--- title: Branch and Merge author: "Eric C. Anderson" output: html_document: toc: yes bookdown::html_chapter: toc: no layout: default_with_disqus --- # Branching and Merging {#branch-and-merge} ## What the heck is a branch? {#whats-a-branch} * Note that the motto for git is "_local branching on the cheap_," so branches must be important ![branching on the cheap](diagrams/git.png) * A branch is a "lightweight pointer to a commit"...What? * Let's refresh our memory on what commits, trees, and blobs are. ### Data structure within the .git folder * _Note: the next four diagrams are from Scott Chacon's [ProGit](http://git-scm.com/book/en/) book and are used under the [CC Noncommercial 3.0 Attribution license](https://creativecommons.org/licenses/by-nc/3.0/us/). I took them from [here](https://github.com/progit/progit/tree/master/figures)_ ![A commit](diagrams/18333fig0301-tn.png) * Contents of (different versions of) files in your repository are stored in blobs * (Sub)directory structure (locations of files in your repository) stored in trees + The tree and the blobs referenced in it constitute the "snapshot" of the repository. * Commits: for a given snapshot, points to a tree, has a comment, an author, a committer, __and__ + Unless it is the first commit, it has a pointer to a parent. --- --- ### A chain or sequence of commits * Commits form a chain by connecting to their parents. ![A commit](diagrams/18333fig0302-tn.png) * The arrows between commits point to parents. Thus the later commits are on the right, and the earlier commits on the left. * This is critical: every commit knows where it came from (the previous commit). * Just as a genealogy (i.e., links to parents) gives your family history, so too do the commits _and the links from each commit to its parent(s)_ constitute the __version history__ of your repository. --- --- ### Easy access to commits * Commits would not be useful if they were just stored in the .git directory of your repository with no way to access them. * Branches are how you access commits so that you can: + Check them out (be able to access their contents) + Modify the contents and make a new commit * They are "lightweight" in that they don't take up much space (each branch is just a file with the sha-1 hash of a commit in it) * Upon initialization, every repository gets a branch called _master_ ![A commit](diagrams/18333fig0303-tn.png) * It is customary to let _master_ be where your _stable_, deployable code is kept. --- --- ### What it means to be _on a branch_ * If you are on a branch, (say _master_ for the sake of argument) that means that 1. There is a commit pointed to by _master_. (Let's say it is 8e334ab3) 2. Any changes you make, __when staged and committed__ will create a new commit whose parent (in this case) is 8e334ab3. * In git, if you are on a branch, then the HEAD points to that branch. ![A commit](diagrams/18333fig0305-tn.png) * Hey, there are two branches pointing to the same commit! + That is just fine. You can have as many branches as you want pointing to a commit. + Question. If changes are committed now, what happens to the two branches? This is crucial... ## Creating and using branches {#creating-and-using-branches} ### Creating a branch called "testing" * In the shell, when you type ```{r, eval=FALSE} git branch some-name ``` git will make a new branch for you named `some-name`. * __It will point to whatever commit HEAD currently points to.__ (i.e. the branch that you are currently on.) * Let's do this together. 1. Create a new rstudio project with version control: + File -> New Project -> New Directory -> Empty Project + Let the directory name be `branchy` and make it a subdirectory of Desktop. + _Important:_ check the box to __"create a git repository"__ 2. At this point there are no commits. Stage `.gitignore` and `branchy.Rproj` and commit them. 3. Look at the "diff" / History window, and notice that there is one commit and _HEAD_ and _master_ are at it: ![rstudio git hist](diagrams/rs-git-hist-1.png) 4. Let's "move _master_ forward" by making a new .Rmd file called `simple.Rmd` and committing it. Now our history looks like: ![rstudio git hist](diagrams/rs-git-hist-2.png) 5. Now, let's make a new branch called _testing_. In the shell (in the correct directory): ```{r, eval = FALSE} git branch testing ``` * Notice that RStudio's git-history shows us that _HEAD_, _master_, and _testing_, all now point to the same commit. (__You might have to hit the refresh button__). They all have the same parent and history too. ![branches](diagrams/testy-master.png) * __However__ the visual there doesn't tell us which branch we are one (where _HEAD_ is attached.) + No worries, upper right of the git pane tells us. We should still be on _master_ because we have not "checked-out" the _testing_ branch yet. ![branch indicator](diagrams/branch-indicator.png) ### Checking out a branch * If we want to be _on a branch_ we need to "check that branch out". Either: ```{r, eval=FALSE} git checkout testing ``` in the shell, or use the dropdown menu by clicking on the __Branch Indicator__ in RStudio. * In this case nothing exciting happened because _testing_ and _master_ both point to the same commit. * If _testing_ had pointed to a different commit, the state of the files in your working area might have changed. * What if you had uncommitted changes on some of the files that changed?!! + Thankfully, git does not let you overwrite unstored changes when you switch branches. (More on this later.) ### Make some changes on the _testing_ branch 1. Make sure you are the _testing_ branch 1. Add some text to the bottom of simple.Rmd 2. Stage that file and commit it 3. Don't stage the "simple.html" file. ### Make __some more__ changes to the _testing_ branch 1. Add some more text to the bottom. Stage and commit. 2. Don't stage the html file. ### Make even more changes to the _testing_ branch 1. Modify the `.gitignore` file so git ignores all files that end in `.html` in your repo. ### Now look at your commit history * Your _testing_ branch has left _master_ in the dust! ![master in the dust](diagrams/master-in-the-dust.png) * What if, after all of this, we decide that our _testing_ branch is good stuff and we want to add the changes in _testing_ to _master_ so we can deploy it, etc? * This is where merging comes in. ## Merging branches {#merging-branches} The basic idea of a merge: 1. Switch to the branch that you want to add the stuff into 2. Tell it to merge in changes from another branch (or branches) * Sometimes this is easy (_fast-forward merges_) * Sometimes it is more complicated, if both branches have changed from their common parent. But, git takes care of most of the details ### Simplest case = Fast-forward merge * To merge _testing_ into _master_ is simple because _master_ is a __direct__ ancestor of _testing_. * Here is how you do it: 1. Switch to the master branch (best to make sure your working area is clean) 2. In the shell (in a directory within the `branchy` repo) type this: ```{r, eval=FALSE} git merge testing ``` * git should tell you that it must made a fast-forward merge: ``` Updating 9f5b2cf..51503c5 Fast-forward .gitignore | 1 + simple.Rmd | 24 ++++++++++++++++++++++++ 2 files changed, 25 insertions(+) ``` ### Look at your history diagram again * You might have to hit the "Refresh" button in the "RStudio: Review Changes" window. * See that all that happened is that the _master_ branch has been brought up to where the _testing_ branch is. * __No new commits were generated__ ![fast-forward-result](diagrams/fast-forward-result.png) ### We can delete _testing_ now if we want _master_ points to the same place, so it is redundant. ``` git branch -d testing ``` If a branch is not redundant, git won't let you delete it with the `-d` option (have to use `-D`, but be careful you don't lose track of work that you would like to merge in at some point.) ### Harder case, both branches in the merge have moved forward * Requires a "3-way-merge" + 3 commits involved: the two branches and their common ancestor. * Git takes care of the difficulties here (finding the "merge base", etc.) * This creates a new commit. This is as it must and should be! ### Let's create the need for a 3-way merge Note, through all of this, __do not__ be amending your previous commits. 1. Make a new branch called _changes-on-bottom_ and check it out. + Note: if you want to simultaneously create a new branch named _changes-on-bottom_ __and__ check it out at the current HEAD location. You can just do this: ```{r, eval=FALSE} git checkout -b changes-on-bottom ``` 2. Add stuff to `simple.Rmd` to the bottom of the file. Commit. 3. Delete some of the stuff you just added. Commit. 4. Add even more to the bottom of the file. Commit. 4. Now checkout _master_ to make changes on the _master_ branch. Notice that your file has been "rolled back" to its state on the _master_ branch. 5. Add some stuff to the **top** part of the file. commit. 6. Remove half of what you just added. Commit. 7. Add some more to the top. Commit. ### Now, look at your history * Be sure to choose "all branches" to the right of the "History" button: ![needs-3-way-merge](diagrams/needs-3-way-merge.png) * __It sort of looks like _master_ is ahead of _changes-on-bottom___ but it isn't. * They are diverging _independently_ ### Will we be able to merge these? * Both branches change the same file, __BUT__, * That is not a problem _unless the same line in the same file has been changed_ in the different branches being merged. ### Let's merge them __into__ the _master_ branch * The command is the same: + Make sure you are on the _master_ branch + Use the `git merge` command: ```{r, eval=FALSE} git merge changes-on-bottom ``` * It will pop up an editor for you to write a commit message + This might fail if you don't have an editor set up. + If you get emacs and are unfamiliar with it, this will take some explanation. * git expects you to write a commit message because this is a non-trivial merge---it has to make a new commit because it is not equivalent to any previous commit (unlike in the fast-forward case). + git provides a basic message (pre-written) in the editor window. + all you have to do is save the file and that message will be commit message. * in the `nano` editor, `^X` (cntrl-x) is the key sequence to exit the file. * if you haven't modified the file at all, hitting `^X` should just save the file and exit. * if you have modified the file `nano` will ask if you want to save it (say Yes), and then hit return to save it with the filename recommended (and, actually, required, by git). ### Now, check out your history ![after the merge](diagrams/after-the-merge.png) * There is a new commit in there and it has two parents. * Whoa! The _changes-on-bottom_ branch __did not__ move forward. * You could delete _changes-on-bottom_ now, OR you could continue to modify it (it won't have the changes you did on _master_) and merge those in later. Let that me a fun weekend mission for the motivated...explore what happens. ## For next git session {#for-next-time} * When git won't let you checkout a branch + Stashing * Merge conflicts, and resolving them * Remotes ### If we have time today * It is really fun to browse the version history for [Hadley's advanced R book](https://github.com/hadley/adv-r.git). + For fun: make a new project that is a clone of it and look at its history and try to understand it.