---
title: "Git and GitHub"
author: "Dr. Hua Zhou"
date: "Jan 16, 2018"
output:
ioslides_presentation: default
subtitle: Biostat M280
bibliography: ../bib-HZ.bib
---
## {.flexbox .vcenter}
> If it's not in source control, it doesn't exist.
## Collaborative research
Statisticians, as opposed to closet mathematicians, rarely do things in vacuum.
- We talk to scientists/clients about their data and questions.
- We write code (a lot!) together with team members or coauthors.
- We run code/program on different platforms.
- We write manuscripts/reports with co-authors.
- We distribute software so potential users have access to your methods.
## {.flexbox .vcenter}
> In every project you have at least one other collaborator, **future-you**. You don’t want future-you to curse past-you.
>
> Hadley Wickham
## Why version control?
- A centralized repository helps coordinate multi-person projects.
- Time machine. Keep track of all the changes and revert back easily (reproducible).
- Storage efficiency.
- Synchronize files across multiple computers and platforms.
- [GitHub](github.com) is becoming a de facto central repository for open source development. E.g., all packages in Julia are distributed through GitHub; [Hadley Wickham](http://r-pkgs.had.co.nz) also recommends Git/GitHub as the best practices for R package development.
- Advertise yourself through GitHub.
## Available version control tools
- Open source: Git, Apache subversion (aka svn), cvs, mercurial.
- Proprietary: Visual SourceSafe (VSS), etc.
- Dropbox? Mostly for file backup and sharing, limited version control (1 month?).
We use Git in this course.
## Git
- Currently the most popular version control system according to [Google Trend](https://trends.google.com/trends/explore?date=all&q=%2Fm%2F05vqwg,%2Fm%2F012ct9,%2Fm%2F08441_,%2Fm%2F08w6d6,%2Fm%2F09d6g&hl=en-US&tz=&tz=).
- Initially designed and developed by [Linus Torvalds](http://en.wikipedia.org/wiki/Linus_Torvalds) in 2005 for Linux kernel development. Git is the British English slang for unpleasant person.
> I'm an egotistical bastard, and I name all my projects after myself. First 'Linux', now 'git'.
>
> Linus Torvalds
## Centralized version control
Svn is a **centralized** version control system:
## Distributed version control
Git is a **distributed** version control system:
## What do I need to use Git?
- A **Git server** enabling multi-person collaboration through a centralized repository.
- [github.com](github.com): unlimited public repositories, private repositories costs $, but unlimited private repositories for free from [Student Developer Pack](https://education.github.com/pack).
- [bitbucket.org](bitbucket.org): unlimited public repositories, unlimited private repositories for academic account (register for free using your edu email).
- We use github.com in this course for developing and submitting homework.
----
- A **Git client** on your own machine.
- Linux: shipped with many Linux distributions, e.g., Ubuntu. If not, install using a package manager, e.g., `yum install git` on CentOS.
- Mac: install by `port install git` or other package managers.
- Windows: [GitHub Desktop](https://desktop.github.com) (GUI), [TortoiseGIT](https://tortoisegit.org) (is this good?).
- Don't totally rely on GUI or IDE. Learn to use Git on command line, which is needed for cluster and cloud computing.
## Git workflow
## Git survival commands {.smaller}
- Synchronize local Git directory with remote repository:
```{bash, eval=FALSE}
git pull
```
same as `git fetch` plus `git merge`.
- Modify files in local working directory.
- Add snapshots to staging area:
```{bash, eval=FALSE}
git add FILES
```
- Commit: store snapshots permanently to (local) Git repository
```{bash, eval=FALSE}
git commit -m "MESSAGE"
```
- Push commits to remote repository:
```{bash, eval=FALSE}
git push
```
## Git basic usage {.smaller}
- Register for an account on a Git server, e.g., github.com.
- Upload your SSH public key to the server.
- Identify yourself at local machine, e.g.,
```{bash, eval=FALSE}
git config --global user.name "Your Name"
git config --global user.email "your_email@ucla.edu"
```
Name and email appear in each commit you make.
- Initialize a project.
- Create a repository, e.g., `biostat-m280-2018-winter` on the server.
- Clone the repository to your local machine:
```{bash, eval=FALSE}
git clone git@github.com:biostat-m280-2018-winter.git
```
## {.smaller}
- Working with your local copy.
- `git pull`: update local Git repository with remote repository (fetch + merge).
- `git log FILENAME`: display the current status of working directory.
- `git diff`: show differences (by default difference from the most recent commit).
- `git add file1 file2 ...`: add file(s) to the staging area.
- `git commit`: commit changes in staging area to Git directory.
- `git push`: publish commits in local Git repository to remote repository.
- `git reset --soft HEAD~1`: undo the last commit.
- `git checkout FILENAME`: go back to the last commit, discarding all changes made.
- `git rm FILENAME`: remove files from git control.
## Branching in Git
----
- For this course, you need to have two branches:
- `develop` for your own development.
- `master` for releases, i.e., homework submission.
- Note `master` is the default branch when you initialize the project; create and switch to `develop` branch immediately after project initialization.
----
- Commonly used commands:
- `git branch branchname`: create a branch.
- `git branch`: show all project branches.
- `git checkout branchname`: switch to a branch.
- `git tag`: show tags (major landmarks).
- `git tag tagname`: create a tag.
## Sample session | Getting started with homework
- Obtain [Student Developer Pack](https://education.github.com/pack).
- Create a private repository `biostat-m280-2018-winter`. Add `Hua-Zhou` and `juhkim111` as your collaborators with write permission.
## {.smaller}
On your local machine:
- Clone the repository, create `develop` branch, where your work on solutions.
```{bash, eval=FALSE}
# clone the project
git clone git@github.com:biostat-m280-2018-winter.git
# enter project folder
cd biostat-m280-2018-winter
# what branches are there?
git branch
# create develop branch
git branch develop
# switch to the develop branch
git checkout develop
# create folder for HW1
mkdir hw1
cd hw1
# let's write solutions
echo "sample solution" > hw1.Rmd
echo "some bug" >> hw1.Rmd
# commit the code
git add hw1.Rmd
git commit -m "start working on problem #1"
# push to remote repo
git push
```
## {.smaller}
- Submit and tag HW1 solution to master branch.
```{bash, eval=FALSE}
# which branch are we in?
git branch
# change to the master branch
git checkout master
# merge develop branch to master branch
# git pull origin develop
git merge develop
# push to the remote master branch
git push
# tag version hw1
git tag hw1
git push --tags
```
- RStudio has good Git integration. But practice command line operations also.
## Etiquettes of using Git {.smaller}
- Be judicious what to put in repository.
- Not too less: Make sure collaborators or yourself can reproduce everything on other machines.
- Not too much: No need to put all intermediate files in repository. Make good use of the `.gitignore` file.
- Strictly version control system is for source files only. E.g. only `xxx.Rmd`, `xxx.bib`, and figure files are necessary to produce a pdf file. Pdf file doesn't need to be version controlled or, if version controlled, doesn't need to be frequently committed.
- Commit early, commit often and don't spare the horses.
- Adding an informative message when you commit is not optional. Spending one minute on commit message saves hours later for your collaborators and yourself. Read the following sentence to yourself 3 times:
**Write every commit message like the next person who reads it is an axe-wielding maniac who knows where you live.**