--- title: "Git and GitHub" author: "Dr. Hua Zhou @ UCLA" date: "Jan 15, 2019" output: # ioslides_presentation: default html_document: toc: true toc_depth: 4 bibliography: ../bib-HZ.bib --- ## If it's not in source control, it doesn't exist.

## Collaborative research Statisticians, as opposed to closet mathematicians, rarely do things in vacuum. - We talk to scientists/clients about their data and questions. - We write code (a lot!) together with team members or coauthors. - We run code/program on different platforms. - We write manuscripts/reports with co-authors. - We distribute software so potential users have access to your methods. > In every project you have at least one other collaborator, **future-you**. You don’t want future-you to curse past-you. > > Hadley Wickham ## Why version control? - A centralized repository helps coordinate multi-person projects. - Time machine. Keep track of all the changes and revert back easily (reproducible). - Storage efficiency. - Synchronize files across multiple computers and platforms. - [GitHub](github.com) is becoming a de facto central repository for open source development. E.g., all packages in Julia are distributed through GitHub; [Hadley Wickham](http://r-pkgs.had.co.nz) also recommends Git/GitHub as the best practices for R package development. - **Advertise** yourself through GitHub. - AmStat article: [What Is Data Science Specialization, and What Can It Do for You?](http://magazine.amstat.org/blog/2018/01/01/data-science-mooc/) > What should an employer look for when they see a certification on a résumé? > > For our program, and likely data science in general, they should look at the applicant’s GitHub page. They should see interesting project and code contributions. ## Available version control tools - Open source: Git, Apache subversion (aka svn), cvs, mercurial. - Proprietary: Visual SourceSafe (VSS), etc. - Dropbox? Mostly for file backup and sharing, limited version control (1 month?). We use Git in this course. ## Git - Currently the most popular version control system according to [Google Trend](https://trends.google.com/trends/explore?date=all&q=%2Fm%2F05vqwg,%2Fm%2F012ct9,%2Fm%2F08441_,%2Fm%2F08w6d6,%2Fm%2F09d6g&hl=en-US&tz=&tz=). - Initially designed and developed by [Linus Torvalds](http://en.wikipedia.org/wiki/Linus_Torvalds) in 2005 for Linux kernel development. Git is the British English slang for unpleasant person.
> I'm an egotistical bastard, and I name all my projects after myself. First 'Linux', now 'git'. > > Linus Torvalds
## Centralized version control Svn is a **centralized** version control system:

## Distributed version control Git is a **distributed** version control system:

## What do I need to use Git? - A **Git server** enabling multi-person collaboration through a centralized repository. - [github.com](github.com): unlimited public repositories, private repositories costs $, but unlimited private repositories for free from [Student Developer Pack](https://education.github.com/pack). - [bitbucket.org](bitbucket.org): unlimited public repositories, unlimited private repositories for academic account (register for free using your edu email). - We use github.com in this course for developing and submitting homework. ---- - A **Git client** on your own machine. - Linux: shipped with many Linux distributions, e.g., Ubuntu. If not, install using a package manager, e.g., `yum install git` on CentOS. - Mac: install by `port install git` or other package managers. - Windows: [Git for Windows](https://gitforwindows.org) (GUI). - RStudio supports Git. - Do **not** totally rely on GUI or IDE. Learn to use Git on command line, which is needed for cluster and cloud computing. ## Git workflow

## Git survival commands {.smaller} - Synchronize local Git directory with remote repository: ```{bash, eval=FALSE} git pull ``` same as `git fetch` plus `git merge`. - Modify files in local working directory. - Add snapshots to staging area: ```{bash, eval=FALSE} git add FILES ``` - Commit: store snapshots permanently to (local) Git repository ```{bash, eval=FALSE} git commit -m "MESSAGE" ``` - Push commits to remote repository: ```{bash, eval=FALSE} git push ``` ## Git basic usage - Register for an account on a Git server, e.g., github.com. - Upload your SSH public key to the server. - Identify yourself at local machine, e.g., ```{bash, eval=FALSE} git config --global user.name "Your Name" git config --global user.email "your_email@ucla.edu" ``` Name and email appear in each commit you make. - Initialize a project. - Create a repository, e.g., `biostat-m280-2019-winter` on the server. - Clone the repository to your local machine: ```{bash, eval=FALSE} git clone git@github.com:[USERNAME]/biostat-m280-2019-winter.git ``` - Working with your local copy. - `git pull`: update local Git repository with remote repository (fetch + merge). - `git log FILENAME`: display the current status of working directory. - `git diff`: show differences (by default difference from the most recent commit). - `git add file1 file2 ...`: add file(s) to the staging area. - `git commit`: commit changes in staging area to Git directory. - `git push`: publish commits in local Git repository to remote repository. - `git reset --soft HEAD~1`: undo the last commit. - `git checkout FILENAME`: go back to the last commit, discarding all changes made. - `git rm FILENAME`: remove files from git control. ## Branching in Git

---- - For this course, you need to have two branches: - `develop` for your own development. - `master` for releases, i.e., homework submission. - Note `master` is the default branch when you initialize the project; create and switch to `develop` branch immediately after project initialization.

---- - Commonly used commands: - `git branch branchname`: create a branch. - `git branch`: show all project branches. - `git checkout branchname`: switch to a branch. - `git tag`: show tags (major landmarks). - `git tag tagname`: create a tag. ## Sample session | Getting started with homework On [GitHub.com](https://github.com): - Obtain [Student Developer Pack](https://education.github.com/pack). - Create a private repository `biostat-m280-2019-winter`. Add `Hua-Zhou` and `juhkim111` as your collaborators with write permission. On your local machine: - Clone the repository, create `develop` branch, where your work on solutions. ```{bash, eval=FALSE} # clone the project git clone git@github.com:biostat-m280-2019-winter.git # enter project folder cd biostat-m280-2019-winter # what branches are there? git branch # create develop branch git branch develop # switch to the develop branch git checkout develop # create folder for HW1 mkdir hw1 cd hw1 # let's write solutions echo "sample solution" > hw1.Rmd echo "some bug" >> hw1.Rmd # commit the code git add hw1.Rmd git commit -m "start working on problem #1" # push to remote repo git push ``` - Submit and tag HW1 solution to master branch. ```{bash, eval=FALSE} # which branch are we in? git branch # change to the master branch git checkout master # merge develop branch to master branch # git pull origin develop git merge develop # push to the remote master branch git push # tag version hw1 git tag hw1 git push --tags ``` - RStudio has good Git integration. But practice command line operations also. ## Etiquettes of using Git - Be judicious what to put in repository. - Not too less: Make sure collaborators or yourself can reproduce everything on other machines. - Not too much: No need to put all intermediate files in repository. Make good use of the `.gitignore` file. - Strictly version control system is for source files only. E.g. only `xxx.Rmd`, `xxx.bib`, and figure files are necessary to produce a pdf file. Pdf file doesn't need to be version controlled or, if version controlled, doesn't need to be frequently committed. - Commit early, commit often and don't spare the horses. - Adding an informative message when you commit is not optional. Spending one minute on commit message saves hours later for your collaborators and yourself. Read the following sentence to yourself 3 times: **Write every commit message like the next person who reads it is an axe-wielding maniac who knows where you live.**