--- title: "Tribal Consulting Program" subtitle: "Introduction to Git and GitHub with R" author: "_Yikun Zhang_" date: "" output: xaringan::moon_reader: css: ["uw.css", "fonts.css"] lib_dir: libs nature: highlightStyle: tomorrow-night-bright highlightLines: true countIncrementalSlides: false titleSlideClass: ["center","top"] --- ```{r setup, include=FALSE, purl=FALSE} options(htmltools.dir.version = FALSE) knitr::opts_chunk$set(comment = "##") library(kableExtra) ``` # Outline 1. Introduction to Git 2. Introduction to GitHub 3. Basic Git Commands 4. Connect Git, GitHub, and Rstudio
* Acknowledgement: Parts of the slides are modified from the web-based resources "Happy Git and GitHub for the useR" by Jennifer Bryan.
* Note: Those slides with (*) can be skipped for a high-level presentation.
--- class: inverse # Part 1: Introduction to Git --- # Why Do We Need Git? When working a research project/paper solely or collaboratively with others, we may end up creating a folder with files as follows: * draft1.docx * Draft2.docx * draft2_profcomments.docx * draft_final.docx * Simulation_v1.R * Simulation_v2.R * Simulation_v2_asdf.R * Simulation_v3.R * actualfinaldraft.tex * actualfinaldraft.pdf * paper_bib.bib * Draft_comments.tex * Draft_comments.pdf $\hspace{5em} \vdots$ --- # Why Do We Need Git?- Fortunately, [Git](https://git-scm.com/) can help us keep track of changes and comments without creating multiple files. --- # Why Do We Need Git? - In short, [Git](https://git-scm.com/) is a **version control system**.
-- - It was created by Linus Torvalds in 2005 for the Linux kernel. -- - It was originally developed to help groups of developers work collaboratively on big software projects. -- - It manages the evolution of a set of files called a **repository** in a highly structured way. -- - The Git-based projects can be hosted on some online services such as [GitHub](https://github.com/), [GitLab](https://about.gitlab.com/), [Bitbucket](https://bitbucket.org/), etc. - Online Git-based projects are similar to Google Drive or Dropbox but works better for projects with intensive coding and software developments. --- # Some Benefits of Using Git 1. Keep old versions of our files indefinitely (tracking changes so that making any reversions is simple). -- 2. Work with multiple collaborators: - Keeps track of who changed what and when they changed it. -- 3. Changes are automatically integrated into the most recent version, and conflicts receive warnings.
--- # Install Git - Mac/Linux **Mac/Linux:** [Git](https://git-scm.com/) is probably pre-installed for these operating systems. - Open the `Terminal` app and type `git --version` to confirm which version that has been installed and `which git` to find out where it is.
- If for some reason Git is not pre-installed, we can download it from [here](http://git-scm.com/downloads). * Ubuntu or Debian Linux: `sudo apt-get install git` * Fedora or RedHat Linux: `sudo yum install git` - Other installation instructions can be found [here](https://happygitwithr.com/install-git). --- # Install Git - Windows **Windows:** Here is one option of the two listed [here](https://happygitwithr.com/install-git). * Install [Git for Windows](https://gitforwindows.org/) (also known as `msysgit` or "Git Bash") to get Git in addition to the Bash shell. * **Important note:** When asked about "Adjusting your PATH environment", make sure to select "Git from the command line and also from 3rd-party software". * Note that RStudio for Windows prefers for Git to be installed below `C:/Program Files` and this appears to be the default. This implies, for example, that the Git executable on our Windows system is found at `C:/Program Files/Git/bin/git.exe`. Unless we have specific reasons to otherwise, follow this convention. Hopefully, after following the instructions above, we have a Git Bash shell. To check this, open our shell and type `echo $SHELL`. We should see something like `usr/bin/bash`. --- class: inverse # Part 2: Introduction to GitHub --- # Why GitHub? [Git](https://git-scm.com/) is a software that can be used locally. - We don't need to access to the Internet for proper version controls on our computer. -- However, if we want to share our work or collaborate with others, we need to use some online repository services.
- We target [GitHub](https://github.com/) - not [GitLab](https://about.gitlab.com/) or [Bitbucket](https://bitbucket.org/) - for the sake of specificity and popularity. - This [blog](https://radixweb.com/blog/github-vs-gitlab) discusses pros and cons of GitHub v.s. Gitlab. --- # Benefits of Using GitHub 1. Like RStudio with R, GitHub adds many user-friendly features to Git. * Effective collaboration and code distribution. * Allow others to comment on our code and suggest changes (pull requests/issues). * Building our coding portfolio! -- 2. We can benefit from using R and Rstudio with GitHub. - There is an active R package development community on GitHub. - Rstudio has Git- and GitHub-related features. --- # Special features of GitHub (*) In addition to a well-designed user interface, GitHub offers two especially important features: - **Issues:** A list of bugs, feature requests, to-dos, etc. - Issues are tightly integrated with email and allow us to copy/embed important conversations in the associated repo. - Issues can be assigned to people (e.g., to-dos) and tagged ("bug" or "progress-report"). - Issues are tightly integrated with commits and allow us to record the changes.
The Issues page of ggplot2.
--- # Special features of GitHub (*) In addition to a well-designed user interface, GitHub offers two especially important features: - **Pull Requests:** A formal proposal for making changes on the main development branch. - Git allows a project to have multiple, independent branches of development, with the notion that some should eventually be merged back into the main development branch. - GitHub facilitates and preserves the discussion of the [pull requests](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests), holistically and line-by-line. --- # Register A GitHub Account Go to [https://github.com](https://github.com) and register a free account! - GitHub can become a part of our professional portfolio, and it is likely that we plan to do any coding as part of our career. - So choose our username **carefully**! Some rules of thumb: * Incorporate our actual name to increase visibility and recognizability. * Reuse our username from other contexts, e.g., Twitter or Slack. * Shorter is better than longer. * Make it timeless (don't include the current affiliation, etc.) * All lowercase is recommended. --- # Set Up Git (Optional) (*) Run the following in our shell/terminal (fill in your information): ```{bash, eval = FALSE} git config --global user.name 'Firstname Lastname' git config --global user.email 'email@website.com' git config --global --list ``` **Note:** The email should be the one associated with our GitHub account! We also suggest setting up our default editor. At some point when we have a merge conflict, Git will push us into an editor to resolve it. (For instance, we can use Sublime Text on a Mac). Alternatively, we can set our Git username and email within R: ```{r eval=FALSE} library(usethis) use_git_config(user.name = "Firstname Lastname", user.email = "email@website.com") ``` Check out [Software Carpentry's guide](https://swcarpentry.github.io/git-novice/02-setup.html) for more examples of what line to run for different editor-operating system combinations. --- # Install a Git Client (Optional) (*) Sometimes it could be useful to install a GUI for git. - For instance, [GitKraken](https://www.gitkraken.com/), which works on all common operating systems. See [this webpage](https://happygitwithr.com/git-client.html) to see Jenny Bryan's notes for various git clients as well as why they are helpful. We will discuss more about using Git with Rstudio later. Here is a [summary guideline](https://jennybc.github.io/2014-05-12-ubc/ubc-r/session03_git.html).--- # Create A New Repository on GitHub 1. Go to GitHub and click the green "New Repository" button. 2. Repository name: `git_intro` or similar, following the [naming conventions](https://github.com/bcgov/BC-Policy-Framework-For-GitHub/blob/master/BC-Gov-Org-HowTo/Naming-Repos.md) here. 3. Description: Whatever you want, but fill in something. 4. Public or Private. - GitHub offers free unlimited private repositories for all users. - These free private repositories support up to three external collaborators, making them a perfect place for our personal projects, for job applications, and for testing things out before making our project open source. 5. YES initialize the repository with a README. 6. Choose a license (optional). --- # Add Collaborators to A GitHub Repo (*) We can always add collaborators to our GitHub repos through their usernames or emails.
--- class: inverse # Part 3: Basic Git Commands --- # Commits, Diffs, and Tags We now connect the fundamental concepts of Git to the data science workflow: -- - **repository:** A repository or repo is just a directory of files that Git manages holistically. -- - **commit:** A commit functions like a snapshot of all the files in the repo, at a specific moment. - We must also write a short **commit message** every time we make a commit. - Every commit needs a nickname to identify it. Git automatically assigns each commit what is called a SHA (Simple Hashing Algorithm), a seemingly random string of 40 letters and numbers. - We can also designate certain snapshots as special with a **tag**, which is a name of our choosing. --- # Commits, Diffs, and Tags
- **diff:** The set of differences between Version A and B of a file is called a "_diff_" and Git users contemplate diffs a lot. - We can inspect the diffs between any two commits and keep track of the **history** of our project. --- # Run the Git Commands With the Shell (*) The **shell** is a program on our computer that runs other programs. Sometimes, people use the terms `Terminal`, `Console`, `Command Line`, and `Shell` interchangeably. We can also open a shell within RStudio: * *Tools > Terminal > New Terminal* to launch a shell within RStudio. This will often be sufficient for our purposes. * **Windows:** Here, we focus on the **Git Bash** shell, which should have come with our install of Git for Windows. * Confirm that RStudio detected the presence of Git Bash. This should have happened automatically if we installed in the default location, but check by going to *Tools > Global Options > Terminal* and make sure we see `Git Bash` under "_New terminals open with:_". --- # Basic Shell Commands (*) * `pwd` (**p**rint **w**orking **d**irectory). Shows the current working directory of our shell. * `ls` (**l**ist **f**iles). Shows the files in the current working directory. Use `ls -a` to also list hidden files, such as .Rhistory and .git. * `cd` (**c**hange **d**irectory). Allows us to navigate through our directories by changing the shell's working directory. * We can use relative paths such as `..` (parent directory) and `.` (current directory). * `~` is still your home directory. * We can drag and drop a file or folder into the shell window to paste the absolute path. * We can use `Tab` to autocomplete unambiguous directory and file names. * Navigate previous commands with _arrow-up_ and _arrow-down_. * Search for previous commands with `ctrl + r`. $\hspace{5em} \vdots$ --- # Basic Git Commands (*) These commands are used within the Shell/Terminal, and we will show how to run some of these commands in Rstudio as well. * New local git repo from a repo on GitHub: ```{bash, eval = FALSE} git clone https://github.com/zhangyk8/git_intro.git ``` * `git remote -v` lists all remotes and check that the remote was cloned successfully. * `git remote add origin GITHUB_URL` adds the remote `GITHUB_URL` with nickname origin. * `git add FILENAME` stages changes to the file. Will add the file to the tracked files if it isn't already. Use `git add .` to add all untracked files (but be careful with this!) * `git commit -m "COMMIT MESSAGE"` commit all staged changes. --- # Basic Git Commands (*) * `git status` informs us of our current branch, any changes or untracked files, and whether we are in sync with our remotes. * `git log` shows the commit logs. * `git diff` show changes between commits, commit and working tree, etc. * `git pull` pulls commits from remote to local copy. * `git push` pushes from local copy to remote. * `git fetch` fetches commits. * `git checkout` switch to a branch. * `git init` creates an empty git repo within our current directory. The detailed documentation can be found at [https://git-scm.com/docs](https://git-scm.com/docs). --- # Clone Our Git Repo from GitHub (*) .middler-nocent[ 1. Open a shell/Terminal window. 2. **Change our working directory to a new folder or desired folder.** 3. Clone the repo by entering (filling in the relevant info): ```{bash, eval = FALSE} git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY.git ``` **Notes:** we can get the direct link by navigating to our repo and clicking the green "<> Code" button and copying the text from the "HTTPS" option. 4. explore our repo by entering (filling in the relevant info): ```{bash, eval = FALSE} cd YOUR-REPOSITORY ls head README.md git remote show origin git status ``` ] --- # Commit and Push A Local Change (*) 1. Edit our README within shell by using the following line. ```{bash, eval = FALSE} echo "A line I wrote using the shell on my local computer" >> README.md ``` 2. Check the status ```{bash, eval = FALSE} git status ``` 3. Stage our commit and check the status ```{bash, eval = FALSE} git add . git status ``` 4. Commit and push and check the status ```{bash, eval = FALSE} git commit -m "A commit from my local computer" git push git status ``` Now check the repository online at GitHub. What changed? --- # Notes on GitHub Credentials (*) When we interact with GitHub, we have to include credentials in the request. - Unfortunately, GitHub no longer accepts the password when we talk to GitHub as a Git server. With HTTPS, we will use a **personal access token (PAT)**. - Alternatively, we can also [set up keys for the SSH credential](https://happygitwithr.com/ssh-keys). On [GitHub](https://github.com/), after signing in, we can manage our personal access tokens from [https://github.com/settings/tokens](https://github.com/settings/tokens). - It is also reachable via "Settings > Developer settings > Personal access tokens". We could click on "Generate new token" here, or call `usethis::create_github_token()` from R. --- # Notes on GitHub Credentials (*)
Read more detailed instructions [here](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens). --- # Collaboration on Repo Through Branches (*) Branching allows one or many people to work in parallel without overwriting each other's work. - Create a new branch: ```{bash, eval=FALSE} git branch BRANCH-NAME # Switch to the new branch git checkout BRANCH-NAME ``` - Once we have done our work and committed it to the feature branch, you can switch back to `main` and merge the feature branch. ```{bash, eval=FALSE} git checkout main git merge BRANCH-NAME ``` --- # Dealing With Conflicts in Merging (*) Most of the time, the merge will go smoothly. However if both the branches we are merging changed the same part of the same file, we will get a merge conflict. ```{bash, eval=FALSE} git merge BRANCH-NAME # Auto-merging index.html # CONFLICT (content): Merge conflict in index.html # Automatic merge failed; fix conflicts and then commit the result. ``` - To solve a merge conflict, we should determine which files are in conflict, which we can do with `git status`: ```{bash, eval=FALSE} git status # On branch main # You have unmerged paths. # (fix conflicts and run "git commit") # # Unmerged paths: # (use "git add