{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# R2. Git/Github tips and traps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the purposes of this material, it is assumed that the you have read through [Lesson 2](../../lessons/02/index.rst). We'll start by reviewing some of these concepts and then covering some simple tricks to prevent a whole host of headaches in using GitHub within your groups. The bulk of this lesson, however, will extend on these materials. Specifically, we cover **merging**, including using [jupytext](https://github.com/mwouts/jupytext) to deal with the oddities of performing these operations on notebooks. If you are looking for a more in-depth tutorial on various Git concepts, reference this tutorial from [software carpentry](http://swcarpentry.github.io/git-novice/). The goal for today is to get you comfortable with git. \n", "\n", "We will also do a quick tutorial of GitHub with Google Colab and talk about the beauty and dangers of this feature.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simple practices can save from huge pains\n", "\n", "Git is extremely useful and powerful, but it can cause serious annoyances if not used carefully. Follow Justin's suggestion from [homework 1](../../homework/01/hw1.4.ipynb): \"it is wise to commit and push often.\" And from [Lesson 2](../../lessons/02/version_control_with_git.ipynb), pull frequently and \"always `git pull` when you start working.\" These practices will go a long way to prevent clashes where two team members have edited the same file at the same time without knowing. **Even if you feel that a particular piece of work is not yet finished, commit and push it any time you step away from it.** It is much better for a teammate to look at work you might feel insecure about due to its incompleteness than to have a horrible merge issue later on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use the sandbox exceedingly liberally\n", "\n", "The simplest and easiest way to avoid clashing with group members work is to use the sandbox provided in your GitHub repository extensively. When doing this, each member can work on a differently named file. For example, I might work on `hw3.2_ar.ipynb`, where the `ar` signifies my initials, while my teammate Liana works on `hw3.2_lm.ipynb`. Each team member can have their own version of various pieces of the homework for scratch work. The key reason to do this is that **you might not know when a team member is working on the same file**. If two people are working on the same file at the same time, a **merge conflict** is certain in the near future. One will push to the repository first, leaving changes that the other does not have in the file they are working with. This means that if the second forces a push, their changes will be kept at the expense of the other's. Without additional tools, merging notebooks with disparate changes is very non-trivial. Instead, to avoid all this, both members can work on their own files in the sandbox, discuss various parts, and combine and edit a final submission when the time comes.\n", "\n", "Note that if you open two Jupyter notebooks next to each other in JupyterLab it is quite easy to create a combined version of a document. This is much better than overwriting each other repeatedly. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use `checkout` to avoid destroying others work\n", "\n", "When you use separate files for each team member in the sandbox, you will often want to look at and build in the work of your teammates. This means that you will often pull material from the repo and open the notebooks of others. If you run this notebook and poke around in it, you may make a \"change\" in the document that you do not intend to. If you commit and push this change, you might cause a merge conflict with changes the owner of this file has made since you last pulled. If you insist on pushing, you might destroy another's work. If you check `git status` and see a change in such a file that you don't want to push, use `git checkout FILE_NAME_HERE` to discard your non-change-changes and prevent later merge conflicts. Your teammates will thank you!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Configuration & basics\n", "\n", "If you haven't done so already, make sure to configure your Git parameters. Namely, you must set your contact information and name. Ignore if you already did this when setting up for the class.\n", "\n", "```bash\n", "git config --global user.name \"YOUR NAME\"\n", "git config --global user.email \"YOUR EMAIL ADDRESS\"\n", "```\n", "\n", "Naturally, you fill in the name and email with your details.\n", "\n", "For the purposes of this exercise, we will be playing around with a repository set up specifically for practice. Justin recommends to store cloned and forked GitHub repos in a folder called `git` in your home directory. I will assume this convention for this tutorial. For today, please fork the repository located at https://github.com/AnkitaRoychoudhury/gitbrachmerge. This can be done in GitHub using the browser, using the 'Fork' button on the upper right side of the page when you navigate to the repository. Select that you want to fork it to your personal GitHub organization, *not* the BE/Bi 103 one, to avoid clutter there. Once you do this fork, clone the repository by copying the URL and performing\n", "\n", "```bash\n", "cd ~/git\n", "git clone URL_YOU_COPIED\n", "```\n", "\n", "You should additionally already be familiar with the following commands from [Lesson 2](../../lessons/02/version_control_with_git.ipynb):\n", "\n", "| Command | Description |\n", "|------------------------------|----------------------------------------------------------------|\n", "| git status | Check the difference between your local files and GitHub files |\n", "| git add | Add all files you want to push to GitHub repo |\n", "| git commit -m \"your_message\" | Use this to commit with a descriptive message |\n", "| git push origin main | Push the commit onto remote repo (on GitHub) |\n", "| git pull | Retrieve files from remote repo (on GitHub) \n", "\n", "For additional help with these commands and their uses (as well as a bunch of other useful commands) reference [this cheat sheet](https://education.github.com/git-cheat-sheet-education.pdf) from GitHub. This very handy reference shows all the commands you are likely to need and then some. Common descriptions were copied over above." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Syncing your forked repository to the upstream repository\n", "\n", "For this course, you will do all your work with your teammates in a single repo. You will not need to deal with forking repos, but for today, this is the easiest way to demonstrate merge clashes and how to deal with them. You want to be able to sync your repository with my `gitbrachmerge` repository so you can retrieve any updates in it. The original repository is typically called the **upstream repository**, since presumably you are changing it, so you are downstream. You want the upstream repository to be a **remote** repository, which is just what we call a repository we track and fetch and merge from. To see which repositories are remote, do\n", "\n", " git remote -v\n", "\n", "The `-v` just means \"verbose,\" so it will also tell you the URLs. Entering that now will show a single repository, `origin`, which you can fetch from and push to. In your case, `origin` is your fork of the gitbranchmerge repository.\n", "\n", "We now want to add the upstream repository. To do this, copy the URL of my gitbrachmerge repository and do:\n", "\n", " git remote add upstream https://github.com/AnkitaRoychoudhury/gitbrachmerge.git\n", "\n", "Now try doing `git remote -v`, and you will see that you are now also tracking the upstream repository.\n", "\n", "Now, when you want to pull from the upstream repository, you do\n", "\n", " git pull upstream master\n", " \n", "This will pull in all the changes from the upstream repository. If you want to pull in changes to your own forked repository, it's still just \n", "\n", " git pull\n", " \n", "which is shorthand for\n", "\n", " git pull origin master" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Merging\n", "\n", "### If possible, avoid merging\n", "\n", "The best way to merge is to avoid merging in the first place. This is doubly true when working with Jupyter notebooks. The simple practices above are meant to help avoid the need to merge. Here are some more excellent tips on avoiding the need to merge from a tutorial from [software carpentry](http://swcarpentry.github.io/git-novice/09-conflict/index.html). These include some more advanced features of git than we've covered today, like branches, but they are nevertheless good advice, especially the project management strategies:\n", "\n", ">Git’s ability to resolve conflicts is very useful, but conflict resolution costs time and effort, and can introduce errors if conflicts are not resolved correctly. If you find yourself resolving a lot of conflicts in a project, consider these technical approaches to reducing them:\n", ">\n", ">- Pull from upstream more frequently, especially before starting new work\n", ">- Make smaller more atomic commits\n", ">- Where logically appropriate, break large files into smaller ones so that it is less likely that two authors will alter the same file simultaneously\n", ">\n", ">Conflicts can also be minimized with project management strategies:\n", ">\n", ">- Clarify who is responsible for what areas with your collaborators\n", ">- Discuss what order tasks should be carried out in with your collaborators so that tasks expected to change the same lines won’t be worked on simultaneously\n", ">- If the conflicts are stylistic churn (e.g. tabs vs. spaces), establish a project convention that is governing and use code style tools (e.g. htmltidy, perltidy, rubocop, etc.) to enforce, if necessary\n", "\n", "Daisie Huang and Ivan Gonzalez (eds): \"Software Carpentry: Version\n", "Control with Git.\" Version 2016.06, June 2016,\n", "https://github.com/swcarpentry/git-novice, 10.5281/zenodo.57467." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "### Merging with simple text files\n", "\n", "#### Exercise 1\n", "\n", "We will first play around with the simplest case of merges by dealing with text files. In the repository you forked, there is a text file called `text_merge/edit_me.txt`. We will play around with editing this file concurrently with two people and deal with the resulting merge issues. Open this file in your favorite text editor and add a few lines. This could be accomplished, for example, with\n", "\n", "```bash\n", "cd ~/git/gitbrachmerge/text_merging/\n", "vim edit_me.txt \n", "\n", "OR double click 'edit_me.txt' to open as a new Jupyter window\n", "```\n", "\n", "When you are done editing the file save the file and exit. (In `vim`, tap `i` for insert, then navigate with arrow keys and make your edits. When done, tap `ESC`, then type `:wq`, for write & quit, then `Enter`. If you make a mistake and want to bail without saving, use `ESC`, then `:q`, `Enter`).\n", "\n", "In the mean time, I have also edited this document, committed, and pushed it to the upstream master repo. If you pull from the upstream master, you will get a prompt to merge. Try this now:\n", "\n", "```\n", "git pull upstream master\n", "```\n", "\n", "If we didn't edit the same lines, Git should be able to automerge the `edit_me.txt` file. But if we did make edits to the same lines, Git doesn't know which version to keep, yours or mine! **To handle this merge, you need to manually edit the document and deal with the discrepancies.** Otherwise, you'll end up publishing material with ugly merge issues." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`<<<<<<" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A final note on commit messages\n", "\n", "Git really comes into its own for managing larger projects than your homework assignments in this course. One key ingredient of good version control in this context is writing informative commit messages. Justin emphasized in [Lesson 0](../../lessons/00/intro_to_jupyterlab.ipynb) that the emphasis of comments in code should be _why_, not _how_. The important information is _why_ was a design choice made, what was it's context, and not the implementation details (the _how_), which should be fairly self-evident from good self-documenting code. Future You is the most important reader.\n", "\n", "All of these same thoughts apply to commit messages. Their goal is to give context and explain why, not how, changes were made. I highly recommend [this](https://chris.beams.io/posts/git-commit/) post on how to write good commit messages. This may sound like a silly thing to harp on, so read that blog post for more insight on why your `git log --oneline` should never look like [this](https://xkcd.com/1296/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*This recitation was originally written by Muir Morrison and was added to by Julian Wagner and Ankita Roychoudhury.*" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }