{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# An interactive Git Tutorial: the tool you didn't know you needed\n", "\n", "## From personal workflows to open collaboration" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note:** this tutorial was particularly modeled, and therefore owes a lot, to the excellent materials offered in:\n", "\n", "- \"Git for Scientists: A Tutorial\" by John McDonnell (no link as this tutorial seems to have disappeared from the internet).\n", "- Emanuele Olivetti's lecture notes and exercises from the G-Node summer school on [Advanced Scientific Programming in Python](https://python.g-node.org/wiki/schedule).\n", "\n", "In particular I've reused the excellent images from the [Pro Git book](http://git-scm.com/book) that John had already selected and downloaded, as well as some of his outline. But this version of the tutorial aims to be 100% reproducible by being executed directly as an IPython notebook and is hosted itself on github so that others can more easily make improvements to it by collaborating on Github. Many thanks to John and Emanuele for making their materials available online.\n", "\n", "After writing this document, I discovered [J.R. Johansson](https://github.com/jrjohansson)'s [tutorial on version control](http://nbviewer.ipython.org/urls/raw.github.com/jrjohansson/scientific-python-lectures/master/Lecture-7-Revision-Control-Software.ipynb) that is also written as a fully reproducible notebook and is also aimed at a scientific audience. It has a similar spirit to this one, and is part of his excellent series [Lectures on Scientific Computing with Python](https://github.com/jrjohansson/scientific-python-lectures) that is entirely available as Jupyter Notebooks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Wikipedia" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "“Revision control, also known as version control, source control\n", "or software configuration management (SCM), is the\n", "**management of changes to documents, programs, and other\n", "information stored as computer files.**”" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Reproducibility?**\n", "\n", "* Tracking and recreating every step of your work\n", "* In the software world: it's called *Version Control*!\n", "\n", "What do (good) version control tools give you?\n", "\n", "* Peace of mind (backups)\n", "* Freedom (exploratory branching)\n", "* Collaboration (synchronization)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Git is an enabling technology: Use version control for everything" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Paper writing (never get `paper_v5_john_jane_final_oct22_really_final.tex` by email again!)\n", "* Grant writing\n", "* Everyday research\n", "* Teaching (never accept an emailed homework assignment again!)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Teaching courses with Git" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "<!-- offline: \n", "<img src=\"files/images/indefero_projects_notes.png\" width=\"100%\">\n", "<img src=\"https://raw.github.com/fperez/reprosw/master/fig/indefero_projects_notes.png\" width=\"100%\">\n", "-->" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Annotated history of each student's worfklow (and backup!)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<!-- offline: \n", " <img src=\"files/images/indefero_projects1.png\" width=\"100%\">\n", "<img src=\"https://raw.github.com/fperez/reprosw/master/fig/indefero_projects1.png\" width=\"100%\">\n", " -->\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Git is easy!\n", "\n", "<a href=\"https://xkcd.com/1597\"> \n", " <img src=\"images/xkcd-git.png\" width=\"400px\"> \n", "</a>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Very high level picture: an overview of key concepts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **commit**: *a snapshot of work at a point in time*\n", "\n", "<!-- offline: \n", "\n", "\n", "<img src=\"https://raw.github.com/fperez/reprosw/master/fig/commit_anatomy.png\">\n", "-->\n", "\n", "\n", "\n", "Credit: ProGit book, by Scott Chacon, CC License." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "A **repository**: a group of *linked* commits\n", "\n", "<!-- offline: \n", "\n", "\n", "<img src=\"https://raw.github.com/fperez/reprosw/master/fig/threecommits.png\" >\n", "-->\n", "\n", "\n", "\n", "Note: these form a Directed Acyclic Graph (DAG), with nodes identified by their *hash*." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "A **hash**: a fingerprint of the content of each commit *and its parent*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from hashlib import sha1\n", "\n", "# Our first commit\n", "data1 = b'This is the start of my paper.'\n", "meta1 = b'date: 1/1/17'\n", "hash1 = sha1(data1 + meta1).hexdigest( )\n", "print('Hash:', hash1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Our second commit, linked to the first\n", "data2 = b'Some more text in my paper...'\n", "meta2 = b'date: 1/2/1'\n", "# Note we add the parent hash here!\n", "hash2 = sha1(data2 + meta2 + hash1.encode()).hexdigest()\n", "print('Hash:', hash2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And this is pretty much the essence of Git!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## First things first: git must be configured before first use" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The minimal amount of configuration for git to work without pestering you is to tell it who you are. You should run a version of these commands in your shell:\n", "\n", "```bash\n", "git config --global user.name \"Your Name\"\n", "git config --global user.email \"your.email@yourplace.org\"\n", "```\n", "\n", "And while we're at it, we also turn on the use of color, which is very useful\n", "\n", "```bash\n", "git config --global color.ui \"auto\"\n", "```\n", "\n", "Set git to use the credential memory cache so we don't have to retype passwords too frequently. \n", "\n", "Github offers in its help pages instructions on how to configure the credentials helper for [Mac OSX](https://help.github.com/articles/caching-your-github-password-in-git/#platform-mac), [Windows](https://help.github.com/articles/caching-your-github-password-in-git/#platform-mac) and [Linux](https://help.github.com/articles/caching-your-github-password-in-git/#platform-linux)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The plan for this tutorial" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The rests of this tutorial is structured in the following way: after the above brief overview of key concepts you need to understand in order for git to really make sense, we will then dive into hands-on work. We will discuss 5 \"stages of git\" with scenarios of increasing sophistication and complexity, introducing the necessary commands for each stage:\n", " \n", "1. Local, single-user, linear workflow\n", "2. Single local user, branching\n", "3. Using remotes as a single user\n", "4. Remotes for collaborating in a small team\n", "5. Full-contact github: distributed collaboration with large teams\n", " \n", "In reality, this tutorial only covers stages 1-4, since for #5 there are many software develoment-oriented tutorials and documents of very high quality online. But most scientists start working alone with a few files or with a small team, so I feel it's important to build first the key concepts and practices based on problems scientists encounter in their everyday life and without the jargon of the software world. Once you've become familiar with 1-4, the excellent tutorials that exist about collaborating on github on open-source projects should make sense." ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }