{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# jupyter-diff\n", "\n", "From Jupyter Lab Open Studio! (at Bloomberg HQ)\n", "\n", "Kunal Marwaha\n", "and Saul\n", "Shanabrook // June 22 2019\n", "\n", "This is an experiment in using version\n", "control for Jupyter notebooks, using the\n", "tool\n", "[notedown](https://github.com/aaren/notedown) to automatically store\n", "notebooks\n", "in Markdown." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "attributes": { "classes": [], "id": "", "n": "2" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I ❤ Jupyter!\n" ] } ], "source": [ "print(\"I ❤ Jupyter!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Why\n", "\n", "Jupyter is a great tool for creating *notebooks* (`.ipynb` files). With\n", "notebooks, one can explore new computational ideas and share narratives with\n", "code. \n", "\n", "Ideas can change, and many turn to a form of version control to track\n", "changes.\n", "\n", "![PhD Comics by Jorge Cham](phd101212s.png)\n", "\n", "However, the underlying\n", "format\n", "of notebooks is\n", "JSON. It is relatively difficult to see changes to JSON\n", "objects\n", "in classic tools\n", "like [diff][diff].\n", "There is still\n", "not consensus on how\n", "to put Jupyter\n", "notebooks in version control.\n", "See\n", "[here][j1], \n", "[here][j2],\n", "and\n", "[here][j3],\n", "with\n", "varied solutions like\n", "[nbdime](https://nbdime.readthedocs.io/en/latest/),\n", "[nbstripout](https://github.com/kynan/nbstripout), or\n", "[jupytext](https://github.com/mwouts/jupytext). \n", "\n", "Here, I will use\n", "[notedown](https://github.com/aaren/notedown) to work with `.md` files in\n", "Jupyter Lab, and use [git](https://git-scm.com/) for version control.\n", "[diff]:http://man7.org/linux/man-pages/man1/diff.1.html\n", "[j1]:https://stackoverflow.com/q/18734739\n", "\n", "[j2]:http://tiny.cc/7ofo8y\n", "[j3]:http://tiny.cc/fofo8y" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "attributes": { "classes": [], "id": "", "n": "5" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bool(\"Markdown\") and bool(\"Jupyter\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "First, boot up Jupyter Lab.\n", "\n", "Let's install notedown [directly from the\n", "notebook][jakevdp]:\n", "\n", "[jakevdp]:http://tiny.cc/onfo8y" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "attributes": { "classes": [], "id": "", "n": "3" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: numpy in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (1.15.1)\n", "\u001b[33mYou are using pip version 19.0.1, however version 19.1.1 is available.\n", "You should consider upgrading via the 'pip install --upgrade pip' command.\u001b[0m\n" ] } ], "source": [ "import sys\n", "!{sys.executable} -m pip install numpy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Now, we'll add the below line to your Jupyter settings:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "attributes": { "classes": [], "id": "", "n": "5" } }, "outputs": [], "source": [ "LINE_TO_ADD = \"c.NotebookApp.contents_manager_class = 'notedown.NotedownContentsManager'\"" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "attributes": { "classes": [], "id": "", "n": "8" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing 'LINE_TO_ADD' (str) to file '/Users/kmarwaha/.jupyter/jupyter_notebook_config.py'.\n" ] } ], "source": [ "%store LINE_TO_ADD >> ~/.jupyter/jupyter_notebook_config.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "If you're following along, you can relaunch Jupyter, and create a new\n", "Markdown\n", "file. Once you create the file,\n", "right-click and `Open With ->\n", "Notebook`.\n", "![\"Open With\" image](open-with.png)\n", "\n", "## Issues\n", "Not everything is\n", "perfect. Here are the issues I've seen:\n", "1. **Links\n", "wrap, and occasionally get\n", "spaces added to them.** This is from 80-character\n", "line limits in Markdown\n", "format. Some links will be broken because of this. One\n", "partial solution is to\n", "reference links (see examples [here][mdcheat]).\n", "You can also use URL shortening\n", "services like http://tiny.cc.\n", "\n", "2. **Sometimes, Markdown cells\n", "get pasted\n", "together.** It can be easier to edit\n", "Markdown cells in smaller\n", "chunks. To\n", "separate them, I sometimes add little code\n", "snippets.\n", "\n", "3. **GitHub\n", "doesn't render\n", "the markdown as a notebook!** Maybe if\n", "more people use it, they\n", "can try it out.\n", "As a compromise, I converted to\n", "notebook-output.ipynb for you to\n", "read online :)\n", "This was done with \n", "`notedown notebook.md > notebook-output.ipynb`\n", "\n", "[mdcheat]:\n", "https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "attributes": { "classes": [], "id": "", "n": "26" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "~~~~~~~~~~~~~~~~~~~~~\n", "~~~~~~~~~:-)~~~~~~~~~\n", "~~~~~~~~~~~~~~~~~~~~~\n" ] } ], "source": [ "for i in range(3):\n", " print(\"~\"*9 + (\":-)\" if i==1 else \"~~~\") + \"~\"*9)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sample project\n", "Let's look at some GeoJSON data in San Francisco." ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "attributes": { "classes": [], "id": "", "n": "57" } }, "outputs": [], "source": [ "import json\n", "data = None\n", "with open('san-francisco.geojson') as json_file: \n", " data = json.load(json_file)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see a list of neighborhoods (at least, according to this dataset):" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "attributes": { "classes": [], "id": "", "n": "58" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Seacliff\n", "Marina\n", "Pacific Heights\n", "Nob Hill\n", "Presidio Heights\n", "Downtown/Civic Center\n", "Excelsior\n", "Bernal Heights\n", "Western Addition\n", "Chinatown\n", "North Beach\n", "Haight Ashbury\n", "Outer Mission\n", "Crocker Amazon\n", "West of Twin Peaks\n", "South of Market\n", "Potrero Hill\n", "Inner Richmond\n", "Bayview\n", "Noe Valley\n", "Inner Sunset\n", "Diamond Heights\n", "Lakeshore\n", "Russian Hill\n", "Treasure Island/YBI\n", "Twin Peaks\n", "Outer Richmond\n", "Visitacion Valley\n", "Golden Gate Park\n", "Parkside\n", "Financial District\n", "Ocean View\n", "Mission\n", "Presidio\n", "Castro/Upper Market\n", "Outer Sunset\n", "Glen Park\n" ] } ], "source": [ "## get a list of neighborhoods\n", "for i in range(len(data['features'])):\n", " print(data['features'][i]['properties']['name'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's install GeoPandas!" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "attributes": { "classes": [], "id": "", "n": "75" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: geopandas in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (0.5.0)\n", "Requirement already satisfied: descartes in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (1.1.0)\n", "Requirement already satisfied: pyproj in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from geopandas) (2.2.0)\n", "Requirement already satisfied: fiona in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from geopandas) (1.8.6)\n", "Requirement already satisfied: pandas in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from geopandas) (0.24.1)\n", "Requirement already satisfied: shapely in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from geopandas) (1.6.4.post2)\n", "Requirement already satisfied: matplotlib in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from descartes) (2.2.3)\n", "Requirement already satisfied: aenum; python_version < \"3.6\" in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from pyproj->geopandas) (2.1.2)\n", "Requirement already satisfied: attrs>=17 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from fiona->geopandas) (18.1.0)\n", "Requirement already satisfied: click-plugins>=1.0 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from fiona->geopandas) (1.1.1)\n", "Requirement already satisfied: click<8,>=4.0 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from fiona->geopandas) (7.0)\n", "Requirement already satisfied: munch in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from fiona->geopandas) (2.3.2)\n", "Requirement already satisfied: six>=1.7 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from fiona->geopandas) (1.11.0)\n", "Requirement already satisfied: cligj>=0.5 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from fiona->geopandas) (0.5.0)\n", "Requirement already satisfied: python-dateutil>=2.5.0 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from pandas->geopandas) (2.7.3)\n", "Requirement already satisfied: numpy>=1.12.0 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from pandas->geopandas) (1.15.1)\n", "Requirement already satisfied: pytz>=2011k in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from pandas->geopandas) (2018.5)\n", "Requirement already satisfied: cycler>=0.10 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from matplotlib->descartes) (0.10.0)\n", "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from matplotlib->descartes) (2.2.0)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from matplotlib->descartes) (1.0.1)\n", "Requirement already satisfied: setuptools in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from kiwisolver>=1.0.1->matplotlib->descartes) (40.2.0)\n", "\u001b[33mYou are using pip version 19.0.1, however version 19.1.1 is available.\n", "You should consider upgrading via the 'pip install --upgrade pip' command.\u001b[0m\n" ] } ], "source": [ "import sys\n", "!{sys.executable} -m pip install geopandas descartes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok! Let's use it!" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "attributes": { "classes": [], "id": "", "n": "79" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namecartodb_idcreated_atupdated_atgeometry
0Seacliff12013-02-10T05:44:042013-02-10T05:44:04(POLYGON ((-122.484089 37.78791, -122.484346 3...
1Marina202013-02-10T05:44:042013-02-10T05:44:04(POLYGON ((-122.446806 37.805401, -122.44678 3...
2Pacific Heights232013-02-10T05:44:042013-02-10T05:44:04(POLYGON ((-122.446825 37.787251, -122.447228 ...
3Nob Hill252013-02-10T05:44:042013-02-10T05:44:04(POLYGON ((-122.418609 37.78891, -122.421954 3...
4Presidio Heights292013-02-10T05:44:042013-02-10T05:44:04(POLYGON ((-122.4626 37.789041, -122.460923 37...
\n", "
" ], "text/plain": [ " name cartodb_id created_at updated_at \\\n", "0 Seacliff 1 2013-02-10T05:44:04 2013-02-10T05:44:04 \n", "1 Marina 20 2013-02-10T05:44:04 2013-02-10T05:44:04 \n", "2 Pacific Heights 23 2013-02-10T05:44:04 2013-02-10T05:44:04 \n", "3 Nob Hill 25 2013-02-10T05:44:04 2013-02-10T05:44:04 \n", "4 Presidio Heights 29 2013-02-10T05:44:04 2013-02-10T05:44:04 \n", "\n", " geometry \n", "0 (POLYGON ((-122.484089 37.78791, -122.484346 3... \n", "1 (POLYGON ((-122.446806 37.805401, -122.44678 3... \n", "2 (POLYGON ((-122.446825 37.787251, -122.447228 ... \n", "3 (POLYGON ((-122.418609 37.78891, -122.421954 3... \n", "4 (POLYGON ((-122.4626 37.789041, -122.460923 37... " ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import descartes\n", "import geopandas as gpd\n", "import matplotlib.pyplot as plt\n", "file = gpd.read_file('san-francisco.geojson')\n", "file.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also see nice maps :-)" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "attributes": { "classes": [], "id": "", "n": "82" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "ax = file.plot(color='blue')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Does it work?\n", "\n", "You can decide for yourself. Explore the diffs on GitHub, for\n", "example [here][1] or [here][2].\n", "\n", "In my opinion, this is still a prototype, but a\n", "cool possibility for using Jupyter and Git together.\n", "[1]:https://github.com/marwahaha/jupyter-diff/commit/8479a35\n", "[2]:https://github.com/marwahaha/jupyter-diff/commit/e0faff\n", "\n", "## Thanks!\n", "\n", "Check out the latest [Jupyter Lab\n", "interface](https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html)\n", "for working with notebooks in a friendly, responsive way." ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 2 }