{ "cells": [ { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "4e3ff556-2ed1-4264-9e5d-c34080b5ef43" } }, "source": [ "# Context\n", "Reading data from a software version control system can be pretty useful if you want to answer some evolutionary questions like\n", "* Who are our main committers to the software?\n", "* Are there any areas in the code where only one developer knows of?\n", "* Where were we working on the last months?" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "d090f3d2-84d9-4133-b461-3cd1584af048" } }, "source": [ "In my previous notebook, I showed you how to read a Git repository directly in Python with [Pandas](http://pandas.pydata.org/) and [GitPython](https://gitpython.readthedocs.io/en/stable/). As much as I like that approach (because everything is in one place and therefore reproducible), it's (currently) very slow while reading all the statistics information (but I'll work on that!). What I want to have now is a really fast method to read in a complete Git repository. \n", "\n", "I take this opportunity to show you how to read any kind of structure, linear data into Pandas' DataFrame. The general rule of thumb is: As long as you see a pattern in the raw data, Pandas can read and tame it, too!" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "f1852e51-8531-4ccc-af20-8348509cfc9c" } }, "source": [ "# The idea\n", "We are taking a shortcut for retrieving the commit history by exporting it into a log file. You can use e. g.\n", "
\n",
    "git log --all --numstat --pretty=format:'--%h--%ad--%aN' --no-renames > git.log \n",
    "
\n", "to do this. This will output a file with all the log information of a repository." ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "0bd50158-8902-4051-ac36-6e5f4e708b83" } }, "source": [ "In this notebook, we analyze the Git repository of [aim42](https://github.com/aim42/aim42) (an open book project about how to improve legacy systems). \n", "\n", "The first entries of that file look something like this:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "nbpresent": { "id": "86a45314-4211-4cd1-93fb-9c156534a2e0" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--ea7e08b--Tue Nov 29 21:42:16 2016 +0100--feststelltaste\n", "2\t0\tsrc/main/asciidoc/appendices/bibliography.adoc\n", "1\t7\tsrc/main/asciidoc/pattern-index.adoc\n", "12\t1\tsrc/main/asciidoc/patterns/improve/anticorruption-layer.adoc\n", "\n", "--fa1ca6f--Thu Dec 22 08:04:18 2016 +0100--feststelltaste\n", "2\t0\tsrc/main/asciidoc/appendices/bibliography.adoc\n", "2\t2\tsrc/main/asciidoc/patterns/analyze/busfactor.adoc\n" ] } ], "source": [ "with open (r'data/gitlog_aim42.log') as log:\n", " [print(line, end='') for line in log.readlines()[:8]]" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "5aee8b0f-d3e8-46d7-a680-0d6ff1966976" } }, "source": [ "For each commit, we choose to create a header line with the following commit info (by using --pretty=format:'--%h--%ad--%aN'):\n", "
\n",
    "--fa1ca6f--Thu Dec 22 08:04:18 2016 +0100--feststelltaste\n",
    "
\n", "It contains the SHA key, the timestamp as well as the author's name of the commit, separated by --. \n", "\n", "For each other row, we got some statistics about the modified files:\n", "
\n",
    "2\t0\tsrc/main/asciidoc/appendices/bibliography.adoc\n",
    "
\n", "\n", "It contains the number of lines inserted, the number of lines deleted and the relative path of the file. With a little trick and a little bit of data wrangling, we can read that information into a nicely structured DataFrame.\n", "\n", "Let's get started!" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "3976cf03-5f93-4093-8ca4-fabaf7a85a83" } }, "source": [ "# Import the data" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "5614c8bf-36b0-4371-bf78-b362561cfba3" } }, "source": [ "First, I'll show you my approach on how to read nearly everything into a DataFrame. The key is to use Pandas' read_csv for reading \"non-character separated values\". How to do that? We simply choose a separator that doesn't occur in the file that we want to read. My favorite character for this is the \"DEVICE CONTROL TWO\" character U+0012. I haven't encountered a situation yet where this character was included in a data set.\n", "\n", "We just read our git.log file without any headers (because there are none) and give the only column a nice name." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false, "nbpresent": { "id": "b38ec597-d6b6-4b10-ba27-709b9415b124" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
raw
0--ea7e08b--Tue Nov 29 21:42:16 2016 +0100--fes...
12\\t0\\tsrc/main/asciidoc/appendices/bibliograph...
21\\t7\\tsrc/main/asciidoc/pattern-index.adoc
312\\t1\\tsrc/main/asciidoc/patterns/improve/anti...
4--fa1ca6f--Thu Dec 22 08:04:18 2016 +0100--fes...
\n", "
" ], "text/plain": [ " raw\n", "0 --ea7e08b--Tue Nov 29 21:42:16 2016 +0100--fes...\n", "1 2\\t0\\tsrc/main/asciidoc/appendices/bibliograph...\n", "2 1\\t7\\tsrc/main/asciidoc/pattern-index.adoc\n", "3 12\\t1\\tsrc/main/asciidoc/patterns/improve/anti...\n", "4 --fa1ca6f--Thu Dec 22 08:04:18 2016 +0100--fes..." ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "commits = pd.read_csv(\"data\\gitlog_aim42.log\", \n", " sep=\"\\u0012\", \n", " header=None, \n", " names=['raw'])\n", "commits.head()" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "0d4414c2-99ba-42b6-9968-8d9f30c260af" } }, "source": [ "# Data Wrangling" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "ad927882-c62b-4461-991c-0cb43c38c09b" } }, "source": [ "OK, but now we have a problem data wrangling challenge. We have the commit info as well as the statistic for the modified file in one column, but they don't belong together. What we want is to have the commit info along with the file statistics in separate columns to get some serious analysis started." ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "9a844847-3efc-4812-aabc-7c0867c25321" } }, "source": [ "# Commit info\n", "Let's treat the commit info first. Luckily, we set some kind of anchor or marker to identify the commit info: Each commit info starts with a --. So let's extract all the commit info from the original commits DataFrame." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "nbpresent": { "id": "e34097f6-1626-49ba-bb93-b5b2dfb20d8c" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
raw
0--ea7e08b--Tue Nov 29 21:42:16 2016 +0100--fes...
4--fa1ca6f--Thu Dec 22 08:04:18 2016 +0100--fes...
7--c3d4e2d--Thu Dec 22 05:47:32 2016 +0100--Dr....
8--3f793e8--Tue Nov 29 21:42:16 2016 +0100--fes...
12--5d297c9--Wed Dec 21 20:49:33 2016 +0100--fes...
\n", "
" ], "text/plain": [ " raw\n", "0 --ea7e08b--Tue Nov 29 21:42:16 2016 +0100--fes...\n", "4 --fa1ca6f--Thu Dec 22 08:04:18 2016 +0100--fes...\n", "7 --c3d4e2d--Thu Dec 22 05:47:32 2016 +0100--Dr....\n", "8 --3f793e8--Tue Nov 29 21:42:16 2016 +0100--fes...\n", "12 --5d297c9--Wed Dec 21 20:49:33 2016 +0100--fes..." ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "commit_marker = commits[\n", " commits['raw'].str.startswith(\"--\")]\n", "commit_marker.head()" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "3363bbf6-f66e-4b19-a1e0-8b35404eead4" } }, "source": [ "With this, we can focus on extracting the information of a commit info row. The next command could be looking a little frightening, but don't worry. We go through it step by step." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "nbpresent": { "id": "577cb8ae-fdd7-4835-93c0-3d52c7e7fa6d" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
shadateauthor
0ea7e08b2016-11-29 20:42:16feststelltaste
4fa1ca6f2016-12-22 07:04:18feststelltaste
7c3d4e2d2016-12-22 04:47:32Dr. Gernot Starke
83f793e82016-11-29 20:42:16feststelltaste
125d297c92016-12-21 19:49:33feststelltaste
\n", "
" ], "text/plain": [ " sha date author\n", "0 ea7e08b 2016-11-29 20:42:16 feststelltaste\n", "4 fa1ca6f 2016-12-22 07:04:18 feststelltaste\n", "7 c3d4e2d 2016-12-22 04:47:32 Dr. Gernot Starke\n", "8 3f793e8 2016-11-29 20:42:16 feststelltaste\n", "12 5d297c9 2016-12-21 19:49:33 feststelltaste" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "commit_info = commit_marker['raw'].str.extract(\n", " r\"^--(?P.*?)--(?P.*?)--(?P.*?)$\", \n", " expand=True) \n", "commit_info['date'] = pd.to_datetime(commit_info['date'])\n", "commit_info.head()" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "4f628303-6f28-4d73-9213-189c2c8fe2fd" } }, "source": [ "We want to extract some data from the raw column. For this, we use the extract method on the string representation (note the str) of all the rows. This method expects a regular expression. We provide our own regex \n", "
\n",
    "^--(?P<sha>.\\*?)--(?P<date>.\\*?)--(?P<author>.\\*?)$\n",
    "
\n", "that works as follows:\n", "\n", "* ^: the beginning of the row\n", "* --: the two dashes that we choose and are used in the git log file as separator between the entries\n", "* (?P<sha>.*?)--: a named match group (marked by the ( and ) ) with the name sha for all characters (.*) until the next occurrence (?) of the -- separators.\n", "* and so on until \n", "* \\$: the marker for the end of the row (actually, ^ and $ aren't needed, but it looks nicer from a regex string's perspective in my eyes ;-) )\n", "\n", "I use these ugly looking, named match groups because then the name of such a group will be used by Pandas for the name of the column (therefore we avoid renaming the columns later on).\n", "\n", "The expand=True keyword delivers a DataFrame with columns for each detected regex group.\n", "\n", "We simply store the result into a new DataFrame variable commit_info.\n", "\n", "Because we've worked with the string representation of the row, Pandas didn't recognize the right data types for our newly created columns. That's why we need to cast the date column to the right type.\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "995385e3-6483-4915-9714-c3146674dbae" } }, "source": [ "OK, this part is ready, let's have a look at the file statistics!" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "c1baff0f-da59-418c-8ac5-172ecc162a90" } }, "source": [ "# File statistics" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "491233b3-f1cf-4270-85ca-b7982a4c842d" } }, "source": [ "Every row that is not a commit info row is a file statistics row. So we just reuse the index of our already prepared commit_info DataFrame to get all the other data by saying \"give me all commits that are not in the index of the commit_info's DataFrame\"." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "nbpresent": { "id": "0a9c1f53-c819-4fad-abf6-fa6ddcc6e53e" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
raw
12\\t0\\tsrc/main/asciidoc/appendices/bibliograph...
21\\t7\\tsrc/main/asciidoc/pattern-index.adoc
312\\t1\\tsrc/main/asciidoc/patterns/improve/anti...
52\\t0\\tsrc/main/asciidoc/appendices/bibliograph...
62\\t2\\tsrc/main/asciidoc/patterns/analyze/busfa...
\n", "
" ], "text/plain": [ " raw\n", "1 2\\t0\\tsrc/main/asciidoc/appendices/bibliograph...\n", "2 1\\t7\\tsrc/main/asciidoc/pattern-index.adoc\n", "3 12\\t1\\tsrc/main/asciidoc/patterns/improve/anti...\n", "5 2\\t0\\tsrc/main/asciidoc/appendices/bibliograph...\n", "6 2\\t2\\tsrc/main/asciidoc/patterns/analyze/busfa..." ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "file_stats_marker = commits[\n", " ~commits.index.isin(commit_info.index)]\n", "file_stats_marker.head()" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "60078fab-210d-4b85-a688-4820de6d98a9" } }, "source": [ "Luckily, the row's data is just a tab-separated string that we can easily split with the split method. We expand the result to get a DataFrame , rename the default columns to something that make more sense and adjust some data types. For the later, we use the keyword coerce that will let to_numeric return Nan's for all entries that are not a number." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "nbpresent": { "id": "0a64edd8-1d4e-4456-8d54-db30f82da038" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
insertionsdeletionsfilename
12.00.0src/main/asciidoc/appendices/bibliography.adoc
21.07.0src/main/asciidoc/pattern-index.adoc
312.01.0src/main/asciidoc/patterns/improve/anticorrupt...
52.00.0src/main/asciidoc/appendices/bibliography.adoc
62.02.0src/main/asciidoc/patterns/analyze/busfactor.adoc
\n", "
" ], "text/plain": [ " insertions deletions filename\n", "1 2.0 0.0 src/main/asciidoc/appendices/bibliography.adoc\n", "2 1.0 7.0 src/main/asciidoc/pattern-index.adoc\n", "3 12.0 1.0 src/main/asciidoc/patterns/improve/anticorrupt...\n", "5 2.0 0.0 src/main/asciidoc/appendices/bibliography.adoc\n", "6 2.0 2.0 src/main/asciidoc/patterns/analyze/busfactor.adoc" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "file_stats = file_stats_marker['raw'].str.split(\n", " \"\\t\", expand=True)\n", "file_stats = file_stats.rename(\n", " columns={ 0: \"insertions\", 1: \"deletions\", 2: \"filename\"})\n", "file_stats['insertions'] = pd.to_numeric(\n", " file_stats['insertions'], errors='coerce')\n", "file_stats['deletions'] = pd.to_numeric(\n", " file_stats['deletions'], errors='coerce')\n", "file_stats.head()" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "bdb87d56-b9e9-44d4-91b0-37d6c00c1d42" } }, "source": [ "# Putting it all together\n", "Now we have three parts: all commits, the separated commit info and the file statistics.\n", "\n", "We only need to glue the commit info and the file statistics together into a normalized DataFrame. For this, we have to make some adjustments to the indexes.\n", "\n", "For the commit info, we want to have each info for each file statistics row. That means we reindex the commit info by using the index of the commits DataFrame..." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "nbpresent": { "id": "81cdfe40-3a5d-44cf-9b55-eaf0a635761f" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
shadateauthor
0ea7e08b2016-11-29 20:42:16feststelltaste
1NaNNaTNaN
2NaNNaTNaN
\n", "
" ], "text/plain": [ " sha date author\n", "0 ea7e08b 2016-11-29 20:42:16 feststelltaste\n", "1 NaN NaT NaN\n", "2 NaN NaT NaN" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "commit_info.reindex(commits.index).head(3)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "179fb755-5d48-421f-9f14-245aa203c5d1" } }, "source": [ "...and fill the missing values for the file statistics' rows to get the needed structure. Together, this is done like the following:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false, "nbpresent": { "id": "022f08fe-2746-4631-b0e2-1df34c154675" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
shadateauthor
0ea7e08b2016-11-29 20:42:16feststelltaste
1ea7e08b2016-11-29 20:42:16feststelltaste
2ea7e08b2016-11-29 20:42:16feststelltaste
3ea7e08b2016-11-29 20:42:16feststelltaste
4fa1ca6f2016-12-22 07:04:18feststelltaste
\n", "
" ], "text/plain": [ " sha date author\n", "0 ea7e08b 2016-11-29 20:42:16 feststelltaste\n", "1 ea7e08b 2016-11-29 20:42:16 feststelltaste\n", "2 ea7e08b 2016-11-29 20:42:16 feststelltaste\n", "3 ea7e08b 2016-11-29 20:42:16 feststelltaste\n", "4 fa1ca6f 2016-12-22 07:04:18 feststelltaste" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "commit_data = commit_info.reindex(\n", " commits.index).fillna(method=\"ffill\")\n", "commit_data.head()" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "d9667ccd-8a26-4cb3-8d63-11ed06ef8f3d" } }, "source": [ "After filling the file statistics rows, we can throw away the dedicated commit info rows by reusing the index from above (look at the index for seeing this clearly)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false, "nbpresent": { "id": "60bbce75-8da3-443d-8b38-9a2609cc2e8d" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
shadateauthor
1ea7e08b2016-11-29 20:42:16feststelltaste
2ea7e08b2016-11-29 20:42:16feststelltaste
3ea7e08b2016-11-29 20:42:16feststelltaste
5fa1ca6f2016-12-22 07:04:18feststelltaste
6fa1ca6f2016-12-22 07:04:18feststelltaste
\n", "
" ], "text/plain": [ " sha date author\n", "1 ea7e08b 2016-11-29 20:42:16 feststelltaste\n", "2 ea7e08b 2016-11-29 20:42:16 feststelltaste\n", "3 ea7e08b 2016-11-29 20:42:16 feststelltaste\n", "5 fa1ca6f 2016-12-22 07:04:18 feststelltaste\n", "6 fa1ca6f 2016-12-22 07:04:18 feststelltaste" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "commit_data = commit_data[~commit_data.index.isin(commit_info.index)]\n", "commit_data.head()" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "43c8d1f2-07cb-4135-a562-fbe14674f767" } }, "source": [ "The easy step afterward is to join the file_stats DataFrame with the commit_data." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "nbpresent": { "id": "cffa207d-719c-4ea3-ae5c-556ca5ee692e" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
shadateauthorinsertionsdeletionsfilename
1ea7e08b2016-11-29 20:42:16feststelltaste2.00.0src/main/asciidoc/appendices/bibliography.adoc
2ea7e08b2016-11-29 20:42:16feststelltaste1.07.0src/main/asciidoc/pattern-index.adoc
3ea7e08b2016-11-29 20:42:16feststelltaste12.01.0src/main/asciidoc/patterns/improve/anticorrupt...
5fa1ca6f2016-12-22 07:04:18feststelltaste2.00.0src/main/asciidoc/appendices/bibliography.adoc
6fa1ca6f2016-12-22 07:04:18feststelltaste2.02.0src/main/asciidoc/patterns/analyze/busfactor.adoc
\n", "
" ], "text/plain": [ " sha date author insertions deletions \\\n", "1 ea7e08b 2016-11-29 20:42:16 feststelltaste 2.0 0.0 \n", "2 ea7e08b 2016-11-29 20:42:16 feststelltaste 1.0 7.0 \n", "3 ea7e08b 2016-11-29 20:42:16 feststelltaste 12.0 1.0 \n", "5 fa1ca6f 2016-12-22 07:04:18 feststelltaste 2.0 0.0 \n", "6 fa1ca6f 2016-12-22 07:04:18 feststelltaste 2.0 2.0 \n", "\n", " filename \n", "1 src/main/asciidoc/appendices/bibliography.adoc \n", "2 src/main/asciidoc/pattern-index.adoc \n", "3 src/main/asciidoc/patterns/improve/anticorrupt... \n", "5 src/main/asciidoc/appendices/bibliography.adoc \n", "6 src/main/asciidoc/patterns/analyze/busfactor.adoc " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "commit_data = commit_data.join(file_stats)\n", "commit_data.head()" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "002860f4-1961-4409-8b2b-20835285ff24" } }, "source": [ "We're done!" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "56ee3c98-8b97-41bf-938d-108f98a4606b" } }, "source": [ "# Complete code block\n", "To much code to look through? Here is everything from above in a condensed format." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "nbpresent": { "id": "6eb87420-81da-4191-9758-f404134a26ce" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Wall time: 219 ms\n" ] } ], "source": [ "%%time\n", "import pandas as pd\n", "\n", "commits = pd.read_csv(r'C:\\dev\\repos\\aim42\\git.log', sep=\"\\u0012\", header=None, names=['raw'])\n", "\n", "commit_marker = commits[commits['raw'].str.startswith(\"--\",na=False)]\n", "commit_info = commit_marker['raw'].str.extract(r\"^--(?P.*?)--(?P.*?)--(?P.*?)$\", expand=True)\n", "commit_info['date'] = pd.to_datetime(commit_info['date'])\n", "\n", "file_stats_marker = commits[~commits.index.isin(commit_info.index)]\n", "file_stats = file_stats_marker['raw'].str.split(\"\\t\", expand=True)\n", "file_stats = file_stats.rename(columns={0: \"insertions\", 1: \"deletions\", 2: \"filename\"})\n", "file_stats['insertions'] = pd.to_numeric(file_stats['insertions'], errors='coerce')\n", "file_stats['deletions'] = pd.to_numeric(file_stats['deletions'], errors='coerce')\n", "\n", "commit_data = commit_info.reindex(commits.index).fillna(method=\"ffill\")\n", "commit_data = commit_data[~commit_data.index.isin(commit_info.index)]\n", "commit_data = commit_data.join(file_stats)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "e9d7a22e-fec0-4b9a-801d-923e8b4ab42e" } }, "source": [ "Just some milliseconds to run through, not bad!" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "nbpresent": { "id": "417386b7-669e-4c2e-93bc-2abae6b03699" } }, "source": [ "# Summary\n", "In this notebook, I showed you how to read some non-perfect structured data via the non-character separator trick. I also showed you how to transform the rows that contain multiple kinds of data into one nicely structured DataFrame.\n", "\n", "Now that we have the Git repository DataFrame, we can do some nice things with it e. g. visualizing the code churn of a project, but that's a story for another notebook! But to give you a short preview:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false, "nbpresent": { "id": "edfb29bb-0a19-4835-be91-60f8ae4941ca" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEMCAYAAAAvaXplAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XuUVvV97/H3hxGiXAMmQgSpmoDRQNKSis1xJZ0jKUgu\nSNrEQy5HG8mJq9iQpm1WIDkNY5K1qvRkhZhW2zRE0erhEHs8mhOL6MJpVhJNQJNiBGGWiQhDGA/X\nyHWG4Xv+2HtkM89cn8s8t89rrVnPnt/+7c1vs+H5PL/fbz97KyIwMzPLGlbuBpiZWeVxOJiZWQ6H\ng5mZ5XA4mJlZDoeDmZnlcDiYmVmOfsNB0mpJbZK2dCv/jKRtkp6TdFumfLmklnTd3Ez5LElbJO2Q\ntCpTPkLS2nSbpyRNzay7Ma2/XdINhR+umZkNxEB6DncD87IFkhqBDwIzI2Im8D/S8suB64HLgfnA\nnZKUbnYXsDgipgPTJXXtczFwICKmAauAlem+xgNfBq4ErgJWSBqX53Gamdkg9BsOEfEj4GC34j8D\nbouIU2mdfWn5dcDaiDgVES8BLcBsSZOAMRGxKa13L7Aws82adPlB4Jp0eR6wISIOR8QhYANw7SCP\nz8zM8pDvnMN04D2Snpb0pKR3puWTgV2Zeq1p2WRgd6Z8d1p21jYR0QkcljShj32ZmVmJnVPAduMj\n4g8kXQl8D7i0SG1S/1XMzKyU8g2HXcD/BoiITZI6JZ1P8ul+aqbelLSsFbioh3Iy6/ZIagDGRsQB\nSa1AY7dtnuypMZJ8gygzszxERI8fyAc6rCTO/kT/f0jnBiRNB0ZExH7gEeC/pFcgXQK8BfhZROwl\nGS6anU5Q3wA8nO7rEeDGdPkjwMZ0+THgjySNSyen/ygt6+0Ae/xZsWJFr+sGW6+Y+ypXvUpuWzGP\nodrb77bV5jFUWtv60m/PQdIDJJ/gz5f0MrAC+C5wt6TngJPpmz0RsVXSOmAr0AEsiTMtuAW4BzgX\neDQi1qflq4H7JLUA+4FF6b4OSvoqsBkI4NZIJqYHpbGxsWj1irmvctUrV9sGqljH4HOQX71ynM9i\n16uFv99K+H+q/tKjGkiKWjiOWtbU1ERTU1O5m2FF4vNZGyQRBQ4rmRWk2J9Irbx8Pmufew5mZnXK\nPQczMxsUh4OZmeXI93sOZlYnTp2CTZugs7PcLSmfWbNg5Mhyt2JoORzMrE9PPQXvfz+8/e3lbkl5\ntLTAbbfBJz9Z7pYMLYeDmfWpvR1+//dh48b+69aiT386+TuoN55zMLM+dXZCQ0O5W1E+DQ31OaTm\ncDCzPjkcHA5mZjlOn4ZhdfxO4XAwM+uBew4OBzOzHPUeDsOGORzMzHJ4WMnhYGaWo957Dg0NSUDW\nG4eDmfXJ4eCeg5lZjtOnHQ4OBzOzbjo7PefgcDAz68bDSg4HM7Mc9R4OvpS1F5JWS2qTtKWHdX8l\n6bSkCZmy5ZJaJG2TNDdTPkvSFkk7JK3KlI+QtDbd5ilJUzPrbkzrb5d0Q2GHamb58JyDr1bqzd3A\nvO6FkqYAfwTszJRdDlwPXA7MB+6U1PUIuruAxRExHZguqWufi4EDETENWAWsTPc1HvgycCVwFbBC\n0rhBH6GZFcRzDu459CgifgQc7GHVN4DPdyu7DlgbEaci4iWgBZgtaRIwJiI2pfXuBRZmtlmTLj8I\nXJMuzwM2RMThiDgEbACuHdBRmVnR1PuwksNhECQtAHZFxHPdVk0GdmV+b03LJgO7M+W707KztomI\nTuBwOkzV277MbAh5WKk+w2HQD/uRdB7wRZIhpVJQ/1XMbKh4WMnhMFBvBi4G/iOdT5gCPCtpNsmn\n+6mZulPSslbgoh7KyazbI6kBGBsRByS1Ao3dtnmyt0Y1NTW9ttzY2EhjY2NvVa2KRcAdd8DSpSB/\njBgSHlaqnXBobm6mubl5QHUVEf1Xki4Gvh8RM3tY92tgVkQclHQFcD/JBPJk4HFgWkSEpKeBpcAm\n4AfAHRGxXtISYEZELJG0CFgYEYvSCenNwCyS4a/NwDvT+YfubYiBHIdVv7Y2mDQJjh2D884rd2vq\nw8qVsG9f8lqPVq+GH/8Yvvvdcrek+CQRET1+zBrIpawPAD8hucLoZUndH7MdpENBEbEVWAdsBR4F\nlmTetW8BVgM7gJaIWJ+WrwbeIKkF+AtgWbqvg8BXSULhp8CtPQWD1Y9/+Rf4wAeS5Xp8pm+51HvP\nYdiw+ryUtd9hpYj4WD/rL+32+98Cf9tDvWeAnJ5HRJwkufy1p33fA9zTXxutPjz0EHzwg7B5s8Nh\nKHnOoXaGlQajjk+5VZPdu+GFF2D+fLjwQjh5stwtqh++WsnhYFax3vMeGDkSLrsMXvc69xyGUr0P\nK9VrOORztZJZ0WzcCKdOwdy5vdeJSHoOW7fCuefCiBHuOQylzk4YPrzcrSgfh4NZGcyZA69/PRzs\n6Tv4qf37YdSoJBggCQf3HIaO5xwcDmZlMWNG3+s/9KEzwQDJsJJ7DkPHcw6Vf7VSRNIDb29P/m+c\nPAnbtsGRI8m6rjrZ1/44HKzsxo/ve/3LL8O//duZ391zGFr1PucwYgQ88kjyIaUSdXTAk08m/ydG\njDjz8+Y3wxvekNTp+sJo99e+OBysbI4fT177+zJbZ+eZf+RQeT2HiMKHHdrb4dlni/cJ9ehR2LkT\nPvxhGD367HW9fXLsrfzkyfoOh/e+Fx57LPk7rVT//M/wpjcNfru+QsLhYGVzKP1KY3+9gO7fhh7q\nnsO2bUkbfv5z+MUvzn4TPXIEHnyw8LCKSIbXxhXppvTDhiU/t9ySO5nc/Q2hpzeI7mXf+U5x2lWN\nhg/v+4KJWuVwsLLpeoM/caLvesePJ5exdumt5xCR1D169MzP8eMDH2MFeP55aGpK/ry3vQ2eew5e\nfBFmzkw+mc2Zk/z5WZ//fP/zJmbVxuFgZdPRkbz29an79OlkfXZCetQo+PSn4eabk4Bpb0/21dGR\nvHGPHp3UGTUq6XEM5kobCb75TZgwAX7zm2RIYcYMuPrq/I7RrFo5HKxsusKhr57DiRNJMGSHOf7h\nH5IbwY0enYRB1wTc8OG+U6tZsTgcrGwG0nM4duzsISVIrm7q7wonMyuMw8HKpqMj+cR/4kRyjXb3\n67ClnsPBzErP4WBl09EBY8Ykt8UYPjy5XDJ7HXZnJ/zd38H555e3nWb1qI6/FG/l1tGR3Ejv4YfP\n9B66Jpbb2+Ezn4E1a2DatHK31Kz+OBysbLqGlRYsyL08FOB970vK3//+oW+bWb3zsJKVTUdH33f7\nnDu3Pr98ZFYJ3HOwsukvHMysfBwOVjYOB7PK1W84SFotqU3SlkzZSknbJP1C0r9KGptZt1xSS7p+\nbqZ8lqQtknZIWpUpHyFpbbrNU5KmZtbdmNbfLumG4hyyVQqHg1nlGkjP4W5gXreyDcDbIuJ3gRZg\nOYCkK4DrgcuB+cCd0mvfWb0LWBwR04Hpkrr2uRg4EBHTgFXAynRf44EvA1cCVwErJBXptmRWCRwO\nZpWr33CIiB8BB7uVPRERXTcXfhqYki4vANZGxKmIeIkkOGZLmgSMiYhNab17gYXp8nXAmnT5QeCa\ndHkesCEiDkfEIZJAunaQx2cVzOFgVrmKMedwE/BoujwZ2JVZ15qWTQZ2Z8p3p2VnbRMRncBhSRP6\n2JfVgAg4fNjhYFapCgoHSV8COiLifxapPQC+dVodWLEC/uIv+n/Qj5mVR97fc5D0p8D7ODMMBMmn\n+4syv09Jy3orz26zR1IDMDYiDkhqBRq7bfNkb+1pamp6bbmxsZHGxsbeqloFeOaZ5DV7K24zK63m\n5maam5sHVFcxgCehSLoY+H5EzEx/vxb4OvCeiNifqXcFcD/JBPJk4HFgWkSEpKeBpcAm4AfAHRGx\nXtISYEZELJG0CFgYEYvSCenNwCySHs5m4J3p/EP39sVAjsMqx9y58Pjj8JWvwN/8TblbY1afJBER\nPY7W9NtzkPQAySf48yW9DKwAvgiMAB5PL0Z6OiKWRMRWSeuArUAHsCTzrn0LcA9wLvBoRKxPy1cD\n90lqAfYDiwAi4qCkr5KEQgC39hQMVp3a2+Fb34JPfarcLTGzngyo51Dp3HOoPldfDStX+glrZuXU\nV8/B35C2smhv95VKZpXM4WBl0d6e3JHVzCqTw8HKout23WZWmRwOVhYeVjKrbH6eQw04ciT5JJ4v\n5fm1w3y3GzXKPQezSudwqHL79sGb3gSjR+e3fb4XeeW73enTMHkyvPyyew5mlczhUOX27YNLL4Xt\n28vdkoGJgJkzk2X3HMwql+ccqtyrr8LYsf3XqxQSzJ+fLDsczCqXw6HK/fa31RUOABMmJK8eVjKr\nXA6HKvfb38KYMeVuxeB0hYN7DmaVy+FQ5fbsyX8yulze8pbktaGhvO0ws945HKrYkSPw538Ob31r\nuVsyOHPm5H+1k5kNDd94r4rt3QvveAe0tZW7JWZWjXzjvRp1/LifpGZmpeFwqGInTjgczKw0HA5V\n7PhxP2bTzErD4VDFPKxkZqXicKhiJ06452BmpeFwqGLuOZhZqfQbDpJWS2qTtCVTNl7SBknbJT0m\naVxm3XJJLZK2SZqbKZ8laYukHZJWZcpHSFqbbvOUpKmZdTem9bdLuqE4h1w73HMws1IZSM/hbmBe\nt7JlwBMRcRmwEVgOIOkK4HrgcmA+cKf02l3/7wIWR8R0YLqkrn0uBg5ExDRgFbAy3dd44MvAlcBV\nwIpsCJl7DmZWOv2GQ0T8CDjYrfg6YE26vAZYmC4vANZGxKmIeAloAWZLmgSMiYhNab17M9tk9/Ug\ncE26PA/YEBGHI+IQsAG4dhDHVvOOHk0enGNmVmz5zjlcEBFtABGxF7ggLZ8M7MrUa03LJgO7M+W7\n07KztomITuCwpAl97MtS1XhHVjOrDsWakC7mvSvyfPhk/Xn11eq7I6uZVYd8nwTXJmliRLSlQ0av\npOWtwEWZelPSst7Ks9vskdQAjI2IA5JagcZu2zzZW4OamppeW25sbKSxsbG3qjXjt7+FSy4pdyvM\nrFo0NzfT3Nw8oLoDuvGepIuB70fEzPT320kmkW+X9AVgfEQsSyek7yeZQJ4MPA5Mi4iQ9DSwFNgE\n/AC4IyLWS1oCzIiIJZIWAQsjYlE6Ib0ZmEXSw9kMvDOdf+jevrq88d5NN8HVV8PixeVuiZlVo75u\nvNdvz0HSAySf4M+X9DKwArgN+J6km4CdJFcoERFbJa0DtgIdwJLMu/YtwD3AucCjEbE+LV8N3Cep\nBdgPLEr3dVDSV0lCIYBbewqGeuY5BzMrFd+yu4rNmwef+xxc62u4zCwPvmV3jXLPwcxKxeFQxarx\n+dFmVh0cDlXs1VfdczCz0nA4VDH3HMysVBwOVWrzZjh82OFgZqXhq5Wq1Ic/DK+8Aj/8YblbYmbV\nylcr1ailS8vdAjOrVQ6HKnXqFDQ0lLsVZlarHA5VqrPT4WBmpeNwqFIOBzMrJYdDlXI4mFkpORyq\n1KlTcE6+N1w3M+uHw6HK3HEHXHwxbNzonoOZlY7Docr88pewc2ey7HAws1JxOFSZkydh5Mhk2eFg\nZqXicKgyJ0/C6NHJsucczKxUHA5V5uRJGDUqWXbPwcxKxeFQZbI9B4eDmZWKw6HKuOdgZkOhoHCQ\n9DlJv5S0RdL9kkZIGi9pg6Ttkh6TNC5Tf7mkFknbJM3NlM9K97FD0qpM+QhJa9NtnpI0tZD21gL3\nHMxsKOQdDpIuBD4DzIqItwPnAB8FlgFPRMRlwEZgeVr/CuB64HJgPnCnpK5bxd4FLI6I6cB0SfPS\n8sXAgYiYBqwCVubb3lqR7Tl4QtrMSqXQYaUGYJSkc4DzgFbgOmBNun4NsDBdXgCsjYhTEfES0ALM\nljQJGBMRm9J692a2ye7rQWBOge2teu45mNlQyDscImIP8HXgZZJQOBwRTwATI6ItrbMXuCDdZDKw\nK7OL1rRsMrA7U747LTtrm4joBA5JmpBvm2uBw8HMhkIhw0qvJ/lk/zvAhSQ9iI8D3R/JVsxHtPX4\nxKJ64glpMxsKhYxavxf4VUQcAJD0EPCfgDZJEyOiLR0yeiWt3wpclNl+SlrWW3l2mz2SGoCxXX9e\nd01NTa8tNzY20tjYWMChVaa/+ivo6IALL0x+dziY2WA0NzfT3Nw8oLp5P0Na0mxgNXAlcBK4G9gE\nTCWZRL5d0heA8RGxLJ2Qvh+4imS46HFgWkSEpKeBpen2PwDuiIj1kpYAMyJiiaRFwMKIWNRDW2r+\nGdIdHTBiBPzqV8lN9z71Kdi7FyZOLHfLzKxa9fUM6bx7DhHxM0kPAj8HOtLXbwNjgHWSbgJ2klyh\nRERslbQO2JrWX5J5R78FuAc4F3g0Itan5auB+yS1APuBnGCoB/fcA5/7HLz97XDJJWeuUnLPwcxK\nJe+eQyWptJ5DBLzwQjI/0NkJp0+f/dpTWfa1u9tvh5tvhk98AoYPhwcegI9/HA4cgPHjh/74zKw2\nlKTnYGd76SXYtSt5g//xj+Eb34ApU2DYsOQTfvfXnsqGDUt+1O1UzZwJH/lIEgzgnoOZlZ7DoUgW\nLEjmBEaNSt7cH3oI3v3u0vxZXSHhcDCzUnE4FEFEMlHc2grjxvVfv1Bd33PoCgkzs2Kr+3Bob4ct\nW5LX9nZ48UXYv//M3ED3OYKun1On4MSJZF7h6NGk1zAUwQDw3vfCvn3Jn2lmVgp1PyG9Zg389V/D\ntGnJm+0FFyTPaM7ODXSfI2hoSMb9zz03+Xnd65Jt3vWuoh6WmVlJeUK6D8eOwZ/8CfzjP5a7JWZm\nlaPun+fQ0eGxezOz7hwODgczsxwOB4eDmVkOh4PDwcwsR92HQ3u7w8HMrLu6Dwf3HMzMcjkcHA5m\nZjkcDg4HM7McDgeHg5lZDoeDw8HMLIfDocM3sDMz687h4J6DmVkOh4PDwcwsR0HhIGmcpO9J2ibp\neUlXSRovaYOk7ZIekzQuU3+5pJa0/txM+SxJWyTtkLQqUz5C0tp0m6ckTS2kvQBHjiTPb+j62b/f\n4WBm1l2hPYdvAo9GxOXAO4AXgGXAExFxGbARWA4g6QrgeuByYD5wp/Ta05LvAhZHxHRguqR5afli\n4EBETANWASsLbC9f+xrMmwef+ETyc+AAXHppoXs1M6steT/sR9JY4OcR8eZu5S8AfxgRbZImAc0R\n8VZJy4CIiNvTev8GNAE7gY0RcUVavijd/s8krQdWRMRPJTUAeyPijT20ZcAP+1myBGbMSF7NzOpZ\nXw/7KaTncAmwT9Ldkp6V9G1JI4GJEdEGEBF7gQvS+pOBXZntW9OyycDuTPnutOysbSKiEzgkaUIB\nbebYMRg5spA9mJnVvkLC4RxgFvAPETELOEoypNT9I3wxn0PaY8INhsPBzKx/hTwmdDewKyI2p7//\nK0k4tEmamBlWeiVd3wpclNl+SlrWW3l2mz3psNLYiDjQU2OamppeW25sbKSxsbHHRh87BuedN8Aj\nNDOrIc3NzTQ3Nw+obt5zDgCS/h34bxGxQ9IKoOsz+YGIuF3SF4DxEbEsnZC+H7iKZLjocWBaRISk\np4GlwCbgB8AdEbFe0hJgRkQsSeciFkbEoh7aMeA5h2uugS99CebMyfuwzcxqQl9zDoX0HCB5Q79f\n0nDgV8AngQZgnaSbSCabrweIiK2S1gFbgQ5gSeYd/RbgHuBckquf1qflq4H7JLUA+4GcYBis48c9\nrGRm1p+Ceg6VYjA9h3e8A+69N3k1M6tnpbpaqSp5zsHMrH91Fw7t7b7RnplZfxwOZmaWo+7CwTfa\nMzPrn8PBzMxyOBzMzCxHXYaD5xzMzPpWV+EQAadOwTmFfvXPzKzG1VU4dHQkwaCCb99nZlbb6i4c\nPKRkZta/ugsHT0abmfWvrsKhvd3hYGY2EHUVDu45mJkNTN2Fg+cczMz6V3fh4J6DmVn/6uKK/xMn\n4PnnYccOh4OZ2UDURTisXg1f+QpccgnMm1fu1piZVb66CId9++Dmm5OAMDOz/tXFnMPBgzB+fLlb\nYWZWPQoOB0nDJD0r6ZH09/GSNkjaLukxSeMydZdLapG0TdLcTPksSVsk7ZC0KlM+QtLadJunJE3N\np40OBzOzwSlGz+GzwNbM78uAJyLiMmAjsBxA0hXA9cDlwHzgTum1uxzdBSyOiOnAdEldMwOLgQMR\nMQ1YBawcbOO2bIEXX3Q4mJkNRkHhIGkK8D7gO5ni64A16fIaYGG6vABYGxGnIuIloAWYLWkSMCYi\nNqX17s1sk93Xg8CcwbRvwwZ417tg9GiYOXMwW5qZ1bdCew7fAD4PRKZsYkS0AUTEXuCCtHwysCtT\nrzUtmwzszpTvTsvO2iYiOoFDkiYMtHH79sGCBbB+PVx66YCPycys7uV9tZKk9wNtEfELSY19VI0+\n1g36j+1tRVNT02vLjY2NNDY20tnpZzeYmXVpbm6mubl5QHULeeu8Glgg6X3AecAYSfcBeyVNjIi2\ndMjolbR+K3BRZvspaVlv5dlt9khqAMZGxIGeGpMNhy6nTkFDQ55HZ2ZWY7o+OHe59dZbe62b97BS\nRHwxIqZGxKXAImBjRPxX4PvAn6bVbgQeTpcfARalVyBdArwF+Fk69HRY0ux0gvqGbtvcmC5/hGSC\ne8A6Ox0OZmb5KMWgy23AOkk3ATtJrlAiIrZKWkdyZVMHsCQiuoacbgHuAc4FHo2I9Wn5auA+SS3A\nfpIQGjCHg5lZfnTm/bl6SYqejuPOO+G55+Cuu8rQKDOzCieJiOhxLremvyHtCWkzs/zUdDh4QtrM\nLD81HQ6eczAzy4/DwczMcjgczMwsR82HgyekzcwGr6bDwRPSZmb5qelw8LCSmVl+HA5mZpbD4WBm\nZjlqPhw8IW1mNng1HQ6ekDYzy09Nh4OHlczM8uNwMDOzHA4HMzPLUfPh4AlpM7PBq+lw8IS0mVl+\najocPKxkZpafvMNB0hRJGyU9L+k5SUvT8vGSNkjaLukxSeMy2yyX1CJpm6S5mfJZkrZI2iFpVaZ8\nhKS16TZPSZraX7si4MUX4d//HZ54wuFgZpaPQnoOp4C/jIi3Ae8CbpH0VmAZ8EREXAZsBJYDSLoC\nuB64HJgP3Cmp69mldwGLI2I6MF3SvLR8MXAgIqYBq4CV/TXqYx+DK6+E5cth/ny46qoCjtDMrE7l\nPV0bEXuBvenyEUnbgCnAdcAfptXWAM0kgbEAWBsRp4CXJLUAsyXtBMZExKZ0m3uBhcBj6b5WpOUP\nAn/fX7t+8hN45hm45JJ8j8zMzIoy5yDpYuB3gaeBiRHRBq8FyAVptcnArsxmrWnZZGB3pnx3WnbW\nNhHRCRySNKGvthw7BqNGFXAwZmZWeDhIGk3yqf6zEXEEiG5Vuv9e0B/XX4Xjx2HkyCL+iWZmdaig\nbwFIOockGO6LiIfT4jZJEyOiTdIk4JW0vBW4KLP5lLSst/LsNnskNQBjI+JAT21pamoiAo4ehZ/+\ntJE5cxoLOTQzs5rT3NxMc3PzgOoqIv8P9pLuBfZFxF9mym4nmUS+XdIXgPERsSydkL4fuIpkuOhx\nYFpEhKSngaXAJuAHwB0RsV7SEmBGRCyRtAhYGBGLemhHRAQnTsC4cXDyZN6HZGZWNyQRET2OyOQd\nDpKuBn4IPEcydBTAF4GfAetIPvHvBK6PiEPpNstJrkDqIBmG2pCWvxO4BzgXeDQiPpuWvw64D/g9\nYD+wKCJe6qEtEREcPJhMRB86lNchmZnVlZKEQyXpCofW1uQy1j17yt0iM7PK11c41NQ3pI8ehREj\nyt0KM7PqV1Ph8Md/DG98Y7lbYWZW/WrqnqXHj8NDD5W7FWZm1a+meg5Hj8Lo0eVuhZlZ9au5cPC3\no83MClcz4RDhW2eYmRVLzYTD8ePJlUq+RbeZWeFqJhw8pGRmVjw1Ew7f/jaMHVvuVpiZ1YaaCYeW\nFvjGN8rdCjOz2lBTt88wM7OBq5vbZ5iZWXE4HMzMLIfDwczMcjgczMwsh8PBzMxyOBzMzCyHw8HM\nzHI4HMzMLEdVhIOkayW9IGmHpC+Uuz1mZrWu4sNB0jDg74F5wNuAj0p6a3lbZYPV3Nxc7iZYEfl8\n1r6KDwdgNtASETsjogNYC1xX5jbZIPnNpLb4fNa+agiHycCuzO+707IBGeg/4oHUK+a+ylWvXG0b\nqGIdg89BfvXKcT6LXa8W/n4r4f9pNYRDQerpxFbyMQyUw6G89RwOpa1XyW3rruLvyirpD4CmiLg2\n/X0ZEBFxe6ZOZR+EmVmF6u2urNUQDg3AdmAO8BvgZ8BHI2JbWRtmZlbDzil3A/oTEZ2S/hzYQDIM\nttrBYGZWWhXfczAzs6FX8xPSlULSq/2sf1LSrKFqT6n0d5y1pB7OaT2dT6iPczpQDoehUy9dtHo5\nTqiPY62HY8yqt+PtlcNh6EjSH0r6fqbgW5JuKGejSkHSSElPSNos6T8kLUjLf0fSVknflvRLSesl\nva7c7S1AXZzTOjqfUCfndCAcDkMrqI9PJieAhRHx+8A1wNcz694CfCsiZgCHgT8pQ/uKqR7OaT2d\nT6iPc9qvir9ayaqSgNskvRs4DVwo6YJ03a8j4rl0+Rng4jK0zwbH57MOORyG1imgIfP7ueVqSAkJ\n+ARwPvB7EXFa0q85c6wnM3U7qf6/g1o/p/V2PqH2z+mAeFhp6ASwE7hC0nBJryf5Yl8tGgu8kr6R\n/GfgdzLrevw2ZpWql3NaL+cT6uec9ss9hyGQfsv7ZES0SloH/BL4NfBsplrVj3Gmx3kCuB/4v5L+\nA9gMZL+0WPXHCfVxTuvpfEJ9nNPB8JfghoCkdwD/FBF/UO62lFK9HCfUx7HWwzFm1dvx9sfDSiUm\n6WaST15fKndbSqlejhPq41jr4Riz6u14B8I9BzMzy+Geg5mZ5XA4WF4kTZG0UdLzkp6TtDQtHy9p\ng6Ttkh6TNC4tn5DWf1XSHd329aSkFyT9XNKzkt5QjmOqZ0U+n8Ml/VO6zVZJHyrHMVlhPKxkeZE0\nCZgUEb+QNJrkC1DXAZ8E9kfESklfAMZHxDJJI4HfBWYAMyJiaWZfTwJ/GRE/H/ojMSj6+WwChkXE\nl9PfJ0Q9Q+igAAABqklEQVTEgSE+JCuQew6Wl4jYGxG/SJePkFzeOIXkDWVNWm0NsDCtcywifsLZ\nX5rK8r/FMiry+bwJ+NvMvh0MVcj/Ia1gki4m+RT5NDAxItogecMBLuh9y7Pckw4p/feSNNIGrJDz\n2TXsBHxN0jOS/pekN5awuVYiDgcrSDoE8SDw2fQTZ/dxyoGMW34sImYC7wbeLekTRW6mDVARzuc5\nJD2OH0XEO0kC5ut9b2KVyOFgeZN0DskbyX0R8XBa3CZpYrp+EvBKf/uJiN+kr0eBB4DZpWmx9aUY\n5zMi9gNHI+KhtOh7wO+VqMlWQg4HK8R3ga0R8c1M2SPAn6bLNwIPd9+IzP14JDVIOj9dHg58gOS2\nBTb0Cj6fqe+n92ACeC+wtZiNtKHhq5UsL5KuBn4IPMeZ+99/EfgZsA64iOQGZtdHxKF0m18DY4AR\nwCFgLvByup9zSO6E+QTJlUv+hzmEinU+I+IFSVOB+4BxwP8DPhkRu4f2iKxQDgczM8vhYSUzM8vh\ncDAzsxwOBzMzy+FwMDOzHA4HMzPL4XAwM7McDgczM8vhcDAzsxz/H6YhN13PuZ4HAAAAAElFTkSu\nQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "timed_commits = commit_data.set_index(pd.DatetimeIndex(commit_data['date']))[['insertions', 'deletions']].resample('1D').sum()\n", "(timed_commits['insertions'] - timed_commits['deletions']).cumsum().fillna(method='ffill').plot()" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "b9045217-458c-49e7-ab17-b873047bc10e" } }, "source": [ "Stay tuned!" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [Root]", "language": "python", "name": "Python [Root]" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" }, "nbpresent": { "slides": { "1c3bf261-5fad-42f1-be29-bc970f66b78f": { "id": "1c3bf261-5fad-42f1-be29-bc970f66b78f", "prev": "2f379bd9-addb-4c73-aff7-0effc56da804", "regions": { "05f3625e-4608-4f56-b0c8-42757d38085e": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "cffa207d-719c-4ea3-ae5c-556ca5ee692e", "part": "whole" }, "id": "05f3625e-4608-4f56-b0c8-42757d38085e" } } }, "2f379bd9-addb-4c73-aff7-0effc56da804": { "id": "2f379bd9-addb-4c73-aff7-0effc56da804", "prev": "fa62ca0d-9305-405d-a301-04fab08ee8f3", "regions": { "ac003123-264d-4b81-95cd-d79b47595f27": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "022f08fe-2746-4631-b0e2-1df34c154675", "part": "whole" }, "id": "ac003123-264d-4b81-95cd-d79b47595f27" } } }, "3935cf8e-ea1d-4e1d-ad46-fcb10946cd1d": { "id": "3935cf8e-ea1d-4e1d-ad46-fcb10946cd1d", "prev": "88ab58e4-fc80-487b-83b5-687302c35522", "regions": { "a666344f-e0ba-42c7-9f85-12abfa643611": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "b38ec597-d6b6-4b10-ba27-709b9415b124", "part": "whole" }, "id": "a666344f-e0ba-42c7-9f85-12abfa643611" } } }, "46431dec-27a6-40c6-af95-5fb2f230ebd4": { "id": "46431dec-27a6-40c6-af95-5fb2f230ebd4", "prev": "90fbdba6-3c4e-436f-b4a1-a4a469dde026", "regions": { "ee88eafd-7388-4463-9651-9e672e3cf093": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "577cb8ae-fdd7-4835-93c0-3d52c7e7fa6d", "part": "whole" }, "id": "ee88eafd-7388-4463-9651-9e672e3cf093" } } }, "467b68fa-ba68-43f0-be64-972fa50485c2": { "id": "467b68fa-ba68-43f0-be64-972fa50485c2", "prev": "46431dec-27a6-40c6-af95-5fb2f230ebd4", "regions": { "06523258-1daa-4bbe-9b61-1af7bac105f8": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "0a9c1f53-c819-4fad-abf6-fa6ddcc6e53e", "part": "whole" }, "id": "06523258-1daa-4bbe-9b61-1af7bac105f8" } } }, "5ce0a5c0-69e6-403c-8b9e-e91a4f6d2084": { "id": "5ce0a5c0-69e6-403c-8b9e-e91a4f6d2084", "prev": "467b68fa-ba68-43f0-be64-972fa50485c2", "regions": { "447d5ea1-72a7-484d-b127-7c87b2fb17a5": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "0a64edd8-1d4e-4456-8d54-db30f82da038", "part": "whole" }, "id": "447d5ea1-72a7-484d-b127-7c87b2fb17a5" } } }, "69bc04e6-b390-4c39-8cfc-e901363073db": { "id": "69bc04e6-b390-4c39-8cfc-e901363073db", "prev": "b7d251b4-9300-4d87-a4c2-ce9e82cceaf1", "regions": { "d71fa94c-1908-4c10-8468-35ec4c0d0b60": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "f1852e51-8531-4ccc-af20-8348509cfc9c", "part": "whole" }, "id": "d71fa94c-1908-4c10-8468-35ec4c0d0b60" } } }, "88ab58e4-fc80-487b-83b5-687302c35522": { "id": "88ab58e4-fc80-487b-83b5-687302c35522", "prev": "69bc04e6-b390-4c39-8cfc-e901363073db", "regions": { "08604d82-a07c-41b3-ad38-e27b39ee83a2": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "86a45314-4211-4cd1-93fb-9c156534a2e0", "part": "whole" }, "id": "08604d82-a07c-41b3-ad38-e27b39ee83a2" } } }, "90fbdba6-3c4e-436f-b4a1-a4a469dde026": { "id": "90fbdba6-3c4e-436f-b4a1-a4a469dde026", "prev": "3935cf8e-ea1d-4e1d-ad46-fcb10946cd1d", "regions": { "66d3717e-256a-4063-96cf-415ced19032d": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "e34097f6-1626-49ba-bb93-b5b2dfb20d8c", "part": "whole" }, "id": "66d3717e-256a-4063-96cf-415ced19032d" } } }, "b7d251b4-9300-4d87-a4c2-ce9e82cceaf1": { "id": "b7d251b4-9300-4d87-a4c2-ce9e82cceaf1", "prev": null, "regions": { "0e54fbca-ee14-4d41-8134-a7e4848c8244": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "4e3ff556-2ed1-4264-9e5d-c34080b5ef43", "part": "whole" }, "id": "0e54fbca-ee14-4d41-8134-a7e4848c8244" } } }, "ce90176b-054d-44da-82e5-24e00f159d44": { "id": "ce90176b-054d-44da-82e5-24e00f159d44", "prev": "1c3bf261-5fad-42f1-be29-bc970f66b78f", "regions": { "a049b370-d20d-4b31-b12d-d1e98cf186a5": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "edfb29bb-0a19-4835-be91-60f8ae4941ca", "part": "whole" }, "id": "a049b370-d20d-4b31-b12d-d1e98cf186a5" } } }, "fa62ca0d-9305-405d-a301-04fab08ee8f3": { "id": "fa62ca0d-9305-405d-a301-04fab08ee8f3", "prev": "5ce0a5c0-69e6-403c-8b9e-e91a4f6d2084", "regions": { "e33a8ad9-898a-435b-9cd8-a0a9e971b7c0": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "81cdfe40-3a5d-44cf-9b55-eaf0a635761f", "part": "whole" }, "id": "e33a8ad9-898a-435b-9cd8-a0a9e971b7c0" } } } }, "themes": { "default": "66eaa5a2-7a73-471f-ab56-5b57ef30cb08", "theme": { "66eaa5a2-7a73-471f-ab56-5b57ef30cb08": { "id": "66eaa5a2-7a73-471f-ab56-5b57ef30cb08", "palette": { "19cc588f-0593-49c9-9f4b-e4d7cc113b1c": { "id": "19cc588f-0593-49c9-9f4b-e4d7cc113b1c", "rgb": [ 252, 252, 252 ] }, "31af15d2-7e15-44c5-ab5e-e04b16a89eff": { "id": "31af15d2-7e15-44c5-ab5e-e04b16a89eff", "rgb": [ 68, 68, 68 ] }, "50f92c45-a630-455b-aec3-788680ec7410": { "id": "50f92c45-a630-455b-aec3-788680ec7410", "rgb": [ 155, 177, 192 ] }, "c5cc3653-2ee1-402a-aba2-7caae1da4f6c": { "id": "c5cc3653-2ee1-402a-aba2-7caae1da4f6c", "rgb": [ 43, 126, 184 ] }, "efa7f048-9acb-414c-8b04-a26811511a21": { "id": "efa7f048-9acb-414c-8b04-a26811511a21", "rgb": [ 25.118061674008803, 73.60176211453744, 107.4819383259912 ] } }, "rules": { "blockquote": { "color": "50f92c45-a630-455b-aec3-788680ec7410" }, "code": { "font-family": "Anonymous Pro" }, "h1": { "color": "c5cc3653-2ee1-402a-aba2-7caae1da4f6c", "font-family": "Lato", "font-size": 8 }, "h2": { "color": "c5cc3653-2ee1-402a-aba2-7caae1da4f6c", "font-family": "Lato", "font-size": 6 }, "h3": { "color": "50f92c45-a630-455b-aec3-788680ec7410", "font-family": "Lato", "font-size": 5.5 }, "h4": { "color": "c5cc3653-2ee1-402a-aba2-7caae1da4f6c", "font-family": "Lato", "font-size": 5 }, "h5": { "font-family": "Lato" }, "h6": { "font-family": "Lato" }, "h7": { "font-family": "Lato" }, "pre": { "font-family": "Anonymous Pro", "font-size": 4 } }, "text-base": { "font-family": "Merriweather", "font-size": 4 } }, "ecdad483-e20b-4d01-87c5-efd37e40b00e": { "backgrounds": { "backgroundColor": { "background-color": "backgroundColor", "id": "backgroundColor" } }, "id": "ecdad483-e20b-4d01-87c5-efd37e40b00e", "palette": { "backgroundColor": { "id": "backgroundColor", "rgb": [ 256, 256, 256 ] }, "headingColor": { "id": "headingColor", "rgb": [ 34, 34, 34 ] }, "linkColor": { "id": "linkColor", "rgb": [ 42, 118, 221 ] }, "mainColor": { "id": "mainColor", "rgb": [ 34, 34, 34 ] } }, "rules": { "a": { "color": "linkColor" }, "h1": { "color": "headingColor", "font-family": "Source Sans Pro", "font-size": 5.25 }, "h2": { "color": "headingColor", "font-family": "Source Sans Pro", "font-size": 4 }, "h3": { "color": "headingColor", "font-family": "Source Sans Pro", "font-size": 3.5 }, "h4": { "color": "headingColor", "font-family": "Source Sans Pro", "font-size": 3 }, "h5": { "color": "headingColor", "font-family": "Source Sans Pro" }, "h6": { "color": "headingColor", "font-family": "Source Sans Pro" }, "h7": { "color": "headingColor", "font-family": "Source Sans Pro" }, "li": { "color": "mainColor", "font-family": "Source Sans Pro", "font-size": 6 }, "p": { "color": "mainColor", "font-family": "Source Sans Pro", "font-size": 6 } }, "text-base": { "color": "mainColor", "font-family": "Source Sans Pro", "font-size": 6 } } } } } }, "nbformat": 4, "nbformat_minor": 0 }