{ "cells": [ { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "4e3ff556-2ed1-4264-9e5d-c34080b5ef43" } }, "source": [ "# Introduction\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction\n", "There are multiple reasons for analyzing a version control system like your Git repository. See for example Adam Tornhill's book [\"Your Code as a Crime Scene\"](https://pragprog.com/book/atcrime/your-code-as-a-crime-scene) or his upcoming book [\"Software Design X-Rays\"](http://www.adamtornhill.com/swevolution/reviewersprogress.html) for plenty of inspirations:\n", "\n", "You can \n", "- analyze knowledge islands\n", "- distinguish often changing code from stable code parts\n", "- identify code that is temporal coupled\n", "\n", "Having the necessary data for those analyses in a Pandas <tt>DataFrame</tt> gives you many many possibilities to quickly gain insights about the evolution of your software system." ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "f1852e51-8531-4ccc-af20-8348509cfc9c" } }, "source": [ "# The idea\n", "\n", "In another [blog post](https://www.feststelltaste.de/reading-a-git-log-file-output-with-pandas/) I showed you a way to read in Git log data with [Pandas](http://pandas.pydata.org/)'s DataFrame and [GitPython](https://gitpython.readthedocs.io/en/stable/). Looking back, this was really complicated and tedious. So with a few tricks we can do it much more better this time:\n", "\n", "- We use GitPython's feature to directly access an underlying Git installation. This is way more faster than using GitPython's object representation of the repository makes it possible to have everything we need in one notebook.\n", "- We use in-memory reading by using StringIO to avoid unnecessary file access. This avoids storing the Git output on disk and read it from from disc again. This method is way more faster.\n", "- We also exploit Pandas's <tt>read_csv</tt> method even more. This makes the transformation of the Git log into a <tt>DataFrame</tt> as easy as pie." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reading the Git log\n", "The first step is to connect GitPython with the Git repo. If we have an instance of the repo, we can gain access to the underlying Git installation of the operation system via <tt>repo.git</tt>.\n", "\n", "In this case, again, we tap the Spring Pet Clinic project, a small sample application for the Spring framework." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<git.cmd.Git at 0x1f05402f948>" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import git \n", "\n", "GIT_REPO_PATH = r'../../spring-petclinic/'\n", "repo = git.Repo(GIT_REPO_PATH)\n", "git_bin = repo.git\n", "git_bin" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With the <tt>git_bin</tt>, we can execute almost any Git command we like directly. In our hypothetical use case, we want to retrieve some information about the change frequency of files. For this, we need the complete history of the Git repo including statistics for the changed files (via <tt>--numstat</tt>).\n", "\n", "We use a little trick to make sure, that the format for the file's statistics fits nicely with the commit's metadata (SHA <tt>%h</tt>, UNIX timestamp <tt>%at</tt> and author's name <tt>%aN</tt>). The <tt>--numstat</tt> option provides data for additions, deletions and the affected file name in one line, separated by the tabulator character <tt>\\t</tt>: \n", "<p>\n", "<tt>1<b>\\t</b>1<b>\\t</b>some/file/name.ext</tt>\n", "</p>\n", "\n", "We use the same tabular separator <tt>\\t</tt> for the format string:\n", "<p>\n", "<tt>%h<b>\\t</b>%at<b>\\t</b>%aN</tt>\n", "</p>\n", "\n", "And here is the trick: Additionally, we add the amount of tabulators of the file's statistics plus an additional tabulator in front of the format string to pretend that there are empty file statistics' information in front of the format string.\n", "\n", "The results looks like this:\n", "\n", "<p>\n", "<tt>\\t\\t\\t%h\\t%at\\t%aN</tt>\n", "</p>\n", "\n", "Note: If you want to export the Git log on the command line into a file to read that file later, you need to use the tabulator character xxx as separator instead of <tt>\\t</tt> in the format string. Otherwise, the trick doesn't work.\n", "\n", "\n", "OK, let's first executed the Git log export:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'\\t\\t\\t101c9dc\\t1498817227\\tDave Syer\\n2\\t3\\tpom.xml\\n\\n\\t\\t\\tffa967c\\t1492026060\\tAntoine Rey\\n1'" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "git_log = git_bin.execute('git log --numstat --pretty=format:\"\\t\\t\\t%h\\t%at\\t%aN\"')\n", "git_log[:80]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now read in the complete files' history in the <tt>git_log</tt> variable. Don't let confuse you by all the <tt>\\t</tt> characters. \n", "\n", "Let's read the result into a Pandas <tt>DataFrame</tt> by using the <tt>read_csv</tt> method. Because we can't provide a file path to a CSV data, we have to use StringIO to read in our in-memory buffered content.\n", "\n", "Pandas will read the first line of the tabular-separated \"file\", sees the many tabular-separated columns and parses all other lines in the same format / column layout. Additionaly, we set the <tt>header</tt> to <tt>None</tt> because we don't have one and provide nice names for all the columns that we read in." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style>\n", " .dataframe thead tr:only-child th {\n", " text-align: right;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: left;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>additions</th>\n", " <th>deletions</th>\n", " <th>filename</th>\n", " <th>sha</th>\n", " <th>timestamp</th>\n", " <th>author</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>101c9dc</td>\n", " <td>1.498817e+09</td>\n", " <td>Dave Syer</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>2</td>\n", " <td>3</td>\n", " <td>pom.xml</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>ffa967c</td>\n", " <td>1.492026e+09</td>\n", " <td>Antoine Rey</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>readme.md</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>fd1c742</td>\n", " <td>1.488785e+09</td>\n", " <td>Antoine Rey</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " additions deletions filename sha timestamp author\n", "0 NaN NaN NaN 101c9dc 1.498817e+09 Dave Syer\n", "1 2 3 pom.xml NaN NaN NaN\n", "2 NaN NaN NaN ffa967c 1.492026e+09 Antoine Rey\n", "3 1 1 readme.md NaN NaN NaN\n", "4 NaN NaN NaN fd1c742 1.488785e+09 Antoine Rey" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "from io import StringIO\n", "\n", "commits_raw = pd.read_csv(StringIO(git_log), \n", " sep=\"\\t\",\n", " header=None, \n", " names=['additions', 'deletions', 'filename', 'sha', 'timestamp', 'author']\n", " )\n", "commits_raw.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We got two different kind of content for the rows:\n", "\n", "\n", "For each other row, we got some statistics about the modified files:\n", "<pre>\n", "2\t0\tsrc/main/asciidoc/appendices/bibliography.adoc\n", "</pre>\n", "\n", "It contains the number of lines inserted, the number of lines deleted and the relative path of the file. With a little trick and a little bit of data wrangling, we can read that information into a nicely structured DataFrame.\n", "\n", "The last steps are easy. We fill all the empty file statistics rows with the commit's metadata." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style>\n", " .dataframe thead tr:only-child th {\n", " text-align: right;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: left;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>additions</th>\n", " <th>deletions</th>\n", " <th>filename</th>\n", " <th>sha</th>\n", " <th>timestamp</th>\n", " <th>author</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>101c9dc</td>\n", " <td>1.498817e+09</td>\n", " <td>Dave Syer</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>2</td>\n", " <td>3</td>\n", " <td>pom.xml</td>\n", " <td>101c9dc</td>\n", " <td>1.498817e+09</td>\n", " <td>Dave Syer</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>2</td>\n", " <td>3</td>\n", " <td>pom.xml</td>\n", " <td>ffa967c</td>\n", " <td>1.492026e+09</td>\n", " <td>Antoine Rey</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>readme.md</td>\n", " <td>ffa967c</td>\n", " <td>1.492026e+09</td>\n", " <td>Antoine Rey</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>readme.md</td>\n", " <td>fd1c742</td>\n", " <td>1.488785e+09</td>\n", " <td>Antoine Rey</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " additions deletions filename sha timestamp author\n", "0 NaN NaN NaN 101c9dc 1.498817e+09 Dave Syer\n", "1 2 3 pom.xml 101c9dc 1.498817e+09 Dave Syer\n", "2 2 3 pom.xml ffa967c 1.492026e+09 Antoine Rey\n", "3 1 1 readme.md ffa967c 1.492026e+09 Antoine Rey\n", "4 1 1 readme.md fd1c742 1.488785e+09 Antoine Rey" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "commits = commits_raw.fillna(method='ffill')\n", "commits.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And drop all the commit metadata rows that don't contain file statitics." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style>\n", " .dataframe thead tr:only-child th {\n", " text-align: right;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: left;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>additions</th>\n", " <th>deletions</th>\n", " <th>filename</th>\n", " <th>sha</th>\n", " <th>timestamp</th>\n", " <th>author</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>1</th>\n", " <td>2</td>\n", " <td>3</td>\n", " <td>pom.xml</td>\n", " <td>101c9dc</td>\n", " <td>1.498817e+09</td>\n", " <td>Dave Syer</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>2</td>\n", " <td>3</td>\n", " <td>pom.xml</td>\n", " <td>ffa967c</td>\n", " <td>1.492026e+09</td>\n", " <td>Antoine Rey</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>readme.md</td>\n", " <td>ffa967c</td>\n", " <td>1.492026e+09</td>\n", " <td>Antoine Rey</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>readme.md</td>\n", " <td>fd1c742</td>\n", " <td>1.488785e+09</td>\n", " <td>Antoine Rey</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>pom.xml</td>\n", " <td>fd1c742</td>\n", " <td>1.488785e+09</td>\n", " <td>Antoine Rey</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " additions deletions filename sha timestamp author\n", "1 2 3 pom.xml 101c9dc 1.498817e+09 Dave Syer\n", "2 2 3 pom.xml ffa967c 1.492026e+09 Antoine Rey\n", "3 1 1 readme.md ffa967c 1.492026e+09 Antoine Rey\n", "4 1 1 readme.md fd1c742 1.488785e+09 Antoine Rey\n", "5 1 0 pom.xml fd1c742 1.488785e+09 Antoine Rey" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "commits = commits.dropna()\n", "commits.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are finished! This is it. \n", "\n", "In summary, you'll need \"one-liner\" for converting a Git log file output that was exported with\n", "```\n", "git log --numstat --pretty=format:\"%x09%x09%x09%h%x09%at%x09%aN\" > git.log\n", "```\n", "into a <tt>DataFrame</tt>:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style>\n", " .dataframe thead tr:only-child th {\n", " text-align: right;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: left;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>additions</th>\n", " <th>deletions</th>\n", " <th>filename</th>\n", " <th>sha</th>\n", " <th>timestamp</th>\n", " <th>author</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>1</th>\n", " <td>2</td>\n", " <td>3</td>\n", " <td>pom.xml</td>\n", " <td>101c9dc</td>\n", " <td>1.498817e+09</td>\n", " <td>Dave Syer</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>2</td>\n", " <td>3</td>\n", " <td>pom.xml</td>\n", " <td>ffa967c</td>\n", " <td>1.492026e+09</td>\n", " <td>Antoine Rey</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>readme.md</td>\n", " <td>ffa967c</td>\n", " <td>1.492026e+09</td>\n", " <td>Antoine Rey</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>readme.md</td>\n", " <td>fd1c742</td>\n", " <td>1.488785e+09</td>\n", " <td>Antoine Rey</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>pom.xml</td>\n", " <td>fd1c742</td>\n", " <td>1.488785e+09</td>\n", " <td>Antoine Rey</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " additions deletions filename sha timestamp author\n", "1 2 3 pom.xml 101c9dc 1.498817e+09 Dave Syer\n", "2 2 3 pom.xml ffa967c 1.492026e+09 Antoine Rey\n", "3 1 1 readme.md ffa967c 1.492026e+09 Antoine Rey\n", "4 1 1 readme.md fd1c742 1.488785e+09 Antoine Rey\n", "5 1 0 pom.xml fd1c742 1.488785e+09 Antoine Rey" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.read_csv(\"../../spring-petclinic/git.log\", \n", " sep=\"\\t\", \n", " header=None,\n", " names=[\n", " 'additions', \n", " 'deletions', \n", " 'filename', \n", " 'sha', \n", " 'timestamp', \n", " 'author']).fillna(method='ffill').dropna().head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Bonus section\n", "We can now convert some columns to their correct data types. The columns <tt>additions</tt> and <tt>deletions</tt> columns are representing the added or deleted lines of code respectively. But there are also a few exceptions for binary files like images. We skip these lines with the <tt>errors='coerce'</tt> option. This will lead to <tt>Nan</tt> in the rows that will be dropped after the conversion. \n", "\n", "The <tt>timestamp</tt> column is a UNIX timestamp with the past seconds since January 1st 1970 we can easily convert with Pandas' <tt>to_datetime</tt> method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "commits['additions'] = pd.to_numeric(commits['additions'], errors='coerce')\n", "commits['deletions'] = pd.to_numeric(commits['deletions'], errors='coerce')\n", "commits = commits.dropna()\n", "commits['timestamp'] = pd.to_datetime(commits['timestamp'], unit=\"s\")\n", "commits.head()" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<matplotlib.axes._subplots.AxesSubplot at 0x1f05fc9e0f0>" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD8CAYAAAB5Pm/hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFBZJREFUeJzt3X+sZ3Wd3/Hnq+DSDXcDuNCbyUB7IZk1AWY7Zm7YJqvm\n3tJVqk1R07AQYpjVdjRhjU1I2tEmla4hIVvR/mF1OxuIbNzlSkSUAHaXEq/UpKgzhjL8kBV0yHI7\nzlTB0WsN7eC7f9wzu9+5vXfu99f1zvfT5yO5uef7OedzzufNyby+h8893+9JVSFJatff2uoBSJI2\nl0EvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJatzZWz0AgAsvvLBmZmZOafvZz37G\nueeeuzUD2kTWNXlarc26Js/q2g4ePPjDqrpoo35nRNDPzMxw4MCBU9oWFxeZm5vbmgFtIuuaPK3W\nZl2TZ3VtSV7sp59TN5LUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxGwZ9kkuSfDXJM0meTvKhrv31\nSR5J8t3u9wU9fT6c5PkkzyV522YWIEk6vX6u6E8At1TV5cA/AG5OcjmwD3i0qnYAj3av6dZdD1wB\nXAN8OslZmzF4SdLGNgz6qjpSVd/uln8KPAtsB64F7u42uxt4Z7d8LbBQVa9W1feB54Grxj1wSVJ/\nBvpkbJIZ4I3AN4DpqjrSrfoBMN0tbwce7+n2Ute2aWb2PbSZu1/X4dvfsSXHlaRBpKr62zCZAr4G\n3FZVX0zy46o6v2f9K1V1QZJPAY9X1ee69juBr1TVF1btby+wF2B6enr3wsLCKcdbXl5mamqqr7Ed\nWjre13bjtnP7eQP3GaSuSdJqXdBubdY1eVbXNj8/f7CqZjfq19cVfZLXAfcBf1pVX+yajybZVlVH\nkmwDjnXtS8AlPd0v7tpOUVX7gf0As7Oztfq7KQb5voo9W3VFf+PcwH1a/R6OVuuCdmuzrskzbG39\n3HUT4E7g2ar6RM+qB4CbuuWbgC/3tF+f5JwklwI7gG8OPDJJ0lj0c0X/28B7gENJnujaPgLcDtyb\n5H3Ai8B1AFX1dJJ7gWdYuWPn5qp6bewjlyT1ZcOgr6qvA1ln9dXr9LkNuG2EcUmSxsRPxkpS4wx6\nSWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJek\nxhn0ktQ4g16SGmfQS1Lj+nlm7F1JjiV5qqft80me6H4On3zEYJKZJD/vWfdHmzl4SdLG+nlm7GeB\nTwF/crKhqn735HKSO4DjPdu/UFW7xjVASdJo+nlm7GNJZtZalySsPBT8H453WJKkcRl1jv7NwNGq\n+m5P26XdtM3Xkrx5xP1LkkaUqtp4o5Ur+ger6spV7Z8Bnq+qO7rX5wBTVfWjJLuBLwFXVNVP1tjn\nXmAvwPT09O6FhYVT1i8vLzM1NdVXEYeWjm+80SbYuf28gfsMUtckabUuaLc265o8q2ubn58/WFWz\nG/XrZ45+TUnOBt4N7D7ZVlWvAq92yweTvAD8BnBgdf+q2g/sB5idna25ublT1i8uLrK6bT179j00\nTAkjO3zj3MB9BqlrkrRaF7Rbm3VNnmFrG2Xq5h8B36mql042JLkoyVnd8mXADuB7IxxDkjSifm6v\nvAf4b8AbkryU5H3dquuBe1Zt/hbgye52yy8AH6iql8c5YEnSYPq56+aGddr3rNF2H3Df6MOSJI2L\nn4yVpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMM\neklqnEEvSY0z6CWpcQa9JDXOoJekxvXzKMG7khxL8lRP261JlpI80f28vWfdh5M8n+S5JG/brIFL\nkvrTzxX9Z4Fr1mj/ZFXt6n4eBkhyOSvPkr2i6/Ppkw8LlyRtjQ2DvqoeA/p9wPe1wEJVvVpV3wee\nB64aYXySpBGlqjbeKJkBHqyqK7vXtwK/BxwHDgC3VNUrST4FPF5Vn+u2uxP4SlV9YY197gX2AkxP\nT+9eWFg4Zf3y8jJTU1N9FXFo6Xhf243bzu3nDdxnkLomSat1Qbu1WdfkWV3b/Pz8waqa3ajf2UMe\n7zPAx4Dqft8BvHeQHVTVfmA/wOzsbM3NzZ2yfnFxkdVt69mz76FBDj02h2+cG7jPIHVNklbrgnZr\ns67JM2xtQ911U1VHq+q1qvoF8Mf8zfTMEnBJz6YXd22SpC0yVNAn2dbz8l3AyTtyHgCuT3JOkkuB\nHcA3RxuiJGkUG07dJLkHmAMuTPIS8FFgLskuVqZuDgPvB6iqp5PcCzwDnABurqrXNmfokqR+bBj0\nVXXDGs13nmb724DbRhmUJGl8/GSsJDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEG\nvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxGwZ9kruSHEvyVE/bv0/ynSRPJrk/\nyfld+0ySnyd5ovv5o80cvCRpY/1c0X8WuGZV2yPAlVX1m8BfAh/uWfdCVe3qfj4wnmFKkoa1YdBX\n1WPAy6va/qKqTnQvHwcu3oSxSZLGYBxz9O8FvtLz+tJu2uZrSd48hv1LkkaQqtp4o2QGeLCqrlzV\n/m+AWeDdVVVJzgGmqupHSXYDXwKuqKqfrLHPvcBegOnp6d0LCwunrF9eXmZqaqqvIg4tHe9ru3Hb\nuf28gfsMUtckabUuaLc265o8q2ubn58/WFWzG/U7e9gDJtkD/BPg6ureLarqVeDVbvlgkheA3wAO\nrO5fVfuB/QCzs7M1Nzd3yvrFxUVWt61nz76HhqxiNIdvnBu4zyB1TZJW64J2a7OuyTNsbUNN3SS5\nBvhXwD+tqv/V035RkrO65cuAHcD3hjmGJGk8NryiT3IPMAdcmOQl4KOs3GVzDvBIEoDHuzts3gL8\nQZL/A/wC+EBVvbzmjiVJvxQbBn1V3bBG853rbHsfcN+og5IkjY+fjJWkxhn0ktQ4g16SGmfQS1Lj\nDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6g\nl6TGbRj0Se5KcizJUz1tr0/ySJLvdr8v6Fn34STPJ3kuyds2a+CSpP70c0X/WeCaVW37gEeragfw\naPeaJJcD1wNXdH0+ffJh4ZKkrbFh0FfVY8DqB3xfC9zdLd8NvLOnfaGqXq2q7wPPA1eNaaySpCEM\nO0c/XVVHuuUfANPd8nbgr3q2e6lrkyRtkVTVxhslM8CDVXVl9/rHVXV+z/pXquqCJJ8CHq+qz3Xt\ndwJfqaovrLHPvcBegOnp6d0LCwunrF9eXmZqaqqvIg4tHe9ru3Hbuf28gfsMUtckabUuaLc265o8\nq2ubn58/WFWzG/U7e8jjHU2yraqOJNkGHOval4BLera7uGv7f1TVfmA/wOzsbM3NzZ2yfnFxkdVt\n69mz76FBxj42h2+cG7jPIHVNklbrgnZrs67JM2xtw07dPADc1C3fBHy5p/36JOckuRTYAXxzyGNI\nksZgwyv6JPcAc8CFSV4CPgrcDtyb5H3Ai8B1AFX1dJJ7gWeAE8DNVfXaJo1dktSHDYO+qm5YZ9XV\n62x/G3DbKIOSJI2Pn4yVpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS\n1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxg37cHCSvAH4fE/TZcC/Bc4H/gXwP7v2\nj1TVw0OPUJI0kqGDvqqeA3YBJDkLWALuB34P+GRVfXwsI5QkjWRcUzdXAy9U1Ytj2p8kaUzGFfTX\nA/f0vP5gkieT3JXkgjEdQ5I0hFTVaDtIfgX4H8AVVXU0yTTwQ6CAjwHbquq9a/TbC+wFmJ6e3r2w\nsHDK+uXlZaampvoaw6Gl4yPVMKyd288buM8gdU2SVuuCdmuzrsmzurb5+fmDVTW7Ub9xBP21wM1V\n9dY11s0AD1bVlafbx+zsbB04cOCUtsXFRebm5voaw8y+h/oc7Xgdvv0dA/cZpK5J0mpd0G5t1jV5\nVteWpK+gH8fUzQ30TNsk2daz7l3AU2M4hiRpSEPfdQOQ5Fzgd4D39zT/YZJdrEzdHF61TpL0SzZS\n0FfVz4BfX9X2npFGJEkaKz8ZK0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9J\njTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0b9VGCh4GfAq8BJ6pqNsnrgc8D\nM6w8SvC6qnpltGFKkoY1jiv6+ara1fMk8n3Ao1W1A3i0ey1J2iKbMXVzLXB3t3w38M5NOIYkqU+p\nquE7J98HjrMydfOfqmp/kh9X1fnd+gCvnHy9qu9eYC/A9PT07oWFhVPWLy8vMzU11dc4Di0dH7qG\nUezcft7AfQapa5K0Whe0W5t1TZ7Vtc3Pzx/smU1Z10hz9MCbqmopyd8BHknynd6VVVVJ1nwnqar9\nwH6A2dnZmpubO2X94uIiq9vWs2ffQ4OPfAwO3zg3cJ9B6pokrdYF7dZmXZNn2NpGCvqqWup+H0ty\nP3AVcDTJtqo6kmQbcGyUY5zJZoZ4g7ll54mR35gO3/6OkfpL+v/L0HP0Sc5N8msnl4G3Ak8BDwA3\ndZvdBHx51EFKkoY3yhX9NHD/yjQ8ZwN/VlX/Ocm3gHuTvA94Ebhu9GFKkoY1dNBX1feAv79G+4+A\nq0cZlCRpfPxkrCQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS\n1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcaM8SvCSJF9N8kySp5N8qGu/NclSkie6n7ePb7iSpEGN\n8ijBE8AtVfXt7tmxB5M80q37ZFV9fPThSZJGNcqjBI8AR7rlnyZ5Ftg+roFJksZjLHP0SWaANwLf\n6Jo+mOTJJHcluWAcx5AkDSdVNdoOkinga8BtVfXFJNPAD4ECPgZsq6r3rtFvL7AXYHp6evfCwsIp\n65eXl5mamuprDIeWjo9Uwy/T9K/C0Z+Pto+d288bz2DGaJDzNWlarc26Js/q2ubn5w9W1exG/UYK\n+iSvAx4E/ryqPrHG+hngwaq68nT7mZ2drQMHDpzStri4yNzcXF/jmNn3UH8DPgPcsvMEdxwa5U8j\ncPj2d4xpNOMzyPmaNK3WZl2TZ3VtSfoK+lHuuglwJ/Bsb8gn2daz2buAp4Y9hiRpdKNcWv428B7g\nUJInuraPADck2cXK1M1h4P0jjVCSNJJR7rr5OpA1Vj08/HAkSePmJ2MlqXEGvSQ1zqCXpMYZ9JLU\nOINekhpn0EtS40b7iKa2xFZ+EvhM/FSupNPzil6SGmfQS1LjnLrRQNabNrpl5wn2bOKUklNG0vC8\nopekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mN8/ZKaQOHlo5v6q2jW+V0t8R6O2tbNu2KPsk1SZ5L\n8nySfZt1HEnS6W1K0Cc5C/iPwD8GLmflObKXb8axJEmnt1lTN1cBz1fV9wCSLADXAs9s0vHUuK38\nIrdbdm7ZobfMVv33dspoc2xW0G8H/qrn9UvAb23SsSQ1YhxvMJv9dRzj9st4c0tVjX+nyT8Drqmq\nf969fg/wW1X1+z3b7AX2di/fADy3ajcXAj8c++C2nnVNnlZrs67Js7q2v1dVF23UabOu6JeAS3pe\nX9y1/bWq2g/sX28HSQ5U1ezmDG/rWNfkabU265o8w9a2WXfdfAvYkeTSJL8CXA88sEnHkiSdxqZc\n0VfViSS/D/w5cBZwV1U9vRnHkiSd3qZ9YKqqHgYeHmEX607rTDjrmjyt1mZdk2eo2jblj7GSpDOH\n33UjSY0744K+5a9OSHI4yaEkTyQ5sNXjGVaSu5IcS/JUT9vrkzyS5Lvd7wu2cozDWKeuW5Msdefs\niSRv38oxDiPJJUm+muSZJE8n+VDX3sI5W6+2iT5vSf52km8m+e9dXf+uax/qnJ1RUzfdVyf8JfA7\nrHzI6lvADVXVxCdqkxwGZqtqou/xTfIWYBn4k6q6smv7Q+Dlqrq9e4O+oKr+9VaOc1Dr1HUrsFxV\nH9/KsY0iyTZgW1V9O8mvAQeBdwJ7mPxztl5t1zHB5y1JgHOrajnJ64CvAx8C3s0Q5+xMu6L/669O\nqKr/DZz86gSdQarqMeDlVc3XAnd3y3ez8o9toqxT18SrqiNV9e1u+afAs6x8er2Fc7ZebROtVix3\nL1/X/RRDnrMzLejX+uqEiT9pPQr4L0kOdp8Mbsl0VR3pln8ATG/lYMbsg0me7KZ2Jm56o1eSGeCN\nwDdo7Jytqg0m/LwlOSvJE8Ax4JGqGvqcnWlB37o3VdUuVr7V8+ZuqqA5tTIfeObMCY7mM8BlwC7g\nCHDH1g5neEmmgPuAf1lVP+ldN+nnbI3aJv68VdVrXV5cDFyV5MpV6/s+Z2da0G/41QmTrKqWut/H\ngPtZmapqxdFuvvTkvOmxLR7PWFTV0e4f3C+AP2ZCz1k3z3sf8KdV9cWuuYlztlZtrZw3gKr6MfBV\n4BqGPGdnWtA3+9UJSc7t/lhEknOBtwJPnb7XRHkAuKlbvgn48haOZWxO/qPqvIsJPGfdH/buBJ6t\nqk/0rJr4c7ZebZN+3pJclOT8bvlXWblB5TsMec7OqLtuALrboP4Df/PVCbdt8ZDGIsllrFzFw8on\nkv9sUmtLcg8wx8o36R0FPgp8CbgX+LvAi8B1VTVRf9hcp645Vv73v4DDwPt75kgnQpI3Af8VOAT8\nomv+CCtz2ZN+ztar7QYm+Lwl+U1W/th6FisX5PdW1R8k+XWGOGdnXNBLksbrTJu6kSSNmUEvSY0z\n6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1Lj/i9EaS/dzB7bPQAAAABJRU5ErkJggg==\n", "text/plain": [ "<matplotlib.figure.Figure at 0x1f05b918cf8>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "commits[commits['filename'].str.endswith(\".java\")]\\\n", " .groupby('filename')\\\n", " .count()['additions']\\\n", " .hist()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "nbpresent": { "id": "417386b7-669e-4c2e-93bc-2abae6b03699" } }, "source": [ "# Summary\n", "In this notebook, I showed you how to read some a Git log output with another separator trick in only one line. This is a very handy method and a good base for further analysis! " ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" }, "nbpresent": { "slides": { "1c3bf261-5fad-42f1-be29-bc970f66b78f": { "id": "1c3bf261-5fad-42f1-be29-bc970f66b78f", "prev": "2f379bd9-addb-4c73-aff7-0effc56da804", "regions": { "05f3625e-4608-4f56-b0c8-42757d38085e": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "cffa207d-719c-4ea3-ae5c-556ca5ee692e", "part": "whole" }, "id": "05f3625e-4608-4f56-b0c8-42757d38085e" } } }, "2f379bd9-addb-4c73-aff7-0effc56da804": { "id": "2f379bd9-addb-4c73-aff7-0effc56da804", "prev": "fa62ca0d-9305-405d-a301-04fab08ee8f3", "regions": { "ac003123-264d-4b81-95cd-d79b47595f27": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "022f08fe-2746-4631-b0e2-1df34c154675", "part": "whole" }, "id": "ac003123-264d-4b81-95cd-d79b47595f27" } } }, "3935cf8e-ea1d-4e1d-ad46-fcb10946cd1d": { "id": "3935cf8e-ea1d-4e1d-ad46-fcb10946cd1d", "prev": "88ab58e4-fc80-487b-83b5-687302c35522", "regions": { "a666344f-e0ba-42c7-9f85-12abfa643611": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "b38ec597-d6b6-4b10-ba27-709b9415b124", "part": "whole" }, "id": "a666344f-e0ba-42c7-9f85-12abfa643611" } } }, "46431dec-27a6-40c6-af95-5fb2f230ebd4": { "id": "46431dec-27a6-40c6-af95-5fb2f230ebd4", "prev": "90fbdba6-3c4e-436f-b4a1-a4a469dde026", "regions": { "ee88eafd-7388-4463-9651-9e672e3cf093": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "577cb8ae-fdd7-4835-93c0-3d52c7e7fa6d", "part": "whole" }, "id": "ee88eafd-7388-4463-9651-9e672e3cf093" } } }, "467b68fa-ba68-43f0-be64-972fa50485c2": { "id": "467b68fa-ba68-43f0-be64-972fa50485c2", "prev": "46431dec-27a6-40c6-af95-5fb2f230ebd4", "regions": { "06523258-1daa-4bbe-9b61-1af7bac105f8": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "0a9c1f53-c819-4fad-abf6-fa6ddcc6e53e", "part": "whole" }, "id": "06523258-1daa-4bbe-9b61-1af7bac105f8" } } }, "5ce0a5c0-69e6-403c-8b9e-e91a4f6d2084": { "id": "5ce0a5c0-69e6-403c-8b9e-e91a4f6d2084", "prev": "467b68fa-ba68-43f0-be64-972fa50485c2", "regions": { "447d5ea1-72a7-484d-b127-7c87b2fb17a5": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "0a64edd8-1d4e-4456-8d54-db30f82da038", "part": "whole" }, "id": "447d5ea1-72a7-484d-b127-7c87b2fb17a5" } } }, "69bc04e6-b390-4c39-8cfc-e901363073db": { "id": "69bc04e6-b390-4c39-8cfc-e901363073db", "prev": "b7d251b4-9300-4d87-a4c2-ce9e82cceaf1", "regions": { "d71fa94c-1908-4c10-8468-35ec4c0d0b60": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "f1852e51-8531-4ccc-af20-8348509cfc9c", "part": "whole" }, "id": "d71fa94c-1908-4c10-8468-35ec4c0d0b60" } } }, "88ab58e4-fc80-487b-83b5-687302c35522": { "id": "88ab58e4-fc80-487b-83b5-687302c35522", "prev": "69bc04e6-b390-4c39-8cfc-e901363073db", "regions": { "08604d82-a07c-41b3-ad38-e27b39ee83a2": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "86a45314-4211-4cd1-93fb-9c156534a2e0", "part": "whole" }, "id": "08604d82-a07c-41b3-ad38-e27b39ee83a2" } } }, "90fbdba6-3c4e-436f-b4a1-a4a469dde026": { "id": "90fbdba6-3c4e-436f-b4a1-a4a469dde026", "prev": "3935cf8e-ea1d-4e1d-ad46-fcb10946cd1d", "regions": { "66d3717e-256a-4063-96cf-415ced19032d": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "e34097f6-1626-49ba-bb93-b5b2dfb20d8c", "part": "whole" }, "id": "66d3717e-256a-4063-96cf-415ced19032d" } } }, "b7d251b4-9300-4d87-a4c2-ce9e82cceaf1": { "id": "b7d251b4-9300-4d87-a4c2-ce9e82cceaf1", "prev": null, "regions": { "0e54fbca-ee14-4d41-8134-a7e4848c8244": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "4e3ff556-2ed1-4264-9e5d-c34080b5ef43", "part": "whole" }, "id": "0e54fbca-ee14-4d41-8134-a7e4848c8244" } } }, "ce90176b-054d-44da-82e5-24e00f159d44": { "id": "ce90176b-054d-44da-82e5-24e00f159d44", "prev": "1c3bf261-5fad-42f1-be29-bc970f66b78f", "regions": { "a049b370-d20d-4b31-b12d-d1e98cf186a5": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "edfb29bb-0a19-4835-be91-60f8ae4941ca", "part": "whole" }, "id": "a049b370-d20d-4b31-b12d-d1e98cf186a5" } } }, "fa62ca0d-9305-405d-a301-04fab08ee8f3": { "id": "fa62ca0d-9305-405d-a301-04fab08ee8f3", "prev": "5ce0a5c0-69e6-403c-8b9e-e91a4f6d2084", "regions": { "e33a8ad9-898a-435b-9cd8-a0a9e971b7c0": { "attrs": { "height": 1, "width": 1, "x": 0, "y": 0 }, "content": { "cell": "81cdfe40-3a5d-44cf-9b55-eaf0a635761f", "part": "whole" }, "id": "e33a8ad9-898a-435b-9cd8-a0a9e971b7c0" } } } }, "themes": { "default": "66eaa5a2-7a73-471f-ab56-5b57ef30cb08", "theme": { "66eaa5a2-7a73-471f-ab56-5b57ef30cb08": { "id": "66eaa5a2-7a73-471f-ab56-5b57ef30cb08", "palette": { "19cc588f-0593-49c9-9f4b-e4d7cc113b1c": { "id": "19cc588f-0593-49c9-9f4b-e4d7cc113b1c", "rgb": [ 252, 252, 252 ] }, "31af15d2-7e15-44c5-ab5e-e04b16a89eff": { "id": "31af15d2-7e15-44c5-ab5e-e04b16a89eff", "rgb": [ 68, 68, 68 ] }, "50f92c45-a630-455b-aec3-788680ec7410": { "id": "50f92c45-a630-455b-aec3-788680ec7410", "rgb": [ 155, 177, 192 ] }, "c5cc3653-2ee1-402a-aba2-7caae1da4f6c": { "id": "c5cc3653-2ee1-402a-aba2-7caae1da4f6c", "rgb": [ 43, 126, 184 ] }, "efa7f048-9acb-414c-8b04-a26811511a21": { "id": "efa7f048-9acb-414c-8b04-a26811511a21", "rgb": [ 25.118061674008803, 73.60176211453744, 107.4819383259912 ] } }, "rules": { "blockquote": { "color": "50f92c45-a630-455b-aec3-788680ec7410" }, "code": { "font-family": "Anonymous Pro" }, "h1": { "color": "c5cc3653-2ee1-402a-aba2-7caae1da4f6c", "font-family": "Lato", "font-size": 8 }, "h2": { "color": "c5cc3653-2ee1-402a-aba2-7caae1da4f6c", "font-family": "Lato", "font-size": 6 }, "h3": { "color": "50f92c45-a630-455b-aec3-788680ec7410", "font-family": "Lato", "font-size": 5.5 }, "h4": { "color": "c5cc3653-2ee1-402a-aba2-7caae1da4f6c", "font-family": "Lato", "font-size": 5 }, "h5": { "font-family": "Lato" }, "h6": { "font-family": "Lato" }, "h7": { "font-family": "Lato" }, "pre": { "font-family": "Anonymous Pro", "font-size": 4 } }, "text-base": { "font-family": "Merriweather", "font-size": 4 } }, "ecdad483-e20b-4d01-87c5-efd37e40b00e": { "backgrounds": { "backgroundColor": { "background-color": "backgroundColor", "id": "backgroundColor" } }, "id": "ecdad483-e20b-4d01-87c5-efd37e40b00e", "palette": { "backgroundColor": { "id": "backgroundColor", "rgb": [ 256, 256, 256 ] }, "headingColor": { "id": "headingColor", "rgb": [ 34, 34, 34 ] }, "linkColor": { "id": "linkColor", "rgb": [ 42, 118, 221 ] }, "mainColor": { "id": "mainColor", "rgb": [ 34, 34, 34 ] } }, "rules": { "a": { "color": "linkColor" }, "h1": { "color": "headingColor", "font-family": "Source Sans Pro", "font-size": 5.25 }, "h2": { "color": "headingColor", "font-family": "Source Sans Pro", "font-size": 4 }, "h3": { "color": "headingColor", "font-family": "Source Sans Pro", "font-size": 3.5 }, "h4": { "color": "headingColor", "font-family": "Source Sans Pro", "font-size": 3 }, "h5": { "color": "headingColor", "font-family": "Source Sans Pro" }, "h6": { "color": "headingColor", "font-family": "Source Sans Pro" }, "h7": { "color": "headingColor", "font-family": "Source Sans Pro" }, "li": { "color": "mainColor", "font-family": "Source Sans Pro", "font-size": 6 }, "p": { "color": "mainColor", "font-family": "Source Sans Pro", "font-size": 6 } }, "text-base": { "color": "mainColor", "font-family": "Source Sans Pro", "font-size": 6 } } } } } }, "nbformat": 4, "nbformat_minor": 1 }