{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Context\n", "\n", "Often, it isn't possible to get the real data where we applied our analysis. In these cases, we can generate similar dataset that contain similar phenomena based on real data. This notebook shows an example about how we can do it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get base data\n", "The data, we want to derive another dataset. It's just there to get some realistic file names" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
additionsdeletionsfileshatimestampauthor
111pom.xmlf96d80e2018-06-12 08:32:28Dirk Mahler
355jqassistant/layer.adocd6e95092018-05-30 14:59:44Dirk Mahler
511jqassistant/layer.adoc87b88d92018-05-18 22:43:32Dirk Mahler
740pom.xmlebb50e02018-05-17 20:51:14Dirk Mahler
911jqassistant/index.adocb9b6dcf2018-05-16 21:32:29Dirk Mahler
\n", "
" ], "text/plain": [ " additions deletions file sha timestamp \\\n", "1 1 1 pom.xml f96d80e 2018-06-12 08:32:28 \n", "3 5 5 jqassistant/layer.adoc d6e9509 2018-05-30 14:59:44 \n", "5 1 1 jqassistant/layer.adoc 87b88d9 2018-05-18 22:43:32 \n", "7 4 0 pom.xml ebb50e0 2018-05-17 20:51:14 \n", "9 1 1 jqassistant/index.adoc b9b6dcf 2018-05-16 21:32:29 \n", "\n", " author \n", "1 Dirk Mahler \n", "3 Dirk Mahler \n", "5 Dirk Mahler \n", "7 Dirk Mahler \n", "9 Dirk Mahler " ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from lib.ozapfdis import git_tc\n", "\n", "log = git_tc.log_numstat(\"C:/dev/repos/buschmais-spring-petclinic\")\n", "log.head()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
additionsdeletionsfileshatimestampauthortype
23445src/test/java/org/springframework/samples/petc...e5254152016-08-19 16:54:56Antoine Reyother
235257src/test/java/org/springframework/samples/petc...e5254152016-08-19 16:54:56Antoine Reyother
236219src/test/java/org/springframework/samples/petc...e5254152016-08-19 16:54:56Antoine Reyother
237233src/test/java/org/springframework/samples/petc...e5254152016-08-19 16:54:56Antoine Reyother
238106src/test/java/org/springframework/samples/petc...e5254152016-08-19 16:54:56Antoine Reyother
\n", "
" ], "text/plain": [ " additions deletions file \\\n", "234 4 5 src/test/java/org/springframework/samples/petc... \n", "235 25 7 src/test/java/org/springframework/samples/petc... \n", "236 21 9 src/test/java/org/springframework/samples/petc... \n", "237 23 3 src/test/java/org/springframework/samples/petc... \n", "238 10 6 src/test/java/org/springframework/samples/petc... \n", "\n", " sha timestamp author type \n", "234 e525415 2016-08-19 16:54:56 Antoine Rey other \n", "235 e525415 2016-08-19 16:54:56 Antoine Rey other \n", "236 e525415 2016-08-19 16:54:56 Antoine Rey other \n", "237 e525415 2016-08-19 16:54:56 Antoine Rey other \n", "238 e525415 2016-08-19 16:54:56 Antoine Rey other " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "log = log[log.file.str.contains(\".java\")]\n", "log.loc[log.file.str.contains(\"/jdbc/\"), 'type'] = \"jdbc\"\n", "log.loc[log.file.str.contains(\"/jpa/\"), 'type'] = \"jpa\"\n", "log.loc[log.type.isna(), 'type'] = \"other\"\n", "log.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create synthetic dataset 1\n", "For the first technology, where \"JDBC\" was used." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create committed lines" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lines
0118
150
278
3142
4123
\n", "
" ], "text/plain": [ " lines\n", "0 118\n", "1 50\n", "2 78\n", "3 142\n", "4 123" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "\n", "np.random.seed(0)\n", "# adding period\n", "added_lines = [int(np.random.normal(30,50)) for i in range(0,600)]\n", "# deleting period\n", "added_lines.extend([int(np.random.normal(-50,100)) for i in range(0,200)])\n", "added_lines.extend([int(np.random.normal(-2,20)) for i in range(0,200)])\n", "added_lines.extend([int(np.random.normal(-3,10)) for i in range(0,200)])\n", "df_jdbc = pd.DataFrame()\n", "df_jdbc['lines'] = added_lines\n", "df_jdbc.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add timestamp" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 00:00:00\n", "1 00:00:01\n", "2 00:00:02\n", "3 00:00:03\n", "4 00:00:04\n", "dtype: timedelta64[ns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "times = pd.timedelta_range(\"00:00:00\",\"23:59:59\", freq=\"s\")\n", "times = pd.Series(times)\n", "times.head()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 2013-05-15 03:35:33\n", "1 2013-05-16 02:15:44\n", "2 2013-05-17 15:12:26\n", "3 2013-05-20 00:16:06\n", "4 2013-05-21 17:43:53\n", "dtype: datetime64[ns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dates = pd.date_range('2013-05-15', '2017-07-23')\n", "dates = pd.to_datetime(dates)\n", "dates = dates[~dates.dayofweek.isin([5,6])]\n", "dates = pd.Series(dates)\n", "dates = dates.add(times.sample(len(dates), replace=True).values)\n", "dates.head()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
linestimestamp
01182013-05-15 03:35:33
1502013-05-16 02:15:44
2782013-05-17 15:12:26
31422013-05-24 05:52:31
41232013-05-28 08:15:35
\n", "
" ], "text/plain": [ " lines timestamp\n", "0 118 2013-05-15 03:35:33\n", "1 50 2013-05-16 02:15:44\n", "2 78 2013-05-17 15:12:26\n", "3 142 2013-05-24 05:52:31\n", "4 123 2013-05-28 08:15:35" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_jdbc['timestamp'] = dates.sample(len(df_jdbc), replace=True).sort_values().reset_index(drop=True)\n", "df_jdbc = df_jdbc.sort_index()\n", "df_jdbc.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Treat first commit separetely\n", "Set a fixed value because we have to start with some code at the beginning" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
linestimestamp
02502013-05-15 03:35:33
1502013-05-16 02:15:44
2782013-05-17 15:12:26
31422013-05-24 05:52:31
41232013-05-28 08:15:35
\n", "
" ], "text/plain": [ " lines timestamp\n", "0 250 2013-05-15 03:35:33\n", "1 50 2013-05-16 02:15:44\n", "2 78 2013-05-17 15:12:26\n", "3 142 2013-05-24 05:52:31\n", "4 123 2013-05-28 08:15:35" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_jdbc.loc[0, 'lines'] = 250\n", "df_jdbc.head()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "df_jdbc = df_jdbc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add file names\n", "Sample file names including their paths from an existing dataset" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "df_jdbc['file'] = log[log['type'] == 'jdbc']['file'].sample(len(df_jdbc), replace=True).values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check dataset" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD8CAYAAAB5Pm/hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAE2dJREFUeJzt3X+s3XV9x/Hne1SQcJW2IHdd26wYG6exQeEGWVjMrTh+FGNZIhkLkRa79B90LHbROhL9YzPCFmSSLJhm6MqC3hGU0FH80VXujH/ApMpasDAK6+DSrp0C1Suoa/beH+fT7ex6uOfc3u/puffT5yM5Od/v5/s53+/7Hc553W+/5weRmUiS6vVrgy5AktRfBr0kVc6gl6TKGfSSVDmDXpIqZ9BLUuUMekmqnEEvSZUz6CWpcgsGXQDA2WefnStWrBhoDT/72c8444wzBlpDk+xnbrOfuW8+9LRr164fZeabus2bE0G/YsUKHn300YHWMD4+zujo6EBraJL9zG32M/fNh54i4t97meelG0mqnEEvSZUz6CWpcga9JFXOoJekyhn0klS5noI+IhZGxL0R8WRE7I2I346IxRGxIyKeLveLytyIiNsjYl9E7I6I8/vbgiRpOr2e0X8e+EZm/hZwHrAX2AzszMyVwM6yDnAFsLLcNgJ3NFqxJGlGugZ9RLwReA9wJ0Bm/jIzXwbWAlvLtK3AVWV5LXBXtjwMLIyIJY1XLknqSS/fjH0z8J/AlyLiPGAXcCMwnJkHATLzYEScU+YvBZ5ve/xEGTvYWNXSCbJi8/a+7HfTqqOs77Lv/Tdf2Zdj6+QTmTn9hIgR4GHg4sx8JCI+D/wE+GhmLmyb91JmLoqI7cBnM/O7ZXwn8PHM3DVlvxtpXdpheHj4grGxsSb7mrHJyUmGhoYGWkOT7KcZe1440pf9Dp8Oh16dfs6qpWf25dj9UNvzDeZHT6tXr96VmSPd5vVyRj8BTGTmI2X9XlrX4w9FxJJyNr8EONw2f3nb45cBB6buNDO3AFsARkZGctC/KTEfftdiJuynGd3Ouo/XplVHuXXP9C+//deO9uXY/VDb8w3q6qnrNfrM/A/g+Yh4axm6BPghsA1YV8bWAfeX5W3AdeXTNxcBR45d4pEknXi9/nrlR4G7I+JU4Fngelp/JO6JiA3Ac8DVZe6DwBpgH/BKmStJGpCegj4zHwM6XQe6pMPcBG6YZV2SpIb4zVhJqpxBL0mVM+glqXIGvSRVzqCXpMoZ9JJUOYNekipn0EtS5Qx6SaqcQS9JlTPoJalyBr0kVc6gl6TKGfSSVDmDXpIqZ9BLUuUMekmqnEEvSZUz6CWpcga9JFXOoJekyhn0klQ5g16SKmfQS1LlDHpJqlxPQR8R+yNiT0Q8FhGPlrHFEbEjIp4u94vKeETE7RGxLyJ2R8T5/WxAkjS9mZzRr87Md2bmSFnfDOzMzJXAzrIOcAWwstw2Anc0VawkaeZmc+lmLbC1LG8FrmobvytbHgYWRsSSWRxHkjQLvQZ9At+KiF0RsbGMDWfmQYByf04ZXwo83/bYiTImSRqABT3OuzgzD0TEOcCOiHhymrnRYSx/ZVLrD8ZGgOHhYcbHx3sspT8mJycHXkOT7KcZm1Yd7ct+h0/vvu/59N+vtucb1NVTT0GfmQfK/eGIuA+4EDgUEUsy82C5NHO4TJ8Alrc9fBlwoMM+twBbAEZGRnJ0dPS4m2jC+Pg4g66hSfbTjPWbt/dlv5tWHeXWPdO//PZfO9qXY/dDbc83qKunrpduIuKMiHjDsWXgUuBxYBuwrkxbB9xflrcB15VP31wEHDl2iUeSdOL1ckY/DNwXEcfmfzkzvxER3wPuiYgNwHPA1WX+g8AaYB/wCnB941VLknrWNegz81ngvA7jPwYu6TCewA2NVCdJmjW/GStJlTPoJalyBr0kVc6gl6TKGfSSVDmDXpIqZ9BLUuUMekmqnEEvSZUz6CWpcga9JFXOoJekyhn0klQ5g16SKmfQS1LlDHpJqpxBL0mVM+glqXIGvSRVzqCXpMoZ9JJUOYNekipn0EtS5Qx6SaqcQS9JlTPoJalyPQd9RJwSET+IiAfK+rkR8UhEPB0Rfx8Rp5bx08r6vrJ9RX9KlyT1YiZn9DcCe9vWbwFuy8yVwEvAhjK+AXgpM98C3FbmSZIGpKegj4hlwJXA35T1AN4L3FumbAWuKstryzpl+yVlviRpACIzu0+KuBf4LPAG4E+A9cDD5aydiFgOfD0z3xERjwOXZ+ZE2fYM8O7M/NGUfW4ENgIMDw9fMDY21lhTx2NycpKhoaGB1tAk+2nGnheO9GW/w6fDoVenn7Nq6Zl9OXY/1PZ8g/nR0+rVq3dl5ki3eQu6TYiI9wOHM3NXRIweG+4wNXvY9n8DmVuALQAjIyM5Ojo6dcoJNT4+zqBraJL9NGP95u192e+mVUe5dc/0L7/914725dj9UNvzDerqqWvQAxcDH4iINcDrgTcCfwUsjIgFmXkUWAYcKPMngOXAREQsAM4EXmy8cklST7peo8/MT2bmssxcAVwDfDszrwUeAj5Ypq0D7i/L28o6Zfu3s5frQ5KkvpjN5+g/AXwsIvYBZwF3lvE7gbPK+MeAzbMrUZI0G71cuvlfmTkOjJflZ4ELO8z5OXB1A7VJkhrgN2MlqXIGvSRVzqCXpMoZ9JJUOYNekipn0EtS5Qx6SaqcQS9JlTPoJalyBr0kVc6gl6TKGfSSVDmDXpIqZ9BLUuUMekmqnEEvSZUz6CWpcga9JFXOoJekyhn0klQ5g16SKmfQS1LlDHpJqpxBL0mVM+glqXILuk2IiNcD3wFOK/PvzcxPR8S5wBiwGPg+8KHM/GVEnAbcBVwA/Bj4/czc36f6pWqt2Lx9IMfdf/OVAzmu+qeXM/pfAO/NzPOAdwKXR8RFwC3AbZm5EngJ2FDmbwBeysy3ALeVeZKkAeka9NkyWVZfV24JvBe4t4xvBa4qy2vLOmX7JRERjVUsSZqRnq7RR8QpEfEYcBjYATwDvJyZR8uUCWBpWV4KPA9Qth8BzmqyaElS7yIze58csRC4D/gU8KVyeYaIWA48mJmrIuIJ4LLMnCjbngEuzMwfT9nXRmAjwPDw8AVjY2NN9HPcJicnGRoaGmgNTbKfZux54Uhf9jt8Ohx6tS+7nrVVS8+c8WNqe77B/Ohp9erVuzJzpNu8rm/GtsvMlyNiHLgIWBgRC8pZ+zLgQJk2ASwHJiJiAXAm8GKHfW0BtgCMjIzk6OjoTEpp3Pj4OIOuoUn204z1fXpDdNOqo9y6Z0YvvxNm/7WjM35Mbc83qKunrpduIuJN5UyeiDgdeB+wF3gI+GCZtg64vyxvK+uU7d/OmfyzQZLUqF5OKZYAWyPiFFp/GO7JzAci4ofAWET8OfAD4M4y/07g7yJiH60z+Wv6ULckqUddgz4zdwPv6jD+LHBhh/GfA1c3Up0kadb8ZqwkVc6gl6TKGfSSVDmDXpIqZ9BLUuUMekmqnEEvSZUz6CWpcga9JFXOoJekyhn0klQ5g16SKmfQS1LlDHpJqpxBL0mVM+glqXIGvSRVzqCXpMoZ9JJUOYNekipn0EtS5Qx6SaqcQS9JlTPoJalyBr0kVa5r0EfE8oh4KCL2RsQTEXFjGV8cETsi4ulyv6iMR0TcHhH7ImJ3RJzf7yYkSa+tlzP6o8CmzHwbcBFwQ0S8HdgM7MzMlcDOsg5wBbCy3DYCdzRetSSpZ12DPjMPZub3y/JPgb3AUmAtsLVM2wpcVZbXAndly8PAwohY0njlkqSezOgafUSsAN4FPAIMZ+ZBaP0xAM4p05YCz7c9bKKMSZIGIDKzt4kRQ8A/AZ/JzK9FxMuZubBt+0uZuSgitgOfzczvlvGdwMczc9eU/W2kdWmH4eHhC8bGxprp6DhNTk4yNDQ00BqaZD/N2PPCkb7sd/h0OPRqX3Y9a6uWnjnjx9T2fIP50dPq1at3ZeZIt3kLetlZRLwO+Cpwd2Z+rQwfioglmXmwXJo5XMYngOVtD18GHJi6z8zcAmwBGBkZydHR0V5K6Zvx8XEGXUOT7KcZ6zdv78t+N606yq17enr5nXD7rx2d8WNqe75BXT318qmbAO4E9mbm59o2bQPWleV1wP1t49eVT99cBBw5dolHknTi9XJKcTHwIWBPRDxWxv4UuBm4JyI2AM8BV5dtDwJrgH3AK8D1jVYsSZqRrkFfrrXHa2y+pMP8BG6YZV2SpIb4zVhJqpxBL0mVM+glqXIGvSRVzqCXpMoZ9JJUOYNekipn0EtS5Qx6SaqcQS9JlTPoJalyBr0kVc6gl6TKGfSSVDmDXpIqZ9BLUuUMekmqnEEvSZUz6CWpcga9JFXOoJekyhn0klQ5g16SKmfQS1LlDHpJqpxBL0mV6xr0EfHFiDgcEY+3jS2OiB0R8XS5X1TGIyJuj4h9EbE7Is7vZ/GSpO56OaP/W+DyKWObgZ2ZuRLYWdYBrgBWlttG4I5mypQkHa+uQZ+Z3wFenDK8FthalrcCV7WN35UtDwMLI2JJU8VKkmYuMrP7pIgVwAOZ+Y6y/nJmLmzb/lJmLoqIB4CbM/O7ZXwn8InMfLTDPjfSOutneHj4grGxsQbaOX6Tk5MMDQ0NtIYm2U8z9rxwpC/7HT4dDr3al10PRFP9rFp65ux30pD58BpavXr1rswc6TZvQcPHjQ5jHf+SZOYWYAvAyMhIjo6ONlzKzIyPjzPoGppkP81Yv3l7X/a7adVRbt3T9MtvcJrqZ/+1o7MvpiE1vYaO91M3h45dkin3h8v4BLC8bd4y4MDxlydJmq3j/RO8DVgH3Fzu728b/0hEjAHvBo5k5sFZV6mT3p4XjvTt7FqqXdegj4ivAKPA2RExAXyaVsDfExEbgOeAq8v0B4E1wD7gFeD6PtQsSZqBrkGfmX/wGpsu6TA3gRtmW5QkqTl+M1aSKmfQS1LlDHpJqpxBL0mVM+glqXIGvSRVzqCXpMoZ9JJUOYNekipn0EtS5Qx6SaqcQS9JlTPoJalyBr0kVa6e/5eZTogVA/qff2xaNZDDSlXwjF6SKmfQS1LlDHpJqpxBL0mVM+glqXIGvSRVzo9XzkO9fMRx06qjrB/QRyElzS2e0UtS5Qx6Saqcl24kzRmD+ub1/puvHMhxT5S+nNFHxOUR8VRE7IuIzf04hiSpN42f0UfEKcBfA78LTADfi4htmfnDpo81aIM6+5CkmejHGf2FwL7MfDYzfwmMAWv7cBxJUg/6cY1+KfB82/oE8O4+HAdo7qzajyNKJ69OOXKiMuFEvD8QmdnsDiOuBi7LzD8s6x8CLszMj06ZtxHYWFbfCjzVaCEzdzbwowHX0CT7mdvsZ+6bDz39Zma+qdukfpzRTwDL29aXAQemTsrMLcCWPhz/uETEo5k5Mug6mmI/c5v9zH019dSPa/TfA1ZGxLkRcSpwDbCtD8eRJPWg8TP6zDwaER8BvgmcAnwxM59o+jiSpN705QtTmfkg8GA/9t1Hc+YyUkPsZ26zn7mvmp4afzNWkjS3+Fs3klS5ky7oI+LPImJ3RDwWEd+KiN8o4xERt5efbdgdEee3PWZdRDxdbusGV/2vioi/jIgnS833RcTCtm2fLP08FRGXtY3P2Z+oiIirI+KJiPjviBiZsm3e9dPJfKsXICK+GBGHI+LxtrHFEbGjvC52RMSiMv6ar6W5IiKWR8RDEbG3PN9uLOPztqdpZeZJdQPe2Lb8R8AXyvIa4OtAABcBj5TxxcCz5X5RWV406D7aergUWFCWbwFuKctvB/4FOA04F3iG1pvjp5TlNwOnljlvH3Qfbf28jdb3KsaBkbbxedlPh/7mVb1tdb8HOB94vG3sL4DNZXlz23Ov42tpLt2AJcD5ZfkNwL+W59i87Wm620l3Rp+ZP2lbPQM49ibFWuCubHkYWBgRS4DLgB2Z+WJmvgTsAC4/oUVPIzO/lZlHy+rDtL63AK1+xjLzF5n5b8A+Wj9PMad/oiIz92Zmpy/Pzct+Ophv9QKQmd8BXpwyvBbYWpa3Ale1jXd6Lc0ZmXkwM79fln8K7KX1rf5529N0TrqgB4iIz0TE88C1wKfKcKefblg6zfhc9GFaZx1QRz/taulnvtU7neHMPAit4ATOKePzqseIWAG8C3iESnqaqsrfo4+IfwR+vcOmmzLz/sy8CbgpIj4JfAT4NK1/kk2V04yfMN36KXNuAo4Cdx97WIf5Sec/7nOun04P6zA2J/qZoYE/n06AedNjRAwBXwX+ODN/EtGp9NbUDmNzsqdOqgz6zHxfj1O/DGynFfSv9dMNE8DolPHxWRc5A936KW8Qvx+4JMsFRab/KYquP1HRTzP479NuzvYzQz39RMg8cSgilmTmwXIZ43AZnxc9RsTraIX83Zn5tTI8r3t6LSfdpZuIWNm2+gHgybK8DbiuvLt+EXCk/NPtm8ClEbGovAN/aRmbEyLicuATwAcy85W2TduAayLitIg4F1gJ/DPz9ycqaulnvtU7nW3AsU+hrQPubxvv9FqaM6J16n4nsDczP9e2ad72NK1Bvxt8om+0/oI/DuwG/gFYWsaD1v8w5RlgD///Ex8fpvXm3z7g+kH3MKWffbSuHT5Wbl9o23ZT6ecp4Iq28TW0PmXwDK3LJQPvo62236N19vQL4BDwzfncz2v0OK/qLTV/BTgI/Ff577MBOAvYCTxd7heXua/5WporN+B3aF162d322lkzn3ua7uY3YyWpcifdpRtJOtkY9JJUOYNekipn0EtS5Qx6SaqcQS9JlTPoJalyBr0kVe5/AMYLWZuaLv0hAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "df_jdbc.lines.hist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sum up the data and check if it was created as wanted." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_jdbc_timed = df_jdbc.set_index('timestamp')\n", "df_jdbc_timed['count'] = df_jdbc_timed.lines.cumsum()\n", "df_jdbc_timed['count'].plot()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Timestamp('2017-07-21 19:02:47')" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "last_non_zero_timestamp = df_jdbc_timed[df_jdbc_timed['count'] >= 0].index.max()\n", "last_non_zero_timestamp" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
linestimestampfile
02502013-05-15 03:35:33src/main/java/org/springframework/samples/petc...
1502013-05-16 02:15:44src/main/java/org/springframework/samples/petc...
2782013-05-17 15:12:26src/main/java/org/springframework/samples/petc...
31422013-05-24 05:52:31src/main/java/org/springframework/samples/petc...
41232013-05-28 08:15:35src/main/java/org/springframework/samples/petc...
\n", "
" ], "text/plain": [ " lines timestamp \\\n", "0 250 2013-05-15 03:35:33 \n", "1 50 2013-05-16 02:15:44 \n", "2 78 2013-05-17 15:12:26 \n", "3 142 2013-05-24 05:52:31 \n", "4 123 2013-05-28 08:15:35 \n", "\n", " file \n", "0 src/main/java/org/springframework/samples/petc... \n", "1 src/main/java/org/springframework/samples/petc... \n", "2 src/main/java/org/springframework/samples/petc... \n", "3 src/main/java/org/springframework/samples/petc... \n", "4 src/main/java/org/springframework/samples/petc... " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_jdbc = df_jdbc[df_jdbc.timestamp <= last_non_zero_timestamp]\n", "df_jdbc.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create synthetic dataset 2" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
linestimestampfile
01502015-05-17 15:12:26src/main/java/org/springframework/samples/petc...
1862015-05-20 00:16:06src/main/java/org/springframework/samples/petc...
2-272015-05-24 05:52:31src/main/java/org/springframework/samples/petc...
3142015-06-04 21:09:15src/main/java/org/springframework/samples/petc...
4662015-06-06 19:22:39src/main/java/org/springframework/samples/petc...
\n", "
" ], "text/plain": [ " lines timestamp \\\n", "0 150 2015-05-17 15:12:26 \n", "1 86 2015-05-20 00:16:06 \n", "2 -27 2015-05-24 05:52:31 \n", "3 14 2015-06-04 21:09:15 \n", "4 66 2015-06-06 19:22:39 \n", "\n", " file \n", "0 src/main/java/org/springframework/samples/petc... \n", "1 src/main/java/org/springframework/samples/petc... \n", "2 src/main/java/org/springframework/samples/petc... \n", "3 src/main/java/org/springframework/samples/petc... \n", "4 src/main/java/org/springframework/samples/petc... " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_jpa = pd.DataFrame([int(np.random.normal(20,50)) for i in range(0,600)], columns=['lines'])\n", "df_jpa.loc[0,'lines'] = 150\n", "df_jpa['timestamp'] = pd.DateOffset(years=2) + dates.sample(len(df_jpa), replace=True).sort_values().reset_index(drop=True)\n", "df_jpa = df_jpa.sort_index()\n", "df_jpa['file'] = log[log['type'] == 'jpa']['file'].sample(len(df_jpa), replace=True).values\n", "df_jpa.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check dataset" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD8CAYAAAB5Pm/hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAE8dJREFUeJzt3X+Q3HV9x/Hnu6QickpA5GQS2oOaWpVrp2SHwTo6e1IVwRo6lSkOI8HSyXSq1qlxNJY/cMZxxLbU6mjtRKGG6nBaqiUFrNLUk3FG0ASRAxGJmGJCDFoh9ZTRpn33j/1muk32snv7ez8+HzM3t/v5fva7r2y+97rvfe+734vMRJJUrl8YdQBJ0mBZ9JJUOItekgpn0UtS4Sx6SSqcRS9JhbPoJalwFr0kFc6il6TCrRp1AIBTTz01Z2ZmRh1jWT/+8Y858cQTRx1jRSYt86TlBTMPi5mXt2vXrh9k5rPazRuLop+ZmWHnzp2jjrGshYUF6vX6qGOsyKRlnrS8YOZhMfPyIuLfO5nnoRtJKpxFL0mFs+glqXAWvSQVzqKXpMJZ9JJUOItekgpn0UtS4Sx6SSrcWLwzVhpnM1tubTm+efYQVyyzbFx1knnPNRcNKY2GxT16SSqcRS9JhbPoJalwFr0kFc6il6TCWfSSVDiLXpIKZ9FLUuEsekkqnEUvSYWz6CWpcG2LPiKuj4jHIuK+FsveGhEZEadW9yMiPhARuyPi3og4ZxChJUmd62SP/mPABUcORsQZwMuAR5qGXwmsqz42AR/uPaIkqRdtiz4z7wB+2GLR+4C3Adk0tgG4IRvuBFZHxOl9SSpJ6kpXx+gj4tXAvsz8+hGL1gDfbbq/txqTJI1IZGb7SREzwC2ZeXZEPA34AvDyzDwYEXuAWmb+ICJuBd6TmV+qHrcDeFtm7mqxzk00Du8wPT29fn5+vk//pP5bWlpiampq1DFWZNIyj3PexX0HW45PnwAHnhxymB51knl2zUnDCdOhcd42ljOszHNzc7sys9ZuXjd/eORXgDOBr0cEwFrg7og4l8Ye/BlNc9cCj7ZaSWZuBbYC1Gq1rNfrXUQZjoWFBcY5XyuTlnmc8y73hzo2zx7i2sXJ+ts9nWTec1l9OGE6NM7bxnLGLfOKD91k5mJmnpaZM5k5Q6Pcz8nM7wHbgcurs2/OAw5m5v7+RpYkrUQnp1feCHwZeG5E7I2IK48x/TbgYWA38BHgj/uSUpLUtbY/d2bma9ssn2m6ncAbeo8lSeoX3xkrSYWz6CWpcBa9JBXOopekwln0klQ4i16SCmfRS1LhLHpJKpxFL0mFs+glqXAWvSQVzqKXpMJZ9JJUOItekgpn0UtS4Sx6SSqcRS9JhbPoJalwFr0kFa6TPw5+fUQ8FhH3NY39RUR8MyLujYjPRMTqpmXviIjdEfFgRLxiUMElSZ3pZI/+Y8AFR4zdDpydmb8OfAt4B0BEPB+4FHhB9Zi/iYjj+pZWkrRibYs+M+8AfnjE2Ocz81B1905gbXV7AzCfmT/NzO8Au4Fz+5hXkrRCkZntJ0XMALdk5tktlv0z8MnM/HhEfBC4MzM/Xi27DvhsZt7U4nGbgE0A09PT6+fn53v5dwzU0tISU1NTo46xIpOWuV3exX0Hh5imM9MnwIEnR51iZTrJPLvmpOGE6dCkbcswvMxzc3O7MrPWbt6qXp4kIq4CDgGfODzUYlrL7ySZuRXYClCr1bJer/cSZaAWFhYY53ytTFrmdnmv2HLr8MJ0aPPsIa5d7OlLaOg6ybznsvpwwnRo0rZlGL/MXW+lEbEReBVwfv7fjwV7gTOapq0FHu0+niSpV12dXhkRFwBvB16dmT9pWrQduDQijo+IM4F1wFd6jylJ6lbbPfqIuBGoA6dGxF7gahpn2RwP3B4R0Dgu/0eZeX9EfAr4Bo1DOm/IzP8eVHhJUnttiz4zX9ti+LpjzH838O5eQkmS+sd3xkpS4Sx6SSqcRS9JhbPoJalwFr0kFc6il6TCWfSSVDiLXpIKZ9FLUuEsekkqnEUvSYWz6CWpcBa9JBXOopekwln0klQ4i16SCmfRS1LhLHpJKlzboo+I6yPisYi4r2nslIi4PSIeqj6fXI1HRHwgInZHxL0Rcc4gw0uS2utkj/5jwAVHjG0BdmTmOmBHdR/glcC66mMT8OH+xJQkdatt0WfmHcAPjxjeAGyrbm8DLm4avyEb7gRWR8Tp/QorSVq5bo/RT2fmfoDq82nV+Brgu03z9lZjkqQRicxsPyliBrglM8+u7j+Rmaublj+emSdHxK3AezLzS9X4DuBtmbmrxTo30Ti8w/T09Pr5+fk+/HMGY2lpiampqVHHWJFJy9wu7+K+g0NM05npE+DAk6NOsTKdZJ5dc9JwwnRo0rZlGF7mubm5XZlZazdvVZfrPxARp2fm/urQzGPV+F7gjKZ5a4FHW60gM7cCWwFqtVrW6/UuowzewsIC45yvlUnL3C7vFVtuHV6YDm2ePcS1i91+CY1GJ5n3XFYfTpgOTdq2DOOXudtDN9uBjdXtjcDNTeOXV2ffnAccPHyIR5I0Gm13RyLiRqAOnBoRe4GrgWuAT0XElcAjwCXV9NuAC4HdwE+A1w8gsyRpBdoWfWa+dplF57eYm8Abeg0lSeof3xkrSYWz6CWpcBa9JBXOopekwln0klQ4i16SCmfRS1LhLHpJKpxFL0mFs+glqXAWvSQVbrKusSpp4GZGeEnoPddcNLLnLpl79JJUOItekgpn0UtS4Sx6SSqcRS9JhbPoJalwPRV9RPxpRNwfEfdFxI0R8dSIODMi7oqIhyLikxHxlH6FlSStXNdFHxFrgD8Bapl5NnAccCnwXuB9mbkOeBy4sh9BJUnd6fXQzSrghIhYBTwN2A+8FLipWr4NuLjH55Ak9aDros/MfcBfAo/QKPiDwC7gicw8VE3bC6zpNaQkqXuRmd09MOJk4B+B3weeAP6hun91Zj6nmnMGcFtmzrZ4/CZgE8D09PT6+fn5rnIMw9LSElNTU6OOsSKTlrld3sV9B4eYpjPTJ8CBJ0edYmXGPfPsmpOOGpu0bRmGl3lubm5XZtbazevlWje/DXwnM78PEBGfBn4LWB0Rq6q9+rXAo60enJlbga0AtVot6/V6D1EGa2FhgXHO18qkZW6X94oRXn9lOZtnD3Ht4mRdLmrcM++5rH7U2KRtyzB+mXs5Rv8IcF5EPC0iAjgf+AbwBeA11ZyNwM29RZQk9aKXY/R30fil693AYrWurcDbgbdExG7gmcB1fcgpSepSTz/DZebVwNVHDD8MnNvLeiVJ/eM7YyWpcBa9JBXOopekwln0klQ4i16SCmfRS1LhLHpJKpxFL0mFs+glqXAWvSQVzqKXpMJZ9JJUOItekgpn0UtS4Sx6SSqcRS9JhbPoJalwFr0kFa6noo+I1RFxU0R8MyIeiIgXRsQpEXF7RDxUfT65X2ElSSvX6x79+4F/ycxfA34DeADYAuzIzHXAjuq+JGlEui76iHgG8BLgOoDM/FlmPgFsALZV07YBF/caUpLUvV726M8Cvg/8XUR8LSI+GhEnAtOZuR+g+nxaH3JKkroUmdndAyNqwJ3AizLzroh4P/CfwJsyc3XTvMcz86jj9BGxCdgEMD09vX5+fr6rHMOwtLTE1NTUqGOsyKRlbpd3cd/BIabpzPQJcODJUadYmXHPPLvmpKPGJm1bhuFlnpub25WZtXbzein6ZwN3ZuZMdf/FNI7HPweoZ+b+iDgdWMjM5x5rXbVaLXfu3NlVjmFYWFigXq+POsaKTFrmdnlnttw6vDAd2jx7iGsXV406xoqMe+Y911x01NikbcswvMwR0VHRd33oJjO/B3w3Ig6X+PnAN4DtwMZqbCNwc7fPIUnqXa/f2t8EfCIingI8DLyexjePT0XElcAjwCU9PockqQc9FX1m3gO0+rHh/F7WK0nqH98ZK0mFs+glqXAWvSQVzqKXpMJZ9JJUOItekgpn0UtS4Sx6SSqcRS9JhbPoJalwFr0kFc6il6TCWfSSVDiLXpIKZ9FLUuHG92+KaSwN6k/6bZ49xBVj+OcCpRK4Ry9JhbPoJalwFr0kFa7noo+I4yLiaxFxS3X/zIi4KyIeiohPVn84XJI0Iv3Yo38z8EDT/fcC78vMdcDjwJV9eA5JUpd6KvqIWAtcBHy0uh/AS4GbqinbgIt7eQ5JUm8iM7t/cMRNwHuApwNvBa4A7szM51TLzwA+m5lnt3jsJmATwPT09Pr5+fmucwza0tISU1NTo46xIoPKvLjvYN/XCTB9Ahx4ciCrHhgz99/smpOOGvPrb3lzc3O7MrPWbl7X59FHxKuAxzJzV0TUDw+3mNryO0lmbgW2AtRqtazX662mjYWFhQXGOV8rg8o8qHPdN88e4trFyXpbh5n7b89l9aPG/PrrXS//4y8CXh0RFwJPBZ4B/DWwOiJWZeYhYC3waO8xJUnd6voYfWa+IzPXZuYMcCnwb5l5GfAF4DXVtI3AzT2nlCR1bRDn0b8deEtE7AaeCVw3gOeQJHWoLwfrMnMBWKhuPwyc24/1SpJ65ztjJalwFr0kFc6il6TCWfSSVDiLXpIKZ9FLUuEsekkqnEUvSYWz6CWpcBa9JBXOopekwln0klQ4i16SCmfRS1LhLHpJKpxFL0mFs+glqXAWvSQVruuij4gzIuILEfFARNwfEW+uxk+JiNsj4qHq88n9iytJWqle9ugPAZsz83nAecAbIuL5wBZgR2auA3ZU9yVJI9J10Wfm/sy8u7r9I+ABYA2wAdhWTdsGXNxrSElS9/pyjD4iZoDfBO4CpjNzPzS+GQCn9eM5JEndiczsbQURU8AXgXdn5qcj4onMXN20/PHMPOo4fURsAjYBTE9Pr5+fn+8pxyAtLS0xNTU16hgrMqjMi/sO9n2dANMnwIEnB7LqgTFz/82uOemoMb/+ljc3N7crM2vt5vVU9BHxi8AtwOcy86+qsQeBembuj4jTgYXMfO6x1lOr1XLnzp1d5xi0hYUF6vX6qGOsyKAyz2y5te/rBNg8e4hrF1cNZN2DYubhGEbmPddc1Nf1DaszIqKjou/lrJsArgMeOFzyle3Axur2RuDmbp9DktS7Xr5Nvgh4HbAYEfdUY38GXAN8KiKuBB4BLuktoiSpF10XfWZ+CYhlFp/f7XolSf3lO2MlqXAWvSQVzqKXpMJZ9JJUOItekgo3We+cENDZm5Y2zx7iigG9uUnSZHGPXpIKZ9FLUuEsekkqnEUvSYWz6CWpcJ51I+nnXr8vv72Ss976fYnkVtyjl6TCWfSSVDiLXpIKZ9FLUuEsekkqnEUvSYUbWNFHxAUR8WBE7I6ILYN6HknSsQ3kPPqIOA74EPAyYC/w1YjYnpnf6Pdz9fv811a8EqSkSTaoPfpzgd2Z+XBm/gyYBzYM6LkkSccwqKJfA3y36f7eakySNGSRmf1facQlwCsy8w+r+68Dzs3MNzXN2QRsqu4+F3iw70H651TgB6MOsUKTlnnS8oKZh8XMy/vlzHxWu0mDutbNXuCMpvtrgUebJ2TmVmDrgJ6/ryJiZ2bWRp1jJSYt86TlBTMPi5l7N6hDN18F1kXEmRHxFOBSYPuAnkuSdAwD2aPPzEMR8Ubgc8BxwPWZef8gnkuSdGwDu0xxZt4G3Dao9Q/ZRBxiOsKkZZ60vGDmYTFzjwbyy1hJ0vjwEgiSVDiLvklEXBIR90fE/0RErWl8JiKejIh7qo+/bVq2PiIWq0s9fCAiYhwyV8veUeV6MCJe0TQ+NpeniIh3RsS+ptf2wqZlLfOPg3F6DY8lIvZU2+c9EbGzGjslIm6PiIeqzyePOOP1EfFYRNzXNNYyYzR8oHrd742Ic8Yo8/huy5npR/UBPI/GOf0LQK1pfAa4b5nHfAV4IRDAZ4FXjknm5wNfB44HzgS+TeMX48dVt88CnlLNef4IX/N3Am9tMd4y/6i3kSrbWL2GbbLuAU49YuzPgS3V7S3Ae0ec8SXAOc1fY8tlBC6svs4COA+4a4wyj+227B59k8x8IDM7fuNWRJwOPCMzv5yN/9EbgIsHFrCFY2TeAMxn5k8z8zvAbhqXppiUy1Msl38cTMpruJwNwLbq9jaGvM0eKTPvAH54xPByGTcAN2TDncDq6utwqJbJvJyRb8sWfefOjIivRcQXI+LF1dgaGm8OO2ycLvWw3GUoxvHyFG+sfgy/vukwwjjmPGycsx0pgc9HxK7q3egA05m5H6D6fNrI0i1vuYzj/tqP5bY8sNMrx1VE/Cvw7BaLrsrMm5d52H7glzLzPyJiPfBPEfECGj8+HqnvpzF1mXm5bK2+uQ/01Ktj5Qc+DLyryvAu4FrgDxjSa9ulcc52pBdl5qMRcRpwe0R8c9SBejTOr/3Ybss/d0Wfmb/dxWN+Cvy0ur0rIr4N/CqN78xrm6YedamHfugmM8e+DMUxL0/Rb53mj4iPALdUd9teRmOExjnb/5OZj1afH4uIz9A4ZHAgIk7PzP3VYY/HRhqyteUyju1rn5kHDt8et23ZQzcdiIhnVdfYJyLOAtYBD1c/Uv4oIs6rzra5HFhuD3vYtgOXRsTxEXEmjcxfYcwuT3HE8dXfBQ6fxbBc/nEwVq/hciLixIh4+uHbwMtpvL7bgY3VtI2MzzbbbLmM24HLq7NvzgMOHj7EM2pjvS2P4jfW4/pR/efspbH3fgD4XDX+e8D9NH5zfjfwO02PqdH4D/028EGqN6GNOnO17Koq14M0nQ1E48yFb1XLrhrxa/73wCJwL40viNPb5R+Hj3F6DY+R8axqm/16tf1eVY0/E9gBPFR9PmXEOW+kcXj0v6pt+crlMtI4DPKh6nVfpOlMszHIPLbbsu+MlaTCeehGkgpn0UtS4Sx6SSqcRS9JhbPoJalwFr0kFc6il6TCWfSSVLj/BU/IdrwMvPcGAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_jpa.lines.hist()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_jpa_timed = df_jpa.set_index('timestamp')\n", "df_jpa_timed['count'] = df_jpa_timed.lines.cumsum()\n", "df_jpa_timed['count'].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add some noise" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 2013-05-15 17:36:46\n", "1 2013-05-16 22:05:34\n", "2 2013-05-17 19:27:07\n", "3 2013-05-20 06:28:34\n", "4 2013-05-21 03:46:00\n", "dtype: datetime64[ns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dates_other = pd.date_range(df_jdbc.timestamp.min(), df_jpa.timestamp.max())\n", "dates_other = pd.to_datetime(dates_other)\n", "dates_other = dates_other[~dates_other.dayofweek.isin([5,6])]\n", "dates_other = pd.Series(dates_other)\n", "dates_other = dates_other.add(times.sample(len(dates_other), replace=True).values)\n", "dates_other.head()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
linestimestampfile
0382013-05-15 17:36:46src/test/java/org/springframework/samples/petc...
1742013-05-15 17:36:46src/main/java/org/springframework/samples/petc...
21432013-05-15 17:36:46src/main/java/org/springframework/samples/petc...
3-542013-05-15 17:36:46src/test/java/org/springframework/samples/petc...
4-462013-05-15 17:36:46src/main/java/org/springframework/samples/petc...
\n", "
" ], "text/plain": [ " lines timestamp \\\n", "0 38 2013-05-15 17:36:46 \n", "1 74 2013-05-15 17:36:46 \n", "2 143 2013-05-15 17:36:46 \n", "3 -54 2013-05-15 17:36:46 \n", "4 -46 2013-05-15 17:36:46 \n", "\n", " file \n", "0 src/test/java/org/springframework/samples/petc... \n", "1 src/main/java/org/springframework/samples/petc... \n", "2 src/main/java/org/springframework/samples/petc... \n", "3 src/test/java/org/springframework/samples/petc... \n", "4 src/main/java/org/springframework/samples/petc... " ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_other = pd.DataFrame([int(np.random.normal(5,100)) for i in range(0,40000)], columns=['lines'])\n", "df_other['timestamp'] = dates_other.sample(len(df_other), replace=True).sort_values().reset_index(drop=True)\n", "df_other = df_other.sort_index()\n", "df_other['file'] = log[log['type'] == 'other']['file'].sample(len(df_other), replace=True).values\n", "df_other.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check dataset" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAD8CAYAAACcjGjIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAFUdJREFUeJzt3H+sXOV95/H3d+1CaNrENjR3Ldtau6rV1sStyl4B3Uirq7gCA9mYP4LkFCWGWrLauk269aoxyR9USZBMU0qC2iSyYrcmYmNcmgor0BIvYVStVBx+JXHBTX0LLtzgDY1saG7YJnvpd/+Y55bxfebea8+MZ+6P90sa3XO+5znnPPN4xp85P2YiM5EkqdV/GHQHJElzj+EgSaoYDpKkiuEgSaoYDpKkiuEgSaoYDpKkiuEgSaoYDpKkytJBd6BTl112Wa5du7an2/zBD37AW9/61p5ucyFxfGbm+MzOMZpZP8bnqaee+l5m/tRs7eZtOKxdu5Ynn3yyp9tsNBqMjIz0dJsLieMzM8dndo7RzPoxPhHxT+fSztNKkqSK4SBJqhgOkqSK4SBJqhgOkqSK4SBJqhgOkqSK4SBJqhgOkqTKrN+Qjoj9wHuAVzLznaX2KeC/AT8C/hG4NTNfLctuA7YDbwAfysxHSn0z8BlgCfCFzNxT6uuAg8AK4GngA5n5o14+Samf1u5+aGD7PrnnhoHtWwvLuRw5/BmweUrtCPDOzPwF4B+A2wAiYgOwFbi8rPPZiFgSEUuAPwGuAzYA7y9tAe4E7s7M9cAZmsEiSRqgWY8cMvNvImLtlNpXW2YfB95XprcABzPzh8ALETEKXFmWjWbm8wARcRDYEhHHgXcDv1raHAB+H/hcJ09GatXvT/C7Nk5wywCPGqRe6sUP7/0acH+ZXkUzLCaNlRrAS1PqVwGXAq9m5kSb9pWI2AHsABgaGqLRaHTb97OMj4/3fJsLyXwbn10bJ2Zv1ENDl/R/n1PN9X+f+fYa6re5ND5dhUNEfAyYAO6bLLVplrQ/fZUztG8rM/cCewGGh4ez179e6C9Gzmy+jU+/P8Xv2jjBXccG+0PHJ28eGej+ZzPfXkP9NpfGp+NXckRso3mhelNmTv6HPgasaWm2Gni5TLerfw9YFhFLy9FDa3tJ0oB0dCtrufPoI8B7M/P1lkWHga0RcXG5C2k98HXgCWB9RKyLiItoXrQ+XELlMd68ZrENeLCzpyJJ6pVZwyEivgT8LfCzETEWEduBPwZ+EjgSEd+IiM8DZOazwCHgOeCvgZ2Z+UY5Kvgt4BHgOHCotIVmyPxuuXh9KbCvp89QknTezuVupfe3KU/7H3hm3gHc0ab+MPBwm/rzvHlHkyRpDvAb0pKkiuEgSaoYDpKkiuEgSaoYDpKkiuEgSaoYDpKkiuEgSaoYDpKkiuEgSaoYDpKkiuEgSaoYDpKkiuEgSaoYDpKkiuEgSaoYDpKkiuEgSaoYDpKkiuEgSaoYDpKkiuEgSaoYDpKkiuEgSaoYDpKkyqzhEBH7I+KViPi7ltqKiDgSESfK3+WlHhFxT0SMRsS3IuKKlnW2lfYnImJbS/0/R8Sxss49ERG9fpKSpPNzLkcOfwZsnlLbDTyameuBR8s8wHXA+vLYAXwOmmEC3A5cBVwJ3D4ZKKXNjpb1pu5LktRns4ZDZv4NcHpKeQtwoEwfAG5sqd+bTY8DyyJiJXAtcCQzT2fmGeAIsLkse1tm/m1mJnBvy7YkSQOytMP1hjLzFEBmnoqId5T6KuCllnZjpTZTfaxNva2I2EHzKIOhoSEajUaH3W9vfHy859tcSObb+OzaONHX/Q1d0v99TjXX/33m22uo3+bS+HQaDtNpd70gO6i3lZl7gb0Aw8PDOTIy0kEXp9doNOj1NheS+TY+t+x+qK/727VxgruO9fotdX5O3jwy0P3PZr69hvptLo1Pp3crfbecEqL8faXUx4A1Le1WAy/PUl/dpi5JGqBOw+EwMHnH0TbgwZb6B8tdS1cDr5XTT48A10TE8nIh+hrgkbLs+xFxdblL6YMt25IkDcisx8AR8SVgBLgsIsZo3nW0BzgUEduBF4GbSvOHgeuBUeB14FaAzDwdEZ8AnijtPp6Zkxe5f4PmHVGXAH9VHpKkAZo1HDLz/dMs2tSmbQI7p9nOfmB/m/qTwDtn64ckqX/8hrQkqWI4SJIqhoMkqWI4SJIqhoMkqWI4SJIqhoMkqWI4SJIqhoMkqWI4SJIqhoMkqWI4SJIqhoMkqWI4SJIqhoMkqWI4SJIqhoMkqWI4SJIqhoMkqWI4SJIqhoMkqWI4SJIqhoMkqWI4SJIqhoMkqdJVOETEf4+IZyPi7yLiSxHxlohYFxFHI+JERNwfEReVtheX+dGyfG3Ldm4r9W9HxLXdPSVJUrc6DoeIWAV8CBjOzHcCS4CtwJ3A3Zm5HjgDbC+rbAfOZObPAHeXdkTEhrLe5cBm4LMRsaTTfkmSutftaaWlwCURsRT4ceAU8G7ggbL8AHBjmd5S5inLN0VElPrBzPxhZr4AjAJXdtkvSVIXlna6YmZ+JyL+EHgR+L/AV4GngFczc6I0GwNWlelVwEtl3YmIeA24tNQfb9l06zpniYgdwA6AoaEhGo1Gp91va3x8vOfbXEjm2/js2jgxe6MeGrqk//ucaq7/+8y311C/zaXx6TgcImI5zU/964BXgT8HrmvTNCdXmWbZdPW6mLkX2AswPDycIyMj59fpWTQaDXq9zYVkvo3PLbsf6uv+dm2c4K5jHb+leuLkzSMD3f9s5ttrqN/m0vh0c1rpV4AXMvOfM/P/AV8G/guwrJxmAlgNvFymx4A1AGX524HTrfU260iSBqCbcHgRuDoifrxcO9gEPAc8BryvtNkGPFimD5d5yvKvZWaW+tZyN9M6YD3w9S76JUnqUjfXHI5GxAPA08AE8AzNUz4PAQcj4pOltq+ssg/4YkSM0jxi2Fq282xEHKIZLBPAzsx8o9N+SZK619UJ0sy8Hbh9Svl52txtlJn/Ctw0zXbuAO7opi+SpN7xG9KSpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqGA6SpIrhIEmqdBUOEbEsIh6IiL+PiOMR8csRsSIijkTEifJ3eWkbEXFPRIxGxLci4oqW7Wwr7U9ExLZun5QkqTvdHjl8BvjrzPw54BeB48Bu4NHMXA88WuYBrgPWl8cO4HMAEbECuB24CrgSuH0yUCRJg9FxOETE24D/CuwDyMwfZearwBbgQGl2ALixTG8B7s2mx4FlEbESuBY4kpmnM/MMcATY3Gm/JEndW9rFuj8N/DPwpxHxi8BTwIeBocw8BZCZpyLiHaX9KuCllvXHSm26uqTztHb3QwPZ78k9Nwxkv7pwugmHpcAVwG9n5tGI+AxvnkJqJ9rUcoZ6vYGIHTRPSTE0NESj0TivDs9mfHy859tcSDoZn2Pfee3CdOYc7NrY3/0NXQK7Nk70d6dzxLm+LnyPzWwujU834TAGjGXm0TL/AM1w+G5ErCxHDSuBV1rar2lZfzXwcqmPTKk32u0wM/cCewGGh4dzZGSkXbOONRoNer3NhaST8bllQJ9kB2HXxgnuOtbNW2r+OnnzyDm18z02s7k0Ph1fc8jM/wO8FBE/W0qbgOeAw8DkHUfbgAfL9GHgg+WupauB18rpp0eAayJiebkQfU2pSZIGpNuPOb8N3BcRFwHPA7fSDJxDEbEdeBG4qbR9GLgeGAVeL23JzNMR8QngidLu45l5ust+SZK60FU4ZOY3gOE2iza1aZvAzmm2sx/Y301fJEm94zekJUkVw0GSVDEcJEkVw0GSVDEcJEkVw0GSVDEcJEkVw0GSVDEcJEkVw0GSVDEcJEkVw0GSVDEcJEkVw0GSVDEcJEkVw0GSVDEcJEkVw0GSVDEcJEkVw0GSVDEcJEkVw0GSVDEcJEkVw0GSVDEcJEmVrsMhIpZExDMR8ZUyvy4ijkbEiYi4PyIuKvWLy/xoWb62ZRu3lfq3I+LabvskSepOL44cPgwcb5m/E7g7M9cDZ4Dtpb4dOJOZPwPcXdoRERuArcDlwGbgsxGxpAf9kiR1qKtwiIjVwA3AF8p8AO8GHihNDgA3luktZZ6yfFNpvwU4mJk/zMwXgFHgym76JUnqTrdHDp8Gfg/4tzJ/KfBqZk6U+TFgVZleBbwEUJa/Vtr/e73NOpKkAVja6YoR8R7glcx8KiJGJsttmuYsy2ZaZ+o+dwA7AIaGhmg0GufT5VmNj4/3fJsLSSfjs2vjxOyNFoihSxbX8211rq8L32Mzm0vj03E4AO8C3hsR1wNvAd5G80hiWUQsLUcHq4GXS/sxYA0wFhFLgbcDp1vqk1rXOUtm7gX2AgwPD+fIyEgX3a81Gg16vc2FpJPxuWX3QxemM3PQro0T3HWsm7fU/HXy5pFzaud7bGZzaXw6Pq2Umbdl5urMXEvzgvLXMvNm4DHgfaXZNuDBMn24zFOWfy0zs9S3lruZ1gHrga932i9JUvcuxMecjwAHI+KTwDPAvlLfB3wxIkZpHjFsBcjMZyPiEPAcMAHszMw3LkC/JEnnqCfhkJkNoFGmn6fN3UaZ+a/ATdOsfwdwRy/6Iknqnt+QliRVDAdJUsVwkCRVDAdJUsVwkCRVDAdJUsVwkCRVDAdJUsVwkCRVDAdJUsVwkCRVDAdJUsVwkCRVDAdJUsVwkCRVDAdJUsVwkCRVDAdJUsVwkCRVDAdJUsVwkCRVDAdJUsVwkCRVDAdJUsVwkCRVOg6HiFgTEY9FxPGIeDYiPlzqKyLiSEScKH+Xl3pExD0RMRoR34qIK1q2ta20PxER27p/WpKkbnRz5DAB7MrMnweuBnZGxAZgN/BoZq4HHi3zANcB68tjB/A5aIYJcDtwFXAlcPtkoEiSBqPjcMjMU5n5dJn+PnAcWAVsAQ6UZgeAG8v0FuDebHocWBYRK4FrgSOZeTozzwBHgM2d9kuS1L2eXHOIiLXALwFHgaHMPAXNAAHeUZqtAl5qWW2s1KarS5IGZGm3G4iInwD+AvidzPyXiJi2aZtazlBvt68dNE9JMTQ0RKPROO/+zmR8fLzn21xIOhmfXRsnLkxn5qChSxbX8211rq8L32Mzm0vj01U4RMSP0QyG+zLzy6X83YhYmZmnymmjV0p9DFjTsvpq4OVSH5lSb7TbX2buBfYCDA8P58jISLtmHWs0GvR6mwtJJ+Nzy+6HLkxn5qBdGye461jXn7fmpZM3j5xTO99jM5tL49PN3UoB7AOOZ+YftSw6DEzecbQNeLCl/sFy19LVwGvltNMjwDURsbxciL6m1CRJA9LNx5x3AR8AjkXEN0rto8Ae4FBEbAdeBG4qyx4GrgdGgdeBWwEy83REfAJ4orT7eGae7qJfkqQudRwOmfm/aX+9AGBTm/YJ7JxmW/uB/Z32RZLUW35DWpJUMRwkSZXFeWvFIrS2B3cN7do4sajuPpIWM8NBUtfO9cNHrz9gnNxzQ8+2pbN5WkmSVDEcJEkVw0GSVDEcJEkVw0GSVDEcJEkVw0GSVDEcJEkVw0GSVDEcJEkVw0GSVDEcJEkVw0GSVDEcJEkVw0GSVDEcJEkVw0GSVDEcJEkVw0GSVDEcJEkVw0GSVFk66A5MiojNwGeAJcAXMnPPgLvUc2t3PzToLkgLyiDfUyf33DCwfffDnDhyiIglwJ8A1wEbgPdHxIbB9kqSFq85EQ7AlcBoZj6fmT8CDgJbBtwnSVq05spppVXASy3zY8BVF2pn0x2K7to4wS2e+pGkORMO0aaWVaOIHcCOMjseEd/uZSc+BJcB3+vlNhcSx2dmjs/sFtIYxZ0XZLP9GJ//dC6N5ko4jAFrWuZXAy9PbZSZe4G9F6oTEfFkZg5fqO3Pd47PzByf2TlGM5tL4zNXrjk8AayPiHURcRGwFTg84D5J0qI1J44cMnMiIn4LeITmraz7M/PZAXdLkhatOREOAJn5MPDwgLtxwU5ZLRCOz8wcn9k5RjObM+MTmdV1X0nSIjdXrjlIkuaQRR0OEfE/IiIj4rIyHxFxT0SMRsS3IuKKlrbbIuJEeWwbXK8vvIj4VET8fRmDv4yIZS3Lbivj8+2IuLalvrnURiNi92B6PjiL/fkDRMSaiHgsIo5HxLMR8eFSXxERR8p750hELC/1ad9vC1lELImIZyLiK2V+XUQcLeNzf7kph4i4uMyPluVr+9rRzFyUD5q3zj4C/BNwWaldD/wVze9dXA0cLfUVwPPl7/IyvXzQz+ECjs01wNIyfSdwZ5neAHwTuBhYB/wjzRsIlpTpnwYuKm02DPp59HG8FvXzbxmHlcAVZfongX8or5k/AHaX+u6W11Pb99tCfwC/C/xP4Ctl/hCwtUx/HviNMv2bwOfL9Fbg/n72czEfOdwN/B5nf9luC3BvNj0OLIuIlcC1wJHMPJ2ZZ4AjwOa+97hPMvOrmTlRZh+n+b0TaI7Pwcz8YWa+AIzS/OmTxf7zJ4v9+QOQmacy8+ky/X3gOM1fP9gCHCjNDgA3lunp3m8LVkSsBm4AvlDmA3g38EBpMnV8JsftAWBTad8XizIcIuK9wHcy85tTFrX7GY9VM9QXg1+j+ekOHJ/pLPbnXymnQH4JOAoMZeYpaAYI8I7SbDGO26dpfij9tzJ/KfBqy4ex1jH49/Epy18r7ftiztzK2msR8b+A/9hm0ceAj9I8dVKt1qaWM9TnrZnGJzMfLG0+BkwA902u1qZ90v5Dxrwen/O04F4f3YiInwD+AvidzPyXGT7sLqpxi4j3AK9k5lMRMTJZbtM0z2HZBbdgwyEzf6VdPSI20jxf/s3yol0NPB0RVzL9z3iMASNT6o2ed7qPphufSeWi+3uATVlOejLzz5zM+vMnC9g5/fzLYhARP0YzGO7LzC+X8ncjYmVmniqnjV4p9cU2bu8C3hsR1wNvAd5G80hiWUQsLUcHrWMwOT5jEbEUeDtwum+9HfTFmUE/gJO8eUH6Bs6+QPb1Ul8BvEDzYvTyMr1i0H2/gGOyGXgO+Kkp9cs5+4L08zQvxi4t0+t484Ls5YN+Hn0cr0X9/FvGIYB7gU9PqX+Ksy9I/0GZbvt+WwwPmh82Jy9I/zlnX5D+zTK9k7MvSB/qZx8X7JFDhx6meQfFKPA6cCtAZp6OiE/Q/A0ogI9nZv8SvP/+mGYAHClHV49n5q9n5rMRcYhmcEwAOzPzDYDF/PMn6c+/THoX8AHgWER8o9Q+CuwBDkXEduBF4KayrO37bRH6CHAwIj4JPAPsK/V9wBcjYpTmEcPWfnbKb0hLkiqL8m4lSdLMDAdJUsVwkCRVDAdJUsVwkCRVDAdJUsVwkCRVDAdJUuX/A5etQLZLwIBrAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_other.lines.hist()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_other_timed = df_other.set_index('timestamp')\n", "df_other_timed['count'] = df_other_timed.lines.cumsum()\n", "df_other_timed['count'].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Concatenate all datasets" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
additionsdeletionsfiletimestamp
41799560src/main/java/org/springframework/samples/petc...2019-07-19 19:16:44
41798062src/main/java/org/springframework/samples/petc...2019-07-19 19:16:44
417700117src/main/java/org/springframework/samples/petc...2019-07-19 06:09:16
4177700src/main/java/org/springframework/samples/petc...2019-07-19 06:09:16
41776046src/main/java/org/springframework/samples/petc...2019-07-19 06:09:16
\n", "
" ], "text/plain": [ " additions deletions \\\n", "41799 56 0 \n", "41798 0 62 \n", "41770 0 117 \n", "41777 0 0 \n", "41776 0 46 \n", "\n", " file timestamp \n", "41799 src/main/java/org/springframework/samples/petc... 2019-07-19 19:16:44 \n", "41798 src/main/java/org/springframework/samples/petc... 2019-07-19 19:16:44 \n", "41770 src/main/java/org/springframework/samples/petc... 2019-07-19 06:09:16 \n", "41777 src/main/java/org/springframework/samples/petc... 2019-07-19 06:09:16 \n", "41776 src/main/java/org/springframework/samples/petc... 2019-07-19 06:09:16 " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.concat([df_jpa, df_jdbc, df_other], ignore_index=True).sort_values(by='timestamp')\n", "df.loc[df.lines > 0, 'additions'] = df.lines\n", "df.loc[df.lines < 0, 'deletions'] = df.lines * -1\n", "df = df.fillna(0).reset_index(drop=True)\n", "df = df[['additions', 'deletions', 'file', 'timestamp']]\n", "df.loc[(df.deletions > 0) & (df.loc[0].timestamp == df.timestamp),'additions'] = df.deletions\n", "df.loc[df.loc[0].timestamp == df.timestamp,'deletions'] = 0\n", "df['additions'] = df.additions.astype(int)\n", "df['deletions'] = df.deletions.astype(int)\n", "df = df.sort_values(by='timestamp', ascending=False)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Truncate data until fixed date" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
additionsdeletionsfiletimestamp
31486190src/main/java/org/springframework/samples/petc...2017-12-31 19:41:29
31485550src/main/java/org/springframework/samples/petc...2017-12-30 12:48:20
31484290src/main/java/org/springframework/samples/petc...2017-12-30 12:48:20
31461099src/main/java/org/springframework/samples/petc...2017-12-30 00:38:54
31467190src/main/java/org/springframework/samples/petc...2017-12-30 00:38:54
\n", "
" ], "text/plain": [ " additions deletions \\\n", "31486 19 0 \n", "31485 55 0 \n", "31484 29 0 \n", "31461 0 99 \n", "31467 19 0 \n", "\n", " file timestamp \n", "31486 src/main/java/org/springframework/samples/petc... 2017-12-31 19:41:29 \n", "31485 src/main/java/org/springframework/samples/petc... 2017-12-30 12:48:20 \n", "31484 src/main/java/org/springframework/samples/petc... 2017-12-30 12:48:20 \n", "31461 src/main/java/org/springframework/samples/petc... 2017-12-30 00:38:54 \n", "31467 src/main/java/org/springframework/samples/petc... 2017-12-30 00:38:54 " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = df[df.timestamp < pd.Timestamp('2018-01-01')]\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Export the data" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "df.to_csv(\"datasets/git_log_refactoring.gz\", index=None, compression='gzip')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Check loaded data" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
additionsdeletionsfiletimestamp
0190src/main/java/org/springframework/samples/petc...2017-12-31 19:41:29
1550src/main/java/org/springframework/samples/petc...2017-12-30 12:48:20
2290src/main/java/org/springframework/samples/petc...2017-12-30 12:48:20
3099src/main/java/org/springframework/samples/petc...2017-12-30 00:38:54
4190src/main/java/org/springframework/samples/petc...2017-12-30 00:38:54
\n", "
" ], "text/plain": [ " additions deletions file \\\n", "0 19 0 src/main/java/org/springframework/samples/petc... \n", "1 55 0 src/main/java/org/springframework/samples/petc... \n", "2 29 0 src/main/java/org/springframework/samples/petc... \n", "3 0 99 src/main/java/org/springframework/samples/petc... \n", "4 19 0 src/main/java/org/springframework/samples/petc... \n", "\n", " timestamp \n", "0 2017-12-31 19:41:29 \n", "1 2017-12-30 12:48:20 \n", "2 2017-12-30 12:48:20 \n", "3 2017-12-30 00:38:54 \n", "4 2017-12-30 00:38:54 " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_loaded = pd.read_csv(\"datasets/git_log_refactoring.gz\")\n", "df_loaded.head()" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 31487 entries, 0 to 31486\n", "Data columns (total 4 columns):\n", "additions 31487 non-null int64\n", "deletions 31487 non-null int64\n", "file 31487 non-null object\n", "timestamp 31487 non-null object\n", "dtypes: int64(2), object(2)\n", "memory usage: 984.0+ KB\n" ] } ], "source": [ "df_loaded.info()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 2 }