{ "cells": [ { "cell_type": "code", "execution_count": 410, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from pandas import DataFrame, read_csv\n", "import pandas as pd\n", "import numpy as np\n", "import plotly.offline as py\n", "import cufflinks\n", "from plotly import tools\n", "py.init_notebook_mode(connected=True)\n", "import plotly.graph_objs as go\n", "import time\n", "import datetime\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import and format data" ] }, { "cell_type": "code", "execution_count": 411, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "date datetime64[ns]\n", "store category\n", "item category\n", "sales int64\n", "dtype: object" ] }, "execution_count": 411, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv('SEG_Saleshistory_Stores.csv')\n", "df.columns = ['date', 'store', 'item', 'sales']\n", "df.date = pd.to_datetime(df.date)\n", "df.item = df.item.astype('category')\n", "df.store = df.store.astype('category')\n", "df.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below are some summary statistics on the data. Overall the quantities for individual items at individual stores is quite small. It would be difficult to forecast daily quantities of individual items and individual stores, so we will work towards forecasting weekly item sales at individual stores." ] }, { "cell_type": "code", "execution_count": 412, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datestoreitemsales
count425575425575425575.0425575.000000
unique34218139.0NaN
top2016-11-05 00:00:00SEGWD741795.0NaN
freq4110611127516.0NaN
first2016-02-04 00:00:00NaNNaNNaN
last2017-01-11 00:00:00NaNNaNNaN
meanNaNNaNNaN1.670345
stdNaNNaNNaN1.112165
minNaNNaNNaN1.000000
25%NaNNaNNaN1.000000
50%NaNNaNNaN1.000000
75%NaNNaNNaN2.000000
maxNaNNaNNaN112.000000
\n", "
" ], "text/plain": [ " date store item sales\n", "count 425575 425575 425575.0 425575.000000\n", "unique 342 181 39.0 NaN\n", "top 2016-11-05 00:00:00 SEGWD7 41795.0 NaN\n", "freq 4110 6111 27516.0 NaN\n", "first 2016-02-04 00:00:00 NaN NaN NaN\n", "last 2017-01-11 00:00:00 NaN NaN NaN\n", "mean NaN NaN NaN 1.670345\n", "std NaN NaN NaN 1.112165\n", "min NaN NaN NaN 1.000000\n", "25% NaN NaN NaN 1.000000\n", "50% NaN NaN NaN 1.000000\n", "75% NaN NaN NaN 2.000000\n", "max NaN NaN NaN 112.000000" ] }, "execution_count": 412, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe(include='all')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look at a few cuts on the data to see if we can spot any trends. Below are plots of a handful of individual stores' sales. It looks like the answer to the question about the big jump in sales in September is the addition of a good number of stores." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Total Sales" ] }, { "cell_type": "code", "execution_count": 413, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_total = df.groupby(pd.Grouper(freq='W', key='date')).sum().fillna(0).unstack('date', 0)\n", "df_total.index.levels[1]\n", "\n", "len(df_total) == len(df_total.index.levels[1])\n", "\n", "trace = go.Scatter(\n", " x = df_total.index.levels[1],\n", " y = df_total\n", ")\n", "\n", "layout = go.Layout(\n", " title='Total Sales'\n", ")\n", "\n", "\n", "fig = go.Figure(data=[trace], layout=layout)\n", "py.iplot(fig, filename='total-sales')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Individual Store Sales" ] }, { "cell_type": "code", "execution_count": 414, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This is the format of your plot grid:\n", "[ (1,1) x1,y1 ] [ (1,2) x2,y1 ] [ (1,3) x3,y1 ] \n", "[ (2,1) x4,y2 ] [ (2,2) x5,y2 ] [ (2,3) x6,y2 ] \n", "[ (3,1) x7,y3 ] [ (3,2) x8,y3 ] [ (3,3) x9,y3 ] \n", "[ (4,1) x10,y4 ] [ (4,2) x11,y4 ] [ (4,3) x12,y4 ] \n", "[ (5,1) x13,y5 ] [ (5,2) x14,y5 ] [ (5,3) x15,y5 ] \n", "[ (6,1) x16,y6 ] [ (6,2) x17,y6 ] [ (6,3) x18,y6 ] \n", "[ (7,1) x19,y7 ] [ (7,2) x20,y7 ] [ (7,3) x21,y7 ] \n", "[ (8,1) x22,y8 ] [ (8,2) x23,y8 ] [ (8,3) x24,y8 ] \n", "[ (9,1) x25,y9 ] [ (9,2) x26,y9 ] [ (9,3) x27,y9 ] \n", "[ (10,1) x28,y10 ] [ (10,2) x29,y10 ] [ (10,3) x30,y10 ]\n", "\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_1w = df.groupby(['store']+[pd.Grouper(freq='W', key='date')]).sum().fillna(0).unstack('date', 0)\n", "\n", "rows = 10\n", "cols = 3\n", "spidx = np.arange(rows*cols).reshape(rows,cols)\n", "\n", "fig = tools.make_subplots(rows=rows, cols=cols, shared_yaxes=True, subplot_titles=df_1w.index[:rows*cols])\n", "\n", "for i in range(rows):\n", "\n", " for j in range(cols):\n", "\n", " trace = go.Scatter(\n", " x = df_1w.iloc[1].index.levels[1],\n", " y = df_1w.iloc[spidx[i,j]],\n", " )\n", "\n", " fig.append_trace(trace, i+1, j+1)\n", "\n", "fig['layout'].update(height=250*rows, title='Sales by Store', showlegend=False);\n", "py.iplot(fig, filename='sales-by-store')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### New Stores" ] }, { "cell_type": "code", "execution_count": 415, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "store_sales = df.groupby(['store']+[pd.Grouper(freq='W', key='date')]).sum().fillna(0).unstack('date')\n", "stores_with_sales = store_sales['sales'].where(store_sales.sales > 0).count()\n", "\n", "stores_with_sales.index\n", "\n", "trace = go.Bar(\n", " x = stores_with_sales.index,\n", " y = stores_with_sales\n", ")\n", "\n", "layout = go.Layout(\n", " title='No. of Stores with Sales'\n", ")\n", "\n", "\n", "fig = go.Figure(data=[trace], layout=layout)\n", "py.iplot(fig, filename='stores-with-sales')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Items" ] }, { "cell_type": "code", "execution_count": 416, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This is the format of your plot grid:\n", "[ (1,1) x1,y1 ] [ (1,2) x2,y1 ] [ (1,3) x3,y1 ] \n", "[ (2,1) x4,y2 ] [ (2,2) x5,y2 ] [ (2,3) x6,y2 ] \n", "[ (3,1) x7,y3 ] [ (3,2) x8,y3 ] [ (3,3) x9,y3 ] \n", "[ (4,1) x10,y4 ] [ (4,2) x11,y4 ] [ (4,3) x12,y4 ] \n", "[ (5,1) x13,y5 ] [ (5,2) x14,y5 ] [ (5,3) x15,y5 ] \n", "[ (6,1) x16,y6 ] [ (6,2) x17,y6 ] [ (6,3) x18,y6 ] \n", "[ (7,1) x19,y7 ] [ (7,2) x20,y7 ] [ (7,3) x21,y7 ] \n", "[ (8,1) x22,y8 ] [ (8,2) x23,y8 ] [ (8,3) x24,y8 ] \n", "[ (9,1) x25,y9 ] [ (9,2) x26,y9 ] [ (9,3) x27,y9 ] \n", "[ (10,1) x28,y10 ] [ (10,2) x29,y10 ] [ (10,3) x30,y10 ]\n", "[ (11,1) x31,y11 ] [ (11,2) x32,y11 ] [ (11,3) x33,y11 ]\n", "[ (12,1) x34,y12 ] [ (12,2) x35,y12 ] [ (12,3) x36,y12 ]\n", "[ (13,1) x37,y13 ] [ (13,2) x38,y13 ] [ (13,3) x39,y13 ]\n", "\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_1w = df.groupby(['item']+[pd.Grouper(freq='W', key='date')]).sum().fillna(0).unstack('date', 0)\n", "rows = 13\n", "cols = 3\n", "\n", "fig = tools.make_subplots(rows=rows, cols=cols, shared_yaxes=True, subplot_titles=df_1w.index[:rows*cols])\n", "\n", "spidx = np.arange(rows*cols).reshape(rows,cols)\n", "\n", "for i in range(rows):\n", "\n", " for j in range(cols):\n", "\n", " trace = go.Scatter(\n", " x = df_1w.iloc[1].index.levels[1],\n", " y = df_1w.iloc[spidx[i,j]],\n", " )\n", "\n", " fig.append_trace(trace, i+1, j+1)\n", "\n", "fig['layout'].update(height=250*rows, title='Sales by Store', showlegend=False);\n", "py.iplot(fig, filename='sales-by-store')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### New Items" ] }, { "cell_type": "code", "execution_count": 417, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "item_sales = df.groupby(['item']+[pd.Grouper(freq='W', key='date')]).sum().fillna(0).unstack('date')\n", "items_with_sales = item_sales['sales'].where(item_sales.sales > 0).count()\n", "\n", "items_with_sales.index\n", "\n", "trace = go.Bar(\n", " x = items_with_sales.index,\n", " y = items_with_sales\n", ")\n", "\n", "layout = go.Layout(\n", " title='No. of Items with Sales'\n", ")\n", "\n", "\n", "fig = go.Figure(data=[trace], layout=layout)\n", "py.iplot(fig, filename='items-with-sales')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prepare Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Trim date range to have consistent history" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But before we do that, let's trim the range of the data to go from 2016-03-01 to 2016-12-15 in order to avoid anomolies in the data evident in the total sales graph above. We're going to trim to an approximate range here to reduce the amount of data we have to process. Once we aggregate into the order periods we will trim so that we end up with whole periods on either end of our data." ] }, { "cell_type": "code", "execution_count": 418, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df = df[(df['date'] > '2016-02-15') & (df['date'] < '2016-12-30')]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Add records with zero sales so there is a record for every period for every store-item combination" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need to make sure that we add records with zero sales for any combination of store, item and period that doesn't appear in the dataset so that our trailing averages are calculated correctly." ] }, { "cell_type": "code", "execution_count": 419, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2237703" ] }, "execution_count": 419, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import itertools\n", "\n", "beg_date = df['date'].min()\n", "end_date = df['date'].max()\n", "dates = pd.DatetimeIndex(start=beg_date, end=end_date, freq='D')\n", "\n", "items = df['item'].value_counts().index\n", "stores = df['store'].value_counts().index\n", "all_periods = pd.DataFrame(list(itertools.product(dates, stores, items)), columns=['date', 'store', 'item'])\n", "\n", "all_periods['sales'] = 0\n", "all_periods.date = pd.to_datetime(df.date)\n", "all_periods.item = df.item.astype('category')\n", "all_periods.store = df.store.astype('category')\n", "\n", "incl_periods = df.groupby(['store', 'item', 'date']).sum().fillna(0).reset_index()\n", "\n", "df_all = pd.concat([all_periods, df]).groupby(['store', 'item', 'date']).sum().fillna(0).reset_index()\n", "len(df_all)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Add Columns for Periods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This time we also want to predict sales for order periods of two and three times per week as well as weekly, assuming orders are placed on the same day each week. The day numbers start on Monday with 0 and end on Sunday with 6.\n", "\n", "For the **two** orders per week periods, we will predict sales from:\n", "* Wednesday through Friday (days 2 through 4)\n", "* Saturday through Tuesday (days 5, 6, 0 and 1)\n", "\n", "For the **three** orders per week periods, we will predict sales from:\n", "* Wednesday and Thursdsay (days 2 and 3)\n", "* Friday and Saturday (days 4 and 5)\n", "* Sunday through Tuesday (days 6, 0 and 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to aggregate sales over the correct periods we need to add columns to represent the series for each order period. The end of the two orders per week periods are created in the column `freq2_end` and the three orders per week periods are in `freq3_end`.\n", "\n", "We are also adding in columns to distinguish between the intra-weekly periods (which have a zero index) in order to allow the model to compensate for differences in sales volumes between intra-weekly periods, which are `freq2_per` and `freq3_per`." ] }, { "cell_type": "code", "execution_count": 420, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
storeitemdatesalesfreq2_perfreq3_perfreq2_endfreq3_end
234SEGWD103417742016-10-072.0122016-10-072016-10-08
235SEGWD103417742016-10-080.0022016-10-112016-10-08
236SEGWD103417742016-10-090.0002016-10-112016-10-11
237SEGWD103417742016-10-100.0002016-10-112016-10-11
238SEGWD103417742016-10-110.0002016-10-112016-10-11
239SEGWD103417742016-10-120.0112016-10-142016-10-13
240SEGWD103417742016-10-132.0112016-10-142016-10-13
241SEGWD103417742016-10-141.0122016-10-142016-10-15
242SEGWD103417742016-10-151.0022016-10-182016-10-15
243SEGWD103417742016-10-161.0002016-10-182016-10-18
\n", "
" ], "text/plain": [ " store item date sales freq2_per freq3_per freq2_end \\\n", "234 SEGWD103 41774 2016-10-07 2.0 1 2 2016-10-07 \n", "235 SEGWD103 41774 2016-10-08 0.0 0 2 2016-10-11 \n", "236 SEGWD103 41774 2016-10-09 0.0 0 0 2016-10-11 \n", "237 SEGWD103 41774 2016-10-10 0.0 0 0 2016-10-11 \n", "238 SEGWD103 41774 2016-10-11 0.0 0 0 2016-10-11 \n", "239 SEGWD103 41774 2016-10-12 0.0 1 1 2016-10-14 \n", "240 SEGWD103 41774 2016-10-13 2.0 1 1 2016-10-14 \n", "241 SEGWD103 41774 2016-10-14 1.0 1 2 2016-10-14 \n", "242 SEGWD103 41774 2016-10-15 1.0 0 2 2016-10-18 \n", "243 SEGWD103 41774 2016-10-16 1.0 0 0 2016-10-18 \n", "\n", " freq3_end \n", "234 2016-10-08 \n", "235 2016-10-08 \n", "236 2016-10-11 \n", "237 2016-10-11 \n", "238 2016-10-11 \n", "239 2016-10-13 \n", "240 2016-10-13 \n", "241 2016-10-15 \n", "242 2016-10-15 \n", "243 2016-10-18 " ] }, "execution_count": 420, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_f = df_all.copy()\n", "\n", "# Assign each record to its respective intra-week group.\n", "#\n", "# These are the intra-week periods that each day of the week belongs to.\n", "freq = list([[0, 0, 1, 1, 1, 0, 0],\n", " [0, 0, 1, 1, 2, 2, 0]\n", " ])\n", "\n", "# Map the day of the week of each record to its respective intra-week period.\n", "for i, f in enumerate(freq):\n", " df_f['freq' + str(i + 2) + '_per'] = df_f['date'].dt.weekday.map(pd.Series(f))\n", "\n", "# Assign each record to its respective group within each series of intra-week groups.\n", "# Group membership is indicated in a separate column by the end date of the group.\n", "#\n", "# 1. Calculate the numeric day of the week for each date in the range of dates\n", "# in the data.\n", "# 2. Create a boolean array with an entry for each record indicating whether\n", "# the date of the record falls on a day of the week on which an intra-week\n", "# period ends.\n", "# 3. Calculate the cumulative sum of the boolean array for the range of dates, which\n", "# will then represent the sequential period each date in our range belongs to.\n", "# 4. Index the cumulative sums by the range of dates to create a lookup table.\n", "# 5. Map the 'date' column in our data to the sequence number using the lookup table.\n", "# 6. Group the lookup table by the period, aggregating the date column by max, which\n", "# represents the end date of each sequential period, to create another lookup table.\n", "# 7. Map the sequence number series we created earlier to the period ending date using\n", "# the new lookup table and add it to our data frame.\n", "#\n", "\n", "# These are the days of the week that new periods begin on for order frequencies of two\n", "# and three times per week.\n", "period_ends = list([[2, 5],\n", " [2, 4, 6]\n", " ])\n", "\n", "# Execute the same process for each of our order frequencies\n", "for i, p in enumerate(period_ends):\n", " # Steps 1 through 4\n", " periods = pd.Series(dates.weekday).isin(period_ends[i]).cumsum()\n", " date_lookup = pd.DataFrame({'date': dates, 'period': periods})\n", " date_lookup.set_index('date', inplace=True)\n", " \n", " # Step 5\n", " seq_col = df_f['date'].map(date_lookup.period)\n", "\n", " # Step 6\n", " period_lookup = date_lookup.reset_index().groupby('period').max()\n", " period_lookup.to_csv('freq' + str(i+2) + '.csv')\n", "\n", " # Step 7\n", " df_f['freq' + str(i+2) + '_end'] = seq_col.map(period_lookup.date)\n", "\n", "df_f[df_f['date'] > '2016-10-06'].head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That looks like its working right." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Calculating trailing averages\n", "Adding in the rolling average sales is more complicated because we want to calculate the average of like intra-week periods. For example, for the two orders per week frequency the first period spans from Friday through Monday. When we do our trailing averages we want to only include trailing Monday through Friday periods and exclude the Tuesday through Thursday periods that would be included in a strictly sequential calculation. " ] }, { "cell_type": "code", "execution_count": 421, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [], "source": [ "# This function calculates the trailing average for a given order frequency.\n", "def calc_trailing(orders_per_week):\n", " # Check to make sure orders per week is in the available range\n", " if orders_per_week not in [2, 3]:\n", " print('Orders per week must be either 2 or 3.')\n", " raise\n", " \n", " freq_per = 'freq' + str(orders_per_week) + '_per'\n", " freq_end = 'freq' + str(orders_per_week) + '_end'\n", " freq_end_avg = freq_end + '_avg'\n", " \n", " f = {'sales': 'sum', freq_per: 'mean'}\n", " g = ['store', 'item', freq_end]\n", " \n", " # Here we filter the data frame for each of the intra-week periods in\n", " # the specified order frequency and perform the trailing average and\n", " # calculations on them separately.\n", " df_final = pd.DataFrame()\n", " for i, n in enumerate(df_f[freq_per].value_counts().index):\n", " df_model = df_f[df_f[freq_per] == n].groupby(g).agg(f).fillna(0)\n", " \n", " rolling_sum = (df_model\n", " .apply(lambda x:x.rolling(window=3).mean())\n", " .shift(1)\n", " )\n", "\n", " df_model[freq_end_avg] = rolling_sum['sales']\n", " df_final = df_final.append(df_model.reset_index())\n", " \n", " return df_final.groupby(g).sum()" ] }, { "cell_type": "code", "execution_count": 422, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
salesfreq3_perfreq3_end_avg
storeitemfreq3_end
SEGWD103417742016-02-160.00NaN
2016-02-182.01NaN
2016-02-201.02NaN
2016-02-230.00NaN
2016-02-250.01NaN
2016-02-272.02NaN
2016-03-011.00NaN
2016-03-030.01NaN
2016-03-052.02NaN
2016-03-082.000.333333
2016-03-102.010.666667
2016-03-121.021.666667
2016-03-152.001.000000
2016-03-171.010.666667
2016-03-192.021.666667
2016-03-223.001.666667
2016-03-240.011.000000
2016-03-262.021.666667
2016-03-294.002.333333
2016-03-310.011.000000
\n", "
" ], "text/plain": [ " sales freq3_per freq3_end_avg\n", "store item freq3_end \n", "SEGWD103 41774 2016-02-16 0.0 0 NaN\n", " 2016-02-18 2.0 1 NaN\n", " 2016-02-20 1.0 2 NaN\n", " 2016-02-23 0.0 0 NaN\n", " 2016-02-25 0.0 1 NaN\n", " 2016-02-27 2.0 2 NaN\n", " 2016-03-01 1.0 0 NaN\n", " 2016-03-03 0.0 1 NaN\n", " 2016-03-05 2.0 2 NaN\n", " 2016-03-08 2.0 0 0.333333\n", " 2016-03-10 2.0 1 0.666667\n", " 2016-03-12 1.0 2 1.666667\n", " 2016-03-15 2.0 0 1.000000\n", " 2016-03-17 1.0 1 0.666667\n", " 2016-03-19 2.0 2 1.666667\n", " 2016-03-22 3.0 0 1.666667\n", " 2016-03-24 0.0 1 1.000000\n", " 2016-03-26 2.0 2 1.666667\n", " 2016-03-29 4.0 0 2.333333\n", " 2016-03-31 0.0 1 1.000000" ] }, "execution_count": 422, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_final = calc_trailing(3)\n", "df_final.head(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Remove Missing Stores and Items" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Then we need to get rid of all the rows where there were no sales because an item hadn't yet been introduced or a store didn't exist. We'll do that by adding a column for cumulative sales and then dropping all rows with zero cumulative sales." ] }, { "cell_type": "code", "execution_count": 423, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def remove_missing(df_final):\n", " beg_len = len(df_final)\n", " df_final['cum_sales'] = df_final.groupby(level=[0,1]).cumsum()['sales']\n", " df_final_masked = df_final[df_final.cum_sales != 0].dropna(how='any')\n", " \n", " print('{} records removed'.format(beg_len - len(df_final_masked)))\n", " print('{} records remaining'.format(len(df_final_masked)))\n", " \n", " return df_final_masked.reset_index()" ] }, { "cell_type": "code", "execution_count": 424, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "487591 records removed\n", "479492 records remaining\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
storeitemfreq3_endsalesfreq3_perfreq3_end_avgcum_sales
4501SEGWD104417832016-08-301.000.0000001.0
4502SEGWD104417832016-09-012.010.0000003.0
4503SEGWD104417832016-09-031.020.0000004.0
4504SEGWD104417832016-09-065.000.3333339.0
4505SEGWD104417832016-09-080.010.6666679.0
\n", "
" ], "text/plain": [ " store item freq3_end sales freq3_per freq3_end_avg cum_sales\n", "4501 SEGWD104 41783 2016-08-30 1.0 0 0.000000 1.0\n", "4502 SEGWD104 41783 2016-09-01 2.0 1 0.000000 3.0\n", "4503 SEGWD104 41783 2016-09-03 1.0 2 0.000000 4.0\n", "4504 SEGWD104 41783 2016-09-06 5.0 0 0.333333 9.0\n", "4505 SEGWD104 41783 2016-09-08 0.0 1 0.666667 9.0" ] }, "execution_count": 424, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_final_masked = remove_missing(df_final)\n", "store = 'SEGWD104'\n", "item = 41783\n", "df_final_masked.query('(store == @store) & (item == @item)').head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Encode Store and Item Categories" ] }, { "cell_type": "code", "execution_count": 425, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "# We have some extra logic here to deal with needing to categorize the three order\n", "# per week variable, but not the two, which is already binary.\n", "def encode_cat_vars(df_final_masked):\n", " \n", " # Make sure sales is the first column\n", " cols = df_final_masked.columns.tolist()\n", " cols.remove('sales')\n", " cols = ['sales'] + cols \n", " df_final_masked = df_final_masked[cols]\n", " \n", " # Initial list of columns to drop\n", " drop_cols = ['cum_sales', 41793, 'SEGWD103']\n", "\n", " # Initial dummy variables\n", " stores = pd.get_dummies(df_final_masked['store'])\n", " items = pd.get_dummies(df_final_masked['item'])\n", " concat_tables = [df_final_masked, stores, items]\n", " \n", " # Create dummy variables for freq3_end if necessary\n", " try:\n", " freq = cols[cols.index('freq2_end')][:5]\n", " except:\n", " freq = 'freq3'\n", " freq3_d = pd.get_dummies(df_final_masked['freq3_per'], prefix='freq3_per')\n", " concat_tables.append(freq3_d)\n", " drop_cols.append(freq + '_per_2')\n", " \n", " # Add dummy variables\n", " df_final_masked = pd.concat(concat_tables, axis=1)\n", " \n", " # Drop columns and return\n", " return df_final_masked.drop(drop_cols, axis=1).sort_values(freq + '_end')" ] }, { "cell_type": "code", "execution_count": 426, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
salesstoreitemfreq3_endfreq3_perfreq3_end_avgSEGWD104SEGWD116SEGWD12SEGWD123...4204542046420474204842049420504205142052freq3_per_0freq3_per_1
4104002.0SEGWD671417972016-02-1603.0000000000...0000000010
493442.0SEGWD151417912016-02-1601.0000000000...0000000010
866612.0SEGWD1690417932016-02-1602.3333330000...0000000010
3935061.0SEGWD649417952016-02-1600.3333330000...0000000010
3664253.0SEGWD54417972016-02-1602.0000000000...0000000010
\n", "

5 rows × 226 columns

\n", "
" ], "text/plain": [ " sales store item freq3_end freq3_per freq3_end_avg \\\n", "410400 2.0 SEGWD671 41797 2016-02-16 0 3.000000 \n", "49344 2.0 SEGWD151 41791 2016-02-16 0 1.000000 \n", "86661 2.0 SEGWD1690 41793 2016-02-16 0 2.333333 \n", "393506 1.0 SEGWD649 41795 2016-02-16 0 0.333333 \n", "366425 3.0 SEGWD54 41797 2016-02-16 0 2.000000 \n", "\n", " SEGWD104 SEGWD116 SEGWD12 SEGWD123 ... 42045 42046 \\\n", "410400 0 0 0 0 ... 0 0 \n", "49344 0 0 0 0 ... 0 0 \n", "86661 0 0 0 0 ... 0 0 \n", "393506 0 0 0 0 ... 0 0 \n", "366425 0 0 0 0 ... 0 0 \n", "\n", " 42047 42048 42049 42050 42051 42052 freq3_per_0 freq3_per_1 \n", "410400 0 0 0 0 0 0 1 0 \n", "49344 0 0 0 0 0 0 1 0 \n", "86661 0 0 0 0 0 0 1 0 \n", "393506 0 0 0 0 0 0 1 0 \n", "366425 0 0 0 0 0 0 1 0 \n", "\n", "[5 rows x 226 columns]" ] }, "execution_count": 426, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = encode_cat_vars(df_final_masked)\n", "data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Split Data for Training and Testing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This time we'll train the model with data through the end of October and then use the remaining data to \n", "test the model. This is where we also do the final trimming of the data to ensure it includes on whole intra-week periods." ] }, { "cell_type": "code", "execution_count": 427, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Values are begining of training, end of training, end of testing and\n", "# represent the end of the respective intra-week order period.\n", "date_range = dict(\n", " freq2_end=['2016-03-01', '2016-10-28', '2016-12-23'],\n", " freq3_end=['2016-03-03', '2016-10-29', '2016-12-24']\n", ")\n", "\n", "freq_avg = data.columns.tolist()[5]\n", "freq = data.columns.tolist()[3]\n", "beg_train, end_train, end_test = date_range[freq]\n", "\n", "# Add constant\n", "data['const'] = 1\n", "\n", "# Normalize average sales variable.\n", "mu = data[freq_avg].mean()\n", "std = data[freq_avg].std()\n", "data[freq_avg] = (data[freq_avg] - mu) / std\n", "\n", "data_train = data[data[freq].ge(beg_train) & data[freq].le(end_train)]\n", "data_test = data[data[freq].gt(end_train) & data[freq].le(end_test)]\n", "\n", "X_train = data_train.iloc[:,5:].as_matrix()\n", "y_train = data_train.iloc[:,0].as_matrix()\n", "\n", "X_test = data_test.iloc[:,5:].as_matrix()\n", "y_test = data_test.iloc[:,0].as_matrix()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train Model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is where we actually train the model. I ran it for 200 iterations - more won't likely increase the predictive power of the model, but there are some other diagnostics we can run to see what other improvements we can make." ] }, { "cell_type": "code", "execution_count": 428, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "SGDRegressor(alpha=0.0001, average=False, epsilon=0.1, eta0=0.01,\n", " fit_intercept=True, l1_ratio=0.15, learning_rate='invscaling',\n", " loss='squared_loss', n_iter=100, penalty='l2', power_t=0.25,\n", " random_state=None, shuffle=True, verbose=0, warm_start=False)" ] }, "execution_count": 428, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn import linear_model\n", "clf = linear_model.SGDRegressor(n_iter=100)\n", "clf.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate Model" ] }, { "cell_type": "code", "execution_count": 429, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "R-squared: 0.5402\n", "Total error in sales quantity: 21932\n", "Total error as a % of actual: 12.81%\n", "Total error in sales quantity with zero min prediction: 23599\n", "Total error as a % of actual with zero min prediction: 13.79%\n" ] } ], "source": [ "predict = clf.predict(X_test)\n", "predict_neg = predict < 0\n", "error = predict - y_test\n", "error_neg = predict_neg @ error\n", "np.savetxt('modelparams.csv', clf.predict(np.eye(X_test.shape[1])), delimiter=\",\")\n", "print('R-squared: {:.{p}f}'.format(clf.score(X_test, y_test), p=4))\n", "print('Total error in sales quantity: {:.{p}f}'.format(sum(error), p=0))\n", "print('Total error as a % of actual: {:.{p}f}%'.format(sum(error) / sum(y_test)*100, p=2))\n", "print('Total error in sales quantity with zero min prediction: {:.{p}f}'.format(sum(error)-error_neg, p=0))\n", "print('Total error as a % of actual with zero min prediction: {:.{p}f}%'.format((sum(error)-error_neg) / sum(y_test)*100, p=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is still quite a good fit overall. As you would expect, it is not quite as accurate as the model that was based on seven day periods." ] }, { "cell_type": "code", "execution_count": 436, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
salesstoreitemfreq3_endfreq3_perfreq3_end_avgpredictvariance
732761.0SEGWD168417802016-11-0100.3333330.645468-0.354532
2854760.0SEGWD2366420432016-11-0101.0000001.1746021.174602
4070400.0SEGWD67420402016-11-0100.0000000.2471990.247199
2220976.0SEGWD2261417952016-11-0104.0000004.417526-1.582474
3861610.0SEGWD622420452016-11-0100.0000000.1498430.149843
\n", "
" ], "text/plain": [ " sales store item freq3_end freq3_per freq3_end_avg \\\n", "73276 1.0 SEGWD168 41780 2016-11-01 0 0.333333 \n", "285476 0.0 SEGWD2366 42043 2016-11-01 0 1.000000 \n", "407040 0.0 SEGWD67 42040 2016-11-01 0 0.000000 \n", "222097 6.0 SEGWD2261 41795 2016-11-01 0 4.000000 \n", "386161 0.0 SEGWD622 42045 2016-11-01 0 0.000000 \n", "\n", " predict variance \n", "73276 0.645468 -0.354532 \n", "285476 1.174602 1.174602 \n", "407040 0.247199 0.247199 \n", "222097 4.417526 -1.582474 \n", "386161 0.149843 0.149843 " ] }, "execution_count": 436, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_predict = data_test.iloc[:,:5]\n", "data_predict[freq_avg] = data[freq_avg] * std + mu\n", "data_predict['predict'] = predict\n", "data_predict['variance'] = predict - data_predict['sales']\n", "data_predict.head()" ] }, { "cell_type": "code", "execution_count": 437, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "py.iplot(data_predict.groupby(freq).sum()[['sales', 'predict']].iplot(asFigure=True,\n", " kind='bar',xTitle='Dates',yTitle='Sales',title='Actual vs. Predicted'))" ] }, { "cell_type": "code", "execution_count": 445, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
freq3_end2016-11-01 00:00:002016-11-03 00:00:002016-11-05 00:00:002016-11-08 00:00:00...2016-12-20 00:00:002016-12-22 00:00:002016-12-24 00:00:00Total
salespredictvariancesalespredictvariancesalespredictvariancesales...variancesalespredictvariancesalespredictvariancesalespredictvariance
storeitem
SEGWD103417743.01.496760-1.5032403.01.632384-1.3676160.01.6564271.6564273.0...0.6981571.01.0281950.0281950.01.2536341.25363430.037.8817737.881773
417751.01.3707420.3707421.01.5063650.5063652.00.926219-1.0737812.0...0.3707420.00.7007800.7007800.02.1345972.13459728.032.8433614.843361
417760.00.8209010.8209010.00.7551280.7551280.00.3763790.3763790.0...0.8209011.00.352336-0.6476640.00.3763790.3763796.014.6122828.612282
417773.01.491449-1.5085510.00.8214870.8214874.01.248323-2.7516771.0...0.8942421.00.821487-0.1785130.00.8455300.84553028.029.8998451.899845
417781.01.4727890.4727891.00.601431-0.3985691.00.826870-0.1731301.0...-1.5272111.01.0042240.0042240.00.8268700.82687027.027.2366550.236655
417791.01.1528550.1528551.01.0870820.0870820.01.7153141.7153142.0...-1.0415602.01.489874-0.5101260.01.1111251.11112535.034.260152-0.739848
417803.01.547540-1.4524602.00.877578-1.1224221.01.3044140.3044141.0...1.3531251.00.676182-0.3238182.01.304414-0.69558635.032.857197-2.142803
417810.00.2530610.2530610.0-0.014109-0.0141090.00.0099340.0099340.0...0.2530610.0-0.014109-0.0141090.00.0099340.0099340.01.9910901.991090
417821.02.0698701.0698701.01.3999080.3999081.01.6253470.6253474.0...-0.7357150.01.1985121.1985123.01.625347-1.37465335.038.7470443.747044
417833.01.997093-1.0029072.01.125735-0.8742653.01.351174-1.6488263.0...-0.0029070.01.5285271.5285270.01.7539671.75396745.042.035298-2.964702
417860.00.3160150.3160150.00.0488450.0488450.00.0728880.0728880.0...0.3160150.00.0488450.0488450.00.0728880.0728880.03.5019883.501988
417883.01.492568-1.5074321.01.8295880.8295880.00.8466490.8466494.0...-1.1046390.01.6281911.6281911.00.645253-0.35474733.036.9755743.975574
417892.01.118977-0.8810230.00.8518070.8518071.01.2786430.2786433.0...-0.2768350.01.0532031.0532031.00.875850-0.12415029.030.0233361.023336
417902.00.999797-1.0002030.01.3368171.3368171.00.756671-0.2433292.0...1.2011941.00.934024-0.0659760.00.5552750.55527521.024.9476783.947678
417913.03.2226390.2226395.02.552677-2.4473231.02.5767201.5767202.0...1.4100731.01.5456960.5456960.01.3683431.36834328.054.93391526.933915
4179210.04.952470-5.0475303.03.0741300.0741302.02.4939840.4939845.0...1.9385070.01.4629591.4629590.01.8897951.88979537.069.66413532.664135
4179310.04.662246-5.3377543.03.9922840.9922845.03.814931-1.18506910.0...-1.1363585.03.186699-1.8133013.03.4121380.412138113.0104.991987-8.008013
417946.03.695985-2.3040154.02.824627-1.1753735.02.244481-2.7555196.0...-1.6998263.04.4357971.4357973.02.043085-0.95691594.086.031049-7.968951
417957.04.138919-2.8610813.03.2675610.2675619.03.895793-5.1042076.0...-2.2499102.04.2745432.2745432.03.8957931.895793115.0108.543855-6.456145
417960.02.3334222.3334224.02.267649-1.7323515.02.090296-2.9097046.0...-1.2568041.02.8718381.8718383.01.687503-1.31249771.057.558860-13.441140
417970.02.0857022.0857020.02.0199292.0199295.02.043972-2.9560281.0...-1.9073160.02.0199292.0199293.01.439783-1.56021746.045.168899-0.831101
417983.03.3153670.3153672.02.0412160.0412165.01.863863-3.1361373.0...0.7111784.02.444008-1.5559923.02.065259-0.93474170.059.576130-10.423870
417992.01.360804-0.6391963.00.690842-2.3091581.00.714885-0.2851151.0...-0.4377991.00.892239-0.1077611.01.3190740.31907433.028.576936-4.423064
418000.00.1701170.1701170.0-0.097052-0.0970520.0-0.073009-0.0730090.0...0.1701170.0-0.097052-0.0970520.0-0.073009-0.0730090.00.0004520.000452
41803NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
418040.00.1104890.1104890.0-0.156681-0.1566810.0-0.132638-0.1326380.0...0.1104890.0-0.156681-0.1566810.0-0.132638-0.1326380.0-1.430642-1.430642
420400.00.7761370.7761370.00.5089680.5089680.00.1302180.1302180.0...0.3733450.00.1061750.1061750.00.1302180.1302180.06.4890806.489080
420410.00.8170530.8170530.00.7512800.7512800.00.3725300.3725300.0...0.4142600.00.1470910.1470910.00.1711340.1711340.08.4780298.478029
420420.00.8281250.8281250.00.3595590.3595590.00.5849990.5849990.0...0.4253320.00.1581630.1581630.00.1822060.1822060.08.1395748.139574
420430.00.7409280.7409280.00.2723620.2723620.00.2964050.2964050.0...0.3381350.00.0709660.0709660.00.0950090.0950090.05.8454405.845440
.....................................................................
SEGWD97417860.00.1082890.1082890.0-0.158880-0.1588800.0-0.134837-0.1348370.0...0.1082890.0-0.158880-0.1588800.0-0.134837-0.1348370.0-1.483428-1.483428
417882.01.083446-0.9165542.01.017673-0.9823271.01.2431120.2431121.0...-0.3123651.00.816277-0.1837231.00.638924-0.36107627.028.1636281.163628
417892.00.508458-1.4915420.01.8524591.8524593.01.473710-1.5262900.0...-0.4845601.00.845478-0.1545221.00.265332-0.73466825.023.426750-1.573250
417901.00.792072-0.2079280.00.5249020.5249024.00.548945-3.4510550.0...0.5906751.00.323506-0.6764941.00.548945-0.45105516.015.732939-0.267061
417914.02.612121-1.3878795.02.546348-2.4536520.02.1675982.1675987.0...-1.1864832.02.3449520.3449523.02.973184-0.02681670.066.060203-3.939797
417923.02.730781-0.2692194.02.463612-1.5363884.03.494636-0.5053643.0...-1.6650303.03.0678000.0678003.02.286258-0.71374268.072.5331754.533175
417934.02.037765-1.9622352.02.5761800.5761804.02.801620-1.1983802.0...-0.7608393.03.1803690.1803692.02.3988270.39882761.066.3733885.373388
417944.02.682674-1.3173263.02.415505-0.5844955.02.036755-2.9632455.0...-2.1229112.03.4224861.4224863.03.0437370.04373775.069.767440-5.232560
417953.03.9311940.9311943.02.657043-0.3429573.02.681086-0.3189144.0...-0.4715993.04.0668171.0668172.03.0838781.08387877.083.0160166.016016
417962.01.722904-0.2770961.01.0529420.0529424.01.076985-2.9230152.0...-0.4784922.00.650149-1.3498510.01.6811741.68117434.033.843588-0.156412
417972.01.676580-0.3234202.01.208014-0.7919864.00.829265-3.1707353.0...-0.5248160.00.6038250.6038252.01.030661-0.96933935.033.738801-1.261199
417980.01.2950741.2950741.01.4306980.4306981.01.0519480.0519483.0...-0.9063221.00.423716-0.5762842.01.454741-0.54525928.030.0203642.020364
417990.00.3474930.3474930.00.0803240.0803240.00.1043670.1043670.0...0.7502860.00.0803240.0803240.00.3057630.3057634.05.8686451.868645
418000.0-0.037608-0.0376080.0-0.304778-0.3047780.0-0.280735-0.2807350.0...-0.0376080.0-0.304778-0.3047780.0-0.280735-0.2807350.0-4.984964-4.984964
41803NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
41804NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
420400.00.1656190.1656190.0-0.101550-0.1015500.00.1238890.1238890.0...0.1656190.0-0.101550-0.1015500.0-0.077507-0.0775070.00.0938900.093890
420410.00.2065350.2065350.0-0.060635-0.0606350.0-0.036592-0.0365920.0...0.2065350.0-0.060635-0.0606350.0-0.036592-0.0365920.00.8744610.874461
420420.00.2176070.2176070.0-0.049563-0.0495630.00.1758770.1758770.0...0.2176070.0-0.049563-0.0495630.0-0.025520-0.0255200.01.3415911.341591
420430.00.3318060.3318060.0-0.136760-0.1367600.0-0.112717-0.1127170.0...0.1304090.0-0.136760-0.1367600.0-0.112717-0.1127170.0-0.549751-0.549751
420440.00.5294520.5294520.0-0.140510-0.1405100.0-0.116467-0.1164670.0...0.1266600.0-0.140510-0.1405100.0-0.116467-0.1164670.0-0.236952-0.236952
420450.00.1535350.1535350.0-0.113635-0.1136350.00.1118040.1118040.0...0.1535350.0-0.113635-0.1136350.0-0.089592-0.0895920.0-0.196142-0.196142
420460.00.0739520.0739520.0-0.193217-0.1932170.0-0.169174-0.1691740.0...0.0739520.0-0.193217-0.1932170.0-0.169174-0.1691740.0-2.307511-2.307511
420471.01.2477210.2477210.00.5777590.5777595.00.601802-4.3981981.0...-0.7522790.00.5777590.5777591.01.4073880.40738835.027.071326-7.928674
420481.01.2578380.2578381.00.386480-0.6135202.01.216108-0.7838922.0...0.0634232.00.587876-1.4121241.01.0147120.01471238.028.321106-9.678894
420492.01.131052-0.8689481.00.259693-0.7403074.01.290718-2.7092821.0...0.1310522.01.468071-0.5319292.00.485133-1.51486742.028.500577-13.499423
420502.01.472176-0.5278240.00.3994220.3994222.00.826257-1.1737434.0...0.0693842.00.802214-1.1977862.01.027653-0.97234731.024.603784-6.396216
420510.01.2894081.2894081.01.0222390.0222393.01.046282-1.9537181.0...0.0949931.00.619446-0.3805541.00.643489-0.35651129.025.655051-3.344949
420522.01.079798-0.9202021.00.409836-0.5901644.00.635276-3.3647241.0...1.2811950.00.2084400.2084404.00.232483-3.76751729.019.416036-9.583964
Total8733.09304.347387571.3473876118.06650.350036532.35003610153.07259.384284-2893.6157169235.0...1261.3086165625.06810.8363731185.8363736752.07032.609569280.609569171148.0193080.17185021932.171850
\n", "

7060 rows × 75 columns

\n", "
" ], "text/plain": [ "freq3_end 2016-11-01 00:00:00 \\\n", " sales predict variance \n", "store item \n", "SEGWD103 41774 3.0 1.496760 -1.503240 \n", " 41775 1.0 1.370742 0.370742 \n", " 41776 0.0 0.820901 0.820901 \n", " 41777 3.0 1.491449 -1.508551 \n", " 41778 1.0 1.472789 0.472789 \n", " 41779 1.0 1.152855 0.152855 \n", " 41780 3.0 1.547540 -1.452460 \n", " 41781 0.0 0.253061 0.253061 \n", " 41782 1.0 2.069870 1.069870 \n", " 41783 3.0 1.997093 -1.002907 \n", " 41786 0.0 0.316015 0.316015 \n", " 41788 3.0 1.492568 -1.507432 \n", " 41789 2.0 1.118977 -0.881023 \n", " 41790 2.0 0.999797 -1.000203 \n", " 41791 3.0 3.222639 0.222639 \n", " 41792 10.0 4.952470 -5.047530 \n", " 41793 10.0 4.662246 -5.337754 \n", " 41794 6.0 3.695985 -2.304015 \n", " 41795 7.0 4.138919 -2.861081 \n", " 41796 0.0 2.333422 2.333422 \n", " 41797 0.0 2.085702 2.085702 \n", " 41798 3.0 3.315367 0.315367 \n", " 41799 2.0 1.360804 -0.639196 \n", " 41800 0.0 0.170117 0.170117 \n", " 41803 NaN NaN NaN \n", " 41804 0.0 0.110489 0.110489 \n", " 42040 0.0 0.776137 0.776137 \n", " 42041 0.0 0.817053 0.817053 \n", " 42042 0.0 0.828125 0.828125 \n", " 42043 0.0 0.740928 0.740928 \n", "... ... ... ... \n", "SEGWD97 41786 0.0 0.108289 0.108289 \n", " 41788 2.0 1.083446 -0.916554 \n", " 41789 2.0 0.508458 -1.491542 \n", " 41790 1.0 0.792072 -0.207928 \n", " 41791 4.0 2.612121 -1.387879 \n", " 41792 3.0 2.730781 -0.269219 \n", " 41793 4.0 2.037765 -1.962235 \n", " 41794 4.0 2.682674 -1.317326 \n", " 41795 3.0 3.931194 0.931194 \n", " 41796 2.0 1.722904 -0.277096 \n", " 41797 2.0 1.676580 -0.323420 \n", " 41798 0.0 1.295074 1.295074 \n", " 41799 0.0 0.347493 0.347493 \n", " 41800 0.0 -0.037608 -0.037608 \n", " 41803 NaN NaN NaN \n", " 41804 NaN NaN NaN \n", " 42040 0.0 0.165619 0.165619 \n", " 42041 0.0 0.206535 0.206535 \n", " 42042 0.0 0.217607 0.217607 \n", " 42043 0.0 0.331806 0.331806 \n", " 42044 0.0 0.529452 0.529452 \n", " 42045 0.0 0.153535 0.153535 \n", " 42046 0.0 0.073952 0.073952 \n", " 42047 1.0 1.247721 0.247721 \n", " 42048 1.0 1.257838 0.257838 \n", " 42049 2.0 1.131052 -0.868948 \n", " 42050 2.0 1.472176 -0.527824 \n", " 42051 0.0 1.289408 1.289408 \n", " 42052 2.0 1.079798 -0.920202 \n", "Total 8733.0 9304.347387 571.347387 \n", "\n", "freq3_end 2016-11-03 00:00:00 \\\n", " sales predict variance \n", "store item \n", "SEGWD103 41774 3.0 1.632384 -1.367616 \n", " 41775 1.0 1.506365 0.506365 \n", " 41776 0.0 0.755128 0.755128 \n", " 41777 0.0 0.821487 0.821487 \n", " 41778 1.0 0.601431 -0.398569 \n", " 41779 1.0 1.087082 0.087082 \n", " 41780 2.0 0.877578 -1.122422 \n", " 41781 0.0 -0.014109 -0.014109 \n", " 41782 1.0 1.399908 0.399908 \n", " 41783 2.0 1.125735 -0.874265 \n", " 41786 0.0 0.048845 0.048845 \n", " 41788 1.0 1.829588 0.829588 \n", " 41789 0.0 0.851807 0.851807 \n", " 41790 0.0 1.336817 1.336817 \n", " 41791 5.0 2.552677 -2.447323 \n", " 41792 3.0 3.074130 0.074130 \n", " 41793 3.0 3.992284 0.992284 \n", " 41794 4.0 2.824627 -1.175373 \n", " 41795 3.0 3.267561 0.267561 \n", " 41796 4.0 2.267649 -1.732351 \n", " 41797 0.0 2.019929 2.019929 \n", " 41798 2.0 2.041216 0.041216 \n", " 41799 3.0 0.690842 -2.309158 \n", " 41800 0.0 -0.097052 -0.097052 \n", " 41803 NaN NaN NaN \n", " 41804 0.0 -0.156681 -0.156681 \n", " 42040 0.0 0.508968 0.508968 \n", " 42041 0.0 0.751280 0.751280 \n", " 42042 0.0 0.359559 0.359559 \n", " 42043 0.0 0.272362 0.272362 \n", "... ... ... ... \n", "SEGWD97 41786 0.0 -0.158880 -0.158880 \n", " 41788 2.0 1.017673 -0.982327 \n", " 41789 0.0 1.852459 1.852459 \n", " 41790 0.0 0.524902 0.524902 \n", " 41791 5.0 2.546348 -2.453652 \n", " 41792 4.0 2.463612 -1.536388 \n", " 41793 2.0 2.576180 0.576180 \n", " 41794 3.0 2.415505 -0.584495 \n", " 41795 3.0 2.657043 -0.342957 \n", " 41796 1.0 1.052942 0.052942 \n", " 41797 2.0 1.208014 -0.791986 \n", " 41798 1.0 1.430698 0.430698 \n", " 41799 0.0 0.080324 0.080324 \n", " 41800 0.0 -0.304778 -0.304778 \n", " 41803 NaN NaN NaN \n", " 41804 NaN NaN NaN \n", " 42040 0.0 -0.101550 -0.101550 \n", " 42041 0.0 -0.060635 -0.060635 \n", " 42042 0.0 -0.049563 -0.049563 \n", " 42043 0.0 -0.136760 -0.136760 \n", " 42044 0.0 -0.140510 -0.140510 \n", " 42045 0.0 -0.113635 -0.113635 \n", " 42046 0.0 -0.193217 -0.193217 \n", " 42047 0.0 0.577759 0.577759 \n", " 42048 1.0 0.386480 -0.613520 \n", " 42049 1.0 0.259693 -0.740307 \n", " 42050 0.0 0.399422 0.399422 \n", " 42051 1.0 1.022239 0.022239 \n", " 42052 1.0 0.409836 -0.590164 \n", "Total 6118.0 6650.350036 532.350036 \n", "\n", "freq3_end 2016-11-05 00:00:00 \\\n", " sales predict variance \n", "store item \n", "SEGWD103 41774 0.0 1.656427 1.656427 \n", " 41775 2.0 0.926219 -1.073781 \n", " 41776 0.0 0.376379 0.376379 \n", " 41777 4.0 1.248323 -2.751677 \n", " 41778 1.0 0.826870 -0.173130 \n", " 41779 0.0 1.715314 1.715314 \n", " 41780 1.0 1.304414 0.304414 \n", " 41781 0.0 0.009934 0.009934 \n", " 41782 1.0 1.625347 0.625347 \n", " 41783 3.0 1.351174 -1.648826 \n", " 41786 0.0 0.072888 0.072888 \n", " 41788 0.0 0.846649 0.846649 \n", " 41789 1.0 1.278643 0.278643 \n", " 41790 1.0 0.756671 -0.243329 \n", " 41791 1.0 2.576720 1.576720 \n", " 41792 2.0 2.493984 0.493984 \n", " 41793 5.0 3.814931 -1.185069 \n", " 41794 5.0 2.244481 -2.755519 \n", " 41795 9.0 3.895793 -5.104207 \n", " 41796 5.0 2.090296 -2.909704 \n", " 41797 5.0 2.043972 -2.956028 \n", " 41798 5.0 1.863863 -3.136137 \n", " 41799 1.0 0.714885 -0.285115 \n", " 41800 0.0 -0.073009 -0.073009 \n", " 41803 NaN NaN NaN \n", " 41804 0.0 -0.132638 -0.132638 \n", " 42040 0.0 0.130218 0.130218 \n", " 42041 0.0 0.372530 0.372530 \n", " 42042 0.0 0.584999 0.584999 \n", " 42043 0.0 0.296405 0.296405 \n", "... ... ... ... \n", "SEGWD97 41786 0.0 -0.134837 -0.134837 \n", " 41788 1.0 1.243112 0.243112 \n", " 41789 3.0 1.473710 -1.526290 \n", " 41790 4.0 0.548945 -3.451055 \n", " 41791 0.0 2.167598 2.167598 \n", " 41792 4.0 3.494636 -0.505364 \n", " 41793 4.0 2.801620 -1.198380 \n", " 41794 5.0 2.036755 -2.963245 \n", " 41795 3.0 2.681086 -0.318914 \n", " 41796 4.0 1.076985 -2.923015 \n", " 41797 4.0 0.829265 -3.170735 \n", " 41798 1.0 1.051948 0.051948 \n", " 41799 0.0 0.104367 0.104367 \n", " 41800 0.0 -0.280735 -0.280735 \n", " 41803 NaN NaN NaN \n", " 41804 NaN NaN NaN \n", " 42040 0.0 0.123889 0.123889 \n", " 42041 0.0 -0.036592 -0.036592 \n", " 42042 0.0 0.175877 0.175877 \n", " 42043 0.0 -0.112717 -0.112717 \n", " 42044 0.0 -0.116467 -0.116467 \n", " 42045 0.0 0.111804 0.111804 \n", " 42046 0.0 -0.169174 -0.169174 \n", " 42047 5.0 0.601802 -4.398198 \n", " 42048 2.0 1.216108 -0.783892 \n", " 42049 4.0 1.290718 -2.709282 \n", " 42050 2.0 0.826257 -1.173743 \n", " 42051 3.0 1.046282 -1.953718 \n", " 42052 4.0 0.635276 -3.364724 \n", "Total 10153.0 7259.384284 -2893.615716 \n", "\n", "freq3_end 2016-11-08 00:00:00 ... 2016-12-20 00:00:00 \\\n", " sales ... variance \n", "store item ... \n", "SEGWD103 41774 3.0 ... 0.698157 \n", " 41775 2.0 ... 0.370742 \n", " 41776 0.0 ... 0.820901 \n", " 41777 1.0 ... 0.894242 \n", " 41778 1.0 ... -1.527211 \n", " 41779 2.0 ... -1.041560 \n", " 41780 1.0 ... 1.353125 \n", " 41781 0.0 ... 0.253061 \n", " 41782 4.0 ... -0.735715 \n", " 41783 3.0 ... -0.002907 \n", " 41786 0.0 ... 0.316015 \n", " 41788 4.0 ... -1.104639 \n", " 41789 3.0 ... -0.276835 \n", " 41790 2.0 ... 1.201194 \n", " 41791 2.0 ... 1.410073 \n", " 41792 5.0 ... 1.938507 \n", " 41793 10.0 ... -1.136358 \n", " 41794 6.0 ... -1.699826 \n", " 41795 6.0 ... -2.249910 \n", " 41796 6.0 ... -1.256804 \n", " 41797 1.0 ... -1.907316 \n", " 41798 3.0 ... 0.711178 \n", " 41799 1.0 ... -0.437799 \n", " 41800 0.0 ... 0.170117 \n", " 41803 NaN ... NaN \n", " 41804 0.0 ... 0.110489 \n", " 42040 0.0 ... 0.373345 \n", " 42041 0.0 ... 0.414260 \n", " 42042 0.0 ... 0.425332 \n", " 42043 0.0 ... 0.338135 \n", "... ... ... ... \n", "SEGWD97 41786 0.0 ... 0.108289 \n", " 41788 1.0 ... -0.312365 \n", " 41789 0.0 ... -0.484560 \n", " 41790 0.0 ... 0.590675 \n", " 41791 7.0 ... -1.186483 \n", " 41792 3.0 ... -1.665030 \n", " 41793 2.0 ... -0.760839 \n", " 41794 5.0 ... -2.122911 \n", " 41795 4.0 ... -0.471599 \n", " 41796 2.0 ... -0.478492 \n", " 41797 3.0 ... -0.524816 \n", " 41798 3.0 ... -0.906322 \n", " 41799 0.0 ... 0.750286 \n", " 41800 0.0 ... -0.037608 \n", " 41803 NaN ... NaN \n", " 41804 NaN ... NaN \n", " 42040 0.0 ... 0.165619 \n", " 42041 0.0 ... 0.206535 \n", " 42042 0.0 ... 0.217607 \n", " 42043 0.0 ... 0.130409 \n", " 42044 0.0 ... 0.126660 \n", " 42045 0.0 ... 0.153535 \n", " 42046 0.0 ... 0.073952 \n", " 42047 1.0 ... -0.752279 \n", " 42048 2.0 ... 0.063423 \n", " 42049 1.0 ... 0.131052 \n", " 42050 4.0 ... 0.069384 \n", " 42051 1.0 ... 0.094993 \n", " 42052 1.0 ... 1.281195 \n", "Total 9235.0 ... 1261.308616 \n", "\n", "freq3_end 2016-12-22 00:00:00 \\\n", " sales predict variance \n", "store item \n", "SEGWD103 41774 1.0 1.028195 0.028195 \n", " 41775 0.0 0.700780 0.700780 \n", " 41776 1.0 0.352336 -0.647664 \n", " 41777 1.0 0.821487 -0.178513 \n", " 41778 1.0 1.004224 0.004224 \n", " 41779 2.0 1.489874 -0.510126 \n", " 41780 1.0 0.676182 -0.323818 \n", " 41781 0.0 -0.014109 -0.014109 \n", " 41782 0.0 1.198512 1.198512 \n", " 41783 0.0 1.528527 1.528527 \n", " 41786 0.0 0.048845 0.048845 \n", " 41788 0.0 1.628191 1.628191 \n", " 41789 0.0 1.053203 1.053203 \n", " 41790 1.0 0.934024 -0.065976 \n", " 41791 1.0 1.545696 0.545696 \n", " 41792 0.0 1.462959 1.462959 \n", " 41793 5.0 3.186699 -1.813301 \n", " 41794 3.0 4.435797 1.435797 \n", " 41795 2.0 4.274543 2.274543 \n", " 41796 1.0 2.871838 1.871838 \n", " 41797 0.0 2.019929 2.019929 \n", " 41798 4.0 2.444008 -1.555992 \n", " 41799 1.0 0.892239 -0.107761 \n", " 41800 0.0 -0.097052 -0.097052 \n", " 41803 NaN NaN NaN \n", " 41804 0.0 -0.156681 -0.156681 \n", " 42040 0.0 0.106175 0.106175 \n", " 42041 0.0 0.147091 0.147091 \n", " 42042 0.0 0.158163 0.158163 \n", " 42043 0.0 0.070966 0.070966 \n", "... ... ... ... \n", "SEGWD97 41786 0.0 -0.158880 -0.158880 \n", " 41788 1.0 0.816277 -0.183723 \n", " 41789 1.0 0.845478 -0.154522 \n", " 41790 1.0 0.323506 -0.676494 \n", " 41791 2.0 2.344952 0.344952 \n", " 41792 3.0 3.067800 0.067800 \n", " 41793 3.0 3.180369 0.180369 \n", " 41794 2.0 3.422486 1.422486 \n", " 41795 3.0 4.066817 1.066817 \n", " 41796 2.0 0.650149 -1.349851 \n", " 41797 0.0 0.603825 0.603825 \n", " 41798 1.0 0.423716 -0.576284 \n", " 41799 0.0 0.080324 0.080324 \n", " 41800 0.0 -0.304778 -0.304778 \n", " 41803 NaN NaN NaN \n", " 41804 NaN NaN NaN \n", " 42040 0.0 -0.101550 -0.101550 \n", " 42041 0.0 -0.060635 -0.060635 \n", " 42042 0.0 -0.049563 -0.049563 \n", " 42043 0.0 -0.136760 -0.136760 \n", " 42044 0.0 -0.140510 -0.140510 \n", " 42045 0.0 -0.113635 -0.113635 \n", " 42046 0.0 -0.193217 -0.193217 \n", " 42047 0.0 0.577759 0.577759 \n", " 42048 2.0 0.587876 -1.412124 \n", " 42049 2.0 1.468071 -0.531929 \n", " 42050 2.0 0.802214 -1.197786 \n", " 42051 1.0 0.619446 -0.380554 \n", " 42052 0.0 0.208440 0.208440 \n", "Total 5625.0 6810.836373 1185.836373 \n", "\n", "freq3_end 2016-12-24 00:00:00 Total \\\n", " sales predict variance sales \n", "store item \n", "SEGWD103 41774 0.0 1.253634 1.253634 30.0 \n", " 41775 0.0 2.134597 2.134597 28.0 \n", " 41776 0.0 0.376379 0.376379 6.0 \n", " 41777 0.0 0.845530 0.845530 28.0 \n", " 41778 0.0 0.826870 0.826870 27.0 \n", " 41779 0.0 1.111125 1.111125 35.0 \n", " 41780 2.0 1.304414 -0.695586 35.0 \n", " 41781 0.0 0.009934 0.009934 0.0 \n", " 41782 3.0 1.625347 -1.374653 35.0 \n", " 41783 0.0 1.753967 1.753967 45.0 \n", " 41786 0.0 0.072888 0.072888 0.0 \n", " 41788 1.0 0.645253 -0.354747 33.0 \n", " 41789 1.0 0.875850 -0.124150 29.0 \n", " 41790 0.0 0.555275 0.555275 21.0 \n", " 41791 0.0 1.368343 1.368343 28.0 \n", " 41792 0.0 1.889795 1.889795 37.0 \n", " 41793 3.0 3.412138 0.412138 113.0 \n", " 41794 3.0 2.043085 -0.956915 94.0 \n", " 41795 2.0 3.895793 1.895793 115.0 \n", " 41796 3.0 1.687503 -1.312497 71.0 \n", " 41797 3.0 1.439783 -1.560217 46.0 \n", " 41798 3.0 2.065259 -0.934741 70.0 \n", " 41799 1.0 1.319074 0.319074 33.0 \n", " 41800 0.0 -0.073009 -0.073009 0.0 \n", " 41803 NaN NaN NaN NaN \n", " 41804 0.0 -0.132638 -0.132638 0.0 \n", " 42040 0.0 0.130218 0.130218 0.0 \n", " 42041 0.0 0.171134 0.171134 0.0 \n", " 42042 0.0 0.182206 0.182206 0.0 \n", " 42043 0.0 0.095009 0.095009 0.0 \n", "... ... ... ... ... \n", "SEGWD97 41786 0.0 -0.134837 -0.134837 0.0 \n", " 41788 1.0 0.638924 -0.361076 27.0 \n", " 41789 1.0 0.265332 -0.734668 25.0 \n", " 41790 1.0 0.548945 -0.451055 16.0 \n", " 41791 3.0 2.973184 -0.026816 70.0 \n", " 41792 3.0 2.286258 -0.713742 68.0 \n", " 41793 2.0 2.398827 0.398827 61.0 \n", " 41794 3.0 3.043737 0.043737 75.0 \n", " 41795 2.0 3.083878 1.083878 77.0 \n", " 41796 0.0 1.681174 1.681174 34.0 \n", " 41797 2.0 1.030661 -0.969339 35.0 \n", " 41798 2.0 1.454741 -0.545259 28.0 \n", " 41799 0.0 0.305763 0.305763 4.0 \n", " 41800 0.0 -0.280735 -0.280735 0.0 \n", " 41803 NaN NaN NaN NaN \n", " 41804 NaN NaN NaN NaN \n", " 42040 0.0 -0.077507 -0.077507 0.0 \n", " 42041 0.0 -0.036592 -0.036592 0.0 \n", " 42042 0.0 -0.025520 -0.025520 0.0 \n", " 42043 0.0 -0.112717 -0.112717 0.0 \n", " 42044 0.0 -0.116467 -0.116467 0.0 \n", " 42045 0.0 -0.089592 -0.089592 0.0 \n", " 42046 0.0 -0.169174 -0.169174 0.0 \n", " 42047 1.0 1.407388 0.407388 35.0 \n", " 42048 1.0 1.014712 0.014712 38.0 \n", " 42049 2.0 0.485133 -1.514867 42.0 \n", " 42050 2.0 1.027653 -0.972347 31.0 \n", " 42051 1.0 0.643489 -0.356511 29.0 \n", " 42052 4.0 0.232483 -3.767517 29.0 \n", "Total 6752.0 7032.609569 280.609569 171148.0 \n", "\n", "freq3_end \n", " predict variance \n", "store item \n", "SEGWD103 41774 37.881773 7.881773 \n", " 41775 32.843361 4.843361 \n", " 41776 14.612282 8.612282 \n", " 41777 29.899845 1.899845 \n", " 41778 27.236655 0.236655 \n", " 41779 34.260152 -0.739848 \n", " 41780 32.857197 -2.142803 \n", " 41781 1.991090 1.991090 \n", " 41782 38.747044 3.747044 \n", " 41783 42.035298 -2.964702 \n", " 41786 3.501988 3.501988 \n", " 41788 36.975574 3.975574 \n", " 41789 30.023336 1.023336 \n", " 41790 24.947678 3.947678 \n", " 41791 54.933915 26.933915 \n", " 41792 69.664135 32.664135 \n", " 41793 104.991987 -8.008013 \n", " 41794 86.031049 -7.968951 \n", " 41795 108.543855 -6.456145 \n", " 41796 57.558860 -13.441140 \n", " 41797 45.168899 -0.831101 \n", " 41798 59.576130 -10.423870 \n", " 41799 28.576936 -4.423064 \n", " 41800 0.000452 0.000452 \n", " 41803 NaN NaN \n", " 41804 -1.430642 -1.430642 \n", " 42040 6.489080 6.489080 \n", " 42041 8.478029 8.478029 \n", " 42042 8.139574 8.139574 \n", " 42043 5.845440 5.845440 \n", "... ... ... \n", "SEGWD97 41786 -1.483428 -1.483428 \n", " 41788 28.163628 1.163628 \n", " 41789 23.426750 -1.573250 \n", " 41790 15.732939 -0.267061 \n", " 41791 66.060203 -3.939797 \n", " 41792 72.533175 4.533175 \n", " 41793 66.373388 5.373388 \n", " 41794 69.767440 -5.232560 \n", " 41795 83.016016 6.016016 \n", " 41796 33.843588 -0.156412 \n", " 41797 33.738801 -1.261199 \n", " 41798 30.020364 2.020364 \n", " 41799 5.868645 1.868645 \n", " 41800 -4.984964 -4.984964 \n", " 41803 NaN NaN \n", " 41804 NaN NaN \n", " 42040 0.093890 0.093890 \n", " 42041 0.874461 0.874461 \n", " 42042 1.341591 1.341591 \n", " 42043 -0.549751 -0.549751 \n", " 42044 -0.236952 -0.236952 \n", " 42045 -0.196142 -0.196142 \n", " 42046 -2.307511 -2.307511 \n", " 42047 27.071326 -7.928674 \n", " 42048 28.321106 -9.678894 \n", " 42049 28.500577 -13.499423 \n", " 42050 24.603784 -6.396216 \n", " 42051 25.655051 -3.344949 \n", " 42052 19.416036 -9.583964 \n", "Total 193080.171850 21932.171850 \n", "\n", "[7060 rows x 75 columns]" ] }, "execution_count": 445, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pivot = (pd.pivot_table(data_predict,index=['store','item'],columns=[freq],values=['sales', 'predict', 'variance'],aggfunc=np.sum,margins=True, margins_name='Total')\n", " .swaplevel(axis=1)\n", " .sortlevel(0, axis=1, sort_remaining=False)\n", " )\n", "pivot" ] }, { "cell_type": "code", "execution_count": 446, "metadata": { "collapsed": false }, "outputs": [], "source": [ "timestring = datetime.datetime.fromtimestamp(time.time()).strftime('%m%d%H%M')\n", "pivot.to_csv('pivot_' + freq[:5] + '_' + timestring + '.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Improvement Opportunities\n", "* Run model diagnostics to understand bias and variance.\n", "* Analyze the errors (cut by store, by item, by period to try to identify areas where the errors are systematically larger. Explore segmenting or grouping stores and/or products.\n", "* Longer history of sales going back at least another year to be able to factor in normal seasonality.\n", "* Evaluation of alternative optimization algorighthms beyond regression.\n", "* Incorporate pricing and promotional variables.\n" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [conda root]", "language": "python", "name": "conda-root-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 2 }