{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Mean Absolute Difference in Pandas Tutorial"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook explains how to calculate the mean absolute difference of time series data in `pandas`.\n",
    "\n",
    "This notebook will use gold and silver price data from `rdatasets` for this tutorial"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Packages"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The documentation for each package used in this tutorial is linked below:\n",
    "* [pandas](https://pandas.pydata.org/docs/)\n",
    "* [statsmodels](https://www.statsmodels.org/stable/index.html)\n",
    "    * [statsmodels.api](https://www.statsmodels.org/stable/api.html#statsmodels-api)\n",
    "* [numpy](https://numpy.org/doc/stable/)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import statsmodels.api as sm\n",
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create initial dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is from `rdatasets` imported using the Python package `statsmodels`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 9132 entries, 0 to 9131\n",
      "Data columns (total 3 columns):\n",
      " #   Column  Non-Null Count  Dtype  \n",
      "---  ------  --------------  -----  \n",
      " 0   date    9132 non-null   object \n",
      " 1   gold    9132 non-null   float64\n",
      " 2   silver  9132 non-null   float64\n",
      "dtypes: float64(2), object(1)\n",
      "memory usage: 214.2+ KB\n"
     ]
    }
   ],
   "source": [
    "df = sm.datasets.get_rdataset('GoldSilver', 'AER').data.reset_index().rename(columns={'index': 'date'})\n",
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['date'] = pd.to_datetime(df.date)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Time series aggregation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `pandas` function `rolling` can be used to create aggregations on windows of specific lengths.  Here, an aggregate of the daily gold and silver price data will be created covering the primary week.  \n",
    "\n",
    "First, a datetime index needs to be created from the **date** column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.set_index('date', inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas aggregation will require a function that can be called.  The mean difference will be called using three functions from `numpy`: `mean`, `abs` and `diff`.  `diff` calculates the difference between the current value and a prior value (by default the immediate prior value)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "def mean_absolute_difference(series):\n",
    "    return np.mean(np.abs(np.diff(series)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If, instead of an offset (**'7D'** representing 7 days), a number is used, it will just use the prior number of observations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "weekly_resample = df.rolling('7D')\n",
    "aggregated_df = weekly_resample.agg(['mean', mean_absolute_difference])\n",
    "aggregated_df.columns = ['_'.join(col).strip() + '_week' for col in aggregated_df.columns.values]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gold_mean_week</th>\n",
       "      <th>gold_mean_absolute_difference_week</th>\n",
       "      <th>silver_mean_week</th>\n",
       "      <th>silver_mean_absolute_difference_week</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1977-12-30</th>\n",
       "      <td>100.000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>223.420</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-02</th>\n",
       "      <td>100.000</td>\n",
       "      <td>0.0000</td>\n",
       "      <td>223.420</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-03</th>\n",
       "      <td>100.000</td>\n",
       "      <td>0.0000</td>\n",
       "      <td>225.560</td>\n",
       "      <td>3.210000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-04</th>\n",
       "      <td>100.000</td>\n",
       "      <td>0.0000</td>\n",
       "      <td>225.315</td>\n",
       "      <td>3.893333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-05</th>\n",
       "      <td>100.000</td>\n",
       "      <td>0.0000</td>\n",
       "      <td>225.850</td>\n",
       "      <td>3.772500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-06</th>\n",
       "      <td>100.000</td>\n",
       "      <td>0.0000</td>\n",
       "      <td>226.604</td>\n",
       "      <td>3.972500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-09</th>\n",
       "      <td>100.246</td>\n",
       "      <td>0.3075</td>\n",
       "      <td>227.844</td>\n",
       "      <td>2.975000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-10</th>\n",
       "      <td>100.436</td>\n",
       "      <td>0.3775</td>\n",
       "      <td>227.670</td>\n",
       "      <td>1.822500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-11</th>\n",
       "      <td>100.886</td>\n",
       "      <td>0.7025</td>\n",
       "      <td>228.998</td>\n",
       "      <td>1.532500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-12</th>\n",
       "      <td>101.062</td>\n",
       "      <td>1.0450</td>\n",
       "      <td>228.978</td>\n",
       "      <td>2.165000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-13</th>\n",
       "      <td>101.586</td>\n",
       "      <td>1.1725</td>\n",
       "      <td>230.050</td>\n",
       "      <td>2.722500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-16</th>\n",
       "      <td>101.958</td>\n",
       "      <td>1.2200</td>\n",
       "      <td>230.624</td>\n",
       "      <td>2.575000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-17</th>\n",
       "      <td>102.124</td>\n",
       "      <td>1.2225</td>\n",
       "      <td>230.866</td>\n",
       "      <td>2.590000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-18</th>\n",
       "      <td>101.848</td>\n",
       "      <td>1.1075</td>\n",
       "      <td>230.214</td>\n",
       "      <td>2.312500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-19</th>\n",
       "      <td>102.012</td>\n",
       "      <td>0.8800</td>\n",
       "      <td>230.412</td>\n",
       "      <td>1.377500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-20</th>\n",
       "      <td>101.832</td>\n",
       "      <td>0.7675</td>\n",
       "      <td>229.686</td>\n",
       "      <td>1.372500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-23</th>\n",
       "      <td>101.938</td>\n",
       "      <td>0.9150</td>\n",
       "      <td>229.988</td>\n",
       "      <td>2.065000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-24</th>\n",
       "      <td>102.300</td>\n",
       "      <td>0.6950</td>\n",
       "      <td>230.678</td>\n",
       "      <td>1.602500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-25</th>\n",
       "      <td>102.836</td>\n",
       "      <td>0.4975</td>\n",
       "      <td>231.986</td>\n",
       "      <td>1.590000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1978-01-26</th>\n",
       "      <td>103.152</td>\n",
       "      <td>0.5600</td>\n",
       "      <td>232.508</td>\n",
       "      <td>2.332500</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            gold_mean_week  gold_mean_absolute_difference_week  \\\n",
       "date                                                             \n",
       "1977-12-30         100.000                                 NaN   \n",
       "1978-01-02         100.000                              0.0000   \n",
       "1978-01-03         100.000                              0.0000   \n",
       "1978-01-04         100.000                              0.0000   \n",
       "1978-01-05         100.000                              0.0000   \n",
       "1978-01-06         100.000                              0.0000   \n",
       "1978-01-09         100.246                              0.3075   \n",
       "1978-01-10         100.436                              0.3775   \n",
       "1978-01-11         100.886                              0.7025   \n",
       "1978-01-12         101.062                              1.0450   \n",
       "1978-01-13         101.586                              1.1725   \n",
       "1978-01-16         101.958                              1.2200   \n",
       "1978-01-17         102.124                              1.2225   \n",
       "1978-01-18         101.848                              1.1075   \n",
       "1978-01-19         102.012                              0.8800   \n",
       "1978-01-20         101.832                              0.7675   \n",
       "1978-01-23         101.938                              0.9150   \n",
       "1978-01-24         102.300                              0.6950   \n",
       "1978-01-25         102.836                              0.4975   \n",
       "1978-01-26         103.152                              0.5600   \n",
       "\n",
       "            silver_mean_week  silver_mean_absolute_difference_week  \n",
       "date                                                                \n",
       "1977-12-30           223.420                                   NaN  \n",
       "1978-01-02           223.420                              0.000000  \n",
       "1978-01-03           225.560                              3.210000  \n",
       "1978-01-04           225.315                              3.893333  \n",
       "1978-01-05           225.850                              3.772500  \n",
       "1978-01-06           226.604                              3.972500  \n",
       "1978-01-09           227.844                              2.975000  \n",
       "1978-01-10           227.670                              1.822500  \n",
       "1978-01-11           228.998                              1.532500  \n",
       "1978-01-12           228.978                              2.165000  \n",
       "1978-01-13           230.050                              2.722500  \n",
       "1978-01-16           230.624                              2.575000  \n",
       "1978-01-17           230.866                              2.590000  \n",
       "1978-01-18           230.214                              2.312500  \n",
       "1978-01-19           230.412                              1.377500  \n",
       "1978-01-20           229.686                              1.372500  \n",
       "1978-01-23           229.988                              2.065000  \n",
       "1978-01-24           230.678                              1.602500  \n",
       "1978-01-25           231.986                              1.590000  \n",
       "1978-01-26           232.508                              2.332500  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "aggregated_df.head(20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}