{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Wellness Data\n",
    "\n",
    "The original wellness data of athletes consists of 19 columns. Our aim for this dataset is to come up with a quantitative  summary measure(s). For complete understanding as to what each variable means, please refer to the [CodeBook](https://www.dropbox.com/s/c5v4hig2afq881h/3.%20DataFest%202019%20-%20Codebook.xlsx?dl=0). \n",
    "\n",
    "We'll first look at the dataset in depth, to possibly merge highly correlated measures or come up with new measures by transforming the existing ones. The summary of our wellness dataset is as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 5011 entries, 0 to 5010\n",
      "Data columns (total 19 columns):\n",
      "Date                   5011 non-null object\n",
      "PlayerID               5011 non-null int64\n",
      "Fatigue                5011 non-null int64\n",
      "Soreness               5011 non-null int64\n",
      "Desire                 5011 non-null int64\n",
      "Irritability           5011 non-null int64\n",
      "BedTime                5011 non-null object\n",
      "WakeTime               5011 non-null object\n",
      "SleepHours             5011 non-null float64\n",
      "SleepQuality           5011 non-null int64\n",
      "MonitoringScore        5011 non-null int64\n",
      "Pain                   5011 non-null object\n",
      "Illness                5011 non-null object\n",
      "Menstruation           4995 non-null object\n",
      "Nutrition              4174 non-null object\n",
      "NutritionAdjustment    4266 non-null object\n",
      "USGMeasurement         4843 non-null object\n",
      "USG                    629 non-null float64\n",
      "TrainingReadiness      5011 non-null object\n",
      "dtypes: float64(2), int64(7), object(10)\n",
      "memory usage: 743.9+ KB\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "from factor_analyzer import FactorAnalyzer\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "dfWell = pd.read_csv('https://www.dropbox.com/s/170bc3dimgn8ru8/wellness.csv?dl=1')\n",
    "dfWell.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we can see, there are some null values in our dataset. After close inspection, we decided to fill the null values of Menstruation, Nutrition and NutritionAdjustment with their modes and drop the USG and USGMeasurement columns as they were ~85% null. The resulting dataset looks as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 5011 entries, 0 to 5010\n",
      "Data columns (total 19 columns):\n",
      "Date                   5011 non-null object\n",
      "PlayerID               5011 non-null int64\n",
      "Fatigue                5011 non-null int64\n",
      "Soreness               5011 non-null int64\n",
      "Desire                 5011 non-null int64\n",
      "Irritability           5011 non-null int64\n",
      "BedTime                5011 non-null object\n",
      "WakeTime               5011 non-null object\n",
      "SleepHours             5011 non-null float64\n",
      "SleepQuality           5011 non-null int64\n",
      "MonitoringScore        5011 non-null int64\n",
      "Pain                   5011 non-null object\n",
      "Illness                5011 non-null object\n",
      "Menstruation           5011 non-null object\n",
      "Nutrition              5011 non-null object\n",
      "NutritionAdjustment    5011 non-null object\n",
      "USGMeasurement         4843 non-null object\n",
      "USG                    629 non-null float64\n",
      "TrainingReadiness      5011 non-null object\n",
      "dtypes: float64(2), int64(7), object(10)\n",
      "memory usage: 743.9+ KB\n"
     ]
    }
   ],
   "source": [
    "# As Menstruation, Nutrition and Nutrition Adjustment are categorical, fill NA with their Modes\n",
    "dfWell.Menstruation.fillna(dfWell.Menstruation.mode()[0],inplace=True)\n",
    "dfWell.Nutrition.fillna(dfWell.Nutrition.mode()[0],inplace=True)\n",
    "dfWell.NutritionAdjustment.fillna(dfWell.NutritionAdjustment.mode()[0],inplace=True)\n",
    "dfWell.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>PlayerID</th>\n",
       "      <th>Fatigue</th>\n",
       "      <th>Soreness</th>\n",
       "      <th>Desire</th>\n",
       "      <th>Irritability</th>\n",
       "      <th>BedTime</th>\n",
       "      <th>WakeTime</th>\n",
       "      <th>SleepHours</th>\n",
       "      <th>SleepQuality</th>\n",
       "      <th>MonitoringScore</th>\n",
       "      <th>Pain</th>\n",
       "      <th>Illness</th>\n",
       "      <th>Menstruation</th>\n",
       "      <th>Nutrition</th>\n",
       "      <th>NutritionAdjustment</th>\n",
       "      <th>TrainingReadiness</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>23:00:00</td>\n",
       "      <td>07:00:00</td>\n",
       "      <td>8.00</td>\n",
       "      <td>2</td>\n",
       "      <td>13</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>0%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>23:00:00</td>\n",
       "      <td>07:00:00</td>\n",
       "      <td>8.00</td>\n",
       "      <td>4</td>\n",
       "      <td>19</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>0%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>22:30:00</td>\n",
       "      <td>06:30:00</td>\n",
       "      <td>8.00</td>\n",
       "      <td>4</td>\n",
       "      <td>19</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>100%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>00:30:00</td>\n",
       "      <td>07:00:00</td>\n",
       "      <td>6.50</td>\n",
       "      <td>1</td>\n",
       "      <td>15</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>95%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>23:45:00</td>\n",
       "      <td>07:00:00</td>\n",
       "      <td>7.25</td>\n",
       "      <td>4</td>\n",
       "      <td>20</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Okay</td>\n",
       "      <td>Yes</td>\n",
       "      <td>100%</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Date  PlayerID  Fatigue  Soreness  Desire  Irritability   BedTime  \\\n",
       "0  2018-07-21         1        3         3       2             3  23:00:00   \n",
       "1  2018-07-21         2        4         3       4             4  23:00:00   \n",
       "2  2018-07-21         3        3         3       5             4  22:30:00   \n",
       "3  2018-07-21         4        2         3       5             4  00:30:00   \n",
       "4  2018-07-21         5        5         3       4             4  23:45:00   \n",
       "\n",
       "   WakeTime  SleepHours  SleepQuality  MonitoringScore Pain Illness  \\\n",
       "0  07:00:00        8.00             2               13   No      No   \n",
       "1  07:00:00        8.00             4               19  Yes      No   \n",
       "2  06:30:00        8.00             4               19   No      No   \n",
       "3  07:00:00        6.50             1               15   No      No   \n",
       "4  07:00:00        7.25             4               20   No      No   \n",
       "\n",
       "  Menstruation  Nutrition NutritionAdjustment TrainingReadiness  \n",
       "0          Yes  Excellent                 Yes                0%  \n",
       "1          Yes  Excellent                 Yes                0%  \n",
       "2           No  Excellent                 Yes              100%  \n",
       "3          Yes  Excellent                 Yes               95%  \n",
       "4           No       Okay                 Yes              100%  "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# As USG measurement has almost 87% NA, it doesn't make sense to fill NA with random or mean values. \n",
    "# Hence, drop USG related columns\n",
    "dfWell = dfWell.drop(['USGMeasurement', 'USG'], axis=1)\n",
    "dfWell.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we drop the BedTime and WakeTime as this information is captured by the SleepHours already. We are also going to convert the TrainingReadiness and SleepTime to a scale of 1 to 7 to match all the other scales which are captured in the MonitoringScore. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>PlayerID</th>\n",
       "      <th>Fatigue</th>\n",
       "      <th>Soreness</th>\n",
       "      <th>Desire</th>\n",
       "      <th>Irritability</th>\n",
       "      <th>SleepHours</th>\n",
       "      <th>SleepQuality</th>\n",
       "      <th>MonitoringScore</th>\n",
       "      <th>Pain</th>\n",
       "      <th>Illness</th>\n",
       "      <th>Menstruation</th>\n",
       "      <th>Nutrition</th>\n",
       "      <th>NutritionAdjustment</th>\n",
       "      <th>TrainingReadiness</th>\n",
       "      <th>SleepHoursScaled</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>8.00</td>\n",
       "      <td>2</td>\n",
       "      <td>13</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>8.00</td>\n",
       "      <td>4</td>\n",
       "      <td>19</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>8.00</td>\n",
       "      <td>4</td>\n",
       "      <td>19</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>7.0</td>\n",
       "      <td>3.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>6.50</td>\n",
       "      <td>1</td>\n",
       "      <td>15</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>6.7</td>\n",
       "      <td>2.586207</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>7.25</td>\n",
       "      <td>4</td>\n",
       "      <td>20</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Okay</td>\n",
       "      <td>Yes</td>\n",
       "      <td>7.0</td>\n",
       "      <td>2.793103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5006</td>\n",
       "      <td>2017-08-01</td>\n",
       "      <td>8</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>7</td>\n",
       "      <td>7</td>\n",
       "      <td>6.50</td>\n",
       "      <td>6</td>\n",
       "      <td>33</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Poor</td>\n",
       "      <td>No</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2.586207</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5007</td>\n",
       "      <td>2017-08-01</td>\n",
       "      <td>10</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>6.75</td>\n",
       "      <td>3</td>\n",
       "      <td>18</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>I Don't Know</td>\n",
       "      <td>6.7</td>\n",
       "      <td>2.655172</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5008</td>\n",
       "      <td>2017-08-01</td>\n",
       "      <td>12</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>7.50</td>\n",
       "      <td>3</td>\n",
       "      <td>21</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Okay</td>\n",
       "      <td>I Don't Know</td>\n",
       "      <td>6.4</td>\n",
       "      <td>2.862069</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5009</td>\n",
       "      <td>2017-08-01</td>\n",
       "      <td>13</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>7.50</td>\n",
       "      <td>4</td>\n",
       "      <td>20</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Okay</td>\n",
       "      <td>Yes</td>\n",
       "      <td>7.0</td>\n",
       "      <td>2.862069</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5010</td>\n",
       "      <td>2017-08-01</td>\n",
       "      <td>14</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>8.00</td>\n",
       "      <td>6</td>\n",
       "      <td>24</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Okay</td>\n",
       "      <td>Yes</td>\n",
       "      <td>7.0</td>\n",
       "      <td>3.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5011 rows × 16 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            Date  PlayerID  Fatigue  Soreness  Desire  Irritability  \\\n",
       "0     2018-07-21         1        3         3       2             3   \n",
       "1     2018-07-21         2        4         3       4             4   \n",
       "2     2018-07-21         3        3         3       5             4   \n",
       "3     2018-07-21         4        2         3       5             4   \n",
       "4     2018-07-21         5        5         3       4             4   \n",
       "...          ...       ...      ...       ...     ...           ...   \n",
       "5006  2017-08-01         8        6         7       7             7   \n",
       "5007  2017-08-01        10        3         4       4             4   \n",
       "5008  2017-08-01        12        5         4       5             4   \n",
       "5009  2017-08-01        13        4         4       4             4   \n",
       "5010  2017-08-01        14        4         4       5             5   \n",
       "\n",
       "      SleepHours  SleepQuality  MonitoringScore Pain Illness Menstruation  \\\n",
       "0           8.00             2               13   No      No          Yes   \n",
       "1           8.00             4               19  Yes      No          Yes   \n",
       "2           8.00             4               19   No      No           No   \n",
       "3           6.50             1               15   No      No          Yes   \n",
       "4           7.25             4               20   No      No           No   \n",
       "...          ...           ...              ...  ...     ...          ...   \n",
       "5006        6.50             6               33   No      No           No   \n",
       "5007        6.75             3               18   No      No           No   \n",
       "5008        7.50             3               21   No      No          Yes   \n",
       "5009        7.50             4               20   No      No           No   \n",
       "5010        8.00             6               24   No      No           No   \n",
       "\n",
       "      Nutrition NutritionAdjustment  TrainingReadiness  SleepHoursScaled  \n",
       "0     Excellent                 Yes                1.0          3.000000  \n",
       "1     Excellent                 Yes                1.0          3.000000  \n",
       "2     Excellent                 Yes                7.0          3.000000  \n",
       "3     Excellent                 Yes                6.7          2.586207  \n",
       "4          Okay                 Yes                7.0          2.793103  \n",
       "...         ...                 ...                ...               ...  \n",
       "5006       Poor                  No                4.0          2.586207  \n",
       "5007  Excellent        I Don't Know                6.7          2.655172  \n",
       "5008       Okay        I Don't Know                6.4          2.862069  \n",
       "5009       Okay                 Yes                7.0          2.862069  \n",
       "5010       Okay                 Yes                7.0          3.000000  \n",
       "\n",
       "[5011 rows x 16 columns]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# We do not require Bed Time and Wake Time as well. We'll be using SleepHours instead\n",
    "dfWell = dfWell.drop(['BedTime', 'WakeTime'], axis=1)\n",
    "# Convert TrainingReadiness from String to fraction\n",
    "dfWell['TrainingReadiness'] = 1 + (dfWell['TrainingReadiness'].str.rstrip('%').astype('float') / 100.0 * 6)\n",
    "# Use MinMaxScaler to convert sleeptime to a number from 1 to 7\n",
    "from sklearn import preprocessing\n",
    "min_max_scaler = preprocessing.MinMaxScaler()\n",
    "dfWell[\"SleepHoursScaled\"] = 1 + (min_max_scaler.fit_transform(dfWell[[\"SleepHours\"]]) * 6)\n",
    "dfWell"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We merge the MonitoringScore with TrainingReadiness and SleepHoursScaled to define a new measure called the WellnessScore. Our dataset looks a lot cleaner and concise now:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>PlayerID</th>\n",
       "      <th>Pain</th>\n",
       "      <th>Illness</th>\n",
       "      <th>Menstruation</th>\n",
       "      <th>Nutrition</th>\n",
       "      <th>NutritionAdjustment</th>\n",
       "      <th>WellnessScore</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>1</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>17.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>2</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>23.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>3</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>29.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>4</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>24.286207</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>5</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Okay</td>\n",
       "      <td>Yes</td>\n",
       "      <td>29.793103</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Date  PlayerID Pain Illness Menstruation  Nutrition  \\\n",
       "0  2018-07-21         1   No      No          Yes  Excellent   \n",
       "1  2018-07-21         2  Yes      No          Yes  Excellent   \n",
       "2  2018-07-21         3   No      No           No  Excellent   \n",
       "3  2018-07-21         4   No      No          Yes  Excellent   \n",
       "4  2018-07-21         5   No      No           No       Okay   \n",
       "\n",
       "  NutritionAdjustment  WellnessScore  \n",
       "0                 Yes      17.000000  \n",
       "1                 Yes      23.000000  \n",
       "2                 Yes      29.000000  \n",
       "3                 Yes      24.286207  \n",
       "4                 Yes      29.793103  "
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a WellnessScore which is the sum of all the quantitative values\n",
    "dfWell[\"WellnessScore\"] = dfWell[\"MonitoringScore\"] + dfWell[\"TrainingReadiness\"] + dfWell[\"SleepHoursScaled\"]\n",
    "# Drop all the quantitative values as we'll be using only the WellnessScore from now on\n",
    "dfWell = dfWell.drop(['Fatigue','Soreness','Desire','Irritability','SleepHours','SleepQuality','MonitoringScore',\\\n",
    "                      'TrainingReadiness','SleepHoursScaled'],axis=1)\n",
    "dfWell.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But the data we have here isn't completely independent. Intuitively, doesn't our wellness today depend on how we felt yesterday or even a week before? So, we define a new score by applying exponential moving average to the WellnessScore. Now, each player's <b>'EWMScore'</b> will depend on a week of WellnessScores with exponentially decreasing weights as me move further back in time. The resulting sample of rows like this: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>PlayerID</th>\n",
       "      <th>Pain</th>\n",
       "      <th>Illness</th>\n",
       "      <th>Menstruation</th>\n",
       "      <th>Nutrition</th>\n",
       "      <th>NutritionAdjustment</th>\n",
       "      <th>WellnessScore</th>\n",
       "      <th>EWMScore</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>2017-08-01</td>\n",
       "      <td>14</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Okay</td>\n",
       "      <td>Yes</td>\n",
       "      <td>34.000000</td>\n",
       "      <td>34.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>2017-08-01</td>\n",
       "      <td>2</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>35.824138</td>\n",
       "      <td>35.824138</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>2017-08-01</td>\n",
       "      <td>3</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>30.000000</td>\n",
       "      <td>30.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>2017-08-01</td>\n",
       "      <td>5</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>31.596552</td>\n",
       "      <td>31.596552</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>2017-08-01</td>\n",
       "      <td>13</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Okay</td>\n",
       "      <td>Yes</td>\n",
       "      <td>29.862069</td>\n",
       "      <td>29.862069</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5006</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>14</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>28.206897</td>\n",
       "      <td>28.557073</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5007</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>15</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>34.837931</td>\n",
       "      <td>35.176056</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5008</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>16</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Okay</td>\n",
       "      <td>Yes</td>\n",
       "      <td>25.000000</td>\n",
       "      <td>26.956113</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5009</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>10</td>\n",
       "      <td>No</td>\n",
       "      <td>Slightly Off</td>\n",
       "      <td>No</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>22.993103</td>\n",
       "      <td>24.044602</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5010</td>\n",
       "      <td>2018-07-21</td>\n",
       "      <td>1</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Excellent</td>\n",
       "      <td>Yes</td>\n",
       "      <td>17.000000</td>\n",
       "      <td>20.872639</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5011 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            Date  PlayerID Pain       Illness Menstruation  Nutrition  \\\n",
       "0     2017-08-01        14   No            No           No       Okay   \n",
       "1     2017-08-01         2   No            No           No  Excellent   \n",
       "2     2017-08-01         3   No            No           No  Excellent   \n",
       "3     2017-08-01         5   No            No           No  Excellent   \n",
       "4     2017-08-01        13   No            No           No       Okay   \n",
       "...          ...       ...  ...           ...          ...        ...   \n",
       "5006  2018-07-21        14   No            No           No  Excellent   \n",
       "5007  2018-07-21        15   No            No           No  Excellent   \n",
       "5008  2018-07-21        16   No            No           No       Okay   \n",
       "5009  2018-07-21        10   No  Slightly Off           No  Excellent   \n",
       "5010  2018-07-21         1   No            No          Yes  Excellent   \n",
       "\n",
       "     NutritionAdjustment  WellnessScore   EWMScore  \n",
       "0                    Yes      34.000000  34.000000  \n",
       "1                    Yes      35.824138  35.824138  \n",
       "2                    Yes      30.000000  30.000000  \n",
       "3                    Yes      31.596552  31.596552  \n",
       "4                    Yes      29.862069  29.862069  \n",
       "...                  ...            ...        ...  \n",
       "5006                 Yes      28.206897  28.557073  \n",
       "5007                 Yes      34.837931  35.176056  \n",
       "5008                 Yes      25.000000  26.956113  \n",
       "5009                 Yes      22.993103  24.044602  \n",
       "5010                 Yes      17.000000  20.872639  \n",
       "\n",
       "[5011 rows x 9 columns]"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a Exponential Moving Average for 7 days span\n",
    "dfWell = dfWell.sort_values(by='Date')\n",
    "dfWell['EWMScore'] = dfWell.groupby('PlayerID')['WellnessScore'].transform(lambda x: x.ewm(span=7 , min_periods=1).mean())\n",
    "dfWell.reset_index(drop=True, inplace=True)\n",
    "dfWell"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now have the final cleaned and summarized wellness dataset which we'll be using for our model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "dfWell.to_csv(\"Clean Data/WellnessMeasure.csv\", index = None, header=True)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}