{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Health Analytics based on Apple Health Data\n", "\n", "The project takes Raw Data from iPhone and converts the datapoints like Step Count, Sleep, Flights Climbed to build a Health Analytics and Prediction System that can be useful to understand patterns. We have also collected user **User Information** for all our 25 User Base whose data we have have analysed in this Project.\n", "\n", "This project will be extremely helpful to understand User Health Patterns and help users self-reflect their Health Metrics with respect to their friends and community\n", "\n", "**Libraries used:**\n", "\n", "1. Pandas\n", "2. Numpy\n", "3. Matplotlib\n", "4. Seaborn\n", "5. Sklearn\n", "6. Pyplot\n", "7. CalMap" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "from datetime import datetime\n", "from matplotlib import pyplot as plt\n", "import os.path\n", "from pathlib import Path\n", "import seaborn as sns; sns.set(style=\"ticks\", color_codes=True)\n", "import calmap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Read the data from CSV for all the 25 Users that we collected XML and converted to CSV" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def read_files(filename, number_folders, delimiter = ';'):\n", " data = []\n", " for i in range(1,number_folders+1):\n", " file = \"users/\"+str(i)+\"/\"+str(filename)\n", " my_file = Path(file)\n", " if my_file.is_file():\n", " df = pd.read_csv(file, delimiter = ';')\n", " data.append(df)\n", " else:\n", "# data.append([\"Not Found\"])\n", " print(i, filename, \"Not found\")\n", " return data\n", "\n", "data = read_files('HKQuantityTypeIdentifierStepCount.csv', 25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Pre-Processing and Feature Engineering\n", "\n", "In this step we preporcess the data and perform feature engineering as follows:\n", "\n", "1. Drop Unuseful Columns\n", "2. Remove -400 from all time fields like creationDate, startDate,endDate\n", "3. Split date time into separate features: Date and Time\n", "4. Drop Unuseful Time fields\n", "5. Create new features" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def preprocess(data, drop_columns):\n", " \"\"\"\n", " Input:\n", " list of df (dataframe): input list of all dataframes\n", " Output:\n", " list of df (dataframe): input list of all dataframes\n", " \"\"\"\n", " number_folders = len(data)\n", " for i in range(1,number_folders+1):\n", " # drop unuseful columns\n", " data[i-1].drop(drop_columns,axis = 1, inplace=True)\n", " # remove the string -400 from date field\n", " data[i-1]['creationDate'] = data[i-1]['creationDate'].astype(str).str[:-6]\n", " data[i-1]['startDate'] = data[i-1]['startDate'].astype(str).str[:-6]\n", " data[i-1]['endDate'] = data[i-1]['endDate'].astype(str).str[:-6]\n", " return data\n", "\n", "column_names = ['type','sourceName','sourceVersion','device','unit']\n", "data_clean = preprocess(data, column_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The below method is used to split date time features separately for the purpose of extracting day, time, hour of the day" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def time_transformation(df):\n", " \"\"\"\n", " Input:\n", " df (dataframe): input dataframe\n", " Output:\n", " df (dataframe): return a dataframe with formatted datetime columns\n", " \"\"\"\n", " df['CREATION_DATE'] = df['creationDate'].apply(lambda x : x.split(' ')[0])\n", " df['CREATION_DATE'] = pd.to_datetime(df['CREATION_DATE'],format='%Y-%m-%d')\n", " df['CREATION_TIME'] = df['creationDate'].apply(lambda x : x.split(' ')[1])\n", " df['CREATION_TIME'] = pd.to_datetime(df['CREATION_TIME'],format='%H:%M:%S').apply(lambda x:x.time())\n", " \n", " df['START_DATE'] = df['startDate'].apply(lambda x : x.split(' ')[0])\n", " df['START_DATE'] = pd.to_datetime(df['START_DATE'],format='%Y-%m-%d')\n", " df['START_TIME'] = df['startDate'].apply(lambda x : x.split(' ')[1])\n", " df['START_TIME'] = pd.to_datetime(df['START_TIME'],format='%H:%M:%S').apply(lambda x:x.time())\n", " \n", " df['END_DATE'] = df['endDate'].apply(lambda x : x.split(' ')[0])\n", " df['END_DATE'] = pd.to_datetime(df['END_DATE'],format='%Y-%m-%d')\n", " df['END_TIME'] = df['endDate'].apply(lambda x : x.split(' ')[1])\n", " df['END_TIME'] = pd.to_datetime(df['END_TIME'],format='%H:%M:%S').apply(lambda x:x.time())\n", " return df" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def perfom_time_transform(data):\n", " number_folders = len(data)\n", " for i in range(1,number_folders+1):\n", " data[i-1] = time_transformation(data[i-1]) \n", " data[i-1].drop((['creationDate','startDate','endDate','CREATION_DATE','CREATION_TIME',\n", " 'END_TIME','START_DATE','START_TIME']), axis=1, inplace=True)\n", " return data\n", "\n", "data_transform = perfom_time_transform(data)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valueEND_DATE
01192014-11-22
11042014-11-22
2782014-11-22
3782014-11-22
4102014-11-22
\n", "
" ], "text/plain": [ " value END_DATE\n", "0 119 2014-11-22\n", "1 104 2014-11-22\n", "2 78 2014-11-22\n", "3 78 2014-11-22\n", "4 10 2014-11-22" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_transform[1].head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create new features for date field:\n", "\n", "1. date by month \n", "2. day_of_week\n", "3. year and \n", "4. day" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def create_date_features(data):\n", " for i in range(1,2):\n", " data[i-1]['month'] = data[i-1]['END_DATE'].apply(lambda x:x.month)\n", " data[i-1]['day_of_week'] = data[i-1]['END_DATE'].apply(lambda x:x.weekday())\n", " data[i-1]['year'] = data[i-1]['END_DATE'].apply(lambda x:x.year)\n", " data[i-1]['day'] = data[i-1]['END_DATE'].apply(lambda x:x.day)\n", " return data\n", "\n", "data_with_features = create_date_features(data_transform)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valueEND_DATE
01192014-11-22
11042014-11-22
2782014-11-22
3782014-11-22
4102014-11-22
\n", "
" ], "text/plain": [ " value END_DATE\n", "0 119 2014-11-22\n", "1 104 2014-11-22\n", "2 78 2014-11-22\n", "3 78 2014-11-22\n", "4 10 2014-11-22" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_with_features[1].head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Calculate Life-Time Monthly Average Metrics\n", "\n", "1. Calculate monthly average steps for each of the users to create a feature" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "def calculate_average_steps(data, filename, number_folders):\n", " final = []\n", " counter = 0\n", " for i in range(1,number_folders+1):\n", " # try to check for file for the user\n", " file = \"users/\"+str(i)+\"/\"+str(filename)\n", " my_file = Path(file)\n", " # if file is found then add the average of the features\n", " if my_file.is_file():\n", " # calculate the average\n", " final_small = []\n", " data[counter].index = data[counter]['END_DATE'] \n", " monthly_average = data[counter][\"value\"].resample('M').sum()/30\n", " final_small.append(i)\n", " final_small.append(monthly_average.sum()/len(monthly_average))\n", " final.append(final_small)\n", " counter = counter + 1\n", " # append nothing or null\n", " else:\n", " final_small = []\n", " final_small.append(i)\n", " final_small.append(0)\n", " final.append(final_small)\n", " df = pd.DataFrame(final,columns=['id','avg'])\n", " return df\n", "\n", "life_time_avg = calculate_average_steps(data_with_features, 'HKQuantityTypeIdentifierStepCount.csv', 25)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idavg
015073.455556
126606.426190
238978.688889
343232.273333
454884.705208
\n", "
" ], "text/plain": [ " id avg\n", "0 1 5073.455556\n", "1 2 6606.426190\n", "2 3 8978.688889\n", "3 4 3232.273333\n", "4 5 4884.705208" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "life_time_avg.head(5)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "from datetime import datetime, date\n", "\n", "def sleep_time_transform(data):\n", " number_folders = len(data)\n", " for i in range(1,number_folders+1):\n", " data[i-1] = time_transformation(data[i-1])\n", " carry = datetime.strptime('0000','%H%M').time()\n", " endtime = datetime.strptime('2359','%H%M').time()\n", " starttime = datetime.strptime('0000','%H%M').time()\n", " tot1=0\n", " tot2=0\n", " for index, row in data[i-1].iterrows():\n", " if row['START_DATE'] == row['END_DATE']:\n", " diff = datetime.combine(date.min, row['END_TIME']) - datetime.combine(date.min, row['START_TIME'])\n", " tot1 = diff.total_seconds()\n", " data[i-1].set_value(index,'value', tot1)\n", " else:\n", " carry = (datetime.combine(date.min, endtime) - datetime.combine(date.min, row['START_TIME'])) + (datetime.combine(date.min, row['END_TIME']) - datetime.combine(date.min, starttime))\n", " tot2 = carry.total_seconds()\n", " data[i-1].set_value(index,'value',tot2)\n", " carry = datetime.strptime('0000','%H%M').time()\n", " tot1=0\n", " tot2=0\n", " # dropping columns \n", " data[i-1].drop((['creationDate','startDate','endDate','CREATION_DATE','CREATION_TIME', 'END_TIME',\n", " 'START_DATE','START_TIME']), axis=1, inplace=True)\n", " data[i-1]['value'] = data[i-1]['value']/3600\n", "# print(data[i-1].shape)\n", " data[i-1] = data[i-1].groupby('END_DATE', as_index=False).sum()\n", "# date[i-1].reset_index()\n", "# print(data[i-1].shape)\n", "# print(data[i-1].head())\n", " return data\n", "\n", "# print(sleep_analysis_life_time_avg)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 HKCategoryTypeIdentifierSleepAnalysis.csv Not found\n", "4 HKCategoryTypeIdentifierSleepAnalysis.csv Not found\n", "5 HKCategoryTypeIdentifierSleepAnalysis.csv Not found\n", "6 HKCategoryTypeIdentifierSleepAnalysis.csv Not found\n", "7 HKCategoryTypeIdentifierSleepAnalysis.csv Not found\n", "8 HKCategoryTypeIdentifierSleepAnalysis.csv Not found\n", "10 HKCategoryTypeIdentifierSleepAnalysis.csv Not found\n", "11 HKCategoryTypeIdentifierSleepAnalysis.csv Not found\n", "12 HKCategoryTypeIdentifierSleepAnalysis.csv Not found\n", "13 HKCategoryTypeIdentifierSleepAnalysis.csv Not found\n", "16 HKCategoryTypeIdentifierSleepAnalysis.csv Not found\n", "20 HKCategoryTypeIdentifierSleepAnalysis.csv Not found\n", "22 HKCategoryTypeIdentifierSleepAnalysis.csv Not found\n", "25 HKCategoryTypeIdentifierSleepAnalysis.csv Not found\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:16: FutureWarning: set_value is deprecated and will be removed in a future release. Please use .at[] or .iat[] accessors instead\n", " app.launch_new_instance()\n", "/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:20: FutureWarning: set_value is deprecated and will be removed in a future release. Please use .at[] or .iat[] accessors instead\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idavg
010.000000
124.130765
233.579188
340.000000
450.000000
\n", "
" ], "text/plain": [ " id avg\n", "0 1 0.000000\n", "1 2 4.130765\n", "2 3 3.579188\n", "3 4 0.000000\n", "4 5 0.000000" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sleep_analysis = read_files('HKCategoryTypeIdentifierSleepAnalysis.csv', 25)\n", "sleep_analysis_clean = preprocess(sleep_analysis, ['type','sourceName','sourceVersion','device','value'])\n", "sleep_analysis_transform = sleep_time_transform(sleep_analysis_clean)\n", "sleep_analysis_life_time_avg = calculate_average_steps(sleep_analysis_transform, \n", " 'HKCategoryTypeIdentifierSleepAnalysis.csv', 25)\n", "sleep_analysis_life_time_avg.head(5)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idavg
012.183262
124.100364
233.759584
342.027935
452.102282
\n", "
" ], "text/plain": [ " id avg\n", "0 1 2.183262\n", "1 2 4.100364\n", "2 3 3.759584\n", "3 4 2.027935\n", "4 5 2.102282" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "distance_analysis = read_files('HKQuantityTypeIdentifierDistanceWalkingRunning.csv', 25)\n", "distance_analysis_clean = preprocess(distance_analysis, ['type','sourceName','sourceVersion','device','unit'])\n", "distance_analysis_transform = perfom_time_transform(distance_analysis_clean)\n", "distance_analysis_features = create_date_features(distance_analysis_transform)\n", "distance_analysis_life_time_avg = calculate_average_steps(distance_analysis_features, \n", " 'HKQuantityTypeIdentifierDistanceWalkingRunning.csv', 25)\n", "distance_analysis_life_time_avg.head(5)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idavg
015073.455556
126606.426190
238978.688889
343232.273333
454884.705208
\n", "
" ], "text/plain": [ " id avg\n", "0 1 5073.455556\n", "1 2 6606.426190\n", "2 3 8978.688889\n", "3 4 3232.273333\n", "4 5 4884.705208" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "steps_analysis = read_files('HKQuantityTypeIdentifierStepCount.csv', 25)\n", "steps_analysis_clean = preprocess(steps_analysis, ['type','sourceName','sourceVersion','device','unit'])\n", "steps_analysis_transform = perfom_time_transform(steps_analysis_clean)\n", "steps_analysis_features = create_date_features(steps_analysis_transform)\n", "steps_analysis_life_time_avg = calculate_average_steps(steps_analysis_features, \n", " 'HKQuantityTypeIdentifierStepCount.csv', 25)\n", "steps_analysis_life_time_avg.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Merging User Data with features created form the Apple Health\n", "\n", "We did a survey of all our users to understand their **Activities**, **Profession, City, Sex, Eating Habits, etc** and create a better Healthy Analysis System. \n", "\n", "We wish to merge features such as:\n", "* Step Count\n", "* Distance of walking or running\n", "* Sleep analysis" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "user_information = pd.read_csv(\"all_users.csv\", delimiter = ',')" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idNameAgeSexProfessionNationalityCityRunningSwimmingCyclingGymmingCommuteCommute_TimeVegNon_Veg
01Bhavya Sharma23MaleStudentIndianPittsburgh0001Walk1010
12Ayush Jain25MaleStudentIndianPittsburgh0001Walk1001
23Rohan Panikkar26MaleStudentIndianPittsburgh0000Walk2001
34Om Khard24MaleFreelance EditorIndianMumbai1101Bus6011
45Vidushi Dikshit24FemaleSoftware EngineerIndianKansas0001Car4501
\n", "
" ], "text/plain": [ " id Name Age Sex Profession Nationality \\\n", "0 1 Bhavya Sharma 23 Male Student Indian \n", "1 2 Ayush Jain 25 Male Student Indian \n", "2 3 Rohan Panikkar 26 Male Student Indian \n", "3 4 Om Khard 24 Male Freelance Editor Indian \n", "4 5 Vidushi Dikshit 24 Female Software Engineer Indian \n", "\n", " City Running Swimming Cycling Gymming Commute Commute_Time Veg \\\n", "0 Pittsburgh 0 0 0 1 Walk 10 1 \n", "1 Pittsburgh 0 0 0 1 Walk 10 0 \n", "2 Pittsburgh 0 0 0 0 Walk 20 0 \n", "3 Mumbai 1 1 0 1 Bus 60 1 \n", "4 Kansas 0 0 0 1 Car 45 0 \n", "\n", " Non_Veg \n", "0 0 \n", "1 1 \n", "2 1 \n", "3 1 \n", "4 1 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "user_information.head(5)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "# renaming columns\n", "steps_analysis_life_time_avg.rename(columns={\"avg\": \"steps\"}, inplace = True)\n", "distance_analysis_life_time_avg.rename(columns={\"avg\": \"distance\"}, inplace = True)\n", "sleep_analysis_life_time_avg.rename(columns={\"avg\": \"sleep\"}, inplace = True)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "merged = pd.merge(user_information, steps_analysis_life_time_avg, on='id', how='left')\n", "merged = pd.merge(merged, distance_analysis_life_time_avg, on='id', how='left')\n", "merged = pd.merge(merged, sleep_analysis_life_time_avg, on='id', how='left')" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idNameAgeSexProfessionNationalityCityRunningSwimmingCyclingGymmingCommuteCommute_TimeVegNon_Vegstepsdistancesleep
01Bhavya Sharma23MaleStudentIndianPittsburgh0001Walk10105073.4555562.1832620.000000
12Ayush Jain25MaleStudentIndianPittsburgh0001Walk10016606.4261904.1003644.130765
23Rohan Panikkar26MaleStudentIndianPittsburgh0000Walk20018978.6888893.7595843.579188
34Om Khard24MaleFreelance EditorIndianMumbai1101Bus60113232.2733332.0279350.000000
45Vidushi Dikshit24FemaleSoftware EngineerIndianKansas0001Car45014884.7052082.1022820.000000
\n", "
" ], "text/plain": [ " id Name Age Sex Profession Nationality \\\n", "0 1 Bhavya Sharma 23 Male Student Indian \n", "1 2 Ayush Jain 25 Male Student Indian \n", "2 3 Rohan Panikkar 26 Male Student Indian \n", "3 4 Om Khard 24 Male Freelance Editor Indian \n", "4 5 Vidushi Dikshit 24 Female Software Engineer Indian \n", "\n", " City Running Swimming Cycling Gymming Commute Commute_Time Veg \\\n", "0 Pittsburgh 0 0 0 1 Walk 10 1 \n", "1 Pittsburgh 0 0 0 1 Walk 10 0 \n", "2 Pittsburgh 0 0 0 0 Walk 20 0 \n", "3 Mumbai 1 1 0 1 Bus 60 1 \n", "4 Kansas 0 0 0 1 Car 45 0 \n", "\n", " Non_Veg steps distance sleep \n", "0 0 5073.455556 2.183262 0.000000 \n", "1 1 6606.426190 4.100364 4.130765 \n", "2 1 8978.688889 3.759584 3.579188 \n", "3 1 3232.273333 2.027935 0.000000 \n", "4 1 4884.705208 2.102282 0.000000 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploratory Data Analysis\n", "\n", "We will not look at patterns and relationships for different sub-groups of data for example:\n", "\n", "1. Patterns by Nationality\n", "2. Eating Habits - Veg or Non-Veg\n", "3. Patterns by Hobbies\n", "4. Patterns by Activity" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Lets look at the pairplots first to derive relationships between attributes in our users\n", "pairplot = merged.drop((['id','Name']),axis=1)\n", "\n", "g = sns.pairplot(pairplot, hue = 'Nationality')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Univariate Plots\n", "\n", "The univariate plots are helpful to undertand the distrbution of our features" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "pairplot.hist()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Density Plots\n", "\n", "We also did Density plots to understand the distributions better according to density" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "pairplot.plot(kind='density', subplots=True, layout=(4,3), sharex=False)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Correlation Matrix Plot\n", "\n", "From the correlation matrix, we can see that Swimming and Cycling are Highly Correlated (Negatively). Also age and profession are positively correlated. This could help us in grouping users to understand their patterns" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "correlations = pairplot.corr()\n", "# plot correlation matrix\n", "fig = plt.figure()\n", "fig.set_size_inches(30, 15)\n", "ax = fig.add_subplot(111)\n", "cax = ax.matshow(correlations, vmin=-1, vmax=1)\n", "fig.colorbar(cax)\n", "ticks = np.arange(0,11,1)\n", "ax.set_xticks(ticks)\n", "ax.set_yticks(ticks)\n", "ax.set_xticklabels(pairplot.columns.tolist())\n", "ax.set_yticklabels(pairplot.columns.tolist())\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Model : Unsupervised Learning\n", "\n", "We decided to use Unsupervised learning because we do not have any Ground Truth data available. The motivation behind using unsupervised learning is to understand **patterns** of User health behaviour and recommend them useful insights to help them **self-reflect** and improve to healthier living patterns\n", "\n", "We will be using KMeans clustering to perform our Unsupervised learning" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "from sklearn.cluster import KMeans\n", "from sklearn.preprocessing import StandardScaler" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## One Hot Encoding" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "merged['Profession']=merged['Profession'].apply(lambda x:x.replace(' ','_'))\n", "merged['City']=merged['City'].apply(lambda x:x.replace(' ','_'))\n", "merged.drop((['id','Name']),axis=1,inplace=True)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "def one_hot(df, list_var):\n", " list_df = pd.get_dummies(df[list_var])\n", " df = pd.concat([df,list_df],axis=1) \n", " return df" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "def get_object_type(df):\n", " cat_col = []\n", " for col in df.columns:\n", " if df[col].dtype == 'object':\n", " cat_col.append(col)\n", " return cat_col" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "cat_col = get_object_type(pairplot)\n", "clean_cat_col= []\n", "for col in cat_col:\n", " if col != 'steps' and col != 'steps':\n", " clean_cat_col.append(col)\n", "\n", "merged_clean = one_hot(merged,clean_cat_col)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeRunningSwimmingCyclingGymmingCommute_TimeVegNon_Vegstepsdistance...City_Jersey_CityCity_KansasCity_MumbaiCity_PerthCity_PittsburghCity_SFOCommute_BusCommute_CarCommute_TrainCommute_Walk
023000110105073.4555562.183262...0000100001
125000110016606.4261904.100364...0000100001
226000020018978.6888893.759584...0000100001
324110160113232.2733332.027935...0010001000
424000145014884.7052082.102282...0100000100
\n", "

5 rows × 32 columns

\n", "
" ], "text/plain": [ " Age Running Swimming Cycling Gymming Commute_Time Veg Non_Veg \\\n", "0 23 0 0 0 1 10 1 0 \n", "1 25 0 0 0 1 10 0 1 \n", "2 26 0 0 0 0 20 0 1 \n", "3 24 1 1 0 1 60 1 1 \n", "4 24 0 0 0 1 45 0 1 \n", "\n", " steps distance ... City_Jersey_City City_Kansas \\\n", "0 5073.455556 2.183262 ... 0 0 \n", "1 6606.426190 4.100364 ... 0 0 \n", "2 8978.688889 3.759584 ... 0 0 \n", "3 3232.273333 2.027935 ... 0 0 \n", "4 4884.705208 2.102282 ... 0 1 \n", "\n", " City_Mumbai City_Perth City_Pittsburgh City_SFO Commute_Bus \\\n", "0 0 0 1 0 0 \n", "1 0 0 1 0 0 \n", "2 0 0 1 0 0 \n", "3 1 0 0 0 1 \n", "4 0 0 0 0 0 \n", "\n", " Commute_Car Commute_Train Commute_Walk \n", "0 0 0 1 \n", "1 0 0 1 \n", "2 0 0 1 \n", "3 0 0 0 \n", "4 1 0 0 \n", "\n", "[5 rows x 32 columns]" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged_clean.drop((['Sex','Profession','Nationality','City','Commute']),axis=1,inplace=True)\n", "merged_clean.head()" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "#change the object data type of categorical data\n", "def change_dtype(df,numeric_names,to_type):\n", " \"\"\"\n", " input:\n", " df (dataframe): input dataframe\n", " numeric_names (list): names of numeric data\n", " to_type (str): target type. 'category', 'str', 'bool'\n", " \n", " \"\"\"\n", " numeric_col = numeric_names\n", " categorical_col = list(df.columns.difference(numeric_col))\n", "\n", " for c in categorical_col:\n", " df[c] = df[c].astype(to_type)\n", " \n", " return df" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Age int64\n", "Running int64\n", "Swimming int64\n", "Cycling int64\n", "Gymming int64\n", "Commute_Time int64\n", "Veg int64\n", "Non_Veg int64\n", "steps float64\n", "distance float64\n", "sleep float64\n", "Sex_Female uint8\n", "Sex_Male uint8\n", "Profession_Entrepreneur uint8\n", "Profession_Freelance_Editor uint8\n", "Profession_Software_Engineer uint8\n", "Profession_Student uint8\n", "Nationality_China uint8\n", "Nationality_Indian uint8\n", "Nationality_Taiwanese uint8\n", "City_Adelaide uint8\n", "City_Delhi uint8\n", "City_Jersey_City uint8\n", "City_Kansas uint8\n", "City_Mumbai uint8\n", "City_Perth uint8\n", "City_Pittsburgh uint8\n", "City_SFO uint8\n", "Commute_Bus uint8\n", "Commute_Car uint8\n", "Commute_Train uint8\n", "Commute_Walk uint8\n", "dtype: object" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged_clean.dtypes" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeRunningSwimmingCyclingGymmingCommute_TimeVegNon_Vegstepsdistance...City_Jersey_CityCity_KansasCity_MumbaiCity_PerthCity_PittsburghCity_SFOCommute_BusCommute_CarCommute_TrainCommute_Walk
023000110105073.4555562.183262...0000100001
125000110016606.4261904.100364...0000100001
226000020018978.6888893.759584...0000100001
324110160113232.2733332.027935...0010001000
424000145014884.7052082.102282...0100000100
\n", "

5 rows × 32 columns

\n", "
" ], "text/plain": [ " Age Running Swimming Cycling Gymming Commute_Time Veg Non_Veg \\\n", "0 23 0 0 0 1 10 1 0 \n", "1 25 0 0 0 1 10 0 1 \n", "2 26 0 0 0 0 20 0 1 \n", "3 24 1 1 0 1 60 1 1 \n", "4 24 0 0 0 1 45 0 1 \n", "\n", " steps distance ... City_Jersey_City City_Kansas \\\n", "0 5073.455556 2.183262 ... 0 0 \n", "1 6606.426190 4.100364 ... 0 0 \n", "2 8978.688889 3.759584 ... 0 0 \n", "3 3232.273333 2.027935 ... 0 0 \n", "4 4884.705208 2.102282 ... 0 1 \n", "\n", " City_Mumbai City_Perth City_Pittsburgh City_SFO Commute_Bus \\\n", "0 0 0 1 0 0 \n", "1 0 0 1 0 0 \n", "2 0 0 1 0 0 \n", "3 1 0 0 0 1 \n", "4 0 0 0 0 0 \n", "\n", " Commute_Car Commute_Train Commute_Walk \n", "0 0 0 1 \n", "1 0 0 1 \n", "2 0 0 1 \n", "3 0 0 0 \n", "4 1 0 0 \n", "\n", "[5 rows x 32 columns]" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged_clean.head()" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "from copy import deepcopy\n", "import numpy as np\n", "import pandas as pd\n", "from matplotlib import pyplot as plt\n", "plt.rcParams['figure.figsize'] = (16, 9)\n", "plt.style.use('ggplot')" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 5073.455556\n", "1 6606.426190\n", "2 8978.688889\n", "3 3232.273333\n", "4 4884.705208\n", "Name: steps, dtype: float64\n", "0 0.000000\n", "1 4.130765\n", "2 3.579188\n", "3 0.000000\n", "4 0.000000\n", "Name: sleep, dtype: float64\n", "0 2.183262\n", "1 4.100364\n", "2 3.759584\n", "3 2.027935\n", "4 2.102282\n", "Name: distance, dtype: float64\n" ] } ], "source": [ "print(merged_clean[\"steps\"].head())\n", "print(merged_clean[\"sleep\"].head())\n", "print(merged_clean[\"distance\"].head())" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['Age', 'Running', 'Swimming', 'Cycling', 'Gymming', 'Commute_Time',\n", " 'Veg', 'Non_Veg', 'steps', 'distance', 'sleep', 'Sex_Female',\n", " 'Sex_Male', 'Profession_Entrepreneur', 'Profession_Freelance_Editor',\n", " 'Profession_Software_Engineer', 'Profession_Student',\n", " 'Nationality_China', 'Nationality_Indian', 'Nationality_Taiwanese',\n", " 'City_Adelaide', 'City_Delhi', 'City_Jersey_City', 'City_Kansas',\n", " 'City_Mumbai', 'City_Perth', 'City_Pittsburgh', 'City_SFO',\n", " 'Commute_Bus', 'Commute_Car', 'Commute_Train', 'Commute_Walk'],\n", " dtype='object')\n" ] } ], "source": [ "cols = merged_clean.columns\n", "print(cols)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "cluster = KMeans(n_clusters = 6)" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [], "source": [ "merged_clean[\"cluster\"] = cluster.fit_predict(merged_clean)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "from scipy.spatial.distance import cdist\n", "# k means determine k\n", "distortions = []\n", "K = range(1,10)\n", "for k in K:\n", " kmeanModel = KMeans(n_clusters=k).fit(merged_clean)\n", " kmeanModel.fit(merged_clean)\n", " distortions.append(sum(np.min(cdist(merged_clean, kmeanModel.cluster_centers_, 'euclidean'), axis=1)) / merged_clean.shape[0])\n", "\n" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Plot the elbow\n", "plt.plot(K, distortions, 'bx-')\n", "plt.xlabel('k')\n", "plt.ylabel('Distortion')\n", "plt.title('The Elbow Method showing the optimal k')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# PCA Component Creation to Visualize Clusters better\n", "\n", "In the below approach, we are creating 2 PCA components, x and y reduce our dimensionality and visualize our clusters better." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "from sklearn.decomposition import PCA\n", "\n", "pca = PCA(n_components = 2)\n", "\n", "merged_clean[\"x\"] = pca.fit_transform(merged_clean[cols])[:,0]\n", "merged_clean[\"y\"] = pca.fit_transform(merged_clean[cols])[:,1]\n", "merged_clean = merged_clean.reset_index()" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexAgeRunningSwimmingCyclingGymmingCommute_TimeVegNon_Vegsteps...City_PerthCity_PittsburghCity_SFOCommute_BusCommute_CarCommute_TrainCommute_Walkclusterxy
202029100030107682.252941...010000122005.3286869.375952
212123000130013951.920833...01000013-1724.992209-2.710310
222223000130015483.919608...01000015-193.0019672.131796
232324111115105092.648485...01000011-584.218441-14.121743
242429000020016647.680000...01000010970.791726-3.761315
\n", "

5 rows × 36 columns

\n", "
" ], "text/plain": [ " index Age Running Swimming Cycling Gymming Commute_Time Veg \\\n", "20 20 29 1 0 0 0 30 1 \n", "21 21 23 0 0 0 1 30 0 \n", "22 22 23 0 0 0 1 30 0 \n", "23 23 24 1 1 1 1 15 1 \n", "24 24 29 0 0 0 0 20 0 \n", "\n", " Non_Veg steps ... City_Perth City_Pittsburgh City_SFO \\\n", "20 0 7682.252941 ... 0 1 0 \n", "21 1 3951.920833 ... 0 1 0 \n", "22 1 5483.919608 ... 0 1 0 \n", "23 0 5092.648485 ... 0 1 0 \n", "24 1 6647.680000 ... 0 1 0 \n", "\n", " Commute_Bus Commute_Car Commute_Train Commute_Walk cluster \\\n", "20 0 0 0 1 2 \n", "21 0 0 0 1 3 \n", "22 0 0 0 1 5 \n", "23 0 0 0 1 1 \n", "24 0 0 0 1 0 \n", "\n", " x y \n", "20 2005.328686 9.375952 \n", "21 -1724.992209 -2.710310 \n", "22 -193.001967 2.131796 \n", "23 -584.218441 -14.121743 \n", "24 970.791726 -3.761315 \n", "\n", "[5 rows x 36 columns]" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged_clean.tail()" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "final_clusters = merged_clean[['Age','steps','x','y']]" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/vnd.plotly.v1+html": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import plotly.plotly as py\n", "import plotly.graph_objs as go\n", "from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot\n", "init_notebook_mode()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will now create a scatter plot for all of the 6 Clusters that we have chosen using the Elbow approach with lowest decrease in losses." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "data": [ { "marker": { "color": "rgba(15,152,152,0.5)", "line": { "color": "rgb(0,0,0)", "width": 1 }, "size": 10 }, "mode": "markers", "name": "Cluster 1", "type": "scatter", "x": [ 929.5673558037972, 770.3876419499859, 970.791726354481 ], "y": [ -14.199603744343172, 5.482825273902316, -3.7613150139503007 ] }, { "marker": { "color": "rgba(180,18,180,0.5)", "line": { "color": "rgb(0,0,0)", "width": 1 }, "size": 10 }, "mode": "markers", "name": "Cluster 2", "type": "scatter", "x": [ -603.3995047239753, -792.2612586433485, -840.6202239004308, -693.8226931532614, -1057.8248239147922, -798.2933254142713, -1232.4634663307731, -949.8500421055809, -791.8264690374339, -584.2184409801187 ], "y": [ -19.111934735938156, 15.310185208568756, 0.05291181976939672, -4.387872669047419, -20.52438235804893, 45.23409069975479, -0.9901264752402309, -0.1898647329021422, 0.2529855525943868, -14.121743028532391 ] }, { "marker": { "color": "rgba(132,132,132,0.8)", "line": { "color": "rgb(0,0,0)", "width": 1 }, "size": 10 }, "mode": "markers", "name": "Cluster 3", "type": "scatter", "x": [ 1910.108593709318, 2069.9610386080826, 2005.3286864700403 ], "y": [ 10.374653845195164, 9.43982066340518, 9.375951612497188 ] }, { "marker": { "color": "rgba(122,122,12,0.8)", "line": { "color": "rgb(0,0,0)", "width": 1 }, "size": 10 }, "mode": "markers", "name": "Cluster 4", "type": "scatter", "x": [ -2444.7308933340146, -2097.0868655865793, -1724.9922086216868 ], "y": [ 25.06536243375068, -3.647475041951041, -2.710309841219584 ] }, { "marker": { "color": "rgba(210,20,30,0.5)", "line": { "color": "rgb(0,0,0)", "width": 1 }, "size": 10 }, "mode": "markers", "name": "Cluster 5", "type": "scatter", "x": [ 3301.782746192508, 2999.658712431589 ], "y": [ 3.3413428171831594, -2.6545935056592564 ] }, { "marker": { "color": "rgba(240,60,70,0.5)", "line": { "color": "rgb(0,0,0)", "width": 1 }, "size": 10 }, "mode": "markers", "name": "Cluster 5", "type": "scatter", "x": [ 3301.782746192508, 2999.658712431589 ], "y": [ 3.3413428171831594, -2.6545935056592564 ] } ], "layout": {} }, "text/html": [ "
" ], "text/vnd.plotly.v1+html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "trace0 = go.Scatter(x =final_clusters[merged_clean.cluster == 0][\"x\"],\n", " y = final_clusters[merged_clean.cluster == 0][\"y\"],\n", " name = \"Cluster 1\",\n", " mode = \"markers\",\n", " marker = dict(size = 10,\n", " color = \"rgba(15,152,152,0.5)\",\n", " line = dict(width = 1, color = \"rgb(0,0,0)\")))\n", "trace1 = go.Scatter(x =final_clusters[merged_clean.cluster == 1][\"x\"],\n", " y = final_clusters[merged_clean.cluster == 1][\"y\"],\n", " name = \"Cluster 2\",\n", " mode = \"markers\",\n", " marker = dict(size = 10,\n", " color = \"rgba(180,18,180,0.5)\",\n", " line = dict(width = 1, color = \"rgb(0,0,0)\")))\n", "trace2 = go.Scatter(x =final_clusters[merged_clean.cluster == 2][\"x\"],\n", " y = final_clusters[merged_clean.cluster == 2][\"y\"],\n", " name = \"Cluster 3\",\n", " mode = \"markers\",\n", " marker = dict(size = 10,\n", " color = \"rgba(132,132,132,0.8)\",\n", " line = dict(width = 1, color = \"rgb(0,0,0)\")))\n", "trace3 = go.Scatter(x =final_clusters[merged_clean.cluster == 3][\"x\"],\n", " y = final_clusters[merged_clean.cluster == 3][\"y\"],\n", " name = \"Cluster 4\",\n", " mode = \"markers\",\n", " marker = dict(size = 10,\n", " color = \"rgba(122,122,12,0.8)\",\n", " line = dict(width = 1, color = \"rgb(0,0,0)\")))\n", "trace4 = go.Scatter(x =final_clusters[merged_clean.cluster == 4][\"x\"],\n", " y = final_clusters[merged_clean.cluster == 4][\"y\"],\n", " name = \"Cluster 5\",\n", " mode = \"markers\",\n", " marker = dict(size = 10,\n", " color = \"rgba(210,20,30,0.5)\",\n", " line = dict(width = 1, color = \"rgb(0,0,0)\")))\n", "trace5 = go.Scatter(x =final_clusters[merged_clean.cluster == 4][\"x\"],\n", " y = final_clusters[merged_clean.cluster == 4][\"y\"],\n", " name = \"Cluster 5\",\n", " mode = \"markers\",\n", " marker = dict(size = 10,\n", " color = \"rgba(240,60,70,0.5)\",\n", " line = dict(width = 1, color = \"rgb(0,0,0)\")))\n", "\n", "\n", "\n", "data = [trace0,trace1,trace2,trace3,trace4,trace5]\n", "iplot(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating New Features to identify specific clusters\n", "\n", "These features will be helpful to understand the business value and make inferences from our clusters by comparing all features against the ones which are interpretable in different clusters" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "merged_clean[\"0\"] = merged_clean.cluster == 0\n", "merged_clean[\"1\"] = merged_clean.cluster == 1\n", "merged_clean[\"2\"] = merged_clean.cluster == 2\n", "merged_clean[\"3\"] = merged_clean.cluster == 3\n", "merged_clean[\"4\"] = merged_clean.cluster == 4\n", "merged_clean[\"5\"] = merged_clean.cluster == 5" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Age\n", "False 23 6\n", " 24 5\n", " 25 3\n", " 26 3\n", " 22 2\n", " 28 1\n", " 29 1\n", " 48 1\n", "True 25 1\n", " 28 1\n", " 29 1\n", "Name: Age, dtype: int64" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged_clean.groupby(['0']).Age.value_counts()" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1 Running\n", "False 0 11\n", " 1 4\n", "True 0 7\n", " 1 3\n", "Name: Running, dtype: int64" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged_clean.groupby(['1']).Running.value_counts()" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2 Running\n", "False 0 17\n", " 1 5\n", "True 1 2\n", " 0 1\n", "Name: Running, dtype: int64" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged_clean.groupby(['2']).Running.value_counts()" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3 Running\n", "False 0 16\n", " 1 6\n", "True 0 2\n", " 1 1\n", "Name: Running, dtype: int64" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged_clean.groupby(['3']).Running.value_counts()" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4 Running\n", "False 0 16\n", " 1 7\n", "True 0 2\n", "Name: Running, dtype: int64" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged_clean.groupby(['4']).Running.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cluster Analysis and Recommendations\n", "\n", "In this phase we interpret each of the clusters separately to understand the useful and **meaningful patterns** in our data. \n", "\n", "We were able to find out insights by highest **average steps, sleep and distance patterns.**\n", "\n", "In the below section, we give an example of a sample analysis of clusters by **highest average steps**" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['index', 'Age', 'Running', 'Swimming', 'Cycling', 'Gymming',\n", " 'Commute_Time', 'Veg', 'Non_Veg', 'steps', 'distance', 'sleep',\n", " 'Sex_Female', 'Sex_Male', 'Profession_Entrepreneur',\n", " 'Profession_Freelance_Editor', 'Profession_Software_Engineer',\n", " 'Profession_Student', 'Nationality_China', 'Nationality_Indian',\n", " 'Nationality_Taiwanese', 'City_Adelaide', 'City_Delhi',\n", " 'City_Jersey_City', 'City_Kansas', 'City_Mumbai', 'City_Perth',\n", " 'City_Pittsburgh', 'City_SFO', 'Commute_Bus', 'Commute_Car',\n", " 'Commute_Train', 'Commute_Walk', 'cluster', 'x', 'y', '0', '1', '2',\n", " '3', '4', '5', 'Sex', 'Profession', 'Nationality', 'City', 'Commute',\n", " 'Name'],\n", " dtype='object')" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged_new = user_information[['Sex', 'Profession', 'Nationality','City','Commute','Name']].copy()\n", "\n", "merged_clean_new = merged_clean\n", "\n", "analysis_df = pd.concat((merged_clean_new,merged_new),axis=1)\n", "\n", "analysis_df.columns" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Sex \n", "False Male 13\n", " Female 9\n", "True Male 2\n", " Female 1\n", "Name: Sex, dtype: int64" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "analysis_df.groupby(['0']).Sex.value_counts()" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1 Sex \n", "False Male 10\n", " Female 5\n", "True Female 5\n", " Male 5\n", "Name: Sex, dtype: int64" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "analysis_df.groupby(['1']).Sex.value_counts()" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexAgeRunningSwimmingCyclingGymmingCommute_TimeVegNon_Vegsteps...2345SexProfessionNationalityCityCommuteName
0023000110105073.455556...FalseFalseFalseFalseMaleStudentIndianPittsburghWalkBhavya Sharma
1125000110016606.426190...FalseFalseFalseFalseMaleStudentIndianPittsburghWalkAyush Jain
2226000020018978.688889...FalseFalseTrueFalseMaleStudentIndianPittsburghWalkRohan Panikkar
3324110160113232.273333...FalseTrueFalseFalseMaleFreelance EditorIndianMumbaiBusOm Khard
4424000145014884.705208...FalseFalseFalseFalseFemaleSoftware EngineerIndianKansasCarVidushi Dikshit
\n", "

5 rows × 48 columns

\n", "
" ], "text/plain": [ " index Age Running Swimming Cycling Gymming Commute_Time Veg \\\n", "0 0 23 0 0 0 1 10 1 \n", "1 1 25 0 0 0 1 10 0 \n", "2 2 26 0 0 0 0 20 0 \n", "3 3 24 1 1 0 1 60 1 \n", "4 4 24 0 0 0 1 45 0 \n", "\n", " Non_Veg steps ... 2 3 4 5 Sex \\\n", "0 0 5073.455556 ... False False False False Male \n", "1 1 6606.426190 ... False False False False Male \n", "2 1 8978.688889 ... False False True False Male \n", "3 1 3232.273333 ... False True False False Male \n", "4 1 4884.705208 ... False False False False Female \n", "\n", " Profession Nationality City Commute Name \n", "0 Student Indian Pittsburgh Walk Bhavya Sharma \n", "1 Student Indian Pittsburgh Walk Ayush Jain \n", "2 Student Indian Pittsburgh Walk Rohan Panikkar \n", "3 Freelance Editor Indian Mumbai Bus Om Khard \n", "4 Software Engineer Indian Kansas Car Vidushi Dikshit \n", "\n", "[5 rows x 48 columns]" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "analysis_df.head()" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "analysis_df[analysis_df.cluster==0][\"Commute\"].count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From our analysis we can infer that Commute has maximum counts in **Cluster 2** (Cluster Index 1), but it is mixed form, **Walk, Bus, Train and Car**. This does not point us to any valuable insight but only helps us understand the nature of the cluster" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1 Commute\n", "False Walk 9\n", " Bus 4\n", " Car 2\n", "True Walk 4\n", " Bus 3\n", " Train 2\n", " Car 1\n", "Name: Commute, dtype: int64" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "analysis_df.groupby(['1']).Commute.value_counts()" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0\n", "False 5555.514915\n", "True 6567.138175\n", "Name: steps, dtype: float64" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "analysis_df.groupby(['0']).steps.mean()" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1\n", "False 6233.210009\n", "True 4842.459252\n", "Name: steps, dtype: float64" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "analysis_df.groupby(['1']).steps.mean()" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2\n", "False 5404.844824\n", "True 7672.052171\n", "Name: steps, dtype: float64" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "analysis_df.groupby(['2']).steps.mean()" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3\n", "False 5961.760429\n", "True 3588.004405\n", "Name: steps, dtype: float64" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "analysis_df.groupby(['3']).steps.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the below two analysis steps, we can identify **Cluster 5 (Cluster Index 4)** has **highest mean** of average **steps**. Also, the only means to commute in this cluster is by **Walk**" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4\n", "False 5402.935067\n", "True 8827.618056\n", "Name: steps, dtype: float64" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "analysis_df.groupby(['4']).steps.mean()" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5\n", "False 5693.400613\n", "True 5590.332442\n", "Name: steps, dtype: float64" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "analysis_df.groupby(['5']).steps.mean()" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1 Commute\n", "False Walk 9\n", " Bus 4\n", " Car 2\n", "True Walk 4\n", " Bus 3\n", " Train 2\n", " Car 1\n", "Name: Commute, dtype: int64" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "analysis_df.groupby(['1']).Commute.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Winners\n", "\n", "From our example of using one of the metrics **Step Count**, Cluster 5 (Cluster Index = 4) gives two users, Rohan Panikkar and Devang Varia who have the best Step averages. We can infer that these users tend to be healthier on average by looking at their datapoints" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2 Rohan Panikkar\n", "8 Devang Varia\n", "Name: Name, dtype: object" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "analysis_df[analysis_df.cluster==4][\"Name\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Analysis of Winners" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexAgeRunningSwimmingCyclingGymmingCommute_TimeVegNon_Vegsteps...2345SexProfessionNationalityCityCommuteName
2226000020018978.688889...FalseFalseTrueFalseMaleStudentIndianPittsburghWalkRohan Panikkar
\n", "

1 rows × 48 columns

\n", "
" ], "text/plain": [ " index Age Running Swimming Cycling Gymming Commute_Time Veg \\\n", "2 2 26 0 0 0 0 20 0 \n", "\n", " Non_Veg steps ... 2 3 4 5 Sex \\\n", "2 1 8978.688889 ... False False True False Male \n", "\n", " Profession Nationality City Commute Name \n", "2 Student Indian Pittsburgh Walk Rohan Panikkar \n", "\n", "[1 rows x 48 columns]" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "analysis_df.loc[analysis_df['index'] == 2]" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexAgeRunningSwimmingCyclingGymmingCommute_TimeVegNon_Vegsteps...2345SexProfessionNationalityCityCommuteName
8825000115018676.547222...FalseFalseTrueFalseMaleStudentIndianPittsburghWalkDevang Varia
\n", "

1 rows × 48 columns

\n", "
" ], "text/plain": [ " index Age Running Swimming Cycling Gymming Commute_Time Veg \\\n", "8 8 25 0 0 0 1 15 0 \n", "\n", " Non_Veg steps ... 2 3 4 5 Sex \\\n", "8 1 8676.547222 ... False False True False Male \n", "\n", " Profession Nationality City Commute Name \n", "8 Student Indian Pittsburgh Walk Devang Varia \n", "\n", "[1 rows x 48 columns]" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "analysis_df.loc[analysis_df['index'] == 8]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## No. of Steps for User 8 (Devang Varia)\n", "\n", "When we check for both users, we inferred that User 8 has a better Steps Pattern and is clearly our winner" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/anaconda3/lib/python3.6/site-packages/calmap/__init__.py:294: FutureWarning:\n", "\n", "how in .resample() is deprecated\n", "the new syntax is .resample(...).sum()\n", "\n", "/anaconda3/lib/python3.6/site-packages/calmap/__init__.py:146: MatplotlibDeprecationWarning:\n", "\n", "The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.\n", "\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib\n", "steps_analysis_features[8] = steps_analysis_features[8][steps_analysis_features[8].value != 0]\n", "steps_analysis_features[8].index=pd.to_datetime(steps_analysis_features[8].END_DATE)\n", "events = pd.Series(steps_analysis_features[8]['value'])\n", "fig,ax=calmap.calendarplot(events, monthticks=True, cmap='GnBu', vmin=0, \n", " vmax=max(steps_analysis_features[8]['value']),fig_kws=dict(figsize=(12, 8)));\n", "c = fig.add_axes([1.0, 0.2, 0.02, 0.6])\n", "normalize = matplotlib.colors.Normalize(0,max(steps_analysis_features[8]['value']))\n", "cb = matplotlib.colorbar.ColorbarBase(c, cmap='GnBu', norm=normalize)\n", "cb.set_label('# of Steps')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Future Enhancements\n", "\n", "In the future, we could create a HealthScore which can be calculated based on insights from our Clusters and then label our **Ground Truth label**. This will be very useful to use for a **Classification** algorithims which can then be used for **Real-Time** analysis of Health Index for users on a Daily Basis.\n", "\n", "Through this project, we hope to use the insights to help users in self-reflection of their data and then improve their overall health." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# References\n", "\n", "1. https://developer.apple.com/documentation/healthkit\n", "\n", "2. http://www.ryanpraski.com/apple-health-data-how-to-export-analyze-visualize-guide/\n", "\n", "3. http://ericwolter.com/projects/health-export.html\n", "\n", "4. https://github.com/amandasolis/Fitbit/blob/master/FitbitSummaryPlots.ipynb\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }