{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Divvy Data Analysis\n", "**View on:** [nbviewer](https://nbviewer.jupyter.org/github/chrisluedtke/divvy-data-analysis/blob/master/notebook.ipynb), [Google Colab](https://colab.research.google.com/github/chrisluedtke/divvy-data-analysis/blob/master/notebook.ipynb)\n", "\n", "![animation](https://github.com/chrisluedtke/divvy-data-analysis/blob/master/img/divvy_day.gif?raw=true)\n", "
View on YouTube
\n", "\n", "**Contents**\n", "* [Data-Sourcing](#Data-Sourcing)\n", "* [Exploration](#Exploration)\n", "* [Merge-Station-Coordinates](#Merge-Station-Coordinates)\n", "* [Farthest-Ridden-Bike](#Farthest-Ridden-Bike)\n", "* [Calculate-Distances](#Calculate-Distances)\n", "* [Exploration-with-Distance](#Exploration-with-Distance)\n", "* [Check-Daylight-Savings](#Check-Daylight-Savings)\n", "* [Animated-Plot](#Animated-Plot)\n", "* [Perception-of-Circle-Size](#Perception-of-Circle-Size)\n", "* [Dualmap](http://localhost:8889/notebooks/GitHub/divvy-data-analysis/notebook.ipynb#Dualmap)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "%matplotlib inline\n", "import folium\n", "import folium.plugins\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import seaborn as sns\n", "\n", "import divvydata\n", "from nb_utils.colors import linear_gradient, polylinear_gradient\n", "from nb_utils.data_processing import my_melt, add_empty_rows\n", "from nb_utils.geospatial import haversine\n", "from nb_utils.mapping import (create_map, gen_maps_by_group, \n", " render_html_map_to_png)\n", "\n", "pd.options.display.max_columns = None\n", "plt.style.use('seaborn')\n", "sns.set_context('talk', rc={'figure.figsize':(10, 7)})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Sourcing\n", "Data from: https://www.divvybikes.com/system-data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on function get_historical_data in module divvydata.historical_data:\n", "\n", "get_historical_data(years:List[str], write_to:str=None, rides=True, stations=True)\n", " Gathers and cleans historical Divvy data\n", " \n", " write_to: optional local folder path to extract zip files to\n", " returns: (pandas.DataFrame of rides, pandas.DataFrame of stations)\n", "\n" ] } ], "source": [ "help(divvydata.get_historical_data)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on class StationsFeed in module divvydata.stations_feed:\n", "\n", "class StationsFeed(builtins.object)\n", " | Client that pulls data from Divvy JSON feed:\n", " | \n", " | https://feeds.divvybikes.com/stations/stations.json\n", " | \n", " | Methods defined here:\n", " | \n", " | __init__(self)\n", " | Initialize self. See help(type(self)) for accurate signature.\n", " | \n", " | monitor_data(self, interval_sec=5, runtime_sec=1000)\n", " | Listens to JSON feed and tracks events.\n", " | \n", " | interval_sec: default 5 seconds.\n", " | runtime_sec: default 1000 seconds. Set to None to run indefinitely.\n", " | \n", " | returns: pandas.DataFrame\n", " | \n", " | update_data(self)\n", " | Overwrites `data` attribute with most recent station data.\n", " | \n", " | ----------------------------------------------------------------------\n", " | Static methods defined here:\n", " | \n", " | get_current_data()\n", " | Pulls current data. Does not assign to an attribute.\n", " | \n", " | ----------------------------------------------------------------------\n", " | Data descriptors defined here:\n", " | \n", " | __dict__\n", " | dictionary for instance variables (if defined)\n", " | \n", " | __weakref__\n", " | list of weak references to the object (if defined)\n", " | \n", " | data_call_time\n", " | \n", " | event_history_time_span\n", "\n" ] } ], "source": [ "help(divvydata.StationsFeed)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# rides, stations = divvydata.get_historical_data(\n", "# years=[str(yr) for yr in range(2013,2019)],\n", "# rides=True, \n", "# stations=True\n", "# )" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# rides.to_pickle('data/rides.pkl')\n", "# stations.to_pickle('data/stations.pkl')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(17425340, 10)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bikeidbirthyearend_timefrom_station_idgenderstart_timeto_station_idtrip_idtripdurationusertype
3567429141982.02013-06-27 09:46:0091Male2013-06-27 01:06:0048394031177.0Subscriber
3567447111982.02013-06-27 11:11:0088Male2013-06-27 11:09:00884113140.0Subscriber
3567457111982.02013-06-27 11:13:0088Male2013-06-27 11:12:0088411987.0Subscriber
3567461451978.02013-06-27 14:38:0017Male2013-06-27 11:24:0061413411674.0Subscriber
3567477111982.02013-06-27 16:01:0088Male2013-06-27 11:39:0034416215758.0Subscriber
\n", "
" ], "text/plain": [ " bikeid birthyear end_time from_station_id gender \\\n", "356742 914 1982.0 2013-06-27 09:46:00 91 Male \n", "356744 711 1982.0 2013-06-27 11:11:00 88 Male \n", "356745 711 1982.0 2013-06-27 11:13:00 88 Male \n", "356746 145 1978.0 2013-06-27 14:38:00 17 Male \n", "356747 711 1982.0 2013-06-27 16:01:00 88 Male \n", "\n", " start_time to_station_id trip_id tripduration usertype \n", "356742 2013-06-27 01:06:00 48 3940 31177.0 Subscriber \n", "356744 2013-06-27 11:09:00 88 4113 140.0 Subscriber \n", "356745 2013-06-27 11:12:00 88 4119 87.0 Subscriber \n", "356746 2013-06-27 11:24:00 61 4134 11674.0 Subscriber \n", "356747 2013-06-27 11:39:00 34 4162 15758.0 Subscriber " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rides = pd.read_pickle('data/rides.pkl')\n", "# drop unused cols to save space\n", "rides = rides.drop(columns=['from_station_name', 'to_station_name'])\n", "print(rides.shape)\n", "rides.head()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(4846, 7)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
as_of_datedpcapacityidlatlonnameonline_date
9002015-12-3135241.872293-87.624091Michigan Ave & Balbo AveNaT
14212016-06-3035241.872667-87.623971Buckingham Fountain2015-05-08 00:00:00
28472016-09-3035241.872638-87.623979Michigan Ave & Balbo Ave2015-05-08 00:00:00
22662016-12-3135241.872638-87.623979Michigan Ave & Balbo Ave2015-05-08 00:00:00
31232017-06-3027241.881060-87.619486Buckingham Fountain2013-06-10 10:43:46
\n", "
" ], "text/plain": [ " as_of_date dpcapacity id lat lon \\\n", "900 2015-12-31 35 2 41.872293 -87.624091 \n", "1421 2016-06-30 35 2 41.872667 -87.623971 \n", "2847 2016-09-30 35 2 41.872638 -87.623979 \n", "2266 2016-12-31 35 2 41.872638 -87.623979 \n", "3123 2017-06-30 27 2 41.881060 -87.619486 \n", "\n", " name online_date \n", "900 Michigan Ave & Balbo Ave NaT \n", "1421 Buckingham Fountain 2015-05-08 00:00:00 \n", "2847 Michigan Ave & Balbo Ave 2015-05-08 00:00:00 \n", "2266 Michigan Ave & Balbo Ave 2015-05-08 00:00:00 \n", "3123 Buckingham Fountain 2013-06-10 10:43:46 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stations = pd.read_pickle('data/stations.pkl')\n", "stations = stations.rename(columns={'latitude':'lat', 'longitude':'lon'})\n", "print(stations.shape)\n", "stations.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploration" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "bikeid 0\n", "birthyear 4328042\n", "end_time 0\n", "from_station_id 0\n", "gender 4335754\n", "start_time 0\n", "to_station_id 0\n", "trip_id 0\n", "tripduration 0\n", "usertype 0\n", "dtype: int64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rides.isna().sum()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "count 6386.000000\n", "mean 2728.678359\n", "std 871.209718\n", "min 2.000000\n", "25% 2059.250000\n", "50% 2779.500000\n", "75% 3427.000000\n", "max 5164.000000\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# How many times has each bike been ridden?\n", "bike_use = rides['bikeid'].value_counts()\n", "print(bike_use.describe().to_string())\n", "sns.distplot(bike_use)\n", "plt.title('Rides per Bike');" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Count of bikes released per month\n", "frst_use = (rides.groupby('bikeid')['start_time'].min()\n", " .dt.to_period(\"Q\")\n", " .rename('first_use_quarter'))\n", "\n", "quarterly_counts = frst_use.value_counts()\n", "\n", "all_dates = pd.date_range(start=rides.start_time.min().date(), \n", " end=rides.start_time.max().date(), \n", " freq='Q').to_period(\"Q\")\n", "\n", "all_dates = pd.Series(index=all_dates,\n", " data=0)\n", "\n", "all_dates.update(quarterly_counts)\n", "\n", "all_dates.plot(kind='bar', width=1, \n", " title='Bikes released per month');" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# # color bike usage by quarter first ridden\n", "# bike_use_q = rides[['bikeid']].merge(frst_use, left_on='bikeid',\n", "# right_index=True, how='left')\n", "# bike_use_q = bike_use_q.sort_values('first_use_quarter')\n", "\n", "# top8_qs = frst_use.value_counts().index[:8]\n", "# bike_use_q = bike_use_q.loc[bike_use_q['first_use_quarter'].isin(top8_qs)]\n", "\n", "# bike_use_grpd = bike_use_q.groupby('first_use_quarter')\n", "\n", "# for group_name, group_df in bike_use_grpd:\n", "# group_df = group_df['bikeid'].value_counts()\n", "# sns.distplot(group_df)\n", "# plt.xlim([0, 5000])\n", "# plt.title(f'{group_name} bikes')\n", "# plt.show();" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Count of quarterly rides\n", "(rides['start_time'].groupby([rides.start_time.dt.to_period(\"Q\")])\n", " .count()\n", " .plot.bar(width=1));" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Trip Duration (Minutes)\n", "count 1.742534e+07\n", "mean 1.821090e+01\n", "std 2.739044e+02\n", "min 1.000000e+00\n", "25% 6.850000e+00\n", "50% 1.176667e+01\n", "75% 1.995000e+01\n", "max 2.389400e+05\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print('Trip Duration (Minutes)')\n", "print(rides.tripduration.divide(60).describe().to_string())\n", "sns.distplot(rides.loc[rides.tripduration < \n", " rides.tripduration.quantile(.95),\n", " 'tripduration'].divide(60))\n", "plt.title('Trip Duration (Minutes)');" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "603.7503168125318" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rides.tripduration.sum() / 60 / 60 / 24 / 365 #years" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Bike Use (Hours)\n", "count 6386.000000\n", "mean 828.194923\n", "std 342.489106\n", "min 0.369722\n", "25% 611.624236\n", "50% 822.267778\n", "75% 1013.357014\n", "max 4981.602500\n" ] } ], "source": [ "sum_duration_bike = (rides.groupby('bikeid')['tripduration'].sum()\n", " .divide(60).divide(60))\n", "print('Bike Use (Hours)')\n", "print(sum_duration_bike.describe().to_string())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Merge Station Coordinates" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Stations have moved\n", "(stations.drop_duplicates(['id', 'lat', 'lon'])['id']\n", " .value_counts()\n", " .plot.hist(title='Station Instances'));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unfortunately, Divvy kept the same station ID while physically moving those stations around. This adds a lot of complexity to route analysis.\n", "\n", "One solution would be to round lat/lon coordinates to some [degree of precision](https://en.wikipedia.org/wiki/Decimal_degrees#Precision), and then remove duplicates on rounded position. While that may seem to reduce the problem, there would be no way to determine whether a station initially at position A, moved to position B, and then back to position A.\n", "\n", "Another approach is to calculate the rolling difference of lat/lon coordinates and filter out differences below a desired precision. Let's do that." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "fix_stns = stations.copy()\n", "\n", "fix_stns = fix_stns.sort_values(['id', 'as_of_date'])\n", "\n", "fix_stns['dist_m'] = np.concatenate(\n", " fix_stns.groupby('id')\n", " .apply(lambda x: haversine(\n", " x['lat'].values, x['lon'].values,\n", " x['lat'].shift().values, x['lon'].shift().values)).values\n", ")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# NaNs are first instance, so keep those\n", "mask = fix_stns.dist_m.isna() | (fix_stns.dist_m > 30)\n", "fix_stns.loc[mask, 'id'].value_counts().plot(kind='hist', title='Reduced Station Instances');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can assess the problem by plotting stations that have moved:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = fix_stns.loc[fix_stns.id.duplicated(keep=False)]\n", "\n", "m = folium.Map(location=[df.lat.mean(), \n", " df.lon.mean()],\n", " tiles='CartoDB dark_matter',\n", " zoom_start=12)\n", "\n", "for g_k, g_df in df.groupby('id'):\n", " total_dist = g_df.dist_m.sum()\n", " if total_dist < 10:\n", " continue\n", "\n", " text = (f\"Station {g_df.id.values[0]}
\"\n", " f\"{int(total_dist)} m\")\n", " folium.PolyLine(\n", " locations=list(zip(g_df.lat, g_df.lon)), \n", " tooltip=text, color=\"#E37222\", weight=3\n", " ).add_to(m)\n", " \n", "folium.plugins.Fullscreen(\n", " position='topright',\n", " force_separate_button=True\n", ").add_to(m)\n", "\n", "m.save('maps/stations_moved.html')\n", "m" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "rides['end_date'] = rides['end_time'].dt.date\n", "\n", "day_aggs = (rides.groupby(['to_station_id', 'end_date'])\n", " .agg({'from_station_id':'median',\n", " 'tripduration':'mean',\n", " 'end_time':'count'})\n", " .rename(columns={'from_station_id':'trip_origin_median',\n", " 'tripduration':'trip_duration_mean',\n", " 'end_time':'trip_counts'}))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "as_of_date dist_m lat lon\n", "2015-12-31 NaN 41.853661 -87.635135\n", "2016-06-30 1880.031485 41.870257 -87.639474\n", "2016-09-30 0.000000 41.870257 -87.639474\n", "2016-12-31 0.000000 41.870257 -87.639474\n", "2017-06-30 0.000000 41.870257 -87.639474\n", "2017-12-31 0.000000 41.870257 -87.639474\n", "2019-03-05 0.000000 41.870257 -87.639474\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "stn_id = 414\n", "min_date, max_date = '2015-12-31', '2016-06-30'\n", "\n", "grouped = fix_stns.groupby('id')\n", "cols = ['as_of_date', 'dist_m', 'lat', 'lon']\n", "print(grouped.get_group(stn_id)[cols].to_string(index=False))\n", "\n", "stn_aggs = day_aggs.loc[stn_id]\n", "\n", "dates = pd.DataFrame(data={k:0 for k in stn_aggs}, \n", " index=pd.date_range(min_date, max_date))\n", "\n", "dates.update(stn_aggs)\n", "\n", "for col in ['trip_duration_mean', 'trip_counts']:\n", " plt.bar(x=dates.index, height=dates[col])\n", " plt.xlim(min_date, max_date)\n", " plt.title(col)\n", " plt.show();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### TODO\n", "* subtract daily average of all active stations\n", "* pick a few nearby stations -- should observe that durations change after a move\n", "* monitor the set of station origins, should change \n", "\n", "From here I would create a lookup table for stations that have moved. The rows would span each day the station was active, and I would merge with my `rides` table on a `ride_id_date` key.\n", "\n", "But for now, I'll average each duplicated station's lat/lon positions to make things easy." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": true }, "outputs": [], "source": [ "stations = (fix_stns.groupby('id')['lat', 'lon'].mean())" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Merge Station Coordinates\n", "rides = (rides.merge(stations.rename(columns={'lat':'from_lat',\n", " 'lon':'from_lon'}),\n", " left_on='from_station_id', right_index=True,\n", " how='left')\n", " .merge(stations.rename(columns={'lat':'to_lat',\n", " 'lon':'to_lon'}),\n", " left_on='to_station_id', right_index=True,\n", " how='left'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculate Distances" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "rides['dist'] = haversine(rides.from_lat, rides.from_lon, \n", " rides.to_lat, rides.to_lon)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "rides['taxi_dist'] = (haversine(rides.from_lat, rides.from_lon, \n", " rides.from_lat, rides.to_lon) + \n", " haversine(rides.to_lat, rides.to_lon, \n", " rides.from_lat, rides.to_lon))" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# rides.to_pickle('data/rides_with_dist.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploration with Distance" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "rides = pd.read_pickle('data/rides_with_dist.pkl')" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
disttaxi_dist
count1.742475e+071.742475e+07
mean2.005762e+002.499794e+00
std1.587529e+002.005664e+00
min0.000000e+000.000000e+00
25%9.254459e-011.145991e+00
50%1.548781e+001.944045e+00
75%2.646230e+003.304592e+00
max3.679380e+014.167998e+01
\n", "
" ], "text/plain": [ " dist taxi_dist\n", "count 1.742475e+07 1.742475e+07\n", "mean 2.005762e+00 2.499794e+00\n", "std 1.587529e+00 2.005664e+00\n", "min 0.000000e+00 0.000000e+00\n", "25% 9.254459e-01 1.145991e+00\n", "50% 1.548781e+00 1.944045e+00\n", "75% 2.646230e+00 3.304592e+00\n", "max 3.679380e+01 4.167998e+01" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rides[['dist', 'taxi_dist']].divide(1000).describe()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "45.5 trips to the moon and back\n" ] } ], "source": [ "sum_dist = rides.dist.sum()\n", "m_to_moon = 384401000\n", "\n", "print(round(sum_dist / m_to_moon / 2, 1), 'trips to the moon and back')" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "(rides.dist.loc[(1 < rides.dist) &\n", " (rides.dist < rides.dist.quantile(.99))]\n", " .divide(1000)\n", " .plot.hist(bins=100, title='Distance per Ride (kilometers)'));" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# sum distance ridden, km\n", "dist_sum = rides.groupby('bikeid')['dist'].sum().divide(1000)\n", "dist_sum.plot.hist(bins=100, title='Distance ridden per bike (kilometers)');" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bikeid\n", "410 10451.034962\n", "1315 9785.608171\n", "1385 9743.838634\n", "73 9726.479546\n", "877 9642.728393\n" ] } ], "source": [ "print(dist_sum.sort_values(ascending=False).head().to_string())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Farthest Ridden Bike" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "rides = pd.read_pickle('data/rides_with_dist.pkl')" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "# Path of the farthest ridden bike\n", "df = (rides.loc[(rides.bikeid==410)]\n", " .sort_values('start_time'))\n", "\n", "df['start_q'] = df.start_time.dt.to_period(\"Q\").astype(str)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dropped 1 colors from end of the gradient\n" ] } ], "source": [ "date_range = (pd.date_range(df['start_time'].min().date(),\n", " df['start_time'].max().date())\n", " .to_period(\"Q\").astype(str).unique())\n", "\n", "colors = [\n", " '#fe0000', #red\n", " '#fdfe02', #yellow \n", " '#011efe', #blue\n", "] \n", "\n", "gradient = polylinear_gradient(colors, len(date_range))\n", "\n", "date_colors = pd.Series(index=date_range, data=gradient, name='date_color')\n", "\n", "df = df.merge(date_colors, left_on='start_q', right_index=True, how='left')" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = folium.Map(location=[df.from_lat.mean(), \n", " df.from_lon.mean()],\n", " tiles='CartoDB dark_matter',\n", " zoom_start=13)\n", "\n", "for q_group, q_df in df.groupby('start_q'):\n", " points = []\n", " from_locs = list(zip(q_df.from_lat, q_df.from_lon))\n", " to_locs = list(zip(q_df.to_lat, q_df.to_lon))\n", " for from_i, to_i, in zip(from_locs, to_locs):\n", " points.append(from_i)\n", " points.append(to_i)\n", " \n", " folium.PolyLine(\n", " points,\n", " weight=1,\n", " color=q_df.date_color.values[0],\n", " tooltip=q_group.replace('Q', ' Q')\n", " ).add_to(m)\n", "\n", "folium.plugins.Fullscreen(\n", " position='topright',\n", " force_separate_button=True\n", ").add_to(m)\n", "\n", "m.save('maps/longest_ridden_rainbow.html')\n", "\n", "m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Check Daylight Savings\n", "\n", "The time component of my analysis is very important. If DST were a problem, a large number of rows could be +/- 1 hour off. \n", "\n", "Sanity check:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rides = pd.read_pickle('data/rides_with_dist.pkl')" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "dst_start = { # clock 1:59AM to 3:00AM\n", " '2013':'03-10',\n", " '2014':'03-09',\n", " '2015':'03-08',\n", " '2016':'03-13',\n", " '2017':'03-12',\n", " '2018':'03-11',\n", "}\n", "\n", "for yy, mm_dd in dst_start.items():\n", " uh_oh = rides.loc[(f'{yy}-{mm_dd} 01:59:59' < rides['start_time']) &\n", " (rides['start_time'] < f'{yy}-{mm_dd} 03:00:00')]\n", " if not uh_oh.empty:\n", " print(uh_oh)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
start_timeend_timetripduration
134417722017-11-05 01:24:002017-11-05 01:04:002385.0
134417652017-11-05 01:29:002017-11-05 01:05:002158.0
134417522017-11-05 01:47:002017-11-05 01:04:001028.0
134417512017-11-05 01:51:002017-11-05 01:07:00932.0
134417502017-11-05 01:54:002017-11-05 01:06:00715.0
134417492017-11-05 01:54:002017-11-05 01:06:00724.0
134417482017-11-05 01:54:002017-11-05 01:06:00716.0
134417472017-11-05 01:56:002017-11-05 01:04:00479.0
134417452017-11-05 01:58:002017-11-05 01:27:001699.0
134417462017-11-05 01:58:002017-11-05 01:13:00887.0
134417442017-11-05 01:59:002017-11-05 01:13:00831.0
171562732018-11-04 01:34:412018-11-04 01:12:432282.0
171562792018-11-04 01:46:452018-11-04 01:06:531208.0
171562802018-11-04 01:48:122018-11-04 01:02:46874.0
171562812018-11-04 01:50:422018-11-04 01:01:53671.0
171562832018-11-04 01:53:162018-11-04 01:11:131077.0
171562842018-11-04 01:55:292018-11-04 01:14:561167.0
171562852018-11-04 01:59:572018-11-04 01:27:141637.0
\n", "
" ], "text/plain": [ " start_time end_time tripduration\n", "13441772 2017-11-05 01:24:00 2017-11-05 01:04:00 2385.0\n", "13441765 2017-11-05 01:29:00 2017-11-05 01:05:00 2158.0\n", "13441752 2017-11-05 01:47:00 2017-11-05 01:04:00 1028.0\n", "13441751 2017-11-05 01:51:00 2017-11-05 01:07:00 932.0\n", "13441750 2017-11-05 01:54:00 2017-11-05 01:06:00 715.0\n", "13441749 2017-11-05 01:54:00 2017-11-05 01:06:00 724.0\n", "13441748 2017-11-05 01:54:00 2017-11-05 01:06:00 716.0\n", "13441747 2017-11-05 01:56:00 2017-11-05 01:04:00 479.0\n", "13441745 2017-11-05 01:58:00 2017-11-05 01:27:00 1699.0\n", "13441746 2017-11-05 01:58:00 2017-11-05 01:13:00 887.0\n", "13441744 2017-11-05 01:59:00 2017-11-05 01:13:00 831.0\n", "17156273 2018-11-04 01:34:41 2018-11-04 01:12:43 2282.0\n", "17156279 2018-11-04 01:46:45 2018-11-04 01:06:53 1208.0\n", "17156280 2018-11-04 01:48:12 2018-11-04 01:02:46 874.0\n", "17156281 2018-11-04 01:50:42 2018-11-04 01:01:53 671.0\n", "17156283 2018-11-04 01:53:16 2018-11-04 01:11:13 1077.0\n", "17156284 2018-11-04 01:55:29 2018-11-04 01:14:56 1167.0\n", "17156285 2018-11-04 01:59:57 2018-11-04 01:27:14 1637.0" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# DST End, clock 1:59AM to 1:00AM\n", "rides.loc[(rides.end_time < rides.start_time), \n", " ['start_time', 'end_time', 'tripduration']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Animated Plot\n", "\n", "Seeking animated plot where:\n", "* circles positioned at stations\n", "* **size** represents station usage\n", "* **color** represents type of use (gaining bikes or losing bikes)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rides = pd.read_pickle('data/rides_with_dist.pkl')" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "# subsample to get a working flow before applying to large dataset\n", "# rides = rides.sample(10000)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
from_station_idstart_timeto_station_idend_time
356742912013-06-27 01:06:00482013-06-27 09:46:00
356744882013-06-27 11:09:00882013-06-27 11:11:00
356745882013-06-27 11:12:00882013-06-27 11:13:00
356746172013-06-27 11:24:00612013-06-27 14:38:00
356747882013-06-27 11:39:00342013-06-27 16:01:00
\n", "
" ], "text/plain": [ " from_station_id start_time to_station_id end_time\n", "356742 91 2013-06-27 01:06:00 48 2013-06-27 09:46:00\n", "356744 88 2013-06-27 11:09:00 88 2013-06-27 11:11:00\n", "356745 88 2013-06-27 11:12:00 88 2013-06-27 11:13:00\n", "356746 17 2013-06-27 11:24:00 61 2013-06-27 14:38:00\n", "356747 88 2013-06-27 11:39:00 34 2013-06-27 16:01:00" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# reshape DataFrame from 'ride' orientation to 'station interaction' orientation\n", "rides[['from_station_id', 'start_time', \n", " 'to_station_id', 'end_time']].head()" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "stn_agg = my_melt(rides)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
station_idtimelatlontype
356742912013-06-27 01:06:0041.88338-87.641170departure
356744882013-06-27 11:09:0041.88402-87.656271departure
356745882013-06-27 11:12:0041.88402-87.656271departure
356746172013-06-27 11:24:0041.90322-87.673333departure
356747882013-06-27 11:39:0041.88402-87.656271departure
\n", "
" ], "text/plain": [ " station_id time lat lon type\n", "356742 91 2013-06-27 01:06:00 41.88338 -87.641170 departure\n", "356744 88 2013-06-27 11:09:00 41.88402 -87.656271 departure\n", "356745 88 2013-06-27 11:12:00 41.88402 -87.656271 departure\n", "356746 17 2013-06-27 11:24:00 41.90322 -87.673333 departure\n", "356747 88 2013-06-27 11:39:00 41.88402 -87.656271 departure" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stn_agg.head()" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "# actually 'half hour'\n", "stn_agg['hour'] = (stn_agg.time.dt.hour + \n", " stn_agg.time.dt.minute // 30 * 0.5)\n", "# stn_agg['dow'] = stn_agg.time.dt.dayofweek\n", "stn_agg['month'] = stn_agg.time.dt.to_period('M')\n", "\n", "# Here I assume if the station was used any time in a month,\n", "# then it was active/available for that entire month\n", "stn_days = (stn_agg.groupby('station_id')['month']\n", " .nunique()\n", " .multiply(30)\n", " .rename('days_active'))\n", "\n", "# pivot to get arrival and departure count columns\n", "id_cols = ['station_id', 'lat', 'lon', 'hour']\n", "stn_agg = (stn_agg.pivot_table(index=id_cols, columns='type', \n", " aggfunc='size', fill_value=0)\n", " .reset_index()\n", " .merge(stn_days, left_on='station_id', \n", " right_index=True))\n", "stn_agg['total_use'] = stn_agg.arrival + stn_agg.departure\n", "stn_agg['avg_use'] = stn_agg.total_use.divide(stn_agg.days_active, fill_value=0)\n", "stn_agg['pt_departures'] = stn_agg.departure.divide(stn_agg.total_use, fill_value=0.5)\n", "\n", "# stn_agg.to_pickle('data/station_aggregates.pkl')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "stn_agg = pd.read_pickle('data/station_aggregates.pkl')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
station_idlatlonhourarrivaldeparturedays_activetotal_useavg_usept_departures
0241.876582-87.6213020.023420913204430.3356060.471783
1241.876582-87.6213020.522718413204110.3113640.447689
2241.876582-87.6213021.01539613202490.1886360.385542
3241.876582-87.6213021.51239613202190.1659090.438356
4241.876582-87.6213022.01077113201780.1348480.398876
\n", "
" ], "text/plain": [ " station_id lat lon hour arrival departure days_active \\\n", "0 2 41.876582 -87.621302 0.0 234 209 1320 \n", "1 2 41.876582 -87.621302 0.5 227 184 1320 \n", "2 2 41.876582 -87.621302 1.0 153 96 1320 \n", "3 2 41.876582 -87.621302 1.5 123 96 1320 \n", "4 2 41.876582 -87.621302 2.0 107 71 1320 \n", "\n", " total_use avg_use pt_departures \n", "0 443 0.335606 0.471783 \n", "1 411 0.311364 0.447689 \n", "2 249 0.188636 0.385542 \n", "3 219 0.165909 0.438356 \n", "4 178 0.134848 0.398876 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stn_agg.head()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": true }, "outputs": [], "source": [ "## sanity checks\n", "# (set(rides.to_station_id) | set(rides.from_station_id)) - set(stn_agg.station_id)\n", "# (set(rides.to_station_id) | set(rides.from_station_id)) - set(stations.index)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "stn_agg.pt_departures.plot.hist(bins=50, title='dot color range');" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "stn_agg.avg_use.plot.hist(bins=50, title='dot size range');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Perception of Circle Size\n", "\n", "If we directly map the radius of each station dot to the average use of the station, a dot of radius 2 will appear non-linearly larger than a dot of radius 1. However, if we set the **area** of the dot to the average use, I would argue that the average user would not perceive the the second dot as twice as large as the first. I could not find any empirical discussion on this.\n", "\n", "https://eagereyes.org/blog/2008/linear-vs-quadratic-change\n", "\n", "In my tests, a circle radius 60 is about the largest I would like a cirlce to appear, otherwise it obscures other information." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "def radius(x):\n", " '''radius when radius defined by station interactions'''\n", " return x * 2.19\n", "\n", "def area(x):\n", " '''radius when area defined by station interactions'''\n", " return ((x / np.pi) ** (1/2)) * 20\n", "\n", "def between(x):\n", " return x ** (58/100) * 8.7\n", "\n", "x_vals = np.linspace(0, 27, 100)\n", "plots = pd.DataFrame({'x_vals':x_vals,\n", " 'radius': [radius(x) for x in x_vals],\n", " 'area': [area(x) for x in x_vals],\n", " 'between': [between(x) for x in x_vals]})\n", "plots.plot.line(x='x_vals', xlim=(0,27), ylim=(0,60))\n", "plt.xlabel('N Station Interactions')\n", "plt.ylabel('Circle Radius on Plot');" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
station_idlatlonhourarrivaldeparturedays_activetotal_useavg_usept_departures
2858066441.939354-87.6832820.0013010.0333331.0
2858166441.939354-87.6832826.0023020.0666671.0
2858266441.939354-87.6832828.0103010.0333330.0
2858366441.939354-87.6832829.0013010.0333331.0
2858466441.939354-87.68328211.0323050.1666670.4
\n", "
" ], "text/plain": [ " station_id lat lon hour arrival departure \\\n", "28580 664 41.939354 -87.683282 0.0 0 1 \n", "28581 664 41.939354 -87.683282 6.0 0 2 \n", "28582 664 41.939354 -87.683282 8.0 1 0 \n", "28583 664 41.939354 -87.683282 9.0 0 1 \n", "28584 664 41.939354 -87.683282 11.0 3 2 \n", "\n", " days_active total_use avg_use pt_departures \n", "28580 30 1 0.033333 1.0 \n", "28581 30 2 0.066667 1.0 \n", "28582 30 1 0.033333 0.0 \n", "28583 30 1 0.033333 1.0 \n", "28584 30 5 0.166667 0.4 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stn_agg.loc[stn_agg.station_id==664].head()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "def expand_and_interpolate(df):\n", " # expand by every half hour and fill with zeros\n", " steps = 24 * 2\n", " hours = pd.Series(np.linspace(0, 24, steps, endpoint=False),\n", " index=[1]*steps, name='hour')\n", " df = add_empty_rows(df=stn_agg, fill_series=hours, \n", " constants=['station_id', 'lat','lon', 'days_active'])\n", " df['pt_departures'] = df['pt_departures'].fillna(0.50)\n", " df = df.fillna(0)\n", "\n", " # expand by every 2 minutes and fill with interpolation\n", " steps = 24 * 2 * 15\n", " hours = pd.Series(np.linspace(0, 24, steps, endpoint=False).round(3),\n", " index=[1]*steps, name='hour')\n", " df = add_empty_rows(df=df, fill_series=hours, \n", " constants=['station_id', 'lat','lon', 'days_active'])\n", "\n", " # add hour 24 that matches hour 0\n", " df = (df.append(df.loc[df['hour']==df['hour'].min()]\n", " .assign(hour=24))\n", " .sort_values(['station_id', 'hour']))\n", "\n", " df[['avg_use', 'pt_departures']] = df[['avg_use', 'pt_departures']].interpolate()\n", " \n", " return df\n", "\n", "\n", "def get_percent_depart_gradient():\n", " # Generate color gradient for each percentage pt_departures\n", " strt_color = \"#18f0da\" #blue, gathering bikes\n", " mid_color = \"#e6e6e6\" #gray\n", " end_color = \"#f06e18\" #orange, \"radiating\" bikes\n", " start_pos = 25\n", " mid_width = 5\n", "\n", " steps = int((100 - start_pos * 2 - mid_width) / 2) + 1\n", "\n", " color_list = ([strt_color] * start_pos +\n", " linear_gradient(strt_color, mid_color, steps) + \n", " [mid_color] * mid_width +\n", " linear_gradient(mid_color, end_color, steps) + \n", " [end_color] * start_pos)\n", "\n", " gradient = pd.Series(data=color_list,\n", " index=np.linspace(0, 1, 101, endpoint=True).round(2),\n", " name='color')\n", " return gradient" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "stn_agg_interp = expand_and_interpolate(df=stn_agg)\n", "\n", "gradient = get_percent_depart_gradient()\n", "\n", "stn_agg_interp['pt_departures_rd'] = stn_agg_interp['pt_departures'].round(2)\n", "stn_agg_interp = (stn_agg_interp.drop(columns='color', errors='ignore')\n", " .merge(gradient, left_on='pt_departures_rd', \n", " right_index=True, how='left'))\n", "stn_agg_interp['radius'] = stn_agg_interp.avg_use.apply(between)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "scrolled": true }, "outputs": [], "source": [ "create_map??" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "create_map(stn_agg_interp.loc[stn_agg_interp.hour==17])" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gen_maps_by_group(stn_agg_interp, group_label='hour', height_px=1350, \n", " width_px=int(1350*1.777), preview=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From here I use `gen_maps_by_group` to generate `.html` maps for each frame of the animation, then `render_maps_dir_to_pngs` to iterate over the maps to `.png`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "maps_dir = os.path.join(os.getcwd(), 'maps')\n", "output_dir = os.path.join(os.getcwd(), 'maps/pngs')" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on function render_html_map_to_png in module utils.mapping:\n", "\n", "render_html_map_to_png(map_path, output_path, map_x_px=None, map_y_px=None, driver=None, sleep_s=3.0, quit_after=True, preview=False)\n", "\n" ] } ], "source": [ "help(render_html_map_to_png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dualmap" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def add_to_dualmap(df, m, add_to):\n", " for i, r in df.iterrows():\n", " if r.avg_use < 0.01:\n", " continue\n", " popup_txt=(f'
Station {r.station_id}

'\n", " f'Avg Uses: {round(r.avg_use,1)}
'\n", " f'Departures: {round(r.pt_departures*100)}%')\n", " folium.CircleMarker(\n", " location=(r.lat, r.lon), \n", " radius=r.radius,\n", " color=r['color'],\n", " weight=0.5,\n", " popup=folium.Popup(popup_txt, max_width=500),\n", " fill=True).add_to(add_to)\n", " \n", " return m\n", "\n", "m = folium.plugins.DualMap(\n", " location=(41.89, -87.63), \n", " tiles=\"CartoDB dark_matter\", \n", " zoom_start=14)\n", "\n", "m = add_to_dualmap(stn_agg_interp.loc[stn_agg_interp.hour==8],\n", " m, m.m1)\n", "\n", "m = add_to_dualmap(stn_agg_interp.loc[stn_agg_interp.hour==17],\n", " m, m.m2)\n", "\n", "m.save('maps/am_v_pm.html')\n", "\n", "m" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }