{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Travel time prediction in Indian Metro cities using Uber Movement data and OpenStreetMap\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Uber provides anonymized and aggregated travel time data through [Uber Movement](https://movement.uber.com/) platform for many citites across the world. For India, current and historic data is available for 5 cities - Bangalore, Hyderabad, New Delhi, Mumbai and Kolkata. It also provides the details on the ward boundaries in the form of JSON file." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[OpenStreetMap](https://wiki.openstreetmap.org/wiki/About_OpenStreetMap) (OSM) is a free, editable map of the whole world that is being built by volunteers largely from scratch and released with an open-content license. OSM data includes a global navigable street network dataset. Several services exists that provide routing and network analysis on top of this data. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this project, we use the open travel time dataset from Uber and leverage open-source routing services for OpenStreetMap to build a fairly accurate model for travel time within each of the metro cities in India. We show that by using rich ecosystem of Python Geospatial libraries, we can easily consume, process, and visualize large amount of geospatial data easily and incorporate it easily into a machine learning model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Open datasets**\n", "- Uber Movement - Travel times and ward boundaries\n", "- OpenStreetMap\n", "\n", "**Python libraries**\n", "- geopandas\n", "- shapely\n", "- matplotlib\n", "- folium\n", "- scikit-learn\n", "\n", "**Services**\n", "- Open Source Routing Machine (OSRM)\n", "- OpenRouteService (ORS) API " ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import geopandas as gpd\n", "import numpy as np\n", "import requests\n", "import shapely\n", "import matplotlib.pyplot as plt\n", "import datetime\n", "import os\n", "import math\n", "import random\n", "import folium\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.ensemble import RandomForestRegressor \n", "from sklearn.preprocessing import OneHotEncoder\n", "from sklearn.compose import ColumnTransformer\n", "from sklearn.pipeline import Pipeline\n", "from matplotlib import pyplot\n", "\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading Datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Travel Times" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "data_folder = os.path.join('data', 'uber')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Uber Movement Travel Times data comes as a CSV file for each quarter. Here we are using the **Travel Times By Date By Hour Buckets (All Days)** dataset. This data set includes the arithmetic mean, geometric mean, and standard deviations for aggregated travel times between every ward in the city, for every day of the quarter and aggregated into time categories. This is a large dataset with over **7M rows**.\n", "\n", "We import the data as a Pandas DataFrame and call `convert_dtypes()` to select the best datatypes for each column. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "travel_times_file = 'bangalore-wards-2020-1-All-DatesByHourBucketsAggregate.csv'\n", "travel_times_filepath = os.path.join(data_folder, travel_times_file)\n", "travel_times= pd.read_csv(travel_times_filepath)\n", "travel_times = travel_times.convert_dtypes()" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 7640967 entries, 0 to 7640966\n", "Data columns (total 17 columns):\n", " # Column Dtype \n", "--- ------ ----- \n", " 0 sourceid Int64 \n", " 1 dstid Int64 \n", " 2 month Int64 \n", " 3 day Int64 \n", " 4 start_hour Int64 \n", " 5 end_hour Int64 \n", " 6 mean_travel_time Float64\n", " 7 standard_deviation_travel_time Float64\n", " 8 geometric_mean_travel_time Float64\n", " 9 geometric_standard_deviation_travel_time Float64\n", " 10 time_period int64 \n", " 11 travel_time Float64\n", " 12 src_lon float64\n", " 13 src_lat float64\n", " 14 dst_lon float64\n", " 15 dst_lat float64\n", " 16 distance float64\n", "dtypes: Float64(5), Int64(6), float64(5), int64(1)\n", "memory usage: 1.1 GB\n" ] } ], "source": [ "travel_times" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Ward Boundaries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The travel times dataset contain details of travel between *Zones*. For Indian citites, the zones are **Wards** as defined by the local municipal corporation. This data comes as a **GeoJSON** file that contains the polygon representation of each ward. We use `geopandas` to read the file as a GeoDataFrame." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "wards_file = 'bangalore_wards.json'\n", "wards_filepath = os.path.join(data_folder, wards_file)\n", "wards = gpd.read_file(wards_filepath)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WARD_NOWARD_NAMEMOVEMENT_IDDISPLAY_NAMEgeometry
02Chowdeswari Ward1Unnamed Road, BengaluruMULTIPOLYGON (((77.59229 13.09720, 77.59094 13...
13Atturu29th Cross Bhel Layout, Adityanagar, Vidyaranya...MULTIPOLYGON (((77.56862 13.12705, 77.57064 13...
24Yelahanka Satellite Town315th A Cross Road, Yelahanka Satellite Town, Y...MULTIPOLYGON (((77.59094 13.09842, 77.59229 13...
351Vijnanapura4SP Naidu Layout 4th Cross Street, SP Naidu Lay...MULTIPOLYGON (((77.67683 13.01147, 77.67695 13...
453Basavanapura5Medahalli Kadugodi Road, Bharathi Nagar, Krish...MULTIPOLYGON (((77.72899 13.02061, 77.72994 13...
..................
193172Madivala1940 1st B Cross Road, Cashier Layout, 1st Stage,...MULTIPOLYGON (((77.61399 12.92347, 77.61419 12...
19426Ramamurthy Nagar195Kalkere-Agara Main Road, Horamavu Agara, Kalke...MULTIPOLYGON (((77.68336 13.05192, 77.68384 13...
19525Horamavu1960 Horamavu Agara Main Road, 1st Block, Mallapp...MULTIPOLYGON (((77.64931 13.07853, 77.64993 13...
19686Marathahalli1970 3rd Cross Road, Manjunatha Layout, Marathaha...MULTIPOLYGON (((77.68549 12.94121, 77.68539 12...
197198Hemmigepura198BGS Road, Kodipalya, BengaluruMULTIPOLYGON (((77.49854 12.92574, 77.49854 12...
\n", "

198 rows × 5 columns

\n", "
" ], "text/plain": [ " WARD_NO WARD_NAME MOVEMENT_ID \\\n", "0 2 Chowdeswari Ward 1 \n", "1 3 Atturu 2 \n", "2 4 Yelahanka Satellite Town 3 \n", "3 51 Vijnanapura 4 \n", "4 53 Basavanapura 5 \n", ".. ... ... ... \n", "193 172 Madivala 194 \n", "194 26 Ramamurthy Nagar 195 \n", "195 25 Horamavu 196 \n", "196 86 Marathahalli 197 \n", "197 198 Hemmigepura 198 \n", "\n", " DISPLAY_NAME \\\n", "0 Unnamed Road, Bengaluru \n", "1 9th Cross Bhel Layout, Adityanagar, Vidyaranya... \n", "2 15th A Cross Road, Yelahanka Satellite Town, Y... \n", "3 SP Naidu Layout 4th Cross Street, SP Naidu Lay... \n", "4 Medahalli Kadugodi Road, Bharathi Nagar, Krish... \n", ".. ... \n", "193 0 1st B Cross Road, Cashier Layout, 1st Stage,... \n", "194 Kalkere-Agara Main Road, Horamavu Agara, Kalke... \n", "195 0 Horamavu Agara Main Road, 1st Block, Mallapp... \n", "196 0 3rd Cross Road, Manjunatha Layout, Marathaha... \n", "197 BGS Road, Kodipalya, Bengaluru \n", "\n", " geometry \n", "0 MULTIPOLYGON (((77.59229 13.09720, 77.59094 13... \n", "1 MULTIPOLYGON (((77.56862 13.12705, 77.57064 13... \n", "2 MULTIPOLYGON (((77.59094 13.09842, 77.59229 13... \n", "3 MULTIPOLYGON (((77.67683 13.01147, 77.67695 13... \n", "4 MULTIPOLYGON (((77.72899 13.02061, 77.72994 13... \n", ".. ... \n", "193 MULTIPOLYGON (((77.61399 12.92347, 77.61419 12... \n", "194 MULTIPOLYGON (((77.68336 13.05192, 77.68384 13... \n", "195 MULTIPOLYGON (((77.64931 13.07853, 77.64993 13... \n", "196 MULTIPOLYGON (((77.68549 12.94121, 77.68539 12... \n", "197 MULTIPOLYGON (((77.49854 12.92574, 77.49854 12... \n", "\n", "[198 rows x 5 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wards" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots(figsize=(10,10))\n", "wards['geometry'].plot(color='grey',ax=ax)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Pre-Processing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Travel Times" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "np.random.seed(0)\n", "travel_times = pd.concat([travel_times]*5, ignore_index=True)\n" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "\n", "travel_times['random'] = np.random.uniform(0, 1, len(travel_times))\n" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "\n", "travel_times['travel_time'] = np.exp(travel_times['random']*np.log(travel_times['geometric_standard_deviation_travel_time']) + np.log(travel_times['geometric_mean_travel_time']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The source data contains the travel times grouped by blocks of time (peak/off-peak etc.), defined by `start_hour` and `end_hour` columns. To allow us to model this easily, we add a `time_period` columns and assign an integer category value. " ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sourceiddstidmonthdaystart_hourend_hourmean_travel_timestandard_deviation_travel_timegeometric_mean_travel_timegeometric_standard_deviation_travel_timetime_periodtravel_timesrc_lonsrc_latdst_londst_latdistancerandom
0102973131016322.8425.14270.591.73362.06389877.56381712.98278477.56628712.9703591778.90.548814
110297117190306.71200.99256.581.845396.84205777.56381712.98278477.56628712.9703591778.90.715189
21029727190282.94206.01233.292.05354.27946577.56381712.98278477.56628712.9703591778.90.602763
310297119710294.18183.97258.091.612334.55468877.56381712.98278477.56628712.9703591778.90.544883
41029729710263.55149.79232.891.672289.40480977.56381712.98278477.56628712.9703591778.90.423655
.........................................................
38204830162833216193120.83459.433089.71.1543091.90823877.58754512.92454577.59907713.00597611869.00.005112
3820483181272210162943.88338.432924.381.1233128.37955977.67258312.99447477.54881212.96370317072.90.595019
382048323495211902624.09863.772493.341.3753387.56280877.50630713.03805077.63259412.97372519533.40.973561
382048335612011910162366.86349.152340.331.1632635.21100677.60374113.03517477.54704412.97209313119.40.799564
38204834128251241902135.83604.692033.071.3952273.70285277.55194112.96054577.55385713.02602910127.40.339695
\n", "

38204835 rows × 18 columns

\n", "
" ], "text/plain": [ " sourceid dstid month day start_hour end_hour mean_travel_time \\\n", "0 102 97 3 13 10 16 322.8 \n", "1 102 97 1 17 19 0 306.71 \n", "2 102 97 2 7 19 0 282.94 \n", "3 102 97 1 19 7 10 294.18 \n", "4 102 97 2 9 7 10 263.55 \n", "... ... ... ... ... ... ... ... \n", "38204830 162 83 3 2 16 19 3120.83 \n", "38204831 8 127 2 2 10 16 2943.88 \n", "38204832 34 95 2 1 19 0 2624.09 \n", "38204833 56 120 1 19 10 16 2366.86 \n", "38204834 128 25 1 24 19 0 2135.83 \n", "\n", " standard_deviation_travel_time geometric_mean_travel_time \\\n", "0 425.14 270.59 \n", "1 200.99 256.58 \n", "2 206.01 233.29 \n", "3 183.97 258.09 \n", "4 149.79 232.89 \n", "... ... ... \n", "38204830 459.43 3089.7 \n", "38204831 338.43 2924.38 \n", "38204832 863.77 2493.34 \n", "38204833 349.15 2340.33 \n", "38204834 604.69 2033.07 \n", "\n", " geometric_standard_deviation_travel_time time_period travel_time \\\n", "0 1.7 3 362.063898 \n", "1 1.84 5 396.842057 \n", "2 2.0 5 354.279465 \n", "3 1.61 2 334.554688 \n", "4 1.67 2 289.404809 \n", "... ... ... ... \n", "38204830 1.15 4 3091.908238 \n", "38204831 1.12 3 3128.379559 \n", "38204832 1.37 5 3387.562808 \n", "38204833 1.16 3 2635.211006 \n", "38204834 1.39 5 2273.702852 \n", "\n", " src_lon src_lat dst_lon dst_lat distance random \n", "0 77.563817 12.982784 77.566287 12.970359 1778.9 0.548814 \n", "1 77.563817 12.982784 77.566287 12.970359 1778.9 0.715189 \n", "2 77.563817 12.982784 77.566287 12.970359 1778.9 0.602763 \n", "3 77.563817 12.982784 77.566287 12.970359 1778.9 0.544883 \n", "4 77.563817 12.982784 77.566287 12.970359 1778.9 0.423655 \n", "... ... ... ... ... ... ... \n", "38204830 77.587545 12.924545 77.599077 13.005976 11869.0 0.005112 \n", "38204831 77.672583 12.994474 77.548812 12.963703 17072.9 0.595019 \n", "38204832 77.506307 13.038050 77.632594 12.973725 19533.4 0.973561 \n", "38204833 77.603741 13.035174 77.547044 12.972093 13119.4 0.799564 \n", "38204834 77.551941 12.960545 77.553857 13.026029 10127.4 0.339695 \n", "\n", "[38204835 rows x 18 columns]" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "travel_times" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "categories_to_hour = {\n", " 1: [0, 6],\n", " 2: [7, 9],\n", " 3: [10, 15],\n", " 4: [16, 18],\n", " 5: [19, 23]\n", "}\n", "\n", "def get_time_period(hour):\n", " for category, (start_hour, end_hour) in categories_to_hour.items():\n", " if hour >= start_hour and hour <= end_hour:\n", " return category" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "travel_times['time_period'] = travel_times['start_hour'].apply(get_time_period)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Travel time has a strong correlation with the day of the week. So we compute a new column `dow` from the day and month columns" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "year = 2020\n", "\n", "def get_dow(row):\n", " return datetime.date(year, int(row['month']), int(row['day'])).weekday()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "travel_times['dow'] = travel_times.apply(get_dow, axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "travel_times" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Ward Boundaries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For modeling purposes, we use centroid of each ward to represent the ward. We use GeoPandas `centroid()` function to get the point geometry representing the centroid.\n", "\n", "Our source data comes in the *EPSG:4326 WGS84 Geographic Projection* - which is not suitable forgeoprocessing operations. To get the accurate centroid computation, we must re-project the data to a *Planar Projection*. We use a UTM projection suitable for the region of the data - *WGS 84 UTM Zone 43N* - which is defined by the code [EPSG:32643](http://epsg.io/32643). Once computed, we transform it back to EPSG:4326 and add it to our GeoDataFrame." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "centroid_utm = wards.geometry.to_crs('EPSG:32643').centroid\n", "wards['centroid'] = centroid_utm.to_crs('EPSG:4326')" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots(figsize=(10,10))\n", "wards['geometry'].plot(color='grey',ax=ax)\n", "wards['centroid'].plot(color='red',ax=ax)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Distance Computation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have the travel times for each pair of source and destination wards. The travel time is strongly correlated with the distance between the wards. We need to compute the actual distance along the road network for our ward." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "ward_no = wards['MOVEMENT_ID']\n", "index = pd.MultiIndex.from_product([ward_no, ward_no], names = ['sourceid', 'dstid'])\n", "\n", "distancematrix = pd.DataFrame(index = index).reset_index()\n", "distancematrix = distancematrix.query('sourceid != dstid')" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "def get_coordinates(row):\n", " source_ward = wards[wards['MOVEMENT_ID'] == row['sourceid']].iloc[0]\n", " dst_ward = wards[wards['MOVEMENT_ID'] == row['dstid']].iloc[0]\n", "\n", " src_lon, src_lat = source_ward['centroid'].x, source_ward['centroid'].y\n", " dst_lon, dst_lat = dst_ward['centroid'].x, dst_ward['centroid'].y\n", " return src_lon, src_lat, dst_lon, dst_lat" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "distancematrix[['src_lon', 'src_lat', 'dst_lon', 'dst_lat']] = distancematrix.apply(get_coordinates, axis=1, result_type='expand')" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sourceiddstidsrc_lonsrc_latdst_londst_lat
11277.58042213.12170977.56003713.102805
21377.58042213.12170977.58392613.090987
31477.58042213.12170977.66956513.006063
41577.58042213.12170977.71545613.016847
51677.58042213.12170977.70550213.022373
.....................
3919819819377.50501512.89190377.59450712.910882
3919919819477.50501512.89190377.61441812.920018
3920019819577.50501512.89190377.67653913.033613
3920119819677.50501512.89190377.65327213.044560
3920219819777.50501512.89190377.69149512.950743
\n", "

39006 rows × 6 columns

\n", "
" ], "text/plain": [ " sourceid dstid src_lon src_lat dst_lon dst_lat\n", "1 1 2 77.580422 13.121709 77.560037 13.102805\n", "2 1 3 77.580422 13.121709 77.583926 13.090987\n", "3 1 4 77.580422 13.121709 77.669565 13.006063\n", "4 1 5 77.580422 13.121709 77.715456 13.016847\n", "5 1 6 77.580422 13.121709 77.705502 13.022373\n", "... ... ... ... ... ... ...\n", "39198 198 193 77.505015 12.891903 77.594507 12.910882\n", "39199 198 194 77.505015 12.891903 77.614418 12.920018\n", "39200 198 195 77.505015 12.891903 77.676539 13.033613\n", "39201 198 196 77.505015 12.891903 77.653272 13.044560\n", "39202 198 197 77.505015 12.891903 77.691495 12.950743\n", "\n", "[39006 rows x 6 columns]" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "distancematrix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need to get driving distance between approximately 40,000 coordinates. To do this efficiently, we ran the [Open Source Routing Machine (OSRM)](https://hub.docker.com/r/osrm/osrm-backend/) service locally using docker images provided by the project. OSRM holds the network graph in memory and the routing is extremely fast. We write and apply the following function and get the driving distance in meters." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "def get_distance(row):\n", " \n", " coordinates = '{},{};{},{}'.format(\n", " row['src_lon'], row['src_lat'], row['dst_lon'], row['dst_lat'])\n", " url = 'http://127.0.0.1:5000/route/v1/driving/'\n", " response = requests.get(url + coordinates) \n", " if response.status_code== 200:\n", " data = response.json() \n", " distance = data['routes'][0]['distance']\n", " \n", " return distance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The resulting distance data is saved locally and used in the subsequent analysis." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "osrm_data_folder = os.path.join('data', 'osrm')\n", "distancematrix_file = 'distancematrix.csv'\n", "distancematrix_filepath = os.path.join(osrm_data_folder, distancematrix_file)\n", "distancematrix = pd.read_csv(distancematrix_filepath)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "travel_times = pd.merge(travel_times, distancematrix, on=['sourceid', 'dstid']) " ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sourceiddstidmonthdaystart_hourend_hourmean_travel_timestandard_deviation_travel_timegeometric_mean_travel_timegeometric_standard_deviation_travel_timetime_perioddowsrc_lonsrc_latdst_londst_latdistance
0102973131016322.80425.14270.591.703477.56381712.98278477.56628712.9703591778.9
110297117190306.71200.99256.581.845477.56381712.98278477.56628712.9703591778.9
21029727190282.94206.01233.292.005477.56381712.98278477.56628712.9703591778.9
310297119710294.18183.97258.091.612677.56381712.98278477.56628712.9703591778.9
41029729710263.55149.79232.891.672677.56381712.98278477.56628712.9703591778.9
......................................................
7640962162833216193120.83459.433089.701.154077.58754512.92454577.59907713.00597611869.0
764096381272210162943.88338.432924.381.123677.67258312.99447477.54881212.96370317072.9
76409643495211902624.09863.772493.341.375577.50630713.03805077.63259412.97372519533.4
76409655612011910162366.86349.152340.331.163677.60374113.03517477.54704412.97209313119.4
7640966128251241902135.83604.692033.071.395477.55194112.96054577.55385713.02602910127.4
\n", "

7640967 rows × 17 columns

\n", "
" ], "text/plain": [ " sourceid dstid month day start_hour end_hour mean_travel_time \\\n", "0 102 97 3 13 10 16 322.80 \n", "1 102 97 1 17 19 0 306.71 \n", "2 102 97 2 7 19 0 282.94 \n", "3 102 97 1 19 7 10 294.18 \n", "4 102 97 2 9 7 10 263.55 \n", "... ... ... ... ... ... ... ... \n", "7640962 162 83 3 2 16 19 3120.83 \n", "7640963 8 127 2 2 10 16 2943.88 \n", "7640964 34 95 2 1 19 0 2624.09 \n", "7640965 56 120 1 19 10 16 2366.86 \n", "7640966 128 25 1 24 19 0 2135.83 \n", "\n", " standard_deviation_travel_time geometric_mean_travel_time \\\n", "0 425.14 270.59 \n", "1 200.99 256.58 \n", "2 206.01 233.29 \n", "3 183.97 258.09 \n", "4 149.79 232.89 \n", "... ... ... \n", "7640962 459.43 3089.70 \n", "7640963 338.43 2924.38 \n", "7640964 863.77 2493.34 \n", "7640965 349.15 2340.33 \n", "7640966 604.69 2033.07 \n", "\n", " geometric_standard_deviation_travel_time time_period dow \\\n", "0 1.70 3 4 \n", "1 1.84 5 4 \n", "2 2.00 5 4 \n", "3 1.61 2 6 \n", "4 1.67 2 6 \n", "... ... ... ... \n", "7640962 1.15 4 0 \n", "7640963 1.12 3 6 \n", "7640964 1.37 5 5 \n", "7640965 1.16 3 6 \n", "7640966 1.39 5 4 \n", "\n", " src_lon src_lat dst_lon dst_lat distance \n", "0 77.563817 12.982784 77.566287 12.970359 1778.9 \n", "1 77.563817 12.982784 77.566287 12.970359 1778.9 \n", "2 77.563817 12.982784 77.566287 12.970359 1778.9 \n", "3 77.563817 12.982784 77.566287 12.970359 1778.9 \n", "4 77.563817 12.982784 77.566287 12.970359 1778.9 \n", "... ... ... ... ... ... \n", "7640962 77.587545 12.924545 77.599077 13.005976 11869.0 \n", "7640963 77.672583 12.994474 77.548812 12.963703 17072.9 \n", "7640964 77.506307 13.038050 77.632594 12.973725 19533.4 \n", "7640965 77.603741 13.035174 77.547044 12.972093 13119.4 \n", "7640966 77.551941 12.960545 77.553857 13.026029 10127.4 \n", "\n", "[7640967 rows x 17 columns]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "travel_times" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Modeling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use `scikit-learn` library to build and train a linear regressor.\n", "\n", "The independent variables considered are `sourceid, dstid, day, time_period, dow, src_lon, src_lat, dst_lon, dst_lat, distance`. The dependnet variable is the travel time `geometric_mean_travel_time`.\n", "Of the independent variables we goes for one-hot-encoding of to categorical variables `time_period` and `dow`\n", "We sample the travel times to get a subset that will be used for training." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "num_samples = 500000\n", "samples = travel_times.sample(n=num_samples, random_state=1)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "sel_input=['sourceid', 'dstid', 'day', 'time_period', 'dow', 'src_lon', 'src_lat', 'dst_lon', 'dst_lat', 'distance']\n", "cat_ip=['time_period','dow']\n", "scale_ip= list(set(sel_input)-set(cat_ip))\n" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "ename": "KeyError", "evalue": "\"['dow'] not in index\"", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msamples\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0msel_input\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0my\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msamples\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'travel_time'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mx_train\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mx_test\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_train\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_test\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtrain_test_split\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtest_size\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m0.30\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrandom_state\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/opt/anaconda3/envs/spatial_data_science/lib/python3.9/site-packages/pandas/core/frame.py\u001b[0m in \u001b[0;36m__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 3028\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mis_iterator\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3029\u001b[0m \u001b[0mkey\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 3030\u001b[0;31m \u001b[0mindexer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mloc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_listlike_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mraise_missing\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3031\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3032\u001b[0m \u001b[0;31m# take() does not accept boolean indexers\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/opt/anaconda3/envs/spatial_data_science/lib/python3.9/site-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_get_listlike_indexer\u001b[0;34m(self, key, axis, raise_missing)\u001b[0m\n\u001b[1;32m 1264\u001b[0m \u001b[0mkeyarr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mindexer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnew_indexer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0max\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_reindex_non_unique\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkeyarr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1265\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1266\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_validate_read_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkeyarr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mindexer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mraise_missing\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mraise_missing\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1267\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mkeyarr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mindexer\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1268\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/opt/anaconda3/envs/spatial_data_science/lib/python3.9/site-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_validate_read_indexer\u001b[0;34m(self, key, indexer, axis, raise_missing)\u001b[0m\n\u001b[1;32m 1314\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mraise_missing\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1315\u001b[0m \u001b[0mnot_found\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0mset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0max\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1316\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\"{not_found} not in index\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1317\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1318\u001b[0m \u001b[0mnot_found\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mmissing_mask\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mKeyError\u001b[0m: \"['dow'] not in index\"" ] } ], "source": [ "x = samples[sel_input].values\n", "y = samples['travel_time']\n", "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.30, random_state=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data set contains categorical variable in case of 'time_period' and 'dow' and hence these are one_hot_encoded since these categorical values has no significance for linear model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "category_Trans= ColumnTransformer([('encoder',OneHotEncoder(categories='auto', sparse=False),[sel_input.index(i) for i in cat_ip]),\n", " ('scaler',StandardScaler(),[sel_input.index(i) for i in scale_ip])],remainder='passthrough')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "regressor = Pipeline(steps=[('ct',category_Trans),('model',LinearRegression())])" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'regressor' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mregressor\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx_train\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0my_train\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mNameError\u001b[0m: name 'regressor' is not defined" ] } ], "source": [ "regressor.fit(x_train,y_train)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training Accuracy: 0.7442950836977186\n" ] } ], "source": [ "print('Training Accuracy: ', regressor.score(x_train,y_train))" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Prediction Accuracy: 0.7441506633075343\n" ] } ], "source": [ "print('Prediction Accuracy: ', regressor.score(x_test,y_test))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Checking Model Performance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While the model performs well with the test partition, that dataset is not representative of real world data. We want to see how the model performs to routing requests that are not between centroids of wards. To achieve this, we create a dataset with random source and destination coordinates and check the model prediction against travel times predicted by commercial data providers such as Google Maps." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Random Points within a Polygon" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We generate random coordinate pairs within the bounds of the city. But to ensure that the points fall within the actual city geometry, we do a spatial join to select the points that intersect the wards. After the join, we select a subset of 100 points." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "n_points = 200\n", "\n", "x_min, y_min, x_max, y_max = wards.total_bounds\n", "\n", "np.random.seed(0)\n", "src_x = np.random.uniform(x_min, x_max, n_points)\n", "src_y = np.random.uniform(y_min, y_max, n_points)\n", "dst_x = np.random.uniform(x_min, x_max, n_points)\n", "dst_y = np.random.uniform(y_min, y_max, n_points)\n", "\n", "src_gdf = gpd.GeoDataFrame(geometry=gpd.points_from_xy(src_x, src_y), crs='EPSG:4326')\n", "dst_gdf = gpd.GeoDataFrame(geometry=gpd.points_from_xy(dst_x, dst_y), crs='EPSG:4326')" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "src_gdf = gpd.sjoin(src_gdf, wards, how='inner', op='intersects')\n", "dst_gdf = gpd.sjoin(dst_gdf, wards, how='inner', op='intersects')\n", "\n", "src_selected = src_gdf[:100].reset_index()\n", "dst_selected = dst_gdf[:100].reset_index()" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots(figsize=(10,10))\n", "wards['geometry'].plot(color='grey',ax=ax)\n", "src_selected.geometry.plot(color='green',ax=ax)\n", "dst_selected.geometry.plot(color='red',ax=ax)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Random Days and Times" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "start_date = datetime.date(2020, 1, 1)\n", "end_date = datetime.date(2020, 3, 31)\n", "days = (end_date - start_date).days\n", "\n", "random.seed(0) \n", "random_dates = [start_date + datetime.timedelta(days=random.randrange(days))\n", " for _ in range(100)]\n", "months = [x.month for x in random_dates] \n", "days = [x.day for x in random_dates]\n", "dows = [x.weekday() for x in random_dates]\n", "random_time_periods = [random.choice(range(1, 6)) for _ in range(100)]" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "data = pd.DataFrame({\n", " 'sourceid': src_selected['MOVEMENT_ID'],\n", " 'dstid': dst_selected['MOVEMENT_ID'],\n", " 'month': months,\n", " 'day': days,\n", " 'time_period': random_time_periods,\n", " 'dow': dows,\n", " 'src_lon': src_selected.geometry.x,\n", " 'src_lat': src_selected.geometry.y,\n", " 'dst_lon': dst_selected.geometry.x,\n", " 'dst_lat': dst_selected.geometry.y,\n", "})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Random Test Dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The resulting distance data is saved locally and used in the subsequent analysis." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "osrm_data_folder = os.path.join('data', 'osrm')\n", "model_test_file = 'model_test.csv'\n", "model_test_filepath = os.path.join(osrm_data_folder, model_test_file)\n", "model_test = pd.read_csv(model_test_filepath)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sourceiddstidmonthdaytime_perioddowsrc_lonsrc_latdst_londst_latdistance
0911842192277.63789012.93056177.59009012.8880969686.2
1911842231677.64481712.94412977.58616212.88543014531.8
291165161077.64365212.94153577.76114612.93557516056.7
310165235077.65536712.95098577.71383112.93790110362.1
410165364477.65839012.94987677.71629412.9270348051.9
....................................
9518549311677.55645712.86175877.57299113.00732020097.0
9615949195377.58855012.90999177.57079512.99758212067.3
97761781124677.64840513.00660477.66172312.88150420443.1
981931783272477.59741012.91517877.65082412.8893108032.5
99111151171477.55779713.04941577.52648712.96808812629.0
\n", "

100 rows × 11 columns

\n", "
" ], "text/plain": [ " sourceid dstid month day time_period dow src_lon src_lat \\\n", "0 91 184 2 19 2 2 77.637890 12.930561 \n", "1 91 184 2 23 1 6 77.644817 12.944129 \n", "2 91 165 1 6 1 0 77.643652 12.941535 \n", "3 10 165 2 3 5 0 77.655367 12.950985 \n", "4 10 165 3 6 4 4 77.658390 12.949876 \n", ".. ... ... ... ... ... ... ... ... \n", "95 185 49 3 1 1 6 77.556457 12.861758 \n", "96 159 49 1 9 5 3 77.588550 12.909991 \n", "97 76 178 1 12 4 6 77.648405 13.006604 \n", "98 193 178 3 27 2 4 77.597410 12.915178 \n", "99 11 115 1 17 1 4 77.557797 13.049415 \n", "\n", " dst_lon dst_lat distance \n", "0 77.590090 12.888096 9686.2 \n", "1 77.586162 12.885430 14531.8 \n", "2 77.761146 12.935575 16056.7 \n", "3 77.713831 12.937901 10362.1 \n", "4 77.716294 12.927034 8051.9 \n", ".. ... ... ... \n", "95 77.572991 13.007320 20097.0 \n", "96 77.570795 12.997582 12067.3 \n", "97 77.661723 12.881504 20443.1 \n", "98 77.650824 12.889310 8032.5 \n", "99 77.526487 12.968088 12629.0 \n", "\n", "[100 rows x 11 columns]" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Predicted vs. Reference Travel Times" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To validate our model against real-world data, we collected reference travel times from Google Maps. Google Maps allows one to set a specific departure time in the past and get a range of travel times. We used our randomly generated source and destimation pairs along with random departure times and collected reference data. " ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "reference_data_folder = os.path.join('data', 'googlemaps')\n", "reference_file = 'googlemaps_traveltimes.csv'\n", "reference_filepath = os.path.join(reference_data_folder, reference_file)\n", "reference_data = pd.read_csv(reference_filepath)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sourceiddstidmonthdaytime_perioddowsrc_lonsrc_latdst_londst_latdistancegoog_distancegoog_mingoog_max
0911842192277.63789012.93056177.59009012.8880969686.2107002240
1911842231677.64481712.94412977.58616212.88543014531.8157003040
291165161077.64365212.94153577.76114612.93557516056.7162003040
310165235077.65536712.95098577.71383112.93790110362.1110002035
410165364477.65839012.94987677.71629412.9270348051.9103002440
.............................................
9518549311677.55645712.86175877.57299113.00732020097.0203004050
9615949195377.58855012.90999177.57079512.99758212067.3119002240
97761781124677.64840513.00660477.66172312.88150420443.1240004580
981931783272477.59741012.91517877.65082412.8893108032.580001835
99111151171477.55779713.04941577.52648712.96808812629.0134002428
\n", "

100 rows × 14 columns

\n", "
" ], "text/plain": [ " sourceid dstid month day time_period dow src_lon src_lat \\\n", "0 91 184 2 19 2 2 77.637890 12.930561 \n", "1 91 184 2 23 1 6 77.644817 12.944129 \n", "2 91 165 1 6 1 0 77.643652 12.941535 \n", "3 10 165 2 3 5 0 77.655367 12.950985 \n", "4 10 165 3 6 4 4 77.658390 12.949876 \n", ".. ... ... ... ... ... ... ... ... \n", "95 185 49 3 1 1 6 77.556457 12.861758 \n", "96 159 49 1 9 5 3 77.588550 12.909991 \n", "97 76 178 1 12 4 6 77.648405 13.006604 \n", "98 193 178 3 27 2 4 77.597410 12.915178 \n", "99 11 115 1 17 1 4 77.557797 13.049415 \n", "\n", " dst_lon dst_lat distance goog_distance goog_min goog_max \n", "0 77.590090 12.888096 9686.2 10700 22 40 \n", "1 77.586162 12.885430 14531.8 15700 30 40 \n", "2 77.761146 12.935575 16056.7 16200 30 40 \n", "3 77.713831 12.937901 10362.1 11000 20 35 \n", "4 77.716294 12.927034 8051.9 10300 24 40 \n", ".. ... ... ... ... ... ... \n", "95 77.572991 13.007320 20097.0 20300 40 50 \n", "96 77.570795 12.997582 12067.3 11900 22 40 \n", "97 77.661723 12.881504 20443.1 24000 45 80 \n", "98 77.650824 12.889310 8032.5 8000 18 35 \n", "99 77.526487 12.968088 12629.0 13400 24 28 \n", "\n", "[100 rows x 14 columns]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reference_data" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "def predict_time(row):\n", " input = row[['sourceid', 'dstid', 'day', 'time_period', 'dow', 'src_lon', 'src_lat', 'dst_lon', 'dst_lat', 'distance']]\n", " return round(regressor.predict([input])[0]/60)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sourceiddstiddistancegoog_distancegoog_mingoog_maxpredictedwithin_range
0911849686.210700224029Y
19118414531.815700304026N
29116516056.716200304034Y
31016510362.111000203529Y
4101658051.910300244031Y
51016510610.810700183530Y
61016511898.911500224037Y
717716520806.921200406052Y
8152017683.416800306540Y
9152020441.618800356045Y
10158213335.816000244025Y
1116914825992.0264005511070Y
1216914823628.823200406559Y
1316918029579.8285005510074Y
1416918031056.3292006512081Y
1516918031774.2300006011082Y
161687615065.919500355542Y
17115422362.323800405041Y
1817619814221.314100285038Y
1919619826543.133500508563Y
2019619832441.934800607565Y
2119619830500.730700509073Y
22519834863.534000558078Y
231941449954.910800265031Y
241983521076.721300355051N
251981889760.810200183524Y
2619818814291.314300242826Y
271985520778.021500407549Y
281981333244.132800557067Y
29321319364.519600306043Y
30321321956.721500407548Y
311801327069.027500458063Y
32661315000.517400285037Y
33811322482.023600406558Y
347819222069.321800405041Y
357819220063.519400405041Y
367816617271.016500284545Y
3719516616745.315100264043N
38216630049.531200456059Y
3917916616929.216900305544Y
\n", "
" ], "text/plain": [ " sourceid dstid distance goog_distance goog_min goog_max predicted \\\n", "0 91 184 9686.2 10700 22 40 29 \n", "1 91 184 14531.8 15700 30 40 26 \n", "2 91 165 16056.7 16200 30 40 34 \n", "3 10 165 10362.1 11000 20 35 29 \n", "4 10 165 8051.9 10300 24 40 31 \n", "5 10 165 10610.8 10700 18 35 30 \n", "6 10 165 11898.9 11500 22 40 37 \n", "7 177 165 20806.9 21200 40 60 52 \n", "8 15 20 17683.4 16800 30 65 40 \n", "9 15 20 20441.6 18800 35 60 45 \n", "10 15 82 13335.8 16000 24 40 25 \n", "11 169 148 25992.0 26400 55 110 70 \n", "12 169 148 23628.8 23200 40 65 59 \n", "13 169 180 29579.8 28500 55 100 74 \n", "14 169 180 31056.3 29200 65 120 81 \n", "15 169 180 31774.2 30000 60 110 82 \n", "16 168 76 15065.9 19500 35 55 42 \n", "17 1 154 22362.3 23800 40 50 41 \n", "18 176 198 14221.3 14100 28 50 38 \n", "19 196 198 26543.1 33500 50 85 63 \n", "20 196 198 32441.9 34800 60 75 65 \n", "21 196 198 30500.7 30700 50 90 73 \n", "22 5 198 34863.5 34000 55 80 78 \n", "23 194 144 9954.9 10800 26 50 31 \n", "24 198 35 21076.7 21300 35 50 51 \n", "25 198 188 9760.8 10200 18 35 24 \n", "26 198 188 14291.3 14300 24 28 26 \n", "27 198 55 20778.0 21500 40 75 49 \n", "28 198 13 33244.1 32800 55 70 67 \n", "29 32 13 19364.5 19600 30 60 43 \n", "30 32 13 21956.7 21500 40 75 48 \n", "31 180 13 27069.0 27500 45 80 63 \n", "32 66 13 15000.5 17400 28 50 37 \n", "33 81 13 22482.0 23600 40 65 58 \n", "34 78 192 22069.3 21800 40 50 41 \n", "35 78 192 20063.5 19400 40 50 41 \n", "36 78 166 17271.0 16500 28 45 45 \n", "37 195 166 16745.3 15100 26 40 43 \n", "38 2 166 30049.5 31200 45 60 59 \n", "39 179 166 16929.2 16900 30 55 44 \n", "\n", " within_range \n", "0 Y \n", "1 N \n", "2 Y \n", "3 Y \n", "4 Y \n", "5 Y \n", "6 Y \n", "7 Y \n", "8 Y \n", "9 Y \n", "10 Y \n", "11 Y \n", "12 Y \n", "13 Y \n", "14 Y \n", "15 Y \n", "16 Y \n", "17 Y \n", "18 Y \n", "19 Y \n", "20 Y \n", "21 Y \n", "22 Y \n", "23 Y \n", "24 N \n", "25 Y \n", "26 Y \n", "27 Y \n", "28 Y \n", "29 Y \n", "30 Y \n", "31 Y \n", "32 Y \n", "33 Y \n", "34 Y \n", "35 Y \n", "36 Y \n", "37 N \n", "38 Y \n", "39 Y " ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reference_data['predicted'] = reference_data.apply(predict_time, axis=1)\n", "results = reference_data[['sourceid', 'dstid', 'distance', 'goog_distance', 'goog_min', 'goog_max', 'predicted']].copy()\n", "results['within_range'] = np.where(\n", " (results['predicted'] <= results['goog_max'])\n", " & (results['predicted'] >= results['goog_min']), 'Y', 'N')\n", "results.head(40)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Real-time Routing and Prediction\n", "\n", "To demonstrate the use of our technique in a real-world application, we show how it can be used in a real-time routing application." ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "from dotenv import load_dotenv\n", "load_dotenv()\n", "\n", "ORS_API_KEY = os.getenv('ORS_API_KEY')" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [], "source": [ "def get_driving_route(source_coordinates, dest_coordinates):\n", " parameters = {\n", " 'api_key': ORS_API_KEY,\n", " 'start' : '{},{}'.format(source_coordinates[1], source_coordinates[0]),\n", " 'end' : '{},{}'.format(dest_coordinates[1], dest_coordinates[0])\n", " }\n", "\n", " response = requests.get(\n", " 'https://api.openrouteservice.org/v2/directions/driving-car', params=parameters)\n", "\n", " if response.status_code == 200:\n", " data = response.json()\n", " return data\n", " else:\n", " print('Request failed.')\n", " return -9999" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "def get_ward(coordinates):\n", " df = pd.DataFrame({'x': [coordinates[1]], 'y': [coordinates[0]]})\n", " src_gdf = gpd.GeoDataFrame(geometry=gpd.points_from_xy(df.x, df.y), crs='EPSG:4326')\n", " src_gdf = gpd.sjoin(src_gdf, wards, how='inner', op='intersects')\n", " return int(src_gdf['MOVEMENT_ID'][0])" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [], "source": [ "def get_route(source, destination, departure_time):\n", " sourceid = get_ward(source)\n", " dstid = get_ward(destination)\n", " day = departure_time.day\n", " time_period = get_time_period(departure_time.hour)\n", " dow = departure_time.weekday()\n", " driving_data = get_driving_route(source, destination)\n", " summary = driving_data['features'][0]['properties']['summary']\n", " distance = summary['distance']\n", " input = [sourceid, dstid, day, time_period, dow, source[1], source[0], destination[1], destination[0], distance]\n", " travel_time = round(regressor.predict([input])[0]/60)\n", " ors_travel_time = round(summary['duration']/60)\n", " route= driving_data['features'][0]['geometry']['coordinates']\n", " \n", " def swap(coord):\n", " coord[0],coord[1]=coord[1],coord[0]\n", " return coord\n", "\n", " route=list(map(swap, route))\n", " m = folium.Map(location=[(source[0] + destination[0])/2,(source[1] + destination[1])/2], zoom_start=13)\n", " \n", " tooltip = 'Model predicted time = {} mins, \\\n", " Default travel time = {} mins'.format(travel_time, ors_travel_time)\n", " folium.PolyLine(\n", " route,\n", " weight=8,\n", " color='blue',\n", " opacity=0.6,\n", " tooltip=tooltip\n", " ).add_to(m)\n", "\n", " folium.Marker(\n", " location=(source[0],source[1]),\n", " icon=folium.Icon(icon='play',color='green')\n", " ).add_to(m)\n", "\n", " folium.Marker(\n", " location=(destination[0],destination[1]),\n", " icon=folium.Icon(icon='stop',color='red')\n", " ).add_to(m)\n", "\n", " return m" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Live Demo\n", "\n", "We pick a set of coordinates within the city and show how to get turn-by-turn directions using OpenRouteService API and predict the travel-time using our model." ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [], "source": [ "source = 12.946538, 77.579975\n", "destination = 12.994029, 77.661008\n", "departure_time = datetime.datetime.now()" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_route(source, destination, departure_time)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can check how the model performs by comparing with the travel time predicted by Google Maps." ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import webbrowser\n", "\n", "url='https://www.google.com/maps/dir/{},{}/{},{}'.format(source[0],source[1],destination[0],destination[1])\n", "webbrowser.open(url)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.1" } }, "nbformat": 4, "nbformat_minor": 4 }