{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "<center><a href=\"https://www.featuretools.com/\"><img src=\"http://www.featuretools.com/wp-content/uploads/2017/12/FeatureLabs-Logo-Tangerine-2000.png\" width=\"400\" height=\"200\" /></a></center>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Predict Taxi Trip Duration using Premium Primitives\n", "\n", "The following tutorial illustrates using [premium primitives](http://primitives.featurelabs.com) in Featuretools to predict the duration of a taxi trip in New York City. An accurate predictive model would provide passengers informative time estimate before they begin their trip. \n", "\n", "In this notebook we will:\n", "\n", "1. [Load Data](#Step-1:-Load-Data)\n", "2. [Select Primitives](#Step-2.-Selecting-Premium-Primitives)\n", "3. [Run Featuretools](#Step-3.-Running-Featuretools)\n", "4. [Build a model](#Step-4:-Building-the-Model)\n", "5. [Interpret features](#Step-5:-Interpretting-Features)\n", "6. [Apply to new data](#Step-6:-Apply-feature-engineering-and-modeling-to-new-data)\n", "\n", "To learn more about Featuretools, visit our [documentation](http://featuretools.featurelabs.com)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import featuretools as ft\n", "import pandas as pd\n", "import numpy as np\n", "from sklearn.ensemble import RandomForestRegressor\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import mean_squared_error\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Load Data\n", "\n", "First, we load in a copy of the data. It is 175 MB, so it may take a few minutes to download" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "es = ft.entityset.read_entityset(\"s3://featurelabs-static/nyc_taxi_entityset_train.tar\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this entity set, there is data on nearly **1.5 million taxi trips** in New York City across a several month period. For each trip, we have a handful of columns about it shown below.\n", "\n", "With [graphviz installed](https://docs.featuretools.com/getting_started/install.html#installing-graphviz) we can generate a visualization of the entity set." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n", " -->\n", "<!-- Title: taxi Pages: 1 -->\n", "<svg width=\"246pt\" height=\"151pt\"\n", " viewBox=\"0.00 0.00 246.38 151.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 147)\">\n", "<title>taxi</title>\n", "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-147 242.3799,-147 242.3799,4 -4,4\"/>\n", "<!-- trips -->\n", "<g id=\"node1\" class=\"node\">\n", "<title>trips</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"0,-.5 0,-142.5 238.3799,-142.5 238.3799,-.5 0,-.5\"/>\n", "<text text-anchor=\"middle\" x=\"119.1899\" y=\"-127.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">trips</text>\n", "<polyline fill=\"none\" stroke=\"#000000\" points=\"0,-120.5 238.3799,-120.5 \"/>\n", "<text text-anchor=\"start\" x=\"8\" y=\"-105.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">id : index</text>\n", "<text text-anchor=\"start\" x=\"8\" y=\"-91.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">pickup_datetime : datetime_time_index</text>\n", "<text text-anchor=\"start\" x=\"8\" y=\"-77.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">store_and_fwd_flag : boolean</text>\n", "<text text-anchor=\"start\" x=\"8\" y=\"-63.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">trip_duration : numeric</text>\n", "<text text-anchor=\"start\" x=\"8\" y=\"-49.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">passenger_count : ordinal</text>\n", "<text text-anchor=\"start\" x=\"8\" y=\"-35.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">vendor_id : categorical</text>\n", "<text text-anchor=\"start\" x=\"8\" y=\"-21.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">pickup_latlong : latlong</text>\n", "<text text-anchor=\"start\" x=\"8\" y=\"-7.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">dropoff_latlong : latlong</text>\n", "</g>\n", "</g>\n", "</svg>\n" ], "text/plain": [ "<graphviz.dot.Digraph at 0x1048b6ef0>" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "es.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The primary data types for this problem are **geospatial** (latitude and longitude) and **temporal**. By default, most machine learning algorithms have a difficult time processing these data types. Therefore, in order get the most out of this data, we need perform feature engineering to extract predictive signals before applying machine learning." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2. Selecting Premium Primitives" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Featuretools has several premium primitives that can be used to assist with preparing this data as numeric feature vectors for machine learning. \n", "\n", "Below we've selected several primitives that apply to the data types in this dataset. To learn more about any of the primitives, click on the links to view the documentation.\n", "\n", "* [City Block Distance](https://primitives.featurelabs.com/#CityblockDistance) - Cars cannot travel diagonally through a city block, so this primitive can be used to give us the most accurate estimate of the distance the passenger has to travel during there trip. \n", "\n", "\n", "* [Lat Long To City](https://primitives.featurelabs.com/#LatLongToCity) - An important factor for the length of a trip is where it begins or ends. This primitive can convert the pick up and drop of locations to the borough e.g this trip began in Manhattan, but ends Brooklyn so we must cross the east river\n", "\n", "\n", "* [Is In Geo Box](https://primitives.featurelabs.com/#IsInGeoBox) - Trips starting and ending by points of interesting can also be relevent. To extract this we can use a geobox to detect trips that start or end within a couple important areas in New York City that have a lot of taxi trips.\n", "\n", " * Area around JFK Airport - (40.62, -73.85), (40.70, -73.75)\n", " * Area around La Guardia Airport - (40.76, -73.89), (40.78, -73.85)\n", " \n", "\n", "* [Part Of Day](https://primitives.featurelabs.com/#PartOfDay) - The traffic conditions greatly affect the duration of the trip. We know traffic varies by time of day, so we can use this primitive to extract if the trip occurs during the morning, afternoon, evening, or night.\n", "\n", "\n", "* [Is Federal Holiday](https://primitives.featurelabs.com/#IsFirstWeekOfMonth) - A typical Monday morning may have heavy traffic going into the city, but if it is a federal holiday, the traffic conditions are likely lighter.\n", "\n", "\n", "* [Season](https://primitives.featurelabs.com/#Season), [Quarter](https://primitives.featurelabs.com/#Quarter) - The weather outside may determine street conditions. Using these primitives we can extract the time of year. Note: this demo data only spans a few months, but these primitives may be very relevent when we expand the dataset.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3. Running Featuretools\n", "\n", "Next, we run Featuretools using the primitives " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from featuretools.primitives import (CityblockDistance, LatLongToCity, IsInGeoBox, PartOfDay, \n", " IsFederalHoliday, Season, NthWeekOfMonth, Quarter)\n", "\n", "trans_primitives = [CityblockDistance,\n", " LatLongToCity,\n", " IsInGeoBox((40.62, -73.85), (40.70, -73.75)), # JFK Airport\n", " IsInGeoBox((40.76, -73.89), (40.78, -73.85)), # La Guardia Airport\n", " IsFederalHoliday,\n", " PartOfDay,\n", " Season,\n", " NthWeekOfMonth,\n", " Quarter]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we can create the feature matrix using [Deep Feature Synthesis](https://featuretools.featurelabs.com/automated_feature_engineering/afe.html)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Built 16 features\n", "Elapsed: 04:55 | Progress: 100%|██████████| Remaining: 00:00\n" ] } ], "source": [ "fm, features = ft.dfs(entityset=es,\n", " target_entity=\"trips\",\n", " trans_primitives=trans_primitives,\n", " chunk_size=.1, # lowering this gives more frequent updates\n", " verbose=True)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>store_and_fwd_flag</th>\n", " <th>trip_duration</th>\n", " <th>passenger_count</th>\n", " <th>vendor_id</th>\n", " <th>CITYBLOCK_DISTANCE(dropoff_latlong, pickup_latlong)</th>\n", " <th>LATLONG_TO_CITY(pickup_latlong)</th>\n", " <th>LATLONG_TO_CITY(dropoff_latlong)</th>\n", " <th>IS_IN_GEOBOX(pickup_latlong, point1=(40.62, -73.85), point2=(40.7, -73.75))</th>\n", " <th>IS_IN_GEOBOX(dropoff_latlong, point1=(40.62, -73.85), point2=(40.7, -73.75))</th>\n", " <th>IS_IN_GEOBOX(pickup_latlong, point1=(40.76, -73.89), point2=(40.78, -73.85))</th>\n", " <th>IS_IN_GEOBOX(dropoff_latlong, point1=(40.76, -73.89), point2=(40.78, -73.85))</th>\n", " <th>IS_FEDERAL_HOLIDAY(pickup_datetime)</th>\n", " <th>PART_OF_DAY(pickup_datetime)</th>\n", " <th>SEASON(pickup_datetime)</th>\n", " <th>NTH_WEEK_OF_MONTH(pickup_datetime)</th>\n", " <th>QUARTER(pickup_datetime)</th>\n", " </tr>\n", " <tr>\n", " <th>id</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <td>id0000001</td>\n", " <td>False</td>\n", " <td>1105</td>\n", " <td>1</td>\n", " <td>2</td>\n", " <td>4.457185</td>\n", " <td>New York City</td>\n", " <td>Long Island City</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>Morning</td>\n", " <td>summer</td>\n", " <td>3.0</td>\n", " <td>2</td>\n", " </tr>\n", " <tr>\n", " <td>id0000003</td>\n", " <td>False</td>\n", " <td>1046</td>\n", " <td>5</td>\n", " <td>2</td>\n", " <td>1.770763</td>\n", " <td>Weehawken</td>\n", " <td>Hoboken</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>Morning</td>\n", " <td>spring</td>\n", " <td>3.0</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <td>id0000005</td>\n", " <td>False</td>\n", " <td>368</td>\n", " <td>1</td>\n", " <td>2</td>\n", " <td>0.904869</td>\n", " <td>Manhattan</td>\n", " <td>Manhattan</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>Morning</td>\n", " <td>spring</td>\n", " <td>5.0</td>\n", " <td>2</td>\n", " </tr>\n", " <tr>\n", " <td>id0000008</td>\n", " <td>False</td>\n", " <td>303</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>0.967836</td>\n", " <td>New York City</td>\n", " <td>New York City</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>Morning</td>\n", " <td>summer</td>\n", " <td>3.0</td>\n", " <td>2</td>\n", " </tr>\n", " <tr>\n", " <td>id0000009</td>\n", " <td>False</td>\n", " <td>547</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>4.147816</td>\n", " <td>Manhattan</td>\n", " <td>Manhattan</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>False</td>\n", " <td>Night</td>\n", " <td>spring</td>\n", " <td>2.0</td>\n", " <td>2</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " store_and_fwd_flag trip_duration passenger_count vendor_id \\\n", "id \n", "id0000001 False 1105 1 2 \n", "id0000003 False 1046 5 2 \n", "id0000005 False 368 1 2 \n", "id0000008 False 303 1 1 \n", "id0000009 False 547 1 1 \n", "\n", " CITYBLOCK_DISTANCE(dropoff_latlong, pickup_latlong) \\\n", "id \n", "id0000001 4.457185 \n", "id0000003 1.770763 \n", "id0000005 0.904869 \n", "id0000008 0.967836 \n", "id0000009 4.147816 \n", "\n", " LATLONG_TO_CITY(pickup_latlong) LATLONG_TO_CITY(dropoff_latlong) \\\n", "id \n", "id0000001 New York City Long Island City \n", "id0000003 Weehawken Hoboken \n", "id0000005 Manhattan Manhattan \n", "id0000008 New York City New York City \n", "id0000009 Manhattan Manhattan \n", "\n", " IS_IN_GEOBOX(pickup_latlong, point1=(40.62, -73.85), point2=(40.7, -73.75)) \\\n", "id \n", "id0000001 False \n", "id0000003 False \n", "id0000005 False \n", "id0000008 False \n", "id0000009 False \n", "\n", " IS_IN_GEOBOX(dropoff_latlong, point1=(40.62, -73.85), point2=(40.7, -73.75)) \\\n", "id \n", "id0000001 False \n", "id0000003 False \n", "id0000005 False \n", "id0000008 False \n", "id0000009 False \n", "\n", " IS_IN_GEOBOX(pickup_latlong, point1=(40.76, -73.89), point2=(40.78, -73.85)) \\\n", "id \n", "id0000001 False \n", "id0000003 False \n", "id0000005 False \n", "id0000008 False \n", "id0000009 False \n", "\n", " IS_IN_GEOBOX(dropoff_latlong, point1=(40.76, -73.89), point2=(40.78, -73.85)) \\\n", "id \n", "id0000001 False \n", "id0000003 False \n", "id0000005 False \n", "id0000008 False \n", "id0000009 False \n", "\n", " IS_FEDERAL_HOLIDAY(pickup_datetime) PART_OF_DAY(pickup_datetime) \\\n", "id \n", "id0000001 False Morning \n", "id0000003 False Morning \n", "id0000005 False Morning \n", "id0000008 False Morning \n", "id0000009 False Night \n", "\n", " SEASON(pickup_datetime) NTH_WEEK_OF_MONTH(pickup_datetime) \\\n", "id \n", "id0000001 summer 3.0 \n", "id0000003 spring 3.0 \n", "id0000005 spring 5.0 \n", "id0000008 summer 3.0 \n", "id0000009 spring 2.0 \n", "\n", " QUARTER(pickup_datetime) \n", "id \n", "id0000001 2 \n", "id0000003 1 \n", "id0000005 2 \n", "id0000008 2 \n", "id0000009 2 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fm.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the primitives above, we created many new features to feed into our machine learning algorithm. Because some of these features are categorical, we will perform [categorical encoding](https://featuretools.featurelabs.com/generated/featuretools.encode_features.html#featuretools.encode_features) (with one-hot encoding) using featuretools before continuing. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Encoding pass 1: 100%|██████████| 16/16 [00:08<00:00, 1.96feature/s]\n", "Encoding pass 2: 100%|██████████| 35/35 [00:00<00:00, 235.92feature/s]\n" ] } ], "source": [ "fm_encoded, features_encoded = ft.encode_features(fm, features, top_n=5, verbose=True, include_unknown=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4: Building the Model\n", "\n", "After applying Featuretools, we have a feature matrix of all numeric data that is ready for machine learning. \n", "\n", "The final step we will do is apply a `log` transform to our trip durations. By doing this we can better distiguish short trips when training our models. We can later undo this transform to generate final predictions." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "X = fm_encoded.copy()\n", "y = (X.pop('trip_duration') + 1).apply(np.log)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "to validate our model, we will do a simple train/test split" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we are ready to train and score our model. For the purposes of this example, we will not perform any hyper parameter tuning of our model." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers.\n", "[Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 58.9s\n", "[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed: 2.5min finished\n" ] }, { "data": { "text/plain": [ "RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,\n", " max_features='auto', max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,\n", " oob_score=False, random_state=0, verbose=True,\n", " warm_start=False)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "estimator = RandomForestRegressor(n_estimators=100,\n", " n_jobs=-1,\n", " random_state=0,\n", " verbose=True)\n", "estimator.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the trained model, we can look at the mean squared error on the test set" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.\n", "[Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 6.1s\n", "[Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed: 15.9s finished\n" ] }, { "data": { "text/plain": [ "0.2237187249878536" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_pred = estimator.predict(X_test)\n", "mean_squared_error(y_pred, y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5: Interpretting Features\n", "\n", "Featuretools primitives are valueable because they transform the raw data (e.g dates, latitude and longitudes) into meaningful attributes a machine learning model can learn from. Compared to other techniques for feature engineering, they are also more interpretable by humans looking to understand the model.\n", "\n", "\n", "To understand the model better, let's take a look at the most important features discovered when we trained the random forest" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>feature</th>\n", " <th>importance</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <td>8</td>\n", " <td>CITYBLOCK_DISTANCE(dropoff_latlong, pickup_lat...</td>\n", " <td>0.797597</td>\n", " </tr>\n", " <tr>\n", " <td>31</td>\n", " <td>NTH_WEEK_OF_MONTH(pickup_datetime)</td>\n", " <td>0.041128</td>\n", " </tr>\n", " <tr>\n", " <td>27</td>\n", " <td>PART_OF_DAY(pickup_datetime) = Night</td>\n", " <td>0.011538</td>\n", " </tr>\n", " <tr>\n", " <td>6</td>\n", " <td>vendor_id = 2</td>\n", " <td>0.011130</td>\n", " </tr>\n", " <tr>\n", " <td>7</td>\n", " <td>vendor_id = 1</td>\n", " <td>0.010091</td>\n", " </tr>\n", " <tr>\n", " <td>1</td>\n", " <td>passenger_count = 1</td>\n", " <td>0.008714</td>\n", " </tr>\n", " <tr>\n", " <td>28</td>\n", " <td>SEASON(pickup_datetime) = spring</td>\n", " <td>0.008686</td>\n", " </tr>\n", " <tr>\n", " <td>24</td>\n", " <td>PART_OF_DAY(pickup_datetime) = Afternoon</td>\n", " <td>0.008539</td>\n", " </tr>\n", " <tr>\n", " <td>2</td>\n", " <td>passenger_count = 2</td>\n", " <td>0.007104</td>\n", " </tr>\n", " <tr>\n", " <td>30</td>\n", " <td>SEASON(pickup_datetime) = summer</td>\n", " <td>0.006361</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " feature importance\n", "8 CITYBLOCK_DISTANCE(dropoff_latlong, pickup_lat... 0.797597\n", "31 NTH_WEEK_OF_MONTH(pickup_datetime) 0.041128\n", "27 PART_OF_DAY(pickup_datetime) = Night 0.011538\n", "6 vendor_id = 2 0.011130\n", "7 vendor_id = 1 0.010091\n", "1 passenger_count = 1 0.008714\n", "28 SEASON(pickup_datetime) = spring 0.008686\n", "24 PART_OF_DAY(pickup_datetime) = Afternoon 0.008539\n", "2 passenger_count = 2 0.007104\n", "30 SEASON(pickup_datetime) = summer 0.006361" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "importances = pd.DataFrame(zip(X.columns, estimator.feature_importances_), columns=[\"feature\", \"importance\"]).sort_values(\"importance\", ascending=False)\n", "importances.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you might expect, **a majority of the top features are the result of applying the premium primitives**. Let's take a closer look at some features in particular" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### As the distance increases, the trip duration increase\n", "\n", "Unsuprisingly the city block distance of the trip is the most important feature. The longer the trip's distance, the longer it will take. However, we will see below this isn't always the case." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<matplotlib.axes._subplots.AxesSubplot at 0x138793e10>" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "<Figure size 432x288 with 1 Axes>" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fm.sample(1000).plot.scatter(x='CITYBLOCK_DISTANCE(dropoff_latlong, pickup_latlong)', y ='trip_duration')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Average trips are shorter in duration during winter, and longer during the summer\n", "\n", "This may be because people are more likely to take a taxi when it is cold outside in New York City" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<matplotlib.axes._subplots.AxesSubplot at 0x13859c908>" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "<Figure size 432x288 with 1 Axes>" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fm.groupby(\"SEASON(pickup_datetime)\")[\"trip_duration\"].mean().plot.bar(title=\"Trip duration by Season\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Trips are longer in duration in the afternoon even though they cover a shorter distance\n", "\n", "This is counter to what we saw earlier that longer distance trips take a longer amount of time. This is why it is important to extract numerous features from your data so your model and can learn multivariate relationships. \n", "\n", "The likely explaination for this is that there is more traffic in the afternoon than at night. " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<matplotlib.axes._subplots.AxesSubplot at 0x1572b0a20>" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA20AAAE/CAYAAADVKysfAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3deZglZXn38e9PQNlBYaJmYBgMGGNYFEeCSxIVk+CKiRvEBQxm3hiNGmIUfY1bEpcsmhgTDBFkCS4IEtAgxldRXNEB2XGZoLKIMGzDqjhwv39UtZxpeqb3U9V9vp/rOtfU8pyq+1T3nLvvqqeeSlUhSZIkSeqn+3UdgCRJkiRpwyzaJEmSJKnHLNokSZIkqccs2iRJkiSpxyzaJEmSJKnHLNokSZIkqccs2jQvkvxVkg/O0bZenuSLc7GtKe7vYUlum4ftbpqkkiyf622P288mSW5Lsmw+9zPFWN6d5IYkV3UdiyR1bS5z4wa2f0iSz8zX9ueTOVLaOIs2Tar9cht73ZPkzoH5F030nqr666r6k2HHOhNJrkrypLH5qrq8qrbuMKSNapP+2PH/aZK7B+YvqKq7q2rrqrqi4zh3BV4N/GpV7TTB+qe2v09jsV+V5ONJHjP8aCVpeoadG5P8Z5K7ktzavi5K8rdJth3Y/nFV9bQpbuttM4mj78yRWqws2jSp9stt67aQuQJ41sCyE8e3T7Lp8KOcWJ9imStt0h/7ebwK+PLAz2PvruMbsAtwXVVdv5E2V7SfYxvgccD3ga8OFtGS1Ecd5cZ3VtU2wBLgMOA3gS8n2WIOtr0omCO1WFm0adaS/E179uejSW4FXtwuO7Zdv1vb5eGPk/y4ff35Rra3JMmnk9yS5BvArgPrdktS49p/Jcmh7fTLk5yd5P1JbgTenGT3JGcluTHJ9UlOSLJd2/6jwC8Dn2nPZB0+fh9JdmrjuTHJ95P80bjP/tH2rOWtSS5Oss8kh+xZSX7QxvLuJPdLsnmSm5P82sC2H5rkjiQ7TPYzGHc81uti0sb2L0k+237Gs5M8uF12c5LLkuw98P6dkpyaZE0b5ys3sq/t2+2vSfLDJG9M4wDgM8Cydp8f2ljM1biyqt4MHAu8e2AfH2jPMN6S5FtJHt8uX9oen+0H2u6b5CeLsViXtLDMdW4cVFU/rapvAs8CHgIc0m7zF7cTtLnl/UmuS7I2yYVJHpnkT4EXAm9qv59Pbdu/OcnlbS67JMmzBz7Ly5N8Kcn72rxxeZLfHVi/Q5Jjk1yT5KYkpwyse3aSC9r3fSXJHpN8PHPkOOZIgUWb5s7vAx8BtgM+voE2vwXsBjyNpph60gbaHQncSpOIVgJ/tIF2G/J44DKaM5HvAQL8Tbu9RwIPA/4KoKoOBn4MPK09C/feCbb3ceAHNMXdC4G/S/LbA+ufA5wAbE/zJfz+SeI7ENgHeAzwPOClVfVT4CTgxQPt/hD4bFXdMLWPvVEvBI4AdgQK+AbwdWAH4DTgH6BJ8sCngW8BS4HfAf4yyf4b2O6/AVvSHNOn0Jz5fWlVnUnzx8QV7XF9+TRi/STw2CSbt/PnAHsBDwJOBj6R5AFVdTXwFeD5A+99CfDRqlo3jf1J0nyZy9x4H1W1Fvg8zRW38Z4G7AfsDjwQOAi4sar+rY3lne338++37b8HPKGN9W+BjyR58MD2Hg9cRJM33gccPbDuI8D9aXLsLwH/DJDkscB/AC9v33cMcFqS+2/kY5kjN84cOaIs2jRXvlJVn6qqe6rqzg20eXtV3VFVFwDHAQePb5BkM5oi6K/athfSFETTcUVVHdn2W7+zqr5XVZ+vqruq6jqaZPPbk22kjWdXYF/giPbM5nnAh2m++MZ8qao+W1V3t7E+apLNvruqbqqqH9EUeGPH4TjgD5OknX8J0//sG3JKVX27TXz/BdxWVR9pY/448Oi23eOAbavqne3xWk2TmA8av8H2Z/UCmmNza1VdTnNsXzK+7TT9mOa7aTuAqjqhqm5sk8zfAdvS/IEDzTF7cRvPpm2cc3XMJGm25iQ3TuLHNH+wj/dzmu/LRwBU1aVV9ZMNbaSqTqqqa9pYPwL8EFgx0OR/q+qYNm8cB+yUZMckOwP7A69oc9vPq+rs9j0rgX+rqm+1OfmYdvljN/J5zJEbZ44cURZtmitXTrPNj2iuXI33YGCTCdrOOJYkD0lyUpKrk9xC07Vgxylu65eB66vq9nHxLB2YH0yCdwBbTSO+XxyHqvoqsA54Ytt9ZBnw31OMczLXDkzfOcH82MAru9B017h57AW8nuYq5Xi/RPOzGvz5jD82M7EUuAdYC5Dk9Um+k2QtcBPN8R37+Z0K7J1mFLADaO4POG+W+5ekuTJXuXFjlgI3jl9YVf8DfJCm98q1ST6YZJsNbSTJoQPdGG+mKfYGc+X4XAdN7tiZJk+unWCzuwBvGJdTHsrG84Q5cuPMkSPKok1zpSZvws4D08tozhaNdy3Nl9H4tmNuB0iy5cCy8V+W42N5D/AzYM+q2hY4lKbL5IbaD/oxsGOSwUJsGXD1Rt4zmY0dh+Npzoq9BDipqn42i/3MxJXA96tq+4HXNlX1rAnaXgfcTZPExsz22EDTnehbVfXTJE8GDgeeS9P99IHAbbQ/v6q6AzgFeBFze9ZVkubCXOXGCaUZOfIpwJcn3HnVP1XVPsAeNF0XD58oriQPoynuXgHsUFXbA99h/Vy5IVfS5MltN7Du7eNyypZVddJGtmeO3Dhz5IiyaNMw/VWSLZLsSXPT9H3691fVz2m6Jry9bbsH63cl+En7enGaZ62sZP0vxIlsQ1PsrW27cbxu3Pprafqb30dV/QBYBbwzyQOSPAp4GfCfk+xzY17f3py8jGa438HjcAJNH/4/pElOw/Z14K4kf9He+L1Jkj0zwRDD7c/qZJpjs3XblfTPmcGxaW/M3inJ22mK6je1q7ahObN6PbAZ8DbueyXzeJr7Hp8xk31LUscmzY3jtfloBc39VmuYIF+0g07s23aLux24i+akKNw3721NU8itad6aP6btVjmZqroS+H/Av7a5bbMkv9Wu/g/glUke237Pb53kWeNOhI5njhzHHCmwaNNwfQW4HPgf4F1V9YUNtHsFzdmia2n6in94bEVVFfDHNF9Y19P02z5nkv2+lea+tLXA6TRnnQa9k6ZIvDnJayd4/wtpbuT+Cc0X8Juq6ouT7HNjPgWcD3ybpuvCsWMrquqHNDd6/6yqvjaLfcxI2yf+6TTH64c0x/jfafrIT+RPaf4Q+CHwJZr+89NJpMvSPMj8Npqf4yOB3xr43TiD5o+B77f7uAW4Ztw2zgY2Bc6pKh9QKmmhmWpuhGbEx1uBG2i+b78BPKG9ojLe9jQ59Gaa789rgLHBtj5E023upiQnt/eP/wvwzbbdrzJ5bh00NkDI92hy958BVNU3aHL6kTRd977H+oOJTMQceS9zpH4hzd/A0vxJshtNd4KpdLMYeUmOBy6vqrd1HctCkeRs4JiqOrbrWCRpKsyNM2OOnD5z5OLgcxqkHmnvKzgQ2LPrWBaKJPvR3K/xia5jkSTNH3Pk9JkjFw+7R0o9keRdwAU0z865out4FoIkJwJnAq8ZN8KnJGkRMUdOnzlycbF7pCRJkiT1mFfaJEmSJKnHLNokSZIkqcd6MRDJjjvuWMuXL+86DEnSEJx77rnXV9WSruNYKMyRkjQaNpYfe1G0LV++nFWrVnUdhiRpCJL8qOsYFhJzpCSNho3lR7tHSpIkSVKPWbRJkiRJUo9ZtEmSNAeSbJ7km0kuSHJJkrdP0OYBST6eZHWSc5IsH36kkqSFxqJNkqS58TPgKVW1N/Ao4IAk+41rcxhwU1XtBrwPeM+QY5QkLUAWbZIkzYFq3NbObta+alyzA4Hj2umTgf2TZEghSpIWKIs2SZLmSJJNkpwPXAd8rqrOGddkKXAlQFWtA9YCOww3SknSQmPRJknSHKmqu6vqUcBOwL5J9pjJdpKsTLIqyao1a9bMbZCSpAXHok2SpDlWVTcDZwEHjFt1NbAzQJJNge2AGyZ4/1FVtaKqVixZ4nPIJWnU9eLh2vNt+RH/3XUIM/bDdz+j6xAkSVOQZAnw86q6OckWwO9w34FGTgcOAb4OPA/4QlWNv+9NkhYt/y6fmZEo2iRJGoKHAscl2YSmJ8tJVfXpJO8AVlXV6cDRwAlJVgM3Agd1F64kaaGwaJMkaQ5U1YXAoydY/paB6Z8Czx9mXJKkhc972iRJkiSpxyzaJEmSJKnHLNokSZIkqccs2iRJkiSpxyzaJEmSJKnHLNokSZIkqccs2iRJkiSpxyzaJEmSJKnHplS0JfnzJJckuTjJR5NsnmTXJOckWZ3k40nu37Z9QDu/ul2/fD4/gCRJkiQtZpMWbUmWAq8GVlTVHsAmwEHAe4D3VdVuwE3AYe1bDgNuape/r20nSZIkSZqBqXaP3BTYIsmmwJbANcBTgJPb9ccBz2mnD2znadfvnyRzE64kSZIkjZZJi7aquhr4B+AKmmJtLXAucHNVrWubXQUsbaeXAle2713Xtt9hbsOWJEmSpNEwle6RD6S5erYr8MvAVsABs91xkpVJViVZtWbNmtluTpIkSZIWpal0j3wq8IOqWlNVPwc+CTwB2L7tLgmwE3B1O301sDNAu3474IbxG62qo6pqRVWtWLJkySw/hiRJkiQtTlMp2q4A9kuyZXtv2v7ApcBZwPPaNocAp7XTp7fztOu/UFU1dyFLkiRJ0uiYyj1t59AMKHIecFH7nqOANwCHJ1lNc8/a0e1bjgZ2aJcfDhwxD3FLkiRJ0kjYdPImUFVvBd46bvHlwL4TtP0p8PzZhyZJkiRJmlLRJmnhWX7Ef3cdwqz88N3P6DoESZKkXrBo07yycJAkSZJmZ6oP15YkSZIkdcCiTZIkSZJ6zKJNkiRJknrMok2SJEmSesyiTZIkSZJ6zNEjJWmOOWqqJEmaS15pkyRJkqQes2iTJGmWkuyc5Kwklya5JMlrJmjzpCRrk5zfvt7SRaySpIXH7pGSJM3eOuAvquq8JNsA5yb5XFVdOq7dl6vqmR3EJ0lawLzSJknSLFXVNVV1Xjt9K3AZsLTbqCRJi4VX2iRJmkNJlgOPBs6ZYPXjklwA/Bh4XVVdMsTQ1DMLedAiByyShsuiTZKkOZJka+AU4LVVdcu41ecBu1TVbUmeDvwXsPsGtrMSWAmwbNmyeYxYkrQQ2D1SkqQ5kGQzmoLtxKr65Pj1VXVLVd3WTp8BbJZkx4m2VVVHVdWKqlqxZMmSeY1bktR/Fm2SJM1SkgBHA5dV1Xs30OYhbTuS7EuTg28YXpSSpIXK7pGSJM3eE4CXABclOb9d9iZgGUBVfRB4HvCKJOuAO4GDqqq6CFaStLBYtEmSNEtV9RUgk7T5APCB4UQkaWMcBEYLjd0jJUmSJKnHLNokSZIkqccs2iRJkiSpxyzaJEmSJKnHHIhEkqQRtZAHYwAHZJA0OrzSJkmSJEk9ZtEmSZIkST1m0SZJkiRJPWbRJkmSJEk9ZtEmSZIkST1m0SZJkiRJPWbRJkmSJEk9ZtEmSZIkST1m0SZJkiRJPWbRJkmSJEk9ZtEmSZIkST1m0SZJkiRJPWbRJkmSJEk9ZtEmSZIkST1m0SZJkiRJPWbRJkmSJEk9ZtEmSZIkST02paItyfZJTk7ynSSXJXlckgcl+VyS77f/PrBtmyTvT7I6yYVJ9pnfjyBJkiRJi9dUr7T9M3BmVT0C2Bu4DDgC+HxV7Q58vp0HeBqwe/taCRw5pxFLkiRJ0giZtGhLsh3wW8DRAFV1V1XdDBwIHNc2Ow54Tjt9IHB8Nb4BbJ/koXMeuSRJkiSNgKlcadsVWAN8OMm3k3woyVbAg6vqmrbNT4AHt9NLgSsH3n9Vu2w9SVYmWZVk1Zo1a2b+CSRJkiRpEZtK0bYpsA9wZFU9Gride7tCAlBVBdR0dlxVR1XViqpasWTJkum8VZKkXkmyc5Kzklya5JIkr5mgjfd8S5JmZCpF21XAVVV1Tjt/Mk0Rd+1Yt8f23+va9VcDOw+8f6d2mSRJi9U64C+q6pHAfsArkzxyXBvv+ZYkzcikRVtV/QS4Msmvtov2By4FTgcOaZcdApzWTp8OvLQ9o7gfsHagG6UkSYtOVV1TVee107fSDNg1/tYA7/mWJM3IplNs92fAiUnuD1wOvIym4DspyWHAj4AXtG3PAJ4OrAbuaNtKkjQSkiwHHg2cM27Vhu759sSmJGmjplS0VdX5wIoJVu0/QdsCXjnLuCRJWnCSbA2cAry2qm6ZxXZW0nShZNmyZXMUnSRpoZrqc9okSdJGJNmMpmA7sao+OUGTKd/z7WBdkqRBFm2SJM1SktA8z/SyqnrvBpp5z7ckaUamek+bJEnasCcALwEuSnJ+u+xNwDKAqvog3vMtSZohizZJkmapqr4CZJI23vMtSZoRu0dKkiRJUo9ZtEmSJElSj1m0SZIkSVKPWbRJkiRJUo9ZtEmSJElSj1m0SZIkSVKPWbRJkiRJUo9ZtEmSJElSj1m0SZIkSVKPWbRJkiRJUo9ZtEmSJElSj1m0SZIkSVKPWbRJkiRJUo9ZtEmSJElSj1m0SZIkSVKPWbRJkiRJUo9ZtEmSJElSj1m0SZIkSVKPWbRJkiRJUo9ZtEmSJElSj1m0SZIkSVKPWbRJkiRJUo9ZtEmSJElSj1m0SZIkSVKPWbRJkiRJUo9ZtEmSJElSj1m0SZI0B5Ick+S6JBdvYP2TkqxNcn77esuwY5QkLUybdh2AJEmLxLHAB4DjN9Lmy1X1zOGEI0laLLzSJknSHKiqs4Ebu45DkrT4WLRJkjQ8j0tyQZLPJPn1roORJC0Mdo+UJGk4zgN2qarbkjwd+C9g94kaJlkJrARYtmzZ8CKUJPWSV9okSRqCqrqlqm5rp88ANkuy4wbaHlVVK6pqxZIlS4YapySpfyzaJEkagiQPSZJ2el+aHHxDt1FJkhYCu0dKkjQHknwUeBKwY5KrgLcCmwFU1QeB5wGvSLIOuBM4qKqqo3AlSQuIRZskSXOgqg6eZP0HaB4JIEnStEy5e2SSTZJ8O8mn2/ldk5yTZHWSjye5f7v8Ae386nb98vkJXZIkSZIWv+nc0/Ya4LKB+fcA76uq3YCbgMPa5YcBN7XL39e2kyRJkiTNwJSKtiQ7Ac8APtTOB3gKcHLb5DjgOe30ge087fr9x268liRJkiRNz1SvtP0T8HrgnnZ+B+DmqlrXzl8FLG2nlwJXArTr17bt15NkZZJVSVatWbNmhuFLkiRJ0uI2adGW5JnAdVV17lzu2GfQSJIkSdLkpjJ65BOAZyd5OrA5sC3wz8D2STZtr6btBFzdtr8a2Bm4KsmmwHb4HBpJkiRJmpFJr7RV1RuraqeqWg4cBHyhql4EnEXzzBmAQ4DT2unT23na9V/wOTSSJEmSNDPTGT1yvDcAhydZTXPP2tHt8qOBHdrlhwNHzC5ESZIkSRpd03q4dlV9EfhiO305sO8EbX4KPH8OYpMkSZKkkTebK22SJEmSpHlm0SZJkiRJPWbRJkmSJEk9ZtEmSZIkST1m0SZJkiRJPWbRJkmSJEk9ZtEmSZIkST1m0SZJkiRJPWbRJkmSJEk9ZtEmSZIkST1m0SZJkiRJPWbRJkmSJEk9ZtEmSZIkST1m0SZJkiRJPWbRJkmSJEk9ZtEmSZIkST1m0SZJkiRJPWbRJknSHEhyTJLrkly8gfVJ8v4kq5NcmGSfYccoSVqYLNokSZobxwIHbGT904Dd29dK4MghxCRJWgQs2iRJmgNVdTZw40aaHAgcX41vANsneehwopMkLWQWbZIkDcdS4MqB+avaZZIkbZRFmyRJPZNkZZJVSVatWbOm63AkSR2zaJMkaTiuBnYemN+pXXYfVXVUVa2oqhVLliwZSnCSpP6yaJMkaThOB17ajiK5H7C2qq7pOihJUv9t2nUAkiQtBkk+CjwJ2DHJVcBbgc0AquqDwBnA04HVwB3Ay7qJVJK00Fi0SZI0B6rq4EnWF/DKIYUjSVpE7B4pSZIkST1m0SZJkiRJPWbRJkmSJEk9ZtEmSZIkST1m0SZJkiRJPWbRJkmSJEk9ZtEmSZIkST1m0SZJkiRJPWbRJkmSJEk9ZtEmSZIkST1m0SZJkiRJPWbRJkmSJEk9ZtEmSZIkST02adGWZOckZyW5NMklSV7TLn9Qks8l+X777wPb5Uny/iSrk1yYZJ/5/hCSJEmStFhN5UrbOuAvquqRwH7AK5M8EjgC+HxV7Q58vp0HeBqwe/taCRw551FLkiRJ0oiYtGirqmuq6rx2+lbgMmApcCBwXNvsOOA57fSBwPHV+AawfZKHznnkkiRJkjQCpnVPW5LlwKOBc4AHV9U17aqfAA9up5cCVw687ap22fhtrUyyKsmqNWvWTDNsSZIkSRoNUy7akmwNnAK8tqpuGVxXVQXUdHZcVUdV1YqqWrFkyZLpvFWSJEmSRsaUirYkm9EUbCdW1SfbxdeOdXts/72uXX41sPPA23dql0mSJEmSpmkqo0cGOBq4rKreO7DqdOCQdvoQ4LSB5S9tR5HcD1g70I1SkiRJkjQNm06hzROAlwAXJTm/XfYm4N3ASUkOA34EvKBddwbwdGA1cAfwsjmNWJIkSZJGyKRFW1V9BcgGVu8/QfsCXjnLuCRJkiRJTHP0SEmSJEnScFm0SZIkSVKPWbRJkiRJUo9ZtEmSJElSj1m0SZI0B5IckOS7SVYnOWKC9YcmWZPk/Pb18i7ilCQtPFMZ8l+SJG1Ekk2AfwV+B7gK+FaS06vq0nFNP15Vrxp6gJKkBc0rbZIkzd6+wOqquryq7gI+BhzYcUySpEXCok2SpNlbClw5MH9Vu2y85ya5MMnJSXYeTmiSpIXOok2SpOH4FLC8qvYCPgcct6GGSVYmWZVk1Zo1a4YWoCSpnyzaJEmavauBwStnO7XLfqGqbqiqn7WzHwIes6GNVdVRVbWiqlYsWbJkzoOVJC0sFm2SJM3et4Ddk+ya5P7AQcDpgw2SPHRg9tnAZUOMT5K0gDl6pCRJs1RV65K8CvgssAlwTFVdkuQdwKqqOh14dZJnA+uAG4FDOwtYkrSgWLRJkjQHquoM4Ixxy94yMP1G4I3DjkuStPDZPVKSJEmSesyiTZIkSZJ6zKJNkiRJknrMok2SJEmSesyiTZIkSZJ6zKJNkiRJknrMok2SJEmSesyiTZIkSZJ6zKJNkiRJknrMok2SJEmSesyiTZIkSZJ6zKJNkiRJknrMok2SJEmSesyiTZIkSZJ6zKJNkiRJknrMok2SJEmSesyiTZIkSZJ6zKJNkiRJknrMok2SJEmSesyiTZIkSZJ6zKJNkiRJknrMok2SJEmSesyiTZIkSZJ6zKJNkiRJknrMok2SJEmSesyiTZIkSZJ6bF6KtiQHJPluktVJjpiPfUiS1CeT5b4kD0jy8Xb9OUmWDz9KSdJCNOdFW5JNgH8FngY8Ejg4ySPnej+SJPXFFHPfYcBNVbUb8D7gPcONUpK0UM3HlbZ9gdVVdXlV3QV8DDhwHvYjSVJfTCX3HQgc106fDOyfJEOMUZK0QM1H0bYUuHJg/qp2mSRJi9VUct8v2lTVOmAtsMNQopMkLWibdrXjJCuBle3sbUm+21Usc2BH4Pr52HDsPDOZeTv24PGfhMe+Owv92O8y73tY4BZRjlzov6sLmce+Ox77bi3kv8s3mB/no2i7Gth5YH6ndtl6quoo4Kh52P/QJVlVVSu6jmMUeey747Hvjse+l6aS+8baXJVkU2A74IaJNrZYcqS/q93x2HfHY9+txXr856N75LeA3ZPsmuT+wEHA6fOwH0mS+mIque904JB2+nnAF6qqhhijJGmBmvMrbVW1LsmrgM8CmwDHVNUlc70fSZL6YkO5L8k7gFVVdTpwNHBCktXAjTSFnSRJk5qXe9qq6gzgjPnYdk8t+C4sC5jHvjse++547HtootxXVW8ZmP4p8Pxhx9Uxf1e747Hvjse+W4vy+MeeGZIkSZLUX/NxT5skSZIkaY5YtEmSJElSj1m0SZIkSVKPdfZw7YUsyQOA5wLLGTiGVfWOrmIaJUn+YILFa4GLquq6YcczSpJcBIy/EXYtsAr4m6qa8JlTmr0kJ1TVSyZbJnXJ/Ngt82N3zI/dGoUcadE2M6fR/Ec8F/hZx7GMosOAxwFntfNPovlZ7JrkHVV1QleBjYDPAHcDH2nnDwK2BH4CHAs8q5uwRsKvD84k2QR4TEexSBtifuyW+bE75sduLfocadE2MztV1QFdBzHCNgV+raquBUjyYOB44DeAswGT0vx5alXtMzB/UZLzqmqfJC/uLKpFLMkbgTcBWyS5ZWwxcBeLdFhjLWjmx26ZH7tjfuzAKOVI72mbma8l2bPrIEbYzmMJqXVdu+xG4OcdxTQqNkmy79hMksfSPEgYYF03IS1uVfWuqtoG+Puq2rZ9bVNVO1TVG7uOTxrH/Ngt82N3zI8dGKUc6ZW2mXkicGiSH9B0/whQVbVXt2GNjC8m+TTwiXb+ue2yrYCbuwtrJLwcOCbJ1jS/97cAL2+P/bs6jWyRq6o3JlkK7ML69wqd3V1U0n2YH7tlfuyO+bFDo5Ajfbj2DCTZZaLlVfWjYccyipKEJhE9oV30VeCU8pd5aJJsB1BVa7uOZVQkeTfNPRKX0tw3Ac0fw8/uLippfebHbpkfu2d+7MYo5EiLthlKsjfwm+3sl6vqgi7jkYbBkeG6k+S7wF5V5eAO6jXzo0aR+bFbo5AjvadtBpK8BjgR+KX29Z9J/qzbqEZHkj9I8v0ka5PckuTWgZtPNb9OAw6k6Z9/+8BL8+9yYLOug5A2xvzYLfNjp8yP3Vr0OdIrbTOQ5ELgcVV1ezu/FfB1++wPR5LVwLOq6rKuYxk1SS6uqj26jmOUJPkXmmf/LAX2Bj7PwFDqVfXqjkKT7sP82C3zY3fMj90YpRzpQCQzE+7tL0s7nY5iGUXXmpA687Uke1bVRV0HMkJWtf+eC5zeZSDSFJgfu2V+7I75sRsjkyO90jYDSQ4HDgFOpUlGBwLHVtU/dRrYiEjyz8BDgP9i/bMpn+wsqBGR5FJgN8CR4STdh/mxWzkPRy4AAA8GSURBVObH7pgfNd8s2mYoyT40QxsX8JWq+nbHIY2MJB+eYHFV1R8NPZgR48hw3UlyEc33zaC1NGcZ/6aqbhh+VNJ9mR+7Y37sjvmxW6OQI+0eOXN30/xyFHBPx7GMlKp6WdcxjJok21bVLcCtXccywj5D873zkXb+IGBL4CfAscCzuglLug/zY0fMj8NnfuyNRZ8jvdI2A+3oWH8MnEJz+fv3gaOq6l86DWyRS/L6qvq7gZtO17OYbjbtmySfrqpntg/MLda/R6Wq6mEdhTYykpxXVftMtCzJRVW1Z1exSWPMj90wP3bH/NgPo5AjvdI2M4cBvzEwOtZ7gK8DJqX5NXZz9aqNttKcq6pntv/u2nUsI2yTJPtW1TcBkjwW2KRdt667sKT1mB+7YX7siPmxNxZ9jrRomxlHx+pAVX2q/fe4rmMZZUmWAruw/sNDz+4uopHxcuCYJFvTfN/cAry8HVL9XZ1GJt3L/NgB82M/mB87tehzpEXbzHwYOCfJqe38c4CjO4xnpCR5OPA6YDnrfzE+pauYRkV71vyFwKXc+4dZASaleVZV3wL2TLJdO792YPVJ3UQl3Yf5sUPmx+6YH7s1CjnSe9pmKMljgCe0s192dKzhSXIB8EGaZ3L84oxuVZ3bWVAjIsl3gb2q6meTNtacSPLiqvrPdij1+6iq9w47JmljzI/dMT92x/zYjVHKkV5pm7nzgWtoj2GSZVV1RbchjYx1VXVk10GMqMuBzRh4/o/m3Vbtv9t0GoU0debH7pgfu2N+7MbI5EivtM1Akj8D3gpcy7399X2A4pAkeRtwHc3DWwcfHnpjVzGNiiSnAHsDn2f9Y+/IZJLMjx0zP3bH/Kj5ZtE2A0lW04yOteAf1LcQtcPqjuewukOQ5JCJlnvz+/xJ8paNrK6q+uuhBSNNwvzYLfNjd8yP3RilHGn3yJm5kuYp6+qAw+p2I8kmwO9W1Yu6jmXE3D7Bsq1ohlbfAVg0CUmLgvmxQ+bHbpgfOzUyOdKibWYuB76Y5L9Z/xL4ornZsc+SbAkcDiyrqpVJdgd+tao+3XFoi1pV3Z1klyT3r6q7uo5nVFTVP45NJ9kGeA3wMuBjwD9u6H1SR8yPHTI/dsP82J1RypEWbTNzRfu6f/vScH2YZmSsx7fzVwOfAExK8+9y4KtJTmfg7JZ/kM2vJA+i+UPsRcBxwD5VdVO3UUkTMj92y/zYHfNjR0YlR1q0TVN7CXybqnpd17GMsF+pqhcmORigqu5I4sNbh+N/29f9GIGRmvogyd8DfwAcBexZVbd1HJI0IfNjL5gfu2N+7MAo5UgHIpmBJF+vqsd1HceoSvI1YH/gq1W1T5JfAT5aVft2HNrISLI1wGL+cuyLJPfQdDNbR/Og1l+sornJettOApMmYH7slvmxe+bH4RqlHOmVtpk5v738/QnWvwT+ye5CGilvA84Edk5yIs1DXA/tMqBRkWQP4ATgQe389cBLq+qSTgNbxKrqfl3HIE2D+bFbb8P82AnzYzdGKUd6pW0Gknx4gsVVVX809GBGVJIdgP1ozqR8o6qu7zikkdCexf2/VXVWO/8k4J1V9fiNvlHSSDA/ds/82A3zo+abRZsWnCSfAj4CnF5VEw31qnmS5IKq2nuyZZKk4TM/dsf8qPk2MpcU51KShyf5fJKL2/m9kry567hGyD8AvwlcmuTkJM9LsnnXQY2Iy5P8VZLl7evNNCNmSZL5sXvmx+6YHzWvvNI2A0m+BPwl8O9V9eh22cVVtUe3kY2WdqSypwB/DBywmG427askDwTeDjyxXfRl4G2LcWhdSdNnfuwH8+PwmR813xyIZGa2rKpvjhtFd11XwYyiJFsAzwJeCOxD81wOzbM2+by66zgk9Zb5sWPmx26YHzXfLNpm5vp2GN0CSPI84JpuQxodSU4C9qUZIesDwJeq6p5uo1rc2tHgNqiqnj2sWCT1mvmxQ+bH4TM/aljsHjkDSR5G8xC/xwM3AT8AXlRVP+o0sBGR5PeA/1dVd3cdy6hIsga4EvgocA7NqGS/UFVf6iIuSf1ifuyW+XH4zI8aFgcimZmqqqcCS4BHVNUT8VjOuySvB6iqzwJ/MG7dOzsJanQ8BHgTsAfwz8DvANdX1ZdMSJIGmB87YH7slPlRQ+EX6cycAlBVt1fVre2ykzuMZ1QcNDD9xnHrDhhmIKOmqu6uqjOr6hCa5/+sBr6Y5FUdhyapX8yP3TA/dsT8qGHxnrZpSPII4NeB7ZIMnsnaFnBI3fmXDUxPNK85luQBwDOAg4HlwPuBU7uMSVI/mB87Z37skPlRw2DRNj2/CjwT2J5mZKYxt9IMq6v5VRuYnmhecyjJ8TRdP84A3l5VF3cckqR+MT92y/zYEfOjhsWBSKYhyXuq6g1J3lJV7+g6nlGT5G7gdpqzhlsAd4ytAjavqs26im2xS3IPzbGH9f8ACM09LD4DSBph5sdumR+7Y37UsFi0TUOSi4C9gHOrap+u45EkqQ/Mj5I0v+weOT1n0gxhvHWSW1i/n/g9VbVdN2FJktQp86MkzSOvtM1AktOq6sCB+d8EDq6qP+0wLEmSOmV+lKT54ZW2GaiqA5M8mmaUoBfQPDz0lG6jkiSpW+ZHSZofFm3TkOThNInoYOB64OM0Vyuf3GlgkiR1yPwoSfPL7pHT0I4Q9GXgsKpa3S67vKoe1m1kkiR1x/woSfPrfl0HsMD8AXANcFaS/0iyPz60UpIk86MkzSOvtM1Akq2AA2m6gTwFOB44tar+p9PAJEnqkPlRkuaHRdssJXkg8HzghVW1f9fxSJLUB+ZHSZo7Fm2SJEmS1GPe0yZJkiRJPWbRJkmSJEk9ZtEmSZIkST1m0aZZSXJ3kvOTXJzkE0m2HFj3nCSV5BEDy5YnubN9z6VJjk+yWZLfa5edn+S2JN9tp4/fyL6fmOSbSb7TvlYOrHtbkqsHtvnujWzni+3+Lmy384Ek249rs95nSbJ523bPgTZ/meTf2+mHJvn0FI7f1yZZ/8UkKybbzlyYyr6SvHbwZzzVdknOGH9MZxjjnkmOne12JGkYzJHmyKm0M0dqKizaNFt3VtWjqmoP4C7gTwbWHQx8pf130P9W1aOAPYGdgBdU1Wfb7TwKWAW8qJ1/6UQ7TfIQ4CPAn1TVI4AnAv8nyTMGmr1vbJtVdcQkn+NFVbUXsBfwM+C0cevX+yxV9VPgtcC/pbG0/exj+zkc+I9J9klVPX6yNj3zWmDShDS+XVU9vapunu3Oq+oiYKcky2a7LUkaAnOkOXLSduZITYVFm+bSl4HdAJJsTZMkDgMOmqhxVd0NfBNYOoN9vRI4tqrOa7d1PfB67k0IM1JVd7XbWZZkb9jwZ6mqM2keJvtS4H3A26rqpnb1c4Ez2/cfmuS09izd95O8dWwbSW4bmH5DkouSXDD+rGeS+yU5NsnfTPC+542dWWvbfDDJqiTfS/LMDX3WJFsk+ViSy5KcCmwxsO7IdhuXJHl7u+zVwC/TPDz3rHbZ7yb5epLz2rPIW2+g3Q+T7NieRf5OG+f3kpyY5KlJvtoem33b9lslOaY9S/ztJAcOhP4pNvA7JUk9Zo40R5ojNWMWbZoTSTYFngZc1C46EDizqr4H3JDkMRO8Z3PgN2i/uKfp14Fzxy1b1S4f8+e5t+vH7011w22ivAAY67Kysc/yWuBvgSVVdQJAkl2Bm6rqZwPt9qVJUnsBz8+4LhZJntbu5zeqam/g7wZWbwqcCHy/qt48hY+wvN3fM4APtsd5Iq8A7qiqXwPeCgx+rv9bVSvaeH87yV5V9X7gx8CTq+rJSXYE3gw8tar2oTn+h49vN8F+dwP+keb4PgL4Q5qE/zrgTWP7B75QVfsCTwb+Ps1De2n385tTOA6S1AvmSHMk5kjNkkWbZmuLJOfTfElcARzdLj8Y+Fg7/THW7/7xK+17rgWuqaoL5ym2wa4fn53mezMwvcHPUlU/Br4AHDnQ/qHAmnHb+1xV3VBVdwKfpPkCHvRU4MNVdUe73RsH1v07cHFV/e0UYz+pqu6pqu8Dl3NvYh3vt4D/bPd3ITD4c3hBkvOAb9Mk+UdO8P792uVfbX+ehwC7TCG+H1TVRVV1D3AJ8PlqHhh5EU0yBfhd4Ih2u18ENgfGuntcR3OWUpL6zhxpjjRHak5s2nUAWvDubPvY/0KSBwFPAfZMUsAmQCX5y7bJ/1bVo9qzUF9N8uyqOn2a+72U5qzXYL/6x9B8wc1Kkk1o7iW4bGOfpe59Mv097WvMnTRfoIPGP8V+Ok+1/xrw5CT/2N4nMP79c7mvsbOgrwMeW1U3td1KJjoTGZpEO/5+jMkMnl29Z2D+Hu79Tgrw3Kr67gTv35zmGEtS35kjzZHmSM0Jr7RpPjwPOKGqdqmq5VW1M/ADxl2ub/vYHwG8cQb7+Ffg0CSPAkiyA/Ae1u8yMW1JNgPeBVzZnlmb0mcZ53vcezZszO8keVCSLYDnAF8dt/5zwMvSjibVJsIxRwNnACe1XWwArk3ya0nuB/z+uG09v+3f/yvAw4CJvtQBzqbpdkGSPWi6eQBsC9wOrE3yYJouPWNuBbZpp78BPCHJ2D0aWyV5+ATtZuKzwJ8lSbvtRw+sezhw8Sy2LUldMkeaI82RmjaLNs2Hg4FTxy07hfuOkAXwX8CWSabV/7qqrgFeDPxHku/QnGk7pqo+NYN4AU5MciHNF91WNH3nYXqfZSy224H/Hfuibn2zfd+FwClVtWrce84ETgdWtd0dXjdu/XtpumGc0CahI4BP03zua8aFcEW7v8/QjBz2UyZ2JLB1ksuAd9De/1BVF7T7+g7N6GODyfMo4MwkZ1XVGuBQ4KPtsfs693Yz+UW7Dex7Mn8NbAZcmOSSdn7Mk4H/nuF2Jalr5khzpDlS05Z7r15LmitJfh94TFW9OcmhwIqqetUQ9nss8OmqOnm+99WFJA8AvgQ8sarWdR2PJGn6zJHzwxy5uHlPmzQPqurUtjuK5tYy4AiTkSQtXObIeWOOXMS80qZea4chfs+4xT+oqvF91KeyrVOBXcctfsMMRs1aUObyGEqS+sMcOXvmSC0UFm2SJEmS1GMORCJJkiRJPWbRJkmSJEk9ZtEmSZIkST1m0SZJkiRJPWbRJkmSJEk99v8BeI/2onUP18QAAAAASUVORK5CYII=\n", "text/plain": [ "<Figure size 1080x288 with 2 Axes>" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig, axs = plt.subplots(ncols=2, figsize=(15, 4))\n", "fm.groupby(\"PART_OF_DAY(pickup_datetime)\")[\"trip_duration\"].mean().plot.bar(title=\"Trip duration by Time of Day\", ax=axs[0])\n", "fm.groupby(\"PART_OF_DAY(pickup_datetime)\")[\"CITYBLOCK_DISTANCE(dropoff_latlong, pickup_latlong)\"].mean().plot.bar(title=\"Trip Distance by Time of Day\", ax=axs[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Trips are shorter in time during federal holidays\n", "\n", "This may be because fewer people are one the road when it is a holiday." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<matplotlib.axes._subplots.AxesSubplot at 0x15faa82e8>" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "<Figure size 432x288 with 1 Axes>" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fm.groupby(\"IS_FEDERAL_HOLIDAY(pickup_datetime)\")[\"trip_duration\"].mean().plot.bar(title=\"Trip duration by Is Federal Holiday\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 6: Apply feature engineering and modeling to new data\n", "\n", "Once we are happy with our model, we can apply it to new data. Below we load in 600,000 new trips, where we don't know the duration (note: duration is no longer a column in the data)." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "es_test = ft.entityset.read_entityset(\"s3://featurelabs-static/nyc_taxi_entityset_test.tar\")" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n", "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n", " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n", "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n", " -->\n", "<!-- Title: taxi Pages: 1 -->\n", "<svg width=\"246pt\" height=\"137pt\"\n", " viewBox=\"0.00 0.00 246.38 137.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n", "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 133)\">\n", "<title>taxi</title>\n", "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-133 242.3799,-133 242.3799,4 -4,4\"/>\n", "<!-- trips -->\n", "<g id=\"node1\" class=\"node\">\n", "<title>trips</title>\n", "<polygon fill=\"none\" stroke=\"#000000\" points=\"0,-.5 0,-128.5 238.3799,-128.5 238.3799,-.5 0,-.5\"/>\n", "<text text-anchor=\"middle\" x=\"119.1899\" y=\"-113.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">trips</text>\n", "<polyline fill=\"none\" stroke=\"#000000\" points=\"0,-106.5 238.3799,-106.5 \"/>\n", "<text text-anchor=\"start\" x=\"8\" y=\"-91.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">id : index</text>\n", "<text text-anchor=\"start\" x=\"8\" y=\"-77.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">pickup_datetime : datetime_time_index</text>\n", "<text text-anchor=\"start\" x=\"8\" y=\"-63.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">store_and_fwd_flag : boolean</text>\n", "<text text-anchor=\"start\" x=\"8\" y=\"-49.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">passenger_count : ordinal</text>\n", "<text text-anchor=\"start\" x=\"8\" y=\"-35.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">vendor_id : categorical</text>\n", "<text text-anchor=\"start\" x=\"8\" y=\"-21.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">pickup_latlong : latlong</text>\n", "<text text-anchor=\"start\" x=\"8\" y=\"-7.3\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">dropoff_latlong : latlong</text>\n", "</g>\n", "</g>\n", "</svg>\n" ], "text/plain": [ "<graphviz.dot.Digraph at 0x148ad9198>" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "es_test.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "we can use the feature definitions that we created before to reperform feature engineering on this dataset. You can read more about saving and loading feature definitions [here](https://featuretools.featurelabs.com/guides/deployment.html#saving-features). " ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Elapsed: 02:17 | Progress: 100%|██████████| Remaining: 00:00\n" ] } ], "source": [ "# remove trip_duration from the features to calculate\n", "features = [f for f in features_encoded if f.get_name() != \"trip_duration\"]\n", "\n", "fm_test = ft.calculate_feature_matrix(entityset=es_test,\n", " features=features,\n", " chunk_size=.1,\n", " verbose=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "with the new feature matrix in hand, we are ready to reapply our estimator that was previously trained to generate predictions" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.\n", "[Parallel(n_jobs=8)]: Done 34 tasks | elapsed: 4.2s\n", "[Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed: 12.2s finished\n" ] }, { "data": { "text/plain": [ "id\n", "id0000002 1026.497582\n", "id0000006 681.823890\n", "id0000007 791.126881\n", "id0000017 1029.409864\n", "id0000018 2454.557293\n", " ... \n", "id3999960 562.255700\n", "id3999966 1464.219648\n", "id3999967 244.016590\n", "id3999981 649.003684\n", "id3999997 1333.935868\n", "Length: 625134, dtype: float64" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "preds = estimator.predict(fm_test)\n", "preds = np.exp(preds) - 1 # undo log transform\n", "preds = pd.Series(preds, index=fm_test.index)\n", "preds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<p>\n", " <img src=\"https://www.featurelabs.com/wp-content/uploads/2017/12/logo.png\" alt=\"Featuretools\" />\n", "</p>\n", "\n", "Featuretools was created by the developers at [Feature Labs](https://www.featurelabs.com/). If building impactful data science pipelines is important to you or your business, please [get in touch](https://www.featurelabs.com/contact/)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }