{ "metadata": { "name": "", "signature": "sha256:27c8ab1081d88e3708b11d220568fcde04bd5aa8639869b9c0b842d0fb5239d3" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Wherefore art thou CitiBike?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Citibike is NYC's new(ish) and very popular bike share program. Citibike customers can take a bike from any station, and leave it at any other station. It's natural to assume that after some time, certain areas of the city will end up with a high density of bikes, and others areas will have few bikes. Moreover, this relative density is likely to fluctuate in both space and time." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following is an attempt to visualize the approximate density of available bikes throughout the day, using Citibike's own system data available at: http://citibikenyc.com/system-data" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Import Statements" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# data handling and formatting\n", "import pandas as pd\n", "import numpy as np\n", "import datetime as dt\n", "from copy import copy, deepcopy\n", "\n", "# plotting\n", "import matplotlib.pyplot as plt\n", "import matplotlib.animation as animation\n", "import matplotlib.figure as fig\n", "import matplotlib.image as mpimg\n", "\n", "# image creation / modification \n", "import scipy.ndimage as ndi\n", "from scipy import misc\n", "from matplotlib.colors import LinearSegmentedColormap\n", "\n", "# mapping / shapefile stuff\n", "from mpl_toolkits.basemap import Basemap\n", "from shapelib import ShapeFile\n", "import dbflib" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Parameters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set the path to the Citibike data file. This is the first available file on http://citibikenyc.com/system-data , and spans all of July 2013 so not all of the current citibike stations are going to be active in this file." ] }, { "cell_type": "code", "collapsed": false, "input": [ "data_path = '2013-07-CitiBikeTripData.csv'" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The parameter time_increment refers to the amount of time (minutes) by which we step forward in each frame of the animation, and time_window_size is the size of the time window we're looking at in each frame. number_of_days is the number of days' worth of data that should be used in the visualization." ] }, { "cell_type": "code", "collapsed": false, "input": [ "time_increment = 5 \n", "time_window_size = 10\n", "number_of_days = 3 # the number of days worth of data that should be used in the visualization" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The boundary lat/lon values of the map are chosen to give a decent bounding box around lower Manhattan and much of northwestern brooklyn. This is the area we'll be looking in." ] }, { "cell_type": "code", "collapsed": false, "input": [ "lon0 = -74.0417\n", "lon1 = -73.8982\n", "lat0 = 40.6851\n", "lat1 = 40.7721" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following parameters are for use with a grid-density gaussian filter algorithm, where image_width and image_height refer to the width and height (in pixels) of a grid (2D array) that we will use to represent the geographic area we're looking at. \n", "\n", "The gaussian_filter_radius parameter is the radius we'll be feeding into the scipy.ndimage.gaussianfilter() method." ] }, { "cell_type": "code", "collapsed": false, "input": [ "image_width = 200\n", "image_height = 200\n", "gaussian_filter_radius = 3" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data comes in the form of a large CSV file, so use Pandas' read_csv method to read the data." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# read in the undocking data\n", "undocking_dataframe = pd.read_csv(data_path, usecols=[1,5,6], index_col=0, parse_dates=True)\n", "#read in the docking datax\n", "docking_dataframe_unsorted = pd.read_csv(data_path, usecols=[2,9,10], index_col=0, parse_dates=True)\n", "docking_dataframe = docking_dataframe_unsorted.sort_index()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have two dataframes, undocking_dataframe and docking_dataframe each indexed by a timestamp (sorted earliest to latest), and containing a latitude and a longitude. See below for an example." ] }, { "cell_type": "code", "collapsed": false, "input": [ "undocking_dataframe.head()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
start station latitudestart station longitude
starttime
2013-07-01 00:00:00 40.753231-73.970325
2013-07-01 00:00:02 40.749718-74.002950
2013-07-01 00:01:04 40.730287-73.990765
2013-07-01 00:01:06 40.718939-73.992663
2013-07-01 00:01:10 40.734927-73.992005
\n", "

5 rows \u00d7 2 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ " start station latitude start station longitude\n", "starttime \n", "2013-07-01 00:00:00 40.753231 -73.970325\n", "2013-07-01 00:00:02 40.749718 -74.002950\n", "2013-07-01 00:01:04 40.730287 -73.990765\n", "2013-07-01 00:01:06 40.718939 -73.992663\n", "2013-07-01 00:01:10 40.734927 -73.992005\n", "\n", "[5 rows x 2 columns]" ] } ], "prompt_number": 7 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "mpl_toolkits.basemap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we need a map on which to plot our geographic data. We use mpl_toolkits.basemap.Basemap to instantiate a map, using our latitude and longitude bounds." ] }, { "cell_type": "code", "collapsed": false, "input": [ "m = Basemap(llcrnrlon=lon0, llcrnrlat=lat0, urcrnrlon=lon1, urcrnrlat=lat1, resolution='h', lat_0=lat0, lon_0=lon0)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "To create the effect of a more detailed map, we add a background image to the basemap." ] }, { "cell_type": "code", "collapsed": false, "input": [ "osm_image = misc.imread('map.png')\n", "#plot it onto the map\n", "m.imshow(osm_image, extent=[lon0, lat0, lon1, lat1], origin='upper', aspect='auto')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "" ] } ], "prompt_number": 9 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Custom Colormap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a custom colormap for the m.imshow() method to use. This will display high numbers as red, low numbers as blue, and in-the-middle numbers as clear." ] }, { "cell_type": "code", "collapsed": false, "input": [ "bwr_cmap = plt.get_cmap('bwr')\n", "cdict_new = bwr_cmap._segmentdata.copy()\n", "cdict_new['alpha'] = ((0.0, 1.0, 1.0),\n", " # (0.25,1.0, 1.0),\n", " (0.6, 0.4, 0.0),\n", " # (0.75,1.0, 1.0),\n", " (1.0, 1.0, 1.0))\n", "blue_red1 = LinearSegmentedColormap('BlueRed1', cdict_new)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Grid Density Gaussian Filter Methods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The method here is adapted from this StackOverflow question: \n", "http://stackoverflow.com/questions/6652671/efficient-method-of-calculating-density-of-irregularly-spaced-points\n", "\n", "As a slight modification, the algorithm is broken into three methods:\n", "" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# return the gaussian filtered data for a given current_img\n", "def grid_density_gaussian_filter_img(current_img, radius):\n", "\t\tr = radius\n", "\t\treturn ndi.gaussian_filter(current_img, (r,r)) #apply gaussian filter to the image and return the ndimage\n", "\n", "# go through and bring all image data closer to zero\n", "def decay_img(current_img): \n", "\t\tfor index,value in np.ndenumerate( current_img ):\n", "\t\t\tif value > 0:\n", "\t\t\t\tcurrent_img[index] -= 1\n", "\t\t\tif value < 0:\n", "\t\t\t\tcurrent_img[index] += 1\n", "\t\treturn current_img\n", "\t\t\t\n", "# add 'positive_data' and subtract 'negative data' to previous_img and return a new image with those values\n", "def iterate_image(previous_img, x0, y0, x1, y1, w, h, positive_data, negative_data):\n", "\t\timgw = previous_img.shape[0]\n", "\t\timgh = previous_img.shape[1]\n", "\t\tnew_img = copy(previous_img)\n", "\t\tkx = (imgw - 1) / (x1 - x0)\n", "\t\tky = (imgh - 1) / (y1 - y0)\n", "\t\tfor x, y in positive_data:\n", "\t\t\t\tix = int((x - x0) * kx) \n", "\t\t\t\tiy = int((y - y0) * ky)\n", "\t\t\t\tif 0 <= ix < imgw and 0 <= iy < imgh:\n", "\t\t\t\t\tnew_img[iy][ix] += 1\n", "\t\tfor x, y in negative_data:\n", "\t\t\t\tix = int((x - x0) * kx)\n", "\t\t\t\tiy = int((y - y0) * ky)\n", "\t\t\t\tif 0 <= ix < imgw and 0 <= iy < imgh:\n", "\t\t\t\t\tnew_img[iy][ix] -= 1\n", "\t\treturn new_img" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Create An Array Of matplotlib.image.AxesImage Objects" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To generate an animated heatmap of all the docking or undocking events in the CitiBike system, one approach is to generate an array of objects of the type matplotlib.image.AxesImage and then call Matplotlib's animation.ArtistAnimation method on that array to generate an animation. Below is the code for the hard part -- building the array of AxesImages from our data." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# find the earliest undocking timestamp in the database -- this will be the start time of the data used for the animation\n", "initial_start_date_undocking = undocking_dataframe.index[0]\n", "\n", "# ims will hold the series of images that make up the animation\n", "ims = []\n", "\n", "# determine the number of frames in the animation\n", "num_frames = (number_of_days*24*60)/time_increment" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "for i in range(num_frames):\n", "\t# find the indices of the next time windows for undocking events\n", "\tnext_start_date_undocking_index = undocking_dataframe.index.searchsorted( initial_start_date_undocking + dt.timedelta(minutes=time_increment*i) )\n", "\tnext_end_date_undocking_index = undocking_dataframe.index.searchsorted( initial_start_date_undocking + dt.timedelta(minutes=(time_window_size + time_increment*i) ) )\n", "\t# grab the next time windows for undocking events\n", "\tnext_start_date_undocking = undocking_dataframe.index[next_start_date_undocking_index]\n", "\tnext_end_date_undocking = undocking_dataframe.index[next_end_date_undocking_index]\n", "\t\n", "\t# find the indices of the next time windows for re-docking events\n", "\tnext_start_date_docking_index = docking_dataframe.index.searchsorted( next_start_date_undocking )\n", "\tnext_end_date_docking_index = docking_dataframe.index.searchsorted( next_end_date_undocking )\n", "\t# grab the next time windows for undocking events\n", "\tnext_start_date_docking = docking_dataframe.index[next_start_date_docking_index]\n", "\tnext_end_date_docking = docking_dataframe.index[next_end_date_docking_index]\n", "\t\n", "\t# get the next few undocking events\n", "\tnext_data_undocking = undocking_dataframe[next_start_date_undocking:next_end_date_undocking]\n", "\tx_udata = next_data_undocking['start station longitude']\n", "\ty_udata = next_data_undocking['start station latitude']\n", "\tx_undocking, y_undocking = m(x_udata,y_udata)\n", "\t\n", "\t# get the next few docking events\n", "\tnext_data_docking = docking_dataframe[next_start_date_docking:next_end_date_docking]\n", "\tx_ddata = next_data_docking['end station longitude']\n", "\ty_ddata = next_data_docking['end station latitude']\n", "\tx_docking, y_docking = m(x_ddata,y_ddata)\t\n", "\t\n", "\t# get the new heatmap image\n", "\tif i == 0:\n", "\t\tnew_heatmap_image = np.zeros((image_width, image_height))\n", "\telse:\n", "\t\tnew_heatmap_image = iterate_image(heatmap_image, lon0, lat0, lon1, lat1, image_width, image_height, zip(x_docking,y_docking), zip(x_undocking,y_undocking))\n", "\t\n", "\tif i % 100 == 0:\n", "\t\tnew_heatmap_image = decay_img(new_heatmap_image)\n", "\t\t\n", "\t# generate the z dimension with the newlt iterated image\n", "\tz_dimension = grid_density_gaussian_filter_img(new_heatmap_image, gaussian_filter_radius)\n", "\t\n", "\t#reset the heatmap_image variable\n", "\theatmap_image = new_heatmap_image\n", "\t\n", "\t# produce the im from this data\n", "\theatmap_im = m.imshow(z_dimension , origin='lower', extent=[lon0, lat0, lon1, lat1], alpha=.4, vmin=-1, vmax=1, cmap=blue_red1, aspect='auto')\n", "\t# find the text string for displaying time/date\n", "\tcurrent_date = initial_start_date_undocking + dt.timedelta(minutes=time_increment*i)\n", "\tcurrent_date_string = current_date.strftime('%I:%M%p on %a %b %d')\n", "\tcurrent_date_plotted = plt.text(-73.9509, 40.6909, current_date_string, fontsize=14)\n", "\t# append all Artist instances to the array of drawn images\n", "\tims.append([current_date_plotted,heatmap_im])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Animate the Images" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# set the figure size\n", "plt.gcf().set_size_inches(15.0, 12.0)\n", "plt.gcf().subplots_adjust(left=0, bottom=0, right=1, top=1, wspace=None, hspace=None)\n", "\n", "# run the animation\t\n", "anim = animation.ArtistAnimation(plt.gcf(), ims, interval=50, blit=True, repeat=True)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "#plot all of these points\n", "plt.show()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 15 } ], "metadata": {} } ] }