{
 "metadata": {
  "name": "",
  "signature": "sha256:27c8ab1081d88e3708b11d220568fcde04bd5aa8639869b9c0b842d0fb5239d3"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Wherefore art thou CitiBike?"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Citibike is NYC's new(ish) and very popular bike share program. Citibike customers can take a bike from any station, and leave it at any other station. It's natural to assume that after some time, certain areas of the city will end up with a high density of bikes, and others areas will have few bikes. Moreover, this relative density is likely to fluctuate in both space and time."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The following is an attempt to visualize the approximate density of available bikes throughout the day, using Citibike's own system data available at: http://citibikenyc.com/system-data"
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Import Statements"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# data handling and formatting\n",
      "import pandas as pd\n",
      "import numpy as np\n",
      "import datetime as dt\n",
      "from copy import copy, deepcopy\n",
      "\n",
      "# plotting\n",
      "import matplotlib.pyplot as plt\n",
      "import matplotlib.animation as animation\n",
      "import matplotlib.figure as fig\n",
      "import matplotlib.image as mpimg\n",
      "\n",
      "# image creation / modification \n",
      "import scipy.ndimage as ndi\n",
      "from scipy import misc\n",
      "from matplotlib.colors import LinearSegmentedColormap\n",
      "\n",
      "# mapping / shapefile stuff\n",
      "from mpl_toolkits.basemap import Basemap\n",
      "from shapelib import ShapeFile\n",
      "import dbflib"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Parameters"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Set the path to the Citibike data file. This is the first available file on http://citibikenyc.com/system-data , and spans all of July 2013 so not all of the current citibike stations are going to be active in this file."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "data_path = '2013-07-CitiBikeTripData.csv'"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The parameter <code>time_increment</code> refers to the amount of time (minutes) by which we step forward in each frame of the animation, and <code>time_window_size</code> is the size of the time window we're looking at in each frame. <code>number_of_days</code> is the number of days' worth of data that should be used in the visualization."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "time_increment = 5 \n",
      "time_window_size = 10\n",
      "number_of_days = 3 # the number of days worth of data that should be used in the visualization"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The boundary lat/lon values of the map are chosen to give a decent bounding box around lower Manhattan and much of northwestern brooklyn. This is the area we'll be looking in."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "lon0 = -74.0417\n",
      "lon1 = -73.8982\n",
      "lat0 = 40.6851\n",
      "lat1 = 40.7721"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The following parameters are for use with a grid-density gaussian filter algorithm, where <code>image_width</code> and <code>image_height</code> refer to the width and height (in pixels) of a grid (2D array) that we will use to represent the geographic area we're looking at. \n",
      "\n",
      "The <code>gaussian_filter_radius</code> parameter is the radius we'll be feeding into the <code>scipy.ndimage.gaussianfilter()</code> method."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "image_width = 200\n",
      "image_height = 200\n",
      "gaussian_filter_radius = 3"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 5
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Pandas"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The data comes in the form of a large CSV file, so use Pandas' <code>read_csv</code> method to read the data."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# read in the undocking data\n",
      "undocking_dataframe = pd.read_csv(data_path, usecols=[1,5,6], index_col=0, parse_dates=True)\n",
      "#read in the docking datax\n",
      "docking_dataframe_unsorted = pd.read_csv(data_path, usecols=[2,9,10], index_col=0, parse_dates=True)\n",
      "docking_dataframe = docking_dataframe_unsorted.sort_index()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We now have two dataframes, <code>undocking_dataframe</code> and <code>docking_dataframe</code> each indexed by a timestamp (sorted earliest to latest), and containing a latitude and a longitude. See below for an example."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "undocking_dataframe.head()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>start station latitude</th>\n",
        "      <th>start station longitude</th>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>starttime</th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>2013-07-01 00:00:00</th>\n",
        "      <td> 40.753231</td>\n",
        "      <td>-73.970325</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2013-07-01 00:00:02</th>\n",
        "      <td> 40.749718</td>\n",
        "      <td>-74.002950</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2013-07-01 00:01:04</th>\n",
        "      <td> 40.730287</td>\n",
        "      <td>-73.990765</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2013-07-01 00:01:06</th>\n",
        "      <td> 40.718939</td>\n",
        "      <td>-73.992663</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2013-07-01 00:01:10</th>\n",
        "      <td> 40.734927</td>\n",
        "      <td>-73.992005</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>5 rows \u00d7 2 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 7,
       "text": [
        "                     start station latitude  start station longitude\n",
        "starttime                                                           \n",
        "2013-07-01 00:00:00               40.753231               -73.970325\n",
        "2013-07-01 00:00:02               40.749718               -74.002950\n",
        "2013-07-01 00:01:04               40.730287               -73.990765\n",
        "2013-07-01 00:01:06               40.718939               -73.992663\n",
        "2013-07-01 00:01:10               40.734927               -73.992005\n",
        "\n",
        "[5 rows x 2 columns]"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "mpl_toolkits.basemap"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we need a map on which to plot our geographic data. We use <code>mpl_toolkits.basemap.Basemap</code> to instantiate a map, using our latitude and longitude bounds."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "m = Basemap(llcrnrlon=lon0, llcrnrlat=lat0, urcrnrlon=lon1, urcrnrlat=lat1, resolution='h', lat_0=lat0, lon_0=lon0)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To create the effect of a more detailed map, we add a background image to the basemap."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "osm_image = misc.imread('map.png')\n",
      "#plot it onto the map\n",
      "m.imshow(osm_image, extent=[lon0, lat0, lon1, lat1], origin='upper', aspect='auto')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 9,
       "text": [
        "<matplotlib.image.AxesImage at 0x10dbaa450>"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Custom Colormap"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Create a custom colormap for the <code>m.imshow()</code> method to use. This will display high numbers as red, low numbers as blue, and in-the-middle numbers as clear."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "bwr_cmap = plt.get_cmap('bwr')\n",
      "cdict_new = bwr_cmap._segmentdata.copy()\n",
      "cdict_new['alpha'] = ((0.0, 1.0, 1.0),\n",
      "                #   (0.25,1.0, 1.0),\n",
      "                   (0.6, 0.4, 0.0),\n",
      "                #   (0.75,1.0, 1.0),\n",
      "                   (1.0, 1.0, 1.0))\n",
      "blue_red1 = LinearSegmentedColormap('BlueRed1', cdict_new)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 10
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Grid Density Gaussian Filter Methods"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The method here is adapted from this StackOverflow question: \n",
      "http://stackoverflow.com/questions/6652671/efficient-method-of-calculating-density-of-irregularly-spaced-points\n",
      "\n",
      "As a slight modification, the algorithm is broken into three methods:\n",
      "<ul>\n",
      "<li><code>grid_density_gaussian_filter_img</code> applies a gaussian filter to the input <code>image</code> (which is a 2D array) using <code>scipy.ndimage.gaussian_filter</code> and returns the result. \n",
      "\n",
      "<li><code>iterate_image</code> takes a 2D-array (<code>previous_img</code>) and the docking/undocking data (<code>positive_data</code> and <code>negative_date</code>). The 'grid density' algorithm simply increments a pixel in <code>previous_img</code> for each corresponding point in <code>positive_data</code>, or decrements a pixel in <code>previous_img</code> for each corresponding point in <code>negative_date</code>.\n",
      "\n",
      "<li><code>deacy_img</code> is an optionally-used method which takes the current image array (<code>current_img</code>) and reduces the absolute value of every nonzero pixel in the array by 1. The resulting image is returned. This prevents areas of really high density from swamping out all the other areas.\n",
      "</ul>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# return the gaussian filtered data for a given current_img\n",
      "def grid_density_gaussian_filter_img(current_img, radius):\n",
      "\t\tr = radius\n",
      "\t\treturn ndi.gaussian_filter(current_img, (r,r)) #apply gaussian filter to the image and return the ndimage\n",
      "\n",
      "# go through and bring all image data closer to zero\n",
      "def decay_img(current_img): \n",
      "\t\tfor index,value in np.ndenumerate( current_img ):\n",
      "\t\t\tif value > 0:\n",
      "\t\t\t\tcurrent_img[index] -= 1\n",
      "\t\t\tif value < 0:\n",
      "\t\t\t\tcurrent_img[index] += 1\n",
      "\t\treturn current_img\n",
      "\t\t\t\n",
      "# add 'positive_data' and subtract 'negative data' to previous_img and return a new image with those values\n",
      "def iterate_image(previous_img, x0, y0, x1, y1, w, h, positive_data, negative_data):\n",
      "\t\timgw = previous_img.shape[0]\n",
      "\t\timgh = previous_img.shape[1]\n",
      "\t\tnew_img = copy(previous_img)\n",
      "\t\tkx = (imgw - 1) / (x1 - x0)\n",
      "\t\tky = (imgh - 1) / (y1 - y0)\n",
      "\t\tfor x, y in positive_data:\n",
      "\t\t\t\tix = int((x - x0) * kx) \n",
      "\t\t\t\tiy = int((y - y0) * ky)\n",
      "\t\t\t\tif 0 <= ix < imgw and 0 <= iy < imgh:\n",
      "\t\t\t\t\tnew_img[iy][ix] += 1\n",
      "\t\tfor x, y in negative_data:\n",
      "\t\t\t\tix = int((x - x0) * kx)\n",
      "\t\t\t\tiy = int((y - y0) * ky)\n",
      "\t\t\t\tif 0 <= ix < imgw and 0 <= iy < imgh:\n",
      "\t\t\t\t\tnew_img[iy][ix] -= 1\n",
      "\t\treturn new_img"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 11
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Create An Array Of matplotlib.image.AxesImage Objects"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To generate an animated heatmap of all the docking or undocking events in the CitiBike system, one approach is to generate an array of objects of the type <code>matplotlib.image.AxesImage</code> and then call Matplotlib's <code>animation.ArtistAnimation</code> method on that array to generate an animation. Below is the code for the hard part -- building the array of AxesImages from our data."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# find the earliest undocking timestamp in the database -- this will be the start time of the data used for the animation\n",
      "initial_start_date_undocking = undocking_dataframe.index[0]\n",
      "\n",
      "# ims will hold the series of images that make up the animation\n",
      "ims = []\n",
      "\n",
      "# determine the number of frames in the animation\n",
      "num_frames = (number_of_days*24*60)/time_increment"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 12
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "for i in range(num_frames):\n",
      "\t# find the indices of the next time windows for undocking events\n",
      "\tnext_start_date_undocking_index = undocking_dataframe.index.searchsorted( initial_start_date_undocking + dt.timedelta(minutes=time_increment*i) )\n",
      "\tnext_end_date_undocking_index = undocking_dataframe.index.searchsorted( initial_start_date_undocking + dt.timedelta(minutes=(time_window_size + time_increment*i) ) )\n",
      "\t# grab the next time windows for undocking events\n",
      "\tnext_start_date_undocking = undocking_dataframe.index[next_start_date_undocking_index]\n",
      "\tnext_end_date_undocking = undocking_dataframe.index[next_end_date_undocking_index]\n",
      "\t\n",
      "\t# find the indices of the next time windows for re-docking events\n",
      "\tnext_start_date_docking_index = docking_dataframe.index.searchsorted( next_start_date_undocking )\n",
      "\tnext_end_date_docking_index = docking_dataframe.index.searchsorted( next_end_date_undocking )\n",
      "\t# grab the next time windows for undocking events\n",
      "\tnext_start_date_docking = docking_dataframe.index[next_start_date_docking_index]\n",
      "\tnext_end_date_docking = docking_dataframe.index[next_end_date_docking_index]\n",
      "\t\n",
      "\t# get the next few undocking events\n",
      "\tnext_data_undocking = undocking_dataframe[next_start_date_undocking:next_end_date_undocking]\n",
      "\tx_udata = next_data_undocking['start station longitude']\n",
      "\ty_udata = next_data_undocking['start station latitude']\n",
      "\tx_undocking, y_undocking = m(x_udata,y_udata)\n",
      "\t\n",
      "\t# get the next few docking events\n",
      "\tnext_data_docking = docking_dataframe[next_start_date_docking:next_end_date_docking]\n",
      "\tx_ddata = next_data_docking['end station longitude']\n",
      "\ty_ddata = next_data_docking['end station latitude']\n",
      "\tx_docking, y_docking = m(x_ddata,y_ddata)\t\n",
      "\t\n",
      "\t# get the new heatmap image\n",
      "\tif i == 0:\n",
      "\t\tnew_heatmap_image = np.zeros((image_width, image_height))\n",
      "\telse:\n",
      "\t\tnew_heatmap_image = iterate_image(heatmap_image, lon0, lat0, lon1, lat1, image_width, image_height, zip(x_docking,y_docking), zip(x_undocking,y_undocking))\n",
      "\t\n",
      "\tif i % 100 == 0:\n",
      "\t\tnew_heatmap_image = decay_img(new_heatmap_image)\n",
      "\t\t\n",
      "\t# generate the z dimension with the newlt iterated image\n",
      "\tz_dimension = grid_density_gaussian_filter_img(new_heatmap_image, gaussian_filter_radius)\n",
      "\t\n",
      "\t#reset the heatmap_image variable\n",
      "\theatmap_image = new_heatmap_image\n",
      "\t\n",
      "\t# produce the im from this data\n",
      "\theatmap_im = m.imshow(z_dimension , origin='lower', extent=[lon0, lat0, lon1, lat1], alpha=.4, vmin=-1, vmax=1, cmap=blue_red1, aspect='auto')\n",
      "\t# find the text string for displaying time/date\n",
      "\tcurrent_date = initial_start_date_undocking + dt.timedelta(minutes=time_increment*i)\n",
      "\tcurrent_date_string = current_date.strftime('%I:%M%p on %a %b %d')\n",
      "\tcurrent_date_plotted = plt.text(-73.9509, 40.6909, current_date_string, fontsize=14)\n",
      "\t# append all Artist instances to the array of drawn images\n",
      "\tims.append([current_date_plotted,heatmap_im])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 13
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Animate the Images"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# set the figure size\n",
      "plt.gcf().set_size_inches(15.0, 12.0)\n",
      "plt.gcf().subplots_adjust(left=0, bottom=0, right=1, top=1, wspace=None, hspace=None)\n",
      "\n",
      "# run the animation\t\n",
      "anim = animation.ArtistAnimation(plt.gcf(), ims, interval=50, blit=True, repeat=True)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 14
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "#plot all of these points\n",
      "plt.show()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 15
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 15
    }
   ],
   "metadata": {}
  }
 ]
}