{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "_Since we announced [our collaboration with the World Bank and more partners to create the Open Traffic platform](https://mapzen.com/blog/announcing-open-traffic/), we’ve been busy. We’ve shared [two](https://mapzen.com/blog/open-traffic-osmlr-technical-preview/) [technical](https://mapzen.com/blog/osmlr-2nd-technical-preview/) previews of the OSMLR linear referencing system. Now we’re ready to share more about how we’re using [Mapzen Map Matching](https://mapzen.com/blog/map-matching/) to “snap” GPS-derived locations to OSMLR segments, and how we’re using a data-driven approach to evaluate and improve the algorithms._" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# A \"data-driven\" approach to improving map-matching - Part II:\n", "## _PARAMETER TUNING_\n", "============================================================================================" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "In the [last](https://mapzen.com/blog/map-matching-validation/) blog post on Mapzen's map-matching service, we showed you how Mapzen uses synthetic GPS data to validate the results of our map-matching algorithm. This time around, we'll dive a _bit_ deeper into the internals of the algorithm itself to see how we can use our validation metrics to fine-tune the map-matching parameters." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## 0. Setup test environment" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "import os\n", "import sys; sys.path.insert(0, os.path.abspath('..'));\n", "import validator.validator as val\n", "import numpy as np\n", "import glob\n", "import pandas as pd\n", "import pickle\n", "import seaborn as sns\n", "from matplotlib import pyplot as plt\n", "from IPython.display import Image\n", "from IPython.core.display import HTML \n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "#### User vars" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "mapzenKey = os.environ.get('MAPZEN_API')\n", "gmapsKey = os.environ.get('GOOGLE_MAPS')\n", "cityName = 'San Francisco'\n", "numRoutes = 200" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "#### Load our old routes" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Check out the last [notebook](http://nbviewer.jupyter.org/github/opentraffic/reporter-quality-testing-rig/blob/33868a9703f9c00d42dd520339eac3ec04fb4ea0/notebooks/map_matching_part_I.ipynb) in this series if you don't have routes yet. Or just read along!" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "routeList = pickle.load(open('{0}_{1}_routes.pkl'.format(cityName, numRoutes), 'rb'))" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## 1. Parameter Definitions" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The Open Traffic Reporter map-matching service is based on the Hidden Markov Model (HMM) design of [Newson and Krumm (2009)](http://research.microsoft.com/en-us/um/people/jckrumm/Publications%202009/map%20matching%20ACM%20GIS%20camera%20ready.pdf). HMM's define a class of models that utilize a directed graph structure to represent the probability of observing a particular sequence of events -- or in our case, a particular sequence of road segments that defines a route. For our purposes, it is enough to know that HMM's are, in general, governed by two probability distributions which we've parameterized using $\\sigma_z$ and $\\beta$, respectively. If, however, you are interested in a more detailed explanation of how HMM's work in the context of map-matching, please see the excellent map-matching primer [here](https://github.com/valhalla/meili/blob/master/docs/meili/algorithms.md) written by Mapzen's own routing experts." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "![image](https://raw.githubusercontent.com/valhalla/meili/master/docs/meili/figures/model.png)\n", "
\n", " | sample_rate | \n", "noise | \n", "beta | \n", "sigma_z | \n", "score | \n", "
---|---|---|---|---|---|
0 | \n", "1 | \n", "0.0 | \n", "0.5 | \n", "0.5 | \n", "0.000000 | \n", "
1 | \n", "5 | \n", "0.0 | \n", "0.5 | \n", "0.5 | \n", "0.000000 | \n", "
2 | \n", "10 | \n", "0.0 | \n", "0.5 | \n", "0.5 | \n", "0.000000 | \n", "
3 | \n", "20 | \n", "0.0 | \n", "0.5 | \n", "0.5 | \n", "0.219928 | \n", "
4 | \n", "30 | \n", "0.0 | \n", "0.5 | \n", "0.5 | \n", "0.219928 | \n", "