{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "> This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.\n" ] }, { "cell_type": "markdown", "metadata": { "word_id": "4818_07_kde" }, "source": [ "# 7.6. Estimating a probability distribution nonparametrically with a Kernel Density Estimation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You need to download the *Storms* dataset on the book's website, and extract it in the current directory. (http://ipython-books.github.io)\n", "\n", "You also need matplotlib's toolkit *basemap*. (http://matplotlib.org/basemap/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Let's import the usual packages. The kernel density estimation with a Gaussian kernel is implemented in *SciPy.stats*." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import scipy.stats as st\n", "import matplotlib.pyplot as plt\n", "from mpl_toolkits.basemap import Basemap\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Let's open the data with Pandas." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# http://www.ncdc.noaa.gov/ibtracs/index.php?name=wmo-data\n", "df = pd.read_csv(\"data/Allstorms.ibtracs_wmo.v03r05.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. The dataset contains information about most storms since 1848. A single storm may appear multiple times across several consecutive days." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df[df.columns[[0,1,3,8,9]]].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. We use Pandas' `groupby` function to obtain the average location of every storm." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "dfs = df.groupby('Serial_Num')\n", "pos = dfs[['Latitude', 'Longitude']].mean()\n", "y, x = pos.values.T\n", "pos.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. We display the storms on a map with basemap. This toolkit allows us to easily project the geographical coordinates on the map." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "m = Basemap(projection='mill', llcrnrlat=-65 ,urcrnrlat=85,\n", " llcrnrlon=-180, urcrnrlon=180)\n", "x0, y0 = m(-180, -65)\n", "x1, y1 = m(180, 85)\n", "plt.figure(figsize=(10,6))\n", "m.drawcoastlines()\n", "m.fillcontinents(color='#dbc8b2')\n", "xm, ym = m(x, y)\n", "m.plot(xm, ym, '.r', alpha=.1);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. To perform the Kernel Density Estimation, we need to stack the x and y coordinates of the storms into a 2xN array." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "h = np.vstack((xm, ym))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "kde = st.gaussian_kde(h)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "7. The `gaussian_kde` routine returned a Python function. To see the results on a map, we need to evaluate this function on a 2D grid spanning the entire map. We create this grid with `meshgrid`, and we pass the x, y values to the `kde` function. We need to arrange the shape of the array since `kde` accepts a 2xN array as input." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "k = 50\n", "tx, ty = np.meshgrid(np.linspace(x0, x1, 2*k),\n", " np.linspace(y0, y1, k))\n", "v = kde(np.vstack((tx.ravel(), ty.ravel()))).reshape((k, 2*k))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "8. Finally, we display the estimated density with `imshow`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.figure(figsize=(10,6))\n", "m.drawcoastlines()\n", "m.fillcontinents(color='#dbc8b2')\n", "xm, ym = m(x, y)\n", "m.imshow(v, origin='lower', extent=[x0,x1,y0,y1],\n", " cmap=plt.get_cmap('Reds'));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).\n", "\n", "> [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.2" } }, "nbformat": 4, "nbformat_minor": 0 }