{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# In Depth: Kernel Density Estimation" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "In the previous chapter we covered Gaussian mixture models, which are a kind of hybrid between a clustering estimator and a density estimator.\n", "Recall that a density estimator is an algorithm that takes a $D$-dimensional dataset and produces an estimate of the $D$-dimensional probability distribution that data is drawn from.\n", "The GMM algorithm accomplishes this by representing the density as a weighted sum of Gaussian distributions.\n", "*Kernel density estimation* (KDE) is in some senses an algorithm that takes the mixture-of-Gaussians idea to its logical extreme: it uses a mixture consisting of one Gaussian component *per point*, resulting in an essentially nonparametric estimator of density.\n", "In this chapter, we will explore the motivation and uses of KDE.\n", "\n", "We begin with the standard imports:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.style.use('seaborn-whitegrid')\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Motivating Kernel Density Estimation: Histograms\n", "\n", "As mentioned previously, a density estimator is an algorithm that seeks to model the probability distribution that generated a dataset.\n", "For one-dimensional data, you are probably already familiar with one simple density estimator: the histogram.\n", "A histogram divides the data into discrete bins, counts the number of points that fall in each bin, and then visualizes the results in an intuitive manner.\n", "\n", "For example, let's create some data that is drawn from two normal distributions:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "def make_data(N, f=0.3, rseed=1):\n", " rand = np.random.RandomState(rseed)\n", " x = rand.randn(N)\n", " x[int(f * N):] += 5\n", " return x\n", "\n", "x = make_data(1000)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We have previously seen that the standard count-based histogram can be created with the `plt.hist` function.\n", "By specifying the `density` parameter of the histogram, we end up with a normalized histogram where the height of the bins does not reflect counts, but instead reflects probability density (see the following figure):" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD0CAYAAACLpN0/AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVTUlEQVR4nO3df0zU9x3H8Rd3x1F6B3UsXdosA5XkNrQjiE23xIFpkDltl5htWjBlbcy0sTq3yigNyVpCCUITmsY14pwbGiw/bGwWF7P9QWyAEteml5EWd7CsplvnOkXpEu6y4zzvuz+stxHQA+7O4z4+H3/x/X7u+7332+/5yofvjyPDsixLAIC0Z0t1AQCAxCDQAcAQBDoAGIJABwBDEOgAYAgCHQAM4UjVG3u93lS9NQCktbVr1865PmWBLt26qFTw+XwqKipKdRkJY1I/JvUi0c9St9T7ud1kmFMuAGAIAh0ADEGgA4AhCHQAMETMi6KRSESNjY0aHx+X0+lUc3OzCgoKouNvvPGG3nrrLWVkZGjHjh3avHmzgsGg6urqdPXqVblcLrW1tSkvLy+pjQDA3S7mDL2/v1+hUEh9fX2qra1Va2trdGxyclI9PT3q7e3VsWPH1NbWJsuy1NPTI4/Ho+7ubm3ZskWHDh1KahMAgHkEutfrVVlZmSSppKREo6Oj0bG8vDz99re/VWZmpq5cuaKsrCxlZGTM2Ka8vFznzp1LUvkAgJtinnLx+/1yu93RZbvdrnA4LIfjxqYOh0MnTpzQL37xC9XU1ES3ycnJkSS5XC5NTU3NuW+fzxd3A4kSDAaXVD3xMqkfk3qR6GepS+d+Yga62+1WIBCILkcikWiY3/Tkk09q27Zt2rlzp/74xz/O2CYQCCg3N3fOfS+lm/eX+sMEC2VSPyb1IiWmn+UvnJnX6z5ufSyu95kPjs+dFdeDRaWlpRocHJQkjYyMyOPxRMcuXLigvXv3yrIsZWZmyul0ymazqbS0VAMDA5KkwcHBJfVEKACYKuYMvbKyUsPDw6qqqpJlWWppaVFnZ6fy8/NVUVGhr33ta3riiSeUkZGhsrIyPfLII/r617+u+vp6VVdXKzMzU+3t7XeiFwC4q8UMdJvNpqamphnrCgsLoz/v3btXe/funTGenZ2tgwcPJqhEAMB88GARABiCQAcAQxDoAGAIAh0ADEGgA4AhCHQAMASBDgCGINABwBAEOgAYgkAHAEMQ6ABgCAIdAAxBoAOAIQh0ADAEgQ4AhiDQAcAQBDoAGIJABwBDEOgAYAgCHQAMQaADgCEIdAAwBIEOAIZwpLoAAKm3/IUz83rdx62PJbkSxIMZOgAYgkAHAEMQ6ABgiJjn0CORiBobGzU+Pi6n06nm5mYVFBREx48dO6YzZ26cf1u/fr327t0ry7JUXl6u5cuXS5JKSkpUW1ubnA4AAJLmEej9/f0KhULq6+vTyMiIWltb1dHRIUn65JNPdPr0ab355puy2Wyqrq7Whg0blJ2drdWrV+vw4cNJbwAAcEPMUy5er1dlZWWSbsy0R0dHo2MPPPCAjh49KrvdroyMDIXDYWVlZen8+fO6dOmSampqtHPnTl24cCF5HQAAJM1jhu73++V2u6PLdrtd4XBYDodDmZmZysvLk2VZeuWVV7Rq1SqtWLFCV65c0a5du7Rp0ya9//77qqur06lTp2bt2+fzJbabOASDwSVVT7xM6sekXqQ720+i32eu/XF8lo6Yge52uxUIBKLLkUhEDsf/NpuenlZDQ4NcLpdeeuklSdJDDz0ku90uSXr44Yd1+fJlWZaljIyMGfsuKipKSBOJ4PP5llQ98TKpH5N6kRLVz/x+653/+yx+fxyfO8vr9d5yLOYpl9LSUg0ODkqSRkZG5PF4omOWZenZZ5/VV7/6VTU1NUVD/PXXX9fx48clSWNjY3rwwQdnhTkAILFiztArKys1PDysqqoqWZallpYWdXZ2Kj8/X5FIRO+9955CoZCGhoYkSfv379euXbtUV1engYEB2e12HThwIOmNAMDdLmag22w2NTU1zVhXWFgY/fnDDz+cc7sjR47EWRoAYCH4LhdgCZnvd6oAc+FJUQAwBIEOAIYg0AHAEAQ6ABiCQAcAQxDoAGAIblsEDMZtkHcXZugAYAgCHQAMQaADgCEIdAAwBIEOAIYg0AHAEAQ6ABiCQAcAQxDoAGAIAh0ADEGgA4AhCHQAMASBDgCGINABwBAEOgAYgkAHAEMQ6ABgCAIdAAwR80/QRSIRNTY2anx8XE6nU83NzSooKIiOHzt2TGfO3PgzV+vXr9fevXsVDAZVV1enq1evyuVyqa2tTXl5ecnrAgAQe4be39+vUCikvr4+1dbWqrW1NTr2ySef6PTp0+rt7dXJkyf1zjvvaGxsTD09PfJ4POru7taWLVt06NChpDYBAJhHoHu9XpWVlUmSSkpKNDo6Gh174IEHdPToUdntdmVkZCgcDisrK2vGNuXl5Tp37lySygcA3BQz0P1+v9xud3TZbrcrHA5LkjIzM5WXlyfLstTW1qZVq1ZpxYoV8vv9ysnJkSS5XC5NTU0lqXwAwE0xz6G73W4FAoHociQSkcPxv82mp6fV0NAgl8ull156adY2gUBAubm5c+7b5/PFVXwiBYPBJVVPvEzqx6RepPTuZ66607mfuaRzPzEDvbS0VG+//bY2b96skZEReTye6JhlWXr22Wf1jW98Q7t27ZqxzcDAgIqLizU4OKi1a9fOue+ioqIEtJAYPp9vSdUTL5P6MakXKVY/F+5oLQs1V9131/FJPa/Xe8uxmIFeWVmp4eFhVVVVybIstbS0qLOzU/n5+YpEInrvvfcUCoU0NDQkSdq/f7+qq6tVX1+v6upqZWZmqr29PXHdAADmFDPQbTabmpqaZqwrLCyM/vzhhx/Oud3BgwfjLA0AsBA8WAQAhiDQAcAQMU+5AMBNy184c4uR2RdzP259LLnFYBZm6ABgCAIdAAxBoAOAIQh0ADAEgQ4AhiDQAcAQBDoAGIJABwBDEOgAYAieFAXugNlPWC7tr8lFemKGDgCGINABwBAEOgAYgkAHAEMQ6ABgCAIdAAxBoAOAIQh0ADAEgQ4AhiDQAcAQBDoAGIJABwBDEOgAYAgCHQAMEfPrcyORiBobGzU+Pi6n06nm5mYVFBTMeM3k5KSqq6t1+vRpZWVlybIslZeXa/ny5ZKkkpIS1dbWJqUBAMANMQO9v79foVBIfX19GhkZUWtrqzo6OqLjQ0NDam9v18TERHTd3//+d61evVqHDx9OTtUAgFlinnLxer0qKyuTdGOmPTo6OnMHNps6Ozu1bNmy6Lrz58/r0qVLqqmp0c6dO3XhAl/mDwDJFnOG7vf75Xa7o8t2u13hcFgOx41N161bN2ub+++/X7t27dKmTZv0/vvvq66uTqdOnZr1Op/PF0/tCRUMBpdUPfEyqR+TermbpOsxS+fPW8xAd7vdCgQC0eVIJBIN81t56KGHZLfbJUkPP/ywLl++LMuylJGRMeN1RUVFi6k5KXw+35KqJ14m9WNGL3ffb6npesyW+ufN6/XecizmKZfS0lINDg5KkkZGRuTxeGK+4euvv67jx49LksbGxvTggw/OCnMAQGLFnKFXVlZqeHhYVVVVsixLLS0t6uzsVH5+vioqKubcZteuXaqrq9PAwIDsdrsOHDiQ8MIBADPFDHSbzaampqYZ6woLC2e97uzZs9Gf77vvPh05ciQB5QEA5osHiwDAEAQ6ABiCQAcAQxDoAGAIAh0ADEGgA4AhCHQAMETM+9CRXpa/cOb/lm79uPnHrY8lvxgAdxQzdAAwBIEOAIYg0AHAEJxDTxMzz40DwGzM0AHAEAQ6ABiCQAcAQxDoAGAIAh0ADEGgA4AhCHQAMASBDgCGINABwBAEOgAYgkAHAEMQ6ABgCAIdAAxBoAOAIQh0ADBEzECPRCJ68cUX9cQTT6impkZ/+9vfZr1mcnJSGzdu1PT0tCQpGAzqxz/+sbZv366dO3dqcnIy8ZUDAGaIGej9/f0KhULq6+tTbW2tWltbZ4wPDQ1px44dmpiYiK7r6emRx+NRd3e3tmzZokOHDiW+cgDADDH/YpHX61VZWZkkqaSkRKOjozPGbTabOjs79f3vf3/GNj/60Y8kSeXl5QT6ErSQv4D0cetjSawEQKLEDHS/3y+32x1dttvtCofDcjhubLpu3bo5t8nJyZEkuVwuTU1Nzblvn8+3qKKTIRgMLql6lpJU/7twbNJTuh6zdP68xQx0t9utQCAQXY5EItEwn882gUBAubm5c76uqKhoIbUmlc/nW1L1zHYhZe+c6n+XpX9s5iN1xy9V0vWYLfXPm9frveVYzHPopaWlGhwclCSNjIzI4/HEfMPS0lINDAxIkgYHB7V27dr51goAWKSYM/TKykoNDw+rqqpKlmWppaVFnZ2dys/PV0VFxZzbVFdXq76+XtXV1crMzFR7e3vCCwcAzBQz0G02m5qammasKywsnPW6s2fPRn/Ozs7WwYMHE1AegHQ13wvvXHRPHB4sAgBDEOgAYAgCHQAMQaADgCFiXhQFuLgFpAdm6ABgCGboQBwW8p04QLIxQwcAQzBDTzFmeAAShRk6ABiCQAcAQxDoAGAIAh0ADEGgA4AhCHQAMASBDgCGINABwBAEOgAYgidFk4QnQAHcaczQAcAQBDoAGIJABwBDcA4dQErxF7EShxk6ABiCQAcAQxDoAGCImOfQI5GIGhsbNT4+LqfTqebmZhUUFETHT548qd7eXjkcDu3evVuPPvqo/v3vf2vjxo3yeDySpA0bNuipp55KXhcAgNiB3t/fr1AopL6+Po2MjKi1tVUdHR2SpImJCXV1denUqVOanp7W9u3btW7dOv35z3/W448/rp///OdJbwAAcEPMUy5er1dlZWWSpJKSEo2OjkbHPvjgA61Zs0ZOp1M5OTnKz8/X2NiYRkdHdf78eT355JPat2+fLl++nLwOAACS5hHofr9fbrc7umy32xUOh6NjOTk50TGXyyW/36+VK1dq3759OnHihDZs2KDm5uYklA4A+H8xT7m43W4FAoHociQSkcPhmHMsEAgoJydHxcXFys7OliRVVlbq4MGDc+7b5/PFVXwiBYPBJVVPOkrWvx/HBtKdy4t0/rzFDPTS0lK9/fbb2rx5s0ZGRqIXOiWpuLhYr732mqanpxUKhfTRRx/J4/Govr5e3/72t7V582adO3dOq1evnnPfRUVFieskTj6fL8H1XEjgvtJDso5n4o/N7fHFakvTnfoM3OnP20J5vd5bjsUM9MrKSg0PD6uqqkqWZamlpUWdnZ3Kz89XRUWFampqtH37dlmWpeeee05ZWVmqra1VQ0ODenp6lJ2dzSkXLAkENUwXM9BtNpuamppmrCssLIz+vG3bNm3btm3G+Fe+8hV1dXUlqEQAwHzwXS5IGL6TA0gtnhQFAEMQ6ABgCAIdAAzBOfQF4k4JAEsVM3QAMASBDgCGINABwBAEOgAYgouiuOMWd2H51t+Nw4NKwA3M0AHAEAQ6ABiCQAcAQxDoAGAIAh0ADEGgA4AhCHQAMASBDgCG4MEipD2+AfPuwF/Eio0ZOgAYghn65zYdv6DbPV4OAEsdM3QAMASBDgCGINABwBAEOgAYwuiLotzOBuBuwgwdAAwRc4YeiUTU2Nio8fFxOZ1ONTc3q6CgIDp+8uRJ9fb2yuFwaPfu3Xr00Uc1OTmpn/3sZwoGg/rSl76kAwcOKDs7O2FFM/MGcCt38wNIMWfo/f39CoVC6uvrU21trVpbW6NjExMT6urqUm9vr37961/r1VdfVSgU0qFDh/T444+ru7tbq1atUl9fX1KbAADMY4bu9XpVVlYmSSopKdHo6Gh07IMPPtCaNWvkdDrldDqVn5+vsbExeb1ePfPMM5Kk8vJyvfrqq3r66aeT0wEALMLtZ/LJfcgwWb8dxAx0v98vt9sdXbbb7QqHw3I4HPL7/crJyYmOuVwu+f3+GetdLpempqbm3LfX611U0ae2PrCo7QBgKVhs9sUSM9DdbrcCgUB0ORKJyOFwzDkWCASUk5MTXX/PPfcoEAgoNzd31n7Xrl2biPoBAJ+LeQ69tLRUg4ODkqSRkRF5PJ7oWHFxsbxer6anpzU1NaWPPvpIHo9HpaWlGhgYkCQNDg4S3gBwB2RYlmXd7gU373L5y1/+Isuy1NLSosHBQeXn56uiokInT55UX1+fLMvSM888o40bN+rKlSuqr69XIBDQF77wBbW3t+vee++9Uz0BwF0pZqDfLaamplRXVye/369r167phRde0Jo1a1Jd1oLEusU03Vy7dk0NDQ26ePGiQqGQdu/erYqKilSXFberV6/qe9/7nn7zm9+osLAw1eXE5Ze//KXOnj2ra9euqbq6Wlu3bk11SYty8//8xYsXZbPZ9PLLL6flseHBos91dnbqm9/8pk6cOKEDBw6oqakp1SUt2O1uMU1Hp0+f1rJly9Td3a2jR4/q5ZdfTnVJcbt27ZpefPFF3XPPPakuJW7vvvuu/vSnP6mnp0ddXV3617/+leqSFm1gYEDhcFi9vb3as2ePXnvttVSXtChGP/q/EE8//bScTqck6fr168rKykpxRQt3u1tM09F3vvMdbdy4UZJkWZbsdnuKK4pfW1ubqqqqdOTIkVSXErd33nlHHo9He/bskd/v1/PPP5/qkhZtxYoVun79uiKRiPx+f/TGj3STnlXH6c0339Tx48dnrGtpaVFxcbEmJiZUV1enhoaGFFW3eLe7xTQduVwuSTf62rdvn37605+mtqA4vfXWW8rLy1NZWZkRgf7ZZ5/pn//8pw4fPqx//OMf2r17t/7whz8oIyMj1aUt2L333quLFy9q06ZN+uyzz3T48OFUl7Qo6fk/PU5bt26d81zf+Pi49u/fr+eff16PPPJICiqLz+1uMU1Xn376qfbs2aPt27fru9/9bqrLicupU6eUkZGhc+fOyefzqb6+Xh0dHbr//vtTXdqiLFu2TCtXrpTT6dTKlSuVlZWlyclJffGLX0x1aQt27Ngxfetb31Jtba0+/fRTPfXUU/rd736Xdr+pcw79c3/961/1k5/8RO3t7Vq/fn2qy1mU291imo6uXLmiHTt2qK6uTj/4wQ9SXU7c3njjDZ04cUJdXV0qKipSW1tb2oa5dONZkqGhIVmWpUuXLuk///mPli1bluqyFiU3Nzf6MOR9992ncDis69evp7iqheMul8/t3r1b4+Pj+vKXvyzpxmy3o6MjxVUtzFy3mKbjlfqbmpub9fvf/14rV66MrvvVr35lxAXFmpoaNTY2pvXxkaRXXnlF7777rizL0nPPPRe9hpNuAoGAGhoaNDExoWvXrumHP/xhWv5GSKADgCE45QIAhiDQAcAQBDoAGIJABwBDEOgAYAgCHQAMQaADgCEIdAAwxH8B2HcDlo+LR6cAAAAASUVORK5CYII=", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "hist = plt.hist(x, bins=30, density=True)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Notice that for equal binning, this normalization simply changes the scale on the y-axis, leaving the relative heights essentially the same as in a histogram built from counts.\n", "This normalization is chosen so that the total area under the histogram is equal to 1, as we can confirm by looking at the output of the histogram function:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "density, bins, patches = hist\n", "widths = bins[1:] - bins[:-1]\n", "(density * widths).sum()" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "One of the issues with using a histogram as a density estimator is that the choice of bin size and location can lead to representations that have qualitatively different features.\n", "For example, if we look at a version of this data with only 20 points, the choice of how to draw the bins can lead to an entirely different interpretation of the data!\n", "Consider this example, visualized in the following figure:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "deletable": true, "editable": true, "tags": [] }, "outputs": [], "source": [ "x = make_data(20)\n", "bins = np.linspace(-5, 10, 10)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAsIAAAD3CAYAAAAXOjLuAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAaZ0lEQVR4nO3df0xV9/3H8dflXkDkYh1bG7tkoCW5LeocxWbZZsA5Sjpts5ilUWASWxNNra6dMmZjViWMATZjaVyDztmhsxXQaBYXsy1jXcAyt643JS3uSjJNu27r6q8u494MEO/5/uG3t6XSHuBeOOfez/PxF/eee855v885vvvyeMr1WJZlCQAAADBMmtMFAAAAAE4gCAMAAMBIBGEAAAAYiSAMAAAAIxGEAQAAYCSCMAAAAIzks/tANBpVXV2dBgYGlJGRoYaGBuXn58eWv/jiizp58qQ8Ho82bNigVatWaWhoSLW1tbp69aqys7O1Z88e5ebmTmsjAAAAwGTY3hHu6urSyMiIOjs7VVNTo+bm5tiya9euqb29XR0dHTp06JD27Nkjy7LU3t6uQCCgo0ePavXq1WptbZ3WJgAAAIDJsg3CwWBQJSUlkqSioiL19/fHluXm5uqXv/yl0tPTdeXKFWVmZsrj8YxZp7S0VGfPnp2m8gEAAICpsX00IhwOy+/3x157vV6Njo7K57u5qs/n0wsvvKCf/OQnqq6ujq2Tk5MjScrOztbg4OAt2w0GgwlpAABSzdKlS6e8LrMVAMY33my1DcJ+v1+RSCT2OhqNxkLw+9atW6c1a9Zo48aN+tOf/jRmnUgkojlz5ky4oGQTCoVUWFjodBkJkSq90Ie7pEof0sz0koggy2x1l1TphT7cJVX6kJydrbaPRhQXF6unp0eS1NfXp0AgEFt28eJFbd26VZZlKT09XRkZGUpLS1NxcbG6u7slST09PSkxlAEAAJBabO8Il5eXq7e3VxUVFbIsS42NjWpra1NeXp7Kysp0zz33aO3atfJ4PCopKdEXv/hFff7zn9eOHTtUWVmp9PR0tbS0zEQvAAAAwITZBuG0tDTV19ePea+goCD289atW7V169Yxy7OysrR3794ElQgAAAAkHl+oAQAAACMRhAEAAGAkgjAAAACMRBAGAACAkQjCAAAAMBJBGAAAAEYiCAMAAMBIBGEAAAAYiSAMAAAAIxGEAQAAYCSCMAAAAIxEEAYAAICRCMIAAAAwEkEYAAAARiIIAwAAwEg+pwsAAACYbvOfOv2hVxcdq8POm80POl2CUbgjDAAAACMRhAEAAGAkgjAAAACMRBAGAACAkQjCAAAAMBJBGAAAAEYiCAMAAMBIBGEAAAAYiSAMAAAAIxGEAQAAYCTbr1iORqOqq6vTwMCAMjIy1NDQoPz8/NjyQ4cO6fTpm19buHz5cm3dulWWZam0tFTz58+XJBUVFammpmZ6OgAAAACmwDYId3V1aWRkRJ2dnerr61Nzc7P27dsnSXr77bd16tQpHT9+XGlpaaqsrNT999+vrKwsLVq0SPv375/2BgAAAICpsH00IhgMqqSkRNLNO7v9/f2xZfPmzdPBgwfl9Xrl8Xg0OjqqzMxMnTt3Tu+++66qq6u1ceNGXbx4cfo6AAAAAKbA9o5wOByW3++PvfZ6vRodHZXP51N6erpyc3NlWZaeeeYZLVy4UAsWLNCVK1e0adMmrVy5Uq+++qpqa2t14sSJW7YdCoUS240DhoaGUqIPKXV6oQ93SZU+pOTpJRlqtJMsx3oiUqWXVOnD7SZ6jFPpfDjZi20Q9vv9ikQisdfRaFQ+3werDQ8Pa+fOncrOztbu3bslSYsXL5bX65Uk3Xfffbp06ZIsy5LH4xmz7cLCwoQ04aRQKJQSfUip0wt9uEuq9CHNTC/BYDDubaTC8ea6cZ/k7yM5/nV6osc4+c/HB5ycrbaPRhQXF6unp0eS1NfXp0AgEFtmWZYef/xx3X333aqvr4+F3+eee06HDx+WJJ0/f1533nnnLSEYAAAAcJLtHeHy8nL19vaqoqJClmWpsbFRbW1tysvLUzQa1SuvvKKRkRGdOXNGkrR9+3Zt2rRJtbW16u7ultfrVVNT07Q3AgAAAEyGbRBOS0tTfX39mPcKCgpiP7/xxhvjrnfgwIE4SwMAAACmD1+oAQAAACMRhAEAAGAkgjAAAACMRBAGAACAkQjCAAAAMBJBGAAAAEYiCAMAAMBIBGEAAAAYiSAMAAAAIxGEAQAAYCSCMAAAAIxEEAYAAICRCMIAAAAwEkEYAAAARiIIAwAAwEgEYQAAABiJIAwAAAAjEYQBAABgJIIwAAAAjEQQBgAAgJEIwgAAADASQRgAAABG8jldAJCK5j91OkFbupig7YzvzeYHp3X7AAC4GXeEAQAAYCSCMAAAAIxEEAYAAICRbJ8Rjkajqqur08DAgDIyMtTQ0KD8/PzY8kOHDun06ZvPQy5fvlxbt27V0NCQamtrdfXqVWVnZ2vPnj3Kzc2dvi4AAACASbK9I9zV1aWRkRF1dnaqpqZGzc3NsWVvv/22Tp06pY6ODh07dkwvv/yyzp8/r/b2dgUCAR09elSrV69Wa2vrtDYBAAAATJZtEA4GgyopKZEkFRUVqb+/P7Zs3rx5OnjwoLxerzwej0ZHR5WZmTlmndLSUp09e3aaygcAAACmxvbRiHA4LL/fH3vt9Xo1Ojoqn8+n9PR05ebmyrIsPfPMM1q4cKEWLFigcDisnJwcSVJ2drYGBwfH3XYoFEpQG84ZGhpKiT6k1OklVfqYCTNxnFLpfCRLL8lQo51kOdYTkSq9pEofbjfRY5xK58PJXmyDsN/vVyQSib2ORqPy+T5YbXh4WDt37lR2drZ27959yzqRSERz5swZd9uFhYVxFe8GoVAoJfqQUqcXd/Qxvb//N1Fm4ji543wkxkz0EgwG495GKhxvrhv3Sf4+UmsuJ//5+ICTs9X20Yji4mL19PRIkvr6+hQIBGLLLMvS448/rrvvvlv19fXyer2xdbq7uyVJPT09Wrp0adwNAAAAAIlke0e4vLxcvb29qqiokGVZamxsVFtbm/Ly8hSNRvXKK69oZGREZ86ckSRt375dlZWV2rFjhyorK5Wenq6WlpZpbwQAAACYDNsgnJaWpvr6+jHvFRQUxH5+4403xl1v7969cZYGAAAATB++UAMAAABGIggDAADASARhAAAAGIkgDAAAACMRhAEAAGAkgjAAAACMRBAGAACAkQjCAAAAMBJBGAAAAEYiCAMAAMBIBGEAAAAYiSAMAAAAIxGEAQAAYCSCMAAAAIxEEAYAAICRCMIAAAAwEkEYAAAARiIIAwAAwEgEYQAAABiJIAwAAAAjEYQBAABgJIIwAAAAjEQQBgAAgJEIwgAAADASQRgAAABGIggDAADASD67D0SjUdXV1WlgYEAZGRlqaGhQfn7+mM9cu3ZNlZWVOnXqlDIzM2VZlkpLSzV//nxJUlFRkWpqaqalAQAAAGAqbINwV1eXRkZG1NnZqb6+PjU3N2vfvn2x5WfOnFFLS4suX74ce+/vf/+7Fi1apP37909P1QAAAECcbB+NCAaDKikpkXTzzm5/f//YDaSlqa2tTXPnzo29d+7cOb377ruqrq7Wxo0bdfHixcRWDQAAAMTJ9o5wOByW3++PvfZ6vRodHZXPd3PVZcuW3bLO7bffrk2bNmnlypV69dVXVVtbqxMnTtzyuVAoFE/trjA0NJQSfUip00uq9DET5j91eob2FN9fhn+9/q4E1RGfZLm2kqFGO8lyrCciVXpJlT7cbqLHOJXOh5O92AZhv9+vSCQSex2NRmMh+OMsXrxYXq9XknTffffp0qVLsixLHo9nzOcKCwunUrOrhEKhlOhDSp1e3NEH/wqSSM6fz5tm4toKBoNxb8Mtxyse7vhznBip0kvy95Ecc3mixzj5z8cHnJytto9GFBcXq6enR5LU19enQCBgu7PnnntOhw8fliSdP39ed9555y0hGAAAAHCS7R3h8vJy9fb2qqKiQpZlqbGxUW1tbcrLy1NZWdm462zatEm1tbXq7u6W1+tVU1NTwgsHAAAA4mEbhNPS0lRfXz/mvYKCgls+99JLL8V+vu2223TgwIEElAcAAABMD75QAwAAAEYiCAMAAMBIBGEAAAAYiSAMAAAAIxGEAQAAYCSCMAAAAIxEEAYAAICRCMIAAAAwEkEYAAAARiIIAwAAwEgEYQAAABiJIAwAAAAjEYQBAABgJIIwAAAAjEQQBgAAgJEIwgAAADASQRgAAABGIggDAADASARhAAAAGIkgDAAAACMRhAEAAGAkgjAAAACMRBAGAACAkQjCAAAAMBJBGAAAAEYiCAMAAMBItkE4Go1q165dWrt2raqrq/XWW2/d8plr167pgQce0PDwsCRpaGhI3/72t1VVVaWNGzfq2rVria8cAAAAiINtEO7q6tLIyIg6OztVU1Oj5ubmMcvPnDmjDRs26PLly7H32tvbFQgEdPToUa1evVqtra2JrxwAAACIg20QDgaDKikpkSQVFRWpv79/7AbS0tTW1qa5c+eOu05paanOnj2bwJIBAACA+PnsPhAOh+X3+2OvvV6vRkdH5fPdXHXZsmXjrpOTkyNJys7O1uDg4LjbDoVCUyraTYaGhlKiDyl1ekmVPvABt5zPZLm2kqFGO8lyrCciVXpJlT7cbqLHOJXOh5O92AZhv9+vSCQSex2NRmMheCLrRCIRzZkzZ9zPFRYWTqZWVwqFQinRh5Q6vbijj4sO7z+1OH8+b5qJaysYDMa9Dbccr3i4489xYqRKL8nfR3LM5Yke4+Q/Hx9wcrbaPhpRXFysnp4eSVJfX58CgYDtzoqLi9Xd3S1J6unp0dKlSydTKwAAADDtbO8Il5eXq7e3VxUVFbIsS42NjWpra1NeXp7KysrGXaeyslI7duxQZWWl0tPT1dLSkvDCAQAAgHjYBuG0tDTV19ePea+goOCWz7300kuxn7OysrR3794ElAcAAABMD75QAwAAAEYiCAMAAMBIto9GIH7znzrtdAkT8uv1dzldAgAARptcZnDuN2G82fygY/tOJO4IAwAAwEgEYQAAABiJIAwAAAAjEYQBAABgJIIwAAAAjEQQBgAAgJEIwgAAADASQRgAAABGIggDAADASARhAAAAGIkgDAAAACMRhAEAAGAkgjAAAACMRBAGAACAkQjCAAAAMBJBGAAAAEYiCAMAAMBIBGEAAAAYiSAMAAAAI/mcLgDusfLwRUkXnS7jE73Z/KDTJQD4BPOfOp3Arbl7Hk1OqvSSKn0AN3FHGAAAAEYiCAMAAMBIBGEAAAAYyfYZ4Wg0qrq6Og0MDCgjI0MNDQ3Kz8+PLT927Jg6Ojrk8/m0efNmrVixQv/5z3/0wAMPKBAISJLuv/9+rV+/fvq6AAAAACbJNgh3dXVpZGREnZ2d6uvrU3Nzs/bt2ydJunz5so4cOaITJ05oeHhYVVVVWrZsmf7617/qoYce0tNPPz3tDQAAAABTYftoRDAYVElJiSSpqKhI/f39sWWvv/667r33XmVkZCgnJ0d5eXk6f/68+vv7de7cOa1bt05PPPGELl26NH0dAAAAAFNge0c4HA7L7/fHXnu9Xo2Ojsrn8ykcDisnJye2LDs7W+FwWHfddZcWL16sr3zlKzp16pQaGhq0d+/eW7YdCoUS1IZzhoaGUqKPZDGRY805ST1uOZ/Jcm0lQ40Aklsi54yTs9U2CPv9fkUikdjraDQqn8837rJIJKKcnBwtWbJEWVlZkqTy8vJxQ7AkFRYWxlW8G4RCoQn0we9dTJSJXDMTOyfTjXOeSM6fz5tm4toKBoNxb8PZ48W1D5ggkXPGydlq+2hEcXGxenp6JEl9fX2x/wFOkpYsWaJgMKjh4WENDg7qwoULCgQC+v73v6/f/va3kqSzZ89q0aJFiegBAAAASBjbO8Ll5eXq7e1VRUWFLMtSY2Oj2tralJeXp7KyMlVXV6uqqkqWZWnbtm3KzMxUTU2Ndu7cqfb2dmVlZamhoWEmegEAAAAmzDYIp6Wlqb6+fsx7BQUFsZ/XrFmjNWvWjFn+uc99TkeOHElQiQAAAEDi8YUaAAAAMBJBGAAAAEYiCAMAAMBIBGEAAAAYiSAMAAAAIxGEAQAAYCSCMAAAAIxEEAYAAICRbL9Qw83mP3Xa6RL+30WnCzDGxM855ySVuOfPuvRJ19abzQ/OYB0AgHhxRxgAAABGIggDAADASARhAAAAGIkgDAAAACMRhAEAAGAkgjAAAACMRBAGAACAkQjCAAAAMBJBGAAAAEYiCAMAAMBIBGEAAAAYiSAMAAAAIxGEAQAAYCSCMAAAAIxEEAYAAICRCMIAAAAwEkEYAAAARvLZfSAajaqurk4DAwPKyMhQQ0OD8vPzY8uPHTumjo4O+Xw+bd68WStWrNC1a9f03e9+V0NDQ7rjjjvU1NSkrKysaW0EAAAAmAzbO8JdXV0aGRlRZ2enampq1NzcHFt2+fJlHTlyRB0dHXr++ef14x//WCMjI2ptbdVDDz2ko0ePauHChers7JzWJgAAAIDJsg3CwWBQJSUlkqSioiL19/fHlr3++uu69957lZGRoZycHOXl5en8+fNj1iktLdUf//jHaSofAAAAmBrbRyPC4bD8fn/stdfr1ejoqHw+n8LhsHJycmLLsrOzFQ6Hx7yfnZ2twcHBcbcdCoXiKv7X6++Ka/1EGBoa0qxZs5wuIyFSpRf6cJdU6UOy7yXemZYoTtaRqLls0nWTLOjDXZzuI5FzZmhoyLG5ZRuE/X6/IpFI7HU0GpXP5xt3WSQSUU5OTuz9WbNmKRKJaM6cOeNuu7CwMN76HRcKhVKiDyl1eqEPd0mVPqSZ6SUYDMa9jVQ43lw37kMf7pIqfUjOzlbbRyOKi4vV09MjSerr61MgEIgtW7JkiYLBoIaHhzU4OKgLFy4oEAiouLhY3d3dkqSenh4tXbo0ET0AAAAACWN7R7i8vFy9vb2qqKiQZVlqbGxUW1ub8vLyVFZWpurqalVVVcmyLG3btk2ZmZnavHmzduzYoWPHjulTn/qUWlpaZqIXAAAAYMJs7winpaWpvr5eHR0d6uzsVEFBgR599FGVlZVJktasWaMTJ07o5MmTeuCBByRJn/nMZ/T888+ro6ND+/bt0+zZs6e3i2lWV1eXFPt+/7NO1vtRiazFDX3NRA3j7ePD7020hrq6uluuCbttJ2qZXV1TXe6GawCJw2ydOmZrYvbBbI1vn6nAY1mW5cSOg8Fg0jwy4fF49HGHabqfa/mkfX/cZyezzodNRy9TrSWebU3nOUlkP3b7+HAfH97vRGvweDySNOaaGG/dT9reVJd92EfPh916idjndJmp59jimY3M1vj3/XGfZbYyW9//nMRsTSQnZyvfLAcAAAAjEYQBAABgJIIwAAAAjEQQBgAAgJFsf30apN27dyfFvt//rJP1flQia3FDXzNRw3j7+PB7E61hvHXstp2oZROta6b2CXditk4dszUx+2C2xrfPVMBvjYgT3+ziPvThLqnSh8RvjZhJXDfuQx/ukip9SPzWCAAAAGDGOXpHGABwq3jvCAMAbjXebHUsCAMAAABO4tEIAAAAGIkgDAAAACMRhAEAAGAkx4PwhQsXtHTpUg0PDztdypQMDg7qscce07p167R27Vq99tprTpc0KdFoVLt27dLatWtVXV2tt956y+mSpuT69euqra1VVVWVHn74Yf3+9793uqS4XL16VcuXL9eFCxecLiUuP/3pT7V27Vp985vf1PHjx50uZ0quX7+umpoaVVRUqKqqKmnOCbPVWcxWd2K2uodbZqujQTgcDmvPnj3KyMhwsoy4tLW16Utf+pJeeOEFNTU1qb6+3umSJqWrq0sjIyPq7OxUTU2NmpubnS5pSk6dOqW5c+fq6NGjOnjwoH7wgx84XdKUXb9+Xbt27dKsWbOcLiUuf/7zn/Xaa6+pvb1dR44c0b///W+nS5qS7u5ujY6OqqOjQ1u2bNGzzz7rdEm2mK3OY7a6D7PVXdwyWx0LwpZl6emnn9b27duVlZXlVBlxe+SRR1RRUSFJunHjhjIzMx2uaHKCwaBKSkokSUVFRerv73e4oqn5+te/rieffFLSzWvL6/U6XNHU7dmzRxUVFbrjjjucLiUuL7/8sgKBgLZs2aLHHntMX/3qV50uaUoWLFigGzduKBqNKhwOy+dz9xdyMlvdgdnqPsxWd3HLbJ2RvR4/flyHDx8e895nP/tZrVq1Svfcc89MlJAQ4/XR2NioJUuW6PLly6qtrdXOnTsdqm5qwuGw/H5/7LXX69Xo6Kjr/2P/UdnZ2ZJu9vPEE0/oO9/5jrMFTdHJkyeVm5urkpISHThwwOly4vLee+/pX//6l/bv369//OMf2rx5s37zm9/I4/E4XdqkzJ49W//85z+1cuVKvffee9q/f7/TJcUwW92L2eouzFb3cctsdez3CJeXl2vevHmSpL6+Pi1ZskQvvviiE6XEbWBgQNu3b9f3vvc9LV++3OlyJqWpqUlf+MIXtGrVKklSaWmpenp6HK5qat555x1t2bIl9ixbMvrWt74lj8cjj8ejUCik+fPna9++fbr99tudLm3SfvSjHyk3N1cbNmyQJH3jG99QW1ubPv3pTztc2eQ0NTUpIyNDNTU1euedd7R+/Xr96le/cu0dSmarOzBb3YXZ6j5uma2O/dX0d7/7Xeznr33ta/r5z3/uVClx+dvf/qYnn3xSzz77bFLdgXlfcXGx/vCHP2jVqlXq6+tTIBBwuqQpuXLlijZs2KBdu3bpy1/+stPlTNmHA0t1dbXq6uqSclBLN7/B5xe/+IUeffRRXbp0Sf/73/80d+5cp8uatDlz5ig9PV2SdNttt2l0dFQ3btxwuKqPx2x1B2aruzBb3cctszW5/o3GhVpaWjQyMqIf/vCHkiS/3699+/Y5XNXElZeXq7e3VxUVFbIsS42NjU6XNCX79+/Xf//7X7W2tqq1tVWS9LOf/Szp/6eIZLZixQr95S9/0cMPPyzLsrRr166kfL7wkUce0c6dO1VVVaXr169r27Ztmj17ttNlpTxmqzswW92H2ZpYfMUyAAAAjOT47xEGAAAAnEAQBgAAgJEIwgAAADASQRgAAABGIggDAADASARhAAAAGIkgDAAAACP9H4FUpE14/F18AAAAAElFTkSuQmCC", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots(1, 2, figsize=(12, 4),\n", " sharex=True, sharey=True,\n", " subplot_kw={'xlim':(-4, 9),\n", " 'ylim':(-0.02, 0.3)})\n", "fig.subplots_adjust(wspace=0.05)\n", "for i, offset in enumerate([0.0, 0.6]):\n", " ax[i].hist(x, bins=bins + offset, density=True)\n", " ax[i].plot(x, np.full_like(x, -0.01), '|k',\n", " markeredgewidth=1)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "On the left, the histogram makes clear that this is a bimodal distribution.\n", "On the right, we see a unimodal distribution with a long tail.\n", "Without seeing the preceding code, you would probably not guess that these two histograms were built from the same data. With that in mind, how can you trust the intuition that histograms confer?\n", "And how might we improve on this?\n", "\n", "Stepping back, we can think of a histogram as a stack of blocks, where we stack one block within each bin on top of each point in the dataset.\n", "Let's view this directly (see the following figure):" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "(-0.2, 8.0)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWkAAAD3CAYAAADfYKXJAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAQ90lEQVR4nO3db2xUdb7H8c/QUtqBlIYKi4abWkiqsw+Q0PsAogioyJ/EjWKRUugFi+yl6V7500ClgQZEKVWJYHZpCyyoFUFRNPQJBJWgECXspBq5eySxqeTSVoRawdoWZtq5D4gsf+cctjNnfh3er0fans7v+5vTvHOYf/WEQqGQAABG6hPrAQAAt0akAcBgRBoADEakAcBgRBoADEakAcBgiXYHBAIBvfDCC2psbFSfPn20du1ajRgxwo3ZAOCOZ3slffjwYQWDQe3evVtFRUXauHGjC2MBACQHkc7MzFRXV5e6u7vV1tamxETbi28AQITYFtfr9aqxsVFTp05Va2urqqqqrvm+3++P2nAAEM+ys7Ntj7GN9JtvvqmHHnpIxcXFam5u1ty5c1VbW6t+/frd1kK9lWVZ8vl8sR4jathf7xbP+4vnvUnOL3BtI52amqq+fftKkgYOHKhgMKiurq6eTQcAcMQ20vPmzVNpaany8vIUCAS0ZMkSeb1eN2YDgDuebaT79++vTZs2uTELAOA6vJkFAAxGpAHAYEQaAAxGpAHAYEQaAAxGpAHAYEQaAAxGpAHAYEQaAAxGpAHAYEQaAAxGpAHAYEQaAAxGpAHAYEQaAAxGpAHAYEQaAAxGpAHAYEQaAAxm+zcO9+7dq48++kiSdPHiRVmWpaNHjyo1NTXqwwHAnc420tOnT9f06dMlSWvWrNHTTz9NoAHAJZ5QKBRycuC3336rV155RTU1Ndd83e/3y+v1RmU4E3R2dio5OTnWY0QN++vd4nl/8bw3SWpvb1d2drbtcbZX0r+rrq5WUVHRTb/n8/mcT9bLWJbF/nox9td7xfPepMsXuE44euLwwoULamho0JgxY3o0FADg9jiK9PHjxzV27NhozwIAuI6jSDc0NGjYsGHRngUAcB1Hj0k/99xz0Z4DAHATvJkFAAxGpAHAYEQaAAxGpAHAYEQaAAxGpAHAYEQaAAxGpAHAYEQaAAxGpAHAYEQaAAxGpAHAYEQaAAxGpAHAYEQaAAxGpAHAYEQaAAxGpAHAYI7+fFZ1dbU+++wzBQIBzZo1SzNmzIj2XAAAOYj0sWPHVFdXp127dqmjo0Pbt293Yy4AgBxE+siRI8rKylJRUZHa2tq0fPlyN+YCAMhBpFtbW9XU1KSqqiqdPn1ahYWF2r9/vzwez5VjLMuK6pCx1NnZyf56sXjf36bKrTp3/jdX1hqaPlBFC551ZS0p/s+dU7aRTktL0/Dhw5WUlKThw4erX79++vnnn5Wenn7lGJ/PF9UhY8myLPbXi8X7/s6d/00Pzil2Za0TtdtcvS/j/dz5/X5Hx9m+uiM7O1tffPGFQqGQzpw5o46ODqWlpfV0PgCAA7ZX0hMnTtTx48eVk5OjUCiksrIyJSQkuDEbANzxHL0EjycLASA2eDMLABiMSAOAwYg0ABiMSAOAwYg0ABiMSAOAwYg0ABiMSAOAwYg0ABiMSAOAwYg0ABiMSAOAwYg0ABiMSAOAwYg0ABiMSAOAwYg0ABiMSAOAwYg0ABjM0d84fOqppzRgwABJ0rBhw1ReXh7VoQAAl9lG+uLFiwqFQqqpqXFjHgDAVWwj/d1336mjo0MFBQUKBoNaunSpRo0adc0xlmVFa76Y6+zsZH+9WLzv71RDvepK5ruy1l2pKa7el/F+7pyyjXRycrLmz5+vGTNm6IcfftCCBQu0f/9+JSb+60d9Pl9Uh4wly7LYXy8W7/vLyByhvDnFrqx1onabq/dlvJ87v9/v6DjbSGdmZiojI0Mej0eZmZlKS0vT2bNndffdd/d4SABAeLav7vjggw+0fv16SdKZM2fU1tamwYMHR30wAICDK+mcnBytWLFCs2bNksfj0bp16655qAMAED22tU1KStKGDRvcmAUAcB3ezAIABiPSAGAwIg0ABiPSAGAwIg0ABiPSAGAwIg0ABiPSAGAwIg0ABiPSAGAwIg0ABiPSAGAwIg0ABiPSAGAwIg0ABiPSAGAwIg0ABiPSAGAwR5FuaWnR+PHjVV9fH+15AABXsY10IBBQWVmZkpOT3ZgHAHAV20hXVFQoNzdXQ4YMcWMeAMBVwv618L1792rQoEEaN26ctmzZcsvjLMuK+GCm6OzsZH+9WLzv71RDvepK5ruy1l2pKa7el/F+7pwKG+kPP/xQHo9HX375pSzLUklJiSorKzV48OBrjvP5fFEdMpYsy2J/vVi87y8jc4Ty5hS7staJ2m2u3pfxfu78fr+j48JGeufOnVf+Oz8/X6tXr74h0ACA6OEleABgsLBX0lerqamJ5hwAgJvgShoADEakAcBgRBoADEakAcBgRBoADEakAcBgRBoADEakAcBgRBoADEakAcBgRBoADEakAcBgRBoADEakAcBgRBoADEakAcBgRBoADEakAcBgRBoADGb7Nw67urq0cuVKNTQ0yOPxaM2aNcrKynJjNgC449leSR86dEiStHv3bi1evFivv/561IcCAFxmeyX92GOPacKECZKkpqYmpaam3nCMZVkRH8wUnZ2d7K8Xi/f9nWqoV13JfFfWutR+QTlzz7uylnR5bxmZI1xbb2j6QBUteNa19ZyyjbQkJSYmqqSkRAcPHtQbb7xxw/d9Pl/EBzOFZVnsrxeL9/1lZI5Q3pxiV9basmqhHnRpLUmqK5nv2t4k6UTtNld/V/x+v6PjHD9xWFFRoQMHDmjVqlVqb2//twcDADhnG+mPP/5Y1dXVkqSUlBR5PB716cOLQgDADbYPdzz++ONasWKFZs+erWAwqNLSUiUnJ7sxGwDc8Wwj7fV6tWnTJjdmAQBch8ctAMBgRBoADEakAcBgRBoADEakAcBgRBoADEakAcBgRBoADEakAcBgRBoADEakAcBgRBoADEakAcBgRBoADEakAcBgRBoADEakAcBgRBoADBb2z2cFAgGVlpaqsbFRly5dUmFhoR599FG3ZgOAO17YSO/bt09paWl69dVX9csvv+jJJ58k0gDgorCRnjJliiZPnixJCoVCSkhIcGUoAMBlnlAoFLI7qK2tTYWFhXrmmWf0xBNPXPM9v98vr9cbtQFjrbOzU8nJya6t97etO/Rjy3nX1rtrYH8tKlzg2npuc/v8ue2//vt/9HPbRVfWavmpSelD7nFlLUlqOdOk9D+4t95dqSl6s3KTa+u1t7crOzvb9riwV9KS1NzcrKKiIuXl5d0Q6N/5fL7bn7CXsCzL1f11BKUH5xS7tt7RdzZw/nqxjMwRynPp92XLqoX689oqV9aSpL+WzNdfKv7u2nonare5+rvi9/sdHRc20ufOnVNBQYHKyso0duzYiAwGAHAu7EvwqqqqdOHCBW3evFn5+fnKz89XZ2enW7MBwB0v7JX0ypUrtXLlSrdmAQBchzezAIDBiDQAGIxIA4DBiDQAGIxIA4DBiDQAGIxIA4DBiDQAGIxIA4DBiDQAGIxIA4DBiDQAGIxIA4DBiDQAGIxIA4DBiDQAGIxIA4DBiDQAGIxIA4DBHEX6m2++UX5+frRnAQBcJ+wfopWkrVu3at++fUpJSXFjHgDAVTyhUCgU7oADBw7ovvvu0/Lly/X+++/f8H2/3y+v1xu1Aa/3t6079GPLedfWO9VQr4zMEa6td+zYV0ryprq23sXfzmvMmLGurfd/pxr0HxmZrq3n9vlze39fffWl+vUf6MpaLT81KX3IPa6sJUktZ5qU/gf31rsrNUVvVm5ybb329nZlZ2fbHmd7JT158mSdPn067DE+n8/5ZD3UEZQenFPs2np1JfOV5+J6/3tyof68tsq19f5aMt/V+3PLqoVxff5c39+38/WXir+7staWVe7/brq1N0k6UbvN1Zb5/X5Hx/HEIQAYjEgDgMGINAAYzFGkhw0bdtMnDQEA0cWVNAAYjEgDgMGINAAYjEgDgMGINAAYjEgDgMGINAAYjEgDgMGINAAYjEgDgMGINAAYjEgDgMGINAAYjEgDgMGINAAYjEgDgMGINAAYjEgDgMES7Q7o7u7W6tWrdfLkSSUlJemll15SRkaGG7MBwB3P9kr6k08+0aVLl/Tee++puLhY69evd2MuAIAcRNrv92vcuHGSpFGjRunEiRNRHwoAcJntwx1tbW0aMGDAlf9PSEhQMBhUYuK/ftSyrOhMdxMpidLRdzZE7Pb++fU/9MdR/3nL7w8a0C+i69nNkZqc6Hi933/Gbg/hRHt/1892O/tzepvhmHz+erqWdO3+rv767dxH//z6H5J0w+/Src5duNu2W7cn587Jz/Zk/aHpA11tmVOeUCgUCndAeXm5HnjgAU2bNk2S9PDDD+vzzz+/8n2/36/s7OzoThlFHo9H4e4Cy7Lk8/liPke4n/l3fvZ30d5fT2aLxG2afP4isdbV+7v667czj8fjkaQbfpdudRvhbttu3Z6cOyc/G8n1o81pO20f7hg9evSVKH/99dfKysrq+XQAAEdsH+6YNGmSjh49qtzcXIVCIa1bt86NuQAAchDpPn366MUXX3RjFgDAdWwfk7bj9/sjNQsA3FGcPCbd40gDAKKHt4UDgMGINAAYjEgDgMEiFun6+nplZ2fr4sWLkbpJI/z6669auHCh5syZo5kzZ6quri7WI0VEd3e3ysrKNHPmTOXn5+vUqVOxHiliAoGAli1bpry8POXk5OjTTz+N9UhR0dLSovHjx6u+vj7Wo0RcdXW1Zs6cqenTp2vPnj2xHieiAoGAiouLlZubq7y8PNvzF5FIt7W1qaKiQklJSZG4OaPs2LFDY8aM0TvvvKPy8vK4eTliPH9w1r59+5SWlqZ3331X27Zt09q1a2M9UsQFAgGVlZUpOTk51qNE3LFjx1RXV6ddu3appqZGP/74Y6xHiqjDhw8rGAxq9+7dKioq0saNG8Me3+NIh0IhrVq1SkuXLlVKSkpPb8448+bNU25uriSpq6tL/fr1i/FEkRHPH5w1ZcoULVq0SNLl38+EhIQYTxR5FRUVys3N1ZAhQ2I9SsQdOXJEWVlZKioq0sKFCzVhwoRYjxRRmZmZ6urqUnd3t9ra2q75HKSbsX0zy9X27Nmjt95665qv3XPPPZo2bZruv//+25/WMDfb37p16zRy5EidPXtWy5YtU2lpaYymiywnH5zVW/Xv31/S5T0+//zzWrx4cWwHirC9e/dq0KBBGjdunLZs2RLrcSKutbVVTU1Nqqqq0unTp1VYWKj9+/df+YyR3s7r9aqxsVFTp05Va2urqqqqwh7f49dJT5o0SUOHDpV0+bM9Ro4cqZ07d/bkJo1z8uRJLV26VMuXL9f48eNjPU5E2H1wVm/X3NysoqKiK49Lx5PZs2fL4/HI4/HIsizde++9qqys1ODBg2M9WkS89tprGjRokAoKCiRJf/rTn7Rjxw6lp6fHeLLIKC8vV1JSkoqLi9Xc3Ky5c+eqtrb2lv9K7/Fl08GDB6/89yOPPKLt27f39CaN8v3332vRokXauHFjXPxr4XejR4/WoUOHNG3atLj74Kxz586poKBAZWVlGjt2bKzHibirL4Ly8/O1evXquAm0dPldeG+//baeffZZ/fTTT+ro6FBaWlqsx4qY1NRU9e3bV5I0cOBABYNBdXV13fL43v9v2yjbsGGDLl26pJdfflmSNGDAAFVWVsZ4qp6L5w/Oqqqq0oULF7R582Zt3rxZkrR169a4fJItHk2cOFHHjx9XTk6OQqGQysrK4up5hXnz5qm0tFR5eXkKBAJasmSJvF7vLY/nbeEAYDDezAIABiPSAGAwIg0ABiPSAGAwIg0ABiPSAGAwIg0ABvt/sOtW0dK/WmEAAAAASUVORK5CYII=", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots()\n", "bins = np.arange(-3, 8)\n", "ax.plot(x, np.full_like(x, -0.1), '|k',\n", " markeredgewidth=1)\n", "for count, edge in zip(*np.histogram(x, bins)):\n", " for i in range(count):\n", " ax.add_patch(plt.Rectangle(\n", " (edge, i), 1, 1, ec='black', alpha=0.5))\n", "ax.set_xlim(-4, 8)\n", "ax.set_ylim(-0.2, 8)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The problem with our two binnings stems from the fact that the height of the block stack often reflects not the actual density of points nearby, but coincidences of how the bins align with the data points.\n", "This misalignment between points and their blocks is a potential cause of the poor histogram results seen here.\n", "But what if, instead of stacking the blocks aligned with the *bins*, we were to stack the blocks aligned with the *points they represent*?\n", "If we do this, the blocks won't be aligned, but we can add their contributions at each location along the x-axis to find the result.\n", "Let's try this (see the following figure):" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWkAAAD3CAYAAADfYKXJAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAbbklEQVR4nO3dbWwU170G8Gft9Xq9GHCMTSBwy0sSh72VCM3eD0EqIWmVQpAStZQU49QKpa0U5KpJQCGNRSzStBC3jUqqCExSJW1pAm1aUpEvRLSNSMJtEXcvoUR3RRPqpLUxxjYGY9Zr78vcDxTjl52dmZ23M2een4SAndmZ/9mZffbsmZcNKIqigIiIhFTidgFERKSOIU1EJDCGNBGRwBjSREQCY0gTEQmMIU1EJLCg1gzpdBrf+9730NnZiZKSEjz77LO4+eabnaiNiMj3NHvSR44cQSaTwf79+9HU1ISdO3c6UBYREQE6QnrBggXIZrPI5XIYHBxEMKjZ+SYiIotoJm4kEkFnZyfuu+8+9Pf3o62tbdz0eDxuW3FERDKLxWKa82iG9C9+8Qt8/vOfx+bNm9HV1YWHH34Yb731FsrLyw2tyKsSiQSi0ajbZdiG7fM2mdsnc9sA/R1czZCeNm0aysrKAADTp09HJpNBNps1Vx0REemiGdLr169Hc3MzGhoakE6n8fjjjyMSiThRGxGR72mG9JQpU/DCCy84UQsREU3Ai1mIiATGkCYiEhhDmohIYAxpIiKBMaSJiATGkCYiEhhDmohIYAxpIiKBMaSJiATGkCYiEhhDmohIYAxpIiKBMaSJiATGkCYiEhhDmohIYAxpIiKBMaSJiATGkCYiEhhDmohIYJq/cXjgwAG8+eabAIDh4WEkEgkcPXoU06ZNs704IiK/0wzp1atXY/Xq1QCAZ555Bl/96lcZ0EREDgkoiqLomfHUqVP40Y9+hL179457PB6PIxKJ2FKcCFKpFMLhsNtl2Ibt8zat9l0ezuJUdwq1U4K4dUb56OM9VzL4qG8YFWUl+NzsCkPrHM7kcHYgjQXV5dozmyD7tksmk4jFYprzafakr9mzZw+ampryTotGo/or85hEIsH2eZjf29d1aQhHe/6FOVVTEY3OHjNhAPH+cwiGQ4hG5xta50Aqjf5P+hFdNLPIqvWRfdvF43Fd8+k6cDgwMID29nbceeedpooiImdd+56s7/uysWWSM3SF9PHjx7F06VK7ayEiiykT/h59fDS8i0hcBVAmLZHsoiuk29vbMXfuXLtrISKHXAvZYqOWvWnn6BqT/ta3vmV3HURkg2s95YmhamYYRIHCkHYQL2YhkthoGKv0mYvJWkUpvgdOxjGkiXzA0gOH1i2KdGBIE0nsek9a5fEi0ltRlOIOOFJRGNJEPmT27AxGtHMY0kQSGz2LY0LP19yBQ57d4SSGNJHE1ML0+vnTxQx3jF0C2Y0hTSSx0TC29MAhA9pJDGkiiY2eJ42Jwx35z5/Wt1AOdziJIU0kMbWetNrl4nqXyYx2DkOayIfM3niJPWnnMKSJJKYexvmHQfQuk+PSzmFIE0nN3I2U8i+RAe0khjSRxNSuLDR1njQPHDqKIU0kMdX7SZtcJjPaOQxpIh8yddN/E88j4xjSRBJTVLrSZs6TZkA7iyFNJDFF5SwOU+dJc0zaUQxpIonZ8UO05CxdP5+1Z88e/PnPf0Y6nca6devw4IMP2l0XEVlA+37SxS2Tp+E5RzOkjx07hhMnTmDfvn0YGhrCK6+84kRdRGQjtWEQ3c9nRjtGM6Tff/991NXVoampCYODg9iyZYsTdRGRCSOZHIYzWQylswCATE7B5VR63HTgatiOfbyyPIhAIIDhTHZ0nomS6Qwy2avLqygrRbCUo6Z2Cigah2q3bt2Ks2fPoq2tDR0dHdi4cSMOHTqEQCAAAIjH44hEIo4U64ZUKoVwOOx2GbZh+7xNrX0f9Q3j2L+uGF7eusU3oLQkgFPnhnDy3JDm/PfeMhU3VpYZXo8esm+7ZDKJWCymOZ9mT7qqqgoLFy5EKBTCwoULUV5ejgsXLmDGjBmj80SjUXPVCiyRSLB9HubX9mU6LuHMULfh5UWjt6K0JICB8j50Zvo057/55rn4j2p7Ommyb7t4PK5rPs3vKbFYDO+99x4URUF3dzeGhoZQVVVltj4islHxY83W3+uDzNHsSd9zzz04fvw41qxZA0VR0NLSgtLSUidqI6IiFX0LUoPP5wFE++k6BY8HC4m8pdjsvH7KHtNXFDwsSyShou/JAZUTq7XmJ9swpImoaBzusB9DmkhC5oc77F0P6ceQJpKQ2R6u/gOHjGm7MaSJpGTucm+ONYuDIU0koeJPwTN2n2lGuf0Y0kQ0ie4xaaa07RjSRBIye+DQ/jWRXgxpIgmZv+JQ3wLYk7YfQ5pIQmZ/YJbZKw6GNBFNxgOHwmBIE0mo6DHp0b853CEKhjSRhIoekzb4PJ5PbT+GNJGEig5Pgz9Qy560/RjSRDJy6GIWsh9Dmogm4cUs4mBIE0nI9F3w9J4nzTFp2zGkiSRk9mIWu9dD+jGkiSTEH6KVh67fOPzKV76CyspKAMDcuXOxY8cOW4siInP4Q7Ty0Azp4eFhKIqCvXv3OlEPEQmB6SuKgKJxhODkyZPYsmUL5syZg0wmg02bNmHJkiWj0+PxOCKRiN11uiaVSiEcDrtdhm3YPm+b2L7zgxm09w+j50oWF1MZw8ubVxVCqDSAc4MZXB7Oas4/c0oZpodLMP+GEG6sLDO8vkJk33bJZBKxWExzPs2edDgcxje/+U08+OCD+OSTT/Dtb38bhw4dQjB4/anRaNRctQJLJBJsn4f5rX3pjov434vnEawEaiqNL+/Kv/+UT736R0sOQD+AJbNmIvofVcZXWIDs2y4ej+uaTzOkFyxYgHnz5iEQCGDBggWoqqpCT08PZs+ebbpIIrKWW2PE/K1D+2ie3fG73/0Ozz33HACgu7sbg4ODqK2ttb0wIjLOrahkRNtHsye9Zs0aPPXUU1i3bh0CgQC2b98+bqiDiIgdaftopm0oFMLzzz/vRC1EZJJ7ww5MabvwYhYiibgW0cxo2zCkiSTCsJQPQ5pIKu6kND8b7MOQJiLT2IO3D0OaSCI8T1o+DGkiifDcDvkwpIkkwg6tfBjSRBJxa9iBHw72YUgTScS94Q6mtF0Y0kRkHjPaNgxpIom4dnaHO6v1BYY0kUQ47CAfhjSRTFw7T9qd9foBQ5pIIjxwKB+GNBGZxp60fRjSRBLhgUP5MKSJJOLWsAPv3WEfhjSRRJiV8tEV0n19fVi+fDnOnDljdz1EZAJvsCQfzZBOp9NoaWlBOBx2oh4i8iKmtG00Q7q1tRX19fWYOXOmE/UQURGyOQWpdBaZbM6V9aezOVfXL7OCvxZ+4MABVFdXY9myZXjppZdU50skEpYXJopUKsX2eZhf2tc9mMbhjy+7VseR3h4c+dsZ/NecCBbVWvOtW/Ztp1fBkP7973+PQCCAv/zlL0gkEnjyySexe/du1NbWjpsvGo3aWqSbEokE2+dhfmlf5YUkTlzscLsczJtfi+hnbrBkWbJvu3g8rmu+giH92muvjf67sbER27ZtmxTQRETX8OwS6/EUPCIJMBzlVbAnPdbevXvtrIOITBDn3hmi1CEP9qSJJCBKT1qUOmTCkCYiyzCjrceQJpKAKOHInrT1GNJEEhDlBkei1CEThjSRBBiN8mJIE0lAlA6sIGVIhSFNRJYR5cNCJgxpIimIkY7inK8tD4Y0kQSE6cGKUodEGNJEEmA2yoshTSQBUXrSgpQhFYY0kQREGQsW5cNCJgxpIrKMKB8WMmFIE0lAlB6sKHXIhCFNJAGGo7wY0kQSEGWYQYwq5MKQJpKAKD1p3mDJegxpIrIMI9p6DGkisg5T2nKav3GYzWaxdetWtLe3IxAI4JlnnkFdXZ0TtRGRTqKMMogyNi4TzZ70O++8AwDYv38/HnvsMfz0pz+1vSgiMobhKK+AomOkP5PJIBgM4s0338Rf//pXtLa2jk6Lx+OIRCK2FummVCqFcDjsyLo+7R9BbzKje/5IWQmiM83V5mT73OD19rX3j+BCgX1iZGQEoVAIF4ay6B5MO1hZflXhIGZPnfwFfc60MsyaWgYAOHVuCCPZ8bHznzPDqCi73mf8x4VhdF8aQigUwo2VQcydHho3f6InheRITrOekpIAPje7opim2C6ZTCIWi2nOpzncAQDBYBBPPvkkDh8+jJ/97GeTpkejUeMVekQikXCsfR3/142eK5d0z39jOIxo9DOm1ulk+9zg9fZ9+uE59FwZUJ3ee6kHNTVTgTBQI8hnUU+ebt+imTWIzq8GAPx3XzsGMuM/UOYtnIcZleWj///H37pw5p//QE3NVMybcQOit9aOm/+Dy/9Ez3BKs5ZQoATR6C1FtMJ+8Xhc13y6Dxy2trbi7bffxtNPP41kMll0YaTO6OlL/IrrB3Js47GtyLefT3xk7L6d720hyhi8EzRD+g9/+AP27NkDAKioqEAgEEBJCU8KsYPR/c5PO6pfybKNtdoxcfrY/+d7qt4OigznbWsOd3zpS1/CU089hYceegiZTAbNzc2eHuMTmQT7E1FeWmFp9Fuhn94rmiEdiUTwwgsvOFELGd1RbaqCxCHLNh4/3KExw6T5tYdHVNcrwQvIcQuBGN6hZNgDqSBZNvH44QsdY9JjnpD3JdD5wsjw8jGkPUyGHZAKk/HgsPHOiIPrEhBDWiDsSNNEsmzjcWdr5Jte8MChieEOCT7kGNICkeUNSTSJ1tkdPHCoiiEtEOM7qo/2VJ+SZQtrHTic1JPWPE9a7yl43n+fMKQFYnRf8vauR3p4PWCuMX7gMP+/1eaXGUPawyR5/5LP6NlvtS5msXp9ImNIC8TBg97kEV4PmGu0hvImfmPQPE/awOvi9ZeQIS0QWb7aEk1k9LJwzeUVX4rnMKQ9jKEuPxlOIQO0e8aT5te4mMXIvu/19wlDWiAe35fIBrLsE+NCV9fZHerTDK/b3NNdx5D2MFnewKROxm2sq0mK6n+Mr8/jryFDWiCGz5P2fB+BtMiyhTXPk57QUivvJ+319wlDWiCGD554e98jHbw+njpK6zxpm+4nnW/ZXsOQFojXdyYiNZqn4Bldno/eKwxpgfA8aZpIlm2seQVhwfOk88xvTVmewJD2MD/1JnxLsm1s5J4bo//mgUMSBX+IliaSZRtf27XVdvFJ9+7ggcNRBX8+K51Oo7m5GZ2dnRgZGcHGjRvxxS9+0anafIfjcjSRLNtYmfD3pOk8cKiqYEgfPHgQVVVV+PGPf4yLFy/iy1/+MkPaTh7fmYjUaH9L5P2k1RQM6ZUrV2LFihUArr7IpaWljhTlN+lsDgCQMzrcoSijz9UrACBYKsYoVyabU31rlglSo10URUEmp+Rt59htmpMkjHL/3lezKg3K5Mbvy2NDPZczvp+Plc7mNJ8fLAkgEAgUvQ47BRQdA6GDg4PYuHEjvva1r+H+++8fNy0ejyMSidhWoNtSqRTC4bCt69j3t37Vnddqc6eFcPfCytH/O9E+Ne99MohPL45MejxSVoLVn62yZB1utq+Q/qEM3vvkCh6ITp807fWT/bo/sDOZDILBgn0tz3Kybfcvmo7pYWc7oclkErFYTHM+zVegq6sLTU1NaGhomBTQ10SjUeMVekQikbC9fTO7PkI660xI31Q7BdHonNH/O9E+Ne2ZLlwJXp70eGV5ENHoQkvW4Wb7Cum5PIyPUl2IRudPmlbT+ZHukO7t7UFNTa3F1YnBybbV1c1H9ZSQI+u6Jh6P65qvYEj39vZiw4YNaGlpwdKlSy0pjCbz0/jaWOpH+uV/QRQoquO0fmi/aES+srPgwF9bWxsGBgawa9cuNDY2orGxEalUyqnafMPJ3UOkfdHXYaTwOLFIRN4WBXvSW7duxdatW52qxbecDE6RglG1Jy1OibZRoH7lnR/aLxqRX3O5D6F7hJPBKdLOqHrOrKNVuENhT1ooInVeJmJIk3BE+iBxmp/bTvkxpF3m9NdbkULAzwfO1A4cyt9yQQn8wjOkfUbgfXGUSB8kdlEfj/dB4wUk8qvOkHaZ0+9JkUJAoFIcp3bgkNwh8rZgSLvM6X1DpH3RD8MaahRFyf8LJS7UQmLviwxpEo5IvX2n+bjppIIh7TLHA0mgEPD1edKKvh9kJWeIvM8xpF3m/HCHOHuj3hvAyyrvfZL90njBiPyyM6Rd5vyBQ2fXV4hApThOrSdN7hB5iI0h7TKne7Yi7Yqq50mLVKRNFOQ/cEjuEHmfY0i7jD3pfI8LVKRNVMek5W86GcSQJuH4Oaj88AFFxjCkfUaoEChQishjhFbQ+4Os5AyRX3eGtMv8PdwhUDEOu3rPFv+2XzQi74sMaZf5+8BhcdNkoHo/accrIUDs/Y0h7TLHdw6B9sZClYhTpT3U7ifN3rU7RH7VGdIkJL+GlT9bTYUwpF3m6xssFRrucK4Ml+S/j7hPP5tcJ3KnQFdInzx5Eo2NjXbX4ktO7xwi7YsiH6yx29XhDv+2XzQib4mCP0QLAC+//DIOHjyIiooKJ+rxHeeHpMXZHXngUGUCOU7k/S2gaLxr3377bdx2223YsmULfvvb306aHo/HEYlEbCtQj2Q6h9O9w7YsOz0ygrJQaNxj1RWlmFcVyjt/4nwKqaz+LZ7OKvh7b8pUjUaESktwa005AGDxjWGkR4YRDoc1n/dx3zAuj+QsreVM3zBSmfzLjNaGUVISML2OsdsvWlOOcFnhL48nu4ZgbSvzu5TKouPSCD574/jOj9H9IZPJIBjU7Gt5kpNt+0xVCFPLS1Wn3z4rjJKA+f1xrGQyiVgspjmf5iuwYsUKdHR0FJwnGo3qr8wG5y+n8G73P21Zdm9/D2pqKsc9Vj11GqLRWXnnP3ahHf0jaUPrqKmZWnR9xejOXv37tkW34OO/n9a1/U79z7/QPTRkaR2VVZWoVJnWowDIml/H2O23YuE8zKgsLzj/4bMfIZtzoFtVBtTUXN8WYxnZH3p7e1BTU2thYeJwsm1JAMkC+9ttt92CYKm1h/Di8biu+eQ4cOj4VxX1FQr8rWkSQ1/xvNQwFXqaIPLXXnKPm7uFFCEt0qnGXnqTGzlwJcNBLj3bRoZ2kvXcfF/LEdICva8EKoUmYACTF+kK6blz5+Y9aCgKkS6tFunsCS1GSvVQs9Tp6UnL0E6ynJsf8FL0pJ3mxzeyH5rspQ9YchaHO0xy/rdc5Xgz+60nrdUEGdpI8pEjpJ1eHw8cepLWtvF+C8ku7EmbJNLXVBnCTFbcNuRFkoS0w+srNM1DOeC74Q6tnrQMjSRb8MChxxR6M3vpbe6za1k0+aGNVBwOd5ArDPUcJehl8sAhFYtXHJok0u8EeumN7reetNaHEsesSY2bQ2FyhLRAby6RaqHxvPQBSnSNHCEt0HnSXgoCn412aPJDG6k4HO4wSaTzpD3FUEjL0mgi43jg0GNkyStjF7N4n/YpeM7UQR7EkDbH8d8JLDTNQ+90vw13aH0o8XgCqeF50iaJ9DuBXgoz/53dYW46+ReHO0wS6c0lUClEJAEpQtrpaJTnsnADY9JeapgKzYtZHKmCvIhnd3iNJO9mSZqhm+bFLBJ8EJE9eDGLSUKdJ+2h6PPfgUNz08m/3Nw3gloz5HI5bNu2DadPn0YoFMIPfvADzJs3z4nadBPpPGkvhRnvJ21sOvmX0AcO//jHP2JkZAS/+c1vsHnzZjz33HNO1GWIKG8ufl0mIqtphnQ8HseyZcsAAEuWLMGHH35oe1FGifJDtJ7LaJ8Nd2g2WIo2kh3c/CapOdwxODiIysrK0f+XlpYik8kgGLz+1EQiYU91BqycW9zzXnzxRXznO99RnZ6qiSAczk549Ipqm+2qo9BzinkuAPR1tiOVSunafnfNNLx4AMW1y8pljt1+6b4OJPoKz+/k9ivW2HWNbd/Yx43U8+KLLwLApH1JbRmFlq213mK3nd7nWrn+sfo62qGx69gmoGh8R9+xYwduv/12rFq1CgBw11134d133x2dHo/HEYvF7K3SRoFAoOAwRSKRQDQadb2OQs8p5rnX2N0+M7VZsUyRt58V6xrbvrGPG6knEAgAwKR9SW0ZhZattV4z207Pc61cv930ZqfmcMcdd9wxGsoffPAB6urqzFdHRES6aA533HvvvTh69Cjq6+uhKAq2b9/uRF1ERAQdIV1SUoLvf//7TtRCREQTaI5Ja4nH41bVQkTkK3rGpE2HNBER2UeKy8KJiGTFkCYiEhhDmohIYJaF9JkzZxCLxTA8PGzVIoVw+fJlPPLII/j617+OtWvX4sSJE26XZIlcLoeWlhasXbsWjY2N+PTTT90uyTLpdBpPPPEEGhoasGbNGvzpT39yuyRb9PX1Yfny5Thz5ozbpVhuz549WLt2LVavXo033njD7XIslU6nsXnzZtTX16OhoUFz+1kS0oODg2htbUUoFLJicUJ59dVXceedd+LXv/41duzYIc3piF64cVaxDh48iKqqKrz++uv4+c9/jmeffdbtkiyXTqfR0tKCcDjsdimWO3bsGE6cOIF9+/Zh7969OHfunNslWerIkSPIZDLYv38/mpqasHPnzoLzmw5pRVHw9NNPY9OmTaioqDC7OOGsX78e9fX1AIBsNovy8nKXK7KGF26cVayVK1fi0UcfBXB1/ywtLXW5Iuu1traivr4eM2cWeVMVgb3//vuoq6tDU1MTHnnkEdx9991ul2SpBQsWIJvNIpfLYXBwcNx9kPLRvJhlrDfeeAO//OUvxz120003YdWqVVi0aJHxagWTr33bt2/H4sWL0dPTgyeeeALNzc0uVWctPTfO8qopU6YAuNrG7373u3jsscfcLchiBw4cQHV1NZYtW4aXXnrJ7XIs19/fj7Nnz6KtrQ0dHR3YuHEjDh06NHqPEa+LRCLo7OzEfffdh/7+frS1tRWc3/R50vfeey9mzZoF4Oq9PRYvXozXXnvNzCKFc/r0aWzatAlbtmzB8uXL3S7HElo3zvK6rq4uNDU1jY5Ly+Shhx5CIBBAIBBAIpHA/PnzsXv3btTW1rpdmiV+8pOfoLq6Ghs2bAAAPPDAA3j11VcxY8YMlyuzxo4dOxAKhbB582Z0dXXh4YcfxltvvaX6Ld10t+nw4cOj//7CF76AV155xewihfLxxx/j0Ucfxc6dO6X4tnDNHXfcgXfeeQerVq2S7sZZvb292LBhA1paWrB06VK3y7Hc2E5QY2Mjtm3bJk1AA1evwvvVr36Fb3zjGzh//jyGhoZQVVXldlmWmTZtGsrKygAA06dPRyaTQTY78XbI13n/u63Nnn/+eYyMjOCHP/whAKCyshK7d+92uSrzZL5xVltbGwYGBrBr1y7s2rULAPDyyy9LeZBNRvfccw+OHz+ONWvWQFEUtLS0SHVcYf369WhubkZDQwPS6TQef/xxRCIR1fl5WTgRkcB4MQsRkcAY0kREAmNIExEJjCFNRCQwhjQRkcAY0kREAmNIExEJ7P8By3rdik2MKNQAAAAASUVORK5CYII=", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x_d = np.linspace(-4, 8, 2000)\n", "density = sum((abs(xi - x_d) < 0.5) for xi in x)\n", "\n", "plt.fill_between(x_d, density, alpha=0.5)\n", "plt.plot(x, np.full_like(x, -0.1), '|k', markeredgewidth=1)\n", "\n", "plt.axis([-4, 8, -0.2, 8]);" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The result looks a bit messy, but it's a much more robust reflection of the actual data characteristics than is the standard histogram.\n", "Still, the rough edges are not aesthetically pleasing, nor are they reflective of any true properties of the data.\n", "In order to smooth them out, we might decide to replace the blocks at each location with a smooth function, like a Gaussian.\n", "Let's use a standard normal curve at each point instead of a block (see the following figure):" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWkAAAD3CAYAAADfYKXJAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAcCElEQVR4nO3deWxU570+8GfwbrwFMCGJm0DSpvFtL43KH7epRJMu6YJ+XUWFAyVNqSolQkpoEIlACaAuENRETVsJSFI1amm2Jj9UkeZeaEIphNCwGNuAPdh432Y8+76dOee9f3ChELBn7Dkz5z1nno8UKcFnznzfGD9+5z3vYhNCCBARkZRmGV0AERFNjiFNRCQxhjQRkcQY0kREEmNIExFJjCFNRCSx0mwu+u53v4uamhoAQFNTE7Zv357XooiI6KKMIZ1MJiGEwJ49ewpRDxERXSHjcMf58+cRj8exZs0aPPjgg2hvby9AWUREBAC2TCsOu7u70dHRge9///sYHBzET37yE+zfvx+lpRc74a2trQUplIjIapYsWZLxmozDHYsWLcJtt90Gm82GRYsWoaGhAW63GzfddNO03sis7HY7mpubjS4jb9g+c7Ny+6zcNiD7Dm7G4Y633noLzzzzDABgYmICkUgEjY2NuVVHRERZydiTXr58OTZu3IgHHngANpsN27ZtuzzUQURE+ZUxbcvLy/Hcc88VohYiIvoILmYhIpIYQ5qISGIMaSIiiTGkiYgkxpAmIpIYQ5qISGIMaSIiiTGkiYgkxpAmIpIYQ5qISGIMaSIiiTGkiYgkxpAmIpIYQ5qISGIMaSIiiTGkiYgkxpAmIpIYQ5qISGIMaSIiiTGkiYgkxpAmIpIYQ5qISGIMaSIiiTGkiYgkxpAmIpIYQ5qISGIMaSIiiTGkiYgkxpAmIpIYQ5qISGIMaSIiiTGkiYgkxpAmIpIYQ5qISGJZhbTX68W9996Lvr6+fNdDRERXyBjSiqJg8+bNqKysLEQ9RER0hYwhvWPHDrS0tGD+/PmFqIeIiK5QOtUX9+7dizlz5mDp0qV48cUXJ73ObrfrXpgsEokE22dibJ95Wblt02ETQojJvrhq1SrYbDbYbDbY7XYsXLgQu3btQmNj4+VrWltbsWTJkoIUawS73Y7m5majy8gbts/crNw+K7cNyD47p+xJv/LKK5f/ffXq1di6detVAU1E1hFLpdHnimIsEEcooQAAZpeXYkF9BT7eWIv66jKDKyxOU4Y0EVlfNJnGv/q86HKEoGrXfrDumQjjSI8HtzfOxj13zMX8Wk4iKKSsQ3rPnj35rIOIDHDeGcI/zruQVLSM1/a7oxjwRPHZW2/A5++Yi9ISLrMoBPakiYqQpgkcueBG23BgWq8TAmgd8mMsEMf/W3wTais5BJJv/FVIVGQ0TWB/p3PaAX0lZzCBN06OwBtJ6lcYXRdDmqiICCHw9y4nup3hnO8VTqTxVusogzrPGNJEReRorwd2R+4BfUkspWLv6TEE44pu96SrMaSJikTXeAinBv263zeSTOOvbWNIKKru9yaGNFFRcIeTOGifyNv9fdEU/uecA9p1pvBRbhjSRBanqBr+55wD6TwH6KAnhg/7vXl9j2LEkCayuA96PfBGUgV5rxODPgx5owV5r2LBkCaysFF/LKepdtMlBHCg04lYKl2w97Q6hjSRRaVVDQftroK/bzSp4qDdhSn2bqNpYEgTWdSpIT980cIMc3xUryuC7gn9pvoVM4Y0kQUF4wpODvgMreGf3W4Oe+iAIU1kQe9fcOd9Nkcm8ZSKIz1uQ2uwAoY0kcWMBeK4MBExugwAgN0RxrA3ZnQZpsaQJrIQIQTel6z3eqjbdd19qik7DGkiC+lzR+AIJowu4yq+aAptw/ovRy8WDGkii9A0gWN9cq74Oz7gQzTJh4gzwZAmsojzznDBVhZOVyqtSfsLRHYMaSIL0DSB4wNyh2DneBCusFxDMWbAkCaygPPOMAIxufd0FgI4esFjdBmmw5AmMjlNEzgheS/6kiFvjBswTRNDmsjkLrgi8Evei77S0V4P9/WYBoY0kYkJIXBy0Njl39PlCiVxwSXHYhszYEgTmZgjnIY7bL6DYI/1eniKS5YY0kQm1uky52wJf0xBlyNkdBmmwJAmMilnMIGJiHnGoj/q+ICPy8WzwJAmMqnTJl9qHYor6BpnbzoThjSRCQXjijQ73eXi+ICXvekMGNJEJtQ+EoBmgWls4UQaneNBo8uQGkOayGSSaRXnxqwTbCc4Nj0lhjSRyXSNh5BKa0aXoZtwIg07Z3pMiiFNZCJCCHSMBIwuQ3cnB32cNz0JhjSRiQx6Y6ZaAp6tQEzh6eKTYEgTmUj7iLmn3U3l5KCPe3pcR8aQVlUVGzduREtLCx544AH09PQUoi4i+gh/NIVBj3UPdfVGUuhzm39aod4yhvShQ4cAAK+//jrWrVuHX//613kvioiu1TEaMLqEvDsx4Gdv+iNKM13wla98Bffddx8AYHx8HHV1dddcY7fbdS9MFolEgu0zMau0T1EFDncFkVKvntWRTqfh8ch1OnguPB7gkC2Am2rLLPO9y1XGkAaA0tJSPPnkk3j33Xfx29/+9pqvNzc3616YLOx2O9tnYlZp39nRIOpuuPbH1eNxY968RgMqyh9/STW+1Nxkme/dZFpbW7O6LusHhzt27MCBAwfw9NNPIxaz7rgYkWyEEEUx1HHJsC+GiZA5d/fLh4wh/de//hUvvPACAKCqqgo2mw2zZnFSCFGhOIIJU+4ZnQuzHWSQTxmHO7761a9i48aNWLVqFdLpNDZt2oTKyspC1EZEAM4UUS/6kl5XBI1zVaPLkELGkK6ursZvfvObQtRCRB8RT6noscBud9MlBGB3J/BfRhciAY5bEEmsczxYtJsP9ftSiCbTRpdhOIY0kaSEEDhrod3upku16D4l08WQJpLUsC+GgAX36ZiOjtGgpXb8mwmGNJGkzowWby/6koSiFv2hAAxpIgmFEwr63VGjy5DC6eFAUW9jypAmklDneMgSx2PpIRRXcMFVfDNcLmFIE0lG04SljsfSQ+tQ8W68xJAmksyAN4pwglPPrjQRSmDUHze6DEMwpIkkc5YPDK/r9LB1DzyYCkOaSCLBuIJBLx8YXk+/OwpfNGV0GQXHkCaSyLmxIIp06DUrp4eKrzfNkCaShMoHhhnZHSHEUsU1Xs+QJpJEnzuCWIo7v00lrQl0jBTXLzKGNJEkuMIwO2dGA0irxbNUnCFNJAFfNIURH088ykYspcLuCBtdRsEwpIkkUIwb++eibaR4FrcwpIkMpqgauhwho8swFW8khUFvcXzyYEgTGazbGUZSKZ4xVr20Fsl0PIY0kcH4wHBmRnwxuMLWP1WcIU1kIGcwgYmQ9YMmX04PBYwuIe8Y0kQG6uADw5z0TIQRsfg5iAxpIoPEUyp6nMUzlSwfVE2gfThgdBl5xZAmMkjneBDpIj5xRC9nxgKWPgeRIU1kAE0T6OADQ10kFc3S5yAypIkMMOCNIhQv7pPA9dRm4XMQGdJEBrD6OGqhBeMK+j3WPAeRIU1UYN5IEsPcp0N3Vl3cwpAmKrD2kYDRJVjSeCABR9B65yAypIkKKKGosHOfjryxYm+aIU1UQOfGglBUaz7gkkGvK4JAzFrnIDKkiQpE1QSHOvJMCKDNYv+PGdJEBdLriiCcsPYSZhl0jgWRUKxzDBlDmqgAhBCWHC+VkaIKS+0syJAmKoBRf5y73RVQ+4jfMucglk71RUVRsGnTJoyNjSGVSuGRRx7Bl7/85ULVRmQZp4fZiy6kaFLFeWcYn76l3uhScjZlSO/btw8NDQ341a9+hUAggO985zsMaaJp8kaS6HdHjS6j6Jwe9uNTN9fBZrMZXUpObGKK0xyj0SiEEKipqYHf78fy5ctx8ODBq65pbW1FdXV13gs1SiKRQGVlpdFl5A3bl3/HhqPo9yXzcu90Oo3S0in7WqalR9vuW1SDpvpynSrSVywWw5IlSzJeN+X/gdmzZwMAIpEIHn30Uaxbt+661zU3N0+/QpOw2+1sn4kZ3b5QQkFkbBDz5uVnbrTH48a8eY15ubfR9Gibv7QK9zd/TKeK9NXa2prVdRkfHDocDjz44IP49re/jW9+85s5F0ZUTFqH/NAm/7BKeTbmj5t+qfiUIe3xeLBmzRps2LABy5cvL1RNRJYQTabROWadqWBmdWrQ3A9tpwzp3bt3IxQKYefOnVi9ejVWr16NRILTiIiycXrYzyXgEuhzR+CPmnep+JRj0k899RSeeuqpQtVCZBnxlGqpBRVmJsTFYaev/MeNRpcyI1zMQpQHp4f9lj53z2zsjhCiJj1VnCFNpLNYKs2NlCST1gTaTHoaDkOaSGenBtmLllHHaADJtPk2XmJIE+kokkyjg71oKaXSGs6a8DkBQ5pIRx/2eZG26KnVVnB62HwbLzGkiXTii6bQOc6jsWQWTaroMtnxZQxpIp180Ovh6kITODXoh2aiTzsMaSIdjPpj6HVFjC6DshCMK+ieCBtdRtYY0kQ50jSBwz1uo8ugaTg16MMUG4BKhSFNlKMuRwiuUH62IqX88ERS6DPJHt8MaaIcJBQVH/R6jC6DZuCkSXrTDGmiHBzr8yCWMt8CCQKcwQSGvDGjy8iIIU00Q45gnJsomdyJAZ/RJWTEkCaaAVUTeK9rAib4tExTGAvEMeKTuzfNkCaagRMDPngi5t2jmP7tuOS9aYY00TS5QglTfEym7Iz4Yhj1y9ubZkgTTYOiatjf6eTKQos53i/vL12GNNE0HL3ggZfDHJYz7IthLCDngbVTHp9FckgoKkJxBdGUimRaRfr/zs0rLbGhsrQE1RUlqKssQ2VZicGVWluvK8zN/C3sX31eLF/SZHQZ12BIS0YIAU8khWFfFGOBBFyhBMKJ7I79mV1RgsbaCtxYV4lbGqpwU30Vykv5YUkPgVgKBzonjC6D8ujS2HTTDdVGl3IVhrQkvJEkuhwh9ExEEIorM7pHNKkimoxh0HPxIUjJLBtubqjC7Y2zcUdjDeqryvQsuWgk0yre7hjnaStF4GJvugo2m83oUi5jSBtI0wT63BG0DQfyMh6magIjvhhGfDEc7nZjQX0lPrmgFp+8sRazK/itz4amCew/5+R0uyIx6o9jxBfHrXPl6U3zJ9UAmiZw3hnGiQEv/LGZ9ZpnwhlMwBlM4P0eDxbOq8anb6nnLIUpCHFxd7t+k2zEQ/o41ufBx+Z8TJreNEO6gIQQGPTG8P4Ft6EzBDQh0O+Oot8dRTQYRLDCg0/fUo+6Sg6HXOnEgI8PCouQI5hAvyeKOxprjC4FAEO6YPzRFP7Z47o8XiyLuKLheL8PJwZ8uL2xBp9pqsetc6ql6UUYpXXIh2N9XqPLIIMc6/Pi9nmzpfg5YEjnmaoJnBz04eSAT+oDSoUA+lwR9LkiuKG6DP/Z1IBP3VxXlNP6Tg76cPQCtx8tZp5wEt0TYdy1oM7oUhjS+eQKJXCgawKesLk2hPfHFBzpceNffR7ceWMtFjc1YEF9pdFl5Z0QAu9f8KB1yG90KSSBY71efGJ+LUpmGdubZkjngfZ/vecP+32mfjCnqAKd4yF0jocwv64C/3lLPT65oBYVpdbrXafSGg50OnlOIV0WjCs4OxbE3R9rMLQOhrTOQgkF+886pV1iOlOuUBIHQy4c6XHjEzfW4j9uqkPTDXLNJ50pbySJd846uNybrnG834vmm4ztmDCkddTriuDdrgkkFOue1KGoAl3jIXSNh1BXVYa7FtTikwtqMa+mwujSpk0IgfaRAD7o9UBRzfuJh/InllLROuTH5++YZ1gNDGkdqJrA0V4PThfZWGYoruDEwMWZIXNryvGJ+bW4Y/5sNNZUSN/DdoUTOHTehfFAwuhSSHKnh/xY3NSAGoMWgDGkcxRJpvHfZxyWG96YLm8kBW/Eiw/7vairKsPt82Zj4bzZaLqhCmUl8uwfEoil8GG/D+edIZ6qQllRVIFjvR589VMLDHl/hnQOxgJxvHNmHNGkdYc3ZiIUV9A+EkD7SODy/iFNN1z858a6yoKHthACo/6L5xFecIUZzjRtXY4Q7r61AfNrCz/LKauQ7ujowLPPPos9e/bkux5TEEKgYzSIw91uU8/eKIQr9w8BLm761FhbgQV1lZhfV4HG2grMqS5Hqc7BrWoCjmAcp8djOOYdnPGmVUTAxXUEh7vdWL6kqeBDeRlD+qWXXsK+fftQVVVViHqkl1Y1HOp249wYT4meCVUTl/cQuWSWzYaG6jLcMLscDVVlqKsqQ21lKWoqSlFVXoLK0hKUldiu+eEQQkBRBeKKinBCQSCmwBdNwRVOwhmMQ1EFPJ4E5s1jQFPuRv1x9Lkj+Pj82oK+b8aQvvXWW/G73/0OTzzxRCHqkVo0mcbfzozzYZPONCHgi6bgi04+BW6WzYbSEhtm2Wyw2S6GfVoV/CRDBXW4x4Pb5s4u6JBdxpD+2te+htHR0SmvsdvtuhUkm0QiAbvdDl8sjcMDEUQVa+0pnE6n4fG4jS4jb9g+85KxbR4AexU/Fi8o3MiCLg8Om5ub9biNlOx2O0rnNKHN4URVfQWsNujj8bgxb16j0WXkDdtnXrK2zS1suHnhwpwP0Whtbc3qOnnmRklICIGzzjj+dsbBxQ5EBODilLzDPYXr4TOkJ6GoGvafc6LDWdzzn4noWn2uCPrdhdnnJauQbmpqwl/+8pd81yKNaDKN/986ivPOsNGlEJGkDnW7C3LuJXvSH+EKJ/DaiWE4gpzBQUSTC8UVfNif/4MhGNJX6HWF8ZeTIwgn0kaXQkQm0DYcgCuU3w4dQxoXHxCeGPDh7Q4+ICSi7GlC4O9dE1DzeOpS0Yf0pQeEH/TyuCQimj53OIlTg7683b+oQzqSTOMtPiAkohwdH/DBnadj8oo2pJ3BBF47PnzVHhJERDOhagIHOp15GfYoypDuGg/hzVMjiCT5gJCI9OEOJ/My26OoQlrTBP7Z7cKBTifSeRzoJ6LidHLQh1F/TNd7Fk1Ix1Jp7G0bQ9twwOhSiMiihAD2n3Pqes5pUYT0RCiBV48PX954nogoX8KJNP7eNQGh0za6lg/pc2NBLlAhooLqc0VwWqdP7ZY941BRNRw670LneMjoUoioCB294MGC+krc0pDbBseW7En7oym8cXKEAU1EhtGEwDtnxnOeRWa5kO6ZCOPVE8N5m1hORJStaFLF3zrGkVZnvlueZUJaUTUctE/gnTOOgmwfSESUDUcwgffsM3+QaIkxaXc4if3nHPBEJj/IlIjIKHZHGPVV5bjnjrnTfq2pQ1oIgfaRAI5e8HBxChFJ7cN+L2orS/HpW+qn9TrThnQ4oeDdrgkMeTn3mYjM4T37BCrLSvDx+TVZv8Z0IS2EwHlnGP/sduu6qoeIKN+EAP77rAPfvvvmrF9jqpCOJNP4x3kX+lyFOQCSiEhvqiawr30cn2/I7npThLQQAp3jIbx/wcPeMxGZ3nSeoUkf0r5oCv847+K+G0RUlKQNaUXVcHLAh1ND/ryeH0ZEJDPpQloIgQuuCI70uLkpEhEVPalC2hGM4/0eD8YCcaNLISKSghQh7YumcKzPgwsTnLVBRHQlQ0PaH03h+IAP550h6LQ/NhGRpRgS0u5wEqcGfeieCDOciYimULCQFkJgyBtD24gfgx5OpyMiykbeQzqeUtHlCOHsaAD+mJLvtyMispS8hLSqCQx5o7A7wuhzRzjPmYhohnQLaVUTGPHF0OuKoNcdQTzF5dtERLnSJaT/dmYcQ94YT0QhItKZLiHN+c1ERPmRMaQ1TcPWrVvR3d2N8vJy/OIXv8Btt91WiNqIiIpexoNo33vvPaRSKbzxxhtYv349nnnmmULURUREyKIn3draiqVLlwIA7r77bpw7d+6aayrKLHPo+DXKS2axfSbG9pmXlds2HRlDOhKJoKbm3+dxlZSUIJ1Oo7T03y/94o3Wnf+cqK9EZSXbZ1Zsn3lZuW0AEMt2TZ/IYNu2beKdd965/N9Lly696uunTp3KdAupbdmyZcqvd3V1SVHHVK+ZyWsvyXf7cqlNj3vK/P3T472ubN+Vfz6derZs2XLdv0uT3WOqe2d631y+d9m8Vs/3z7dsszNjSO/fv188+eSTQggh2traxI9//OMZvZGsMv2eKtQPeRa/Lyd9zUxee0m+25dLbXrcU+bvnx7vdWX7rvzz6dQD4Lp/lya7x1T3zvS+uXzvsnmtnu+fb9lmZ8bhjvvvvx8ffPABWlpaIITAtm3bptGhJyKiXGQM6VmzZuFnP/tZIWohIqKP4KNTIiKJMaSJiCQmxfFZRtqyZYvRJQCYWR2XXiNLG64nH7XJ2N5C1jTZe13559OpZ7LXZfM+0/nadOuayWvz+f5GsQmR29kora2tetVCRFRUlixZkvGanEOaiIjyh2PSREQSY0gTEUmMIU1EJDHdQrqvrw9LlixBMpnU65ZSCIfDePjhh/GDH/wAK1asQFtbm9El6ULTNGzevBkrVqzA6tWrMTQ0ZHRJulEUBRs2bMDKlSuxfPlyHDx40OiS8sLr9eLee+9FX1+f0aXo7oUXXsCKFSvwve99D2+++abR5ehKURSsX78eLS0tWLlyZcbvny4hHYlEsGPHDpSXl+txO6m8/PLL+NznPoc///nP2L59u2VWX1p5n/B9+/ahoaEBr776Kn7/+9/j5z//udEl6U5RFGzevBmVlZVGl6K748ePo62tDa+99hr27NkDp9NpdEm6Onz4MNLpNF5//XWsXbsWzz///JTX5xzSQgg8/fTTePzxx1FVVZXr7aTz0EMPoaWlBQCgqioqKioMrkgf2ewTblZf//rX8dhjjwG4+PezpKTE4Ir0t2PHDrS0tGD+/PlGl6K7o0eP4s4778TatWvx8MMP47777jO6JF0tWrQIqqpC0zREIpGrtn2+nmktZnnzzTfxxz/+8ao/u/nmm7Fs2TLcdddd069WMtdr37Zt27B48WK43W5s2LABmzZtMqg6fWWzT7hZzZ49G8DFNj766KNYt26dsQXpbO/evZgzZw6WLl2KF1980ehydOf3+zE+Po7du3djdHQUjzzyCPbv3w+bzWZ0abqorq7G2NgYvvGNb8Dv92P37t1TXp/zPOn7778fCxYsAAC0t7dj8eLFeOWVV3K5pXS6u7vx+OOP44knnsC9995rdDm62L59Oz7zmc9g2bJlAIAvfOELOHLkiMFV6cfhcGDt2rWXx6WtZNWqVbDZbLDZbLDb7Vi4cCF27dqFxsZGo0vTxbPPPos5c+ZgzZo1AIBvfetbePnllzF37lyDK9PH9u3bUV5ejvXr18PhcOCHP/wh3n777Uk/pefcbXr33Xcv//uXvvQl/OEPf8j1llLp7e3FY489hueff94SnxYu+exnP4tDhw5h2bJlaG9vx5133ml0SbrxeDxYs2YNNm/ejHvuucfocnR3ZSdo9erV2Lp1q2UCGri4Cu9Pf/oTfvSjH8HlciEej6OhocHosnRTV1eHsrIyAEB9fT3S6TRUVZ30evN/ts2z5557DqlUCr/85S8BADU1Ndi1a5fBVeXOyvuE7969G6FQCDt37sTOnTsBAC+99JIlH7JZ0Re/+EWcPHkSy5cvhxACmzdvttRzhYceegibNm3CypUroSgKfvrTn6K6unrS67ksnIhIYlzMQkQkMYY0EZHEGNJERBJjSBMRSYwhTUQkMYY0EZHEGNJERBL7X6Ub4i9vXzzMAAAAAElFTkSuQmCC", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from scipy.stats import norm\n", "x_d = np.linspace(-4, 8, 1000)\n", "density = sum(norm(xi).pdf(x_d) for xi in x)\n", "\n", "plt.fill_between(x_d, density, alpha=0.5)\n", "plt.plot(x, np.full_like(x, -0.1), '|k', markeredgewidth=1)\n", "\n", "plt.axis([-4, 8, -0.2, 5]);" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "This smoothed-out plot, with a Gaussian distribution contributed at the location of each input point, gives a much more accurate idea of the shape of the data distribution, and one that has much less variance (i.e., changes much less in response to differences in sampling).\n", "\n", "What we've landed on in the last two plots is what's called kernel density estimation in one dimension: we have placed a \"kernel\"—a square or \"tophat\"-shaped kernel in the former, a Gaussian kernel in the latter—at the location of each point, and used their sum as an estimate of density.\n", "With this intuition in mind, we'll now explore kernel density estimation in more detail." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Kernel Density Estimation in Practice\n", "\n", "The free parameters of kernel density estimation are the *kernel*, which specifies the shape of the distribution placed at each point, and the *kernel bandwidth*, which controls the size of the kernel at each point.\n", "In practice, there are many kernels you might use for kernel density estimation: in particular, the Scikit-Learn KDE implementation supports six kernels, which you can read about in the [\"Density Estimation\" section](http://scikit-learn.org/stable/modules/density.html) of the documentation.\n", "\n", "While there are several versions of KDE implemented in Python (notably in the SciPy and `statsmodels` packages), I prefer to use Scikit-Learn's version because of its efficiency and flexibility.\n", "It is implemented in the `sklearn.neighbors.KernelDensity` estimator, which handles KDE in multiple dimensions with one of six kernels and one of a couple dozen distance metrics.\n", "Because KDE can be fairly computationally intensive, the Scikit-Learn estimator uses a tree-based algorithm under the hood and can trade off computation time for accuracy using the `atol` (absolute tolerance) and `rtol` (relative tolerance) parameters.\n", "The kernel bandwidth can be determined using Scikit-Learn's standard cross-validation tools, as we will soon see.\n", "\n", "Let's first show a simple example of replicating the previous plot using the Scikit-Learn `KernelDensity` estimator (see the following figure):" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD0CAYAAACLpN0/AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAhIUlEQVR4nO3de3BU9d0/8HeySy5kk0AEivoIYh5TUZ80hv76tGWCtTZjZZyO03EgpGa0zuhItbWaUqeMIkMpYKd0HKcD1GpTSwsEf7bPj9o+9mnUEi55EFYCuWxCSMj9tpfsPXs9398fKSsxCZtsdvdc9v36K9lzzp7PlyzvnHzP93y/aUIIASIiUr10uQsgIqL4YKATEWkEA52ISCMY6EREGsFAJyLSCAY6EZFG6OU6sdFolOvURESqtmbNmmlfly3QgZmLisZkMmH16tVxrkYebIvyaKUdANuiVPNpy/UuhtnlQkSkEQx0IiKNYKATEWkEA52ISCMY6EREGsFAJyLSCAY6EZFGMNCJiDSCgU5EpBEMdCIijWCgExFpBAOdiEgjGOhERBoRdbZFSZKwfft2tLe3IyMjAzt37sTKlSsj23/3u9/hr3/9KwDg3nvvxbPPPgufz4ctW7bAarUiJycHr776KgoKChLXCiIiin6FXldXh0AggNraWlRXV2PPnj2RbX19fTh27BiOHDmCo0eP4uTJk2hra8Phw4dRVFSEQ4cO4eGHH8a+ffsS2ggiIppFoBuNRpSVlQEASkpK0NzcHNm2fPlyvPnmm9DpdEhLS0MoFEJmZuakY9atW4eGhoYElU9ERFdF7XJxu90wGAyR73U6HUKhEPR6PRYsWICCggIIIfDzn/8cd955J1atWgW3243c3FwAQE5ODlwu17TvbTKZYira5/PFfKzSsC3Ko5V2AGyLUiWqLVED3WAwwOPxRL6XJAl6/aeH+f1+bN26FTk5OXjllVemHOPxeJCXlzfte8e6YgdXLlEmrbRFK+0A2Balkm3FotLSUtTX1wMAGhsbUVRUFNkmhMD3vvc9fP7zn8eOHTug0+kixxw/fhwAUF9fH/NSc0SkHJIkYHX70WfzYtTpQygsyV0SfUbUK/Ty8nKcOnUKFRUVEEJg165dqKmpwYoVKyBJEj7++GMEAgGcOHECAPDCCy9g06ZNePHFF7Fp0yYsWLAAe/fuTXhDiCgxvIEQznWPoXXIifFAOPK6Pj0NhcsM+OKti7EsN0vGCumqqIGenp6OHTt2THqtsLAw8nVTU9O0x73++uvzLI2I5HZpxIUPTKPwBcNTtoUkgfZhFy6NuHDPisVYW3gD9Do+2iKnqIFORKlHCIGGLivOdNlmsS/wSc8Yhuzj+FbJTViYwViRC3+dEtEkQgic6LDMKsyvNeTw4Z1z/fD4QwmqjKJhoBPRJJ/02mHsGYvpWJsngD+dH5i2i4YSj4FORBE9Vg9OdJjn9R4Wlx9/axqCJIk4VUWzxUAnIgCAxx/C+83DEHHI4R6rF6c7rfN/I5oTBjoRQQiBOtMIvIH4dZWc7bah2+KJviPFDQOdiNA27EKXOf7h+z+tw5PGrlNiMdCJUpwvGEb9pfn1m8/E4w/j+KXRhLw3TcVAJ0pxDV3WuHa1fJZpyMWulyRhoBOlMKvbj4t9joSf56P2Uc79kgQMdKIUdqrTCikew1qisHuDMY9tp9ljoBOlqCHHODpH3Uk739luG1y+YNLOl4oY6EQpqiHJ48SDYZH0c6YaBjpRChq0j6PH6k36eVuHnLC4/Uk/b6pgoBOloI+vzG3irXgRAnyCNIEY6EQpZtTlwxUZhxF2jrox7PDJdn4tY6ATpZhPFDDapKHLIncJmsRAJ0ohTl8Q7cPJG9kyk26Ll1fpCcBAJ0ohF/rsSRl3PhtnrrAvPd4Y6EQpIhCS0DzglLuMiC6zB6MuXqXHEwOdKEW0D7sUt5LQuW75+/O1hIFOlAKEEGjst8tdxhSXRlxwePn0aLww0IlSwKDDB4tLeQ/0CAF80sur9HhhoBOlgCYFXp1f1TLo4CIYccJAJ9K48UAYHSPyD1WcSTAscFHBv3DUhIFOpHGmYSdCkjKGKs7kQr8dYYXXqAYMdCINE0KgZSDxC1jMl8cfRvuwS+4yVI+BTqRhFm8YFndA7jJmpbHPDqGQh57UioFOpGGdNuWNbJnJiNOHIU4HMC8MdCKNCoYl9NjVcXV+VWOfXe4SVI2BTqRRl0fdCIbV1YXRMeKG2x+SuwzVYqATaVTroHLmbZktSQg0q+AmrlIx0Ik0yOULom8s+UvMxUNTv4NDGGPEQCfSoLZhF9Q6YMTtD+GKRbkPQikZA51IY4QQaBtSX3fLtS70sdslFgx0Io0xu/yqGXs+k16bF2MedbdBDgx0Io0xaeSJyybeHJ2zqIEuSRK2bduGjRs3oqqqCj09PVP2sdlseOCBB+D3TzzEIIRAWVkZqqqqUFVVhb1798a/ciKaQpIELmkk0FuHnAiFJbnLUBV9tB3q6uoQCARQW1uLxsZG7NmzB/v3749sP3HiBPbu3Quz2Rx5rbe3F3fddRcOHDiQmKqJaFr9Y+OaGcc9HgjjstmNO5bnyV2KakS9QjcajSgrKwMAlJSUoLm5efIbpKejpqYGixYtirzW0tKCkZERVFVV4cknn0RXV1d8qyaiabUNq/tm6Gc19bPbZS6iXqG73W4YDIbI9zqdDqFQCHr9xKFr166dcszSpUvx1FNP4cEHH8S5c+ewZcsWvPvuu1P2M5lMMRXt8/liPlZp2BblUWs7wpJAQ4sDgWu6KUKhECwW83WOUjaLBVihcyAvS6fan8t0EtWWqIFuMBjg8Xgi30uSFAnzmdx9993Q6XQAgC9+8YsYHR2FEAJpaWmT9lu9enUsNcNkMsV8rNKwLcqj1nZcHnUhb/Hk/5sWixlLliyVqaL4CBoKsPr2Jar9uUxnPm0xGo0zbova5VJaWor6+noAQGNjI4qKiqKe8Fe/+hXefvttAEBbWxtuvPHGKWFORPHVPqzNh3Fah/jk6GxFvUIvLy/HqVOnUFFRASEEdu3ahZqaGqxYsQL333//tMc89dRT2LJlC44fPw6dTofdu3fHvXAi+pQ/FEaXWZuB7vGH+eToLEUN9PT0dOzYsWPSa4WFhVP2+/DDDyNf5+fn44033ohDeUQ0G11mj+KXmZuP5gEnPp8ldxXKxweLiDTg0og2xp7PpNvqgSfAMenRMNCJVM4XDKPHqs6ZFWdLCKBLRasvyYWBTqRyl0fdKXHTsNMW4JqjUTDQiVSuXSOP+kfjDoTRPzYudxmKxkAnUjFvIKTahSxi0TLIJ0evh4FOpGIdI27VLmQRi8ujbviCYbnLUCwGOpGKtWt8dMtnBcNC8yN65oOBTqRSLl8Qg/bU61NW4+LXycJAJ1KpjtHU6m65asjhg9XNIYzTYaATqZRWFrKIRavK10xNFAY6kQo5xoMYcvjkLkM2piEnpBQYez9XDHQiFepI8RuDHn8Y3VZP9B1TDAOdSIVSbXTLdFp4c3QKBjqRyti9AYw6eVPwisWD8QDHpF+LgU6kMqnyqH80YUlobg3V+WKgE6kMH6z5FLtdJmOgE6mI1e2HxR2QuwzFMLv8GHWl7mifz2KgE6kIb4ZOxSdHP8VAJ1IJIURKP0w0k7ZhV0rMBz8bDHQilTC7/BjzBuUuQ3HGA2FcsXBMOsBAJ1KNNl6dz4hTAUxgoBOpgBCcNvZ6rpg98AZCcpchOwY6kQoM2Mfh8jGwZiIJwb9gwEAnUgVenUfHMekMdCLFC0sCl0bccpeheBaXH6PO1B6TzkAnUrhem5dzlsxSS4rfHGWgEylcO+crmbX2FB+TzkAnUrBASEKnmWOsZ2tiTHrqdk8x0IkUrMviRiAkyV2GqqTyzVEGOpGCtQ1xdMtcdVu88PhTc4gnA51IoTz+EHqsXrnLUJ2JMempeZXOQCdSqPYRFySRujf45qN10AmRgv92DHQihTKl+BC8+bC4Axh1pd4yfQx0IgWyuP1cN3SeWgYdcpeQdAx0IgXi1fn8tQ27EAqn1gghBjqRwkiS4OiWOPAHU28MPwOdSGF6bF64U3TYXbylWrdL1ECXJAnbtm3Dxo0bUVVVhZ6enin72Gw2PPDAA/D7J/r8fD4fvv/976OyshJPPvkkbDZb/Csn0iiukRk/vTYvnL7UWeUpaqDX1dUhEAigtrYW1dXV2LNnz6TtJ06cwBNPPAGz2Rx57fDhwygqKsKhQ4fw8MMPY9++ffGvnEiDxgNhdJpT99H1eBMCMKXQL8iogW40GlFWVgYAKCkpQXNz8+Q3SE9HTU0NFi1aNO0x69atQ0NDQxxLJtKutmFnSk8ulQgtKTQmPWqgu91uGAyGyPc6nQ6h0Kf9e2vXrsXixYunHJObmwsAyMnJgcvFGzxE0Qgh0JxCV5PJ4hgPon9sXO4ykkIfbQeDwQCP59M7xZIkQa+//mHXHuPxeJCXlzftfiaTaS61Rvh8vpiPVRq2RXnkaofZE0LblfgGeigUgsVijr6jCsynLX//2Im1K3PiXFHsEvUZixropaWl+Oijj7B+/Xo0NjaiqKgo6puWlpbi+PHjKC4uRn19PdasWTPtfqtXr557xZj4RRDrsUrDtiiPXO3obx3BkiWZcX1Pi8WMJUuWxvU95TKftvh0aVj177cha4EuzlXFZj6fMaPROOO2qF0u5eXlyMjIQEVFBXbv3o2f/OQnqKmpwQcffDDjMZs2bUJHRwc2bdqE2tpaPPvsszEVTpQqfMEwF7JIoGBYpMS6rFGv0NPT07Fjx45JrxUWFk7Z78MPP4x8nZ2djddffz0O5RGlhvZhF4Lh1LhxJ5eWQSeK/22R3GUkFB8sIpKZEAIX++1yl6F5ww4fzBqfsIuBTiSzAfs4LO6A3GWkhGaNPznKQCeS2YU+bYeMkrQNaXvCLgY6kYxcviAuj/LJ0GTxBcO4rOEncRnoRDJq6ndwVaIkax7Q7mgiBjqRTIJhCRcH2N2SbH02L+xebd6zYKATyaRtyIXxQFjuMlJSi0anWGCgE8lACIFPesfkLiNltQw6NDkJGgOdSAZdFg9sHm3+2a8GHn8YVyzauznKQCdKMiEEznVz0Re5NWnw/gUDnSjJ+sfGMWj3yV1GyuuxeuEY19ZqRgx0oiQ7y6tzRRACaNbYVToDnSiJBu3j6LF65S6D/kVrN0cZ6ERJdOaKVe4S6BoefxhdGnpylIFOlCQD9nF0W3h1rjQX+7XT7cJAJ0oCIQROXbbIXQZNo9fmxZhGhpAy0ImS4IrFg4EUWahYjbQyBQMDnSjBJEngJK/OFa110ImgBqbVZaATJVjzoANWLmChaBNruqp/zVEGOlEC+YJhnO7kyBY1uNjvgFD5VMYMdKIEOt1p4YyKKjHi9GHYqe4neBnoKU6SBHzBMLyBEHzBsOqvUJRk2OHT1JC4VHChzy53CfOil7sASh5/KIw+mxf9Y+MYdflh9wbg8U++ekxLAwyZeixamIGluZm4KT8LNy/OxsIMflTmIiwJ/MM0Av5+VJdLI26U3R5CTqY6P+/qrJpmLSwJdJndaB1yosfqjfqYsxCAyxeCyxdCn82LTzAR8p/Ly8K/LzOgaFku8hcuSE7xKna22waLyy93GTRHYUmgacCBL992g9ylxISBrlHBsISL/Q6c7x2Dyxea13sJMdF9MOzw4WSHBTcvzsZ/3JyP25cZoNex1+6zRpw+nOniBFxq1dTvwP+5tQC69DS5S5kzBrrGSJJAy6ATDV2WKd0p8TIwNo6BsXEcz9Ch+OZ8fOGWRar9EzXeAiEJ7zcPc+FnFXP7Q+gYdeGO5XlylzJn/F+oIcMOHz5oG8GoMzl/6o8HwjhzxYZzPWO488Y85KX4aA4hBD5qH+VKRBrQ2GtnoJM8QmEJDV1WGHvGZLkJd7Xf0WpxwqobwpdWFeAGQ2byC5FZ04ADrRpdfDjVDDl8GHKM48b8bLlLmRMGuspZ3X78rXlYETfgBATahl1oH3Hh9mW5+NKqAizNTY1g7x/z4p/tZrnLoDg632vHjf/BQKckaRl04KO2UQTDyuqvFQK4NOLCpREXbv+cAV9aVYBluVlyl5UwVrcff7kwpKmFEgjoGHHDeXsQeVnqGdXFQFehsCTwz/ZRVTy00jHiRseIG4XLDPjPVQX4XJ62gt0xHsSfzw/AF0zt+wdaJAmBxl471hUtlbuUWWOgq4w3EMJ7F4dUNxVr56gbnaNu3LpkIb64sgD/tjgbaWnqGxZ2Lcd4EO8a++c9LJSUq2nAgf+8rQCZep3cpcwKA11FrG4//l/joKpXKu+2eNFt8eLG/CysWbkYhUsNSFfheF+r248/nx9gmGtcICShecCBNSsL5C5lVhjoKtE/5sWxC4PwB9U/ZzMwMYrgvYtDyMtegJJb8nHXTfnIWqCOq6BeqxfvNWnnZ0HXd77XjpJbFqviQSMGugp0jLjwfvMwQhq86eYcD6L+kgWnL1tRtDwXd9+cj5vysxTZHSNJAme7bWjosnKOlhTi8oXQNuzEXTfly11KVAx0hWvqd+CDNu1P8hSSBFoHnWgddKIgJwN3LM/FHcvzFDNvjM0TQF3rCAbs6rp3QfFh/NfDc0q80LgWA13BznbbcLIj9ZYus3kCON1pxelOK5bnT0wKVrjUgIKcjKTX4guGcbbbhvO9dg5LTGFWdwBdFg8KlxrkLuW6GOgKJITA6U4rPr7CCZ6unRRs0cIFuPWGHKy4YSFuXpSd0D53hzeIC/12NA04EAixr5yAs1dsuG1JjqKv0qMGuiRJ2L59O9rb25GRkYGdO3di5cqVke1Hjx7FkSNHoNfrsXnzZtx3332w2+144IEHUFRUBAD4xje+gcceeyxxrdAQIQT+ecmMxl673KUojt0bRKPXjsY+O9LSgCWGTNyYn4XP5WVhWV4mChZmxDz7oxACY94geqwe1He4IPVfiXP1pHZDDh/6x8ZxS8FCuUuZUdRAr6urQyAQQG1tLRobG7Fnzx7s378fAGA2m3Hw4EG8++678Pv9qKysxNq1a9Ha2oqHHnoIL7/8csIboCVCCHzYpo4HhuQmBGB2+WF2+QFM/Hulp6UhP1uPxTkZyMtaAEOWHtkLdMhakI4FunSk/+vKKiQJ+ENheANhOMeDGPMGMOr0w/uvycUsniCWqOuJb0qSM1ds6g50o9GIsrIyAEBJSQmam5sj2y5evIh77rkHGRkZyMjIwIoVK9DW1obm5ma0tLTg0UcfRUFBAV566SUsW7Ysca3QAEkSqDONoIWTO8VM+tdV9phXveP0Sdn6bF4M2Mdx8yJl/saPGuhutxsGw6c3AnQ6HUKhEPR6PdxuN3JzcyPbcnJy4Ha7cdttt+Huu+/GV7/6VRw7dgw7d+7E66+/PuW9TSZTTEX7fL6Yj1Uan8+HltZWNPR6cWVM/gm25iMUCsFiUf8EVVppB8C2JMKfT9pxf2Fu9B2vI1EZFjXQDQYDPB5P5HtJkqDX66fd5vF4kJubi+LiYmRnT/wGKy8vnzbMAWD16tUxFW0ymWI+VmlaWlvRG14Ml06PJUvkrmZ+LBYzlixRz7wXM9FKOwC2JRGCAPJvvAU3zeMqfT4ZZjQaZ9wW9Q5SaWkp6uvrAQCNjY2RG50AUFxcDKPRCL/fD5fLhc7OThQVFeGll17C3//+dwBAQ0MD7rrrrpgK1zpJEjjV40HbsEvuUohoDho6rXKXMK2oV+jl5eU4deoUKioqIITArl27UFNTgxUrVuD+++9HVVUVKisrIYTA888/j8zMTFRXV2Pr1q04fPgwsrOzsXPnzmS0RVUkSeD9lmH02AOqvzInSjW9Ni/6bF7F3SCNGujp6enYsWPHpNcKCwsjX2/YsAEbNmyYtP2WW27BwYMH41Si9kiSwH83D+PSCK/MidSqodOquFlDuWR7koUZ5kSaMGAfR7fVK3cZkzDQk2gizIcY5kQacfKyBUJBEy0x0JPkaph3jLjlLoWI4sTi8itqUAMDPQnCksDfmhjmRFp06rIFobAy5vthoCdYKCzhvYuDuDzKMCfSIpcvhPN9drnLAMBAT6iJMB9Cl9kTfWciUq2Pr9jg8cu/HCEDPUGCYQnHLgziioVhTqR1gZCE0wp42IiBngCBkIT/Oj+AHoUNaSKixGkZdGDE6ZO1BgZ6nPmCYfz5fD/6x7hUGVEqEQL4qG1U1mGMDPQ48gZCePeTfgza5f0tTUTyGHL40Dwg3xTYDPQ4cfmC+L/Gfow61T0FLhHNz8nLFngD8twgZaDHgd0bwNFz/bC6A3KXQkQy8wXDON4uz7ztDPR5GnX5UHu2D85xrpJDRBPahl2yjHBjoM9Dn82Ld871R9aiJCK66gPTCHzB5GYDAz1GHSMu/Nf5AQRCynjkl4iUxeUL4fil5Ha9MNBjcL53DH9tGkJIUs4sa0SkPK2DTlweTd7kXQz0OZAkgeOXzPhnuxkKmjGTiBTsH62jcPmSc4+NgT5LgZCE95qG8EnPmNylEJGK+IJhvN88DCkJf9Ez0GfB5QviHWMfOjljIhHFoH9sHP/blfi5XhjoUQw5xnH4414+MERE83Lmig1d5sReFDLQr6N5wIF3zvXD4+ewRCKav/9uHobNk7gHEPUJe2cVC4Ul1HeYcaHPIXcpRKQhgZCEM11W3Jqg5GWgf4ZjPIi/NQ1h2MEJtogo/hJ5a5SBfo1Osxv/05L8p7uIiOKBgY6J1YVOdljQqJB1AYmIYpHygT7q9OH9lmHOlEhEqpeygR6WBM5223CmywaJj30SkQakZKAPO3yoM43A7OLYciLSjpQKdF8wjIYuKy702TkXCxFpTkoEuiQJtA45ceqyhXOXE5FmaTrQhRDotXlxosPC7hUi0jzNBnr/mBcNnVb0j43LXQoRUVJoKtCFEOixenG228YgJ6KUo4lAD4QktA+70Ng3BgvHkxNRilJtoAshMOz0oXXQibZhF9f2JKKUp8pA77D68b+nu2H3JmdZJyIiNVBloPc5AghkMMyJiK4VNdAlScL27dvR3t6OjIwM7Ny5EytXroxsP3r0KI4cOQK9Xo/Nmzfjvvvug81mw49+9CP4fD4sW7YMu3fvRnZ2dkIbQkSU6qKuWFRXV4dAIIDa2lpUV1djz549kW1msxkHDx7EkSNH8NZbb+GXv/wlAoEA9u3bh4ceegiHDh3CnXfeidra2oQ2goiIZhHoRqMRZWVlAICSkhI0NzdHtl28eBH33HMPMjIykJubixUrVqCtrW3SMevWrcPp06cTVD4REV0VtcvF7XbDYDBEvtfpdAiFQtDr9XC73cjNzY1sy8nJgdvtnvR6Tk4OXC7XtO9tMpliKrp4iQ66Bdp48tNvyERmJtuiJFppB8C2KNECKQCfLxhz/l1P1EA3GAzweDyR7yVJgl6vn3abx+NBbm5u5PWsrCx4PB7k5eVN+96rV6+OqWiTyRTzsUrDtiiPVtoBsC1KNZ+2GI3GGbdF7XIpLS1FfX09AKCxsRFFRUWRbcXFxTAajfD7/XC5XOjs7ERRURFKS0tx/PhxAEB9fT3WrFkTU+FERDR7Ua/Qy8vLcerUKVRUVEAIgV27dqGmpgYrVqzA/fffj6qqKlRWVkIIgeeffx6ZmZnYvHkzXnzxRRw9ehSLFy/G3r17k9EWIqKUFjXQ09PTsWPHjkmvFRYWRr7esGEDNmzYMGn7kiVL8NZbb8WpRCIimo2oXS5ERKQODPQE2759u2rOf3VfuWv+rHjWo4S2JaOG6c5x7WuzrWH79u1TPhczHXu994zlmGh1zWcfJXwOEiFNCHkWYzMajTHfLFXT3e60tDRc75840W2Jdv7p9p3LMddKVFtirSfW91LSz2S+57i2Ldeed7Y1pKWlAcCkz8VMx17vPWM55rNmasv1ap9rPcky31EuM2Unr9CJiDSCgU5EpBEMdCIijWCgExFphCrnQ1eTV155RTXnv7qv3DV/VjzrUULbklHDdOe49rXZ1jDdMTMde733jOWY2dYVyz5K+BwkAke5yIxtUR6ttANgW5SKo1yIiOi6GOhERBrBQCci0ggGOhGRRsh6U5SIiOZuppuisgU6ERHFF7tciIg0goFORKQRqg70zs5OrFmzBn6/elcCd7lcePrpp/Hoo49i48aNOH/+vNwlzYkkSdi2bRs2btyIqqoq9PT0yF1SzILBILZs2YLKyko88sgj+OCDD+Quad6sVivuvfdedHZ2yl1KzH79619j48aN+Pa3v4133nlH7nJiFgwGUV1djYqKClRWVibkZ6LaQHe73Xj11VeRkZEhdynzUlNTgy9/+cv4wx/+gN27d09Z7k/p6urqEAgEUFtbi+rqauzZs0fukmJ27NgxLFq0CIcOHcKbb76Jn/70p3KXNC/BYBDbtm1DVlaW3KXE7MyZMzh//jwOHz6MgwcPYnh4WO6SYnb8+HGEQiEcOXIEzzzzDF577bW4n0OVgS6EwMsvv4wXXngB2dnZcpczL48//jgqKioAAOFwGJmZmTJXNDdGoxFlZWUAgJKSEjQ3N8tcUey++c1v4rnnngMw8RnT6XQyVzQ/r776KioqKrBs2TK5S4nZyZMnUVRUhGeeeQZPP/00vva1r8ldUsxWrVqFcDgMSZLgdruh18d/Ki3FT871zjvv4O2335702k033YT169fjjjvukKmq2EzXll27dqG4uBhmsxlbtmzB1q1bZaouNm63GwaDIfK9TqdDKBRKyIc10XJycgBMtOkHP/gBfvjDH8pb0Dz86U9/QkFBAcrKyvDGG2/IXU7MxsbGMDg4iAMHDqC/vx+bN2/G+++/H1lJSU0WLlyIgYEBPPjggxgbG8OBAwfifg5VDlssLy/H8uXLAQCNjY0oLi7GH//4R5mril17ezteeOEF/PjHP8a9994rdzlzsnv3bnzhC1/A+vXrAQDr1q1DfX29zFXFbmhoCM8880ykH12tvvOd7yAtLQ1paWkwmUy49dZbsX//fixdulTu0ubkF7/4BQoKCvDEE08AAL71rW+hpqYGN9xwg8yVzd3u3buRkZGB6upqDA0N4bHHHsNf/vKXuP5Vrr7LKAD/+Mc/Il9//etfx29/+1sZq5mfy5cv47nnnsNrr72mur84AKC0tBQfffQR1q9fj8bGRhQVFcldUswsFgueeOIJbNu2DV/5ylfkLmderr3Aqaqqwvbt21UX5sDEAzS///3v8d3vfhejo6MYHx/HokWL5C4rJnl5eViwYAEAID8/H6FQCOFwOK7nUGWga8nevXsRCATws5/9DABgMBiwf/9+mauavfLycpw6dQoVFRUQQmDXrl1ylxSzAwcOwOl0Yt++fdi3bx8A4De/+Y2qbyqq3X333YezZ8/ikUcegRAC27ZtU+29jccffxxbt25FZWUlgsEgnn/+eSxcuDCu51BllwsREU2lylEuREQ0FQOdiEgjGOhERBrBQCci0ggGOhGRRjDQiYg0goFORKQRDHQiIo34/xFs+pUuy7zIAAAAAElFTkSuQmCC", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from sklearn.neighbors import KernelDensity\n", "\n", "# instantiate and fit the KDE model\n", "kde = KernelDensity(bandwidth=1.0, kernel='gaussian')\n", "kde.fit(x[:, None])\n", "\n", "# score_samples returns the log of the probability density\n", "logprob = kde.score_samples(x_d[:, None])\n", "\n", "plt.fill_between(x_d, np.exp(logprob), alpha=0.5)\n", "plt.plot(x, np.full_like(x, -0.01), '|k', markeredgewidth=1)\n", "plt.ylim(-0.02, 0.22);" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The result here is normalized such that the area under the curve is equal to 1." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Selecting the Bandwidth via Cross-Validation\n", "\n", "The final estimate produced by a KDE procedure can be quite sensitive to the choice of bandwidth, which is the knob that controls the bias–variance trade-off in the estimate of density.\n", "Too narrow a bandwidth leads to a high-variance estimate (i.e., overfitting), where the presence or absence of a single point makes a large difference. Too wide a bandwidth leads to a high-bias estimate (i.e., underfitting), where the structure in the data is washed out by the wide kernel.\n", "\n", "There is a long history in statistics of methods to quickly estimate the best bandwidth based on rather stringent assumptions about the data: if you look up the KDE implementations in the SciPy and `statsmodels` packages, for example, you will see implementations based on some of these rules.\n", "\n", "In machine learning contexts, we've seen that such hyperparameter tuning often is done empirically via a cross-validation approach.\n", "With this in mind, Scikit-Learn's `KernelDensity` estimator is designed such that it can be used directly within the package's standard grid search tools.\n", "Here we will use `GridSearchCV` to optimize the bandwidth for the preceding dataset.\n", "Because we are looking at such a small dataset, we will use leave-one-out cross-validation, which minimizes the reduction in training set size for each cross-validation trial:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "from sklearn.model_selection import GridSearchCV\n", "from sklearn.model_selection import LeaveOneOut\n", "\n", "bandwidths = 10 ** np.linspace(-1, 1, 100)\n", "grid = GridSearchCV(KernelDensity(kernel='gaussian'),\n", " {'bandwidth': bandwidths},\n", " cv=LeaveOneOut())\n", "grid.fit(x[:, None]);" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Now we can find the choice of bandwidth that maximizes the score (which in this case defaults to the log-likelihood):" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "{'bandwidth': 1.1233240329780276}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grid.best_params_" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The optimal bandwidth happens to be very close to what we used in the example plot earlier, where the bandwidth was 1.0 (i.e., the default width of `scipy.stats.norm`)." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Example: Not-so-Naive Bayes\n", "\n", "This example looks at Bayesian generative classification with KDE, and demonstrates how to use the Scikit-Learn architecture to create a custom estimator.\n", "\n", "In [In Depth: Naive Bayes Classification](05.05-Naive-Bayes.ipynb) we explored naive Bayesian classification, in which we create a simple generative model for each class, and use these models to build a fast classifier.\n", "For Gaussian naive Bayes, the generative model is a simple axis-aligned Gaussian.\n", "With a density estimation algorithm like KDE, we can remove the \"naive\" element and perform the same classification with a more sophisticated generative model for each class.\n", "It's still Bayesian classification, but it's no longer naive.\n", "\n", "The general approach for generative classification is this:\n", "\n", "1. Split the training data by label.\n", "\n", "2. For each set, fit a KDE to obtain a generative model of the data.\n", " This allows you, for any observation $x$ and label $y$, to compute a likelihood $P(x~|~y)$.\n", " \n", "3. From the number of examples of each class in the training set, compute the *class prior*, $P(y)$.\n", "\n", "4. For an unknown point $x$, the posterior probability for each class is $P(y~|~x) \\propto P(x~|~y)P(y)$.\n", " The class that maximizes this posterior is the label assigned to the point.\n", "\n", "The algorithm is straightforward and intuitive to understand; the more difficult piece is couching it within the Scikit-Learn framework in order to make use of the grid search and cross-validation architecture.\n", "\n", "This is the code that implements the algorithm within the Scikit-Learn framework; we will step through it following the code block:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "deletable": true, "editable": true, "tags": [] }, "outputs": [], "source": [ "from sklearn.base import BaseEstimator, ClassifierMixin\n", "\n", "\n", "class KDEClassifier(BaseEstimator, ClassifierMixin):\n", " \"\"\"Bayesian generative classification based on KDE\n", " \n", " Parameters\n", " ----------\n", " bandwidth : float\n", " the kernel bandwidth within each class\n", " kernel : str\n", " the kernel name, passed to KernelDensity\n", " \"\"\"\n", " def __init__(self, bandwidth=1.0, kernel='gaussian'):\n", " self.bandwidth = bandwidth\n", " self.kernel = kernel\n", " \n", " def fit(self, X, y):\n", " self.classes_ = np.sort(np.unique(y))\n", " training_sets = [X[y == yi] for yi in self.classes_]\n", " self.models_ = [KernelDensity(bandwidth=self.bandwidth,\n", " kernel=self.kernel).fit(Xi)\n", " for Xi in training_sets]\n", " self.logpriors_ = [np.log(Xi.shape[0] / X.shape[0])\n", " for Xi in training_sets]\n", " return self\n", " \n", " def predict_proba(self, X):\n", " logprobs = np.array([model.score_samples(X)\n", " for model in self.models_]).T\n", " result = np.exp(logprobs + self.logpriors_)\n", " return result / result.sum(axis=1, keepdims=True)\n", " \n", " def predict(self, X):\n", " return self.classes_[np.argmax(self.predict_proba(X), 1)]" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Anatomy of a Custom Estimator" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Let's step through this code and discuss the essential features:\n", "\n", "```python\n", "from sklearn.base import BaseEstimator, ClassifierMixin\n", "\n", "class KDEClassifier(BaseEstimator, ClassifierMixin):\n", " \"\"\"Bayesian generative classification based on KDE\n", " \n", " Parameters\n", " ----------\n", " bandwidth : float\n", " the kernel bandwidth within each class\n", " kernel : str\n", " the kernel name, passed to KernelDensity\n", " \"\"\"\n", "```\n", "\n", "Each estimator in Scikit-Learn is a class, and it is most convenient for this class to inherit from the `BaseEstimator` class as well as the appropriate mixin, which provides standard functionality.\n", "For example, here the `BaseEstimator` contains (among other things) the logic necessary to clone/copy an estimator for use in a cross-validation procedure, and `ClassifierMixin` defines a default `score` method used by such routines.\n", "We also provide a docstring, which will be captured by IPython's help functionality (see [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb))." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Next comes the class initialization method:\n", "\n", "```python\n", " def __init__(self, bandwidth=1.0, kernel='gaussian'):\n", " self.bandwidth = bandwidth\n", " self.kernel = kernel\n", "```\n", "\n", "This is the actual code that is executed when the object is instantiated with `KDEClassifier`.\n", "In Scikit-Learn, it is important that *initialization contains no operations* other than assigning the passed values by name to `self`.\n", "This is due to the logic contained in `BaseEstimator` required for cloning and modifying estimators for cross-validation, grid search, and other functions.\n", "Similarly, all arguments to `__init__` should be explicit: i.e., `*args` or `**kwargs` should be avoided, as they will not be correctly handled within cross-validation routines." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Next comes the `fit` method, where we handle training data:\n", "\n", "```python \n", " def fit(self, X, y):\n", " self.classes_ = np.sort(np.unique(y))\n", " training_sets = [X[y == yi] for yi in self.classes_]\n", " self.models_ = [KernelDensity(bandwidth=self.bandwidth,\n", " kernel=self.kernel).fit(Xi)\n", " for Xi in training_sets]\n", " self.logpriors_ = [np.log(Xi.shape[0] / X.shape[0])\n", " for Xi in training_sets]\n", " return self\n", "```\n", "\n", "Here we find the unique classes in the training data, train a `KernelDensity` model for each class, and compute the class priors based on the number of input samples.\n", "Finally, `fit` should always return `self` so that we can chain commands. For example:\n", "```python\n", "label = model.fit(X, y).predict(X)\n", "```\n", "Notice that each persistent result of the fit is stored with a trailing underscore (e.g., `self.logpriors_`).\n", "This is a convention used in Scikit-Learn so that you can quickly scan the members of an estimator (using IPython's tab completion) and see exactly which members are fit to training data." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Finally, we have the logic for predicting labels on new data:\n", "```python\n", " def predict_proba(self, X):\n", " logprobs = np.vstack([model.score_samples(X)\n", " for model in self.models_]).T\n", " result = np.exp(logprobs + self.logpriors_)\n", " return result / result.sum(axis=1, keepdims=True)\n", " \n", " def predict(self, X):\n", " return self.classes_[np.argmax(self.predict_proba(X), 1)]\n", "```\n", "Because this is a probabilistic classifier, we first implement `predict_proba`, which returns an array of class probabilities of shape `[n_samples, n_classes]`.\n", "Entry `[i, j]` of this array is the posterior probability that sample `i` is a member of class `j`, computed by multiplying the likelihood by the class prior and normalizing.\n", "\n", "The `predict` method uses these probabilities and simply returns the class with the largest probability." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Using Our Custom Estimator\n", "\n", "Let's try this custom estimator on a problem we have seen before: the classification of handwritten digits.\n", "Here we will load the digits and compute the cross-validation score for a range of candidate bandwidths using the ``GridSearchCV`` meta-estimator (refer back to [Hyperparameters and Model Validation](05.03-Hyperparameters-and-Model-Validation.ipynb)):" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "from sklearn.datasets import load_digits\n", "from sklearn.model_selection import GridSearchCV\n", "\n", "digits = load_digits()\n", "\n", "grid = GridSearchCV(KDEClassifier(),\n", " {'bandwidth': np.logspace(0, 2, 100)})\n", "grid.fit(digits.data, digits.target);" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Next we can plot the cross-validation score as a function of bandwidth (see the following figure):" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "best param: {'bandwidth': 6.135907273413174}\n", "accuracy = 0.9677298050139276\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXsAAAEWCAYAAACHVDePAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAwKElEQVR4nO3de1xUZf4H8M9cmGFguF9UQMBAFEVCvKfkLTMvq6WVZpltlnZbd9NatS1rTY2s/dVmm6aVFm2Kpamla0U30LxiqBCKIiAXheEmzAwwzMz5/YFOkqKIHAbmfN6vl6+cOZfnO3j68MxzznmOTBAEAURE5NDk9i6AiIjEx7AnIpIAhj0RkQQw7ImIJIBhT0QkAQx7IiIJYNhTsxUUFKBv376N3tu1axcGDRqEffv2oaCgAJGRkZg8eTImT56MP/3pT5gyZQq2bdtmW3/r1q3o16+fbZ1Lf/7+979ftb0ePXrgwQcfvGLZ4sWL0aNHD5SXl9/QZ5g7dy62bt16zXUOHDiAiRMnXrWeyz/f5MmTMWnSJHzxxRc3VAMAnDt3DhMnTsSkSZPw66+/3vD2RDdKae8CqOPatGkT3nvvPWzYsAGRkZEoKCiAs7Mztm/fblunsLAQjzzyCDQaDcaOHQsA6N+/P95///1mtaFWq5Gbm4vCwkIEBgYCAIxGI1JTU1v/AzXDHz9fcXExJk6ciKioKPTs2bPZ+zlw4AB8fX2xYcMGEaokuhLDnlpk7dq12Lp1Kz777DMEBQU1uV5gYCDmzZuHDz/80Bb2N0KhUGDcuHH46quv8MQTTwAAvv32W4wePRofffSRbb3ExEQkJCRALpfD19cXL730Erp164bi4mIsWrQIJSUlCAgIQFlZmW2b7OxsLF++HJWVlbBYLJg5cybuvffeG6qvU6dOCAkJQW5uLnr27InPP/8cGzduhNVqhaenJ1566SWEhYVh0aJFqKysRH5+PlxcXKDT6VBdXY2ZM2ciISGhyfov327EiBEoKyuDWq3G8ePHUVpainHjxsHb2xs//vgjdDodli1bhiFDhiAnJwdLly6F0WhESUkJevbsibfffhtqtRp9+vTBnDlzsHfvXpSUlODhhx/GI488AgB4//338eWXX0KpVCIkJATx8fFwc3Nr8nNRByIQNVN+fr4QExMjvP7660JERITw6aefXnX5H2VlZQm33nqrIAiCsGXLFiE2NlaYNGlSoz9ffPFFk+0dP35cGDdunO39WbNmCSdPnhQiIiKEsrIy4ZdffhHuuOMOoayszNbGuHHjBKvVKjz11FPCW2+9JQiCIOTm5goxMTHCli1bhPr6emH8+PFCenq6IAiCUFVVJYwbN0749ddfhf379wsTJkxosp7LHTlyRBgwYIBQVFQkHDhwQJgxY4ZgNBoFQRCElJQUW90LFy4UZs2aZdtuy5Ytwpw5cwRBEK5Z/x+3W7hwoXDfffcJJpNJKCkpESIiIoRPPvlEEARB2LBhg/DnP/9ZEARBiI+PF7Zt2yYIgiCYTCZh4sSJwu7duwVBEISIiAghISFBEARBOH78uBAVFSXU1tYKSUlJwp133ilUVlYKgiAIK1asEN57771rfi7qONizpxtiNBqRlZWFtWvX4tlnn0VsbCwiIyOvuY1MJoOzs7Pt9Y0M4wBAVFQU5HI50tPT4ePjA4PBgIiICNvylJQUjB8/Ht7e3gCAKVOmYPny5SgoKMAvv/yChQsXAgBCQkIwaNAgAEBubi7Onj2LF154wbaf2tpa/Pbbb9fssdbW1mLy5MkAAIvFAi8vL7zxxhvo0qULEhISkJeXh+nTp9vWv3DhAiorKwEA/fr1u+o+r1X/1bYbOXIknJyc4OfnBxcXF8TFxQEAgoODbW09//zz2Lt3L9atW4fc3FyUlJTAaDTa9jF69GgAQO/evWEymWA0GrFv3z7cdddd8PDwANBwXgQAVq5c2eTn8vT0bPJnRe0Lw55uiLOzM1avXg0nJyfMnTsXTz/9NLZu3XrN/+mPHz/eKJxbYtKkSdixYwe8vb1tYXuJcJXpnQRBgNlshkwma7RcqWw45C0WC9zd3RuNv5eWlsLNzQ1paWlN1vHHMfvLWa1WTJ48Gc8//7ztdUlJiS08XVxcrrrdteq/2nYqlarR60uf6XLz58+HxWLBuHHjMGLECJw7d65RO2q1GkDDL+JL7SkUCttrAKiqqkJVVdV1Pxd1DLwah26IXC6Hk5MTAGDOnDkIDw/HggULYLVar7p+Tk4O3nvvPTz66KM31e7kyZOxe/du7Nq164orZYYNG4Zdu3bZrszZsmULPD09ERISgri4OCQmJgIAioqKcODAAQBAt27doFarbcF96eqY9PT0Ftc4dOhQ7Ny5EyUlJQCAjRs3YtasWdfd7lr1t9SePXvw9NNPY/z48ZDJZDh69CgsFss1t7ntttvw3XffQa/XAwBWrVqFDRs2tPhzUfvCnj21mEwmw+uvv4577rkHb7/9Nu6///5GwxxyuRxqtRrz58/HiBEjbNsdPnz4it65QqG45iWRnTp1QlhYGNzc3K74FjF06FA88sgjmDVrFqxWK7y9vfH+++9DLpfj5ZdfxuLFizFu3Dh07tzZdsWMSqXCe++9h+XLl+ODDz6A2WzGX//6V/Tr18/2C+FGxcXF4fHHH8ejjz4KmUwGrVaLd999t1Fv+WquVX9LPfvss3j66afh4eEBjUaDAQMG4OzZs9fcZvjw4Th9+jQeeOABAEB4eDheffVVaLXaFn0ual9kwtW+QxIRkUPhMA4RkQQw7ImIJEC0sD969Chmzpx5xfs//PADpk6dimnTpmHz5s1iNU9ERJcR5QTtunXrsGPHDmg0mkbv19fX47XXXsMXX3wBjUaDBx54AKNGjYKvr68YZRAR0UWi9OyDg4OxatWqK97Pzs5GcHAwPDw8oFKp0K9fPxw6dEiMEoiI6DKi9OzHjh1ru/vvcnq9Hm5ubrbXrq6utmt6/8heE10REXVkTd2p3abX2Wu1WhgMBttrg8HQKPz/qKmiryczM/O6t/ATtRSPLxLTzRxf1+okt+nVOGFhYcjLy0NlZSVMJhMOHz58xfzoRETU+tqkZ//VV1/BaDRi2rRpWLRoEWbPng1BEDB16lR06tSpLUogIpI00cI+KCjIdmnln/70J9v7o0aNwqhRo8RqloiIroI3VRERSQAnQqMWMdSZUVhZg8LKGuiq61BhMKHcaIKx7veZFc1WAbX1FtTWW2Ay/z4r5uWTMckAODspoFEpoFLKUVdvRa3Zgrp6Cy7N2mQVBFyoqUe5wYTKmnooZDLbNhqnhj/OKgWsVgE19RbUmCxQKeXwdlXBy0UFjUoOGRom7VIr5fByVcHbVQUPjVPDtk4KeLo4oZuvK5ydFG3w0yNqewx7uq4KgwmH8ypwrKAS6YUXkF5UBV113RXrqRRyuKp/nxNdLpNBo5JD46SAk0KOyydJvBS+VqHhF0KNyYI6sxXOTgo4O8mhVipwadJHGWTw0Dgh0MsFnhonCBBQY7Kipt6MGpMFNfUWXDCaoJDLoFEp4KlxgsliRUl1LU6cq0LtZb9oLq1/NTIZEOipQZifFmF+WoT7a9GjsxbRQZ5wUvBLMHVsDHu6qvxyIz7am4M9p0pxqqThXgi5DOju74bbu/sh3F+LQC8NAj2d4e/mDG9XFVxUig4x7W1tvQXlBhOqautt4V+mN+GMzoDTOj1Ol+hxIKcMtfUNvyRcVQoMCfPF4Fu8oVEpcP5cFY5U5dn2J4MMfm5qBHpqEOipgYeLk70+GlGTGPbUSFZxNd778TS+OnYOchkwNNwXd/cNxIBQb/QJ9IBG1fGHOZydFAjw1CAAmibXsVoFFF2oQXrhBaScKkXyKR2SMosvW6O0yW29XVUI99MizN8VQV4uCPLSIMBTg3A/LbxcVU1uRyQmhj0BaAi3NcnZ+Ne3WVAr5Xh0aChmD7sFnT2cr7+xA5LLZReD2gV3RXWBIAgoN5hgsQo4deoUunfvblvXIggoqapDYWUNCiqMyCk1ILvEgG8zilFmMDXab6CnBlGB7gjxcYWXiwo+rir0CnBH7wD3DvGtiDouhj2hTF+H+ZuP4ucsHSb06YJld0exB/oHMpkMPtqG57aWuSjh7974l2AXDw1u7ep5xXZGkxlFlTXIr6hB1vlqpBdVIaPwAn46qUPdZecSbvFzxeRbAzGsuw98XNXwclXB3VnJXwDUahj2Epet0+OhDw6gzGDCq3dH4aFBwQyYVuSiUiLc3w3h/m4Y2cO/0bIakwWl+jrsOV2Kbb8W4q2kLLyV9PtyN2cl+od4oX+oN/oGe6K7vxt8tSr++1CLMOwlLLfUgBnr9sNsEbD1ydsQFehh75IkRaNSoKu3Cx4YGIwHBgbj3IUanDxfjQqjCWV6E7J1BhzKLcePJ0/atnF3VqJ3gAfGR3fBhD5d4M1vYNRMDHuJyi83Ysa6/TCZrdg4ZzB6dna3d0mS18VDgy4eV540LjeYkF54AWd0epzW6XHgTDle2paOf+7IwIgefpg5JBS3d/dlj5+uiWEvQfnlRjywbj8MJgs+e3wQg76d83ZV4fYIP9we4QcAEAQBmeeqsT2tEFuOFCIp8yDC/Fwx67ZQjIvqAj83tZ0rpvaIYS8xZ8sagr66th6fPjYIvQM4dNPRyGQy9ApwR68Adyy4swd2Hi/C+r25WLI9Ay/vyED/EC+M7d0ZE6K7XPWbAkkTw15CLo3RN/ToB3OM3gGolHLc0zcId8cEIvNcNb797Ty+ySjGsp2ZWL4rE4O6eePumEBMigmAi4r/u0sZ//UloriqFtPX7keduWHohj16x3J5b/9vd0Qgt9SA7WlF2J5WiEVbj+P13Sfw56HdMGtIKO/wlSiGvUS88c1JlBtM2Pb0UPQK4Bi9owv1dcVf7+iOeaPDcTivAmt+ysb/fZeF93/OxoToLrg7JhCDbvGBQs6TulLBsJeA34qqsOVIAR4b1o1BLzEymQwDQr0x4BFvZJ6rwod7crDz2DlsPlyATu5qDL7FB30CPdA7wAM9OrvxUk4HxrCXgNf+lwl3Zyc8M7L79VcmhxXZxR1v3ncrlt0dhaTMYuw8dg4Hc8qxPa3Ito63qwphfq6ICvTAgFBv9A/1gr+bNKfMcDQMeweXnKVDyqlSvDghkmO1BKBhIriJ0QGYGB0AACjV1yGjqAqniquRrdPjVLEeGw+exfq9uQCAUB8X9A/1xsBQb4T5N8z5r3FSwN/dGVo1I6Sj4L+UA7NYBazYlYlgbxfMHBJi73KonfLVqjE8wg/DL17HDwAmsxUZRRdwKLcch3Ir8H1mMb5ILWi0nVwG9OzsjgGhXhgS5oO47n5wZfi3W/yXcWA7jhbixPlqvDujL9TKjj81MbUdlVKOvsFe6BvshTm3N8yKeqZUj8LK2ovPADAjt9SIw3nl2Hy4AB/vy4NaKUdcd1+M7d0ZY6M6w92Z3yTbE4a9g7JYBaz64TR6dnbDhD5d7F0OdXByucw2odsf1VusOJxbgW8yzuPbjPNIyizBP7alY3RPf9wV1RndfF0R4KmBjysncbMnhr2D2nX8HM7oDHjvwVj+D0aiclLIMSTMB0PCfPDyn3ohLb8S29OK8PWxIvwv/fxl68kanhmsUsBVrUSojyvC/FwR7v/7YyA9XXg1kFgY9g7IahXw7g+n0d1fi7t6d7Z3OSQhMpnMNvzz4oRInCyuRmFFDYoqa1BcXdcwBGSy4EJNPXLLDNhzurTRw+h9XFUNzwD21yLMzxVdvV0Q6NnwpC8vFyd2XG4Cw94BffvbeZwsrsa/p8dAzptmyE6UCjl6B3hc825ti1VAQYUR2To9sksMOF2iR7ZOj/+ln0Olsb7RuhonBQI8nRHgqYGniwoap4aH2btrnODlooK3qwpdvTWI7OLOqSGugj8RByMIDWP13XxdbZfWEbVXCrkMIT6uCPFxxaiejZeVG0worKhBYWXDn3MX/1tUWYPCihrU1FtgNFlQXVsPq/D7dnIZEOanxZAwH9zfvyvngLqIYe9gfjqpQ0ZRFd64N5q3wlOH5u3a0FvvE3TtsLZaBVTV1qPMYMIZnQHphRdwvPACEg/l45N9eYgKdMeMgSG4p28gNCrpXpXGsHcwn+zLhb+bGnf3DbR3KURtQi6XwdNFBU+XhvH+Mb06AQAuGOux/WghNh7MxwtfNkwGN31gVzw8JBSBntKb+llu7wKo9RRUGPFTlg7TB3SFk4L/tCRtHi5OeHhIKHbNG4bPnxiCoeE++CAlByPf/Amrvj/V6MSwFLBn70ASD+VDBmDawGB7l0LUbtgmgwv1RkGFEa/97wT+9V0WdhwtwrK7ozDoFh97l9gm2P1zEPUWKzYdyseIHv6S/IpK1BxBXi74z4xYfPRIfxhNFkxbux8T3klBwr5cVNXWX38HHRjD3kF8n1kMXXUdZrBXT3Rdo3p2wrfP3o5XJ/eGIAAvbc/AwOVJWLD5KA7llkMQhOvvpIPhMI6D+O+Bs+ji4YwRPfyuvzIRwVWtxMwhoXhocAiOF17ApkP52JFWhC1HCtDdX4v5YyJwV1Rnh7mRiz17B3C2zIiUU6WYNqArlDwxS3RDZDIZooM8seKePjjwwmisvDcaMhnw5H+P4N41+5CaV27vElsFk8EBrEs5A4VchmkDutq7FKIOzVWtxP39u2LXvDjET+mD/HIjpq7eh5kfHsAv2aUdeniHYd/BnSquxmcHz+LBQcHo4sETs0StQamQY/rAYPz0/AgsvKsnMs9VY8a6A5iy+hecLtHbu7wWESXsrVYrlixZgmnTpmHmzJnIy8trtPyjjz7ClClTMHXqVHz33XdilCAZK3ZlwkWlwN/uiLB3KUQOx0WlxJMjwrBn4Ui8encUzpYZce+aX3DkbIW9S7thooR9UlISTCYTEhMTsWDBAsTHx9uWVVVV4ZNPPsGmTZvw0UcfYcWKFWKUIAnJWTr8eFKHeaO680HRRCJydlJg5uAQbH3qNnhonDBj3X78eKLE3mXdEFHCPjU1FXFxcQCAmJgYpKen25ZpNBoEBASgpqYGNTU1DnOmu62ZLVYs2/kbQnxc8PBtfOQgUVsI8XHFF0/chnB/LR775DAS9ud1mHF8US691Ov10Gq1ttcKhQJmsxlKZUNzXbp0wYQJE2CxWDB37twm95OZmdmi9mtra1u8bUfxzakqZBXr8eKITjhzKsve5UiKFI4vurZ/DvdG/M/1eGlbOn4+notnBvtCrWydvrNYx5coYa/VamEwGGyvrVarLeiTk5NRUlKC77//HgAwe/ZsxMbGIjo6+or9REZGtqj9zMzMFm/bUaz45QBu8XPF7LH9+O2ojUnh+KLrS4zqhX9/fwr//v4UztXIsfbh/q1y9/rNHF+pqalNLhNlGCc2NhbJyckAgLS0NERE/H7y0MPDA87OzlCpVFCr1XBzc0NVVZUYZTis2noLDuaUY0SEP4OeyE7kchmeHROBD2f1x9kyIx5Yux/nLtTYu6wmiRL2Y8aMgUqlwvTp0/Haa69h8eLFWL9+Pb7//nv0798fffr0wf33349p06YhNDQUQ4cOFaMMh3Uotxx1ZiviInztXQqR5I2O7ISExwah3GDCjHUHUFxVa++SrkqUYRy5XI6lS5c2ei8sLMz293nz5mHevHliNC0JKadKoVLIMaibt71LISIAMV098fGjA/DwhwcxY91+bJozBH5uanuX1QhvquqAkrN06B/qxedsErUj/UK8sf7PA1FUWYv71vyCs2VGe5fUCMO+gympqsWJ89WI684Jz4jam4HdvPHpY4NQYazHlNW/IKPogr1LsmHYdzB7TpcCAOK6c7yeqD3qF+KFL54YAieFDNPe349fskvtXRIAhn2Hk3KqFD6uKvTq4m7vUoioCd07uWHrU7ehi4czHvnoEHYeO2fvkhj2HYnVKiDlVCmGdfeFXM5LLonasy4eGnz+xBBEB3ngmY1H8Mm+XLvWw7DvQE6cr0apvo7j9UQdhKeLCgmzB2F0T38s2Z6B13efgNVqn+kVGPYdSMopHQCO1xN1JBqVAmse6ocHBnbF6p+y8cSnqTDUmdu8DoZ9B3Igpxxhfq7o5O5s71KI6AYoFXKsuKcPlkzshaTMYkxd/Qvyy9v20kyGfQchCALS8isRG+xl71KIqAVkMhkeHdYN6/88EIWVNZi+dj/K9HVt1j7DvoPIL69BucGEmGBPe5dCRDdheIQfPp09CDp9HZ787xGYzNY2aZdh30H8mt/wZJyYrp72LYSIbtqtXT2xcmo0DuaU459fZbRJm7zfvoNIy6+ExkmBHp3c7F0KEbWCu/sG4sT5aqz5ORthflo8OqybqO0x7DuItPxK9An0gFLBL2NEjuL5sT1wukSPpV//hvTCC/jn5N6itcXk6ADqzBZkFFZxvJ7IwSjkMqx5KBZ/u6M7tqUVYty/U5CpE2eKZIZ9B5B5rhomi5Xj9UQOSKmQ4293RODzJ26DXCbDh4fLxWlHlL1Sq0o723Byti979kQOq1+IF76bfzuOZYjzfGP27DuAtPxKdHJXo4vHzT/fkojaL7VSAa1KIcq+GfYdQFp+JYdwiOimMOzbuQqDCbllRsR05Z2zRNRyDPt2qN5iRW29BQCQVlAJgDdTEdHN4QnadmjOJ4exN7sMg7p5QxAAuQyIDvKwd1lE1IEx7NuZOrMFe7PL0N1fi/MXanGqRI9bgzzgquY/FRG1HBOknUkvvACT2Yp5o7tjbO/OOH+hFmolR9uI6OYw7NuZgzkN19T3D2k4IdvZg3PXE9HNY5exnTmcW45b/Fzho1XbuxQiciAM+3bEahVwOK8CA0K87V0KETkYhn07clqnx4WaegzoxrAnotbFsG9HDuY0TIA0IJQ3UBFR62LYtyOHc8vh56ZGsLeLvUshIgfDsG9HDuVWYECoF2Qymb1LISIHw7BvJ4oqa1BYWYP+PDlLRCJg2LcTh/Marq8fEMqwJ6LWx7BvJw7llMNVpUBkFz5QnIhaH8O+HbBaBew5XYrYEC8+UJyIRMFkaQd2Hj+HnFID7u0XZO9SiMhBMeztzGIV8HZSFiI6aTExOsDe5RCRgxJlIjSr1YpXXnkFJ0+ehEqlwrJlyxASEmJb/vPPP+M///kPBEFA79698fLLL0v2csMdRwuRrTPgvQdjoZBL82dAROITpWeflJQEk8mExMRELFiwAPHx8bZler0eb7zxBtasWYPPP/8cgYGBqKioEKOMds9sseLfSafQs7Mb7urd2d7lEJEDEyXsU1NTERcXBwCIiYlBenq6bdmvv/6KiIgIvP7665gxYwZ8fX3h7S3Nyw2//LUQuWVGPDsmAnL26olIRKIM4+j1emi1WttrhUIBs9kMpVKJiooKHDhwANu2bYOLiwsefPBBxMTEoFu3blfsJzMzs0Xt19bWtnjbtvTOd/kI81ahq6wcmZnS/HbTEXWU44s6JrGOL1HCXqvVwmAw2F5brVYolQ1NeXp6ok+fPvDz8wMA9O/fH5mZmVcN+8jIyBa1n5mZ2eJt24q+zoz8C2fw/Nge6NUr3N7l0A3oCMcXdVw3c3ylpqY2uUyUYZzY2FgkJycDANLS0hAREWFb1rt3b2RlZaG8vBxmsxlHjx5FeLj0wi5H1/DLMMzP1c6VEJEUNKtnbzKZoFKpmr3TMWPGYO/evZg+fToEQcCKFSuwfv16BAcHY/To0ViwYAEee+wxAMBdd93V6JeBVGTr9ACAMD/tddYkIrp5zQr7qVOnYvDgwbjvvvuaFcxyuRxLly5t9F5YWJjt7xMmTMCECRNusFTHkq3TQyGXIdiH0xkTkfiaFfbbt29HSkoK3n33XVRUVGDSpEkYP348XF05BNFS2To9gr1doFYq7F0KEUlAs8bs5XI5br/9dkydOhWenp5ISEjA7Nmz8emnn4pdn8M6ozNwvJ6I2kyzevYrV67E999/j4EDB+Lxxx9HdHQ0rFYrpkyZgoceekjsGh2OxSrgTKkBwyP87F0KEUlEs8I+NDQUX375JVxcXFBfXw+gobf/7rvvilqcoyqsqIHJbOXJWSJqM80axhEEAatWrQIAzJ07F9u2bQMABAVxlsaWuHQlzi0cxiGiNtKssN+0aRMWLFgAAHj//fexceNGUYtydLzskojaWrNP0F66A9bJyUmyM1S2lmydHt6uKni5Nv/eBSKim9GsMfvRo0djxowZiI6ORkZGBkaNGiV2XQ4tu4RX4hBR22pW2D/11FMYOXIkcnJycPfdd6Nnz55i1+XQsnV6jOnVyd5lEJGENGsYJy8vD8nJyThz5gySkpKwZMkSsetyWJVGE8oMJo7XE1GbalbYXzo5e+TIERQUFKCyslLMmhxa9qUJ0Pw5jENEbadZYe/i4oK5c+eiU6dOiI+PR2lpqdh1tbp1yWew9UiBvcvglThEZBfNCnuZTAadTgeDwQCj0Qij0Sh2Xa3uwz052Hw4395lIFunh0ohR5AXJ0AjorbTrLB/5plnkJSUhMmTJ+OOO+7AkCFDxK6rVdXWW3C+qhYl1XX2LgXZJQZ083Xlw8WJqE0162qcY8eOYfbs2QAaLsPsaM6WN3wT0VW1fdgb6sy469/JcFYqMKCbN44VVKJ/qFeb10FE0tasnv3PP/8Mi8Uidi2iyStrCPvqOjNqTG37OQ7nVSC/vAYuKgW+SitCSXUdegd4tGkNRETN6tlXVFQgLi4OQUFBkMlkkMlk2LRpk9i1tZq8st+fh1tSXYsQn7a7Emb/mTIo5TJsnDMYaqUCuWUGBHlp2qx9IiKgmWG/Zs0asesQ1aWePQCUVNe1adgfOFOG6CAPuKgaftS8CoeI7KFZYf/ll19e8d4zzzzT6sWIJa/cCI2TAjX1FpS04bi9oc6MYwUXMOf2W9qsTSKiq2lW2Pv6+gJomOr4t99+g9VqFbWo1pZXZkBsiCf2ni5DSXVtm7WbmlcBs1XA4Ft82qxNIqKraVbYT58+vdHrxx57TJRixGC2WFFYUYPxfbrgYE55m15+uf9MGRRyGfqF8OobIrKvZoV9Tk6O7e86nQ5FRUWiFdTaiiprYbYK6ObjCj+tukXDOGaLFYWVNTc81n8gpxzRQR5wVTfrx0xEJJpmpdCSJUsgk8kgCAKcnZ2xcOFCsetqNbkXr8QJ9nGBn7tzi4Zx1qXk4P++O4nkv49EF4/mXUljNJlxNL8Sj3O8nojagWaF/QcffIDs7Gz06tULSUlJuO2228Suq9XkXbyhKtTHFf5uauSX39hUD4IgYOuRAtRbBHybUYxZt4U2azuO1xNRe9Ksm6qef/55ZGZmAmgY0lm0aJGoRbWmvFID1Eo5/N3U8HdT3/CY/Ynz1ThV0jB52e70802ud7pEjzv+72ck7MuFIAg4cKac4/VE1G40K+yLi4sxdepUAMDjjz+OkpISUYtqTXnlRgR7u0Aul8HfzRnlBhNM5uZfTbQ9rQgKuQwzBgXjQE4ZyvRX/2Xx5jcncbpEj5e2Z+Dpz47gp6wS9An0gJbj9UTUDjR71stLJ2nPnj3bri+9tFqFRq/PlhltJ1b93dUAgNImAvuPBEHAV0eLMCzcFzMGBsMqAEmZxVesd7zgAnZnnMe8UeFYPK4nvskoRnphFYdwiKjdaFa3c/HixXj22WdRWloKf39//POf/xS7rha77/19uNVXhiWRDWGdV27AsO4N9wn4uzWEfUl1HQI8r3+i9cjZChRW1mD+mAj0DnBHV28Ndqefx7QBwY3W+9d3J+GhccJjt98Cd2cn9A/1xttJWZgSG9j6H5CIqAWa1bOPjIzEihUrsGfPHjz11FPt+hm0nd2dselYBapq61FSXYfaeitCfBrmjvd3cwYAlFQ174qcHWlFUCvluLN3J8hkMtzVuzP2ni5DVW29bZ3DueX46aQOTwwPg7uzEwCgX4gXEmYPQkQnt1b+dERELdOssH/uuec6zAnaJ0eEwVgv4L/7z9rmxPnjME5zTtKaLVbsPH4Oo3r6w+1iiN8V1QUmixU/nmg4ZyEIAt745iR8tWrMui1EjI9DRNQqHO4EbVSgB2IDNPhwTw5OFlcDAEK8G3r2Pq4qyGSNwz6vzIC0/Mor9rPvTBlK9SZMujXA9l7frp7wd1Pjf8fPY/+ZMsxNSMWBnHI8MzLMNtEZEVF7dMMnaPPy8tr1CVoAuD/KE6X6Oqz5KRsKuQyBF6cUVirk8HFVQ3fZjVUvbkvHA2v3N7r+3moVsOqH0/B0ccLInv629+VyGcb27ozdGecxfe1+HMwtxzMjw/HgYPbqiah9a1Z39IUXXsD8+fOh0+ng7++PV155ReSybk50Z2fEdPVEWn4lgr1d4KT4/Xeav9vvUybUmS04lFuO2nor/rEtHR//eQBkMhkSD+fjYE45Vk6NhrOTotG+Zw4JQW6ZAeP7dMHdMYHQqBovJyJqj5rVs8/IyIDRaIRKpUJlZSWee+45seu6KTKZDE+OCAMA28nZS/zdf7+x6mj+BdTWWzE8wg/JWTpsTytCSVUtVuzKxOBbvHFf/6Ar9h3RyQ0JswfhgYHBDHoi6jCaFfafffYZEhISMHz4cLz22msIDw8Xu66bNiayEwbf4o1h4b6N3m+4i7ZhGGdfdhlkMuCtaTHoG+yJpV//hoVbjqHObMVrU6Ihk/Gh4ETkGJoV9v7+/vD394fBYMCgQYNQXV19zfWtViuWLFmCadOmYebMmcjLy7vqOo899hg2btzYssqvQy6XYdOcIZg7PKzR+/5uzijVm2CxCth3phS9urjD21WF16b0QVVNPX48qcNfR3dHN9+2e5oVEZHYmhX2bm5uSEpKsj17trKy8prrJyUlwWQyITExEQsWLEB8fPwV67z99tuoqqpqUdE3w99dDYtVwLkLNThythJDLt7l2rOzOxaPj8Sonv54PI4zVRKRY2lW2C9btgwBAQGYP38+cnNz8eKLL15z/dTUVMTFxQEAYmJikJ6e3mj57t27IZPJbOu0pUt30X6TUQyT2YohYb9PaTB7WDd89MgAqJTN+rEQEXUYzboaR6vVolevXgDQrBuq9Ho9tNrfH6ytUChgNpuhVCqRlZWFr7/+Gu+88w7+85//XHM/l27kulG1tbVNbmssaxivT9yXDbkM8DCVIjOzvEXtkDRd6/giulliHV+i3Amk1WphMBhsr61WK5TKhqa2bduG4uJizJo1C4WFhXByckJgYCBuv/32K/YTGRnZovYzMzOb3FbbyQj8rwhZZXW4NcgD/W/t3aI2SLqudXwR3aybOb5SU1ObXCZK2MfGxuLHH3/E+PHjkZaWhoiICNuyv//977a/r1q1Cr6+vlcNerH4XRzGAYDBYZyVkoikQZSwHzNmDPbu3Yvp06dDEASsWLEC69evR3BwMEaPHi1Gk83m7KSAh8YJF2rqbSdniYgcnShhL5fLsXTp0kbvhYWFXbHeX/7yFzGavy5/NzUMdWYMCPW2S/tERG1NkrN3de+kRSd3Z7jyKVJEJBGSTLt/3RcDAcL1VyQichCSDHvOaUNEUsO7h4iIJIBhT0QkAQx7IiIJYNgTEUkAw56ISAIY9kREEsCwJyKSAIY9EZEEMOyJiCSAYU9EJAEMeyIiCWDYExFJAMOeiEgCGPZERBLAsCcikgCGPRGRBDDsiYgkgGFPRCQBDHsiIglg2BMRSQDDnohIAhj2REQSwLAnIpIAhj0RkQQw7ImIJIBhT0QkAQx7IiIJYNgTEUkAw56ISAIY9kREEsCwJyKSAIY9EZEEMOyJiCRAKcZOrVYrXnnlFZw8eRIqlQrLli1DSEiIbfmGDRuwc+dOAMDw4cPxzDPPiFEGERFdJErPPikpCSaTCYmJiViwYAHi4+Nty/Lz87Fjxw5s2rQJmzdvxp49e3DixAkxyiAiootE6dmnpqYiLi4OABATE4P09HTbss6dO+ODDz6AQqEAAJjNZqjVajHKICKii0QJe71eD61Wa3utUChgNpuhVCrh5OQEb29vCIKAlStXolevXujWrdtV95OZmdmi9mtra1u8LdH18PgiMYl1fIkS9lqtFgaDwfbaarVCqfy9qbq6OrzwwgtwdXXFyy+/3OR+IiMjW9R+ZmZmi7cluh4eXySmmzm+UlNTm1wmyph9bGwskpOTAQBpaWmIiIiwLRMEAU899RR69OiBpUuX2oZziIhIPKL07MeMGYO9e/di+vTpEAQBK1aswPr16xEcHAyr1YqDBw/CZDIhJSUFADB//nz07dtXjFKIiAgihb1cLsfSpUsbvRcWFmb7+/Hjx8VoloiImsCbqoiIJIBhT0QkAQx7IiIJYNgTEUkAw56ISAIY9kREEsCwJyKSAIY9EZEEMOyJiCSAYU9EJAEMeyIiCWDYExFJAMOeiEgCGPZERBLAsCcikgCGPRGRBDDsiYgkgGFPRCQBDHsiIglg2BMRSQDDnohIAhj2REQSwLAnIpIAhj0RkQQw7ImIJIBhT0QkAQx7IiIJYNgTEUkAw56ISAIY9kREEsCwJyKSAIY9EZEEMOyJiCSAYU9EJAEMeyIiCRAl7K1WK5YsWYJp06Zh5syZyMvLa7R88+bNmDJlCu6//378+OOPYpRARESXUYqx06SkJJhMJiQmJiItLQ3x8fFYvXo1AECn0yEhIQFbtmxBXV0dZsyYgaFDh0KlUolRChERQaSefWpqKuLi4gAAMTExSE9Pty07duwY+vbtC5VKBTc3NwQHB+PEiRNilEFERBeJ0rPX6/XQarW21wqFAmazGUqlEnq9Hm5ubrZlrq6u0Ov1V91Pampqi2u4mW2JrofHF4lJjONLlLDXarUwGAy211arFUql8qrLDAZDo/C/pF+/fmKURkQkSaIM48TGxiI5ORkAkJaWhoiICNuy6OhopKamoq6uDtXV1cjOzm60nIiIWp9MEAShtXdqtVrxyiuvICsrC4IgYMWKFUhOTkZwcDBGjx6NzZs3IzExEYIgYO7cuRg7dmxrl0BERJcRJeyJiKh9EWXMvj05cuQIEhMTAQD/+Mc/4O7ubueKyBHt27cPX3/9NZYvX27vUsjB7Nu3Dzt37kRNTQ0ef/xx9OzZs0X7cfg7aDdv3oylS5fi3nvvxa5du+xdDjmgvLw8ZGZmoq6uzt6lkAOqqanBq6++itmzZ2PPnj0t3o/Dh73FYoFarYafnx90Op29yyEHFBISgkcffdTeZZCDGjVqFGpqapCQkIB77rmnxftx+GEcjUYDk8kEnU4HX19fe5dDRHRDysvL8cYbb2DevHnw8fFp8X46dM/+6NGjmDlzJoCm5+O5//77sWTJEmzatAmTJk2yZ7nUATXnGCNqqeYcX/Hx8dDpdPjXv/6F3bt3t7itDtuzX7duHXbs2AGNRgOg6fl4oqKiEB8fb+dqqSNq7jF2yZtvvmmvUqkDau7xtXLlylZpr8P27IODg7Fq1Srb62vNx0PUEjzGSExtfXx12LAfO3asbQoGoOn5eIhaiscYiamtj68OG/Z/dK35eIhaA48xEpPYx5fDhP215uMhag08xkhMYh9fDtMtGTNmDPbu3Yvp06fb5uMhak08xkhMYh9fnBuHiEgCHGYYh4iImsawJyKSAIY9EZEEMOyJiCSAYU9EJAEMeyIiCWDYExFJAMOeHNbWrVtbbSbK5ORkLFq06LrrLV++HEVFRY3ey87Otk1je+jQIZw4cQIAMHTo0Fapjag5GPZEregf//gHAgICmly+ZcsWlJSUtGFFRA0cZroEoqtJS0vDrFmzoNfr8Ze//AW1tbX473//C7PZDJlMhnfffRenTp3CunXr4OTkhIKCAowfPx5PPvkksrOz8cILL0Cj0UCj0cDDwwMff/wxzGYzZs+ejSVLlkClUuHFF1/E6tWrERQUhM2bN+OVV16Bm5sbnnvuOQiCAD8/PwBAeno6UlJSkJGRgfDwcJhMJixYsABFRUXw9PTEO++8AycnJzv/xMhRsWdPDk2j0WDDhg1Yu3Ytli5ditzcXKxduxYbN25EeHi47QHORUVFWLVqFRITE/HBBx8AAFauXIl58+Zhw4YN6Nu3L4CG+UtSUlIAADk5OTh69CgAICUlBSNHjrS1u2bNGkycOBEJCQm44447AABRUVGIi4vD888/j4CAABiNRjz77LPYuHEj9Ho9MjMz2+znQtLDsCeH1q9fP8hkMvj4+MDNzQ1KpRILFy7E4sWLcfLkSdt84REREVAqlXBxcYGzszMAIDc3F9HR0QAaZiQEgICAANTW1uLYsWMICwuDt7c3jh07Bjc3t0ZzkV9t2z/y8PBAUFAQAMDX1xc1NTXi/BCIwLAnB3f8+HEAgE6nQ3V1NT7++GO89dZbWLZsGdRqNS7NAyiTya7YNiwsDL/++isANHpq0PDhw/HGG29g2LBhGDp0KJYtW2brvV9t20s1XGrnWm0SiYVhTw6ttrYWDz/8MJ588kksX74csbGxmDZtGh588EE4Oztf82TpokWLsHr1asyaNcs2XAMAd955J44cOYLBgwdj2LBhSE9Px+jRoxtt++STTyIpKQkzZ87EDz/8YHv/1ltvxZtvvons7OzW/7BE18ApjomIJIA9eyIiCWDYExFJAMOeiEgCGPZERBLAsCcikgCGPRGRBDDsiYgkgGFPRCQB/w/hvuUzgQ+qmgAAAABJRU5ErkJggg==", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots()\n", "ax.semilogx(np.array(grid.cv_results_['param_bandwidth']),\n", " grid.cv_results_['mean_test_score'])\n", "ax.set(title='KDE Model Performance', ylim=(0, 1),\n", " xlabel='bandwidth', ylabel='accuracy')\n", "print(f'best param: {grid.best_params_}')\n", "print(f'accuracy = {grid.best_score_}')" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "This indicates that our KDE classifier reaches a cross-validation accuracy of over 96%, compared to around 80% for the naive Bayes classifier:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "0.8069281956050759" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.naive_bayes import GaussianNB\n", "from sklearn.model_selection import cross_val_score\n", "cross_val_score(GaussianNB(), digits.data, digits.target).mean()" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "One benefit of such a generative classifier is interpretability of results: for each unknown sample, we not only get a probabilistic classification, but a *full model* of the distribution of points we are comparing it to!\n", "If desired, this offers an intuitive window into the reasons for a particular classification that algorithms like SVMs and random forests tend to obscure.\n", "\n", "If you would like to take this further, here are some ideas for improvements that could be made to our KDE classifier model:\n", "\n", "- You could allow the bandwidth in each class to vary independently.\n", "- You could optimize these bandwidths not based on their prediction score, but on the likelihood of the training data under the generative model within each class (i.e. use the scores from `KernelDensity` itself rather than the global prediction accuracy).\n", "\n", "Finally, if you want some practice building your own estimator, you might tackle building a similar Bayesian classifier using Gaussian mixture models instead of KDE." ] } ], "metadata": { "anaconda-cloud": {}, "jupytext": { "formats": "ipynb,md" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" } }, "nbformat": 4, "nbformat_minor": 4 }