{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Implementing CDFs\n", "\n", "Copyright 2019 Allen Downey\n", "\n", "BSD 3-clause license: https://opensource.org/licenses/BSD-3-Clause" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import numpy as np\n", "import pandas as pd\n", "\n", "import seaborn as sns\n", "sns.set_style('white')\n", "\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import inspect\n", "\n", "def psource(obj):\n", " \"\"\"Prints the source code for a given object.\n", "\n", " obj: function or method object\n", " \"\"\"\n", " print(inspect.getsource(obj))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Constructor\n", "\n", "For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/11).\n", "\n", "The `Cdf` class inherits from `pd.Series`. The `__init__` method is essentially unchanged, but it includes a workaround for what I think is bad behavior." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " def __init__(self, *args, **kwargs):\n", " \"\"\"Initialize a Pmf.\n", "\n", " Note: this cleans up a weird Series behavior, which is\n", " that Series() and Series([]) yield different results.\n", " See: https://github.com/pandas-dev/pandas/issues/16737\n", " \"\"\"\n", " if args or ('index' in kwargs):\n", " super().__init__(*args, **kwargs)\n", " else:\n", " underride(kwargs, dtype=np.float64)\n", " super().__init__([], **kwargs)\n", "\n" ] } ], "source": [ "from empiricaldist import Cdf\n", "\n", "psource(Cdf.__init__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can create an empty `Cdf` and then add elements.\n", "\n", "Here's a `Cdf` that representat a four-sided die." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "d4 = Cdf()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "d4[1] = 1\n", "d4[2] = 2\n", "d4[3] = 3\n", "d4[4] = 4" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
11
22
33
44
\n", "
" ], "text/plain": [ "1 1\n", "2 2\n", "3 3\n", "4 4\n", "dtype: int64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In a normalized `Cdf`, the last probability is 1.\n", "\n", "`normalize` makes that true. The return value is the total probability before normalizing." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " def normalize(self):\n", " \"\"\"Make the probabilities add up to 1 (modifies self).\n", "\n", " :return: normalizing constant\n", " \"\"\"\n", " total = self.ps[-1]\n", " self /= total\n", " return total\n", "\n" ] } ], "source": [ "psource(Cdf.normalize)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4.normalize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now the Cdf is normalized." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
10.25
20.50
30.75
41.00
\n", "
" ], "text/plain": [ "1 0.25\n", "2 0.50\n", "3 0.75\n", "4 1.00\n", "dtype: float64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Properties\n", "\n", "For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/2).\n", "\n", "In a `Cdf` the index contains the quantities (`qs`) and the values contain the probabilities (`ps`).\n", "\n", "These attributes are available as properties that return arrays (same semantics as the Pandas `values` property)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4.qs" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.25, 0.5 , 0.75, 1. ])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4.ps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sharing\n", "\n", "For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/12).\n", "\n", "Because `Cdf` is a `Series` you can initialize it with any type `Series.__init__` can handle.\n", "\n", "Here's an example with a dictionary." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
a0.333333
b0.666667
c1.000000
\n", "
" ], "text/plain": [ "a 0.333333\n", "b 0.666667\n", "c 1.000000\n", "dtype: float64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = dict(a=1, b=2, c=3)\n", "cdf = Cdf(d)\n", "cdf.normalize()\n", "cdf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's an example with two lists." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
10.25
20.50
30.75
41.00
\n", "
" ], "text/plain": [ "1 0.25\n", "2 0.50\n", "3 0.75\n", "4 1.00\n", "dtype: float64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qs = [1,2,3,4]\n", "ps = [0.25, 0.5, 0.75, 1.0]\n", "d4 = Cdf(ps, index=qs)\n", "d4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can copy a `Cdf` like this." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
10.25
20.50
30.75
41.00
\n", "
" ], "text/plain": [ "1 0.25\n", "2 0.50\n", "3 0.75\n", "4 1.00\n", "dtype: float64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4_copy = Cdf(d4)\n", "d4_copy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, you have to be careful about sharing. In this example, the copies share the arrays:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4.index is d4_copy.index" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4.ps is d4_copy.ps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can avoid sharing with `copy=True`" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
10.25
20.50
30.75
41.00
\n", "
" ], "text/plain": [ "1 0.25\n", "2 0.50\n", "3 0.75\n", "4 1.00\n", "dtype: float64" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4_copy = Cdf(d4, copy=True)\n", "d4_copy" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4.index is d4_copy.index" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4.ps is d4_copy.ps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or by calling `copy` explicitly." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
10.25
20.50
30.75
41.00
\n", "
" ], "text/plain": [ "1 0.25\n", "2 0.50\n", "3 0.75\n", "4 1.00\n", "dtype: float64" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4_copy = d4.copy()\n", "d4_copy" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4.index is d4_copy.index" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4.ps is d4_copy.ps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Displaying CDFs\n", "\n", "For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/13).\n", "\n", "`Cdf` provides `_repr_html_`, so it looks good when displayed in a notebook." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " def _repr_html_(self):\n", " \"\"\"Returns an HTML representation of the series.\n", "\n", " Mostly used for Jupyter notebooks.\n", " \"\"\"\n", " df = pd.DataFrame(dict(probs=self))\n", " return df._repr_html_()\n", "\n" ] } ], "source": [ "psource(Cdf._repr_html_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Cdf` provides `plot`, which plots the Cdf as a line." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " def plot(self, **options):\n", " \"\"\"Plot the Cdf as a line.\n", "\n", " :param options: passed to plt.plot\n", "\n", " :return:\n", " \"\"\"\n", " underride(options, label=self.name)\n", " plt.plot(self.qs, self.ps, **options)\n", "\n" ] } ], "source": [ "psource(Cdf.plot)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "def decorate_dice(title):\n", " \"\"\"Labels the axes.\n", " \n", " title: string\n", " \"\"\"\n", " plt.xlabel('Outcome')\n", " plt.ylabel('CDF')\n", " plt.title(title)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "d4.plot()\n", "decorate_dice('One die')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Cdf` also provides `step`, which plots the Cdf as a step function." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " def step(self, **options):\n", " \"\"\"Plot the Cdf as a step function.\n", "\n", " :param options: passed to plt.step\n", "\n", " :return:\n", " \"\"\"\n", " underride(options, label=self.name, where=\"post\")\n", " plt.step(self.qs, self.ps, **options)\n", "\n" ] } ], "source": [ "psource(Cdf.step)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAESCAYAAAAVLtXjAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAT30lEQVR4nO3de5RdZXnH8e8EU2MxI0KspAEVl/CUVE2EaBIMFNSgpGJS8QaKmJpKFOolVS5e8bK8UFMVXVHBUsVyEYGYoDWgqEUSglwKhoQ8OrbcIlhxQQYMhCST/nHO2O14Zk4S2DM5eb+ftVicvd99zn427+I373nPvnRt3boVSVJZRo10AZKk4Wf4S1KBDH9JKpDhL0kFMvwlqUCGvyQV6AkjXYA03CJiPvB2YDSwFbgJ+EBm3lnT/qYAl2Tms5r73iMzP13HvqRtZfirKBHxWWAS8MrMvCsiRgFvAq6NiKmZeXed+8/Mr9T5+dK2MvxVjIjYB5gP7JuZ9wNkZh9wXkQcDJwOnBQRtwNfB14KPAM4LzM/1PyMo4EPAn8GbADem5nXttjX24H3AOuBVZX1ZwDjMvPkiJgAfKm5j9HARZn5ycf9wKUWnPNXSaYCt/UH/wA/BGZUlp+cmYcChwDvjYj9ImJ/4JPArMx8AfA24LKI2L36QRExGTgDOCwzXwg8Okg93wTOzcyDgRcBL4uI1+344UnbzvBXaUYPsv6JNOb/+y0ByMx1wP8CewIzgfHAVRFxM3A+0Ac8Z8BnvRS4MjPvbS6fPXBnzT8YfwN8vPlZK2l8A5i8A8ckbTenfVSSlcD+EbF3JZj7HQGsqCw/XHm9FegCdgOuyszX9zdExL7Ar1vsq6vyenOL9t2a2xySmRuanzUOeGQbj0V6TBz5qxjNUfxZwIXN+XYAImIucAzwmTYfcRVwZET8VfN9s4CfA08asN2Vze32aS6/pUUtvTT+GC1oftYewHJg9vYdlbRjDH8VJTNPB/4dWBIRt0bEL4GXAdMz8442711DY57/ooi4Bfg48KrMfGjAdquAU2hMD90AjBnkI48DpkXEKuA64MLMPP8xHJ60zbq8pbMklceRvyQVyPCXpAIZ/pJUIMNfkgrUEef5T506deuECRPabyhJ+oPVq1ffl5lPa9XWEeE/YcIELrvsspEuQ5I6SkQMevqy0z6SVCDDX5IKZPhLUoEMf0kqkOEvSQUy/CWpQLWFf0RMjYiftFh/dERcHxHXRsQ/1LV/SdLgagn/iDgF+BoDbmUbEaOBzwFH0niK0dsiYu86apCkTnfpjXdz6Y131/LZdY38fwW8usX6A4GezLw/Mx8FrgEOrakGSepoF99wFxffcFctn11L+GfmpcCmFk3dwPrK8oPAU+qoQZI0uOH+wbcXGFtZHgs8MMw1SFLxhvvePrfReID2nsBDwGHAZ4e5Bkkq3rCEf0QcBzw5M8+OiAXAFTS+dZzbfKi2JGkY1Rb+mXk7MK35+oLK+suBy+varySpPS/ykqQCGf6SVCDDX5IKZPhLUoEMf0kqkOEvSQUy/CWpQIa/JBXI8JekAhn+klQgw1+SCmT4S1KBDH9JKpDhL0kFMvwlqUCGvyQVyPCXpAIZ/pJUoFoe4xgRo4BFwCRgIzAvM3sq7acCxwK9wJmZ+d066pAktVbXyH8OMCYzpwOnAQv7GyLiecBxNJ7veyTwsYj485rqkCS1UFf4zwCWAWTmSmBKpe1A4CeZ+UhmPgL8Enh+TXVIklqoK/y7gfWV5S0R0T/FtAo4LCLGRsRewCHA7jXVIUlqoa7w7wXGVveTmZsBMvM24EvA92lMB10H3FdTHZKkFuoK/+XALICImEZjtE9z+WnAuMycAbwL2Be4taY6JEkt1HK2D7AYmBkRK4AuYG5ELAB6gMuBZ0fE9cCjwPsyc0tNdUiSWqgl/DOzD5g/YPXayusT69ivJGnbeJGXJBXI8JekAhn+klQgw1+SCmT4S1KBDH9JKpDhL0kFMvwlqUCGvyQVyPCXpAIZ/pJUIMNfkgpk+EtSgQx/SSqQ4S9JBTL8JalAhr8kFcjwl6QC1fIYx4gYBSwCJgEbgXmZ2VNpfy9wLNAHfDIzF9dRhySptbpG/nOAMZk5HTgNWNjfEBF7AO8EpgNHAp+vqQZJ0iDqCv8ZwDKAzFwJTKm0/R64A9i9+U9fTTVIkgZRV/h3A+sry1siojrFdBewBrgJOKumGiRJg6gr/HuBsdX9ZObm5uujgPHAfsAzgDkR8aKa6pAktVBX+C8HZgFExDRgVaXtfuBhYGNmPgI8AOxRUx2SpBZqOdsHWAzMjIgVQBcwNyIWAD2ZuTQiXgasjIg+4BrgBzXVIUlqoZbwz8w+YP6A1Wsr7R8BPlLHviVJ7XmRlyQVyPCXpAIZ/pJUIMNfkgpk+EtSgQx/SSqQ4S9JBarrIi9JHeSC6+5kyc3rRroMDbDmnl4mju+u5bMd+Utiyc3rWHNP70iXoQEmju9m9uQJtXy2I39JQCNovnXi9JEuQ8PEkb8kFcjwl6QCGf6SVCDDX5IKZPhLUoEMf0kqkOEvSQUy/CWpQIa/JBWolit8I2IUsAiYBGwE5mVmT7NtMvD5yubTgDmZuayOWiRJf6qu2zvMAcZk5vSImAYsBGYDZObNwOEAEfFa4NcGvyQNr7qmfWYAywAycyUwZeAGEbE78FHgnTXVIEkaRF3h3w2sryxviYiB3zLeCnw7M++rqQZJ0iDqmvbpBcZWlkdl5uYB27wReE1N+5ckDaGukf9yYBZAc85/VbUxIp4CPDEz76pp/5KkIdQ18l8MzIyIFUAXMDciFgA9mbkUOAC4vaZ9S5LaqCX8M7MPmD9g9dpK+/U0zgiSJI0AL/KSpAIZ/pJUIMNfkgpk+EtSgQx/SSqQ4S9JBTL8JalAQ4Z/RFxSeX1U/eVIkoZDu5H/XpXX76uzEEnS8NmeaZ+u2qqQJA2rdrd36IqI0TT+SPS/7gLIzEfrLk6SVI924f9MIPn/Uf8vmv/eCjy7rqIkSfUaMvwzc7/hKkSSNHza3tUzIv4WeD0wDrgbuDAzf1x3YZKk+gwZ/hFxEnAU8AXgNzSmgT4QEftn5tnDUJ8kqQbtRv5vBA7NzC3N5Z9HxJXAlYDhL0kdqt2pno9Wgh+AzNwIDHwerySpg7QL/75B1nvOvyR1sHbTPgc3n8Nb1QUcONSbImIUsAiYBGwE5mVmT6X9KOAjzcWbgJMyc+v2FC5J2nHtRv6TgG8AJwDHAqcC5wGT27xvDjAmM6cDpwEL+xsiYizwz8ArM3MajQe5j9uR4iVJO6Zd+J8AzATWZeYdwF3N5ePbvG8GsAwgM1cCUypthwCrgIUR8VPgN5n52x2oXZK0g9qF/yzgtZm5ASAzb6dxzv+r2ryvG1hfWd4SEf1TTOOAI2h8izgKeHdEHLCddUuSHoN24f/7gXPxmbkJeLDN+3qBsdX9ZGb/GUK/A67PzHsz8yHgatpPI0mSHkftwn9DRPzRPXyay+1+nF1O41sDETGNxjRPvxuB50bEuOa3gWnAmu2qWpL0mLQ72+dU4DsRcRXw38AzgJfT+C1gKIuBmc0zhbqAuRGxAOjJzKURcTpwRXPbizPz1h0+AknSdmt3Y7fVEXEoMBv4SxqnZX4sM4ec9snMPmD+gNVrK+0XARftUMWSpMes7Y3dMnM9jdM7JUm7CB/gLkkFMvwlqUCGvyQVyPCXpAIZ/pJUIMNfkgpk+EtSgQx/SSqQ4S9JBTL8JalAhr8kFcjwl6QCGf6SVCDDX5IKZPhLUoEMf0kqkOEvSQVq+ySvHRERo4BFwCRgIzAvM3sq7WcBLwb6Hwc5u/nEMEnSMKgl/IE5wJjMnB4R04CFNJ4D3O8g4OWZeV9N+5ckDaGu8J8BLAPIzJURMaW/ofmtYH/g7Ih4OvCvmXluTXVoJ3TBdXey5OZ1I12GKtbc08vE8d0jXYaGUV1z/t1AdRpnS0T0/6HZHfgi8CbgFcA7IuL5NdWhndCSm9ex5p7ekS5DFRPHdzN78oSRLkPDqK6Rfy8wtrI8KjM3N19vAL6QmRsAIuJHNH4b+HlNtWgnNHF8N986cfpIlyEVq66R/3JgFkBzzn9Vpe0A4JqI2C0iRtOYIrqppjokSS3UNfJfDMyMiBVAFzA3IhYAPZm5NCLOB1YCm4DzMnN1TXVIklqoJfwzsw+YP2D12kr7mcCZdexbktSeF3lJUoEMf0kqkOEvSQUy/CWpQIa/JBXI8JekAhn+klQgw1+SCmT4S1KBDH9JKpDhL0kFMvwlqUCGvyQVyPCXpAIZ/pJUIMNfkgpk+EtSgQx/SSqQ4S9JBarlGb4RMQpYBEwCNgLzMrOnxTbfA5Zk5lfqqEOS1FpdI/85wJjMnA6cBixssc0ngD1r2r8kaQh1hf8MYBlAZq4EplQbI+I1QB/w/Zr2L0kaQl3h3w2sryxviYgnAETEc4HjgA/XtG9JUhu1zPkDvcDYyvKozNzcfP1mYALwI+BZwKMRcXtmLqupFknSAHWF/3LgaODiiJgGrOpvyMxT+l9HxBnAvQa/JA2vusJ/MTAzIlYAXcDciFgA9GTm0pr2KUnaRrWEf2b2AfMHrF7bYrsz6ti/JGloXuQlSQUy/CWpQIa/JBXI8JekAhn+klQgw1+SCmT4S1KBDH9JKpDhL0kFMvwlqUCGvyQVyPCXpAIZ/pJUIMNfkgpk+EtSgQx/SSqQ4S9JBTL8JalAtTzGMSJGAYuAScBGYF5m9lTaTwLeAmwFPpaZ362jDklSa3WN/OcAYzJzOnAasLC/ISLGAe8ADgFeCnw5IrpqqkOS1EJd4T8DWAaQmSuBKf0NmXkfMCkzNwF7Aw9k5taa6pAktVBX+HcD6yvLWyLiD1NMmbk5Ik4GVgKX1FSDJGkQdYV/LzC2up/M3FzdIDO/BIwHDouII2qqQ5LUQi0/+ALLgaOBiyNiGrCqvyEiAvgUcAywicYPwn011SFJaqGu8F8MzIyIFUAXMDciFgA9mbk0Im4BrqVxts/3M/M/a6pDktRCLeGfmX3A/AGr11baPwp8tI59S5La8yIvSSqQ4S9JBTL8JalAhr8kFcjwl6QCGf6SVCDDX5IKVNdFXjuFS2+8m4tvuGuky9AAa+7pZeL47pEuQyqaI38Nu4nju5k9ecJIlyEVbZce+R9z8D4cc/A+I12GJO10HPlLUoEMf0kqkOEvSQUy/CWpQIa/JBXI8JekAhn+klQgw1+SCtQRF3mtXr36voi4Y6TrkKQO88zBGrq2bt06nIVIknYCTvtIUoEMf0kqkOEvSQUy/CWpQIa/JBXI8JekAnXEef7bKiKmAp/JzMMHrD8a+DCwGTg3M88ZgfK2yxDHsgB4K/Db5qoTMzOHubxtEhGjgXOBZwFPBD6RmUsr7R3TL9twLJ3UL7sB5wABbAHmZuavKu0d0S/bcBwd0yf9IuIvgBuBmZm5trL+ce+TXSb8I+IU4Hjg9wPWjwY+B7yw2bY8Ii7PzHuHv8ptM9ixNB0EvDkzbxzeqnbIm4DfZebxEbEX8F/AUujIfhn0WJo6qV+OBsjMF0fE4cC/ALOh4/pl0ONo6qQ+6f9v/1Xg4RbrH/c+2ZWmfX4FvLrF+gOBnsy8PzMfBa4BDh3WyrbfYMcCcDBwekRcExGnD2NNO+LbwIcqy5srrzutX4Y6FuigfsnM7wBvay4+E/hNpblj+qXNcUAH9UnTZ4GvAL8esL6WPtllwj8zLwU2tWjqBtZXlh8EnjIsRe2gIY4F4CJgPvASYEZEvHLYCttOmflQZj4YEWOBS4APVpo7ql/aHAt0UL8AZObmiPgG8EUax9Ov0/plsOOADuqTiHgL8NvMvKJFcy19ssuE/xB6gbGV5bHAAyNUy2MSEV3A5zPzvuYI4HvAC0a4rCFFxL7Aj4FvZuYFlaaO65fBjqUT+wUgM08ADgDOiYjdm6s7rl9aHUcH9snfAzMj4ifAZOC8iNi72VZLn+wyc/5DuA3YPyL2BB4CDqPx9aoTdQO3RsSBNOb+XkLjR8idUkQ8HbgSODkzrxrQ3FH90uZYOq1fjgf2ycxPARuAPho/mEIH9Uub4+ioPsnMw/pfN/8AzK/M6dfSJ7ts+EfEccCTM/Ps5q/+V9D4pnNuZq4b2eq2z4BjeT+N0edG4KrM/I+RrW5I7weeCnwoIvrny88Bdu/Afml3LJ3UL5cB/xYRVwOjgXcDr46ITvv/pd1xdFKf/Im6M8y7ekpSgUqY85ckDWD4S1KBDH9JKpDhL0kFMvwlqUC77Kme0lAiYj8a50rvReM0wVuAUzPzwUG2/zvguswceOm91JEc+as4EfEkGjdlOzMzD8/MFwPXARcO8bZ30bhwSNoleJ6/ihMRrwEOz8yTB6xfCfwCuCAzl0XEK4A30Lip2/nNthnAKcAcGt+cv5yZX42If2puuxm4OjNPjYgzgOcA44A9gUXAMTRuRXBCZq6MiH8EjgO2Ahdl5ln1Hr3U4MhfJXo2jTunDvQ/NC6d/yOZ+T3gZuDNwF8DRwFTgUOAiRHxPOB1zeVDaFyK338TsYcz8xU0rkadlZlHA58G3hARE4HX0/iDMgOYExHxuB2lNATn/FWidcCLWqzfH7i6stzVYpsAfpaZW2jcT+ZdEfFaYGVmbgKIiJ/S+CMBcFPz3w8Aa5qv7wfGAM+lcSvi/nsFPZXGN4Wd+oEj2jU48leJltC4g+If/gBExDwaT3zaAIxvrj6o8p4+Gv+/rAUOiohRETE6In5AYzpoakQ8oXk3ycOa66AxnTOYBFYDRzSf2PZ1YNVjPDZpmxj+Kk5mPkTjKVAfjIjlEXEdjWmcY4GvAe+JiB8CEypvWwGcB9wJLAOW03ioxvmZeQtwcXPdz4Dbge9sQx230Bj1XxMRN9D45rGz3kRNuxh/8JWkAjnyl6QCGf6SVCDDX5IKZPhLUoEMf0kqkOEvSQUy/CWpQP8HzlvxKE0LlRkAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "d4.step()\n", "decorate_dice('One die')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make Cdf from sequence\n", "\n", "For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/14).\n", "\n", "\n", "The following function makes a `Cdf` object from a sequence of values." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " @staticmethod\n", " def from_seq(seq, normalize=True, sort=True, **options):\n", " \"\"\"Make a CDF from a sequence of values.\n", "\n", " seq: any kind of sequence\n", " normalize: whether to normalize the Cdf, default True\n", " sort: whether to sort the Cdf by values, default True\n", " options: passed to the pd.Series constructor\n", "\n", " :return: CDF object\n", " \"\"\"\n", " pmf = Pmf.from_seq(seq, normalize=False, sort=sort, **options)\n", " return pmf.make_cdf(normalize=normalize)\n", "\n" ] } ], "source": [ "psource(Cdf.from_seq)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
a0.2
e0.4
l0.8
n1.0
\n", "
" ], "text/plain": [ "a 0.2\n", "e 0.4\n", "l 0.8\n", "n 1.0\n", "dtype: float64" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cdf = Cdf.from_seq(list('allen'))\n", "cdf" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
10.2
20.6
30.8
51.0
\n", "
" ], "text/plain": [ "1 0.2\n", "2 0.6\n", "3 0.8\n", "5 1.0\n", "dtype: float64" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cdf = Cdf.from_seq(np.array([1, 2, 2, 3, 5]))\n", "cdf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Selection\n", "\n", "For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/15).\n", "\n", "`Cdf` inherits [] from Series, so you can look up a quantile and get its cumulative probability." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.25" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4[1]" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4[4]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Cdf` objects are mutable, but in general the result is not a valid Cdf." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
10.25
20.50
30.75
41.00
51.25
\n", "
" ], "text/plain": [ "1 0.25\n", "2 0.50\n", "3 0.75\n", "4 1.00\n", "5 1.25\n", "dtype: float64" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4[5] = 1.25\n", "d4" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
10.2
20.4
30.6
40.8
51.0
\n", "
" ], "text/plain": [ "1 0.2\n", "2 0.4\n", "3 0.6\n", "4 0.8\n", "5 1.0\n", "dtype: float64" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4.normalize()\n", "d4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Evaluating CDFs\n", "\n", "For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/16).\n", "\n", "Evaluating a `Cdf` forward maps from a quantity to its cumulative probability." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "d6 = Cdf.from_seq([1,2,3,4,5,6])" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(0.5)" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d6.forward(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`forward` interpolates, so it works for quantities that are not in the distribution." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(0.5)" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d6.forward(3.5)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(0.)" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d6.forward(0)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(1.)" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d6.forward(7)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`__call__` is a synonym for `forward`, so you can call the `Cdf` like a function (which it is)." ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(0.16666667)" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d6(1.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`forward` can take an array of quantities, too." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "def decorate_cdf(title):\n", " \"\"\"Labels the axes.\n", " \n", " title: string\n", " \"\"\"\n", " plt.xlabel('Quantity')\n", " plt.ylabel('CDF')\n", " plt.title(title)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "qs = np.linspace(0, 7)\n", "ps = d6(qs)\n", "plt.plot(qs, ps)\n", "decorate_cdf('Forward evaluation')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Cdf` also provides `inverse`, which computes the inverse `Cdf`:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(3.)" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d6.inverse(0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`quantile` is a synonym for `inverse`" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(3.)" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d6.quantile(0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`inverse` and `quantile` work with arrays " ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "ps = np.linspace(0, 1)\n", "qs = d6.quantile(ps)\n", "plt.plot(qs, ps)\n", "decorate_cdf('Inverse evaluation')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These functions provide a simple way to make a Q-Q plot.\n", "\n", "Here are two samples from the same distribution." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "cdf1 = Cdf.from_seq(np.random.normal(size=100))\n", "cdf2 = Cdf.from_seq(np.random.normal(size=100))\n", "\n", "cdf1.plot()\n", "cdf2.plot()\n", "decorate_cdf('Two random samples')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's how we compute the Q-Q plot." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "def qq_plot(cdf1, cdf2):\n", " \"\"\"Compute results for a Q-Q plot.\n", " \n", " Evaluates the inverse Cdfs for a \n", " range of cumulative probabilities.\n", " \n", " :param cdf1: Cdf\n", " :param cdf2: Cdf\n", " \n", " :return: tuple of arrays\n", " \"\"\"\n", " ps = np.linspace(0, 1)\n", " q1 = cdf1.quantile(ps)\n", " q2 = cdf2.quantile(ps)\n", " return q1, q2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result is near the identity line, which suggests that the samples are from the same distribution." ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "q1, q2 = qq_plot(cdf1, cdf2)\n", "plt.plot(q1, q2)\n", "plt.xlabel('Quantity 1')\n", "plt.ylabel('Quantity 2')\n", "plt.title('Q-Q plot');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's how we compute a P-P plot" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "def pp_plot(cdf1, cdf2):\n", " \"\"\"Compute results for a P-P plot.\n", " \n", " Evaluates the Cdfs for all quantities in either Cdf.\n", " \n", " :param cdf1: Cdf\n", " :param cdf2: Cdf\n", " \n", " :return: tuple of arrays\n", " \"\"\"\n", " qs = cdf1.index.union(cdf2)\n", " p1 = cdf1(qs)\n", " p2 = cdf2(qs)\n", " return p1, p2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here's what it looks like." ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "p1, p2 = pp_plot(cdf1, cdf2)\n", "plt.plot(p1, p2)\n", "plt.xlabel('Cdf 1')\n", "plt.ylabel('Cdf 2')\n", "plt.title('P-P plot');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Statistics\n", "\n", "For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/17).\n", "\n", "`Cdf` overrides the statistics methods to compute `mean`, `median`, etc." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " def mean(self):\n", " \"\"\"Expected value.\n", "\n", " :return: float\n", " \"\"\"\n", " return self.make_pmf().mean()\n", "\n" ] } ], "source": [ "psource(Cdf.mean)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.5" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d6.mean()" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " def var(self):\n", " \"\"\"Variance.\n", "\n", " :return: float\n", " \"\"\"\n", " return self.make_pmf().var()\n", "\n" ] } ], "source": [ "psource(Cdf.var)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.916666666666667" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d6.var()" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " def std(self):\n", " \"\"\"Standard deviation.\n", "\n", " :return: float\n", " \"\"\"\n", " return self.make_pmf().std()\n", "\n" ] } ], "source": [ "psource(Cdf.std)" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.7078251276599332" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d6.std()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sampling\n", "\n", "For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/18).\n", "\n", "`choice` chooses a random values from the Cdf, following the API of `np.random.choice`" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " def choice(self, *args, **kwargs):\n", " \"\"\"Makes a random sample.\n", "\n", " Uses the probabilities as weights unless `p` is provided.\n", "\n", " args: same as np.random.choice\n", " options: same as np.random.choice\n", "\n", " :return: NumPy array\n", " \"\"\"\n", " # TODO: Make this more efficient by implementing the inverse CDF method.\n", " pmf = self.make_pmf()\n", " return pmf.choice(*args, **kwargs)\n", "\n" ] } ], "source": [ "psource(Cdf.choice)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 5, 4, 6, 3, 1, 2, 4, 2, 5])" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d6.choice(size=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`sample` chooses a random values from the `Cdf`, following the API of `pd.Series.sample`" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " def sample(self, *args, **kwargs):\n", " \"\"\"Makes a random sample.\n", "\n", " Uses the probabilities as weights unless `weights` is provided.\n", "\n", " This function returns an array containing a sample of the quantities in this Pmf,\n", " which is different from Series.sample, which returns a Series with a sample of\n", " the rows in the original Series.\n", "\n", " args: same as Series.sample\n", " options: same as Series.sample\n", "\n", " :return: NumPy array\n", " \"\"\"\n", " # TODO: Make this more efficient by implementing the inverse CDF method.\n", " pmf = self.make_pmf()\n", " return pmf.sample(*args, **kwargs)\n", "\n" ] } ], "source": [ "psource(Cdf.sample)" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 1, 6, 4, 4, 5, 2, 2, 2, 4])" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d6.sample(n=10, replace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Arithmetic\n", "\n", "For comments or questions about this section, see [this issue](https://github.com/AllenDowney/EmpyricalDistributions/issues/9).\n", "\n", "`Cdf` provides `add_dist`, which computes the distribution of the sum.\n", "\n", "The implementation uses outer products to compute the convolution of the two distributions." ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " def add_dist(self, x):\n", " \"\"\"Computes the distribution of the sum of values drawn from self and x.\n", "\n", " x: Distribution, scalar, or sequence\n", "\n", " :return: new Distribution, same subtype as self\n", " \"\"\"\n", " pmf = self.make_pmf()\n", " res = pmf.add_dist(x)\n", " return self.make_same(res)\n", "\n" ] } ], "source": [ "psource(Cdf.add_dist)" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " def make_same(self, dist):\n", " \"\"\"Convert the given dist to Cdf\n", "\n", " :param dist:\n", " :return: Cdf\n", " \"\"\"\n", " return dist.make_cdf()\n", "\n" ] } ], "source": [ "psource(Cdf.make_same)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's the distribution of the sum of two dice." ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
20.027778
30.083333
40.166667
50.277778
60.416667
70.583333
80.722222
90.833333
100.916667
110.972222
121.000000
\n", "
" ], "text/plain": [ "2 0.027778\n", "3 0.083333\n", "4 0.166667\n", "5 0.277778\n", "6 0.416667\n", "7 0.583333\n", "8 0.722222\n", "9 0.833333\n", "10 0.916667\n", "11 0.972222\n", "12 1.000000\n", "dtype: float64" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d6 = Cdf.from_seq([1,2,3,4,5,6])\n", "\n", "twice = d6.add_dist(d6)\n", "twice" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "7.000000000000002" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAESCAYAAAAVLtXjAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAATK0lEQVR4nO3de5SdVXnH8e+EoOmiGW9YHeN9Cc9iRMgSagg3sTVCUAwVKSiK4sKGpYgVLReVol22ahEv1IX3a0sEKmAQNXhDkWCCXEKByENj5RYDRYQMIAZy6R/vGT2MZzLEzj4nc/b3sxYr7/vu97z7OSTzO3vec87eA5s2bUKSVJdpvS5AktR9hr8kVcjwl6QKGf6SVCHDX5IqZPhLUoWm97oAqbSIOAPYt7U7DPwSeLC1PzczH+z4wP9/v+8Cds7MN0bE54GzM/P7JfqStpThr76XmceNbkfEzcARmXlll2s4upv9SRMx/FW1iPg4cF9mnhIRQ8CvgL/KzEsi4nXAQZl5WEScArwGWA/cBBybmXeMuda2wBnAPOB/gTuBta22HwGfzMyvR8QrgA/Q3HZ9ADgmM6+NiD2BDwPbARuA92fmRYX/F6hS3vNX7c4H5re2DwDuoAlvgFcC50XEUa1z/jIzdwGuB77c4VpvAXakubU0D3jm2BMi4inAfwBHta51GvChiHgC8CXg9Zn5QmAB8KmI+KNrSJPB8FftLgOe3grlA2hG5PMi4jHAi4Fv0wT/lzLzgdZjPgH8deucdi8FFmXmQ61zz+rQ317A9Zl5DUBmnp+Z84G5wBDwjYhY0ep3E7DLJD5X6fe87aOqZebGiLgIOBCYA7weOBk4FLg8M++PiG1ognjUNJqfnYEOl2w/tr5D+/r2a0XEAPACYBvg55k5p63tacBdf8rzkibiyF9qbv2cAFyXmQ8BPwQ+CJzXal8CvCkitmvtHwdcmpnrxlznO8CRETEjImYAh3XoazmwU0Q8v7W/gOY20DJgh4jYFyAiZgP/DcyajCcojeXIX4LvA08DPtXav5gmuL/Z2v8C8AzgioiYBqwCjuhwnc8Az6N5T+BumvB+hMy8MyKOAL4SEdOBEeDwzLwrIg4BTmu9cEyjuf9/8+Q8RemRBpzSWZLq420fSaqQ4S9JFTL8JalChr8kVWhKfNpnzpw5m2bN8hNvkrQlbrjhhl9n5pM7tU2J8J81axbnn39+r8uQpCklIm4Zr83bPpJUIcNfkipk+EtShQx/SaqQ4S9JFTL8JalCxcI/Iua0lq4be/ygiPhZRPw0It5cqn9J0viKhH9EnAB8Hpgx5vi2wMeAl9GskvR3EfHUEjVI0lR33lW3c95Vtxe5dqmR/y+AV3U4vhOwKjPvaS2acRmwT6EaJGlKO/fK2zj3ytuKXLtI+GfmecDDHZoGgbVt+/cBjytRgyRpfN1+w3cEmNm2PxO4t8s1SFL1uj23z89p1il9InA/sC/wkS7XIEnV60r4R8RrgT/PzM9GxPE0a6ROA76Ymau7UYMk6Q+KhX9r4ek9WtuL2o5/kz8sjC1JW7VFy29l8YrejFFXrhlheGiwyLX9kpckbcbiFatZuWakJ30PDw2yYHaZtUymxHz+ktRLw0ODnLNwbq/LmFSO/CWpQoa/JFXI8JekChn+klQhw1+SKmT4S1KFDH9JqpDhL0kV8ktekqaEXk2zUHKKhV5y5C9pSujVNAslp1joJUf+kqaMfpxmoVcc+UtShQx/SaqQ4S9JFTL8JalChr8kVcjwl6QKGf6SVCHDX5IqZPhLUoUMf0mqkOEvSRUy/CWpQoa/JFXI8JekCjmls6RHrVcLqkD/LqrSK478JT1qvVpQBfp3UZVeceQvaYu4oEp/cOQvSRUy/CWpQkVu+0TENOBMYFdgHXB0Zq5qa38X8BpgI/AvmXlBiTokSZ2VGvkfDMzIzLnAScDpow0R8XjgOGAu8DLg44VqkCSNo1T47w0sAcjMZcDubW0PALcA27X+21ioBknSOEqF/yCwtm1/Q0S032K6DVgJXA2cUagGSdI4SoX/CDCzvZ/MXN/ang8MAc8BngkcHBEvKlSHJKmDUuG/FDgQICL2AK5ra7sHeBBYl5m/A+4FHl+oDklSB6W+5HUBMC8iLgcGgKMi4nhgVWZeGBEvBZZFxEbgMuB7heqQJHVQJPwzcyNwzJjDN7a1nwqcWqJvSdLE/JKXJFXI8JekChn+klQhw1+SKmT4S1KFDH9JqpDhL0kVMvwlqUKGvyRVyPCXpAq5gLs0BS1afiuLV6zuer8r14wwPDTY9X41+Rz5S1PQ4hWrWblmpOv9Dg8NsmD2rK73q8nnyF+aooaHBjln4dxel6EpypG/JFXI8JekChn+klQhw1+SKmT4S1KFDH9JqpDhL0kVMvwlqUKGvyRVyPCXpAoZ/pJUIcNfkipk+EtShQx/SaqQ4S9JFTL8JalChr8kVcjwl6QKGf6SVCHDX5IqVGQB94iYBpwJ7AqsA47OzFVt7fOBU1u7VwNvzcxNJWqRJP2xUiP/g4EZmTkXOAk4fbQhImYCpwGvyMw9gJuB7QvVIUnqoFT47w0sAcjMZcDubW17AtcBp0fET4A7M/OuQnVIkjooFf6DwNq2/Q0RMXqLaXvgJcCJwHzg7yNix0J1SJI6KBX+I8DM9n4yc31r+27gZ5l5R2beD1wKzC5UhySpgyJv+AJLgYOAcyNiD5rbPKOuAnaOiO2Be4E9gM8VqkMqZtHyW1m8YnVP+l65ZoThocGe9K3+UCr8LwDmRcTlwABwVEQcD6zKzAsj4mTg4ta552bm9YXqkIpZvGJ1z0J4eGiQBbNndb1f9Y8i4Z+ZG4Fjxhy+sa39bODsEn1L3TQ8NMg5C+f2ugxpi/klL0mqkOEvSRUy/CWpQoa/JFXI8JekCm02/CPi623b88uXI0nqholG/k9q2/6HkoVIkrpnS277DBSrQpLUVRN9yWsgIraleZEY3R4AyMyHShcnSSpjovB/FpD8YdR/U+vPTcBzSxUlSSprs+Gfmc/pViGSpO6ZcG6fiHg5cBjNPPy3A1/LzEtKFyZJKmez4R8Rb6VZcOUTwJ00t4HeExE7ZOZnu1CfJKmAiUb+RwD7ZOaG1v5/RcR3ge8Chr8kTVETfdTzobbgByAz1wHrxzlfkjQFTBT+G8c57mf+JWkKm+i2z26t1bjaDQA7FapHktQFE4X/rsD+wA+Bh4BnAjsD3ylclySpoIlu+7wBmAeszsxbgNta+68vXZgkqZyJwv9A4NDM/C1AZt5M85n/VxauS5JU0ETh/0Bmbmo/kJkPA/eVK0mSVNpE4f/biHjEHD6t/U3jnC9JmgImesP3ROAbEfED4H9o3vDdn+a9AEnSFLXZkX9m3gDsA1wDbAdcDeyVmdd0oTZJUiETTuyWmWuBr3ahFklSl7iAuyRVyPCXpAoZ/pJUIcNfkipk+EtShQx/SarQhB/1lLZ2i5bfyuIVq7ve78o1IwwPDXa9X2kyOPLXlLd4xWpWrhnper/DQ4MsmD2r6/1Kk6HIyD8ipgFn0qwHsA44OjNXdTjnW8DizPx0iTpUj+GhQc5ZOLfXZUhTRqmR/8HAjMycC5wEnN7hnA8ATyzUvyRpM0qF/97AEoDMXAbs3t4YEa+mWR/YFcEkqQdKhf8gsLZtf0NETAeIiJ2B1wL/WKhvSdIESn3aZwSY2bY/LTPXt7aPBGbRrAv8bOChiLg5M5cUqkWSNEap8F8KHAScGxF7ANeNNmTmCaPbEfE+4A6DX5K6q1T4XwDMi4jLgQHgqIg4HliVmRcW6lOS9CgVCf/M3AgcM+bwjR3Oe1+J/iVJm+eXvCSpQoa/JFXI8JekChn+klQhw1+SKmT4S1KFDH9JqpDhL0kVMvwlqUKGvyRVyPCXpAoZ/pJUIcNfkipk+EtShQx/SaqQ4S9JFTL8JalChr8kVajUGr6q0KLlt7J4xequ97tyzQjDQ4Nd71eayhz5a9IsXrGalWtGut7v8NAgC2bP6nq/0lTmyF+TanhokHMWzu11GZIm4Mhfkipk+EtShQx/SaqQ4S9JFTL8JalChr8kVcjwl6QKGf6SVCHDX5IqZPhLUoUMf0mqUJG5fSJiGnAmsCuwDjg6M1e1tb8DOLy1++3MfH+JOiRJnZUa+R8MzMjMucBJwOmjDRHxXOAIYE9gLvCyiNilUB2SpA5Khf/ewBKAzFwG7N7WdhtwQGZuyMyNwLbA7wrVIUnqoNSUzoPA2rb9DRExPTPXZ+bDwK8jYgA4DbgmM28qVIckqYNSI/8RYGZ7P5m5fnQnImYAZ7XOeUuhGiRJ4ygV/kuBAwEiYg/gutGG1oh/MXBtZi7MzA2FapAkjaPUbZ8LgHkRcTkwABwVEccDq4BtgBcDj42I+a3zT87MnxaqRZI0RpHwb72Re8yYwze2bc8o0a8k6dHxS16SVCEXcO8zi5bfyuIVq3vS98o1IwwPDfakb0lbxpF/n1m8YjUr14z0pO/hoUEWzJ7Vk74lbRlH/n1oeGiQcxbO7XUZkrZijvwlqUKGvyRVyPCXpAoZ/pJUIcNfkipk+EtShQx/SaqQ4S9JFTL8JalChr8kVcjwl6QKGf6SVCHDX5IqZPhLUoWc0rmQXi2q4oIqkh4NR/6F9GpRFRdUkfRoOPIvyEVVJG2tHPlLUoUMf0mqkOEvSRUy/CWpQoa/JFXI8JekChn+klQhw1+SKtTXX/I676rbOffK23rSt9MsSNqaOfIvxGkWJG3N+nrkf8huT+eQ3Z7e6zIkaavjyF+SKlRk5B8R04AzgV2BdcDRmbmqrf3NwEJgPfCBzLyoRB2SpM5KjfwPBmZk5lzgJOD00YaIeCpwHLAXsD/wwYh4bKE6JEkdlAr/vYElAJm5DNi9re1FwNLMXJeZa4FVwC6F6pAkdVAq/AeBtW37GyJi+jht9wGPK1SHJKmDUuE/Asxs7ycz14/TNhO4t1AdkqQOSoX/UuBAgIjYA7iure0KYJ+ImBERjwN2Aq4vVIckqYNSn/O/AJgXEZcDA8BREXE8sCozL4yIM4Cf0Lz4vCczf1eoDklSBwObNm3qdQ0Tioi7gFt6XYckTTHPyswnd2qYEuEvSZpcfsNXkipk+EtShQx/SaqQ4S9JFTL8JalChr8kVahvF3OJiG2BLwLPBh5LM3X0hT0tqgsi4i+Aq4B5mXljr+vphog4GXgl8BjgzMz8Qo9LKqb17/orNP+uNwBv7ue/54iYA3w4M/eLiOcBXwY20cwK8NbM3NjL+koY85xnA/9G83e9DjgyM++cjH76eeT/OuDuzNwHmA98ssf1FNcKhs8AD/a6lm6JiP2APWmmCH8x8IyeFlTegcD0zNwT+Cfgn3tcTzERcQLweWBG69BHgfe2fqYHgAW9qq2UDs/5E8DbMnM/4HzgxMnqq5/D/z+BU9r21493Yh/5CPBp4Fe9LqSL9qeZO+oC4JtAvy8MdBMwvbVg0iDwcI/rKekXwKva9ncDftza/g7w0q5XVN7Y53x4Zq5obU8HJm0qnL4N/8y8PzPvi4iZwNeB9/a6ppIi4o3AXZl5ca9r6bLtadaLOBQ4BjgrIgZ6W1JR99Pc8rkR+BxwRk+rKSgzz+ORL24DmTk6JUFfTgU/9jln5hqAiNgTOBb42GT11bfhDxARzwAuAf49Mxf1up7C3kQzmd6PgNnAV1urpvW7u4GLM/OhzEyakVHHuUz6xDtonu+ONMukfiUiZkzwmH7Rfn+/mqngI+Iwmt/oX56Zd03Wdfv5Dd+nAN8Fjs3MH/S6ntIyc9/R7dYLwDGZeUfvKuqay4C3R8RHgSFgO5oXhH51D38YGf4G2BbYpnfldNU1EbFfZv6I5n28S3pcT3ER8Tqa9c73y8zfTOa1+zb8gXcDTwBOiYjRe//zM7OaN0NrkJkXRcS+NOtETKP5BMiGHpdV0seAL0bET2g+3fTuzHygxzV1yzuBz0XEY4Cf09zO7VsRsQ3Nbb1bgfMjAuDHmXnqZFzfWT0lqUJ9fc9fktSZ4S9JFTL8JalChr8kVcjwl6QK9fNHPaVxRcRzaKbDeBLNZ+WvBU7MzPvGOf9vgOWZWdPUGepjjvxVnYj4M+BC4F8zc7/M3AtYDnxtMw97O81cOlJf8HP+qk5EvJrmG5PHjjm+jGbitEWZuSQiDgAOp5kk8KxW297ACcDBNL85fyozPxMR72ydux64NDNPjIj3Ac+jmX/oicCZwCHAjsAbMnNZRLwNeC3NNMVnZ2bfztWjrYsjf9XouTSzJ471S2DfsQcz81vACuBI4Pk0UwvMoZlKejgiXgD8bWt/T2CHiHhF6+EPZuYBNNPxHpiZBwEfAg6PiGHgMJoXlL2Bg6P1NU6pNO/5q0argRd1OL4DcGnbfqfZQQO4ojWFxG9p5hU6FFiWmQ8DtKZeeH7r/Ktbf94LrGxt30MzX/vOwLOA0bmnnkDzm0L+Cc9J2iKO/FWjxTQzoP7+BSAijgbuogn0odbhF7Y9ZiPNz8uNwAsjYlpEbBsR36O5HTQnIqa3ppPet3UMmts540ngBuAlrcU6vkyzNoFUnOGv6mTm/cBBwHsjYmlELKe5jfMamlWU3hER3wdmtT3scuCrNJNsLQGW0swoelZmXguc2zp2BXAz8I1HUce1NKP+yyLiSprfPFZPxnOUJuIbvpJUIUf+klQhw1+SKmT4S1KFDH9JqpDhL0kVMvwlqUKGvyRV6P8Aq6HcPypuK3gAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "twice.step()\n", "decorate_dice('Two dice')\n", "twice.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To add a constant to a distribution, you could construct a deterministic `Pmf`" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
20.166667
30.333333
40.500000
50.666667
60.833333
71.000000
\n", "
" ], "text/plain": [ "2 0.166667\n", "3 0.333333\n", "4 0.500000\n", "5 0.666667\n", "6 0.833333\n", "7 1.000000\n", "dtype: float64" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "const = Cdf.from_seq([1])\n", "d6.add_dist(const)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But `add_dist` also handles constants as a special case:" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
20.166667
30.333333
40.500000
50.666667
60.833333
71.000000
\n", "
" ], "text/plain": [ "2 0.166667\n", "3 0.333333\n", "4 0.500000\n", "5 0.666667\n", "6 0.833333\n", "7 1.000000\n", "dtype: float64" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d6.add_dist(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Other arithmetic operations are also implemented" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
-30.041667
-20.125000
-10.250000
00.416667
10.583333
20.750000
30.875000
40.958333
51.000000
\n", "
" ], "text/plain": [ "-3 0.041667\n", "-2 0.125000\n", "-1 0.250000\n", " 0 0.416667\n", " 1 0.583333\n", " 2 0.750000\n", " 3 0.875000\n", " 4 0.958333\n", " 5 1.000000\n", "dtype: float64" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4 = Cdf.from_seq([1,2,3,4])\n", "d6.sub_dist(d4)" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
10.0625
20.1875
30.3125
40.5000
60.6250
80.7500
90.8125
120.9375
161.0000
\n", "
" ], "text/plain": [ "1 0.0625\n", "2 0.1875\n", "3 0.3125\n", "4 0.5000\n", "6 0.6250\n", "8 0.7500\n", "9 0.8125\n", "12 0.9375\n", "16 1.0000\n", "dtype: float64" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4.mul_dist(d4)" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
probs
0.2500000.0625
0.3333330.1250
0.5000000.2500
0.6666670.3125
0.7500000.3750
1.0000000.6250
1.3333330.6875
1.5000000.7500
2.0000000.8750
3.0000000.9375
4.0000001.0000
\n", "
" ], "text/plain": [ "0.250000 0.0625\n", "0.333333 0.1250\n", "0.500000 0.2500\n", "0.666667 0.3125\n", "0.750000 0.3750\n", "1.000000 0.6250\n", "1.333333 0.6875\n", "1.500000 0.7500\n", "2.000000 0.8750\n", "3.000000 0.9375\n", "4.000000 1.0000\n", "dtype: float64" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4.div_dist(d4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparison operators\n", "\n", "`Pmf` implements comparison operators that return probabilities.\n", "\n", "You can compare a `Pmf` to a scalar:" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.3333333333333333" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d6.lt_dist(3)" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.75" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4.ge_dist(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or compare `Pmf` objects:" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.25" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4.gt_dist(d6)" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.41666666666666663" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d6.le_dist(d4)" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.16666666666666666" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d4.eq_dist(d6)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Interestingly, this way of comparing distributions is [nontransitive]()." ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [], "source": [ "A = Cdf.from_seq([2, 2, 4, 4, 9, 9])\n", "B = Cdf.from_seq([1, 1, 6, 6, 8, 8])\n", "C = Cdf.from_seq([3, 3, 5, 5, 7, 7])" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5555555555555556" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.gt_dist(B)" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5555555555555556" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B.gt_dist(C)" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5555555555555556" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C.gt_dist(A)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }