{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Dollar-unit sampling and \"taint\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dollar-unit sampling\n", "\n", "Dollar-unit sampling (DUS) is a form of probability proportional to size (PPS) sampling that originated in statistical financial auditing. In this case, \"size\" is an upper bound on the value of the item; DUS is sometimes called sampling with probability proportional to an upper bound (PPUB).\n", "\n", "The idea is that each item in a population has a reported value and a true value; the true value is known only if the item is audited—which is expensive. There is a possibility that the reported value is too high. We sample units with probability proportional to their reported values.\n", "\n", "Intuitively, it makes sense to focus attention on the items with larger reported values, because \"that's where the money is.\" Errors in the values of those items potentially have a larger effect on the error in the total reported value for the population.\n", "\n", "### The math\n", "\n", "There are $N$ items, $\\{x_j\\}_{j=1}^N$. There are $N$ known numbers $\\{u_j\\}_{j=1}^N$ such that\n", "\\begin{equation*} \n", " 0 \\le x_j \\le u_j, \\; j = 1, \\ldots, N.\n", "\\end{equation*}\n", "Define $u := \\sum_{j=1}^N u_j$, $x := \\sum_{j=1}^N x_j$, and $\\mu := x/N = \\frac{1}{N} \\sum_{j=1}^N x_j$.\n", "\n", "We will consider only sampling with replacement (although this can be extended to sampling without replacement).\n", "\n", "In the $i$th draw, we select item $j$ with probability $\\pi_j := u_j/u$. \n", "\n", "Suppose the $i$th draw selects item $J$.\n", "Then $X_i = x_J$ and $U_i = u_J$.\n", "If we sample with replacement, $\\{X_i\\}$ are iid, as are $\\{U_i \\}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Taint\n", "\n", "Define $t_j := x_j/u_j$. This value, the \"taint,\" is the fraction of the reported value that the true value represents.\n", "\n", "Define $t := x/u$. (This is _not_ parallel to the definitions of $x$ and $u$.)\n", "\n", "Define $T_i := X_i/U_i$, the taint of the item selected in the $i$th draw. Then $\\{T_i \\}$ are iid.\n", "\n", "Calculate:\n", "\\begin{equation*} \n", " \\mathbb E T_i = \\sum_{j=1}^N t_j \\pi_j = \\sum_{j=1}^N (x_j/u_j) (u_j/u) = \\frac{1}{u}\\sum_{j=1}^N x_j = x/u = t.\n", "\\end{equation*}\n", "Thus the expected value of the taint for any draw, times $u$, is the population total $x$.\n", "\n", "Since $u$ is known, we can translate an upper confidence bound for $\\mathbb E T_i$ into an upper confidence bound for $x$ or for $\\mu$ by multiplication.\n", "\n", "This is the relationship we will exploit to connect auditing problems to the problem of finding a nonparametric confidence bound for the mean of a non-negative or bounded population.\n", "\n", "The main point is that since $\\mathbb P \\{T_i \\in [0, 1]\\} = 1$, we can use any of the methods we've developed for finding confidence bounds for the mean of nonnegative or bounded populations (Binomial with thresholding, Chebychev's inequality, Hoeffding's inequality, Markov's inequality, Anderson's method, the Kaplan-Markov method, or other supermartingale-based methods) to make confidence bonds for $\\mathbb E T_i$. \n", "In turn, any confidence bound for $t$ can be translated easily into a rigorous confidence bound for the population mean $\\mu$ and population total $x$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Some examples\n", "\n", "### Litigation and elections\n", "\n", "A plaintiff might want to find a confidence bound in one direction; a defendant might want a confidence bound in the opposite direction.\n", "\n", "+ Employment litigation\n", " - wage and hour: overtime is nonnegative, and has an upper bound based on the number of days an employee worked\n", " - missed meal or rest breaks: nonnegative and has an upper bound based on days worked\n", "+ Toxic tort class action\n", " - damages are nonnegative\n", "+ Construction defects, product defects\n", " - damages nonnegative and bounded (e.g., by replacement cost)\n", "+ Illegal charges for loan origination\n", " - nonnegative and bounded by total origination fee\n", "+ Patent and intellectual property infringement\n", " - damages nonnegative; upper bound from royalty rate and number of potentially infringing items\n", "+ Healthcare fraud\n", " - damages nonnegative and bounded by billed amount\n", "+ Tax fraud\n", " - taxes paid is an upper bound\n", "+ Under-refunding of security deposits\n", " - amount not refunded is an upper bound\n", "+ Election integrity\n", " - upper and lower bounds on the error in counting the votes on a ballot depend on the election rules; I'll talk about this in my plenary lecture" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's simple code to draw a weighted random sample in Python. For other approaches, see \n", "[the cryptorandom library](https://github.com/statlab/cryptorandom)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# This is the first cell with code: set up the Python environment\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import math\n", "import numpy as np\n", "import scipy as sp\n", "import scipy.stats\n", "from scipy.stats import binom\n", "import pandas as pd\n", "from ipywidgets import interact, interactive, fixed\n", "import ipywidgets as widgets\n", "from IPython.display import clear_output, display, HTML" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def weightedRandomSample(n, weights):\n", " '''\n", " Weighted random sample of size n drawn with replacement.\n", " Returns indices of the selected items and the raw uniform values used to \n", " select them.\n", " '''\n", " if any(weights < 0):\n", " print('negative weight in weightedRandomSample')\n", " return float('NaN')\n", " else:\n", " wc = np.cumsum(weights, dtype=float)/np.sum(weights, dtype=float) # ensure weights sum to 1\n", " theSam = np.random.random_sample((n,))\n", " return np.array(wc).searchsorted(theSam), theSam" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 5 5 5 5\n", " 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8\n", " 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9]\n" ] }, { "data": { "text/plain": [ "(array([ 4., 7., 14., 8., 13., 13., 12., 15.]),\n", " array([0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5]),\n", " <BarContainer object of 8 artists>)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAh8AAAGdCAYAAACyzRGfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAa+klEQVR4nO3da4xU9f348c/Iyoh0dy3oght2BVtbFLxFrEVthWhpVqQaW+8Xim2icYsgvQC1F/AnrjSpoZGIxQeoMaAPKkhrvaBV0ah1AVFrW5GKstFS0tbuArajwvk9+MX9/7fgBT3zHWZ9vZLz4Jw5O9/PBJN9e+bMTiHLsiwAABLZq9IDAACfLOIDAEhKfAAASYkPACAp8QEAJCU+AICkxAcAkJT4AACSqqn0AP9tx44d8frrr0dtbW0UCoVKjwMAfAhZlsWWLVuisbEx9trr/a9t7HHx8frrr0dTU1OlxwAAPoKOjo4YMmTI+56zx8VHbW1tRPzf8HV1dRWeBgD4MLq6uqKpqan79/j72ePi4923Wurq6sQHAFSZD3PLhBtOAYCkxAcAkJT4AACSEh8AQFLiAwBISnwAAEmJDwAgKfEBACQlPgCApMQHAJCU+AAAkhIfAEBS4gMASEp8AABJ1VR6AADY0wydcU+lRyirV64bX9H1XfkAAJISHwBAUuIDAEhKfAAASYkPACAp8QEAJCU+AICkxAcAkJT4AACSEh8AQFLiAwBISnwAAEmJDwAgKfEBACQlPgCApMQHAJCU+AAAkhIfAEBS4gMASGq342PlypUxYcKEaGxsjEKhEMuWLXvPcy+99NIoFAoxb968jzEiANCb7HZ8bNu2LY488siYP3/++563bNmy+P3vfx+NjY0feTgAoPep2d0faGlpiZaWlvc957XXXovvfOc7cf/998f48eM/8nAAQO+z2/HxQXbs2BEXXXRRfP/7348RI0Z84PmlUilKpVL3fldXV94jAQB7kNzjY+7cuVFTUxNXXHHFhzq/ra0tZs+enfcY9GJDZ9xT6RHK6pXrXC0EerdcP+2yevXq+MUvfhG33HJLFAqFD/UzM2fOjM7Ozu6to6Mjz5EAgD1MrvHx2GOPxebNm6O5uTlqamqipqYmXn311fjud78bQ4cO3eXPFIvFqKur67EBAL1Xrm+7XHTRRXHKKaf0OPbVr341Lrroopg0aVKeSwEAVWq342Pr1q2xfv367v0NGzbE2rVrY8CAAdHc3BwDBw7scf7ee+8dgwcPjs9//vMff1oAoOrtdnysWrUqxo4d270/bdq0iIiYOHFi3HLLLbkNBgD0TrsdH2PGjIksyz70+a+88sruLgEA9GK+2wUASEp8AABJiQ8AICnxAQAkJT4AgKTEBwCQlPgAAJISHwBAUuIDAEhKfAAASYkPACAp8QEAJCU+AICkxAcAkJT4AACSEh8AQFI1lR4A+GQZOuOeSo/Ax/TKdeMrPQJVzpUPACAp8QEAJCU+AICkxAcAkJT4AACSEh8AQFLiAwBISnwAAEmJDwAgKfEBACQlPgCApMQHAJCU+AAAkhIfAEBS4gMASEp8AABJiQ8AICnxAQAkJT4AgKR2Oz5WrlwZEyZMiMbGxigUCrFs2bLux95+++2YPn16HH744dG/f/9obGyMiy++OF5//fU8ZwYAqthux8e2bdviyCOPjPnz5+/02Jtvvhlr1qyJH//4x7FmzZq46667Yt26dfG1r30tl2EBgOpXs7s/0NLSEi0tLbt8rL6+PlasWNHj2A033BBf+MIXYuPGjdHc3PzRpgQAeo3djo/d1dnZGYVCIfbbb79dPl4qlaJUKnXvd3V1lXskAKCCyhof//nPf2LGjBlx/vnnR11d3S7PaWtri9mzZ5dzDAByNHTGPZUegSpXtk+7vP3223HuuefGjh074sYbb3zP82bOnBmdnZ3dW0dHR7lGAgD2AGW58vH222/H2WefHRs2bIjf/e5373nVIyKiWCxGsVgsxxgAwB4o9/h4NzxeeumlePjhh2PgwIF5LwEAVLHdjo+tW7fG+vXru/c3bNgQa9eujQEDBkRjY2N84xvfiDVr1sRvfvOb2L59e2zatCkiIgYMGBB9+/bNb3IAoCrtdnysWrUqxo4d270/bdq0iIiYOHFizJo1K5YvXx4REUcddVSPn3v44YdjzJgxH31SAKBX2O34GDNmTGRZ9p6Pv99jAAC+2wUASEp8AABJiQ8AICnxAQAkJT4AgKTEBwCQlPgAAJISHwBAUuIDAEhKfAAASYkPACAp8QEAJCU+AICkxAcAkJT4AACSEh8AQFLiAwBISnwAAEmJDwAgKfEBACQlPgCApMQHAJCU+AAAkhIfAEBS4gMASEp8AABJiQ8AICnxAQAkJT4AgKTEBwCQlPgAAJISHwBAUuIDAEhKfAAASYkPACCp3Y6PlStXxoQJE6KxsTEKhUIsW7asx+NZlsWsWbOisbEx+vXrF2PGjIkXXnghr3kBgCq32/Gxbdu2OPLII2P+/Pm7fPxnP/tZXH/99TF//vxob2+PwYMHx1e+8pXYsmXLxx4WAKh+Nbv7Ay0tLdHS0rLLx7Isi3nz5sVVV10VZ555ZkRE3HrrrTFo0KBYvHhxXHrppR9vWgCg6uV6z8eGDRti06ZNMW7cuO5jxWIxTjrppHjiiSfyXAoAqFK7feXj/WzatCkiIgYNGtTj+KBBg+LVV1/d5c+USqUolUrd+11dXXmOBADsYcryaZdCodBjP8uynY69q62tLerr67u3pqamcowEAOwhco2PwYMHR8T/uwLyrs2bN+90NeRdM2fOjM7Ozu6to6Mjz5EAgD1MrvExbNiwGDx4cKxYsaL72FtvvRWPPvpoHH/88bv8mWKxGHV1dT02AKD32u17PrZu3Rrr16/v3t+wYUOsXbs2BgwYEM3NzTF16tS49tpr45BDDolDDjkkrr322th3333j/PPPz3VwAKA67XZ8rFq1KsaOHdu9P23atIiImDhxYtxyyy3xgx/8IP7973/H5ZdfHm+88UYcd9xx8cADD0RtbW1+UwMAVWu342PMmDGRZdl7Pl4oFGLWrFkxa9asjzMXANBL+W4XACAp8QEAJCU+AICkxAcAkJT4AACSEh8AQFLiAwBISnwAAEmJDwAgKfEBACQlPgCApMQHAJCU+AAAkhIfAEBS4gMASEp8AABJiQ8AICnxAQAkJT4AgKTEBwCQlPgAAJISHwBAUuIDAEhKfAAASYkPACAp8QEAJCU+AICkxAcAkJT4AACSEh8AQFLiAwBISnwAAEmJDwAgKfEBACQlPgCApMQHAJBU7vHxzjvvxI9+9KMYNmxY9OvXLw4++OC4+uqrY8eOHXkvBQBUoZq8n3Du3Llx0003xa233hojRoyIVatWxaRJk6K+vj6mTJmS93IAQJXJPT6efPLJOP3002P8+PERETF06NBYsmRJrFq1Ku+lAIAqlPvbLieeeGI89NBDsW7duoiIePbZZ+Pxxx+PU089Ne+lAIAqlPuVj+nTp0dnZ2cMHz48+vTpE9u3b485c+bEeeedt8vzS6VSlEql7v2urq68RwIA9iC5x8edd94Zt99+eyxevDhGjBgRa9eujalTp0ZjY2NMnDhxp/Pb2tpi9uzZeY8BVWvojHsqPQJAWRWyLMvyfMKmpqaYMWNGtLa2dh+75ppr4vbbb48///nPO52/qysfTU1N0dnZGXV1dXmORi/hlzPAx/PKdeNzf86urq6or6//UL+/c7/y8eabb8Zee/W8laRPnz7v+VHbYrEYxWIx7zEAgD1U7vExYcKEmDNnTjQ3N8eIESPimWeeieuvvz4uueSSvJcCAKpQ7vFxww03xI9//OO4/PLLY/PmzdHY2BiXXnpp/OQnP8l7KQCgCuUeH7W1tTFv3ryYN29e3k8NAPQCvtsFAEhKfAAASYkPACAp8QEAJCU+AICkxAcAkJT4AACSEh8AQFLiAwBISnwAAEmJDwAgKfEBACQlPgCApMQHAJCU+AAAkhIfAEBS4gMASKqm0gOQr6Ez7qn0CADwvlz5AACSEh8AQFLiAwBISnwAAEmJDwAgKfEBACQlPgCApMQHAJCU+AAAkhIfAEBS4gMASEp8AABJiQ8AICnxAQAkJT4AgKTEBwCQlPgAAJISHwBAUmWJj9deey0uvPDCGDhwYOy7775x1FFHxerVq8uxFABQZWryfsI33ngjTjjhhBg7dmzce++90dDQEH/5y19iv/32y3spAKAK5R4fc+fOjaampli0aFH3saFDh+a9DABQpXJ/22X58uUxatSoOOuss6KhoSGOPvrouPnmm9/z/FKpFF1dXT02AKD3yj0+Xn755ViwYEEccsghcf/998dll10WV1xxRdx22227PL+trS3q6+u7t6amprxHAgD2IIUsy7I8n7Bv374xatSoeOKJJ7qPXXHFFdHe3h5PPvnkTueXSqUolUrd+11dXdHU1BSdnZ1RV1eX52ifCENn3FPpEQDYw71y3fjcn7Orqyvq6+s/1O/v3K98HHjggXHYYYf1OHbooYfGxo0bd3l+sViMurq6HhsA0HvlHh8nnHBCvPjiiz2OrVu3Lg466KC8lwIAqlDu8XHllVfGU089Fddee22sX78+Fi9eHAsXLozW1ta8lwIAqlDu8XHsscfG0qVLY8mSJTFy5Mj4n//5n5g3b15ccMEFeS8FAFSh3P/OR0TEaaedFqeddlo5nhoAqHK+2wUASEp8AABJiQ8AICnxAQAkJT4AgKTEBwCQlPgAAJISHwBAUuIDAEhKfAAASYkPACAp8QEAJCU+AICkxAcAkJT4AACSEh8AQFLiAwBISnwAAEmJDwAgKfEBACQlPgCApMQHAJCU+AAAkhIfAEBS4gMASEp8AABJiQ8AICnxAQAkJT4AgKTEBwCQlPgAAJISHwBAUuIDAEhKfAAASYkPACAp8QEAJFX2+Ghra4tCoRBTp04t91IAQBUoa3y0t7fHwoUL44gjjijnMgBAFSlbfGzdujUuuOCCuPnmm+PTn/50uZYBAKpM2eKjtbU1xo8fH6eccsr7nlcqlaKrq6vHBgD0XjXleNI77rgj1qxZE+3t7R94bltbW8yePbscY+zS0Bn3JFsLANhZ7lc+Ojo6YsqUKXH77bfHPvvs84Hnz5w5Mzo7O7u3jo6OvEcCAPYguV/5WL16dWzevDmOOeaY7mPbt2+PlStXxvz586NUKkWfPn26HysWi1EsFvMeAwDYQ+UeHyeffHI8//zzPY5NmjQphg8fHtOnT+8RHgDAJ0/u8VFbWxsjR47scax///4xcODAnY4DAJ88/sIpAJBUWT7t8t8eeeSRFMsAAFXAlQ8AICnxAQAkJT4AgKTEBwCQlPgAAJISHwBAUuIDAEhKfAAASYkPACAp8QEAJCU+AICkxAcAkJT4AACSEh8AQFLiAwBISnwAAEmJDwAgKfEBACQlPgCApMQHAJCU+AAAkhIfAEBS4gMASEp8AABJiQ8AICnxAQAkJT4AgKTEBwCQlPgAAJISHwBAUuIDAEhKfAAASYkPACAp8QEAJCU+AICkxAcAkFTu8dHW1hbHHnts1NbWRkNDQ5xxxhnx4osv5r0MAFClco+PRx99NFpbW+Opp56KFStWxDvvvBPjxo2Lbdu25b0UAFCFavJ+wvvuu6/H/qJFi6KhoSFWr14dX/7yl/NeDgCoMrnHx3/r7OyMiIgBAwbs8vFSqRSlUql7v6urq9wjAQAVVNYbTrMsi2nTpsWJJ54YI0eO3OU5bW1tUV9f3701NTWVcyQAoMLKGh/f+c534rnnnoslS5a85zkzZ86Mzs7O7q2jo6OcIwEAFVa2t10mT54cy5cvj5UrV8aQIUPe87xisRjFYrFcYwAAe5jc4yPLspg8eXIsXbo0HnnkkRg2bFjeSwAAVSz3+GhtbY3FixfH3XffHbW1tbFp06aIiKivr49+/frlvRwAUGVyv+djwYIF0dnZGWPGjIkDDzywe7vzzjvzXgoAqEJledsFAOC9+G4XACAp8QEAJCU+AICkxAcAkJT4AACSEh8AQFLiAwBISnwAAEmJDwAgKfEBACQlPgCApMQHAJCU+AAAkhIfAEBS4gMASEp8AABJiQ8AICnxAQAkJT4AgKTEBwCQlPgAAJISHwBAUuIDAEhKfAAASYkPACAp8QEAJCU+AICkxAcAkJT4AACSEh8AQFLiAwBISnwAAEmJDwAgKfEBACQlPgCApMQHAJBU2eLjxhtvjGHDhsU+++wTxxxzTDz22GPlWgoAqCJliY8777wzpk6dGldddVU888wz8aUvfSlaWlpi48aN5VgOAKgiZYmP66+/Pr71rW/Ft7/97Tj00ENj3rx50dTUFAsWLCjHcgBAFanJ+wnfeuutWL16dcyYMaPH8XHjxsUTTzyx0/mlUilKpVL3fmdnZ0REdHV15T1aRETsKL1ZlucFgGpRjt+x7z5nlmUfeG7u8fH3v/89tm/fHoMGDepxfNCgQbFp06adzm9ra4vZs2fvdLypqSnv0QCAiKifV77n3rJlS9TX17/vObnHx7sKhUKP/SzLdjoWETFz5syYNm1a9/6OHTvin//8ZwwcOHCX51eTrq6uaGpqio6Ojqirq6v0OGXR21+j11f9evtr9PqqX295jVmWxZYtW6KxsfEDz809Pvbff//o06fPTlc5Nm/evNPVkIiIYrEYxWKxx7H99tsv77Eqqq6urqr/g/owevtr9PqqX29/jV5f9esNr/GDrni8K/cbTvv27RvHHHNMrFixosfxFStWxPHHH5/3cgBAlSnL2y7Tpk2Liy66KEaNGhWjR4+OhQsXxsaNG+Oyyy4rx3IAQBUpS3ycc8458Y9//COuvvrq+Otf/xojR46M3/72t3HQQQeVY7k9VrFYjJ/+9Kc7va3Um/T21+j1Vb/e/hq9vur3SXiN/62QfZjPxAAA5MR3uwAASYkPACAp8QEAJCU+AICkxEcZrFy5MiZMmBCNjY1RKBRi2bJllR4pV21tbXHsscdGbW1tNDQ0xBlnnBEvvvhipcfK1YIFC+KII47o/qM/o0ePjnvvvbfSY5VNW1tbFAqFmDp1aqVHycWsWbOiUCj02AYPHlzpsXL32muvxYUXXhgDBw6MfffdN4466qhYvXp1pcfKxdChQ3f6NywUCtHa2lrp0XLxzjvvxI9+9KMYNmxY9OvXLw4++OC4+uqrY8eOHZUeLYmy/Xn1T7Jt27bFkUceGZMmTYqvf/3rlR4nd48++mi0trbGscceG++8805cddVVMW7cuPjjH/8Y/fv3r/R4uRgyZEhcd9118dnPfjYiIm699dY4/fTT45lnnokRI0ZUeLp8tbe3x8KFC+OII46o9Ci5GjFiRDz44IPd+3369KngNPl744034oQTToixY8fGvffeGw0NDfGXv/yl1/yF6Pb29ti+fXv3/h/+8If4yle+EmeddVYFp8rP3Llz46abbopbb701RowYEatWrYpJkyZFfX19TJkypdLjlZ34KIOWlpZoaWmp9Bhlc9999/XYX7RoUTQ0NMTq1avjy1/+coWmyteECRN67M+ZMycWLFgQTz31VK+Kj61bt8YFF1wQN998c1xzzTWVHidXNTU1vfJqx7vmzp0bTU1NsWjRou5jQ4cOrdxAOTvggAN67F933XXxmc98Jk466aQKTZSvJ598Mk4//fQYP358RPzfv92SJUti1apVFZ4sDW+78LF1dnZGRMSAAQMqPEl5bN++Pe64447Ytm1bjB49utLj5Kq1tTXGjx8fp5xySqVHyd1LL70UjY2NMWzYsDj33HPj5ZdfrvRIuVq+fHmMGjUqzjrrrGhoaIijjz46br755kqPVRZvvfVW3H777XHJJZdU/ReOvuvEE0+Mhx56KNatWxcREc8++2w8/vjjceqpp1Z4sjRc+eBjybIspk2bFieeeGKMHDmy0uPk6vnnn4/Ro0fHf/7zn/jUpz4VS5cujcMOO6zSY+XmjjvuiDVr1kR7e3ulR8ndcccdF7fddlt87nOfi7/97W9xzTXXxPHHHx8vvPBCDBw4sNLj5eLll1+OBQsWxLRp0+KHP/xhPP3003HFFVdEsViMiy++uNLj5WrZsmXxr3/9K775zW9WepTcTJ8+PTo7O2P48OHRp0+f2L59e8yZMyfOO++8So+WRkZZRUS2dOnSSo9RNpdffnl20EEHZR0dHZUeJXelUil76aWXsvb29mzGjBnZ/vvvn73wwguVHisXGzduzBoaGrK1a9d2HzvppJOyKVOmVG6oMtq6dWs2aNCg7Oc//3mlR8nN3nvvnY0ePbrHscmTJ2df/OIXKzRR+YwbNy477bTTKj1GrpYsWZINGTIkW7JkSfbcc89lt912WzZgwIDslltuqfRoSbjywUc2efLkWL58eaxcuTKGDBlS6XFy17dv3+4bTkeNGhXt7e3xi1/8In75y19WeLKPb/Xq1bF58+Y45phjuo9t3749Vq5cGfPnz49SqdSrbtDs379/HH744fHSSy9VepTcHHjggTtdiTv00EPjV7/6VYUmKo9XX301HnzwwbjrrrsqPUquvv/978eMGTPi3HPPjYiIww8/PF599dVoa2uLiRMnVni68hMf7LYsy2Ly5MmxdOnSeOSRR2LYsGGVHimJLMuiVCpVeoxcnHzyyfH888/3ODZp0qQYPnx4TJ8+vVeFR0REqVSKP/3pT/GlL32p0qPk5oQTTtjpI+7r1q3rdV/g+e4N7e/emNlbvPnmm7HXXj1vu+zTp4+P2vLRbd26NdavX9+9v2HDhli7dm0MGDAgmpubKzhZPlpbW2Px4sVx9913R21tbWzatCkiIurr66Nfv34Vni4fP/zhD6OlpSWamppiy5Ytcccdd8Qjjzyy0yd9qlVtbe1O9+j0798/Bg4c2Cvu3fne974XEyZMiObm5ti8eXNcc8010dXV1av+j/LKK6+M448/Pq699to4++yz4+mnn46FCxfGwoULKz1abnbs2BGLFi2KiRMnRk1N7/p1NWHChJgzZ040NzfHiBEj4plnnonrr78+LrnkkkqPlkal3/fpjR5++OEsInbaJk6cWOnRcrGr1xYR2aJFiyo9Wm4uueSS7KCDDsr69u2bHXDAAdnJJ5+cPfDAA5Ueq6x60z0f55xzTnbggQdme++9d9bY2JideeaZveZ+nf/fr3/962zkyJFZsVjMhg8fni1cuLDSI+Xq/vvvzyIie/HFFys9Su66urqyKVOmZM3Nzdk+++yTHXzwwdlVV12VlUqlSo+WRCHLsqwy2QMAfBL5Ox8AQFLiAwBISnwAAEmJDwAgKfEBACQlPgCApMQHAJCU+AAAkhIfAEBS4gMASEp8AABJiQ8AIKn/BWk/M4kKz6sQAAAAAElFTkSuQmCC\n", "text/plain": [ "<Figure size 640x480 with 1 Axes>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# illustrate the random sampling code\n", "vals = 10\n", "n = 100\n", "w = np.arange(vals)+1 # linearly increasing weights\n", "wrs, raw = weightedRandomSample(n, w)\n", "print(np.sort(wrs))\n", "fig, ax = plt.subplots(nrows=1, ncols=1)\n", "bins = np.arange(np.min(wrs)-0.5, np.max(wrs)+0.5, 1)\n", "ax.hist(wrs, bins=bins)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4" } }, "nbformat": 4, "nbformat_minor": 4 }