{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Algorithms for Pseudo-Random Permutations and Samples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## Random Sampling\n", "\n", "Random sampling is one of the most fundamental tools in Statistics.\n", "\n", "Used for:\n", "\n", "+ Surveys and extrapolation\n", " - Opinion surveys and polls\n", " - Census (for some purposes) and Current Population Survey\n", " - Environmental statistics\n", " - Litigation, including class actions, discrimination, ...\n", "+ Experiments\n", " - Medicine / clinical trials\n", " - Agriculture\n", " - Marketing\n", " - Product development\n", "+ Quality control and auditing\n", " - Process control\n", " - Financial auditing\n", " - Healthcare auditing\n", " - Election auditing\n", "+ Sampling and resampling methods\n", " - Bootstrap\n", " - Permutation tests\n", " - MCMC\n", "\n", "and on and on.\n", "\n", "---\n", "## Simple random sampling\n", "\n", "Draw $k \\le n$ items from a population of $n$ items, in such a way that\n", "each of the $n \\choose k$ subsets of size $k$ is equally likely.\n", "\n", "Many standard statistical methods assume the sample is drawn in this way, or allocated between treatment and control in this way (e.g., $k$ of $n$ subjects are assigned to treatment, and the remaining $n-k$ to control).\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## Random permutations\n", "\n", "$n!$ possible permutations of $n$ distinct objects.\n", "\n", "$n!$ grows quickly with $n$: \n", "\n", "$$ n! \\sim \\sqrt {2\\pi n}\\left({\\frac {n}{e}}\\right)^n (\\mbox{Stirling's approximation})$$ " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pigeon-hole principle\n", "\n", "If you put $N>n$ pigeons in $n$ pigeonholes, then at least one\n", "pigeonhole must contain more than one pigeon.\n", "\n", "## Corollary\n", "At most $n$ pigeons can be put in $n$ pigeonholes if at most\n", "one pigeon is put in each hole." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Some pigeon coops & flocks\n", "\n", "| Expression | full | scientific notation | \n", "|:---------------:|----------:|----------:|\n", "|$2^{32}$ | 4,294,967,296 | 4.29e9 |\n", "|$2^{64}$ | 18,446,744,073,709,551,616 | 1.84e19|\n", "|$2^{128}$ | | 3.40e38 |\n", "|$2^{32 \\times 624}$ | | 9.27e6010|\n", "|$13!$ | 6,227,020,800 | 6.23e9|\n", "|$21!$ | 51,090,942,171,709,440,000 | 5.11e19| \n", "|$35!$ | | 1.03e40|\n", "|$2084!$ | | 3.73e6013 |\n", "|${50 \\choose 10}$ | 10,272,278,170 | 1.03e10|\n", "|${100 \\choose 10}$ | 17,310,309,456,440 | 1.73e13|\n", "|${500 \\choose 10}$ | | 2.4581e20 |\n", "|$\\frac{2^{32}}{{50 \\choose 10}}$ | 0.418 | |\n", "|$\\frac{2^{64}}{{500 \\choose 10}}$ | 0.075 | |\n", "\n", "We will come back to these numbers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "