{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Examples and Exercises from Think Stats, 2nd Edition\n", "\n", "http://thinkstats2.com\n", "\n", "Copyright 2016 Allen B. Downey\n", "\n", "MIT License: https://opensource.org/licenses/MIT\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from __future__ import print_function, division\n", "\n", "%matplotlib inline\n", "\n", "import numpy as np\n", "\n", "import brfss\n", "\n", "import thinkstats2\n", "import thinkplot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The estimation game\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Root mean squared error is one of several ways to summarize the average error of an estimation process." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def RMSE(estimates, actual):\n", " \"\"\"Computes the root mean squared error of a sequence of estimates.\n", "\n", " estimate: sequence of numbers\n", " actual: actual value\n", "\n", " returns: float RMSE\n", " \"\"\"\n", " e2 = [(estimate-actual)**2 for estimate in estimates]\n", " mse = np.mean(e2)\n", " return np.sqrt(mse)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following function simulates experiments where we try to estimate the mean of a population based on a sample with size `n=7`. We run `iters=1000` experiments and collect the mean and median of each sample." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Experiment 1\n", "rmse xbar 0.3862603150323446\n", "rmse median 0.48032836544952223\n" ] } ], "source": [ "import random\n", "\n", "def Estimate1(n=7, iters=1000):\n", " \"\"\"Evaluates RMSE of sample mean and median as estimators.\n", "\n", " n: sample size\n", " iters: number of iterations\n", " \"\"\"\n", " mu = 0\n", " sigma = 1\n", "\n", " means = []\n", " medians = []\n", " for _ in range(iters):\n", " xs = [random.gauss(mu, sigma) for _ in range(n)]\n", " xbar = np.mean(xs)\n", " median = np.median(xs)\n", " means.append(xbar)\n", " medians.append(median)\n", "\n", " print('Experiment 1')\n", " print('rmse xbar', RMSE(means, mu))\n", " print('rmse median', RMSE(medians, mu))\n", " \n", "Estimate1()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using $\\bar{x}$ to estimate the mean works a little better than using the median; in the long run, it minimizes RMSE. But using the median is more robust in the presence of outliers or large errors.\n", "\n", "\n", "## Estimating variance\n", "\n", "The obvious way to estimate the variance of a population is to compute the variance of the sample, $S^2$, but that turns out to be a biased estimator; that is, in the long run, the average error doesn't converge to 0.\n", "\n", "The following function computes the mean error for a collection of estimates." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def MeanError(estimates, actual):\n", " \"\"\"Computes the mean error of a sequence of estimates.\n", "\n", " estimate: sequence of numbers\n", " actual: actual value\n", "\n", " returns: float mean error\n", " \"\"\"\n", " errors = [estimate-actual for estimate in estimates]\n", " return np.mean(errors)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following function simulates experiments where we try to estimate the variance of a population based on a sample with size `n=7`. We run `iters=1000` experiments and two estimates for each sample, $S^2$ and $S_{n-1}^2$." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mean error biased -0.1374939469040375\n", "mean error unbiased 0.006257061945289593\n" ] } ], "source": [ "def Estimate2(n=7, iters=1000):\n", " mu = 0\n", " sigma = 1\n", "\n", " estimates1 = []\n", " estimates2 = []\n", " for _ in range(iters):\n", " xs = [random.gauss(mu, sigma) for i in range(n)]\n", " biased = np.var(xs)\n", " unbiased = np.var(xs, ddof=1)\n", " estimates1.append(biased)\n", " estimates2.append(unbiased)\n", "\n", " print('mean error biased', MeanError(estimates1, sigma**2))\n", " print('mean error unbiased', MeanError(estimates2, sigma**2))\n", " \n", "Estimate2()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The mean error for $S^2$ is non-zero, which suggests that it is biased. The mean error for $S_{n-1}^2$ is close to zero, and gets even smaller if we increase `iters`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The sampling distribution\n", "\n", "The following function simulates experiments where we estimate the mean of a population using $\\bar{x}$, and returns a list of estimates, one from each experiment." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def SimulateSample(mu=90, sigma=7.5, n=9, iters=1000):\n", " xbars = []\n", " for j in range(iters):\n", " xs = np.random.normal(mu, sigma, n)\n", " xbar = np.mean(xs)\n", " xbars.append(xbar)\n", " return xbars\n", "\n", "xbars = SimulateSample()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's the \"sampling distribution of the mean\" which shows how much we should expect $\\bar{x}$ to vary from one experiment to the next." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xl4VNX9x/H3l4R9VwJYAgQRF1zqEpdq645V6y4VECqoFduKUrefqDy4W7fauitaRGwFLdaWKm5txVoVJSqigCJVKAGVgILIFpJ8f3/MJWYyk0yA3NxZPq/n4cmcc09mPgQm37nbOebuiIiIADSLOoCIiKQPFQUREammoiAiItVUFEREpJqKgoiIVFNREBGRaioKIiJSTUVBRESqqSiIiEi1/KgDbKkuXbp4UVFR1DFERDLKO++8s8LdC1KNy7iiUFRURElJSdQxREQyipktbsg4HT4SEZFqKgoiIlJNRUFERKqpKIiISDUVBRERqRZaUTCzCWa23Mw+rGO7mdndZrbQzOaY2b5hZRERkYYJ85LUicC9wKQ6th8H9Av+HAg8EHwVEckp5Zsq2FRRWef2BYuWU1lVRcvm+fTr3ZVWLZuHliW0ouDu/zazonqGnAxM8th6oDPNrJOZ7eDun4eVSUSkqa1dv5GqKud/n39FRWUVX5St5u8z5rBy1VogVhC2xF1XDaKwW+cwogLR3rzWA1hSo10a9CUUBTMbCYwE6NWrV5OEExGpz2elK5izYCnrN5aTn5fHoqUreXP2f+ncoQ35eXlUeVX1L/7G1KZVi0Z/zpqiLAqWpM+TDXT38cB4gOLi4qRjRETC9sWKb3joyX8zZ0FpnWO+/mbdVj9/63p+4efnNWPAD3Zju45tt/r5GyLKolAK9KzRLgSWRZRFRCTO6jXr2bipgqt//1e+Wr2WFs3zt/hQT01G7FPvXjsXsuLrNbRu1YKfHLYnRT260K5NC7bv1K7Rsm+LKIvCNGCUmU0hdoJ5tc4niEhTWbt+I+vWl7P8qzXMWbCUiopK8vOa8d8lZbw3f0nC+LoKQvcuHejbqyvdtmvP+o2b2KGgI/vt3hsLjoV0aNuq3j2AdBNaUTCzycDhQBczKwWuAZoDuPuDwHTgeGAhsA44O6wsIiKbvf9xKdff/+w2P88dlw+kT2GXRkiUXsK8+mhIiu0OXBDW64uIALg7cxYs5W//nM37H9d9LqAubVu3ZO36jVwyYgC7FHWjZYt82rdtFULS9JBxU2eLiNSnfFMFQy57BID8/Dwq6rn+H2D7Tm1ZuWot3bt0YPedvkeXzu2orKxix54FHLhXn6aInFZUFEQk45V9tYYPFizlvskz4vrrKwg/2Lsvl509IORkmUdFQUQykrvzxuxPuXPiyw3+nvZtW3HWSQdx5EG7hpgss6koiEhGcXfOv/aPKW8My8/P4/djzqBDu1a0bJ5Pfn5eEyXMbCoKIpIR1q7fyP1PzGDmnM/qHNOyRXMKu3Xi/DMOpW+vlMsRSxIqCiKSlt7/uJTPl69m6fKvmf7vpJMtV9t/jyJOP2Yf+vXu1kTpspeKgoikDXfn3idmMOPtjxs0/qbRp7Drjt3DDZVjVBREJC3M+nARtzz8QoPGnnv6IRx/6J4hJ8pNKgoiEpm16zdy6yMvMndh3dOeFe/em5Ytm9N9+w4M+cn+mCWbS1Mai4qCiETC3TlrzKN1br9o2JH8aL+daNZMqwY3JRUFEWlyFRWVDLr04aTbCjq358FrhzZxItlMRUFEmszKVd8y/qnXKJm7OGHbFT8/ln1366n7CSKmoiAioVq6fBWjb5qSfAWtwA0XnUz/vjs0WSapm4qCiIRm9Zr1XHTTlHrH3H31YHp07dREiSQVFQURCcULr83l4amv1bn9wqFHcPgBuzRhImkIFQURaXTLlq9KWhBuueRU+vToovMGaUxFQUQaVVVVFRcmOWQ09ffn6x6DDKALgEWkUQ0OFrjZrGf3zjx91y9UEDKE9hREpFFs2lTJ4MsS7z244/KBEaSRraU9BRHZZlVVVUkLwg0XnazzBxlGRUFEttlPLx6f0HfRsCN170EG0uEjEdlqr7/336TLYU68eQTt27aKIJFsK+0piMhW+fCTpUkLwl1XDVJByGAqCiKyVa659+8JfVeffzyF3TpHkEYaiw4fiUiDuTvnjJ3EN9+uT9j29F2/iCCRNDYVBRFpsLPGPMq6DeUJ/SoI2UOHj0SkQdZvKE9aEM4b+KMI0khYtKcgIimtWrOOc8dOiuu7btSJ7NGvR0SJJCzaUxCReq1dvzGhIAAqCFlKewoiUqf3Py7l+vufTeh/4Botl5mtQi0KZnYscBeQBzzi7rfU2t4LeAzoFIwZ4+7Tw8wkIqlVVVUlvUsZYNItZ9O2dcsmTiRNJbTDR2aWB9wHHAf0B4aYWf9aw8YCT7n7PsBg4P6w8ohIw9VVEJ66c6QKQpYLc0/hAGChu38KYGZTgJOBeTXGONAheNwRWBZiHhFpgNE3P5nQt89uPRn9s6PIy9NpyGwXZlHoASyp0S4FDqw15lrgJTO7EGgLHB1iHhFJYc3aDZR++XVc371jh7BDQceIEklTC7PsJ1tRw2u1hwAT3b0QOB543MwSMpnZSDMrMbOSsrKyEKKKyB+nzWTEVRPj+sb+4icqCDkmzKJQCvSs0S4k8fDQucBTAO7+JtAK6FL7idx9vLsXu3txQUFBSHFFctdbcz7jmX/OTujfZ7eeSUZLNguzKMwC+plZHzNrQexE8rRaY/4HHAVgZrsRKwraFRBpQlVVVdz2hxcT+v9w41kRpJGohXZOwd0rzGwU8CKxy00nuPtcM7seKHH3acClwMNmdjGxQ0sj3L32ISYRCVGyK400l1HuCvU+heCeg+m1+sbVeDwPOCTMDCJSt7kLEy/4U0HIbbq+TCSHjbsn/ojuozcNjyiJpAsVBZEcdeGNk+Pa+/XvTYd2rSNKI+lCcx+J5KDTRz+Y0DfmvB9HkETSjfYURHJMsoJw5IG70qyZfh2I9hREcsodj76c0Pebi09l56JuEaSRdKSPBiI5onxTBW/O/m9c33WjTlRBkDgqCiI5wN0ZctkjcX1DTzhQC+VIAhUFkRww8NcPJfSdNmCfCJJIulNREMlyA5OcWP7jredEkEQygYqCSJZyd04f/WDC1MT3jh1C61YtIskk6U9FQSQLfVa6Iukho0vPHqCpsKVeKgoiWaaiopLLbp+a0D/sxAM5eO++ESSSTKL7FESyzNi7/5bQN+HG4XRsryksJDUVBZEs88ni5XFtzXoqW0KHj0SySO0pLH77fwMjSiKZSkVBJEuUzF2c0FfUI2F1W5F66fCRSBb46z9n8/i0mXF9115wYkRpJJNpT0Ekw81duCyhIPQp7MKeO2sKC9lyKgoiGa726mkAd1yucwmydVQURDJYsrURdLWRbAsVBZEMtHkKi9pUEGRbqSiIZBh3TzqFxe+vHBRBGsk2uvpIJMOcNebRhL6Hrh1Gl87tIkgj2UZFQSSDVFVVsW5DeVzfxJtH0L5tq4gSSbbR4SORDPLTi8fHtW8afYoKgjQqFQWRDHHRTVMS+nbdsXsESSSbqSiIZIBbH3mBpctXxfVNvHlENGEkq6koiGSAtz9YFNc+49hiHTaSUOhEs0iau/Oxf8S1Lzv7GH6w944RpZFspz0FkTRWUVHJ6+8ujOtTQZAwqSiIpLFBlz4c175y5HERJZFcEWpRMLNjzexjM1toZmPqGHOGmc0zs7lm9kSYeUQyibsn9BXv3juCJJJLQjunYGZ5wH3AAKAUmGVm09x9Xo0x/YArgUPc/Wsz6xpWHpFMM+X5krj2XVdpGgsJX5h7CgcAC939U3cvB6YAJ9cacx5wn7t/DeDuyxERAKa++E5cu7Bb54iSSC4Jsyj0AJbUaJcGfTXtDOxsZq+b2UwzOzbZE5nZSDMrMbOSsrKykOKKpI+BtWZA/clhe0aURHJNmEXBkvTVPkiaD/QDDgeGAI+YWaeEb3If7+7F7l5cUFDQ6EFF0sn6DeUJb5RzTjskkiySe8IsCqVAzxrtQmBZkjF/c/dN7v4Z8DGxIiGSk5YuX8WwKybE9V1/4UkRpZFcFGZRmAX0M7M+ZtYCGAzUXjfwr8ARAGbWhdjhpE9DzCSSttau35h0fqPdd/peBGkkV4VWFNy9AhgFvAjMB55y97lmdr2Zbf7o8yKw0szmAa8Al7v7yrAyiaQrd0+6TsJNo0+JII3kMkt2LXQ6Ky4u9pKSktQDRTJEXSupPfnb88jPz4sgkWQjM3vH3YtTjdMdzSIRuzDJIaOJN49QQZBIqCiIRGjd+nI+L1sd1zfhxuGaAVUio6IgEqGfjYm/0uiBa4bSsX3riNKIqCiIRGby9FkJfV23ax9BEpHvqCiIRKB8U0XCNBaT7/h5RGlEvqOiIBKBl9+YH9e+ZMQAWjTXmlcSPRUFkSZWUVHJhL+8Htd3yD59I0ojEq/eomBmE2s8Hh56GpEcUHvhnFOP2juiJCKJUu0pfL/G49FhBhHJBZ8uSZzld+iJB0aQRCS5VEUhs253Fklzl9/xdFx70i1nY5ZsQmGRaKQ6s1VoZncTmwZ78+Nq7n5RaMlEsswrb30c1y7o3J62rVtGlEYkuVRF4fIajzXhkMhWqqio5N4nXonru3/ckIjSiNSt3qLg7o81VRCRbHbbH16Ka++1cyHNmuniP0k/Kf9XmtlwM3vXzNYGf0rM7KymCCeSDd6Zu5h35i2O67vmghMiSiNSv3r3FIJf/r8GLgHeJXZuYV/gdjPD3SeFH1Ekc7k7N49/Pq5v6Am62kjSV6o9hV8Bp7r7K+6+2t1Xufu/gNODbSJSjyGXPRLXbtu6JacN2CeiNCKppSoKHdx9Ue3OoK9DGIFEssmmisq49qRbzo4oiUjDpCoK67dym0jOGzj6wbj2DRedHFESkYZLdUnqbmY2J0m/ATuGkEckK7h7wp2f/fvuEEkWkS2Rqih8H+gGLKnV3xtYFkoikSww7Ir4xXPuumpQRElEtkyqw0e/A75x98U1/wDrgm0iUou7s2Hjpri+wm6dI0ojsmVSFYUid084fOTuJUBRKIlEMty9T8yIa197wYmR5BDZGqmKQn2rh2shWZEkZrwdP8fRnjv3iCiJyJZLVRRmmdl5tTvN7FzgnSTjRXLaOWPjZ4Y5/4xDI0oisnVSnWj+NfCMmQ3luyJQDLQATg0zmEimeXbGHFavib9S+5hD+keURmTrpJoQ70vgYDM7Atgj6H4uuKtZRALvzV/Co8+8Edd3yYgBEaUR2XoNWinc3V8BXkk5UCRH3fjgc3Htg/bqo3WXJSNp7l6RbXT3HxN3nC8/98cRJBHZdioKItugoqKSV2ctiOt76s6REaUR2XYqCiLb4JGn/xPXHnHKweTl6W0lmSvU/71mdqyZfWxmC81sTD3jBpqZm1lxmHlEGpO78/Ib8+P6Tjxir4jSiDSO0IqCmeUB9wHHAf2BIWaWcH2embUHLgLeCiuLSBhG3Tg5rv2LQbonQTJfmHsKBwAL3f1Tdy8HpgDJ5g6+AbgN2BBiFpFG9VnpCr5Y8U1c31EH7RpRGpHGE2ZR6EH87KqlQV81M9sH6Onuz4aYQ6TRXXb71Lj2xWcdTbNmOpcgmS/M/8WWpK96inkza0ZsptVLUz6R2UgzKzGzkrKyskaMKLLlFi9bGddunp/HD/fbKaI0Io0rzKJQCvSs0S4kfg2G9sTukp5hZouAg4BpyU42u/t4dy929+KCgoIQI4ukdsmtf45ra4lNySZhFoVZQD8z62NmLYDBwLTNG919tbt3cfcidy8CZgInBdNyi6SlydNnJfS1aN6giQFEMkJoRcHdK4BRwIvAfOApd59rZteb2Ulhva5ImKa+GD85sG5Uk2wT6kccd58OTK/VN66OsYeHmUVkW9WeAXXoCQfqRjXJOvofLdJAT7/8blz7tAH7RJREJDwqCiIN9NyrH0QdQSR0KgoiDfDsjPilyoef8oOIkoiES0VBpAFqL6Bz4uGa40iyk4qCSArvzvtfXHuPft/DLNm9mSKZT0VBJIWbHoq7gI7rRumKasleKgoi9Zj83Ntx7UP21XQWkt1UFETqUFVVxdSX4i9DvfDMIyJKI9I0VBRE6vDTi8fHtYeecCDNm+dFlEakaagoiCRx52P/SOjTzWqSC1QURGp58MlXef3dhXF9d101KKI0Ik1LRUGkhtkfLUlYd/n4Q/egsFvniBKJNC0VBZFA6Zdfc8MDz8X1HX/oHpx7+g8jSiTS9FQURAKjb34yoU8FQXKNioIIMOyKCQl9T9/1iwiSiERLS0ZJTquqqkq49BTgyd+eF0EakehpT0FyWrKCMPSEA8nP1/0Ikpu0pyA56bPSFVx2+9SE/tE/O5JDi3eOIJFIelBRkJyzZu2GpAXh7qsH06NrpwgSiaQPHT6SnLJg0ZeMuGpiQv9No09RQRBBewqSY6783TMJfbrKSOQ72lOQnHHHoy8n9E39/fkRJBFJX9pTkJzwxLNv8+bs/8b1aQ9BJJH2FCTr/WvmRzz9cvy6CL8aclhEaUTSm4qCZLWKikrumzwjof+og3Zr+jAiGUBFQbKWuzPo0ofj+pqZ6bCRSD1UFCRrDfz1Qwl9f9aJZZF6qShIVjrz8j8k9N126ekRJBHJLLr6SLLOv2Z+xMbyTXF9d1w+kD6FXSJKJJI5tKcgWaf2ieURpxysgiDSQCoKklUemPJqXHu//r058Yi9IkojknlCLQpmdqyZfWxmC81sTJLtl5jZPDObY2b/NLPeYeaR7PbKWx/zjzfj11e+cuSxEaURyUyhFQUzywPuA44D+gNDzKx/rWHvAcXuvhcwFbgtrDyS3co3VXDvE6/E9V049AjMLKJEIpkpzD2FA4CF7v6pu5cDU4CTaw5w91fcfV3QnAkUhphHstiQyx6Jaxtw+AG7RBNGJIOFWRR6AEtqtEuDvrqcCzyfbIOZjTSzEjMrKSsra8SIkuncndNHP5jQP1U3qIlslTCLQrL9dk860GwYUAzcnmy7u49392J3Ly4oKGjEiJLpJj83K6Hv4et/FkESkewQ5n0KpUDPGu1CYFntQWZ2NHA1cJi7bwwxj2SZz8tWJ0x099hvzqZdm5YRJRLJfGHuKcwC+plZHzNrAQwGptUcYGb7AA8BJ7n78hCzSBYadePkuPaY845VQRDZRqEVBXevAEYBLwLzgafcfa6ZXW9mJwXDbgfaAX82s9lmNq2OpxOJc/NDiaef9t+jqOmDiGSZUKe5cPfpwPRafeNqPD46zNeX7FNZWcUZl4xP6NfMpyKNQ3c0S0ZJVhAuPkufLUQai4qCZIS6Lj0957RD+OF+O0WQSCQ7aZZUyQjDrpiQ0Df+umFs36ldBGlEspeKgqQ1d0+6WM6g44pVEERCoKIgaauuk8o3//oUdunTPYJEItlP5xQkLZVvqkhaEH520kEqCCIh0p6CpKXaE9wB3D/uTLpt3yGCNCK5Q0VB0kpFRSWDLn04of/xW86hTesWESQSyS0qCpI2SuYu5jfjE+9UnnjzCBUEkSaioiCRW71mPeeMfSzptit+fizt27Zq4kQiuUtFQSL15PMlPPVCSdJtf7rtXFq1bN7EiURym4qCROKr1Ws5b9zjSbf177sDN1x0ctJtIhIuFQVpUuWbKjh37CTWbShPun3SLWfTtrWmvxaJioqCNJlly1dx4U1Tkm4r3r03V448rokTiUhtKgrSJN6bv4QbH3wu6bYnf3se+fl5TZxIRJJRUZBQfbL4S8bc+UzSbXdfPZgeXTs1cSIRqY+KgoRi8bKVXHLrn+vcPvX352NmTZhIRBpCRUEaXbJ1D2rSKmki6UtFQRrF8q/WcPuEl/h0SVmdYybcOJyO7Vs3YSoR2VIqCrLNPliwlGvv+3vSbc3MuG/cmXTdrn0TpxKRraGiIFvlnzPn8483P2LBoi/rHDPuVyfw/V0KmzCViGwrFQVpsI3lm1j4vzLG3TOt3nG/ufhUdi7q1kSpRKQxqShISkuXr+KiOm46q2mHgo7cc/VgXVUkksFUFCSpb75dz6wPF3H/5FfrHbdv/14cf+ie7L1roYqBSBZQURDKN1Xw9pxFTPzrG+xYWMA78xan/J52bVry4DVDad1K6xyIZBMVhRz1WekKFi9byT1/eiWuv76C0L/vDlw47EhdSSSSxVQUcsRXq9cyb+Hn/G7SP7b4e/fo9z3+79wfa/ZSkRygopBFKioqWbR0JV+s/IavVq/lsb++Sbs2Lfl23cYGP0evHbbj9AH70rJlPt/fpZAWzfVfRCSX6B2fgSoqKvlk8XKe/89c3v9oCUU9tmflqrV8XrY6YWyqgrBLn+6s+XY91446ke07tQsrsohkCBWFNDfrw0VMfOYNvljxDa1aNmfDxk0JYz78ZNkWPef+exRxwZmHa+1jEUkQalEws2OBu4A84BF3v6XW9pbAJGA/YCUwyN0XhZkp3X306RdMmjaTZctXsWbthrhtyQpCXXp270yPrp3YrlNbBh6zH61bNdehIBFJKbTfEmaWB9wHDABKgVlmNs3d59UYdi7wtbvvZGaDgVuBQWFlqs/G8k18seKb6nb5pgo+K11J80Zc/KWispLZ85ewXae2Cdtmz1/CsiSHf1LJy2vGqUfvw247dsfM6LXDdnTu0KYx4opIDgrzo+MBwEJ3/xTAzKYAJwM1i8LJwLXB46nAvWZm7u6NHeaF1+by73c+oaqqKmHb0i9X1blmcLr4XkFHRp5xKH17FgDQulVz3SwmIo0uzKLQA1hSo10KHFjXGHevMLPVwPbAisYMsuLrb3lk6ms0eqUJUY+unTjmkP7s2qc7fXsVqACISJMIsygk+y1W+/dyQ8ZgZiOBkQC9evXa4iCrvlnX4ILQrFkzenSLLRH57doNbCyv4IC9+mzxa9bF3Sn7ag3771lEXrNmcds2llewb/+eFPXo0mivJyKyJcIsCqVAzxrtQqD2ZTKbx5SaWT7QEfiq9hO5+3hgPEBxcfEWf+Dvsl07fj7wh3wZnDM4eJ++Scf16NZJN2iJSE4LsyjMAvqZWR9gKTAYOLPWmGnAcOBNYCDwrzDOJ3Rq34bjfrRHYz+tiEjWCa0oBOcIRgEvErskdYK7zzWz64ESd58G/AF43MwWEttDGBxWHhERSS3UC9fdfTowvVbfuBqPNwA/DTODiIg0XLPUQ0REJFeoKIiISDUVBRERqaaiICIi1VQURESkmoVwW0CozKwMSL2IcP260MhTaYQg3TMq37ZRvm2T7vkg/TL2dveCVIMyrig0BjMrcffiqHPUJ90zKt+2Ub5tk+75IDMyJqPDRyIiUk1FQUREquVqURgfdYAGSPeMyrdtlG/bpHs+yIyMCXLynIKIiCSXq3sKIiKSRE4UBTO72MzmmtmHZjbZzFqZ2Z/M7OOgb4KZNU+nfDW23WNm30aVra58FnOTmS0ws/lmdlGa5TvKzN41s9lm9h8z2ynCfKODbHPN7NdB33Zm9rKZfRJ87Zxm+W43s4/MbI6ZPWNmnaLKV1fGGtsuMzM3s8hWp6orn5ldGPyemWtmt0WVb4u4e1b/Ibbk52dA66D9FDACOJ7Yym8GTAZ+mU75gsfFwOPAt2n48zsbmAQ0C/q7plm+BcBuQd+vgIkR5dsD+BBoQ2xW4n8A/YDbgDHBmDHArWmW7xggPxhza1T56ssYbOtJbHr+xUCXdMoHHBE8bhmMi+Q9sqV/cmJPgdg/VOtgdbc2wDJ3n+4B4G1iK8OlTT4zywNuB/4vwlybJeQDfglc7+5VAO6+PM3yOdAh2N6RxFX/mspuwEx3X+fuFcCrwKnAycBjwZjHgFPSKZ+7vxS0AWYS7fujrp8hwO+IvUeiPDlaV75fAre4+0aI/D3SYFlfFNx9KXAH8D/gc2C1u7+0eXtw2OhnwAtplm8UMM3dP48iVwPy9QUGmVmJmT1vZv3SLN/PgelmVkrs3/eWKPIR+wR5qJltb2ZtiO2h9gS6bf63Db52TbN8NZ0DPN/kyb6TNKOZnQQsdff3I8wGdf8MdwZ+ZGZvmdmrZrZ/pCkbKOuLQnCs9mSgD/A9oK2ZDasx5H7g3+7+WhrlO4vY4kP3RJGppnp+fi2BDR67Y/NhYEKa5bsYON7dC4FHgTujyOfu84kdfnmZ2AeP94GKer+pCaXKZ2ZXB+0/RRKQejNeDYyr51ubRD358oHOwEHA5cBTZmZR5WyorC8KwNHAZ+5e5u6bgL8ABwOY2TVAAXBJmuW7DtgJWGhmi4A2wZKl6ZLvYKAUeDoY8wywVxrlOwT4vru/FYx5kuDfPAru/gd339fdDyW27OwnwJdmtgNA8DWyQwt15MPMhgMnAEODw6yRSZJxEbEPAu8H75FC4F0z654m+T4h9h75S3CU+m2gith8SGktF4rC/4CDzKxNUKWPAuab2c+BHwNDNh8XT6N8d7p7d3cvcvciYJ27R3X1TNKfH/BX4MhgzGHETuymS755QEcz2zkYM4BY5kiYWdfgay/gNGIXNkwDhgdDhgN/iyZd8nxmdixwBXCSu6+LKttmSTJOcveuNd4jpcC+7v5FmuSbTI33SPB/sQXpNUFeUqGu0ZwO3P0tM5sKvEtsl+49YncariV2xcKbwR7dX9z9+jTKlxbqydca+JOZXQx8S+wYfjrlKwWeNrMq4Gtix8Wj8rSZbQ9sAi5w96/N7BZihxPOJVbYolyrPFm+e4kdInw5eH/MdPdfpFPGCLMkk+xnOAGYYGYfAuXA8Kj3uBpCdzSLiEi1XDh8JCIiDaSiICIi1VQURESkmoqCiIhUU1EQEZFqKgqSFczs6mAmyjnBzKgHhvx6M8ws49bfFUkl6+9TkOxnZj8gduftvu6+MZhCuUXEsUQykvYUJBvsAKyoMRvlCndfBmBm48xsVjDX/fjNc88En/R/Z2b/tth6EPub2V8str7BjcGYomBNgceCPZCpwYRncczsGDN702LrN/zZzNolGZPy9YJxw8zs7WBv56FgtlzM7IFg8sG5ZnZdjfGLzOy64LU/MLNdG/lnKzlGRUGywUvEZs1cYGb3m9lhNbbd6+77u/sexO7CPqHGtvJgrpoHiU03jRwjAAACAElEQVQzcQGxufFHBHenAuwCjHf3vYBviK3NUC3YKxkLHO3u+wIl1D2XVr2vZ2a7AYOAQ9x9b6ASGBp879XB5IN7AYeZWc25plYEr/0AcFnqH5dI3VQUJOO5+7fAfsBIoAx40sxGBJuPsNjUxR8Qm4dm9xrfOi34+gEw190/D/Y2PuW76aOXuPvrweM/Aj+s9fIHAf2B181sNrF5jHrXETXV6x0V/D1mBc91FLBj8D1nmNm7xKbx2D14zc3+Enx9Byiq47VFGkTnFCQruHslMAOYERSA4WY2hdjU6MXuvsTMrgVa1fi2jcHXqhqPN7c3vzdqzwNTu23Ay+4+pAExU72eAY+5+5VxL2DWh9gewP7BnDoT6/h7VKL3tGwj7SlIxjOzXSx+kZ+9iU12uPkX54rgOP/ArXj6XsGJbIAhwH9qbZ8JHGLBGtDBbK07s3X+CQysMePmdmbWm9gKcmuB1WbWDThuK59fJCV9qpBs0A64x2KLy1cAC4GR7r7KzB4mdrhmETBrK557PrG9joeIzZH/QM2N7l4WHKqabGYtg+6xbMVU4u4+z8zGAi+ZWTO+m3Fzppm9B8wldqjp9fqeR2RbaJZUkTqYWRHwbHCSWiQn6PCRiIhU056CiIhU056CiIhUU1EQEZFqKgoiIlJNRUFERKqpKIiISDUVBRERqfb/Zlk9L7gkTxUAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "cdf = thinkstats2.Cdf(xbars)\n", "thinkplot.Cdf(cdf)\n", "thinkplot.Config(xlabel='Sample mean',\n", " ylabel='CDF')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The mean of the sample means is close to the actual value of $\\mu$." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "89.94056816952832" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.mean(xbars)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An interval that contains 90% of the values in the sampling disrtribution is called a 90% confidence interval." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(85.87345905535176, 94.11925824713033)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ci = cdf.Percentile(5), cdf.Percentile(95)\n", "ci" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the RMSE of the sample means is called the standard error." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.487879588208278" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stderr = RMSE(xbars, 90)\n", "stderr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Confidence intervals and standard errors quantify the variability in the estimate due to random sampling." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Estimating rates\n", "\n", "The following function simulates experiments where we try to estimate the mean of an exponential distribution using the mean and median of a sample. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "rmse L 1.075066354067901\n", "rmse Lm 1.7723826281429338\n", "mean error L 0.315202440018787\n", "mean error Lm 0.464767815400564\n" ] } ], "source": [ "def Estimate3(n=7, iters=1000):\n", " lam = 2\n", "\n", " means = []\n", " medians = []\n", " for _ in range(iters):\n", " xs = np.random.exponential(1.0/lam, n)\n", " L = 1 / np.mean(xs)\n", " Lm = np.log(2) / thinkstats2.Median(xs)\n", " means.append(L)\n", " medians.append(Lm)\n", "\n", " print('rmse L', RMSE(means, lam))\n", " print('rmse Lm', RMSE(medians, lam))\n", " print('mean error L', MeanError(means, lam))\n", " print('mean error Lm', MeanError(medians, lam))\n", " \n", "Estimate3()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The RMSE is smaller for the sample mean than for the sample median.\n", "\n", "But neither estimator is unbiased." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercises" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise:** In this chapter we used $\\bar{x}$ and median to estimate µ, and found that $\\bar{x}$ yields lower MSE. Also, we used $S^2$ and $S_{n-1}^2$ to estimate σ, and found that $S^2$ is biased and $S_{n-1}^2$ unbiased.\n", "Run similar experiments to see if $\\bar{x}$ and median are biased estimates of µ. Also check whether $S^2$ or $S_{n-1}^2$ yields a lower MSE." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise:** Suppose you draw a sample with size n=10 from an exponential distribution with λ=2. Simulate this experiment 1000 times and plot the sampling distribution of the estimate L. Compute the standard error of the estimate and the 90% confidence interval.\n", "\n", "Repeat the experiment with a few different values of `n` and make a plot of standard error versus `n`.\n", "\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise:** In games like hockey and soccer, the time between goals is roughly exponential. So you could estimate a team’s goal-scoring rate by observing the number of goals they score in a game. This estimation process is a little different from sampling the time between goals, so let’s see how it works.\n", "\n", "Write a function that takes a goal-scoring rate, `lam`, in goals per game, and simulates a game by generating the time between goals until the total time exceeds 1 game, then returns the number of goals scored.\n", "\n", "Write another function that simulates many games, stores the estimates of `lam`, then computes their mean error and RMSE.\n", "\n", "Is this way of making an estimate biased?" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "def SimulateGame(lam):\n", " \"\"\"Simulates a game and returns the estimated goal-scoring rate.\n", "\n", " lam: actual goal scoring rate in goals per game\n", " \"\"\"\n", " goals = 0\n", " t = 0\n", " while True:\n", " time_between_goals = random.expovariate(lam)\n", " t += time_between_goals\n", " if t > 1:\n", " break\n", " goals += 1\n", "\n", " # estimated goal-scoring rate is the actual number of goals scored\n", " L = goals\n", " return L" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 1 }