{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Statistical Inference" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of the most fundamental problems statistics tries to address is _causal inference_, determining what effect some _treatment_ has.\n", "\n", "One of the most serious difficulties in making causal inferences is _confounding_, the differences due to something other than treatment masquerading as the effect of treatment.\n", "\n", "### The Method of Comparison\n", "The basic method to mitigate confounding is to compare(at least) two groups, one that receives _treatment_ and a _control group_ that does not (or that gets a different treatment).\n", "To minimize bias, the treatment group and the control group should be as similar as possible but for the fact that\n", "one gets treatment and the other does not.\n", "\n", "If subjects _self-select_ for treatment, that generally results in bias. So does allowing the experimenter flexibility to select the groups.\n", "The best way to minimize bias, and to be able to quantify the uncertainty in the resulting inferences, is to assign subjects to treatment or control _randomly_.\n", "\n", "For human subjects, the mere fact of receiving treatment—even a treatment with no real effect—can\n", "produce changes in response. This is called _the placebo effect_.\n", "For that reason, it is important that human subjects be _blind_ to whether they are treated or not, for instance,\n", "by giving subjects in the control group a _placebo_.\n", "That makes the treatment and control groups more similar.\n", "Both groups receive something: the difference is in _what_ they\n", "receive, rather than _whether_ they receive anything.\n", "\n", "Also, subjective elements can deliberately or inadvertently enter the assessment of subjects' responses to treatment,\n", "making it important for the people assessing the responses to be _blind_ to which subjects received treatment.\n", "When neither the subjects nor the assessors know who was treated, the experiment is _double blind_.\n", "\n", "See [SticiGui: Does Treatment Have an Effect?](http://www.stat.berkeley.edu/~stark/SticiGui/Text/experiments.htm) for more discussion." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The 2-sample problem \n", "\n", "Suppose we have a group of $N$ individuals who are randomized into two groups, a _treatment_ group of size $N_t$ and a _control_ group of size $N_c = N - N_t$.\n", "Label the individuals from $1$ to $N$.\n", "Let ${\\mathcal T}$ denote the labels of individuals assigned to treatment and ${\\mathcal C}$ denote \n", "the labels of those assigned to control.\n", "\n", "For each of the $N$ individuals, we measure a quantitative (real-valued) response.\n", "Each individual $i$ has two _potential responses_: the response $c_i $individual would have if assigned to \n", "the control group, and the response $t_i$ the individual would have if assigned to the treatment group.\n", "\n", "We assume that individual $i$'s response depends _only_ on that individual's assigment, and not on anyone else's assignment.\n", "This is the assumption of _non-interference_. \n", "In some cases, this assumption is reasonable; in others, it is not.\n", "\n", "For instance, imagine testing a vaccine for a communicable disease.\n", "If you and I have contact, whether you get the disease might depend on whether I am vaccinated—and _vice versa_—since if the vaccine protects me from illness, I won't infect you.\n", "Similarly, suppose we are testing the effectiveness of an advertisement for a product.\n", "If you and I are connected and you buy the product, I might be more likely to buy it, even if I don't\n", "see the advertisement.\n", "\n", "Conversely, suppose that \"treatment\" is exposure to a carcinogen, and the response whether the\n", "subject contracts cancer. \n", "On the assumption that cancer is not communicable, my exposure and your disease\n", "status have no connection.\n", "\n", "The _strong null hypothesis_ is that individual by individual, treatment makes no difference whatsoever: $c_i = t_i$ for all $i$.\n", "\n", "If so, any differences between statistics computed for the treatment and control groups are entirely due to the luck of the draw: which individuals happened to be assigned to treatment and which to control.\n", "\n", "We can find the _null distribution_ of any statistic computed from the responses of the two groups: if the strong null hypothesis is true, we know what individual $i$'s response would have been whether assigned to treatment or to control—namely, the same.\n", "\n", "For instance, suppose we suspect that treatment tends to increase response: in general, $t_i \\ge c_i$.\n", "Then we might expect $\\bar{c} = \\frac{1}{N_c} \\sum_{i \\in {\\mathcal C}} c_i$ to be less than\n", "$\\bar{t} = \\frac{1}{N_t} \\sum_{i \\in {\\mathcal T}} t_i$.\n", "How large a difference between $\\bar{c}$ and $\\bar{t}$ would be evidence that treatment increases the response,\n", "beyond what might happen by chance through the luck of the draw?\n", "\n", "This amounts to asking whether the observed difference in means between the two groups is a high percentile\n", "of the distribution of that difference in means, calculated on the assumption that the null hypothesis is true.\n", "\n", "Because of how subjects are assigned to treatment or to control, all allocations of $N_t$ subjects to\n", "treatment are equally likely.\n", "\n", "One way to partition the $N$ subjects randomly into a group of size $N_c$ and a group of size $N_t$ is\n", "to permute the $N$ subjects at random, then take the first $N_c$ in the permuted list to be the control\n", "group, and the remaining $N_t$ to be the treatment group." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Aside: Random number generation and simulation\n", "\n", "Most computers cannot generate true random numbers (there are rare exceptions that have _hardware random\n", "number generators_).\n", "Instead, they generate _psdudo-random numbers_ using algorithms called _pseudo-random number generators_ (PRNGs).\n", "PRNGs are deterministic, but the sequence of numbers they generate can behave like random numbers for\n", "some purposes—depending on the PRNG and the purpose.\n", "\n", "Older languages, operating systems, and non-statistical software (including Excel) tend to use low-quality\n", "PRNGs that are not suitable for statistical inference.\n", "\n", "A _linear congruential generator_ makes sequences of the form\n", "\n", "$$ X_{n+1} = ( a X_n + c ) \\mbox{ mod } m,$$\n", "\n", "with $m > 0$, $0 \\le c < m$, $0 < a < m$, and \"seed\" $X_0$.\n", "This generator has the benefit of simplicity, but the quality is not adequate for statistics.\n", "Moreover, for some choices of $a$, $c$, and $m$, this can have especially poor behavior.\n", "One particularly bad generator, RANDU, was used on IBM mainframes in the 1960s.\n", "It has $m = 2^{31}$, $c=0$, and $a = 65539$.\n", "It generates sequences with a very strong correlation: if you plot triples of numbers it generates\n", "as points in $\\Re^3$, the numbers line up on 15 planes.\n", "Despite the fact that problems with this generator were already known in the early 1960s,\n", "it was implemented in the IBM/360 in the 1970s and widely promulgated.\n", "(See Marsaglia, G., 1968. Random numbers fall mainly in the planes. _PNAS_, _61_ 25-28. http://www.pnas.org/content/61/1/25.full.pdf)\n", "Many scientific \"results\" are now in question because they relied on RANDU.\n", "\n", "The Wichmann-Hill generator (published in 1982)\n", "was popular for a while; it combines three linear congruential generators.\n", "Excel purported to implement it, but at least through 2008, its implementation had bugs.\n", "(For instance,\n", "despite the fact that the Wichmann-Hill generator gives numbers between 0 and 1, the Excel\n", "PRNG occasionally gave negative numbers.\n", "See McCullough, B.D., 2008. Microsoft's 'Not the Wichmann-Hill' random number generator. _Computational Statistics and Data Analysis_, _52_, 4587–4593.)\n", "\n", "Current high-end statistics packages and programming languages (e.g., R, Python, SAS, MATLAB) use the Mersenne Twister\n", "PRNG.\n", "The Mersenne Twister has a very long period ($2^{19937}-1$) and passed the [DIEHARD tests](https://en.wikipedia.org/wiki/Diehard_tests) for equidistribution, etc.\n", "However, it is not adequate for cryptography (the _state space_ is so small that its future values can be predicted\n", "from a relatively small number of observations).\n", "_Linear congruential_ generators are generally not adequate for statistics.\n", "In particular, beware the algorithms in _Numerical Recipes_ and the Excel PRNG." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Random sampling\n", "\n", "One standard way to draw a (pseudo-)random sample from a list is to assign a pseudo-random number to each\n", "element of the list, then sort the list by those numbers.\n", "Thus, the first element of the sample is the element of the list that was assigned the smallest number,\n", "the second element of the sample is the list element assigned the second-smallest number, etc.\n", "This generates a random sample in the same way that shuffling cards well generates random \"hands\" of cards.\n", "\n", "Formally, this is fine, but PRNGs have limitations.\n", "Even the Mersenne Twister runs into trouble generating random permutations of long vectors using the naiive approach (assign a random number to each element of the vector, then sort by those numbers).\n", "The period of the Mersenne Twister is about $4 \\times 10^{6002}$. \n", "That's less than the number of\n", "permutations of 2081 objects.\n", "\n", "### The Knuth Shuffle\n", "\n", "Constructing a random permutation as described above is inefficient, in part because it requires sorting.\n", "The _Knuth Shuffle_ is more efficient and can generate a random permutation in place.\n", "The Knuth shuffle goes through the list, randomly swapping the current element\n", "with elements above it in the list (or leaving it in place).\n", "That is, item $i$ is swapped with item $j \\le i$ with probability $1/i$\n", "(swapping item $i$ with item $i$ leaves it in place, of course)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# boilerplate\n", "%matplotlib inline\n", "from __future__ import division\n", "import math\n", "import numpy as np\n", "import scipy as sp\n", "from scipy import stats # distributions\n", "from scipy import special # special functions\n", "from scipy import random # random variables, distributions, etc.\n", "import matplotlib.pyplot as plt\n", "from IPython.html import widgets\n", "# from IPython.html.widgets import interact, interactive, FloatSliderWidget, fixed # interactive stuff" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Random sampling using random permutations\n", "n = 20\n", "d = range(1,n+1) # dummy data for illustration\n", "np.random.RandomState(seed=20150824) # deliberately set the seed for reproducible results\n", "x = np.random.random(n) # to use scipy instead of numpy, we could set x = sp.random.uniform(low=0, high=1, size=n)\n", "i = np.argsort(x)\n", "sam = [d[j] for j in i]\n", "print 'x:', x\n", "print 'i:', i\n", "print 'shuffled data:', sam" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Computational Assignment\n", "\n", "Implement Knuth's algorithm in Python from scratch, using numpy.random.random().\n", "\n", "Compare the cpu time taken to generate $10^4$ random permutations of arrays of length $10^k$, for $k = 2, 3, 4, 5$ for\n", "1. your implementation of Knuth's algorithm\n", "1. the method above generating $10^k$ uniform random numbers and sorting the results\n", "1. the numpy.random.shuffle(), which randomly shuffles the array in place\n", "1. numpy.random.permutation(), which generates a new permutated array\n", "\n", "IPython has a \"magic\" command %%timeit that will help you do this. For an example, see the cell below:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%timeit n = 10**3\n", "x = range(n)\n", "for i in range(n):\n", " np.random.shuffle(x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%timeit n = 10**3\n", "x = range(n)\n", "for i in range(n):\n", " np.random.permutation(x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%timeit n= 10**3\n", "x = range(n)\n", "for i in range(n):\n", " y = np.random.random(n) # to use scipy instead of numpy, we could set x = sp.random.uniform(low=0, high=1, size=n)\n", " i = np.argsort(y)\n", " sam = [x[j] for j in i]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hypothesis testing\n", "[To do.]\n", "\n", "### Null and alternative hypotheses\n", "[To do.]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Significance level and power\n", "\n", "Chance of the same event: rejecting the null hypothesis.\n", "\n", "Currently vilified, partly because it is misused.\n", "\n", "[To do.]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## $P$-values\n", "[To do.]\n", "Start with family of hypothesis tests for any desired significance level..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Confidence Sets\n", "[To do.]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Confidence bounds for Binomial $p$ and Hypergeometric $G$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from scipy.optimize import brentq\n", "from scipy.stats import (binom, hypergeom)\n", "\n", "def binom_conf_interval(n, x, cl=0.975, alternative=\"two-sided\", p=None,\n", " **kwargs):\n", " \"\"\"\n", " Compute a confidence interval for a binomial p, the probability of success in each trial.\n", "\n", " Parameters\n", " ----------\n", " n : int\n", " The number of Bernoulli trials.\n", " x : int\n", " The number of successes.\n", " cl : float in (0, 1)\n", " The desired confidence level.\n", " alternative : {\"two-sided\", \"lower\", \"upper\"}\n", " Indicates the alternative hypothesis.\n", " p : float in (0, 1)\n", " Starting point in search for confidence bounds for probability of success in each trial.\n", " kwargs : dict\n", " Key word arguments\n", "\n", " Returns\n", " -------\n", " tuple\n", " lower and upper confidence level with coverage (approximately)\n", " 1-alpha.\n", "\n", " Notes\n", " -----\n", " xtol : float\n", " Tolerance\n", " rtol : float\n", " Tolerance\n", " maxiter : int\n", " Maximum number of iterations.\n", " \"\"\"\n", " assert alternative in (\"two-sided\", \"lower\", \"upper\")\n", "\n", " if p is None:\n", " p = x / n\n", " ci_low = 0.0\n", " ci_upp = 1.0\n", "\n", " if alternative == 'two-sided':\n", " cl = 1 - (1-cl)/2\n", "\n", " if alternative != \"upper\" and x > 0:\n", " f = lambda q: cl - binom.cdf(x - 1, n, q)\n", " ci_low = brentq(f, 0.0, p, *kwargs)\n", " if alternative != \"lower\" and x < n:\n", " f = lambda q: binom.cdf(x, n, q) - (1 - cl)\n", " ci_upp = brentq(f, 1.0, p, *kwargs)\n", "\n", " return ci_low, ci_upp\n", "\n", "\n", "def hypergeom_conf_interval(n, x, N, cl=0.975, alternative=\"two-sided\", G=None,\n", " **kwargs):\n", " \"\"\"\n", " Confidence interval for a hypergeometric distribution parameter G, the number of good \n", " objects in a population in size N, based on the number x of good objects in a simple\n", " random sample of size n.\n", "\n", " Parameters\n", " ----------\n", " n : int\n", " The number of draws without replacement.\n", " x : int\n", " The number of \"good\" objects in the sample.\n", " N : int\n", " The number of objects in the population.\n", " cl : float in (0, 1)\n", " The desired confidence level.\n", " alternative : {\"two-sided\", \"lower\", \"upper\"}\n", " Indicates the alternative hypothesis.\n", " G : int in [0, N]\n", " Starting point in search for confidence bounds for the hypergeometric parameter G.\n", " kwargs : dict\n", " Key word arguments\n", "\n", " Returns\n", " -------\n", " tuple\n", " lower and upper confidence level with coverage (at least)\n", " 1-alpha.\n", "\n", " Notes\n", " -----\n", " xtol : float\n", " Tolerance\n", " rtol : float\n", " Tolerance\n", " maxiter : int\n", " Maximum number of iterations.\n", " \"\"\"\n", " assert alternative in (\"two-sided\", \"lower\", \"upper\")\n", "\n", " if G is None:\n", " G = (x / n)*N\n", " ci_low = 0\n", " ci_upp = N\n", "\n", " if alternative == 'two-sided':\n", " cl = 1 - (1-cl)/2\n", "\n", " if alternative != \"upper\" and x > 0:\n", " f = lambda q: cl - hypergeom.cdf(x-1, N, q, n)\n", " ci_low = math.ceil(brentq(f, 0.0, G, *kwargs))\n", "\n", " if alternative != \"lower\" and x < n:\n", " f = lambda q: hypergeom.cdf(x, N, q, n) - (1-cl)\n", " ci_upp = math.floor(brentq(f, G, N, *kwargs))\n", "\n", " return ci_low, ci_upp\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Permutation tests\n", "Basic setup: null hypothesis under which the probability distribution of the data is invariant under the action of some group.\n", "That is, if the null hypothesis is true, then there are many sets of data that would be (known to be) just as\n", "likely as the data actually observed.\n", "\n", "Examples: symmetry about a point, rotational invariance, ...\n", "[To do.]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Neyman model for causal inference\n", "The ticket model.\n", "Each \"subject\" is represented by a ticket with two numbers on it, the response that the subject would have\n", "if assigned to control, and the response if assigned to treatment. \n", "These are assumed to be pre-ordained, fixed numbers. \n", "(Note that non-interference is built into this framework.)\n", "\n", "We assign subjects at random according to the experimental design.\n", "\n", "Non-interference: when is it realistic?\n", "Strong null hypothesis.\n", "\n", "### Generalizations\n", "Two probability distributions on each ticket instead of two numbers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Gender Bias in Teaching Evaluations\n", "MacNell, Driscoll, and Hunt (2014. [What's in a Name: Exposing Gender Bias in Student Ratings of Teaching](http://link.springer.com/article/10.1007%2Fs10755-014-9313-4), _Innovative Higher Education_) conducted a controlled, randomized experiment on\n", "the effect of students' perception of instructors' gender on teaching evaluations\n", "in an online course.\n", "Students in the class did not know the instructors' true genders.\n", "\n", "MacNell et al. randomized 43 students in an online course into four groups: two taught by a female\n", "instructor and two by a male instructor.\n", "One of the groups taught by each instructor was led to believe the instructor was male;\n", "the other was led to believe the instructor was female.\n", "Comparable instructor biographies were given to all students.\n", "Instructors treated the groups identically, including returning assignments at the same time.\n", "\n", "When students thought the instructor was female, students rated the instructor lower, on average,\n", "in every regard.\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Characteristic F - M
Caring -0.52
Consistent -0.47
Enthusiastic -0.57
Fair -0.76
Feedback -0.47
Helpful -0.46
Knowledgeable -0.35
Praise -0.67
Professional -0.61
Prompt -0.80
Respectful -0.61
Responsive -0.22
\n", "\n", "Those results are for a 5-point scale, so a difference of 0.8 is 16% of the entire scale." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "MacNell et al. graciously shared their data.\n", "The evaluation data are coded as follows:\n", "\n", " Group\n", " 3 (8 students) - TA identified as male, true TA gender female \n", "\t 4 (12 students) - TA identified as male, true TA gender male\n", "\t 5 (12 students) - TA identified as female, true TA gender female\n", "\t 6 (11 students) - TA identified as female, true TA gender male\n", " tagender - 1 if TA is actually male, 0 if actually female \n", " taidgender - 1 if TA is identified as male, 0 if identified as female \n", " gender - 1 if student is male, 0 if student is female\n", "\n", "There are grades for 47 students but evaluations for only 43 (4 did not respond). \n", "The grades are not linked to the evaluations, per the IRB protocol.\n", "\n", "Let's think about the experiment: the students were assigned at random to four groups,\n", "with each group equally likely " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Interlude: partitioning sets into more than two subsets\n", "\n", "Recall that there are $n \\choose k$ ways of picking a subset of size $k$ from $n$ items;\n", "hence there are $n \\choose k$ ways of dividing a set into a subset of size $k$ and one of size $n-k$\n", "(once you select those that belong to the subset of size $k$, the rest must be in the complementary\n", "subset of size $n-k$.\n", "\n", "In this problem, we are partitioning 43 things into 4 subsets, one of size 8, one of size 11, and\n", "two of size 12.\n", "How many ways are there of doing that?\n", "\n", "\n", "Recall the [Fundamental Rule of Counting](http://www.stat.berkeley.edu/~stark/SticiGui/SticiGui/Text/counting.htm#fundamental_rule): \n", "If a set of choices, $T_1, T_2, \\ldots, T_k$, could result, respectively, \n", "in $n_1, n_2, \\ldots, n_k$ possible outcomes, the entire set of $k$ choices has\n", "$\\prod_{i=1}^k n_k$ possible outcomes.\n", "\n", "We can think of the allocation of students to the four groups as choosing 8 of the 43\n", "students for the first group, then 11 of the remaining 35 for the second, \n", "then 12 of the remaining 24 for the third.\n", "The fourth group must containe the remaining 12.\n", "\n", "The number of ways of doing that is\n", "\n", "$$ \n", " {43 \\choose 8}{35 \\choose 11}{24 \\choose 12} =\n", " \\frac{43}{8! 35!} \\frac{35!}{11! 24!} \\frac{24!}{12! 12!} = \\frac{43!}{8! 11! 12! 12!}.\n", "$$\n", "\n", "Does the number depend on the order in which we made the choices?\n", "Suppose we made the choices in a different order: first 12 students for one group, then\n", "8 for the second, then 12 for the third (the fourth gets the remaining 11 students).\n", "The number of ways of doing that is\n", "$$ \n", " {43 \\choose 12}{31 \\choose 8}{23 \\choose 12} =\n", " \\frac{43}{12! 31!} \\frac{31!}{8! 23!} \\frac{23!}{12! 11!} = \\frac{43!}{8! 11! 12! 12!}.\n", "$$\n", "The number does not depend on the order in which we make the choices.\n", "\n", "By the same reasoning, the number of ways of dividing a set of $n$ objects into\n", "$m$ subsets of sizes $k_1, \\ldots k_m$ is given by the _multinomial coefficient_\n", "$$\n", " {n \\choose k_1, k_2, \\ldots, k_m} =\n", " {n \\choose k_1}{n-k_1 \\choose k_2} {n-k_1-k_2 \\choose k_3} \\cdots {n - \\sum_{i=1}^{m-1} k_i \\choose k_{m-1}}\n", "$$\n", "\n", "$$ = \\frac{n! (n-k_1)! (n-k_1 - k_2)! \\cdots \n", " (n - k_1 - \\cdots - k_{m-1}!}{k_1! (n-k_1)! k_2! (n-k_1-k_2)! \\cdots k_m!}\n", " = \\frac{n!}{\\prod_{i=1}^m k_i!}.\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will check how surprising it would be for the means to be so much lower when the TA is identified as female, if in fact there is \"no real difference\" in how they were rated, and the apparent difference is just due to the luck of the draw: which students happened to end up in which section.\n", "\n", "In the actual randomization, all $43 \\choose 8, 11, 12, 12$ allocations\n", "were equally likely.\n", "But there might be real differences between the two instructors.\n", "Hence, we'd like to use each of them as his or her own \"control.\"\n", "\n", "Each student's potential responses are represented by a ticket with 4 numbers:\n", "\n", "+ the rating that the student would assign to instructor 1 if instructor 1 is identified as male\n", "+ the rating that the student would assign to instructor 1 if instructor 1 is identified as female\n", "+ the rating that the student would assign to instructor 2 if instructor 2 is identified as male\n", "+ the rating that the student would assign to instructor 2 if instructor 2 is identified as female\n", "\n", "The null hypothesis is that the first two numbers are equal and the second two numbers are equal,\n", "but the first two numbers might be different from the second two numbers.\n", "This corresponds to the hypohtesis that \n", "students assigned to a given TA would rate him or her the same, whether that TA seemed to be male or female.\n", "For all students assigned instructor 1, we know both of the first two numbers if the hull hypothesis\n", "is true; but we know neither of the second two numbers.\n", "Similarly, if the null hypothesis is true, we know both of the second two numbers for all students\n", "assigned to instructor 2, but we know neither of the first two numbers for those students.\n", "\n", "Because of how the randomization was performed, all allocations \n", "of students to sections that keep the number of students in each section the same are equally likely, so\n", "in particular all allocations that keep the same students assigned to each actual instructor\n", "the same are equally likely.\n", "\n", "Hence, all ${20 \\choose 8}$ ways of splitting the 20 students assigned to the female instructor into two groups, one with 8 students and one with 12, are equally likely. Similarly, all\n", "${23 \\choose 12}$ ways of splitting the 23 students assigned to the male instructor into two groups, one with 12 students and one with 11, are equally likely.\n", "We can thus imagine shuffling the female TA's students between her sections, and the male TA's students\n", "between his sections, and examine the distribution of the difference between the mean score for the sections where the\n", "TA was identified as male is larger than the mean score for the sections where the TA was identified as\n", "female.\n", "\n", "If the difference is rarely as large as the observed mean difference, the observed mean difference gives\n", "evidence that being identified as female really does lower the scores." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Number of allocations to 8, 11, 12, 12\n", "print sp.special.binom(43,8)*sp.special.binom(35,11)*sp.special.binom(24,12) # big number!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# the data are in a .csv file called \"Macnell-RatingsData.csv\" in the directory Data\n", "import csv # library for dealing with .csv files\n", "ratings = np.genfromtxt ('./Data/Macnell-RatingsData.csv', delimiter=\",\", names=True)\n", "print ratings\n", "print ratings['overall']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computational assignment\n", "\n", "Rosenzweig, M.R., E.L. Bennet, and M.C. Diamond, 1972. Brain changes in response to experience, _Scientific American_, _226_, 22–29 report an experiment in which 11 triples of male rats, each triple from the same litter, were\n", "assigned at random to three different environments, \"enriched\" (E), standard, and \"impoverished.\"\n", "\n", "After 65 days, they euthanized the rats and measured their cortical masses (mg).\n", "Here are the results: [To do: check that this is impoverished, not standard.]\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
enriched 689 656 668 660 679 663 664 647 694 633 653
impoverished 657 623 652 654 658 646 600 640 605 635 642
difference 32 33 16 6 21 17 64 7 89 -2 11\n", "
\n", "\n", "1. Code, from scratch, an appropriate 2-sample permutation test in Python using the sum of the 11 differences as the test statistic. The permutation scheme should match the actual randomization in the experiment.\n", "You may either construct a test that generates permutations at random, or write code that enumerates all permutations (since there are only 2048 possible equally-likely data in this case).\n", "If you choose to enumerate all permutations, you can use the itertools library.\n", "\n", "1. Explain whether a 1-sided or 2-sided test is appropriate.\n", "\n", "1. Under the Neyman model, find the $P$-value of the strong null hypothesis that the treatment makes no difference whatsoever.\n", "Use $10^4$ random assignments or enumerate all possible assignments.\n", "Your code should work in the general case (different data, different number of litters), not only the particular set of 22 numbers given.\n", "That said, for reference, the $P$-value for a 2-sided test should be $1/2^9$ for\n", "these data, so you can check your answer.\n", "\n", "1. Find an upper confidence bound for the $P$-value by inverting Binomial tests. You may re-use code from the class to do this.\n", "\n", "For your convenience, the data are below in a numpy array." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "brain_wts = np.array([ [689, 656, 668, 660, 679, 663, 664, 647, 694, 633, 653],\\\n", " [657, 623, 652, 654, 658, 646, 600, 640, 605, 635, 642] ])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%run talkTools.py" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }