{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "> This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.\n" ] }, { "cell_type": "markdown", "metadata": { "word_id": "4818_07_ztest" }, "source": [ "# 7.2. Getting started with statistical hypothesis testing: a simple z-test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Many frequentist methods for hypothesis testing roughly involve the following steps:\n", "\n", "1. Writing down the hypotheses, notably the **null hypothesis** which is the *opposite* of the hypothesis you want to prove (with a certain degree of confidence).\n", "2. Computing a **test statistics**, a mathematical formula depending on the test type, the model, the hypotheses, and the data.\n", "3. Using the computed value to accept the hypothesis, reject it, or fail to conclude.\n", " \n", "Here, we flip a coin $n$ times and we observe $h$ heads. We want to know whether the coin is fair (null hypothesis). This example is extremely simple yet quite good for pedagogical purposes. Besides, it is the basis of many more complex methods." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We denote by $\\mathcal B(q)$ the Bernoulli distribution with unknown parameter $q$ (http://en.wikipedia.org/wiki/Bernoulli_distribution). A Bernoulli variable:\n", "\n", "* is 0 (tail) with probability $1-q$,\n", "* is 1 (head) with probability $q$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Let's suppose that, after $n=100$ flips, we get $h=61$ heads. We choose a significance level of 0.05: is the coin fair or not? Our null hypothesis is: *the coin is fair* ($q = 1/2$)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "import scipy.stats as st\n", "import scipy.special as sp" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "n = 100 # number of coin flips\n", "h = 61 # number of heads\n", "q = .5 # null-hypothesis of fair coin" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Let's compute the **z-score**, which is defined by the following formula (`xbar` is the estimated average of the distribution). We will explain this formula in the next section *How it works...*" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "xbar = float(h)/n\n", "z = (xbar - q) * np.sqrt(n / (q*(1-q))); z" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Now, from the z-score, we can compute the p-value as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "pval = 2 * (1 - st.norm.cdf(z)); pval" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. This p-value is less than 0.05, so we reject the null hypothesis and conclude that *the coin is probably not fair*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).\n", "\n", "> [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.2" } }, "nbformat": 4, "nbformat_minor": 0 }