{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np\n", "import scipy.stats" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Problem 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**20 our of 100 U.S. Senators are women, yet when the Senate formed an intramural baseball team of 9 people only 1 woman was chosen for the team. What is the probability of this occurring by chance? What is the p-value with which the null hypothesis \"there is no discrimination against women Senators\" can be rejected?**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I first define the PMF for the multivariate hypergeometric distribution." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def multi_hypergeom_pmf(k, K):\n", " N = sum(K)\n", " n = sum(k)\n", " choose = scipy.misc.comb\n", " return np.product([choose(K_i, k_i) for K_i, k_i in zip(K, k)]) / choose(N, n)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Assuming each senator has an equal chance of being chosen for the team, the number of men and women on the team is distributed according to the hypergeometric distribution. The probability of exactly 8 men and 1 woman on the team is 0.3048. The probability of 8 or more men on the team (i.e., the team observed or a team with even more men) is **0.4267** which is nowhere near a reasonable threshold such as 0.05 for rejecting the null hypothesis." ] }, { "cell_type": "code", "collapsed": false, "input": [ "senate_counts = (80, 20)\n", "prob_8_men = multi_hypergeom_pmf((8, 1), senate_counts)\n", "prob_9_men = multi_hypergeom_pmf((9, 0), senate_counts)\n", "print 'P(8 men) =', prob_8_men\n", "print 'P(9 men) =', prob_9_men\n", "print 'P(8 or more men) =', prob_8_men + prob_9_men" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "P(8 men) = 0.304773971521\n", "P(9 men) = 0.121909588608\n", "P(8 or more men) = 0.42668356013\n" ] } ], "prompt_number": 3 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Problem 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**A large jelly bean jar has 20% red jelly beans, 30% blue, and 50% yellow. If 6 jelly beans are chosen at random, what is the chance of getting exactly 2 of each color? What is the name of this distribution?**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Assuming an infinitely large bag (or sampling with replacement), this is the multinomial distribution and the probability of getting exactly 2 of each color is **0.0810**." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def multinomial_pmf(x, p):\n", " factorial = scipy.misc.factorial\n", " n = sum(x)\n", " n_equiv = factorial(n) / np.product([factorial(x_i) for x_i in x])\n", " p_outcome = np.product([p_i**x_i for p_i, x_i in zip(p, x)])\n", " return n_equiv * p_outcome" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "multinomial_pmf([2, 2, 2], [0.2, 0.3, 0.5])" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ "0.081000000000000016" ] } ], "prompt_number": 5 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Problem 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**A small jelly bean jar has 2 red jelly beans, 3 blue, and 5 yellow. If 6 jelly beans are chosen at random, what is the chance of getting exactly 2 of each color? What is the name of this distribution?**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the multivariate hypergeometric distribution. The probability of getting 2 jelly beans of each color is **0.1429**." ] }, { "cell_type": "code", "collapsed": false, "input": [ "multi_hypergeom_pmf([2, 2, 2], [2, 3, 5])" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ "0.14285714285714271" ] } ], "prompt_number": 7 } ], "metadata": {} } ] }