{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Homework 7: Testing Hypotheses" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Reading**: \n", "* [Testing Hypotheses](https://www.inferentialthinking.com/chapters/11/testing-hypotheses.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Please complete this notebook by filling in the cells provided.\n", "\n", "Directly sharing answers is not okay, but discussing problems with the course staff or with other students is encouraged. Refer to the policies page to learn more about how to learn cooperatively.\n", "\n", "For all problems that you must write our explanations and sentences for, you **must** provide your answer in the designated space. Moreover, throughout this homework and all future ones, please be sure to not re-assign variables throughout the notebook! For example, if you use `max_temperature` in your answer to one question, do not reassign it later on." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Don't change this cell; just run it. \n", "\n", "import numpy as np\n", "from datascience import *\n", "\n", "# These lines do some fancy plotting magic.\n", "import matplotlib\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.style.use('fivethirtyeight')\n", "import warnings\n", "warnings.simplefilter('ignore', FutureWarning)\n", "import scipy.stats\n", "\n", "\n", "#import otter\n", "#grader = otter.Notebook()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Spam Calls\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 1: 781 Fun" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Yanay gets a lot of spam calls. An area code is defined to be a three digit number from 200-999 inclusive. In reality, many of these area codes are not in use, but for this question we'll simplify things and assume they all are. **Throughout these questions, you should assume that Yanay's area code is 781.**" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "**Question 1.** Assuming each area code is just as likely as any other, what's the probability that the area code of two back to back spam calls are 781?\n", "\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "manual_grade": true, "manual_problem_id": "catching_cheaters_1" }, "outputs": [], "source": [ "prob_781 = ...\n", "prob_781" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "**Question 2.** Rohan already knows that Yanay's area code is 781. Rohan randomly guesses the last 7 digits (0-9 inclusive) of his phone number. What's the probability that Rohan correctly guesses Yanay's number, assuming he’s equally likely to choose any digit?\n", "\n", "*Note: A phone number contains an area code and 7 additional digits, i.e. xxx-xxx-xxxx*\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "prob_yanay_num = ...\n", "prob_yanay_num" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Yanay suspects that there's a higher chance that the spammers are using his area code (781) to trick him into thinking it's someone from his area calling him. Ashley thinks that this is not the case, and that spammers are just choosing area codes of the spam calls at random from all possible area codes (*Remember, for this question we’re assuming the possible area codes are 200-999, inclusive*). Yanay wants to test his claim using the 50 spam calls he received in the past month.\n", "\n", "Here's a dataset of the area codes of the 50 spam calls he received in the past month." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Just run this cell\n", "spam = Table().read_table('spam.csv')\n", "spam" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false }, "source": [ "**Question 3.** Define the null hypothesis and alternative hypothesis for this investigation. \n", "\n", "*Hint: Don’t forget that your null hypothesis should fully describe a probability model that we can use for simulation later.*\n", "\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$H_o: \\pi = \\frac{1}{800}$\n", "\n", "H0: pi = 1/800\n", "\n", "\n", "\n", "$H_a: \\pi > \\frac{1}{800}$\n", "\n", "Ha: pi > 1/800" ] }, { "cell_type": "markdown", "metadata": { "export_pdf": true, "for_assignment_type": "solution" }, "source": [ "### Binomial Test\n", "\n", "Below is the code to run a binomial test, along with the results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "num_781 = spam.where(\"Area Code\", 781).num_rows\n", "\n", "total_calls = spam.num_rows\n", "\n", "pval = scipy.stats.binom_test(num_781, total_calls, 1/800, alternative = \"greater\")\n", "\n", "print(f\"Binomial Test Results: The p-value = {pval}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 4.** Suppose you use a p-value cutoff of 5%. What do you conclude from the hypothesis test? Why?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Write your answer here, replacing this text.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose now that we want to re-run this hypothesis test using the simulation approach. \n", "\n", "\n", "**Question 5.** Which of the following test statistics would be a reasonable choice to help differentiate between the two hypotheses?\n", "\n", "*Hint*: For a refresher on choosing test statistics, check out the textbook section on [Test Statistics](https://www.inferentialthinking.com/chapters/11/3/decisions-and-uncertainty.html#Step-2:-The-Test-Statistic).\n", "\n", "1. The proportion of area codes that are 781 in 50 random spam calls\n", "2. The probability of getting an area code of 781 out of all the possible area codes.\n", "3. The proportion of area codes that are 781 in 50 random spam calls divided by 2\n", "4. The number of times you see the area code 781 in 50 random spam calls\n", "\n", "Assign `reasonable_test_statistics` to an array of numbers corresponding to these test statistics.\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "reasonable_test_statistics = make_array(1,4)" ] }, { "cell_type": "markdown", "metadata": { "deletable": false }, "source": [ "