{ "cells": [ { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/9f5keQe.png)" }, { "metadata": {}, "cell_type": "markdown", "source": "* **Hypothesis testing is used when we need to make decisions concerning populations on the basis of only sample information **\n* **A variety of statistical tests are uese to help arrive at these decisions, but the steps are common for all tests **" }, { "metadata": {}, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/sz2xKlU.jpg)" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/jYN6L0X.png?1)" }, { "metadata": {}, "cell_type": "markdown", "source": "## Approaches" }, { "metadata": {}, "cell_type": "markdown", "source": "**Hypothesis Testing (Critical Value Approach)** \n**Hypothesis Testing (P-Value Approach)** " }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "### Critical Value approach" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/QGnkJsp.png)" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "If test statistic < critical value: Fail to reject the null hypothesis. \nIf test statistic >= critical value: Reject the null hypothesis. " }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "### p-value approach" }, { "metadata": { "run_control": { "marked": false }, "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "**p- values **" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "If p-value <= alpha: Reject the null hypothesis (i.e. significant result). \nIf p-value > alpha: Fail to reject the null hypothesis (i.e. not signifiant result). " }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "Statistical test rejection statement is in terms of the dichotomy of rejecting and fail to rejecting the null hypothesis. \n\nRejecting the null hypothesis means that there is sufficient statistical evidence that the null hypothesis does not look likely. \n\nFail to reject the null hypothesis, as in, there is insufficient statistical evidence to reject it." }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/PUAA4mR.jpg?1)" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "**HOW TO DEFINE A NULL HYPOTHESIS:**\n* Every hypothesis test contains a set of two opposing statements, or hypotheses, about a population parameter\n* The first hypothesis is called the null hypothesis, denoted **H0**\n* The null hypothesis always states that **the population parameter is equal to the claimed value**\n* For example, if the claim is that the average waiting time to get an ordered item in a hotel is five minutes \n![Imgur](https://i.imgur.com/S4SqUQ1.png)(That is, the population mean is 5 minutes.)\n\n" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "**HOW TO DEFINE AN ALTERNATIVE HYPOTHESIS** \n* three possibilities exist for the second (or alternative) hypothesis, denoted **Ha**\n![Imgur](https://i.imgur.com/qOhBYaI.png)" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "if you want to test whether the hotel is correct in claiming its average waiting time to get an ordered item five minutes and it **it doesn’t matter whether the actual average time is more or less than that, you use the not-equal-to alternative**. Your hypotheses for that test would be\n![Imgur](https://i.imgur.com/iqHyi7S.png)" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "If you only want to see whether the time turns out to be **greater than what the hotel claim** (that is, whether the company is falsely advertising its quick prep time), **you use the greater-than alternative**, and your two hypotheses are\n![Imgur](https://i.imgur.com/7KxVK5D.png)" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "If you think **the average waiting time for an ordered item can be in less than five minutes** (and could be marketed by the hotel as such). **The less-than alternative is the one you want**, and your two hypotheses would be\n![Imgur](https://i.imgur.com/ky7hnOw.png)" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "### Type I and Type II Errors" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/yYmql3R.png?1)" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/czKU38H.png)" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/gfyfuWJ.png)" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/eugQbwB.png)" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "install.packages(\"ISLR\",repos=\"https://cran.r-project.org\")" }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "library(ISLR)", "execution_count": 1, "outputs": [ { "name": "stderr", "output_type": "stream", "text": "Warning message:\n\"package 'ISLR' was built under R version 3.4.3\"" } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "fragment" }, "trusted": false }, "cell_type": "code", "source": "dim(Wage)", "execution_count": 2, "outputs": [ { "data": { "text/html": "
year | age | maritl | race | education | region | jobclass | health | health_ins | logwage | wage | |
---|---|---|---|---|---|---|---|---|---|---|---|
231655 | 2006 | 18 | 1. Never Married | 1. White | 1. < HS Grad | 2. Middle Atlantic | 1. Industrial | 1. <=Good | 2. No | 4.318063 | 75.04315 |
86582 | 2004 | 24 | 1. Never Married | 1. White | 4. College Grad | 2. Middle Atlantic | 2. Information | 2. >=Very Good | 2. No | 4.255273 | 70.47602 |
161300 | 2003 | 45 | 2. Married | 1. White | 3. Some College | 2. Middle Atlantic | 1. Industrial | 1. <=Good | 1. Yes | 4.875061 | 130.98218 |
155159 | 2003 | 43 | 2. Married | 3. Asian | 4. College Grad | 2. Middle Atlantic | 2. Information | 2. >=Very Good | 1. Yes | 5.041393 | 154.68529 |
11443 | 2005 | 50 | 4. Divorced | 1. White | 2. HS Grad | 2. Middle Atlantic | 2. Information | 1. <=Good | 1. Yes | 4.318063 | 75.04315 |
376662 | 2008 | 54 | 2. Married | 1. White | 4. College Grad | 2. Middle Atlantic | 2. Information | 2. >=Very Good | 1. Yes | 4.845098 | 127.11574 |