{ "cells": [ { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/9f5keQe.png)" }, { "metadata": {}, "cell_type": "markdown", "source": "* **Hypothesis testing is used when we need to make decisions concerning populations on the basis of only sample information **\n* **A variety of statistical tests are uese to help arrive at these decisions, but the steps are common for all tests **" }, { "metadata": {}, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/sz2xKlU.jpg)" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/jYN6L0X.png?1)" }, { "metadata": {}, "cell_type": "markdown", "source": "## Approaches" }, { "metadata": {}, "cell_type": "markdown", "source": "**Hypothesis Testing (Critical Value Approach)** \n**Hypothesis Testing (P-Value Approach)** " }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "### Critical Value approach" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/QGnkJsp.png)" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "If test statistic < critical value: Fail to reject the null hypothesis. \nIf test statistic >= critical value: Reject the null hypothesis. " }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "### p-value approach" }, { "metadata": { "run_control": { "marked": false }, "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "**p- values **" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "If p-value <= alpha: Reject the null hypothesis (i.e. significant result). \nIf p-value > alpha: Fail to reject the null hypothesis (i.e. not signifiant result). " }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "Statistical test rejection statement is in terms of the dichotomy of rejecting and fail to rejecting the null hypothesis. \n\nRejecting the null hypothesis means that there is sufficient statistical evidence that the null hypothesis does not look likely. \n\nFail to reject the null hypothesis, as in, there is insufficient statistical evidence to reject it." }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/PUAA4mR.jpg?1)" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "**HOW TO DEFINE A NULL HYPOTHESIS:**\n* Every hypothesis test contains a set of two opposing statements, or hypotheses, about a population parameter\n* The first hypothesis is called the null hypothesis, denoted **H0**\n* The null hypothesis always states that **the population parameter is equal to the claimed value**\n* For example, if the claim is that the average waiting time to get an ordered item in a hotel is five minutes \n![Imgur](https://i.imgur.com/S4SqUQ1.png)(That is, the population mean is 5 minutes.)\n\n" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "**HOW TO DEFINE AN ALTERNATIVE HYPOTHESIS** \n* three possibilities exist for the second (or alternative) hypothesis, denoted **Ha**\n![Imgur](https://i.imgur.com/qOhBYaI.png)" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "if you want to test whether the hotel is correct in claiming its average waiting time to get an ordered item five minutes and it **it doesn’t matter whether the actual average time is more or less than that, you use the not-equal-to alternative**. Your hypotheses for that test would be\n![Imgur](https://i.imgur.com/iqHyi7S.png)" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "If you only want to see whether the time turns out to be **greater than what the hotel claim** (that is, whether the company is falsely advertising its quick prep time), **you use the greater-than alternative**, and your two hypotheses are\n![Imgur](https://i.imgur.com/7KxVK5D.png)" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "If you think **the average waiting time for an ordered item can be in less than five minutes** (and could be marketed by the hotel as such). **The less-than alternative is the one you want**, and your two hypotheses would be\n![Imgur](https://i.imgur.com/ky7hnOw.png)" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "### Type I and Type II Errors" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/yYmql3R.png?1)" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/czKU38H.png)" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/gfyfuWJ.png)" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/eugQbwB.png)" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "install.packages(\"ISLR\",repos=\"https://cran.r-project.org\")" }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "library(ISLR)", "execution_count": 1, "outputs": [ { "name": "stderr", "output_type": "stream", "text": "Warning message:\n\"package 'ISLR' was built under R version 3.4.3\"" } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "fragment" }, "trusted": false }, "cell_type": "code", "source": "dim(Wage)", "execution_count": 2, "outputs": [ { "data": { "text/html": "
    \n\t
  1. 3000
  2. \n\t
  3. 11
  4. \n
\n", "text/latex": "\\begin{enumerate*}\n\\item 3000\n\\item 11\n\\end{enumerate*}\n", "text/markdown": "1. 3000\n2. 11\n\n\n", "text/plain": "[1] 3000 11" }, "metadata": {}, "output_type": "display_data" } ] }, { "metadata": { "collapsed": true, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "?Wage", "execution_count": 3, "outputs": [] }, { "metadata": { "collapsed": true, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "try(data(package=\"ISLR\"))", "execution_count": 4, "outputs": [] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "head(Wage)", "execution_count": 5, "outputs": [ { "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\t\n\n
yearagemaritlraceeducationregionjobclasshealthhealth_inslogwagewage
2316552006 18 1. Never Married 1. White 1. < HS Grad 2. Middle Atlantic 1. Industrial 1. <=Good 2. No 4.318063 75.04315
865822004 24 1. Never Married 1. White 4. College Grad 2. Middle Atlantic2. Information 2. >=Very Good 2. No 4.255273 70.47602
1613002003 45 2. Married 1. White 3. Some College 2. Middle Atlantic 1. Industrial 1. <=Good 1. Yes 4.875061 130.98218
1551592003 43 2. Married 3. Asian 4. College Grad 2. Middle Atlantic 2. Information 2. >=Very Good 1. Yes 5.041393 154.68529
114432005 50 4. Divorced 1. White 2. HS Grad 2. Middle Atlantic 2. Information 1. <=Good 1. Yes 4.318063 75.04315
3766622008 54 2. Married 1. White 4. College Grad 2. Middle Atlantic 2. Information 2. >=Very Good 1. Yes 4.845098 127.11574
\n", "text/latex": "\\begin{tabular}{r|lllllllllll}\n & year & age & maritl & race & education & region & jobclass & health & health\\_ins & logwage & wage\\\\\n\\hline\n\t231655 & 2006 & 18 & 1. Never Married & 1. White & 1. < HS Grad & 2. Middle Atlantic & 1. Industrial & 1. <=Good & 2. No & 4.318063 & 75.04315 \\\\\n\t86582 & 2004 & 24 & 1. Never Married & 1. White & 4. College Grad & 2. Middle Atlantic & 2. Information & 2. >=Very Good & 2. No & 4.255273 & 70.47602 \\\\\n\t161300 & 2003 & 45 & 2. Married & 1. White & 3. Some College & 2. Middle Atlantic & 1. Industrial & 1. <=Good & 1. Yes & 4.875061 & 130.98218 \\\\\n\t155159 & 2003 & 43 & 2. Married & 3. Asian & 4. College Grad & 2. Middle Atlantic & 2. Information & 2. >=Very Good & 1. Yes & 5.041393 & 154.68529 \\\\\n\t11443 & 2005 & 50 & 4. Divorced & 1. White & 2. HS Grad & 2. Middle Atlantic & 2. Information & 1. <=Good & 1. Yes & 4.318063 & 75.04315 \\\\\n\t376662 & 2008 & 54 & 2. Married & 1. White & 4. College Grad & 2. Middle Atlantic & 2. Information & 2. >=Very Good & 1. Yes & 4.845098 & 127.11574 \\\\\n\\end{tabular}\n", "text/markdown": "\n| | year | age | maritl | race | education | region | jobclass | health | health_ins | logwage | wage | \n|---|---|---|---|---|---|\n| 231655 | 2006 | 18 | 1. Never Married | 1. White | 1. < HS Grad | 2. Middle Atlantic | 1. Industrial | 1. <=Good | 2. No | 4.318063 | 75.04315 | \n| 86582 | 2004 | 24 | 1. Never Married | 1. White | 4. College Grad | 2. Middle Atlantic | 2. Information | 2. >=Very Good | 2. No | 4.255273 | 70.47602 | \n| 161300 | 2003 | 45 | 2. Married | 1. White | 3. Some College | 2. Middle Atlantic | 1. Industrial | 1. <=Good | 1. Yes | 4.875061 | 130.98218 | \n| 155159 | 2003 | 43 | 2. Married | 3. Asian | 4. College Grad | 2. Middle Atlantic | 2. Information | 2. >=Very Good | 1. Yes | 5.041393 | 154.68529 | \n| 11443 | 2005 | 50 | 4. Divorced | 1. White | 2. HS Grad | 2. Middle Atlantic | 2. Information | 1. <=Good | 1. Yes | 4.318063 | 75.04315 | \n| 376662 | 2008 | 54 | 2. Married | 1. White | 4. College Grad | 2. Middle Atlantic | 2. Information | 2. >=Very Good | 1. Yes | 4.845098 | 127.11574 | \n\n\n", "text/plain": " year age maritl race education region \n231655 2006 18 1. Never Married 1. White 1. < HS Grad 2. Middle Atlantic\n86582 2004 24 1. Never Married 1. White 4. College Grad 2. Middle Atlantic\n161300 2003 45 2. Married 1. White 3. Some College 2. Middle Atlantic\n155159 2003 43 2. Married 3. Asian 4. College Grad 2. Middle Atlantic\n11443 2005 50 4. Divorced 1. White 2. HS Grad 2. Middle Atlantic\n376662 2008 54 2. Married 1. White 4. College Grad 2. Middle Atlantic\n jobclass health health_ins logwage wage \n231655 1. Industrial 1. <=Good 2. No 4.318063 75.04315\n86582 2. Information 2. >=Very Good 2. No 4.255273 70.47602\n161300 1. Industrial 1. <=Good 1. Yes 4.875061 130.98218\n155159 2. Information 2. >=Very Good 1. Yes 5.041393 154.68529\n11443 2. Information 1. <=Good 1. Yes 4.318063 75.04315\n376662 2. Information 2. >=Very Good 1. Yes 4.845098 127.11574" }, "metadata": {}, "output_type": "display_data" } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "# test the hypothesis whether the average wage of male workers in the Mid-Atlantic region is greater than or equal to 50, 5500 \nt.test(Wage$wage,alternative = \"less\", mu = 250.7036)\noptions(scipen = 999)", "execution_count": 6, "outputs": [ { "data": { "text/plain": "\n\tOne Sample t-test\n\ndata: Wage$wage\nt = -182.45, df = 2999, p-value < 2.2e-16\nalternative hypothesis: true mean is less than 250.7036\n95 percent confidence interval:\n -Inf 112.9571\nsample estimates:\nmean of x \n 111.7036 \n" }, "metadata": {}, "output_type": "display_data" } ] }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "## Practice" }, { "metadata": { "slideshow": { "slide_type": "fragment" } }, "cell_type": "markdown", "source": "http://www.statstutor.ac.uk/types/tests-and-quizzes/confidence-intervals-and-hypothesis-testing/" } ], "metadata": { "celltoolbar": "Slideshow", "hide_input": false, "kernelspec": { "name": "r", "display_name": "R", "language": "R" }, "language_info": { "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.4.1", "file_extension": ".r", "codemirror_mode": "r" }, "nav_menu": {}, "toc": { "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "base_numbering": 1, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }