{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "import matplotlib.ticker as ticker\n", "from IPython.core.display import display, HTML\n", "import os\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 27\n", "Which is cheaper: eating out or dining in? The mean cost of a flank steak, broccoli, and rice bought at the grocery store is \\$13.04 (Money.msn website, November 7, 2012). A sample of 100 neighborhood restaurants showed a mean price of \\$12.75 and a standard deviation of $2 for a comparable restaurant meal." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A. Develop appropriate hypotheses for a test to determine whether the sample data support the conclusion that the mean cost of a restaurant meal is less than fixing a comparable meal at home.\n", "\n", "B. Using the sample from the 100 restaurants, what is the p – value?\n", "\n", "C. At α = .05, what is your conclusion?\n", "\n", "D. Repeat the preceding hypothesis test using the critical value approach." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This question is asking if the cost difference of easting out is statistically significant. More specifically, it's asking specifically whether eating out is less than eating in, so they are asking a 1-tailed question.\n", "\n", "The general equation you'll use for this problem is \n", "$$Z = \\frac{\\bar{x}-\\mu_o}{\\frac{s}{\\sqrt{n}}}$$\n", "\n", "where \n", "* $s$ is the standard deviation in restaurant price ($\\$2$)\n", "* $\\mu_o$ is the null hypothesis mean (in this case, it's the grocery store price $\\$13.04$)\n", "* $n$ is the number of samples (100 restaurants)\n", "* $\\bar{x}$ is the sample mean (here, it's the $\\$12.75$ mean price of a restaurant meal)\n", "* $\\mu$ is the actual mean price of all restaurants. We don't know this value, but we have the mean of a sample of 100 restaurants.\n", "* $Z$ is the critical value\n", "\n", "The null hypothesis ($H_o$) is that there isn't a statistically significant difference in price between restaurants and eating out ($\\mu = \\mu_o$)\n", "\n", "\n", "$$H_o: \\mu = \\mu_o$$\n", "$$H_a: \\mu < \\mu_o$$\n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-1.4499999999999957\n" ] } ], "source": [ "z = (12.75 - 13.04)/(2/(100**0.5))\n", "print(z)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's about -1.45. Looking at a [standard normal table](https://en.wikipedia.org/wiki/Standard_normal_table) at the value for -1.45 (same as for 1.45, as the normal distribution is symmetrical), we see that the test statistic ($f(Z)$) at that level is 0.43943+0.5 = .93943, and $p-value = 1 - f(Z)$, so the $p-value = 0.061$. That's above the $\\alpha = 0.05$ significance level, so we cannot reject the null hypothesis that eating out is significantly cheaper. \n", "\n", "I don't know the difference between the critical value approach and the p-value approach, my way used both. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 41\n", "\n", "Ten years ago 53% of American families owned stocks or stock funds. Sample data collected by the Investment Company Institute indicate that the percentage is now 46%.\n", "\n", "A. Develop appropriate hypotheses such that rejection of Ho will support the conclusion that a smaller proportion of American families own stocks or stock funds in 2012 than 10 years ago.\n", "\n", "B. Assume the Investment Company Institute sampled 300 American families to estimate that the percent owning stocks or stock funds was 46% in 2012. What is the p- value for your hypothesis test?\n", "\n", "C. At α = .01, what is your conclusion?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I initially tried applying the same approach as in problem 27, but I noticed we aren't given a standard deviation, so I looked in my statistics book for an approach that didn't use the standard deviation and found one that takes advantage of the fact that we're dealing with percentages (ie population proportions), so I concluded this problem is working with population proportions, which uses the equation:\n", "\n", "$$Z = \\frac{\\hat{p}-p_o}{\\sqrt{\\frac{p_o(1-p_o)}{n}}}$$\n", "\n", "Given in the problem\n", "* $p_o = 53\\%$ of families owned stocks or stock funds\n", "* $H_a: p < p_o$ (1 tailed)\n", "* $\\hat{p} = 46\\%$\n", "* $n = 300$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given the alternative hypothesis, $H_a$, it's easy to formulate the null hypothesis, $H_o$ \n", "\n", "$$H_o: p = p_o$$\n", "\n", "Or in English: the proportion of American families that owned stocks or funds in 2002 is not statistically different from the proportion of families that owned stocks/funds in 2012.\n", "\n", "Now I'll plug values into the test statistic equation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$Z = \\frac{0.46-0.53}{\\sqrt{\\frac{0.53(1-0.53)}{300}}} = -2.429$$" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-2.429247718971547" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(0.46-0.53)/((0.53*(1-0.53))/300)**0.5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Going back to the table of standard normals, for z = -2.43, f(z) = 0.49254+0.5 = 0.99254. \n", "\n", "$$1-0.99254 = 0.00746$$\n", "\n", "and as $0.00746 < 0.01$, we can reject the null hypothesis and conclude that fewer families owned stocks or funds in 2012 than did in 2002." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:py36]", "language": "python", "name": "conda-env-py36-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }