{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Skew test\n",
    "\n",
    "Allen Downey\n",
    "\n",
    "[MIT License](https://en.wikipedia.org/wiki/MIT_License)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "sns.set(style='white')\n",
    "\n",
    "from thinkstats2 import Pmf, Cdf\n",
    "\n",
    "import thinkstats2\n",
    "import thinkplot\n",
    "\n",
    "decorate = thinkplot.config"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose you buy a loaf of bread every day for a year, take it\n",
    "home, and weigh it.  You suspect that the distribution of weights is\n",
    "more skewed than a normal distribution with the same mean and\n",
    "  standard deviation.\n",
    "\n",
    "To test your suspicion, write a definition for a class named\n",
    "  `SkewTest` that extends `thinkstats.HypothesisTest` and provides\n",
    "  two methods:\n",
    "\n",
    "* `TestStatistic` should compute the skew of a given sample.\n",
    "\n",
    "* `RunModel` should simulate the null hypothesis and return\n",
    "  simulated data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class HypothesisTest(object):\n",
    "    \"\"\"Represents a hypothesis test.\"\"\"\n",
    "\n",
    "    def __init__(self, data):\n",
    "        \"\"\"Initializes.\n",
    "\n",
    "        data: data in whatever form is relevant\n",
    "        \"\"\"\n",
    "        self.data = data\n",
    "        self.MakeModel()\n",
    "        self.actual = self.TestStatistic(data)\n",
    "        self.test_stats = None\n",
    "\n",
    "    def PValue(self, iters=1000):\n",
    "        \"\"\"Computes the distribution of the test statistic and p-value.\n",
    "\n",
    "        iters: number of iterations\n",
    "\n",
    "        returns: float p-value\n",
    "        \"\"\"\n",
    "        self.test_stats = np.array([self.TestStatistic(self.RunModel()) \n",
    "                                       for _ in range(iters)])\n",
    "\n",
    "        count = sum(self.test_stats >= self.actual)\n",
    "        return count / iters\n",
    "\n",
    "    def MaxTestStat(self):\n",
    "        \"\"\"Returns the largest test statistic seen during simulations.\n",
    "        \"\"\"\n",
    "        return np.max(self.test_stats)\n",
    "\n",
    "    def PlotHist(self, label=None):\n",
    "        \"\"\"Draws a Cdf with vertical lines at the observed test stat.\n",
    "        \"\"\"\n",
    "        plt.hist(self.test_stats, color='C4', alpha=0.5)\n",
    "        plt.axvline(self.actual, linewidth=3, color='0.8')\n",
    "        plt.xlabel('Test statistic')\n",
    "        plt.ylabel('Count')\n",
    "        plt.title('Distribution of the test statistic under the null hypothesis')\n",
    "\n",
    "    def TestStatistic(self, data):\n",
    "        \"\"\"Computes the test statistic.\n",
    "\n",
    "        data: data in whatever form is relevant        \n",
    "        \"\"\"\n",
    "        raise UnimplementedMethodException()\n",
    "\n",
    "    def MakeModel(self):\n",
    "        \"\"\"Build a model of the null hypothesis.\n",
    "        \"\"\"\n",
    "        pass\n",
    "\n",
    "    def RunModel(self):\n",
    "        \"\"\"Run the model of the null hypothesis.\n",
    "\n",
    "        returns: simulated data\n",
    "        \"\"\"\n",
    "        raise UnimplementedMethodException()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To test this class, I'll generate a sample from an actual Gaussian distribution, so the null hypothesis is true."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mu = 1000\n",
    "sigma = 35\n",
    "data = np.random.normal(mu, sigma, size=365)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can make a `SkewTest` and compute the observed skewness."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test = SkewTest(data)\n",
    "test.actual"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's the p-value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "test = SkewTest(data)\n",
    "test.PValue()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And the distribution of the test statistic under the null hypothesis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "test.PlotHist()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Most of the time the p-value exceeds 5%, so we would conclude that the observed skewness could plausibly be due to random sample.\n",
    "\n",
    "But let's see how often we get a false positive."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iters = 100\n",
    "count = 0\n",
    "\n",
    "for i in range(iters):\n",
    "    data = np.random.normal(mu, sigma, size=365)\n",
    "    test = SkewTest(data)\n",
    "    p_value = test.PValue()\n",
    "    if p_value < 0.05:\n",
    "        count +=1\n",
    "        \n",
    "print(count/iters)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the long run, the false positive rate is the threshold we used, 5%."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}