{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "NBIO208.ipynb",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.5"
    },
    "papermill": {
      "duration": 11.308284,
      "end_time": "2021-01-28T05:37:53.680473",
      "environment_variables": {},
      "exception": null,
      "input_path": "__notebook__.ipynb",
      "output_path": "__notebook__.ipynb",
      "parameters": {},
      "start_time": "2021-01-28T05:37:42.372189",
      "version": "2.1.0"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0hgErj1IiJG1"
      },
      "source": [
        "# NBIO208. Statistical Testing in Python"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4DBpQPRniJG1"
      },
      "source": [
        "The goal of this lesson is to introduce Python libraries, load and look at a dataset, begin some common statistical testing.  It includes:\n",
        "- [1 The simplicity underlying common statistical tests](#anchor1)\n",
        "- [2 The python environment](#anchor2)\n",
        "- [A note about $p$-values](#anchor2b)\n",
        "- [3 Pearson and Spearman correlation](#anchor3)\n",
        "- [4 One mean comparisons](#anchor4)\n",
        "- [5 Many mean comparisons](#anchor5)\n",
        "- [6 Checking assumptions](#anchor6)\n",
        "- [Resources](#anchor7)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0b3h8W5-iJG2"
      },
      "source": [
        "<a name=\"anchor1\"></a>\n",
        "## 1 The simplicity underlying common statistical tests\n",
        "\n",
        "Most of the common statistical models (t-test, correlation, ANOVA, chi-square, etc.) are special cases of linear models, or a very close approximation. This beautiful simplicity means that there is less to learn. In particular, it all comes down to $y = a \\cdot x + b$, where $a$ is the slope of the line and $b$ is the y-intercept where the line crosses the y-axis. \n",
        "\n",
        "There are certain assumptions we check to use these \"parametric tests.\"  When assumptions are not met, we have alternative \"non-parametric\" counterparts.  We will think of \"non-parametric\"\" tests as ranked versions of the corresponding parametric tests. \n",
        "\n",
        "This lesson is adapted from [Tests as Linear](https://github.com/eigenfoo/tests-as-linear), which is also available [in R](https://github.com/lindeloev/tests-as-linear). \n",
        "\n",
        "**See the [Cheat Sheet](https://lindeloev.github.io/tests-as-linear/linear_tests_cheat_sheet.pdf)**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3hYxOGLBchMJ"
      },
      "source": [
        "<a name=\"anchor2\"></a>\n",
        "## 2 The Python Environment"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Sno4smqRiJG3"
      },
      "source": [
        "In part 2, we will:\n",
        "<img src=\"https://github.com/DeisData/datasets/blob/main/Part2_steps.png?raw=true?raw=true\" alt=\"drawing\" style=\"width:350px;\"/>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RwuaB2M7iJG3"
      },
      "source": [
        "### 2.1.  Load libraries. <a class=\"anchor\" id=\"bullet2\"></a>\n",
        "Think of these as useful powerful toolboxes we are opening up on our workbench. Many additional libraries are available from the Python Package Index.\n",
        "\n",
        "we `import` some key scientific libraries, sometimes using a shorter *alias* for ease of coding, with the syntax `import library_name as alias_name` \n",
        "\n",
        "- [numpy](https://numpy.org) - adding support for large, multi-dimensional arrays and matrices, and mathematical functions for arrays.\n",
        "- [pandas](https://pandas.pydata.org/) - offers data structures and operations for manipulating numerical tables and time series.\n",
        "- [matplotlib](https://matplotlib.org/) - the most widely used scientific plotting library in Python.\n",
        "- [seaborn](https://seaborn.pydata.org/) - for drawing attractive and informative statistical graphics.\n",
        "- [statsmodels](https://www.statsmodels.org/stable/index.html) and [scipy](https://www.scipy.org/) for hypothesis testing and regression models."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "8TEg4K1AiJG3"
      },
      "source": [
        "# [1]:\n",
        "# Python libraries for data structures, arrays, and math\n",
        "import numpy as np\n",
        "import pandas as pd\n",
        "\n",
        "# Libraris for plotting\n",
        "import matplotlib.pyplot as plt\n",
        "import seaborn as sns\n",
        "\n",
        "# Importing the statistics module\n",
        "import statistics\n",
        "\n",
        "# Library for hypothesis testing\n",
        "import scipy\n",
        "\n",
        "# Libraries for regression modeling\n",
        "import statsmodels.api as sm\n",
        "import statsmodels.formula.api as smf\n",
        "\n",
        "# Library for latex \n",
        "from IPython.display import Latex, display, Markdown"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mbR3OGMiiJG4"
      },
      "source": [
        "### 2.2 Load data.<a class=\"anchor\" id=\"bullet3\"></a>\n",
        "Loads a copy of the data into our environment.  Unlike working with a spreadsheet, it does not affect the original file.\n",
        "\n",
        "*Our data*  This synthetic dataset contains information on new born babies and their parents.  It comes from [here](https://www.sheffield.ac.uk/mash/statistics/datasets). \n",
        "\n",
        "Read a Comma Separated Values (CSV) data file with `pd.read_csv()`. \n",
        "*[Need to read in a different file type?](https://realpython.com/pandas-read-write-files/)*\n",
        "- Argument is the name of the file to be read.\n",
        "- Assign result to a variable to store the data that was read."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "jWi0xwl1iJG4"
      },
      "source": [
        "# url is the name and path of the data file\n",
        "url = \"https://raw.githubusercontent.com/DeisData/datasets/main/Birthweight_reduced_kg_R.csv\"\n",
        "\n",
        "# data is the name of the dataFrame we are storing our data in\n",
        "# pd is pandas and read_csv is a tool in pandas for reading in a csv file\n",
        "data = pd.read_csv(url)    "
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hG_fqj-WiJG5"
      },
      "source": [
        "### 2.3. Inspect the data table <a class=\"anchor\" id=\"bullet4\"></a>\n",
        "There are lots of functions and methods we can apply to the dataframe to start inspecting it."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5YtAh5rqiJG5"
      },
      "source": [
        "<div class=\"alert alert-block alert-info\">Functions attached to objects are called methods.  They are common in pandas.  Methods have parentheses like functions, but come after the variable.\n",
        "</div>"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "IhO5uiE8iJG5",
        "outputId": "e37a4921-b815-4ea3-ebac-1852c2206532"
      },
      "source": [
        "# Method to view head of dataset\n",
        "data.head()"
      ],
      "execution_count": null,
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>ID</th>\n",
              "      <th>Length</th>\n",
              "      <th>Birthweight</th>\n",
              "      <th>Headcirc</th>\n",
              "      <th>Gestation</th>\n",
              "      <th>smoker</th>\n",
              "      <th>mage</th>\n",
              "      <th>mnocig</th>\n",
              "      <th>mheight</th>\n",
              "      <th>mppwt</th>\n",
              "      <th>fage</th>\n",
              "      <th>fedyrs</th>\n",
              "      <th>fnocig</th>\n",
              "      <th>fheight</th>\n",
              "      <th>lowbwt</th>\n",
              "      <th>mage35</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>1360</td>\n",
              "      <td>56</td>\n",
              "      <td>4.55</td>\n",
              "      <td>34</td>\n",
              "      <td>44</td>\n",
              "      <td>0</td>\n",
              "      <td>20</td>\n",
              "      <td>0</td>\n",
              "      <td>162</td>\n",
              "      <td>57</td>\n",
              "      <td>23</td>\n",
              "      <td>10</td>\n",
              "      <td>35</td>\n",
              "      <td>179</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>1016</td>\n",
              "      <td>53</td>\n",
              "      <td>4.32</td>\n",
              "      <td>36</td>\n",
              "      <td>40</td>\n",
              "      <td>0</td>\n",
              "      <td>19</td>\n",
              "      <td>0</td>\n",
              "      <td>171</td>\n",
              "      <td>62</td>\n",
              "      <td>19</td>\n",
              "      <td>12</td>\n",
              "      <td>0</td>\n",
              "      <td>183</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>462</td>\n",
              "      <td>58</td>\n",
              "      <td>4.10</td>\n",
              "      <td>39</td>\n",
              "      <td>41</td>\n",
              "      <td>0</td>\n",
              "      <td>35</td>\n",
              "      <td>0</td>\n",
              "      <td>172</td>\n",
              "      <td>58</td>\n",
              "      <td>31</td>\n",
              "      <td>16</td>\n",
              "      <td>25</td>\n",
              "      <td>185</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>1187</td>\n",
              "      <td>53</td>\n",
              "      <td>4.07</td>\n",
              "      <td>38</td>\n",
              "      <td>44</td>\n",
              "      <td>0</td>\n",
              "      <td>20</td>\n",
              "      <td>0</td>\n",
              "      <td>174</td>\n",
              "      <td>68</td>\n",
              "      <td>26</td>\n",
              "      <td>14</td>\n",
              "      <td>25</td>\n",
              "      <td>189</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>553</td>\n",
              "      <td>54</td>\n",
              "      <td>3.94</td>\n",
              "      <td>37</td>\n",
              "      <td>42</td>\n",
              "      <td>0</td>\n",
              "      <td>24</td>\n",
              "      <td>0</td>\n",
              "      <td>175</td>\n",
              "      <td>66</td>\n",
              "      <td>30</td>\n",
              "      <td>12</td>\n",
              "      <td>0</td>\n",
              "      <td>184</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "     ID  Length  Birthweight  Headcirc  Gestation  smoker  mage  mnocig  \\\n",
              "0  1360      56         4.55        34         44       0    20       0   \n",
              "1  1016      53         4.32        36         40       0    19       0   \n",
              "2   462      58         4.10        39         41       0    35       0   \n",
              "3  1187      53         4.07        38         44       0    20       0   \n",
              "4   553      54         3.94        37         42       0    24       0   \n",
              "\n",
              "   mheight  mppwt  fage  fedyrs  fnocig  fheight  lowbwt  mage35  \n",
              "0      162     57    23      10      35      179       0       0  \n",
              "1      171     62    19      12       0      183       0       0  \n",
              "2      172     58    31      16      25      185       0       1  \n",
              "3      174     68    26      14      25      189       0       0  \n",
              "4      175     66    30      12       0      184       0       0  "
            ]
          },
          "execution_count": 190,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YO-L17ywiJG6"
      },
      "source": [
        "<p class=MsoNormal><span lang=EN-GB>This dataset contains information on new\n",
        "born babies and their parents.<span style='mso-spacerun:yes'>  </span>It\n",
        "contains mostly continuous variables (although some have only a few values <span\n",
        "class=GramE>e.g.</span> number of cigarettes smoked per day) and is most useful\n",
        "for correlation and regression.<span style='mso-spacerun:yes'>  </span><span style='mso-spacerun:yes'> </span></span></p>\n",
        "\n",
        "<p class=MsoNormal><span lang=EN-GB>Main dependent variable = Birthweight (lbs)</span></p>\n",
        "\n",
        "<table class=MsoNormalTable border=0 cellspacing=0 cellpadding=0 width=600\n",
        " style='width:6.25in;margin-left:4.65pt;border-collapse:collapse;mso-yfti-tbllook:\n",
        " 1184;mso-padding-alt:0in 5.4pt 0in 5.4pt'>\n",
        " <tr style='mso-yfti-irow:0;mso-yfti-firstrow:yes;height:21.0pt'>\n",
        "  <td width=129 style='width:96.65pt;border-top:solid #4BACC6 1.0pt;border-left:\n",
        "  solid #4BACC6 1.0pt;border-bottom:none;border-right:none;background:#4BACC6;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:21.0pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:white;mso-themecolor:background1;mso-fareast-language:EN-GB'>Name</span></b><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:white;mso-fareast-language:EN-GB'><o:p></o:p></span></b></p>\n",
        "  </td>\n",
        "  <td width=316 style='width:237.25pt;border:none;border-top:solid #4BACC6 1.0pt;\n",
        "  background:#4BACC6;padding:0in 5.4pt 0in 5.4pt;height:21.0pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:white;mso-themecolor:background1;mso-fareast-language:EN-GB'>Variable</span></b><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:white;mso-fareast-language:EN-GB'><o:p></o:p></span></b></p>\n",
        "  </td>\n",
        "  <td width=155 style='width:116.1pt;border-top:solid #4BACC6 1.0pt;border-left:\n",
        "  none;border-bottom:none;border-right:solid #4BACC6 1.0pt;background:#4BACC6;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:21.0pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:white;mso-themecolor:background1;mso-fareast-language:EN-GB'>Data type</span></b><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:white;mso-fareast-language:EN-GB'><o:p></o:p></span></b></p>\n",
        "  </td>\n",
        " </tr>\n",
        " <tr style='mso-yfti-irow:1;height:13.85pt'>\n",
        "  <td width=129 style='width:96.65pt;border:solid #4BACC6 1.0pt;border-right:\n",
        "  none;padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>ID<o:p></o:p></span></b></p>\n",
        "  </td>\n",
        "  <td width=316 style='width:237.25pt;border-top:solid #4BACC6 1.0pt;\n",
        "  border-left:none;border-bottom:solid #4BACC6 1.0pt;border-right:none;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Baby number<o:p></o:p></span></p>\n",
        "  </td>\n",
        "  <td width=155 style='width:116.1pt;border:solid #4BACC6 1.0pt;border-left:\n",
        "  none;padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'><o:p>&nbsp;</o:p></span></p>\n",
        "  </td>\n",
        " </tr>\n",
        " <tr style='mso-yfti-irow:2;height:13.85pt'>\n",
        "  <td width=129 style='width:96.65pt;border-top:none;border-left:solid #4BACC6 1.0pt;\n",
        "  border-bottom:solid #4BACC6 1.0pt;border-right:none;mso-border-top-alt:solid #4BACC6 1.0pt;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;length<o:p></o:p></span></b></p>\n",
        "  </td>\n",
        "  <td width=316 style='width:237.25pt;border:none;border-bottom:solid #4BACC6 1.0pt;\n",
        "  mso-border-top-alt:solid #4BACC6 1.0pt;padding:0in 5.4pt 0in 5.4pt;\n",
        "  height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Length of baby (cm)<o:p></o:p></span></p>\n",
        "  </td>\n",
        "  <td width=155 style='width:116.1pt;border-top:none;border-left:none;\n",
        "  border-bottom:solid #4BACC6 1.0pt;border-right:solid #4BACC6 1.0pt;\n",
        "  mso-border-top-alt:solid #4BACC6 1.0pt;padding:0in 5.4pt 0in 5.4pt;\n",
        "  height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Scale<o:p></o:p></span></p>\n",
        "  </td>\n",
        " </tr>\n",
        " <tr style='mso-yfti-irow:3;height:13.85pt'>\n",
        "  <td width=129 style='width:96.65pt;border:none;border-left:solid #4BACC6 1.0pt;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;Birthweight<o:p></o:p></span></b></p>\n",
        "  </td>\n",
        "  <td width=316 style='width:237.25pt;padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Weight of baby (kg)<o:p></o:p></span></p>\n",
        "  </td>\n",
        "  <td width=155 style='width:116.1pt;border:none;border-right:solid #4BACC6 1.0pt;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Scale<o:p></o:p></span></p>\n",
        "  </td>\n",
        " </tr>\n",
        " <tr style='mso-yfti-irow:4;height:13.85pt'>\n",
        "  <td width=129 style='width:96.65pt;border:solid #4BACC6 1.0pt;border-right:\n",
        "  none;padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;<span class=SpellE>headcirumference</span><o:p></o:p></span></b></p>\n",
        "  </td>\n",
        "  <td width=316 style='width:237.25pt;border-top:solid #4BACC6 1.0pt;\n",
        "  border-left:none;border-bottom:solid #4BACC6 1.0pt;border-right:none;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Head Circumference<o:p></o:p></span></p>\n",
        "  </td>\n",
        "  <td width=155 style='width:116.1pt;border:solid #4BACC6 1.0pt;border-left:\n",
        "  none;padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Scale<o:p></o:p></span></p>\n",
        "  </td>\n",
        " </tr>\n",
        " <tr style='mso-yfti-irow:5;height:13.85pt'>\n",
        "  <td width=129 style='width:96.65pt;border:none;border-left:solid #4BACC6 1.0pt;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;Gestation<o:p></o:p></span></b></p>\n",
        "  </td>\n",
        "  <td width=316 style='width:237.25pt;padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Gestation (weeks)<o:p></o:p></span></p>\n",
        "  </td>\n",
        "  <td width=155 style='width:116.1pt;border:none;border-right:solid #4BACC6 1.0pt;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Scale<o:p></o:p></span></p>\n",
        "  </td>\n",
        " </tr>\n",
        " <tr style='mso-yfti-irow:6;height:13.85pt'>\n",
        "  <td width=129 style='width:96.65pt;border:none;border-left:solid #4BACC6 1.0pt;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;smoker<o:p></o:p></span></b></p>\n",
        "  </td>\n",
        "  <td width=316 style='width:237.25pt;padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Mother smokes 1 = smoker 0 =\n",
        "  non-smoker<o:p></o:p></span></p>\n",
        "  </td>\n",
        "  <td width=155 style='width:116.1pt;border:none;border-right:solid #4BACC6 1.0pt;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;Binary<o:p></o:p></span></p>\n",
        "  </td>\n",
        " </tr>\n",
        " <tr style='mso-yfti-irow:7;height:13.85pt'>\n",
        "  <td width=129 style='width:96.65pt;border:solid #4BACC6 1.0pt;border-right:\n",
        "  none;padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;<span class=SpellE>motherage</span><o:p></o:p></span></b></p>\n",
        "  </td>\n",
        "  <td width=316 style='width:237.25pt;border-top:solid #4BACC6 1.0pt;\n",
        "  border-left:none;border-bottom:solid #4BACC6 1.0pt;border-right:none;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Maternal age<o:p></o:p></span></p>\n",
        "  </td>\n",
        "  <td width=155 style='width:116.1pt;border:solid #4BACC6 1.0pt;border-left:\n",
        "  none;padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Scale<o:p></o:p></span></p>\n",
        "  </td>\n",
        " </tr>\n",
        " <tr style='mso-yfti-irow:8;height:12.35pt'>\n",
        "  <td width=129 style='width:96.65pt;border-top:none;border-left:solid #4BACC6 1.0pt;\n",
        "  border-bottom:solid #4BACC6 1.0pt;border-right:none;mso-border-top-alt:solid #4BACC6 1.0pt;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:12.35pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;<span class=SpellE>mnocig</span><o:p></o:p></span></b></p>\n",
        "  </td>\n",
        "  <td width=316 style='width:237.25pt;border:none;border-bottom:solid #4BACC6 1.0pt;\n",
        "  mso-border-top-alt:solid #4BACC6 1.0pt;padding:0in 5.4pt 0in 5.4pt;\n",
        "  height:12.35pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Number of cigarettes smoked per day\n",
        "  by mother<o:p></o:p></span></p>\n",
        "  </td>\n",
        "  <td width=155 style='width:116.1pt;border-top:none;border-left:none;\n",
        "  border-bottom:solid #4BACC6 1.0pt;border-right:solid #4BACC6 1.0pt;\n",
        "  mso-border-top-alt:solid #4BACC6 1.0pt;padding:0in 5.4pt 0in 5.4pt;\n",
        "  height:12.35pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Scale<o:p></o:p></span></p>\n",
        "  </td>\n",
        " </tr>\n",
        " <tr style='mso-yfti-irow:9;height:13.85pt'>\n",
        "  <td width=129 style='width:96.65pt;border:none;border-left:solid #4BACC6 1.0pt;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;<span class=SpellE>mheight</span><o:p></o:p></span></b></p>\n",
        "  </td>\n",
        "  <td width=316 style='width:237.25pt;padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  class=GramE><span lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\n",
        "  \"Times New Roman\";mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Mothers</span></span><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'> height (cm)<o:p></o:p></span></p>\n",
        "  </td>\n",
        "  <td width=155 style='width:116.1pt;border:none;border-right:solid #4BACC6 1.0pt;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;Scale<o:p></o:p></span></p>\n",
        "  </td>\n",
        " </tr>\n",
        " <tr style='mso-yfti-irow:10;height:13.85pt'>\n",
        "  <td width=129 style='width:96.65pt;border:none;border-left:solid #4BACC6 1.0pt;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;<span class=SpellE>mppwt</span><o:p></o:p></span></b></p>\n",
        "  </td>\n",
        "  <td width=316 style='width:237.25pt;padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  class=GramE><span lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\n",
        "  \"Times New Roman\";mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Mothers</span></span><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'> pre-pregnancy weight (kg)<o:p></o:p></span></p>\n",
        "  </td>\n",
        "  <td width=155 style='width:116.1pt;border:none;border-right:solid #4BACC6 1.0pt;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;Scale<o:p></o:p></span></p>\n",
        "  </td>\n",
        " </tr>\n",
        " <tr style='mso-yfti-irow:11;height:13.85pt'>\n",
        "  <td width=129 style='width:96.65pt;border:solid #4BACC6 1.0pt;border-right:\n",
        "  none;padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;<span class=SpellE>fage</span><o:p></o:p></span></b></p>\n",
        "  </td>\n",
        "  <td width=316 style='width:237.25pt;border-top:solid #4BACC6 1.0pt;\n",
        "  border-left:none;border-bottom:solid #4BACC6 1.0pt;border-right:none;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Father's age<o:p></o:p></span></p>\n",
        "  </td>\n",
        "  <td width=155 style='width:116.1pt;border:solid #4BACC6 1.0pt;border-left:\n",
        "  none;padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;Scale<o:p></o:p></span></p>\n",
        "  </td>\n",
        " </tr>\n",
        " <tr style='mso-yfti-irow:12;height:13.85pt'>\n",
        "  <td width=129 style='width:96.65pt;border-top:none;border-left:solid #4BACC6 1.0pt;\n",
        "  border-bottom:solid #4BACC6 1.0pt;border-right:none;mso-border-top-alt:solid #4BACC6 1.0pt;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  class=SpellE><b><span lang=EN-GB style='mso-ascii-font-family:Calibri;\n",
        "  mso-fareast-font-family:\"Times New Roman\";mso-hansi-font-family:Calibri;\n",
        "  mso-bidi-font-family:\"Times New Roman\";color:black;mso-fareast-language:EN-GB'>fedyrs</span></b></span><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'><o:p></o:p></span></b></p>\n",
        "  </td>\n",
        "  <td width=316 style='width:237.25pt;border:none;border-bottom:solid #4BACC6 1.0pt;\n",
        "  mso-border-top-alt:solid #4BACC6 1.0pt;padding:0in 5.4pt 0in 5.4pt;\n",
        "  height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Father’s years in education<o:p></o:p></span></p>\n",
        "  </td>\n",
        "  <td width=155 style='width:116.1pt;border-top:none;border-left:none;\n",
        "  border-bottom:solid #4BACC6 1.0pt;border-right:solid #4BACC6 1.0pt;\n",
        "  mso-border-top-alt:solid #4BACC6 1.0pt;padding:0in 5.4pt 0in 5.4pt;\n",
        "  height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Scale<o:p></o:p></span></p>\n",
        "  </td>\n",
        " </tr>\n",
        " <tr style='mso-yfti-irow:13;height:13.85pt'>\n",
        "  <td width=129 style='width:96.65pt;border-top:none;border-left:solid #4BACC6 1.0pt;\n",
        "  border-bottom:solid #4BACC6 1.0pt;border-right:none;mso-border-top-alt:solid #4BACC6 1.0pt;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;<span class=SpellE>fnocig</span><o:p></o:p></span></b></p>\n",
        "  </td>\n",
        "  <td width=316 style='width:237.25pt;border:none;border-bottom:solid #4BACC6 1.0pt;\n",
        "  mso-border-top-alt:solid #4BACC6 1.0pt;padding:0in 5.4pt 0in 5.4pt;\n",
        "  height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Number of cigarettes smoked per day\n",
        "  by father<o:p></o:p></span></p>\n",
        "  </td>\n",
        "  <td width=155 style='width:116.1pt;border-top:none;border-left:none;\n",
        "  border-bottom:solid #4BACC6 1.0pt;border-right:solid #4BACC6 1.0pt;\n",
        "  mso-border-top-alt:solid #4BACC6 1.0pt;padding:0in 5.4pt 0in 5.4pt;\n",
        "  height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Scale<o:p></o:p></span></p>\n",
        "  </td>\n",
        " </tr>\n",
        " <tr style='mso-yfti-irow:14;height:13.85pt'>\n",
        "  <td width=129 style='width:96.65pt;border-top:none;border-left:solid #4BACC6 1.0pt;\n",
        "  border-bottom:solid #4BACC6 1.0pt;border-right:none;mso-border-top-alt:solid #4BACC6 1.0pt;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;<span class=SpellE>fheight</span><o:p></o:p></span></b></p>\n",
        "  </td>\n",
        "  <td width=316 style='width:237.25pt;border:none;border-bottom:solid #4BACC6 1.0pt;\n",
        "  mso-border-top-alt:solid #4BACC6 1.0pt;padding:0in 5.4pt 0in 5.4pt;\n",
        "  height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Father's height (kg)<o:p></o:p></span></p>\n",
        "  </td>\n",
        "  <td width=155 style='width:116.1pt;border-top:none;border-left:none;\n",
        "  border-bottom:solid #4BACC6 1.0pt;border-right:solid #4BACC6 1.0pt;\n",
        "  mso-border-top-alt:solid #4BACC6 1.0pt;padding:0in 5.4pt 0in 5.4pt;\n",
        "  height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;Scale<o:p></o:p></span></p>\n",
        "  </td>\n",
        " </tr>\n",
        " <tr style='mso-yfti-irow:15;height:13.85pt'>\n",
        "  <td width=129 style='width:96.65pt;border-top:none;border-left:solid #4BACC6 1.0pt;\n",
        "  border-bottom:solid #4BACC6 1.0pt;border-right:none;mso-border-top-alt:solid #4BACC6 1.0pt;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;<span class=SpellE>lowbwt</span><o:p></o:p></span></b></p>\n",
        "  </td>\n",
        "  <td width=316 style='width:237.25pt;border:none;border-bottom:solid #4BACC6 1.0pt;\n",
        "  mso-border-top-alt:solid #4BACC6 1.0pt;padding:0in 5.4pt 0in 5.4pt;\n",
        "  height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Low birth weight, 0 = No and 1 = yes<o:p></o:p></span></p>\n",
        "  </td>\n",
        "  <td width=155 style='width:116.1pt;border-top:none;border-left:none;\n",
        "  border-bottom:solid #4BACC6 1.0pt;border-right:solid #4BACC6 1.0pt;\n",
        "  mso-border-top-alt:solid #4BACC6 1.0pt;padding:0in 5.4pt 0in 5.4pt;\n",
        "  height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>&nbsp;Binary<o:p></o:p></span></p>\n",
        "  </td>\n",
        " </tr>\n",
        " <tr style='mso-yfti-irow:16;mso-yfti-lastrow:yes;height:13.85pt'>\n",
        "  <td width=129 style='width:96.65pt;border-top:none;border-left:solid #4BACC6 1.0pt;\n",
        "  border-bottom:solid #4BACC6 1.0pt;border-right:none;mso-border-top-alt:solid #4BACC6 1.0pt;\n",
        "  padding:0in 5.4pt 0in 5.4pt;height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><b><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>mage35<o:p></o:p></span></b></p>\n",
        "  </td>\n",
        "  <td width=316 style='width:237.25pt;border:none;border-bottom:solid #4BACC6 1.0pt;\n",
        "  mso-border-top-alt:solid #4BACC6 1.0pt;padding:0in 5.4pt 0in 5.4pt;\n",
        "  height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Mother over 35, 0 = No and 1 = yes<o:p></o:p></span></p>\n",
        "  </td>\n",
        "  <td width=155 style='width:116.1pt;border-top:none;border-left:none;\n",
        "  border-bottom:solid #4BACC6 1.0pt;border-right:solid #4BACC6 1.0pt;\n",
        "  mso-border-top-alt:solid #4BACC6 1.0pt;padding:0in 5.4pt 0in 5.4pt;\n",
        "  height:13.85pt'>\n",
        "  <p class=MsoNormal style='margin-bottom:0in;line-height:normal'><span\n",
        "  lang=EN-GB style='mso-ascii-font-family:Calibri;mso-fareast-font-family:\"Times New Roman\";\n",
        "  mso-hansi-font-family:Calibri;mso-bidi-font-family:\"Times New Roman\";\n",
        "  color:black;mso-fareast-language:EN-GB'>Binary<o:p></o:p></span></p>\n",
        "  </td>\n",
        " </tr>\n",
        "</table>\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "y3nQ0AfeiJHD"
      },
      "source": [
        "**Your turn to explore!** Uncomment and try more Pandas methods we can apply to our dataframe:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RNdCCqloiJHE"
      },
      "source": [
        "# TRY THESE!\n",
        "# Method to see the tail end of the data\n",
        "# data.tail()\n",
        "\n",
        "# Use the DataFrame.info() method to find out more about a dataframe.\n",
        "# data.info()\n",
        "\n",
        "# Get Dimensions (rows, columns)\n",
        "# data.shape\n",
        "\n",
        "# Get data types\n",
        "# data.dtypes\n",
        "\n",
        "# The DataFrame.columns variable stores information about the dataframe’s columns.\n",
        "# data.columns"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "M3Bvnz-1iJHF"
      },
      "source": [
        "# Use DataFrame.describe() to get summary statistics about data.\n",
        "# data.describe()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9xMcUBGgchMV"
      },
      "source": [
        "<div class=\"alert alert-block alert-info\">\n",
        "    <ul><li>Use the Pandas library to get basic statistics out of tabular data.</li>\n",
        "\n",
        "<li>Use `index_col` to specify that a column’s values should be used as row headings.</li>\n",
        "\n",
        "<li>Use `DataFrame.info` to find out more about a dataframe.</li>\n",
        "\n",
        "<li>The `DataFrame.columns` variable stores information about the dataframe’s columns.</li>\n",
        "\n",
        "<li>Use `DataFrame.T` to transpose a dataframe.</li>\n",
        "\n",
        "<li>Use `DataFrame.describe` to get summary statistics about data.</li></ul>\n",
        "</div>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "65yzm4ytiJHF"
      },
      "source": [
        "<a name=\"anchor2b\"></a>\n",
        "## A note about $p$-values"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uqAZq7uMiJHF"
      },
      "source": [
        "When we are doing statistical testing, we form a null hypothesis that states that there is no relationship, either between the two variables being studied (one variable does not affect the other), or between a variable and some constant number like a population mean. \n",
        "\n",
        "The $p$-value, or probability value, returned with your statistical test is a number describing how likely it is that your data would have occurred by random chance (i.e. that the null hypothesis is true).  It is a number between 0 and 1.  The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.  A common threshold of significance is $p < 0.05$. \n",
        "\n",
        "<img src=\"https://github.com/DeisData/datasets/blob/main/p-value.png?raw=true\" alt=\"drawing\" style=\"width:450px;\"/>\n",
        "\n",
        "*Image from [What a p-value tells you about statistical significance](https://www.simplypsychology.org/p-value.html)*\n",
        "\n",
        "There is a lot of conversation around use and [mis-use of $p$-values](https://en.wikipedia.org/wiki/Misuse_of_p-values) in scientific analysis.  Here are a few articles you may find interesting:\n",
        "- [\"After p values: The new statistics for undergraduate neuroscience education\"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5777847/)\n",
        "- [\"Estimation for better inference in neuroscience\"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6709209/) \n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8mzzyS6AiJHG"
      },
      "source": [
        "<a name=\"anchor3\"></a>\n",
        "## 3 Pearson and Spearman correlation\n",
        "\n",
        "### 3.1 Theory: As linear models \n",
        "\n",
        "**Model:** the recipe for $y$ is a slope ($\\beta_1$) times $x$ plus an y-intercept ($\\beta_0$, aka a straight line).\n",
        "\n",
        "$y = \\beta_0 + \\beta_1 x \\qquad \\mathcal{H}_0: \\beta_1 = 0$\n",
        "\n",
        "... which is a mathy way of writing the good old $y = ax + b$ (here ordered as $y = b + ax$). The task of linear models is simply to find the numbers that best predict `y`.\n",
        "\n",
        "Either way you write it, it's an intercept ($\\beta_0$) and a slope ($\\beta_1$) yielding a straight line."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pL34VprbiJHG"
      },
      "source": [
        "**With our data**\n",
        "we can use the seaborn regression plot tool `sns.regplot`to quickly visualize a linear model fit between mother's height `mheight` and baby length `Length` when asking the question \"Can moether's height predict baby length?\""
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Q_wu2XyfiJHG",
        "outputId": "396f65d6-bacf-4ea9-9c8d-35f23f9aa75a"
      },
      "source": [
        "sns.regplot(x='mheight',y='Length',data=data)"
      ],
      "execution_count": null,
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<AxesSubplot:xlabel='mheight', ylabel='Length'>"
            ]
          },
          "execution_count": 193,
          "metadata": {},
          "output_type": "execute_result"
        },
        {
          "data": {
            "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEGCAYAAABiq/5QAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAvD0lEQVR4nO3deXAc93Xg8e+bCzMAcQwkQqJIkBAtKbRkSTxArteyZUp2nI0PuRI7jpJ4y7lWipM4sl2Kd5VklSy3UuU4ScXKHrYY27VeOz7kjWRr42wi2QrNKIksgrQO0qItiQJ4CyJxH4O53v7RPcMBMABmgOmZ6Zn3qWJh0Ojp/v3Y5MNvfv36/URVMcYY0zwCtW6AMcaY6rLAb4wxTcYCvzHGNBkL/MYY02Qs8BtjTJMJ1boBpbj88su1r6+v1s0wxhhfOXz48AVVXb9wuy8Cf19fHwMDA7VuhjHG+IqIDBXbblM9xhjTZCzwG2NMk7HAb4wxTcYCvzHGNBkL/MYY02Q8zeoRkUFgEsgAaVXtF5HtwGeBKJAGflNVn/ayHcaY2jhwfJgHD57g1OgMvfFW7r51K3u39ax6P1MZ1Rjx36aq21W13/3+U8B/UdXtwP3u98aYBnPg+DD3P3qM4ckEXbEww5MJ7n/0GAeOD69qP1M5tZjqUaDDfd0JnK1BG4wxHnvw4AnCQaE1EkLE+RoOCg8ePLGq/UzleP0AlwKPiYgCD6rqfuCjwD+IyJ/h/OJ5U7E3ishdwF0Amzdv9riZxphKOzU6Q1csPG9bLBzk9OjMqvYzleP1iP8WVd0J/DTwWyJyK/Bh4GOq2gt8DPh8sTeq6n5V7VfV/vXrFz1xbIypc73xVmZTmXnbZlMZNsVbV7WfqRxPA7+qnnW/DgOPAHuADwEPu7t8w91mjGkwd9+6lVRGmUmmUXW+pjLK3bduXdV+pnI8C/wi0iYi7bnXwDuAozhz+m91d7sdeNGrNhhjamfvth723XEDPe1RxmdT9LRH2XfHDYuydUrdz1SOl3P8VwCPiEjuPF9R1b8XkSngAREJAQnceXxjTOPZu62npABe6n6mMjwL/Kp6Ari5yPYngV1endcYY8zy7MldY4xpMr6ox2+MMaZ0yXR2UaZUIQv8xhjTIBKpDGMzKWaSaaLh4JL7WeA3xhifm0mmGZtJkVhmlF/IAr8xxvjU1FyasZkkyXS2rPdZ4DfGGB9RVSbn0ozPpEhlygv4ORb4jTHGB7JZZTKRZnw2RTq7uoCfY4HfGGPqWDarjM+mmEikyGS1Ise0wG+MMXUonckyPptiMpEmq5UJ+DkW+I0xpo6kMlnGZlJMzTlF67xggd8YY+pAIpVhfDbF9Fza83NZ4DfGmBpRVabm0kwk0syVmINfCRb4jTGmyjJZZaLCN2zLYYHfGGOqJFVww9ar+ftSWOA3xhiPJVIZJmadG7b1wAK/McZ4QFWZTmaYTKSYTVZv/r4UFviNMaaCUpksk4k0kzWavy+Fp4FfRAaBSSADpFW1393+EeC3gTTwbVX9hJftMMYYL9Xz6L6Yaoz4b1PVC7lvROQ24L3ATao6JyK20KZZlQPHh3nw4AlOjc7QG2/l7lu32rqtpqr8MLovphZTPR8GPqmqcwCqOlyDNhifO3B8mPsfPUY4KHTFwgxPJrj/0WPsAwv+xlN+G90X4/Wauwo8JiKHReQud9t1wFtE5Psi8j0R2e1xG0wDevDgCcJBoTUSQsT5Gg4KDx48UeummQaVzmQZmU5yamSW4YmEb4M+eD/iv0VVz7rTOY+LyHH3nHHgjcBu4CER2aoLklrdXxR3AWzevNnjZhq/OTU6Q1csPG9bLBzk9OhMjVpkGtVsMsNEojqlFKrF0xG/qp51vw4DjwB7gNPAw+p4GsgClxd5735V7VfV/vXr13vZTONDvfHWRYtJz6YybIq31qhFppHkSiGfGpnh3PhsQwV98DDwi0ibiLTnXgPvAI4C3wRud7dfB0SAC0scxpii7r51K6mMMpN0noCcSaZJZZS7b91a66YZH0tlslycmuPkyAwXp+ZWvcJVvfNyqucK4BERyZ3nK6r69yISAb4gIkeBJPChhdM8xqxk77Ye9uHM9Z8enWGTZfWYNZhJppmYTTOTbKyR/VI8C/yqegK4ucj2JPBBr85rmsfebT0W6M2q5ZYynEisfu1av7Ind40xTSWZzjKRSDHlwcpWfmGB3xjTFJptOmc5FviNMQ0rk1WmmnQ6ZzkW+I0xDSdXBnk6malp3ft6ZYHfGNMQarWMoR9Z4DfG+FpuGcPJRJp01qZzSmGB3xjjS3PpDOOzKabnbDqnXBb4jTG+kauMOTGbImHTOYtkVTnx2jSHBkc4cnJsyf0s8Btj6p5N5yxtZDrJ4aFRDg2OcHholNGZ1IrvscBvjKlbiVSuMqZN5+Qk01mOnh1nYHCUgcFRXnptatE+V3S08Matl/GXSxzDAr8xpq5Yds58qsqpkVkODY0wMDjKs6fGSKTnf+qJhgNs7+2if0s3/X1xeuMxYpGQBX5jTH1LZbJMzKaYmkv7ahlDL0wmUhw5OcahQSfYD0/OLdrnmp517O6L078lzg1XdRIJlV5s2QK/MaZmGmEZw0rIZJUXzk0wMDTKwOAIx89PsvB3X7w1TH9fN7v74uzcHKe7LbLq81ngN8ZUXa5Q2nQTj+7PTyQYGBzh0OAoR06OMj03/xdfOCjcuLHTCfZb4mxd34Zb5n7NLPAbY6pmes6pm9OMo/vZZIZnT49xaNDJwDk9Ortony3drfT3xdnd181NmzqJhoOetMUCvzHGU5msMplwUjGbqVBaYU79ocFRjp4ZJ73g0017NMTOzfH8XH1PR7QqbbPAb4zxRDM+WTsynczP0xfLqQ8IvH5DB7vdUf11V7QTDFRm+qYcFviNMRU1NZdumidrk+ksR8+MO9k3Q6O8/Nr0on2u7Iiyuy/Orr44O3vjrIvWPux62gIRGQQmgQyQVtX+gp/dC/wpsF5VbbF1Y3xMVZlIOAG/cDrn6RMjfO3QKc5NzLKhI8adu3vZs7W7Km3y4tyl5NTHwkG293Y50zd9cTZ2xSp2U7ZSqvGr57aFgV1EeoGfBE5W4fzGGI/MpTNMJdJFc++fPjHCA0+8SCggdERDXJye44EnXuQervU8+Ffy3BOzTk79gDuqL5ZTf62bU7+7r5vrr+ogHCw9p74WavWZ4y+ATwDfqtH5jTGrlMk6T9ZOJlIk00vfrP3aoVOEAkLMzUyJhYPMpjJ87dApzwP/Ws6dz6kfHOXQ0Ag/KpJT390WoX+Lc1N215Y4Xa2rz6mvBa8DvwKPiYgCD6rqfhG5Azijqs8u9/FHRO4C7gLYvHmzx800xqxk1n3QqtRVrc5NzNKxYD47Gg5wfmJxGmOllXvu8+MJBoaWz6m/aWMnu9wHqLZeXrmc+lrwOvDfoqpnRaQHeFxEjgO/D7xjpTeq6n5gP0B/f39zpAQYU2dSmSyTiTRTq6iKuaEjxsXpufyoGyCRynJlR6zSzSz73LPJDM+cGsvflC2aU39Zqzuq9zanvhY8Dfyqetb9OiwijwBvBa4GcqP9TcAREdmjque9bIsxpjSqykwys+YHre7c3csDT7zIbCpDNBwgkcqSzip37u6tYGtLO/dsMkMineWKjhY+/tAzHD0zsSinviMaYtcWJ5++v6+b9e0tnrezVjwL/CLSBgRUddJ9/Q5gn6r2FOwzCPRbVo8xtZdMZ5maW93ovpg9W7u5h2v52qFTnJ+Y5coqZvXs2drNryT6+PJTJzk1Mks6q6Szyj/88NX8PgGBG67qyFe0rFVOfS14OeK/AnjEHdmHgK+o6t97eD5jTJkyWWUqkWZybvkbtau1Z2t31dI3k+ksz58Zd+rfDI1yokhO/YbOKLv7uunfEmfH5i7aWmqfU18LnvVaVU8AN6+wT59X5zfGLK0RFjhRVYZGZjg0OMrhwRGePT3OXJGc+h2bu/Jz9Rvj3t9f8IPm/HVnTJOanksz7uOnap2c+lEOuatPvTY1P6degGuvWJcf1fshp74WLPAb0+CyWWUqmWZ8JuW7ImnpTJYXzk0yMORk3xTLqb+sLZKvaLlrc5zO1nBtGusjFviNaVC5Fa0mE2myPprOOTc+my9d/MzJMaaTRXLqN3XlK1pe7fOc+lqwwG9MA8lklemkk5njl+mcmWTazakf5fAyOfW5kgg3beykpYFy6mvBAn+dOnB8mAcPnuDU6Ay98VbuvnUre7f1rPzGBlDpvv/ld37M5558helkhrZIkF9/89X8ztuvW9O5yzlmqcrp98J9f/lNW9ixOV7yU7XFlFrUbK3Fz7KqvDQ85ZREGBzh2NllcurdufrCnPpaFn5rFOKHO/r9/f06MDBQ62ZUzYHjw9z/6DHCQcnXGElllH133NDwwb/Sff/L7/yYB554iYA4edtZdf7cc/s1iwJ1qecu55he9Du3bygALaEg08k0qYxyz+2rL35WWNSs8GGrhccsdb+FLkzNcXhoND+qH5+dX6c+GBCud+vU9/fFubaneE79as/fjKLhIBvjrYcLqyLn2Ii/Dj148AThoNAacS5PayTETDLNgwdPNHzgr3TfP/fkKwQEQgEnsyMgkM5m+dyTrywK0qWeu5xjetHv/3ngZQQlFAySySrRUBDVtRU/K7WoWan7JdNZnjt9afrmxIXFOfVXdUWdh6fKyKmvZeG3RmKBvw6dGp2hKzY/MyEWDnJ6dKZGLaqeSvd9OpkhtCCbLyAsumFYzrnLOWapVjp37qna6bk0QyPTTgGygg/ray1+VmpRs6X2Ozc+wysXpvOrTz17enzRA2GtkSA7ervo73OmcDZ2lZ9TX8vCb43EAn8d6o23MjyZyI/+AGZTGTbFW2vYquqodN/bIs6IsHDWIKvO9tWeu5xjlqrYuWeSaa7sjHJmbJa5ghu1XhQ/K/WYhftlsspMMs1EIk0yk+XXvjh/OlaA665sz5cvvn5DB6E15tTXsvBbI7EnG+rQ3bduJZVx/lM5BbOcOdy7b91a66Z5rtJ9//U3X01WnamYrGbdr8721Z67nGOWqvDcmWyWyUSK2VSW9+3YNC/og1OALJ1VZlMZFOfrWouflXLMdCbLG6/uZnQmyeDFaV6+MM25iTmmk879CIDL1kX4qRuu4A/e9Xoe/vCb+Mwv7eTX3nw1N23qWnPQ96rvzchu7tapXNbG6dEZNjVpVk+l+r6arJ6Vzl3prJ5kOstjx87z+Sdf4dz4ygXNcpktlSx+VuyYm7pjzlOyQ8Vz6gVn9anbX9/D7r5u+i5r9Tyn3ou+N6Llbu5a4DemRhKpDDPJDNNz6bp5onYmmeYHJ8fyq0+dHUss2ufqy9vc0sVxy6mvY5bVY0wdyNW5n0lmmE1mKlL6eK2yqvz41Uk3p36UH56bWLR2bi6nfndfN7sW5NQbf7LAb4yHcsF+ei7NTDJTF6UTLkzN5R+eOjw0ykQiPe/nwYBww1VuTv2Wbq7pWdc0deqbhQV+Yzwwl84wmXDSLxeOoKvellSG586MMzA4ysDQKK8skVO/212QZMfmrnnZRcZfAiIEA0JkYc5xAbu6xlSIqjLllj32YlGTctoxeHHGWZBkcJTnziyRU7+5K7/61Gpy6k31iAhBEYJB92tACAXmf5/bVsrNdQv8xqxRKpNlKpFmIpGq2eh+fDbllkRwyhdfnErO+3kupz5X0bISOfWmcsLBAOFggFDQDegBIRQIEAg4T4hXeqrN08Dvrqk7CWSAtKr2i8ifAu8BksDLwK+o6piX7TCOZi78VqpS/o5UlceOnuevnnyF06MzNUkpTGeyHDs3kZ+++fH5SRb+yrl8XSS/IMnOLXE6Y8vXqf/Svwzy0OHTzKYyxMJBPrBrE//+TX2e9aEZiBQG8ksj80DB95FgoOIpsLl/x+H1fTcW+3k1Rvy3LVhM/XHgPlVNi8ifAPcB/7EK7WhqhUXAumJhhicT3P/oMfaBBX/Xcn9Hb772cmbd9MsDx4f59HedQmHt0RAXp+d44IkXuQdvC4WdGZt1Av3gCD84NcbMgpz6SCjAzZs68xUty8mp/9K/DPLFp4YICAQDzj2KLz41BGDBfwnBgBAQITRv+sUZpYeDAUIBqcmnqsJ/x2g2XWyfqk/1qOpjBd8+Bby/2m1oRs1c+K1UhX9HWVVaQkHSmRQPfPdFNl92qWzDV5+uTqGw6blLdeoHVsip390X56ZNXcve0FvOQ4dPu0Hffb8A2SwPHT7ddIE/F8xDgepNvVTSwv/rxXgd+BV4TEQUeFBV9y/4+a8CXy/2RhG5C7gLYPPmzZ42shk0c+G3UqQyWYYuTtMeDTGXzuQLoEVCAc6Nl1aobK2FwjJZN6feLXR27OzEomUGO2NhN6c+zq4tcS5fV5mc+tlUhoWDUxFne6PJTb/k5tTDuQDvvg7UcVAvRbH/6wt5HfhvUdWzItIDPC4ix1X1IICI/D6QBv662BvdXxL7wXly1+N2NrxmLvxWTCarJFIZEqkMs6kMyXSWnvZo2YXKltuvFK9NzjHg3pBdKqf+DVd15NeUvaZnHQEPSiLEwkHnF17BoVWZ10e/CLpTLOHcSD0YIBy8NGqv59F6JRT7v76Qp4FfVc+6X4dF5BFgD3BQRD4EvBt4m/qhZkQDuPvWrdz/6DFmkul5C300Q+E3KB7oF7pzdy8PPPEis6nMvEU+FhYAK3W/YhKpDM+fGeeQm2o5dHHxJ65N8Vh+VL+9tzo59R/YtcmZ089mEXGCflad7bWUS2MMBC7lp4uQn1MvvHlaTjpjIyv8v74Uz/5FiUgbEFDVSff1O4B9IvLvcG7mvlVVbZ6hSvZu62EfNEXht3QmSzKTJZnOMpd2vpZSC2fP1m7u4doVC4CVuh84GUCvXJh25+lHee70WL6SZU5bJMiOzXF3VB9nQ2f1c+pz8/jVyuoRcR4wCi+YSw9I7g/5wG7KU/h/HQkUjfGeFWkTka3AI+63IeArqvrHIvIS0AJcdH/2lKr+xnLHsiJtZimqSiKVJZHKMJfOMpfO1PxJ2bGZJIeHxhgYGmFgcJSL0/Nz6gMCP5GvU9/N6zd0NPz0QzgYIBoOEg0HiIQCtIT8N4XkRyJS3SJtqnoCuLnI9mu8OqdpfKmMM4qfS2VIuKP5Ws8WFubUHxoc4cVXpxbl1K9f15JfT3bn5jgdK9x886vc1EskFCASDNASdr7aw2L1paTALyLrgf8A9BW+R1V/1ZtmmWaXzWp+mmYuk3Gna7TmQR6cTxn5nPqhUX5wcmxR9ktLYU59X5wt3d7Xqa+W3Hx67mnTcFBqmrduylfqiP9bwD8B38F5CteYikmkMsylnACfzqjzpw5KFheamsvVqXcycM6NL86p37q+jd1bnPVkb9zYueqc+noQCgQuzcEHC+bi3adOjb+VGvhbVdWerjVroqokM87IPemO5hOp+ihVvFA+p96dvvnhucU59V0Lcuovq1BOfbUU3mDN5bLnRvGNfs+h2ZUa+P9WRN6pqn/naWuM7+WCey6wJzNZ0hklk9W6DPCFXpucy6dZHjk5yuSCnPpQQHjDxo58RUuvcuorIZcGKQKBgJMlEwpcmpYJBwO+/kRi1mbZwC8ikzjPMArweyIyB6Tc71VVO7xvoqmUShVpU1XSWWdKJpV1Ansqs3zaZG6d1HMTs2yok3VSE6kMz50ez1e0XCqnPlfobHtvF7GIk43y9IkR7n3ouZr1p/Dv86rOGL/8pj5ue32PcyPV5trNCmzN3SZRWLip8AGufXfcUDT4Z7JuMHdH7OlMllRWyaxi/v3pEyM88IRT1Kzwgad7bve2qNlCqsqJC9P5QmfPnRkvmlO/052+6d/SzZWd0UXHqWV/AiIcGRrlzx//MZGQ0BoOknBvfC91LU3zWlM6p4h8V1XfttI2U7+KFWmbnkvx2e+9zBtfd1l+eiY3cq9kLvzXDlWnqFkxozNJjgyN5h+gGlkipz63+lQpOfXV6k9AhHBBWmQ0FCQSCnDfw88TDQcKrmXACu6Zsqw01RMF2oDLRSTOpUoeHcBVHrfNrFIm64zKCzNkBi9O0xEN5UsVKEowIAxenObs2NqKi63Eq6JmxaQyWY6dnXCmbwZHeXF4atE+Pe0t+do3O3q7ys6p96I/4YKcd+eGq/OnGCu4Z9ZqpRH/3cBHcYL8kYLtE8D/8KhNZgXpXGZMJks6kyWjzs3TjDvvXuwm6hUlFiDzQiWLmi2Uy6k/5GbfPHNqjERq/lRUSyjAzb1d7O6Ls3tLN73dsTXl1K+lP8UecGoJBcvKorGCe2atlg38qvoA8ICIfERV/1uV2mS4NMeeyhTcPHVfryY7Zi2Fxdaq0ueemktz5OQohwedKZzzE4tz6l+3vi1/U/YNFc6pL6c/4YLg3hIK0BJa+2pLzV5wz6xdSTd3ReRni2weB55X1eGKt2qBRry5m844wSLljt5z3zvTNN48oZrLBFmpsJgX1nLuXE59LtXyhSI59fFWJ6c+t/pUd1vEg15csqg/e3p5y3Xr55UpqESQX0ouQ6vRC+6ZtVnq5m6pgf/bwL8F/tHdtBdn9azrgH2q+qXKNXUxvwV+VXWCefZSUM8F9HRGyWh9lB6oZ8MTCQbcm7JL59R35lefel2Vc+pFnIyeWDhINBz0NMgbs1prLdKWBV6vqq+6B7sC+Azwb4CDgKeBvx4VjtgL89nrsdyAHyRSGZ49PeamWo4yNLL4RmVvPJYf0Rfm1FdDLtBHQ0FiEQv0xt9KDfx9uaDvGgauU9UREUl50K6ayWXEZLOQzmbzN01zI3Yvp2Kaiapy4rVpDrnLDD5fLKe+Jcgut079Ujn1XgoHA8QiQVojQaKhoNWoMQ2j1MD/TyLyt8A33O/fh7OSVhsw5kXDKkl1ceB2MmCcjJjcDdOsYgHdQ6MzSQ7ncuoHRxidmT9mCAhsu7IjvyDJtiurX6c+HAzQ1hKirSVoNeNNwyo18P8WTrC/BSeX/38Df+Mum3ibR20rWS4DJu0G81RG8yP33GjdVF8qk+XomfH8w1MvrZBTv3NzF+3R6tapF3EexMqN7JfKnTemkZQU+N0A/3/cPzWRylwK4rnUxsIbp6b2VJXTo7NuoC+eUx8NBdi+uYt+NwOnN762nPrVyAX7tpYgbZGQTeGYplNqyYafBf4E6MEZ8Ve1SNtcOsupIjf7TO1NJZyceicDZ4RXJ+YW7XNNz7p89s0NV9WuTn0sEnSmcSIhKztsmlqpUz2fAt6jqi+Uc3ARGQQmcRZvSatqv4h0A1/HWc1rEPiAqo6Wc1xzSTlVLytRITOTVX50fjJf0XK5nPrdfd3sqkJO/XJawkGeGRrlS08NcXpsdk1VSQtVqtKpMbVQah7/P6vqLWUf3An8/ap6oWDbp4ARVf2kiPwnIL7SIi83bt+p33r8YLmnb3jlVIlcS0XJVycS+embI0NjTM2lF+0TC1+qMfPxt1/HG193WUX7WqpgQNxFvZ05+39+8UJZVUlLUW6lU2NqZa15/AMi8nXgm0D+s7yqPryKtrwX5wEwgC8CBwBb3WsVyqkSWc6+s6kMz54ay68+dWp0cfGx3ngsv85sVyycf3hqNpXhoYHTVQ38y2XiFKtKutZKll4c05hqKjXwdwAzwDsKtimwUuBX4DERUeBBVd0PXKGq5wBU9ZyIFP2fIiJ3AXcBXLXJ+3oyflROlcjl9lVVXn5tOj99c7RITv26lhA7t3Tlyxdf0RHlF/7qKTqiIQRZdEyvhQIB5+ZsS4hoeOm0Sy8qWVp1TON3pWb1/Moqj3+Lqp51g/vjInK81De6vyT2gzPVs8rzN7RyqkQu3DedzTI2k0KB93/2X5fMqd/d5zxAVSyn3suqm8UERGhtCbKuJTSvMuVyvKhkadUxjd+VlF4hIteJyHdF5Kj7/U0i8gcrvU9Vz7pfh4FHgD3AqyKywT3OBpyngM0q3Lm7l3RWmU1lUJyvS1WJfP/OjUwn05wbTzA0MsOJCzOMzKQYdf8AXNHRwrtu3MAfved6vvmbt/Dff3EHH3pTHzdc1Vk0C6ac86+WiNDWEqKnI8qWy1rpaY+WHPTBqWSZyigzyTSqzte1VrL04pjGVFOpN3e/B/wuznTNDnfbUVV9wzLvaQMCqjrpvn4c2Ae8DbhYcHO3W1U/sdz57ebu0paqeqmqnBqdZcCtaPnsqTES6fk59ZFggJ1buujf0s3uvjibVpFT71XFz1zq5boK5Nl7UcnSqmMaP1hrdc5DqrpbRH5QEPifUdXty7xnK84oH5wppa+o6h+LyGXAQ8Bm4CTwc6o6stz5LfCXZjKR4sjJSzdlhyfrN6e+mHAwQHs0xLqWkC0WbkwFrDWr54KIvA7nZi0i8n7g3HJvUNUTwM1Ftl/EGfWbNcpklePnJ/K1b46fnyyaU5+raFnrnPpiggFnKmfdCjdpjTGVU06tnv3ANhE5A7wC/JJnrTJLOj+RYMBdT/bIycU59eGgcOPGTvr7utm9Jc7W9W11WT44FgnSHg3TFgnWZfuMaWSlZvWcAN6+YN7+o8CnPWybAWaTGZ45NZYviXC6SE79lu5W+vucEf3NvV3zsmzqSTgYYF1LiHXRkBVDM6aGSk+PAFR1uuDbj2OBv+Kyqrw8PJWvaHn0zDjpBfM37dEQOzc78/T9W+L0dFS3Tn05wsEArZGV8+2NMdVTVuBfwD6fV8jIdJIBd0GSw0OjRXPqr9/Q4Swe3hfnuiva67rIWCQUoC0SotVq2htTl9YS+Bv6oapKFDRbSjKd5ejZ8Xz2zcuvTS/a58qOKLv74uzqi7OzN8666Foulfdy0zhtLaGqZQpZoTRjVmfZaCIikxQP8AJ483hmHSgsaNYRDXFxeo4HnniRe1i5oFkxqsrJkZn84uHPFcmpj4YD7OiN51ef2thV/Tr15crVyFlXxWCfU1gorSsWZngywf2PHmMfWPA3ZgXLBn5Vba9WQ+pJOQXNljKZSHF4aMzJwBkaLZpTf23POrckQjc3XNXhixue9ZJ+aYXSjFm9+p4/qJFyip/lZLLKC+cmnOmboRF+VCSnvrstkn94ateWOF2t9ZVTv5xYJEhHNExrnaRfWqE0Y1bPAn8RpRYfOz+eYGDIKYlw5OQo03OZeT8PB4Wb3Jz6/r44Wy+vz5z65axrCdERC9ddRo4VSjNm9SzwF3Hn7l4eeOJFZlOZeYuW/MyOq/jXly/myxcvl1O/u6+bmzZ11l3ALEUoEGBdNER7Hefb333rVu5/9BgzyfS8xVCsUJoxK7PAX8Serd3cw7V89emTnB6bIRIMEosE+a/ffsHXOfUrqbfpnOXs3dbDPrBCacasggX+BS5OzXF4yHl4amhkhrHZFHApr95vOfUrERF3Oifku5z7vdt6LNAbswpNH/iT6SzPnxl3pm8GRzlxYXFO/YbOKP19cfq3dLNjcxfrWvz/1xYQoSMWpjMW9vUvLmNM+fwfwcqUy6nPVbR89vQ4cwty6mPhIDs2d7kZON1sjDfOIwvhYICOaJj26Nrr3Btj/KkpAv/EbK5OffGcegGuvWJdPtBf75Oc+nLk5u/bGuDTijFmbRoyCuRy6nPZN8Vy6i9b5+TU92/pZteWLl/l1JfKWbYwSGcs7Lv5e2OMdxom8J8bn3Wnb0b5wclRppNFcuo3deWzb672YU59qUIBZyWr9qitZGWMWczzwC8iQWAAOKOq7xaR7cBngSiQBn5TVZ8u97izyQw/OOXUvjm8RE5932Wt+Zuyfs2pL5WI0BoJ0h4NlbUYeTG1Ln5W6vkrvV+t+aWdxv9KWnN3TScQ+TjQD3S4gf8x4C9U9f+JyDuBT6jq3uWOceP2nfrIY9/jpeGpfPbNsbMTi3LqO6Ihdm2J55caXN/e4lW36kZLOOgsbtISqkh2TmHxs8IHo/bdcUNVglCp56/0frXml3Yaf1nrmrurPekm4F3AH+Ms3AJOtc8O93UncHal45wbn+V9n/lXxmfn16kPBoTrN7Q7ywz2xbm2x9859aXysgRyrYuflXr+Su9Xa35pp2kMXk/1fBr4BFBY5fOjwD+IyJ8BAeBNxd4oIncBdwFErrwmH/Q3dEadh6e2xNmxuatpslSCAckvW+jljdpaFz8r9fyV3q/W/NJO0xg8i5oi8m5gWFUPi8jegh99GPiYqv6NiHwA+Dzw9oXvV9X9OAu8c1nf6/Wet11Df183G7saJ6e+FLGIk5Wz1nn7UtW6+Fmp56/0frXml3aaxuBlysctwB0iMgh8DbhdRL4MfAh42N3nG8CelQ50VVeM927f2FRBvzUS4qquGBs6Y1UL+uAUP0tllJlkGlXnazWLn5V6/krvV2t+aadpDJ7f3AVwR/z3ujd3XwA+rKoHRORtwKdUdddy779x+0791uMHPW9nreWmc9qj4aqvaFUol11Sq+JnpZ6/0vvVml/aafxjqZu7tQj8bwYewJlmSuCkcx5e7v2NHvj9VBXTGOMfNcnqyVHVA8AB9/WTwLIj/GaQG913xMINVx7CGFPfmiMlpo6EgwE6YmHaW6xImjGmNizwV0lLOEhXzIqkGWNqz6KQx6LhIPHWCLFI45aLMMb4iwV+j1jAN8bUKwv8FRaLBOmKWcA3/mNF4pqHpZNUSOEDVxb0jd/kisQNTyboioUZnkxw/6PHOHB8uNZNMx6wwL8GARHao2E2xmNc2Rlt6LLPprEVFolzSnyHCAeFBw+eqHXTjAdsqmcVwsEAna1h1kUsJdM0BisS11ws8JchFAjQ1ebk4NsTtqaRWJG45mJTPSUIBoTL2lro7Y7REQ1b0DcNx4rENRcb8S9DROiMhemKhW1KxzS0vdt62AdWJK5JWOBfwrpoiO7WiC1WbprG3m09FuibhAX+BVojIeJtYU9XuTLGmFqywO+KRZwnbS0l0xjT6Jo+8FvAN8Y0m6YN/G0tITpjYQv4xpim01SBXyS3+EnI5vCNMU3L88AvIkFgADijqu92t30E+G0gDXxbVT/hZRvCwQAd0TDtUXvS1lSfFT8z9aYaI/57gBeADgARuQ14L3CTqs6JiCf/A0SEtkiQ9mjYiqaZmskVPwsHZV7xs31gwd/UjKdJ6iKyCXgX8LmCzR8GPqmqcwCqWtHyf+FggO62CJu7W+npiFrQNzVlxc9MPfL66aRPA58AsgXbrgPeIiLfF5HvicjuYm8UkbtEZEBEBkYuXljxRK2REBs6Y/R2t9LVGiFoUzqmDpwanSG2IIHAip+ZWvMs8IvIu4FhVT284EchIA68Efhd4CEpUvxGVferar+q9ndfdvlS56A9GmZTvJUrO210b+pPb7yV2VRm3jYrfmZqzcsR/y3AHSIyCHwNuF1EvgycBh5Wx9M4nwaKR/YliAgdsTC98Rjr21uIhKysgqlPVvzM1CPPIqaq3qeqm1S1D7gTeEJVPwh8E7gdQESuAyLAynM5XBrh98ZjXL6uxeromLq3d1sP++64gZ72KOOzKXrao+y74wa7sWtqqhZ5/F8AviAiR4Ek8CFV1eXeIEC8NUJHLGxz98Z3rPiZqTdVCfyqegA44L5OAh8s5/2RUIB4W6TyDTPGmCZkcyXGGNNkLPAbY0yTscBvjDFNxgK/McY0GQv8xhjTZCzwG2NMk7HAb4wxTcYCvzHGNBkL/MYY02Qs8BtjTJOxwG+MMU3GAr8xxjQZC/zGGNNkLPAbY0yTscBvjDFNxgK/McY0GQv8xhjTZCzwG2NMk/E88ItIUER+ICJ/u2D7vSKiInK5120wxhhzSTVG/PcALxRuEJFe4CeBk1U4vzHGmAKeBn4R2QS8C/jcgh/9BfAJQL08vzHGmMW8HvF/GifAZ3MbROQO4IyqPrvcG0XkLhEZEJGB1157zdtWGmNME/Es8IvIu4FhVT1csK0V+H3g/pXer6r7VbVfVfvXr1/vVTONMabphDw89i3AHSLyTiAKdABfAq4GnhURgE3AERHZo6rnPWyLMcYYl2eBX1XvA+4DEJG9wL2q+r7CfURkEOhX1QtetcMYY8x8lsdvjDFNxsupnjxVPQAcKLK9rxrnN8YYc4mN+I0xpslY4DfGmCZjgd8YY5qMBX5jjGkyvgj8x89P8gv7n+LA8eFaN8UYY3zPF4E/FBCGJxPc/+gxC/7GGLNGvgj8AK2REOGg8ODBE7VuijHG+JpvAj9ALBzk9OhMrZthjDG+5qvAP5vKsCneWutmGGOMr/km8M8k06Qyyt23bq11U4wxxteqUrJhrTJZpac9yt23bmXvtp5aN8cYY3zNF4H/J65s56t3vbHWzTDGmIbgm6keY4wxlWGB3xhjmowFfmOMaTIW+I0xpslY4DfGmCYjqlrrNqxIRF4DhkrY9XKg0dbvbbQ+WX/qX6P1qZn7s0VV1y/c6IvAXyoRGVDV/lq3o5IarU/Wn/rXaH2y/ixmUz3GGNNkLPAbY0yTabTAv7/WDfBAo/XJ+lP/Gq1P1p8FGmqO3xhjzMoabcRvjDFmBRb4jTGmyfgq8IvIF0RkWESOFmz7IxE5IyLPuH/eWfCz+0TkJRH5kYj8VG1avbRy+iMifSIyW7D9s7Vr+dKK9cnd/hH3OhwTkU8VbPfdNXK3L+qPH67REv/mvl7Q5kEReabgZ3V9faC8Pvn4Gm0XkafcNg+IyJ6Cn5V/jVTVN3+AW4GdwNGCbX8E3Ftk3+uBZ4EW4GrgZSBY6z6soT99hfvV658l+nQb8B2gxf2+x+fXaKn+1P01KtafBT//c+B+v1yfVfTJl9cIeAz4aff1O4EDa7lGvhrxq+pBYKTE3d8LfE1V51T1FeAlYM8K76mqMvvjC0v06cPAJ1V1zt1n2N3u12u0VH/q3nL/5kREgA8AX3U31f31gbL7VPeW6I8CHe7rTuCs+3pV18hXgX8Zvy0iz7kfkeLuto3AqYJ9Trvb/KBYfwCuFpEfiMj3ROQtNWtd+a4D3iIi33fbvtvd7tdrtFR/wL/XCOAtwKuq+qL7vV+vT6GFfQJ/XqOPAn8qIqeAPwPuc7ev6ho1QuD/DPA6YDtwDudjHYAU2dcPuatL9eccsFlVdwAfB74iIh1Fj1B/QkAceCPwu8BD7kjMr9doqf74+RoB/ALzR8Z+vT6FFvbJr9fow8DHVLUX+BjweXf7qq6R7wO/qr6qqhlVzQJ/xaWPOaeB3oJdN3Hp41HdWqo/7ke5i+7rwzhzedfVrqVlOQ08rI6ngSxOoSlfXiOW6I+fr5GIhICfBb5esNmv1wco3icfX6MPAQ+7r7/BGuOc7wO/iGwo+PZngNyd8EeBO0WkRUSuBq4Fnq52+8q1VH9EZL2IBN3XW3H6c6L6LVyVbwK3A4jIdUAEp7qgL68RS/TH59fo7cBxVT1dsM2v1ydnUZ98fI3OAm91X98O5KauVneNan0Hu8y73V/F+aiWwvlN92vAl4Dngefcv4QNBfv/Ps5v9B/h3hGvpz/l9Ad4H3AM5w7+EeA9tW5/GX2KAF/G+SV2BLjd59eoaH/8cI2K9cfd/r+A3yiyf11fn3L75NdrBLwZOOy2+/vArrVcIyvZYIwxTcb3Uz3GGGPKY4HfGGOajAV+Y4xpMhb4jTGmyVjgN8aYJmOB35gFxKmQem+Z7/k7EelaYZ8DIrJokWy38uI7i73HGC9Y4DemAlT1nao6tsq3b8epuGhMVVjgN03Frcd+XEQ+JyJHReSvReTtIvLPIvJiQZ3z690R+gkR+Z2C939QRJ5266I/WPAU6KCIXO6+/s/uOR4Xka8u+PTwc+77fywibxGRCLAP+Hn3mD9frb8L07ws8JtmdA3wAHATsA34RZwnI+8Ffs/dZxvwUzg1Uf5QRMIi8nrg54FbVHU7kAF+qfDA7lTO+4AdOHViFk7thFR1D061xT9U1SRwP/B1Vd2uql/HGI+Fat0AY2rgFVV9HkBEjgHfVVUVkedxFup4Bvi2OvX250RkGLgCeBuwCzjkFOMkBiysxf9m4FuqOuse//8u+Hmu0NZh91zGVJ0FftOM5gpeZwu+z3Lp/0ThPhl3uwBfVNX7WFqxMrnFzp07pjFVZ1M9xpTuu8D7RaQHQES6RWTLgn2eBN4jIlERWQe8q4TjTgLtlW2qMUuzwG9MiVT1h8AfAI+JyHPA48CGBfscwqmq+izOtM4AML7Cof8R52ay3dw1VWHVOY2pMBFZp6pTItIKHATuUtUjtW6XMTk2x2hM5e0XkeuBKM49AQv6pq7YiN8YY5qMzfEbY0yTscBvjDFNxgK/McY0GQv8xhjTZCzwG2NMk/n/hV5jFEwTUJUAAAAASUVORK5CYII=\n",
            "text/plain": [
              "<Figure size 432x288 with 1 Axes>"
            ]
          },
          "metadata": {
            "needs_background": "light"
          },
          "output_type": "display_data"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-gwo-CPhiJHH"
      },
      "source": [
        "**Practice for you**. Use the seaborn regression tool to visualize a simple linear regression to consider the question: \"Can gestational age (`Gestation`) predict baby weight (`Birthweight`)?\""
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "E-ktOtSIiJHH"
      },
      "source": [
        "# Your code goes here!:\n",
        "#sns.regplot()\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uuyi5n71iJHH"
      },
      "source": [
        "This is often simply called a *regression* model which can be extended to *multiple regression* where there are several $\\beta$s and on the right-hand side multiplied with the predictors. \n",
        "\n",
        "Everything below, from [one-sample t-test](#4.1-One-sample-t-test-and-Wilcoxon-signed-rank) to [two-way ANOVA](#6.2-Two-way-ANOVA) are just special cases of this system. Nothing more, nothing less."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dnn2z1ChiJHH"
      },
      "source": [
        "### 3.2 Pearson correlation\n",
        "We use [`scipy.stats.pearsonr`](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.pearsonr.html) to calculate a pearson correlation coefficient and the p-value for testing non-correlation between two variables. Correlations of $-1$ or $+1$ imply an exact linear relationships, and 0 implies no correlation. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ZNjW3XfmiJHH",
        "outputId": "f963cf11-159d-4886-f5ba-42f62a8e1701"
      },
      "source": [
        "# pearsonr returns the pearson's correlation coefficient and \n",
        "# the 2-tailed p-value\n",
        "stats.pearsonr(data.mheight, data.Length)"
      ],
      "execution_count": null,
      "outputs": [
        {
          "data": {
            "application/javascript": [
              "\n",
              "        if (window._pyforest_update_imports_cell) { window._pyforest_update_imports_cell('from scipy import stats'); }\n",
              "    "
            ],
            "text/plain": [
              "<IPython.core.display.Javascript object>"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/plain": [
              "(0.4849924026326507, 0.0011331311458135573)"
            ]
          },
          "execution_count": 195,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "COg9OPIziJHI"
      },
      "source": [
        "The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets.\n",
        "\n",
        "**Practice for you**. Calculate the Pearson's R for the question: \"Can gestational age (`Gestation`) predict baby weight (`Birthweight`)?\""
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "uc3DLwl7iJHI"
      },
      "source": [
        "# Your code goes here!:\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "a3AmczJmiJHI"
      },
      "source": [
        "### 3.3 Spearman rank correlation"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3h8RSt2oiJHI"
      },
      "source": [
        "As the name implies, the *Spearman rank correlation* is a *Pearson correlation* on rank-transformed $x$ and $y$:\n",
        "\n",
        "$\\text{rank}(y) = \\beta_0 + \\beta_1 \\cdot \\text{rank}(x) \\qquad \\mathcal{H}_0: \\beta_1 = 0$\n",
        "\n",
        "I'll introduce [ranks](#3.4-Theory:-Rank-transformation) in a minute. For now, notice that the correlation coefficient of the linear model is identical to a \"real\" Pearson correlation, but p-values are an approximation which is is [appropriate for samples greater than N = 10 and almost perfect when N > 20](https://lindeloev.github.io/tests-as-linear/simulations/simulate_spearman.html).\n",
        "\n",
        "Such a nice and non-mysterious equivalence!\n",
        "\n",
        "The Spearman rank-order correlation coefficient is a *nonparametric measure* of the relationship between two variables. Unlike the Pearson correlation, the Spearman correlation does not assume that both datasets are normally distributed. "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jUa-DZ-hiJHJ"
      },
      "source": [
        "We will use [`scipy.stats.spearmanr`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html) to calculate the Spearman correlation coefficient with associated p-value."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "8zFVX3cciJHJ",
        "outputId": "439d9972-e141-4cf0-a2f2-b9636e406256"
      },
      "source": [
        "# spearmanr returns the rank correlation coefficient and \n",
        "# the 2-tailed p-value\n",
        "stats.spearmanr(data.mheight, data.Length)"
      ],
      "execution_count": null,
      "outputs": [
        {
          "data": {
            "application/javascript": [
              "\n",
              "        if (window._pyforest_update_imports_cell) { window._pyforest_update_imports_cell('from scipy import stats'); }\n",
              "    "
            ],
            "text/plain": [
              "<IPython.core.display.Javascript object>"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/plain": [
              "SpearmanrResult(correlation=0.4553274463786581, pvalue=0.002446213023703486)"
            ]
          },
          "execution_count": 197,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iAJC9OmpiJHJ"
      },
      "source": [
        "**Practice for you**. Calculate the Spearman's R for the question: \"Can gestational age (`Gestation`) predict baby weight (`Birthweight`)?\""
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "cAbe0Z8riJHJ"
      },
      "source": [
        "# Your code goes here!:\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0zu7FIC_iJHJ"
      },
      "source": [
        "### 3.4 Theory: Rank-transformation\n",
        "`scipy.stats.rankdata` simply takes an array of numbers and \"replaces\" them with the integers of their rank (1st smallest, 2nd smallest, 3rd smallest, etc.). `pd.DataFrame.rank` performs a similar function, but with support for `pandas.DataFrames`. So the result of the rank-transformation `scipy.stats.rankdata([3.6, 3.4, -5.0, 8.2])` is `[3, 2, 1, 4]`. See that in the figure above?\n",
        "\n",
        "A _signed_ rank is the same, just where we rank according to absolute size first and then add in the sign second. So the signed rank here would be `[2, 1, -3, 4]`. Or in code:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "R0nXu8ziiJHK"
      },
      "source": [
        "def signed_rank(df):\n",
        "    return np.sign(df) * df.abs().rank()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bxTLVV_OiJHK"
      },
      "source": [
        "Ranks are all you need to do to convert most parametric tests into their \"non-parametric\" counterparts! "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7D5qLFEu6DFF"
      },
      "source": [
        "#### Quick Aside:  Can we look at all the quantitative data at once?\n",
        "`sns.pairplot()` from seaborn combines joint and marginal views — but rather than focusing on a single relationship, it visualizes every pairwise combination of variables simultaneously:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "I9elJgoZ6DFG",
        "scrolled": true
      },
      "source": [
        "# Uncomment and run to see a matrix of all relationships, color-coded by smoker / non-smoker\n",
        "# sns.pairplot(data=data, hue=\"smoker\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GsgaT33miJHK"
      },
      "source": [
        "\n",
        "***\n",
        "\n",
        "<a name=\"anchor4\"></a>\n",
        "## 4 One mean comparisons\n",
        "\n",
        "### 4.1 One sample t-test and Wilcoxon signed-rank\n",
        "\n",
        "The T-test and Wilcoxon signed-rank test calculate probability for the mean of ONE group of scores.  This is a test for the null hypothesis that the expected value (mean) of a sample of independent observations a is equal to the given population mean, popmean.\n",
        "\n",
        "\n",
        "#### 4.1.1 Theory: As linear models\n",
        "\n",
        "**t-test** model: A single number predicts $y$.\n",
        "\n",
        "$y = \\beta_0 \\qquad \\mathcal{H}_0: \\beta_0 = 0$\n",
        "\n",
        "In other words, it's our good old $y = \\beta_0 + \\beta_1*x$ where the last term is gone since there is no $x$ (essentially $x=0$).\n",
        "\n",
        "The same is to a very close approximately true for **Wilcoxon signed-rank test**, just with the [signed ranks](#3.0.2-Theory:-rank-transformation) of $y$ instead of $y$ itself.\n",
        "\n",
        "$\\text{signed_rank}(y) = \\beta_0$\n",
        "\n",
        "[This approximation is good enough when the sample size is larger than 14 and almost perfect if the sample size is larger than 50](https://lindeloev.github.io/tests-as-linear/simulations/simulate_wilcoxon.html)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XQICc_n-iJHL"
      },
      "source": [
        "We will use [`scipy.stats.ttest_1samp`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_1samp.html) for the parametric comparison test, and [`scipy.stats.wilcoxon`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wilcoxon.html) for the non-parametric comparison test."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GqPm4KIiiJHL"
      },
      "source": [
        "#### 4.1.2 Python code: One-sample t-test\n",
        "\n",
        "Try running the Python code below and see that the linear model (`smf.ols`) produces the same $t$, $p$, and $r$ as `scipy.stats.ttest_1samp`. \n",
        "\n",
        "First, we extract the subsets of data for smokers and non-smokers."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "xkO3BuMTiJHL"
      },
      "source": [
        "# Average of non-smoker mother birthweights\n",
        "data_nonsmokers = data[data.smoker==0]\n",
        "data_smokers = data[data.smoker==1]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Bp4Q7h9yiJHL"
      },
      "source": [
        "Next, we will calculate the average birthweight for each group."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "cdA_tFuBiJHL",
        "outputId": "4cb5ba14-291f-45ec-81d8-b82deba1d49a"
      },
      "source": [
        "avg_birthwgt_nonsmoker = statistics.mean(data_nonsmokers.Birthweight)\n",
        "avg_birthwgt_smoker = statistics.mean(data_smokers.Birthweight)\n",
        "\n",
        "print(avg_birthwgt_nonsmoker, avg_birthwgt_smoker)"
      ],
      "execution_count": null,
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "3.5095 3.1340909090909093\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TNrsozSFiJHL"
      },
      "source": [
        "Let's compare the data for smoker's to the average birthweight from a non-smoking mother."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "CmjtD9vFiJHM",
        "outputId": "f9eda4c1-0df6-4163-8b38-b375fbce4b37"
      },
      "source": [
        "t, p = stats.ttest_1samp(data_smokers.Birthweight, avg_birthwgt_nonsmoker)\n",
        "print(t,p)"
      ],
      "execution_count": null,
      "outputs": [
        {
          "data": {
            "application/javascript": [
              "\n",
              "        if (window._pyforest_update_imports_cell) { window._pyforest_update_imports_cell('from scipy import stats'); }\n",
              "    "
            ],
            "text/plain": [
              "<IPython.core.display.Javascript object>"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "-2.7894379594885215 0.010986630284506064\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6yd64v0biJHM"
      },
      "source": [
        "and the equivalent linear model with an intercept-only is fit with `smf.ols`, the tool for ordinary least squares linear regression.  If we want to make an equivalent comparison, we will shift the birthweight data by the average birthweight for non-smokers, and then see what the linear model looks like!"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "qf4E1ik_iJHM",
        "outputId": "5d65f987-ee0d-4794-dc72-907d7c16901b"
      },
      "source": [
        "# We will shift the birthweight data by the average for non-smoker\n",
        "# and store the result in a new column, Birthweight_adj\n",
        "data[\"Birthweight_adj\"] = data[\"Birthweight\"]-avg_birthwgt_nonsmoker\n",
        "data_smokers.head()"
      ],
      "execution_count": null,
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>ID</th>\n",
              "      <th>Length</th>\n",
              "      <th>Birthweight</th>\n",
              "      <th>Headcirc</th>\n",
              "      <th>Gestation</th>\n",
              "      <th>smoker</th>\n",
              "      <th>mage</th>\n",
              "      <th>mnocig</th>\n",
              "      <th>mheight</th>\n",
              "      <th>mppwt</th>\n",
              "      <th>fage</th>\n",
              "      <th>fedyrs</th>\n",
              "      <th>fnocig</th>\n",
              "      <th>fheight</th>\n",
              "      <th>lowbwt</th>\n",
              "      <th>mage35</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>20</th>\n",
              "      <td>792</td>\n",
              "      <td>53</td>\n",
              "      <td>3.64</td>\n",
              "      <td>38</td>\n",
              "      <td>40</td>\n",
              "      <td>1</td>\n",
              "      <td>20</td>\n",
              "      <td>2</td>\n",
              "      <td>170</td>\n",
              "      <td>59</td>\n",
              "      <td>24</td>\n",
              "      <td>12</td>\n",
              "      <td>12</td>\n",
              "      <td>185</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>21</th>\n",
              "      <td>1388</td>\n",
              "      <td>51</td>\n",
              "      <td>3.14</td>\n",
              "      <td>33</td>\n",
              "      <td>41</td>\n",
              "      <td>1</td>\n",
              "      <td>22</td>\n",
              "      <td>7</td>\n",
              "      <td>160</td>\n",
              "      <td>53</td>\n",
              "      <td>24</td>\n",
              "      <td>16</td>\n",
              "      <td>12</td>\n",
              "      <td>176</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>22</th>\n",
              "      <td>575</td>\n",
              "      <td>50</td>\n",
              "      <td>2.78</td>\n",
              "      <td>30</td>\n",
              "      <td>37</td>\n",
              "      <td>1</td>\n",
              "      <td>19</td>\n",
              "      <td>7</td>\n",
              "      <td>165</td>\n",
              "      <td>60</td>\n",
              "      <td>20</td>\n",
              "      <td>14</td>\n",
              "      <td>0</td>\n",
              "      <td>183</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>23</th>\n",
              "      <td>569</td>\n",
              "      <td>50</td>\n",
              "      <td>2.51</td>\n",
              "      <td>35</td>\n",
              "      <td>39</td>\n",
              "      <td>1</td>\n",
              "      <td>22</td>\n",
              "      <td>7</td>\n",
              "      <td>159</td>\n",
              "      <td>52</td>\n",
              "      <td>23</td>\n",
              "      <td>14</td>\n",
              "      <td>25</td>\n",
              "      <td>200</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>24</th>\n",
              "      <td>1363</td>\n",
              "      <td>48</td>\n",
              "      <td>2.37</td>\n",
              "      <td>30</td>\n",
              "      <td>37</td>\n",
              "      <td>1</td>\n",
              "      <td>20</td>\n",
              "      <td>7</td>\n",
              "      <td>163</td>\n",
              "      <td>47</td>\n",
              "      <td>20</td>\n",
              "      <td>10</td>\n",
              "      <td>35</td>\n",
              "      <td>185</td>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "      ID  Length  Birthweight  Headcirc  Gestation  smoker  mage  mnocig  \\\n",
              "20   792      53         3.64        38         40       1    20       2   \n",
              "21  1388      51         3.14        33         41       1    22       7   \n",
              "22   575      50         2.78        30         37       1    19       7   \n",
              "23   569      50         2.51        35         39       1    22       7   \n",
              "24  1363      48         2.37        30         37       1    20       7   \n",
              "\n",
              "    mheight  mppwt  fage  fedyrs  fnocig  fheight  lowbwt  mage35  \n",
              "20      170     59    24      12      12      185       0       0  \n",
              "21      160     53    24      16      12      176       0       0  \n",
              "22      165     60    20      14       0      183       0       0  \n",
              "23      159     52    23      14      25      200       1       0  \n",
              "24      163     47    20      10      35      185       1       0  "
            ]
          },
          "execution_count": 204,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "xafESP5UiJHM",
        "outputId": "3d51ab2e-11f6-49c7-8e11-a1cb55e24b50"
      },
      "source": [
        "# Equivalent linear model: intercept-only\n",
        "res = smf.ols(formula=\"Birthweight_adj ~ 1\", data=data[data.smoker==1]).fit() \n",
        "print(res.summary())"
      ],
      "execution_count": null,
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "                            OLS Regression Results                            \n",
            "==============================================================================\n",
            "Dep. Variable:        Birthweight_adj   R-squared:                      -0.000\n",
            "Model:                            OLS   Adj. R-squared:                 -0.000\n",
            "Method:                 Least Squares   F-statistic:                       nan\n",
            "Date:                Thu, 26 Aug 2021   Prob (F-statistic):                nan\n",
            "Time:                        13:25:46   Log-Likelihood:                -20.584\n",
            "No. Observations:                  22   AIC:                             43.17\n",
            "Df Residuals:                      21   BIC:                             44.26\n",
            "Df Model:                           0                                         \n",
            "Covariance Type:            nonrobust                                         \n",
            "==============================================================================\n",
            "                 coef    std err          t      P>|t|      [0.025      0.975]\n",
            "------------------------------------------------------------------------------\n",
            "Intercept     -0.3754      0.135     -2.789      0.011      -0.655      -0.096\n",
            "==============================================================================\n",
            "Omnibus:                        0.182   Durbin-Watson:                   1.305\n",
            "Prob(Omnibus):                  0.913   Jarque-Bera (JB):                0.016\n",
            "Skew:                           0.021   Prob(JB):                        0.992\n",
            "Kurtosis:                       2.876   Cond. No.                         1.00\n",
            "==============================================================================\n",
            "\n",
            "Notes:\n",
            "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Y0gezJ9CiJHM"
      },
      "source": [
        "Notice that the `t` statistics and `p` values for the linear model and t-test the same!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PTfWUsbMiJHN"
      },
      "source": [
        "#### 4.1.2 Python code: One-sample  Wilcoxon signed-rank test\n",
        "We use [`scipy.stats.wilcoxon`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wilcoxon.html) for the non-parametric comparison test and compare it to the $p$-value for a ranked data linear model."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Gs2oUva1iJHN",
        "outputId": "d979980e-28a9-426c-946e-0f553e5469c9"
      },
      "source": [
        "stats.wilcoxon(data[data.smoker==1].Birthweight_adj)"
      ],
      "execution_count": null,
      "outputs": [
        {
          "data": {
            "application/javascript": [
              "\n",
              "        if (window._pyforest_update_imports_cell) { window._pyforest_update_imports_cell('from scipy import stats'); }\n",
              "    "
            ],
            "text/plain": [
              "<IPython.core.display.Javascript object>"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/plain": [
              "WilcoxonResult(statistic=51.0, pvalue=0.01272726058959961)"
            ]
          },
          "execution_count": 206,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RMUu_TP2iJHN",
        "outputId": "79892ed4-1177-4a35-bc6a-736f2ddd2475"
      },
      "source": [
        "signed_rank_data = signed_rank(data[data.smoker==1])\n",
        "res = smf.ols(\"Birthweight_adj ~ 1\", data=signed_rank_data).fit()\n",
        "print(res.summary())"
      ],
      "execution_count": null,
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "                            OLS Regression Results                            \n",
            "==============================================================================\n",
            "Dep. Variable:        Birthweight_adj   R-squared:                      -0.000\n",
            "Model:                            OLS   Adj. R-squared:                 -0.000\n",
            "Method:                 Least Squares   F-statistic:                       nan\n",
            "Date:                Thu, 26 Aug 2021   Prob (F-statistic):                nan\n",
            "Time:                        13:25:46   Log-Likelihood:                -84.360\n",
            "No. Observations:                  22   AIC:                             170.7\n",
            "Df Residuals:                      21   BIC:                             171.8\n",
            "Df Model:                           0                                         \n",
            "Covariance Type:            nonrobust                                         \n",
            "==============================================================================\n",
            "                 coef    std err          t      P>|t|      [0.025      0.975]\n",
            "------------------------------------------------------------------------------\n",
            "Intercept     -6.8636      2.443     -2.809      0.011     -11.945      -1.782\n",
            "==============================================================================\n",
            "Omnibus:                        1.920   Durbin-Watson:                   1.250\n",
            "Prob(Omnibus):                  0.383   Jarque-Bera (JB):                1.649\n",
            "Skew:                           0.603   Prob(JB):                        0.438\n",
            "Kurtosis:                       2.411   Cond. No.                         1.00\n",
            "==============================================================================\n",
            "\n",
            "Notes:\n",
            "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hS6GYFA2iJHO"
      },
      "source": [
        "### 4.2 Paired samples t-test and Wilcoxon matched pairs\n",
        "\n",
        "A paired t-test or matched pairs test is used when we are interested in the difference between two variables for the same subject. Often the two variables might be measurements that are separated by time (e.g. cholesterol levels before and after a special diet). \n",
        "\n",
        "#### 4.2.1 Theory: As linear models\n",
        "\n",
        "**t-test** model: a single number (intercept) predicts the pairwise differences.\n",
        "\n",
        "$y_2-y_1 = \\beta_0 \\qquad \\mathcal{H}_0: \\beta_0 = 0$\n",
        "\n",
        "This means that there is just one $y = y_2 - y_1$ to predict and it becomes a [one-sample t-test](#4.1-One-sample-t-test-and-Wilcoxon-signed-rank) on the pairwise differences. The idea is therefore also the same as for the one-sample t-test. "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JFX2CTjYiJHO"
      },
      "source": [
        "Similarly, the **Wilcoxon matched pairs** only differ from **Wilcoxon signed-rank** in that it's testing the signed ranks of the pairwise $y_2-y_1$ differences.\n",
        "\n",
        "$\\text{signed_rank}(y_2-y_1) = \\beta_0 \\qquad \\mathcal{H}_0: \\beta_0 = 0$"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uTK0_2d_iJHO"
      },
      "source": [
        "**Let's load some data with matched pairs** from a study tested whether cholesterol was reduced after using a certain brand of margarine as part of a low fat, low cholesterol diet. \n",
        "\n",
        "The subjects consumed on average 2.31g of the active ingredient, stanol easter, a day. This data set contains information on 18 people using margarine to reduce cholesterol over three time points."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "23otu9QniJHO",
        "outputId": "8bc9c8e8-4c09-455a-dd34-1f4967824223"
      },
      "source": [
        "url = \"https://raw.githubusercontent.com/DeisData/datasets/main/Cholesterol_R.csv\"\n",
        "data_matched = pd.read_csv(url)\n",
        "data_matched.head()"
      ],
      "execution_count": null,
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>ID</th>\n",
              "      <th>Before</th>\n",
              "      <th>After4weeks</th>\n",
              "      <th>After8weeks</th>\n",
              "      <th>Margarine</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>1</td>\n",
              "      <td>6.42</td>\n",
              "      <td>5.83</td>\n",
              "      <td>5.75</td>\n",
              "      <td>B</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>2</td>\n",
              "      <td>6.76</td>\n",
              "      <td>6.20</td>\n",
              "      <td>6.13</td>\n",
              "      <td>A</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>3</td>\n",
              "      <td>6.56</td>\n",
              "      <td>5.83</td>\n",
              "      <td>5.71</td>\n",
              "      <td>B</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>4</td>\n",
              "      <td>4.80</td>\n",
              "      <td>4.27</td>\n",
              "      <td>4.15</td>\n",
              "      <td>A</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>5</td>\n",
              "      <td>8.43</td>\n",
              "      <td>7.71</td>\n",
              "      <td>7.67</td>\n",
              "      <td>B</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   ID  Before  After4weeks  After8weeks Margarine\n",
              "0   1    6.42         5.83         5.75         B\n",
              "1   2    6.76         6.20         6.13         A\n",
              "2   3    6.56         5.83         5.71         B\n",
              "3   4    4.80         4.27         4.15         A\n",
              "4   5    8.43         7.71         7.67         B"
            ]
          },
          "execution_count": 208,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "oFlU3MbfiJHO",
        "outputId": "abae48cd-bf92-4a6c-9f33-51e5cbda42ab"
      },
      "source": [
        "# Plot the data for each ID, Before and After 8 weeks\n",
        "fig, ax = plt.subplots()\n",
        "sns.scatterplot(x='ID',y='Before',data=data_matched, ax=ax)\n",
        "sns.scatterplot(x='ID',y='After8weeks',data=data_matched, ax=ax)\n",
        "ax.set_ylabel('Cholesterol Before and After 8 wks')\n",
        "ax.set_xlabel('Patient ID')"
      ],
      "execution_count": null,
      "outputs": [
        {
          "data": {
            "text/plain": [
              "Text(0.5, 0, 'Patient ID')"
            ]
          },
          "execution_count": 209,
          "metadata": {},
          "output_type": "execute_result"
        },
        {
          "data": {
            "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAEGCAYAAABvtY4XAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAfj0lEQVR4nO3de5wcdZnv8c93YNiEXGBzIUEFh2SRLPFwHV1A4sKiLCCoeJADqyuiR+Csa0SOrri64tnz2nOOnpVVXBXiHXFRQKPgCouLqPGGTkJEEFxMCMolV4SEkJGQfvaPqk46w3RPTU93dXX19/16zau7q7uqn1Rqnqn61fP7/RQRmJlZ+fR1OgAzM2sPJ3gzs5JygjczKykneDOzknKCNzMrqT07HUCtWbNmxcDAQKfDMDPrGsuXL98YEbNHe69QCX5gYIChoaFOh2Fm1jUkPVjvPTfRmJmVlBO8mVlJOcGbmZWUE7yZWUk5wZuZlVShqmistSqVYM2mrazbPMyc6ZMYmDmFvj51Oiwzy4kTfElVKsEt96zlkutWMry9wqT+Pi4/+whOWTjXSd6sR7iJpqTWbNq6M7kDDG+vcMl1K1mzaWuHIzOzvDjBl9S6zcM7k3vV8PYK67cMdygiM8ubE3xJzZk+iUn9u//3TurvY79pkzoUkZnlzQm+pAZmTuHys4/YmeSrbfADM6d0ODIzy4tvspZUX584ZeFcFixexPotw+w3zVU0Zr3GCb7E+vrEvNlTmTd7aqdDMbMOcBONmVlJOcGbmZWUE7yZWUk5wZuZlZQTvJlZSTnBm5mVlBO8mVlJOcGbmZWUOzpZbjw+vVm+xkzwkuYDD0XE7yWdABwGXB0Rj7c3NCsTj09vlr8sTTRfBXZI+iPgM8BBwL9k2bikd0i6R9Ldkq6V5KEMe5THpzfLX5YEX4mIZ4AzgY9ExDuA/cdaSdJzgcXAYES8ENgDOGciwVr38vj0ZvnLkuC3SzoXOA/4ZrqsP+P29wQmS9oT2Bt4ZPwhWhl4fHqz/GVJ8OcDxwL/EBEPSDoIuGaslSLiYeAfgd8AjwJPRMStIz8n6QJJQ5KGNmzYML7orWt4fHqz/CkiGn9AOjoilo9YdkZE3DTGen9I0n7/34DHgeuBGyKi7h+HwcHBGBoayhi6dZtqFY3HpzdrHUnLI2JwtPeynMF/StJ/qdnYucD7Mqz3MuCBiNgQEduBrwHHZQnYyqk6Pv0x82Yxb/ZUJ3ezNstSB38WcIOk1wHHA28ATs6w3m+AYyTtDWwDTgJ8em5mlpMxE3xErJZ0DvB14LfAyRGxLcN6d0i6AVgBPAPcCSyZWLhmZpZV3QQv6RdAbQP9DJJSxzskERGHjbXxiLgMuGzCUZqZ2bg1OoM/PbcozMys5eom+Ih4EEDS3wPLgB9FhLsdmpl1iSxVNGuAc4EhST+V9GFJr2pvWGZmNlFjJviI+GxEvAk4kaSD02vJ0NHJzMw6K8tokp8GDgXWkTTVnEVSGWNmZgWWpQ5+Jkn1zOPAY8DGdPAxK7pKBR5bBVvWwrS5MGM+9HmOF8uX5wHonCx18GcCSPpj4M+B2yXtERHPa3dwNgGVCtx3Eyy9ELZvg/7JcOZVsOAMJ3nLjecB6Kwxf9MlnS7pg8BngYuA7wDvb3dgNkGPrdqV3CF5XHphstwsJ54HoLOyNNGcCnwf+GhEeLjfbrFl7a7kXrV9Gzy5FmYd3JmYrOc0mgdg3uypHYqqd2RponlrHoFYi02bmzTL1Cb5/skwdW7nYrKeU50HoDbJex6A/LgxtqxmzE/a3PsnJ6+rbfAz5nc2Luspngegs8YcDz5PHg++xapVNE+uTc7cXUVjHeB5ANqr0XjwWdrgrVv19SXt7RNsc3eZm01EdR4At7nnr+7pnKRBSbdLukbSAZK+LekJST+TdGSeQVrnVMvcTrtiGed+6g5Ou2IZt9yzlkqlOFd+Zja6RtfrnwA+BPwr8CPgqojYB7g0fc96gMvczLpXowTfHxE3R8S1QETEDSRPbgN8C7xHNCpzM7Nia5TghyWdLOm1QEh6NYCkPwV25BGcdV61zK2Wy9zMukOjBH8R8D+BN5EMUXCipMdJmmcWtz80KwKXuZl1L5dJ2phc5mad5kqu+lwmaRPiMjfrJA9Y1jz3ejGzQnMlV/MaJnhJfZKOyysYM7ORXMnVvIYJPiIqwIdzisXM7FlcydW8LE00t0r6r5Lc2GVmuXMlV/Oy3GS9BJgC7JC0DRBJx6fpbY3MzIzkJv8pC+eyYPEiV3KNU5bx4KflEYiZWT2u5GpOlin7JOn1kv4ufX2ApBe3PzQzM5uILG3wnwCOBf4iff0k8PG2RWRmZi2RpQ3+TyLiKEl3AkTE7yTt1ea4rIyqE5BsWZtMKegJSMzaKkuC3y5pDyAAJM0GKo1XMRuhUoH7boKlFybzxFanEFxwhpO8WZtk+c26AlgK7CfpH4AfAP+3rVFZsVQqsPF+eGBZ8lhp4u/7Y6t2JXdIHpdemCw3s7bIUkXzJUnLgZNISiRfHRH3tj0yK4ZWnXlvWbsruVdt35bMFzvBKQXNbHRZqmi+GBH3RcTHI+KfI+JeSV/MsN4hklbW/GyWdHFLorb8tOrMe9rc5I9Drf7JyWTgZtYWWU7BFta+SNvjjx5rpYj4VUQcERFHpJ9/iqSpx7pJozPv8ZgxPznzryb56pXAjPmtidPMnqVuE42k9wB/C0yWtLm6GHgaWDLO7zkJWBURDzYVpXVO9cy7Nsk3c+bd15c061x4aPLHYaqraHqGq6c6plEb/K8jYpqk6yLi7Al+zznAtRPchnVC9cx7ZBt8M2fefX1Je7vb3HuHq6c6qu6MTpJWpPXvKyLiqKa/IKmZfwRYGBHrRnn/AuACgAMPPPDoBx/0SX7hVM/AfOZt47Xxfrhq0bOvAC9c5j/0LdLsjE6bJN0OHCTpxpFvRsQrM37/qcCK0ZJ7up0lpE0+g4ODxZk/0Hbxmbc1y9VTHdUowb8COAr4Is8eE348ifhc3Dxj1ptadQ/HmlL3Ojsino6InwDHRcT3IuJ7wBBwAPDuLBuXtDfwcuBrrQjWzLpMwaqnKpVg9YYn+fGqjaze8CSVSrkbDbIMVfCEpFeTDDZ2CvBV4MosG4+Ip4CZTUdnZt2tVdVTLajE6cXJuxuVSb6cpHnlz4HbSZpqXhwR5+cUm5mVwUTv4bSoEqfe5N0LFi8q7TjzjfbOvwHzgeMj4vURcRMlHmSs1y7drHh8DNbRot7UvTh5d6MmmqNJ6tf/XdJq4MvAHrlElbNevHSzYvEx2ECLKnGqk3fXJvmyT97d6CbrnRHx7oiYD3wAOBLYS9LNae16adS7dFuzaWuHI7Ne4WOwgRaNY9SLk3dnuclKRPwQ+KGkxSRVMecw/uEKCqvRpVtZ2+asWHwMNtCi3tS9OHl3pgRfFREVkrb5f2tPOJ3Ri5duViw+Bhto4ThGvTZ5t/ub05uXblYsPgbHUK3EGViUPHqojEzqjkXTCYODgzE0NNSR765UgjWbtvbMpZsVj49Ba0ZTY9FImtFooxHx2EQDK5Jeu3Sz4vExaK3WqA1+OcmYMwIOBH6XPt8X+A1wULuDMzOz5tVN8BFxEICkK4EbI+Jb6etTgZflE173qV5mr9s8zJzpvsw2s87JUkXzooi4qPoiIm6W9L/bGFPXcmcVMyuSLLeiN0p6n6QBSc+X9F5gU7sD60burGJmRZIlwZ8LzCaZMPvrwH7pMhuhF8e6MLPiGrOJJq2WeXsOsXQ9d1YxsyIZ8wxe0gskLZF0q6TvVH/yCK7buLOKFYFHpbSqLDdZryeZ4OPTwI72hjN+Rapa6cWxLqxYfKPfamVJ8M9ExCfbHkkTingwu7OKdVIvTmph9WW5yXqTpL+StL+kGdWftkeWgatWzHbnG/1WK8sZ/Hnp47tqlgUwr/XhjI+HWDXbnW/0W60xz+Aj4qBRfjqe3GHXwVzLB7P1Mt/ot1qZxoOX9ELgUGBn5oyIq9sVVFbVg3lkG7wPZutVvtE/hkolmct1y9pkpqgmx5XvFmMOFyzpMuAEkgT/LeBU4AcRcVarg2lmuGAPsWpmmVQqcN9Nz54ZasEZXZ3kGw0XnOVfdRZwErA2Is4HDgf+oIXxTUi1auWYebOYN3uqk7uZje6xVbuSOySPSy9MlpdUlgS/LZ2q7xlJ04H1FOAGq5nZuGxZuyu5V23flkwDWFJZ2uCHJO0LfIpkjPgngZ+2Mygzs5abNjdplqlN8v2TkzleSypLFc1fRcTjEXEl8HLgvLSpxsxKrHRDHsyYn7S5909OXlfb4GfM72xcbZSpiqYqIta0KQ4zK5Ai9hKfsL6+5IbqhYcmzTJTy19FU95/2XhVKrDxfnhgWfJYqYy9jllJlbaXeF8fzDoYBhYljyVO7jDOM/jSKmn5lFmz3Eu8HDJlL0nHSzo/fT5bUrkm3O7B8imzRtxLvByyjAd/GfBu4D3pon7gmnYGlbseLJ8ya8RDHpRDliaaM4EjgRUAEfGIpGltjSpvPVg+ZdaIhzwohyxNNE9HMp5BAEjK/Cdc0r6SbpB0n6R7JR3bbKBt1YPlU2ZjcS/x7pflDP46SVcB+0p6C/Amkk5PWXwUuCUizpK0F7B3k3G2Vw+WT5lZ+TVM8JIEfAVYAGwGDgHeHxHfHmvD6bAGLwXeCBARTwNPTzDe9qmWT806uNORWK/qsZEOrf0aJviICElfj4ijgTGT+gjzgA3A5yQdTjLMwdsjYrdCWkkXABcAHHjggeP8CrOScKmutUGWI+cnkl7UxLb3BI4CPhkRRwJbgUtHfigilkTEYEQMzp49u4mvMSsBl+paG2RJ8CeSJPlVku6S9AtJd2VY7yHgoYi4I319A0nCN7ORXKprbZDlJuupzWw4ItZK+q2kQyLiVyRjyv+ymW2ZlZ5Lda0Nsowm+SCwL3BG+rNvuiyLtwFfSs/4jwD+T3NhmpWcS3WtDcY8g5f0duAtwNfSRddIWhIRHxtr3YhYCYw6lZSZ1XCprrVBliaaNwN/Uq1+kfRB4MfAmAnerMiq8/mu2zzMnOkF6KnpUl1rsSwJXsCOmtc70mXF4Npha0Ipxzs3GyFLgv8ccIekpenrVwOfaVtE4+HaYWtSvfHOFyxe5OFwLTftvoqsm+AlHRQRD0TE5ZK+CxxPcuZ+fkTc2bIIJqJe7fCFh3buMtdXFF3B451bp+VxFdko89wAIOm2iFgREVdExEcLk9yheLXD1SuKqxbBF05PHu+7ybNDFZDHO7dOy2PWrEYJvi8dC/4Fki4Z+dOyCCaiWjtcq5O1w+6N2DU83rl1WqOryFZp1AZ/Dkl7+55AMcd/r9YOj2yD71TtcKMrCldGFEqpxzt3M2FXqF5F1ib5Vl9F1k3wae/TD0q6KyJubtk3tlLRaofdG7GrVMc7L1WbuwsPukb1KnJkG3wrryKVzOXR4APSHJIeqM+JiFMlHQocGxEtr6QZHByMoaGhVm82P/7lsk7beH9y72fkScaFy3wVWUDVKpqJXEVKWh4Ro3YozVIm+XmSUsn3pq//g2SM+GKUShZJ0a4orPe4mbCrtPsqMkvmmRUR1wEVgIh4ht07Plmtam/EgUXJo5O75alohQfWUVmyz1ZJM9k1J+sxwBNtjcrMmtPKQcsqlaTJ54FlyaPLfbtOliaaS4AbgfmSfgjMBs5qa1Rm1pxWNRP6flIpjJngI2KFpD8lmY9VwK8iYnvbIzOz5rRi0LIi9hK3cWv4p1jS8yXNStvdpwGnAK/IJTIz65yi9RK3pjQai+bvgDcCIenLwMuA7wKvkHRCRFycR4Bm1gHu01EKjZpozgX+GNgb+A0wNyKekrQnsDKH2MysU4rWS9ya0ijBD0fE08DTklZFxFOQlElKejqf8MysI9ynoxQaJfh9Jb2G5Mbq9PQ56et92h6ZmXWWZ5jqeo0S/PdIJtkG+H7N8+prs+7mQbms5BoNNnZ+noGY5cp13tYDfCRbb/LY/dYDnOCtN7nO23qAE7z1Jg/KZT2gUUen19R7DyAivtb6cMxy4jpv6wGNqmjOaPBeAE7w1r1c521F0OZKLlfRWO9ynbd1Ug6VXGNuRdI+ki6XNJT+fFiSOzqZWc+qVILVG57kx6s2snrDk1Qqjac+HVUOlVxZxoP/LHA3cHb6+i9JpvBr2EZvZlZGlUpwyz1rnzVZ9ikL545vPtUcplfMch0wPyIui4jV6c//Aua15NvNzLrMmk1bdyZ3gOHtFS65biVrNm0d34ZyqOTKkuC3STq++kLSS4BtDT5v1lYtuTw2a9K6zcM7k3vV8PYK67cMj29DrZxesY4sTTQXAVfXtLv/DjivZRGYjUPLLo/NmjRn+iQm9fftluQn9fex37RJ49tQDpVcY83otAfw+og4HDgMOCwijoyIu7JsXNIaSb+QtFLSUAvitR7XsstjsyYNzJzC5WcfwaT+JH1WTzIGZk4Z/8aqlVwDi5LHFpfpNjyDj4gdko5On29u8jtOjIiNTa5rtptGl8fzZk/tUFTWS/r6xCkL57Jg8SLWbxlmv2mTGJg5pZBXkFmaaO6UdCNwPbDzNMk9Wa0TWnZ5bDYBfX1i3uyphT+pyHI9MAPYBPwZSe/WM4DTM24/gFslLZd0wWgfkHRBtcZ+w4YNGTdrvaqll8dmJaeI9lUgSHpORDwiaT/g28DbIqLuZCGDg4MxNOSmemusUgnWbNpa+MtjszxIWh4Rg6O9l6Un6wsk3Sbp7vT1YZLel+WLI+KR9HE9sBR4cfawzUZXvTw+Zt4s5s2e6uRuVkeWJppPAe8BtgOkFTTnjLWSpCmSplWfAyeT9Ig1M7McZLnJundE/FTa7SzpmQzrzQGWpuvtCfxLRNwy/hDNzKwZWRL8RknzSW6YIuks4NGxVoqI1cDhEwvPzMyalSXBvxVYAiyQ9DDwAPC6tkZlO28krts8zJzpvpFoZuOXJcFHRLwsbUfvi4gtkg5qd2C9zN3xzawVstxk/SpARGyNiC3pshvaF5K5O76ZtUKjOVkXAAuBfUbMzzodcLfBNnJ3fDNrhUZNNIeQ9Fjdl93nZ90CvKWNMfU8d8c3s1ZoNCfrN4BvSDo2In6cY0w9r9odf2QbvLvjp9o8UbFZWWS5yXqmpHtIJvm4haT08eKIuKatkfWwbhqtLnc5TFRsVhZZfiNOTocKPh14CHgB8K62RmXujl9PDhMVm5VFlgTfnz6eBlwbEY+1MR6zxhpNVGxmu8mS4G+SdB8wCNwmaTYwzskHzVokh4mKzcpizAQfEZcCxwKDEbEdeAp4VbsDMxtVDhMVm5XFmDdZJe1NMlzBgcAFwHNISii/2d7QzEaRw0TFZmWRpYrmc8By4Lj09UMk0/c5wVtnVCcqnnVwpyMxK7Qspz3zI+JD7BoPfhvgkg4zs4LLcgb/tKTJ7BoueD7w+7ZGZWZWZF3S2S5Lgr+MpIPTAZK+BLwEeGM7gzIzK6wu6myXpYrm28BrSJL6tSTVNN9tb1hmZgXVRZ3tGo0medSIRdVZnA6UdGBErGhfWGZmBdWos13Bbvw3aqL5cIP3AvizFsdiZlZ81c52tUm+oJ3tGo0meWKegZiZdYVqZ7uRbfAF7GyXpaNTP/A/gJemi74LXJX2ajUz6y1d1NkuSxXNJ0kGHPtE+vov02X/vV1BmZkVWpd0tsuS4F8UEYfXvP6OpJ+3KyAzM2uNLNcUO9LOTQBImgfsaF9IZmbWClnO4N8F3C5pNckQBc8Hzm9rVGZmNmFjJviIuE3SwSQjSAq4LyI8VIGZWcFlOYMHOBoYSD9/uCQi4uq2RWVmZhOWpUzyi8B8YCW72t4DcIJvpy4ZzMjMiivLGfwgcGhERLuDsVQXDWZkZsWVJVvcDRSvD26ZddFgRmZWXI0GG7uJpClmGvBLST+lZhz4iHhl+8PrUV00mJGZFVejJpp/zC0K210XDWZkZsXVqInmYeCZiPhe7Q/JWf1D+YTXo6qDGfVPTl4XeDAjMyuuRmfwHwH+dpTlT6XvnZHlCyTtAQwBD0fE6eOMrzd10WBGZlZcjRL8QETcNXJhRAxJGhjHd7wduBeYPs7YeluXDGZkZsXV6JRwUoP3JmfZuKTnAa8APj2eoMzMbOIaJfifSXrLyIWS3gwsz7j9jwB/A1TqfUDSBZKGJA1t2LAh42bNzGwsjZpoLgaWSnoduxL6ILAXcOZYG5Z0OrA+IpZLOqHe5yJiCbAEYHBw0J2prOtUKsGaTVtZt3mYOdMnMTBzCn196nRYZg2n7FsHHCfpROCF6eJ/jYjvZNz2S4BXSjqNpLlnuqRrIuL1E4rYrEAqleCWe9ZyyXUrGd5eYVJ/H5effQSnLJzrJG8dpzxGIEjP4N85VhXN4OBgDA0NtT0es1ZZveFJTrtiGcPbd7VCTurv41uLFzFv9tQORma9QtLyiBgc7T3X3ZlNwLrNw7sld4Dh7RXWbxnuUERmu+SS4CPiu66BtzKaM30Sk/p3/zWa1N/HftMaFaGZ5cNn8GYTMDBzCpeffcTOJF9tgx+YOaXDkZlln/DDzEbR1ydOWTiXBYsXsX7LMPtNcxWNFYcTvNkE9fWJebOn+qaqFY6baMzMSsoJ3syspNxEYzZRnj/XCsoJ3mwiPH+uFZiPQLOJ8Py5VmBO8GYT0Wj+XLMOc4I3m4jq/Lm1PH+uFYQTvNlEeP5cKzDfZDWbCM+fawXmBG82UZ4/1wrKpxlmZiXlBG9mVlJO8GZmJeUEb2ZWUk7wZmYllcuk21lJ2gA82Ok4MpgFbOx0EOPQbfGCY85Lt8XcbfFC+2N+fkTMHu2NQiX4biFpqN4s5kXUbfGCY85Lt8XcbfFCZ2N2E42ZWUk5wZuZlZQTfHOWdDqAceq2eMEx56XbYu62eKGDMbsN3syspHwGb2ZWUk7wZmYl5QQ/CkkHSLpd0r2S7pH09lE+c4KkJyStTH/e34lYR8S0RtIv0niGRnlfkq6Q9GtJd0k6qhNx1sRzSM3+Wylps6SLR3ym4/tZ0mclrZd0d82yGZK+Len+9PEP66x7iqRfpfv80g7H/P8l3Zf+3y+VtG+ddRseRznG+wFJD9f8359WZ90i7eOv1MS7RtLKOuvms48jwj8jfoD9gaPS59OA/wAOHfGZE4BvdjrWETGtAWY1eP804GZAwDHAHZ2OuSa2PYC1JJ02CrWfgZcCRwF31yz7EHBp+vxS4IN1/k2rgHnAXsDPRx5HOcd8MrBn+vyDo8Wc5TjKMd4PAO/McNwUZh+PeP/DwPs7uY99Bj+KiHg0Ilakz7cA9wLP7WxULfEq4OpI/ATYV9L+nQ4qdRKwKiIK15M5Ir4PPDZi8auAL6TPvwC8epRVXwz8OiJWR8TTwJfT9dputJgj4taIeCZ9+RPgeXnEkkWdfZxFofZxlSQBZwPX5hFLPU7wY5A0ABwJ3DHK28dK+rmkmyUtzDeyUQVwq6Tlki4Y5f3nAr+tef0QxfnDdQ71fxmKtp8B5kTEo5CcEAD7jfKZIu/vN5FczY1mrOMoT3+dNil9tk4zWFH38SJgXUTcX+f9XPaxE3wDkqYCXwUujojNI95eQdKccDjwMeDrOYc3mpdExFHAqcBbJb10xPsaZZ2O18lK2gt4JXD9KG8XcT9nVdT9/V7gGeBLdT4y1nGUl08C84EjgEdJmjxGKuQ+Bs6l8dl7LvvYCb4OSf0kyf1LEfG1ke9HxOaIeDJ9/i2gX9KsnMMcGdMj6eN6YCnJ5Wuth4ADal4/D3gkn+gaOhVYERHrRr5RxP2cWldt3kof14/ymcLtb0nnAacDr4u0MXikDMdRLiJiXUTsiIgK8Kk6cRRxH+8JvAb4Sr3P5LWPneBHkbaffQa4NyIur/OZuennkPRikn25Kb8onxXPFEnTqs9JbqjdPeJjNwJvSKtpjgGeqDYzdFjds52i7ecaNwLnpc/PA74xymd+Bhws6aD0KuWcdL2OkHQK8G7glRHxVJ3PZDmOcjHi/tCZdeIo1D5OvQy4LyIeGu3NXPdxHnebu+0HOJ7kMu8uYGX6cxpwEXBR+pm/Bu4huWv/E+C4Dsc8L43l52lc702X18Ys4OMkVQe/AAYLsK/3JknY+9QsK9R+Jvnj8yiwneSM8c3ATOA24P70cUb62ecA36pZ9zSSKqxV1f+TDsb8a5L26uoxfeXImOsdRx2K94vpcXoXSdLev+j7OF3++erxW/PZjuxjD1VgZlZSbqIxMyspJ3gzs5JygjczKykneDOzknKCNzMrKSd4Kw1JO9LR+e6WdL2kvRt89gRJx9W8vkjSG5r83gFJf9HgvbtrvvMJSXemox9+X9LpzXynWRZO8FYm2yLiiIh4IfA0ST19PScAOxN8RFwZEVc3+b0DwKgJfhTLIuLIiDgEWAz8s6STmvxes4ac4K2slgF/JOkMSXekZ83/LmlOOoDcRcA70jP+RenY4+8EkDRf0i3pQFDLJC1Il39eyXj6P5K0WtJZ6Xf9P2BRuq13ZA0wIlYCf0/Smcus5ZzgrXTSsUBOJekF+QPgmIg4kmQo2b+JiDXAlcA/pWf8y0ZsYgnwtog4Gngn8Ima9/Yn6el8Oklih2Q8+GXptv5pnOGuABaMcx2zTPbsdABmLTS5ZgadZSTjCR0CfCUd12Qv4IFGG0hHED0OuD4dAgfgD2o+8vVIBr/6paQ5LYh5tNEQzVrCCd7KZFtEHFG7QNLHgMsj4kZJJ5DMEtRIH/D4yO3U+H3t5puKcndHkkwoY9ZybqKxstsHeDh9fl7N8i0k0zHuJpJx/x+Q9FrYOY/t4WN8x6jbGoukw4C/IxkAzqzlnOCt7D5A0tyyDNhYs/wm4MzqTdYR67wOeLOk6mh/Y00BdxfwTDrr1Fg3WRdVyyRJEvviiLgt6z/GbDw8mqSZWUn5DN7MrKSc4M3MSsoJ3syspJzgzcxKygnezKyknODNzErKCd7MrKT+E1rzXuDMRUuTAAAAAElFTkSuQmCC\n",
            "text/plain": [
              "<Figure size 432x288 with 1 Axes>"
            ]
          },
          "metadata": {
            "needs_background": "light"
          },
          "output_type": "display_data"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TXrFCFHciJHP"
      },
      "source": [
        "#### 4.2.2 Python code: Paired sample t-test"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "BoGHIfLpiJHP",
        "outputId": "3c998d83-fc24-4011-8aa5-9ad67b93b37f"
      },
      "source": [
        "t, p = scipy.stats.ttest_rel(data_matched.After8weeks, data_matched.Before)\n",
        "print(t,p)"
      ],
      "execution_count": null,
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "-14.945967764585072 3.27857534017563e-11\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "n2DBIV2NiJHP",
        "outputId": "4f0e3aa1-5fd5-4249-bf5c-6589ca2e2fab"
      },
      "source": [
        "# Let's add a column to our dataFrame subtracting the difference in cholesterol measures\n",
        "data_matched[\"cholesterol_diff\"] = data_matched.After8weeks - data_matched.Before\n",
        "data_matched.head()"
      ],
      "execution_count": null,
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>ID</th>\n",
              "      <th>Before</th>\n",
              "      <th>After4weeks</th>\n",
              "      <th>After8weeks</th>\n",
              "      <th>Margarine</th>\n",
              "      <th>cholesterol_diff</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>1</td>\n",
              "      <td>6.42</td>\n",
              "      <td>5.83</td>\n",
              "      <td>5.75</td>\n",
              "      <td>B</td>\n",
              "      <td>-0.67</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>2</td>\n",
              "      <td>6.76</td>\n",
              "      <td>6.20</td>\n",
              "      <td>6.13</td>\n",
              "      <td>A</td>\n",
              "      <td>-0.63</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>3</td>\n",
              "      <td>6.56</td>\n",
              "      <td>5.83</td>\n",
              "      <td>5.71</td>\n",
              "      <td>B</td>\n",
              "      <td>-0.85</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>4</td>\n",
              "      <td>4.80</td>\n",
              "      <td>4.27</td>\n",
              "      <td>4.15</td>\n",
              "      <td>A</td>\n",
              "      <td>-0.65</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>5</td>\n",
              "      <td>8.43</td>\n",
              "      <td>7.71</td>\n",
              "      <td>7.67</td>\n",
              "      <td>B</td>\n",
              "      <td>-0.76</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   ID  Before  After4weeks  After8weeks Margarine  cholesterol_diff\n",
              "0   1    6.42         5.83         5.75         B             -0.67\n",
              "1   2    6.76         6.20         6.13         A             -0.63\n",
              "2   3    6.56         5.83         5.71         B             -0.85\n",
              "3   4    4.80         4.27         4.15         A             -0.65\n",
              "4   5    8.43         7.71         7.67         B             -0.76"
            ]
          },
          "execution_count": 211,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "8BMF1kLgiJHP",
        "outputId": "1d9277bf-e331-4220-b9e4-edd9b366eab7"
      },
      "source": [
        "# The ttest_rel is similar to calculating the difference in y values for each match, and \n",
        "# fitting a linear model\n",
        "res = smf.ols(formula=\"cholesterol_diff ~ 1\", data=data_matched).fit()\n",
        "print(res.summary())"
      ],
      "execution_count": null,
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "                            OLS Regression Results                            \n",
            "==============================================================================\n",
            "Dep. Variable:       cholesterol_diff   R-squared:                       0.000\n",
            "Model:                            OLS   Adj. R-squared:                  0.000\n",
            "Method:                 Least Squares   F-statistic:                       nan\n",
            "Date:                Thu, 26 Aug 2021   Prob (F-statistic):                nan\n",
            "Time:                        13:25:47   Log-Likelihood:                 5.9885\n",
            "No. Observations:                  18   AIC:                            -9.977\n",
            "Df Residuals:                      17   BIC:                            -9.087\n",
            "Df Model:                           0                                         \n",
            "Covariance Type:            nonrobust                                         \n",
            "==============================================================================\n",
            "                 coef    std err          t      P>|t|      [0.025      0.975]\n",
            "------------------------------------------------------------------------------\n",
            "Intercept     -0.6289      0.042    -14.946      0.000      -0.718      -0.540\n",
            "==============================================================================\n",
            "Omnibus:                        0.404   Durbin-Watson:                   2.273\n",
            "Prob(Omnibus):                  0.817   Jarque-Bera (JB):                0.371\n",
            "Skew:                           0.291   Prob(JB):                        0.831\n",
            "Kurtosis:                       2.605   Cond. No.                         1.00\n",
            "==============================================================================\n",
            "\n",
            "Notes:\n",
            "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "/opt/anaconda3/lib/python3.8/site-packages/scipy/stats/stats.py:1603: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=18\n",
            "  warnings.warn(\"kurtosistest only valid for n>=20 ... continuing \"\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sLIScqiWiJHQ"
      },
      "source": [
        "You will notice the `t` statistics and `p` values are similar for the Paired T-test and a linear fit of the difference in the matched data."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6HSCX4biiJHQ"
      },
      "source": [
        "#### 4.2.3 Python code: Wilcoxon matched pairs\n",
        "\n",
        "Again, we do the signed-ranks trick. This is still an approximation, but a close one:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "G8ZaBCfliJHQ",
        "outputId": "b322549b-266e-431f-94c7-3f40279cc884"
      },
      "source": [
        "stats.wilcoxon(data_matched.After8weeks, data_matched.Before)\n"
      ],
      "execution_count": null,
      "outputs": [
        {
          "data": {
            "application/javascript": [
              "\n",
              "        if (window._pyforest_update_imports_cell) { window._pyforest_update_imports_cell('from scipy import stats'); }\n",
              "    "
            ],
            "text/plain": [
              "<IPython.core.display.Javascript object>"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/plain": [
              "WilcoxonResult(statistic=0.0, pvalue=7.62939453125e-06)"
            ]
          },
          "execution_count": 213,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CdHI7fOCiJHQ"
      },
      "source": [
        "**Practice for you.**  Compare the matched data after 4 weeks and after 8 weeks.  What are the results of the paired sample t-test? Of the Wilcoxon matched pairs test?"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "8GFD3RuIiJHQ"
      },
      "source": [
        "# Your code goes here!:\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3jbgVe6TiJHQ"
      },
      "source": [
        "\n",
        "***\n",
        "\n",
        "<a name=\"anchor5\"></a>\n",
        "\n",
        "# 5 Many mean comparisons\n",
        "\n",
        "ANOVAs are linear models with categorical predictors.  The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "c_7kMbIOiJHQ"
      },
      "source": [
        "### 5.1 Theory: As linear models\n",
        "\n",
        "Model: One mean for each group predicts $y$.\n",
        "\n",
        "$y = \\beta_0 + \\beta_1 x_1 + \\beta_2 x_2 + \\beta_3 x_3 +... \\qquad \\mathcal{H}_0: y = \\beta_0$\n",
        "\n",
        "where $x_i$ are indicators ($x=0$ or $x=1$) where at most one $x_i=1$ while all others are $x_i=0$. \n",
        "\n",
        "Notice how this is just \"more of the same\" of what we already did in other models above. When there are only two groups, this model is $y = \\beta_0 + \\beta_1*x$, i.e. the independent t-test. If there is only one group, it is $y = \\beta_0$, i.e. the one-sample t-test."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SHaL0Py26DFM"
      },
      "source": [
        "Since we now regress on more than one $x$, the one-way ANOVA is a **multiple regression** model.\n",
        "\n",
        "The **Kruskal-Wallis** test is simply a **one-way ANOVA** on the rank-transformed $y$ (`value`):\n",
        "\n",
        "$\\text{rank}(y) = \\beta_0 + \\beta_1 x_1 + \\beta_2 x_2 + \\beta_3 x_3 +...$\n",
        "\n",
        "This approximation is [good enough for 12 or more data points](https://lindeloev.github.io/tests-as-linear/simulations/simulate_kruskall.html). "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0f3mJJgI6DFM",
        "outputId": "577a822d-a725-4caa-9edc-00132e9272ce"
      },
      "source": [
        "### 5.2 Example data\n",
        "\n",
        "We make a two-level factor with the `Margarine` levels `A` and `B`, so that the **one-way ANOVA** basically becomes a \"two-sample t-test\". We will test the null hypothesis that the two groups have the same population mean of cholesterol level after 8 weeks."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "zyrUWuo-iJHR",
        "outputId": "9b384cda-e37a-4be3-96e9-fffe4249acde"
      },
      "source": [
        "data_matched.head()"
      ],
      "execution_count": null,
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>ID</th>\n",
              "      <th>Before</th>\n",
              "      <th>After4weeks</th>\n",
              "      <th>After8weeks</th>\n",
              "      <th>Margarine</th>\n",
              "      <th>cholesterol_diff</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>1</td>\n",
              "      <td>6.42</td>\n",
              "      <td>5.83</td>\n",
              "      <td>5.75</td>\n",
              "      <td>B</td>\n",
              "      <td>-0.67</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>2</td>\n",
              "      <td>6.76</td>\n",
              "      <td>6.20</td>\n",
              "      <td>6.13</td>\n",
              "      <td>A</td>\n",
              "      <td>-0.63</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>3</td>\n",
              "      <td>6.56</td>\n",
              "      <td>5.83</td>\n",
              "      <td>5.71</td>\n",
              "      <td>B</td>\n",
              "      <td>-0.85</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>4</td>\n",
              "      <td>4.80</td>\n",
              "      <td>4.27</td>\n",
              "      <td>4.15</td>\n",
              "      <td>A</td>\n",
              "      <td>-0.65</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>5</td>\n",
              "      <td>8.43</td>\n",
              "      <td>7.71</td>\n",
              "      <td>7.67</td>\n",
              "      <td>B</td>\n",
              "      <td>-0.76</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   ID  Before  After4weeks  After8weeks Margarine  cholesterol_diff\n",
              "0   1    6.42         5.83         5.75         B             -0.67\n",
              "1   2    6.76         6.20         6.13         A             -0.63\n",
              "2   3    6.56         5.83         5.71         B             -0.85\n",
              "3   4    4.80         4.27         4.15         A             -0.65\n",
              "4   5    8.43         7.71         7.67         B             -0.76"
            ]
          },
          "execution_count": 215,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9trjagAx6DFN"
      },
      "source": [
        "### 5.3 Analysis of variance [ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html)\n",
        "The ANOVA test has important assumptions that must be satisfied in order for the associated p-value to be valid.  We will see how to test these in **6. Checking Assumptions**\n",
        "\n",
        "- The samples are independent. \n",
        "\n",
        "- Each sample is from a normally distributed population. [KS Test](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html)\n",
        "\n",
        "- The population standard deviations of the groups are all equal. This property is known as homoscedasticity. [Bartlett Test](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.bartlett.html)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "teJ5vgXaiJHR"
      },
      "source": [
        "Let's see scipy's dedicated ANOVA function `scipy.stats.f_oneway`."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "pI-ItdZ4iJHS",
        "outputId": "a865f73b-ee0c-41ba-ff72-e39d0101be4f"
      },
      "source": [
        "# Extract subsets of data\n",
        "data_matched_A = data_matched[data_matched.Margarine=='A']\n",
        "data_matched_B = data_matched[data_matched.Margarine=='B']\n",
        "\n",
        "scipy.stats.f_oneway(data_matched_A.After8weeks,\n",
        "                    data_matched_B.After8weeks)"
      ],
      "execution_count": null,
      "outputs": [
        {
          "data": {
            "text/plain": [
              "F_onewayResult(statistic=1.2662631426081903, pvalue=0.27706640602454957)"
            ]
          },
          "execution_count": 216,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Nq6y4Hpt6DFP"
      },
      "source": [
        "### 5.4 Kruskal-Wallis H-test\n",
        "If ANOVA assumptions are not met, we can consider a non-parametric test.\n",
        "The [Kruskal-Wallis H-test](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kruskal.html) (`scipy.stats.kruskal`) tests the null hypothesis that the population median of all of the groups are equal. \n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "9p25-kFg6DFP",
        "outputId": "3e4afcbc-ae98-419a-a5c9-fa496b0c6591"
      },
      "source": [
        "# Notice here, we have combined the steps of extracting subsets of data,  \n",
        "# accessing the .After8weeks column from the subset,\n",
        "# and doing the Kruskal test!  Python is powerful!\n",
        "\n",
        "stats.kruskal(data_matched[data_matched.Margarine=='A'].After8weeks,\n",
        "                    data_matched[data_matched.Margarine=='B'].After8weeks)\n",
        "\n",
        "# Python starts working inside the parentheses, just like algebra."
      ],
      "execution_count": null,
      "outputs": [
        {
          "data": {
            "application/javascript": [
              "\n",
              "        if (window._pyforest_update_imports_cell) { window._pyforest_update_imports_cell('from scipy import stats'); }\n",
              "    "
            ],
            "text/plain": [
              "<IPython.core.display.Javascript object>"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/plain": [
              "KruskalResult(statistic=1.0311890838206494, pvalue=0.3098795496397732)"
            ]
          },
          "execution_count": 217,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "oXvG_vgp6DFP"
      },
      "source": [
        "Do Not Reject the hypothesis that median cholesterol levels at 8 weeks are equal for the Margarine A and Margarine B groups."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zfpC9aPn6DFQ"
      },
      "source": [
        "### 5.5 **CHALLENGE for you!**  \n",
        "Think back to the Birthweight `data`.  Can you set up a test of equal means for the birthweights measured from four groups of mothers who are:\n",
        "- over 35 and smoke\n",
        "- over 35 and do not smoke\n",
        "- under 35 and smoke\n",
        "- under 35 and do not smoke"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "j_8D8xww6DFQ"
      },
      "source": [
        "# Try it here!: \n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9MEHq10ZiJHT"
      },
      "source": [
        "<a name=\"anchor6\"></a>\n",
        "\n",
        "## 6. Checking Assumptions\n",
        "\n",
        "Parametric tests like the t-tests and ANOVA have important assumptions that must be satisfied in order for the associated p-value to be valid, including:  \n",
        "\n",
        "- The samples are independent. \n",
        "- Each sample is from a normally distributed population.\n",
        "- The population standard deviations of the groups are all equal. This property is known as homoscedasticity.\n",
        "\n",
        "Here are just a few tools to check some of these assumptions.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0iCNmQn4iJHT"
      },
      "source": [
        "### 6.1 Checking normality (Gaussian distribution)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ms3VJDETiJHT"
      },
      "source": [
        "Baby weight is normally distributed (i.e. bell-shaped.)  We can check this visually with a histogram `sns.histplot` and with the [Kolmogorv Smirnov Test](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html) `scipy.stats.kstest`"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "omS3q7IEiJHT",
        "outputId": "1763c4a3-fb9b-47ee-f78c-099095699da9"
      },
      "source": [
        "# [68]\n",
        "# Are the samples normally distributed?\n",
        "sns.histplot(data.Birthweight)\n",
        "stats.kstest(data.Birthweight,'norm')\n"
      ],
      "execution_count": null,
      "outputs": [
        {
          "data": {
            "application/javascript": [
              "\n",
              "        if (window._pyforest_update_imports_cell) { window._pyforest_update_imports_cell('from scipy import stats'); }\n",
              "    "
            ],
            "text/plain": [
              "<IPython.core.display.Javascript object>"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/plain": [
              "KstestResult(statistic=0.9725710502961632, pvalue=5.07943581694757e-66)"
            ]
          },
          "execution_count": 219,
          "metadata": {},
          "output_type": "execute_result"
        },
        {
          "data": {
            "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEGCAYAAABiq/5QAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAARJ0lEQVR4nO3dfZBkVX3G8e8DCwEFRIuJgX3JqqEwhjJqxhdAjUGtQiWSpIjgK6hxTSwVjWJAK5qkKhUrWgZFS9mgQSPBF8TEN1BUEIiI7i6IwEKwUNl1ibtCAhopyOIvf/TdODXO7jSz232353w/VV3Tfe/tPr8zh324c/r26VQVkqR27NF3AZKk8TL4JakxBr8kNcbgl6TGGPyS1JglfRcwjIMOOqhWrlzZdxmSNFHWrl3746qamr19IoJ/5cqVrFmzpu8yJGmiJPnBXNud6pGkxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/GrG0uUrSDIRt6XLV/T969IiNhFLNki7wqaNGzjhrK/3XcZQPv7KI/suQYuYZ/yS1BiDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGjOy4E/yoSSbk1w3Y9s7ktyY5Nokn05y4KjalyTNbZRn/OcAx8zadjFweFU9GvgP4PQRti9JmsPIgr+qLgPumLXtS1W1tXv4DWDZqNqXJM2tzzn+lwEX9ti+JDWpl+BP8hZgK3DuDo5ZlWRNkjVbtmwZX3GStMiNPfiTnAQcC7ywqmp7x1XV6qqarqrpqamp8RUoSYvcWL+IJckxwF8Av1tVPxtn25KkgVFeznkecCVwWJKNSV4OvBfYH7g4yTVJPjCq9iVJcxvZGX9VPX+OzR8cVXuSpOH4yV1JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGjOy4E/yoSSbk1w3Y9tDklyc5Obu54NH1b4kaW6jPOM/Bzhm1rbTgK9U1aHAV7rHkqQxGlnwV9VlwB2zNh8HfLi7/2HgD0bVviRpbuOe439oVd0G0P381TG3L0nN223f3E2yKsmaJGu2bNnSdzmStGiMO/h/lORggO7n5u0dWFWrq2q6qqanpqbGVqAkLXbjDv7PACd1908C/m3M7UtS80Z5Oed5wJXAYUk2Jnk58HbgmUluBp7ZPZYkjdGSUb1wVT1/O7uePqo2JUnz223f3JUkjYbBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGtNL8Cd5fZLrk1yX5Lwk+/RRhyS1aOzBn2Qp8FpguqoOB/YEThx3HZLUqr6mepYA+yZZAjwA2NRTHZLUnLEHf1X9EHgncCtwG3BnVX1p9nFJViVZk2TNli1bxl2mJC1afUz1PBg4DngYcAjwwCQvmn1cVa2uqumqmp6amhp3mZK0aPUx1fMM4HtVtaWq/he4ADiyhzokqUl9BP+twJOSPCBJgKcD63uoQ5Ka1Mcc/1XA+cA64DtdDavHXYcktWpJH41W1duAt/XRtiS1zk/uSlJjhgr+JEcNs02StPsb9oz/zCG3SZJ2czuc409yBINLLaeS/PmMXQcwWGpBkjRh5ntzd29gv+64/Wdsvws4flRFSZJGZ4fBX1VfA76W5Jyq+sGYapIkjdCwl3P+SpLVwMqZz6mqo0dRlCRpdIYN/k8CHwDOBu4bXTmSpFEbNvi3VtX7R1qJJGkshr2c87NJXpXk4CQP2XYbaWWSpJEY9oz/pO7nqTO2FfDwXVuOJsnS5SvYtHFD32UsTnssYbCG4WQ4ZNlyfrjh1r7L0JCGCv6qetioC9Hk2bRxAyec9fW+yxjax185Qat//3yrv1uNzFDBn+Qlc22vqo/s2nIkSaM27FTP42fc34fBGvrrAINfkibMsFM9r5n5OMmDgH8eSUWSpJFa6LLMPwMO3ZWFSJLGY9g5/s8yuIoHBouz/SbwiVEVJUkanWHn+N854/5W4AdVtXEE9UiSRmyoqZ5usbYbGazQ+WDg3lEWJUkanWG/get5wDeBPwaeB1yVxGWZJWkCDTvV8xbg8VW1GSDJFPBl4PxRFSZJGo1hr+rZY1vod26/H8+VJO1Ghj3jvyjJF4HzuscnAF8YTUmSpFGa7zt3fwN4aFWdmuSPgCcDAa4Ezl1oo0kOZLC2/+EMLhN9WVVdudDXkyQNb74z/jOANwNU1QXABQBJprt9v7/Adt8NXFRVxyfZG3jAAl9HknQ/zRf8K6vq2tkbq2pNkpULaTDJAcBTgZO717oXLw+VpLGZ7w3afXawb98FtvlwYAvwT0muTnJ2kgfOPijJqiRrkqzZsmXLApuSNBbd9wdMwm3p8hV9/7Z6N98Z/7eSvKKq/nHmxiQvB9buRJuPA15TVVcleTdwGvCXMw+qqtXAaoDp6en6pVeRtPuYoO8P8LsD5g/+1wGfTvJCfhH008DewB8usM2NwMaquqp7fD6D4JckjcEOg7+qfgQcmeT3GFyBA/D5qvrqQhusqv9MsiHJYVV1E4O1/W9Y6OtJku6fYdfjvwS4ZBe2+xrg3O6KnluAl+7C15Yk7cCwH+DaparqGgZTRpKkMXPZBUlqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5Ia01vwJ9kzydVJPtdXDZLUoj7P+E8B1vfYviQ1qZfgT7IMeA5wdh/tS1LLlvTU7hnAm4D9t3dAklXAKoAVK1aMp6rdwNLlK9i0cUPfZUhaxMYe/EmOBTZX1dokT9vecVW1GlgNMD09XeOprn+bNm7ghLO+3ncZQ/n4K4/suwRJC9DHVM9RwHOTfB/4GHB0ko/2UIckNWnswV9Vp1fVsqpaCZwIfLWqXjTuOiSpVV7HL0mN6evNXQCq6lLg0j5rkKTWeMYvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTG9LtkgSWO3xxKS9F3F0A5Ztpwfbrh1l76mwS+pLT/fOjHfeQGj+d4Lp3okqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGjD34kyxPckmS9UmuT3LKuGuQpJb1sUjbVuANVbUuyf7A2iQXV9UNPdQiSc0Z+xl/Vd1WVeu6+z8B1gNLx12HJLWq12WZk6wEHgtcNce+VcAqgBUrViy4jaXLV7Bp44YFP1+SFpvegj/JfsCngNdV1V2z91fVamA1wPT0dC20nU0bNzS/9rYkzdTLVT1J9mIQ+udW1QV91CBJrerjqp4AHwTWV9W7xt2+JLWujzP+o4AXA0cnuaa7PbuHOiSpSWOf46+qK4DJ+aZjSVpk/OSuJDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY3pJfiTHJPkpiTfTXJaHzVIUqvGHvxJ9gTeBzwLeBTw/CSPGncdktSqPs74nwB8t6puqap7gY8Bx/VQhyQ1KVU13gaT44FjqupPuscvBp5YVa+eddwqYFX38DDgphGWdRDw4xG+/u6ihX620Edoo5/2cef9elVNzd64ZIQNbk/m2PZL//epqtXA6tGXA0nWVNX0ONrqUwv9bKGP0EY/7ePo9DHVsxFYPuPxMmBTD3VIUpP6CP5vAYcmeViSvYETgc/0UIckNWnsUz1VtTXJq4EvAnsCH6qq68ddxyxjmVLaDbTQzxb6CG300z6OyNjf3JUk9ctP7kpSYwx+SWpMM8GfZHmSS5KsT3J9klPmOCZJ3tMtJXFtksf1UevOGLKfT0tyZ5Jruttb+6h1oZLsk+SbSb7d9fGv5zhmosdyyD5O9DjOlGTPJFcn+dwc+yZ6LLeZp49jHcs+ruPvy1bgDVW1Lsn+wNokF1fVDTOOeRZwaHd7IvD+7uckGaafAJdX1bE91Lcr3AMcXVU/TbIXcEWSC6vqGzOOmfSxHKaPMNnjONMpwHrggDn2TfpYbrOjPsIYx7KZM/6quq2q1nX3f8JgAJbOOuw44CM18A3gwCQHj7nUnTJkPydaNz4/7R7u1d1mX6Uw0WM5ZB8XhSTLgOcAZ2/nkIkeSxiqj2PVTPDPlGQl8Fjgqlm7lgIbZjzeyASH5g76CXBEN41wYZLfGm9lO6/7s/kaYDNwcVUturEcoo8w4ePYOQN4E/Dz7eyf+LFk/j7CGMeyueBPsh/wKeB1VXXX7N1zPGUiz7Lm6ec6Bmt4/DZwJvCvYy5vp1XVfVX1GAaf/H5CksNnHTLxYzlEHyd+HJMcC2yuqrU7OmyObRMzlkP2caxj2VTwd3OlnwLOraoL5jhkUSwnMV8/q+qubdMIVfUFYK8kB425zF2iqv4buBQ4ZtauRTGWsP0+LpJxPAp4bpLvM1ip9+gkH511zKSP5bx9HPdYNhP8SQJ8EFhfVe/azmGfAV7SXUXwJODOqrptbEXuAsP0M8mvdceR5AkM/ju4fXxV7pwkU0kO7O7vCzwDuHHWYRM9lsP0cdLHEaCqTq+qZVW1ksHyLV+tqhfNOmyix3KYPo57LFu6quco4MXAd7p5U4A3AysAquoDwBeAZwPfBX4GvHT8Ze60Yfp5PPBnSbYCdwMn1mR9hPtg4MMZfKnPHsAnqupzSf4UFs1YDtPHSR/H7VpkYzmnPsfSJRskqTHNTPVIkgYMfklqjMEvSY0x+CWpMQa/JDXG4NeikuS+bnXDbydZl+TIbvshSc7fznNWJnnBjMcnJ3nvLq7rb5I8Y55j/irJG+fYfmCSV+3KetQ2g1+Lzd1V9Zjuo++nA38HUFWbqur42QcnWQKsBF4we9+uVFVvraovL/DpBwIGv3YZg1+L2QHAf8H/n9Vf190/Ocknk3wW+BLwduAp3V8Kr++ee0iSi5LcnOTvu+c9L8m7uvunJLmlu/+IJFd0938nydeSrE3yxW2rSCY5J8nx3f1nJ7kxyRUZrDM/c332RyW5NMktSV7bbXs78IiuvneM7telVrT0yV21Yd/uE8v7MPj069HbOe4I4NFVdUeSpwFv3LYWepKTgccwWNn0HuCmJGcClwGnds9/CnB7kqXAk4HLuzWSzgSOq6otSU4A/hZ42bZGk+wDnAU8taq+l+S8WXU9Evg9YP+u3fcDpwGHdwu2STvN4Ndic/e2gExyBPCROVa1hMEyx3fs4HW+UlV3dq9zA4OVE69Isl8GX3CzHPgX4KkM/idwAXAYcDhwcbfsyp7A7DVlHgncUlXf6x6fB6yasf/zVXUPcE+SzcBDh+y3NDSDX4tWVV3ZrXA4Ncfu/5nn6ffMuH8fv/i3ciWDtWJuAi5ncDZ/BPAGBushXV9VR+zgdedaYniYdqVdxjl+LVpJHsngrHu+VQ5/wmBqZRiXAW/sfl7NYFrmnu6vg5uAqe4vDZLslV/+Qo0bgYdn8CU5ACcM0eb9qU+al2cTWmy2zfHD4Oz6pKq6r5t62Z5rga1Jvg2cQ/eG8HZczmCa57LudTfQLZdcVfd2b+C+J8mDGPz7OgO4ftuTq+ru7tLMi5L8GPjmfB2qqtuT/Hv35vSFVXXqfM+RdsTVOaUxS7Jf9yXqAd4H3FxV/9B3XWqHUz3S+L2i+6vkeuBBDK7ykcbGM35Jaoxn/JLUGINfkhpj8EtSYwx+SWqMwS9Jjfk/3mX00k2CiPYAAAAASUVORK5CYII=\n",
            "text/plain": [
              "<Figure size 432x288 with 1 Axes>"
            ]
          },
          "metadata": {
            "needs_background": "light"
          },
          "output_type": "display_data"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fqs5xCF_iJHT"
      },
      "source": [
        "**Your turn** Is the number of cigarettes smoked per day normally distributed?"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "yK5wUSo3iJHT"
      },
      "source": [
        "# Try it here!:\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OrxZ_MmviJHU"
      },
      "source": [
        "### 6.2 Checking heteroscedasticity"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hONK1r62iJHU"
      },
      "source": [
        "The population standard deviations of the groups are all equal. This property is known as homoscedasticity. One of several tests available to check this is the [Bartlett Test](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.bartlett.html) `scipy.stats.bartlett`."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "wh5_dEbEiJHU",
        "outputId": "3e1bc804-fc35-4a33-ea2f-e17dce402cee"
      },
      "source": [
        "# Are the standard deviations equal for the smokers and non-smokers birthweight data?\n",
        "stats.bartlett(data_smokers.Birthweight, data_nonsmokers.Birthweight)"
      ],
      "execution_count": null,
      "outputs": [
        {
          "data": {
            "application/javascript": [
              "\n",
              "        if (window._pyforest_update_imports_cell) { window._pyforest_update_imports_cell('from scipy import stats'); }\n",
              "    "
            ],
            "text/plain": [
              "<IPython.core.display.Javascript object>"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/plain": [
              "BartlettResult(statistic=0.743871469606483, pvalue=0.38842352374096334)"
            ]
          },
          "execution_count": 221,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vO1ozBzQiJHU"
      },
      "source": [
        "**Practice for you**:  Are the standard deviations of birthweight data equal for the groups of over 35 and under 35 mothers?"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Tr3UwfEmiJHU"
      },
      "source": [
        "# Try it here:\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NpgITG806DFQ"
      },
      "source": [
        "**Key Points**\n",
        "\n",
        "   * Hypothesis testing and p-values give you the **significance** of an\n",
        "     effect / difference.\n",
        "     \n",
        "   * Parametric and Non-parametric tests are special cases of linear models!\n",
        "\n",
        "   * **Regressions**  enable you to express rich\n",
        "     links in your data.\n",
        "\n",
        "   * **Visualizing** your data and fitting simple models give insight into the \n",
        "     data.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Nbz8MQmXchMZ"
      },
      "source": [
        "***\n",
        "<a name=\"anchor7\"></a>\n",
        "\n",
        "## Resources   <a class=\"anchor\" id=\"resources\"></a>\n",
        "- This workshop follows [Tests-as-linear](https://lindeloev.github.io/tests-as-linear/)\n",
        "- See more about [Scientific Computing](https://guides.library.brandeis.edu/science/computing) and [Python](https://guides.library.brandeis.edu/c.php?g=1150530&p=8519775) on our [Science Services library guide](https://guides.library.brandeis.edu/science)\n",
        "- Request 1-1 consultations with [Brandeis Data Services](https://guides.library.brandeis.edu/dataservices)\n",
        "- Take time with tutorials at [Kaggle.com](https://www.kaggle.com/learn)\n",
        "- [Brandeis LinkedIn Learning portal](https://www.brandeis.edu/its/support/linkedin-learning/index.html)\n",
        "- [Stackoverflow](https://stackoverflow.com/)\n",
        "- [Statsmodels User Guide](https://www.statsmodels.org/stable/user-guide.html)\n",
        "- [Think stats](http://greenteapress.com/wp/think-stats-2e) book\n",
        "- [Pandas Getting Started Tutorials](https://pandas.pydata.org/docs/getting_started/index.html#getting-started)\n",
        "- Data Visualization:  [Python Graph Gallery](https://python-graph-gallery.com/)\n",
        "- [Seaborn example gallery](http://seaborn.pydata.org/examples/) "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "IxGmhWcRiJHV"
      },
      "source": [
        ""
      ],
      "execution_count": null,
      "outputs": []
    }
  ]
}