{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "NBIO208.ipynb", "provenance": [], "collapsed_sections": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" }, "papermill": { "duration": 11.308284, "end_time": "2021-01-28T05:37:53.680473", "environment_variables": {}, "exception": null, "input_path": "__notebook__.ipynb", "output_path": "__notebook__.ipynb", "parameters": {}, "start_time": "2021-01-28T05:37:42.372189", "version": "2.1.0" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "0hgErj1IiJG1" }, "source": [ "# NBIO208. Statistical Testing in Python" ] }, { "cell_type": "markdown", "metadata": { "id": "4DBpQPRniJG1" }, "source": [ "The goal of this lesson is to introduce Python libraries, load and look at a dataset, begin some common statistical testing. It includes:\n", "- [1 The simplicity underlying common statistical tests](#anchor1)\n", "- [2 The python environment](#anchor2)\n", "- [A note about $p$-values](#anchor2b)\n", "- [3 Pearson and Spearman correlation](#anchor3)\n", "- [4 One mean comparisons](#anchor4)\n", "- [5 Many mean comparisons](#anchor5)\n", "- [6 Checking assumptions](#anchor6)\n", "- [Resources](#anchor7)" ] }, { "cell_type": "markdown", "metadata": { "id": "0b3h8W5-iJG2" }, "source": [ "\n", "## 1 The simplicity underlying common statistical tests\n", "\n", "Most of the common statistical models (t-test, correlation, ANOVA, chi-square, etc.) are special cases of linear models, or a very close approximation. This beautiful simplicity means that there is less to learn. In particular, it all comes down to $y = a \\cdot x + b$, where $a$ is the slope of the line and $b$ is the y-intercept where the line crosses the y-axis. \n", "\n", "There are certain assumptions we check to use these \"parametric tests.\" When assumptions are not met, we have alternative \"non-parametric\" counterparts. We will think of \"non-parametric\"\" tests as ranked versions of the corresponding parametric tests. \n", "\n", "This lesson is adapted from [Tests as Linear](https://github.com/eigenfoo/tests-as-linear), which is also available [in R](https://github.com/lindeloev/tests-as-linear). \n", "\n", "**See the [Cheat Sheet](https://lindeloev.github.io/tests-as-linear/linear_tests_cheat_sheet.pdf)**" ] }, { "cell_type": "markdown", "metadata": { "id": "3hYxOGLBchMJ" }, "source": [ "\n", "## 2 The Python Environment" ] }, { "cell_type": "markdown", "metadata": { "id": "Sno4smqRiJG3" }, "source": [ "In part 2, we will:\n", "\"drawing\"" ] }, { "cell_type": "markdown", "metadata": { "id": "RwuaB2M7iJG3" }, "source": [ "### 2.1. Load libraries. \n", "Think of these as useful powerful toolboxes we are opening up on our workbench. Many additional libraries are available from the Python Package Index.\n", "\n", "we `import` some key scientific libraries, sometimes using a shorter *alias* for ease of coding, with the syntax `import library_name as alias_name` \n", "\n", "- [numpy](https://numpy.org) - adding support for large, multi-dimensional arrays and matrices, and mathematical functions for arrays.\n", "- [pandas](https://pandas.pydata.org/) - offers data structures and operations for manipulating numerical tables and time series.\n", "- [matplotlib](https://matplotlib.org/) - the most widely used scientific plotting library in Python.\n", "- [seaborn](https://seaborn.pydata.org/) - for drawing attractive and informative statistical graphics.\n", "- [statsmodels](https://www.statsmodels.org/stable/index.html) and [scipy](https://www.scipy.org/) for hypothesis testing and regression models." ] }, { "cell_type": "code", "metadata": { "id": "8TEg4K1AiJG3" }, "source": [ "# [1]:\n", "# Python libraries for data structures, arrays, and math\n", "import numpy as np\n", "import pandas as pd\n", "\n", "# Libraris for plotting\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "# Importing the statistics module\n", "import statistics\n", "\n", "# Library for hypothesis testing\n", "import scipy\n", "\n", "# Libraries for regression modeling\n", "import statsmodels.api as sm\n", "import statsmodels.formula.api as smf\n", "\n", "# Library for latex \n", "from IPython.display import Latex, display, Markdown" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "mbR3OGMiiJG4" }, "source": [ "### 2.2 Load data.\n", "Loads a copy of the data into our environment. Unlike working with a spreadsheet, it does not affect the original file.\n", "\n", "*Our data* This synthetic dataset contains information on new born babies and their parents. It comes from [here](https://www.sheffield.ac.uk/mash/statistics/datasets). \n", "\n", "Read a Comma Separated Values (CSV) data file with `pd.read_csv()`. \n", "*[Need to read in a different file type?](https://realpython.com/pandas-read-write-files/)*\n", "- Argument is the name of the file to be read.\n", "- Assign result to a variable to store the data that was read." ] }, { "cell_type": "code", "metadata": { "id": "jWi0xwl1iJG4" }, "source": [ "# url is the name and path of the data file\n", "url = \"https://raw.githubusercontent.com/DeisData/datasets/main/Birthweight_reduced_kg_R.csv\"\n", "\n", "# data is the name of the dataFrame we are storing our data in\n", "# pd is pandas and read_csv is a tool in pandas for reading in a csv file\n", "data = pd.read_csv(url) " ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "hG_fqj-WiJG5" }, "source": [ "### 2.3. Inspect the data table \n", "There are lots of functions and methods we can apply to the dataframe to start inspecting it." ] }, { "cell_type": "markdown", "metadata": { "id": "5YtAh5rqiJG5" }, "source": [ "
Functions attached to objects are called methods. They are common in pandas. Methods have parentheses like functions, but come after the variable.\n", "
" ] }, { "cell_type": "code", "metadata": { "id": "IhO5uiE8iJG5", "outputId": "e37a4921-b815-4ea3-ebac-1852c2206532" }, "source": [ "# Method to view head of dataset\n", "data.head()" ], "execution_count": null, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDLengthBirthweightHeadcircGestationsmokermagemnocigmheightmppwtfagefedyrsfnocigfheightlowbwtmage35
01360564.55344402001625723103517900
11016534.3236400190171621912018300
2462584.10394103501725831162518501
31187534.07384402001746826142518900
4553543.9437420240175663012018400
\n", "
" ], "text/plain": [ " ID Length Birthweight Headcirc Gestation smoker mage mnocig \\\n", "0 1360 56 4.55 34 44 0 20 0 \n", "1 1016 53 4.32 36 40 0 19 0 \n", "2 462 58 4.10 39 41 0 35 0 \n", "3 1187 53 4.07 38 44 0 20 0 \n", "4 553 54 3.94 37 42 0 24 0 \n", "\n", " mheight mppwt fage fedyrs fnocig fheight lowbwt mage35 \n", "0 162 57 23 10 35 179 0 0 \n", "1 171 62 19 12 0 183 0 0 \n", "2 172 58 31 16 25 185 0 1 \n", "3 174 68 26 14 25 189 0 0 \n", "4 175 66 30 12 0 184 0 0 " ] }, "execution_count": 190, "metadata": {}, "output_type": "execute_result" } ] }, { "cell_type": "markdown", "metadata": { "id": "YO-L17ywiJG6" }, "source": [ "

This dataset contains information on new\n", "born babies and their parents.  It\n", "contains mostly continuous variables (although some have only a few values e.g. number of cigarettes smoked per day) and is most useful\n", "for correlation and regression.   

\n", "\n", "

Main dependent variable = Birthweight (lbs)

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", "

Name

\n", "
\n", "

Variable

\n", "
\n", "

Data type

\n", "
\n", "

ID

\n", "
\n", "

Baby number

\n", "
\n", "

 

\n", "
\n", "

 length

\n", "
\n", "

Length of baby (cm)

\n", "
\n", "

Scale

\n", "
\n", "

 Birthweight

\n", "
\n", "

Weight of baby (kg)

\n", "
\n", "

Scale

\n", "
\n", "

 headcirumference

\n", "
\n", "

Head Circumference

\n", "
\n", "

Scale

\n", "
\n", "

 Gestation

\n", "
\n", "

Gestation (weeks)

\n", "
\n", "

Scale

\n", "
\n", "

 smoker

\n", "
\n", "

Mother smokes 1 = smoker 0 =\n", " non-smoker

\n", "
\n", "

 Binary

\n", "
\n", "

 motherage

\n", "
\n", "

Maternal age

\n", "
\n", "

Scale

\n", "
\n", "

 mnocig

\n", "
\n", "

Number of cigarettes smoked per day\n", " by mother

\n", "
\n", "

Scale

\n", "
\n", "

 mheight

\n", "
\n", "

Mothers height (cm)

\n", "
\n", "

 Scale

\n", "
\n", "

 mppwt

\n", "
\n", "

Mothers pre-pregnancy weight (kg)

\n", "
\n", "

 Scale

\n", "
\n", "

 fage

\n", "
\n", "

Father's age

\n", "
\n", "

 Scale

\n", "
\n", "

fedyrs

\n", "
\n", "

Father’s years in education

\n", "
\n", "

Scale

\n", "
\n", "

 fnocig

\n", "
\n", "

Number of cigarettes smoked per day\n", " by father

\n", "
\n", "

Scale

\n", "
\n", "

 fheight

\n", "
\n", "

Father's height (kg)

\n", "
\n", "

 Scale

\n", "
\n", "

 lowbwt

\n", "
\n", "

Low birth weight, 0 = No and 1 = yes

\n", "
\n", "

 Binary

\n", "
\n", "

mage35

\n", "
\n", "

Mother over 35, 0 = No and 1 = yes

\n", "
\n", "

Binary

\n", "
\n" ] }, { "cell_type": "markdown", "metadata": { "id": "y3nQ0AfeiJHD" }, "source": [ "**Your turn to explore!** Uncomment and try more Pandas methods we can apply to our dataframe:" ] }, { "cell_type": "code", "metadata": { "id": "RNdCCqloiJHE" }, "source": [ "# TRY THESE!\n", "# Method to see the tail end of the data\n", "# data.tail()\n", "\n", "# Use the DataFrame.info() method to find out more about a dataframe.\n", "# data.info()\n", "\n", "# Get Dimensions (rows, columns)\n", "# data.shape\n", "\n", "# Get data types\n", "# data.dtypes\n", "\n", "# The DataFrame.columns variable stores information about the dataframe’s columns.\n", "# data.columns" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "M3Bvnz-1iJHF" }, "source": [ "# Use DataFrame.describe() to get summary statistics about data.\n", "# data.describe()" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "9xMcUBGgchMV" }, "source": [ "
\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "65yzm4ytiJHF" }, "source": [ "\n", "## A note about $p$-values" ] }, { "cell_type": "markdown", "metadata": { "id": "uqAZq7uMiJHF" }, "source": [ "When we are doing statistical testing, we form a null hypothesis that states that there is no relationship, either between the two variables being studied (one variable does not affect the other), or between a variable and some constant number like a population mean. \n", "\n", "The $p$-value, or probability value, returned with your statistical test is a number describing how likely it is that your data would have occurred by random chance (i.e. that the null hypothesis is true). It is a number between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis. A common threshold of significance is $p < 0.05$. \n", "\n", "\"drawing\"\n", "\n", "*Image from [What a p-value tells you about statistical significance](https://www.simplypsychology.org/p-value.html)*\n", "\n", "There is a lot of conversation around use and [mis-use of $p$-values](https://en.wikipedia.org/wiki/Misuse_of_p-values) in scientific analysis. Here are a few articles you may find interesting:\n", "- [\"After p values: The new statistics for undergraduate neuroscience education\"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5777847/)\n", "- [\"Estimation for better inference in neuroscience\"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6709209/) \n" ] }, { "cell_type": "markdown", "metadata": { "id": "8mzzyS6AiJHG" }, "source": [ "\n", "## 3 Pearson and Spearman correlation\n", "\n", "### 3.1 Theory: As linear models \n", "\n", "**Model:** the recipe for $y$ is a slope ($\\beta_1$) times $x$ plus an y-intercept ($\\beta_0$, aka a straight line).\n", "\n", "$y = \\beta_0 + \\beta_1 x \\qquad \\mathcal{H}_0: \\beta_1 = 0$\n", "\n", "... which is a mathy way of writing the good old $y = ax + b$ (here ordered as $y = b + ax$). The task of linear models is simply to find the numbers that best predict `y`.\n", "\n", "Either way you write it, it's an intercept ($\\beta_0$) and a slope ($\\beta_1$) yielding a straight line." ] }, { "cell_type": "markdown", "metadata": { "id": "pL34VprbiJHG" }, "source": [ "**With our data**\n", "we can use the seaborn regression plot tool `sns.regplot`to quickly visualize a linear model fit between mother's height `mheight` and baby length `Length` when asking the question \"Can moether's height predict baby length?\"" ] }, { "cell_type": "code", "metadata": { "id": "Q_wu2XyfiJHG", "outputId": "396f65d6-bacf-4ea9-9c8d-35f23f9aa75a" }, "source": [ "sns.regplot(x='mheight',y='Length',data=data)" ], "execution_count": null, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 193, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEGCAYAAABiq/5QAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAvD0lEQVR4nO3deXAc93Xg8e+bCzMAcQwkQqJIkBAtKbRkSTxArteyZUp2nI0PuRI7jpJ4y7lWipM4sl2Kd5VklSy3UuU4ScXKHrYY27VeOz7kjWRr42wi2QrNKIksgrQO0qItiQJ4CyJxH4O53v7RPcMBMABmgOmZ6Zn3qWJh0Ojp/v3Y5MNvfv36/URVMcYY0zwCtW6AMcaY6rLAb4wxTcYCvzHGNBkL/MYY02Qs8BtjTJMJ1boBpbj88su1r6+v1s0wxhhfOXz48AVVXb9wuy8Cf19fHwMDA7VuhjHG+IqIDBXbblM9xhjTZCzwG2NMk7HAb4wxTcYCvzHGNBkL/MYY02Q8zeoRkUFgEsgAaVXtF5HtwGeBKJAGflNVn/ayHcaY2jhwfJgHD57g1OgMvfFW7r51K3u39ax6P1MZ1Rjx36aq21W13/3+U8B/UdXtwP3u98aYBnPg+DD3P3qM4ckEXbEww5MJ7n/0GAeOD69qP1M5tZjqUaDDfd0JnK1BG4wxHnvw4AnCQaE1EkLE+RoOCg8ePLGq/UzleP0AlwKPiYgCD6rqfuCjwD+IyJ/h/OJ5U7E3ishdwF0Amzdv9riZxphKOzU6Q1csPG9bLBzk9OjMqvYzleP1iP8WVd0J/DTwWyJyK/Bh4GOq2gt8DPh8sTeq6n5V7VfV/vXrFz1xbIypc73xVmZTmXnbZlMZNsVbV7WfqRxPA7+qnnW/DgOPAHuADwEPu7t8w91mjGkwd9+6lVRGmUmmUXW+pjLK3bduXdV+pnI8C/wi0iYi7bnXwDuAozhz+m91d7sdeNGrNhhjamfvth723XEDPe1RxmdT9LRH2XfHDYuydUrdz1SOl3P8VwCPiEjuPF9R1b8XkSngAREJAQnceXxjTOPZu62npABe6n6mMjwL/Kp6Ari5yPYngV1endcYY8zy7MldY4xpMr6ox2+MMaZ0yXR2UaZUIQv8xhjTIBKpDGMzKWaSaaLh4JL7WeA3xhifm0mmGZtJkVhmlF/IAr8xxvjU1FyasZkkyXS2rPdZ4DfGGB9RVSbn0ozPpEhlygv4ORb4jTHGB7JZZTKRZnw2RTq7uoCfY4HfGGPqWDarjM+mmEikyGS1Ise0wG+MMXUonckyPptiMpEmq5UJ+DkW+I0xpo6kMlnGZlJMzTlF67xggd8YY+pAIpVhfDbF9Fza83NZ4DfGmBpRVabm0kwk0syVmINfCRb4jTGmyjJZZaLCN2zLYYHfGGOqJFVww9ar+ftSWOA3xhiPJVIZJmadG7b1wAK/McZ4QFWZTmaYTKSYTVZv/r4UFviNMaaCUpksk4k0kzWavy+Fp4FfRAaBSSADpFW1393+EeC3gTTwbVX9hJftMMYYL9Xz6L6Yaoz4b1PVC7lvROQ24L3ATao6JyK20KZZlQPHh3nw4AlOjc7QG2/l7lu32rqtpqr8MLovphZTPR8GPqmqcwCqOlyDNhifO3B8mPsfPUY4KHTFwgxPJrj/0WPsAwv+xlN+G90X4/Wauwo8JiKHReQud9t1wFtE5Psi8j0R2e1xG0wDevDgCcJBoTUSQsT5Gg4KDx48UeummQaVzmQZmU5yamSW4YmEb4M+eD/iv0VVz7rTOY+LyHH3nHHgjcBu4CER2aoLklrdXxR3AWzevNnjZhq/OTU6Q1csPG9bLBzk9OhMjVpkGtVsMsNEojqlFKrF0xG/qp51vw4DjwB7gNPAw+p4GsgClxd5735V7VfV/vXr13vZTONDvfHWRYtJz6YybIq31qhFppHkSiGfGpnh3PhsQwV98DDwi0ibiLTnXgPvAI4C3wRud7dfB0SAC0scxpii7r51K6mMMpN0noCcSaZJZZS7b91a66YZH0tlslycmuPkyAwXp+ZWvcJVvfNyqucK4BERyZ3nK6r69yISAb4gIkeBJPChhdM8xqxk77Ye9uHM9Z8enWGTZfWYNZhJppmYTTOTbKyR/VI8C/yqegK4ucj2JPBBr85rmsfebT0W6M2q5ZYynEisfu1av7Ind40xTSWZzjKRSDHlwcpWfmGB3xjTFJptOmc5FviNMQ0rk1WmmnQ6ZzkW+I0xDSdXBnk6malp3ft6ZYHfGNMQarWMoR9Z4DfG+FpuGcPJRJp01qZzSmGB3xjjS3PpDOOzKabnbDqnXBb4jTG+kauMOTGbImHTOYtkVTnx2jSHBkc4cnJsyf0s8Btj6p5N5yxtZDrJ4aFRDg2OcHholNGZ1IrvscBvjKlbiVSuMqZN5+Qk01mOnh1nYHCUgcFRXnptatE+V3S08Matl/GXSxzDAr8xpq5Yds58qsqpkVkODY0wMDjKs6fGSKTnf+qJhgNs7+2if0s3/X1xeuMxYpGQBX5jTH1LZbJMzKaYmkv7ahlDL0wmUhw5OcahQSfYD0/OLdrnmp517O6L078lzg1XdRIJlV5s2QK/MaZmGmEZw0rIZJUXzk0wMDTKwOAIx89PsvB3X7w1TH9fN7v74uzcHKe7LbLq81ngN8ZUXa5Q2nQTj+7PTyQYGBzh0OAoR06OMj03/xdfOCjcuLHTCfZb4mxd34Zb5n7NLPAbY6pmes6pm9OMo/vZZIZnT49xaNDJwDk9Ortony3drfT3xdnd181NmzqJhoOetMUCvzHGU5msMplwUjGbqVBaYU79ocFRjp4ZJ73g0017NMTOzfH8XH1PR7QqbbPAb4zxRDM+WTsynczP0xfLqQ8IvH5DB7vdUf11V7QTDFRm+qYcFviNMRU1NZdumidrk+ksR8+MO9k3Q6O8/Nr0on2u7Iiyuy/Orr44O3vjrIvWPux62gIRGQQmgQyQVtX+gp/dC/wpsF5VbbF1Y3xMVZlIOAG/cDrn6RMjfO3QKc5NzLKhI8adu3vZs7W7Km3y4tyl5NTHwkG293Y50zd9cTZ2xSp2U7ZSqvGr57aFgV1EeoGfBE5W4fzGGI/MpTNMJdJFc++fPjHCA0+8SCggdERDXJye44EnXuQervU8+Ffy3BOzTk79gDuqL5ZTf62bU7+7r5vrr+ogHCw9p74WavWZ4y+ATwDfqtH5jTGrlMk6T9ZOJlIk00vfrP3aoVOEAkLMzUyJhYPMpjJ87dApzwP/Ws6dz6kfHOXQ0Ag/KpJT390WoX+Lc1N215Y4Xa2rz6mvBa8DvwKPiYgCD6rqfhG5Azijqs8u9/FHRO4C7gLYvHmzx800xqxk1n3QqtRVrc5NzNKxYD47Gg5wfmJxGmOllXvu8+MJBoaWz6m/aWMnu9wHqLZeXrmc+lrwOvDfoqpnRaQHeFxEjgO/D7xjpTeq6n5gP0B/f39zpAQYU2dSmSyTiTRTq6iKuaEjxsXpufyoGyCRynJlR6zSzSz73LPJDM+cGsvflC2aU39Zqzuq9zanvhY8Dfyqetb9OiwijwBvBa4GcqP9TcAREdmjque9bIsxpjSqykwys+YHre7c3csDT7zIbCpDNBwgkcqSzip37u6tYGtLO/dsMkMineWKjhY+/tAzHD0zsSinviMaYtcWJ5++v6+b9e0tnrezVjwL/CLSBgRUddJ9/Q5gn6r2FOwzCPRbVo8xtZdMZ5maW93ovpg9W7u5h2v52qFTnJ+Y5coqZvXs2drNryT6+PJTJzk1Mks6q6Szyj/88NX8PgGBG67qyFe0rFVOfS14OeK/AnjEHdmHgK+o6t97eD5jTJkyWWUqkWZybvkbtau1Z2t31dI3k+ksz58Zd+rfDI1yokhO/YbOKLv7uunfEmfH5i7aWmqfU18LnvVaVU8AN6+wT59X5zfGLK0RFjhRVYZGZjg0OMrhwRGePT3OXJGc+h2bu/Jz9Rvj3t9f8IPm/HVnTJOanksz7uOnap2c+lEOuatPvTY1P6degGuvWJcf1fshp74WLPAb0+CyWWUqmWZ8JuW7ImnpTJYXzk0yMORk3xTLqb+sLZKvaLlrc5zO1nBtGusjFviNaVC5Fa0mE2myPprOOTc+my9d/MzJMaaTRXLqN3XlK1pe7fOc+lqwwG9MA8lklemkk5njl+mcmWTazakf5fAyOfW5kgg3beykpYFy6mvBAn+dOnB8mAcPnuDU6Ay98VbuvnUre7f1rPzGBlDpvv/ld37M5558helkhrZIkF9/89X8ztuvW9O5yzlmqcrp98J9f/lNW9ixOV7yU7XFlFrUbK3Fz7KqvDQ85ZREGBzh2NllcurdufrCnPpaFn5rFOKHO/r9/f06MDBQ62ZUzYHjw9z/6DHCQcnXGElllH133NDwwb/Sff/L7/yYB554iYA4edtZdf7cc/s1iwJ1qecu55he9Du3bygALaEg08k0qYxyz+2rL35WWNSs8GGrhccsdb+FLkzNcXhoND+qH5+dX6c+GBCud+vU9/fFubaneE79as/fjKLhIBvjrYcLqyLn2Ii/Dj148AThoNAacS5PayTETDLNgwdPNHzgr3TfP/fkKwQEQgEnsyMgkM5m+dyTrywK0qWeu5xjetHv/3ngZQQlFAySySrRUBDVtRU/K7WoWan7JdNZnjt9afrmxIXFOfVXdUWdh6fKyKmvZeG3RmKBvw6dGp2hKzY/MyEWDnJ6dKZGLaqeSvd9OpkhtCCbLyAsumFYzrnLOWapVjp37qna6bk0QyPTTgGygg/ray1+VmpRs6X2Ozc+wysXpvOrTz17enzRA2GtkSA7ervo73OmcDZ2lZ9TX8vCb43EAn8d6o23MjyZyI/+AGZTGTbFW2vYquqodN/bIs6IsHDWIKvO9tWeu5xjlqrYuWeSaa7sjHJmbJa5ghu1XhQ/K/WYhftlsspMMs1EIk0yk+XXvjh/OlaA665sz5cvvn5DB6E15tTXsvBbI7EnG+rQ3bduJZVx/lM5BbOcOdy7b91a66Z5rtJ9//U3X01WnamYrGbdr8721Z67nGOWqvDcmWyWyUSK2VSW9+3YNC/og1OALJ1VZlMZFOfrWouflXLMdCbLG6/uZnQmyeDFaV6+MM25iTmmk879CIDL1kX4qRuu4A/e9Xoe/vCb+Mwv7eTX3nw1N23qWnPQ96rvzchu7tapXNbG6dEZNjVpVk+l+r6arJ6Vzl3prJ5kOstjx87z+Sdf4dz4ygXNcpktlSx+VuyYm7pjzlOyQ8Vz6gVn9anbX9/D7r5u+i5r9Tyn3ou+N6Llbu5a4DemRhKpDDPJDNNz6bp5onYmmeYHJ8fyq0+dHUss2ufqy9vc0sVxy6mvY5bVY0wdyNW5n0lmmE1mKlL6eK2yqvz41Uk3p36UH56bWLR2bi6nfndfN7sW5NQbf7LAb4yHcsF+ei7NTDJTF6UTLkzN5R+eOjw0ykQiPe/nwYBww1VuTv2Wbq7pWdc0deqbhQV+Yzwwl84wmXDSLxeOoKvellSG586MMzA4ysDQKK8skVO/212QZMfmrnnZRcZfAiIEA0JkYc5xAbu6xlSIqjLllj32YlGTctoxeHHGWZBkcJTnziyRU7+5K7/61Gpy6k31iAhBEYJB92tACAXmf5/bVsrNdQv8xqxRKpNlKpFmIpGq2eh+fDbllkRwyhdfnErO+3kupz5X0bISOfWmcsLBAOFggFDQDegBIRQIEAg4T4hXeqrN08Dvrqk7CWSAtKr2i8ifAu8BksDLwK+o6piX7TCOZi78VqpS/o5UlceOnuevnnyF06MzNUkpTGeyHDs3kZ+++fH5SRb+yrl8XSS/IMnOLXE6Y8vXqf/Svwzy0OHTzKYyxMJBPrBrE//+TX2e9aEZiBQG8ksj80DB95FgoOIpsLl/x+H1fTcW+3k1Rvy3LVhM/XHgPlVNi8ifAPcB/7EK7WhqhUXAumJhhicT3P/oMfaBBX/Xcn9Hb772cmbd9MsDx4f59HedQmHt0RAXp+d44IkXuQdvC4WdGZt1Av3gCD84NcbMgpz6SCjAzZs68xUty8mp/9K/DPLFp4YICAQDzj2KLz41BGDBfwnBgBAQITRv+sUZpYeDAUIBqcmnqsJ/x2g2XWyfqk/1qOpjBd8+Bby/2m1oRs1c+K1UhX9HWVVaQkHSmRQPfPdFNl92qWzDV5+uTqGw6blLdeoHVsip390X56ZNXcve0FvOQ4dPu0Hffb8A2SwPHT7ddIE/F8xDgepNvVTSwv/rxXgd+BV4TEQUeFBV9y/4+a8CXy/2RhG5C7gLYPPmzZ42shk0c+G3UqQyWYYuTtMeDTGXzuQLoEVCAc6Nl1aobK2FwjJZN6feLXR27OzEomUGO2NhN6c+zq4tcS5fV5mc+tlUhoWDUxFne6PJTb/k5tTDuQDvvg7UcVAvRbH/6wt5HfhvUdWzItIDPC4ix1X1IICI/D6QBv662BvdXxL7wXly1+N2NrxmLvxWTCarJFIZEqkMs6kMyXSWnvZo2YXKltuvFK9NzjHg3pBdKqf+DVd15NeUvaZnHQEPSiLEwkHnF17BoVWZ10e/CLpTLOHcSD0YIBy8NGqv59F6JRT7v76Qp4FfVc+6X4dF5BFgD3BQRD4EvBt4m/qhZkQDuPvWrdz/6DFmkul5C300Q+E3KB7oF7pzdy8PPPEis6nMvEU+FhYAK3W/YhKpDM+fGeeQm2o5dHHxJ65N8Vh+VL+9tzo59R/YtcmZ089mEXGCflad7bWUS2MMBC7lp4uQn1MvvHlaTjpjIyv8v74Uz/5FiUgbEFDVSff1O4B9IvLvcG7mvlVVbZ6hSvZu62EfNEXht3QmSzKTJZnOMpd2vpZSC2fP1m7u4doVC4CVuh84GUCvXJh25+lHee70WL6SZU5bJMiOzXF3VB9nQ2f1c+pz8/jVyuoRcR4wCi+YSw9I7g/5wG7KU/h/HQkUjfGeFWkTka3AI+63IeArqvrHIvIS0AJcdH/2lKr+xnLHsiJtZimqSiKVJZHKMJfOMpfO1PxJ2bGZJIeHxhgYGmFgcJSL0/Nz6gMCP5GvU9/N6zd0NPz0QzgYIBoOEg0HiIQCtIT8N4XkRyJS3SJtqnoCuLnI9mu8OqdpfKmMM4qfS2VIuKP5Ws8WFubUHxoc4cVXpxbl1K9f15JfT3bn5jgdK9x886vc1EskFCASDNASdr7aw2L1paTALyLrgf8A9BW+R1V/1ZtmmWaXzWp+mmYuk3Gna7TmQR6cTxn5nPqhUX5wcmxR9ktLYU59X5wt3d7Xqa+W3Hx67mnTcFBqmrduylfqiP9bwD8B38F5CteYikmkMsylnACfzqjzpw5KFheamsvVqXcycM6NL86p37q+jd1bnPVkb9zYueqc+noQCgQuzcEHC+bi3adOjb+VGvhbVdWerjVroqokM87IPemO5hOp+ihVvFA+p96dvvnhucU59V0Lcuovq1BOfbUU3mDN5bLnRvGNfs+h2ZUa+P9WRN6pqn/naWuM7+WCey6wJzNZ0hklk9W6DPCFXpucy6dZHjk5yuSCnPpQQHjDxo58RUuvcuorIZcGKQKBgJMlEwpcmpYJBwO+/kRi1mbZwC8ikzjPMArweyIyB6Tc71VVO7xvoqmUShVpU1XSWWdKJpV1Ansqs3zaZG6d1HMTs2yok3VSE6kMz50ez1e0XCqnPlfobHtvF7GIk43y9IkR7n3ouZr1p/Dv86rOGL/8pj5ue32PcyPV5trNCmzN3SZRWLip8AGufXfcUDT4Z7JuMHdH7OlMllRWyaxi/v3pEyM88IRT1Kzwgad7bve2qNlCqsqJC9P5QmfPnRkvmlO/052+6d/SzZWd0UXHqWV/AiIcGRrlzx//MZGQ0BoOknBvfC91LU3zWlM6p4h8V1XfttI2U7+KFWmbnkvx2e+9zBtfd1l+eiY3cq9kLvzXDlWnqFkxozNJjgyN5h+gGlkipz63+lQpOfXV6k9AhHBBWmQ0FCQSCnDfw88TDQcKrmXACu6Zsqw01RMF2oDLRSTOpUoeHcBVHrfNrFIm64zKCzNkBi9O0xEN5UsVKEowIAxenObs2NqKi63Eq6JmxaQyWY6dnXCmbwZHeXF4atE+Pe0t+do3O3q7ys6p96I/4YKcd+eGq/OnGCu4Z9ZqpRH/3cBHcYL8kYLtE8D/8KhNZgXpXGZMJks6kyWjzs3TjDvvXuwm6hUlFiDzQiWLmi2Uy6k/5GbfPHNqjERq/lRUSyjAzb1d7O6Ls3tLN73dsTXl1K+lP8UecGoJBcvKorGCe2atlg38qvoA8ICIfERV/1uV2mS4NMeeyhTcPHVfryY7Zi2Fxdaq0ueemktz5OQohwedKZzzE4tz6l+3vi1/U/YNFc6pL6c/4YLg3hIK0BJa+2pLzV5wz6xdSTd3ReRni2weB55X1eGKt2qBRry5m844wSLljt5z3zvTNN48oZrLBFmpsJgX1nLuXE59LtXyhSI59fFWJ6c+t/pUd1vEg15csqg/e3p5y3Xr55UpqESQX0ouQ6vRC+6ZtVnq5m6pgf/bwL8F/tHdtBdn9azrgH2q+qXKNXUxvwV+VXWCefZSUM8F9HRGyWh9lB6oZ8MTCQbcm7JL59R35lefel2Vc+pFnIyeWDhINBz0NMgbs1prLdKWBV6vqq+6B7sC+Azwb4CDgKeBvx4VjtgL89nrsdyAHyRSGZ49PeamWo4yNLL4RmVvPJYf0Rfm1FdDLtBHQ0FiEQv0xt9KDfx9uaDvGgauU9UREUl50K6ayWXEZLOQzmbzN01zI3Yvp2Kaiapy4rVpDrnLDD5fLKe+Jcgut079Ujn1XgoHA8QiQVojQaKhoNWoMQ2j1MD/TyLyt8A33O/fh7OSVhsw5kXDKkl1ceB2MmCcjJjcDdOsYgHdQ6MzSQ7ncuoHRxidmT9mCAhsu7IjvyDJtiurX6c+HAzQ1hKirSVoNeNNwyo18P8WTrC/BSeX/38Df+Mum3ibR20rWS4DJu0G81RG8yP33GjdVF8qk+XomfH8w1MvrZBTv3NzF+3R6tapF3EexMqN7JfKnTemkZQU+N0A/3/cPzWRylwK4rnUxsIbp6b2VJXTo7NuoC+eUx8NBdi+uYt+NwOnN762nPrVyAX7tpYgbZGQTeGYplNqyYafBf4E6MEZ8Ve1SNtcOsupIjf7TO1NJZyceicDZ4RXJ+YW7XNNz7p89s0NV9WuTn0sEnSmcSIhKztsmlqpUz2fAt6jqi+Uc3ARGQQmcRZvSatqv4h0A1/HWc1rEPiAqo6Wc1xzSTlVLytRITOTVX50fjJf0XK5nPrdfd3sqkJO/XJawkGeGRrlS08NcXpsdk1VSQtVqtKpMbVQah7/P6vqLWUf3An8/ap6oWDbp4ARVf2kiPwnIL7SIi83bt+p33r8YLmnb3jlVIlcS0XJVycS+embI0NjTM2lF+0TC1+qMfPxt1/HG193WUX7WqpgQNxFvZ05+39+8UJZVUlLUW6lU2NqZa15/AMi8nXgm0D+s7yqPryKtrwX5wEwgC8CBwBb3WsVyqkSWc6+s6kMz54ay68+dWp0cfGx3ngsv85sVyycf3hqNpXhoYHTVQ38y2XiFKtKutZKll4c05hqKjXwdwAzwDsKtimwUuBX4DERUeBBVd0PXKGq5wBU9ZyIFP2fIiJ3AXcBXLXJ+3oyflROlcjl9lVVXn5tOj99c7RITv26lhA7t3Tlyxdf0RHlF/7qKTqiIQRZdEyvhQIB5+ZsS4hoeOm0Sy8qWVp1TON3pWb1/Moqj3+Lqp51g/vjInK81De6vyT2gzPVs8rzN7RyqkQu3DedzTI2k0KB93/2X5fMqd/d5zxAVSyn3suqm8UERGhtCbKuJTSvMuVyvKhkadUxjd+VlF4hIteJyHdF5Kj7/U0i8gcrvU9Vz7pfh4FHgD3AqyKywT3OBpyngM0q3Lm7l3RWmU1lUJyvS1WJfP/OjUwn05wbTzA0MsOJCzOMzKQYdf8AXNHRwrtu3MAfved6vvmbt/Dff3EHH3pTHzdc1Vk0C6ac86+WiNDWEqKnI8qWy1rpaY+WHPTBqWSZyigzyTSqzte1VrL04pjGVFOpN3e/B/wuznTNDnfbUVV9wzLvaQMCqjrpvn4c2Ae8DbhYcHO3W1U/sdz57ebu0paqeqmqnBqdZcCtaPnsqTES6fk59ZFggJ1buujf0s3uvjibVpFT71XFz1zq5boK5Nl7UcnSqmMaP1hrdc5DqrpbRH5QEPifUdXty7xnK84oH5wppa+o6h+LyGXAQ8Bm4CTwc6o6stz5LfCXZjKR4sjJSzdlhyfrN6e+mHAwQHs0xLqWkC0WbkwFrDWr54KIvA7nZi0i8n7g3HJvUNUTwM1Ftl/EGfWbNcpklePnJ/K1b46fnyyaU5+raFnrnPpiggFnKmfdCjdpjTGVU06tnv3ANhE5A7wC/JJnrTJLOj+RYMBdT/bIycU59eGgcOPGTvr7utm9Jc7W9W11WT44FgnSHg3TFgnWZfuMaWSlZvWcAN6+YN7+o8CnPWybAWaTGZ45NZYviXC6SE79lu5W+vucEf3NvV3zsmzqSTgYYF1LiHXRkBVDM6aGSk+PAFR1uuDbj2OBv+Kyqrw8PJWvaHn0zDjpBfM37dEQOzc78/T9W+L0dFS3Tn05wsEArZGV8+2NMdVTVuBfwD6fV8jIdJIBd0GSw0OjRXPqr9/Q4Swe3hfnuiva67rIWCQUoC0SotVq2htTl9YS+Bv6oapKFDRbSjKd5ejZ8Xz2zcuvTS/a58qOKLv74uzqi7OzN8666Foulfdy0zhtLaGqZQpZoTRjVmfZaCIikxQP8AJ483hmHSgsaNYRDXFxeo4HnniRe1i5oFkxqsrJkZn84uHPFcmpj4YD7OiN51ef2thV/Tr15crVyFlXxWCfU1gorSsWZngywf2PHmMfWPA3ZgXLBn5Vba9WQ+pJOQXNljKZSHF4aMzJwBkaLZpTf23POrckQjc3XNXhixue9ZJ+aYXSjFm9+p4/qJFyip/lZLLKC+cmnOmboRF+VCSnvrstkn94ateWOF2t9ZVTv5xYJEhHNExrnaRfWqE0Y1bPAn8RpRYfOz+eYGDIKYlw5OQo03OZeT8PB4Wb3Jz6/r44Wy+vz5z65axrCdERC9ddRo4VSjNm9SzwF3Hn7l4eeOJFZlOZeYuW/MyOq/jXly/myxcvl1O/u6+bmzZ11l3ALEUoEGBdNER7Hefb333rVu5/9BgzyfS8xVCsUJoxK7PAX8Serd3cw7V89emTnB6bIRIMEosE+a/ffsHXOfUrqbfpnOXs3dbDPrBCacasggX+BS5OzXF4yHl4amhkhrHZFHApr95vOfUrERF3Oifku5z7vdt6LNAbswpNH/iT6SzPnxl3pm8GRzlxYXFO/YbOKP19cfq3dLNjcxfrWvz/1xYQoSMWpjMW9vUvLmNM+fwfwcqUy6nPVbR89vQ4cwty6mPhIDs2d7kZON1sjDfOIwvhYICOaJj26Nrr3Btj/KkpAv/EbK5OffGcegGuvWJdPtBf75Oc+nLk5u/bGuDTijFmbRoyCuRy6nPZN8Vy6i9b5+TU92/pZteWLl/l1JfKWbYwSGcs7Lv5e2OMdxom8J8bn3Wnb0b5wclRppNFcuo3deWzb672YU59qUIBZyWr9qitZGWMWczzwC8iQWAAOKOq7xaR7cBngSiQBn5TVZ8u97izyQw/OOXUvjm8RE5932Wt+Zuyfs2pL5WI0BoJ0h4NlbUYeTG1Ln5W6vkrvV+t+aWdxv9KWnN3TScQ+TjQD3S4gf8x4C9U9f+JyDuBT6jq3uWOceP2nfrIY9/jpeGpfPbNsbMTi3LqO6Ihdm2J55caXN/e4lW36kZLOOgsbtISqkh2TmHxs8IHo/bdcUNVglCp56/0frXml3Yaf1nrmrurPekm4F3AH+Ms3AJOtc8O93UncHal45wbn+V9n/lXxmfn16kPBoTrN7Q7ywz2xbm2x9859aXysgRyrYuflXr+Su9Xa35pp2kMXk/1fBr4BFBY5fOjwD+IyJ8BAeBNxd4oIncBdwFErrwmH/Q3dEadh6e2xNmxuatpslSCAckvW+jljdpaFz8r9fyV3q/W/NJO0xg8i5oi8m5gWFUPi8jegh99GPiYqv6NiHwA+Dzw9oXvV9X9OAu8c1nf6/Wet11Df183G7saJ6e+FLGIk5Wz1nn7UtW6+Fmp56/0frXml3aaxuBlysctwB0iMgh8DbhdRL4MfAh42N3nG8CelQ50VVeM927f2FRBvzUS4qquGBs6Y1UL+uAUP0tllJlkGlXnazWLn5V6/krvV2t+aadpDJ7f3AVwR/z3ujd3XwA+rKoHRORtwKdUdddy779x+0791uMHPW9nreWmc9qj4aqvaFUol11Sq+JnpZ6/0vvVml/aafxjqZu7tQj8bwYewJlmSuCkcx5e7v2NHvj9VBXTGOMfNcnqyVHVA8AB9/WTwLIj/GaQG913xMINVx7CGFPfmiMlpo6EgwE6YmHaW6xImjGmNizwV0lLOEhXzIqkGWNqz6KQx6LhIPHWCLFI45aLMMb4iwV+j1jAN8bUKwv8FRaLBOmKWcA3/mNF4pqHpZNUSOEDVxb0jd/kisQNTyboioUZnkxw/6PHOHB8uNZNMx6wwL8GARHao2E2xmNc2Rlt6LLPprEVFolzSnyHCAeFBw+eqHXTjAdsqmcVwsEAna1h1kUsJdM0BisS11ws8JchFAjQ1ebk4NsTtqaRWJG45mJTPSUIBoTL2lro7Y7REQ1b0DcNx4rENRcb8S9DROiMhemKhW1KxzS0vdt62AdWJK5JWOBfwrpoiO7WiC1WbprG3m09FuibhAX+BVojIeJtYU9XuTLGmFqywO+KRZwnbS0l0xjT6Jo+8FvAN8Y0m6YN/G0tITpjYQv4xpim01SBXyS3+EnI5vCNMU3L88AvIkFgADijqu92t30E+G0gDXxbVT/hZRvCwQAd0TDtUXvS1lSfFT8z9aYaI/57gBeADgARuQ14L3CTqs6JiCf/A0SEtkiQ9mjYiqaZmskVPwsHZV7xs31gwd/UjKdJ6iKyCXgX8LmCzR8GPqmqcwCqWtHyf+FggO62CJu7W+npiFrQNzVlxc9MPfL66aRPA58AsgXbrgPeIiLfF5HvicjuYm8UkbtEZEBEBkYuXljxRK2REBs6Y/R2t9LVGiFoUzqmDpwanSG2IIHAip+ZWvMs8IvIu4FhVT284EchIA68Efhd4CEpUvxGVferar+q9ndfdvlS56A9GmZTvJUrO210b+pPb7yV2VRm3jYrfmZqzcsR/y3AHSIyCHwNuF1EvgycBh5Wx9M4nwaKR/YliAgdsTC98Rjr21uIhKysgqlPVvzM1CPPIqaq3qeqm1S1D7gTeEJVPwh8E7gdQESuAyLAynM5XBrh98ZjXL6uxeromLq3d1sP++64gZ72KOOzKXrao+y74wa7sWtqqhZ5/F8AviAiR4Ek8CFV1eXeIEC8NUJHLGxz98Z3rPiZqTdVCfyqegA44L5OAh8s5/2RUIB4W6TyDTPGmCZkcyXGGNNkLPAbY0yTscBvjDFNxgK/McY0GQv8xhjTZCzwG2NMk7HAb4wxTcYCvzHGNBkL/MYY02Qs8BtjTJOxwG+MMU3GAr8xxjQZC/zGGNNkLPAbY0yTscBvjDFNxgK/McY0GQv8xhjTZCzwG2NMk/E88ItIUER+ICJ/u2D7vSKiInK5120wxhhzSTVG/PcALxRuEJFe4CeBk1U4vzHGmAKeBn4R2QS8C/jcgh/9BfAJQL08vzHGmMW8HvF/GifAZ3MbROQO4IyqPrvcG0XkLhEZEJGB1157zdtWGmNME/Es8IvIu4FhVT1csK0V+H3g/pXer6r7VbVfVfvXr1/vVTONMabphDw89i3AHSLyTiAKdABfAq4GnhURgE3AERHZo6rnPWyLMcYYl2eBX1XvA+4DEJG9wL2q+r7CfURkEOhX1QtetcMYY8x8lsdvjDFNxsupnjxVPQAcKLK9rxrnN8YYc4mN+I0xpslY4DfGmCZjgd8YY5qMBX5jjGkyvgj8x89P8gv7n+LA8eFaN8UYY3zPF4E/FBCGJxPc/+gxC/7GGLNGvgj8AK2REOGg8ODBE7VuijHG+JpvAj9ALBzk9OhMrZthjDG+5qvAP5vKsCneWutmGGOMr/km8M8k06Qyyt23bq11U4wxxteqUrJhrTJZpac9yt23bmXvtp5aN8cYY3zNF4H/J65s56t3vbHWzTDGmIbgm6keY4wxlWGB3xhjmowFfmOMaTIW+I0xpslY4DfGmCYjqlrrNqxIRF4DhkrY9XKg0dbvbbQ+WX/qX6P1qZn7s0VV1y/c6IvAXyoRGVDV/lq3o5IarU/Wn/rXaH2y/ixmUz3GGNNkLPAbY0yTabTAv7/WDfBAo/XJ+lP/Gq1P1p8FGmqO3xhjzMoabcRvjDFmBRb4jTGmyfgq8IvIF0RkWESOFmz7IxE5IyLPuH/eWfCz+0TkJRH5kYj8VG1avbRy+iMifSIyW7D9s7Vr+dKK9cnd/hH3OhwTkU8VbPfdNXK3L+qPH67REv/mvl7Q5kEReabgZ3V9faC8Pvn4Gm0XkafcNg+IyJ6Cn5V/jVTVN3+AW4GdwNGCbX8E3Ftk3+uBZ4EW4GrgZSBY6z6soT99hfvV658l+nQb8B2gxf2+x+fXaKn+1P01KtafBT//c+B+v1yfVfTJl9cIeAz4aff1O4EDa7lGvhrxq+pBYKTE3d8LfE1V51T1FeAlYM8K76mqMvvjC0v06cPAJ1V1zt1n2N3u12u0VH/q3nL/5kREgA8AX3U31f31gbL7VPeW6I8CHe7rTuCs+3pV18hXgX8Zvy0iz7kfkeLuto3AqYJ9Trvb/KBYfwCuFpEfiMj3ROQtNWtd+a4D3iIi33fbvtvd7tdrtFR/wL/XCOAtwKuq+qL7vV+vT6GFfQJ/XqOPAn8qIqeAPwPuc7ev6ho1QuD/DPA6YDtwDudjHYAU2dcPuatL9eccsFlVdwAfB74iIh1Fj1B/QkAceCPwu8BD7kjMr9doqf74+RoB/ALzR8Z+vT6FFvbJr9fow8DHVLUX+BjweXf7qq6R7wO/qr6qqhlVzQJ/xaWPOaeB3oJdN3Hp41HdWqo/7ke5i+7rwzhzedfVrqVlOQ08rI6ngSxOoSlfXiOW6I+fr5GIhICfBb5esNmv1wco3icfX6MPAQ+7r7/BGuOc7wO/iGwo+PZngNyd8EeBO0WkRUSuBq4Fnq52+8q1VH9EZL2IBN3XW3H6c6L6LVyVbwK3A4jIdUAEp7qgL68RS/TH59fo7cBxVT1dsM2v1ydnUZ98fI3OAm91X98O5KauVneNan0Hu8y73V/F+aiWwvlN92vAl4Dngefcv4QNBfv/Ps5v9B/h3hGvpz/l9Ad4H3AM5w7+EeA9tW5/GX2KAF/G+SV2BLjd59eoaH/8cI2K9cfd/r+A3yiyf11fn3L75NdrBLwZOOy2+/vArrVcIyvZYIwxTcb3Uz3GGGPKY4HfGGOajAV+Y4xpMhb4jTGmyVjgN8aYJmOB35gFxKmQem+Z7/k7EelaYZ8DIrJokWy38uI7i73HGC9Y4DemAlT1nao6tsq3b8epuGhMVVjgN03Frcd+XEQ+JyJHReSvReTtIvLPIvJiQZ3z690R+gkR+Z2C939QRJ5266I/WPAU6KCIXO6+/s/uOR4Xka8u+PTwc+77fywibxGRCLAP+Hn3mD9frb8L07ws8JtmdA3wAHATsA34RZwnI+8Ffs/dZxvwUzg1Uf5QRMIi8nrg54FbVHU7kAF+qfDA7lTO+4AdOHViFk7thFR1D061xT9U1SRwP/B1Vd2uql/HGI+Fat0AY2rgFVV9HkBEjgHfVVUVkedxFup4Bvi2OvX250RkGLgCeBuwCzjkFOMkBiysxf9m4FuqOuse//8u+Hmu0NZh91zGVJ0FftOM5gpeZwu+z3Lp/0ThPhl3uwBfVNX7WFqxMrnFzp07pjFVZ1M9xpTuu8D7RaQHQES6RWTLgn2eBN4jIlERWQe8q4TjTgLtlW2qMUuzwG9MiVT1h8AfAI+JyHPA48CGBfscwqmq+izOtM4AML7Cof8R52ay3dw1VWHVOY2pMBFZp6pTItIKHATuUtUjtW6XMTk2x2hM5e0XkeuBKM49AQv6pq7YiN8YY5qMzfEbY0yTscBvjDFNxgK/McY0GQv8xhjTZCzwG2NMk/n/hV5jFEwTUJUAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ] }, { "cell_type": "markdown", "metadata": { "id": "-gwo-CPhiJHH" }, "source": [ "**Practice for you**. Use the seaborn regression tool to visualize a simple linear regression to consider the question: \"Can gestational age (`Gestation`) predict baby weight (`Birthweight`)?\"" ] }, { "cell_type": "code", "metadata": { "id": "E-ktOtSIiJHH" }, "source": [ "# Your code goes here!:\n", "#sns.regplot()\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "uuyi5n71iJHH" }, "source": [ "This is often simply called a *regression* model which can be extended to *multiple regression* where there are several $\\beta$s and on the right-hand side multiplied with the predictors. \n", "\n", "Everything below, from [one-sample t-test](#4.1-One-sample-t-test-and-Wilcoxon-signed-rank) to [two-way ANOVA](#6.2-Two-way-ANOVA) are just special cases of this system. Nothing more, nothing less." ] }, { "cell_type": "markdown", "metadata": { "id": "dnn2z1ChiJHH" }, "source": [ "### 3.2 Pearson correlation\n", "We use [`scipy.stats.pearsonr`](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.pearsonr.html) to calculate a pearson correlation coefficient and the p-value for testing non-correlation between two variables. Correlations of $-1$ or $+1$ imply an exact linear relationships, and 0 implies no correlation. " ] }, { "cell_type": "code", "metadata": { "id": "ZNjW3XfmiJHH", "outputId": "f963cf11-159d-4886-f5ba-42f62a8e1701" }, "source": [ "# pearsonr returns the pearson's correlation coefficient and \n", "# the 2-tailed p-value\n", "stats.pearsonr(data.mheight, data.Length)" ], "execution_count": null, "outputs": [ { "data": { "application/javascript": [ "\n", " if (window._pyforest_update_imports_cell) { window._pyforest_update_imports_cell('from scipy import stats'); }\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "(0.4849924026326507, 0.0011331311458135573)" ] }, "execution_count": 195, "metadata": {}, "output_type": "execute_result" } ] }, { "cell_type": "markdown", "metadata": { "id": "COg9OPIziJHI" }, "source": [ "The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets.\n", "\n", "**Practice for you**. Calculate the Pearson's R for the question: \"Can gestational age (`Gestation`) predict baby weight (`Birthweight`)?\"" ] }, { "cell_type": "code", "metadata": { "id": "uc3DLwl7iJHI" }, "source": [ "# Your code goes here!:\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "a3AmczJmiJHI" }, "source": [ "### 3.3 Spearman rank correlation" ] }, { "cell_type": "markdown", "metadata": { "id": "3h8RSt2oiJHI" }, "source": [ "As the name implies, the *Spearman rank correlation* is a *Pearson correlation* on rank-transformed $x$ and $y$:\n", "\n", "$\\text{rank}(y) = \\beta_0 + \\beta_1 \\cdot \\text{rank}(x) \\qquad \\mathcal{H}_0: \\beta_1 = 0$\n", "\n", "I'll introduce [ranks](#3.4-Theory:-Rank-transformation) in a minute. For now, notice that the correlation coefficient of the linear model is identical to a \"real\" Pearson correlation, but p-values are an approximation which is is [appropriate for samples greater than N = 10 and almost perfect when N > 20](https://lindeloev.github.io/tests-as-linear/simulations/simulate_spearman.html).\n", "\n", "Such a nice and non-mysterious equivalence!\n", "\n", "The Spearman rank-order correlation coefficient is a *nonparametric measure* of the relationship between two variables. Unlike the Pearson correlation, the Spearman correlation does not assume that both datasets are normally distributed. " ] }, { "cell_type": "markdown", "metadata": { "id": "jUa-DZ-hiJHJ" }, "source": [ "We will use [`scipy.stats.spearmanr`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html) to calculate the Spearman correlation coefficient with associated p-value." ] }, { "cell_type": "code", "metadata": { "id": "8zFVX3cciJHJ", "outputId": "439d9972-e141-4cf0-a2f2-b9636e406256" }, "source": [ "# spearmanr returns the rank correlation coefficient and \n", "# the 2-tailed p-value\n", "stats.spearmanr(data.mheight, data.Length)" ], "execution_count": null, "outputs": [ { "data": { "application/javascript": [ "\n", " if (window._pyforest_update_imports_cell) { window._pyforest_update_imports_cell('from scipy import stats'); }\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "SpearmanrResult(correlation=0.4553274463786581, pvalue=0.002446213023703486)" ] }, "execution_count": 197, "metadata": {}, "output_type": "execute_result" } ] }, { "cell_type": "markdown", "metadata": { "id": "iAJC9OmpiJHJ" }, "source": [ "**Practice for you**. Calculate the Spearman's R for the question: \"Can gestational age (`Gestation`) predict baby weight (`Birthweight`)?\"" ] }, { "cell_type": "code", "metadata": { "id": "cAbe0Z8riJHJ" }, "source": [ "# Your code goes here!:\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "0zu7FIC_iJHJ" }, "source": [ "### 3.4 Theory: Rank-transformation\n", "`scipy.stats.rankdata` simply takes an array of numbers and \"replaces\" them with the integers of their rank (1st smallest, 2nd smallest, 3rd smallest, etc.). `pd.DataFrame.rank` performs a similar function, but with support for `pandas.DataFrames`. So the result of the rank-transformation `scipy.stats.rankdata([3.6, 3.4, -5.0, 8.2])` is `[3, 2, 1, 4]`. See that in the figure above?\n", "\n", "A _signed_ rank is the same, just where we rank according to absolute size first and then add in the sign second. So the signed rank here would be `[2, 1, -3, 4]`. Or in code:" ] }, { "cell_type": "code", "metadata": { "id": "R0nXu8ziiJHK" }, "source": [ "def signed_rank(df):\n", " return np.sign(df) * df.abs().rank()" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "bxTLVV_OiJHK" }, "source": [ "Ranks are all you need to do to convert most parametric tests into their \"non-parametric\" counterparts! " ] }, { "cell_type": "markdown", "metadata": { "id": "7D5qLFEu6DFF" }, "source": [ "#### Quick Aside: Can we look at all the quantitative data at once?\n", "`sns.pairplot()` from seaborn combines joint and marginal views — but rather than focusing on a single relationship, it visualizes every pairwise combination of variables simultaneously:" ] }, { "cell_type": "code", "metadata": { "id": "I9elJgoZ6DFG", "scrolled": true }, "source": [ "# Uncomment and run to see a matrix of all relationships, color-coded by smoker / non-smoker\n", "# sns.pairplot(data=data, hue=\"smoker\")" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "GsgaT33miJHK" }, "source": [ "\n", "***\n", "\n", "\n", "## 4 One mean comparisons\n", "\n", "### 4.1 One sample t-test and Wilcoxon signed-rank\n", "\n", "The T-test and Wilcoxon signed-rank test calculate probability for the mean of ONE group of scores. This is a test for the null hypothesis that the expected value (mean) of a sample of independent observations a is equal to the given population mean, popmean.\n", "\n", "\n", "#### 4.1.1 Theory: As linear models\n", "\n", "**t-test** model: A single number predicts $y$.\n", "\n", "$y = \\beta_0 \\qquad \\mathcal{H}_0: \\beta_0 = 0$\n", "\n", "In other words, it's our good old $y = \\beta_0 + \\beta_1*x$ where the last term is gone since there is no $x$ (essentially $x=0$).\n", "\n", "The same is to a very close approximately true for **Wilcoxon signed-rank test**, just with the [signed ranks](#3.0.2-Theory:-rank-transformation) of $y$ instead of $y$ itself.\n", "\n", "$\\text{signed_rank}(y) = \\beta_0$\n", "\n", "[This approximation is good enough when the sample size is larger than 14 and almost perfect if the sample size is larger than 50](https://lindeloev.github.io/tests-as-linear/simulations/simulate_wilcoxon.html)." ] }, { "cell_type": "markdown", "metadata": { "id": "XQICc_n-iJHL" }, "source": [ "We will use [`scipy.stats.ttest_1samp`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_1samp.html) for the parametric comparison test, and [`scipy.stats.wilcoxon`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wilcoxon.html) for the non-parametric comparison test." ] }, { "cell_type": "markdown", "metadata": { "id": "GqPm4KIiiJHL" }, "source": [ "#### 4.1.2 Python code: One-sample t-test\n", "\n", "Try running the Python code below and see that the linear model (`smf.ols`) produces the same $t$, $p$, and $r$ as `scipy.stats.ttest_1samp`. \n", "\n", "First, we extract the subsets of data for smokers and non-smokers." ] }, { "cell_type": "code", "metadata": { "id": "xkO3BuMTiJHL" }, "source": [ "# Average of non-smoker mother birthweights\n", "data_nonsmokers = data[data.smoker==0]\n", "data_smokers = data[data.smoker==1]" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "Bp4Q7h9yiJHL" }, "source": [ "Next, we will calculate the average birthweight for each group." ] }, { "cell_type": "code", "metadata": { "id": "cdA_tFuBiJHL", "outputId": "4cb5ba14-291f-45ec-81d8-b82deba1d49a" }, "source": [ "avg_birthwgt_nonsmoker = statistics.mean(data_nonsmokers.Birthweight)\n", "avg_birthwgt_smoker = statistics.mean(data_smokers.Birthweight)\n", "\n", "print(avg_birthwgt_nonsmoker, avg_birthwgt_smoker)" ], "execution_count": null, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3.5095 3.1340909090909093\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "TNrsozSFiJHL" }, "source": [ "Let's compare the data for smoker's to the average birthweight from a non-smoking mother." ] }, { "cell_type": "code", "metadata": { "id": "CmjtD9vFiJHM", "outputId": "f9eda4c1-0df6-4163-8b38-b375fbce4b37" }, "source": [ "t, p = stats.ttest_1samp(data_smokers.Birthweight, avg_birthwgt_nonsmoker)\n", "print(t,p)" ], "execution_count": null, "outputs": [ { "data": { "application/javascript": [ "\n", " if (window._pyforest_update_imports_cell) { window._pyforest_update_imports_cell('from scipy import stats'); }\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "-2.7894379594885215 0.010986630284506064\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "6yd64v0biJHM" }, "source": [ "and the equivalent linear model with an intercept-only is fit with `smf.ols`, the tool for ordinary least squares linear regression. If we want to make an equivalent comparison, we will shift the birthweight data by the average birthweight for non-smokers, and then see what the linear model looks like!" ] }, { "cell_type": "code", "metadata": { "id": "qf4E1ik_iJHM", "outputId": "5d65f987-ee0d-4794-dc72-907d7c16901b" }, "source": [ "# We will shift the birthweight data by the average for non-smoker\n", "# and store the result in a new column, Birthweight_adj\n", "data[\"Birthweight_adj\"] = data[\"Birthweight\"]-avg_birthwgt_nonsmoker\n", "data_smokers.head()" ], "execution_count": null, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDLengthBirthweightHeadcircGestationsmokermagemnocigmheightmppwtfagefedyrsfnocigfheightlowbwtmage35
20792533.64384012021705924121218500
211388513.14334112271605324161217600
22575502.7830371197165602014018300
23569502.51353912271595223142520010
241363482.37303712071634720103518510
\n", "
" ], "text/plain": [ " ID Length Birthweight Headcirc Gestation smoker mage mnocig \\\n", "20 792 53 3.64 38 40 1 20 2 \n", "21 1388 51 3.14 33 41 1 22 7 \n", "22 575 50 2.78 30 37 1 19 7 \n", "23 569 50 2.51 35 39 1 22 7 \n", "24 1363 48 2.37 30 37 1 20 7 \n", "\n", " mheight mppwt fage fedyrs fnocig fheight lowbwt mage35 \n", "20 170 59 24 12 12 185 0 0 \n", "21 160 53 24 16 12 176 0 0 \n", "22 165 60 20 14 0 183 0 0 \n", "23 159 52 23 14 25 200 1 0 \n", "24 163 47 20 10 35 185 1 0 " ] }, "execution_count": 204, "metadata": {}, "output_type": "execute_result" } ] }, { "cell_type": "code", "metadata": { "id": "xafESP5UiJHM", "outputId": "3d51ab2e-11f6-49c7-8e11-a1cb55e24b50" }, "source": [ "# Equivalent linear model: intercept-only\n", "res = smf.ols(formula=\"Birthweight_adj ~ 1\", data=data[data.smoker==1]).fit() \n", "print(res.summary())" ], "execution_count": null, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: Birthweight_adj R-squared: -0.000\n", "Model: OLS Adj. R-squared: -0.000\n", "Method: Least Squares F-statistic: nan\n", "Date: Thu, 26 Aug 2021 Prob (F-statistic): nan\n", "Time: 13:25:46 Log-Likelihood: -20.584\n", "No. Observations: 22 AIC: 43.17\n", "Df Residuals: 21 BIC: 44.26\n", "Df Model: 0 \n", "Covariance Type: nonrobust \n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "Intercept -0.3754 0.135 -2.789 0.011 -0.655 -0.096\n", "==============================================================================\n", "Omnibus: 0.182 Durbin-Watson: 1.305\n", "Prob(Omnibus): 0.913 Jarque-Bera (JB): 0.016\n", "Skew: 0.021 Prob(JB): 0.992\n", "Kurtosis: 2.876 Cond. No. 1.00\n", "==============================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "Y0gezJ9CiJHM" }, "source": [ "Notice that the `t` statistics and `p` values for the linear model and t-test the same!" ] }, { "cell_type": "markdown", "metadata": { "id": "PTfWUsbMiJHN" }, "source": [ "#### 4.1.2 Python code: One-sample Wilcoxon signed-rank test\n", "We use [`scipy.stats.wilcoxon`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wilcoxon.html) for the non-parametric comparison test and compare it to the $p$-value for a ranked data linear model." ] }, { "cell_type": "code", "metadata": { "id": "Gs2oUva1iJHN", "outputId": "d979980e-28a9-426c-946e-0f553e5469c9" }, "source": [ "stats.wilcoxon(data[data.smoker==1].Birthweight_adj)" ], "execution_count": null, "outputs": [ { "data": { "application/javascript": [ "\n", " if (window._pyforest_update_imports_cell) { window._pyforest_update_imports_cell('from scipy import stats'); }\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "WilcoxonResult(statistic=51.0, pvalue=0.01272726058959961)" ] }, "execution_count": 206, "metadata": {}, "output_type": "execute_result" } ] }, { "cell_type": "code", "metadata": { "id": "RMUu_TP2iJHN", "outputId": "79892ed4-1177-4a35-bc6a-736f2ddd2475" }, "source": [ "signed_rank_data = signed_rank(data[data.smoker==1])\n", "res = smf.ols(\"Birthweight_adj ~ 1\", data=signed_rank_data).fit()\n", "print(res.summary())" ], "execution_count": null, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: Birthweight_adj R-squared: -0.000\n", "Model: OLS Adj. R-squared: -0.000\n", "Method: Least Squares F-statistic: nan\n", "Date: Thu, 26 Aug 2021 Prob (F-statistic): nan\n", "Time: 13:25:46 Log-Likelihood: -84.360\n", "No. Observations: 22 AIC: 170.7\n", "Df Residuals: 21 BIC: 171.8\n", "Df Model: 0 \n", "Covariance Type: nonrobust \n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "Intercept -6.8636 2.443 -2.809 0.011 -11.945 -1.782\n", "==============================================================================\n", "Omnibus: 1.920 Durbin-Watson: 1.250\n", "Prob(Omnibus): 0.383 Jarque-Bera (JB): 1.649\n", "Skew: 0.603 Prob(JB): 0.438\n", "Kurtosis: 2.411 Cond. No. 1.00\n", "==============================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "hS6GYFA2iJHO" }, "source": [ "### 4.2 Paired samples t-test and Wilcoxon matched pairs\n", "\n", "A paired t-test or matched pairs test is used when we are interested in the difference between two variables for the same subject. Often the two variables might be measurements that are separated by time (e.g. cholesterol levels before and after a special diet). \n", "\n", "#### 4.2.1 Theory: As linear models\n", "\n", "**t-test** model: a single number (intercept) predicts the pairwise differences.\n", "\n", "$y_2-y_1 = \\beta_0 \\qquad \\mathcal{H}_0: \\beta_0 = 0$\n", "\n", "This means that there is just one $y = y_2 - y_1$ to predict and it becomes a [one-sample t-test](#4.1-One-sample-t-test-and-Wilcoxon-signed-rank) on the pairwise differences. The idea is therefore also the same as for the one-sample t-test. " ] }, { "cell_type": "markdown", "metadata": { "id": "JFX2CTjYiJHO" }, "source": [ "Similarly, the **Wilcoxon matched pairs** only differ from **Wilcoxon signed-rank** in that it's testing the signed ranks of the pairwise $y_2-y_1$ differences.\n", "\n", "$\\text{signed_rank}(y_2-y_1) = \\beta_0 \\qquad \\mathcal{H}_0: \\beta_0 = 0$" ] }, { "cell_type": "markdown", "metadata": { "id": "uTK0_2d_iJHO" }, "source": [ "**Let's load some data with matched pairs** from a study tested whether cholesterol was reduced after using a certain brand of margarine as part of a low fat, low cholesterol diet. \n", "\n", "The subjects consumed on average 2.31g of the active ingredient, stanol easter, a day. This data set contains information on 18 people using margarine to reduce cholesterol over three time points." ] }, { "cell_type": "code", "metadata": { "id": "23otu9QniJHO", "outputId": "8bc9c8e8-4c09-455a-dd34-1f4967824223" }, "source": [ "url = \"https://raw.githubusercontent.com/DeisData/datasets/main/Cholesterol_R.csv\"\n", "data_matched = pd.read_csv(url)\n", "data_matched.head()" ], "execution_count": null, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDBeforeAfter4weeksAfter8weeksMargarine
016.425.835.75B
126.766.206.13A
236.565.835.71B
344.804.274.15A
458.437.717.67B
\n", "
" ], "text/plain": [ " ID Before After4weeks After8weeks Margarine\n", "0 1 6.42 5.83 5.75 B\n", "1 2 6.76 6.20 6.13 A\n", "2 3 6.56 5.83 5.71 B\n", "3 4 4.80 4.27 4.15 A\n", "4 5 8.43 7.71 7.67 B" ] }, "execution_count": 208, "metadata": {}, "output_type": "execute_result" } ] }, { "cell_type": "code", "metadata": { "id": "oFlU3MbfiJHO", "outputId": "abae48cd-bf92-4a6c-9f33-51e5cbda42ab" }, "source": [ "# Plot the data for each ID, Before and After 8 weeks\n", "fig, ax = plt.subplots()\n", "sns.scatterplot(x='ID',y='Before',data=data_matched, ax=ax)\n", "sns.scatterplot(x='ID',y='After8weeks',data=data_matched, ax=ax)\n", "ax.set_ylabel('Cholesterol Before and After 8 wks')\n", "ax.set_xlabel('Patient ID')" ], "execution_count": null, "outputs": [ { "data": { "text/plain": [ "Text(0.5, 0, 'Patient ID')" ] }, "execution_count": 209, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAEGCAYAAABvtY4XAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAfj0lEQVR4nO3de5wcdZnv8c93YNiEXGBzIUEFh2SRLPFwHV1A4sKiLCCoeJADqyuiR+Csa0SOrri64tnz2nOOnpVVXBXiHXFRQKPgCouLqPGGTkJEEFxMCMolV4SEkJGQfvaPqk46w3RPTU93dXX19/16zau7q7uqn1Rqnqn61fP7/RQRmJlZ+fR1OgAzM2sPJ3gzs5JygjczKykneDOzknKCNzMrqT07HUCtWbNmxcDAQKfDMDPrGsuXL98YEbNHe69QCX5gYIChoaFOh2Fm1jUkPVjvPTfRmJmVlBO8mVlJOcGbmZWUE7yZWUk5wZuZlVShqmistSqVYM2mrazbPMyc6ZMYmDmFvj51Oiwzy4kTfElVKsEt96zlkutWMry9wqT+Pi4/+whOWTjXSd6sR7iJpqTWbNq6M7kDDG+vcMl1K1mzaWuHIzOzvDjBl9S6zcM7k3vV8PYK67cMdygiM8ubE3xJzZk+iUn9u//3TurvY79pkzoUkZnlzQm+pAZmTuHys4/YmeSrbfADM6d0ODIzy4tvspZUX584ZeFcFixexPotw+w3zVU0Zr3GCb7E+vrEvNlTmTd7aqdDMbMOcBONmVlJOcGbmZWUE7yZWUk5wZuZlZQTvJlZSTnBm5mVlBO8mVlJOcGbmZWUOzpZbjw+vVm+xkzwkuYDD0XE7yWdABwGXB0Rj7c3NCsTj09vlr8sTTRfBXZI+iPgM8BBwL9k2bikd0i6R9Ldkq6V5KEMe5THpzfLX5YEX4mIZ4AzgY9ExDuA/cdaSdJzgcXAYES8ENgDOGciwVr38vj0ZvnLkuC3SzoXOA/4ZrqsP+P29wQmS9oT2Bt4ZPwhWhl4fHqz/GVJ8OcDxwL/EBEPSDoIuGaslSLiYeAfgd8AjwJPRMStIz8n6QJJQ5KGNmzYML7orWt4fHqz/CkiGn9AOjoilo9YdkZE3DTGen9I0n7/34DHgeuBGyKi7h+HwcHBGBoayhi6dZtqFY3HpzdrHUnLI2JwtPeynMF/StJ/qdnYucD7Mqz3MuCBiNgQEduBrwHHZQnYyqk6Pv0x82Yxb/ZUJ3ezNstSB38WcIOk1wHHA28ATs6w3m+AYyTtDWwDTgJ8em5mlpMxE3xErJZ0DvB14LfAyRGxLcN6d0i6AVgBPAPcCSyZWLhmZpZV3QQv6RdAbQP9DJJSxzskERGHjbXxiLgMuGzCUZqZ2bg1OoM/PbcozMys5eom+Ih4EEDS3wPLgB9FhLsdmpl1iSxVNGuAc4EhST+V9GFJr2pvWGZmNlFjJviI+GxEvAk4kaSD02vJ0NHJzMw6K8tokp8GDgXWkTTVnEVSGWNmZgWWpQ5+Jkn1zOPAY8DGdPAxK7pKBR5bBVvWwrS5MGM+9HmOF8uX5wHonCx18GcCSPpj4M+B2yXtERHPa3dwNgGVCtx3Eyy9ELZvg/7JcOZVsOAMJ3nLjecB6Kwxf9MlnS7pg8BngYuA7wDvb3dgNkGPrdqV3CF5XHphstwsJ54HoLOyNNGcCnwf+GhEeLjfbrFl7a7kXrV9Gzy5FmYd3JmYrOc0mgdg3uypHYqqd2RponlrHoFYi02bmzTL1Cb5/skwdW7nYrKeU50HoDbJex6A/LgxtqxmzE/a3PsnJ6+rbfAz5nc2Luspngegs8YcDz5PHg++xapVNE+uTc7cXUVjHeB5ANqr0XjwWdrgrVv19SXt7RNsc3eZm01EdR4At7nnr+7pnKRBSbdLukbSAZK+LekJST+TdGSeQVrnVMvcTrtiGed+6g5Ou2IZt9yzlkqlOFd+Zja6RtfrnwA+BPwr8CPgqojYB7g0fc96gMvczLpXowTfHxE3R8S1QETEDSRPbgN8C7xHNCpzM7Nia5TghyWdLOm1QEh6NYCkPwV25BGcdV61zK2Wy9zMukOjBH8R8D+BN5EMUXCipMdJmmcWtz80KwKXuZl1L5dJ2phc5mad5kqu+lwmaRPiMjfrJA9Y1jz3ejGzQnMlV/MaJnhJfZKOyysYM7ORXMnVvIYJPiIqwIdzisXM7FlcydW8LE00t0r6r5Lc2GVmuXMlV/Oy3GS9BJgC7JC0DRBJx6fpbY3MzIzkJv8pC+eyYPEiV3KNU5bx4KflEYiZWT2u5GpOlin7JOn1kv4ufX2ApBe3PzQzM5uILG3wnwCOBf4iff0k8PG2RWRmZi2RpQ3+TyLiKEl3AkTE7yTt1ea4rIyqE5BsWZtMKegJSMzaKkuC3y5pDyAAJM0GKo1XMRuhUoH7boKlFybzxFanEFxwhpO8WZtk+c26AlgK7CfpH4AfAP+3rVFZsVQqsPF+eGBZ8lhp4u/7Y6t2JXdIHpdemCw3s7bIUkXzJUnLgZNISiRfHRH3tj0yK4ZWnXlvWbsruVdt35bMFzvBKQXNbHRZqmi+GBH3RcTHI+KfI+JeSV/MsN4hklbW/GyWdHFLorb8tOrMe9rc5I9Drf7JyWTgZtYWWU7BFta+SNvjjx5rpYj4VUQcERFHpJ9/iqSpx7pJozPv8ZgxPznzryb56pXAjPmtidPMnqVuE42k9wB/C0yWtLm6GHgaWDLO7zkJWBURDzYVpXVO9cy7Nsk3c+bd15c061x4aPLHYaqraHqGq6c6plEb/K8jYpqk6yLi7Al+zznAtRPchnVC9cx7ZBt8M2fefX1Je7vb3HuHq6c6qu6MTpJWpPXvKyLiqKa/IKmZfwRYGBHrRnn/AuACgAMPPPDoBx/0SX7hVM/AfOZt47Xxfrhq0bOvAC9c5j/0LdLsjE6bJN0OHCTpxpFvRsQrM37/qcCK0ZJ7up0lpE0+g4ODxZk/0Hbxmbc1y9VTHdUowb8COAr4Is8eE348ifhc3Dxj1ptadQ/HmlL3Ojsino6InwDHRcT3IuJ7wBBwAPDuLBuXtDfwcuBrrQjWzLpMwaqnKpVg9YYn+fGqjaze8CSVSrkbDbIMVfCEpFeTDDZ2CvBV4MosG4+Ip4CZTUdnZt2tVdVTLajE6cXJuxuVSb6cpHnlz4HbSZpqXhwR5+cUm5mVwUTv4bSoEqfe5N0LFi8q7TjzjfbOvwHzgeMj4vURcRMlHmSs1y7drHh8DNbRot7UvTh5d6MmmqNJ6tf/XdJq4MvAHrlElbNevHSzYvEx2ECLKnGqk3fXJvmyT97d6CbrnRHx7oiYD3wAOBLYS9LNae16adS7dFuzaWuHI7Ne4WOwgRaNY9SLk3dnuclKRPwQ+KGkxSRVMecw/uEKCqvRpVtZ2+asWHwMNtCi3tS9OHl3pgRfFREVkrb5f2tPOJ3Ri5duViw+Bhto4ThGvTZ5t/ub05uXblYsPgbHUK3EGViUPHqojEzqjkXTCYODgzE0NNSR765UgjWbtvbMpZsVj49Ba0ZTY9FImtFooxHx2EQDK5Jeu3Sz4vExaK3WqA1+OcmYMwIOBH6XPt8X+A1wULuDMzOz5tVN8BFxEICkK4EbI+Jb6etTgZflE173qV5mr9s8zJzpvsw2s87JUkXzooi4qPoiIm6W9L/bGFPXcmcVMyuSLLeiN0p6n6QBSc+X9F5gU7sD60burGJmRZIlwZ8LzCaZMPvrwH7pMhuhF8e6MLPiGrOJJq2WeXsOsXQ9d1YxsyIZ8wxe0gskLZF0q6TvVH/yCK7buLOKFYFHpbSqLDdZryeZ4OPTwI72hjN+Rapa6cWxLqxYfKPfamVJ8M9ExCfbHkkTingwu7OKdVIvTmph9WW5yXqTpL+StL+kGdWftkeWgatWzHbnG/1WK8sZ/Hnp47tqlgUwr/XhjI+HWDXbnW/0W60xz+Aj4qBRfjqe3GHXwVzLB7P1Mt/ot1qZxoOX9ELgUGBn5oyIq9sVVFbVg3lkG7wPZutVvtE/hkolmct1y9pkpqgmx5XvFmMOFyzpMuAEkgT/LeBU4AcRcVarg2lmuGAPsWpmmVQqcN9Nz54ZasEZXZ3kGw0XnOVfdRZwErA2Is4HDgf+oIXxTUi1auWYebOYN3uqk7uZje6xVbuSOySPSy9MlpdUlgS/LZ2q7xlJ04H1FOAGq5nZuGxZuyu5V23flkwDWFJZ2uCHJO0LfIpkjPgngZ+2Mygzs5abNjdplqlN8v2TkzleSypLFc1fRcTjEXEl8HLgvLSpxsxKrHRDHsyYn7S5909OXlfb4GfM72xcbZSpiqYqIta0KQ4zK5Ai9hKfsL6+5IbqhYcmzTJTy19FU95/2XhVKrDxfnhgWfJYqYy9jllJlbaXeF8fzDoYBhYljyVO7jDOM/jSKmn5lFmz3Eu8HDJlL0nHSzo/fT5bUrkm3O7B8imzRtxLvByyjAd/GfBu4D3pon7gmnYGlbseLJ8ya8RDHpRDliaaM4EjgRUAEfGIpGltjSpvPVg+ZdaIhzwohyxNNE9HMp5BAEjK/Cdc0r6SbpB0n6R7JR3bbKBt1YPlU2ZjcS/x7pflDP46SVcB+0p6C/Amkk5PWXwUuCUizpK0F7B3k3G2Vw+WT5lZ+TVM8JIEfAVYAGwGDgHeHxHfHmvD6bAGLwXeCBARTwNPTzDe9qmWT806uNORWK/qsZEOrf0aJviICElfj4ijgTGT+gjzgA3A5yQdTjLMwdsjYrdCWkkXABcAHHjggeP8CrOScKmutUGWI+cnkl7UxLb3BI4CPhkRRwJbgUtHfigilkTEYEQMzp49u4mvMSsBl+paG2RJ8CeSJPlVku6S9AtJd2VY7yHgoYi4I319A0nCN7ORXKprbZDlJuupzWw4ItZK+q2kQyLiVyRjyv+ymW2ZlZ5Lda0Nsowm+SCwL3BG+rNvuiyLtwFfSs/4jwD+T3NhmpWcS3WtDcY8g5f0duAtwNfSRddIWhIRHxtr3YhYCYw6lZSZ1XCprrVBliaaNwN/Uq1+kfRB4MfAmAnerMiq8/mu2zzMnOkF6KnpUl1rsSwJXsCOmtc70mXF4Npha0Ipxzs3GyFLgv8ccIekpenrVwOfaVtE4+HaYWtSvfHOFyxe5OFwLTftvoqsm+AlHRQRD0TE5ZK+CxxPcuZ+fkTc2bIIJqJe7fCFh3buMtdXFF3B451bp+VxFdko89wAIOm2iFgREVdExEcLk9yheLXD1SuKqxbBF05PHu+7ybNDFZDHO7dOy2PWrEYJvi8dC/4Fki4Z+dOyCCaiWjtcq5O1w+6N2DU83rl1WqOryFZp1AZ/Dkl7+55AMcd/r9YOj2yD71TtcKMrCldGFEqpxzt3M2FXqF5F1ib5Vl9F1k3wae/TD0q6KyJubtk3tlLRaofdG7GrVMc7L1WbuwsPukb1KnJkG3wrryKVzOXR4APSHJIeqM+JiFMlHQocGxEtr6QZHByMoaGhVm82P/7lsk7beH9y72fkScaFy3wVWUDVKpqJXEVKWh4Ro3YozVIm+XmSUsn3pq//g2SM+GKUShZJ0a4orPe4mbCrtPsqMkvmmRUR1wEVgIh4ht07Plmtam/EgUXJo5O75alohQfWUVmyz1ZJM9k1J+sxwBNtjcrMmtPKQcsqlaTJ54FlyaPLfbtOliaaS4AbgfmSfgjMBs5qa1Rm1pxWNRP6flIpjJngI2KFpD8lmY9VwK8iYnvbIzOz5rRi0LIi9hK3cWv4p1jS8yXNStvdpwGnAK/IJTIz65yi9RK3pjQai+bvgDcCIenLwMuA7wKvkHRCRFycR4Bm1gHu01EKjZpozgX+GNgb+A0wNyKekrQnsDKH2MysU4rWS9ya0ijBD0fE08DTklZFxFOQlElKejqf8MysI9ynoxQaJfh9Jb2G5Mbq9PQ56et92h6ZmXWWZ5jqeo0S/PdIJtkG+H7N8+prs+7mQbms5BoNNnZ+noGY5cp13tYDfCRbb/LY/dYDnOCtN7nO23qAE7z1Jg/KZT2gUUen19R7DyAivtb6cMxy4jpv6wGNqmjOaPBeAE7w1r1c521F0OZKLlfRWO9ynbd1Ug6VXGNuRdI+ki6XNJT+fFiSOzqZWc+qVILVG57kx6s2snrDk1Qqjac+HVUOlVxZxoP/LHA3cHb6+i9JpvBr2EZvZlZGlUpwyz1rnzVZ9ikL545vPtUcplfMch0wPyIui4jV6c//Aua15NvNzLrMmk1bdyZ3gOHtFS65biVrNm0d34ZyqOTKkuC3STq++kLSS4BtDT5v1lYtuTw2a9K6zcM7k3vV8PYK67cMj29DrZxesY4sTTQXAVfXtLv/DjivZRGYjUPLLo/NmjRn+iQm9fftluQn9fex37RJ49tQDpVcY83otAfw+og4HDgMOCwijoyIu7JsXNIaSb+QtFLSUAvitR7XsstjsyYNzJzC5WcfwaT+JH1WTzIGZk4Z/8aqlVwDi5LHFpfpNjyDj4gdko5On29u8jtOjIiNTa5rtptGl8fzZk/tUFTWS/r6xCkL57Jg8SLWbxlmv2mTGJg5pZBXkFmaaO6UdCNwPbDzNMk9Wa0TWnZ5bDYBfX1i3uyphT+pyHI9MAPYBPwZSe/WM4DTM24/gFslLZd0wWgfkHRBtcZ+w4YNGTdrvaqll8dmJaeI9lUgSHpORDwiaT/g28DbIqLuZCGDg4MxNOSmemusUgnWbNpa+MtjszxIWh4Rg6O9l6Un6wsk3Sbp7vT1YZLel+WLI+KR9HE9sBR4cfawzUZXvTw+Zt4s5s2e6uRuVkeWJppPAe8BtgOkFTTnjLWSpCmSplWfAyeT9Ig1M7McZLnJundE/FTa7SzpmQzrzQGWpuvtCfxLRNwy/hDNzKwZWRL8RknzSW6YIuks4NGxVoqI1cDhEwvPzMyalSXBvxVYAiyQ9DDwAPC6tkZlO28krts8zJzpvpFoZuOXJcFHRLwsbUfvi4gtkg5qd2C9zN3xzawVstxk/SpARGyNiC3pshvaF5K5O76ZtUKjOVkXAAuBfUbMzzodcLfBNnJ3fDNrhUZNNIeQ9Fjdl93nZ90CvKWNMfU8d8c3s1ZoNCfrN4BvSDo2In6cY0w9r9odf2QbvLvjp9o8UbFZWWS5yXqmpHtIJvm4haT08eKIuKatkfWwbhqtLnc5TFRsVhZZfiNOTocKPh14CHgB8K62RmXujl9PDhMVm5VFlgTfnz6eBlwbEY+1MR6zxhpNVGxmu8mS4G+SdB8wCNwmaTYwzskHzVokh4mKzcpizAQfEZcCxwKDEbEdeAp4VbsDMxtVDhMVm5XFmDdZJe1NMlzBgcAFwHNISii/2d7QzEaRw0TFZmWRpYrmc8By4Lj09UMk0/c5wVtnVCcqnnVwpyMxK7Qspz3zI+JD7BoPfhvgkg4zs4LLcgb/tKTJ7BoueD7w+7ZGZWZWZF3S2S5Lgr+MpIPTAZK+BLwEeGM7gzIzK6wu6myXpYrm28BrSJL6tSTVNN9tb1hmZgXVRZ3tGo0medSIRdVZnA6UdGBErGhfWGZmBdWos13Bbvw3aqL5cIP3AvizFsdiZlZ81c52tUm+oJ3tGo0meWKegZiZdYVqZ7uRbfAF7GyXpaNTP/A/gJemi74LXJX2ajUz6y1d1NkuSxXNJ0kGHPtE+vov02X/vV1BmZkVWpd0tsuS4F8UEYfXvP6OpJ+3KyAzM2uNLNcUO9LOTQBImgfsaF9IZmbWClnO4N8F3C5pNckQBc8Hzm9rVGZmNmFjJviIuE3SwSQjSAq4LyI8VIGZWcFlOYMHOBoYSD9/uCQi4uq2RWVmZhOWpUzyi8B8YCW72t4DcIJvpy4ZzMjMiivLGfwgcGhERLuDsVQXDWZkZsWVJVvcDRSvD26ZddFgRmZWXI0GG7uJpClmGvBLST+lZhz4iHhl+8PrUV00mJGZFVejJpp/zC0K210XDWZkZsXVqInmYeCZiPhe7Q/JWf1D+YTXo6qDGfVPTl4XeDAjMyuuRmfwHwH+dpTlT6XvnZHlCyTtAQwBD0fE6eOMrzd10WBGZlZcjRL8QETcNXJhRAxJGhjHd7wduBeYPs7YeluXDGZkZsXV6JRwUoP3JmfZuKTnAa8APj2eoMzMbOIaJfifSXrLyIWS3gwsz7j9jwB/A1TqfUDSBZKGJA1t2LAh42bNzGwsjZpoLgaWSnoduxL6ILAXcOZYG5Z0OrA+IpZLOqHe5yJiCbAEYHBw0J2prOtUKsGaTVtZt3mYOdMnMTBzCn196nRYZg2n7FsHHCfpROCF6eJ/jYjvZNz2S4BXSjqNpLlnuqRrIuL1E4rYrEAqleCWe9ZyyXUrGd5eYVJ/H5effQSnLJzrJG8dpzxGIEjP4N85VhXN4OBgDA0NtT0es1ZZveFJTrtiGcPbd7VCTurv41uLFzFv9tQORma9QtLyiBgc7T3X3ZlNwLrNw7sld4Dh7RXWbxnuUERmu+SS4CPiu66BtzKaM30Sk/p3/zWa1N/HftMaFaGZ5cNn8GYTMDBzCpeffcTOJF9tgx+YOaXDkZlln/DDzEbR1ydOWTiXBYsXsX7LMPtNcxWNFYcTvNkE9fWJebOn+qaqFY6baMzMSsoJ3syspNxEYzZRnj/XCsoJ3mwiPH+uFZiPQLOJ8Py5VmBO8GYT0Wj+XLMOc4I3m4jq/Lm1PH+uFYQTvNlEeP5cKzDfZDWbCM+fawXmBG82UZ4/1wrKpxlmZiXlBG9mVlJO8GZmJeUEb2ZWUk7wZmYllcuk21lJ2gA82Ok4MpgFbOx0EOPQbfGCY85Lt8XcbfFC+2N+fkTMHu2NQiX4biFpqN4s5kXUbfGCY85Lt8XcbfFCZ2N2E42ZWUk5wZuZlZQTfHOWdDqAceq2eMEx56XbYu62eKGDMbsN3syspHwGb2ZWUk7wZmYl5QQ/CkkHSLpd0r2S7pH09lE+c4KkJyStTH/e34lYR8S0RtIv0niGRnlfkq6Q9GtJd0k6qhNx1sRzSM3+Wylps6SLR3ym4/tZ0mclrZd0d82yGZK+Len+9PEP66x7iqRfpfv80g7H/P8l3Zf+3y+VtG+ddRseRznG+wFJD9f8359WZ90i7eOv1MS7RtLKOuvms48jwj8jfoD9gaPS59OA/wAOHfGZE4BvdjrWETGtAWY1eP804GZAwDHAHZ2OuSa2PYC1JJ02CrWfgZcCRwF31yz7EHBp+vxS4IN1/k2rgHnAXsDPRx5HOcd8MrBn+vyDo8Wc5TjKMd4PAO/McNwUZh+PeP/DwPs7uY99Bj+KiHg0Ilakz7cA9wLP7WxULfEq4OpI/ATYV9L+nQ4qdRKwKiIK15M5Ir4PPDZi8auAL6TPvwC8epRVXwz8OiJWR8TTwJfT9dputJgj4taIeCZ9+RPgeXnEkkWdfZxFofZxlSQBZwPX5hFLPU7wY5A0ABwJ3DHK28dK+rmkmyUtzDeyUQVwq6Tlki4Y5f3nAr+tef0QxfnDdQ71fxmKtp8B5kTEo5CcEAD7jfKZIu/vN5FczY1mrOMoT3+dNil9tk4zWFH38SJgXUTcX+f9XPaxE3wDkqYCXwUujojNI95eQdKccDjwMeDrOYc3mpdExFHAqcBbJb10xPsaZZ2O18lK2gt4JXD9KG8XcT9nVdT9/V7gGeBLdT4y1nGUl08C84EjgEdJmjxGKuQ+Bs6l8dl7LvvYCb4OSf0kyf1LEfG1ke9HxOaIeDJ9/i2gX9KsnMMcGdMj6eN6YCnJ5Wuth4ADal4/D3gkn+gaOhVYERHrRr5RxP2cWldt3kof14/ymcLtb0nnAacDr4u0MXikDMdRLiJiXUTsiIgK8Kk6cRRxH+8JvAb4Sr3P5LWPneBHkbaffQa4NyIur/OZuennkPRikn25Kb8onxXPFEnTqs9JbqjdPeJjNwJvSKtpjgGeqDYzdFjds52i7ecaNwLnpc/PA74xymd+Bhws6aD0KuWcdL2OkHQK8G7glRHxVJ3PZDmOcjHi/tCZdeIo1D5OvQy4LyIeGu3NXPdxHnebu+0HOJ7kMu8uYGX6cxpwEXBR+pm/Bu4huWv/E+C4Dsc8L43l52lc702X18Ys4OMkVQe/AAYLsK/3JknY+9QsK9R+Jvnj8yiwneSM8c3ATOA24P70cUb62ecA36pZ9zSSKqxV1f+TDsb8a5L26uoxfeXImOsdRx2K94vpcXoXSdLev+j7OF3++erxW/PZjuxjD1VgZlZSbqIxMyspJ3gzs5JygjczKykneDOzknKCNzMrKSd4Kw1JO9LR+e6WdL2kvRt89gRJx9W8vkjSG5r83gFJf9HgvbtrvvMJSXemox9+X9LpzXynWRZO8FYm2yLiiIh4IfA0ST19PScAOxN8RFwZEVc3+b0DwKgJfhTLIuLIiDgEWAz8s6STmvxes4ac4K2slgF/JOkMSXekZ83/LmlOOoDcRcA70jP+RenY4+8EkDRf0i3pQFDLJC1Il39eyXj6P5K0WtJZ6Xf9P2BRuq13ZA0wIlYCf0/Smcus5ZzgrXTSsUBOJekF+QPgmIg4kmQo2b+JiDXAlcA/pWf8y0ZsYgnwtog4Gngn8Ima9/Yn6el8Oklih2Q8+GXptv5pnOGuABaMcx2zTPbsdABmLTS5ZgadZSTjCR0CfCUd12Qv4IFGG0hHED0OuD4dAgfgD2o+8vVIBr/6paQ5LYh5tNEQzVrCCd7KZFtEHFG7QNLHgMsj4kZJJ5DMEtRIH/D4yO3U+H3t5puKcndHkkwoY9ZybqKxstsHeDh9fl7N8i0k0zHuJpJx/x+Q9FrYOY/t4WN8x6jbGoukw4C/IxkAzqzlnOCt7D5A0tyyDNhYs/wm4MzqTdYR67wOeLOk6mh/Y00BdxfwTDrr1Fg3WRdVyyRJEvviiLgt6z/GbDw8mqSZWUn5DN7MrKSc4M3MSsoJ3syspJzgzcxKygnezKyknODNzErKCd7MrKT+E1rzXuDMRUuTAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ] }, { "cell_type": "markdown", "metadata": { "id": "TXrFCFHciJHP" }, "source": [ "#### 4.2.2 Python code: Paired sample t-test" ] }, { "cell_type": "code", "metadata": { "id": "BoGHIfLpiJHP", "outputId": "3c998d83-fc24-4011-8aa5-9ad67b93b37f" }, "source": [ "t, p = scipy.stats.ttest_rel(data_matched.After8weeks, data_matched.Before)\n", "print(t,p)" ], "execution_count": null, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-14.945967764585072 3.27857534017563e-11\n" ] } ] }, { "cell_type": "code", "metadata": { "id": "n2DBIV2NiJHP", "outputId": "4f0e3aa1-5fd5-4249-bf5c-6589ca2e2fab" }, "source": [ "# Let's add a column to our dataFrame subtracting the difference in cholesterol measures\n", "data_matched[\"cholesterol_diff\"] = data_matched.After8weeks - data_matched.Before\n", "data_matched.head()" ], "execution_count": null, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDBeforeAfter4weeksAfter8weeksMargarinecholesterol_diff
016.425.835.75B-0.67
126.766.206.13A-0.63
236.565.835.71B-0.85
344.804.274.15A-0.65
458.437.717.67B-0.76
\n", "
" ], "text/plain": [ " ID Before After4weeks After8weeks Margarine cholesterol_diff\n", "0 1 6.42 5.83 5.75 B -0.67\n", "1 2 6.76 6.20 6.13 A -0.63\n", "2 3 6.56 5.83 5.71 B -0.85\n", "3 4 4.80 4.27 4.15 A -0.65\n", "4 5 8.43 7.71 7.67 B -0.76" ] }, "execution_count": 211, "metadata": {}, "output_type": "execute_result" } ] }, { "cell_type": "code", "metadata": { "id": "8BMF1kLgiJHP", "outputId": "1d9277bf-e331-4220-b9e4-edd9b366eab7" }, "source": [ "# The ttest_rel is similar to calculating the difference in y values for each match, and \n", "# fitting a linear model\n", "res = smf.ols(formula=\"cholesterol_diff ~ 1\", data=data_matched).fit()\n", "print(res.summary())" ], "execution_count": null, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: cholesterol_diff R-squared: 0.000\n", "Model: OLS Adj. R-squared: 0.000\n", "Method: Least Squares F-statistic: nan\n", "Date: Thu, 26 Aug 2021 Prob (F-statistic): nan\n", "Time: 13:25:47 Log-Likelihood: 5.9885\n", "No. Observations: 18 AIC: -9.977\n", "Df Residuals: 17 BIC: -9.087\n", "Df Model: 0 \n", "Covariance Type: nonrobust \n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "Intercept -0.6289 0.042 -14.946 0.000 -0.718 -0.540\n", "==============================================================================\n", "Omnibus: 0.404 Durbin-Watson: 2.273\n", "Prob(Omnibus): 0.817 Jarque-Bera (JB): 0.371\n", "Skew: 0.291 Prob(JB): 0.831\n", "Kurtosis: 2.605 Cond. No. 1.00\n", "==============================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/opt/anaconda3/lib/python3.8/site-packages/scipy/stats/stats.py:1603: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=18\n", " warnings.warn(\"kurtosistest only valid for n>=20 ... continuing \"\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "sLIScqiWiJHQ" }, "source": [ "You will notice the `t` statistics and `p` values are similar for the Paired T-test and a linear fit of the difference in the matched data." ] }, { "cell_type": "markdown", "metadata": { "id": "6HSCX4biiJHQ" }, "source": [ "#### 4.2.3 Python code: Wilcoxon matched pairs\n", "\n", "Again, we do the signed-ranks trick. This is still an approximation, but a close one:" ] }, { "cell_type": "code", "metadata": { "id": "G8ZaBCfliJHQ", "outputId": "b322549b-266e-431f-94c7-3f40279cc884" }, "source": [ "stats.wilcoxon(data_matched.After8weeks, data_matched.Before)\n" ], "execution_count": null, "outputs": [ { "data": { "application/javascript": [ "\n", " if (window._pyforest_update_imports_cell) { window._pyforest_update_imports_cell('from scipy import stats'); }\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "WilcoxonResult(statistic=0.0, pvalue=7.62939453125e-06)" ] }, "execution_count": 213, "metadata": {}, "output_type": "execute_result" } ] }, { "cell_type": "markdown", "metadata": { "id": "CdHI7fOCiJHQ" }, "source": [ "**Practice for you.** Compare the matched data after 4 weeks and after 8 weeks. What are the results of the paired sample t-test? Of the Wilcoxon matched pairs test?" ] }, { "cell_type": "code", "metadata": { "id": "8GFD3RuIiJHQ" }, "source": [ "# Your code goes here!:\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "3jbgVe6TiJHQ" }, "source": [ "\n", "***\n", "\n", "\n", "\n", "# 5 Many mean comparisons\n", "\n", "ANOVAs are linear models with categorical predictors. The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.\n" ] }, { "cell_type": "markdown", "metadata": { "id": "c_7kMbIOiJHQ" }, "source": [ "### 5.1 Theory: As linear models\n", "\n", "Model: One mean for each group predicts $y$.\n", "\n", "$y = \\beta_0 + \\beta_1 x_1 + \\beta_2 x_2 + \\beta_3 x_3 +... \\qquad \\mathcal{H}_0: y = \\beta_0$\n", "\n", "where $x_i$ are indicators ($x=0$ or $x=1$) where at most one $x_i=1$ while all others are $x_i=0$. \n", "\n", "Notice how this is just \"more of the same\" of what we already did in other models above. When there are only two groups, this model is $y = \\beta_0 + \\beta_1*x$, i.e. the independent t-test. If there is only one group, it is $y = \\beta_0$, i.e. the one-sample t-test." ] }, { "cell_type": "markdown", "metadata": { "id": "SHaL0Py26DFM" }, "source": [ "Since we now regress on more than one $x$, the one-way ANOVA is a **multiple regression** model.\n", "\n", "The **Kruskal-Wallis** test is simply a **one-way ANOVA** on the rank-transformed $y$ (`value`):\n", "\n", "$\\text{rank}(y) = \\beta_0 + \\beta_1 x_1 + \\beta_2 x_2 + \\beta_3 x_3 +...$\n", "\n", "This approximation is [good enough for 12 or more data points](https://lindeloev.github.io/tests-as-linear/simulations/simulate_kruskall.html). " ] }, { "cell_type": "markdown", "metadata": { "id": "0f3mJJgI6DFM", "outputId": "577a822d-a725-4caa-9edc-00132e9272ce" }, "source": [ "### 5.2 Example data\n", "\n", "We make a two-level factor with the `Margarine` levels `A` and `B`, so that the **one-way ANOVA** basically becomes a \"two-sample t-test\". We will test the null hypothesis that the two groups have the same population mean of cholesterol level after 8 weeks." ] }, { "cell_type": "code", "metadata": { "id": "zyrUWuo-iJHR", "outputId": "9b384cda-e37a-4be3-96e9-fffe4249acde" }, "source": [ "data_matched.head()" ], "execution_count": null, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDBeforeAfter4weeksAfter8weeksMargarinecholesterol_diff
016.425.835.75B-0.67
126.766.206.13A-0.63
236.565.835.71B-0.85
344.804.274.15A-0.65
458.437.717.67B-0.76
\n", "
" ], "text/plain": [ " ID Before After4weeks After8weeks Margarine cholesterol_diff\n", "0 1 6.42 5.83 5.75 B -0.67\n", "1 2 6.76 6.20 6.13 A -0.63\n", "2 3 6.56 5.83 5.71 B -0.85\n", "3 4 4.80 4.27 4.15 A -0.65\n", "4 5 8.43 7.71 7.67 B -0.76" ] }, "execution_count": 215, "metadata": {}, "output_type": "execute_result" } ] }, { "cell_type": "markdown", "metadata": { "id": "9trjagAx6DFN" }, "source": [ "### 5.3 Analysis of variance [ANOVA](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html)\n", "The ANOVA test has important assumptions that must be satisfied in order for the associated p-value to be valid. We will see how to test these in **6. Checking Assumptions**\n", "\n", "- The samples are independent. \n", "\n", "- Each sample is from a normally distributed population. [KS Test](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html)\n", "\n", "- The population standard deviations of the groups are all equal. This property is known as homoscedasticity. [Bartlett Test](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.bartlett.html)" ] }, { "cell_type": "markdown", "metadata": { "id": "teJ5vgXaiJHR" }, "source": [ "Let's see scipy's dedicated ANOVA function `scipy.stats.f_oneway`." ] }, { "cell_type": "code", "metadata": { "id": "pI-ItdZ4iJHS", "outputId": "a865f73b-ee0c-41ba-ff72-e39d0101be4f" }, "source": [ "# Extract subsets of data\n", "data_matched_A = data_matched[data_matched.Margarine=='A']\n", "data_matched_B = data_matched[data_matched.Margarine=='B']\n", "\n", "scipy.stats.f_oneway(data_matched_A.After8weeks,\n", " data_matched_B.After8weeks)" ], "execution_count": null, "outputs": [ { "data": { "text/plain": [ "F_onewayResult(statistic=1.2662631426081903, pvalue=0.27706640602454957)" ] }, "execution_count": 216, "metadata": {}, "output_type": "execute_result" } ] }, { "cell_type": "markdown", "metadata": { "id": "Nq6y4Hpt6DFP" }, "source": [ "### 5.4 Kruskal-Wallis H-test\n", "If ANOVA assumptions are not met, we can consider a non-parametric test.\n", "The [Kruskal-Wallis H-test](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kruskal.html) (`scipy.stats.kruskal`) tests the null hypothesis that the population median of all of the groups are equal. \n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "9p25-kFg6DFP", "outputId": "3e4afcbc-ae98-419a-a5c9-fa496b0c6591" }, "source": [ "# Notice here, we have combined the steps of extracting subsets of data, \n", "# accessing the .After8weeks column from the subset,\n", "# and doing the Kruskal test! Python is powerful!\n", "\n", "stats.kruskal(data_matched[data_matched.Margarine=='A'].After8weeks,\n", " data_matched[data_matched.Margarine=='B'].After8weeks)\n", "\n", "# Python starts working inside the parentheses, just like algebra." ], "execution_count": null, "outputs": [ { "data": { "application/javascript": [ "\n", " if (window._pyforest_update_imports_cell) { window._pyforest_update_imports_cell('from scipy import stats'); }\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "KruskalResult(statistic=1.0311890838206494, pvalue=0.3098795496397732)" ] }, "execution_count": 217, "metadata": {}, "output_type": "execute_result" } ] }, { "cell_type": "markdown", "metadata": { "id": "oXvG_vgp6DFP" }, "source": [ "Do Not Reject the hypothesis that median cholesterol levels at 8 weeks are equal for the Margarine A and Margarine B groups." ] }, { "cell_type": "markdown", "metadata": { "id": "zfpC9aPn6DFQ" }, "source": [ "### 5.5 **CHALLENGE for you!** \n", "Think back to the Birthweight `data`. Can you set up a test of equal means for the birthweights measured from four groups of mothers who are:\n", "- over 35 and smoke\n", "- over 35 and do not smoke\n", "- under 35 and smoke\n", "- under 35 and do not smoke" ] }, { "cell_type": "code", "metadata": { "id": "j_8D8xww6DFQ" }, "source": [ "# Try it here!: \n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "9MEHq10ZiJHT" }, "source": [ "\n", "\n", "## 6. Checking Assumptions\n", "\n", "Parametric tests like the t-tests and ANOVA have important assumptions that must be satisfied in order for the associated p-value to be valid, including: \n", "\n", "- The samples are independent. \n", "- Each sample is from a normally distributed population.\n", "- The population standard deviations of the groups are all equal. This property is known as homoscedasticity.\n", "\n", "Here are just a few tools to check some of these assumptions.\n" ] }, { "cell_type": "markdown", "metadata": { "id": "0iCNmQn4iJHT" }, "source": [ "### 6.1 Checking normality (Gaussian distribution)" ] }, { "cell_type": "markdown", "metadata": { "id": "Ms3VJDETiJHT" }, "source": [ "Baby weight is normally distributed (i.e. bell-shaped.) We can check this visually with a histogram `sns.histplot` and with the [Kolmogorv Smirnov Test](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html) `scipy.stats.kstest`" ] }, { "cell_type": "code", "metadata": { "id": "omS3q7IEiJHT", "outputId": "1763c4a3-fb9b-47ee-f78c-099095699da9" }, "source": [ "# [68]\n", "# Are the samples normally distributed?\n", "sns.histplot(data.Birthweight)\n", "stats.kstest(data.Birthweight,'norm')\n" ], "execution_count": null, "outputs": [ { "data": { "application/javascript": [ "\n", " if (window._pyforest_update_imports_cell) { window._pyforest_update_imports_cell('from scipy import stats'); }\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "KstestResult(statistic=0.9725710502961632, pvalue=5.07943581694757e-66)" ] }, "execution_count": 219, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEGCAYAAABiq/5QAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAARJ0lEQVR4nO3dfZBkVX3G8e8DCwEFRIuJgX3JqqEwhjJqxhdAjUGtQiWSpIjgK6hxTSwVjWJAK5qkKhUrWgZFS9mgQSPBF8TEN1BUEIiI7i6IwEKwUNl1ibtCAhopyOIvf/TdODXO7jSz232353w/VV3Tfe/tPr8zh324c/r26VQVkqR27NF3AZKk8TL4JakxBr8kNcbgl6TGGPyS1JglfRcwjIMOOqhWrlzZdxmSNFHWrl3746qamr19IoJ/5cqVrFmzpu8yJGmiJPnBXNud6pGkxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/GrG0uUrSDIRt6XLV/T969IiNhFLNki7wqaNGzjhrK/3XcZQPv7KI/suQYuYZ/yS1BiDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGjOy4E/yoSSbk1w3Y9s7ktyY5Nokn05y4KjalyTNbZRn/OcAx8zadjFweFU9GvgP4PQRti9JmsPIgr+qLgPumLXtS1W1tXv4DWDZqNqXJM2tzzn+lwEX9ti+JDWpl+BP8hZgK3DuDo5ZlWRNkjVbtmwZX3GStMiNPfiTnAQcC7ywqmp7x1XV6qqarqrpqamp8RUoSYvcWL+IJckxwF8Av1tVPxtn25KkgVFeznkecCVwWJKNSV4OvBfYH7g4yTVJPjCq9iVJcxvZGX9VPX+OzR8cVXuSpOH4yV1JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGjOy4E/yoSSbk1w3Y9tDklyc5Obu54NH1b4kaW6jPOM/Bzhm1rbTgK9U1aHAV7rHkqQxGlnwV9VlwB2zNh8HfLi7/2HgD0bVviRpbuOe439oVd0G0P381TG3L0nN223f3E2yKsmaJGu2bNnSdzmStGiMO/h/lORggO7n5u0dWFWrq2q6qqanpqbGVqAkLXbjDv7PACd1908C/m3M7UtS80Z5Oed5wJXAYUk2Jnk58HbgmUluBp7ZPZYkjdGSUb1wVT1/O7uePqo2JUnz223f3JUkjYbBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGtNL8Cd5fZLrk1yX5Lwk+/RRhyS1aOzBn2Qp8FpguqoOB/YEThx3HZLUqr6mepYA+yZZAjwA2NRTHZLUnLEHf1X9EHgncCtwG3BnVX1p9nFJViVZk2TNli1bxl2mJC1afUz1PBg4DngYcAjwwCQvmn1cVa2uqumqmp6amhp3mZK0aPUx1fMM4HtVtaWq/he4ADiyhzokqUl9BP+twJOSPCBJgKcD63uoQ5Ka1Mcc/1XA+cA64DtdDavHXYcktWpJH41W1duAt/XRtiS1zk/uSlJjhgr+JEcNs02StPsb9oz/zCG3SZJ2czuc409yBINLLaeS/PmMXQcwWGpBkjRh5ntzd29gv+64/Wdsvws4flRFSZJGZ4fBX1VfA76W5Jyq+sGYapIkjdCwl3P+SpLVwMqZz6mqo0dRlCRpdIYN/k8CHwDOBu4bXTmSpFEbNvi3VtX7R1qJJGkshr2c87NJXpXk4CQP2XYbaWWSpJEY9oz/pO7nqTO2FfDwXVuOJsnS5SvYtHFD32UsTnssYbCG4WQ4ZNlyfrjh1r7L0JCGCv6qetioC9Hk2bRxAyec9fW+yxjax185Qat//3yrv1uNzFDBn+Qlc22vqo/s2nIkSaM27FTP42fc34fBGvrrAINfkibMsFM9r5n5OMmDgH8eSUWSpJFa6LLMPwMO3ZWFSJLGY9g5/s8yuIoHBouz/SbwiVEVJUkanWHn+N854/5W4AdVtXEE9UiSRmyoqZ5usbYbGazQ+WDg3lEWJUkanWG/get5wDeBPwaeB1yVxGWZJWkCDTvV8xbg8VW1GSDJFPBl4PxRFSZJGo1hr+rZY1vod26/H8+VJO1Ghj3jvyjJF4HzuscnAF8YTUmSpFGa7zt3fwN4aFWdmuSPgCcDAa4Ezl1oo0kOZLC2/+EMLhN9WVVdudDXkyQNb74z/jOANwNU1QXABQBJprt9v7/Adt8NXFRVxyfZG3jAAl9HknQ/zRf8K6vq2tkbq2pNkpULaTDJAcBTgZO717oXLw+VpLGZ7w3afXawb98FtvlwYAvwT0muTnJ2kgfOPijJqiRrkqzZsmXLApuSNBbd9wdMwm3p8hV9/7Z6N98Z/7eSvKKq/nHmxiQvB9buRJuPA15TVVcleTdwGvCXMw+qqtXAaoDp6en6pVeRtPuYoO8P8LsD5g/+1wGfTvJCfhH008DewB8usM2NwMaquqp7fD6D4JckjcEOg7+qfgQcmeT3GFyBA/D5qvrqQhusqv9MsiHJYVV1E4O1/W9Y6OtJku6fYdfjvwS4ZBe2+xrg3O6KnluAl+7C15Yk7cCwH+DaparqGgZTRpKkMXPZBUlqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5Ia01vwJ9kzydVJPtdXDZLUoj7P+E8B1vfYviQ1qZfgT7IMeA5wdh/tS1LLlvTU7hnAm4D9t3dAklXAKoAVK1aMp6rdwNLlK9i0cUPfZUhaxMYe/EmOBTZX1dokT9vecVW1GlgNMD09XeOprn+bNm7ghLO+3ncZQ/n4K4/suwRJC9DHVM9RwHOTfB/4GHB0ko/2UIckNWnswV9Vp1fVsqpaCZwIfLWqXjTuOiSpVV7HL0mN6evNXQCq6lLg0j5rkKTWeMYvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTG9LtkgSWO3xxKS9F3F0A5Ztpwfbrh1l76mwS+pLT/fOjHfeQGj+d4Lp3okqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGjD34kyxPckmS9UmuT3LKuGuQpJb1sUjbVuANVbUuyf7A2iQXV9UNPdQiSc0Z+xl/Vd1WVeu6+z8B1gNLx12HJLWq12WZk6wEHgtcNce+VcAqgBUrViy4jaXLV7Bp44YFP1+SFpvegj/JfsCngNdV1V2z91fVamA1wPT0dC20nU0bNzS/9rYkzdTLVT1J9mIQ+udW1QV91CBJrerjqp4AHwTWV9W7xt2+JLWujzP+o4AXA0cnuaa7PbuHOiSpSWOf46+qK4DJ+aZjSVpk/OSuJDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY3pJfiTHJPkpiTfTXJaHzVIUqvGHvxJ9gTeBzwLeBTw/CSPGncdktSqPs74nwB8t6puqap7gY8Bx/VQhyQ1KVU13gaT44FjqupPuscvBp5YVa+eddwqYFX38DDgphGWdRDw4xG+/u6ihX620Edoo5/2cef9elVNzd64ZIQNbk/m2PZL//epqtXA6tGXA0nWVNX0ONrqUwv9bKGP0EY/7ePo9DHVsxFYPuPxMmBTD3VIUpP6CP5vAYcmeViSvYETgc/0UIckNWnsUz1VtTXJq4EvAnsCH6qq68ddxyxjmVLaDbTQzxb6CG300z6OyNjf3JUk9ctP7kpSYwx+SWpMM8GfZHmSS5KsT3J9klPmOCZJ3tMtJXFtksf1UevOGLKfT0tyZ5Jruttb+6h1oZLsk+SbSb7d9fGv5zhmosdyyD5O9DjOlGTPJFcn+dwc+yZ6LLeZp49jHcs+ruPvy1bgDVW1Lsn+wNokF1fVDTOOeRZwaHd7IvD+7uckGaafAJdX1bE91Lcr3AMcXVU/TbIXcEWSC6vqGzOOmfSxHKaPMNnjONMpwHrggDn2TfpYbrOjPsIYx7KZM/6quq2q1nX3f8JgAJbOOuw44CM18A3gwCQHj7nUnTJkPydaNz4/7R7u1d1mX6Uw0WM5ZB8XhSTLgOcAZ2/nkIkeSxiqj2PVTPDPlGQl8Fjgqlm7lgIbZjzeyASH5g76CXBEN41wYZLfGm9lO6/7s/kaYDNwcVUturEcoo8w4ePYOQN4E/Dz7eyf+LFk/j7CGMeyueBPsh/wKeB1VXXX7N1zPGUiz7Lm6ec6Bmt4/DZwJvCvYy5vp1XVfVX1GAaf/H5CksNnHTLxYzlEHyd+HJMcC2yuqrU7OmyObRMzlkP2caxj2VTwd3OlnwLOraoL5jhkUSwnMV8/q+qubdMIVfUFYK8kB425zF2iqv4buBQ4ZtauRTGWsP0+LpJxPAp4bpLvM1ip9+gkH511zKSP5bx9HPdYNhP8SQJ8EFhfVe/azmGfAV7SXUXwJODOqrptbEXuAsP0M8mvdceR5AkM/ju4fXxV7pwkU0kO7O7vCzwDuHHWYRM9lsP0cdLHEaCqTq+qZVW1ksHyLV+tqhfNOmyix3KYPo57LFu6quco4MXAd7p5U4A3AysAquoDwBeAZwPfBX4GvHT8Ze60Yfp5PPBnSbYCdwMn1mR9hPtg4MMZfKnPHsAnqupzSf4UFs1YDtPHSR/H7VpkYzmnPsfSJRskqTHNTPVIkgYMfklqjMEvSY0x+CWpMQa/JDXG4NeikuS+bnXDbydZl+TIbvshSc7fznNWJnnBjMcnJ3nvLq7rb5I8Y55j/irJG+fYfmCSV+3KetQ2g1+Lzd1V9Zjuo++nA38HUFWbqur42QcnWQKsBF4we9+uVFVvraovL/DpBwIGv3YZg1+L2QHAf8H/n9Vf190/Ocknk3wW+BLwduAp3V8Kr++ee0iSi5LcnOTvu+c9L8m7uvunJLmlu/+IJFd0938nydeSrE3yxW2rSCY5J8nx3f1nJ7kxyRUZrDM/c332RyW5NMktSV7bbXs78IiuvneM7telVrT0yV21Yd/uE8v7MPj069HbOe4I4NFVdUeSpwFv3LYWepKTgccwWNn0HuCmJGcClwGnds9/CnB7kqXAk4HLuzWSzgSOq6otSU4A/hZ42bZGk+wDnAU8taq+l+S8WXU9Evg9YP+u3fcDpwGHdwu2STvN4Ndic/e2gExyBPCROVa1hMEyx3fs4HW+UlV3dq9zA4OVE69Isl8GX3CzHPgX4KkM/idwAXAYcDhwcbfsyp7A7DVlHgncUlXf6x6fB6yasf/zVXUPcE+SzcBDh+y3NDSDX4tWVV3ZrXA4Ncfu/5nn6ffMuH8fv/i3ciWDtWJuAi5ncDZ/BPAGBushXV9VR+zgdedaYniYdqVdxjl+LVpJHsngrHu+VQ5/wmBqZRiXAW/sfl7NYFrmnu6vg5uAqe4vDZLslV/+Qo0bgYdn8CU5ACcM0eb9qU+al2cTWmy2zfHD4Oz6pKq6r5t62Z5rga1Jvg2cQ/eG8HZczmCa57LudTfQLZdcVfd2b+C+J8mDGPz7OgO4ftuTq+ru7tLMi5L8GPjmfB2qqtuT/Hv35vSFVXXqfM+RdsTVOaUxS7Jf9yXqAd4H3FxV/9B3XWqHUz3S+L2i+6vkeuBBDK7ykcbGM35Jaoxn/JLUGINfkhpj8EtSYwx+SWqMwS9Jjfk/3mX00k2CiPYAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ] }, { "cell_type": "markdown", "metadata": { "id": "fqs5xCF_iJHT" }, "source": [ "**Your turn** Is the number of cigarettes smoked per day normally distributed?" ] }, { "cell_type": "code", "metadata": { "id": "yK5wUSo3iJHT" }, "source": [ "# Try it here!:\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "OrxZ_MmviJHU" }, "source": [ "### 6.2 Checking heteroscedasticity" ] }, { "cell_type": "markdown", "metadata": { "id": "hONK1r62iJHU" }, "source": [ "The population standard deviations of the groups are all equal. This property is known as homoscedasticity. One of several tests available to check this is the [Bartlett Test](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.bartlett.html) `scipy.stats.bartlett`." ] }, { "cell_type": "code", "metadata": { "id": "wh5_dEbEiJHU", "outputId": "3e1bc804-fc35-4a33-ea2f-e17dce402cee" }, "source": [ "# Are the standard deviations equal for the smokers and non-smokers birthweight data?\n", "stats.bartlett(data_smokers.Birthweight, data_nonsmokers.Birthweight)" ], "execution_count": null, "outputs": [ { "data": { "application/javascript": [ "\n", " if (window._pyforest_update_imports_cell) { window._pyforest_update_imports_cell('from scipy import stats'); }\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "BartlettResult(statistic=0.743871469606483, pvalue=0.38842352374096334)" ] }, "execution_count": 221, "metadata": {}, "output_type": "execute_result" } ] }, { "cell_type": "markdown", "metadata": { "id": "vO1ozBzQiJHU" }, "source": [ "**Practice for you**: Are the standard deviations of birthweight data equal for the groups of over 35 and under 35 mothers?" ] }, { "cell_type": "code", "metadata": { "id": "Tr3UwfEmiJHU" }, "source": [ "# Try it here:\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "NpgITG806DFQ" }, "source": [ "**Key Points**\n", "\n", " * Hypothesis testing and p-values give you the **significance** of an\n", " effect / difference.\n", " \n", " * Parametric and Non-parametric tests are special cases of linear models!\n", "\n", " * **Regressions** enable you to express rich\n", " links in your data.\n", "\n", " * **Visualizing** your data and fitting simple models give insight into the \n", " data.\n" ] }, { "cell_type": "markdown", "metadata": { "id": "Nbz8MQmXchMZ" }, "source": [ "***\n", "\n", "\n", "## Resources \n", "- This workshop follows [Tests-as-linear](https://lindeloev.github.io/tests-as-linear/)\n", "- See more about [Scientific Computing](https://guides.library.brandeis.edu/science/computing) and [Python](https://guides.library.brandeis.edu/c.php?g=1150530&p=8519775) on our [Science Services library guide](https://guides.library.brandeis.edu/science)\n", "- Request 1-1 consultations with [Brandeis Data Services](https://guides.library.brandeis.edu/dataservices)\n", "- Take time with tutorials at [Kaggle.com](https://www.kaggle.com/learn)\n", "- [Brandeis LinkedIn Learning portal](https://www.brandeis.edu/its/support/linkedin-learning/index.html)\n", "- [Stackoverflow](https://stackoverflow.com/)\n", "- [Statsmodels User Guide](https://www.statsmodels.org/stable/user-guide.html)\n", "- [Think stats](http://greenteapress.com/wp/think-stats-2e) book\n", "- [Pandas Getting Started Tutorials](https://pandas.pydata.org/docs/getting_started/index.html#getting-started)\n", "- Data Visualization: [Python Graph Gallery](https://python-graph-gallery.com/)\n", "- [Seaborn example gallery](http://seaborn.pydata.org/examples/) " ] }, { "cell_type": "code", "metadata": { "id": "IxGmhWcRiJHV" }, "source": [ "" ], "execution_count": null, "outputs": [] } ] }