{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#### New to Plotly?\n", "Plotly's Python library is free and open source! [Get started](https://plotly.com/python/getting-started/) by dowloading the client and [reading the primer](https://plotly.com/python/getting-started/).\n", "
You can set up Plotly to work in [online](https://plotly.com/python/getting-started/#initialization-for-online-plotting) or [offline](https://plotly.com/python/getting-started/#initialization-for-offline-plotting) mode, or in [jupyter notebooks](https://plotly.com/python/getting-started/#start-plotting-online).\n", "
We also have a quick-reference [cheatsheet](https://images.plot.ly/plotly-documentation/images/python_cheat_sheet.pdf) (new!) to help you get started!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Imports\n", "The tutorial below imports [NumPy](http://www.numpy.org/), [Pandas](https://plotly.com/pandas/intro-to-pandas-tutorial/), and [SciPy](https://www.scipy.org/)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import plotly.plotly as py\n", "import plotly.graph_objs as go\n", "from plotly.tools import FigureFactory as FF\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import scipy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Generate Data" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Let us generate some random data from the `Normal Distriubtion`. We will sample 50 points from a normal distribution with mean $\\mu = 0$ and variance $\\sigma^2 = 1$ and from another with mean $\\mu = 2$ and variance $\\sigma^2 = 1$." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "data1 = np.random.normal(0, 1, size=50)\n", "data2 = np.random.normal(2, 1, size=50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The two normal probability distribution functions (p.d.f) stacked on top of each other look like this:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.linspace(-4, 4, 160)\n", "y1 = scipy.stats.norm.pdf(x)\n", "y2 = scipy.stats.norm.pdf(x, loc=2)\n", "\n", "trace1 = go.Scatter(\n", " x = x,\n", " y = y1,\n", " mode = 'lines+markers',\n", " name='Mean of 0'\n", ")\n", "\n", "trace2 = go.Scatter(\n", " x = x,\n", " y = y2,\n", " mode = 'lines+markers',\n", " name='Mean of 2'\n", ")\n", "\n", "data = [trace1, trace2]\n", "\n", "py.iplot(data, filename='normal-dists-plot')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### One Sample T Test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A `One Sample T-Test` is a statistical test used to evaluate the null hypothesis that the mean $m$ of a 1D sample dataset of independant observations is equal to the true mean $\\mu$ of the population from which the data is sampled. In other words, our null hypothesis is that\n", "\n", "$$\n", "\\begin{align*}\n", "m = \\mu\n", "\\end{align*}\n", "$$\n", "\n", "For our T-test, we will be using a significance level of `0.05`. On the matter of doing ethical science, it is good practice to always state the chosen significance level for a given test _before_ actually conducting the test. This is meant to ensure that the analyst does not modify the significance level for the purpose of achieving a desired outcome.\n", "\n", "For more information on the choice of 0.05 for a significance level, check out [this page](http://www.investopedia.com/exam-guide/cfa-level-1/quantitative-methods/hypothesis-testing.asp)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "true_mu = 0\n", "\n", "onesample_results = scipy.stats.ttest_1samp(data1, true_mu)\n", "\n", "matrix_onesample = [\n", " ['', 'Test Statistic', 'p-value'],\n", " ['Sample Data', onesample_results[0], onesample_results[1]]\n", "]\n", "\n", "onesample_table = FF.create_table(matrix_onesample, index=True)\n", "py.iplot(onesample_table, filename='onesample-table')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since our p-value is greater than our Test-Statistic, we have good evidence to not reject the null-hypothesis at the $0.05$ significance level. This is our expected result because the data was collected from a normal distribution." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Two Sample T Test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we have two independently sampled datasets (with equal variance) and are interested in exploring the question of whether the true means $\\mu_1$ and $\\mu_2$ are identical, that is, if the data were sampled from the same population, we would use a `Two Sample T-Test`.\n", "\n", "Typically when a researcher in a field is interested in the affect of a given test variable between two populations, they will take one sample from each population and will note them as the experimental group and the control group. The experimental group is the sample which will receive the variable being tested, while the control group will not.\n", "\n", "This test variable is observed (eg. blood pressure) for all the subjects and a two sided t-test can be used to investigate if the two groups of subjects were sampled from populations with the same true mean, i.e. \"Does the drug have an effect?\"" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "twosample_results = scipy.stats.ttest_ind(data1, data2)\n", "\n", "matrix_twosample = [\n", " ['', 'Test Statistic', 'p-value'],\n", " ['Sample Data', twosample_results[0], twosample_results[1]]\n", "]\n", "\n", "twosample_table = FF.create_table(matrix_twosample, index=True)\n", "py.iplot(twosample_table, filename='twosample-table')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since our p-value is much less than our Test Statistic, then with great evidence we can reject our null hypothesis of identical means. This is in alignment with our setup, since we sampled from two different normal pdfs with different means." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Collecting git+https://github.com/plotly/publisher.git\n", " Cloning https://github.com/plotly/publisher.git to /var/folders/ld/6cl3s_l50wd40tdjq2b03jxh0000gp/T/pip-ipGCBx-build\n", "Installing collected packages: publisher\n", " Found existing installation: publisher 0.10\n", " Uninstalling publisher-0.10:\n", " Successfully uninstalled publisher-0.10\n", " Running setup.py install for publisher ... \u001b[?25l-\b \b\\\b \bdone\n", "\u001b[?25hSuccessfully installed publisher-0.10\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/Users/brandendunbar/Desktop/test/venv/lib/python2.7/site-packages/IPython/nbconvert.py:13: ShimWarning: The `IPython.nbconvert` package has been deprecated. You should import from nbconvert instead.\n", " \"You should import from nbconvert instead.\", ShimWarning)\n", "/Users/brandendunbar/Desktop/test/venv/lib/python2.7/site-packages/publisher/publisher.py:53: UserWarning: Did you \"Save\" this notebook before running this command? Remember to save, always save.\n", " warnings.warn('Did you \"Save\" this notebook before running this command? '\n" ] } ], "source": [ "from IPython.display import display, HTML\n", "\n", "display(HTML(''))\n", "display(HTML(''))\n", "\n", "! pip install git+https://github.com/plotly/publisher.git --upgrade\n", "import publisher\n", "publisher.publish(\n", " 'python-T-Test.ipynb', 'python/t-test/', 'T-Test | plotly',\n", " 'Learn how to perform a one sample and two sample t-test using Python.',\n", " title='T-Test in Python. | plotly',\n", " name='T-Test',\n", " language='python',\n", " page_type='example_index', has_thumbnail='false', display_as='statistics', order=7,\n", " ipynb= '~notebook_demo/115')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }