{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#### New to Plotly?\n", "Plotly's Python library is free and open source! [Get started](https://plotly.com/python/getting-started/) by dowloading the client and [reading the primer](https://plotly.com/python/getting-started/).\n", "
You can set up Plotly to work in [online](https://plotly.com/python/getting-started/#initialization-for-online-plotting) or [offline](https://plotly.com/python/getting-started/#initialization-for-offline-plotting) mode, or in [jupyter notebooks](https://plotly.com/python/getting-started/#start-plotting-online).\n", "
We also have a quick-reference [cheatsheet](https://images.plot.ly/plotly-documentation/images/python_cheat_sheet.pdf) (new!) to help you get started!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Imports\n", "The tutorial below imports [NumPy](http://www.numpy.org/), [Pandas](https://plotly.com/pandas/intro-to-pandas-tutorial/), and [SciPy](https://www.scipy.org/)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import plotly.plotly as py\n", "import plotly.graph_objs as go\n", "from plotly.tools import FigureFactory as FF\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import scipy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Import Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us import a dataset to perform our statistics. We will be looking at the consumption of alcohol by country in 2010. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2010_alcohol_consumption_by_country.csv')\n", "df = data[0:10]\n", "\n", "table = FF.create_table(df)\n", "py.iplot(table, filename='alcohol-data-sample')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Mean and Variance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Two of the most basic statistical operations are the `mean` $\\mu$ and `standard deviation` $\\sigma$ of a one-dimension array of data, that is, a sequence of numeric values. The `mean` of a set of numbers $x_1, ..., x_N$ is defined as:\n", "\n", "$$\\begin{align*}\n", "\\mu = \\sum_{i=1}^N{x_i}\n", "\\end{align*}\n", "$$\n", "\n", "The mean is used colloquially as the _average_ of a set of values. The standard deviation on the other hand is a statistical metric that describes the spread of the data, or how far the values are from the mean. The `standard deviation` of a set of data is defined as:\n", "\n", "$$\\begin{align*}\n", "\\sigma = \\sqrt{\\frac{1}{N-1}\\sum_{i=1}^{N}{(x_i-\\mu)^2}}\n", "\\end{align*}\n", "$$" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The mean is 6.2083769633507835\n", "The standard deviation is 4.130671000635401\n" ] } ], "source": [ "mean = np.mean(data['alcohol'])\n", "st_dev = np.std(data['alcohol'])\n", "\n", "print(\"The mean is %r\") %(mean)\n", "print(\"The standard deviation is %r\") %(st_dev)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Secondary Statistics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also compute other statistics such as the `median`, `maximum` and `minimum` of the data" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The median is 6.4000000000000004\n", "The maximum is 17.5\n", "The minimum is 0.10000000000000001\n" ] } ], "source": [ "median = np.median(data['alcohol'])\n", "maximum = np.max(data['alcohol'])\n", "minimum = np.min(data['alcohol'])\n", "\n", "print(\"The median is %r\") %(median)\n", "print(\"The maximum is %r\") %(maximum)\n", "print(\"The minimum is %r\") %(minimum)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Visualize the Statistics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can visualize these statistics by producing a Plotly box or Violin chart." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = data['alcohol'].values.tolist()\n", "\n", "fig = FF.create_violin(y, title='Violin Plot', colors='#604d9e')\n", "py.iplot(fig, filename='alcohol-violin-visual')" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = data['alcohol'].values.tolist()\n", "\n", "trace = go.Box(\n", " y=y,\n", " name = 'Box Plot',\n", " boxpoints='all',\n", " jitter=0.3,\n", " marker = dict(\n", " color = 'rgb(214,12,140)',\n", " ),\n", ")\n", "\n", "layout = go.Layout(\n", " width=500,\n", " yaxis=dict(\n", " title='Alcohol Consumption by Country',\n", " zeroline=False\n", " ),\n", ")\n", "\n", "data = [trace]\n", "fig= go.Figure(data=data, layout=layout)\n", "py.iplot(fig, filename='alcohol-box-plot')" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Collecting git+https://github.com/plotly/publisher.git\n", " Cloning https://github.com/plotly/publisher.git to /var/folders/ld/6cl3s_l50wd40tdjq2b03jxh0000gp/T/pip-ULX1Fx-build\n", "Installing collected packages: publisher\n", " Found existing installation: publisher 0.10\n", " Uninstalling publisher-0.10:\n", " Successfully uninstalled publisher-0.10\n", " Running setup.py install for publisher ... \u001b[?25l-\b \b\\\b \bdone\n", "\u001b[?25hSuccessfully installed publisher-0.10\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/Users/brandendunbar/Desktop/test/venv/lib/python2.7/site-packages/IPython/nbconvert.py:13: ShimWarning: The `IPython.nbconvert` package has been deprecated. You should import from nbconvert instead.\n", " \"You should import from nbconvert instead.\", ShimWarning)\n", "/Users/brandendunbar/Desktop/test/venv/lib/python2.7/site-packages/publisher/publisher.py:53: UserWarning: Did you \"Save\" this notebook before running this command? Remember to save, always save.\n", " warnings.warn('Did you \"Save\" this notebook before running this command? '\n" ] } ], "source": [ "from IPython.display import display, HTML\n", "\n", "display(HTML(''))\n", "display(HTML(''))\n", "\n", "! pip install git+https://github.com/plotly/publisher.git --upgrade\n", "import publisher\n", "publisher.publish(\n", " 'python-Basic-Statistics.ipynb', 'python/basic-statistics/', 'Basic Statistics | plotly',\n", " 'Learn how to perform basic statistical operations using Python.',\n", " title='Basic Statistics in Python. | plotly',\n", " name='Basic Statistics',\n", " language='python',\n", " page_type='example_index', has_thumbnail='false', display_as='statistics', order=1,\n", " ipynb= '~notebook_demo/109')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }