{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#### New to Plotly?\n", "Plotly's Python library is free and open source! [Get started](https://plotly.com/python/getting-started/) by dowloading the client and [reading the primer](https://plotly.com/python/getting-started/).\n", "
You can set up Plotly to work in [online](https://plotly.com/python/getting-started/#initialization-for-online-plotting) or [offline](https://plotly.com/python/getting-started/#initialization-for-offline-plotting) mode, or in [jupyter notebooks](https://plotly.com/python/getting-started/#start-plotting-online).\n", "
We also have a quick-reference [cheatsheet](https://images.plot.ly/plotly-documentation/images/python_cheat_sheet.pdf) (new!) to help you get started!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Imports\n", "The tutorial below imports [Numpy](http://www.numpy.org/), [Pandas](https://plotly.com/pandas/intro-to-pandas-tutorial/), and [SciPy](https://www.scipy.org/)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import plotly.plotly as py\n", "import plotly.graph_objs as go\n", "from plotly.tools import FigureFactory as FF\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import scipy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Import Data" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "We will import a dataset to perform our discrete frequency analysis on. We will look at the consumption of alcohol by country in 2010." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2010_alcohol_consumption_by_country.csv')\n", "df = data[0:10]\n", "\n", "table = FF.create_table(df)\n", "py.iplot(table, filename='alcohol-data-sample')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Probability Distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can produce a histogram plot of the data with the y-axis representing the probability distribution of the data." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = data['alcohol'].values.tolist()\n", "\n", "trace = go.Histogram(x=x, histnorm='probability',\n", " xbins=dict(start=np.min(x),\n", " size=0.25,\n", " end=np.max(x)),\n", " marker=dict(color='rgb(25, 25, 100)'))\n", "\n", "layout = go.Layout(\n", " title=\"Histogram with Probability Distribution\"\n", ")\n", "\n", "fig = go.Figure(data=go.Data([trace]), layout=layout)\n", "py.iplot(fig, filename='histogram-prob-dist')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Frequency Counts" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trace = go.Histogram(x=x,\n", " xbins=dict(start=np.min(x),\n", " size=0.25,\n", " end=np.max(x)),\n", " marker=dict(color='rgb(25, 25, 100)'))\n", "\n", "layout = go.Layout(\n", " title=\"Histogram with Frequency Count\"\n", ")\n", "\n", "fig = go.Figure(data=go.Data([trace]), layout=layout)\n", "py.iplot(fig, filename='histogram-discrete-freq-count')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Percentage" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trace = go.Histogram(x=x, histnorm='percent',\n", " xbins=dict(start=np.min(x),\n", " size=0.25,\n", " end=np.max(x)),\n", " marker=dict(color='rgb(50, 50, 125)'))\n", "\n", "layout = go.Layout(\n", " title=\"Histogram with Frequency Count\"\n", ")\n", "\n", "fig = go.Figure(data=go.Data([trace]), layout=layout)\n", "py.iplot(fig, filename='histogram-percentage')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Cumulative Density Function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also take the cumulatve sum of our dataset and then plot the cumulative density function, or `CDF`, as a scatter plot" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cumsum = np.cumsum(x)\n", "\n", "trace = go.Scatter(x=[i for i in range(len(cumsum))], y=10*cumsum/np.linalg.norm(cumsum),\n", " marker=dict(color='rgb(150, 25, 120)'))\n", "layout = go.Layout(\n", " title=\"Cumulative Distribution Function\"\n", ")\n", "\n", "fig = go.Figure(data=go.Data([trace]), layout=layout)\n", "py.iplot(fig, filename='cdf-dataset')" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Collecting git+https://github.com/plotly/publisher.git\n", " Cloning https://github.com/plotly/publisher.git to /var/folders/ld/6cl3s_l50wd40tdjq2b03jxh0000gp/T/pip-54mgFf-build\n", "Installing collected packages: publisher\n", " Found existing installation: publisher 0.10\n", " Uninstalling publisher-0.10:\n", " Successfully uninstalled publisher-0.10\n", " Running setup.py install for publisher ... \u001b[?25l-\b \b\\\b \bdone\n", "\u001b[?25hSuccessfully installed publisher-0.10\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/Users/brandendunbar/Desktop/test/venv/lib/python2.7/site-packages/IPython/nbconvert.py:13: ShimWarning: The `IPython.nbconvert` package has been deprecated. You should import from nbconvert instead.\n", " \"You should import from nbconvert instead.\", ShimWarning)\n", "/Users/brandendunbar/Desktop/test/venv/lib/python2.7/site-packages/publisher/publisher.py:53: UserWarning: Did you \"Save\" this notebook before running this command? Remember to save, always save.\n", " warnings.warn('Did you \"Save\" this notebook before running this command? '\n" ] } ], "source": [ "from IPython.display import display, HTML\n", "\n", "display(HTML(''))\n", "display(HTML(''))\n", "\n", "! pip install git+https://github.com/plotly/publisher.git --upgrade\n", "import publisher\n", "publisher.publish(\n", " 'python-Discrete-Frequency.ipynb', 'python/discrete-frequency/', 'Discrete Frequency | plotly',\n", " 'Learn how to perform discrete frequency analysis using Python.',\n", " title='Discrete Frequency in Python. | plotly',\n", " name='Discrete Frequency',\n", " language='python',\n", " page_type='example_index', has_thumbnail='false', display_as='statistics', order=3,\n", " ipynb= '~notebook_demo/110')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }