{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#### New to Plotly?\n", "Plotly's Python library is free and open source! [Get started](https://plotly.com/python/getting-started/) by dowloading the client and [reading the primer](https://plotly.com/python/getting-started/).\n", "
You can set up Plotly to work in [online](https://plotly.com/python/getting-started/#initialization-for-online-plotting) or [offline](https://plotly.com/python/getting-started/#initialization-for-offline-plotting) mode, or in [jupyter notebooks](https://plotly.com/python/getting-started/#start-plotting-online).\n", "
We also have a quick-reference [cheatsheet](https://images.plot.ly/plotly-documentation/images/python_cheat_sheet.pdf) (new!) to help you get started!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Imports\n", "The tutorial below imports [NumPy](http://www.numpy.org/), [Pandas](https://plotly.com/pandas/intro-to-pandas-tutorial/), and [SciPy](https://www.scipy.org/)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import plotly.plotly as py\n", "import plotly.graph_objs as go\n", "import plotly.tools as tools\n", "from plotly.tools import FigureFactory as FF\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import scipy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Import Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To properly visualize our data and normalization, let us import a dataset of Apple Stock prices in 2014:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "apple_data = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2014_apple_stock.csv')\n", "df = apple_data[0:10]\n", "\n", "table = FF.create_table(df)\n", "py.iplot(table, filename='apple-data-sample')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Normalize by a Constant\n", "Normalize a dataset by dividing each data point by a constant, such as the standard deviation of the data." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This is the format of your plot grid:\n", "[ (1,1) x1,y1 ]\n", "[ (2,1) x2,y2 ]\n", "\n" ] }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = apple_data['AAPL_y']\n", "\n", "data_norm_by_std = [number/scipy.std(data) for number in data]\n", "\n", "trace1 = go.Histogram(\n", " x=data,\n", " opacity=0.75,\n", " name='data'\n", ")\n", "\n", "trace2 = go.Histogram(\n", " x=data_norm_by_std,\n", " opacity=0.75,\n", " name='normalized by std = ' + str(scipy.std(data)),\n", ")\n", "\n", "fig = tools.make_subplots(rows=2, cols=1)\n", "\n", "fig.append_trace(trace1, 1, 1)\n", "fig.append_trace(trace2, 2, 1)\n", "\n", "fig['layout'].update(height=600, width=800, title='Normalize by a Constant')\n", "py.iplot(fig, filename='apple-data-normalize-constant')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Normalize to [0, 1]\n", "Normalize a dataset by dividing each data point by the norm of the dataset." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This is the format of your plot grid:\n", "[ (1,1) x1,y1 ]\n", "[ (2,1) x2,y2 ]\n", "\n" ] }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_norm_to_0_1 = [number/scipy.linalg.norm(data) for number in data]\n", "\n", "trace1 = go.Histogram(\n", " x=data,\n", " opacity=0.75,\n", " name='data',\n", ")\n", "\n", "trace2 = go.Histogram(\n", " x=data_norm_to_0_1,\n", " opacity=0.75,\n", " name='normalized to [0,1]',\n", ")\n", "\n", "fig = tools.make_subplots(rows=2, cols=1)\n", "\n", "fig.append_trace(trace1, 1, 1)\n", "fig.append_trace(trace2, 2, 1)\n", "\n", "fig['layout'].update(height=600, width=800, title='Normalize to [0,1]')\n", "py.iplot(fig, filename='apple-data-normalize-0-1')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Normalizing to any Interval\n", "Normalize a dataset to an interval [a, b] where a, b are real numbers." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This is the format of your plot grid:\n", "[ (1,1) x1,y1 ]\n", "[ (2,1) x2,y2 ]\n", "\n" ] }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = 10\n", "b = 50\n", "data_norm_to_a_b = [(number - a)/(b - a) for number in data]\n", "\n", "trace1 = go.Histogram(\n", " x=data,\n", " opacity=0.75,\n", " name='data',\n", ")\n", "\n", "trace2 = go.Histogram(\n", " x=data_norm_to_a_b,\n", " opacity=0.75,\n", " name='normalized to [10,50]',\n", ")\n", "\n", "fig = tools.make_subplots(rows=2, cols=1)\n", "\n", "fig.append_trace(trace1, 1, 1)\n", "fig.append_trace(trace2, 2, 1)\n", "\n", "fig['layout'].update(height=600, width=800, title='Normalize to [10,50]')\n", "py.iplot(fig, filename='apple-data-normalize-a-b')" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Collecting git+https://github.com/plotly/publisher.git\n", " Cloning https://github.com/plotly/publisher.git to /var/folders/ld/6cl3s_l50wd40tdjq2b03jxh0000gp/T/pip-cIVPBZ-build\n", "Installing collected packages: publisher\n", " Found existing installation: publisher 0.10\n", " Uninstalling publisher-0.10:\n", " Successfully uninstalled publisher-0.10\n", " Running setup.py install for publisher ... \u001b[?25l-\b \bdone\n", "\u001b[?25hSuccessfully installed publisher-0.10\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/Users/brandendunbar/Desktop/test/venv/lib/python2.7/site-packages/IPython/nbconvert.py:13: ShimWarning: The `IPython.nbconvert` package has been deprecated. You should import from nbconvert instead.\n", " \"You should import from nbconvert instead.\", ShimWarning)\n", "/Users/brandendunbar/Desktop/test/venv/lib/python2.7/site-packages/publisher/publisher.py:53: UserWarning: Did you \"Save\" this notebook before running this command? Remember to save, always save.\n", " warnings.warn('Did you \"Save\" this notebook before running this command? '\n" ] } ], "source": [ "from IPython.display import display, HTML\n", "\n", "display(HTML(''))\n", "display(HTML(''))\n", "\n", "! pip install git+https://github.com/plotly/publisher.git --upgrade\n", "import publisher\n", "publisher.publish(\n", " 'python_Normalization.ipynb', 'python/normalization/', 'Normalization | plotly',\n", " 'Learn how to normalize data by fitting to intervals on the real line and dividing by a constant',\n", " title='Normalization in Python. | plotly',\n", " name='Normalization',\n", " language='python',\n", " page_type='example_index', has_thumbnail='false', display_as='mathematics', order=2,\n", " ipynb= '~notebook_demo/103')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }