{
"metadata": {
"name": "",
"signature": "sha256:a21a0c2a16c38cd5f3d80b0d0037df205b3bdca960394c3208af0ffa70b229e7"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": [
"%load_ext watermark"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%watermark -a 'Sebastian Raschka' -v -d -p numpy,scipy,matplotlib,scikit-learn"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Sebastian Raschka 24/08/2014 \n",
"\n",
"CPython 3.4.1\n",
"IPython 2.1.0\n",
"\n",
"numpy 1.8.1\n",
"scipy 0.14.0\n",
"matplotlib 1.3.1\n",
"scikit-learn 0.15.0b1\n"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Matplolib examples - visualization techniques for exploratory data analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are enumerous useful visualization techniques that are useful for exploratory data analysis: In practice, the choice highly depends on the kind of data and the question at hand. \n",
"\n",
"This IPython notebook is a small gallery for visualizing the Iris flower dataset. This gallery is more meant to be a code matplotlib reference so that certain plots may be more or less useful in the context of this dataset. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Sections"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- [Reading in the dataset](#Reading-in-the-dataset)\n",
"- [Pie chart](#Pie-chart)\n",
"- [Bar plot](#Bar-plot)\n",
"- [Box plot](#Box-plot)\n",
"- [1D Histogram](#1D-Histogram)\n",
"- [2D Histogram](#2D-Histogram)\n",
"- [3D Histogram](#3D-Histogram)\n",
"- [Scatter plot](#Scatter-plot)\n",
"- [3D Scatter plot](#3D-Scatter-plot)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Reading in the dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to top](#Sections)]"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd\n",
"\n",
"# dictionary of the feature names\n",
"feature_dict = {i:label for i,label in zip(\n",
" range(4),\n",
" ('sepal length in cm', \n",
" 'sepal width in cm', \n",
" 'petal length in cm', \n",
" 'petal width in cm', ))}\n",
"\n",
"# reading the CSV file directly from the UCI machine learning repository\n",
"df = pd.io.parsers.read_csv(\n",
" filepath_or_buffer='https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', \n",
" header=None, \n",
" sep=',', \n",
" )\n",
"\n",
"df.columns = [l for i,l in sorted(feature_dict.items())] + ['class label']\n",
"df.dropna(how=\"all\", inplace=True) # to drop the empty line at file-end\n",
"\n",
"df.tail()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"
\n", " | sepal length in cm | \n", "sepal width in cm | \n", "petal length in cm | \n", "petal width in cm | \n", "class label | \n", "
---|---|---|---|---|---|
145 | \n", "6.7 | \n", "3.0 | \n", "5.2 | \n", "2.3 | \n", "Iris-virginica | \n", "
146 | \n", "6.3 | \n", "2.5 | \n", "5.0 | \n", "1.9 | \n", "Iris-virginica | \n", "
147 | \n", "6.5 | \n", "3.0 | \n", "5.2 | \n", "2.0 | \n", "Iris-virginica | \n", "
148 | \n", "6.2 | \n", "3.4 | \n", "5.4 | \n", "2.3 | \n", "Iris-virginica | \n", "
149 | \n", "5.9 | \n", "3.0 | \n", "5.1 | \n", "1.8 | \n", "Iris-virginica | \n", "