{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "code", "collapsed": false, "input": [ "from __future__ import division\n", "\n", "import numpy as np\n", "import scipy as sp\n", "import pandas as pd\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "\n", "#IPython magic command for inline plotting\n", "%matplotlib inline\n", "#a better plot shape for IPython\n", "mpl.rcParams['figure.figsize']=[15,3]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ " Quick Overview of matplotlib" ] }, { "cell_type": "code", "collapsed": false, "input": [ "x = np.linspace(0, 1, 10001)\n", "y = np.cos(np.pi/x) * np.exp(-x**2)\n", "\n", "plt.plot(x, y)\n", "plt.show()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Plot the following equations over the domain $y \\in \\left[-1, 2\\right]$.\n", " * $y = f(x) = x^2 \\exp(-x)$\n", " * $y = f(x) = \\log x$\n", " * $y = f(x) = 1 + x^x + 3 x^4$" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we say pandas, we are not talking about...." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But about..." ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Data analysis: pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **pandas** data analysis module provides data structures and tools for data analysis. It focuses on data handling and manipulation as well as linear and panel regression. It is designed to let you carry out your entire data workflow in Python without having to switch to a domain-specific language such as R.\n", "Although largely compatible with NumPy/SciPy, there are some important differences in indexing, data organization, and features. The basic Pandas data type is not ndarray, but **Series** and **DataFrame**. These allow you to index data and align axes efficiently." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Series" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A `Series` object is a one-dimensional array which can hold any data type. Like a dictionary, it has a set of indices for access (like keys); unlike a dictionary, it is ordered. Data alignment is intrinsic and will not be broken unless you do it explicitly. It is very similar to ndarray from NumPy.\n", "An arbitrary list of values can be used as the index, or a list of axis labels (so it can act something like a `dict`)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "s = pd.Series([1,5,float('NaN'),7.5,2.1,3])\n", "print(s)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "dates = pd.date_range('20140201', periods=s.size)\n", "s.index = dates\n", "print(s)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "letters = ['A', 'B', 'Ch', '#', '#', '---']\n", "s.index = letters\n", "print(s)\n", "print('\\nAccess is like a dictionary key:\\ns[\\'---\\'] = '+str(s['---']))\n", "print('\\nRepeat labels are possible:\\ns[\\'#\\']=\\n'+str(s['#']))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy functions expecting an ndarray often do just fine with Series as well." ] }, { "cell_type": "code", "collapsed": false, "input": [ "t = np.exp(s)\n", "print(t)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "String Methods" ] }, { "cell_type": "code", "collapsed": false, "input": [ "s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ " s.str.upper()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ " s.str.lower()\n", " " ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "s.str.len()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "s2 = pd.Series(['a_b_c', 'c_d_e', np.nan, 'f_g_h'])\n", "print s2\n" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "s2.str.split('_')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
MethodDescription
catConcatenate strings
splitSplit strings on delimiter
getIndex into each element (retrieve i-th element
joinJoin strings in each element of the Series with passed separator
containsReturn boolean array if each string contains pattern/regex
replaceReplace occurrences of pattern/regex with some other string
repeatDuplicate values (s.str.repeat(3) equivalent to x * 3)
padAdd whitespace to left, right, or both sides of strings
centerEquivalent to pad(side='both')
wrapSplit long strings into lines with length less than a given width
sliceSlice each string in the Series
slice_replaceReplace slice in each string with passed value
countCount occurrences of pattern
startswithEquivalent to str.startswith(pat) for each element
endswithEquivalent to str.endswith(pat) for each element
findall Compute list of all occurrences of pattern/regex for each string
matchCall re.match on each element, returning matched groups as list
extractCall re.match on each element, as match does, but return matched groups as strings for convenience.
lenCompute string lengths
stripEquivalent to str.strip
rstripEquivalent to str.rstrip
lstripEquivalent to str.lstrip
lowerEquivalent to str.lower
upperEquivalent to str.upper
" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "DataFrame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `DataFrame` object is similar to a table or a spreadsheet in Excel, i.e. a 2D Matrix-like object. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "s = pd.Series([1,5,float('NaN'),7.5,2.1,3])\n", "df = pd.DataFrame(s, columns=['x'])\n", "print(df)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "t=np.exp(s)\n", "df['exp(x)'] = t\n", "df['exp(exp(x))'] = np.exp(t)\n", "print(df)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are a number of ways to access the elements of a `DataFrame`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "print(df['x'], '\\n') #column\n", "#letters = ['A', 'B', 'Ch', '#', '#', '---']\n", "#df.index=letters\n", "#print(df.loc['#'], '\\n') #row by label\n", "#print(df.iloc[3], '\\n') #row by number (note the transposition in output!)\n", "print(df[1:4]) #row by slice" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Exercise 1" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df1=pd.DataFrame(np.random.randn(dates.size,4),index=dates,columns=list('ABCD'))\n", "print df1" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the `DataFrame df1` created above, perform the following operations:\n", "\n", " 1. df1.head() and df1.tail()\n", " 2. df1.describe()\n", " 3. df1.T\n", " 4. df1.sort(columns='B')\n", " 5. df1.columns, df1.index, df1.values" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df1.sort(columns=list('B'))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Boolean indexing in a DataFrame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using a condition on a column(s) to extract data." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from pandas.util.testing import rands" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df=pd.DataFrame(np.random.randn(dates.size,4),index=dates,columns=list('ABCD'))\n", "print df" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df[df.B>0]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df[df > 0]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df2 = df.copy()\n", "df2['E']=['one', 'one','two','three','four','three']\n", "print df2" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df2[df2['E'].isin(['one'])]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df.at[dates[0],'A'] = 0\n", "print df" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df.iat[0,1] = 0\n", "print df" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Exercise 2" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from random import randint\n", "df = pd.DataFrame({'A': [randint(1, 9) for x in xrange(10)],\n", " 'B': [randint(1, 9)*10 for x in xrange(10)],\n", " 'C': [randint(1, 9)*100 for x in xrange(10)]})\n", "print df" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find the entries from *A* for which corresponding values for *B* will be greater than 50, and those in *C* equal to 900" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Missing Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data usually comes in many formats. `pandas` aims to be flexible with regard to handling missing data. `NaN` is the default missing value marker. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "df= pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))\n", "print df" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df1 = df.reindex(index=dates[0:4],columns=list(df.columns) + ['E'])\n", "print df1" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df1.loc[dates[0]:dates[1],'E'] = 1\n", "print df1" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df1.dropna(how='all') #any" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df1.fillna(value=15)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "pd.isnull(df1)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df2=pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))\n", "print df2\n", "df2.loc[dates[0]:dates[2],'B']=float('NaN')\n", "print df2\n", "print df1+df2" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Stats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Operations *exclude* missing values" ] }, { "cell_type": "code", "collapsed": false, "input": [ "#Gaussian numbers histogram\n", "from numpy.random import normal\n", "n = 1000\n", "x = pd.Series(normal(size=n))\n", "#print x\n", "avg = x.mean()\n", "std = x.std()\n", "\n", "x_avg = pd.Series(np.ones(n)* avg)\n", "x_stdl = pd.Series(np.ones(n)*(avg-std))\n", "x_stdh = pd.Series(np.ones(n)*(avg+std))\n", "\n", "df_gauss=pd.DataFrame({'A':x_stdl,'B':x_stdh,'x':x})\n", "\n", "df_gauss.plot(style=['rx','rx','bx'])\n", "plt.figure()\n", "df_gauss['x'].diff().hist(color='g', bins=50)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Exercise 3: " ] }, { "cell_type": "code", "collapsed": false, "input": [ "df=pd.DataFrame(np.random.randn(5,5), columns=list('ABCDE'))\n", "print df" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try the following with `df` as defined above:\n", "\n", " 1. df.mean()\n", " 2. df.apply(np.cumsum)\n", " 3. df.apply(lambda x: x.max() - x.min())\n", " 4. Plot a histogram" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df.apply(lambda x: x.max() - x.min())\n", "#What does lambda do?" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "def f(x):\n", "... return x*2\n", "g = lambda x: x*2 \n", "\n", "print g(3)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Reading CSV files " ] }, { "cell_type": "code", "collapsed": false, "input": [ "from pandas import read_csv\n", "from urllib import urlopen\n", "page = urlopen(\"http://econpy.pythonanywhere.com/ex/NFL_1979.csv\")\n", "df = read_csv(page)\n", "print df" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df1=df[0:10]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "print df1" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "A=df1[:3]\n", "B=df1[3:7]\n", "C=df1[7:10]\n", "print A,B,C" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "parts=[A,B,C]\n", "df2=pd.concat(parts)\n", "print df2" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "left = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]})\n", "right= pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]})\n", "print left\n", "print right " ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "pd.merge(left, right, on='key')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df = pd.DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "rowadd=df.iloc[3]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "print rowadd,df" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df.append(rowadd,ignore_index=True)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',\n", " ....: 'foo', 'bar', 'foo', 'foo'],\n", " ....: 'B' : ['one', 'one', 'two', 'three',\n", " ....: 'two', 'two', 'one', 'three'],\n", " ....: 'C' : np.random.randn(8),\n", " ....: 'D' : np.random.randn(8)})\n", "print df" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df.groupby('A').sum()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df.groupby(['A','B']).sum()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Regression analysis refers to the process of estimating relationships between variables.\n", "\n", "Linear regression is equivalent to fitting a line between to sets of data points (x,y)\n", "\n", "$$y_i(x) = a_0 + a_1x_i $$\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "\n", "import statsmodels.formula.api as sm\n", "import matplotlib.pyplot as plt\n", "url = \"http://vincentarelbundock.github.com/Rdatasets/csv/HistData/Guerry.csv\"\n", "df = pd.read_csv(url)\n", "#print df\n", "df = df[['Lottery', 'Literacy', 'Wealth', 'Region']].dropna()\n", "df.head()\n" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "mod = sm.ols(formula='Lottery ~ Literacy + Wealth + Region', data=df)\n", "res = mod.fit()\n", "print res.summary()\n" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "T-Test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The t-test assesses whether the means of two groups are statistically different from each other. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "\n", "town1_heights = pd.Series([5, 6, 7, 6, 7.1, 6, 4])\n", "town2_heights = pd.Series([5.5, 6.5, 7, 6, 7.1, 6])\n", "\n", "town1_mean = town1_heights.mean()\n", "town2_mean = town2_heights.mean()\n", "\n", "print \"Town 1 avg. height\", town1_mean\n", "print \"Town 2 avg. height\", town2_mean\n", "\n", "print \"Effect size: \", abs(town1_mean - town2_mean)\n", "\n", "df=pd.DataFrame({'T1':town1_heights,'T2':town2_heights})\n", "b=df.boxplot()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "from scipy import stats\n", "\n", "print \"Town 1 Shapiro-Wilks p-value\", stats.shapiro(town1_heights)[1]\n", "\n", "print \" T-Test p-value:\", stats.ttest_ind(town1_heights, town2_heights,equal_var = False)[1]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Time Series" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A time series is a sequence of data points, measured typically at successive points in time spaced at uniform time intervals. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "rng = pd.date_range('1/1/2012', periods=100, freq='S')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "print rng" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ " ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "ts.plot()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Panels" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Panels are 3-Dimensional data structures." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas.util.testing as tm\n", "panel = tm.makePanel(5)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "print panel['ItemA']" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Plotting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use a combination of `pandas` and `matplotlib` to exploit data analyis and visualization" ] }, { "cell_type": "code", "collapsed": false, "input": [ "pd.options.display.mpl_style = 'default'" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))\n", "ts=ts.cumsum()\n", "ts.plot()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "#Bar plot\n", "\n", "ts = pd.DataFrame(np.random.randn(1000,5), index=pd.date_range('1/1/2000', periods=1000))\n", "ts=ts.cumsum()\n", "print ts.ix[5]\n", "ts.ix[5].plot(kind='bar'); plt.axhline(0, color='k')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])\n", "print df\n", "#df.plot(kind='bar')\n", "#df.plot(kind='bar', stacked=True)\n", "#df.plot(kind='barh', stacked=True)\n", "#print pd.__version__\n" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "from pandas.tools.plotting import scatter_matrix\n", "df = pd.DataFrame(np.random.randn(100, 4), columns=['a', 'b', 'c', 'd'])\n", "scatter_matrix(df, figsize=(7, 7), diagonal='kde')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "from pandas import read_csv\n", "from urllib import urlopen\n", "from pandas.tools.plotting import andrews_curves \n", "\n", "page = urlopen(\"https://raw.githubusercontent.com/pydata/pandas/master/pandas/tests/data/iris.csv\")\n", "df = read_csv(page)\n", "andrews_curves(df, 'Name')\n", "\n", "from pandas.tools.plotting import parallel_coordinates\n", "\n", "#parallel_coordinates(df,'Name')\n" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "from pandas.tools.plotting import lag_plot\n", "data = pd.Series(0.1 * np.random.rand(1000) + 0.9 * np.sin(np.linspace(-99 * np.pi, 99 * np.pi, num=1000)))\n", "lag_plot(data)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "References" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- [Pandas Data Visualization](http://pandas.pydata.org/pandas-docs/stable/visualization.html/). \n", "- [Pandas documentation](http://pandas.pydata.org/pandas-docs/stable/index.html/).\n", "- [On Andrew's curves](http://sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/tutorials/mvahtmlnode9.html)\n", "- [Parallel coordinates](http://en.wikipedia.org/wiki/Parallel_coordinates)" ] } ], "metadata": {} } ] }