{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#Lede Program\n", "#Data and databases 2015\n", "#Session 07\n", "#Number munging: vectors, Pandas, probabilities" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Render our plots inline\n", "%matplotlib inline\n", "\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "pd.set_option('display.mpl_style', 'default') # Make the graphs a bit prettier\n", "plt.rcParams['figure.figsize'] = (15, 5)\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get our big data file from my webpage. You can use `http` in your browser or `wget` or whatever. You'll need to uncompress it." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "!wget http://www.columbia.edu/~mj340/HMXPC_13.zip #etc. whatever" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#check contents of directory!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#Our ritual: Exploratory data analysis\n", "\n", "\n", "> Exploratory data analysis (EDA) seeks to reveal structure, or simple descriptions, in data. We look at numbers and graphs and try to find patterns. \n", " - Persi Diaconis, \"Theories of Data Analysis: From Magical Thinking Through Classical statistics\"\n", "\n", "> . . . proceeding via a ‘dustbowl’ empiricism is dangerous at worst and foolish at best . . . . The purely empirical approach is particularly dangerous in an age when computers and packaged programs are readily available, since there is temptation to substitute immediate empirical analysis for more analytic thought and theory building.\n", " - Einhorn, “Alchemy in the Behavioral Sciences,” 1972\n", "\n", ">. . . we can view the techniques of EDA as a ritual designed to reveal patters in a data set. Thus, we may believe that naturally occurring data sets contain structure, that EDA is a useful vehicle for revealing the structure. . . . If we make no attempt to check whether the structure could have arisen by chance, and tend to accept the findinds as gospel, then the ritual comes close to magical thinking. ... a controlled form of magical thinking--in the guise of 'working hypothesis'--is a basic ingredient of scientific progress. \n", " - Persi Diaconis, \"Theories of Data Analysis: From Magical Thinking Through Classical statistics\"\n", "\n", "#From data to databases to data mining\n", "- move from accessing and manipulating data to performing ever more complicated *queries* on our data\n", "\n", "\n", "#`Pandas` first-line `python` tool for EDA\n", "- rich data structures\n", "- powerful ways to slice, dice, reformate, fix, and eliminate data\n", " - taste of what can do\n", "- tables like Excel or a spreadsheet\n", "- rich queries like databases\n", "- manipulation on vectors and matrices directly\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#`Pandas`: charismatic megafauna" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }