{ "metadata": { "name": "", "signature": "sha256:4160fb4cc19cc97187ca3a5f9acbad4dd7ccc7c28bee0a5faa9b043be2fc54ba" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Analytics Beyond the Basics with pandas and SQL\n", "===\n", "\n", "Wes McKinney\n", "---\n", "@wesmckinn, wes@cloudera.com\n", "---\n", "Strata-Hadoop World SJ 2015\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tutorial motivation\n", "===\n", "- Illustrate useful analytics techniques in both pandas and SQL\n", "- Tools\n", " - pandas 0.15.1 or higher (install via Anaconda)\n", " - PostgreSQL \n", "- Some SQL differences between database flavors\n", "- Materials: https://github.com/wesm/strata-sj-2015\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Topics\n", "===\n", "- More complex data grouping and augmentation\n", "- Top-N filtering\n", "- Subqueries\n", "- Semi- and anti-joins\n", "- Using window functions and equivalents\n", "- Histograms and quantile analysis\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Datasets\n", "===\n", "- SF Crime Reports since 2014\n", "- Crunchbase Data Export (CC-BY-NC license)\n", "- Companies\n", "- Funding rounds and investors\n", "- Acquisitions\n", "- Find these in CSV form in the GitHub repo, along with a Python data loading script" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ceremonial preparation of the datas\n", "===\n", "\n", "See gnarly data loading script in GitHub repo to put data into Postgres \n", "---\n", "(p.s. please let me know if there's a less painful way that also addresses the bad data issues)\n", "---" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%matplotlib inline" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "from datetime import datetime, time\n", "import pandas as pd\n", "import numpy as np\n", "import psycopg2 as pg\n", "import pandas.io.sql as sql\n", "\n", "CRIME_DATE_FORMAT = '%m/%d/%Y %H:%M:%S %p'" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "from sqlalchemy import create_engine\n", "con = create_engine('postgresql://wesm:foo@localhost:5432/strata_2015')\n", "con" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "Engine(postgresql://wesm:***@localhost:5432/strata_2015)" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "table = pd.read_sql(\"select * from crimes limit 10;\", con)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "crimes = pd.read_csv('sf_crimes.csv')\n", "pd.to_datetime(crimes.Date[:1000])" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ "0 2014-01-30\n", "1 2014-07-14\n", "2 2015-01-18\n", "3 2014-09-21\n", "4 2014-03-05\n", "5 2014-03-08\n", "6 2014-04-06\n", "7 2014-09-02\n", "8 2014-07-19\n", "9 2014-09-29\n", "10 2014-04-14\n", "11 2014-07-16\n", "12 2014-01-27\n", "13 2014-05-04\n", "14 2014-08-16\n", "...\n", "985 2014-08-29\n", "986 2015-01-24\n", "987 2014-07-09\n", "988 2014-01-21\n", "989 2014-05-17\n", "990 2014-09-30\n", "991 2014-02-11\n", "992 2014-05-03\n", "993 2014-06-30\n", "994 2014-12-10\n", "995 2014-07-03\n", "996 2014-09-20\n", "997 2014-01-19\n", "998 2014-12-16\n", "999 2014-07-24\n", "Name: Date, Length: 1000, dtype: datetime64[ns]" ] } ], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit pd.to_datetime(crimes.Date[:1000])" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "10 loops, best of 3: 99.5 ms per loop\n" ] } ], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit pd.to_datetime(crimes.Date[:1000], format=CRIME_DATE_FORMAT)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "100 loops, best of 3: 5.63 ms per loop\n" ] } ], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "def null_on_error(series, converter):\n", " def f(x):\n", " try:\n", " return converter(x)\n", " except (TypeError, ValueError):\n", " return np.nan\n", " return series.map(f)\n", "\n", "def number_with_comma(x):\n", " try:\n", " x = x.replace(',', '')\n", " return float(x)\n", " except (ValueError, AttributeError):\n", " return np.nan\n", "\n", "def to_time(x):\n", " h, m = x.split(':')\n", " return time(int(h), int(m))\n", "\n", "crimes = pd.read_csv('sf_crimes.csv')\n", "crimes.columns = crimes.columns.map(str.lower)\n", "crimes.date = pd.to_datetime(crimes.date, format=CRIME_DATE_FORMAT)\n", "crimes.time = crimes.time.map(to_time)\n", "\n", "companies = pd.read_csv('companies.csv',\n", " parse_dates=['founded_at', 'first_funding_at'])\n", "companies['funding_total_usd'] = null_on_error(\n", " companies.pop(' funding_total_usd '), number_with_comma)\n", "\n", "investments = pd.read_csv('investments.csv')\n", "acquisitions = pd.read_csv('acquisitions.csv')\n", "rounds = pd.read_csv('rounds.csv')\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stderr", "text": [ "/Users/wesm/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py:1154: DtypeWarning: Columns (0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,20,21) have mixed types. Specify dtype option on import or set low_memory=False.\n", " data = self._reader.read(nrows)\n" ] } ], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "companies.columns = companies.columns.map(str.strip)\n", "\n", "rounds.columns = rounds.columns.map(str.strip)\n", "investments.columns = investments.columns.map(str.strip)\n", "acquisitions.columns = acquisitions.columns.map(str.strip)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "acquisitions['price_amount'] = null_on_error(acquisitions.price_amount, number_with_comma)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Less trivial data grouping\n", "===\n", "- Computed fields\n", "- pandas conveniences: arrays, functions\n", "- Coarser grouping with value mappings" ] }, { "cell_type": "code", "collapsed": false, "input": [ "crimes.loc[0]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "incidntnum 140090226\n", "category WARRANTS\n", "descript ENROUTE TO PAROLE OFFICER\n", "dayofweek Thursday\n", "date 2014-01-30 12:00:00\n", "time 21:56:00\n", "pddistrict MISSION\n", "resolution ARREST, BOOKED\n", "address 400 Block of SOUTH VAN NESS AV\n", "x -122.4177\n", "y 37.76627\n", "location (37.7662744315825, -122.417662868855)\n", "Name: 0, dtype: object" ] } ], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "crimes.groupby('category').size().order(ascending=False)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 14, "text": [ "category\n", "LARCENY/THEFT 41300\n", "OTHER OFFENSES 22324\n", "NON-CRIMINAL 20838\n", "ASSAULT 13422\n", "VANDALISM 7712\n", "VEHICLE THEFT 7671\n", "WARRANTS 7267\n", "BURGLARY 6525\n", "DRUG/NARCOTIC 5801\n", "SUSPICIOUS OCC 5633\n", "MISSING PERSON 5052\n", "ROBBERY 3758\n", "FRAUD 3176\n", "SECONDARY CODES 2042\n", "WEAPON LAWS 1726\n", "TRESPASS 1247\n", "STOLEN PROPERTY 1165\n", "SEX OFFENSES, FORCIBLE 912\n", "FORGERY/COUNTERFEITING 807\n", "DRUNKENNESS 666\n", "KIDNAPPING 559\n", "PROSTITUTION 477\n", "DRIVING UNDER THE INFLUENCE 408\n", "DISORDERLY CONDUCT 389\n", "ARSON 292\n", "RUNAWAY 233\n", "LIQUOR LAWS 192\n", "EMBEZZLEMENT 153\n", "FAMILY OFFENSES 94\n", "SUICIDE 75\n", "BRIBERY 58\n", "LOITERING 38\n", "BAD CHECKS 35\n", "EXTORTION 33\n", "SEX OFFENSES, NON FORCIBLE 20\n", "GAMBLING 9\n", "PORNOGRAPHY/OBSCENE MAT 2\n", "TREA 1\n", "dtype: int64" ] } ], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "query = \"\"\"\n", "select category, count(*) as occs\n", "from crimes\n", "group by 1\n", "order by occs desc;\n", "\"\"\"\n", "\n", "pd.read_sql(query, con)[:10]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
categoryoccs
0 LARCENY/THEFT 41300
1 OTHER OFFENSES 22324
2 NON-CRIMINAL 20838
3 ASSAULT 13422
4 VANDALISM 7712
5 VEHICLE THEFT 7671
6 WARRANTS 7267
7 BURGLARY 6525
8 DRUG/NARCOTIC 5801
9 SUSPICIOUS OCC 5633
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 16, "text": [ " category occs\n", "0 LARCENY/THEFT 41300\n", "1 OTHER OFFENSES 22324\n", "2 NON-CRIMINAL 20838\n", "3 ASSAULT 13422\n", "4 VANDALISM 7712\n", "5 VEHICLE THEFT 7671\n", "6 WARRANTS 7267\n", "7 BURGLARY 6525\n", "8 DRUG/NARCOTIC 5801\n", "9 SUSPICIOUS OCC 5633" ] } ], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "def top(x, k=10):\n", " return x.order(ascending=False)[:k]\n", "\n", "crimes.groupby([crimes.date.dt.year,\n", " crimes.time.map(lambda x: x.hour)]).size()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 17, "text": [ " time\n", "2014 0 7217\n", " 1 4353\n", " 2 3642\n", " 3 2537\n", " 4 1709\n", " 5 1636\n", " 6 2195\n", " 7 3553\n", " 8 5124\n", " 9 5844\n", " 10 6210\n", " 11 6464\n", " 12 8803\n", " 13 7122\n", " 14 7373\n", " 15 8111\n", " 16 8671\n", " 17 9389\n", " 18 9865\n", " 19 9087\n", " 20 8452\n", " 21 7743\n", " 22 7924\n", " 23 7092\n", "2015 0 504\n", " 1 420\n", " 2 332\n", " 3 224\n", " 4 156\n", " 5 143\n", " 6 179\n", " 7 320\n", " 8 403\n", " 9 530\n", " 10 497\n", " 11 533\n", " 12 596\n", " 13 539\n", " 14 576\n", " 15 627\n", " 16 681\n", " 17 691\n", " 18 819\n", " 19 766\n", " 20 694\n", " 21 634\n", " 22 616\n", " 23 516\n", "dtype: int64" ] } ], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "query = \"\"\"\n", "select extract(year from date) as year,\n", " extract(hour from time) as hourofday,\n", " count(*) as occs \n", "from crimes\n", "group by 1, 2;\n", "\"\"\"\n", "sql.read_sql(query, con)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearhourofdayoccs
0 2014 0 7217
1 2014 1 4353
2 2014 2 3642
3 2014 3 2537
4 2014 4 1709
5 2014 5 1636
6 2014 6 2195
7 2014 7 3553
8 2014 8 5124
9 2014 9 5844
10 2014 10 6210
11 2014 11 6464
12 2014 12 8803
13 2014 13 7122
14 2014 14 7373
15 2014 15 8111
16 2014 16 8671
17 2014 17 9389
18 2014 18 9865
19 2014 19 9087
20 2014 20 8452
21 2014 21 7743
22 2014 22 7924
23 2014 23 7092
24 2015 0 504
25 2015 1 420
26 2015 2 332
27 2015 3 224
28 2015 4 156
29 2015 5 143
30 2015 6 179
31 2015 7 320
32 2015 8 403
33 2015 9 530
34 2015 10 497
35 2015 11 533
36 2015 12 596
37 2015 13 539
38 2015 14 576
39 2015 15 627
40 2015 16 681
41 2015 17 691
42 2015 18 819
43 2015 19 766
44 2015 20 694
45 2015 21 634
46 2015 22 616
47 2015 23 516
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 18, "text": [ " year hourofday occs\n", "0 2014 0 7217\n", "1 2014 1 4353\n", "2 2014 2 3642\n", "3 2014 3 2537\n", "4 2014 4 1709\n", "5 2014 5 1636\n", "6 2014 6 2195\n", "7 2014 7 3553\n", "8 2014 8 5124\n", "9 2014 9 5844\n", "10 2014 10 6210\n", "11 2014 11 6464\n", "12 2014 12 8803\n", "13 2014 13 7122\n", "14 2014 14 7373\n", "15 2014 15 8111\n", "16 2014 16 8671\n", "17 2014 17 9389\n", "18 2014 18 9865\n", "19 2014 19 9087\n", "20 2014 20 8452\n", "21 2014 21 7743\n", "22 2014 22 7924\n", "23 2014 23 7092\n", "24 2015 0 504\n", "25 2015 1 420\n", "26 2015 2 332\n", "27 2015 3 224\n", "28 2015 4 156\n", "29 2015 5 143\n", "30 2015 6 179\n", "31 2015 7 320\n", "32 2015 8 403\n", "33 2015 9 530\n", "34 2015 10 497\n", "35 2015 11 533\n", "36 2015 12 596\n", "37 2015 13 539\n", "38 2015 14 576\n", "39 2015 15 627\n", "40 2015 16 681\n", "41 2015 17 691\n", "42 2015 18 819\n", "43 2015 19 766\n", "44 2015 20 694\n", "45 2015 21 634\n", "46 2015 22 616\n", "47 2015 23 516" ] } ], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [ "crimes[:1].T" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
incidntnum 140090226
category WARRANTS
descript ENROUTE TO PAROLE OFFICER
dayofweek Thursday
date 2014-01-30 12:00:00
time 21:56:00
pddistrict MISSION
resolution ARREST, BOOKED
address 400 Block of SOUTH VAN NESS AV
x -122.4177
y 37.76627
location (37.7662744315825, -122.417662868855)
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 19, "text": [ " 0\n", "incidntnum 140090226\n", "category WARRANTS\n", "descript ENROUTE TO PAROLE OFFICER\n", "dayofweek Thursday\n", "date 2014-01-30 12:00:00\n", "time 21:56:00\n", "pddistrict MISSION\n", "resolution ARREST, BOOKED\n", "address 400 Block of SOUTH VAN NESS AV\n", "x -122.4177\n", "y 37.76627\n", "location (37.7662744315825, -122.417662868855)" ] } ], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "crimes.groupby('category').size().order(ascending=False)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 20, "text": [ "category\n", "LARCENY/THEFT 41300\n", "OTHER OFFENSES 22324\n", "NON-CRIMINAL 20838\n", "ASSAULT 13422\n", "VANDALISM 7712\n", "VEHICLE THEFT 7671\n", "WARRANTS 7267\n", "BURGLARY 6525\n", "DRUG/NARCOTIC 5801\n", "SUSPICIOUS OCC 5633\n", "MISSING PERSON 5052\n", "ROBBERY 3758\n", "FRAUD 3176\n", "SECONDARY CODES 2042\n", "WEAPON LAWS 1726\n", "TRESPASS 1247\n", "STOLEN PROPERTY 1165\n", "SEX OFFENSES, FORCIBLE 912\n", "FORGERY/COUNTERFEITING 807\n", "DRUNKENNESS 666\n", "KIDNAPPING 559\n", "PROSTITUTION 477\n", "DRIVING UNDER THE INFLUENCE 408\n", "DISORDERLY CONDUCT 389\n", "ARSON 292\n", "RUNAWAY 233\n", "LIQUOR LAWS 192\n", "EMBEZZLEMENT 153\n", "FAMILY OFFENSES 94\n", "SUICIDE 75\n", "BRIBERY 58\n", "LOITERING 38\n", "BAD CHECKS 35\n", "EXTORTION 33\n", "SEX OFFENSES, NON FORCIBLE 20\n", "GAMBLING 9\n", "PORNOGRAPHY/OBSCENE MAT 2\n", "TREA 1\n", "dtype: int64" ] } ], "prompt_number": 20 }, { "cell_type": "code", "collapsed": false, "input": [ "category_mapping = {\n", " 'LARCENY/THEFT': 'THEFT',\n", " 'ROBBERY': 'THEFT',\n", " 'VEHICLE THEFT': 'THEFT'\n", "}\n", "coarse_cat = crimes.category.map(lambda x: category_mapping.get(x, x))\n", "top(crimes.groupby(coarse_cat).size(), 20)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 21, "text": [ "category\n", "THEFT 52729\n", "OTHER OFFENSES 22324\n", "NON-CRIMINAL 20838\n", "ASSAULT 13422\n", "VANDALISM 7712\n", "WARRANTS 7267\n", "BURGLARY 6525\n", "DRUG/NARCOTIC 5801\n", "SUSPICIOUS OCC 5633\n", "MISSING PERSON 5052\n", "FRAUD 3176\n", "SECONDARY CODES 2042\n", "WEAPON LAWS 1726\n", "TRESPASS 1247\n", "STOLEN PROPERTY 1165\n", "SEX OFFENSES, FORCIBLE 912\n", "FORGERY/COUNTERFEITING 807\n", "DRUNKENNESS 666\n", "KIDNAPPING 559\n", "PROSTITUTION 477\n", "dtype: int64" ] } ], "prompt_number": 21 }, { "cell_type": "code", "collapsed": false, "input": [ "#crimes.category.map(category_mapping.__getitem__)\n", "crimes.category.map(lambda x: category_mapping[x])" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "KeyError", "evalue": "'WARRANTS'", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m#crimes.category.map(category_mapping.__getitem__)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mcrimes\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcategory\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmap\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mcategory_mapping\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m/Users/wesm/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc\u001b[0m in \u001b[0;36mmap\u001b[0;34m(self, arg, na_action)\u001b[0m\n\u001b[1;32m 2012\u001b[0m index=self.index).__finalize__(self)\n\u001b[1;32m 2013\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2014\u001b[0;31m \u001b[0mmapped\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmap_f\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0marg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2015\u001b[0m return self._constructor(mapped,\n\u001b[1;32m 2016\u001b[0m index=self.index).__finalize__(self)\n", "\u001b[0;32m/Users/wesm/anaconda/lib/python2.7/site-packages/pandas/lib.so\u001b[0m in \u001b[0;36mpandas.lib.map_infer (pandas/lib.c:56502)\u001b[0;34m()\u001b[0m\n", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m(x)\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m#crimes.category.map(category_mapping.__getitem__)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mcrimes\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcategory\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmap\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mcategory_mapping\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mKeyError\u001b[0m: 'WARRANTS'" ] } ], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [ "query = \"\"\"\n", "select\n", " case \n", " when category in ('LARCENY/THEFT', 'ROBBERY', 'VEHICLE THEFT')\n", " then 'THEFT'\n", " else category\n", " end as coarse_cat,\n", " count(*) as occs\n", "from crimes\n", "group by 1\n", "order by occs desc;\n", "\"\"\"\n", "sql.read_sql(query, con)[:10]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coarse_catoccs
0 THEFT 52729
1 OTHER OFFENSES 22324
2 NON-CRIMINAL 20838
3 ASSAULT 13422
4 VANDALISM 7712
5 WARRANTS 7267
6 BURGLARY 6525
7 DRUG/NARCOTIC 5801
8 SUSPICIOUS OCC 5633
9 MISSING PERSON 5052
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 24, "text": [ " coarse_cat occs\n", "0 THEFT 52729\n", "1 OTHER OFFENSES 22324\n", "2 NON-CRIMINAL 20838\n", "3 ASSAULT 13422\n", "4 VANDALISM 7712\n", "5 WARRANTS 7267\n", "6 BURGLARY 6525\n", "7 DRUG/NARCOTIC 5801\n", "8 SUSPICIOUS OCC 5633\n", "9 MISSING PERSON 5052" ] } ], "prompt_number": 24 }, { "cell_type": "code", "collapsed": false, "input": [ "keys = [crimes.date.dt.year,\n", " crimes.time.map(lambda x: x.hour)]\n", "counts = crimes.groupby(keys).size()\n", "counts = counts.unstack(0)\n", "share = counts / counts.sum()\n", "share.plot(kind='bar')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 26, "text": [ "" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAXYAAAEQCAYAAACk818iAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X+cVPV97/HXsAZhYZeF+osquDwUFI0GrJjtjSlg07oS\nAzWpIQRjF3/gDSWa3LTX0Op1oonW0tyi11SxUR72h642+GiwEU0RVm1qiLTuCsgG5JeYNEQjP5Zf\nsnTn/vH9zs6ZszNnzpyZ2TnnzPv5eJzHzvnxmfOd2ZnPnPM53/kOiIiIiIiIiIiIiIiIiIiIiIiI\niIhEXivQDWwDbs+zzYN2fRcw1S47D3jDMR0Abq1oS0VEpKA64G2gGfgI0AlMdm0zC3je3v448JMc\n9zME+C9gXEVaKSIi/YYUWH8ZJrHvAnqBdmCOa5vZwBP29nqgCTjdtc2ngO3AnhLaKiIiPhRK7GeS\nnYzftcsKbXOWa5svAE8GaaCIiBTnpALrUz7vJ+ERNxT4DHnq8+ecc05q+/btPncjIiJWFzAl14pC\nR+w/J7suPg5zRO61zVl2WdpVwH8A7+Xawfbt20mlUjmnu+66K+86r0lxilNcdeKi0Ma4xAEfy5e4\nCyX2DcBEzMXTocBcYJVrm1XA9fZ2C7Af2OtYPw94qsB+RESkTAqVYk4Ai4EXMT1kHgO2ALfY9csx\nPWJmYS6yHgYWOOJHYC6c3hykcbt27QoSpjjFKa5KcVFoYy3EFUrsAKvt5LTcNb84T+xh4JRiG5U2\nZUrO8pHiFKe4kMZFoY21EOe+6FkNKVsvEhERnxKJBOTJ4X6O2KtizJgx7Nu3r9rNCK3Ro0fzwQcf\nVLsZImXR2NRIz4Ge/vmGUQ0c3H+wii2KtkIXT6tm3759ga4i18oU5EOvo6Mj0P9CcYqr9L56DvRA\nkv7JmeQrsb+4x4U2sYuISDChrbEnEglyLRdDz4/ESSKRMEfraUn0+i7Aq8auI3YRkZhRYq8hUakP\nKi66cUH3FVQUnpNqxCmxi4hI2aVyybW8oWF0CjPAWEWmhobROdvi9uGHH6ZuuOGG1Nlnn51qaGhI\nTZkyJbV69er+9WvWrEmdd955qfr6+tTMmTNTu3fv7l+3du3a1IwZM1KjRo1KNTc3591HR0dHKpFI\npO644w7fz49IVAEpko5Jr++C8BikMVJH7D09+6hgXrf3X9iJEycYP348r7zyCgcPHuRb3/oWn//8\n53nnnXd4//33+exnP8u3v/1t9u3bx6WXXsrcuXP7Y0eOHMlNN93E0qVL895/b28vt912Gy0tLekL\nJCKx09g4hkQiodd4BUQqsYdFfX09d911F+PHjwfg05/+NBMmTGDDhg08++yzXHTRRXzuc59j6NCh\nJJNJurq62Lp1KwDTpk1j/vz5TJgwIe/9f+c736G1tZXzzjuvrD0DolIfVFx044qJyT5QCyYKz0k1\n4pTYy2Dv3r1s3bqVj370o2zevJmPfSwzmmZ9fT3nnnsumzZt8nVfu3fvZsWKFdx5553q7iUigYTh\nHCiVK4Hl6qdtTtkqmeyK7xve29vLVVddxcSJE3n44Ye56aabOPXUU7nvvvv6t7n88stZuHAh119/\nff+yNWvWcPPNN7Nz586s+5szZw7XXXcd1157LQsWLGDcuHHcfffdA1uqfuwScdnvZ/VjL5b6sVdI\nX18fX/rSlxg2bBgPPfQQYGroBw9mj3Fx4MABGhoaCt7fc889x6FDh7j22msBnAPqi4j4psQeUCqV\n4sYbb+S9995j5cqV1NXVAXDhhRfS1dXVv93hw4fZvn07F154YcH7XLt2LRs2bGDs2LGMHTuWZ555\nhmXLlnHNNdeUpc1RqQ8qLrpx6scejjgl9oC+/OUv093dzapVqzj55JP7l19zzTVs2rSJZ599lmPH\njvHNb36TKVOmMGnSJMB8IBw7doze3l5SqRQffvghx48fB+Cee+5h27ZtdHV10dnZyezZs1m4cCEr\nVqyoymMUCbtZV8/q71mTSCRobGqsdpPEyttH0y0s/dh37dqVSiQSqeHDh6dGjhzZPz355JOpVMr0\nYz///PNTw4cPH9CPfd26dalEIpFKJBKpIUOGpBKJRGrmzJk599PW1pa68847fT8/IlFi3ncpOwXr\nxx40Lg7wuOAYqYunkqHnR6KuHBdPa3nwMF08FSA69UHFRTdusGvsQUXhuSwlLrS/oCQi0dDYOKb/\nW9vDh4/kyBF/P5IhlaNSTETp+ZGwcJdU/L4uVYopjUoxIiI1RIm9hkSlPqi4aMdFQVSey0r2Y28F\nuoFtwO15tnnQru8CpjqWNwHfB7YAbwEtgVopIiK+Faqx1wE/Az4F/Bx4HZiHSdRps4DF9u/HgQfI\nJPAngJeBxzEXakcAB1z7UI09AD0/EhaqsVdHKTX2y4C3gV1AL9AOzHFtMxuTwAHWY47STwdGAZ/E\nJHWAEwxM6iIiUmaFEvuZwB7H/Lt2WaFtzgImAO8BK4D/BP4WqC+lsVKaqNQHFRftuEorxw90ROW5\nrFSN3e85jfsZTmFKL5cAf2P/Hga+kSu4ra2NZDJJMplk2bJleR9MY1Nj1rgQ5Z78jjNx/Phxbrzx\nRpqbm2lsbGTq1Km88MIL/etfeuklzj//fEaMGMEVV1zBO++8079u3bp1zJw5k6amppw/ttHc3Ex9\nfT0NDQ00NDTQ2trq2ZaOjo6s56sS852dndqf9pd33q2YeOiwk7XTTh7xA36gI3vk64o+n52dnRX/\nf+XbX0dHB21tbf350kuhj7wWTAUrnV2WAH3A/Y5tHsH8Z9rtfDcw3d73a5gjd4DLMYn9atc+fNfY\nB9TTyi3prz535MgRli5dyoIFCxg/fjw//OEPmTdvHps2baK+vp5zzjmHxx9/nM985jPccccdvPrq\nq7z22msAvP7662zdupUjR45w7733DhiPfcKECTz22GNcccUVnm1QjV3CYrBr7BrH3fCqsRf65ukG\nYCLQDPwCmIu5eOq0CnPxtB3zQbAf2GvX7QEmAVsxF2A3F9v4MEr/NF6a86fx3n///f6fxgNIJpOc\ncsopbN26lUmTJjFt2jSmTZvGmjVr8t5/rbwwRaQyCpViTmCS9ouY7opPY3rE3GIngOeBHZiLrMuB\nRY74rwD/iOkGeTFwb7kaHibl/Gk8gPnz53Paaadx5ZVX8uabb5atnblOmxWnuHLHRUFUnsugcX7G\nilltJ6flrvnFeWK7gGnFNipKent7mT9/Pm1tbUyaNInDhw9z6qmnZm3T2NjIoUOHfN3fk08+ySWX\nXEJfXx8PPPAAV155Jd3d3YwaNaoSzReRGIrUWDFhqbGn9fX18cUvfpFDhw7xgx/8gLq6Or761a/S\n29vLd7/73f7tLrroIu6+++6sX0LK95unbpMnT2bp0qVcfXX2pQnV2Aefc7ArgIaG0Rw8+EEVWxQO\nqrFXh8aKqYBUBX4aL5dSunRJeWX3xkhlJXmRMFFiD6gSP423Z88efvzjH3P8+HGOHTvG0qVL+fWv\nf80nPvGJsrQ5KvXBqMQFFdbH5+wfnkgkqK8v/APsUVU/sj5QV+ew/u/cIjUee8OoBnqSlRvruWGU\nvxfy7t27efTRRxk2bBhnnHFG//JHH32UefPmsXLlShYvXsx1111HS0sL7e3t/du8/PLL/V0ZE4kE\nw4cPZ8aMGaxdu5aenh4WLVrE9u3bGTZsGFOnTmX16tWMHj26vA9UJIfMGYlx9GiAs8VE9llmw6gG\nDu4/WIbWldfRw0ezSjiVzCvVEIbzfI0VE4Cen8GXXduFYurJURD08Q12rVxjzBiqsYuI1BAl9hoS\nlfpgVOKCiszjs2WVYmrQcReV/50Su4jklsKUK5LQcyD6NehyDB4WFWF4hKqxB6DnZ/DVYo29vw6d\nDE+tXP3fDdXYRURqiBJ7DYlKfTAqcUHF/fHFWVT+d6Htxz569OiaqIUFpb7tIpJPGDJnzhq7SNio\nxh6Omrdq7IZq7CIiNSTUiT0q9SzF1WZcUBrzJbqi8toMbY1dRIIpy5gvEmlh+I+rxi6REJUae3nG\nfDFxqrGHl2rsIiI1JNSJPSr1LMXVZlxQUWmnDFTMOO7Oax1Br3Ooxi4iUmHFjOPuvNYx2Nc5VGMX\n8SmyNfYhCegzN71++EI19vKO4x70t2D98qqx64hdJO766E9GcfulIMlNNXbFKU41dvFQzeF+Kzke\neyvQDWwDbs+zzYN2fRcw1bF8F/Am8Abw00AtFBGpokytvIRSiuNHSwbjh0sKfQTVAT8DPgX8HHgd\nmAdscWwzC1hs/34ceABoset2Ar8FfOCxD9XYJRIiW2MPVCsPGhfeWnlU4vwqpR/7ZcDbmCPvXqAd\nmOPaZjbwhL29HmgCTnfuv6jWiohISQol9jOBPY75d+0yv9ukgDXABuDmYhsXlVqr4mozLqiotFOq\nr1I1dr/nCvmOyi/H1NyvAv4Y+GSujdra2kgmkySTSZYtW5b1YDo6Ooqe7+zsLCle+9P+8s1Dh53C\n+/iyuefxjHc/Pnbayff+srfX/orfn3O+s7Ozf76jo4O2trb+fOmlUJmkBVMZarXzSzCdp+53bPOI\nbXm7ne8GpgN7Xfd1F3AI+I5ruWrsEglB+4cPNtXYox3nVyk19g3ARKAZGArMBVa5tlkFXG9vtwD7\nMUm9Hkh/j3YE8PvAxqJaLhJm6f7hSeg5oP7hEh6FEvsJTI+XF4G3gKcxPWJusRPA88AOzEXW5cAi\nu/wM4FWgE3NR9V+AHxXTuJynQYpTXEjigvK7P42rLkFfm36+ebraTk7LXfOLc8TtAKYEaZSIaFx1\nCS4MrxTV2CUSgtagy7m/wRtXPWhceGvXUYnzK1LjsTtPPxsbx1S7OSLhMSTz7cVKf3NRoi10id35\n9V1zu3hRqdEqLtpxQQXeny7W1pxK1thFpASNjWOyDlKGDx/JkSNKzFI5sUzsM2bMUJziKh7nly6C\nSjGcBwINDaM5eNBrqK3cYpnYRUSiynkg0NMT7CAgdDX2cohKjVZx0Y4TCatYJnYRkVoWy8QelRqt\n4qIdJxJWsUzsIiK1LJaJPSo1WsVFO04krGKZ2EVCLaFvkEplxbK7Y1RqtIqLdlxgKfrHDelJ6otK\nUn46Ypea19jUqCNoiZVYJvao1GgVl1v9yPpAiTbo/noO9GgMFomVWJZiJNqOHj6qUoVICWJ5xB6V\nGq3iykv90UWMWCZ2ES/un5wTCS3HGPzFlCZjmdijUktWXHn53Z9zzP/sXwwSCRnHGPzFXAOKZWIX\nEallsUzsUaklK668VGMXMcKd2APWl0REalm4E3vA+lJUasmKK6+g/d9F4sZPYm8FuoFtwO15tnnQ\nru8CprrW1QFvAM8FbKNITu7eLf3935P6opHUtkKJvQ54CJPcLwDmAZNd28wCzgUmAguBh13rbwPe\nYhC7H0Sllqy40qh3i0huhRL7ZcDbwC6gF2gH5ri2mQ08YW+vB5qA0+38WZjE/z1AHYZFRAZBocR+\nJrDHMf+uXeZ3m78G/hRTLR80UaklK05EKqFQYvd7fus+Gk8AVwO/wtTXPY/W29raSCaTJJNJu6Qj\ns3KnndJrOjqyEkau+c7OzqK2L3Ve+yvvvJvX9ua14pj38Xpx3bv2V9b9ZW+v/ZVxf/ZvOl96KVQe\nacFcjmq180swR9/3O7Z5BNPydjvfDcwAbgW+BJwAhgGNwErgetc+UqlU5vPDfMU7PZ/oHwwKTEuc\n20o8JRKO/3sy//88+7UCWa8XxVUpzv97VnGlxdnhMHLm8EJH7BswF0WbgaHAXGCVa5tVZJJ1C7Af\n+CXwZ8A4YALwBWAtA5O6iIiUWaHEfgJYDLyI6dnyNLAFuMVOAM8DOzAXWZcDi/Lc16Adauc8nVFc\naOM0KJdIefkZj321nZyWu+YXF7iPl+0kMkCm22KakrtIKcL9zdOAotJfW3EiUgmxTOwiIrUslok9\nrLVkxYnIYIhlYhcRqWWxTOxRqSUrTkQqIZaJXUSklsUysUellqw4EamEWCZ2EZFaFsvEHpVasuJE\npBJimdhFRGpZLBN7VGrJihORSohlYp919aysQaX0w8YiUkv8DAIWOf0/amz1JP39sHFUatBxjxOR\n0sTyiF1EpJYpsTtEpQYd1jj3uOr19Q2B9icipVFil7LJjKtupqPHDuk6h0gVxLLGHlRUatBRiSNF\n/7UOv9c5RKR0OmIXEYkZJXaHsNauoxonItWhxC4iEjNK7A5RqV1HJU5EqkOJXUQkZpTYHaJSu45K\nnIhUh5/E3gp0A9uA2/Ns86Bd3wVMtcuGAeuBTuAt4L6SWioiIr4USux1wEOY5H4BMA+Y7NpmFnAu\nMBFYCDxslx8DZgJTgIvt7cvL0uoKiUrtOipxIlIdhRL7ZcDbwC6gF2gH5ri2mQ08YW+vB5qA0+38\nEft3KOZD4oPSmisiIoUUSuxnAnsc8+/aZYW2OcversOUYvYC6zAlmdCKSu06KnEiUh2FEnvK5/0k\n8sT9N6YUcxbwO8CMXMFtbW0kk0mSyaRd0pFZudNO6TUdHVmJJtd8lp3Zs37ii53v7Oys6P1HaX/m\nf+eY9/H/y+aeR/uL1P7I+Z7T/sqwP/s3nS+9uBOyWwtmtI9WO78E6APud2zzCKbl7Xa+G5iOOUp3\nuhM4CvyVa3kqlcp8fiQSCTKfC4mscdVJgnPbfBKJYHFSmuz/HWT9/5L5/weKi1Oc//ee4kqLM9vl\nzuGFjtg3YC6KNmPq5HOBVa5tVgHX29stwH5MUj8FU28HGA78HvBGgf2JiEiJCiX2E8Bi4EVMffxp\nYAtwi50Angd2YC6yLgcW2eVjgbWYGvt64DngpTK2PYtzLPCgcp4GKS5wnIhUh59he1fbyWm5a35x\njriNwCVBGhVEZixwKFxhEhGJL33z1CEq/cOjEici1aHELiISM0rsDvUj67N+s9Pvz7lFpeatGrtI\nbdBP4zkcPXw0q2uRfs5NRKJIR+xlEJWat2rsIrVBiV0KamxqLLo8JSLVo8ReBlGpeQeN6znQY0pU\nSXtbREJNiV1EJGaU2MsgKjVv1cpFaoMSu4hIzCixl0FUauXqjy5SG5TYRURiRom9DKJSK1eNXaQ2\n1HxiL8dwvyIiYVLziT0z3G/wX1iKSq1cNXaR2lDziV0Gcp7F6ExGJHo0CFgZRKVW7jcu+0dLQD9c\nIhItOmIXEYkZJfYyiEqtXDV2kdqgxC4iEjNK7GUQ1lp5ueJEJFqU2EVEYkaJvQyiUitXjV2kNvhN\n7K1AN7ANuD3PNg/a9V3AVLtsHLAO2AxsAm4N3FIREfHFT2KvAx7CJPcLgHnAZNc2s4BzgYnAQuBh\nu7wX+BpwIdAC/HGO2MiLSq1cNXaR2uAnsV8GvA3swiTqdmCOa5vZwBP29nqgCTgd+CXQaZcfArYA\nv1lSi0VExJOfxH4msMcx/65dVmibs1zbNGNKNOuLa2L4RaVWrhq7SG3wk9j9jo7l/t65M24k8H3g\nNsyRe5a2tjaSySTJZNIu6cis3Gmn9JqOjqwE5Z7Pik3HO9cO2L60/XV0dNDZ2VnU9qXOV3p/2dzz\neMab7R3zPp5P7S9O+yPne077K8P+7N90vvTiZxCQFsxv1Lfa+SVAH3C/Y5tHMK1vt/PdwHRgL/AR\n4F+A1cCyHPefSqUynwFm0Kn0fMLsOS0Jzm2zHsggx8VZ9nMCWc9L0u9zqbjajAvvezZucXaAvpw5\n3M8R+wbMRdFmYCgwF1jl2mYVcL293QLsxyT1BPAY8Ba5k7qIiJSZn8R+AlgMvIhJ0E9jLoLeYieA\n54EdmIusy4FFdvkngOuAmcAbdkof+cdGztOnGMWJSLT4HbZ3tZ2clrvmF+eI+zdq4EtQs66exdHD\nR/vnG0Y1cHD/wSq2SERqmcZjL4Ojh49m1cF6kj2+4tSPXUQqIfZH0yIitUaJvYpUYxeRSlBiFxGJ\nGSX2KlKNXUQqQYk9oMbGMSQSifSXBEREQkOJPaCenn2Yb4cF/2aqauwiUglK7CIiMaPEXkWqsYtI\nJSixi4jEjBJ7FanGLiKVoMReQ2ZdPau/J09jU2O1myMiFaKxYqpoMGrljY1jbA8eK2n++B3PRkSi\nR0fsMVeObpkiEi1K7FWkWrmIVIISu4hIzCixV5H6o4tIJSixV1FjU2N/LxX1VBGRclFir6KeAz2m\nl4qdeg7466miGruIeFFiFxGJGSX2CFKNXUS8KLFHkGrzIuJF3zwdZAO+CRpAf20+Pa9vkYqIg98j\n9lagG9gG3J5nmwft+i5gqmP548BeYGPANsaKvgkqIpXmJ7HXAQ9hkvsFwDxgsmubWcC5wERgIfCw\nY90KGysl0E/xiYhffhL7ZcDbwC6gF2gH5ri2mQ08YW+vB5qAM+z8q0BptQfRkb6I+OYnsZ8J7HHM\nv2uXFbuNiIgMAj8XT/0eIrprBL4PLdva2mhubnYs6QBmmJs77aIJdo39ck66y9/AL+u45ne61rri\na3Z/ebYvuD+7jXv7zP7S22t/tbk/GzOh2P05YrW/3PuzsQPz5UB+CrYtmD4Y6Tr5EqAPuN+xzSOY\nR9Bu57uB6ZiLpgDNwHPARTnuP5VKZT4DTA05PZ/I6v1BEpzbOimuyDjfMYpTXDFxEXwvRDTOXm/L\nmcP9lGI2YC6KNgNDgbnAKtc2q4Dr7e0WYD+ZpC4iIoPIT2I/ASwGXgTeAp4GtgC32AngeWAH5iLr\ncmCRI/4p4N+BSZg6/IJyNFxERHLz+wWl1XZyWu6aX5wndl5RLRIRkZJoSAERkZhRYhcRiRkldhGR\nmFFiFxGJGSV2EZGYUWIXEYkZJXYRkZhRYhcRiRkldhGRmFFiFxGJGSV2EZGYUWIXEYkZJXYRkZhR\nYhcRiRkldhGRmFFiFxGJGSV2EZGYUWIXEYkZJXYRkZhRYhcRiRkldhGRmFFiFxGJGT+JvRXoBrYB\nt+fZ5kG7vguYWmSsiIiUUaHEXgc8hEnQFwDzgMmubWYB5wITgYXAw0XEiohImRVK7JcBbwO7gF6g\nHZjj2mY28IS9vR5oAs7wGSsiImVWKLGfCexxzL9rl/nZ5jd9xIqISJkVSuwpn/eTKLUhIiIyOFqA\nFxzzSxh4EfQR4AuO+W7gdJ+xAJ2YDxBNmjRp0uR/6iSgk4DtQDMw1N5Rrounz9vbLcBPiogVEZEq\nuAr4GeZC6BK77BY7pT1k13cBlxSIFRERERERv+qq3QCHycBNwFxMeedi4APg/Qru72PAe8Bxx/JW\nzBlGPpcDI23cDOAPgeHAzoq0MtsngWuBBkyZK58WoAf4EKgH7gS+jnm8r9vludwK/AI4WGS7Tgau\nA04FdgDzgf8JTMCU4Po8Ys8h83+/0sZs9WijiBQQliEFbgeesrfX22mIXRa0hLPAY92twD8DXwE2\nA3/gWHefR9x9wF9h+u3/JfAXmKR+F/CnRbbv73xs81PH7ZuB/4f5ULkL7+flceCwvf0A0GjbehRY\n4RF3j93nvwGLMInajxWYD+PbgL/HfNj9BPNdhu95xN2Gufh+st32ZGA85v8/0+e+a8Vpg7y/3xjk\n/UXJao91ozDvtX8Avuha9zcVa1FIbQM+kmP5ULyPnr3s8Vi3CZMgwVzc3QB81c6/4RH3FuaicD3m\niHiUXT4ceNMj7jlglf2bng47lufjbMsGMol2hH0M+Wxx3P5P17quAvsbAvw+5sPhPUzPpj/CnCXk\ns9H+PQn4lf0LphvsxpwRxiYyZ431wMv29ni8r/g3Yd483cA+zJldt13W5BHnpRJv1nGYD7Z0u1Zg\nHvPf452ox7im38B80S89n0+r43YT8Bjm+X8S01Mtn/vJvLYuxZx1vQ28gzkrzeUN4A7MGVcxpgHr\nMM/lOOBfgQOYM8mpHnENwN2YA7GDmDP59UBbgf0Ffa1ckmf6LeCXHnHP2vu+BvP+XgkMs+u8ckvQ\n10qodWMSrFsz5uJrPhs9Jq9T+c2u+ZHAi8Bf451QOvPczjXv9Abwj5ij0OmYN8t/2dvTPeLeJPPG\ndr8ovPb3feAGe3sF5s0EMAnzBvJqp9NQzLeF2/EuiW3GHG2PxnzgpY/2hjPwuXbaSOZFPwbz4eW8\nz3x+hDnLO4PMdyjGAt+w6/IZ7DfrS5izwiWY1/g3MB9aX7H3kU8fprTnnHrt3x0ecc62PAZ8C/Me\n+hrmDDUf50FCB9mvl//IE7MTc/b6DuY19TXMlxILeR3TqWIe5kuL12L+h78LvOYRtwpzFj4O+F/A\n/7Ht+zvgXo+4oK+V/8Z8AOWajnrEuQ+c/hz4MXAKlXmthFq6rv0C8Ld2egFTR77KI24v5lO+Ocf0\nC4+4dcAU17KPYF4kXvXg9ZgjS8guYzUx8MjYqQ7zYlxD5qjET01+F5k39g7MCxLM0UuhI9onbMx6\nMknhFUydPR+vF94Ij3VL7L5+hhkv6C3M0ccm4H97xN2GSe7fs7HpD6PTbFvz2Rpw3WC/WZ3/o3c8\n1rl9HfP6v9ixzM/rxdmWLrK/OOh1praFzBnzT1zr8p1xpfeVAH4HM0bULzHP5UKfbSzmOXGfEacP\nAobgffAX9LWyGfPBkYtXNWALA0vcbfb+dnvEBX2thF4d8NuY+uznMBcAT/KMMOWCT+ZZ91Se5WA+\n9c/IsTyBuTiaz7A8y08BLvKISzsL+Cfgu3i/OAqpx1xkLGQU5gPsUnI/XrfzSmhTM5kywTmYi6Fe\nHyJpH8X8z88vYl//ivnAcJYXzsAcma3xiBvsN6szmX7btc6rRAXmNfpPmLPIRvwl9ncxBxBfxxwU\nOBO7V6nwK5jn9AogibkuMx34JqYUkEuuD7STMAdpXtdxfoq5SP55zHN+jV0+HXMQks9rZN7rczBn\n2GleiT3oa+Va8r8m/yDPcoClwO/lWN6KKTnnU8prRULiarxPH8XbGMzF63TddJ+9/Zd416AH+816\nD7mvTUzElMv8mINJeHt9bJvEXFhPT+na7FgKX6yfCTyDSdgbMdccbiH3tS8w5bkgLsOUe54CzsYk\n14OYM95LPeLSPbr2Y86W0gchp2I6Q+QT9LUCpufc75K5HpfmVUUIGleO14pIbHn1hvJyQ+FNqhZX\nT+aMcLBfyhcjAAAB50lEQVQfX5D9BW1jpR7bZOBTDEycrTm2TbsVcybwz5izMucHv1f5LWhcup1B\nPkhEYi9oiUtx5YsLUxuDJtqgPeeCxpXygTBAoRq2SBh51Ry9uvXFIc6r69tg7i9MbfSKW4jp9XQI\nk2hX2r/LPGLAXJ84ZG/vwvRkW4kpH3mNZhs0zt3O7/tsZ05K7BJFp2FOo/flWPfvihuUuCi0EQYm\n2un4S7S/wnQ8SPdIOYS5NvYY2b2VyhUX9ANBJDaC9oZSXPniotBGCN61OWjPuaBxQdspIlJzgiba\nwRaVdoqIiIiIiIiIiIiIiEjMjAK+bG+PxYzJIiIiEdaMBlYSEYmVduAI5qvaz5BJ8m2Yr3T/CDOa\n4mLgTzADVL2GGW8ezOiVqzFfF3+F0kbFFBGRMjibTDJ33m7DjNY4AjMU8wEyY4v/X8z48WB+FOFc\ne/vjdl4kNDSkgNSiRJ7bYL4BeNhO+zG/mAQm+V+MSfr/g+y6/NDKNFMkGCV2kWzOn1Tsc8z3Yd4v\nQzDjlXj9PqdIVbl/FUakFvTg/ePcuaSP7Hsw9fc/dCz3GtxJZNApsUst+jXmV3g2Yn5JJ2WXpxy3\nyXE7PT8fuBEzgt8mYHYlGysiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIhJm/x8evA6RfDpmGwAAAABJ\nRU5ErkJggg==\n", "text": [ "" ] } ], "prompt_number": 26 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use Series indexing to perform vectorized mappings\n", "---" ] }, { "cell_type": "code", "collapsed": false, "input": [ "crimes['coarse_cat'] = coarse_cat\n", "thefts = crimes[crimes.coarse_cat == 'THEFT']\n", "\n", "thefts.groupby(thefts.date.dt.weekday).size()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 27, "text": [ "0 7201\n", "1 7082\n", "2 7353\n", "3 7243\n", "4 8259\n", "5 8368\n", "6 7223\n", "dtype: int64" ] } ], "prompt_number": 27 }, { "cell_type": "code", "collapsed": false, "input": [ "day_names = pd.Series(['Weekday'] * 5 + ['Weekend'] * 2)\n", "day_names" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 28, "text": [ "0 Weekday\n", "1 Weekday\n", "2 Weekday\n", "3 Weekday\n", "4 Weekday\n", "5 Weekend\n", "6 Weekend\n", "dtype: object" ] } ], "prompt_number": 28 }, { "cell_type": "code", "collapsed": false, "input": [ "day_type = day_names.take(thefts.date.dt.weekday)\n", "thefts.groupby(day_type.values).size()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 29, "text": [ "Weekday 37138\n", "Weekend 15591\n", "dtype: int64" ] } ], "prompt_number": 29 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top-N selection / filtering\n", "===\n", "- SQL: use `ORDER BY` + `LIMIT` to find\n", " - Many databases feature optimized Top-N algorithms\n", "- pandas: more of a manual affair" ] }, { "cell_type": "code", "collapsed": false, "input": [ "query = \"\"\"\n", "with t0 as (\n", " select descript, count(*) as occs\n", " from crimes\n", " group by 1\n", " order by occs desc\n", " limit 5\n", ")\n", "select descript, dayofweek, count(*) as occs\n", "from crimes\n", "where descript in (select descript from t0)\n", "group by 1, 2\n", "order by descript, dayofweek;\n", "\"\"\"\n", "sql.read_sql(query, con)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
descriptdayofweekoccs
0 AIDED CASE, MENTAL DISTURBED Friday 704
1 AIDED CASE, MENTAL DISTURBED Monday 718
2 AIDED CASE, MENTAL DISTURBED Saturday 747
3 AIDED CASE, MENTAL DISTURBED Sunday 710
4 AIDED CASE, MENTAL DISTURBED Thursday 745
5 AIDED CASE, MENTAL DISTURBED Tuesday 735
6 AIDED CASE, MENTAL DISTURBED Wednesday 781
7 DRIVERS LICENSE, SUSPENDED OR REVOKED Friday 899
8 DRIVERS LICENSE, SUSPENDED OR REVOKED Monday 876
9 DRIVERS LICENSE, SUSPENDED OR REVOKED Saturday 785
10 DRIVERS LICENSE, SUSPENDED OR REVOKED Sunday 778
11 DRIVERS LICENSE, SUSPENDED OR REVOKED Thursday 829
12 DRIVERS LICENSE, SUSPENDED OR REVOKED Tuesday 904
13 DRIVERS LICENSE, SUSPENDED OR REVOKED Wednesday 878
14 GRAND THEFT FROM LOCKED AUTO Friday 2492
15 GRAND THEFT FROM LOCKED AUTO Monday 2267
16 GRAND THEFT FROM LOCKED AUTO Saturday 2605
17 GRAND THEFT FROM LOCKED AUTO Sunday 2162
18 GRAND THEFT FROM LOCKED AUTO Thursday 2281
19 GRAND THEFT FROM LOCKED AUTO Tuesday 2234
20 GRAND THEFT FROM LOCKED AUTO Wednesday 2241
21 LOST PROPERTY Friday 860
22 LOST PROPERTY Monday 606
23 LOST PROPERTY Saturday 907
24 LOST PROPERTY Sunday 728
25 LOST PROPERTY Thursday 637
26 LOST PROPERTY Tuesday 593
27 LOST PROPERTY Wednesday 691
28 PETTY THEFT OF PROPERTY Friday 825
29 PETTY THEFT OF PROPERTY Monday 558
30 PETTY THEFT OF PROPERTY Saturday 985
31 PETTY THEFT OF PROPERTY Sunday 749
32 PETTY THEFT OF PROPERTY Thursday 673
33 PETTY THEFT OF PROPERTY Tuesday 607
34 PETTY THEFT OF PROPERTY Wednesday 644
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 34, "text": [ " descript dayofweek occs\n", "0 AIDED CASE, MENTAL DISTURBED Friday 704\n", "1 AIDED CASE, MENTAL DISTURBED Monday 718\n", "2 AIDED CASE, MENTAL DISTURBED Saturday 747\n", "3 AIDED CASE, MENTAL DISTURBED Sunday 710\n", "4 AIDED CASE, MENTAL DISTURBED Thursday 745\n", "5 AIDED CASE, MENTAL DISTURBED Tuesday 735\n", "6 AIDED CASE, MENTAL DISTURBED Wednesday 781\n", "7 DRIVERS LICENSE, SUSPENDED OR REVOKED Friday 899\n", "8 DRIVERS LICENSE, SUSPENDED OR REVOKED Monday 876\n", "9 DRIVERS LICENSE, SUSPENDED OR REVOKED Saturday 785\n", "10 DRIVERS LICENSE, SUSPENDED OR REVOKED Sunday 778\n", "11 DRIVERS LICENSE, SUSPENDED OR REVOKED Thursday 829\n", "12 DRIVERS LICENSE, SUSPENDED OR REVOKED Tuesday 904\n", "13 DRIVERS LICENSE, SUSPENDED OR REVOKED Wednesday 878\n", "14 GRAND THEFT FROM LOCKED AUTO Friday 2492\n", "15 GRAND THEFT FROM LOCKED AUTO Monday 2267\n", "16 GRAND THEFT FROM LOCKED AUTO Saturday 2605\n", "17 GRAND THEFT FROM LOCKED AUTO Sunday 2162\n", "18 GRAND THEFT FROM LOCKED AUTO Thursday 2281\n", "19 GRAND THEFT FROM LOCKED AUTO Tuesday 2234\n", "20 GRAND THEFT FROM LOCKED AUTO Wednesday 2241\n", "21 LOST PROPERTY Friday 860\n", "22 LOST PROPERTY Monday 606\n", "23 LOST PROPERTY Saturday 907\n", "24 LOST PROPERTY Sunday 728\n", "25 LOST PROPERTY Thursday 637\n", "26 LOST PROPERTY Tuesday 593\n", "27 LOST PROPERTY Wednesday 691\n", "28 PETTY THEFT OF PROPERTY Friday 825\n", "29 PETTY THEFT OF PROPERTY Monday 558\n", "30 PETTY THEFT OF PROPERTY Saturday 985\n", "31 PETTY THEFT OF PROPERTY Sunday 749\n", "32 PETTY THEFT OF PROPERTY Thursday 673\n", "33 PETTY THEFT OF PROPERTY Tuesday 607\n", "34 PETTY THEFT OF PROPERTY Wednesday 644" ] } ], "prompt_number": 34 }, { "cell_type": "code", "collapsed": false, "input": [ "query = \"\"\"\n", "select c.descript, c.dayofweek, count(*) as occs\n", "from crimes c\n", " inner join (\n", " select descript, count(*) as occs\n", " from crimes\n", " group by 1\n", " order by occs desc\n", " limit 5\n", " ) c2 on c.descript = c2.descript\n", "group by 1, 2;\n", "\"\"\"\n", "sql.read_sql(query, con)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
descriptdayofweekoccs
0 PETTY THEFT OF PROPERTY Thursday 673
1 LOST PROPERTY Wednesday 691
2 PETTY THEFT OF PROPERTY Wednesday 644
3 LOST PROPERTY Thursday 637
4 LOST PROPERTY Friday 860
5 PETTY THEFT OF PROPERTY Friday 825
6 GRAND THEFT FROM LOCKED AUTO Friday 2492
7 DRIVERS LICENSE, SUSPENDED OR REVOKED Sunday 778
8 DRIVERS LICENSE, SUSPENDED OR REVOKED Monday 876
9 GRAND THEFT FROM LOCKED AUTO Thursday 2281
10 AIDED CASE, MENTAL DISTURBED Saturday 747
11 GRAND THEFT FROM LOCKED AUTO Wednesday 2241
12 AIDED CASE, MENTAL DISTURBED Wednesday 781
13 AIDED CASE, MENTAL DISTURBED Thursday 745
14 GRAND THEFT FROM LOCKED AUTO Saturday 2605
15 AIDED CASE, MENTAL DISTURBED Friday 704
16 LOST PROPERTY Saturday 907
17 PETTY THEFT OF PROPERTY Saturday 985
18 DRIVERS LICENSE, SUSPENDED OR REVOKED Tuesday 904
19 PETTY THEFT OF PROPERTY Tuesday 607
20 AIDED CASE, MENTAL DISTURBED Monday 718
21 LOST PROPERTY Tuesday 593
22 DRIVERS LICENSE, SUSPENDED OR REVOKED Saturday 785
23 AIDED CASE, MENTAL DISTURBED Sunday 710
24 GRAND THEFT FROM LOCKED AUTO Tuesday 2234
25 LOST PROPERTY Monday 606
26 PETTY THEFT OF PROPERTY Monday 558
27 AIDED CASE, MENTAL DISTURBED Tuesday 735
28 PETTY THEFT OF PROPERTY Sunday 749
29 LOST PROPERTY Sunday 728
30 DRIVERS LICENSE, SUSPENDED OR REVOKED Friday 899
31 GRAND THEFT FROM LOCKED AUTO Sunday 2162
32 DRIVERS LICENSE, SUSPENDED OR REVOKED Wednesday 878
33 DRIVERS LICENSE, SUSPENDED OR REVOKED Thursday 829
34 GRAND THEFT FROM LOCKED AUTO Monday 2267
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 33, "text": [ " descript dayofweek occs\n", "0 PETTY THEFT OF PROPERTY Thursday 673\n", "1 LOST PROPERTY Wednesday 691\n", "2 PETTY THEFT OF PROPERTY Wednesday 644\n", "3 LOST PROPERTY Thursday 637\n", "4 LOST PROPERTY Friday 860\n", "5 PETTY THEFT OF PROPERTY Friday 825\n", "6 GRAND THEFT FROM LOCKED AUTO Friday 2492\n", "7 DRIVERS LICENSE, SUSPENDED OR REVOKED Sunday 778\n", "8 DRIVERS LICENSE, SUSPENDED OR REVOKED Monday 876\n", "9 GRAND THEFT FROM LOCKED AUTO Thursday 2281\n", "10 AIDED CASE, MENTAL DISTURBED Saturday 747\n", "11 GRAND THEFT FROM LOCKED AUTO Wednesday 2241\n", "12 AIDED CASE, MENTAL DISTURBED Wednesday 781\n", "13 AIDED CASE, MENTAL DISTURBED Thursday 745\n", "14 GRAND THEFT FROM LOCKED AUTO Saturday 2605\n", "15 AIDED CASE, MENTAL DISTURBED Friday 704\n", "16 LOST PROPERTY Saturday 907\n", "17 PETTY THEFT OF PROPERTY Saturday 985\n", "18 DRIVERS LICENSE, SUSPENDED OR REVOKED Tuesday 904\n", "19 PETTY THEFT OF PROPERTY Tuesday 607\n", "20 AIDED CASE, MENTAL DISTURBED Monday 718\n", "21 LOST PROPERTY Tuesday 593\n", "22 DRIVERS LICENSE, SUSPENDED OR REVOKED Saturday 785\n", "23 AIDED CASE, MENTAL DISTURBED Sunday 710\n", "24 GRAND THEFT FROM LOCKED AUTO Tuesday 2234\n", "25 LOST PROPERTY Monday 606\n", "26 PETTY THEFT OF PROPERTY Monday 558\n", "27 AIDED CASE, MENTAL DISTURBED Tuesday 735\n", "28 PETTY THEFT OF PROPERTY Sunday 749\n", "29 LOST PROPERTY Sunday 728\n", "30 DRIVERS LICENSE, SUSPENDED OR REVOKED Friday 899\n", "31 GRAND THEFT FROM LOCKED AUTO Sunday 2162\n", "32 DRIVERS LICENSE, SUSPENDED OR REVOKED Wednesday 878\n", "33 DRIVERS LICENSE, SUSPENDED OR REVOKED Thursday 829\n", "34 GRAND THEFT FROM LOCKED AUTO Monday 2267" ] } ], "prompt_number": 33 }, { "cell_type": "code", "collapsed": false, "input": [ "counts = crimes.groupby('descript').size()\n", "counts" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 36, "text": [ "descript\n", "ABANDONMENT OF CHILD 4\n", "ABORTION 1\n", "ACCESS CARD INFORMATION, PUBLICATION OF 1\n", "ACCESS CARD INFORMATION, THEFT OF 135\n", "ACCIDENTAL BURNS 1\n", "ACCIDENTAL SHOOTING 1\n", "ACTS AGAINST PUBLIC TRANSIT 34\n", "ADVERTISING DISTRIBUTORS PERMIT VIOLATION 2\n", "AEROSOL CONTAINER; SALE, PURCHASE OR POSSESSION OF 3\n", "AFFIXING ADVERTISMENTS TO POLES 1\n", "AGGRAVATED ASSAULT OF POLICE OFFICER,BODILY FORCE 39\n", "AGGRAVATED ASSAULT ON POLICE OFFICER WITH A KNIFE 2\n", "AGGRAVATED ASSAULT WITH A DEADLY WEAPON 1102\n", "AGGRAVATED ASSAULT WITH A GUN 158\n", "AGGRAVATED ASSAULT WITH A KNIFE 447\n", "...\n", "VIOLATION OF STAY AWAY ORDER 131\n", "WARRANT ARREST 4042\n", "WEAPON, ASSAULT, POSSESSION, MANUFACTURE, OR SALE 29\n", "WEAPON, ASSAULT, REGISTRATION OR TRANSFER VIOLATION 1\n", "WEAPON, DEADLY, CARRYING WITH INTENT TO COMMIT ASSAULT 37\n", "WEAPON, DEADLY, EXHIBITING TO RESIST ARREST 6\n", "WEAPON, DEADLY, POSSESSION OF TO VIOLATE 136.1 PC 1\n", "WEAPON, POSSESS OR BRING OTHER ON SCHOOL GROUNDS 33\n", "WEAPON, POSSESSING IN PUBLIC BUILDING OR OPEN MEETING 7\n", "WEAPON, TAKING OR ATTEMPTING TO TAKE FROM PEACE OFFICER 3\n", "WEAPONS POSSESSION BY JUVENILE SUSPECT 2\n", "WEARING MASK OR DISGUISE FOR UNLAWFUL PURPOSE 2\n", "WILLFUL CRUELTY TO CHILD 25\n", "WIRETAPS, UNAUTHORIZED 1\n", "YOUTH COURT 1\n", "Length: 744, dtype: int64" ] } ], "prompt_number": 36 }, { "cell_type": "code", "collapsed": false, "input": [ "counts.order(ascending=False)[:5].index" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 37, "text": [ "Index([u'GRAND THEFT FROM LOCKED AUTO', u'DRIVERS LICENSE, SUSPENDED OR REVOKED', u'AIDED CASE, MENTAL DISTURBED', u'PETTY THEFT OF PROPERTY', u'LOST PROPERTY'], dtype='object')" ] } ], "prompt_number": 37 }, { "cell_type": "code", "collapsed": false, "input": [ "# this could be very large\n", "K = 5\n", "counts = crimes.groupby('descript').size()\n", "top_descripts = counts.order(ascending=False)[:K].index\n", "top_descripts\n", "\n", "filtered = crimes[crimes.descript.isin(top_descripts)]\n", "result = (filtered\n", " .groupby(['descript', filtered.time.map(lambda x: x.hour)])\n", " .size())\n", "#result\n", "result.unstack('descript')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
descriptAIDED CASE, MENTAL DISTURBEDDRIVERS LICENSE, SUSPENDED OR REVOKEDGRAND THEFT FROM LOCKED AUTOLOST PROPERTYPETTY THEFT OF PROPERTY
time
0 170 384 526 250 284
1 120 250 327 177 234
2 114 193 237 106 83
3 103 121 133 42 43
4 74 131 78 11 27
5 77 97 96 32 22
6 123 45 144 44 35
7 165 126 194 97 83
8 217 201 276 167 142
9 234 208 405 206 207
10 245 204 588 265 227
11 284 207 665 246 229
12 288 286 655 395 283
13 282 262 633 289 228
14 254 243 652 264 270
15 265 236 666 321 294
16 331 329 779 319 317
17 275 386 1015 289 335
18 263 358 1731 277 345
19 300 291 1746 242 308
20 286 227 1595 243 274
21 252 279 1208 241 223
22 220 449 1121 238 239
23 198 436 812 261 309
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 39, "text": [ "descript AIDED CASE, MENTAL DISTURBED DRIVERS LICENSE, SUSPENDED OR REVOKED \\\n", "time \n", "0 170 384 \n", "1 120 250 \n", "2 114 193 \n", "3 103 121 \n", "4 74 131 \n", "5 77 97 \n", "6 123 45 \n", "7 165 126 \n", "8 217 201 \n", "9 234 208 \n", "10 245 204 \n", "11 284 207 \n", "12 288 286 \n", "13 282 262 \n", "14 254 243 \n", "15 265 236 \n", "16 331 329 \n", "17 275 386 \n", "18 263 358 \n", "19 300 291 \n", "20 286 227 \n", "21 252 279 \n", "22 220 449 \n", "23 198 436 \n", "\n", "descript GRAND THEFT FROM LOCKED AUTO LOST PROPERTY PETTY THEFT OF PROPERTY \n", "time \n", "0 526 250 284 \n", "1 327 177 234 \n", "2 237 106 83 \n", "3 133 42 43 \n", "4 78 11 27 \n", "5 96 32 22 \n", "6 144 44 35 \n", "7 194 97 83 \n", "8 276 167 142 \n", "9 405 206 207 \n", "10 588 265 227 \n", "11 665 246 229 \n", "12 655 395 283 \n", "13 633 289 228 \n", "14 652 264 270 \n", "15 666 321 294 \n", "16 779 319 317 \n", "17 1015 289 335 \n", "18 1731 277 345 \n", "19 1746 242 308 \n", "20 1595 243 274 \n", "21 1208 241 223 \n", "22 1121 238 239 \n", "23 812 261 309 " ] } ], "prompt_number": 39 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Subqueries\n", "===\n", "- Mainly used in WHERE clauses in SQL\n", "- Scalar and uncorrelated / correlated cases." ] }, { "cell_type": "code", "collapsed": false, "input": [ "acquisitions.price_amount.isnull().value_counts()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 40, "text": [ "True 51424\n", "False 3816\n", "dtype: int64" ] } ], "prompt_number": 40 }, { "cell_type": "code", "collapsed": false, "input": [ "acqs_known = acquisitions[acquisitions.price_amount.notnull()]\n", "acqs_known[acqs_known.price_amount > acqs_known.price_amount.mean()]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
company_permalinkcompany_namecompany_category_listcompany_marketcompany_country_codecompany_state_codecompany_regioncompany_cityacquirer_permalinkacquirer_name...acquirer_country_codeacquirer_state_codeacquirer_regionacquirer_cityacquired_atacquired_monthacquired_quarteracquired_yearprice_amountprice_currency_code
31 /organization/21st-century-insurance 21st Century Insurance |Finance|Business Services|Insurance| Business Services USA AL AL - Other De Armanville /organization/american-international-group American International Group... USA NY New York City New York 2007-09-01 2007-09 2007-Q3 2007 749000000 USD
56 /organization/3com 3Com |Curated Web| Curated Web USA MA Boston Marlborough /organization/hewlett-packard Hewlett-Packard... USA CA SF Bay Area Palo Alto 2009-11-11 2009-11 2009-Q4 2009 2700000000 USD
69 /organization/3par 3PAR |Services|Cloud Data Services|Software| Cloud Data Services USA CA SF Bay Area Fremont /organization/hewlett-packard Hewlett-Packard... USA CA SF Bay Area Palo Alto 2010-09-02 2010-09 2010-Q3 2010 2350000000 USD
108 /organization/91-wireless 91 Wireless |Mobile| Mobile CHN NaN Fuzhou Shi Fuzhou Shi /organization/baidu Baidu... CHN NaN Beijing Beijing 2013-07-15 2013-07 2013-Q3 2013 1900000000 USD
136 /organization/abc ABC |News| News NaN NaN NaN NaN /organization/the-walt-disney-company The Walt Disney Company... USA CA Los Angeles Burbank 1995-08-05 1995-08 1995-Q3 1995 19000000000 USD
152 /organization/abovenet AboveNet |Information Technology|Internet|Service Provi... Internet USA NY New York City White Plains /organization/zayo-group Zayo... USA CO Denver Boulder 2012-03-19 2012-03 2012-Q1 2012 2300000000 USD
155 /organization/abraxis-bioscience Abraxis BioScience |Pharmaceuticals|Biotechnology| Biotechnology USA CA Los Angeles Los Angeles /organization/celgene Celgene... USA NJ Newark Summit 2010-06-30 2010-06 2010-Q2 2010 2900000000 USD
173 /organization/accelrys Accelrys |Software| Software USA CA San Diego San Diego /organization/dassault Dassault Systemes... FRA NaN NaN NaN 2014-01-30 2014-01 2014-Q1 2014 750000000 USD
187 /organization/access-midstream-partners-lp-inv... Access Midstream Partners LP Investments |Natural Gas Uses| Natural Gas Uses USA OK Oklahoma City Oklahoma City /organization/the-williams-companies The Williams Companies... USA OK OK - Other Snyder 2012-12-21 2012-12 2012-Q4 2012 2250000000 USD
195 /organization/acclarent Acclarent |Medical|Biotechnology| Biotechnology USA CA SF Bay Area Menlo Park /organization/johnson-johnson Johnson & Johnson... USA NJ Newark New Brunswick 2009-12-17 2009-12 2009-Q4 2009 785000000 USD
343 /organization/admob AdMob |Mobile|Advertising| Advertising USA CA SF Bay Area Mountain View /organization/google Google... USA CA SF Bay Area Mountain View 2009-11-09 2009-11 2009-Q4 2009 750000000 USD
382 /organization/advanced-computer-software Advanced Computer Software |Health Care|Information Technology|Software| Software GBR NaN Cobham Cobham /organization/vista-equity-partners Vista Equity Partners... USA TX Austin Austin 2014-11-25 2014-11 2014-Q4 2014 725000000 GBP
406 /organization/advanstar-communications Advanstar Communications |Market Research|Events|Business Services| Events USA CA Los Angeles Santa Monica /organization/ubm-electronics UBM Electronics... USA CA SF Bay Area San Francisco 2014-10-01 2014-10 2014-Q4 2014 972000000 USD
420 /organization/advo Advo NaN NaN NaN NaN NaN NaN /organization/valassis Valassis... USA MI Detroit Livonia 2006-07-06 2006-07 2006-Q3 2006 1300000000 USD
430 /organization/aeluros Aeluros |Semiconductors| Semiconductors USA CA SF Bay Area Mountain View /organization/broadcom Broadcom... USA CA Anaheim Irvine 2012-02-17 2012-02 2012-Q1 2012 3700000000 USD
443 /organization/affiliated-computer-services Affiliated Computer Services |Consulting| Consulting USA TX Dallas Dallas /organization/xerox Xerox... USA AL AL - Other Normal 2009-09-28 2009-09 2009-Q3 2009 5750000000 USD
516 /organization/airwatch AirWatch |Mobile Devices|Mobile| Mobile USA GA Atlanta Atlanta /organization/vmware VMware... USA CA SF Bay Area Palo Alto 2014-01-22 2014-01 2014-Q1 2014 1540000000 USD
536 /organization/aktiv-kapital Aktiv Kapital |Finance| Finance NaN NaN NaN NaN /organization/portfolio-recovery-associates Portfolio Recovery Associates... USA VA Norfolk - Virginia Beach Norfolk 2014-02-20 2014-02 2014-Q1 2014 1300000000 USD
540 /organization/alabama-gas-corp Alabama Gas Corp NaN NaN USA AL Birmingham Birmingham /organization/laclede-group Laclede Group... USA MO St. Louis St Louis 2014-04-07 2014-04 2014-Q2 2014 1340000000 USD
576 /organization/algeta Algeta |Biotechnology| Biotechnology NOR NaN Oslo Oslo /organization/bayer-ag-germany Bayer AG... DEU NaN Leverkusen Leverkusen 2013-12-19 2013-12 2013-Q4 2013 2900000000 USD
588 /organization/alios-biopharma Alios BioPharma |Biotechnology| Biotechnology USA CA SF Bay Area South San Francisco /organization/johnson-johnson Johnson & Johnson... USA NJ Newark New Brunswick 2014-09-30 2014-09 2014-Q3 2014 1750000000 USD
606 /organization/allergan Allergan |Medical|Pharmaceuticals|Biotechnology| Medical USA CA Anaheim Irvine /organization/actavis Actavis... USA NJ Newark Parsippany 2014-11-16 2014-11 2014-Q4 2014 65500000000 USD
618 /organization/adl-media-inc AllMedia Inc |Advertising| Advertising USA IA IA - Other Plano /organization/liberty-global Liberty Global... USA CO Denver Englewood 2014-05-08 2014-05 2014-Q2 2014 930000000 USD
629 /organization/alltel Alltel |Public Relations| Public Relations USA AR Little Rock Little Rock /organization/verizon Verizon Communications... USA NY New York City New York 2008-06-05 2008-06 2008-Q2 2008 28100000000 USD
654 /organization/altalink AltaLink |Local Businesses|Energy|Utilities| Utilities CAN AB Calgary Calgary /organization/midamerican-energy Berkshire Hathaway Energy... USA IA Des Moines Des Moines 2014-05-02 2014-05 2014-Q2 2014 3520000000 USD
691 /organization/amcol-international AMCOL International |Manufacturing| Manufacturing USA IL Chicago Hoffman Estates /organization/minerals-technologies Minerals Technologies... USA NY New York City New York 2014-03-10 2014-03 2014-Q1 2014 1700000000 USD
703 /organization/americanexpress American Express |Finance| Finance USA NY New York City New York /organization/standard-chartered-bank Standard Chartered Bank... GBR NaN London London 2008-02-28 2008-02 2008-Q1 2008 823000000 USD
716 /organization/american-petroleum-tankers American Petroleum Tankers NaN NaN USA PA Philadelphia Plymouth Meeting /organization/kinder-morgan Kinder Morgan Energy... USA TX Houston Houston 2013-12-23 2013-12 2013-Q4 2013 962000000 USD
719 /organization/american-realty-capital-healthca... American Realty Capital Healthcare Trust NaN NaN USA MA Boston Boston /organization/ventas Ventas... USA IL Chicago Chicago 2014-06-02 2014-06 2014-Q2 2014 2600000000 USD
723 /organization/ami-semiconductor AMI Semiconductor |Electronics|Manufacturing|Design| Design USA ID Idaho Falls Pocatello /organization/on-semiconductor ON Semiconductor... USA AZ Phoenix Phoenix 2007-12-14 2007-12 2007-Q4 2007 915000000 USD
..................................................................
12702 /organization/viawest ViaWest |Data Centers|Services|Web Hosting| Web Hosting USA CO Denver Greenwood Village /organization/shaw-communications Shaw Communications... CAN AB Calgary Calgary 2014-08-04 2014-08 2014-Q3 2014 1200000000 USD
12704 /organization/viber-media Viber Media |Android|iPhone|VoIP|Messaging| Android CYP NaN Cyprus Limassol /organization/rakuten Rakuten... JPN NaN Tokyo Tokyo 2014-02-14 2014-02 2014-Q1 2014 900000000 USD
12717 /organization/video-gaming-technologies Video Gaming Technologies |Video Games| Video Games USA TN Nashville Franklin /organization/aristocrat-technologies-inc Aristocrat Technologies, Inc... AUS NaN AUS - Other NaN 2014-07-07 2014-07 2014-Q3 2014 1300000000 USD
12749 /organization/vion-food-group Vion Food Group |Hospitality| Hospitality NLD NaN Eindhoven Eindhoven /organization/darling-international Darling Ingredients... USA TX Dallas Irving 2013-10-07 2013-10 2013-Q4 2013 1600000000 EUR
12764 /organization/viropharma ViroPharma |Pharmaceuticals|Biotechnology| Biotechnology USA PA Philadelphia Exton /organization/shire Shire... IRL NaN Dublin Dublin 2013-11-10 2013-11 2013-Q4 2013 4200000000 USD
12843 /organization/vizada Vizada |DOD/Military|Mobile| Mobile USA MD Washington, D.C. Rockville /organization/eads-astrium EADS Astrium... USA TX Houston Houston 2011-08-03 2011-08 2011-Q3 2011 960000000 USD
12854 /organization/vk Vkontake |Communities|Social Network Media|Social Media| Communities RUS NaN St. Petersburg Saint Petersburg /organization/mail-ru Mail.Ru Group... RUS NaN Moscow Moscow 2014-09-16 2014-09 2014-Q3 2014 1470000000 USD
12887 /organization/volvo-rents-construction Volvo Rents Construction NaN NaN CHE NaN CHE - Other Eison /organization/platinum-equity-llc Platinum Equity... USA CA Los Angeles Beverly Hills 2013-12-10 2013-12 2013-Q4 2013 1100000000 USD
12957 /organization/warner-chilcott Warner Chilcott |Pharmaceuticals|Biotechnology| Biotechnology USA NJ Newark Rockaway /organization/actavis Actavis... USA NJ Newark Parsippany 2013-05-20 2013-05 2013-Q2 2013 8500000000 USD
12981 /organization/waze Waze |Navigation|Transportation| Transportation USA CA SF Bay Area Palo Alto /organization/google Google... USA CA SF Bay Area Mountain View 2013-06-11 2013-06 2013-Q2 2013 996000000 USD
13011 /organization/webex WebEx |Curated Web| Curated Web USA CA SF Bay Area Santa Clara /organization/cisco Cisco... USA CA SF Bay Area San Jose 2007-03-15 2007-03 2007-Q1 2007 3200000000 USD
13030 /organization/webtrends Webtrends |SEO|Curated Web| Curated Web USA OR Portland, Oregon Portland /organization/netiq NetIQ... USA CA SF Bay Area San Jose 2001-01-17 2001-01 2001-Q1 2001 1000000000 USD
13061 /organization/western-gas-resources Western Gas Resources |Clean Energy|Oil| Clean Energy USA TX Seminole Seminole /organization/anadarko-petroleum-corporation Anadarko Petroleum Corporation... USA TX Seminole Seminole 2006-06-01 2006-06 2006-Q2 2006 4800000000 USD
13069 /organization/weyerhaeuser Weyerhaeuser |Manufacturing| Manufacturing USA WA Seattle Federal Way /organization/tri-pointe-homes Tri Pointe Homes... USA CA SF Bay Area San Ramon 2013-11-04 2013-11 2013-Q4 2013 2700000000 USD
13080 /organization/whatsapp WhatsApp |Messaging| Messaging USA CA SF Bay Area Santa Clara /organization/facebook Facebook... USA CA SF Bay Area Menlo Park 2014-02-19 2014-02 2014-Q1 2014 19000000000 USD
13081 /organization/wheelabrator-technologies Wheelabrator Technologies |Waste Management| Waste Management USA NH Manchester, New Hampshire Hampton /organization/energy-capital-partners Energy Capital Partners... USA NJ Newark Short Hills 2014-07-29 2014-07 2014-Q3 2014 1940000000 USD
13121 /organization/wild-flavors WILD Flavors |Food Processing| Food Processing DEU NaN Frankfurt Heidelberg /organization/archer-daniels-midland-company Archer Daniels Midland Company... USA IL Chicago Chicago 2014-07-07 2014-07 2014-Q3 2014 3000000000 USD
13143 /organization/wind-river Wind River |Software| Software USA CA SF Bay Area Alameda /organization/intel Intel... USA CA SF Bay Area Santa Clara 2009-07-17 2009-07 2009-Q3 2009 884000000 USD
13146 /organization/windsor-foods Windsor Foods |Consumer Goods|Food Processing| Food Processing USA TX Houston Houston /organization/ajinomoto Ajinomoto... JPN NaN Tokyo Tokyo 2014-09-10 2014-09 2014-Q3 2014 800000000 USD
13174 /organization/wittur Wittur |Manufacturing|Public Safety|Heavy Industry| Heavy Industry NaN NaN NaN NaN /organization/bain-capital-2 Bain Capital... USA MA Boston Boston 2014-12-23 2014-12 2014-Q4 2014 750000000 USD
13238 /organization/wyeth-pharmaceuticals Wyeth Pharmaceuticals |Biotechnology|Pharmaceuticals|Health and Well... Biotechnology NaN NaN NaN NaN /organization/nestl Nestl\u0082... CHE NaN Vevey Vevey 2012-04-01 2012-04 2012-Q2 2012 11850000000 USD
13239 /organization/wyeth-pharmaceuticals Wyeth Pharmaceuticals |Biotechnology|Pharmaceuticals|Health and Well... Biotechnology NaN NaN NaN NaN /organization/pfizer Pfizer... USA NY New York City New York 2009-01-23 2009-01 2009-Q1 2009 68000000000 USD
13277 /organization/xircom Xircom |Web Hosting| Web Hosting USA CA Los Angeles Thousand Oaks /organization/intel Intel... USA CA SF Bay Area Santa Clara 2001-01-15 2001-01 2001-Q1 2001 748000000 USD
13323 /organization/yammer Yammer |Twitter Applications|Networking|Social Media|... Enterprise Software USA CA SF Bay Area San Francisco /organization/microsoft Microsoft... USA WA Seattle Redmond 2012-06-25 2012-06 2012-Q2 2012 1200000000 USD
13324 /organization/yankee-candle-company Yankee Candle Company |Manufacturing| Manufacturing USA AR Fayetteville Rogers /organization/jarden Jarden... USA NY New York City Rye 2013-09-03 2013-09 2013-Q3 2013 1750000000 USD
13356 /organization/yoplait Yoplait |Food Processing| Food Processing USA MN Minneapolis Minneapolis /organization/generalmills General Mills... USA MN Minneapolis Minneapolis 2011-07-01 2011-07 2011-Q3 2011 1200000000 USD
13357 /organization/york-risk-services-group York Risk Services Group |Insurance| Insurance USA NJ Newark Parsippany /organization/onex Onex... CAN ON Toronto Toronto 2014-07-16 2014-07 2014-Q3 2014 1330000000 USD
13371 /organization/youtube YouTube |Video|Online Rental|Entertainment|Games| Games USA CA SF Bay Area San Bruno /organization/google Google... USA CA SF Bay Area Mountain View 2006-10-01 2006-10 2006-Q4 2006 1650000000 USD
13396 /organization/zappos Zappos |Curated Web| Curated Web USA NV Las Vegas Las Vegas /organization/amazon Amazon... USA WA Seattle Seattle 2009-07-22 2009-07 2009-Q3 2009 1200000000 USD
13437 /organization/ziggo Ziggo |Information Services|Entertainment|Investment... Investment Management NLD NaN Utrecht Utrecht /organization/liberty-global Liberty Global... USA CO Denver Englewood 2014-01-27 2014-01 2014-Q1 2014 13700000000 USD
\n", "

537 rows \u00d7 22 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 41, "text": [ " company_permalink \\\n", "31 /organization/21st-century-insurance \n", "56 /organization/3com \n", "69 /organization/3par \n", "108 /organization/91-wireless \n", "136 /organization/abc \n", "152 /organization/abovenet \n", "155 /organization/abraxis-bioscience \n", "173 /organization/accelrys \n", "187 /organization/access-midstream-partners-lp-inv... \n", "195 /organization/acclarent \n", "343 /organization/admob \n", "382 /organization/advanced-computer-software \n", "406 /organization/advanstar-communications \n", "420 /organization/advo \n", "430 /organization/aeluros \n", "443 /organization/affiliated-computer-services \n", "516 /organization/airwatch \n", "536 /organization/aktiv-kapital \n", "540 /organization/alabama-gas-corp \n", "576 /organization/algeta \n", "588 /organization/alios-biopharma \n", "606 /organization/allergan \n", "618 /organization/adl-media-inc \n", "629 /organization/alltel \n", "654 /organization/altalink \n", "691 /organization/amcol-international \n", "703 /organization/americanexpress \n", "716 /organization/american-petroleum-tankers \n", "719 /organization/american-realty-capital-healthca... \n", "723 /organization/ami-semiconductor \n", "... ... \n", "12702 /organization/viawest \n", "12704 /organization/viber-media \n", "12717 /organization/video-gaming-technologies \n", "12749 /organization/vion-food-group \n", "12764 /organization/viropharma \n", "12843 /organization/vizada \n", "12854 /organization/vk \n", "12887 /organization/volvo-rents-construction \n", "12957 /organization/warner-chilcott \n", "12981 /organization/waze \n", "13011 /organization/webex \n", "13030 /organization/webtrends \n", "13061 /organization/western-gas-resources \n", "13069 /organization/weyerhaeuser \n", "13080 /organization/whatsapp \n", "13081 /organization/wheelabrator-technologies \n", "13121 /organization/wild-flavors \n", "13143 /organization/wind-river \n", "13146 /organization/windsor-foods \n", "13174 /organization/wittur \n", "13238 /organization/wyeth-pharmaceuticals \n", "13239 /organization/wyeth-pharmaceuticals \n", "13277 /organization/xircom \n", "13323 /organization/yammer \n", "13324 /organization/yankee-candle-company \n", "13356 /organization/yoplait \n", "13357 /organization/york-risk-services-group \n", "13371 /organization/youtube \n", "13396 /organization/zappos \n", "13437 /organization/ziggo \n", "\n", " company_name \\\n", "31 21st Century Insurance \n", "56 3Com \n", "69 3PAR \n", "108 91 Wireless \n", "136 ABC \n", "152 AboveNet \n", "155 Abraxis BioScience \n", "173 Accelrys \n", "187 Access Midstream Partners LP Investments \n", "195 Acclarent \n", "343 AdMob \n", "382 Advanced Computer Software \n", "406 Advanstar Communications \n", "420 Advo \n", "430 Aeluros \n", "443 Affiliated Computer Services \n", "516 AirWatch \n", "536 Aktiv Kapital \n", "540 Alabama Gas Corp \n", "576 Algeta \n", "588 Alios BioPharma \n", "606 Allergan \n", "618 AllMedia Inc \n", "629 Alltel \n", "654 AltaLink \n", "691 AMCOL International \n", "703 American Express \n", "716 American Petroleum Tankers \n", "719 American Realty Capital Healthcare Trust \n", "723 AMI Semiconductor \n", "... ... \n", "12702 ViaWest \n", "12704 Viber Media \n", "12717 Video Gaming Technologies \n", "12749 Vion Food Group \n", "12764 ViroPharma \n", "12843 Vizada \n", "12854 Vkontake \n", "12887 Volvo Rents Construction \n", "12957 Warner Chilcott \n", "12981 Waze \n", "13011 WebEx \n", "13030 Webtrends \n", "13061 Western Gas Resources \n", "13069 Weyerhaeuser \n", "13080 WhatsApp \n", "13081 Wheelabrator Technologies \n", "13121 WILD Flavors \n", "13143 Wind River \n", "13146 Windsor Foods \n", "13174 Wittur \n", "13238 Wyeth Pharmaceuticals \n", "13239 Wyeth Pharmaceuticals \n", "13277 Xircom \n", "13323 Yammer \n", "13324 Yankee Candle Company \n", "13356 Yoplait \n", "13357 York Risk Services Group \n", "13371 YouTube \n", "13396 Zappos \n", "13437 Ziggo \n", "\n", " company_category_list \\\n", "31 |Finance|Business Services|Insurance| \n", "56 |Curated Web| \n", "69 |Services|Cloud Data Services|Software| \n", "108 |Mobile| \n", "136 |News| \n", "152 |Information Technology|Internet|Service Provi... \n", "155 |Pharmaceuticals|Biotechnology| \n", "173 |Software| \n", "187 |Natural Gas Uses| \n", "195 |Medical|Biotechnology| \n", "343 |Mobile|Advertising| \n", "382 |Health Care|Information Technology|Software| \n", "406 |Market Research|Events|Business Services| \n", "420 NaN \n", "430 |Semiconductors| \n", "443 |Consulting| \n", "516 |Mobile Devices|Mobile| \n", "536 |Finance| \n", "540 NaN \n", "576 |Biotechnology| \n", "588 |Biotechnology| \n", "606 |Medical|Pharmaceuticals|Biotechnology| \n", "618 |Advertising| \n", "629 |Public Relations| \n", "654 |Local Businesses|Energy|Utilities| \n", "691 |Manufacturing| \n", "703 |Finance| \n", "716 NaN \n", "719 NaN \n", "723 |Electronics|Manufacturing|Design| \n", "... ... \n", "12702 |Data Centers|Services|Web Hosting| \n", "12704 |Android|iPhone|VoIP|Messaging| \n", "12717 |Video Games| \n", "12749 |Hospitality| \n", "12764 |Pharmaceuticals|Biotechnology| \n", "12843 |DOD/Military|Mobile| \n", "12854 |Communities|Social Network Media|Social Media| \n", "12887 NaN \n", "12957 |Pharmaceuticals|Biotechnology| \n", "12981 |Navigation|Transportation| \n", "13011 |Curated Web| \n", "13030 |SEO|Curated Web| \n", "13061 |Clean Energy|Oil| \n", "13069 |Manufacturing| \n", "13080 |Messaging| \n", "13081 |Waste Management| \n", "13121 |Food Processing| \n", "13143 |Software| \n", "13146 |Consumer Goods|Food Processing| \n", "13174 |Manufacturing|Public Safety|Heavy Industry| \n", "13238 |Biotechnology|Pharmaceuticals|Health and Well... \n", "13239 |Biotechnology|Pharmaceuticals|Health and Well... \n", "13277 |Web Hosting| \n", "13323 |Twitter Applications|Networking|Social Media|... \n", "13324 |Manufacturing| \n", "13356 |Food Processing| \n", "13357 |Insurance| \n", "13371 |Video|Online Rental|Entertainment|Games| \n", "13396 |Curated Web| \n", "13437 |Information Services|Entertainment|Investment... \n", "\n", " company_market company_country_code company_state_code \\\n", "31 Business Services USA AL \n", "56 Curated Web USA MA \n", "69 Cloud Data Services USA CA \n", "108 Mobile CHN NaN \n", "136 News NaN NaN \n", "152 Internet USA NY \n", "155 Biotechnology USA CA \n", "173 Software USA CA \n", "187 Natural Gas Uses USA OK \n", "195 Biotechnology USA CA \n", "343 Advertising USA CA \n", "382 Software GBR NaN \n", "406 Events USA CA \n", "420 NaN NaN NaN \n", "430 Semiconductors USA CA \n", "443 Consulting USA TX \n", "516 Mobile USA GA \n", "536 Finance NaN NaN \n", "540 NaN USA AL \n", "576 Biotechnology NOR NaN \n", "588 Biotechnology USA CA \n", "606 Medical USA CA \n", "618 Advertising USA IA \n", "629 Public Relations USA AR \n", "654 Utilities CAN AB \n", "691 Manufacturing USA IL \n", "703 Finance USA NY \n", "716 NaN USA PA \n", "719 NaN USA MA \n", "723 Design USA ID \n", "... ... ... ... \n", "12702 Web Hosting USA CO \n", "12704 Android CYP NaN \n", "12717 Video Games USA TN \n", "12749 Hospitality NLD NaN \n", "12764 Biotechnology USA PA \n", "12843 Mobile USA MD \n", "12854 Communities RUS NaN \n", "12887 NaN CHE NaN \n", "12957 Biotechnology USA NJ \n", "12981 Transportation USA CA \n", "13011 Curated Web USA CA \n", "13030 Curated Web USA OR \n", "13061 Clean Energy USA TX \n", "13069 Manufacturing USA WA \n", "13080 Messaging USA CA \n", "13081 Waste Management USA NH \n", "13121 Food Processing DEU NaN \n", "13143 Software USA CA \n", "13146 Food Processing USA TX \n", "13174 Heavy Industry NaN NaN \n", "13238 Biotechnology NaN NaN \n", "13239 Biotechnology NaN NaN \n", "13277 Web Hosting USA CA \n", "13323 Enterprise Software USA CA \n", "13324 Manufacturing USA AR \n", "13356 Food Processing USA MN \n", "13357 Insurance USA NJ \n", "13371 Games USA CA \n", "13396 Curated Web USA NV \n", "13437 Investment Management NLD NaN \n", "\n", " company_region company_city \\\n", "31 AL - Other De Armanville \n", "56 Boston Marlborough \n", "69 SF Bay Area Fremont \n", "108 Fuzhou Shi Fuzhou Shi \n", "136 NaN NaN \n", "152 New York City White Plains \n", "155 Los Angeles Los Angeles \n", "173 San Diego San Diego \n", "187 Oklahoma City Oklahoma City \n", "195 SF Bay Area Menlo Park \n", "343 SF Bay Area Mountain View \n", "382 Cobham Cobham \n", "406 Los Angeles Santa Monica \n", "420 NaN NaN \n", "430 SF Bay Area Mountain View \n", "443 Dallas Dallas \n", "516 Atlanta Atlanta \n", "536 NaN NaN \n", "540 Birmingham Birmingham \n", "576 Oslo Oslo \n", "588 SF Bay Area South San Francisco \n", "606 Anaheim Irvine \n", "618 IA - Other Plano \n", "629 Little Rock Little Rock \n", "654 Calgary Calgary \n", "691 Chicago Hoffman Estates \n", "703 New York City New York \n", "716 Philadelphia Plymouth Meeting \n", "719 Boston Boston \n", "723 Idaho Falls Pocatello \n", "... ... ... \n", "12702 Denver Greenwood Village \n", "12704 Cyprus Limassol \n", "12717 Nashville Franklin \n", "12749 Eindhoven Eindhoven \n", "12764 Philadelphia Exton \n", "12843 Washington, D.C. Rockville \n", "12854 St. Petersburg Saint Petersburg \n", "12887 CHE - Other Eison \n", "12957 Newark Rockaway \n", "12981 SF Bay Area Palo Alto \n", "13011 SF Bay Area Santa Clara \n", "13030 Portland, Oregon Portland \n", "13061 Seminole Seminole \n", "13069 Seattle Federal Way \n", "13080 SF Bay Area Santa Clara \n", "13081 Manchester, New Hampshire Hampton \n", "13121 Frankfurt Heidelberg \n", "13143 SF Bay Area Alameda \n", "13146 Houston Houston \n", "13174 NaN NaN \n", "13238 NaN NaN \n", "13239 NaN NaN \n", "13277 Los Angeles Thousand Oaks \n", "13323 SF Bay Area San Francisco \n", "13324 Fayetteville Rogers \n", "13356 Minneapolis Minneapolis \n", "13357 Newark Parsippany \n", "13371 SF Bay Area San Bruno \n", "13396 Las Vegas Las Vegas \n", "13437 Utrecht Utrecht \n", "\n", " acquirer_permalink \\\n", "31 /organization/american-international-group \n", "56 /organization/hewlett-packard \n", "69 /organization/hewlett-packard \n", "108 /organization/baidu \n", "136 /organization/the-walt-disney-company \n", "152 /organization/zayo-group \n", "155 /organization/celgene \n", "173 /organization/dassault \n", "187 /organization/the-williams-companies \n", "195 /organization/johnson-johnson \n", "343 /organization/google \n", "382 /organization/vista-equity-partners \n", "406 /organization/ubm-electronics \n", "420 /organization/valassis \n", "430 /organization/broadcom \n", "443 /organization/xerox \n", "516 /organization/vmware \n", "536 /organization/portfolio-recovery-associates \n", "540 /organization/laclede-group \n", "576 /organization/bayer-ag-germany \n", "588 /organization/johnson-johnson \n", "606 /organization/actavis \n", "618 /organization/liberty-global \n", "629 /organization/verizon \n", "654 /organization/midamerican-energy \n", "691 /organization/minerals-technologies \n", "703 /organization/standard-chartered-bank \n", "716 /organization/kinder-morgan \n", "719 /organization/ventas \n", "723 /organization/on-semiconductor \n", "... ... \n", "12702 /organization/shaw-communications \n", "12704 /organization/rakuten \n", "12717 /organization/aristocrat-technologies-inc \n", "12749 /organization/darling-international \n", "12764 /organization/shire \n", "12843 /organization/eads-astrium \n", "12854 /organization/mail-ru \n", "12887 /organization/platinum-equity-llc \n", "12957 /organization/actavis \n", "12981 /organization/google \n", "13011 /organization/cisco \n", "13030 /organization/netiq \n", "13061 /organization/anadarko-petroleum-corporation \n", "13069 /organization/tri-pointe-homes \n", "13080 /organization/facebook \n", "13081 /organization/energy-capital-partners \n", "13121 /organization/archer-daniels-midland-company \n", "13143 /organization/intel \n", "13146 /organization/ajinomoto \n", "13174 /organization/bain-capital-2 \n", "13238 /organization/nestl \n", "13239 /organization/pfizer \n", "13277 /organization/intel \n", "13323 /organization/microsoft \n", "13324 /organization/jarden \n", "13356 /organization/generalmills \n", "13357 /organization/onex \n", "13371 /organization/google \n", "13396 /organization/amazon \n", "13437 /organization/liberty-global \n", "\n", " acquirer_name ... \\\n", "31 American International Group ... \n", "56 Hewlett-Packard ... \n", "69 Hewlett-Packard ... \n", "108 Baidu ... \n", "136 The Walt Disney Company ... \n", "152 Zayo ... \n", "155 Celgene ... \n", "173 Dassault Systemes ... \n", "187 The Williams Companies ... \n", "195 Johnson & Johnson ... \n", "343 Google ... \n", "382 Vista Equity Partners ... \n", "406 UBM Electronics ... \n", "420 Valassis ... \n", "430 Broadcom ... \n", "443 Xerox ... \n", "516 VMware ... \n", "536 Portfolio Recovery Associates ... \n", "540 Laclede Group ... \n", "576 Bayer AG ... \n", "588 Johnson & Johnson ... \n", "606 Actavis ... \n", "618 Liberty Global ... \n", "629 Verizon Communications ... \n", "654 Berkshire Hathaway Energy ... \n", "691 Minerals Technologies ... \n", "703 Standard Chartered Bank ... \n", "716 Kinder Morgan Energy ... \n", "719 Ventas ... \n", "723 ON Semiconductor ... \n", "... ... ... \n", "12702 Shaw Communications ... \n", "12704 Rakuten ... \n", "12717 Aristocrat Technologies, Inc ... \n", "12749 Darling Ingredients ... \n", "12764 Shire ... \n", "12843 EADS Astrium ... \n", "12854 Mail.Ru Group ... \n", "12887 Platinum Equity ... \n", "12957 Actavis ... \n", "12981 Google ... \n", "13011 Cisco ... \n", "13030 NetIQ ... \n", "13061 Anadarko Petroleum Corporation ... \n", "13069 Tri Pointe Homes ... \n", "13080 Facebook ... \n", "13081 Energy Capital Partners ... \n", "13121 Archer Daniels Midland Company ... \n", "13143 Intel ... \n", "13146 Ajinomoto ... \n", "13174 Bain Capital ... \n", "13238 Nestl\u0082 ... \n", "13239 Pfizer ... \n", "13277 Intel ... \n", "13323 Microsoft ... \n", "13324 Jarden ... \n", "13356 General Mills ... \n", "13357 Onex ... \n", "13371 Google ... \n", "13396 Amazon ... \n", "13437 Liberty Global ... \n", "\n", " acquirer_country_code acquirer_state_code acquirer_region \\\n", "31 USA NY New York City \n", "56 USA CA SF Bay Area \n", "69 USA CA SF Bay Area \n", "108 CHN NaN Beijing \n", "136 USA CA Los Angeles \n", "152 USA CO Denver \n", "155 USA NJ Newark \n", "173 FRA NaN NaN \n", "187 USA OK OK - Other \n", "195 USA NJ Newark \n", "343 USA CA SF Bay Area \n", "382 USA TX Austin \n", "406 USA CA SF Bay Area \n", "420 USA MI Detroit \n", "430 USA CA Anaheim \n", "443 USA AL AL - Other \n", "516 USA CA SF Bay Area \n", "536 USA VA Norfolk - Virginia Beach \n", "540 USA MO St. Louis \n", "576 DEU NaN Leverkusen \n", "588 USA NJ Newark \n", "606 USA NJ Newark \n", "618 USA CO Denver \n", "629 USA NY New York City \n", "654 USA IA Des Moines \n", "691 USA NY New York City \n", "703 GBR NaN London \n", "716 USA TX Houston \n", "719 USA IL Chicago \n", "723 USA AZ Phoenix \n", "... ... ... ... \n", "12702 CAN AB Calgary \n", "12704 JPN NaN Tokyo \n", "12717 AUS NaN AUS - Other \n", "12749 USA TX Dallas \n", "12764 IRL NaN Dublin \n", "12843 USA TX Houston \n", "12854 RUS NaN Moscow \n", "12887 USA CA Los Angeles \n", "12957 USA NJ Newark \n", "12981 USA CA SF Bay Area \n", "13011 USA CA SF Bay Area \n", "13030 USA CA SF Bay Area \n", "13061 USA TX Seminole \n", "13069 USA CA SF Bay Area \n", "13080 USA CA SF Bay Area \n", "13081 USA NJ Newark \n", "13121 USA IL Chicago \n", "13143 USA CA SF Bay Area \n", "13146 JPN NaN Tokyo \n", "13174 USA MA Boston \n", "13238 CHE NaN Vevey \n", "13239 USA NY New York City \n", "13277 USA CA SF Bay Area \n", "13323 USA WA Seattle \n", "13324 USA NY New York City \n", "13356 USA MN Minneapolis \n", "13357 CAN ON Toronto \n", "13371 USA CA SF Bay Area \n", "13396 USA WA Seattle \n", "13437 USA CO Denver \n", "\n", " acquirer_city acquired_at acquired_month acquired_quarter \\\n", "31 New York 2007-09-01 2007-09 2007-Q3 \n", "56 Palo Alto 2009-11-11 2009-11 2009-Q4 \n", "69 Palo Alto 2010-09-02 2010-09 2010-Q3 \n", "108 Beijing 2013-07-15 2013-07 2013-Q3 \n", "136 Burbank 1995-08-05 1995-08 1995-Q3 \n", "152 Boulder 2012-03-19 2012-03 2012-Q1 \n", "155 Summit 2010-06-30 2010-06 2010-Q2 \n", "173 NaN 2014-01-30 2014-01 2014-Q1 \n", "187 Snyder 2012-12-21 2012-12 2012-Q4 \n", "195 New Brunswick 2009-12-17 2009-12 2009-Q4 \n", "343 Mountain View 2009-11-09 2009-11 2009-Q4 \n", "382 Austin 2014-11-25 2014-11 2014-Q4 \n", "406 San Francisco 2014-10-01 2014-10 2014-Q4 \n", "420 Livonia 2006-07-06 2006-07 2006-Q3 \n", "430 Irvine 2012-02-17 2012-02 2012-Q1 \n", "443 Normal 2009-09-28 2009-09 2009-Q3 \n", "516 Palo Alto 2014-01-22 2014-01 2014-Q1 \n", "536 Norfolk 2014-02-20 2014-02 2014-Q1 \n", "540 St Louis 2014-04-07 2014-04 2014-Q2 \n", "576 Leverkusen 2013-12-19 2013-12 2013-Q4 \n", "588 New Brunswick 2014-09-30 2014-09 2014-Q3 \n", "606 Parsippany 2014-11-16 2014-11 2014-Q4 \n", "618 Englewood 2014-05-08 2014-05 2014-Q2 \n", "629 New York 2008-06-05 2008-06 2008-Q2 \n", "654 Des Moines 2014-05-02 2014-05 2014-Q2 \n", "691 New York 2014-03-10 2014-03 2014-Q1 \n", "703 London 2008-02-28 2008-02 2008-Q1 \n", "716 Houston 2013-12-23 2013-12 2013-Q4 \n", "719 Chicago 2014-06-02 2014-06 2014-Q2 \n", "723 Phoenix 2007-12-14 2007-12 2007-Q4 \n", "... ... ... ... ... \n", "12702 Calgary 2014-08-04 2014-08 2014-Q3 \n", "12704 Tokyo 2014-02-14 2014-02 2014-Q1 \n", "12717 NaN 2014-07-07 2014-07 2014-Q3 \n", "12749 Irving 2013-10-07 2013-10 2013-Q4 \n", "12764 Dublin 2013-11-10 2013-11 2013-Q4 \n", "12843 Houston 2011-08-03 2011-08 2011-Q3 \n", "12854 Moscow 2014-09-16 2014-09 2014-Q3 \n", "12887 Beverly Hills 2013-12-10 2013-12 2013-Q4 \n", "12957 Parsippany 2013-05-20 2013-05 2013-Q2 \n", "12981 Mountain View 2013-06-11 2013-06 2013-Q2 \n", "13011 San Jose 2007-03-15 2007-03 2007-Q1 \n", "13030 San Jose 2001-01-17 2001-01 2001-Q1 \n", "13061 Seminole 2006-06-01 2006-06 2006-Q2 \n", "13069 San Ramon 2013-11-04 2013-11 2013-Q4 \n", "13080 Menlo Park 2014-02-19 2014-02 2014-Q1 \n", "13081 Short Hills 2014-07-29 2014-07 2014-Q3 \n", "13121 Chicago 2014-07-07 2014-07 2014-Q3 \n", "13143 Santa Clara 2009-07-17 2009-07 2009-Q3 \n", "13146 Tokyo 2014-09-10 2014-09 2014-Q3 \n", "13174 Boston 2014-12-23 2014-12 2014-Q4 \n", "13238 Vevey 2012-04-01 2012-04 2012-Q2 \n", "13239 New York 2009-01-23 2009-01 2009-Q1 \n", "13277 Santa Clara 2001-01-15 2001-01 2001-Q1 \n", "13323 Redmond 2012-06-25 2012-06 2012-Q2 \n", "13324 Rye 2013-09-03 2013-09 2013-Q3 \n", "13356 Minneapolis 2011-07-01 2011-07 2011-Q3 \n", "13357 Toronto 2014-07-16 2014-07 2014-Q3 \n", "13371 Mountain View 2006-10-01 2006-10 2006-Q4 \n", "13396 Seattle 2009-07-22 2009-07 2009-Q3 \n", "13437 Englewood 2014-01-27 2014-01 2014-Q1 \n", "\n", " acquired_year price_amount price_currency_code \n", "31 2007 749000000 USD \n", "56 2009 2700000000 USD \n", "69 2010 2350000000 USD \n", "108 2013 1900000000 USD \n", "136 1995 19000000000 USD \n", "152 2012 2300000000 USD \n", "155 2010 2900000000 USD \n", "173 2014 750000000 USD \n", "187 2012 2250000000 USD \n", "195 2009 785000000 USD \n", "343 2009 750000000 USD \n", "382 2014 725000000 GBP \n", "406 2014 972000000 USD \n", "420 2006 1300000000 USD \n", "430 2012 3700000000 USD \n", "443 2009 5750000000 USD \n", "516 2014 1540000000 USD \n", "536 2014 1300000000 USD \n", "540 2014 1340000000 USD \n", "576 2013 2900000000 USD \n", "588 2014 1750000000 USD \n", "606 2014 65500000000 USD \n", "618 2014 930000000 USD \n", "629 2008 28100000000 USD \n", "654 2014 3520000000 USD \n", "691 2014 1700000000 USD \n", "703 2008 823000000 USD \n", "716 2013 962000000 USD \n", "719 2014 2600000000 USD \n", "723 2007 915000000 USD \n", "... ... ... ... \n", "12702 2014 1200000000 USD \n", "12704 2014 900000000 USD \n", "12717 2014 1300000000 USD \n", "12749 2013 1600000000 EUR \n", "12764 2013 4200000000 USD \n", "12843 2011 960000000 USD \n", "12854 2014 1470000000 USD \n", "12887 2013 1100000000 USD \n", "12957 2013 8500000000 USD \n", "12981 2013 996000000 USD \n", "13011 2007 3200000000 USD \n", "13030 2001 1000000000 USD \n", "13061 2006 4800000000 USD \n", "13069 2013 2700000000 USD \n", "13080 2014 19000000000 USD \n", "13081 2014 1940000000 USD \n", "13121 2014 3000000000 USD \n", "13143 2009 884000000 USD \n", "13146 2014 800000000 USD \n", "13174 2014 750000000 USD \n", "13238 2012 11850000000 USD \n", "13239 2009 68000000000 USD \n", "13277 2001 748000000 USD \n", "13323 2012 1200000000 USD \n", "13324 2013 1750000000 USD \n", "13356 2011 1200000000 USD \n", "13357 2014 1330000000 USD \n", "13371 2006 1650000000 USD \n", "13396 2009 1200000000 USD \n", "13437 2014 13700000000 USD \n", "\n", "[537 rows x 22 columns]" ] } ], "prompt_number": 41 }, { "cell_type": "code", "collapsed": false, "input": [ "query = \"\"\"\n", "select *\n", "from acquisitions a\n", "where price_amount > (\n", " select avg(price_amount)\n", " from acquisitions\n", ");\n", "\"\"\"\n", "sql.read_sql(query, con)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
company_permalinkcompany_namecompany_category_listcompany_marketcompany_country_codecompany_state_codecompany_regioncompany_cityacquirer_permalinkacquirer_name...acquirer_country_codeacquirer_state_codeacquirer_regionacquirer_cityacquired_atacquired_monthacquired_quarteracquired_yearprice_amountprice_currency_code
0 /organization/21st-century-insurance 21st Century Insurance |Finance|Business Services|Insurance| Business Services USA AL AL - Other De Armanville /organization/american-international-group American International Group... USA NY New York City New York 2007-09-01 2007-09 2007-Q3 2007 749000000 USD
1 /organization/3com 3Com |Curated Web| Curated Web USA MA Boston Marlborough /organization/hewlett-packard Hewlett-Packard... USA CA SF Bay Area Palo Alto 2009-11-11 2009-11 2009-Q4 2009 2700000000 USD
2 /organization/3par 3PAR |Services|Cloud Data Services|Software| Cloud Data Services USA CA SF Bay Area Fremont /organization/hewlett-packard Hewlett-Packard... USA CA SF Bay Area Palo Alto 2010-09-02 2010-09 2010-Q3 2010 2350000000 USD
3 /organization/91-wireless 91 Wireless |Mobile| Mobile CHN None Fuzhou Shi Fuzhou Shi /organization/baidu Baidu... CHN None Beijing Beijing 2013-07-15 2013-07 2013-Q3 2013 1900000000 USD
4 /organization/abc ABC |News| News None None None None /organization/the-walt-disney-company The Walt Disney Company... USA CA Los Angeles Burbank 1995-08-05 1995-08 1995-Q3 1995 19000000000 USD
5 /organization/abovenet AboveNet |Information Technology|Internet|Service Provi... Internet USA NY New York City White Plains /organization/zayo-group Zayo... USA CO Denver Boulder 2012-03-19 2012-03 2012-Q1 2012 2300000000 USD
6 /organization/abraxis-bioscience Abraxis BioScience |Pharmaceuticals|Biotechnology| Biotechnology USA CA Los Angeles Los Angeles /organization/celgene Celgene... USA NJ Newark Summit 2010-06-30 2010-06 2010-Q2 2010 2900000000 USD
7 /organization/accelrys Accelrys |Software| Software USA CA San Diego San Diego /organization/dassault Dassault Systemes... FRA None None None 2014-01-30 2014-01 2014-Q1 2014 750000000 USD
8 /organization/access-midstream-partners-lp-inv... Access Midstream Partners LP Investments |Natural Gas Uses| Natural Gas Uses USA OK Oklahoma City Oklahoma City /organization/the-williams-companies The Williams Companies... USA OK OK - Other Snyder 2012-12-21 2012-12 2012-Q4 2012 2250000000 USD
9 /organization/acclarent Acclarent |Medical|Biotechnology| Biotechnology USA CA SF Bay Area Menlo Park /organization/johnson-johnson Johnson & Johnson... USA NJ Newark New Brunswick 2009-12-17 2009-12 2009-Q4 2009 785000000 USD
10 /organization/admob AdMob |Mobile|Advertising| Advertising USA CA SF Bay Area Mountain View /organization/google Google... USA CA SF Bay Area Mountain View 2009-11-09 2009-11 2009-Q4 2009 750000000 USD
11 /organization/advanced-computer-software Advanced Computer Software |Health Care|Information Technology|Software| Software GBR None Cobham Cobham /organization/vista-equity-partners Vista Equity Partners... USA TX Austin Austin 2014-11-25 2014-11 2014-Q4 2014 725000000 GBP
12 /organization/advanstar-communications Advanstar Communications |Market Research|Events|Business Services| Events USA CA Los Angeles Santa Monica /organization/ubm-electronics UBM Electronics... USA CA SF Bay Area San Francisco 2014-10-01 2014-10 2014-Q4 2014 972000000 USD
13 /organization/advo Advo None None None None None None /organization/valassis Valassis... USA MI Detroit Livonia 2006-07-06 2006-07 2006-Q3 2006 1300000000 USD
14 /organization/aeluros Aeluros |Semiconductors| Semiconductors USA CA SF Bay Area Mountain View /organization/broadcom Broadcom... USA CA Anaheim Irvine 2012-02-17 2012-02 2012-Q1 2012 3700000000 USD
15 /organization/affiliated-computer-services Affiliated Computer Services |Consulting| Consulting USA TX Dallas Dallas /organization/xerox Xerox... USA AL AL - Other Normal 2009-09-28 2009-09 2009-Q3 2009 5750000000 USD
16 /organization/airwatch AirWatch |Mobile Devices|Mobile| Mobile USA GA Atlanta Atlanta /organization/vmware VMware... USA CA SF Bay Area Palo Alto 2014-01-22 2014-01 2014-Q1 2014 1540000000 USD
17 /organization/aktiv-kapital Aktiv Kapital |Finance| Finance None None None None /organization/portfolio-recovery-associates Portfolio Recovery Associates... USA VA Norfolk - Virginia Beach Norfolk 2014-02-20 2014-02 2014-Q1 2014 1300000000 USD
18 /organization/alabama-gas-corp Alabama Gas Corp None None USA AL Birmingham Birmingham /organization/laclede-group Laclede Group... USA MO St. Louis St Louis 2014-04-07 2014-04 2014-Q2 2014 1340000000 USD
19 /organization/algeta Algeta |Biotechnology| Biotechnology NOR None Oslo Oslo /organization/bayer-ag-germany Bayer AG... DEU None Leverkusen Leverkusen 2013-12-19 2013-12 2013-Q4 2013 2900000000 USD
20 /organization/alios-biopharma Alios BioPharma |Biotechnology| Biotechnology USA CA SF Bay Area South San Francisco /organization/johnson-johnson Johnson & Johnson... USA NJ Newark New Brunswick 2014-09-30 2014-09 2014-Q3 2014 1750000000 USD
21 /organization/allergan Allergan |Medical|Pharmaceuticals|Biotechnology| Medical USA CA Anaheim Irvine /organization/actavis Actavis... USA NJ Newark Parsippany 2014-11-16 2014-11 2014-Q4 2014 65500000000 USD
22 /organization/adl-media-inc AllMedia Inc |Advertising| Advertising USA IA IA - Other Plano /organization/liberty-global Liberty Global... USA CO Denver Englewood 2014-05-08 2014-05 2014-Q2 2014 930000000 USD
23 /organization/alltel Alltel |Public Relations| Public Relations USA AR Little Rock Little Rock /organization/verizon Verizon Communications... USA NY New York City New York 2008-06-05 2008-06 2008-Q2 2008 28100000000 USD
24 /organization/altalink AltaLink |Local Businesses|Energy|Utilities| Utilities CAN AB Calgary Calgary /organization/midamerican-energy Berkshire Hathaway Energy... USA IA Des Moines Des Moines 2014-05-02 2014-05 2014-Q2 2014 3520000000 USD
25 /organization/amcol-international AMCOL International |Manufacturing| Manufacturing USA IL Chicago Hoffman Estates /organization/minerals-technologies Minerals Technologies... USA NY New York City New York 2014-03-10 2014-03 2014-Q1 2014 1700000000 USD
26 /organization/americanexpress American Express |Finance| Finance USA NY New York City New York /organization/standard-chartered-bank Standard Chartered Bank... GBR None London London 2008-02-28 2008-02 2008-Q1 2008 823000000 USD
27 /organization/american-petroleum-tankers American Petroleum Tankers None None USA PA Philadelphia Plymouth Meeting /organization/kinder-morgan Kinder Morgan Energy... USA TX Houston Houston 2013-12-23 2013-12 2013-Q4 2013 962000000 USD
28 /organization/american-realty-capital-healthca... American Realty Capital Healthcare Trust None None USA MA Boston Boston /organization/ventas Ventas... USA IL Chicago Chicago 2014-06-02 2014-06 2014-Q2 2014 2600000000 USD
29 /organization/ami-semiconductor AMI Semiconductor |Electronics|Manufacturing|Design| Design USA ID Idaho Falls Pocatello /organization/on-semiconductor ON Semiconductor... USA AZ Phoenix Phoenix 2007-12-14 2007-12 2007-Q4 2007 915000000 USD
..................................................................
507 /organization/viawest ViaWest |Data Centers|Services|Web Hosting| Web Hosting USA CO Denver Greenwood Village /organization/shaw-communications Shaw Communications... CAN AB Calgary Calgary 2014-08-04 2014-08 2014-Q3 2014 1200000000 USD
508 /organization/viber-media Viber Media |Android|iPhone|VoIP|Messaging| Android CYP None Cyprus Limassol /organization/rakuten Rakuten... JPN None Tokyo Tokyo 2014-02-14 2014-02 2014-Q1 2014 900000000 USD
509 /organization/video-gaming-technologies Video Gaming Technologies |Video Games| Video Games USA TN Nashville Franklin /organization/aristocrat-technologies-inc Aristocrat Technologies, Inc... AUS None AUS - Other None 2014-07-07 2014-07 2014-Q3 2014 1300000000 USD
510 /organization/vion-food-group Vion Food Group |Hospitality| Hospitality NLD None Eindhoven Eindhoven /organization/darling-international Darling Ingredients... USA TX Dallas Irving 2013-10-07 2013-10 2013-Q4 2013 1600000000 EUR
511 /organization/viropharma ViroPharma |Pharmaceuticals|Biotechnology| Biotechnology USA PA Philadelphia Exton /organization/shire Shire... IRL None Dublin Dublin 2013-11-10 2013-11 2013-Q4 2013 4200000000 USD
512 /organization/vizada Vizada |DOD/Military|Mobile| Mobile USA MD Washington, D.C. Rockville /organization/eads-astrium EADS Astrium... USA TX Houston Houston 2011-08-03 2011-08 2011-Q3 2011 960000000 USD
513 /organization/vk Vkontake |Communities|Social Network Media|Social Media| Communities RUS None St. Petersburg Saint Petersburg /organization/mail-ru Mail.Ru Group... RUS None Moscow Moscow 2014-09-16 2014-09 2014-Q3 2014 1470000000 USD
514 /organization/volvo-rents-construction Volvo Rents Construction None None CHE None CHE - Other Eison /organization/platinum-equity-llc Platinum Equity... USA CA Los Angeles Beverly Hills 2013-12-10 2013-12 2013-Q4 2013 1100000000 USD
515 /organization/warner-chilcott Warner Chilcott |Pharmaceuticals|Biotechnology| Biotechnology USA NJ Newark Rockaway /organization/actavis Actavis... USA NJ Newark Parsippany 2013-05-20 2013-05 2013-Q2 2013 8500000000 USD
516 /organization/waze Waze |Navigation|Transportation| Transportation USA CA SF Bay Area Palo Alto /organization/google Google... USA CA SF Bay Area Mountain View 2013-06-11 2013-06 2013-Q2 2013 996000000 USD
517 /organization/webex WebEx |Curated Web| Curated Web USA CA SF Bay Area Santa Clara /organization/cisco Cisco... USA CA SF Bay Area San Jose 2007-03-15 2007-03 2007-Q1 2007 3200000000 USD
518 /organization/webtrends Webtrends |SEO|Curated Web| Curated Web USA OR Portland, Oregon Portland /organization/netiq NetIQ... USA CA SF Bay Area San Jose 2001-01-17 2001-01 2001-Q1 2001 1000000000 USD
519 /organization/western-gas-resources Western Gas Resources |Clean Energy|Oil| Clean Energy USA TX Seminole Seminole /organization/anadarko-petroleum-corporation Anadarko Petroleum Corporation... USA TX Seminole Seminole 2006-06-01 2006-06 2006-Q2 2006 4800000000 USD
520 /organization/weyerhaeuser Weyerhaeuser |Manufacturing| Manufacturing USA WA Seattle Federal Way /organization/tri-pointe-homes Tri Pointe Homes... USA CA SF Bay Area San Ramon 2013-11-04 2013-11 2013-Q4 2013 2700000000 USD
521 /organization/whatsapp WhatsApp |Messaging| Messaging USA CA SF Bay Area Santa Clara /organization/facebook Facebook... USA CA SF Bay Area Menlo Park 2014-02-19 2014-02 2014-Q1 2014 19000000000 USD
522 /organization/wheelabrator-technologies Wheelabrator Technologies |Waste Management| Waste Management USA NH Manchester, New Hampshire Hampton /organization/energy-capital-partners Energy Capital Partners... USA NJ Newark Short Hills 2014-07-29 2014-07 2014-Q3 2014 1940000000 USD
523 /organization/wild-flavors WILD Flavors |Food Processing| Food Processing DEU None Frankfurt Heidelberg /organization/archer-daniels-midland-company Archer Daniels Midland Company... USA IL Chicago Chicago 2014-07-07 2014-07 2014-Q3 2014 3000000000 USD
524 /organization/wind-river Wind River |Software| Software USA CA SF Bay Area Alameda /organization/intel Intel... USA CA SF Bay Area Santa Clara 2009-07-17 2009-07 2009-Q3 2009 884000000 USD
525 /organization/windsor-foods Windsor Foods |Consumer Goods|Food Processing| Food Processing USA TX Houston Houston /organization/ajinomoto Ajinomoto... JPN None Tokyo Tokyo 2014-09-10 2014-09 2014-Q3 2014 800000000 USD
526 /organization/wittur Wittur |Manufacturing|Public Safety|Heavy Industry| Heavy Industry None None None None /organization/bain-capital-2 Bain Capital... USA MA Boston Boston 2014-12-23 2014-12 2014-Q4 2014 750000000 USD
527 /organization/wyeth-pharmaceuticals Wyeth Pharmaceuticals |Biotechnology|Pharmaceuticals|Health and Well... Biotechnology None None None None /organization/nestl Nestl\u0082... CHE None Vevey Vevey 2012-04-01 2012-04 2012-Q2 2012 11850000000 USD
528 /organization/wyeth-pharmaceuticals Wyeth Pharmaceuticals |Biotechnology|Pharmaceuticals|Health and Well... Biotechnology None None None None /organization/pfizer Pfizer... USA NY New York City New York 2009-01-23 2009-01 2009-Q1 2009 68000000000 USD
529 /organization/xircom Xircom |Web Hosting| Web Hosting USA CA Los Angeles Thousand Oaks /organization/intel Intel... USA CA SF Bay Area Santa Clara 2001-01-15 2001-01 2001-Q1 2001 748000000 USD
530 /organization/yammer Yammer |Twitter Applications|Networking|Social Media|... Enterprise Software USA CA SF Bay Area San Francisco /organization/microsoft Microsoft... USA WA Seattle Redmond 2012-06-25 2012-06 2012-Q2 2012 1200000000 USD
531 /organization/yankee-candle-company Yankee Candle Company |Manufacturing| Manufacturing USA AR Fayetteville Rogers /organization/jarden Jarden... USA NY New York City Rye 2013-09-03 2013-09 2013-Q3 2013 1750000000 USD
532 /organization/yoplait Yoplait |Food Processing| Food Processing USA MN Minneapolis Minneapolis /organization/generalmills General Mills... USA MN Minneapolis Minneapolis 2011-07-01 2011-07 2011-Q3 2011 1200000000 USD
533 /organization/york-risk-services-group York Risk Services Group |Insurance| Insurance USA NJ Newark Parsippany /organization/onex Onex... CAN ON Toronto Toronto 2014-07-16 2014-07 2014-Q3 2014 1330000000 USD
534 /organization/youtube YouTube |Video|Online Rental|Entertainment|Games| Games USA CA SF Bay Area San Bruno /organization/google Google... USA CA SF Bay Area Mountain View 2006-10-01 2006-10 2006-Q4 2006 1650000000 USD
535 /organization/zappos Zappos |Curated Web| Curated Web USA NV Las Vegas Las Vegas /organization/amazon Amazon... USA WA Seattle Seattle 2009-07-22 2009-07 2009-Q3 2009 1200000000 USD
536 /organization/ziggo Ziggo |Information Services|Entertainment|Investment... Investment Management NLD None Utrecht Utrecht /organization/liberty-global Liberty Global... USA CO Denver Englewood 2014-01-27 2014-01 2014-Q1 2014 13700000000 USD
\n", "

537 rows \u00d7 22 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 42, "text": [ " company_permalink \\\n", "0 /organization/21st-century-insurance \n", "1 /organization/3com \n", "2 /organization/3par \n", "3 /organization/91-wireless \n", "4 /organization/abc \n", "5 /organization/abovenet \n", "6 /organization/abraxis-bioscience \n", "7 /organization/accelrys \n", "8 /organization/access-midstream-partners-lp-inv... \n", "9 /organization/acclarent \n", "10 /organization/admob \n", "11 /organization/advanced-computer-software \n", "12 /organization/advanstar-communications \n", "13 /organization/advo \n", "14 /organization/aeluros \n", "15 /organization/affiliated-computer-services \n", "16 /organization/airwatch \n", "17 /organization/aktiv-kapital \n", "18 /organization/alabama-gas-corp \n", "19 /organization/algeta \n", "20 /organization/alios-biopharma \n", "21 /organization/allergan \n", "22 /organization/adl-media-inc \n", "23 /organization/alltel \n", "24 /organization/altalink \n", "25 /organization/amcol-international \n", "26 /organization/americanexpress \n", "27 /organization/american-petroleum-tankers \n", "28 /organization/american-realty-capital-healthca... \n", "29 /organization/ami-semiconductor \n", ".. ... \n", "507 /organization/viawest \n", "508 /organization/viber-media \n", "509 /organization/video-gaming-technologies \n", "510 /organization/vion-food-group \n", "511 /organization/viropharma \n", "512 /organization/vizada \n", "513 /organization/vk \n", "514 /organization/volvo-rents-construction \n", "515 /organization/warner-chilcott \n", "516 /organization/waze \n", "517 /organization/webex \n", "518 /organization/webtrends \n", "519 /organization/western-gas-resources \n", "520 /organization/weyerhaeuser \n", "521 /organization/whatsapp \n", "522 /organization/wheelabrator-technologies \n", "523 /organization/wild-flavors \n", "524 /organization/wind-river \n", "525 /organization/windsor-foods \n", "526 /organization/wittur \n", "527 /organization/wyeth-pharmaceuticals \n", "528 /organization/wyeth-pharmaceuticals \n", "529 /organization/xircom \n", "530 /organization/yammer \n", "531 /organization/yankee-candle-company \n", "532 /organization/yoplait \n", "533 /organization/york-risk-services-group \n", "534 /organization/youtube \n", "535 /organization/zappos \n", "536 /organization/ziggo \n", "\n", " company_name \\\n", "0 21st Century Insurance \n", "1 3Com \n", "2 3PAR \n", "3 91 Wireless \n", "4 ABC \n", "5 AboveNet \n", "6 Abraxis BioScience \n", "7 Accelrys \n", "8 Access Midstream Partners LP Investments \n", "9 Acclarent \n", "10 AdMob \n", "11 Advanced Computer Software \n", "12 Advanstar Communications \n", "13 Advo \n", "14 Aeluros \n", "15 Affiliated Computer Services \n", "16 AirWatch \n", "17 Aktiv Kapital \n", "18 Alabama Gas Corp \n", "19 Algeta \n", "20 Alios BioPharma \n", "21 Allergan \n", "22 AllMedia Inc \n", "23 Alltel \n", "24 AltaLink \n", "25 AMCOL International \n", "26 American Express \n", "27 American Petroleum Tankers \n", "28 American Realty Capital Healthcare Trust \n", "29 AMI Semiconductor \n", ".. ... \n", "507 ViaWest \n", "508 Viber Media \n", "509 Video Gaming Technologies \n", "510 Vion Food Group \n", "511 ViroPharma \n", "512 Vizada \n", "513 Vkontake \n", "514 Volvo Rents Construction \n", "515 Warner Chilcott \n", "516 Waze \n", "517 WebEx \n", "518 Webtrends \n", "519 Western Gas Resources \n", "520 Weyerhaeuser \n", "521 WhatsApp \n", "522 Wheelabrator Technologies \n", "523 WILD Flavors \n", "524 Wind River \n", "525 Windsor Foods \n", "526 Wittur \n", "527 Wyeth Pharmaceuticals \n", "528 Wyeth Pharmaceuticals \n", "529 Xircom \n", "530 Yammer \n", "531 Yankee Candle Company \n", "532 Yoplait \n", "533 York Risk Services Group \n", "534 YouTube \n", "535 Zappos \n", "536 Ziggo \n", "\n", " company_category_list company_market \\\n", "0 |Finance|Business Services|Insurance| Business Services \n", "1 |Curated Web| Curated Web \n", "2 |Services|Cloud Data Services|Software| Cloud Data Services \n", "3 |Mobile| Mobile \n", "4 |News| News \n", "5 |Information Technology|Internet|Service Provi... Internet \n", "6 |Pharmaceuticals|Biotechnology| Biotechnology \n", "7 |Software| Software \n", "8 |Natural Gas Uses| Natural Gas Uses \n", "9 |Medical|Biotechnology| Biotechnology \n", "10 |Mobile|Advertising| Advertising \n", "11 |Health Care|Information Technology|Software| Software \n", "12 |Market Research|Events|Business Services| Events \n", "13 None None \n", "14 |Semiconductors| Semiconductors \n", "15 |Consulting| Consulting \n", "16 |Mobile Devices|Mobile| Mobile \n", "17 |Finance| Finance \n", "18 None None \n", "19 |Biotechnology| Biotechnology \n", "20 |Biotechnology| Biotechnology \n", "21 |Medical|Pharmaceuticals|Biotechnology| Medical \n", "22 |Advertising| Advertising \n", "23 |Public Relations| Public Relations \n", "24 |Local Businesses|Energy|Utilities| Utilities \n", "25 |Manufacturing| Manufacturing \n", "26 |Finance| Finance \n", "27 None None \n", "28 None None \n", "29 |Electronics|Manufacturing|Design| Design \n", ".. ... ... \n", "507 |Data Centers|Services|Web Hosting| Web Hosting \n", "508 |Android|iPhone|VoIP|Messaging| Android \n", "509 |Video Games| Video Games \n", "510 |Hospitality| Hospitality \n", "511 |Pharmaceuticals|Biotechnology| Biotechnology \n", "512 |DOD/Military|Mobile| Mobile \n", "513 |Communities|Social Network Media|Social Media| Communities \n", "514 None None \n", "515 |Pharmaceuticals|Biotechnology| Biotechnology \n", "516 |Navigation|Transportation| Transportation \n", "517 |Curated Web| Curated Web \n", "518 |SEO|Curated Web| Curated Web \n", "519 |Clean Energy|Oil| Clean Energy \n", "520 |Manufacturing| Manufacturing \n", "521 |Messaging| Messaging \n", "522 |Waste Management| Waste Management \n", "523 |Food Processing| Food Processing \n", "524 |Software| Software \n", "525 |Consumer Goods|Food Processing| Food Processing \n", "526 |Manufacturing|Public Safety|Heavy Industry| Heavy Industry \n", "527 |Biotechnology|Pharmaceuticals|Health and Well... Biotechnology \n", "528 |Biotechnology|Pharmaceuticals|Health and Well... Biotechnology \n", "529 |Web Hosting| Web Hosting \n", "530 |Twitter Applications|Networking|Social Media|... Enterprise Software \n", "531 |Manufacturing| Manufacturing \n", "532 |Food Processing| Food Processing \n", "533 |Insurance| Insurance \n", "534 |Video|Online Rental|Entertainment|Games| Games \n", "535 |Curated Web| Curated Web \n", "536 |Information Services|Entertainment|Investment... Investment Management \n", "\n", " company_country_code company_state_code company_region \\\n", "0 USA AL AL - Other \n", "1 USA MA Boston \n", "2 USA CA SF Bay Area \n", "3 CHN None Fuzhou Shi \n", "4 None None None \n", "5 USA NY New York City \n", "6 USA CA Los Angeles \n", "7 USA CA San Diego \n", "8 USA OK Oklahoma City \n", "9 USA CA SF Bay Area \n", "10 USA CA SF Bay Area \n", "11 GBR None Cobham \n", "12 USA CA Los Angeles \n", "13 None None None \n", "14 USA CA SF Bay Area \n", "15 USA TX Dallas \n", "16 USA GA Atlanta \n", "17 None None None \n", "18 USA AL Birmingham \n", "19 NOR None Oslo \n", "20 USA CA SF Bay Area \n", "21 USA CA Anaheim \n", "22 USA IA IA - Other \n", "23 USA AR Little Rock \n", "24 CAN AB Calgary \n", "25 USA IL Chicago \n", "26 USA NY New York City \n", "27 USA PA Philadelphia \n", "28 USA MA Boston \n", "29 USA ID Idaho Falls \n", ".. ... ... ... \n", "507 USA CO Denver \n", "508 CYP None Cyprus \n", "509 USA TN Nashville \n", "510 NLD None Eindhoven \n", "511 USA PA Philadelphia \n", "512 USA MD Washington, D.C. \n", "513 RUS None St. Petersburg \n", "514 CHE None CHE - Other \n", "515 USA NJ Newark \n", "516 USA CA SF Bay Area \n", "517 USA CA SF Bay Area \n", "518 USA OR Portland, Oregon \n", "519 USA TX Seminole \n", "520 USA WA Seattle \n", "521 USA CA SF Bay Area \n", "522 USA NH Manchester, New Hampshire \n", "523 DEU None Frankfurt \n", "524 USA CA SF Bay Area \n", "525 USA TX Houston \n", "526 None None None \n", "527 None None None \n", "528 None None None \n", "529 USA CA Los Angeles \n", "530 USA CA SF Bay Area \n", "531 USA AR Fayetteville \n", "532 USA MN Minneapolis \n", "533 USA NJ Newark \n", "534 USA CA SF Bay Area \n", "535 USA NV Las Vegas \n", "536 NLD None Utrecht \n", "\n", " company_city acquirer_permalink \\\n", "0 De Armanville /organization/american-international-group \n", "1 Marlborough /organization/hewlett-packard \n", "2 Fremont /organization/hewlett-packard \n", "3 Fuzhou Shi /organization/baidu \n", "4 None /organization/the-walt-disney-company \n", "5 White Plains /organization/zayo-group \n", "6 Los Angeles /organization/celgene \n", "7 San Diego /organization/dassault \n", "8 Oklahoma City /organization/the-williams-companies \n", "9 Menlo Park /organization/johnson-johnson \n", "10 Mountain View /organization/google \n", "11 Cobham /organization/vista-equity-partners \n", "12 Santa Monica /organization/ubm-electronics \n", "13 None /organization/valassis \n", "14 Mountain View /organization/broadcom \n", "15 Dallas /organization/xerox \n", "16 Atlanta /organization/vmware \n", "17 None /organization/portfolio-recovery-associates \n", "18 Birmingham /organization/laclede-group \n", "19 Oslo /organization/bayer-ag-germany \n", "20 South San Francisco /organization/johnson-johnson \n", "21 Irvine /organization/actavis \n", "22 Plano /organization/liberty-global \n", "23 Little Rock /organization/verizon \n", "24 Calgary /organization/midamerican-energy \n", "25 Hoffman Estates /organization/minerals-technologies \n", "26 New York /organization/standard-chartered-bank \n", "27 Plymouth Meeting /organization/kinder-morgan \n", "28 Boston /organization/ventas \n", "29 Pocatello /organization/on-semiconductor \n", ".. ... ... \n", "507 Greenwood Village /organization/shaw-communications \n", "508 Limassol /organization/rakuten \n", "509 Franklin /organization/aristocrat-technologies-inc \n", "510 Eindhoven /organization/darling-international \n", "511 Exton /organization/shire \n", "512 Rockville /organization/eads-astrium \n", "513 Saint Petersburg /organization/mail-ru \n", "514 Eison /organization/platinum-equity-llc \n", "515 Rockaway /organization/actavis \n", "516 Palo Alto /organization/google \n", "517 Santa Clara /organization/cisco \n", "518 Portland /organization/netiq \n", "519 Seminole /organization/anadarko-petroleum-corporation \n", "520 Federal Way /organization/tri-pointe-homes \n", "521 Santa Clara /organization/facebook \n", "522 Hampton /organization/energy-capital-partners \n", "523 Heidelberg /organization/archer-daniels-midland-company \n", "524 Alameda /organization/intel \n", "525 Houston /organization/ajinomoto \n", "526 None /organization/bain-capital-2 \n", "527 None /organization/nestl \n", "528 None /organization/pfizer \n", "529 Thousand Oaks /organization/intel \n", "530 San Francisco /organization/microsoft \n", "531 Rogers /organization/jarden \n", "532 Minneapolis /organization/generalmills \n", "533 Parsippany /organization/onex \n", "534 San Bruno /organization/google \n", "535 Las Vegas /organization/amazon \n", "536 Utrecht /organization/liberty-global \n", "\n", " acquirer_name ... \\\n", "0 American International Group ... \n", "1 Hewlett-Packard ... \n", "2 Hewlett-Packard ... \n", "3 Baidu ... \n", "4 The Walt Disney Company ... \n", "5 Zayo ... \n", "6 Celgene ... \n", "7 Dassault Systemes ... \n", "8 The Williams Companies ... \n", "9 Johnson & Johnson ... \n", "10 Google ... \n", "11 Vista Equity Partners ... \n", "12 UBM Electronics ... \n", "13 Valassis ... \n", "14 Broadcom ... \n", "15 Xerox ... \n", "16 VMware ... \n", "17 Portfolio Recovery Associates ... \n", "18 Laclede Group ... \n", "19 Bayer AG ... \n", "20 Johnson & Johnson ... \n", "21 Actavis ... \n", "22 Liberty Global ... \n", "23 Verizon Communications ... \n", "24 Berkshire Hathaway Energy ... \n", "25 Minerals Technologies ... \n", "26 Standard Chartered Bank ... \n", "27 Kinder Morgan Energy ... \n", "28 Ventas ... \n", "29 ON Semiconductor ... \n", ".. ... ... \n", "507 Shaw Communications ... \n", "508 Rakuten ... \n", "509 Aristocrat Technologies, Inc ... \n", "510 Darling Ingredients ... \n", "511 Shire ... \n", "512 EADS Astrium ... \n", "513 Mail.Ru Group ... \n", "514 Platinum Equity ... \n", "515 Actavis ... \n", "516 Google ... \n", "517 Cisco ... \n", "518 NetIQ ... \n", "519 Anadarko Petroleum Corporation ... \n", "520 Tri Pointe Homes ... \n", "521 Facebook ... \n", "522 Energy Capital Partners ... \n", "523 Archer Daniels Midland Company ... \n", "524 Intel ... \n", "525 Ajinomoto ... \n", "526 Bain Capital ... \n", "527 Nestl\u0082 ... \n", "528 Pfizer ... \n", "529 Intel ... \n", "530 Microsoft ... \n", "531 Jarden ... \n", "532 General Mills ... \n", "533 Onex ... \n", "534 Google ... \n", "535 Amazon ... \n", "536 Liberty Global ... \n", "\n", " acquirer_country_code acquirer_state_code acquirer_region \\\n", "0 USA NY New York City \n", "1 USA CA SF Bay Area \n", "2 USA CA SF Bay Area \n", "3 CHN None Beijing \n", "4 USA CA Los Angeles \n", "5 USA CO Denver \n", "6 USA NJ Newark \n", "7 FRA None None \n", "8 USA OK OK - Other \n", "9 USA NJ Newark \n", "10 USA CA SF Bay Area \n", "11 USA TX Austin \n", "12 USA CA SF Bay Area \n", "13 USA MI Detroit \n", "14 USA CA Anaheim \n", "15 USA AL AL - Other \n", "16 USA CA SF Bay Area \n", "17 USA VA Norfolk - Virginia Beach \n", "18 USA MO St. Louis \n", "19 DEU None Leverkusen \n", "20 USA NJ Newark \n", "21 USA NJ Newark \n", "22 USA CO Denver \n", "23 USA NY New York City \n", "24 USA IA Des Moines \n", "25 USA NY New York City \n", "26 GBR None London \n", "27 USA TX Houston \n", "28 USA IL Chicago \n", "29 USA AZ Phoenix \n", ".. ... ... ... \n", "507 CAN AB Calgary \n", "508 JPN None Tokyo \n", "509 AUS None AUS - Other \n", "510 USA TX Dallas \n", "511 IRL None Dublin \n", "512 USA TX Houston \n", "513 RUS None Moscow \n", "514 USA CA Los Angeles \n", "515 USA NJ Newark \n", "516 USA CA SF Bay Area \n", "517 USA CA SF Bay Area \n", "518 USA CA SF Bay Area \n", "519 USA TX Seminole \n", "520 USA CA SF Bay Area \n", "521 USA CA SF Bay Area \n", "522 USA NJ Newark \n", "523 USA IL Chicago \n", "524 USA CA SF Bay Area \n", "525 JPN None Tokyo \n", "526 USA MA Boston \n", "527 CHE None Vevey \n", "528 USA NY New York City \n", "529 USA CA SF Bay Area \n", "530 USA WA Seattle \n", "531 USA NY New York City \n", "532 USA MN Minneapolis \n", "533 CAN ON Toronto \n", "534 USA CA SF Bay Area \n", "535 USA WA Seattle \n", "536 USA CO Denver \n", "\n", " acquirer_city acquired_at acquired_month acquired_quarter acquired_year \\\n", "0 New York 2007-09-01 2007-09 2007-Q3 2007 \n", "1 Palo Alto 2009-11-11 2009-11 2009-Q4 2009 \n", "2 Palo Alto 2010-09-02 2010-09 2010-Q3 2010 \n", "3 Beijing 2013-07-15 2013-07 2013-Q3 2013 \n", "4 Burbank 1995-08-05 1995-08 1995-Q3 1995 \n", "5 Boulder 2012-03-19 2012-03 2012-Q1 2012 \n", "6 Summit 2010-06-30 2010-06 2010-Q2 2010 \n", "7 None 2014-01-30 2014-01 2014-Q1 2014 \n", "8 Snyder 2012-12-21 2012-12 2012-Q4 2012 \n", "9 New Brunswick 2009-12-17 2009-12 2009-Q4 2009 \n", "10 Mountain View 2009-11-09 2009-11 2009-Q4 2009 \n", "11 Austin 2014-11-25 2014-11 2014-Q4 2014 \n", "12 San Francisco 2014-10-01 2014-10 2014-Q4 2014 \n", "13 Livonia 2006-07-06 2006-07 2006-Q3 2006 \n", "14 Irvine 2012-02-17 2012-02 2012-Q1 2012 \n", "15 Normal 2009-09-28 2009-09 2009-Q3 2009 \n", "16 Palo Alto 2014-01-22 2014-01 2014-Q1 2014 \n", "17 Norfolk 2014-02-20 2014-02 2014-Q1 2014 \n", "18 St Louis 2014-04-07 2014-04 2014-Q2 2014 \n", "19 Leverkusen 2013-12-19 2013-12 2013-Q4 2013 \n", "20 New Brunswick 2014-09-30 2014-09 2014-Q3 2014 \n", "21 Parsippany 2014-11-16 2014-11 2014-Q4 2014 \n", "22 Englewood 2014-05-08 2014-05 2014-Q2 2014 \n", "23 New York 2008-06-05 2008-06 2008-Q2 2008 \n", "24 Des Moines 2014-05-02 2014-05 2014-Q2 2014 \n", "25 New York 2014-03-10 2014-03 2014-Q1 2014 \n", "26 London 2008-02-28 2008-02 2008-Q1 2008 \n", "27 Houston 2013-12-23 2013-12 2013-Q4 2013 \n", "28 Chicago 2014-06-02 2014-06 2014-Q2 2014 \n", "29 Phoenix 2007-12-14 2007-12 2007-Q4 2007 \n", ".. ... ... ... ... ... \n", "507 Calgary 2014-08-04 2014-08 2014-Q3 2014 \n", "508 Tokyo 2014-02-14 2014-02 2014-Q1 2014 \n", "509 None 2014-07-07 2014-07 2014-Q3 2014 \n", "510 Irving 2013-10-07 2013-10 2013-Q4 2013 \n", "511 Dublin 2013-11-10 2013-11 2013-Q4 2013 \n", "512 Houston 2011-08-03 2011-08 2011-Q3 2011 \n", "513 Moscow 2014-09-16 2014-09 2014-Q3 2014 \n", "514 Beverly Hills 2013-12-10 2013-12 2013-Q4 2013 \n", "515 Parsippany 2013-05-20 2013-05 2013-Q2 2013 \n", "516 Mountain View 2013-06-11 2013-06 2013-Q2 2013 \n", "517 San Jose 2007-03-15 2007-03 2007-Q1 2007 \n", "518 San Jose 2001-01-17 2001-01 2001-Q1 2001 \n", "519 Seminole 2006-06-01 2006-06 2006-Q2 2006 \n", "520 San Ramon 2013-11-04 2013-11 2013-Q4 2013 \n", "521 Menlo Park 2014-02-19 2014-02 2014-Q1 2014 \n", "522 Short Hills 2014-07-29 2014-07 2014-Q3 2014 \n", "523 Chicago 2014-07-07 2014-07 2014-Q3 2014 \n", "524 Santa Clara 2009-07-17 2009-07 2009-Q3 2009 \n", "525 Tokyo 2014-09-10 2014-09 2014-Q3 2014 \n", "526 Boston 2014-12-23 2014-12 2014-Q4 2014 \n", "527 Vevey 2012-04-01 2012-04 2012-Q2 2012 \n", "528 New York 2009-01-23 2009-01 2009-Q1 2009 \n", "529 Santa Clara 2001-01-15 2001-01 2001-Q1 2001 \n", "530 Redmond 2012-06-25 2012-06 2012-Q2 2012 \n", "531 Rye 2013-09-03 2013-09 2013-Q3 2013 \n", "532 Minneapolis 2011-07-01 2011-07 2011-Q3 2011 \n", "533 Toronto 2014-07-16 2014-07 2014-Q3 2014 \n", "534 Mountain View 2006-10-01 2006-10 2006-Q4 2006 \n", "535 Seattle 2009-07-22 2009-07 2009-Q3 2009 \n", "536 Englewood 2014-01-27 2014-01 2014-Q1 2014 \n", "\n", " price_amount price_currency_code \n", "0 749000000 USD \n", "1 2700000000 USD \n", "2 2350000000 USD \n", "3 1900000000 USD \n", "4 19000000000 USD \n", "5 2300000000 USD \n", "6 2900000000 USD \n", "7 750000000 USD \n", "8 2250000000 USD \n", "9 785000000 USD \n", "10 750000000 USD \n", "11 725000000 GBP \n", "12 972000000 USD \n", "13 1300000000 USD \n", "14 3700000000 USD \n", "15 5750000000 USD \n", "16 1540000000 USD \n", "17 1300000000 USD \n", "18 1340000000 USD \n", "19 2900000000 USD \n", "20 1750000000 USD \n", "21 65500000000 USD \n", "22 930000000 USD \n", "23 28100000000 USD \n", "24 3520000000 USD \n", "25 1700000000 USD \n", "26 823000000 USD \n", "27 962000000 USD \n", "28 2600000000 USD \n", "29 915000000 USD \n", ".. ... ... \n", "507 1200000000 USD \n", "508 900000000 USD \n", "509 1300000000 USD \n", "510 1600000000 EUR \n", "511 4200000000 USD \n", "512 960000000 USD \n", "513 1470000000 USD \n", "514 1100000000 USD \n", "515 8500000000 USD \n", "516 996000000 USD \n", "517 3200000000 USD \n", "518 1000000000 USD \n", "519 4800000000 USD \n", "520 2700000000 USD \n", "521 19000000000 USD \n", "522 1940000000 USD \n", "523 3000000000 USD \n", "524 884000000 USD \n", "525 800000000 USD \n", "526 750000000 USD \n", "527 11850000000 USD \n", "528 68000000000 USD \n", "529 748000000 USD \n", "530 1200000000 USD \n", "531 1750000000 USD \n", "532 1200000000 USD \n", "533 1330000000 USD \n", "534 1650000000 USD \n", "535 1200000000 USD \n", "536 13700000000 USD \n", "\n", "[537 rows x 22 columns]" ] } ], "prompt_number": 42 }, { "cell_type": "code", "collapsed": false, "input": [ "companies.loc[0]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 43, "text": [ "permalink /organization/-fame\n", "name #fame\n", "homepage_url http://livfame.com\n", "category_list |Media|\n", "market Media \n", "status operating\n", "country_code IND\n", "state_code NaN\n", "region Mumbai\n", "city Mumbai\n", "funding_rounds 1\n", "founded_at nan\n", "founded_month NaN\n", "founded_quarter NaN\n", "founded_year NaN\n", "first_funding_at 2015-01-05\n", "last_funding_at 2015-01-05\n", "funding_total_usd 1e+07\n", "Name: 0, dtype: object" ] } ], "prompt_number": 43 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Semi and Anti-Joins\n", "===\n", "- Implement efficient \"set filter\"-type relational algebra\n", "- Can be implemented as an explicit join, but only when no row duplication possible\n", "- Some databases support explicit `SEMI JOIN` or `ANTI JOIN`\n", "- No direct analogue in pandas; must implement using explicit set filtering" ] }, { "cell_type": "code", "collapsed": false, "input": [ "rounds.loc[0]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 44, "text": [ "company_permalink /organization/-fame\n", "company_name #fame\n", "company_category_list |Media|\n", "company_market Media\n", "company_country_code IND\n", "company_state_code NaN\n", "company_region Mumbai\n", "company_city Mumbai\n", "funding_round_permalink /funding-round/9a01d05418af9f794eebff7ace91f638\n", "funding_round_type venture\n", "funding_round_code B\n", "funded_at 2015-01-05\n", "funded_month 2015-01\n", "funded_quarter 2015-Q1 \n", "funded_year 2015\n", "raised_amount_usd 10,000,000 \n", "Unnamed: 16 NaN\n", "Name: 0, dtype: object" ] } ], "prompt_number": 44 }, { "cell_type": "code", "collapsed": false, "input": [ "companies.loc[0]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 45, "text": [ "permalink /organization/-fame\n", "name #fame\n", "homepage_url http://livfame.com\n", "category_list |Media|\n", "market Media \n", "status operating\n", "country_code IND\n", "state_code NaN\n", "region Mumbai\n", "city Mumbai\n", "funding_rounds 1\n", "founded_at nan\n", "founded_month NaN\n", "founded_quarter NaN\n", "founded_year NaN\n", "first_funding_at 2015-01-05\n", "last_funding_at 2015-01-05\n", "funding_total_usd 1e+07\n", "Name: 0, dtype: object" ] } ], "prompt_number": 45 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use this One Weird Trick\n", "===" ] }, { "cell_type": "code", "collapsed": false, "input": [ "len(companies)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 46, "text": [ "54292" ] } ], "prompt_number": 46 }, { "cell_type": "code", "collapsed": false, "input": [ "query = \"\"\"\n", "select status, count(*) as count\n", "from companies c\n", "where not exists (\n", " select 1\n", " from rounds r\n", " where c.permalink = r.company_permalink\n", " and r.funding_round_type = 'venture' \n", ") \n", " and c.founded_at > '2010-01-01'\n", "group by 1;\n", "\"\"\"\n", "sql.read_sql(query, con)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
statuscount
0 None 250
1 operating 12309
2 closed 542
3 acquired 288
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 47, "text": [ " status count\n", "0 None 250\n", "1 operating 12309\n", "2 closed 542\n", "3 acquired 288" ] } ], "prompt_number": 47 }, { "cell_type": "markdown", "metadata": {}, "source": [ " QUERY PLAN \n", " ----------------------------------------------------------------------------------\n", " HashAggregate (cost=8454.36..8454.38 rows=2 width=9)\n", " -> Hash Semi Join (cost=4466.18..8388.91 rows=13090 width=9)\n", " Hash Cond: ((c.permalink)::text = (r.company_permalink)::text)\n", " -> Seq Scan on companies c (cost=0.00..2143.65 rows=18526 width=36)\n", " Filter: (founded_at > '2010-01-01'::date)\n", " -> Hash (cost=3635.51..3635.51 rows=42933 width=26)\n", " -> Seq Scan on rounds r (cost=0.00..3635.51 rows=42933 width=26)\n", " Filter: ((funding_round_type)::text = 'venture'::text)\n", " (8 rows)\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "rounds[:20]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
company_permalinkcompany_namecompany_category_listcompany_marketcompany_country_codecompany_state_codecompany_regioncompany_cityfunding_round_permalinkfunding_round_typefunding_round_codefunded_atfunded_monthfunded_quarterfunded_yearraised_amount_usdUnnamed: 16
0 /organization/-fame #fame |Media| Media IND NaN Mumbai Mumbai /funding-round/9a01d05418af9f794eebff7ace91f638 venture B 2015-01-05 2015-01 2015-Q1 2015 10,000,000 NaN
1 /organization/hashoff #HASHOFF |Digital Media|Internet|Social Media| Digital Media USA CO Denver Denver /funding-round/669d6203c0374e6cf0e8d10f75ba0b8a debt_financing NaN 2014-12-08 2014-12 2014-Q4 2014 455,000 NaN
2 /organization/waywire #waywire |Entertainment|Politics|Social Media|News| News USA NY New York City New York /funding-round/cc409188fa2b63482bd9008f682c2efa seed NaN 2012-06-30 2012-06 2012-Q2 2012 1,750,000 NaN
3 /organization/tv-communications &TV Communications |Games| Games USA CA Los Angeles Los Angeles /funding-round/86d22afc65107b6941e6c43c671ecbb8 venture NaN 2010-06-04 2010-06 2010-Q2 2010 1,000,000 NaN
4 /organization/tv-communications &TV Communications |Games| Games USA CA Los Angeles Los Angeles /funding-round/59a3669a64e39360c2b939300bcda162 venture NaN 2010-09-23 2010-09 2010-Q3 2010 3,000,000 NaN
5 /organization/rock-your-paper 'Rock' Your Paper |Publishing|Education| Publishing EST NaN Tallinn Tallinn /funding-round/f06b420775f7cb6c1541a9db526534bb seed NaN 2012-08-09 2012-08 2012-Q3 2012 40,000 NaN
6 /organization/in-touch-network (In)Touch Network |Electronics|Guides|Coffee|Restaurants|Music|i... Electronics GBR NaN London London /funding-round/33c3f135f05d7b734b8d7b7c8ae82647 seed NaN 2011-04-01 2011-04 2011-Q2 2011 1,500,000 NaN
7 /organization/r-ranch-and-mine -R- Ranch and Mine |Tourism|Entertainment|Games| Entertainment USA TX Dallas Fort Worth /funding-round/029720f7eeb218f51c43df5155671472 equity_crowdfunding NaN 2014-08-17 2014-08 2014-Q3 2014 10,000 NaN
8 /organization/r-ranch-and-mine -R- Ranch and Mine |Tourism|Entertainment|Games| Entertainment USA TX Dallas Fort Worth /funding-round/766b0bcead9ca8560af5e1ade579fb7f equity_crowdfunding NaN 2014-09-26 2014-09 2014-Q3 2014 50,000 NaN
9 /organization/club-domains .Club Domains |Software| Software USA FL Ft. Lauderdale Oakland Park /funding-round/aee671c9707f8278a9544c8ae37650e0 venture B 2013-05-31 2013-05 2013-Q2 2013 7,000,000 NaN
10 /organization/fox-networks .Fox Networks |Advertising| Advertising NaN NaN NaN NaN /funding-round/69a1536a6f4506538afd7aa7241ddbe1 undisclosed NaN 2007-01-16 2007-01 2007-Q1 2007 4,912,393 NaN
11 /organization/0-6-com 0-6.com |Curated Web| Curated Web NaN NaN NaN NaN /funding-round/5727accaeaa57461bd22a9bdd945382d venture A 2008-03-19 2008-03 2008-Q1 2008 2,000,000 NaN
12 /organization/004-technologies 004 Technologies |Software| Software USA IL Springfield, Illinois Champaign /funding-round/1278dd4e6a37fa4b7d7e06c21b3c1830 venture NaN 2014-07-24 2014-07 2014-Q3 2014 NaNNaN
13 /organization/01games-technology 01Games Technology |Games| Games HKG NaN Hong Kong Hong Kong /funding-round/7d53696f2b4f607a2f2a8cbb83d01839 seed NaN 2014-07-01 2014-07 2014-Q3 2014 41,250 NaN
14 /organization/1-2-3-listo 1,2,3 Listo |E-Commerce| E-Commerce CHL NaN Santiago Las Condes /funding-round/6de4609e894495105bc791ed64361288 seed NaN 2013-02-18 2013-02 2013-Q1 2013 40,000 NaN
15 /organization/1-4-all 1-4 All |Entertainment|Games|Software| Software USA NC NC - Other Connellys Springs /funding-round/e97a192e13ea0ee3c4f71136b4f3ec16 equity_crowdfunding NaN 2013-04-21 2013-04 2013-Q2 2013 NaNNaN
16 /organization/1-800-dentist 1-800-DENTIST |Health and Wellness| Health and Wellness USA CA Los Angeles Los Angeles /funding-round/5274aacc211163fc7c86539ce94bbacc undisclosed NaN 2010-08-19 2010-08 2010-Q3 2010 NaNNaN
17 /organization/1-800-doctors 1-800-DOCTORS |Health and Wellness| Health and Wellness USA NJ Newark Iselin /funding-round/9eb8c7790a0c200d79e75785d1c4aa12 convertible_note NaN 2011-03-02 2011-03 2011-Q1 2011 1,750,000 NaN
18 /organization/1-618-technology 1.618 Technology |Real Estate| Real Estate USA FL Orlando Orlando /funding-round/83b8f4c7d37ecef5e001a5e953bf461a equity_crowdfunding NaN 2014-01-22 2014-01 2014-Q1 2014 NaNNaN
19 /organization/10-minutes-with 10 Minutes With |Education| Education GBR NaN London London /funding-round/f245a74b4c54610ae843e17bdf4d1113 seed NaN 2013-01-01 2013-01 2013-Q1 2013 400,000 NaN
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 48, "text": [ " company_permalink company_name \\\n", "0 /organization/-fame #fame \n", "1 /organization/hashoff #HASHOFF \n", "2 /organization/waywire #waywire \n", "3 /organization/tv-communications &TV Communications \n", "4 /organization/tv-communications &TV Communications \n", "5 /organization/rock-your-paper 'Rock' Your Paper \n", "6 /organization/in-touch-network (In)Touch Network \n", "7 /organization/r-ranch-and-mine -R- Ranch and Mine \n", "8 /organization/r-ranch-and-mine -R- Ranch and Mine \n", "9 /organization/club-domains .Club Domains \n", "10 /organization/fox-networks .Fox Networks \n", "11 /organization/0-6-com 0-6.com \n", "12 /organization/004-technologies 004 Technologies \n", "13 /organization/01games-technology 01Games Technology \n", "14 /organization/1-2-3-listo 1,2,3 Listo \n", "15 /organization/1-4-all 1-4 All \n", "16 /organization/1-800-dentist 1-800-DENTIST \n", "17 /organization/1-800-doctors 1-800-DOCTORS \n", "18 /organization/1-618-technology 1.618 Technology \n", "19 /organization/10-minutes-with 10 Minutes With \n", "\n", " company_category_list company_market \\\n", "0 |Media| Media \n", "1 |Digital Media|Internet|Social Media| Digital Media \n", "2 |Entertainment|Politics|Social Media|News| News \n", "3 |Games| Games \n", "4 |Games| Games \n", "5 |Publishing|Education| Publishing \n", "6 |Electronics|Guides|Coffee|Restaurants|Music|i... Electronics \n", "7 |Tourism|Entertainment|Games| Entertainment \n", "8 |Tourism|Entertainment|Games| Entertainment \n", "9 |Software| Software \n", "10 |Advertising| Advertising \n", "11 |Curated Web| Curated Web \n", "12 |Software| Software \n", "13 |Games| Games \n", "14 |E-Commerce| E-Commerce \n", "15 |Entertainment|Games|Software| Software \n", "16 |Health and Wellness| Health and Wellness \n", "17 |Health and Wellness| Health and Wellness \n", "18 |Real Estate| Real Estate \n", "19 |Education| Education \n", "\n", " company_country_code company_state_code company_region \\\n", "0 IND NaN Mumbai \n", "1 USA CO Denver \n", "2 USA NY New York City \n", "3 USA CA Los Angeles \n", "4 USA CA Los Angeles \n", "5 EST NaN Tallinn \n", "6 GBR NaN London \n", "7 USA TX Dallas \n", "8 USA TX Dallas \n", "9 USA FL Ft. Lauderdale \n", "10 NaN NaN NaN \n", "11 NaN NaN NaN \n", "12 USA IL Springfield, Illinois \n", "13 HKG NaN Hong Kong \n", "14 CHL NaN Santiago \n", "15 USA NC NC - Other \n", "16 USA CA Los Angeles \n", "17 USA NJ Newark \n", "18 USA FL Orlando \n", "19 GBR NaN London \n", "\n", " company_city funding_round_permalink \\\n", "0 Mumbai /funding-round/9a01d05418af9f794eebff7ace91f638 \n", "1 Denver /funding-round/669d6203c0374e6cf0e8d10f75ba0b8a \n", "2 New York /funding-round/cc409188fa2b63482bd9008f682c2efa \n", "3 Los Angeles /funding-round/86d22afc65107b6941e6c43c671ecbb8 \n", "4 Los Angeles /funding-round/59a3669a64e39360c2b939300bcda162 \n", "5 Tallinn /funding-round/f06b420775f7cb6c1541a9db526534bb \n", "6 London /funding-round/33c3f135f05d7b734b8d7b7c8ae82647 \n", "7 Fort Worth /funding-round/029720f7eeb218f51c43df5155671472 \n", "8 Fort Worth /funding-round/766b0bcead9ca8560af5e1ade579fb7f \n", "9 Oakland Park /funding-round/aee671c9707f8278a9544c8ae37650e0 \n", "10 NaN /funding-round/69a1536a6f4506538afd7aa7241ddbe1 \n", "11 NaN /funding-round/5727accaeaa57461bd22a9bdd945382d \n", "12 Champaign /funding-round/1278dd4e6a37fa4b7d7e06c21b3c1830 \n", "13 Hong Kong /funding-round/7d53696f2b4f607a2f2a8cbb83d01839 \n", "14 Las Condes /funding-round/6de4609e894495105bc791ed64361288 \n", "15 Connellys Springs /funding-round/e97a192e13ea0ee3c4f71136b4f3ec16 \n", "16 Los Angeles /funding-round/5274aacc211163fc7c86539ce94bbacc \n", "17 Iselin /funding-round/9eb8c7790a0c200d79e75785d1c4aa12 \n", "18 Orlando /funding-round/83b8f4c7d37ecef5e001a5e953bf461a \n", "19 London /funding-round/f245a74b4c54610ae843e17bdf4d1113 \n", "\n", " funding_round_type funding_round_code funded_at funded_month \\\n", "0 venture B 2015-01-05 2015-01 \n", "1 debt_financing NaN 2014-12-08 2014-12 \n", "2 seed NaN 2012-06-30 2012-06 \n", "3 venture NaN 2010-06-04 2010-06 \n", "4 venture NaN 2010-09-23 2010-09 \n", "5 seed NaN 2012-08-09 2012-08 \n", "6 seed NaN 2011-04-01 2011-04 \n", "7 equity_crowdfunding NaN 2014-08-17 2014-08 \n", "8 equity_crowdfunding NaN 2014-09-26 2014-09 \n", "9 venture B 2013-05-31 2013-05 \n", "10 undisclosed NaN 2007-01-16 2007-01 \n", "11 venture A 2008-03-19 2008-03 \n", "12 venture NaN 2014-07-24 2014-07 \n", "13 seed NaN 2014-07-01 2014-07 \n", "14 seed NaN 2013-02-18 2013-02 \n", "15 equity_crowdfunding NaN 2013-04-21 2013-04 \n", "16 undisclosed NaN 2010-08-19 2010-08 \n", "17 convertible_note NaN 2011-03-02 2011-03 \n", "18 equity_crowdfunding NaN 2014-01-22 2014-01 \n", "19 seed NaN 2013-01-01 2013-01 \n", "\n", " funded_quarter funded_year raised_amount_usd Unnamed: 16 \n", "0 2015-Q1 2015 10,000,000 NaN \n", "1 2014-Q4 2014 455,000 NaN \n", "2 2012-Q2 2012 1,750,000 NaN \n", "3 2010-Q2 2010 1,000,000 NaN \n", "4 2010-Q3 2010 3,000,000 NaN \n", "5 2012-Q3 2012 40,000 NaN \n", "6 2011-Q2 2011 1,500,000 NaN \n", "7 2014-Q3 2014 10,000 NaN \n", "8 2014-Q3 2014 50,000 NaN \n", "9 2013-Q2 2013 7,000,000 NaN \n", "10 2007-Q1 2007 4,912,393 NaN \n", "11 2008-Q1 2008 2,000,000 NaN \n", "12 2014-Q3 2014 NaN NaN \n", "13 2014-Q3 2014 41,250 NaN \n", "14 2013-Q1 2013 40,000 NaN \n", "15 2013-Q2 2013 NaN NaN \n", "16 2010-Q3 2010 NaN NaN \n", "17 2011-Q1 2011 1,750,000 NaN \n", "18 2014-Q1 2014 NaN NaN \n", "19 2013-Q1 2013 400,000 NaN " ] } ], "prompt_number": 48 }, { "cell_type": "code", "collapsed": false, "input": [ "companies_with_round = rounds.company_permalink.unique()\n", "companies.permalink.isin(companies_with_round)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 49, "text": [ "0 True\n", "1 True\n", "2 True\n", "3 True\n", "4 True\n", "5 True\n", "6 True\n", "7 True\n", "8 True\n", "9 True\n", "10 True\n", "11 True\n", "12 True\n", "13 True\n", "14 True\n", "...\n", "54277 False\n", "54278 False\n", "54279 False\n", "54280 False\n", "54281 False\n", "54282 False\n", "54283 False\n", "54284 False\n", "54285 False\n", "54286 False\n", "54287 False\n", "54288 False\n", "54289 False\n", "54290 False\n", "54291 False\n", "Name: permalink, Length: 54292, dtype: bool" ] } ], "prompt_number": 49 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Window functions\n", "===\n", "- A SQL concept, primarily\n", "- Functions applying to whole sections / partitions of a table\n", "- Emulate in pandas using groupby-apply paradigm" ] }, { "cell_type": "code", "collapsed": false, "input": [ "query = \"\"\"\n", "select year, hour, occs,\n", " occs / sum(occs) OVER (PARTITION BY YEAR) as share\n", "from (\n", " select extract(year from date) as year,\n", " extract(hour from time) as hour,\n", " count(*) as occs\n", " from crimes\n", " group by 1, 2\n", ") t0\n", "\"\"\"\n", "results = sql.read_sql(query, con)\n", "results" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearhouroccsshare
0 2014 0 7217 0.048076
1 2014 1 4353 0.028998
2 2014 2 3642 0.024261
3 2014 3 2537 0.016900
4 2014 4 1709 0.011385
5 2014 5 1636 0.010898
6 2014 6 2195 0.014622
7 2014 7 3553 0.023668
8 2014 8 5124 0.034134
9 2014 9 5844 0.038930
10 2014 10 6210 0.041368
11 2014 11 6464 0.043060
12 2014 12 8803 0.058641
13 2014 13 7122 0.047443
14 2014 14 7373 0.049115
15 2014 15 8111 0.054032
16 2014 16 8671 0.057762
17 2014 17 9389 0.062545
18 2014 18 9865 0.065716
19 2014 19 9087 0.060533
20 2014 20 8452 0.056303
21 2014 21 7743 0.051580
22 2014 22 7924 0.052786
23 2014 23 7092 0.047243
24 2015 0 504 0.042014
25 2015 1 420 0.035012
26 2015 2 332 0.027676
27 2015 3 224 0.018673
28 2015 4 156 0.013004
29 2015 5 143 0.011921
30 2015 6 179 0.014922
31 2015 7 320 0.026676
32 2015 8 403 0.033595
33 2015 9 530 0.044181
34 2015 10 497 0.041430
35 2015 11 533 0.044431
36 2015 12 596 0.049683
37 2015 13 539 0.044932
38 2015 14 576 0.048016
39 2015 15 627 0.052267
40 2015 16 681 0.056769
41 2015 17 691 0.057603
42 2015 18 819 0.068273
43 2015 19 766 0.063855
44 2015 20 694 0.057853
45 2015 21 634 0.052851
46 2015 22 616 0.051350
47 2015 23 516 0.043014
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 50, "text": [ " year hour occs share\n", "0 2014 0 7217 0.048076\n", "1 2014 1 4353 0.028998\n", "2 2014 2 3642 0.024261\n", "3 2014 3 2537 0.016900\n", "4 2014 4 1709 0.011385\n", "5 2014 5 1636 0.010898\n", "6 2014 6 2195 0.014622\n", "7 2014 7 3553 0.023668\n", "8 2014 8 5124 0.034134\n", "9 2014 9 5844 0.038930\n", "10 2014 10 6210 0.041368\n", "11 2014 11 6464 0.043060\n", "12 2014 12 8803 0.058641\n", "13 2014 13 7122 0.047443\n", "14 2014 14 7373 0.049115\n", "15 2014 15 8111 0.054032\n", "16 2014 16 8671 0.057762\n", "17 2014 17 9389 0.062545\n", "18 2014 18 9865 0.065716\n", "19 2014 19 9087 0.060533\n", "20 2014 20 8452 0.056303\n", "21 2014 21 7743 0.051580\n", "22 2014 22 7924 0.052786\n", "23 2014 23 7092 0.047243\n", "24 2015 0 504 0.042014\n", "25 2015 1 420 0.035012\n", "26 2015 2 332 0.027676\n", "27 2015 3 224 0.018673\n", "28 2015 4 156 0.013004\n", "29 2015 5 143 0.011921\n", "30 2015 6 179 0.014922\n", "31 2015 7 320 0.026676\n", "32 2015 8 403 0.033595\n", "33 2015 9 530 0.044181\n", "34 2015 10 497 0.041430\n", "35 2015 11 533 0.044431\n", "36 2015 12 596 0.049683\n", "37 2015 13 539 0.044932\n", "38 2015 14 576 0.048016\n", "39 2015 15 627 0.052267\n", "40 2015 16 681 0.056769\n", "41 2015 17 691 0.057603\n", "42 2015 18 819 0.068273\n", "43 2015 19 766 0.063855\n", "44 2015 20 694 0.057853\n", "45 2015 21 634 0.052851\n", "46 2015 22 616 0.051350\n", "47 2015 23 516 0.043014" ] } ], "prompt_number": 50 }, { "cell_type": "code", "collapsed": false, "input": [ "keys = [crimes.date.dt.year,\n", " crimes.time.map(lambda x: x.hour)]\n", "counts = crimes.groupby(keys).size()\n", "counts" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 52, "text": [ " time\n", "2014 0 7217\n", " 1 4353\n", " 2 3642\n", " 3 2537\n", " 4 1709\n", " 5 1636\n", " 6 2195\n", " 7 3553\n", " 8 5124\n", " 9 5844\n", " 10 6210\n", " 11 6464\n", " 12 8803\n", " 13 7122\n", " 14 7373\n", " 15 8111\n", " 16 8671\n", " 17 9389\n", " 18 9865\n", " 19 9087\n", " 20 8452\n", " 21 7743\n", " 22 7924\n", " 23 7092\n", "2015 0 504\n", " 1 420\n", " 2 332\n", " 3 224\n", " 4 156\n", " 5 143\n", " 6 179\n", " 7 320\n", " 8 403\n", " 9 530\n", " 10 497\n", " 11 533\n", " 12 596\n", " 13 539\n", " 14 576\n", " 15 627\n", " 16 681\n", " 17 691\n", " 18 819\n", " 19 766\n", " 20 694\n", " 21 634\n", " 22 616\n", " 23 516\n", "dtype: int64" ] } ], "prompt_number": 52 }, { "cell_type": "code", "collapsed": false, "input": [ "keys = [crimes.date.dt.year,\n", " crimes.time.map(lambda x: x.hour)]\n", "counts = crimes.groupby(keys).size()\n", "counts.name = 'count'\n", "counts.index.names = ['year', 'hour']\n", "counts2 = counts.reset_index()\n", "counts2" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearhourcount
0 2014 0 7217
1 2014 1 4353
2 2014 2 3642
3 2014 3 2537
4 2014 4 1709
5 2014 5 1636
6 2014 6 2195
7 2014 7 3553
8 2014 8 5124
9 2014 9 5844
10 2014 10 6210
11 2014 11 6464
12 2014 12 8803
13 2014 13 7122
14 2014 14 7373
15 2014 15 8111
16 2014 16 8671
17 2014 17 9389
18 2014 18 9865
19 2014 19 9087
20 2014 20 8452
21 2014 21 7743
22 2014 22 7924
23 2014 23 7092
24 2015 0 504
25 2015 1 420
26 2015 2 332
27 2015 3 224
28 2015 4 156
29 2015 5 143
30 2015 6 179
31 2015 7 320
32 2015 8 403
33 2015 9 530
34 2015 10 497
35 2015 11 533
36 2015 12 596
37 2015 13 539
38 2015 14 576
39 2015 15 627
40 2015 16 681
41 2015 17 691
42 2015 18 819
43 2015 19 766
44 2015 20 694
45 2015 21 634
46 2015 22 616
47 2015 23 516
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 53, "text": [ " year hour count\n", "0 2014 0 7217\n", "1 2014 1 4353\n", "2 2014 2 3642\n", "3 2014 3 2537\n", "4 2014 4 1709\n", "5 2014 5 1636\n", "6 2014 6 2195\n", "7 2014 7 3553\n", "8 2014 8 5124\n", "9 2014 9 5844\n", "10 2014 10 6210\n", "11 2014 11 6464\n", "12 2014 12 8803\n", "13 2014 13 7122\n", "14 2014 14 7373\n", "15 2014 15 8111\n", "16 2014 16 8671\n", "17 2014 17 9389\n", "18 2014 18 9865\n", "19 2014 19 9087\n", "20 2014 20 8452\n", "21 2014 21 7743\n", "22 2014 22 7924\n", "23 2014 23 7092\n", "24 2015 0 504\n", "25 2015 1 420\n", "26 2015 2 332\n", "27 2015 3 224\n", "28 2015 4 156\n", "29 2015 5 143\n", "30 2015 6 179\n", "31 2015 7 320\n", "32 2015 8 403\n", "33 2015 9 530\n", "34 2015 10 497\n", "35 2015 11 533\n", "36 2015 12 596\n", "37 2015 13 539\n", "38 2015 14 576\n", "39 2015 15 627\n", "40 2015 16 681\n", "41 2015 17 691\n", "42 2015 18 819\n", "43 2015 19 766\n", "44 2015 20 694\n", "45 2015 21 634\n", "46 2015 22 616\n", "47 2015 23 516" ] } ], "prompt_number": 53 }, { "cell_type": "code", "collapsed": false, "input": [ "def add_share(x):\n", " x['share'] = x['count'] / x['count'].sum()\n", " return x\n", "counts2.groupby('year').apply(add_share)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearhourcountshare
0 2014 0 7217 0.048076
1 2014 1 4353 0.028998
2 2014 2 3642 0.024261
3 2014 3 2537 0.016900
4 2014 4 1709 0.011385
5 2014 5 1636 0.010898
6 2014 6 2195 0.014622
7 2014 7 3553 0.023668
8 2014 8 5124 0.034134
9 2014 9 5844 0.038930
10 2014 10 6210 0.041368
11 2014 11 6464 0.043060
12 2014 12 8803 0.058641
13 2014 13 7122 0.047443
14 2014 14 7373 0.049115
15 2014 15 8111 0.054032
16 2014 16 8671 0.057762
17 2014 17 9389 0.062545
18 2014 18 9865 0.065716
19 2014 19 9087 0.060533
20 2014 20 8452 0.056303
21 2014 21 7743 0.051580
22 2014 22 7924 0.052786
23 2014 23 7092 0.047243
24 2015 0 504 0.042014
25 2015 1 420 0.035012
26 2015 2 332 0.027676
27 2015 3 224 0.018673
28 2015 4 156 0.013004
29 2015 5 143 0.011921
30 2015 6 179 0.014922
31 2015 7 320 0.026676
32 2015 8 403 0.033595
33 2015 9 530 0.044181
34 2015 10 497 0.041430
35 2015 11 533 0.044431
36 2015 12 596 0.049683
37 2015 13 539 0.044932
38 2015 14 576 0.048016
39 2015 15 627 0.052267
40 2015 16 681 0.056769
41 2015 17 691 0.057603
42 2015 18 819 0.068273
43 2015 19 766 0.063855
44 2015 20 694 0.057853
45 2015 21 634 0.052851
46 2015 22 616 0.051350
47 2015 23 516 0.043014
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 55, "text": [ " year hour count share\n", "0 2014 0 7217 0.048076\n", "1 2014 1 4353 0.028998\n", "2 2014 2 3642 0.024261\n", "3 2014 3 2537 0.016900\n", "4 2014 4 1709 0.011385\n", "5 2014 5 1636 0.010898\n", "6 2014 6 2195 0.014622\n", "7 2014 7 3553 0.023668\n", "8 2014 8 5124 0.034134\n", "9 2014 9 5844 0.038930\n", "10 2014 10 6210 0.041368\n", "11 2014 11 6464 0.043060\n", "12 2014 12 8803 0.058641\n", "13 2014 13 7122 0.047443\n", "14 2014 14 7373 0.049115\n", "15 2014 15 8111 0.054032\n", "16 2014 16 8671 0.057762\n", "17 2014 17 9389 0.062545\n", "18 2014 18 9865 0.065716\n", "19 2014 19 9087 0.060533\n", "20 2014 20 8452 0.056303\n", "21 2014 21 7743 0.051580\n", "22 2014 22 7924 0.052786\n", "23 2014 23 7092 0.047243\n", "24 2015 0 504 0.042014\n", "25 2015 1 420 0.035012\n", "26 2015 2 332 0.027676\n", "27 2015 3 224 0.018673\n", "28 2015 4 156 0.013004\n", "29 2015 5 143 0.011921\n", "30 2015 6 179 0.014922\n", "31 2015 7 320 0.026676\n", "32 2015 8 403 0.033595\n", "33 2015 9 530 0.044181\n", "34 2015 10 497 0.041430\n", "35 2015 11 533 0.044431\n", "36 2015 12 596 0.049683\n", "37 2015 13 539 0.044932\n", "38 2015 14 576 0.048016\n", "39 2015 15 627 0.052267\n", "40 2015 16 681 0.056769\n", "41 2015 17 691 0.057603\n", "42 2015 18 819 0.068273\n", "43 2015 19 766 0.063855\n", "44 2015 20 694 0.057853\n", "45 2015 21 634 0.052851\n", "46 2015 22 616 0.051350\n", "47 2015 23 516 0.043014" ] } ], "prompt_number": 55 }, { "cell_type": "code", "collapsed": false, "input": [ "rounds[:1].T" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
company_permalink /organization/-fame
company_name #fame
company_category_list |Media|
company_market Media
company_country_code IND
company_state_code NaN
company_region Mumbai
company_city Mumbai
funding_round_permalink /funding-round/9a01d05418af9f794eebff7ace91f638
funding_round_type venture
funding_round_code B
funded_at 2015-01-05
funded_month 2015-01
funded_quarter 2015-Q1
funded_year 2015
raised_amount_usd 10,000,000
Unnamed: 16 NaN
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 56, "text": [ " 0\n", "company_permalink /organization/-fame\n", "company_name #fame\n", "company_category_list |Media|\n", "company_market Media\n", "company_country_code IND\n", "company_state_code NaN\n", "company_region Mumbai\n", "company_city Mumbai\n", "funding_round_permalink /funding-round/9a01d05418af9f794eebff7ace91f638\n", "funding_round_type venture\n", "funding_round_code B\n", "funded_at 2015-01-05\n", "funded_month 2015-01\n", "funded_quarter 2015-Q1 \n", "funded_year 2015\n", "raised_amount_usd 10,000,000 \n", "Unnamed: 16 NaN" ] } ], "prompt_number": 56 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use ORDER BY in the window clause to achieve cumulative statistics" ] }, { "cell_type": "code", "collapsed": false, "input": [ "query = \"\"\"\n", " select company_permalink, funded_at,\n", " sum(case when raised_amount_usd is null then 0 else raised_amount_usd end) \n", " OVER (PARTITION BY company_permalink ORDER BY funded_at)\n", " as cumulative_raised\n", " from rounds\n", " order by company_permalink;\n", "\"\"\"\n", "sql.read_sql(query, con)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
company_permalinkfunded_atcumulative_raised
0 /organization/-fame 2015-01-05 10000000
1 /organization/-qounter 2014-03-01 700000
2 /organization/-qounter 2014-10-14 700000
3 /organization/0-6-com 2008-03-19 2000000
4 /organization/004-technologies 2014-07-24 0
5 /organization/01games-technology 2014-07-01 41250
6 /organization/0ndine-biomedical-inc 2009-09-11 43360
7 /organization/0ndine-biomedical-inc 2009-12-21 762851
8 /organization/0xdata 2013-01-03 1700000
9 /organization/0xdata 2014-07-19 10600000
10 /organization/1 2011-07-20 1000050
11 /organization/1 2013-02-05 1000050
12 /organization/1 2014-02-05 1150050
13 /organization/1-2-3-listo 2013-02-18 40000
14 /organization/1-4-all 2013-04-21 0
15 /organization/1-618-technology 2014-01-22 0
16 /organization/1-800-dentist 2010-08-19 0
17 /organization/1-800-doctors 2011-03-02 1750000
18 /organization/10-20-media 2009-06-18 500000
19 /organization/10-20-media 2010-03-30 750000
20 /organization/10-20-media 2011-01-11 1550000
21 /organization/10-20-media 2011-12-28 2050000
22 /organization/10-minutes-with 2013-01-01 400000
23 /organization/10-minutes-with 2014-10-09 4400000
24 /organization/1000-corks 2011-08-23 40000
25 /organization/1000-markets 2009-05-15 500000
26 /organization/1000chi 2010-04-01 43923865
27 /organization/1000jobboersen-de 2011-09-16 0
28 /organization/1000memories 2010-01-01 15000
29 /organization/1000memories 2011-02-16 2535000
............
87131 /organization/zynga 2010-04-24 70213000
87132 /organization/zynga 2010-06-14 370213000
87133 /organization/zynga 2010-10-12 376550786
87134 /organization/zynga 2011-02-18 866550786
87135 /organization/zyngenia 2010-09-09 25000000
87136 /organization/zynstra 2012-03-03 225000
87137 /organization/zynstra 2012-09-18 2550000
87138 /organization/zynstra 2013-09-12 6350000
87139 /organization/zynstra 2014-07-09 14750000
87140 /organization/zyomyx-inc 2010-01-29 555016
87141 /organization/zyomyx-inc 2013-06-30 12555016
87142 /organization/zyomyx-inc 2013-07-11 26775015
87143 /organization/zyomyx-inc 2014-03-26 34275015
87144 /organization/zypsee 2013-08-13 2150000
87145 /organization/zyraz-technology 2008-01-01 2007363
87146 /organization/zyraz-technology 2009-09-11 7407363
87147 /organization/zyraz-technology 2009-10-09 15398910
87148 /organization/zyraz-technology 2013-02-15 15419877
87149 /organization/zyrra 2010-10-21 50000
87150 /organization/zyrra 2010-11-15 495000
87151 /organization/zyrra 2010-12-15 1370000
87152 /organization/zyrra 2012-10-18 1510500
87153 /organization/zytoprotec 2013-01-29 2686600
87154 /organization/zzish 2014-03-24 320000
87155 /organization/zznode-science-and-technology-co... 2012-04-01 1587301
87156 /organization/zzzzapp-com 2011-11-01 7000
87157 /organization/zzzzapp-com 2013-03-19 39360
87158 /organization/zzzzapp-com 2013-08-01 71525
87159 /organization/zzzzapp-com 2014-01-01 84480
87160 /organization/zzzzapp-com 2014-09-10 97398
\n", "

87161 rows \u00d7 3 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 57, "text": [ " company_permalink funded_at \\\n", "0 /organization/-fame 2015-01-05 \n", "1 /organization/-qounter 2014-03-01 \n", "2 /organization/-qounter 2014-10-14 \n", "3 /organization/0-6-com 2008-03-19 \n", "4 /organization/004-technologies 2014-07-24 \n", "5 /organization/01games-technology 2014-07-01 \n", "6 /organization/0ndine-biomedical-inc 2009-09-11 \n", "7 /organization/0ndine-biomedical-inc 2009-12-21 \n", "8 /organization/0xdata 2013-01-03 \n", "9 /organization/0xdata 2014-07-19 \n", "10 /organization/1 2011-07-20 \n", "11 /organization/1 2013-02-05 \n", "12 /organization/1 2014-02-05 \n", "13 /organization/1-2-3-listo 2013-02-18 \n", "14 /organization/1-4-all 2013-04-21 \n", "15 /organization/1-618-technology 2014-01-22 \n", "16 /organization/1-800-dentist 2010-08-19 \n", "17 /organization/1-800-doctors 2011-03-02 \n", "18 /organization/10-20-media 2009-06-18 \n", "19 /organization/10-20-media 2010-03-30 \n", "20 /organization/10-20-media 2011-01-11 \n", "21 /organization/10-20-media 2011-12-28 \n", "22 /organization/10-minutes-with 2013-01-01 \n", "23 /organization/10-minutes-with 2014-10-09 \n", "24 /organization/1000-corks 2011-08-23 \n", "25 /organization/1000-markets 2009-05-15 \n", "26 /organization/1000chi 2010-04-01 \n", "27 /organization/1000jobboersen-de 2011-09-16 \n", "28 /organization/1000memories 2010-01-01 \n", "29 /organization/1000memories 2011-02-16 \n", "... ... ... \n", "87131 /organization/zynga 2010-04-24 \n", "87132 /organization/zynga 2010-06-14 \n", "87133 /organization/zynga 2010-10-12 \n", "87134 /organization/zynga 2011-02-18 \n", "87135 /organization/zyngenia 2010-09-09 \n", "87136 /organization/zynstra 2012-03-03 \n", "87137 /organization/zynstra 2012-09-18 \n", "87138 /organization/zynstra 2013-09-12 \n", "87139 /organization/zynstra 2014-07-09 \n", "87140 /organization/zyomyx-inc 2010-01-29 \n", "87141 /organization/zyomyx-inc 2013-06-30 \n", "87142 /organization/zyomyx-inc 2013-07-11 \n", "87143 /organization/zyomyx-inc 2014-03-26 \n", "87144 /organization/zypsee 2013-08-13 \n", "87145 /organization/zyraz-technology 2008-01-01 \n", "87146 /organization/zyraz-technology 2009-09-11 \n", "87147 /organization/zyraz-technology 2009-10-09 \n", "87148 /organization/zyraz-technology 2013-02-15 \n", "87149 /organization/zyrra 2010-10-21 \n", "87150 /organization/zyrra 2010-11-15 \n", "87151 /organization/zyrra 2010-12-15 \n", "87152 /organization/zyrra 2012-10-18 \n", "87153 /organization/zytoprotec 2013-01-29 \n", "87154 /organization/zzish 2014-03-24 \n", "87155 /organization/zznode-science-and-technology-co... 2012-04-01 \n", "87156 /organization/zzzzapp-com 2011-11-01 \n", "87157 /organization/zzzzapp-com 2013-03-19 \n", "87158 /organization/zzzzapp-com 2013-08-01 \n", "87159 /organization/zzzzapp-com 2014-01-01 \n", "87160 /organization/zzzzapp-com 2014-09-10 \n", "\n", " cumulative_raised \n", "0 10000000 \n", "1 700000 \n", "2 700000 \n", "3 2000000 \n", "4 0 \n", "5 41250 \n", "6 43360 \n", "7 762851 \n", "8 1700000 \n", "9 10600000 \n", "10 1000050 \n", "11 1000050 \n", "12 1150050 \n", "13 40000 \n", "14 0 \n", "15 0 \n", "16 0 \n", "17 1750000 \n", "18 500000 \n", "19 750000 \n", "20 1550000 \n", "21 2050000 \n", "22 400000 \n", "23 4400000 \n", "24 40000 \n", "25 500000 \n", "26 43923865 \n", "27 0 \n", "28 15000 \n", "29 2535000 \n", "... ... \n", "87131 70213000 \n", "87132 370213000 \n", "87133 376550786 \n", "87134 866550786 \n", "87135 25000000 \n", "87136 225000 \n", "87137 2550000 \n", "87138 6350000 \n", "87139 14750000 \n", "87140 555016 \n", "87141 12555016 \n", "87142 26775015 \n", "87143 34275015 \n", "87144 2150000 \n", "87145 2007363 \n", "87146 7407363 \n", "87147 15398910 \n", "87148 15419877 \n", "87149 50000 \n", "87150 495000 \n", "87151 1370000 \n", "87152 1510500 \n", "87153 2686600 \n", "87154 320000 \n", "87155 1587301 \n", "87156 7000 \n", "87157 39360 \n", "87158 71525 \n", "87159 84480 \n", "87160 97398 \n", "\n", "[87161 rows x 3 columns]" ] } ], "prompt_number": 57 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Bucketing analyses / Histograms\n", "===\n", "- pandas has `cut` (linear / ad hoc binning) and `qcut` (quantile binning) functions\n", "- SQL: must be constructed from primitive operations and window functions" ] }, { "cell_type": "code", "collapsed": false, "input": [ "acquisitions.price_amount.isnull().value_counts()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 58, "text": [ "True 51424\n", "False 3816\n", "dtype: int64" ] } ], "prompt_number": 58 }, { "cell_type": "code", "collapsed": false, "input": [ "acqs_known = acquisitions[acquisitions.price_amount.notnull()]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 59 }, { "cell_type": "code", "collapsed": false, "input": [ "acqs_known.price_amount.describe()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 60, "text": [ "count 3.816000e+03\n", "mean 7.182556e+08\n", "std 3.884648e+09\n", "min 1.000000e+00\n", "25% 1.822500e+07\n", "50% 8.295000e+07\n", "75% 3.450000e+08\n", "max 1.500000e+11\n", "Name: price_amount, dtype: float64" ] } ], "prompt_number": 60 }, { "cell_type": "code", "collapsed": false, "input": [ "top = 1000000000.\n", "acqs = acqs_known[(acqs_known.price_amount <= top) & (acqs_known.price_amount > 0)]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 61 }, { "cell_type": "code", "collapsed": false, "input": [ "pd.cut(acqs.price_amount, 4)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 62, "text": [ "2 (-999998.999, 250000000.75]\n", "4 (-999998.999, 250000000.75]\n", "7 (-999998.999, 250000000.75]\n", "22 (-999998.999, 250000000.75]\n", "23 (-999998.999, 250000000.75]\n", "25 (250000000.75, 500000000.5]\n", "31 (500000000.5, 750000000.25]\n", "35 (500000000.5, 750000000.25]\n", "38 (-999998.999, 250000000.75]\n", "39 (-999998.999, 250000000.75]\n", "44 (250000000.75, 500000000.5]\n", "46 (250000000.75, 500000000.5]\n", "53 (-999998.999, 250000000.75]\n", "65 (-999998.999, 250000000.75]\n", "70 (250000000.75, 500000000.5]\n", "...\n", "13458 (-999998.999, 250000000.75]\n", "13463 (-999998.999, 250000000.75]\n", "13471 (-999998.999, 250000000.75]\n", "13472 (500000000.5, 750000000.25]\n", "13473 (-999998.999, 250000000.75]\n", "13475 (-999998.999, 250000000.75]\n", "13476 (-999998.999, 250000000.75]\n", "13480 (-999998.999, 250000000.75]\n", "13483 (-999998.999, 250000000.75]\n", "13488 (-999998.999, 250000000.75]\n", "13491 (-999998.999, 250000000.75]\n", "13495 (-999998.999, 250000000.75]\n", "13496 (250000000.75, 500000000.5]\n", "13505 (-999998.999, 250000000.75]\n", "13506 (-999998.999, 250000000.75]\n", "Name: price_amount, Length: 3400, dtype: category\n", "Categories (4, object): [(-999998.999, 250000000.75] < (250000000.75, 500000000.5] < (500000000.5, 750000000.25] < (750000000.25, 1000000000]]" ] } ], "prompt_number": 62 }, { "cell_type": "code", "collapsed": false, "input": [ "bins = pd.cut(acqs.price_amount, [0, top / 4, top / 2, 3 * top / 4, top])\n", "acqs.groupby(bins).size()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 63, "text": [ "price_amount\n", "(0, 250000000] 2650\n", "(250000000, 500000000] 464\n", "(500000000, 750000000] 188\n", "(750000000, 1000000000] 98\n", "dtype: int64" ] } ], "prompt_number": 63 }, { "cell_type": "code", "collapsed": false, "input": [ "# Get Quantile numbers\n", "(pd.qcut(acqs.price_amount, 4, labels=False) + 1).value_counts()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 64, "text": [ "1 871\n", "3 856\n", "4 844\n", "2 829\n", "dtype: int64" ] } ], "prompt_number": 64 }, { "cell_type": "code", "collapsed": false, "input": [ "quartiles = pd.qcut(acqs.price_amount, 4, labels=False)\n", "grouped = acqs.groupby(quartiles)\n", "\n", "def f(x):\n", " return pd.Series({\n", " 'lb': x.price_amount.min(),\n", " 'ub': x.price_amount.max(),\n", " 'num': len(x)\n", " })\n", "\n", "grouped.apply(f)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lbnumub
0 1 871 15000000
1 15040000 829 59000000
2 60000000 856 210000000
3 212000000 844 1000000000
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 65, "text": [ " lb num ub\n", "0 1 871 15000000\n", "1 15040000 829 59000000\n", "2 60000000 856 210000000\n", "3 212000000 844 1000000000" ] } ], "prompt_number": 65 }, { "cell_type": "code", "collapsed": false, "input": [ "query = \"\"\"\n", "with t0 as ( select \n", " company_permalink, price_amount\n", " from acquisitions a\n", " where price_amount is not null\n", " and price_amount <= 1000000000\n", ")\n", "select t2.bucket, sum(t2.price_amount), count(*)\n", "from (\n", " select t0.company_permalink, t0.price_amount, \n", " ceil((t0.price_amount - lb) / ((ub - lb) / 4)) as bucket\n", " from t0 \n", " cross join (\n", " select min(price_amount) - 1e-14 as lb, max(price_amount) as ub\n", " from t0\n", " ) t1\n", ") t2\n", "group by 1;\n", "\"\"\"\n", "\n", "sql.read_sql(query, con)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bucketsumcount
0 1 1.588106e+11 2650
1 2 1.677637e+11 464
2 3 1.173550e+11 188
3 4 8.616420e+10 98
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 66, "text": [ " bucket sum count\n", "0 1 1.588106e+11 2650\n", "1 2 1.677637e+11 464\n", "2 3 1.173550e+11 188\n", "3 4 8.616420e+10 98" ] } ], "prompt_number": 66 }, { "cell_type": "code", "collapsed": false, "input": [ "query = \"\"\"\n", "select bucket, min(price_amount) as lb,\n", " max(price_amount) as ub, count(*)\n", "from (\n", " select \n", " ntile(4) over (order by price_amount) AS bucket,\n", " price_amount\n", " from acquisitions a\n", " where price_amount is not null\n", " and price_amount < 1000000000\n", ") t0\n", "group by 1\n", "order by bucket;\n", "\"\"\"\n", "\n", "sql.read_sql(query, con)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bucketlbubcount
0 1 1 15000000 847
1 2 15000000 57500000 847
2 3 57900000 206000000 847
3 4 206000000 996000000 846
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 67, "text": [ " bucket lb ub count\n", "0 1 1 15000000 847\n", "1 2 15000000 57500000 847\n", "2 3 57900000 206000000 847\n", "3 4 206000000 996000000 846" ] } ], "prompt_number": 67 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }