{ "metadata": { "name": "", "signature": "sha256:37bd0da9afd2725d4ef45643acc51cecc2b827f53c78a16123ec814adede01a9" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "##The Setup\n", "Import Pandas and set [IPython Notebook](http://ipython.org/notebook.html) display settings. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "\n", "pd.options.display.max_columns = 5200\n", "pd.options.display.max_rows = 5200\n", "\n", "wk = \"/Users/danielmsheehan/Desktop/\" #Define our workspace" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Download and Unzip a Shapefile \n", "####[From the United States Census Bureau](https://www.census.gov/geo/maps-data/data/tiger-line.html)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "#Download Shapefile \n", "import urllib\n", "\n", "zipLoc = wk+\"cb_2013_us_state_20m.zip\"\n", "fileURL = \"http://www2.census.gov/geo/tiger/GENZ2013/cb_2013_us_state_20m.zip\"\n", "urllib.urlretrieve (fileURL, zipLoc) #get that file\n", "\n", "import zipfile \n", " \n", "zip = zipfile.ZipFile(zipLoc) \n", "zip.extractall(wk) #unzip!" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "##States Shapefile in [QGIS](http://www2.qgis.org/)\n", "\n", "![states](https://raw.githubusercontent.com/stat4701-edav-d3/d3-presentation/master/img/states_qgis.png)\n", "\n", "And now the date table associated with the shapefile (in the .dbf). Note: data or columns in GIS files is often referred to as an 'attribute table.'\n", "\n", "![state attributes](https://raw.githubusercontent.com/stat4701-edav-d3/d3-presentation/master/img/states_attribute.png)\n", "\n", "\n", "##[Back to the slides](http://localhost:4000/remark-develop/index_2.html#8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##GDAL/OGR: Conversion from Shapefile to JSON\n", "\n", "Using [GDAL/OGR](http://www.gdal.org/) convert the Shapefile to JSON. \n", "\n", " ogr2ogr -f GeoJSON /Users/danielmsheehan/GitHub/d3-presentation/data/census/states/states.json /Users/danielmsheehan/GitHub/d3-presentation/data/census/states/cb_2013_us_state_20m.shp\n", "\n", "Then, after installing node.js, we can create TopoJson\n", " \n", " topojson /Users/danielmsheehan/GitHub/d3-presentation/data/census/states/states.topo.json /Users/danielmsheehan/GitHub/d3-presentation/data/census/states/states.json\n", "\n", "More simply:\n", " topojson states_cln.topo.json states.json\n", " \n", "[So here's this TopoJSON file in a simple plot](http://stat4701-edav-d3.github.io/viz/plot_json_1.html) \n", " \n", "![plot json](https://raw.githubusercontent.com/stat4701-edav-d3/d3-presentation/master/img/plot_json.png) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##The Google Flu Data\n", "From [here](https://www.google.org/flutrends/us/data.txt) at this [site](https://www.google.org/flutrends/us/#US)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "inFluGoo = 'https://www.google.org/flutrends/us/data.txt' #The online Google Flu Trends .txt\n", "\n", "dfFluGoo = pd.read_csv(inFluGoo, header=11) #Let's read the Google data into a dataframe\n", "\n", "dfFluGoo.head(3) #Let's see the data" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", " | Date | \n", "United States | \n", "Alabama | \n", "Alaska | \n", "Arizona | \n", "Arkansas | \n", "California | \n", "Colorado | \n", "Connecticut | \n", "Delaware | \n", "District of Columbia | \n", "Florida | \n", "Georgia | \n", "Hawaii | \n", "Idaho | \n", "Illinois | \n", "Indiana | \n", "Iowa | \n", "Kansas | \n", "Kentucky | \n", "Louisiana | \n", "Maine | \n", "Maryland | \n", "Massachusetts | \n", "Michigan | \n", "Minnesota | \n", "Mississippi | \n", "Missouri | \n", "Montana | \n", "Nebraska | \n", "Nevada | \n", "New Hampshire | \n", "New Jersey | \n", "New Mexico | \n", "New York | \n", "North Carolina | \n", "North Dakota | \n", "Ohio | \n", "Oklahoma | \n", "Oregon | \n", "Pennsylvania | \n", "Rhode Island | \n", "South Carolina | \n", "South Dakota | \n", "Tennessee | \n", "Texas | \n", "Utah | \n", "Vermont | \n", "Virginia | \n", "Washington | \n", "West Virginia | \n", "Wisconsin | \n", "Wyoming | \n", "HHS Region 1 (CT, ME, MA, NH, RI, VT) | \n", "HHS Region 2 (NJ, NY) | \n", "HHS Region 3 (DE, DC, MD, PA, VA, WV) | \n", "HHS Region 4 (AL, FL, GA, KY, MS, NC, SC, TN) | \n", "HHS Region 5 (IL, IN, MI, MN, OH, WI) | \n", "HHS Region 6 (AR, LA, NM, OK, TX) | \n", "HHS Region 7 (IA, KS, MO, NE) | \n", "HHS Region 8 (CO, MT, ND, SD, UT, WY) | \n", "HHS Region 9 (AZ, CA, HI, NV) | \n", "HHS Region 10 (AK, ID, OR, WA) | \n", "Anchorage, AK | \n", "Birmingham, AL | \n", "Little Rock, AR | \n", "Mesa, AZ | \n", "Phoenix, AZ | \n", "Scottsdale, AZ | \n", "Tempe, AZ | \n", "Tucson, AZ | \n", "Berkeley, CA | \n", "Fresno, CA | \n", "Irvine, CA | \n", "Los Angeles, CA | \n", "Oakland, CA | \n", "Sacramento, CA | \n", "San Diego, CA | \n", "San Francisco, CA | \n", "San Jose, CA | \n", "Santa Clara, CA | \n", "Sunnyvale, CA | \n", "Colorado Springs, CO | \n", "Denver, CO | \n", "Washington, DC | \n", "Gainesville, FL | \n", "Jacksonville, FL | \n", "Miami, FL | \n", "Orlando, FL | \n", "Tampa, FL | \n", "Atlanta, GA | \n", "Roswell, GA | \n", "Honolulu, HI | \n", "Des Moines, IA | \n", "Boise, ID | \n", "Chicago, IL | \n", "Indianapolis, IN | \n", "Wichita, KS | \n", "Lexington, KY | \n", "Baton Rouge, LA | \n", "New Orleans, LA | \n", "Boston, MA | \n", "Somerville, MA | \n", "Baltimore, MD | \n", "Grand Rapids, MI | \n", "St Paul, MN | \n", "Kansas City, MO | \n", "Springfield, MO | \n", "St Louis, MO | \n", "Jackson, MS | \n", "Cary, NC | \n", "Charlotte, NC | \n", "Durham, NC | \n", "Greensboro, NC | \n", "Raleigh, NC | \n", "Lincoln, NE | \n", "Omaha, NE | \n", "Newark, NJ | \n", "Albuquerque, NM | \n", "Las Vegas, NV | \n", "Reno, NV | \n", "Albany, NY | \n", "Buffalo, NY | \n", "New York, NY | \n", "Rochester, NY | \n", "Cleveland, OH | \n", "Columbus, OH | \n", "Dayton, OH | \n", "Oklahoma City, OK | \n", "Tulsa, OK | \n", "Beaverton, OR | \n", "Eugene, OR | \n", "Portland, OR | \n", "Philadelphia, PA | \n", "Pittsburgh, PA | \n", "State College, PA | \n", "Providence, RI | \n", "Columbia, SC | \n", "Greenville, SC | \n", "Knoxville, TN | \n", "Memphis, TN | \n", "Nashville, TN | \n", "Austin, TX | \n", "Dallas, TX | \n", "Ft Worth, TX | \n", "Houston, TX | \n", "Irving, TX | \n", "Lubbock, TX | \n", "Plano, TX | \n", "San Antonio, TX | \n", "Salt Lake City, UT | \n", "Arlington, VA | \n", "Norfolk, VA | \n", "Reston, VA | \n", "Richmond, VA | \n", "Bellevue, WA | \n", "Seattle, WA | \n", "Spokane, WA | \n", "Madison, WI | \n", "Milwaukee, WI | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2003-09-28 | \n", "902 | \n", "477 | \n", "NaN | \n", "606 | \n", "NaN | \n", "929 | \n", "233 | \n", "223 | \n", "NaN | \n", "927 | \n", "587 | \n", "514 | \n", "NaN | \n", "NaN | \n", "677 | \n", "544 | \n", "303 | \n", "272 | \n", "420 | \n", "1017 | \n", "NaN | \n", "1268 | \n", "344 | \n", "685 | \n", "484 | \n", "NaN | \n", "349 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "695 | \n", "NaN | \n", "649 | \n", "565 | \n", "NaN | \n", "616 | \n", "1040 | \n", "409 | \n", "1186 | \n", "NaN | \n", "462 | \n", "NaN | \n", "551 | \n", "1398 | \n", "NaN | \n", "NaN | \n", "1112 | \n", "588 | \n", "NaN | \n", "466 | \n", "NaN | \n", "322 | \n", "666 | \n", "1366 | \n", "631 | \n", "690 | \n", "1385 | \n", "385 | \n", "266 | \n", "878 | \n", "624 | \n", "NaN | \n", "407 | \n", "NaN | \n", "NaN | \n", "757 | \n", "NaN | \n", "585 | \n", "598 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "901 | \n", "848 | \n", "448 | \n", "562 | \n", "1003 | \n", "731 | \n", "990 | \n", "602 | \n", "NaN | \n", "235 | \n", "1153 | \n", "NaN | \n", "NaN | \n", "373 | \n", "609 | \n", "461 | \n", "519 | \n", "NaN | \n", "794 | \n", "NaN | \n", "NaN | \n", "731 | \n", "641 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1154 | \n", "314 | \n", "332 | \n", "1505 | \n", "NaN | \n", "426 | \n", "330 | \n", "NaN | \n", "391 | \n", "NaN | \n", "NaN | \n", "561 | \n", "521 | \n", "NaN | \n", "503 | \n", "NaN | \n", "314 | \n", "540 | \n", "NaN | \n", "843 | \n", "NaN | \n", "505 | \n", "NaN | \n", "579 | \n", "406 | \n", "466 | \n", "437 | \n", "NaN | \n", "924 | \n", "1034 | \n", "NaN | \n", "NaN | \n", "444 | \n", "1204 | \n", "1122 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "425 | \n", "1150 | \n", "1200 | \n", "NaN | \n", "1412 | \n", "1122 | \n", "NaN | \n", "NaN | \n", "986 | \n", "261 | \n", "1066 | \n", "948 | \n", "NaN | \n", "1035 | \n", "NaN | \n", "668 | \n", "NaN | \n", "622 | \n", "452 | \n", "
1 | \n", "2003-10-05 | \n", "952 | \n", "501 | \n", "NaN | \n", "663 | \n", "NaN | \n", "849 | \n", "251 | \n", "243 | \n", "NaN | \n", "993 | \n", "582 | \n", "532 | \n", "NaN | \n", "NaN | \n", "732 | \n", "607 | \n", "303 | \n", "270 | \n", "442 | \n", "1096 | \n", "NaN | \n", "1374 | \n", "362 | \n", "748 | \n", "514 | \n", "NaN | \n", "359 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "716 | \n", "NaN | \n", "725 | \n", "660 | \n", "NaN | \n", "699 | \n", "1065 | \n", "409 | \n", "1176 | \n", "NaN | \n", "478 | \n", "NaN | \n", "597 | \n", "1517 | \n", "NaN | \n", "NaN | \n", "1198 | \n", "624 | \n", "NaN | \n", "504 | \n", "NaN | \n", "381 | \n", "711 | \n", "1335 | \n", "652 | \n", "775 | \n", "1613 | \n", "400 | \n", "271 | \n", "853 | \n", "688 | \n", "NaN | \n", "402 | \n", "NaN | \n", "NaN | \n", "796 | \n", "NaN | \n", "608 | \n", "674 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "891 | \n", "888 | \n", "436 | \n", "840 | \n", "1115 | \n", "740 | \n", "915 | \n", "594 | \n", "NaN | \n", "270 | \n", "1310 | \n", "NaN | \n", "NaN | \n", "386 | \n", "663 | \n", "581 | \n", "484 | \n", "NaN | \n", "877 | \n", "NaN | \n", "NaN | \n", "850 | \n", "657 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1162 | \n", "323 | \n", "375 | \n", "1535 | \n", "NaN | \n", "423 | \n", "316 | \n", "NaN | \n", "397 | \n", "NaN | \n", "NaN | \n", "673 | \n", "536 | \n", "NaN | \n", "586 | \n", "NaN | \n", "331 | \n", "549 | \n", "NaN | \n", "831 | \n", "NaN | \n", "508 | \n", "NaN | \n", "730 | \n", "483 | \n", "535 | \n", "415 | \n", "NaN | \n", "894 | \n", "1042 | \n", "NaN | \n", "NaN | \n", "471 | \n", "1124 | \n", "1193 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "468 | \n", "1331 | \n", "1487 | \n", "NaN | \n", "2057 | \n", "1208 | \n", "NaN | \n", "NaN | \n", "989 | \n", "249 | \n", "1249 | \n", "963 | \n", "NaN | \n", "1135 | \n", "NaN | \n", "787 | \n", "NaN | \n", "626 | \n", "449 | \n", "
2 | \n", "2003-10-12 | \n", "1092 | \n", "492 | \n", "NaN | \n", "700 | \n", "NaN | \n", "1032 | \n", "283 | \n", "261 | \n", "NaN | \n", "1033 | \n", "606 | \n", "557 | \n", "NaN | \n", "NaN | \n", "799 | \n", "637 | \n", "312 | \n", "280 | \n", "460 | \n", "1144 | \n", "NaN | \n", "1445 | \n", "372 | \n", "791 | \n", "588 | \n", "NaN | \n", "381 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "815 | \n", "NaN | \n", "739 | \n", "861 | \n", "NaN | \n", "729 | \n", "1122 | \n", "428 | \n", "1340 | \n", "NaN | \n", "521 | \n", "NaN | \n", "670 | \n", "2010 | \n", "NaN | \n", "NaN | \n", "1343 | \n", "777 | \n", "NaN | \n", "538 | \n", "NaN | \n", "410 | \n", "819 | \n", "1411 | \n", "735 | \n", "760 | \n", "2089 | \n", "422 | \n", "285 | \n", "1102 | \n", "791 | \n", "NaN | \n", "428 | \n", "NaN | \n", "NaN | \n", "766 | \n", "NaN | \n", "629 | \n", "731 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1165 | \n", "839 | \n", "468 | \n", "938 | \n", "1311 | \n", "826 | \n", "989 | \n", "609 | \n", "NaN | \n", "257 | \n", "1309 | \n", "641 | \n", "NaN | \n", "370 | \n", "615 | \n", "567 | \n", "497 | \n", "NaN | \n", "1030 | \n", "NaN | \n", "NaN | \n", "799 | \n", "685 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1274 | \n", "369 | \n", "447 | \n", "1549 | \n", "NaN | \n", "457 | \n", "343 | \n", "NaN | \n", "408 | \n", "NaN | \n", "NaN | \n", "738 | \n", "521 | \n", "NaN | \n", "838 | \n", "NaN | \n", "373 | \n", "575 | \n", "1068 | \n", "824 | \n", "NaN | \n", "555 | \n", "NaN | \n", "652 | \n", "476 | \n", "671 | \n", "442 | \n", "NaN | \n", "922 | \n", "1089 | \n", "NaN | \n", "NaN | \n", "574 | \n", "1249 | \n", "1306 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "497 | \n", "1492 | \n", "1869 | \n", "NaN | \n", "3770 | \n", "1191 | \n", "NaN | \n", "NaN | \n", "1463 | \n", "295 | \n", "1289 | \n", "970 | \n", "NaN | \n", "1170 | \n", "NaN | \n", "994 | \n", "NaN | \n", "661 | \n", "437 | \n", "