{ "metadata": { "name": "", "signature": "sha256:37bd0da9afd2725d4ef45643acc51cecc2b827f53c78a16123ec814adede01a9" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "##The Setup\n", "Import Pandas and set [IPython Notebook](http://ipython.org/notebook.html) display settings. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "\n", "pd.options.display.max_columns = 5200\n", "pd.options.display.max_rows = 5200\n", "\n", "wk = \"/Users/danielmsheehan/Desktop/\" #Define our workspace" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Download and Unzip a Shapefile \n", "####[From the United States Census Bureau](https://www.census.gov/geo/maps-data/data/tiger-line.html)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "#Download Shapefile \n", "import urllib\n", "\n", "zipLoc = wk+\"cb_2013_us_state_20m.zip\"\n", "fileURL = \"http://www2.census.gov/geo/tiger/GENZ2013/cb_2013_us_state_20m.zip\"\n", "urllib.urlretrieve (fileURL, zipLoc) #get that file\n", "\n", "import zipfile \n", " \n", "zip = zipfile.ZipFile(zipLoc) \n", "zip.extractall(wk) #unzip!" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "##States Shapefile in [QGIS](http://www2.qgis.org/)\n", "\n", "![states](https://raw.githubusercontent.com/stat4701-edav-d3/d3-presentation/master/img/states_qgis.png)\n", "\n", "And now the date table associated with the shapefile (in the .dbf). Note: data or columns in GIS files is often referred to as an 'attribute table.'\n", "\n", "![state attributes](https://raw.githubusercontent.com/stat4701-edav-d3/d3-presentation/master/img/states_attribute.png)\n", "\n", "\n", "##[Back to the slides](http://localhost:4000/remark-develop/index_2.html#8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##GDAL/OGR: Conversion from Shapefile to JSON\n", "\n", "Using [GDAL/OGR](http://www.gdal.org/) convert the Shapefile to JSON. \n", "\n", " ogr2ogr -f GeoJSON /Users/danielmsheehan/GitHub/d3-presentation/data/census/states/states.json /Users/danielmsheehan/GitHub/d3-presentation/data/census/states/cb_2013_us_state_20m.shp\n", "\n", "Then, after installing node.js, we can create TopoJson\n", " \n", " topojson /Users/danielmsheehan/GitHub/d3-presentation/data/census/states/states.topo.json /Users/danielmsheehan/GitHub/d3-presentation/data/census/states/states.json\n", "\n", "More simply:\n", " topojson states_cln.topo.json states.json\n", " \n", "[So here's this TopoJSON file in a simple plot](http://stat4701-edav-d3.github.io/viz/plot_json_1.html) \n", " \n", "![plot json](https://raw.githubusercontent.com/stat4701-edav-d3/d3-presentation/master/img/plot_json.png) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##The Google Flu Data\n", "From [here](https://www.google.org/flutrends/us/data.txt) at this [site](https://www.google.org/flutrends/us/#US)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "inFluGoo = 'https://www.google.org/flutrends/us/data.txt' #The online Google Flu Trends .txt\n", "\n", "dfFluGoo = pd.read_csv(inFluGoo, header=11) #Let's read the Google data into a dataframe\n", "\n", "dfFluGoo.head(3) #Let's see the data" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateUnited StatesAlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaHawaiiIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyomingHHS Region 1 (CT, ME, MA, NH, RI, VT)HHS Region 2 (NJ, NY)HHS Region 3 (DE, DC, MD, PA, VA, WV)HHS Region 4 (AL, FL, GA, KY, MS, NC, SC, TN)HHS Region 5 (IL, IN, MI, MN, OH, WI)HHS Region 6 (AR, LA, NM, OK, TX)HHS Region 7 (IA, KS, MO, NE)HHS Region 8 (CO, MT, ND, SD, UT, WY)HHS Region 9 (AZ, CA, HI, NV)HHS Region 10 (AK, ID, OR, WA)Anchorage, AKBirmingham, ALLittle Rock, ARMesa, AZPhoenix, AZScottsdale, AZTempe, AZTucson, AZBerkeley, CAFresno, CAIrvine, CALos Angeles, CAOakland, CASacramento, CASan Diego, CASan Francisco, CASan Jose, CASanta Clara, CASunnyvale, CAColorado Springs, CODenver, COWashington, DCGainesville, FLJacksonville, FLMiami, FLOrlando, FLTampa, FLAtlanta, GARoswell, GAHonolulu, HIDes Moines, IABoise, IDChicago, ILIndianapolis, INWichita, KSLexington, KYBaton Rouge, LANew Orleans, LABoston, MASomerville, MABaltimore, MDGrand Rapids, MISt Paul, MNKansas City, MOSpringfield, MOSt Louis, MOJackson, MSCary, NCCharlotte, NCDurham, NCGreensboro, NCRaleigh, NCLincoln, NEOmaha, NENewark, NJAlbuquerque, NMLas Vegas, NVReno, NVAlbany, NYBuffalo, NYNew York, NYRochester, NYCleveland, OHColumbus, OHDayton, OHOklahoma City, OKTulsa, OKBeaverton, OREugene, ORPortland, ORPhiladelphia, PAPittsburgh, PAState College, PAProvidence, RIColumbia, SCGreenville, SCKnoxville, TNMemphis, TNNashville, TNAustin, TXDallas, TXFt Worth, TXHouston, TXIrving, TXLubbock, TXPlano, TXSan Antonio, TXSalt Lake City, UTArlington, VANorfolk, VAReston, VARichmond, VABellevue, WASeattle, WASpokane, WAMadison, WIMilwaukee, WI
0 2003-09-28 902 477NaN 606NaN 929 233 223NaN 927 587 514NaNNaN 677 544 303 272 420 1017NaN 1268 344 685 484NaN 349NaNNaNNaNNaN 695NaN 649 565NaN 616 1040 409 1186NaN 462NaN 551 1398NaNNaN 1112 588NaN 466NaN 322 666 1366 631 690 1385 385 266 878 624NaN 407NaNNaN 757NaN 585 598NaNNaNNaN 901 848 448 562 1003 731 990 602NaN 235 1153 NaNNaN 373 609 461 519NaN 794NaNNaN 731 641NaNNaNNaN 1154 314 332 1505NaN 426 330NaN 391NaNNaN 561 521NaN 503NaN 314 540 NaN 843NaN 505NaN 579 406 466 437NaN 924 1034NaNNaN 444 1204 1122NaNNaNNaNNaNNaNNaN 425 1150 1200NaN 1412 1122NaNNaN 986 261 1066 948NaN 1035NaN 668NaN 622 452
1 2003-10-05 952 501NaN 663NaN 849 251 243NaN 993 582 532NaNNaN 732 607 303 270 442 1096NaN 1374 362 748 514NaN 359NaNNaNNaNNaN 716NaN 725 660NaN 699 1065 409 1176NaN 478NaN 597 1517NaNNaN 1198 624NaN 504NaN 381 711 1335 652 775 1613 400 271 853 688NaN 402NaNNaN 796NaN 608 674NaNNaNNaN 891 888 436 840 1115 740 915 594NaN 270 1310 NaNNaN 386 663 581 484NaN 877NaNNaN 850 657NaNNaNNaN 1162 323 375 1535NaN 423 316NaN 397NaNNaN 673 536NaN 586NaN 331 549 NaN 831NaN 508NaN 730 483 535 415NaN 894 1042NaNNaN 471 1124 1193NaNNaNNaNNaNNaNNaN 468 1331 1487NaN 2057 1208NaNNaN 989 249 1249 963NaN 1135NaN 787NaN 626 449
2 2003-10-12 1092 492NaN 700NaN 1032 283 261NaN 1033 606 557NaNNaN 799 637 312 280 460 1144NaN 1445 372 791 588NaN 381NaNNaNNaNNaN 815NaN 739 861NaN 729 1122 428 1340NaN 521NaN 670 2010NaNNaN 1343 777NaN 538NaN 410 819 1411 735 760 2089 422 285 1102 791NaN 428NaNNaN 766NaN 629 731NaNNaNNaN 1165 839 468 938 1311 826 989 609NaN 257 1309 641NaN 370 615 567 497NaN 1030NaNNaN 799 685NaNNaNNaN 1274 369 447 1549NaN 457 343NaN 408NaNNaN 738 521NaN 838NaN 373 575 1068 824NaN 555NaN 652 476 671 442NaN 922 1089NaNNaN 574 1249 1306NaNNaNNaNNaNNaNNaN 497 1492 1869NaN 3770 1191NaNNaN 1463 295 1289 970NaN 1170NaN 994NaN 661 437
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 15, "text": [ " Date United States Alabama Alaska Arizona Arkansas California \\\n", "0 2003-09-28 902 477 NaN 606 NaN 929 \n", "1 2003-10-05 952 501 NaN 663 NaN 849 \n", "2 2003-10-12 1092 492 NaN 700 NaN 1032 \n", "\n", " Colorado Connecticut Delaware District of Columbia Florida Georgia \\\n", "0 233 223 NaN 927 587 514 \n", "1 251 243 NaN 993 582 532 \n", "2 283 261 NaN 1033 606 557 \n", "\n", " Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine \\\n", "0 NaN NaN 677 544 303 272 420 1017 NaN \n", "1 NaN NaN 732 607 303 270 442 1096 NaN \n", "2 NaN NaN 799 637 312 280 460 1144 NaN \n", "\n", " Maryland Massachusetts Michigan Minnesota Mississippi Missouri \\\n", "0 1268 344 685 484 NaN 349 \n", "1 1374 362 748 514 NaN 359 \n", "2 1445 372 791 588 NaN 381 \n", "\n", " Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York \\\n", "0 NaN NaN NaN NaN 695 NaN 649 \n", "1 NaN NaN NaN NaN 716 NaN 725 \n", "2 NaN NaN NaN NaN 815 NaN 739 \n", "\n", " North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania \\\n", "0 565 NaN 616 1040 409 1186 \n", "1 660 NaN 699 1065 409 1176 \n", "2 861 NaN 729 1122 428 1340 \n", "\n", " Rhode Island South Carolina South Dakota Tennessee Texas Utah \\\n", "0 NaN 462 NaN 551 1398 NaN \n", "1 NaN 478 NaN 597 1517 NaN \n", "2 NaN 521 NaN 670 2010 NaN \n", "\n", " Vermont Virginia Washington West Virginia Wisconsin Wyoming \\\n", "0 NaN 1112 588 NaN 466 NaN \n", "1 NaN 1198 624 NaN 504 NaN \n", "2 NaN 1343 777 NaN 538 NaN \n", "\n", " HHS Region 1 (CT, ME, MA, NH, RI, VT) HHS Region 2 (NJ, NY) \\\n", "0 322 666 \n", "1 381 711 \n", "2 410 819 \n", "\n", " HHS Region 3 (DE, DC, MD, PA, VA, WV) \\\n", "0 1366 \n", "1 1335 \n", "2 1411 \n", "\n", " HHS Region 4 (AL, FL, GA, KY, MS, NC, SC, TN) \\\n", "0 631 \n", "1 652 \n", "2 735 \n", "\n", " HHS Region 5 (IL, IN, MI, MN, OH, WI) HHS Region 6 (AR, LA, NM, OK, TX) \\\n", "0 690 1385 \n", "1 775 1613 \n", "2 760 2089 \n", "\n", " HHS Region 7 (IA, KS, MO, NE) HHS Region 8 (CO, MT, ND, SD, UT, WY) \\\n", "0 385 266 \n", "1 400 271 \n", "2 422 285 \n", "\n", " HHS Region 9 (AZ, CA, HI, NV) HHS Region 10 (AK, ID, OR, WA) \\\n", "0 878 624 \n", "1 853 688 \n", "2 1102 791 \n", "\n", " Anchorage, AK Birmingham, AL Little Rock, AR Mesa, AZ Phoenix, AZ \\\n", "0 NaN 407 NaN NaN 757 \n", "1 NaN 402 NaN NaN 796 \n", "2 NaN 428 NaN NaN 766 \n", "\n", " Scottsdale, AZ Tempe, AZ Tucson, AZ Berkeley, CA Fresno, CA \\\n", "0 NaN 585 598 NaN NaN \n", "1 NaN 608 674 NaN NaN \n", "2 NaN 629 731 NaN NaN \n", "\n", " Irvine, CA Los Angeles, CA Oakland, CA Sacramento, CA San Diego, CA \\\n", "0 NaN 901 848 448 562 \n", "1 NaN 891 888 436 840 \n", "2 NaN 1165 839 468 938 \n", "\n", " San Francisco, CA San Jose, CA Santa Clara, CA Sunnyvale, CA \\\n", "0 1003 731 990 602 \n", "1 1115 740 915 594 \n", "2 1311 826 989 609 \n", "\n", " Colorado Springs, CO Denver, CO Washington, DC Gainesville, FL \\\n", "0 NaN 235 1153 NaN \n", "1 NaN 270 1310 NaN \n", "2 NaN 257 1309 641 \n", "\n", " Jacksonville, FL Miami, FL Orlando, FL Tampa, FL Atlanta, GA \\\n", "0 NaN 373 609 461 519 \n", "1 NaN 386 663 581 484 \n", "2 NaN 370 615 567 497 \n", "\n", " Roswell, GA Honolulu, HI Des Moines, IA Boise, ID Chicago, IL \\\n", "0 NaN 794 NaN NaN 731 \n", "1 NaN 877 NaN NaN 850 \n", "2 NaN 1030 NaN NaN 799 \n", "\n", " Indianapolis, IN Wichita, KS Lexington, KY Baton Rouge, LA \\\n", "0 641 NaN NaN NaN \n", "1 657 NaN NaN NaN \n", "2 685 NaN NaN NaN \n", "\n", " New Orleans, LA Boston, MA Somerville, MA Baltimore, MD \\\n", "0 1154 314 332 1505 \n", "1 1162 323 375 1535 \n", "2 1274 369 447 1549 \n", "\n", " Grand Rapids, MI St Paul, MN Kansas City, MO Springfield, MO \\\n", "0 NaN 426 330 NaN \n", "1 NaN 423 316 NaN \n", "2 NaN 457 343 NaN \n", "\n", " St Louis, MO Jackson, MS Cary, NC Charlotte, NC Durham, NC \\\n", "0 391 NaN NaN 561 521 \n", "1 397 NaN NaN 673 536 \n", "2 408 NaN NaN 738 521 \n", "\n", " Greensboro, NC Raleigh, NC Lincoln, NE Omaha, NE Newark, NJ \\\n", "0 NaN 503 NaN 314 540 \n", "1 NaN 586 NaN 331 549 \n", "2 NaN 838 NaN 373 575 \n", "\n", " Albuquerque, NM Las Vegas, NV Reno, NV Albany, NY Buffalo, NY \\\n", "0 NaN 843 NaN 505 NaN \n", "1 NaN 831 NaN 508 NaN \n", "2 1068 824 NaN 555 NaN \n", "\n", " New York, NY Rochester, NY Cleveland, OH Columbus, OH Dayton, OH \\\n", "0 579 406 466 437 NaN \n", "1 730 483 535 415 NaN \n", "2 652 476 671 442 NaN \n", "\n", " Oklahoma City, OK Tulsa, OK Beaverton, OR Eugene, OR Portland, OR \\\n", "0 924 1034 NaN NaN 444 \n", "1 894 1042 NaN NaN 471 \n", "2 922 1089 NaN NaN 574 \n", "\n", " Philadelphia, PA Pittsburgh, PA State College, PA Providence, RI \\\n", "0 1204 1122 NaN NaN \n", "1 1124 1193 NaN NaN \n", "2 1249 1306 NaN NaN \n", "\n", " Columbia, SC Greenville, SC Knoxville, TN Memphis, TN Nashville, TN \\\n", "0 NaN NaN NaN NaN 425 \n", "1 NaN NaN NaN NaN 468 \n", "2 NaN NaN NaN NaN 497 \n", "\n", " Austin, TX Dallas, TX Ft Worth, TX Houston, TX Irving, TX Lubbock, TX \\\n", "0 1150 1200 NaN 1412 1122 NaN \n", "1 1331 1487 NaN 2057 1208 NaN \n", "2 1492 1869 NaN 3770 1191 NaN \n", "\n", " Plano, TX San Antonio, TX Salt Lake City, UT Arlington, VA Norfolk, VA \\\n", "0 NaN 986 261 1066 948 \n", "1 NaN 989 249 1249 963 \n", "2 NaN 1463 295 1289 970 \n", "\n", " Reston, VA Richmond, VA Bellevue, WA Seattle, WA Spokane, WA \\\n", "0 NaN 1035 NaN 668 NaN \n", "1 NaN 1135 NaN 787 NaN \n", "2 NaN 1170 NaN 994 NaN \n", "\n", " Madison, WI Milwaukee, WI \n", "0 622 452 \n", "1 626 449 \n", "2 661 437 " ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Getting Population Data from the US Census API" ] }, { "cell_type": "code", "collapsed": false, "input": [ "#Need to get US State Population Dataset to create a Flu Rate. \n", "dfPop = pd.io.json.read_json('http://api.census.gov/data/2010/sf1?key=30699f15ab4d04a1e0943715b539d256c9a3ee44&get=P0010001&for=state')\n", "\n", "#dfPop = dfPop.header(1)\n", "dfPop = dfPop.ix[1:]\n", "\n", "dfPop.columns = ['pop', 'fips']\n", "\n", "print dfPop.head(5)\n", "\n", "dfFIPS = pd.read_csv('https://raw.githubusercontent.com/nygeog/data/master/census/state_fips.csv', dtype={'fips':object})\n", "\n", "print dfFIPS.head(5)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " pop fips\n", "1 4779736 01\n", "2 710231 02\n", "3 6392017 04\n", "4 2915918 05\n", "5 37253956 06\n", " name abbrev fips info\n", "0 Alabama AL 01 State; counties\n", "1 Alaska AK 02 State; boroughs\n", "2 American Samoa AS 60 Outlying area under U.S. sovereignty\n", "3 American Samoa * NaN 03 (FIPS 5-1 reserved code)\n", "4 Arizona AZ 04 State; counties" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "df = dfS.merge(dfFIPS, how='left', on='state')\n", "\n", "df = df.merge(dfPop, how='left', on='fips')\n", "\n", "df.head(10)\n", "\n", "df.to_csv('/Users/danielmsheehan/GitHub/d3-presentation/data/flu/google/states_pop_week.csv', index=False)\n", "#print list(df.columns.values)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "df = pd.read_csv('/Users/danielmsheehan/GitHub/d3-presentation/data/flu/google/states_pop_week.csv', dtype={'fips':object})\n", "\n", "#df['pop'] = df['pop'].astype(float)\n", "\n", "#print df.types\n", "\n", "for i in weekList:\n", " df['rate'+i] = df[i]/df['pop'] * 10000\n", " \n", "df = df.drop(weekList, axis=1)\n", "\n", "df.head(53)\n", "\n", "df.to_csv('/Users/danielmsheehan/GitHub/d3-presentation/data/flu/google/states_pop_week_rate_state.csv', index=False)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "#Get a list of County FIPS\n", "dfCounties = pd.read_csv('http://www2.census.gov/geo/docs/reference/codes/files/national_county.txt', header=None, dtype={1:object,2:object})\n", "dfCounties.columns = ['state_abbrev', 'state_fips','county_fips','fullname','note']\n", "\n", "dfCounties['fips'] = dfCounties['state_fips']\n", "\n", "dfCounties.head(5)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "dfAll = dfCounties.merge(df, on='fips', how='left')\n", "\n", "dfAll['id'] = dfAll.state_fips.map(str) + dfAll.county_fips.map(str)\n", "\n", "dfAll['id'] = dfAll['id'].astype(int)\n", "\n", "dfAll.head(10)\n", "\n", "dfAll.to_csv('/Users/danielmsheehan/GitHub/d3-presentation/data/flu/google/states_pop_week_rates_county.csv', index=False)\n", "\n", "weekList = ['2003-09-28','2003-10-05','2003-10-12','2003-10-19','2003-10-26','2003-11-02','2003-11-09','2003-11-16','2003-11-23','2003-11-30','2003-12-07','2003-12-14','2003-12-21','2003-12-28','2004-01-04','2004-01-11','2004-01-18','2004-01-25','2004-02-01','2004-02-08','2004-02-15','2004-02-22','2004-02-29','2004-03-07','2004-03-14','2004-03-21','2004-03-28','2004-04-04','2004-04-11','2004-04-18','2004-04-25','2004-05-02','2004-05-09','2004-05-16','2004-05-23','2004-05-30','2004-06-06','2004-06-13','2004-06-20','2004-06-27','2004-07-04','2004-07-11','2004-07-18','2004-07-25','2004-08-01','2004-08-08','2004-08-15','2004-08-22','2004-08-29','2004-09-05','2004-09-12','2004-09-19','2004-09-26','2004-10-03','2004-10-10','2004-10-17','2004-10-24','2004-10-31','2004-11-07','2004-11-14','2004-11-21','2004-11-28','2004-12-05','2004-12-12','2004-12-19','2004-12-26','2005-01-02','2005-01-09','2005-01-16','2005-01-23','2005-01-30','2005-02-06','2005-02-13','2005-02-20','2005-02-27','2005-03-06','2005-03-13','2005-03-20','2005-03-27','2005-04-03','2005-04-10','2005-04-17','2005-04-24','2005-05-01','2005-05-08','2005-05-15','2005-05-22','2005-05-29','2005-06-05','2005-06-12','2005-06-19','2005-06-26','2005-07-03','2005-07-10','2005-07-17','2005-07-24','2005-07-31','2005-08-07','2005-08-14','2005-08-21','2005-08-28','2005-09-04','2005-09-11','2005-09-18','2005-09-25','2005-10-02','2005-10-09','2005-10-16','2005-10-23','2005-10-30','2005-11-06','2005-11-13','2005-11-20','2005-11-27','2005-12-04','2005-12-11','2005-12-18','2005-12-25','2006-01-01','2006-01-08','2006-01-15','2006-01-22','2006-01-29','2006-02-05','2006-02-12','2006-02-19','2006-02-26','2006-03-05','2006-03-12','2006-03-19','2006-03-26','2006-04-02','2006-04-09','2006-04-16','2006-04-23','2006-04-30','2006-05-07','2006-05-14','2006-05-21','2006-05-28','2006-06-04','2006-06-11','2006-06-18','2006-06-25','2006-07-02','2006-07-09','2006-07-16','2006-07-23','2006-07-30','2006-08-06','2006-08-13','2006-08-20','2006-08-27','2006-09-03','2006-09-10','2006-09-17','2006-09-24','2006-10-01','2006-10-08','2006-10-15','2006-10-22','2006-10-29','2006-11-05','2006-11-12','2006-11-19','2006-11-26','2006-12-03','2006-12-10','2006-12-17','2006-12-24','2006-12-31','2007-01-07','2007-01-14','2007-01-21','2007-01-28','2007-02-04','2007-02-11','2007-02-18','2007-02-25','2007-03-04','2007-03-11','2007-03-18','2007-03-25','2007-04-01','2007-04-08','2007-04-15','2007-04-22','2007-04-29','2007-05-06','2007-05-13','2007-05-20','2007-05-27','2007-06-03','2007-06-10','2007-06-17','2007-06-24','2007-07-01','2007-07-08','2007-07-15','2007-07-22','2007-07-29','2007-08-05','2007-08-12','2007-08-19','2007-08-26','2007-09-02','2007-09-09','2007-09-16','2007-09-23','2007-09-30','2007-10-07','2007-10-14','2007-10-21','2007-10-28','2007-11-04','2007-11-11','2007-11-18','2007-11-25','2007-12-02','2007-12-09','2007-12-16','2007-12-23','2007-12-30','2008-01-06','2008-01-13','2008-01-20','2008-01-27','2008-02-03','2008-02-10','2008-02-17','2008-02-24','2008-03-02','2008-03-09','2008-03-16','2008-03-23','2008-03-30','2008-04-06','2008-04-13','2008-04-20','2008-04-27','2008-05-04','2008-05-11','2008-05-18','2008-05-25','2008-06-01','2008-06-08','2008-06-15','2008-06-22','2008-06-29','2008-07-06','2008-07-13','2008-07-20','2008-07-27','2008-08-03','2008-08-10','2008-08-17','2008-08-24','2008-08-31','2008-09-07','2008-09-14','2008-09-21','2008-09-28','2008-10-05','2008-10-12','2008-10-19','2008-10-26','2008-11-02','2008-11-09','2008-11-16','2008-11-23','2008-11-30','2008-12-07','2008-12-14','2008-12-21','2008-12-28','2009-01-04','2009-01-11','2009-01-18','2009-01-25','2009-02-01','2009-02-08','2009-02-15','2009-02-22','2009-03-01','2009-03-08','2009-03-15','2009-03-22','2009-03-29','2009-04-05','2009-04-12','2009-04-19','2009-04-26','2009-05-03','2009-05-10','2009-05-17','2009-05-24','2009-05-31','2009-06-07','2009-06-14','2009-06-21','2009-06-28','2009-07-05','2009-07-12','2009-07-19','2009-07-26','2009-08-02','2009-08-09','2009-08-16','2009-08-23','2009-08-30','2009-09-06','2009-09-13','2009-09-20','2009-09-27','2009-10-04','2009-10-11','2009-10-18','2009-10-25','2009-11-01','2009-11-08','2009-11-15','2009-11-22','2009-11-29','2009-12-06','2009-12-13','2009-12-20','2009-12-27','2010-01-03','2010-01-10','2010-01-17','2010-01-24','2010-01-31','2010-02-07','2010-02-14','2010-02-21','2010-02-28','2010-03-07','2010-03-14','2010-03-21','2010-03-28','2010-04-04','2010-04-11','2010-04-18','2010-04-25','2010-05-02','2010-05-09','2010-05-16','2010-05-23','2010-05-30','2010-06-06','2010-06-13','2010-06-20','2010-06-27','2010-07-04','2010-07-11','2010-07-18','2010-07-25','2010-08-01','2010-08-08','2010-08-15','2010-08-22','2010-08-29','2010-09-05','2010-09-12','2010-09-19','2010-09-26','2010-10-03','2010-10-10','2010-10-17','2010-10-24','2010-10-31','2010-11-07','2010-11-14','2010-11-21','2010-11-28','2010-12-05','2010-12-12','2010-12-19','2010-12-26','2011-01-02','2011-01-09','2011-01-16','2011-01-23','2011-01-30','2011-02-06','2011-02-13','2011-02-20','2011-02-27','2011-03-06','2011-03-13','2011-03-20','2011-03-27','2011-04-03','2011-04-10','2011-04-17','2011-04-24','2011-05-01','2011-05-08','2011-05-15','2011-05-22','2011-05-29','2011-06-05','2011-06-12','2011-06-19','2011-06-26','2011-07-03','2011-07-10','2011-07-17','2011-07-24','2011-07-31','2011-08-07','2011-08-14','2011-08-21','2011-08-28','2011-09-04','2011-09-11','2011-09-18','2011-09-25','2011-10-02','2011-10-09','2011-10-16','2011-10-23','2011-10-30','2011-11-06','2011-11-13','2011-11-20','2011-11-27','2011-12-04','2011-12-11','2011-12-18','2011-12-25','2012-01-01','2012-01-08','2012-01-15','2012-01-22','2012-01-29','2012-02-05','2012-02-12','2012-02-19','2012-02-26','2012-03-04','2012-03-11','2012-03-18','2012-03-25','2012-04-01','2012-04-08','2012-04-15','2012-04-22','2012-04-29','2012-05-06','2012-05-13','2012-05-20','2012-05-27','2012-06-03','2012-06-10','2012-06-17','2012-06-24','2012-07-01','2012-07-08','2012-07-15','2012-07-22','2012-07-29','2012-08-05','2012-08-12','2012-08-19','2012-08-26','2012-09-02','2012-09-09','2012-09-16','2012-09-23','2012-09-30','2012-10-07','2012-10-14','2012-10-21','2012-10-28','2012-11-04','2012-11-11','2012-11-18','2012-11-25','2012-12-02','2012-12-09','2012-12-16','2012-12-23','2012-12-30','2013-01-06','2013-01-13','2013-01-20','2013-01-27','2013-02-03','2013-02-10','2013-02-17','2013-02-24','2013-03-03','2013-03-10','2013-03-17','2013-03-24','2013-03-31','2013-04-07','2013-04-14','2013-04-21','2013-04-28','2013-05-05','2013-05-12','2013-05-19','2013-05-26','2013-06-02','2013-06-09','2013-06-16','2013-06-23','2013-06-30','2013-07-07','2013-07-14','2013-07-21','2013-07-28','2013-08-04','2013-08-11','2013-08-18','2013-08-25','2013-09-01','2013-09-08','2013-09-15','2013-09-22','2013-09-29','2013-10-06','2013-10-13','2013-10-20','2013-10-27','2013-11-03','2013-11-10','2013-11-17','2013-11-24','2013-12-01','2013-12-08','2013-12-15','2013-12-22','2013-12-29','2014-01-05','2014-01-12','2014-01-19','2014-01-26','2014-02-02','2014-02-09','2014-02-16','2014-02-23','2014-03-02','2014-03-09','2014-03-16','2014-03-23','2014-03-30','2014-04-06','2014-04-13','2014-04-20','2014-04-27','2014-05-04','2014-05-11','2014-05-18','2014-05-25','2014-06-01','2014-06-08','2014-06-15','2014-06-22','2014-06-29','2014-07-06','2014-07-13','2014-07-20','2014-07-27','2014-08-03','2014-08-10','2014-08-17','2014-08-24','2014-08-31','2014-09-07','2014-09-14','2014-09-21','2014-09-28','2014-10-05','2014-10-12','2014-10-19','2014-10-26','2014-11-02','2014-11-09','2014-11-16','2014-11-23','2014-11-30','2014-12-07','2014-12-14','2014-12-21','2014-12-28','2015-01-04','2015-01-11','2015-01-18','2015-01-25','2015-02-01','2015-02-08','2015-02-15','2015-02-22','2015-03-01']\n", "\n", "for i in weekList:\n", " #print i\n", " dfSamp = dfAll[['id','rate'+i]]\n", " dfSamp.columns = ['id', 'rate']\n", " dfSamp.to_csv('/Users/danielmsheehan/GitHub/stat4701-edav-d3.github.com/viz/choropleth/pages/data/rate-'+i+'.tsv', index=False, sep='\\t')" ], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }