{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Digital Equity Data & Jupyter Notebooks\n", "Eleanor Tutt | tutte@carnegielibrary.org | Code for PGH Work Night | June 13, 2018" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Welcome!\n", "\n", "This is the same Jupyter notebook I will talk through. You can also use it as a starting point to explore digital equity data during the second half of the work night.\n", "\n", "I have been working with Jupyter notebooks on and off for several years, but I still consider myself a beginner and this talk is designed for beginners!\n", "\n", "I'll leave lots of time for you to explore Jupyter notebooks, python, and digital equity data using this notebook as a starting point - you can slow down or speed up then, preferably in small groups with new friends. :)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Some helpful resources\n", "\n", "#### Jupyter Notebooks\n", "\n", "* http://jupyter.org/\n", "* https://mybinder.org/\n", "\n", "#### Python / Pandas\n", "\n", "* http://pandas.pydata.org/pandas-docs/stable/10min.html\n", "* https://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe\n", "* https://chrisalbon.com/#python\n", "\n", "#### Folium\n", "\n", "* https://python-visualization.github.io/folium/quickstart.html\n", "* http://mattgoldwasser.com/posts/choroplot/\n", "* http://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/GeoJSON_and_choropleth.ipynb?flush_cache=true\n", "\n", "\n", "#### Digital Equity Data\n", "\n", "* https://data.wprdc.org/dataset/libraries\n", "* https://data.wprdc.org/dataset/clp-public-wifi\n", "* https://www.fcc.gov/reports-research/maps/\n", "* https://data.wprdc.org/dataset/pittsburgh-internet-service-providers-by-block-june-2016" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Jupyter toolbar" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# No code for this part - just talk!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting up your notebook" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# if you want to use any python packages, add them at the beginning of your notebook\n", "# packages allow you to take \"shortcuts\" instead of coding everything yourself\n", "# like if you are making a pizza and you buy dough instead of making it totally from scratch\n", "\n", "# pandas is a python package popular with data scientists\n", "# is common to shorten pandas to pd when you import it\n", "import pandas as pd\n", "\n", "# folium is a python package that helps create interactive maps using leaflet\n", "import folium\n", "\n", "# later we are going to read in some geojson\n", "import simplejson\n", "\n", "# and this will help us with some colors for a choropleth map\n", "from branca.colormap import linear" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing data" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# pandas has a function called read_csv that can import csv data and store it as a dataframe\n", "# a dataframe is a common pandas object that stores information in rows and columns\n", "# you can think of it a like a spreadsheet, especially if you are using .csv data\n", "\n", "# here we are creating a dataframe named libraries from a previously downloaded csv file\n", "# original data from https://data.wprdc.org/dataset/libraries\n", "# you could also link directly to the wprdc download url in your code:\n", "# https://data.wprdc.org/datastore/dump/14babf3f-4932-4828-8b49-3c9a03bae6d0\n", "\n", "libraries = pd.read_csv(\"librarylocations.csv\")\n", "\n", "# here is library wifi use data - brand new on the WPRDC!\n", "# original data from https://data.wprdc.org/dataset/clp-public-wifi\n", "\n", "wifi = pd.read_csv(\"clp-public-wifi.csv\") \n", "\n", "# here we are creating a dataframe named fcc from previously downloaded data\n", "# original data from \n", "# we added the dype parameter to make sure tract numbers are imported as text and not numbers\n", "\n", "fcc = pd.read_csv(\"tract_map_dec_2016.csv\", dtype={'tractcode': str})\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Looking at your data" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 73767 entries, 0 to 73766\n", "Data columns (total 3 columns):\n", "tractcode 73767 non-null object\n", "pcat_all 73767 non-null int64\n", "pcat_10x1 73767 non-null int64\n", "dtypes: int64(2), object(1)\n", "memory usage: 1.7+ MB\n" ] } ], "source": [ "# adding .info() to the name of a dataframe gives us some information about the dataframe -\n", "# how many rows of data we have, what the columns are named, and what the data types are\n", "\n", "fcc.info()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
_idCLPIDNameAddressCityStateZip4CountyPhoneSqFt...ThOpenThCloseFrOpenFrCloseSaOpenSaCloseSuOpenSuCloseLatLon
01CLP01ALLEGHENY LIBRARY1230 FEDERAL STPITTSBURGHPA15212-4704Allegheny(412) 237-189015005.0...10:00:0020:00:0010:00:0017:00:0010:00:0017:00:0012:00:0017:00:0040.456392-80.006613
12CLP02BEECHVIEW LIBRARY1910 BROADWAY AVEPITTSBURGHPA15216-3130Allegheny(412) 563-29008000.0...10:00:0020:00:0010:00:0017:00:0010:00:0017:00:00NaNNaN40.407026-80.027741
23CLP03BROOKLINE LIBRARY708 BROOKLINE BLVDPITTSBURGHPA15226-2102Allegheny(412) 561-100312651.0...10:00:0020:00:0010:00:0017:00:0010:00:0017:00:00NaNNaN40.394399-80.021427
34CLP04CARRICK LIBRARY1811 BROWNSVILLE RDPITTSBURGHPA15210-3907Allegheny(412) 882-38974000.0...NaNNaNNaNNaNNaNNaNNaNNaN40.397019-79.987547
45CLP05DOWNTOWN & BUSINESS LIBRARY612 SMITHFIELD STPITTSBURGHPA15222-2506Allegheny(412) 281-714112709.0...08:30:0018:00:0008:30:0017:00:0010:00:0017:00:00NaNNaN40.441749-79.997112
\n", "

5 rows × 26 columns

\n", "
" ], "text/plain": [ " _id CLPID Name Address City \\\n", "0 1 CLP01 ALLEGHENY LIBRARY 1230 FEDERAL ST PITTSBURGH \n", "1 2 CLP02 BEECHVIEW LIBRARY 1910 BROADWAY AVE PITTSBURGH \n", "2 3 CLP03 BROOKLINE LIBRARY 708 BROOKLINE BLVD PITTSBURGH \n", "3 4 CLP04 CARRICK LIBRARY 1811 BROWNSVILLE RD PITTSBURGH \n", "4 5 CLP05 DOWNTOWN & BUSINESS LIBRARY 612 SMITHFIELD ST PITTSBURGH \n", "\n", " State Zip4 County Phone SqFt ... ThOpen \\\n", "0 PA 15212-4704 Allegheny (412) 237-1890 15005.0 ... 10:00:00 \n", "1 PA 15216-3130 Allegheny (412) 563-2900 8000.0 ... 10:00:00 \n", "2 PA 15226-2102 Allegheny (412) 561-1003 12651.0 ... 10:00:00 \n", "3 PA 15210-3907 Allegheny (412) 882-3897 4000.0 ... NaN \n", "4 PA 15222-2506 Allegheny (412) 281-7141 12709.0 ... 08:30:00 \n", "\n", " ThClose FrOpen FrClose SaOpen SaClose SuOpen SuClose \\\n", "0 20:00:00 10:00:00 17:00:00 10:00:00 17:00:00 12:00:00 17:00:00 \n", "1 20:00:00 10:00:00 17:00:00 10:00:00 17:00:00 NaN NaN \n", "2 20:00:00 10:00:00 17:00:00 10:00:00 17:00:00 NaN NaN \n", "3 NaN NaN NaN NaN NaN NaN NaN \n", "4 18:00:00 08:30:00 17:00:00 10:00:00 17:00:00 NaN NaN \n", "\n", " Lat Lon \n", "0 40.456392 -80.006613 \n", "1 40.407026 -80.027741 \n", "2 40.394399 -80.021427 \n", "3 40.397019 -79.987547 \n", "4 40.441749 -79.997112 \n", "\n", "[5 rows x 26 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# you also might want to look at the first few rows of your dataframe\n", "\n", "libraries.head(5)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CLPIDNameYearMonthWifiSessionsWifiMinutes
527CLP19WOODS RUN LIBRARY20171263079279
528CLP19WOODS RUN LIBRARY2018171699671
529CLP19WOODS RUN LIBRARY20182778108100
530CLP19WOODS RUN LIBRARY20183816104073
531CLP19WOODS RUN LIBRARY20184903107865
\n", "
" ], "text/plain": [ " CLPID Name Year Month WifiSessions WifiMinutes\n", "527 CLP19 WOODS RUN LIBRARY 2017 12 630 79279\n", "528 CLP19 WOODS RUN LIBRARY 2018 1 716 99671\n", "529 CLP19 WOODS RUN LIBRARY 2018 2 778 108100\n", "530 CLP19 WOODS RUN LIBRARY 2018 3 816 104073\n", "531 CLP19 WOODS RUN LIBRARY 2018 4 903 107865" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# or the last few rows\n", "\n", "wifi.tail(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summarizing your data" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Name Year\n", "ALLEGHENY LIBRARY 2016 13224\n", " 2017 19914\n", " 2018 6872\n", "BEECHVIEW LIBRARY 2016 5924\n", " 2017 8509\n", " 2018 2886\n", "BROOKLINE LIBRARY 2016 6840\n", " 2017 9076\n", " 2018 2350\n", "CARRICK LIBRARY 2016 4498\n", " 2017 1932\n", " 2018 0\n", "DOWNTOWN & BUSINESS LIBRARY 2016 36168\n", " 2017 47660\n", " 2018 16974\n", "EAST LIBERTY LIBRARY 2016 21521\n", " 2017 29329\n", " 2018 9712\n", "HAZELWOOD LIBRARY 2016 7968\n", " 2017 8198\n", " 2018 2638\n", "HILL DISTRICT LIBRARY 2016 7693\n", " 2017 12886\n", " 2018 3943\n", "HOMEWOOD LIBRARY 2016 11363\n", " 2017 14144\n", " 2018 4111\n", "KNOXVILLE LIBRARY 2016 4953\n", " 2017 16859\n", " 2018 4362\n", "LAWRENCEVILLE LIBRARY 2016 3877\n", " 2017 5909\n", " 2018 1788\n", "LIBRARY FOR THE BLIND & PHYSICALLY HANDICAPPED 2016 2407\n", " 2017 3712\n", " 2018 739\n", "MAIN (OAKLAND) LIBRARY 2016 108945\n", " 2017 134235\n", " 2018 53078\n", "MOUNT WASHINGTON LIBRARY 2016 2959\n", " 2017 4076\n", " 2018 992\n", "SHERADEN LIBRARY 2016 8805\n", " 2017 11926\n", " 2018 3026\n", "SOUTH SIDE LIBRARY 2016 7946\n", " 2017 9630\n", " 2018 3264\n", "SQUIRREL HILL LIBRARY 2016 44102\n", " 2017 57127\n", " 2018 20003\n", "WEST END LIBRARY 2016 3999\n", " 2017 4228\n", " 2018 1147\n", "WOODS RUN LIBRARY 2016 8495\n", " 2017 11395\n", " 2018 3213\n", "Name: WifiSessions, dtype: int64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we can group by one or more columns in our data\n", "# and then summarize by another column\n", "# (try \"mean\" instead of \"sum\")\n", "\n", "wifi.groupby(by=['Name', 'Year'])['WifiSessions'].sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Making a map" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = folium.Map(location=[40.442, -79.997],\n", " zoom_start=13,\n", " tiles=\"Stamen Toner\")\n", "\n", "library_layer = folium.FeatureGroup(name='CLP Libraries')\n", "\n", "for index, row in libraries.iterrows():\n", " library_layer.add_child(folium.CircleMarker(location=(row[\"Lat\"],\n", " row[\"Lon\"]),\n", " radius=150,\n", " popup=row[\"Name\"],\n", " color=False,\n", " fill_opacity=1,\n", " fill_color=\"#d4237d\"))\n", "\n", "m.add_child(library_layer)\n", "\n", "folium.LayerControl().add_to(m)\n", "\n", "m" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Making a more complicated map!" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# side note: folium has a \"choropleth\" shortcut I generally use\n", "# but I just learned today they are thinking of depreciating it\n", "# so I made this map in a slightly more complicated way" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# our fcc dataframe has data for the entire US\n", "# but all those census tracts would make for a large file\n", "# we will load a previously downloaded geojson file\n", "# original data from https://data.wprdc.org/dataset/allegheny-county-census-tracts-2016\n", "# a good place to explore geojson as a data structure is http://geojson.io/\n", "\n", "allegheny_tracts = simplejson.load(open(\"Allegheny_County_Census_Tracts_2016.geojson\"))" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "#004529\n" ] }, { "data": { "text/html": [ "05" ], "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create a colormap\n", "\n", "colormap = linear.YlGn_09.scale(\n", " fcc.pcat_all.min(),\n", " fcc.pcat_all.max())\n", "\n", "print(colormap(5.0))\n", "\n", "colormap" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# and a way to connect tractcode/geoids to the relevant data\n", "\n", "fcc_dict = fcc.set_index('tractcode')['pcat_all']\n", "fcc_dict['42003468900']" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m2 = folium.Map(location=[40.442, -79.997],\n", " zoom_start=13,\n", " tiles=\"Stamen Toner\")\n", "\n", "fcc_layer = folium.GeoJson(\n", " allegheny_tracts,\n", " name='Residential Internet',\n", " style_function=lambda feature: {\n", " 'fillColor': colormap(fcc_dict[feature['properties']['GEOID']]),\n", " 'color': 'black',\n", " 'weight': 1,\n", " 'fillOpacity': 0.8,\n", " }\n", ")\n", "\n", "m2.add_child(fcc_layer)\n", "m2.add_child(library_layer)\n", "\n", "\n", "folium.LayerControl().add_to(m2)\n", "\n", "\n", "m2" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [Root]", "language": "python", "name": "Python [Root]" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }