{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

Data Visualization of NYC High Schools College Enrollment for the 2013-14 School Year, by Marlon Pimentel

\n", "

Quick Summary

\n", "

Tools: Python (with pandas and folium) & Google's Geocoding API

\n", "

Using the public data available at NYC OpenData, I downloaded two data sets, one that only had the DBN codes associated with the schools along with the statistics of college enrollment, and another which was a directory of high schools (name, street address, city..etc). I combined what I needed from the two and then converted each address into coordinates using Google's Geocoding API. I then mapped each school into a city map based on their coordinates and color coded them based on their college enrollment, thus painting an image out of the data.

\n", "

Keep in Mind:

\n", "" ] }, { "cell_type": "code", "execution_count": 320, "metadata": {}, "outputs": [], "source": [ "import folium # map visuals\n", "import pandas as pd # data manipulation\n", "import requests\n", "from time import sleep" ] }, { "cell_type": "code", "execution_count": 322, "metadata": { "scrolled": true }, "outputs": [], "source": [ "schoolPerformance = pd.read_csv(r\"C:\\Users\\Marlon Pimentel\\Desktop\\2013_-_2014_DOE_HS_Performance.csv\")\n", "schoolInfo = pd.read_csv(r\"C:\\Users\\Marlon Pimentel\\Desktop\\DOE_High_School_Directory_2013-2014.csv\")" ] }, { "cell_type": "code", "execution_count": 323, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DBNPrinted_NamePrimary_Address_Line_1CityState_CodeneighborhoodBorocollegeStatLatitudeLongitudehasStat
001M292Henry Street School for International Studies220 Henry StreetNew YorkNYLower East SideM40.00%True
101M448University Neighborhood High School200 Monroe StreetNew YorkNYLower East SideM67.30%True
201M450East Side Community School420 East 12 StreetNew YorkNYEast VillageM55.70%True
301M509Marta Valle High School145 Stanton StreetNew YorkNYLower East SideM47.90%True
....................................
41813K430Brooklyn Technical High School29 Ft Greene PlaceBrooklynNYFort GreeneK94.80%True
41910X445Bronx High School of Science75 West 205 StreetBronxNYKingsbridgeX99.90%True
42014K449The Brooklyn Latin School223 Graham AvenueBrooklynNYWilliamsburgK85.70%True
42128Q687Queens High School for the Sciences at York Co...94-50 159 StreetJamaicaNYJamaicaQ100.00%True
\n", "

422 rows × 11 columns

\n", "
" ], "text/plain": [ " DBN Printed_Name \\\n", "0 01M292 Henry Street School for International Studies \n", "1 01M448 University Neighborhood High School \n", "2 01M450 East Side Community School \n", "3 01M509 Marta Valle High School \n", ".. ... ... \n", "418 13K430 Brooklyn Technical High School \n", "419 10X445 Bronx High School of Science \n", "420 14K449 The Brooklyn Latin School \n", "421 28Q687 Queens High School for the Sciences at York Co... \n", "\n", " Primary_Address_Line_1 City State_Code neighborhood Boro \\\n", "0 220 Henry Street New York NY Lower East Side M \n", "1 200 Monroe Street New York NY Lower East Side M \n", "2 420 East 12 Street New York NY East Village M \n", "3 145 Stanton Street New York NY Lower East Side M \n", ".. ... ... ... ... ... \n", "418 29 Ft Greene Place Brooklyn NY Fort Greene K \n", "419 75 West 205 Street Bronx NY Kingsbridge X \n", "420 223 Graham Avenue Brooklyn NY Williamsburg K \n", "421 94-50 159 Street Jamaica NY Jamaica Q \n", "\n", " collegeStat Latitude Longitude hasStat \n", "0 40.00% True \n", "1 67.30% True \n", "2 55.70% True \n", "3 47.90% True \n", ".. ... ... ... ... \n", "418 94.80% True \n", "419 99.90% True \n", "420 85.70% True \n", "421 100.00% True \n", "\n", "[422 rows x 11 columns]" ] }, "execution_count": 323, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# extracts only necessary columns off data sets\n", "schoolStat = schoolPerformance[\"college enroll 2011-12\"]\n", "schoolInfoSub = schoolInfo[[\"DBN\", \"Printed_Name\", \"Primary_Address_Line_1\",\\\n", " \"City\", \"State_Code\", \"neighborhood\", \"Boro\"]]\n", "\n", "\n", "# merging the two subsets into one, adding new columns for coords and renaming\n", "fullProfiles = pd.concat([schoolInfoSub, schoolStat], axis=1, sort=False)\n", "fullProfiles[\"Latitude\"] = \"\"\n", "fullProfiles[\"Longitude\"] = \"\"\n", "fullProfiles[\"hasStat\"] = \"True\"\n", "fullProfiles.rename(columns = {\"college enroll 2011-12\": \"collegeStat\"}, inplace = True)\n", "pd.set_option('max_rows', 8)\n", "fullProfiles" ] }, { "cell_type": "code", "execution_count": 324, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'bx': 31, 'bk': 25, 'manh': 27, 'queens': 24, 'si': 1}\n", "Total Unavailable: 108\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DBNPrinted_NamePrimary_Address_Line_1CityState_CodeneighborhoodBorocollegeStatLatitudeLongitudehasStat
001M292Henry Street School for International Studies220 Henry StreetNew YorkNYLower East SideM40.00%True
101M448University Neighborhood High School200 Monroe StreetNew YorkNYLower East SideM67.30%True
201M450East Side Community School420 East 12 StreetNew YorkNYEast VillageM55.70%True
301M509Marta Valle High School145 Stanton StreetNew YorkNYLower East SideM47.90%True
....................................
41813K430Brooklyn Technical High School29 Ft Greene PlaceBrooklynNYFort GreeneK94.80%True
41910X445Bronx High School of Science75 West 205 StreetBronxNYKingsbridgeX99.90%True
42014K449The Brooklyn Latin School223 Graham AvenueBrooklynNYWilliamsburgK85.70%True
42128Q687Queens High School for the Sciences at York Co...94-50 159 StreetJamaicaNYJamaicaQ100.00%True
\n", "

314 rows × 11 columns

\n", "
" ], "text/plain": [ " DBN Printed_Name \\\n", "0 01M292 Henry Street School for International Studies \n", "1 01M448 University Neighborhood High School \n", "2 01M450 East Side Community School \n", "3 01M509 Marta Valle High School \n", ".. ... ... \n", "418 13K430 Brooklyn Technical High School \n", "419 10X445 Bronx High School of Science \n", "420 14K449 The Brooklyn Latin School \n", "421 28Q687 Queens High School for the Sciences at York Co... \n", "\n", " Primary_Address_Line_1 City State_Code neighborhood Boro \\\n", "0 220 Henry Street New York NY Lower East Side M \n", "1 200 Monroe Street New York NY Lower East Side M \n", "2 420 East 12 Street New York NY East Village M \n", "3 145 Stanton Street New York NY Lower East Side M \n", ".. ... ... ... ... ... \n", "418 29 Ft Greene Place Brooklyn NY Fort Greene K \n", "419 75 West 205 Street Bronx NY Kingsbridge X \n", "420 223 Graham Avenue Brooklyn NY Williamsburg K \n", "421 94-50 159 Street Jamaica NY Jamaica Q \n", "\n", " collegeStat Latitude Longitude hasStat \n", "0 40.00% True \n", "1 67.30% True \n", "2 55.70% True \n", "3 47.90% True \n", ".. ... ... ... ... \n", "418 94.80% True \n", "419 99.90% True \n", "420 85.70% True \n", "421 100.00% True \n", "\n", "[314 rows x 11 columns]" ] }, "execution_count": 324, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# getting rid of schools from set that have no stat\n", "# tracking where these schools are located to include in final report\n", "\n", "unavailableStats = {\"bx\": 0, \"bk\": 0, \"manh\": 0, \"queens\": 0, \"si\": 0}\n", "\n", "for idx, row in fullProfiles.iterrows():\n", " if type(row[\"collegeStat\"]) == float:\n", " if row[\"Boro\"] == \"M\":\n", " unavailableStats[\"manh\"] += 1\n", " elif row[\"Boro\"] == \"X\":\n", " unavailableStats[\"bx\"] += 1\n", " elif row[\"Boro\"] == \"K\":\n", " unavailableStats[\"bk\"] += 1\n", " elif row[\"Boro\"] == \"Q\":\n", " unavailableStats[\"queens\"] += 1\n", " elif row[\"Boro\"] == \"R\":\n", " unavailableStats[\"si\"] += 1\n", "\n", " fullProfiles.loc[idx, \"hasStat\"] = \"False\"\n", "\n", "\n", "print(unavailableStats)\n", "\n", "totalUnvailable = 0\n", "\n", "for borough in unavailableStats.values():\n", " totalUnvailable += borough\n", "\n", "print(\"Total Unavailable: {}\".format(totalUnvailable))\n", "\n", "# new data set with schools that had no stat removed (108 removed out of 422, down to 314)\n", "fullProfilesv2 = fullProfiles[fullProfiles.hasStat != \"False\"]\n", "fullProfilesv2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Geocoding

\n", "

At this point, the data set is almost how I want it to be. I then began converting the school's full address into coordinates of latitude and longitude to add into the empty columns seen above.

" ] }, { "cell_type": "code", "execution_count": 325, "metadata": {}, "outputs": [], "source": [ "# 1. Add coords into new columns\n", "\n", "# first run will give us some with coords and some without due to being missed by the requests\n", "# then we will run a similar func until all have coords\n", "\n", "for idx, row in fullProfilesv2.iterrows():\n", "\n", " try:\n", " # sleep is to avoid sending too many requests quickly\n", " schoolAddress = row[\"Primary_Address_Line_1\"] + \", \" + row[\"City\"] \\\n", " + \", \" + row[\"State_Code\"]\n", "\n", " schoolAddressEdit = schoolAddress.replace(\" \", \"+\")\n", "\n", " r = requests.get(\"https://maps.googleapis.com/maps/api/geocode/json?address={}key=AIzaSyBU828rT3AJLuU4LtZksyB2iWH63oBLCsU\".format(schoolAddressEdit))\n", " jsonInfo = r.json()\n", " locationDict = jsonInfo['results'][0]['geometry']['location']\n", " row[\"Latitude\"] = locationDict['lat']\n", " row[\"Longitude\"] = locationDict['lng']\n", " sleep(2)\n", "\n", " except IndexError:\n", " row[\"Latitude\"] = \"missed\"\n", " row[\"Longitude\"] = \"missed\"\n", "\n" ] }, { "cell_type": "code", "execution_count": 387, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n" ] } ], "source": [ "# runs until all have coordinates, then we can finally map by retrieving them\n", "# have to do it this way to meet API's limit (cus im not paying to get more requests, im broke)\n", "\n", "for idx, row in fullProfilesv2.iterrows():\n", " try:\n", " if row[\"Latitude\"] == \"missed\":\n", " schoolAddress = row[\"Primary_Address_Line_1\"] + \", \" + row[\"City\"] \\\n", " + \", \" + row[\"State_Code\"]\n", "\n", " schoolAddressEdit = schoolAddress.replace(\" \", \"+\")\n", "\n", " r = requests.get(\"https://maps.googleapis.com/maps/api/geocode/json?address={}key=AIzaSyBU828rT3AJLuU4LtZksyB2iWH63oBLCsU\".format(schoolAddressEdit))\n", " jsonInfo = r.json()\n", " locationDict = jsonInfo['results'][0]['geometry']['location']\n", " row[\"Latitude\"] = locationDict['lat']\n", " row[\"Longitude\"] = locationDict['lng']\n", " sleep(2)\n", "\n", " except IndexError:\n", " pass\n", "print(fullProfilesv2.Latitude[fullProfilesv2.Latitude == \"missed\"].count())\n" ] }, { "cell_type": "code", "execution_count": 388, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "False\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DBNPrinted_NamePrimary_Address_Line_1CityState_CodeneighborhoodBorocollegeStatLatitudeLongitudehasStat
001M292Henry Street School for International Studies220 Henry StreetNew YorkNYLower East SideM40.00%40.7134-73.986True
101M448University Neighborhood High School200 Monroe StreetNew YorkNYLower East SideM67.30%40.7123-73.9839True
201M450East Side Community School420 East 12 StreetNew YorkNYEast VillageM55.70%40.7294-73.9825True
301M509Marta Valle High School145 Stanton StreetNew YorkNYLower East SideM47.90%40.7205-73.986True
....................................
41813K430Brooklyn Technical High School29 Ft Greene PlaceBrooklynNYFort GreeneK94.80%40.689-73.9767True
41910X445Bronx High School of Science75 West 205 StreetBronxNYKingsbridgeX99.90%40.8783-73.8908True
42014K449The Brooklyn Latin School223 Graham AvenueBrooklynNYWilliamsburgK85.70%40.7103-73.944True
42128Q687Queens High School for the Sciences at York Co...94-50 159 StreetJamaicaNYJamaicaQ100.00%40.7009-73.7983True
\n", "

314 rows × 11 columns

\n", "
" ], "text/plain": [ " DBN Printed_Name \\\n", "0 01M292 Henry Street School for International Studies \n", "1 01M448 University Neighborhood High School \n", "2 01M450 East Side Community School \n", "3 01M509 Marta Valle High School \n", ".. ... ... \n", "418 13K430 Brooklyn Technical High School \n", "419 10X445 Bronx High School of Science \n", "420 14K449 The Brooklyn Latin School \n", "421 28Q687 Queens High School for the Sciences at York Co... \n", "\n", " Primary_Address_Line_1 City State_Code neighborhood Boro \\\n", "0 220 Henry Street New York NY Lower East Side M \n", "1 200 Monroe Street New York NY Lower East Side M \n", "2 420 East 12 Street New York NY East Village M \n", "3 145 Stanton Street New York NY Lower East Side M \n", ".. ... ... ... ... ... \n", "418 29 Ft Greene Place Brooklyn NY Fort Greene K \n", "419 75 West 205 Street Bronx NY Kingsbridge X \n", "420 223 Graham Avenue Brooklyn NY Williamsburg K \n", "421 94-50 159 Street Jamaica NY Jamaica Q \n", "\n", " collegeStat Latitude Longitude hasStat \n", "0 40.00% 40.7134 -73.986 True \n", "1 67.30% 40.7123 -73.9839 True \n", "2 55.70% 40.7294 -73.9825 True \n", "3 47.90% 40.7205 -73.986 True \n", ".. ... ... ... ... \n", "418 94.80% 40.689 -73.9767 True \n", "419 99.90% 40.8783 -73.8908 True \n", "420 85.70% 40.7103 -73.944 True \n", "421 100.00% 40.7009 -73.7983 True \n", "\n", "[314 rows x 11 columns]" ] }, "execution_count": 388, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(fullProfilesv2.Latitude[fullProfilesv2.Latitude == \"missed\"].count())\n", "print(any(fullProfilesv2.Latitude == \"missed\"))\n", "fullProfilesv2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Plotting onto Folium Map

\n", "

Here, I had gotten all of the necessary parts that I needed. Thus, I began to create a map through folium, a nice library for interactive maps. With each cell in my data set, I evaluated its statistic using a criteria I made, and placed it on the map with a specific color.

" ] }, { "cell_type": "code", "execution_count": 392, "metadata": {}, "outputs": [], "source": [ "fullProfilesv2.to_csv(\"schoolCompleteData.csv\", encoding='utf-8')" ] }, { "cell_type": "code", "execution_count": 401, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "BX represented by 85 schools.\n", "BK represented by 93 schools.\n", "Manhattan represented by 75 schools.\n", "Queens represented by 52 schools.\n", "SI represented by 9 schools.\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 401, "metadata": {}, "output_type": "execute_result" } ], "source": [ "city_map = folium.Map(location=[40.738, -73.98],\n", " zoom_start=11,\n", " tiles=\"CartoDB dark_matter\")\n", "\n", "# this fixes our problem of the requests missing schools thus not having all coordinates to map at once\n", "# maps each school onto map with a color based on their stat\n", "\n", "boroughsShown = {\"BX\": 0, \"BK\": 0, \"Manhattan\": 0, \"Queens\": 0, \"SI\": 0}\n", "boroughsShown[\"BX\"] = fullProfilesv2.Boro[fullProfilesv2.Boro == \"X\"].count()\n", "boroughsShown[\"BK\"] = fullProfilesv2.Boro[fullProfilesv2.Boro == \"K\"].count()\n", "boroughsShown[\"SI\"] = fullProfilesv2.Boro[fullProfilesv2.Boro == \"R\"].count()\n", "boroughsShown[\"Manhattan\"] = fullProfilesv2.Boro[fullProfilesv2.Boro == \"M\"].count()\n", "boroughsShown[\"Queens\"] = fullProfilesv2.Boro[fullProfilesv2.Boro == \"Q\"].count()\n", "\n", "for boro, count in boroughsShown.items():\n", " print(\"{} represented by {} schools.\".format(boro, count))\n", "\n", "for idx, row in fullProfilesv2.iterrows():\n", " \n", " if row[\"Latitude\"] != \"missed\":\n", "\n", " stat = float(row[\"collegeStat\"].replace(\"%\", \"\"))\n", "\n", " if stat >= 90:\n", " color = \"#0000A0\" # dark blue\n", " elif stat >= 75 and stat < 90:\n", " color = \"#736AFF\" # light blue\n", " elif stat >= 65 and stat < 75:\n", " color = \"FFFFFF\" # white\n", " elif stat >= 50 and stat < 65:\n", " color = \"#FF0099\" # pink \n", " elif stat < 50:\n", " color = \"#FF0000\" # red\n", "\n", "# popup_text = \"{}
Neighborhood: {}
College Enrollment: {}
\"\n", "\n", "# popup_text = popup_text.format(row[\"Printed_Name\"],\n", "# row[\"neighborhood\"], stat)\n", "\n", " folium.CircleMarker(location=(row[\"Latitude\"],\n", " row[\"Longitude\"]),\n", " radius=2,\n", " color=color,\n", " fill=True).add_to(city_map)\n", " \n", "\n", "city_map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Key

\n", "" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 2 }