{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
Using the public data available at NYC OpenData, I downloaded two data sets, one that only had the DBN codes associated with the schools along with the statistics of college enrollment, and another which was a directory of high schools (name, street address, city..etc). I combined what I needed from the two and then converted each address into coordinates using Google's Geocoding API. I then mapped each school into a city map based on their coordinates and color coded them based on their college enrollment, thus painting an image out of the data.
\n", "\n", " | DBN | \n", "Printed_Name | \n", "Primary_Address_Line_1 | \n", "City | \n", "State_Code | \n", "neighborhood | \n", "Boro | \n", "collegeStat | \n", "Latitude | \n", "Longitude | \n", "hasStat | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "01M292 | \n", "Henry Street School for International Studies | \n", "220 Henry Street | \n", "New York | \n", "NY | \n", "Lower East Side | \n", "M | \n", "40.00% | \n", "\n", " | \n", " | True | \n", "
1 | \n", "01M448 | \n", "University Neighborhood High School | \n", "200 Monroe Street | \n", "New York | \n", "NY | \n", "Lower East Side | \n", "M | \n", "67.30% | \n", "\n", " | \n", " | True | \n", "
2 | \n", "01M450 | \n", "East Side Community School | \n", "420 East 12 Street | \n", "New York | \n", "NY | \n", "East Village | \n", "M | \n", "55.70% | \n", "\n", " | \n", " | True | \n", "
3 | \n", "01M509 | \n", "Marta Valle High School | \n", "145 Stanton Street | \n", "New York | \n", "NY | \n", "Lower East Side | \n", "M | \n", "47.90% | \n", "\n", " | \n", " | True | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
418 | \n", "13K430 | \n", "Brooklyn Technical High School | \n", "29 Ft Greene Place | \n", "Brooklyn | \n", "NY | \n", "Fort Greene | \n", "K | \n", "94.80% | \n", "\n", " | \n", " | True | \n", "
419 | \n", "10X445 | \n", "Bronx High School of Science | \n", "75 West 205 Street | \n", "Bronx | \n", "NY | \n", "Kingsbridge | \n", "X | \n", "99.90% | \n", "\n", " | \n", " | True | \n", "
420 | \n", "14K449 | \n", "The Brooklyn Latin School | \n", "223 Graham Avenue | \n", "Brooklyn | \n", "NY | \n", "Williamsburg | \n", "K | \n", "85.70% | \n", "\n", " | \n", " | True | \n", "
421 | \n", "28Q687 | \n", "Queens High School for the Sciences at York Co... | \n", "94-50 159 Street | \n", "Jamaica | \n", "NY | \n", "Jamaica | \n", "Q | \n", "100.00% | \n", "\n", " | \n", " | True | \n", "
422 rows × 11 columns
\n", "\n", " | DBN | \n", "Printed_Name | \n", "Primary_Address_Line_1 | \n", "City | \n", "State_Code | \n", "neighborhood | \n", "Boro | \n", "collegeStat | \n", "Latitude | \n", "Longitude | \n", "hasStat | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "01M292 | \n", "Henry Street School for International Studies | \n", "220 Henry Street | \n", "New York | \n", "NY | \n", "Lower East Side | \n", "M | \n", "40.00% | \n", "\n", " | \n", " | True | \n", "
1 | \n", "01M448 | \n", "University Neighborhood High School | \n", "200 Monroe Street | \n", "New York | \n", "NY | \n", "Lower East Side | \n", "M | \n", "67.30% | \n", "\n", " | \n", " | True | \n", "
2 | \n", "01M450 | \n", "East Side Community School | \n", "420 East 12 Street | \n", "New York | \n", "NY | \n", "East Village | \n", "M | \n", "55.70% | \n", "\n", " | \n", " | True | \n", "
3 | \n", "01M509 | \n", "Marta Valle High School | \n", "145 Stanton Street | \n", "New York | \n", "NY | \n", "Lower East Side | \n", "M | \n", "47.90% | \n", "\n", " | \n", " | True | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
418 | \n", "13K430 | \n", "Brooklyn Technical High School | \n", "29 Ft Greene Place | \n", "Brooklyn | \n", "NY | \n", "Fort Greene | \n", "K | \n", "94.80% | \n", "\n", " | \n", " | True | \n", "
419 | \n", "10X445 | \n", "Bronx High School of Science | \n", "75 West 205 Street | \n", "Bronx | \n", "NY | \n", "Kingsbridge | \n", "X | \n", "99.90% | \n", "\n", " | \n", " | True | \n", "
420 | \n", "14K449 | \n", "The Brooklyn Latin School | \n", "223 Graham Avenue | \n", "Brooklyn | \n", "NY | \n", "Williamsburg | \n", "K | \n", "85.70% | \n", "\n", " | \n", " | True | \n", "
421 | \n", "28Q687 | \n", "Queens High School for the Sciences at York Co... | \n", "94-50 159 Street | \n", "Jamaica | \n", "NY | \n", "Jamaica | \n", "Q | \n", "100.00% | \n", "\n", " | \n", " | True | \n", "
314 rows × 11 columns
\n", "At this point, the data set is almost how I want it to be. I then began converting the school's full address into coordinates of latitude and longitude to add into the empty columns seen above.
" ] }, { "cell_type": "code", "execution_count": 325, "metadata": {}, "outputs": [], "source": [ "# 1. Add coords into new columns\n", "\n", "# first run will give us some with coords and some without due to being missed by the requests\n", "# then we will run a similar func until all have coords\n", "\n", "for idx, row in fullProfilesv2.iterrows():\n", "\n", " try:\n", " # sleep is to avoid sending too many requests quickly\n", " schoolAddress = row[\"Primary_Address_Line_1\"] + \", \" + row[\"City\"] \\\n", " + \", \" + row[\"State_Code\"]\n", "\n", " schoolAddressEdit = schoolAddress.replace(\" \", \"+\")\n", "\n", " r = requests.get(\"https://maps.googleapis.com/maps/api/geocode/json?address={}key=AIzaSyBU828rT3AJLuU4LtZksyB2iWH63oBLCsU\".format(schoolAddressEdit))\n", " jsonInfo = r.json()\n", " locationDict = jsonInfo['results'][0]['geometry']['location']\n", " row[\"Latitude\"] = locationDict['lat']\n", " row[\"Longitude\"] = locationDict['lng']\n", " sleep(2)\n", "\n", " except IndexError:\n", " row[\"Latitude\"] = \"missed\"\n", " row[\"Longitude\"] = \"missed\"\n", "\n" ] }, { "cell_type": "code", "execution_count": 387, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n" ] } ], "source": [ "# runs until all have coordinates, then we can finally map by retrieving them\n", "# have to do it this way to meet API's limit (cus im not paying to get more requests, im broke)\n", "\n", "for idx, row in fullProfilesv2.iterrows():\n", " try:\n", " if row[\"Latitude\"] == \"missed\":\n", " schoolAddress = row[\"Primary_Address_Line_1\"] + \", \" + row[\"City\"] \\\n", " + \", \" + row[\"State_Code\"]\n", "\n", " schoolAddressEdit = schoolAddress.replace(\" \", \"+\")\n", "\n", " r = requests.get(\"https://maps.googleapis.com/maps/api/geocode/json?address={}key=AIzaSyBU828rT3AJLuU4LtZksyB2iWH63oBLCsU\".format(schoolAddressEdit))\n", " jsonInfo = r.json()\n", " locationDict = jsonInfo['results'][0]['geometry']['location']\n", " row[\"Latitude\"] = locationDict['lat']\n", " row[\"Longitude\"] = locationDict['lng']\n", " sleep(2)\n", "\n", " except IndexError:\n", " pass\n", "print(fullProfilesv2.Latitude[fullProfilesv2.Latitude == \"missed\"].count())\n" ] }, { "cell_type": "code", "execution_count": 388, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "False\n" ] }, { "data": { "text/html": [ "\n", " | DBN | \n", "Printed_Name | \n", "Primary_Address_Line_1 | \n", "City | \n", "State_Code | \n", "neighborhood | \n", "Boro | \n", "collegeStat | \n", "Latitude | \n", "Longitude | \n", "hasStat | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "01M292 | \n", "Henry Street School for International Studies | \n", "220 Henry Street | \n", "New York | \n", "NY | \n", "Lower East Side | \n", "M | \n", "40.00% | \n", "40.7134 | \n", "-73.986 | \n", "True | \n", "
1 | \n", "01M448 | \n", "University Neighborhood High School | \n", "200 Monroe Street | \n", "New York | \n", "NY | \n", "Lower East Side | \n", "M | \n", "67.30% | \n", "40.7123 | \n", "-73.9839 | \n", "True | \n", "
2 | \n", "01M450 | \n", "East Side Community School | \n", "420 East 12 Street | \n", "New York | \n", "NY | \n", "East Village | \n", "M | \n", "55.70% | \n", "40.7294 | \n", "-73.9825 | \n", "True | \n", "
3 | \n", "01M509 | \n", "Marta Valle High School | \n", "145 Stanton Street | \n", "New York | \n", "NY | \n", "Lower East Side | \n", "M | \n", "47.90% | \n", "40.7205 | \n", "-73.986 | \n", "True | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
418 | \n", "13K430 | \n", "Brooklyn Technical High School | \n", "29 Ft Greene Place | \n", "Brooklyn | \n", "NY | \n", "Fort Greene | \n", "K | \n", "94.80% | \n", "40.689 | \n", "-73.9767 | \n", "True | \n", "
419 | \n", "10X445 | \n", "Bronx High School of Science | \n", "75 West 205 Street | \n", "Bronx | \n", "NY | \n", "Kingsbridge | \n", "X | \n", "99.90% | \n", "40.8783 | \n", "-73.8908 | \n", "True | \n", "
420 | \n", "14K449 | \n", "The Brooklyn Latin School | \n", "223 Graham Avenue | \n", "Brooklyn | \n", "NY | \n", "Williamsburg | \n", "K | \n", "85.70% | \n", "40.7103 | \n", "-73.944 | \n", "True | \n", "
421 | \n", "28Q687 | \n", "Queens High School for the Sciences at York Co... | \n", "94-50 159 Street | \n", "Jamaica | \n", "NY | \n", "Jamaica | \n", "Q | \n", "100.00% | \n", "40.7009 | \n", "-73.7983 | \n", "True | \n", "
314 rows × 11 columns
\n", "Here, I had gotten all of the necessary parts that I needed. Thus, I began to create a map through folium, a nice library for interactive maps. With each cell in my data set, I evaluated its statistic using a criteria I made, and placed it on the map with a specific color.
" ] }, { "cell_type": "code", "execution_count": 392, "metadata": {}, "outputs": [], "source": [ "fullProfilesv2.to_csv(\"schoolCompleteData.csv\", encoding='utf-8')" ] }, { "cell_type": "code", "execution_count": 401, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "BX represented by 85 schools.\n", "BK represented by 93 schools.\n", "Manhattan represented by 75 schools.\n", "Queens represented by 52 schools.\n", "SI represented by 9 schools.\n" ] }, { "data": { "text/html": [ "