{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Let's get some JSON data from the web - both a point layer and a polygon GeoJson dataset with some population data." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import geopandas\n", "\n", "\n", "states = geopandas.read_file(\n", " \"https://rawcdn.githack.com/PublicaMundi/MappingAPI/master/data/geojson/us-states.json\",\n", " driver=\"GeoJSON\",\n", ")\n", "\n", "cities = geopandas.read_file(\n", " \"https://d2ad6b4ur7yvpq.cloudfront.net/naturalearth-3.3.0/ne_50m_populated_places_simple.geojson\",\n", " driver=\"GeoJSON\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And take a look at what our data looks like:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
density
count52.000000
mean402.504404
std1395.100812
min1.264000
25%53.440000
50%100.335000
75%234.050000
max10065.000000
\n", "
" ], "text/plain": [ " density\n", "count 52.000000\n", "mean 402.504404\n", "std 1395.100812\n", "min 1.264000\n", "25% 53.440000\n", "50% 100.335000\n", "75% 234.050000\n", "max 10065.000000" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "states.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Look how far the minimum and maximum values for the density are from the top and bottom quartile breakpoints! We have some outliers in our data that are well outside the meat of most of the distribution. Let's look into this to find the culprits within the sample." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namedensity
8District of Columbia10065.000
30New Jersey1189.000
51Puerto Rico1082.000
39Rhode Island1006.000
21Massachusetts840.200
31New Mexico17.160
34North Dakota9.916
26Montana6.858
50Wyoming5.851
1Alaska1.264
\n", "
" ], "text/plain": [ " name density\n", "8 District of Columbia 10065.000\n", "30 New Jersey 1189.000\n", "51 Puerto Rico 1082.000\n", "39 Rhode Island 1006.000\n", "21 Massachusetts 840.200\n", "31 New Mexico 17.160\n", "34 North Dakota 9.916\n", "26 Montana 6.858\n", "50 Wyoming 5.851\n", "1 Alaska 1.264" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "states_sorted = states.sort_values(by=\"density\", ascending=False)\n", "\n", "states_sorted.head(5).append(states_sorted.tail(5))[[\"name\", \"density\"]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks like Washington D.C. and Alaska were the culprits on each end of the range. Washington was more dense than the next most dense state, New Jersey, than the least dense state, Alaska was from Wyoming, however. Washington D.C. has a has a relatively small land area for the amount of people that live there, so it makes sense that it's pretty dense. And Alaska has a lot of land area, but not much of it is habitable for humans.\n", "

\n", "However, we're looking at all of the states in the US to look at things on a more regional level. That high figure at the top of our range for Washington D.C. will really hinder the ability for us to differentiate between the other states, so let's account for that in the min and max values for our color scale, by getting the quantile values close to the end of the range. Anything higher or lower than those values will just fall into the 'highest' and 'lowest' bins for coloring." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "minimum: 8.54\n", "\n", "maximum: 1040.2\n", "\n", "Mean: 402.5\n" ] } ], "source": [ "def rd2(x):\n", " return round(x, 2)\n", "\n", "\n", "minimum, maximum = states[\"density\"].quantile([0.05, 0.95]).apply(rd2)\n", "\n", "mean = round(states[\"density\"].mean(), 2)\n", "\n", "\n", "print(f\"minimum: {minimum}\", f\"maximum: {maximum}\", f\"Mean: {mean}\", sep=\"\\n\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This looks better. Our min and max values for the colorscale are much closer to the mean value now. Let's run with these values, and make a colorscale. I'm just going to use a sequential light-to-dark color palette from the [ColorBrewer](http://colorbrewer2.org/#type=sequential&scheme=Purples&n=5)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "8.541040.2" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import branca\n", "\n", "\n", "colormap = branca.colormap.LinearColormap(\n", " colors=[\"#f2f0f7\", \"#cbc9e2\", \"#9e9ac8\", \"#756bb1\", \"#54278f\"],\n", " index=states[\"density\"].quantile([0.2, 0.4, 0.6, 0.8]),\n", " vmin=minimum,\n", " vmax=maximum,\n", ")\n", "\n", "colormap.caption = \"Population Density in the United States\"\n", "\n", "colormap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's narrow down these cities to United states cities, by using GeoPandas' spatial join functionality between two GeoDataFrame objects, using the Point 'within' Polygon functionality." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "us_cities = geopandas.sjoin(cities, states, how=\"inner\", op=\"within\")\n", "\n", "pop_ranked_cities = us_cities.sort_values(by=\"pop_max\", ascending=False)[\n", " [\"nameascii\", \"pop_max\", \"geometry\"]\n", "].iloc[:20]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, now we have a new GeoDataFrame with our top 20 populated cities. Let's see the top 5." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameasciipop_maxgeometry
1224New York19040000POINT (-73.98196 40.75192)
1222Los Angeles12500000POINT (-118.18193 33.99192)
1186Chicago8990000POINT (-87.75200 41.83194)
1184Miami5585000POINT (-80.22605 25.78956)
1076Philadelphia5492000POINT (-75.17194 40.00192)
\n", "
" ], "text/plain": [ " nameascii pop_max geometry\n", "1224 New York 19040000 POINT (-73.98196 40.75192)\n", "1222 Los Angeles 12500000 POINT (-118.18193 33.99192)\n", "1186 Chicago 8990000 POINT (-87.75200 41.83194)\n", "1184 Miami 5585000 POINT (-80.22605 25.78956)\n", "1076 Philadelphia 5492000 POINT (-75.17194 40.00192)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pop_ranked_cities.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alright, let's build a map!" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Make this Notebook Trusted to load map: File -> Trust Notebook
" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import folium\n", "from folium.plugins import Search\n", "\n", "\n", "m = folium.Map(location=[38, -97], zoom_start=4)\n", "\n", "\n", "def style_function(x):\n", " return {\n", " \"fillColor\": colormap(x[\"properties\"][\"density\"]),\n", " \"color\": \"black\",\n", " \"weight\": 2,\n", " \"fillOpacity\": 0.5,\n", " }\n", "\n", "\n", "stategeo = folium.GeoJson(\n", " states,\n", " name=\"US States\",\n", " style_function=style_function,\n", " tooltip=folium.GeoJsonTooltip(\n", " fields=[\"name\", \"density\"], aliases=[\"State\", \"Density\"], localize=True\n", " ),\n", ").add_to(m)\n", "\n", "citygeo = folium.GeoJson(\n", " pop_ranked_cities,\n", " name=\"US Cities\",\n", " tooltip=folium.GeoJsonTooltip(\n", " fields=[\"nameascii\", \"pop_max\"], aliases=[\"\", \"Population Max\"], localize=True\n", " ),\n", ").add_to(m)\n", "\n", "statesearch = Search(\n", " layer=stategeo,\n", " geom_type=\"Polygon\",\n", " placeholder=\"Search for a US State\",\n", " collapsed=False,\n", " search_label=\"name\",\n", " weight=3,\n", ").add_to(m)\n", "\n", "citysearch = Search(\n", " layer=citygeo,\n", " geom_type=\"Point\",\n", " placeholder=\"Search for a US City\",\n", " collapsed=True,\n", " search_label=\"nameascii\",\n", ").add_to(m)\n", "\n", "folium.LayerControl().add_to(m)\n", "colormap.add_to(m)\n", "\n", "m" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.0" } }, "nbformat": 4, "nbformat_minor": 2 }