{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "aa1bb64d", "metadata": {}, "outputs": [], "source": [ "import folium\n", "import numpy as np\n", "import pandas as pd\n", "import json\n", "import requests" ] }, { "cell_type": "markdown", "id": "c7fd1292", "metadata": {}, "source": [ "# Integrating Jenks Natural Break Optimization\n", "\n", "Choropleths provide an easy way to visually see data distributions across geography. By default, folium uses the breaks created by numpy.histogram (np.histogram), which generally creates an evenly spaced quantiles.\n", "\n", "This works well enough for evenly distributed data, but for unevenly distributed data, these even quantiles can obscure more than they show. To demonstrate this, I have created maps showing the labor force of each US state.\n", "\n", "The data was taken from the county-level data and aggregated. Since our geographic data does not have areas representing Puerto Rico or the United States as a whole, I removed those entries while keeping Washington, D.C. in our data set. Already, looking at the first five states alphabetically, we can see that Alaska (AK) has a work force roughly 2% the size of California (CA)." ] }, { "cell_type": "code", "execution_count": 2, "id": "a199cc25", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StateCivilian_labor_force_2011
0AK734088
1AL4381044
2AR2739713
3AZ6068526
4CA36769777
\n", "
" ], "text/plain": [ " State Civilian_labor_force_2011\n", "0 AK 734088\n", "1 AL 4381044\n", "2 AR 2739713\n", "3 AZ 6068526\n", "4 CA 36769777" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "url = (\n", " \"https://raw.githubusercontent.com/python-visualization/folium/main/examples/data\"\n", ")\n", "us_states = f\"{url}/us-states.json\"\n", "\n", "geo_json_data = json.loads(requests.get(us_states).text)\n", "\n", "county_data = pd.read_csv(f\"{url}/us_county_data.csv\")\n", "clf = 'Civilian_labor_force_2011'\n", "labor_force = county_data[['State', clf]][\n", " (county_data[clf].str.strip()!='') & (~county_data['State'].isin(['PR', 'US']))\n", "]\n", "labor_force[clf] = labor_force[clf].astype(int)\n", "labor_force = labor_force.groupby('State').sum().reset_index()\n", "\n", "labor_force.head()" ] }, { "cell_type": "markdown", "id": "4b570c1f", "metadata": {}, "source": [ "Using default breaks, most states are represented as being part of the bottom quantile. This distribution is similar to what we might expect if US states follow a Power Law or a Zipf distribution." ] }, { "cell_type": "code", "execution_count": 3, "id": "8b06b85d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Make this Notebook Trusted to load map: File -> Trust Notebook
" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = folium.Map(location=[38, -96], zoom_start=4)\n", "\n", "folium.Choropleth(\n", " geo_data=geo_json_data,\n", " data=labor_force,\n", " columns=['State', clf],\n", " key_on='id',\n", " fill_color='RdBu',\n", ").add_to(m)\n", "\n", "m" ] }, { "cell_type": "markdown", "id": "4a36162e", "metadata": {}, "source": [ "However, when using Jenks natural Breaks Optimization, we now see more granular detail at the bottom of the distribution, where most of our states are located. The upper western states (Idaho, Montana, Wyoming and the Dakotas) are distinguished from their Midwestern and Mountain West neighbors to the south. Gradations in the deep south between Mississippi and Alabama provide more visual information than in the previous map. Overall, this is a richer representation of the data distribution.\n", "\n", "One notable drawback of this representation is the legend. Because the lower bins are smaller, the numerical values overlap, making them unreadable." ] }, { "cell_type": "code", "execution_count": 4, "id": "7ac8ccec", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Make this Notebook Trusted to load map: File -> Trust Notebook
" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = folium.Map(location=[38, -96], zoom_start=4)\n", "\n", "choropleth = folium.Choropleth(\n", " geo_data=geo_json_data,\n", " data=labor_force,\n", " columns=['State', clf],\n", " key_on='id',\n", " fill_color='RdBu',\n", " use_jenks=True,\n", ")\n", "choropleth.add_to(m)\n", "\n", "choropleth.color_scale.width = 800\n", "\n", "m" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 5 }