{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Using Lets-Plot with GeoPandas to Create Maps\n", "\n", "GeoPandas **GeoDataFrame** is a tabular data structure that contains a set of shapes (*geometry*) per each observation.\n", "\n", "*GeoDataFrame* extends pandas *DataFrame* and as such, aside from the *geometry*, can contain other data.\n", "\n", "GeoPandas has three basic classes of geometric objects (shapes):\n", "\n", "- Points / Multi-Points\n", "- Lines / Multi-Lines\n", "- Polygons / Multi-Polygons\n", "\n", "All GeoPandas shapes are \"undersood\" by *Lets-Plot* and can be plotted using various geometry layers, depending on the type of the shape. \n", "\n", "Use: \n", "\n", "- *geom_point, geom_text* with Points / Multi-Points\n", "- *geom_path* with Lines / Multi-Lines\n", "- *geom_polygon, geom_map* with Polygons / Multi-Polygons \n", "- *geom_rect* when used with Polygon shapes will display corresponding bounding boxes\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:50:11.423642Z", "iopub.status.busy": "2024-04-26T11:50:11.423642Z", "iopub.status.idle": "2024-04-26T11:50:12.483498Z", "shell.execute_reply": "2024-04-26T11:50:12.483498Z" } }, "outputs": [], "source": [ "import geopandas as gpd\n", "\n", "from lets_plot import *" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:50:12.487199Z", "iopub.status.busy": "2024-04-26T11:50:12.487199Z", "iopub.status.idle": "2024-04-26T11:50:12.499350Z", "shell.execute_reply": "2024-04-26T11:50:12.499350Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "LetsPlot.setup_html()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:50:12.515087Z", "iopub.status.busy": "2024-04-26T11:50:12.515087Z", "iopub.status.idle": "2024-04-26T11:50:12.547310Z", "shell.execute_reply": "2024-04-26T11:50:12.546376Z" } }, "outputs": [], "source": [ "def get_naturalearth_data(data_type=\"admin_0_countries\", columns=[\"NAME\", \"geometry\"]):\n", " import shapefile\n", " from shapely.geometry import shape\n", "\n", " naturalearth_url = \"https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/\" + \\\n", " \"data/naturalearth/{0}/data.shp?raw=true\".format(data_type)\n", " sf = shapefile.Reader(naturalearth_url)\n", "\n", " gdf = gpd.GeoDataFrame(\n", " [\n", " dict(zip([field[0] for field in sf.fields[1:]], record))\n", " for record in sf.records()\n", " ],\n", " geometry=[shape(s) for s in sf.shapes()]\n", " )[columns]\n", " gdf.columns = [col.lower() for col in gdf.columns]\n", "\n", " return gdf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Polygon shapes - Naturalearth low-resolution world dataset." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:50:12.547310Z", "iopub.status.busy": "2024-04-26T11:50:12.547310Z", "iopub.status.idle": "2024-04-26T11:50:13.715969Z", "shell.execute_reply": "2024-04-26T11:50:13.715969Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameiso_a3continentpop_estgdp_mdgeometry
0FijiFJIOceania889953.05496MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1TanzaniaTZAAfrica58005463.063177POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2W. SaharaESHAfrica603253.0907POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3CanadaCANNorth America37589262.01736425MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4United States of AmericaUSANorth America328239523.021433226MULTIPOLYGON (((-122.84000 49.00000, -120.0000...
\n", "
" ], "text/plain": [ " name iso_a3 continent pop_est gdp_md \\\n", "0 Fiji FJI Oceania 889953.0 5496 \n", "1 Tanzania TZA Africa 58005463.0 63177 \n", "2 W. Sahara ESH Africa 603253.0 907 \n", "3 Canada CAN North America 37589262.0 1736425 \n", "4 United States of America USA North America 328239523.0 21433226 \n", "\n", " geometry \n", "0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000... \n", "1 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... \n", "2 POLYGON ((-8.66559 27.65643, -8.66512 27.58948... \n", "3 MULTIPOLYGON (((-122.84000 49.00000, -122.9742... \n", "4 MULTIPOLYGON (((-122.84000 49.00000, -120.0000... " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "world = get_naturalearth_data(columns=[\"NAME\", \"ISO_A3\", \"CONTINENT\", \"POP_EST\", \"GDP_MD\", \"geometry\"])\n", "world.head()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:50:13.715969Z", "iopub.status.busy": "2024-04-26T11:50:13.715969Z", "iopub.status.idle": "2024-04-26T11:50:13.731780Z", "shell.execute_reply": "2024-04-26T11:50:13.731780Z" } }, "outputs": [ { "data": { "text/plain": [ "array(['Oceania', 'Africa', 'North America', 'Asia', 'South America',\n", " 'Europe', 'Seven seas (open ocean)', 'Antarctica'], dtype=object)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "world.continent.unique()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### *geom_polygon()*" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:50:13.731780Z", "iopub.status.busy": "2024-04-26T11:50:13.731780Z", "iopub.status.idle": "2024-04-26T11:50:13.873933Z", "shell.execute_reply": "2024-04-26T11:50:13.873933Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Use parameter `map` in `geom_polygon` to display Polygons / Multi-Polygons \n", "(ggplot() \n", " + geom_polygon(map=world, fill='white', color='gray') \n", " + ggsize(700, 400) \n", " + theme_void()\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### geom_map()\n", "\n", "*geom_map()* is very similar to *geom_polygon()* but it automatically applies `Mercator` projection and other defaults that are more suitable for displaying blank maps. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:50:13.874938Z", "iopub.status.busy": "2024-04-26T11:50:13.874938Z", "iopub.status.idle": "2024-04-26T11:50:14.001375Z", "shell.execute_reply": "2024-04-26T11:50:14.001375Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(ggplot() \n", " + geom_map(map=world) \n", " + ggsize(700, 400) \n", " + theme_void()\n", ")" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:50:14.002913Z", "iopub.status.busy": "2024-04-26T11:50:14.002913Z", "iopub.status.idle": "2024-04-26T11:50:14.017634Z", "shell.execute_reply": "2024-04-26T11:50:14.017634Z" } }, "outputs": [], "source": [ "# When applying Mercator projection to the world map, Antarctica becomes disproportionally large so \n", "# in the future lets show only part of it above 85th parallel south:\n", "world_limits = coord_map(ylim=[-70, 85])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Point shapes - Naturalearth world capitals dataset." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:50:14.017634Z", "iopub.status.busy": "2024-04-26T11:50:14.017634Z", "iopub.status.idle": "2024-04-26T11:50:15.104770Z", "shell.execute_reply": "2024-04-26T11:50:15.104770Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namegeometry
0Vatican CityPOINT (12.45339 41.90328)
1San MarinoPOINT (12.44177 43.93610)
2VaduzPOINT (9.51667 47.13372)
3LobambaPOINT (31.20000 -26.46667)
4LuxembourgPOINT (6.13000 49.61166)
\n", "
" ], "text/plain": [ " name geometry\n", "0 Vatican City POINT (12.45339 41.90328)\n", "1 San Marino POINT (12.44177 43.93610)\n", "2 Vaduz POINT (9.51667 47.13372)\n", "3 Lobamba POINT (31.20000 -26.46667)\n", "4 Luxembourg POINT (6.13000 49.61166)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cities = get_naturalearth_data(data_type=\"populated_places\")\n", "cities.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### geom_point()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:50:15.106971Z", "iopub.status.busy": "2024-04-26T11:50:15.106971Z", "iopub.status.idle": "2024-04-26T11:50:15.247826Z", "shell.execute_reply": "2024-04-26T11:50:15.247826Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Use parameter `map` in `geom_point` to display Point shapes\n", "(ggplot()\n", " + geom_map(map=world)\n", " + geom_point(map=cities, color='red')\n", " + ggsize(800, 600) + theme_void() + world_limits\n", ") " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### geom_text()\n", "\n", "The situation with *geom_text* is different because in order to display labels we have to specify mapping for the aesthetic \"label\".\n", "\n", "Aesthetic mapping binds a variable in *data* (passed via the `data` parameter) with its representation on the screen.\n", "\n", "Variables in GeoDataframe passed via the `map` parameter can not be used in the aesthetic mapping.\n", "\n", "Fortunately, such a GeoDataframe can as well be passed via the `data` parameter and Lets-Plot will undersand that its *geometries* should be mapped to the \"x\" and \"y\" aesthetic automatically.\n", "\n", "In the next example we are going to show names of cities as labels on map. \n", "\n", "Let's show only South American capitals because too many labels on the entire world map would quickly become not legible." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:50:15.247826Z", "iopub.status.busy": "2024-04-26T11:50:15.247826Z", "iopub.status.idle": "2024-04-26T11:50:15.295026Z", "shell.execute_reply": "2024-04-26T11:50:15.295026Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Obtain bounding box of South America and use it to set the limits.\n", "south_am = world[world.continent == 'South America']\n", "south_am_bounds = south_am.geometry.total_bounds\n", "\n", "# Let's use slightly expanded boundind box.\n", "from shapely.geometry import box\n", "south_am_box = box(*south_am_bounds).buffer(4)\n", "\n", "south_am_limits = coord_map(xlim=south_am_box.bounds[0::2], ylim=south_am_box.bounds[1::2])\n", "\n", "\n", "(ggplot() \n", " + geom_map(map=south_am) \n", " + geom_rect(map=gpd.GeoDataFrame({'geometry' : [south_am_box]}), alpha=0, color=\"#EFC623\"))\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:50:15.296263Z", "iopub.status.busy": "2024-04-26T11:50:15.296263Z", "iopub.status.idle": "2024-04-26T11:50:15.342161Z", "shell.execute_reply": "2024-04-26T11:50:15.342161Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Add `text` layer and use its `data` parameter to pass the `cities` GeoDataFrame.\n", "# Also configure tooltip in the points layer to show the city name.\n", "(ggplot()\n", " + geom_map(map=south_am, fill=\"#e5f5e0\")\n", " + geom_point(data=cities, color='red', size=3, tooltips=layer_tooltips().line(\"@name\"))\n", " + geom_text(aes(label='name'), data=cities, vjust=1, position=position_nudge(y=-.2))\n", " + geom_rect(map=gpd.GeoDataFrame({'geometry' : [south_am_box]}), alpha=0, color=\"#EFC623\", size=16)\n", " + south_am_limits\n", " + ggsize(450, 691)\n", " + theme_void()\n", ") " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Choropleth\n", "\n", "As we saw earlier, *Lets-Plot* geom-layers accept *GeoDataFrame* in their `data` parameter. \n", "\n", "This makes it easy to bind aesthetics with variables in *GeoDataFrame*." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:50:15.343128Z", "iopub.status.busy": "2024-04-26T11:50:15.343128Z", "iopub.status.idle": "2024-04-26T11:50:15.467843Z", "shell.execute_reply": "2024-04-26T11:50:15.467843Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a choropleth by mapping the `continent` variable to the `fill` aesthetic.\n", "(ggplot() \n", " + geom_map(aes(fill='continent'), data=world, color='white') \n", " + world_limits\n", " + ggsize(900, 400) + theme_void()\n", ")" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:50:15.467843Z", "iopub.status.busy": "2024-04-26T11:50:15.467843Z", "iopub.status.idle": "2024-04-26T11:50:15.593875Z", "shell.execute_reply": "2024-04-26T11:50:15.593875Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create another choropleth by mapping the `GDP estimate` variable to the `fill` aesthetic.\n", "(ggplot() \n", " + geom_map(aes(fill='gdp_md'), data=world, color=\"white\") \n", " + world_limits\n", " + ggsize(800, 400) + theme_void()\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Joining `data` and `geometry` datasets\n", "\n", "In this example we will use both: the `data` and the `map` parameters.\n", "\n", "We will use the `data` parameter to pass \"average temperature per continent\" dataset to the geom layer.\n", "\n", "The continent geometries *GeoDataFrame* is passed via the `map` parameter as before.\n", "\n", "For this to work it is also necessary to specify fields by which *Lets-Plot* will join `data` and `map` datasets. We will do that using the *map_join* parameter." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:50:15.593875Z", "iopub.status.busy": "2024-04-26T11:50:15.593875Z", "iopub.status.idle": "2024-04-26T11:50:15.610570Z", "shell.execute_reply": "2024-04-26T11:50:15.609658Z" } }, "outputs": [], "source": [ "# Average temperatures by continent (fictional)\n", "climat_data = dict(\n", " region = ['Europe', 'Asia', 'North America', 'Africa', 'Australia', 'Oceania'],\n", " avg_temp = [8.6, 16.6, 11.7, 21.9, 14.9, 23.9]\n", ")" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2024-04-26T11:50:15.610570Z", "iopub.status.busy": "2024-04-26T11:50:15.610570Z", "iopub.status.idle": "2024-04-26T11:50:15.752709Z", "shell.execute_reply": "2024-04-26T11:50:15.751564Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Join `data` and `map` using the `map_join` parameter.\n", "# For the sake of the demo let's use `geom_rect` and customize the tooltip.\n", "(ggplot()\n", " + geom_rect(aes(fill='avg_temp'), \n", " data=climat_data, \n", " map=world, map_join=[['region'], ['continent']], \n", " color=\"white\",\n", " tooltips=layer_tooltips().line(\"^fill C\\u00b0\"))\n", " + scale_fill_gradient(low='light_blue', high=\"dark_green\", name=\"Average t[C\\u00b0]\")\n", " + ggsize(800, 400) + theme_void()\n", ") " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 4 }