{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p><font size=\"6\"><b>Spatial operations and overlays: creating new geometries</b></font></p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the previous notebook we have seen how to identify and use the spatial relationships between geometries. In this notebook, we will see how to create new geometries based on those relationships."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import geopandas\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "countries = geopandas.read_file(\"data/ne_110m_admin_0_countries.zip\")\n",
    "cities = geopandas.read_file(\"data/ne_110m_populated_places.zip\")\n",
    "rivers = geopandas.read_file(\"data/ne_50m_rivers_lake_centerlines.zip\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# defining the same example geometries as in the previous notebook\n",
    "belgium = countries.loc[countries['name'] == 'Belgium', 'geometry'].item()\n",
    "brussels = cities.loc[cities['name'] == 'Brussels', 'geometry'].item()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Spatial operations\n",
    "\n",
    "Next to the spatial predicates that return boolean values, Shapely and GeoPandas also provide operations that return new geometric objects.\n",
    "\n",
    "**Binary operations:**\n",
    "\n",
    "<table><tr>\n",
    "<td> <img src=\"img/spatial-operations-base.png\"/> </td>\n",
    "<td> <img src=\"img/spatial-operations-intersection.png\"/> </td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td> <img src=\"img/spatial-operations-union.png\"/> </td>\n",
    "<td> <img src=\"img/spatial-operations-difference.png\"/> </td>\n",
    "</tr></table>\n",
    "\n",
    "**Buffer:**\n",
    "\n",
    "<table><tr>\n",
    "<td> <img src=\"img/spatial-operations-buffer-point1.png\"/> </td>\n",
    "<td> <img src=\"img/spatial-operations-buffer-point2.png\"/> </td>\n",
    "</tr>\n",
    "<tr>\n",
    "<td> <img src=\"img/spatial-operations-buffer-line.png\"/> </td>\n",
    "<td> <img src=\"img/spatial-operations-buffer-polygon.png\"/> </td>\n",
    "</tr></table>\n",
    "\n",
    "\n",
    "See https://shapely.readthedocs.io/en/stable/manual.html#spatial-analysis-methods for more details."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For example, using the toy data from above, let's construct a buffer around Brussels (which returns a Polygon):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "geopandas.GeoSeries([belgium, brussels.buffer(1)]).plot(alpha=0.5, cmap='tab10')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and now take the intersection, union or difference of those two polygons:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "brussels.buffer(1).intersection(belgium)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "brussels.buffer(1).union(belgium)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "brussels.buffer(1).difference(belgium)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Spatial operations with GeoPandas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Above we showed how to create a new geometry based on two individual shapely geometries. The same operations can be extended to GeoPandas. Given a GeoDataFrame, we can calculate the intersection, union or difference of each of the geometries with another geometry.\n",
    "\n",
    "Let's look at an example with a subset of the countries. We have a GeoDataFrame with the country polygons of Africa, and  now consider a rectangular polygon, representing an area around the equator:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "africa = countries[countries.continent == 'Africa']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from shapely.geometry import LineString\n",
    "box = LineString([(-10, 0), (50, 0)]).buffer(10, cap_style=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=(6, 6))\n",
    "africa.plot(ax=ax, facecolor='none', edgecolor='k')\n",
    "geopandas.GeoSeries([box]).plot(ax=ax, facecolor='C0', edgecolor='k', alpha=0.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The intersection method of the GeoDataFrame will now calculate the intersection with the rectangle for each of the geometries of the africa GeoDataFrame element-wise. Note that for many of the countries, those that do not overlap with the rectangle, this will be an empty geometry:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "africa_intersection = africa.intersection(box)\n",
    "africa_intersection.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is returned is a new GeoSeries of the same length as the original dataframe, containing one row per country, but now containing only the intersection. In this example, the last element shown is an empty polygon, as that country was not overlapping with the box."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# remove the empty polygons before plotting\n",
    "africa_intersection = africa_intersection[~africa_intersection.is_empty]\n",
    "# plot the intersection\n",
    "africa_intersection.plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Unary union and dissolve\n",
    "\n",
    "Another useful method is the `unary_union` attribute, which converts the set of geometry objects in a GeoDataFrame into a single geometry object by taking the union of all those geometries.\n",
    "\n",
    "For example, we can construct a single Shapely geometry object for the Africa continent:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "africa_countries = countries[countries['continent'] == 'Africa']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "africa = africa_countries.unary_union"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "africa"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "print(str(africa)[:1000])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, you might want to take the unary union of a set of geometries but *grouped* by one of the attributes of the GeoDataFrame (so basically doing \"groupby\" + \"unary_union\"). For this operation, GeoPandas provides the `dissolve()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "continents = countries.dissolve(by=\"continent\")   # , aggfunc=\"sum\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "continents"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\" style=\"font-size:120%\">\n",
    "\n",
    "**REMEMBER**:\n",
    "\n",
    "GeoPandas (and Shapely for the individual objects) provide a whole lot of basic methods to analyze the geospatial data (distance, length, centroid, boundary, convex_hull, simplify, transform, ....), much more than what we can touch in this tutorial.\n",
    "\n",
    "An overview of all methods provided by GeoPandas can be found here: https://geopandas.readthedocs.io/en/latest/docs/reference.html\n",
    "\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Let's practice!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "**EXERCISE 1: What are the districts close to the Seine?**\n",
    "\n",
    "Below, the coordinates for the Seine river in the neighborhood of Paris are provided as a GeoJSON-like feature dictionary (created at http://geojson.io). \n",
    "\n",
    "Based on this `seine` object, we want to know which districts are located close (maximum 150 m) to the Seine. \n",
    "\n",
    "* Create a buffer of 150 m around the Seine.\n",
    "* Check which districts intersect with this buffered object.\n",
    "* Make a visualization of the districts indicating which districts are located close to the Seine.\n",
    " \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": false
   },
   "outputs": [],
   "source": [
    "districts = geopandas.read_file(\"data/paris_districts.geojson\").to_crs(epsg=2154)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": false
   },
   "outputs": [],
   "source": [
    "# created a line with http://geojson.io\n",
    "s_seine = geopandas.GeoDataFrame.from_features({\"type\":\"FeatureCollection\",\"features\":[{\"type\":\"Feature\",\"properties\":{},\"geometry\":{\"type\":\"LineString\",\"coordinates\":[[2.408924102783203,48.805619828930226],[2.4092674255371094,48.81703747481909],[2.3927879333496094,48.82325391133874],[2.360687255859375,48.84912860497674],[2.338714599609375,48.85827758964043],[2.318115234375,48.8641501307046],[2.298717498779297,48.863246707697],[2.2913360595703125,48.859519915404825],[2.2594070434570312,48.8311646245967],[2.2436141967773438,48.82325391133874],[2.236919403076172,48.82347994904826],[2.227306365966797,48.828339513221444],[2.2224998474121094,48.83862215329593],[2.2254180908203125,48.84856379804802],[2.2240447998046875,48.85409863123821],[2.230224609375,48.867989496547864],[2.260265350341797,48.89192242750887],[2.300262451171875,48.910203080780285]]}}]},\n",
    "                                               crs='EPSG:4326')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# convert to local UTM zone\n",
    "s_seine_utm = s_seine.to_crs(epsg=2154)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "fig, ax = plt.subplots(figsize=(20, 10))\n",
    "districts.plot(ax=ax, color='grey', alpha=0.4, edgecolor='k')\n",
    "s_seine_utm.plot(ax=ax)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# access the single geometry object\n",
    "seine = s_seine_utm.geometry.item()\n",
    "seine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays1.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays2.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays3.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "------"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "**EXERCISE 2: Exploring a Land Use dataset**\n",
    "\n",
    "For the following exercises, we first introduce a new dataset: a dataset about the land use of Paris (a simplified version based on the open European [Urban Atlas](https://land.copernicus.eu/local/urban-atlas)). The land use indicates for what kind of activity a certain area is used, such as residential area or for recreation. It is a polygon dataset, with a label representing the land use class for different areas in Paris.\n",
    "\n",
    "In this exercise, we will read the data, explore it visually, and calculate the total area of the different classes of land use in the area of Paris.\n",
    "\n",
    "* Read in the `'paris_land_use.shp'` file and assign the result to a variable `land_use`.\n",
    "* Make a plot of `land_use`, using the `'class'` column to color the polygons. Add a legend with `legend=True`, and make the figure size a bit larger.\n",
    "* Add a new column `'area'` to the dataframe with the area of each polygon.\n",
    "* Calculate the total area in km² for each `'class'` using the `groupby()` method, and print the result.\n",
    "\n",
    "<details><summary>Hints</summary>\n",
    "\n",
    "* Reading a file can be done with the `geopandas.read_file()` function.\n",
    "* To use a column to color the geometries, use the `column` keyword to indicate the column name.\n",
    "* The area of each geometry can be accessed with the `area` attribute of the `geometry` of the GeoDataFrame.\n",
    "* The `groupby()` method takes the column name on which you want to group as the first argument.\n",
    "\n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays4.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays5.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays6.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays7.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "**EXERCISE 3: Intersection of two polygons**\n",
    "\n",
    "For this exercise, we are going to use 2 individual polygons: the district of Muette extracted from the `districts` dataset, and the green urban area of Boulogne, a large public park in the west of Paris, extracted from the `land_use` dataset. The two polygons have already been assigned to the `muette` and `park_boulogne` variables.\n",
    "\n",
    "We first visualize the two polygons. You will see that they overlap, but the park is not fully located in the district of Muette. Let's determine the overlapping part:\n",
    "\n",
    "* Plot the two polygons in a single map to examine visually the degree of overlap\n",
    "* Calculate the intersection of the `park_boulogne` and `muette` polygons.\n",
    "* Plot the intersection.\n",
    "* Print the proportion of the area of the district that is occupied by the park.\n",
    "\n",
    "<details><summary>Hints</summary>\n",
    "\n",
    "* To plot single Shapely objects, you can put those in a `GeoSeries([..])` to use the GeoPandas `plot()` method.\n",
    "* The intersection of to scalar polygons can be calculated with the `intersection()` method of one of the polygons, and passing the other polygon as the argument to that method.\n",
    "\n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "land_use = geopandas.read_file(\"data/paris_land_use.zip\")\n",
    "districts = geopandas.read_file(\"data/paris_districts.geojson\").to_crs(land_use.crs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# extract polygons\n",
    "land_use['area'] = land_use.geometry.area\n",
    "park_boulogne = land_use[land_use['class'] == \"Green urban areas\"].sort_values('area').geometry.iloc[-1]\n",
    "muette = districts[districts.district_name == 'Muette'].geometry.item()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# Plot the two polygons\n",
    "geopandas.GeoSeries([park_boulogne, muette]).plot(alpha=0.5, color=['green', 'blue'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays8.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays9.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays10.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "**EXERCISE 4: Intersecting a GeoDataFrame with a Polygon**\n",
    "\n",
    "Combining the land use dataset and the districts dataset, we can now investigate what the land use is in a certain district.\n",
    "\n",
    "For that, we first need to determine the intersection of the land use dataset with a given district. Let's take again the *Muette* district as example case.\n",
    "\n",
    "* Calculate the intersection of the `land_use` polygons with the single `muette` polygon. Call the result `land_use_muette`.\n",
    "* Remove the empty geometries from `land_use_muette`.\n",
    "* Make a quick plot of this intersection, and pass `edgecolor='black'` to more clearly see the boundaries of the different polygons.\n",
    "* Print the first five rows of `land_use_muette`.\n",
    "\n",
    "<details><summary>Hints</summary>\n",
    "\n",
    "* The intersection of each geometry of a GeoSeries with another single geometry can be performed with the `intersection()` method of a GeoSeries.\n",
    "* The `intersection()` method takes as argument the geometry for which to calculate the intersection.\n",
    "* We can check which geometries are empty with the `is_empty` attribute of a GeoSeries.\n",
    "    \n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "land_use = geopandas.read_file(\"data/paris_land_use.zip\")\n",
    "districts = geopandas.read_file(\"data/paris_districts.geojson\").to_crs(land_use.crs)\n",
    "muette = districts[districts.district_name == 'Muette'].geometry.item()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays11.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays12.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays13.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays14.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays15.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see in the plot that we now only have a subset of the full land use dataset. The original `land_use_muette` (before removing the empty geometries) still has the same number of rows as the original `land_use`, though. But many of the rows, as you could see by printing the first rows, consist now of empty polygons when it did not intersect with the Muette district.\n",
    "\n",
    "The `intersection()` method also returned only geometries. If we want to combine those intersections with the attributes of the original land use, we can take a copy of this and replace the geometries with the intersections (you can uncomment and run to see the code):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays16.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays17.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays18.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "**EXERCISE 5: The land use of the Muette district**\n",
    "\n",
    "Based on the `land_use_muette` dataframe with the land use for the Muette districts as calculated above, we can now determine the total area of the different land use classes in the Muette district.\n",
    "\n",
    "* Calculate the total area per land use class.\n",
    "* Calculate the fraction (in percentage) for the different land use classes.\n",
    "\n",
    "<details><summary>Hints</summary>\n",
    "\n",
    "* The intersection of each geometry of a GeoSeries with another single geometry can be performed with the `intersection()` method of a GeoSeries.\n",
    "* The `intersection()` method takes as argument the geometry for which to calculate the intersection.\n",
    "* We can check which geometries are empty with the `is_empty` attribute of a GeoSeries.\n",
    "    \n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays19.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays20.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The above was only for a single district. If we want to do this more easily for all districts, we can do this with the overlay operation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The overlay operation\n",
    "\n",
    "In a spatial join operation, we are not changing the geometries itself. We are not joining geometries, but joining attributes based on a spatial relationship between the geometries. This also means that the geometries need to at least overlap partially.\n",
    "\n",
    "If you want to create new geometries based on joining (combining) geometries of different dataframes into one new dataframe (eg by taking the intersection of the geometries), you want an **overlay** operation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How does it differ compared to the intersection method?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the `intersection()` method introduced in the previous section, we could for example determine the intersection of a set of countries with another polygon, a circle in the example below:\n",
    "\n",
    "<img width=\"70%\" src=\"img/overlay-countries-circle-intersection.png\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, this method (`countries.intersection(circle)`) also has some limitations.\n",
    "\n",
    "* Mostly useful when intersecting a GeoSeries with a single polygon.\n",
    "* Does not preserve attribute information of the intersecting polygons.\n",
    "\n",
    "For cases where we require a bit more complexity, it is preferable to use the \"overlay\" operation, instead of the intersection method."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Consider the following simplified example. On the left we see again the 3 countries. On the right we have the plot of a GeoDataFrame with some simplified geologic regions for the same area:\n",
    "\n",
    "<table width=\"80%\"><tr>\n",
    "<td> <img src=\"img/overlay-countries.png\"/> </td>\n",
    "<td> <img src=\"img/overlay-regions.png\"/> </td>\n",
    "</tr></table>\n",
    "\n",
    "By simply plotting them on top of each other, as shown below, you can see that the polygons of both layers intersect. \n",
    "\n",
    "But now, by \"overlaying\" the two layers, we can create a third layer that contains the result of intersecting both layers: all the intersections of each country with each geologic region. It keeps only those areas that were included in both layers.\n",
    "\n",
    "<table width=\"80%\"><tr>\n",
    "<td> <img src=\"img/overlay-both.png\"/> </td>\n",
    "<td> <img src=\"img/overlay-overlayed.png\"/> </td>\n",
    "</tr></table>\n",
    "\n",
    "This operation is called an intersection overlay, and in GeoPandas we can perform this operation with the `geopandas.overlay()` function."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another code example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "africa = countries[countries['continent'] == 'Africa']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "africa.plot()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "cities['geometry'] = cities.buffer(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "intersection = geopandas.overlay(africa, cities, how='intersection')\n",
    "intersection.plot()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "intersection.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the overlay method, we pass the full GeoDataFrame with all regions to intersect the countries with. The result contains all non-empty intersections of all combinations of countries and city regions.\n",
    "\n",
    "Note that the result of the overlay function also keeps the attribute information of both the countries as the city regions. That can be very useful for further analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "geopandas.overlay(africa, cities, how='intersection').plot()  # how=\"difference\"/\"union\"/\"symmetric_difference\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\" style=\"font-size:120%\">\n",
    "<b>REMEMBER</b> <br>\n",
    "\n",
    "* **Spatial join**: transfer attributes from one dataframe to another based on the spatial relationship\n",
    "* **Spatial overlay**: construct new geometries based on spatial operation between both dataframes (and combining attributes of both dataframes)\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Let's practice!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "**EXERCISE 6: Overlaying spatial datasets I**\n",
    "\n",
    "We will now combine both datasets in an overlay operation. Create a new `GeoDataFrame` consisting of the intersection of the land use polygons which each of the districts, but make sure to bring the attribute data from both source layers.\n",
    "\n",
    "* Create a new GeoDataFrame from the intersections of `land_use` and `districts`. Assign the result to a variable `combined`.\n",
    "* Print the first rows the resulting GeoDataFrame (`combined`).\n",
    "\n",
    "<details><summary>Hints</summary>\n",
    "\n",
    "* The intersection of two GeoDataFrames can be calculated with the `geopandas.overlay()` function.\n",
    "* The `overlay()` functions takes first the two GeoDataFrames to combine, and a third `how` keyword indicating how to combine the two layers.\n",
    "* For making an overlay based on the intersection, you can pass `how='intersection'`.\n",
    "\n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "land_use = geopandas.read_file(\"data/paris_land_use.zip\")\n",
    "districts = geopandas.read_file(\"data/paris_districts.geojson\").to_crs(land_use.crs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays21.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays22.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "**EXERCISE 7: Overlaying spatial datasets II**\n",
    "\n",
    "Now that we created the overlay of the land use and districts datasets, we can more easily inspect the land use for the different districts. Let's get back to the example district of Muette, and inspect the land use of that district.\n",
    "\n",
    "* Add a new column `'area'` with the area of each polygon to the `combined` GeoDataFrame.\n",
    "* Create a subset called `land_use_muette` where the `'district_name'` is equal to \"Muette\".\n",
    "* Make a plot of `land_use_muette`, using the `'class'` column to color the polygons.\n",
    "* Calculate the total area for each `'class'` of `land_use_muette` using the `groupby()` method, and print the result.\n",
    "\n",
    "<details><summary>Hints</summary>\n",
    "\n",
    "* The area of each geometry can be accessed with the `area` attribute of the `geometry` of the GeoDataFrame.\n",
    "* To use a column to color the geometries, pass its name to the `column` keyword.\n",
    "* The `groupby()` method takes the column name on which you want to group as the first argument.\n",
    "* The total area for each class can be calculated by taking the `sum()` of the area.\n",
    "\n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays23.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays24.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays25.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays26.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "**EXERCISE 8: Overlaying spatial datasets III**\n",
    "\n",
    "Thanks to the result of the overlay operation, we can now more easily perform a similar analysis for *all* districts. Let's investigate the fraction of green urban area in each of the districts.\n",
    "\n",
    "* Based on the `combined` dataset, calculate the total area per district using `groupby()`.\n",
    "* Select the subset of \"Green urban areas\" from `combined` and call this `urban_green`.\n",
    "* Now calculate the total area per district for this `urban_green` subset, and call this `urban_green_area`.\n",
    "* Determine the fraction of urban green area in each district.\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays27.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays28.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays29.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays30.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays31.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "tags": [
     "nbtutor-solution"
    ]
   },
   "outputs": [],
   "source": [
    "# %load _solved/solutions/04-spatial-operations-overlays32.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An alternative to calculate the area per land use class in each district:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "combined.groupby([\"district_name\", \"class\"])[\"area\"].sum().reset_index()"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Nbtutor - export exercises",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}