{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": "# Exploring Land Cover Data (Impact Observatory)\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Context\n",
"### Purpose\n",
"Introduce manipulation and exploratory analysis of classified land use and cover data, using example [data created by Impact Observatory](https://planetarycomputer.microsoft.com/dataset/io-lulc-9-class) from ESA Sentinel-2 imagery. \n",
"\n",
"### Dataset description\n",
"There are now many classified (categorical) land cover data products freely available that are useful for Environmental Data Science. These include:\n",
"- [ESA CCI land cover](https://www.esa-landcover-cci.org/), 300m spatial resolution global extent for years 1992-2015 \n",
"- [Copernicus Global Land Cover](https://zenodo.org/communities/copernicus-land-cover), 100m spatial resolution global extent for years 2015-2019 \n",
"- [USGS LCMAP](https://www.usgs.gov/special-topics/lcmap/data), 30m spatial resolution for USA for years 1985-2020\n",
"- [UKCEH LCMs](https://www.ceh.ac.uk/data/ukceh-land-cover-maps), various spatial resolutions for UK for various years 1990-2020\n",
"- [mapbiomas](https://mapbiomas.org/en), 30m spatial resolution for Brazil for years 1985-2020\n",
"- [Impact Observatory](https://planetarycomputer.microsoft.com/dataset/io-lulc-9-class), 10m spatial resolution global extent for 2017-2021\n",
"\n",
"These products are provided as 2D rasters (spatial) or 3D data cubes (spatio-temporal). The number and classification of discrete land cover classes varies between products, but at their most basic will distinguish between broad land covers such as 'crops', 'forest' and 'built-up'. The nominal (categorical) character of the data influences the types of analysis appropriate. \n",
"\n",
"This notebook uses data created by [Impact Observatory](https://www.impactobservatory.com/). [The data](https://planetarycomputer.microsoft.com/dataset/io-lulc-9-class) are a time series for 2017-2021 of annual global land use and land cover (LULC) mapped at 10m spatial resolution {cite:p}`impactobservatory`. The data are [derived](https://www.impactobservatory.com/global_maps) from [ESA Sentinel-2](https://www.esa.int/Applications/Observing_the_Earth/Copernicus/Sentinel-2) imagery with each annual map specifying individual pixels as belonging to one of 9 LULC classes {cite:p}`impactobservatory_paper`. The Impact Observatory LULC model uses deep learning methods to infer a single annual LULC class for each pixel in a Sentinel-2 image. Each annual global LULC map is produced by aggregating multiple inferences for images from across a given year (requiring processing approximately 2 million images to create each annual map).\n",
"\n",
"### Highlights\n",
"* Fetch land cover product data from an online repository\n",
"* Visualise the data (maps and spatio-temporal plots)\n",
"* Analyse aggregate change through bar charts and similar visualisation \n",
"* Analyse pixel-by-pixel change including use of sankey diagrams\n",
"* Analyse zonal change using ancillary vector data\n",
"\n",
"### Contributions\n",
"\n",
"#### Dataset originator/creator\n",
"- [Esri](https://www.esri.com/) (licensor)\n",
"- [Impact Observatory](https://www.impactobservatory.com/) (processor, producer, licensor)\n",
"- [Microsoft](https://planetarycomputer.microsoft.com/) (host)\n",
"\n",
"The data are available under a [Creative Commons BY-4.0 license](https://creativecommons.org/licenses/by/4.0/).\n",
"\n",
"#### Code\n",
"- Data loading code built on snippet [from @acocac](https://github.com/alan-turing-institute/environmental-ds-book/issues/99#issuecomment-1185550940) via [ODC.stac docs](https://odc-stac.readthedocs.io/en/latest/notebooks/stac-load-e84-aws.html) and a [Microsoft Planetary example notebook](https://planetarycomputer.microsoft.com/dataset/io-lulc#Example-Notebook)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load libraries"
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:05.397729Z",
"iopub.status.busy": "2022-09-23T12:40:05.397252Z",
"iopub.status.idle": "2022-09-23T12:40:11.772091Z",
"shell.execute_reply": "2022-09-23T12:40:11.770952Z"
},
"tags": [
"hide-input"
]
},
"source": [
"#system\n",
"import os\n",
"import warnings\n",
"warnings.filterwarnings(action='ignore')\n",
"\n",
"#data handling\n",
"import pystac_client\n",
"import odc.stac\n",
"from pystac.extensions.item_assets import ItemAssetsExtension\n",
"\n",
"import geopandas as gpd\n",
"import rasterio as rio\n",
"import numpy as np\n",
"import pandas as pd\n",
"from shapely.geometry import Polygon\n",
"import xarray as xr\n",
"import rioxarray\n",
"import planetary_computer\n",
"\n",
"#visualisation\n",
"import matplotlib.pyplot as plt\n",
"import matplotlib.colors as mplc\n",
"import holoviews as hv\n",
"import hvplot.pandas \n",
"from holoviews import opts, dim\n",
"\n",
"#data analysis\n",
"from sklearn import metrics #for confusion matrix\n",
"from rasterstats import zonal_stats"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set project structure\n",
"The next cell creates a separate folder to save any notebook outputs you should wish (e.g. below we use this directory in the demo of how you might load data from a local file)."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:11.777033Z",
"iopub.status.busy": "2022-09-23T12:40:11.776473Z",
"iopub.status.idle": "2022-09-23T12:40:11.784838Z",
"shell.execute_reply": "2022-09-23T12:40:11.783928Z"
}
},
"source": [
"notebook_folder = './notebook'\n",
"if not os.path.exists(notebook_folder):\n",
" os.makedirs(notebook_folder)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## 1. Fetch and Load Data\n",
"This notebook uses annual land cover [data from Impact Observatory](https://planetarycomputer.microsoft.com/dataset/io-lulc-9-class), 10m spatial resolution and global extent for 2017-2021. These data are hosted online by [Microsoft's Planetary Computer](https://planetarycomputer.microsoft.com/) which enable you, the user, to run the code in the cloud (e.g. via binder). \n",
"\n",
"Below we will use the [`pystac-client`](https://pystac-client.readthedocs.io/en/latest/) package to access [`STAC`](https://stacspec.org/en/) information from the [Planetary Computer API](https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac/), then read it using the [`odc-stac`](https://odc-stac.readthedocs.io/en/latest/) package. In doing so we will:\n",
"- fetch data for a study area in Mato Grosso, Brazil\n",
"- resample the original 10m spatial resolution to 300m (to decrease execution times in this example notebook) \n",
"\n",
":::{note}\n",
"Fetching the data from the cloud adds some steps that would not be necessary had you already downloaded the data to your local machine. Consequently, in future for your own work you might ignore the _Cloud Data_ code blocks and start from _Local Data_."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cloud Data\n",
"\n",
"First we need to specify the location and extent of study area to fetch the relevant data. We do this by specying a bounding box around the center point of the study area."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:11.788750Z",
"iopub.status.busy": "2022-09-23T12:40:11.788363Z",
"iopub.status.idle": "2022-09-23T12:40:11.805043Z",
"shell.execute_reply": "2022-09-23T12:40:11.803827Z"
}
},
"source": [
"#specify center point of the study area\n",
"x, y = (-56.1, -12.2) #area in MT\n",
"\n",
"#define a square bounding box for the study area\n",
"km2deg = 1.0 / 111 #1 deg corresponds to approximately 111 km at the equator\n",
"xd = 200 * km2deg\n",
"yd = 325 * km2deg\n",
"study_bbox_coords = [[x - xd, y - yd], [x - xd, y + yd], [x + xd, y + yd], [x + xd, y - yd]]\n",
"\n",
"#view bounding box\n",
"study_bbox_coords"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can then visualise the location and extent of the study area [using `hvPlot`](https://hvplot.holoviz.org/user_guide/Geographic_Data.html). "
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:11.810771Z",
"iopub.status.busy": "2022-09-23T12:40:11.810361Z",
"iopub.status.idle": "2022-09-23T12:40:16.762719Z",
"shell.execute_reply": "2022-09-23T12:40:16.760578Z"
}
},
"source": [
"#create a GeoPandas GeoDataFrame \n",
"study_poly = Polygon(study_bbox_coords)\n",
"\n",
"crs = {'init': 'epsg:4326'}\n",
"sa_gpd = gpd.GeoDataFrame(index=[0], crs=crs, geometry=[study_poly]) \n",
"\n",
"#interactive plot using hvplot.pandas\n",
"sa_gpd_interactive = sa_gpd.hvplot(\n",
" geo=True,crs=sa_gpd.crs.to_epsg(),alpha=0.5, xlim=(-75, -35), ylim=(-30, 10), tiles='CartoLight')\n",
"sa_gpd_interactive"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With our study area defined, we can now query Microsoft Planetary Computer data that is available for this location [using `pystac-client`](https://pystac-client.readthedocs.io/en/latest/)."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:16.768099Z",
"iopub.status.busy": "2022-09-23T12:40:16.767731Z",
"iopub.status.idle": "2022-09-23T12:40:17.521164Z",
"shell.execute_reply": "2022-09-23T12:40:17.520132Z"
}
},
"source": [
"catalog = pystac_client.Client.open(\"https://planetarycomputer.microsoft.com/api/stac/v1\", modifier=planetary_computer.sign_inplace)\n",
"\n",
"#https://pystac-client.readthedocs.io/en/latest/tutorials/pystac-client-introduction.html#API-Search\n",
"query = catalog.search(\n",
" collections=[\"io-lulc-9-class\"],\n",
" limit=100,\n",
" bbox=study_poly.bounds\n",
")\n",
"\n",
"#https://pystac-client.readthedocs.io/en/latest/tutorials/pystac-client-introduction.html#Items\n",
"items = list(query.get_items())\n",
"print(f\"Found: {len(items):d} datasets\")\n",
"\n",
"print(items)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the available datasets we can access and view the associated metadata. "
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:17.525808Z",
"iopub.status.busy": "2022-09-23T12:40:17.525216Z",
"iopub.status.idle": "2022-09-23T12:40:17.741239Z",
"shell.execute_reply": "2022-09-23T12:40:17.740165Z"
}
},
"source": [
"stac_json = query.get_all_items_as_dict() # Convert STAC items into a GeoJSON FeatureCollection\n",
"gdf = gpd.GeoDataFrame.from_features(stac_json) #convert GeoJSON to GeoPandas to view nicely\n",
"gdf.head()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The global LULC datasets are for 'tiles' covering regions of the earth's surface - each line in the DataFrame above is for one year for tile _21L_. We can also plot to show the study area does not cover multiple land cover tiles. "
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:17.745135Z",
"iopub.status.busy": "2022-09-23T12:40:17.744863Z",
"iopub.status.idle": "2022-09-23T12:40:18.100606Z",
"shell.execute_reply": "2022-09-23T12:40:18.099592Z"
},
"tags": [
"hide-input"
]
},
"source": [
"%matplotlib inline\n",
"fig, axes = plt.subplots(figsize=(6, 12))\n",
"\n",
"#tiles are \n",
"gdf.plot(\n",
" \"io:tile_id\",\n",
" ax=axes,\n",
" edgecolor=\"blue\",\n",
" categorical=True,\n",
" aspect=\"equal\",\n",
" alpha=0.5,\n",
" legend=True,\n",
" legend_kwds={\"loc\": \"upper left\", \"frameon\": False, \"ncol\": 1},\n",
")\n",
"\n",
"#add polygon to show our study area\n",
"sa_gpd.plot(edgecolor=\"red\", facecolor=\"none\", ax=axes)\n",
"\n",
"plt.show()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From the figure above we can see the whole study area (represented by the red rectangle) is within in the tiles (represented by the blue rectangles) returned by the `pystac` query. \n",
"\n",
"Now we are happy we know what datasets we wish to fetch, we can [use `odc-stac`](https://odc-stac.readthedocs.io/en/latest/) to fetch and load the data into memory, specifying the projection and spatial resolution we want as well as how we want data grouped in layers:"
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:18.105902Z",
"iopub.status.busy": "2022-09-23T12:40:18.105043Z",
"iopub.status.idle": "2022-09-23T12:40:18.208240Z",
"shell.execute_reply": "2022-09-23T12:40:18.206802Z"
},
"tags": [
"hide-input"
]
},
"source": [
"#projection\n",
"SMepsg = 3857 #https://epsg.io/3857 Geographic crs, units are m\n",
"SMepsg_str = \"epsg:{0}\".format(SMepsg)\n",
"\n",
"#spatial resolution (in units of projection)\n",
"datares = 300\n",
"\n",
"#https://odc-stac.readthedocs.io/en/latest/_api/odc.stac.load.html\n",
"lcxr = odc.stac.load(\n",
" items, #load the items from our query above\n",
" groupby=\"start_datetime\", #create spatial layers that have the same timestamp \n",
" chunks={}, #use Dask to speed loading\n",
" dtype='int',\n",
" #next lines load study area at resolution specified by datares variable\n",
" #without these entire image is returned at original resolution\n",
" crs=SMepsg_str, #specify the desired crs \n",
" resolution=datares, #specify the desired spatial resolution (units as per crs)\n",
" bbox=study_poly.bounds, #specify bounding box in Lon/Lat (ignores crs units, use x,y to use crs units)\n",
" resampling=\"mode\" #ensure sensible type for categorical data (options seem to be as for datacube.Datacube.load)\n",
")"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data object we have created is an [`xarray`](https://xarray.dev/) [Dataset](https://docs.xarray.dev/en/stable/user-guide/data-structures.html#dataset) which is a dict-like collection of [DataArray](https://docs.xarray.dev/en/stable/user-guide/data-structures.html#dataarray) objects with aligned dimensions. We can check the dimensions, coordinates, data and attributes (see `xarray` [terminology](https://docs.xarray.dev/en/stable/user-guide/terminology.html)): "
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:18.213068Z",
"iopub.status.busy": "2022-09-23T12:40:18.212748Z",
"iopub.status.idle": "2022-09-23T12:40:18.236591Z",
"shell.execute_reply": "2022-09-23T12:40:18.235740Z"
}
},
"source": [
"lcxr"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load Local Data\n",
"\n",
"In the previous section we fetched data from an online repository and loaded it into memory. In the sections below we will continue to work with that (and you can skip to the next section now, if you don't want to see how to save and/or read data from a local drive).\n",
"\n",
"The [recommended](https://docs.xarray.dev/en/stable/user-guide/io.html) way to store `xarray` data structures is [netCDF](https://docs.unidata.ucar.edu/netcdf-c/current/faq.html#What-Is-netCDF), a binary file format for self-described datasets that originated in the geosciences. \n",
"\n",
"For example, to save the `xarray` Dataset we created above on disk as netCDF, we would use something like:\n",
"```python\n",
"lcxr.to_netcdf(os.path.join(notebook_folder,\"MT-study-area.nc\"))\n",
"```\n",
"\n",
"Then to load the local netCDF file into a `xarray` Dataset we would use:\n",
"```python\n",
"lcxr_disk = xr.open_dataset(os.path.join(notebook_folder,\"MT-study-area.nc\"), decode_coords=\"all\")\n",
"```\n",
"\n",
":::{tip}\n",
"If you have saved this notebook to a local machine, try uncommenting the code below and running it. Check you can see where the data has been saved on your local machine and note how the Dataset loaded is identical to the one you saved. "
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:18.240189Z",
"iopub.status.busy": "2022-09-23T12:40:18.239922Z",
"iopub.status.idle": "2022-09-23T12:40:18.244115Z",
"shell.execute_reply": "2022-09-23T12:40:18.242987Z"
}
},
"source": [
"#lcxr.to_netcdf(os.path.join(notebook_folder,\"MT-study-area.nc\"))\n",
"#lcxr_disk = xr.open_dataset(os.path.join(notebook_folder,\"MT-study-area.nc\"), decode_coords=\"all\")\n",
"#lcxr_disk"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## 2. Visualisation\n",
"\n",
"Our [dataset](https://planetarycomputer.microsoft.com/dataset/io-lulc-9-class) contains a 2D raster layer of land cover types for individual snapshots in time. Each map is a composite of land use/cover predictions for 9 classes throughout the year in order to generate a representative snapshot of each year. \n",
"\n",
"We can visualise these spatio-temporal data in a variety of different ways, both static and interactive. But before exploring these, we will [create a `matplotlib` colormap](https://matplotlib.org/stable/tutorials/colors/colormap-manipulation.html) to display the 9 land cover classes with intuitive colors. "
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:18.247509Z",
"iopub.status.busy": "2022-09-23T12:40:18.247228Z",
"iopub.status.idle": "2022-09-23T12:40:19.457043Z",
"shell.execute_reply": "2022-09-23T12:40:19.455861Z"
},
"tags": [
"hide-input"
]
},
"source": [
"%matplotlib inline\n",
"#class names and IDs are held in an 'asset' of the collection\n",
"collection = catalog.get_collection(\"io-lulc-9-class\")\n",
"ia = ItemAssetsExtension.ext(collection)\n",
"iaa = ia.item_assets[\"data\"]\n",
"\n",
"#get the names of the land cover classes with their ID in the raster (as a dictionary, name:ID)\n",
"class_names = {iaa[\"summary\"]: iaa[\"values\"][0] for iaa in iaa.properties[\"file:values\"]}\n",
"\n",
"#flip the keys:values in the dictionary (to create ID:name)\n",
"values_to_classes = {v: k for k, v in class_names.items()}\n",
"\n",
"max_class = max(values_to_classes.keys())\n",
"\n",
"#construct a matplotlib colormap\n",
"with rio.open(items[0].assets[\"data\"].href) as src:\n",
" colormap_def = src.colormap(1) # get metadata colormap for band 1\n",
" colormap = [\n",
" np.array(colormap_def[i]) / 255 for i in range(max_class+1)\n",
" ] # transform to matplotlib color format\n",
"lc_cmap = mplc.ListedColormap(colormap)\n",
"print(class_names)\n",
"lc_cmap"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
":::{note}\n",
"Note that there are more than 9 colours in the `colormap` created - that's because we also have colours for _'No Data'_ and for class IDs that are not used (skipped) in the data (e.g. there's no land cover class _6_ - we'll label these as _No Data_ below). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can use the colormap with [`matplotlib`](https://matplotlib.org/stable/index.html) to plot a static 2D map for a given time point."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:19.461791Z",
"iopub.status.busy": "2022-09-23T12:40:19.461504Z",
"iopub.status.idle": "2022-09-23T12:40:23.263824Z",
"shell.execute_reply": "2022-09-23T12:40:23.262967Z"
},
"tags": [
"hide-input"
]
},
"source": [
"%matplotlib inline\n",
"ts='2017' #which year layer\n",
"\n",
"#plotting helpers\n",
"vmin = 0\n",
"vmax = lc_cmap.N\n",
"\n",
"p=lcxr.sel(time=ts).isel(time=0).to_array(\"band\").plot.imshow(\n",
" col=\"band\",\n",
" cmap=lc_cmap,\n",
" vmin=vmin, vmax=vmax,\n",
" size=6\n",
")\n",
"\n",
"ticks = np.linspace(start=vmin+0.5, stop=vmax-0.5, num=vmax)\n",
"labels = [values_to_classes.get(i, \"No Data\") for i in range(vmax)]\n",
"p.cbar.set_ticks(ticks, labels=labels)\n",
"plt.axis('scaled')\n",
"plt.axis('off')\n",
"plt.title(\"Land cover for {0} at {1}m resolution\".format(ts, datares), size=14)\n",
"plt.show()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly, we could make a static plot showing maps for each point in time to visually examine change."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:23.268900Z",
"iopub.status.busy": "2022-09-23T12:40:23.268450Z",
"iopub.status.idle": "2022-09-23T12:40:44.305272Z",
"shell.execute_reply": "2022-09-23T12:40:44.304339Z"
},
"tags": [
"hide-input"
]
},
"source": [
"%matplotlib inline\n",
"p = lcxr.data.plot(\n",
" col=\"time\",\n",
" cmap=lc_cmap,\n",
" vmin=vmin, vmax=vmax,\n",
" figsize=(16, 5)\n",
")\n",
"ticks = np.linspace(start=vmin+0.5, stop=vmax-0.5, num=vmax)\n",
"labels = [values_to_classes.get(i, \"No Data\") for i in range(vmax)]\n",
"p.cbar.set_ticks(ticks, labels=labels)\n",
"\n",
"for axes in p.axes.flat:\n",
" axes.axis('off')\n",
" axes.axis('scaled')"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use [`holoviews`](https://holoviews.org/) to create an interactive plot - you should be able to use the slider below to see change through time in the same space. "
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:44.309884Z",
"iopub.status.busy": "2022-09-23T12:40:44.309512Z",
"iopub.status.idle": "2022-09-23T12:40:52.713106Z",
"shell.execute_reply": "2022-09-23T12:40:52.712202Z"
},
"tags": [
"hide-input"
]
},
"source": [
"#plotting helpers\n",
"levels = [i for i in range(lc_cmap.N)]\n",
"bnorm = mplc.BoundaryNorm(levels, ncolors=len(levels))\n",
"\n",
"#https://holoviews.org/user_guide/Gridded_Datasets.html?highlight=dask#working-with-xarray-data-types\n",
"hv.extension('matplotlib')\n",
"hv_ds = hv.Dataset(lcxr)[:, :, :]\n",
"\n",
"lcmaps = hv_ds.to(hv.Image, kdims=[\"x\", \"y\"], dynamic=True).options(cmap=lc_cmap, norm=bnorm, fig_inches=(5,8), colorbar=True)\n",
"lcmaps"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## 3. Analysing Aggregate Change\n",
"\n",
"In this section we're going to examine land cover change across the study area. Our aim is to produce a bar plot that shows increases or decreases in land area for each land cover class between two points in time. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can quickly identify the unique values in our land cover maps and how many times they are observed (i.e. how many pixels fall in each land cover class). We can use `numpy`'s `.unique` [function](https://numpy.org/doc/stable/reference/generated/numpy.unique.html) to [count the occurrence of each value](https://stackoverflow.com/a/28663910), then output as a `Pandas` DataFrame."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:52.717881Z",
"iopub.status.busy": "2022-09-23T12:40:52.717566Z",
"iopub.status.idle": "2022-09-23T12:40:58.239031Z",
"shell.execute_reply": "2022-09-23T12:40:58.238082Z"
}
},
"source": [
"#list to hold pixel counts, class ID:name dict as list to append to \n",
"counts_ls = [values_to_classes]\n",
"\n",
"#loop over items\n",
"for i, j in enumerate(items):\n",
" lcmap = lcxr.isel(time=i).to_array(\"band\") #get pixels for this year as an array\n",
" unique, counts = np.unique(lcmap, return_counts=True) #get the unique values and corresponding counts \n",
" lcmap_counts = dict(zip(unique, counts)) #combine them into a dict for easier viewing\n",
" counts_ls.append(lcmap_counts) \n",
"\n",
"counts_df = pd.DataFrame(counts_ls).T #create df from list and transpose\n",
"\n",
"#create list of data years as string\n",
"counts_yr = list(lcxr[\"time\"].values.astype('datetime64[Y]').astype('str')) \n",
"\n",
"counts_yr.insert(0, \"Land Cover\") #insert 'LC'\n",
"counts_df.columns = counts_yr #then use as df header\n",
"counts_df #print"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It might be more intuitve (and useful) to work in units of area (e.g. square km) rather than pixel counts, so let's convert the pixel counts to area (sq km):"
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:58.245520Z",
"iopub.status.busy": "2022-09-23T12:40:58.242800Z",
"iopub.status.idle": "2022-09-23T12:40:58.264016Z",
"shell.execute_reply": "2022-09-23T12:40:58.263003Z"
}
},
"source": [
"def cell_to_sqkm(x):\n",
" return x * ((datares*datares) / 1000000)\n",
"\n",
"sqkm_df = counts_df.copy(deep=False)\n",
"sqkm_df.loc[:,'2017':] = sqkm_df.loc[:,'2017':].apply(cell_to_sqkm)\n",
"\n",
"sqkm_df"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that some land cover classes are not present in any year (indicated by `NaN`). We'll drop these and also [reshape the data from 'wide' to 'long'](https://pandas.pydata.org/docs/getting_started/intro_tutorials/07_reshape_table_layout.html#wide-to-long-format) for plotting. "
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:58.268060Z",
"iopub.status.busy": "2022-09-23T12:40:58.267773Z",
"iopub.status.idle": "2022-09-23T12:40:58.278316Z",
"shell.execute_reply": "2022-09-23T12:40:58.277566Z"
}
},
"source": [
"sqkm_df = sqkm_df.dropna() #drop No Data rows\n",
"\n",
"#make data long\n",
"sqkm_df_long = pd.melt(sqkm_df, id_vars=['Land Cover'])\n",
"sqkm_df_long.rename(columns={'variable':'Year','value':'Area (sq km)'}, inplace = True)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From the `Pandas` DataFrame we can create both static and interactive bar plots. Here's the static version with `matplotlib`. "
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:58.283494Z",
"iopub.status.busy": "2022-09-23T12:40:58.282741Z",
"iopub.status.idle": "2022-09-23T12:40:58.633183Z",
"shell.execute_reply": "2022-09-23T12:40:58.632218Z"
},
"tags": [
"hide-input"
]
},
"source": [
"%matplotlib inline\n",
"fig, ax1 = plt.subplots(1, figsize=(8,4))\n",
"\n",
"#drop NAN rows from data when plotting\n",
"sqkm_df[sqkm_df['2017'] > 0].set_index('Land Cover').plot(kind='bar', ax=ax1, cmap='viridis')\n",
"\n",
"x_ticks_labels = ['Water','Trees','Flooded Veg','Crops', 'Built Area', 'Bare Ground', 'Rangeland']\n",
"ax1.set_xticklabels(x_ticks_labels, rotation=45)\n",
"\n",
"plt.tick_params(labelsize=12) \n",
"plt.xlabel('Land Cover', labelpad=10, fontsize=16)\n",
"plt.ylabel(\"Area (sq km)\",labelpad=10, fontsize=16) \n",
"plt.legend(title=\"Year\")\n",
"plt.show()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
":::{tip}\n",
"We could also do an interactive plot with `hvplot` for Pandas data (see the cell below)."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:58.637736Z",
"iopub.status.busy": "2022-09-23T12:40:58.637379Z",
"iopub.status.idle": "2022-09-23T12:40:58.991909Z",
"shell.execute_reply": "2022-09-23T12:40:58.990741Z"
},
"tags": [
"hide-input"
]
},
"source": [
"hv.extension('bokeh')\n",
"color_lst = [mplc.to_hex(colormap[1]),mplc.to_hex(colormap[2]),mplc.to_hex(colormap[4]),\n",
" mplc.to_hex(colormap[5]),mplc.to_hex(colormap[7]),mplc.to_hex(colormap[8]),mplc.to_hex(colormap[11])]\n",
"lc_year_bar = sqkm_df_long.hvplot.bar(y='Area (sq km)', x='Land Cover', by='Year', rot=90).options(\n",
" color='Land Cover', cmap=color_lst, show_legend=False)\n",
"lc_year_bar"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
":::{note}\n",
"From the plots above we can see that through time the _Trees_ and _Rangeland_ classes have decreased in area, while _Crops_ has increased. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We could also show this by calculating and plotting the change between first and final time snapshot. "
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:58.996990Z",
"iopub.status.busy": "2022-09-23T12:40:58.996379Z",
"iopub.status.idle": "2022-09-23T12:40:59.278541Z",
"shell.execute_reply": "2022-09-23T12:40:59.276576Z"
},
"tags": [
"hide-input"
]
},
"source": [
"%matplotlib inline\n",
"#calculate\n",
"sqkm_df['Diffc-1721'] = sqkm_df['2021'] - sqkm_df['2017']\n",
"\n",
"#plot\n",
"fig, ax1 = plt.subplots(1, figsize=(8,4))\n",
"sqkm_df['Diffc-1721'].plot(kind='bar', ax=ax1, cmap='viridis')\n",
"x_ticks_labels = ['Water','Trees','Flooded Veg','Crops', 'Built Area', 'Bare Ground', 'Rangeland']\n",
"ax1.set_xticklabels(x_ticks_labels, rotation=45)\n",
"plt.title(\"Change 2017-2021\", size=16)\n",
"plt.tick_params(labelsize=12) \n",
"plt.xlabel('Land Cover', labelpad=10, fontsize=14)\n",
"plt.ylabel(\"Area (sq km)\",labelpad=10, fontsize=14) \n",
"plt.show()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## 4. Analysing Pixel-by-Pixel Change\n",
"In the last section we saw overall change in the land covers (e.g. decrease in _Trees_ and _Rangeland_, increase in _Crops_). But which classes were changing to which other classes? \n",
"\n",
"To understand the _transitions_ between each class we need to analyse change for each pixel individually. We'll enumerate individual pixels' transitions for the first and last time snapshots using the a _confusion matrix_ (using a [function](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html) from `sklearn` and then visualise using a Sankey diagram (using `holoviews`). \n",
"\n",
"First, the confusion matrix."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:40:59.285689Z",
"iopub.status.busy": "2022-09-23T12:40:59.285362Z",
"iopub.status.idle": "2022-09-23T12:41:03.536233Z",
"shell.execute_reply": "2022-09-23T12:41:03.535389Z"
}
},
"source": [
"#connvert xarray DataArrays to numpy arrays\n",
"lc2017 = lcxr.sel(time='2017').isel(time=0).to_array(\"band\").values.flatten()\n",
"lc2021 = lcxr.sel(time='2021').isel(time=0).to_array(\"band\").values.flatten()\n",
"\n",
"#create the confustion matrix using sklearn\n",
"conf = metrics.confusion_matrix(lc2017, lc2021)\n",
"\n",
"#output as Pandas DF (drop no data land covers)\n",
"conf_pd = pd.DataFrame(conf,\n",
" index=sqkm_df[sqkm_df['2017'] != None]['Land Cover'],\n",
" columns=sqkm_df['Land Cover'])\n",
"conf_pd"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From [the docs](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix), the result is a:\n",
"\n",
">confusion matrix whose i-th row and j-th column entry indicates the number of samples with true label being i-th class and predicted label being j-th class.\n",
"\n",
"So for us, this means 2017 classes are the rows, with 2021 classes in columns (there are no predictions here - the 2021 observations are in place of any prediction). The number of cells that did not change are on the diagonal."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
":::{note}\n",
"This shows the most frequent transition was from Rangeland to Crops with 129,038 cells. Given that each cell is 9 hectares (300m x 300m) that means more than 1.1 million hectares (11,000 sq km) changed from pasture to cropland over the four years!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can use the confusion matrix (after some manipulation) to create an [interactive Sankey diagram using holoviews](https://holoviews.org/reference/elements/matplotlib/Sankey.html) to visualise the transitions between each of the classes. "
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:03.541792Z",
"iopub.status.busy": "2022-09-23T12:41:03.541342Z",
"iopub.status.idle": "2022-09-23T12:41:04.245135Z",
"shell.execute_reply": "2022-09-23T12:41:04.244258Z"
},
"tags": [
"hide-input"
]
},
"source": [
"#create the nodes\n",
"lc_nodes = list(conf_pd.index) + list(conf_pd.index)\n",
"nodes = hv.Dataset(enumerate(lc_nodes), 'index', 'label')\n",
"\n",
"#create the edges (flows between nodes)\n",
"#list of source lcs\n",
"sourcelc = []\n",
"for i in range(0, 7):\n",
" sourcelc += [i] * 7\n",
"#list of target lcs\n",
"targetlc = list(range(7,14))\n",
"targetlc *= 7 \n",
" \n",
"#list of transition counts \n",
"conf_list = conf.flatten().tolist() \n",
"#convert to area and round (for hover display)\n",
"conf_sqkm_list = list(map(cell_to_sqkm, conf_list))\n",
"conf_sqkm_list = [round(item) for item in conf_sqkm_list]\n",
"\n",
"#combine lists for sankey\n",
"edges_sqkm = list(zip(sourcelc, targetlc, conf_sqkm_list))\n",
"\n",
"#from holoviews import opts, dim\n",
"hv.extension('bokeh')\n",
"value_dim = hv.Dimension('Transition', unit='sq km')\n",
"\n",
"#create the sankey from the edges and nodes\n",
"transitions = hv.Sankey((edges_sqkm, nodes), ['From', 'To'], vdims=value_dim)\n",
"\n",
"transitions = transitions.opts(\n",
" opts.Sankey(labels='label', label_position='right', show_values=False, width=900, height=300, cmap=color_lst,\n",
" edge_color=dim('From').str(), node_color=dim('label').str()))\n",
"transitions"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using the data objects we created to make the Sankey diagram, we can also view the most frequent transitions in a table:"
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:04.251203Z",
"iopub.status.busy": "2022-09-23T12:41:04.250639Z",
"iopub.status.idle": "2022-09-23T12:41:04.271866Z",
"shell.execute_reply": "2022-09-23T12:41:04.270797Z"
}
},
"source": [
"edges = list(zip(sourcelc, targetlc, conf_list)) #edges as counts\n",
"trans_df = pd.DataFrame(edges, columns =['From', 'To', 'Count']) #convert edges list of tuples to pandas df\n",
"\n",
"nodes_dict = dict(zip(list(nodes['index']),list(nodes['label']))) #create dictionary for next replace calls\n",
"trans_df['From'].replace(to_replace=nodes_dict, inplace=True) #replace ids with labels\n",
"trans_df['To'].replace(to_replace=nodes_dict, inplace=True) #replace ids with labels\n",
"\n",
"trans_df = trans_df[trans_df['To'] != trans_df['From']] #remove rows of 'no transition'\n",
"trans_df['Area (sqkm)'] = trans_df['Count'].apply(cell_to_sqkm) #add area column\n",
"trans_df.sort_values('Count', ascending=False).head() #output head, sorted"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
":::{note}\n",
"Now we can see that although large numbers of cells changed from Rangeland to other land covers, there were also relatively large numbers of cells changing into Rangeland. Check you can see this in the Sankey diagram above. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## 5. Analysing Zonal Change\n",
"\n",
"Above we have analysed the data for our entire study area and on a pixel-by-pixel basis. But sometimes we want to work at a level between the entire study area and individuals, using zones that belong to some kind of vector geography. This is where raster [zonal statistics](https://gisgeography.com/zonal-statistics/) come in:\n",
"\n",
"> Zonal Statistics uses groupings to calculate statistics for specified zones.\n",
"\n",
"The statistics are summary statistics (e.g. mean, median, standard deviation) of all the pixels that fall in each zone. The zones might be watersheds, biomes or ecoregions (physically) or states, countries, counties or census districts (socio-economically). \n",
"\n",
"Here, we'll use municipality boundaries as zones. For Brazil these are freely available online in (zipped) shapefile format which we can read using [GeoPandas](https://geopandas.org) (which inherits from [Pandas](https://pandas.pydata.org/))."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:04.276749Z",
"iopub.status.busy": "2022-09-23T12:41:04.276117Z",
"iopub.status.idle": "2022-09-23T12:41:06.300841Z",
"shell.execute_reply": "2022-09-23T12:41:06.299840Z"
}
},
"source": [
"#load shapefile for municipalities\n",
"zipfile = \"https://geoftp.ibge.gov.br/organizacao_do_territorio/malhas_territoriais/malhas_municipais/municipio_2021/UFs/MT/MT_Municipios_2021.zip\"\n",
"munis = gpd.read_file(zipfile)\n",
"munis.head()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data Manipulation\n",
"Before we can use these data with the raster data we need to do some manipulation, including:\n",
"- setting the `dtype` of the _CD_MUN_ series appropriately\n",
"- setting the map projection to be consistent with that for the raster land cover data (we'll also do this for the study area polygon we made above)"
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:06.305179Z",
"iopub.status.busy": "2022-09-23T12:41:06.304839Z",
"iopub.status.idle": "2022-09-23T12:41:07.400977Z",
"shell.execute_reply": "2022-09-23T12:41:07.399723Z"
}
},
"source": [
"#munis.dtypes #check dtypes\n",
"munis['CD_MUN'] = munis['CD_MUN'].astype(int) #set appropriate for muni code\n",
"#munis.crs #check projection\n",
"munis = munis.to_crs(SMepsg) #reproject to match our raster data\n",
"sa_gpd = sa_gpd.to_crs(SMepsg) #also reproject sa_polygon"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And now we can plot to check things are aligning nicely and to visualise our zones (municipalities)."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:07.406332Z",
"iopub.status.busy": "2022-09-23T12:41:07.406001Z",
"iopub.status.idle": "2022-09-23T12:41:08.721094Z",
"shell.execute_reply": "2022-09-23T12:41:08.720187Z"
},
"tags": [
"hide-input"
]
},
"source": [
"%matplotlib inline\n",
"fig, axes = plt.subplots(figsize=(6, 12))\n",
"\n",
"munis.plot(\n",
" ax=axes,\n",
" edgecolor=\"blue\",\n",
" facecolor='None',\n",
" categorical=True,\n",
" aspect=\"equal\",\n",
" alpha=0.5\n",
")\n",
"\n",
"#add polygon to show our study area\n",
"sa_gpd.plot(edgecolor=\"red\", facecolor=\"red\", alpha=0.5, ax=axes)\n",
"axes.set_xlabel('x [metre]')\n",
"axes.set_ylabel('y [metre]')\n",
"plt.show()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
":::{tip}\n",
"We could also do an interactive plot with `hvplot` for Pandas data (not shown here to reduce notebook filesize - uncomment to run yourself). "
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:08.725086Z",
"iopub.status.busy": "2022-09-23T12:41:08.724808Z",
"iopub.status.idle": "2022-09-23T12:41:08.728552Z",
"shell.execute_reply": "2022-09-23T12:41:08.727716Z"
}
},
"source": [
"#import hvplot.pandas \n",
"#munis.hvplot(\n",
"# geo=True,crs=munis.crs.to_epsg(),alpha=0.25, tiles=True) * sa_gpd.hvplot(\n",
"# geo=True,crs=sa_gpd.crs.to_epsg(), alpha=0.25)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's also make a static plot with matplotlib to visualise municipality boundaries over the raster land cover data."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:08.733364Z",
"iopub.status.busy": "2022-09-23T12:41:08.732840Z",
"iopub.status.idle": "2022-09-23T12:41:26.021479Z",
"shell.execute_reply": "2022-09-23T12:41:26.020401Z"
},
"tags": [
"hide-input"
]
},
"source": [
"%matplotlib inline\n",
"fig,ax1=plt.subplots(figsize=(8,8))\n",
"\n",
"#year 2017 is in the 0 position of lcxr['data']\n",
"lcxr['data'][0].plot(\n",
" cmap=lc_cmap,\n",
" vmin=vmin,\n",
" vmax=vmax,\n",
" ax=ax1\n",
")\n",
"\n",
"munis.plot(ax=ax1, facecolor='None', edgecolor='red', linewidth=1)\n",
"\n",
"ax1.set_xlim(min(lcxr['data'][0].x), max(lcxr['data'][0].x)) \n",
"ax1.set_ylim(min(lcxr['data'][0].y), max(lcxr['data'][0].y)) \n",
"plt.axis('off')\n",
"plt.title(\"Land cover for {0} at {1}m resolution\".format(ts, datares), size=14)\n",
"plt.show()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before we do our zonal statistics, to make comparison fair we will consider only those municipalities whose boundaries are completely within the study area. \n",
"\n",
"First, remove municipalities from the vector shapefile that are not completely within the study area."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:26.025984Z",
"iopub.status.busy": "2022-09-23T12:41:26.025706Z",
"iopub.status.idle": "2022-09-23T12:41:26.387934Z",
"shell.execute_reply": "2022-09-23T12:41:26.386915Z"
},
"tags": [
"hide-input"
]
},
"source": [
"%matplotlib inline\n",
"pip_mask = munis.within(sa_gpd.loc[0, 'geometry']) #spatial query: select only munis within study area\n",
"sa_munis = munis[pip_mask==True] #remove non-selected munis\n",
"sa_munis.plot(facecolor='None') #check by plotting"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, create a raster from the new vector shapefile to mask cells from our land cover raster."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:26.391991Z",
"iopub.status.busy": "2022-09-23T12:41:26.391713Z",
"iopub.status.idle": "2022-09-23T12:41:26.594141Z",
"shell.execute_reply": "2022-09-23T12:41:26.592973Z"
}
},
"source": [
"#collapse to list for rio.features.rasterize below\n",
"sa_munis_shapes = sa_munis[['geometry', 'CD_MUN']].values.tolist()\n",
"\n",
"#get the spatial transform data needed for rio.features.rasterize below\n",
"lcxr_transform = lcxr['data'][0].rio.transform() #this only works if rioxarray has been loaded - why? \n",
"\n",
"#create the mask raster (cells are CD_MUN for municipality polygons within study area, else -999)\n",
"sa_munis_r = rio.features.rasterize(sa_munis_shapes, fill=-999, \n",
" out_shape=lcxr['data'][0].shape, \n",
" transform=lcxr_transform) \n",
"\n",
"#create xarray DataArray from the new raster\n",
"sa_munis_xr = xr.DataArray(data=sa_munis_r,\n",
" name='Municipality code',\n",
" dims=[\"y\", \"x\"],\n",
" coords={'y':('y',lcxr.y.data),\n",
" 'x':('x',lcxr.x.data)})"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's visualise the raster we've created from the vector shapefile of municipality boundaries."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:26.600069Z",
"iopub.status.busy": "2022-09-23T12:41:26.599719Z",
"iopub.status.idle": "2022-09-23T12:41:26.861089Z",
"shell.execute_reply": "2022-09-23T12:41:26.860223Z"
},
"tags": [
"hide-input"
]
},
"source": [
"%matplotlib inline\n",
"fig,ax1=plt.subplots(figsize=(7,7))\n",
"\n",
"sa_munis_xr.plot.imshow(\n",
" cmap='Reds',\n",
" vmin=5090000, vmax=5110000,\n",
" ax=ax1\n",
")\n",
"\n",
"plt.axis('off')\n",
"plt.show()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And finally, mask the existing land cover `xarray` DataSet with this new `xarray` DataArray."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:26.865516Z",
"iopub.status.busy": "2022-09-23T12:41:26.865214Z",
"iopub.status.idle": "2022-09-23T12:41:26.886553Z",
"shell.execute_reply": "2022-09-23T12:41:26.885608Z"
}
},
"source": [
"#mask original land cover xarray with new xarray\n",
"lcxr_mask = lcxr.where(sa_munis_xr >0) #nodata was set to -999"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's check the mask worked by plotting the new (masked) land cover data with the municipality polygons overlaid."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:26.892562Z",
"iopub.status.busy": "2022-09-23T12:41:26.892167Z",
"iopub.status.idle": "2022-09-23T12:41:44.204937Z",
"shell.execute_reply": "2022-09-23T12:41:44.204115Z"
},
"tags": [
"hide-input"
]
},
"source": [
"%matplotlib inline\n",
"fig,ax1=plt.subplots(figsize=(8,8))\n",
"\n",
"p = lcxr_mask['data'][0].plot(\n",
" cmap=lc_cmap,\n",
" vmin=vmin,\n",
" vmax=vmax,\n",
" ax=ax1\n",
")\n",
"\n",
"munis.plot(ax=ax1, facecolor='None', edgecolor='red', linewidth=1)\n",
"\n",
"ax1.set_xlim(min(lcxr['data'][0].x), max(lcxr['data'][0].x)) \n",
"ax1.set_ylim(min(lcxr['data'][0].y), max(lcxr['data'][0].y)) \n",
"\n",
"plt.show()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
":::{note} \n",
"From this plot you should be able to start thinking about the land cover composition of each municipality, and how they might differ. For example, which are the municipalities with relatively high (or low) crop cover compared to rangeland cover?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One last piece of data manipulation is needed as the [`zonal_stats` function](https://pythonhosted.org/rasterstats/manual.html#zonal-statistics) we will use from the [rasterstats package](https://pypi.org/project/rasterstats/) expects a 2D array in a format that can be read by the [rasterio package](https://pypi.org/project/rasterio/) (e.g. a `numpy array`). Hence data are for a single year of our data set. "
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:44.209367Z",
"iopub.status.busy": "2022-09-23T12:41:44.208766Z",
"iopub.status.idle": "2022-09-23T12:41:45.384816Z",
"shell.execute_reply": "2022-09-23T12:41:45.383845Z"
}
},
"source": [
"lcxr_transform = lcxr_mask['data'][0].rio.transform() #get the spatial transform info \n",
"lcxr_17 = np.array(lcxr_mask['data'][0].astype('int32')) #convert data for 2017 to numpy array"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Zonal Stats\n",
"\n",
"Now we're ready to actually calculate the statistics from the land cover raster by municipality shapefile polygon. When [specifying that our input raster is categorical](https://pythonhosted.org/rasterstats/manual.html#working-with-categorical-rasters) (as is the case for our land cover data), the `zonal_stats` function returns a list of dictionaries, one for each municipality polgyon. In turn, each dictionary provides cell counts (values) for each (land cover) class (keys)."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:45.389699Z",
"iopub.status.busy": "2022-09-23T12:41:45.389114Z",
"iopub.status.idle": "2022-09-23T12:41:45.830714Z",
"shell.execute_reply": "2022-09-23T12:41:45.829696Z"
}
},
"source": [
"#from rasterstats import zonal_stats\n",
"lcxr_17z = zonal_stats(sa_munis, lcxr_17,\n",
" affine=lcxr_transform,\n",
" categorical=True)\n",
"lcxr_17z "
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
":::{tip} It's easier to analyse this output if we store it as a Pandas DataFrame."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:45.835499Z",
"iopub.status.busy": "2022-09-23T12:41:45.835182Z",
"iopub.status.idle": "2022-09-23T12:41:45.856476Z",
"shell.execute_reply": "2022-09-23T12:41:45.855408Z"
}
},
"source": [
"lcxr_17z_pd = pd.DataFrame(lcxr_17z) #convert list of dicts to DataFrame\n",
"lcxr_17z_pd.set_index(sa_munis['NM_MUN'],inplace=True) #use municipality name as index\n",
"lcxr_17z_pd.rename(columns=values_to_classes,inplace=True) #rename columns using dict created above\n",
"lcxr_17z_pd = lcxr_17z_pd.add_suffix('17') #remind ourselves these data are for a given year\n",
"lcxr_17z_pd.head() "
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We could examine these total counts, but in many ways it is more useful to consider the proportion of a muncipality that is covered by each land cover. "
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:45.861191Z",
"iopub.status.busy": "2022-09-23T12:41:45.860368Z",
"iopub.status.idle": "2022-09-23T12:41:45.884497Z",
"shell.execute_reply": "2022-09-23T12:41:45.883639Z"
}
},
"source": [
"lcxr_17z_pd['sum17']=lcxr_17z_pd.sum(axis=1) #add column for total cells in the muni\n",
"\n",
"lcxr_17prop_pd = pd.DataFrame(index = lcxr_17z_pd.index) #create a new DataFrame to hold the proportions\n",
"\n",
"#calculate proportion for each land cover\n",
"for column in lcxr_17z_pd.loc[:,'Water17':'Rangeland17']:\n",
" lcxr_17prop_pd['{}-prop'.format(column)] = lcxr_17z_pd[column] / lcxr_17z_pd['sum17'] \n",
" \n",
"lcxr_17prop_pd.head()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We could create a-spatial plots (e.g. bar plots) directly from this DataFrame. To create spatial plots (maps) we need to attach the spatial information for each municipality to create a GeoPandas GeoDataFrame:"
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:45.889162Z",
"iopub.status.busy": "2022-09-23T12:41:45.888562Z",
"iopub.status.idle": "2022-09-23T12:41:46.027279Z",
"shell.execute_reply": "2022-09-23T12:41:46.026130Z"
}
},
"source": [
"lcxr_17prop_gpd = pd.merge(sa_munis, lcxr_17prop_pd, how='left', left_on = 'NM_MUN', right_index=True)\n",
"lcxr_17prop_gpd.head()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And now we can do a static plot to visualise the proportion of two land covers across the municipalities."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:46.033876Z",
"iopub.status.busy": "2022-09-23T12:41:46.033510Z",
"iopub.status.idle": "2022-09-23T12:41:47.241179Z",
"shell.execute_reply": "2022-09-23T12:41:47.240261Z"
},
"tags": [
"hide-input"
]
},
"source": [
"%matplotlib inline\n",
"cols = ['Crops17-prop','Rangeland17-prop'] #columns to plot\n",
"cbins = [0.15,0.3,0.45,0.6,0.75] #classification bins for plotting\n",
"\n",
"fig, ax = plt.subplots(1, 2, figsize=(9, 30))\n",
"\n",
"for ix, col in enumerate(cols):\n",
" lcxr_17prop_gpd.plot(\n",
" column=col, cmap='viridis',\n",
" scheme='UserDefined', \n",
" classification_kwds={'bins': cbins}, \n",
" linewidth=0., \n",
" legend=True, legend_kwds={\"title\":col,\"loc\": 5},\n",
" ax=ax[ix]\n",
" )\n",
" \n",
" ax[ix].title.set_text(cols[ix])\n",
" ax[ix].set_xlim(-6450000,-5750000) #axis limits to fit legend\n",
" ax[ix].set_xlabel('x [metre]')\n",
" ax[ix].set_ylabel('y [metre]')\n",
"\n",
"plt.show()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An interactive plot enables linked pan/zoom of the two variables (although without consistent classification of polygons). "
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:41:47.245220Z",
"iopub.status.busy": "2022-09-23T12:41:47.244887Z",
"iopub.status.idle": "2022-09-23T12:42:00.826083Z",
"shell.execute_reply": "2022-09-23T12:42:00.825246Z"
},
"tags": [
"hide-input"
]
},
"source": [
"cols = ['Crops17-prop','Rangeland17-prop'] #columns to plot\n",
"\n",
"#import hvplot.pandas # noqa\n",
"hv.extension('bokeh')\n",
"lcxr_17prop_interactive = lcxr_17prop_gpd.hvplot(\n",
" c=cols[0],geo=True, crs=lcxr_17prop_gpd.crs.to_epsg(),cmap='viridis', title=cols[0]) + lcxr_17prop_gpd.hvplot(\n",
" c=cols[1],geo=True, crs=lcxr_17prop_gpd.crs.to_epsg(),cmap='viridis', title=cols[1])\n",
"\n",
"lcxr_17prop_interactive"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
":::{note} \n",
"From these plots we should be able to see what we might have expected from the raster data: municipalities in the north and south of the study area are dominated by rangeland, while those in the centre are dominated more by cropland. We might then think about what social, economic, political or other characteristics of the municipalities have led to this situation. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To examine change through time, we need to calculate the zonal stats for a second year, then calculate proportional change between the two time points. First the zonal stats. "
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:42:00.904998Z",
"iopub.status.busy": "2022-09-23T12:42:00.904604Z",
"iopub.status.idle": "2022-09-23T12:42:02.700332Z",
"shell.execute_reply": "2022-09-23T12:42:02.699393Z"
},
"tags": [
"hide-input"
]
},
"source": [
"lcxr_21 = np.array(lcxr_mask['data'][4].astype('int32')) #convert data for 2021 to numpy array\n",
"\n",
"lcxr_21z = zonal_stats(sa_munis, lcxr_21,\n",
" affine=lcxr_transform,\n",
" categorical=True)\n",
"\n",
"lcxr_21z_pd = pd.DataFrame(lcxr_21z) #convert list of dicts to DataFrame\n",
"lcxr_21z_pd.set_index(sa_munis['NM_MUN'],inplace=True) #use municipality name as index\n",
"lcxr_21z_pd.rename(columns=values_to_classes,inplace=True) #rename columns using dict created above\n",
"lcxr_21z_pd = lcxr_21z_pd.add_suffix('21') #remind ourselves these data are for a given year\n",
"\n",
"lcxr_21z_pd['sum21']=lcxr_21z_pd.sum(axis=1) #add column for total cells in the muni\n",
"lcxr_21z_pd.head() "
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now calculate the proportional change between the two years."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:42:02.706031Z",
"iopub.status.busy": "2022-09-23T12:42:02.705191Z",
"iopub.status.idle": "2022-09-23T12:42:02.754283Z",
"shell.execute_reply": "2022-09-23T12:42:02.753310Z"
},
"tags": [
"hide-input"
]
},
"source": [
"lcxr_1721z_pd = pd.merge(lcxr_17z_pd, lcxr_21z_pd, how='left', left_index = True, right_index=True)\n",
"\n",
"lcxr_1721prop_pd = pd.DataFrame(index = lcxr_1721z_pd.index) #create a new DataFrame to hold the proportions\n",
"\n",
"lcpresent = ['Water','Trees','Flooded vegetation','Crops', 'Built area', 'Bare ground', 'Rangeland']\n",
"\n",
"#calculate proportional change for each land cover\n",
"for column in lcpresent:\n",
" initial = lcxr_1721z_pd['{}17'.format(column)]\n",
" final = lcxr_1721z_pd['{}21'.format(column)]\n",
" lcxr_1721prop_pd['{}-propD'.format(column)] = (final - initial) / initial \n",
" \n",
"lcxr_1721prop_pd.describe()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From the summary statistics we can already begin to see differences in the trajectories of change. To map change spatially we finally merge with the spatial information."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:42:02.759136Z",
"iopub.status.busy": "2022-09-23T12:42:02.758452Z",
"iopub.status.idle": "2022-09-23T12:42:02.770688Z",
"shell.execute_reply": "2022-09-23T12:42:02.769607Z"
}
},
"source": [
"lcxr_1721prop_gpd = pd.merge(sa_munis, lcxr_1721prop_pd, how='left', left_on = 'NM_MUN', right_index=True)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Given that proportional change can be positive or negative, we should use a diverging colour palette."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:42:02.774710Z",
"iopub.status.busy": "2022-09-23T12:42:02.774399Z",
"iopub.status.idle": "2022-09-23T12:42:03.783627Z",
"shell.execute_reply": "2022-09-23T12:42:03.782594Z"
},
"tags": [
"hide-input"
]
},
"source": [
"%matplotlib inline\n",
"cols = ['Crops-propD','Rangeland-propD'] #columns to plot\n",
"cbins = [-2,-1.5,-1,-0.5, 0, 0.5,1,1.5,2,2.5] #classification bins for plotting\n",
"\n",
"fig, ax = plt.subplots(1, 2, figsize=(9, 30))\n",
"\n",
"for ix, col in enumerate(cols):\n",
" lcxr_1721prop_gpd.plot(\n",
" column=col, cmap='RdYlGn', #colour palette\n",
" scheme='UserDefined', \n",
" classification_kwds={'bins': cbins}, \n",
" linewidth=0., \n",
" legend=True, legend_kwds={\"title\":col,\"loc\": 5},\n",
" ax=ax[ix]\n",
" )\n",
" \n",
" ax[ix].title.set_text(cols[ix])\n",
" ax[ix].set_xlim(-6450000,-5750000) #axis limits to fit legend\n",
" ax[ix].set_xlabel('x [metre]')\n",
" ax[ix].set_ylabel('y [metre]')\n",
"\n",
"plt.show()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This allows us to see that all municipalities decreased their Rangeland cover and increased their Cropland cover between 2017 and 2021, and to locate municipalities with extremes of change (which we could then go and further investigate). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"This notebook introduce manipulation and exploratory analysis of classified land cover data. Specifically, it: \n",
"- used `pystac-client` to access information from the [Microsoft's Planetary Computer API](https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac/) and used that with `odc-stac` to read land cover raster data from an online repository.\n",
"- used `matplotlib` and `holoviews` to visualise the spatio-temporal land cover data statically and interactively.\n",
"- used from `sklearn` to create a matrix of pixel-by-pixel change and then `holoviews` to visualise that change using a Sankey diagram.\n",
"- used `rasterstats` to analyse zonal change with tabular data and visualisations."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Citing this Notebook\n",
"\n",
"Please see [CITATION.cff](https://github.com/eds-book/b128b282-dee7-44a7-bc21-f1fd21452a83/blob/main/CITATION.cff) for the full citation information. The citation file can be exported to APA or BibTex formats (learn more [here](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files))."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Additional information\n",
"\n",
"**Review**: This notebook has been reviewed by one or more members of the Environmental Data Science book community. The open review is available [here](https://github.com/eds-book-gallery/b128b282-dee7-44a7-bc21-f1fd21452a83/pull/2).\n",
"\n",
"**Dataset**: 10m Annual Land Use Land Cover (9-class) from Impact Observatory for 2017-2021.\n",
"\n",
"**License**: The code in this notebook is licensed under the MIT License. The Environmental Data Science book is licensed under the Creative Commons by Attribution 4.0 license. See further details [here](https://github.com/alan-turing-institute/environmental-ds-book/blob/main/LICENSE).\n",
"\n",
"**Contact**: If you have any suggestion or report an issue with this notebook, feel free to [create an issue](https://github.com/alan-turing-institute/environmental-ds-book/issues/new/choose) or send a direct message to [environmental.ds.book@gmail.com](mailto:environmental.ds.book@gmail.com)."
]
},
{
"cell_type": "code",
"metadata": {
"execution": {
"iopub.execute_input": "2022-09-23T12:42:03.788195Z",
"iopub.status.busy": "2022-09-23T12:42:03.787898Z",
"iopub.status.idle": "2022-09-23T12:42:03.795906Z",
"shell.execute_reply": "2022-09-23T12:42:03.794984Z"
},
"tags": [
"remove-input"
]
},
"source": [
"from datetime import date\n",
"\n",
"print('Notebook repository version: v2025.6.0')\n",
"print(f'Last tested: {date.today()}')"
],
"outputs": [],
"execution_count": null
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}