{
"cells": [
{
"cell_type": "markdown",
"id": "9d7f0a60-fce7-455e-a68a-568784c61e32",
"metadata": {
"tags": []
},
"source": [
"## Zonal Statistics\n",
"\n",
"In this tutorial, you'll learn how to use [xarray-spatial](https://xarray-spatial.readthedocs.io/en/latest/index.html) to work with zonal statistics. Zonal statistics help you better understand data from one source by analyzing it for different zones defined by another source. This operation uses two datasets: One dataset, the *zones raster*, defines one or more zones. A second dataset, the *values raster*, contains the data you want to analyze for each of the zones defined by the first dataset. If you're familiar with SQL or dataframe libraries like [pandas](https://pandas.pydata.org/), this is similar to a group by operation: you essentially group the *values* raster by the *zones* raster.\n",
"\n",
"In this tutorial, we'll use the [Esri / Observatory 10-Meter Land Cover](https://planetarycomputer.microsoft.com/dataset/io-lulc) dataset, which classifies pixels into one of ten classes, as the zones raster. Our values raster will be [Normalized difference vegetation index (NDVI)](https://en.wikipedia.org/wiki/Normalized_difference_vegetation_index), derived from [Sentinel-2 Level 2-A](https://planetarycomputer.microsoft.com/dataset/sentinel-2-l2a).\n",
"\n",
"---"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "8b31bf3b-da87-4729-892f-81889e8617a8",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"import planetary_computer\n",
"import pystac_client\n",
"import stackstac\n",
"import rasterio.features\n",
"\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib.colors import ListedColormap\n",
"\n",
"from xrspatial.multispectral import true_color\n",
"\n",
"from xrspatial.classify import equal_interval\n",
"from xrspatial.zonal import stats as zonal_stats\n",
"from xrspatial.zonal import crosstab as zonal_crosstab"
]
},
{
"cell_type": "markdown",
"id": "21c9c407-3f22-431e-a1f9-237c0533f275",
"metadata": {
"tags": []
},
"source": [
"### Preparation: Create a local Dask cluster\n",
"\n",
"We'll make a local Dask cluster to process the data in parallel."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "bc933697-db71-4374-ac71-aaa202f33b50",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/proxy/8787/status\n"
]
}
],
"source": [
"from dask.distributed import Client\n",
"\n",
"client = Client()\n",
"print(f\"/proxy/{client.scheduler_info()['services']['dashboard']}/status\")"
]
},
{
"cell_type": "markdown",
"id": "3ce0ca84-38d3-411e-9d44-23fbaed7f7be",
"metadata": {},
"source": [
"To follow the progress of your computation, you can [access the Dask Dashboard](https://planetarycomputer.microsoft.com/docs/quickstarts/scale-with-dask/#Open-the-dashboard) at the URL from the previous cell's output."
]
},
{
"cell_type": "markdown",
"id": "8656dbca-f331-42cd-95e7-bcd2c79082b5",
"metadata": {},
"source": [
"### Load and coregister data\n",
"\n",
"The area of interest covers Lake Bridgeport, Texas, USA. We'll use `pystac-client` to find items covering that area, and `stackstac` to load the items into a `DataArray`."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "2a2ca1db-76ec-4978-b5ac-8e5b67488c34",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"quantile_stats = (\n",
" zonal_stats(\n",
" zones=landcover_data,\n",
" values=ndvi,\n",
" stats_funcs=[\"mean\", \"max\", \"min\", \"count\"],\n",
" )\n",
" .compute()\n",
" .set_index(\"zone\")\n",
" .rename(landcover_labels)\n",
")\n",
"quantile_stats.style.background_gradient()"
]
},
{
"cell_type": "markdown",
"id": "59ab8b09-0eb5-4722-b124-a8766f8212a0",
"metadata": {},
"source": [
"Unsurprisingly, landcover classes like \"trees\" and \"grass\" have relatively high NDVI values. Classes like \"water\" and \"bare\" are relatively low."
]
},
{
"cell_type": "markdown",
"id": "ec2b88c2-6af6-47ea-9eaa-bfcfe81a684e",
"metadata": {},
"source": [
"### Compute zonal cross-tabulation statistics\n",
"\n",
"[xrspatial.zonal.crosstab](https://xarray-spatial.readthedocs.io/en/latest/reference/_autosummary/xrspatial.zonal.crosstab.html) calculates cross-tabulated areas between the zone raster and value raster. This function requires a 2D zone raster and a categorical value raster of 2D or 3D data. \n",
"\n",
"The `crosstab` function calculates different cross-tabulation statistics. It has an `agg` parameter to define which aggregation method to use. It returns a DataFrame where each row represents a zone from the zone raster and each column represents a category from the value raster. \n",
"\n",
"This example uses the NASADEM elevation zones as its zone raster and the Esri Land Cover data as its categorical value raster. The resulting DataFrame will show the percentage of each land cover category for each of the five elevation zones."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "61164b07-7c0f-4977-a4a5-08806353985f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"num_zones = 5\n",
"elevation_cmap = ListedColormap(plt.get_cmap(\"Set1\").colors[:num_zones])\n",
"\n",
"quantile_zones = equal_interval(\n",
" nasadem_data, k=num_zones, name=\"Zones (Classified Elevation - NASADEM)\"\n",
")\n",
"fig, ax = plt.subplots(figsize=(8, 8))\n",
"img = quantile_zones.plot.imshow(cmap=elevation_cmap, ax=ax)\n",
"img.colorbar.set_ticks([0.4, 1.2, 2.0, 2.8, 3.6])\n",
"img.colorbar.set_ticklabels(range(5))\n",
"\n",
"ax.set_axis_off()"
]
},
{
"cell_type": "markdown",
"id": "3fd7b0d5-9f04-4ff1-8c49-769ff8514b7b",
"metadata": {},
"source": [
"Finally, calculate the cross-tabulation statistics and display a table demonstrating how the land cover categories are distributed over each of the five elevation zones. Set values for `zone_ids` and `cat_ids` to indicate the zones and the categories of interest. In this example, we'll use all the available zones and categories."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "fcfb1a9a-3e25-4e4f-8640-55d24964f8ba",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
water
\n",
"
trees
\n",
"
grass
\n",
"
flooded veg
\n",
"
crops
\n",
"
scrub
\n",
"
built area
\n",
"
bare
\n",
"
\n",
"
\n",
"
zone
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0.000000
\n",
"
19.270796
\n",
"
16.616853
\n",
"
10.577962
\n",
"
0.395211
\n",
"
7.421678
\n",
"
7.088585
\n",
"
38.439863
\n",
"
0.189053
\n",
"
\n",
"
\n",
"
1.000000
\n",
"
2.027249
\n",
"
23.448931
\n",
"
22.147768
\n",
"
0.109607
\n",
"
20.129374
\n",
"
13.828654
\n",
"
17.774741
\n",
"
0.533675
\n",
"
\n",
"
\n",
"
2.000000
\n",
"
2.075033
\n",
"
20.599665
\n",
"
23.654604
\n",
"
0.090979
\n",
"
16.611000
\n",
"
27.911911
\n",
"
8.759015
\n",
"
0.297792
\n",
"
\n",
"
\n",
"
3.000000
\n",
"
0.596741
\n",
"
27.804037
\n",
"
21.278863
\n",
"
0.029319
\n",
"
14.013572
\n",
"
28.321761
\n",
"
7.864891
\n",
"
0.090816
\n",
"
\n",
"
\n",
"
4.000000
\n",
"
0.216216
\n",
"
34.702703
\n",
"
19.106518
\n",
"
0.003180
\n",
"
6.206677
\n",
"
32.282989
\n",
"
7.033386
\n",
"
0.448331
\n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"crosstab = (\n",
" zonal_crosstab(\n",
" zones=quantile_zones,\n",
" values=landcover_data,\n",
" agg=\"percentage\",\n",
" )\n",
" .rename(columns=landcover_labels)\n",
" .compute()\n",
" .set_index(\"zone\")\n",
")\n",
"\n",
"crosstab.style.background_gradient()"
]
},
{
"cell_type": "markdown",
"id": "59f06173-b1ae-41a6-9799-90eaf9223e39",
"metadata": {},
"source": [
"### Next steps: analyze different datasets\n",
"\n",
"The [Planetary Computer Data Catalog](https://planetarycomputer.microsoft.com/catalog) includes petabytes of environmental monitoring data. All data sets are available in consistent, analysis-ready formats. You can access them through APIs as well as directly via [Azure Storage](https://docs.microsoft.com/en-us/azure/storage/). \n",
"\n",
"Try using [xrspatial.zonal.stats](https://xarray-spatial.readthedocs.io/en/latest/reference/_autosummary/xrspatial.zonal.stats.html) and [xrspatial.zonal.crosstab](https://xarray-spatial.readthedocs.io/en/latest/reference/_autosummary/xrspatial.zonal.crosstab.html) with different datasets from the [Data Catalog](https://planetarycomputer.microsoft.com/catalog). For example, try using the land cover categories from the [Esri 10m Land Cover](https://www.arcgis.com/home/item.html?id=d6642f8a4f6d4685a24ae2dc0c73d4ac) dataset as zonal raster. Or try using the [Map of Biodiversity Importance (MoBI)](https://planetarycomputer.microsoft.com/dataset/mobi) dataset as a values raster.\n",
"\n",
"There are also [other zonal functions in xarray spatial](https://xarray-spatial.readthedocs.io/en/latest/reference/zonal.html) to use with your datasets."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {
"3e1b9c4dc4544d6e9a382d26ba13ddba": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "VBoxModel",
"state": {
"layout": "IPY_MODEL_d487752ac5c6435ab5645ff570a29832"
}
},
"d487752ac5c6435ab5645ff570a29832": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {}
}
},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}