{ "cells": [ { "cell_type": "markdown", "id": "9d7f0a60-fce7-455e-a68a-568784c61e32", "metadata": { "tags": [] }, "source": [ "## Zonal Statistics\n", "\n", "In this tutorial, you'll learn how to use [xarray-spatial](https://xarray-spatial.org/) to work with zonal statistics. Zonal statistics help you better understand data from one source by analyzing it for different zones defined by another source. This operation uses two datasets: One dataset, the *zones raster*, defines one or more zones. A second dataset, the *values raster*, contains the data you want to analyze for each of the zones defined by the first dataset. If you're familiar with SQL or dataframe libraries like [pandas](https://pandas.pydata.org/), this is similar to a group by operation: you essentially group the *values* raster by the *zones* raster.\n", "\n", "In this tutorial, we'll use the [Esri / Observatory 10-Meter Land Cover](https://planetarycomputer.microsoft.com/dataset/io-lulc) dataset, which classifies pixels into one of ten classes, as the zones raster. Our values raster will be [Normalized difference vegetation index (NDVI)](https://en.wikipedia.org/wiki/Normalized_difference_vegetation_index), derived from [Sentinel-2 Level 2-A](https://planetarycomputer.microsoft.com/dataset/sentinel-2-l2a).\n", "\n", "---" ] }, { "cell_type": "code", "execution_count": 1, "id": "8b31bf3b-da87-4729-892f-81889e8617a8", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "import planetary_computer\n", "import pystac_client\n", "import stackstac\n", "import rasterio.features\n", "\n", "import matplotlib.pyplot as plt\n", "from matplotlib.colors import ListedColormap\n", "\n", "from xrspatial.multispectral import true_color\n", "\n", "from xrspatial.classify import equal_interval\n", "from xrspatial.zonal import stats as zonal_stats\n", "from xrspatial.zonal import crosstab as zonal_crosstab" ] }, { "cell_type": "markdown", "id": "21c9c407-3f22-431e-a1f9-237c0533f275", "metadata": { "tags": [] }, "source": [ "### Preparation: Create a local Dask cluster\n", "\n", "We'll make a local Dask cluster to process the data in parallel." ] }, { "cell_type": "code", "execution_count": 2, "id": "bc933697-db71-4374-ac71-aaa202f33b50", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/proxy/8787/status\n" ] } ], "source": [ "from dask.distributed import Client\n", "\n", "client = Client()\n", "print(f\"/proxy/{client.scheduler_info()['services']['dashboard']}/status\")" ] }, { "cell_type": "markdown", "id": "3ce0ca84-38d3-411e-9d44-23fbaed7f7be", "metadata": {}, "source": [ "To follow the progress of your computation, you can [access the Dask Dashboard](https://planetarycomputer.microsoft.com/docs/quickstarts/scale-with-dask/#Open-the-dashboard) at the URL from the previous cell's output." ] }, { "cell_type": "markdown", "id": "8656dbca-f331-42cd-95e7-bcd2c79082b5", "metadata": {}, "source": [ "### Load and coregister data\n", "\n", "The area of interest covers Lake Bridgeport, Texas, USA. We'll use `pystac-client` to find items covering that area, and `stackstac` to load the items into a `DataArray`." ] }, { "cell_type": "code", "execution_count": 3, "id": "2a2ca1db-76ec-4978-b5ac-8e5b67488c34", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'Landcover' (y: 1338, x: 1115)>\n",
       "dask.array<getitem, shape=(1338, 1115), dtype=int8, chunksize=(1338, 1115), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "    band               <U4 'data'\n",
       "  * x                  (x) float64 -1.091e+07 -1.091e+07 ... -1.08e+07 -1.08e+07\n",
       "  * y                  (y) float64 4.029e+06 4.029e+06 ... 3.895e+06 3.895e+06\n",
       "    label:properties   object None\n",
       "    label:classes      object {'name': '', 'classes': ['nodata', 'water', 'tr...\n",
       "    label:description  <U4 'lulc'\n",
       "    proj:shape         object {74688, 101302}\n",
       "    label:type         <U6 'raster'\n",
       "    start_datetime     <U20 '2020-01-01T00:00:00Z'\n",
       "    end_datetime       <U20 '2021-01-01T00:00:00Z'\n",
       "    raster:bands       object {'nodata': 0, 'spatial_resolution': 10}\n",
       "    epsg               int64 3857\n",
       "Attributes:\n",
       "    spec:        RasterSpec(epsg=3857, bounds=(-10909400, 3895100, -10797900,...\n",
       "    crs:         epsg:3857\n",
       "    transform:   | 100.00, 0.00,-10909400.00|\\n| 0.00,-100.00, 4028900.00|\\n|...\n",
       "    resolution:  100
" ], "text/plain": [ "\n", "dask.array\n", "Coordinates:\n", " band \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'stackstac-0446337b875acfbadb5267ecf6c67e26' (band: 4,\n",
       "                                                                y: 1338, x: 1115)>\n",
       "dask.array<nanmedian, shape=(4, 1338, 1115), dtype=float64, chunksize=(1, 669, 1115), chunktype=numpy.ndarray>\n",
       "Coordinates: (12/19)\n",
       "  * band                                     (band) <U5 'blue' 'green' ... 'nir'\n",
       "  * x                                        (x) float64 -1.091e+07 ... -1.08...\n",
       "  * y                                        (y) float64 4.029e+06 ... 3.895e+06\n",
       "    s2:datatake_type                         <U8 'INS-NOBS'\n",
       "    constellation                            <U10 'Sentinel 2'\n",
       "    sat:orbit_state                          <U10 'descending'\n",
       "    ...                                       ...\n",
       "    proj:shape                               object {10980}\n",
       "    title                                    (band) <U20 'Band 2 - Blue - 10m...\n",
       "    common_name                              (band) <U5 'blue' 'green' ... 'nir'\n",
       "    center_wavelength                        (band) float64 0.49 0.56 ... 0.842\n",
       "    full_width_half_max                      (band) float64 0.098 ... 0.145\n",
       "    epsg                                     int64 3857\n",
       "Attributes:\n",
       "    spec:        RasterSpec(epsg=3857, bounds=(-10909400, 3895100, -10797900,...\n",
       "    crs:         epsg:3857\n",
       "    transform:   | 100.00, 0.00,-10909400.00|\\n| 0.00,-100.00, 4028900.00|\\n|...\n",
       "    resolution:  100
" ], "text/plain": [ "\n", "dask.array\n", "Coordinates: (12/19)\n", " * band (band) 0, other=np.nan) # Sentinels uses 0 as nodata\n", " )\n", " .median(dim=\"time\", keep_attrs=True)\n", " .persist()\n", ")\n", "sentinel_data" ] }, { "cell_type": "markdown", "id": "7226d5ce-b53c-4cf3-85d7-61ad81845188", "metadata": {}, "source": [ "Create a true color image for the Sentinel data:" ] }, { "cell_type": "code", "execution_count": 8, "id": "ca6d1e79-1b9c-4696-9965-cec3bda7f352", "metadata": {}, "outputs": [], "source": [ "sentinel_img = true_color(\n", " sentinel_data.sel(band=\"red\"),\n", " sentinel_data.sel(band=\"green\"),\n", " sentinel_data.sel(band=\"blue\"),\n", " c=30,\n", " th=0.075,\n", " name=\"True color (Sentinel)\",\n", ")" ] }, { "cell_type": "code", "execution_count": 9, "id": "cdcf26cf-0565-4a56-a938-dd9f3ce7e6b9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(18, 8))\n", "\n", "landcover_data.plot.imshow(ax=ax1, cmap=landcover_cmap)\n", "sentinel_img.plot.imshow(ax=ax2)\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "id": "457e5a1b-dbba-4281-b45f-31d6f35dbe8b", "metadata": {}, "source": [ "As a final bit of setup, we'll compute NDVI for the sentinel-2 scene. This will be our values raster when we compute zonal statistics later on." ] }, { "cell_type": "code", "execution_count": 10, "id": "dafca186-2f18-4ce1-a828-fe32ff005a1d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'stackstac-0446337b875acfbadb5267ecf6c67e26' (y: 1338, x: 1115)>\n",
       "dask.array<truediv, shape=(1338, 1115), dtype=float64, chunksize=(669, 1115), chunktype=numpy.ndarray>\n",
       "Coordinates: (12/14)\n",
       "  * x                                        (x) float64 -1.091e+07 ... -1.08...\n",
       "  * y                                        (y) float64 4.029e+06 ... 3.895e+06\n",
       "    s2:datatake_type                         <U8 'INS-NOBS'\n",
       "    constellation                            <U10 'Sentinel 2'\n",
       "    sat:orbit_state                          <U10 'descending'\n",
       "    s2:saturated_defective_pixel_percentage  float64 0.0\n",
       "    ...                                       ...\n",
       "    s2:processing_baseline                   <U5 '02.12'\n",
       "    s2:degraded_msi_data_percentage          float64 0.0\n",
       "    s2:product_type                          <U7 'S2MSI2A'\n",
       "    gsd                                      float64 10.0\n",
       "    proj:shape                               object {10980}\n",
       "    epsg                                     int64 3857
" ], "text/plain": [ "\n", "dask.array\n", "Coordinates: (12/14)\n", " * x (x) float64 -1.091e+07 ... -1.08...\n", " * y (y) float64 4.029e+06 ... 3.895e+06\n", " s2:datatake_type \n", "#T_a3a4b_row0_col0, #T_a3a4b_row0_col2, #T_a3a4b_row3_col3, #T_a3a4b_row7_col1 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", "}\n", "#T_a3a4b_row0_col1 {\n", " background-color: #0f76b3;\n", " color: #f1f1f1;\n", "}\n", "#T_a3a4b_row0_col3 {\n", " background-color: #ebe6f2;\n", " color: #000000;\n", "}\n", "#T_a3a4b_row1_col0, #T_a3a4b_row1_col1, #T_a3a4b_row1_col3, #T_a3a4b_row2_col2 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", "}\n", "#T_a3a4b_row1_col2 {\n", " background-color: #5c9fc9;\n", " color: #f1f1f1;\n", "}\n", "#T_a3a4b_row2_col0 {\n", " background-color: #056ba7;\n", " color: #f1f1f1;\n", "}\n", "#T_a3a4b_row2_col1 {\n", " background-color: #045788;\n", " color: #f1f1f1;\n", "}\n", "#T_a3a4b_row2_col3 {\n", " background-color: #03446a;\n", " color: #f1f1f1;\n", "}\n", "#T_a3a4b_row3_col0 {\n", " background-color: #0d75b3;\n", " color: #f1f1f1;\n", "}\n", "#T_a3a4b_row3_col1 {\n", " background-color: #0568a3;\n", " color: #f1f1f1;\n", "}\n", "#T_a3a4b_row3_col2 {\n", " background-color: #acc0dd;\n", " color: #000000;\n", "}\n", "#T_a3a4b_row4_col0 {\n", " background-color: #2182b9;\n", " color: #f1f1f1;\n", "}\n", "#T_a3a4b_row4_col1 {\n", " background-color: #03517e;\n", " color: #f1f1f1;\n", "}\n", "#T_a3a4b_row4_col2 {\n", " background-color: #358fc0;\n", " color: #f1f1f1;\n", "}\n", "#T_a3a4b_row4_col3 {\n", " background-color: #1379b5;\n", " color: #f1f1f1;\n", "}\n", "#T_a3a4b_row5_col0 {\n", " background-color: #056caa;\n", " color: #f1f1f1;\n", "}\n", "#T_a3a4b_row5_col1 {\n", " background-color: #045585;\n", " color: #f1f1f1;\n", "}\n", "#T_a3a4b_row5_col2, #T_a3a4b_row7_col2 {\n", " background-color: #d6d6e9;\n", " color: #000000;\n", "}\n", "#T_a3a4b_row5_col3 {\n", " background-color: #02395a;\n", " color: #f1f1f1;\n", "}\n", "#T_a3a4b_row6_col0 {\n", " background-color: #2987bc;\n", " color: #f1f1f1;\n", "}\n", "#T_a3a4b_row6_col1 {\n", " background-color: #045687;\n", " color: #f1f1f1;\n", "}\n", "#T_a3a4b_row6_col2 {\n", " background-color: #e0dded;\n", " color: #000000;\n", "}\n", "#T_a3a4b_row6_col3 {\n", " background-color: #4a98c5;\n", " color: #f1f1f1;\n", "}\n", "#T_a3a4b_row7_col0 {\n", " background-color: #f1ebf4;\n", " color: #000000;\n", "}\n", "#T_a3a4b_row7_col3 {\n", " background-color: #fef6fa;\n", " color: #000000;\n", "}\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 meanmaxmincount
zone    
water0.0178840.846885-0.44312945139.000000
trees0.7432800.900511-0.277344339387.000000
grass0.5837490.877947-0.140070323999.000000
flooded veg0.5475170.860692-0.3342881573.000000
crops0.5100500.882642-0.252886241772.000000
scrub0.5777870.879797-0.375246336952.000000
built area0.4940270.878849-0.388387198272.000000
bare0.0875790.704829-0.3750674776.000000
\n" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quantile_stats = (\n", " zonal_stats(\n", " zones=landcover_data,\n", " values=ndvi,\n", " stats_funcs=[\"mean\", \"max\", \"min\", \"count\"],\n", " )\n", " .compute()\n", " .set_index(\"zone\")\n", " .rename(landcover_labels)\n", ")\n", "quantile_stats.style.background_gradient()" ] }, { "cell_type": "markdown", "id": "59ab8b09-0eb5-4722-b124-a8766f8212a0", "metadata": {}, "source": [ "Unsurprisingly, landcover classes like \"trees\" and \"grass\" have relatively high NDVI values. Classes like \"water\" and \"bare\" are relatively low." ] }, { "cell_type": "markdown", "id": "ec2b88c2-6af6-47ea-9eaa-bfcfe81a684e", "metadata": {}, "source": [ "### Compute zonal cross-tabulation statistics\n", "\n", "[xrspatial.zonal.crosstab](https://xarray-spatial.org/reference/_autosummary/xrspatial.zonal.crosstab.html) calculates cross-tabulated areas between the zone raster and value raster. This function requires a 2D zone raster and a categorical value raster of 2D or 3D data. \n", "\n", "The `crosstab` function calculates different cross-tabulation statistics. It has an `agg` parameter to define which aggregation method to use. It returns a DataFrame where each row represents a zone from the zone raster and each column represents a category from the value raster. \n", "\n", "This example uses the NASADEM elevation zones as its zone raster and the Esri Land Cover data as its categorical value raster. The resulting DataFrame will show the percentage of each land cover category for each of the five elevation zones." ] }, { "cell_type": "code", "execution_count": 12, "id": "61164b07-7c0f-4977-a4a5-08806353985f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'Elevation (NASADEM)' (y: 1338, x: 1115)>\n",
       "dask.array<getitem, shape=(1338, 1115), dtype=float64, chunksize=(1338, 1115), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "    band        <U9 'elevation'\n",
       "  * x           (x) float64 -1.091e+07 -1.091e+07 ... -1.08e+07 -1.08e+07\n",
       "  * y           (y) float64 4.029e+06 4.029e+06 ... 3.895e+06 3.895e+06\n",
       "    proj:epsg   int64 4326\n",
       "    proj:shape  object {3601}\n",
       "    title       <U9 'Elevation'\n",
       "    epsg        int64 3857\n",
       "Attributes:\n",
       "    spec:        RasterSpec(epsg=3857, bounds=(-10909400, 3895100, -10797900,...\n",
       "    crs:         epsg:3857\n",
       "    transform:   | 100.00, 0.00,-10909400.00|\\n| 0.00,-100.00, 4028900.00|\\n|...\n",
       "    resolution:  100
" ], "text/plain": [ "\n", "dask.array\n", "Coordinates:\n", " band " ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "num_zones = 5\n", "elevation_cmap = ListedColormap(plt.get_cmap(\"Set1\").colors[:num_zones])\n", "\n", "quantile_zones = equal_interval(\n", " nasadem_data, k=num_zones, name=\"Zones (Classified Elevation - NASADEM)\"\n", ")\n", "fig, ax = plt.subplots(figsize=(8, 8))\n", "img = quantile_zones.plot.imshow(cmap=elevation_cmap, ax=ax)\n", "img.colorbar.set_ticks([0.4, 1.2, 2.0, 2.8, 3.6])\n", "img.colorbar.set_ticklabels(range(5))\n", "\n", "ax.set_axis_off()" ] }, { "cell_type": "markdown", "id": "3fd7b0d5-9f04-4ff1-8c49-769ff8514b7b", "metadata": {}, "source": [ "Finally, calculate the cross-tabulation statistics and display a table demonstrating how the land cover categories are distributed over each of the five elevation zones. Set values for `zone_ids` and `cat_ids` to indicate the zones and the categories of interest. In this example, we'll use all the available zones and categories." ] }, { "cell_type": "code", "execution_count": 14, "id": "fcfb1a9a-3e25-4e4f-8640-55d24964f8ba", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 watertreesgrassflooded vegcropsscrubbuilt areabare
zone        
0.00000019.27079616.61685310.5779620.3952117.4216787.08858538.4398630.189053
1.0000002.02724923.44893122.1477680.10960720.12937413.82865417.7747410.533675
2.0000002.07503320.59966523.6546040.09097916.61100027.9119118.7590150.297792
3.0000000.59674127.80403721.2788630.02931914.01357228.3217617.8648910.090816
4.0000000.21621634.70270319.1065180.0031806.20667732.2829897.0333860.448331
\n" ], "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "crosstab = (\n", " zonal_crosstab(\n", " zones=quantile_zones,\n", " values=landcover_data,\n", " agg=\"percentage\",\n", " )\n", " .rename(columns=landcover_labels)\n", " .compute()\n", " .set_index(\"zone\")\n", ")\n", "\n", "crosstab.style.background_gradient()" ] }, { "cell_type": "markdown", "id": "59f06173-b1ae-41a6-9799-90eaf9223e39", "metadata": {}, "source": [ "### Next steps: analyze different datasets\n", "\n", "The [Planetary Computer Data Catalog](https://planetarycomputer.microsoft.com/catalog) includes petabytes of environmental monitoring data. All data sets are available in consistent, analysis-ready formats. You can access them through APIs as well as directly via [Azure Storage](https://docs.microsoft.com/en-us/azure/storage/). \n", "\n", "Try using [xrspatial.zonal.stats](https://xarray-spatial.org/reference/_autosummary/xrspatial.zonal.stats.html) and [xrspatial.zonal.crosstab](https://xarray-spatial.org/reference/_autosummary/xrspatial.zonal.crosstab.html) with different datasets from the [Data Catalog](https://planetarycomputer.microsoft.com/catalog). For example, try using the land cover categories from the [Esri 10m Land Cover](https://www.arcgis.com/home/item.html?id=d6642f8a4f6d4685a24ae2dc0c73d4ac) dataset as zonal raster. Or try using the [Map of Biodiversity Importance (MoBI)](https://planetarycomputer.microsoft.com/dataset/mobi) dataset as a values raster.\n", "\n", "There are also [other zonal functions in xarray spatial](https://xarray-spatial.org/reference/zonal.html) to use with your datasets." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": { "3e1b9c4dc4544d6e9a382d26ba13ddba": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "VBoxModel", "state": { "layout": "IPY_MODEL_d487752ac5c6435ab5645ff570a29832" } }, "d487752ac5c6435ab5645ff570a29832": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} } }, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }