{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Cloudless Mosaic\n", "\n", "This tutorial constructs a *cloudless mosaic* (also known as a composite) from a time series of satellite images. We'll see the following:\n", "\n", "* Find a time series of images at a particular point on Earth\n", "* Stack those images together into a single array\n", "* Compute the cloudless mosaic by taking a median\n", "* Visualize the results\n", "\n", "This example uses [Sentinel-2 Level-2A](https://planetarycomputer.microsoft.com/dataset/sentinel-2-l2a) data. The techniques used here apply equally well to other remote-sensing datasets." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import xarray as xr\n", "\n", "import rasterio.features\n", "import stackstac\n", "import pystac_client\n", "import planetary_computer\n", "\n", "import xrspatial.multispectral as ms\n", "\n", "from dask_gateway import GatewayCluster" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a Dask cluster\n", "\n", "We're going to process a large amount of data. To cut down on the execution time, we'll use a Dask cluster to do the computation in parallel, adaptively scaling to add and remove workers as needed. See [Scale With Dask](../quickstarts/scale-with-dask.ipynb) for more on using Dask." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "https://pcc-staging.westeurope.cloudapp.azure.com/compute/services/dask-gateway/clusters/staging.5d235feb3f08434baf1868f21aeb3e99/status\n" ] } ], "source": [ "cluster = GatewayCluster() # Creates the Dask Scheduler. Might take a minute.\n", "\n", "client = cluster.get_client()\n", "\n", "cluster.adapt(minimum=4, maximum=24)\n", "print(cluster.dashboard_link)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Discover data\n", "\n", "In this example, we define our area of interest as a GeoJSON object. It's near Redmond, Washington." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "area_of_interest = {\n", " \"type\": \"Polygon\",\n", " \"coordinates\": [\n", " [\n", " [-122.27508544921875, 47.54687159892238],\n", " [-121.96128845214844, 47.54687159892238],\n", " [-121.96128845214844, 47.745787772920934],\n", " [-122.27508544921875, 47.745787772920934],\n", " [-122.27508544921875, 47.54687159892238],\n", " ]\n", " ],\n", "}\n", "bbox = rasterio.features.bounds(area_of_interest)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using `pystac_client` we can search the Planetary Computer's STAC endpoint for items matching our query parameters." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "138\n" ] } ], "source": [ "stac = pystac_client.Client.open(\n", " \"https://planetarycomputer.microsoft.com/api/stac/v1\",\n", " modifier=planetary_computer.sign_inplace,\n", ")\n", "\n", "search = stac.search(\n", " bbox=bbox,\n", " datetime=\"2016-01-01/2020-12-31\",\n", " collections=[\"sentinel-2-l2a\"],\n", " query={\"eo:cloud_cover\": {\"lt\": 25}},\n", ")\n", "\n", "items = search.item_collection()\n", "print(len(items))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So 138 items match our search requirements, over space, time, and cloudiness. Those items will still have *some* clouds over portions of the scenes, though. To create our cloudless mosaic, we'll load the data into an [xarray](https://xarray.pydata.org/en/stable/) DataArray using [stackstac](https://stackstac.readthedocs.io/) and then reduce the time-series of images down to a single image." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'stackstac-6f1865b36c5ddc90ada70893ec944f5f' (time: 138,\n",
       "                                                                band: 3,\n",
       "                                                                y: 1099, x: 1099)>\n",
       "dask.array<where, shape=(138, 3, 1099, 1099), dtype=float64, chunksize=(1, 1, 1099, 1099), chunktype=numpy.ndarray>\n",
       "Coordinates: (12/46)\n",
       "  * time                                     (time) datetime64[ns] 2016-02-08...\n",
       "    id                                       (time) <U54 'S2A_MSIL2A_20160208...\n",
       "  * band                                     (band) <U5 'red' 'green' 'blue'\n",
       "  * x                                        (x) float64 4.999e+05 ... 6.097e+05\n",
       "  * y                                        (y) float64 5.3e+06 ... 5.19e+06\n",
       "    s2:processing_baseline                   (time) <U5 '03.00' ... '02.12'\n",
       "    ...                                       ...\n",
       "    proj:bbox                                object {5190240.0, 609780.0, 499...\n",
       "    gsd                                      float64 10.0\n",
       "    common_name                              (band) <U5 'red' 'green' 'blue'\n",
       "    center_wavelength                        (band) float64 0.665 0.56 0.49\n",
       "    full_width_half_max                      (band) float64 0.038 0.045 0.098\n",
       "    epsg                                     int64 32610\n",
       "Attributes:\n",
       "    spec:        RasterSpec(epsg=32610, bounds=(499900, 5190200, 609800, 5300...\n",
       "    crs:         epsg:32610\n",
       "    transform:   | 100.00, 0.00, 499900.00|\\n| 0.00,-100.00, 5300100.00|\\n| 0...\n",
       "    resolution:  100
" ], "text/plain": [ "\n", "dask.array\n", "Coordinates: (12/46)\n", " * time (time) datetime64[ns] 2016-02-08...\n", " id (time) 0, other=np.nan) # sentinel-2 uses 0 as nodata\n", " .assign_coords(band=lambda x: x.common_name.rename(\"band\")) # use common names\n", ")\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the data matching our query isn't too large we can persist it in distributed memory. Once in memory, subsequent operations will be much faster." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "data = data.persist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Median composite\n", "\n", "Using normal xarray operations, we can [compute the median](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.median.html) over the time dimension. Under the assumption that clouds are transient, the composite shouldn't contain (many) clouds, since they shouldn't be the median pixel value at that point over many images.\n", "\n", "This will be computed in parallel on the cluster (make sure to open the Dask Dashboard using the link printed out above)." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "median = data.median(dim=\"time\").compute()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To visualize the data, we'll use xarray-spatial's `true_color` method to convert to red/green/blue values." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "image = ms.true_color(*median) # expects red, green, blue DataArrays" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "fig, ax = plt.subplots(figsize=(8, 8))\n", "\n", "ax.set_axis_off()\n", "image.plot.imshow(ax=ax);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Monthly composite\n", "\n", "Now suppose we don't want to combine images from different parts of the year (for example, we might not want to combine images from January that often include snow with images from July). Again using standard xarray syntax, we can create set of per-month composites by grouping by month and then taking the median." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "monthly = data.groupby(\"time.month\").median().compute()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's convert each of those arrays to a true-color image and plot the results as a grid." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "images = [ms.true_color(*x) for x in monthly]\n", "images = xr.concat(images, dim=\"time\")\n", "\n", "g = images.plot.imshow(x=\"x\", y=\"y\", rgb=\"band\", col=\"time\", col_wrap=3, figsize=(6, 8))\n", "for ax in g.axes.flat:\n", " ax.set_axis_off()\n", "\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Learn more\n", "\n", "To learn more about using the the Planetary Computer's STAC API, see [Reading data from the STAC API](../quickstarts/reading-stac.ipynb). To learn more about Dask, see [Scaling with Dask](../quickstarts/scale-with-dask.ipynb)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": { "9d2a40c95b594a728dda48ca2f35cce7": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "VBoxModel", "state": { "layout": "IPY_MODEL_b0634804558d48b080ce68b1c6209869" } }, "b0634804558d48b080ce68b1c6209869": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {} } }, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }