{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# Discover and Read SAR Data\n", "\n", "\n", "\n", "This notebook demonstrates how to access radar data in a SpatioTemporal Asset Catalog (STAC) Catalogue using the `pystac` library. In this example, we use Sentinel-1 data from the EODC (earth observation data and high performance computing service provider based in Vienna) STAC catalog. In the further process, we will learn how to query a STAC catalog, select specific items, and display the metadata and the actual image.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "1", "metadata": {}, "outputs": [], "source": [ "import folium\n", "import pystac_client\n", "from odc import stac as odc_stac" ] }, { "cell_type": "markdown", "id": "2", "metadata": {}, "source": [ "## Data Discovery\n" ] }, { "cell_type": "code", "execution_count": null, "id": "3", "metadata": {}, "outputs": [], "source": [ "eodc_catalog = pystac_client.Client.open(\"https://stac.eodc.eu/api/v1\")\n", "\n", "eodc_catalog" ] }, { "cell_type": "markdown", "id": "4", "metadata": {}, "source": [ "The URL `https://stac.eodc.eu/api/v1`, served over Hypertext Transfer Protocol (HTTP), is a STAC-compliant API endpoint (specific URL address where an API service is available) that leads to the EODC Catalogue. Besides EODC's, other catalogues can be found on [STAC Index](https://stacindex.org/catalogs), such as United States Geological Survey (USGS) Landsat imagery, Sentinel Hub, Copernicus Data Space Ecosystem, and so on. Briefly spoken, STAC can be used to search, discover, and access metadata of these datasets with the same code. The EODC Catalogue can be accessed on the web via this [link](https://services.eodc.eu/browser/#/?.language=en) as well.\n", "\n", "Each STAC catalog, composed by different providers, has many collections. To get all collections of a catalog, we can print all of them and their ids, which are used to fetch them from the catalog.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "5", "metadata": {}, "outputs": [], "source": [ "collections = eodc_catalog.get_collections()\n", "\n", "# length of string of collection.id, for pretty print\n", "max_length = max(len(collection.id) for collection in collections)\n", "\n", "for collection in eodc_catalog.get_collections():\n", " print(f\"{collection.id.ljust(max_length)}: {collection.title}\")" ] }, { "cell_type": "markdown", "id": "6", "metadata": {}, "source": [ "To get a specific collection from the catalog, we can use the `client.get_collection()` method and provide the collection name. We can then display its description, id, temporal and spatial extent, license, etc. In this notebook, we will work with the Sentinel-1 sigma naught 20m collection.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "7", "metadata": {}, "outputs": [], "source": [ "colllection_id = \"SENTINEL1_SIG0_20M\"\n", "\n", "collection = eodc_catalog.get_collection(colllection_id)\n", "collection" ] }, { "cell_type": "markdown", "id": "8", "metadata": {}, "source": [ "Each collection has multiple items. An item is one spatio-temporal instance in the collection, for instance a satellite image. If items are needed for a specific timeframe or for a specific region of interest, we can define this as a query.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "9", "metadata": {}, "outputs": [], "source": [ "time_range = \"2022-10-01/2022-10-07\" # a closed range\n", "# time_range = \"2022-01\" # whole month, same can be done for a year and a day\n", "# time_range = \"2022-01-01/..\" # up to the current date, an open range\n", "# time_range = \"2022-01-01T05:34:46\" # a specific time instance" ] }, { "cell_type": "markdown", "id": "10", "metadata": {}, "source": [ "A spatial region of interest can be defined in different ways. One option is to define a simple bounding box:\n" ] }, { "cell_type": "code", "execution_count": null, "id": "11", "metadata": {}, "outputs": [], "source": [ "latmin, latmax = 46.3, 49.3 # South to North\n", "lonmin, lonmax = 13.8, 17.8 # West to East\n", "\n", "bounding_box = [lonmin, latmin, lonmax, latmax]" ] }, { "cell_type": "markdown", "id": "12", "metadata": {}, "source": [ "If the region of interest is not rectangular, we can also define a polygon:\n" ] }, { "cell_type": "code", "execution_count": null, "id": "13", "metadata": {}, "outputs": [], "source": [ "# GEOJSON can be created on geojson.io\n", "\n", "# This specific area of interest is a rectangle, but since it is\n", "# a closed polygon it seems like it has five nodes\n", "\n", "area_of_interest = {\n", " \"coordinates\": [\n", " [\n", " [17.710928010825853, 49.257630084442496],\n", " [13.881798300915221, 49.257630084442496],\n", " [13.881798300915221, 46.34747715326259],\n", " [17.710928010825853, 46.34747715326259],\n", " [17.710928010825853, 49.257630084442496],\n", " ]\n", " ],\n", " \"type\": \"Polygon\",\n", "}" ] }, { "cell_type": "markdown", "id": "14", "metadata": {}, "source": [ "Using our previously loaded STAC catalog, we can now search for items fullfilling our query. In this example we are using the bounding box. If we want to use an area of interest specified in the geojson format - one has to use the intersects parameter as documented in the comment below.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "15", "metadata": {}, "outputs": [], "source": [ "search = eodc_catalog.search(\n", " collections=colllection_id, # can also be a list of several collections\n", " bbox=bounding_box, # search by bounding box\n", " # intersects=area_of_interest, # GeoJSON search\n", " datetime=time_range,\n", " # max_items=1 # number of max items to load\n", ")\n", "\n", "# If we comment everything besides colllection_id, we will load whole\n", "# collection for available region and time_range\n", "\n", "items_eodc = search.item_collection()\n", "print(f\"On EODC we found {len(items_eodc)} items for the given search query\")" ] }, { "cell_type": "markdown", "id": "16", "metadata": {}, "source": [ "Now, we can fetch a single item, in this case a Sentinel-1 image, from the query results. A good practice is to always check what metadata the data provider has stored on the item level. This can be done by looking into the item properties.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "17", "metadata": {}, "outputs": [], "source": [ "item = items_eodc[0]\n", "item.properties" ] }, { "cell_type": "markdown", "id": "18", "metadata": {}, "source": [ "For now, let's display only the vertical-vertical (VV) polarized band of the item and some information about the data.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "19", "metadata": {}, "outputs": [], "source": [ "item.assets[\"VV\"].extra_fields.get(\"raster:bands\")[0]" ] }, { "cell_type": "markdown", "id": "20", "metadata": {}, "source": [ "In the EODC STAC catalogue an item can conveniently be displayed using its thumbnail.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "21", "metadata": {}, "outputs": [], "source": [ "item.assets[\"thumbnail\"].href" ] }, { "cell_type": "markdown", "id": "22", "metadata": {}, "source": [ "Now we will plot the data on a map using the thumbnail and the python package [folium](https://python-visualization.github.io/folium/latest/user_guide.html). This is an easy way to quickly check how the data found by a search query looks on a map.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "23", "metadata": {}, "outputs": [], "source": [ "map = folium.Map(\n", " location=[(latmin + latmax) / 2, (lonmin + lonmax) / 2],\n", " zoom_start=7,\n", " zoom_control=False,\n", " scrollWheelZoom=False,\n", " dragging=False,\n", ")\n", "\n", "folium.GeoJson(area_of_interest, name=\"Area of Interest\").add_to(map)\n", "\n", "for item in items_eodc:\n", " # url leading to display of an item, can also be used as hyperlink\n", " image_url = item.assets[\"thumbnail\"].href\n", " bounds = item.bbox\n", " folium.raster_layers.ImageOverlay(\n", " image=image_url,\n", " bounds=[[bounds[1], bounds[0]], [bounds[3], bounds[2]]],\n", " ).add_to(map)\n", "\n", "folium.LayerControl().add_to(map)\n", "\n", "map" ] }, { "cell_type": "markdown", "id": "24", "metadata": {}, "source": [ "*Figure 1: Map of study area. Blue rectangle is the area covered by the discovered data.*\n", "\n", "## Data Reading\n", "\n", "STAC can also be a useful tool for the discovery of data, however it only loads metadata. This saves memory, but if one would like to do further analysis, the data has to be loaded into memory or downloaded on disk.\n", "\n", "In the following, we will demonstrate this with the library `odc-stac`. Here we can define what data will loaded as `bands`; in this case VV sigma naught. Moreover we can resample the data by providing any coordinate reference system (CRS) and resolution as well as a method for resampling of continuos data (e.g. bilinear resampling). In the example below we use the EQUI7 Grid of Europe and a 20 meter sampling. This is the native format of sigma naught stored at EODC, so there will be no actual resampling. Note, also, that resampling is not advisable for this data, as it is provided on a logarithmic scale. More about this in the notebook \"Backscattering Coefficients\".\n", "\n", "*The chunks argument is an advancement method for performing parallel computations on the data. We will not cover this in further detail.*\n" ] }, { "cell_type": "code", "execution_count": null, "id": "25", "metadata": {}, "outputs": [], "source": [ "bands = \"VV\" # Vertical-vertical polarized\n", "crs = \"EPSG:27704\" # Coordinate Reference System: EQUI7 Grid of Europe\n", "res = 20 # 20 meter\n", "chunks = {\"time\": 1, \"latitude\": 1000, \"longitude\": 1000}\n", "sig0_dc = odc_stac.load(\n", " items_eodc,\n", " bands=bands,\n", " crs=crs,\n", " resolution=res,\n", " bbox=bounding_box,\n", " chunks=chunks,\n", " resampling=\"bilinear\",\n", ")" ] }, { "cell_type": "markdown", "id": "26", "metadata": {}, "source": [ "Let's have a look at the VV polarized band of the dataset.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "27", "metadata": {}, "outputs": [], "source": [ "sig0_dc.VV" ] }, { "cell_type": "markdown", "id": "28", "metadata": {}, "source": [ "As we can see, the data is stored as a `xarray` DataArray. Xarray is a convenient package for multidimensional labeled arrays, like temperature, humidity, pressure, different bands of satellite imagery, and so on. [The link](https://docs.xarray.dev/en/stable/index.html) provides detailed documentation. In a later notebook we will explore some more of the functionality of `xarray`. As we can see in the coordinates, the data here consists of 21 time steps.\n", "\n", "In general, data from STAC is \"lazily\" loaded, which means that the structure of the DataArray is constructed, but the data is not loaded yet. It is loaded only at instance when it is needed, for example, for plotting, computations, and so on.\n", "\n", "Since the DataArray has currently a size of almost 18 GiB, we will subset it to the region of Vienna.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "29", "metadata": {}, "outputs": [], "source": [ "# Create a bounding box covering the region of Vienna\n", "latmin_smaller, latmax_smaller = 48, 48.4\n", "lonmin_smaller, lonmax_smaller = 16, 16.5\n", "\n", "smaller_bounding_box = [\n", " [latmin_smaller, lonmin_smaller],\n", " [latmax_smaller, lonmax_smaller],\n", "]\n", "\n", "map = folium.Map(\n", " location=[\n", " (latmin_smaller + latmax_smaller) / 2,\n", " (lonmin_smaller + lonmax_smaller) / 2,\n", " ],\n", " zoom_start=8,\n", " zoom_control=False,\n", " scrollWheelZoom=False,\n", " dragging=False,\n", ")\n", "\n", "folium.GeoJson(area_of_interest, name=\"Area of Interest\").add_to(map)\n", "\n", "folium.Rectangle(\n", " bounds=smaller_bounding_box,\n", " color=\"red\",\n", ").add_to(map)\n", "\n", "for item in items_eodc:\n", " image_url = item.assets[\"thumbnail\"].href\n", " bounds = item.bbox\n", " folium.raster_layers.ImageOverlay(\n", " image=image_url,\n", " bounds=[[bounds[1], bounds[0]], [bounds[3], bounds[2]]],\n", " ).add_to(map)\n", "\n", "folium.LayerControl().add_to(map)\n", "\n", "map" ] }, { "cell_type": "markdown", "id": "30", "metadata": {}, "source": [ "*Figure 2: Map of study area. Blue rectangle is the area covered by the discovered data. Red rectangle covers the selected data.*\n", "\n", "Create a new dataset with the smaller bounding box covering the region of Vienna. We will leave out the arguments for resampling and directly use the native format as defined in the metadata.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "31", "metadata": {}, "outputs": [], "source": [ "sig0_dc = odc_stac.load(\n", " items_eodc,\n", " bands=bands,\n", " bbox=[lonmin_smaller, latmin_smaller, lonmax_smaller, latmax_smaller],\n", " chunks=chunks,\n", ")" ] }, { "cell_type": "markdown", "id": "32", "metadata": {}, "source": [ "Due to the way the data is acquired and stored, some items include \"no data\" areas. In our case, no data has the value -9999, but this can vary from data provider to data provider. This information can usually be found in the metadata. Furthermore, to save memory, data is often stored as integer (e.g. 25) and not in float (e.g. 2.5) format. For this reason, the backscatter values are often multiplied by a scale factor, in this case factor 10.\n", "\n", "As Sentinel-1 satellites overpasses Austria every few days, only some time steps of the dataset will have physical data. As a final step, we will now decode the data and create a plot of two consecutive Sentinel-1 acquisitions of Vienna.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "33", "metadata": {}, "outputs": [], "source": [ "# Retrieve the scale factor and NoData value from the metadata. raster:bands is\n", "# a STAC raster extension\n", "scale = item.assets[\"VV\"].extra_fields.get(\"raster:bands\")[0][\"scale\"]\n", "nodata = item.assets[\"VV\"].extra_fields.get(\"raster:bands\")[0][\"nodata\"]\n", "\n", "# Decode data with the NoData value and the scale factor\n", "sig0_dc = sig0_dc.where(sig0_dc != nodata) / scale\n", "\n", "# We should remove unnecessary dates when there was no data\n", "# (no satellite overpass)\n", "sig0_dc = sig0_dc.dropna(dim=\"time\")" ] }, { "cell_type": "code", "execution_count": null, "id": "34", "metadata": {}, "outputs": [], "source": [ "sig0_dc.VV.plot(col=\"time\", robust=True, cmap=\"Greys_r\", aspect=1, size=10)" ] }, { "cell_type": "markdown", "id": "35", "metadata": {}, "source": [ "*Figure 3: Sentinel-1 microwave backscatter image for two timeslices.*" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 5 }