{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Catalog\n", "Use the [Descartes Labs Catalog](https://docs.descarteslabs.com/guides/catalog_v2.html) to discover existing raster products, search the images contained in them and manage your own products and images. The Catalog Python client is mainly for discovering data and for managing data. For data analysis and rastering use, [Scenes](https://docs.descarteslabs.com/guides/scenes.html).\n", "\n", "You can run the following cells using `Shift-Enter`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Note__: The Catalog Python object-oriented client provides the functionality previously covered by the more low-level, now deprecated Metadata and Catalog Python clients.\n", "\n", "## Concepts \n", "The Descartes Labs Catalog is a repository for georeferenced images. Commonly these images are either acquired by Earth observation platforms like a satellite or they are derived from other georeferenced images. The catalog is modeled on the following core concepts, each of which is represented by its own class in the API.\n", "\n", "### Images\n", "An image (represented by class `Image` in the API) contains data for a shape on earth, as specified by its georeferencing. An image references one or more files (commonly TIFF or JPEG files) that contain the binary data conforming to the band declaration of its product.\n", "\n", "### Bands\n", "A band (represented by class `Band`) is a 2-dimensional slice of raster data in an image. A product must have at least one band and all images in the product must conform to the declared band structure. For example, an optical sensor will commonly have bands that correspond to the red, blue and green visible light spectrum, which you could raster together to create an RGB image.\n", "\n", "### Products\n", "A product (represented by class `Product`) is a collection of images that share the same band structure. Images in a product can generally be used jointly in a data analysis, as they are expected to have been uniformly processed with respect to data correction, georegistration and so on. For example, you can composite multiple images from a product to run an algorithm over a large geographic region." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Searching the catalog\n", "All objects support the same search interface. Let’s look at two of the most commonly searched for types of objects: products and images.\n", "\n", "### Finding products\n", "`Product.search()` is the entry point for searching products. It returns a query builder that you can use to refine your search and can iterate over to retrieve search results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Count all products with some data before 2016 using `filter()`:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "124" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from descarteslabs.catalog import Product, properties as p\n", "search = Product.search().filter(p.start_datetime < \"2016-01-01\")\n", "search.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can apply multiple filters. To restrict this search to products with data after 2000:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "50" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "search = search.filter(p.end_datetime > \"2000-01-01\")\n", "search.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of these, get the 3 products with the oldest data, using `sort()` and `limit()`. The search is not executed until you start retrieving results by iterating over it:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "e7e4fc5b0afd45d32c623697cb5a4437d3f09ade:daily-weather:cfsr-v1\n", "fcd57a7bf668c5b65d49c5e32130a2c0a5281322:daily-weather:cfsr-v1:dev-v0\n", "daily-weather:gsod-interpolated:v0\n" ] } ], "source": [ "oldest_search = search.sort(\"start_datetime\").limit(3)\n", "for result in oldest_search:\n", " print(result.id)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All attributes are documented in the Product API reference, which also spells out which ones can be used to filter or sort." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Lookup by id and object relationships\n", "If you know a product’s id, look it up directly with `Product.get()`:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Wherever there are relationships between objects expect methods such as `bands()` to find related objects. This shows the first four bands of the Landsat 8 product we looked up:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "Product: Landsat 8 Collection 1 Real-Time\n", " id: landsat:LC08:01:RT:TOAR" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "landsat8_collection1 = Product.get(\"landsat:LC08:01:RT:TOAR\")\n", "landsat8_collection1" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[\n", " SpectralBand: coastal-aerosol\n", " id: landsat:LC08:01:RT:TOAR:coastal-aerosol\n", " product: landsat:LC08:01:RT:TOAR, \n", " SpectralBand: blue\n", " id: landsat:LC08:01:RT:TOAR:blue\n", " product: landsat:LC08:01:RT:TOAR, \n", " SpectralBand: green\n", " id: landsat:LC08:01:RT:TOAR:green\n", " product: landsat:LC08:01:RT:TOAR, \n", " SpectralBand: red\n", " id: landsat:LC08:01:RT:TOAR:red\n", " product: landsat:LC08:01:RT:TOAR]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(landsat8_collection1.bands().limit(4))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`bands()` returns a search object that can be further refined. This shows all class bands of this Landsat 8 product, sorted by name:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[\n", " ClassBand: qa_cirrus\n", " id: landsat:LC08:01:RT:TOAR:qa_cirrus\n", " product: landsat:LC08:01:RT:TOAR, \n", " ClassBand: qa_cloud\n", " id: landsat:LC08:01:RT:TOAR:qa_cloud\n", " product: landsat:LC08:01:RT:TOAR, \n", " ClassBand: qa_cloud_shadow\n", " id: landsat:LC08:01:RT:TOAR:qa_cloud_shadow\n", " product: landsat:LC08:01:RT:TOAR, \n", " ClassBand: qa_saturated\n", " id: landsat:LC08:01:RT:TOAR:qa_saturated\n", " product: landsat:LC08:01:RT:TOAR, \n", " ClassBand: qa_snow\n", " id: landsat:LC08:01:RT:TOAR:qa_snow\n", " product: landsat:LC08:01:RT:TOAR, \n", " ClassBand: valid-cloudfree\n", " id: landsat:LC08:01:RT:TOAR:valid-cloudfree\n", " product: landsat:LC08:01:RT:TOAR]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from descarteslabs.catalog import BandType\n", "list(landsat8_collection1.bands().filter(p.type == BandType.CLASS).sort(\"name\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Finding images\n", "### Image filters\n", "Search images by the most common attributes - by product, intersecting with a geometry and by a date range:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "14" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from descarteslabs.catalog import Image, properties as p\n", "geometry = {\n", " \"type\": \"Polygon\",\n", " \"coordinates\": [[\n", " [2.915496826171875, 42.044193618165224],\n", " [2.838592529296875, 41.92475971933975],\n", " [3.043212890625, 41.929868314485795],\n", " [2.915496826171875, 42.044193618165224]\n", " ]]\n", " }\n", "\n", "search = Product.get(\"landsat:LC08:01:RT:TOAR\").images()\n", "search = search.intersects(geometry)\n", "search = search.filter((p.acquired > \"2017-01-01\") & (p.acquired < \"2018-01-01\"))\n", "search.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are other attributes useful to filter by, documented in the API reference for Image. For example exclude images with too much cloud cover:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "7" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "search = search.filter(p.cloud_fraction < 0.2)\n", "search.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Filtering by `cloud_fraction` is only reasonable when the product sets this attribute on images. Images that don’t set the attribute are excluded from the filter.\n", "\n", "The created timestamp is added to all objects in the catalog when they are created and is immutable. Restrict the search to results created before some time in the past, to make sure that the image results are stable:\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "7" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from datetime import datetime\n", "search = search.filter(p.created < datetime(2019, 1, 1))\n", "search.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that for all timestamps we can use datetime instances or strings that can reasonably be parsed as a timestamp. If a timestamp has no explicit timezone, it’s assumed to be in UTC." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Image summaries\n", "Any queries for images support a summary via the `summary()` method, returning a `SummaryResult` with aggregate statistics beyond just the number of results:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "Summary for 490473 images:\n", " - Total bytes: 57,212,159,637,881\n", " - Products: landsat:LC08:01:T1:TOAR" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from descarteslabs.catalog import Image, properties as p\n", "search = Image.search().filter(p.product_id == \"landsat:LC08:01:T1:TOAR\")\n", "search.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These summaries can also be bucketed by time intervals with `summary_interval()` to create a time series:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[\n", " Summary for 9872 images:\n", " - Total bytes: 1,230,379,744,242\n", " - Interval start: 2017-01-01 00:00:00+00:00, \n", " Summary for 10185 images:\n", " - Total bytes: 1,288,400,404,886\n", " - Interval start: 2017-02-01 00:00:00+00:00, \n", " Summary for 12426 images:\n", " - Total bytes: 1,556,107,514,684\n", " - Interval start: 2017-03-01 00:00:00+00:00, \n", " Summary for 12492 images:\n", " - Total bytes: 1,476,030,969,986\n", " - Interval start: 2017-04-01 00:00:00+00:00, \n", " Summary for 13768 images:\n", " - Total bytes: 1,571,780,442,608\n", " - Interval start: 2017-05-01 00:00:00+00:00]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "search.summary_interval(interval=\"month\", start_datetime=\"2017-01-01\", end_datetime=\"2017-06-01\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Managing products\n", "Creating and updating a product\n", "Before uploading images to the catalog, you need to create a product and declare its bands. The only required attributes are a unique id, passed in the constructor, and a name:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'descarteslabs:guide-example-product_7005'" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from descarteslabs.catalog import Product\n", "import random \n", "\n", "# We append a random number to the product ID so users who run this example multiple times do not get a \"Product with that ID exists\" error\n", "product_id = \"guide-example-product_\" + str(random.randint(1000,9999))\n", "product = Product(id=product_id)\n", "product.name = \"Example product\"\n", "product.save()\n", "product.id" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2020, 1, 7, 18, 51, 47, 437294, tzinfo=)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "product.created" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`save()` saves the product to the catalog in the cloud. Note that you get to choose an id for your product but it must be unique within your organization (you get an exception if it’s not). This code example is assuming the user is in the “descarteslabs” organization. The id is prefixed with the organization id on save to enforce global uniqueness and uniqueness within an organization. If you are not part of an organization the prefix will be your unique user id.\n", "\n", "Every object has a read-only `created` attribute with the timestamp from when it was first saved.\n", "\n", "There are a few more attributes that you can set (see the [Product API reference](https://docs.descarteslabs.com/descarteslabs/catalog/readme.html)). You can update the product to define the timespan that it covers. This is as simple as assigning attributes and then saving again:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2012, 1, 1, 0, 0, tzinfo=)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "product.start_datetime = \"2012-01-01\"\n", "product.end_datetime = \"2015-01-01\"\n", "product.save()\n", "product.start_datetime" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2020, 1, 7, 18, 52, 47, 97574, tzinfo=)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "product.modified" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A read-only modified attribute exists on all objects and is updated on every save.\n", "\n", "Note that all timestamp attributes are represented as datetime instances in UTC. You may assign strings to timestamp attributes if they can be reasonably parsed as timestamps. Once the object is saved the attributes will appear as parsed datetime instances. If a timestamp has no explicit timezone, it’s assumed to be in UTC.\n", "\n", "# Creating bands\n", "Before adding any images to a product you should create bands that declare the structure of the data shared among all images in a product." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'descarteslabs:guide-example-product_7005:blue'" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from descarteslabs.catalog import SpectralBand, DataType, Resolution, ResolutionUnit\n", "band = SpectralBand(name=\"blue\", product=product)\n", "band.data_type = DataType.UINT16\n", "band.data_range = (0, 10000)\n", "band.display_range = (0, 4000)\n", "band.resolution = Resolution(unit=ResolutionUnit.METERS, value=60)\n", "band.band_index = 0\n", "band.save()\n", "band.id" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A band is uniquely identified by its name and product. The full id of the band is composed of the product id and the name.\n", "\n", "The band defines where its data is found in the files attached to images in the product: In this example, band_index = 0 indicates that blue is the first band in the image file, and that first band is expected to be represented by unsigned 16-bit integers (DataType.UINT16).\n", "\n", "This band is specifically a `SpectralBand`, with pixel values representing measurements somewhere in the visible/NIR/SWIR electro-optical wavelength spectrum, so you can also set additional attributes to locate it on the spectrum:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# These values are in nanometers (nm)\n", "band.wavelength_nm_min = 452\n", "band.wavelength_nm_max = 512\n", "band.save()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Bands are created and updated in the same way was as products and all other Catalog objects." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Band types" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It’s common for many products to have an alpha band, which masks pixels in the image that don’t have valid data:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "from descarteslabs.catalog import MaskBand\n", "alpha = MaskBand(name=\"alpha\", product=product)\n", "alpha.is_alpha = True\n", "alpha.data_type = DataType.UINT16\n", "alpha.resolution = band.resolution\n", "alpha.band_index = 1\n", "alpha.save()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here the `alpha` band is created as a `MaskBand` which is by definition a binary band with a data range from 0 to 1, so there is no need to set the data_range and display_range attribute.\n", "\n", "Setting `is_alpha` to `True` enables special behavior for this band during rastering. If this band appears as the last band in a raster operation (such as `SceneCollection.mosaic` or `SceneCollection.stack` in the scenes client) pixels with a value of 0 in this band will be treated as transparent.\n", "\n", "There are five band types which may have some attributes specific to them. The type of a band does not necessarily affect how it is rastered, it mainly conveys useful information about the data it contains.\n", "\n", "* `SpectralBand`: A band that lies somewhere on the visible/NIR/SWIR electro-optical wavelength spectrum. Specific attributes: wavelength_nm_center, wavelength_nm_min, wavelength_nm_max, wavelength_nm_fwhm\n", "\n", "* `MicrowaveBand`: A band that lies in the microwave spectrum, often from SAR or passive radar sensors. Specific attributes: frequency, bandwidth\n", "\n", "* `MaskBand`: A binary band where by convention a 0 means masked and 1 means non-masked. The data_range and display_range for masks is implicitly [0, 1]. Specific attributes: is_alpha\n", "\n", "* `ClassBand`: A band that maps a finite set of values that may not be continuous to classification categories (e.g. a land use classification). A visualization with straight pixel values is typically not useful, so commonly a colormap is used. Specific attributes: colormap, colormap_name, class_labels\n", "\n", "* `GenericBand`: A generic type for bands that are not represented by the other band types, e.g., mapping physical values like temperature or angles. Specific attributes: colormap, colormap_name, physical_range, physical_range_unit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Managing images\n", "Apart from searching and discovering data available to you, the main use case of the catalog is to let you upload new images. In the following example, we will upload data with a single band representing the blue light spectrum. In addition to the example below, users can [upload image](https://docs.descarteslabs.com/descarteslabs/catalog/readme.html) files in geoTiff format through the catalog." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Uploading ndarrays\n", "Often, when creating derived products - for example, running a classification model on existing data - you’ll have a NumPy array (often referred to as “ndarrays”) in memory instead of a file written to disk. In that case, you can use ```Image.upload_ndarray```. This method behaves like ```Image.upload```, with one key difference: you must provide georeferencing attributes for the ndarray.\n", "\n", "Georeferencing attributes are used to map between geospatial coordinates (such as latitude and longitude) and their corresponding pixel coordinates in the array. The required attributes are:\n", "\n", "An affine geotransform in GDAL format (the ```geotrans``` attribute)\n", "\n", "A coordinate reference system definition, preferrably as an EPSG code (the ```cs_code``` attribute) or alternatively as a string in PROJ.4 or WKT format (the ```projection``` attribute)\n", "\n", "If the ndarray you’re uploading was rastered through the the platform, this information is easy to get. When rastering you also receive a dictionary of metadata that includes both of these parameters. Using the ```Scene.ndarray```, you have to set ```raster_info=True```; with ```Raster.ndarray```, it’s always returned.\n", "\n", "The following example puts these pieces together. This extracts the ```blue``` band from a Landsat 8 scene at a lower resolution and uploads it to our product:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'success'" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from descarteslabs.catalog import OverviewResampler\n", "from descarteslabs.scenes import Scene\n", "scene, geoctx = Scene.from_id(\"landsat:LC08:01:T1:TOAR:meta_LC08_L1TP_163068_20181025_20181025_01_T1_v1\")\n", "ndarray, raster_meta = scene.ndarray(\n", " \"blue\",\n", " geoctx.assign(resolution=60),\n", " # return georeferencing info we need to re-upload\n", " raster_info=True\n", " )\n", "\n", "image2 = Image(product=product, name=\"scene2\")\n", "image2.acquired = \"2012-01-02\"\n", "upload2 = image2.upload_ndarray(\n", " ndarray,\n", " raster_meta=raster_meta,\n", " # create overviews for 120m and 240m resolution\n", " overviews=[2, 4],\n", " overview_resampler=OverviewResampler.AVERAGE\n", ")\n", " \n", "upload2.wait_for_completion()\n", "upload2.status\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }