{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"## Using the Radiant MLHub API\n",
"\n",
"The Radiant MLHub API gives access to open Earth imagery training data for machine learning applications. You can learn more about the repository at the [Radiant MLHub site](https://mlhub.earth) and about the organization behind it at the [Radiant Earth Foundation site](https://radiant.earth).\n",
"\n",
"This Jupyter notebook, which you may copy and adapt for any use, shows basic examples of how to use the API. Full documentation for the API is available at [docs.mlhub.earth](docs.mlhub.earth).\n",
"\n",
"We'll show you how to set up your authentication, see the list of available collections and datasets, and retrieve the items (the data contained within them) from those collections. \n",
"\n",
"All collections in the Radiant MLHub repository are cataloged using [STAC](https://stacspec.org/). Collections that include labels/annotations are additionally described using the [Label Extension](https://github.com/stac-extensions/label)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Authentication\n",
"\n",
"Access to the Radiant MLHub API requires an API key. To get your API key, go to [mlhub.earth](https://mlhub.earth/) and click the \"Sign in / Register\" button in the top right to log in. If you have not used Radiant MLHub before, you will need to sign up and create a new account; otherwise, just sign in. Once you have signed in, click on your user avatar in the top right and select the \"Settings & API keys\" from the dropdown menu.\n",
"\n",
"In the **API Keys** section of this page, you will be able to create new API key(s). *Do not share* your API key with others as this may pose a security risk.\n",
"\n",
"Next, we will create a `MLHUB_API_KEY` variable that `pystac-client` will use later use to add our API key to all requests:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"MLHub API Key: ································································\n"
]
}
],
"source": [
"import getpass\n",
"\n",
"MLHUB_API_KEY = getpass.getpass(prompt=\"MLHub API Key: \")\n",
"MLHUB_ROOT_URL = \"https://api.radiant.earth/mlhub/v1\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we connect to the Radiant MLHub API using our API key:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import itertools as it\n",
"import requests\n",
"import shutil\n",
"import tempfile\n",
"import os.path\n",
"from pprint import pprint\n",
"from urllib.parse import urljoin\n",
"\n",
"from pystac_client import Client\n",
"from pystac import ExtensionNotImplemented\n",
"from pystac.extensions.scientific import ScientificExtension\n",
"\n",
"client = Client.open(\n",
" MLHUB_ROOT_URL, parameters={\"key\": MLHUB_API_KEY}, ignore_conformance=True\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### List datasets\n",
"\n",
"A **dataset** in the Radiant MLHub API is a JSON object that represents a group of STAC Collections that belong together. A typical datasets will include 1 Collection of source imagery and 1 Collection of labels, but this is not always the case. Some datasets are comprised of a single Collection with both labels and source imagery, others may contain multiple source imagery or label Collections, and others may contain only labels.\n",
"\n",
"*Datasets are not a STAC entity* and therefore we must work with them by making direct requests to the API rather than using `pystac-client`.\n",
"\n",
"We start by creating a `requests.Session` instance so that we can include the API key in all of our requests."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"class MLHubSession(requests.Session):\n",
" def __init__(self, *args, api_key=None, **kwargs):\n",
" super().__init__(*args, **kwargs)\n",
" self.params.update({\"key\": api_key})\n",
"\n",
" def request(self, method, url, *args, **kwargs):\n",
" url_prefix = MLHUB_ROOT_URL.rstrip(\"/\") + \"/\"\n",
" url = urljoin(url_prefix, url)\n",
" return super().request(method, url, *args, **kwargs)\n",
"\n",
"\n",
"session = MLHubSession(api_key=MLHUB_API_KEY)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we list the available datasets using the `/datasets` endpoint"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total Datasets: 27\n",
"-----\n",
"idiv_asia_crop_type: A crop type dataset for consistent land cover classification in Central Asia\n",
"dlr_fusion_competition_germany: A Fusion Dataset for Crop Type Classification in Germany\n",
"ref_fusion_competition_south_africa: A Fusion Dataset for Crop Type Classification in Western Cape, South Africa\n",
"bigearthnet_v1: BigEarthNet\n",
"microsoft_chesapeake: Chesapeake Land Cover\n",
"ref_african_crops_kenya_02: CV4A Kenya Crop Type Competition\n",
"ref_african_crops_uganda_01: Dalberg Data Insights Crop Type Uganda\n",
"rti_rwanda_crop_type: Drone Imagery Classification Training Dataset for Crop Types in Rwanda\n",
"ref_african_crops_tanzania_01: Great African Food Company Crop Type Tanzania\n",
"landcovernet_v1: LandCoverNet\n",
"nasa_marine_debris: Marine Debris Dataset for Object Detection in Planetscope Imagery\n",
"open_cities_ai_challenge: Open Cities AI Challenge Dataset\n",
"ref_african_crops_kenya_01: PlantVillage Crop Type Kenya\n",
"su_african_crops_ghana: Semantic Segmentation of Crop Type in Ghana\n",
"su_african_crops_south_sudan: Semantic Segmentation of Crop Type in South Sudan\n",
"sen12floods: SEN12-FLOOD : A SAR and Multispectral Dataset for Flood Detection\n",
"ts_cashew_benin: Smallholder Cashew Plantations in Benin\n",
"ref_south_africa_crops_competition_v1: South Africa Crop Type Competition\n",
"spacenet1: SpaceNet 1\n",
"spacenet2: SpaceNet 2\n",
"spacenet3: SpaceNet 3\n",
"spacenet4: SpaceNet 4\n",
"spacenet5: SpaceNet 5\n",
"spacenet6: SpaceNet 6\n",
"spacenet7: SpaceNet 7\n",
"nasa_tropical_storm_competition: Tropical Cyclone Wind Estimation Competition\n",
"su_sar_moisture_content_main: Western USA Live Fuel Moisture\n"
]
}
],
"source": [
"response = session.get(\"/datasets\")\n",
"datasets = response.json()\n",
"\n",
"dataset_limit = 30\n",
"\n",
"print(f\"Total Datasets: {len(datasets)}\")\n",
"print(\"-----\")\n",
"for dataset in it.islice(datasets, dataset_limit):\n",
" dataset_id = dataset[\"id\"]\n",
" dataset_title = dataset[\"title\"] or \"No Title\"\n",
" print(f\"{dataset_id}: {dataset_title}\")\n",
"if len(datasets) > dataset_limit:\n",
" print(\"...\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's take a look at the Kenya Crop Type dataset."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'bbox': {'coordinates': [[[[34.203204542, 0.16702187],\n",
" [34.203204542, 0.167033875],\n",
" [34.022068532, 0.167033875],\n",
" [34.022068532, 0.441545516],\n",
" [34.022094367, 0.441545516],\n",
" [34.022094367, 0.716046625],\n",
" [34.203292802, 0.716046625],\n",
" [34.203292802, 0.716002363],\n",
" [34.384429981, 0.716002363],\n",
" [34.384429981, 0.441486489],\n",
" [34.38436343, 0.441486489],\n",
" [34.38436343, 0.16702187],\n",
" [34.203204542, 0.16702187]]]],\n",
" 'type': 'MultiPolygon'},\n",
" 'citation': 'Radiant Earth Foundation (2020) \"CV4A Competition Kenya Crop '\n",
" 'Type Dataset\", Version 1.0, Radiant MLHub. [Date Accessed] '\n",
" 'https://doi.org/10.34911/RDNT.DW605X',\n",
" 'collections': [{'id': 'ref_african_crops_kenya_02_labels',\n",
" 'license': '[CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)',\n",
" 'types': ['labels']},\n",
" {'id': 'ref_african_crops_kenya_02_source',\n",
" 'license': '[CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)',\n",
" 'types': ['source_imagery']}],\n",
" 'creator_contact': {'contact': 'ml@radiant.earth',\n",
" 'creator': '[Radiant Earth '\n",
" 'Foundation](https://www.radiant.earth/), '\n",
" '[PlantVillage](https://plantvillage.psu.edu/)'},\n",
" 'date_added': '2020-06-17T00:00:00+00:00',\n",
" 'date_modified': '2020-06-17T00:00:00+00:00',\n",
" 'description': 'This dataset was produced as part of the [Crop Type Detection '\n",
" 'competition](https://zindi.africa/competitions/iclr-workshop-challenge-2-radiant-earth-computer-vision-for-crop-recognition) '\n",
" 'at the [Computer Vision for Agriculture (CV4A) '\n",
" 'Workshop](https://www.cv4gc.org/cv4a2020/) at the ICLR 2020 '\n",
" 'conference. The objective of the competition was to create a '\n",
" 'machine learning model to classify fields by crop type from '\n",
" 'images collected during the growing season by the Sentinel-2 '\n",
" 'satellites.\\n'\n",
" '
\\n'\n",
" 'The ground reference data were collected by the PlantVillage '\n",
" 'team, and Radiant Earth Foundation curated the training '\n",
" 'dataset after inspecting and selecting more than 4,000 fields '\n",
" 'from the original ground reference data. The dataset has been '\n",
" 'split into training and test sets (3,286 in the train and '\n",
" '1,402 in the test).\\n'\n",
" '
\\n'\n",
" 'The dataset is cataloged in four tiles. These tiles are '\n",
" 'smaller than the original Sentinel-2 tile that has been '\n",
" 'clipped and chipped to the geographical area that labels have '\n",
" 'been collected.\\n'\n",
" '
\\n'\n",
" 'Each tile has a) 13 multi-band observations throughout the '\n",
" 'growing season. Each observation includes 12 bands from '\n",
" 'Sentinel-2 L2A product, and a cloud probability layer. The '\n",
" 'twelve bands are [B01, B02, B03, B04, B05, B06, B07, B08, '\n",
" 'B8A, B09, B11, B12]. The cloud probability layer is a product '\n",
" 'of the Sentinel-2 atmospheric correction algorithm (Sen2Cor) '\n",
" 'and provides an estimated cloud probability (0-100%) per '\n",
" 'pixel. All of the bands are mapped to a common 10 m spatial '\n",
" 'resolution grid.; b) A raster layer indicating the crop ID '\n",
" 'for the fields in the training set; and c) A raster layer '\n",
" 'indicating field IDs for the fields (both training and test '\n",
" 'sets). Fields with a crop ID of 0 are the test fields.',\n",
" 'documentation_link': 'https://radiantearth.blob.core.windows.net/mlhub/kenya-crop-challenge/Documentation.pdf',\n",
" 'doi': '10.34911/rdnt.dw605x',\n",
" 'id': 'ref_african_crops_kenya_02',\n",
" 'publications': [{'author_name': 'Hannah Kerner, Catherine Nakalembe and '\n",
" 'Inbal Becker-Reshef',\n",
" 'author_url': None,\n",
" 'title': 'Field-Level Crop Type Classification with k '\n",
" 'Nearest Neighbors: A Baseline for a New Kenya '\n",
" 'Smallholder Dataset',\n",
" 'url': 'https://arxiv.org/abs/2004.03023'}],\n",
" 'registry': 'https://mlhub.earth/ref_african_crops_kenya_02',\n",
" 'status': 'ready',\n",
" 'tags': ['crop type', 'segmentation', 'sentinel-2', 'agriculture'],\n",
" 'title': 'CV4A Kenya Crop Type Competition',\n",
" 'tools_applications': None,\n",
" 'tutorials': [{'author_name': 'Hamed Alemohammad',\n",
" 'author_url': 'https://www.linkedin.com/in/hamedalemohammad/',\n",
" 'title': 'A Guide to Access the data on Radiant MLHub',\n",
" 'url': 'https://github.com/radiantearth/mlhub-tutorials/blob/main/notebooks/2020%20CV4A%20Crop%20Type%20Challenge/cv4a-crop-challenge-download-data.ipynb'},\n",
" {'author_name': 'Hamed Alemohammad',\n",
" 'author_url': 'https://www.linkedin.com/in/hamedalemohammad/',\n",
" 'title': 'A Guide to load and visualize the data in Python',\n",
" 'url': 'https://github.com/radiantearth/mlhub-tutorials/blob/main/notebooks/2020%20CV4A%20Crop%20Type%20Challenge/cv4a-crop-challenge-load-data.ipynb'},\n",
" {'author_name': 'Devis Peressutti',\n",
" 'author_url': 'https://sites.google.com/site/devisperessutti/home',\n",
" 'title': 'CV4A ICLR 2020 Starter Notebooks',\n",
" 'url': 'https://github.com/sentinel-hub/cv4a-iclr-2020-starter-notebooks'}]}\n"
]
}
],
"source": [
"crop_dataset = next(\n",
" dataset for dataset in datasets if dataset[\"id\"] == \"ref_african_crops_kenya_02\"\n",
")\n",
"pprint(crop_dataset)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that the metadata includes and ID and title, citation information, a bounding box for the dataset, and list of collections included in the dataset. If we take a closer look at the `collections` list, we can see that each collection has an `id` and a `type`. We can use the `type` to figure out whether a collection contains labels, source imagery, or both, and we can use the ID to fetch that dataset (see below)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[{'id': 'ref_african_crops_kenya_02_labels',\n",
" 'license': '[CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)',\n",
" 'types': ['labels']},\n",
" {'id': 'ref_african_crops_kenya_02_source',\n",
" 'license': '[CC-BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)',\n",
" 'types': ['source_imagery']}]\n"
]
}
],
"source": [
"pprint(crop_dataset[\"collections\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### List data collections\n",
"\n",
"A **collection** in the Radiant MLHub API is a [STAC Collection](https://github.com/radiantearth/stac-spec/tree/master/collection-spec) representing a group of resources (represented as [STAC Items](https://github.com/radiantearth/stac-spec/tree/master/item-spec) and their associated assets) covering a given spatial and temporal extent. A Radiant MLHub collection may contain resources representing training labels, source imagery, or (rarely) both.\n",
"\n",
"Use the `client.list_collections` function to list all available collections and view their properties. The following cell uses the `client.list_collections` function to print the ID, license (if available), and citation (if available) for the first 20 available collections."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ID: ref_african_crops_uganda_01_source\n",
"License: CC-BY-SA-4.0\n",
"Citation: Bocquet, C., & Dalberg Data Insights. (2019) \"Dalberg Data Insights Uganda Crop Classification\", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/RDNT.EII04X\n",
"\n",
"ID: microsoft_chesapeake_landsat_leaf_on\n",
"License: CC-PDDC\n",
"Citation: Robinson C, Hou L, Malkin K, Soobitsky R, Czawlytko J, Dilkina B, Jojic N. Large Scale High-Resolution Land Cover Mapping with Multi-Resolution Data. Proceedings of the 2019 Conference on Computer Vision and Pattern Recognition (CVPR 2019).\n",
"\n",
"ID: sen12floods_s2_source\n",
"License: CC-BY-4.0\n",
"Citation: Clément Rambour, Nicolas Audebert, Elise Koeniguer, Bertrand Le Saux, Michel Crucianu, Mihai Datcu, September 14, 2020, \"SEN12-FLOOD : a SAR and Multispectral Dataset for Flood Detection \", IEEE Dataport, doi: https://dx.doi.org/10.21227/w6xz-s898.\n",
"\n",
"ID: sn2_AOI_2_Vegas\n",
"License: CC-BY-SA-4.0\n",
"Citation: N/A\n",
"\n",
"ID: sn4_AOI_6_Atlanta\n",
"License: CC-BY-SA-4.0\n",
"Citation: N/A\n",
"\n",
"ID: su_sar_moisture_content\n",
"License: CC-BY-NC-ND-4.0\n",
"Citation: Rao, K., Williams, A.P., Fortin, J. & Konings, A.G. (2020). SAR-enhanced mapping of live fuel moisture content. Remote Sens. Environ., 245.\n",
"\n",
"ID: ref_african_crops_tanzania_01_source\n",
"License: CC-BY-SA-4.0\n",
"Citation: Great African Food Company (2019) \"Great African Food Company Tanzania Ground Reference Crop Type Dataset\", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/RDNT.5VX40R\n",
"\n",
"ID: ref_south_africa_crops_competition_v1_train_source_s2\n",
"License: CC-BY-4.0\n",
"Citation: Western Cape Department of Agriculture, Radiant Earth Foundation (2021) \"Crop Type Classification Dataset for Western Cape, South Africa\", Version 1.0, Radiant MLHub, [Date Accessed] https://doi.org/10.34911/rdnt.j0co8q\n",
"\n",
"ID: sn2_AOI_4_Shanghai\n",
"License: CC-BY-SA-4.0\n",
"Citation: N/A\n",
"\n",
"ID: sn3_AOI_3_Paris\n",
"License: CC-BY-SA-4.0\n",
"Citation: N/A\n",
"\n",
"ID: sn3_AOI_4_Shanghai\n",
"License: CC-BY-SA-4.0\n",
"Citation: N/A\n",
"\n",
"ID: sn5_AOI_8_Mumbai\n",
"License: CC-BY-SA-4.0\n",
"Citation: N/A\n",
"\n",
"ID: ref_african_crops_uganda_01_labels\n",
"License: CC-BY-SA-4.0\n",
"Citation: Bocquet, C., & Dalberg Data Insights. (2019) \"Dalberg Data Insights Uganda Crop Classification\", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/RDNT.EII04X\n",
"\n",
"ID: nasa_tropical_storm_competition_train_labels\n",
"License: CC-BY-4.0\n",
"Citation: M. Maskey, R. Ramachandran, I. Gurung, B. Freitag, M. Ramasubramanian, J. Miller\"Tropical Cyclone Wind Estimation Competition Dataset\", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/rdnt.xs53up\n",
"\n",
"ID: sen12floods_s1_labels\n",
"License: CC-BY-4.0\n",
"Citation: Clément Rambour, Nicolas Audebert, Elise Koeniguer, Bertrand Le Saux, Michel Crucianu, Mihai Datcu, September 14, 2020, \"SEN12-FLOOD : a SAR and Multispectral Dataset for Flood Detection \", IEEE Dataport, doi: https://dx.doi.org/10.21227/w6xz-s898.\n",
"\n",
"ID: microsoft_chesapeake_landsat_leaf_off\n",
"License: CC-PDDC\n",
"Citation: Robinson C, Hou L, Malkin K, Soobitsky R, Czawlytko J, Dilkina B, Jojic N. Large Scale High-Resolution Land Cover Mapping with Multi-Resolution Data. Proceedings of the 2019 Conference on Computer Vision and Pattern Recognition (CVPR 2019).\n",
"\n",
"ID: sn3_AOI_2_Vegas\n",
"License: CC-BY-SA-4.0\n",
"Citation: N/A\n",
"\n",
"ID: sn7_test_source\n",
"License: CC-BY-SA-4.0\n",
"Citation: N/A\n",
"\n",
"ID: su_african_crops_ghana_source_s1\n",
"License: CC-BY-SA-4.0\n",
"Citation: Rustowicz R., Cheong R., Wang L., Ermon S., Burke M., Lobell D. (2020) \"Semantic Segmentation of Crop Type in Ghana Dataset\", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/rdnt.ry138p\n",
"\n",
"ID: su_african_crops_south_sudan_source_planet\n",
"License: CC-BY-SA-4.0\n",
"Citation: Rustowicz R., Cheong R., Wang L., Ermon S., Burke M., Lobell D. (2020) \"Semantic Segmentation of Crop Type in South Sudan Dataset\", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/rdnt.v6kx6n\n",
"\n"
]
}
],
"source": [
"collections = client.get_collections()\n",
"for c in it.islice(collections, 20):\n",
" collection_id = c.id\n",
" license = c.license or \"N/A\"\n",
" try:\n",
" sci = ScientificExtension.ext(c)\n",
" citation = sci.citation or \"N/A\"\n",
" except ExtensionNotImplemented:\n",
" citation = \"N/A\"\n",
"\n",
" print(f\"ID: {collection_id}\\nLicense: {license}\\nCitation: {citation}\\n\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Collection objects have many other properties besides the ones shown above. The cell below prints the `ref_african_crops_kenya_01_labels` collection object in its entirety."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'type': 'Collection',\n",
" 'id': 'ref_african_crops_kenya_01_labels',\n",
" 'stac_version': '1.0.0',\n",
" 'description': 'African Crops Kenya',\n",
" 'links': [{'rel': 'items',\n",
" 'href': 'http://api.radiant.earth/mlhub/v1/collections/ref_african_crops_kenya_01_labels/items',\n",
" 'type': 'application/geo+json'},\n",
" {'rel': 'parent',\n",
" 'href': 'http://api.radiant.earth/mlhub/v1/',\n",
" 'type': 'application/json'},\n",
" {'rel': ,\n",
" 'href': 'https://api.radiant.earth/mlhub/v1',\n",
" 'type': ,\n",
" 'title': 'Radiant MLHub API'},\n",
" {'rel': 'self',\n",
" 'href': 'http://api.radiant.earth/mlhub/v1/collections/ref_african_crops_kenya_01_labels',\n",
" 'type': 'application/json'}],\n",
" 'stac_extensions': ['https://stac-extensions.github.io/scientific/v1.0.0/schema.json'],\n",
" 'sci:doi': '10.34911/rdnt.u41j87',\n",
" 'providers': [{'name': 'Radiant Earth Foundation',\n",
" 'roles': ['licensor', 'host', 'processor'],\n",
" 'url': 'https://radiant.earth'}],\n",
" 'sci:citation': 'PlantVillage (2019) \"PlantVillage Kenya Ground Reference Crop Type Dataset\", Version 1.0, Radiant MLHub. [Date Accessed] https://doi.org/10.34911/RDNT.U41J87',\n",
" 'extent': {'spatial': {'bbox': [[34.18191992149459,\n",
" 0.4724181558451209,\n",
" 34.3714943155646,\n",
" 0.7144217206851109]]},\n",
" 'temporal': {'interval': [['2018-04-10T00:00:00Z',\n",
" '2020-03-13T00:00:00Z']]}},\n",
" 'license': 'CC-BY-SA-4.0'}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kenya_crops_labels = next(\n",
" c for c in collections if c.id == \"ref_african_crops_kenya_01_labels\"\n",
")\n",
"kenya_crops_labels.to_dict()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Download Data Archives\n",
"\n",
"A typical workflow for downloading assets from a STAC Catalog would involve looping through all Items and downloading the associated assets. However, the ML training datasets published through Radiant MLHub can sometimes have thousands or hundreds of thousands of Items, making this workflow be *very* time-consuming for larger datasets. For faster access to the assets for an entire dataset, MLHub provides TAR archives of all collections that can be downloaded using the `/archive/{collection_id}` endpoint. \n",
"\n",
"We will use the `MLHubSession` instance we created above to ensure that our API key is sent with each request."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# Create a temporary directory\n",
"tmp_dir = tempfile.mkdtemp()\n",
"archive_path = os.path.join(tmp_dir, \"ref_african_crops_kenya_01_labels.tar.gz\")\n",
"\n",
"# Fetch the archive and save to disk\n",
"response = session.get(\n",
" \"/archive/ref_african_crops_kenya_01_labels\", allow_redirects=True\n",
")\n",
"with open(archive_path, \"wb\") as dst:\n",
" dst.write(response.content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we clean up the temporary directory"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"shutil.rmtree(tmp_dir)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Next Steps\n",
"\n",
"This tutorial was a quick introduction to working with the Radiant MLHub API in a notebook. For more, see:\n",
"\n",
"- [Reading Data from the STAC API](./reading-stac.ipynb)\n",
"- [How to use the Radiant MLHub API to browse and download the LandCoverNet dataset](https://github.com/microsoft/PlanetaryComputerExamples/blob/main/tutorials/radiant-mlhub-landcovernet.ipynb)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
}