{ "cells": [ { "cell_type": "markdown", "id": "8ea8b39f-23cb-47c9-aa0c-a88f5ede4885", "metadata": {}, "source": [ "## Bulk STAC item queries with GeoParquet\n", "\n", "In addition to its [STAC API](https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac/), the Planetary Computer also provides access to STAC items as [geoparquet datasets](https://github.com/opengeospatial/geoparquet). These parquet datasets can be used for \"bulk\" workloads, where the search might return a very large number of items, or if it might require many separate queries to get your desired result. In general, these parquet datasets are produced with a lag relative to what's available through the STAC API. Most use-cases, including those that need recently added assets, should use our [STAC API](https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac/).\n", "\n", "This example shows how to load STAC items from a Parquet dataset into a [geopandas](https://geopandas.readthedocs.io/) GeoDataFrame. A similar workflow would be possible with R's [geoarrow](https://wcjochem.github.io/sfarrow/index.html) package, or any other library that can read [GeoParquet](https://github.com/opengeospatial/geoparquet#current-implementations--examples)." ] }, { "cell_type": "code", "execution_count": 1, "id": "f6a65f00-2d8b-4d2a-b92b-804990a3ebe4", "metadata": {}, "outputs": [], "source": [ "import dask.dataframe as dd\n", "import geopandas\n", "import planetary_computer\n", "import pystac_client\n", "import pandas as pd\n", "\n", "pd.options.display.max_columns = 8" ] }, { "cell_type": "markdown", "id": "9d445f67-fbf2-4c20-b8ae-7bd7c65308e1", "metadata": {}, "source": [ "### Loading STAC Items\n", "\n", "Each STAC collection providing a geoparquet dataset has a collection-level asset under the `geoparquet-items` key." ] }, { "cell_type": "code", "execution_count": 2, "id": "f24ab006-2d86-41a6-b728-2e0bcb2a2511", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
typestac_versionstac_extensionsid...end_datetimeproj:transformstart_datetimeio:supercell_id
0Feature1.0.0[https://stac-extensions.github.io/projection/...12Q-2017...2018-01-01 00:00:00+00:00[10.0, 0.0, 178910.0, 0.0, -10.0, 2657470.0]2017-01-01 00:00:00+00:0012Q
1Feature1.0.0[https://stac-extensions.github.io/projection/...15R-2017...2018-01-01 00:00:00+00:00[10.0, 0.0, 194773.70566898846, 0.0, -10.0, 35...2017-01-01 00:00:00+00:0015R
2Feature1.0.0[https://stac-extensions.github.io/projection/...16M-2017...2018-01-01 00:00:00+00:00[10.0, 0.0, 166023.6435927535, 0.0, -10.0, 999...2017-01-01 00:00:00+00:0016M
3Feature1.0.0[https://stac-extensions.github.io/projection/...20L-2022...2023-01-01 00:00:00+00:00[10.0, 0.0, 169256.89710350422, 0.0, -10.0, 91...2022-01-01 00:00:00+00:0020L
4Feature1.0.0[https://stac-extensions.github.io/projection/...20M-2019...2020-01-01 00:00:00+00:00[10.0, 0.0, 166023.6435927521, 0.0, -10.0, 999...2019-01-01 00:00:00+00:0020M
\n", "

5 rows × 18 columns

\n", "
" ], "text/plain": [ " type stac_version stac_extensions \\\n", "0 Feature 1.0.0 [https://stac-extensions.github.io/projection/... \n", "1 Feature 1.0.0 [https://stac-extensions.github.io/projection/... \n", "2 Feature 1.0.0 [https://stac-extensions.github.io/projection/... \n", "3 Feature 1.0.0 [https://stac-extensions.github.io/projection/... \n", "4 Feature 1.0.0 [https://stac-extensions.github.io/projection/... \n", "\n", " id ... end_datetime \\\n", "0 12Q-2017 ... 2018-01-01 00:00:00+00:00 \n", "1 15R-2017 ... 2018-01-01 00:00:00+00:00 \n", "2 16M-2017 ... 2018-01-01 00:00:00+00:00 \n", "3 20L-2022 ... 2023-01-01 00:00:00+00:00 \n", "4 20M-2019 ... 2020-01-01 00:00:00+00:00 \n", "\n", " proj:transform \\\n", "0 [10.0, 0.0, 178910.0, 0.0, -10.0, 2657470.0] \n", "1 [10.0, 0.0, 194773.70566898846, 0.0, -10.0, 35... \n", "2 [10.0, 0.0, 166023.6435927535, 0.0, -10.0, 999... \n", "3 [10.0, 0.0, 169256.89710350422, 0.0, -10.0, 91... \n", "4 [10.0, 0.0, 166023.6435927521, 0.0, -10.0, 999... \n", "\n", " start_datetime io:supercell_id \n", "0 2017-01-01 00:00:00+00:00 12Q \n", "1 2017-01-01 00:00:00+00:00 15R \n", "2 2017-01-01 00:00:00+00:00 16M \n", "3 2022-01-01 00:00:00+00:00 20L \n", "4 2019-01-01 00:00:00+00:00 20M \n", "\n", "[5 rows x 18 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "catalog = pystac_client.Client.open(\n", " \"https://planetarycomputer.microsoft.com/api/stac/v1/\",\n", " modifier=planetary_computer.sign_inplace,\n", ")\n", "\n", "\n", "asset = catalog.get_collection(\"io-lulc-9-class\").assets[\"geoparquet-items\"]\n", "\n", "df = geopandas.read_parquet(\n", " asset.href, storage_options=asset.extra_fields[\"table:storage_options\"]\n", ")\n", "df.head()" ] }, { "cell_type": "markdown", "id": "e9d44e67-3eb5-4118-89b6-adcab840410c", "metadata": {}, "source": [ "Now we can do things like look at the count of each `proj:epsg` code." ] }, { "cell_type": "code", "execution_count": 3, "id": "c44499b8-9da7-43f5-8d3e-f842a26c1f8f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "proj:epsg\n", "32616 60\n", "32638 60\n", "32650 60\n", "32648 60\n", "32617 60\n", " ..\n", "32714 12\n", "32710 11\n", "32708 10\n", "32711 6\n", "32727 6\n", "Name: count, Length: 120, dtype: int64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"proj:epsg\"].value_counts()" ] }, { "cell_type": "markdown", "id": "3e3c06fd-0f47-4626-8ffe-aa1ec866e5ce", "metadata": {}, "source": [ "Or filter the items to a specific code and plot the footprints." ] }, { "cell_type": "code", "execution_count": 4, "id": "e33c53aa-bbab-4753-8c7b-550d1a81d870", "metadata": {}, "outputs": [], "source": [ "subset = df.loc[df[\"proj:epsg\"] == 32651, [\"io:tile_id\", \"geometry\"]]" ] }, { "cell_type": "code", "execution_count": 5, "id": "65c06109-11d7-460c-8182-12a600217644", "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import contextily\n", "\n", "ax = subset.plot(figsize=(4, 20), color=\"none\", edgecolor=\"yellow\")\n", "contextily.add_basemap(\n", " ax, crs=df.crs.to_string(), source=contextily.providers.Esri.NatGeoWorldMap\n", ")\n", "\n", "ax.set_axis_off()" ] }, { "cell_type": "markdown", "id": "aa7e885e-3187-4134-bc94-53e6c1b2e977", "metadata": {}, "source": [ "### Schemas\n", "\n", "Each parquet dataset has a unique schema, reflecting the unique properties captured in each collection. But there are some general patterns.\n", "\n", "1. Each dataset has a column for the properties required on a STAC item (`type`, `stac_version`, `stac_extensions`, `id`, `geometry`, `bbox`, `links`, `assets`, and `collection`).\n", "2. All fields under `properties` are lifted to the top-level, including datetime-related fields like `datetime`, `start_datetime`, `end_datetime`, common metadata (e.g. `platform`) and extension fields (e.g. `proj:bbox`, ...).\n", "3. Dynamic datasets, where new items are regularly added, are partitioned by time. \n", "\n", "### Partitioning\n", "\n", "Depending on the number of STAC items in the collection and whether or not new items are being added, the Parquet dataset may be split into multiple files by time.\n", "\n", "For example, the `io-lulc-9-class` collection is not partitioned and has just a single file:" ] }, { "cell_type": "code", "execution_count": 6, "id": "5b654d8e-787f-4b3a-975d-2670e7d24f00", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['items/io-lulc-9-class.parquet']" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import adlfs\n", "\n", "fs = adlfs.AzureBlobFileSystem(**asset.extra_fields[\"table:storage_options\"])\n", "fs.ls(\"items/io-lulc-9-class.parquet\") # Not partitioned, single result" ] }, { "cell_type": "markdown", "id": "df56667e-78a3-4d29-82d0-9fbc1253e5d2", "metadata": {}, "source": [ "Compare that to `sentintel-2-l2a`, which is partitioned by week." ] }, { "cell_type": "code", "execution_count": 7, "id": "3dd96e99-637c-4d2e-9414-f5896517968e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['items/sentinel-2-l2a.parquet/part-0001_2015-06-29T10:25:31+00:00_2015-07-06T10:25:31+00:00.parquet',\n", " 'items/sentinel-2-l2a.parquet/part-0002_2015-07-06T10:25:31+00:00_2015-07-13T10:25:31+00:00.parquet',\n", " 'items/sentinel-2-l2a.parquet/part-0003_2015-07-13T10:25:31+00:00_2015-07-20T10:25:31+00:00.parquet',\n", " 'items/sentinel-2-l2a.parquet/part-0004_2015-07-20T10:25:31+00:00_2015-07-27T10:25:31+00:00.parquet',\n", " 'items/sentinel-2-l2a.parquet/part-0005_2015-07-27T10:25:31+00:00_2015-08-03T10:25:31+00:00.parquet']" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fs.ls(\"items/sentinel-2-l2a.parquet\")[:5]" ] }, { "cell_type": "markdown", "id": "6b6eb7d9-e380-4074-b574-78a91474ee43", "metadata": {}, "source": [ "To work with a partitioned dataset, you can use a library like dask or dask-geopandas." ] }, { "cell_type": "code", "execution_count": 8, "id": "30b0f6f0-9860-4c96-b01b-a3975f67e8b7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
typestac_versionstac_extensionsid...s2:high_proba_clouds_percentages2:reflectance_conversion_factors2:medium_proba_clouds_percentages2:saturated_defective_pixel_percentage
0Feature1.0.0[https://stac-extensions.github.io/eo/v1.0.0/s...S2A_MSIL2A_20150704T101006_R022_T35XQA_2021041......92.5465400.9674494.8076700.0
1Feature1.0.0[https://stac-extensions.github.io/eo/v1.0.0/s...S2A_MSIL2A_20150704T101006_R022_T32TMM_2021041......0.0480350.9674490.0513760.0
2Feature1.0.0[https://stac-extensions.github.io/eo/v1.0.0/s...S2A_MSIL2A_20150704T101006_R022_T32TMN_2021041......0.0112380.9674490.0229280.0
3Feature1.0.0[https://stac-extensions.github.io/eo/v1.0.0/s...S2A_MSIL2A_20150704T101006_R022_T36WWC_2021041......65.8122660.96744919.0505610.0
4Feature1.0.0[https://stac-extensions.github.io/eo/v1.0.0/s...S2A_MSIL2A_20150704T101006_R022_T36WWD_2021041......97.6294220.9674491.8610970.0
\n", "

5 rows × 42 columns

\n", "
" ], "text/plain": [ " type stac_version stac_extensions \\\n", "0 Feature 1.0.0 [https://stac-extensions.github.io/eo/v1.0.0/s... \n", "1 Feature 1.0.0 [https://stac-extensions.github.io/eo/v1.0.0/s... \n", "2 Feature 1.0.0 [https://stac-extensions.github.io/eo/v1.0.0/s... \n", "3 Feature 1.0.0 [https://stac-extensions.github.io/eo/v1.0.0/s... \n", "4 Feature 1.0.0 [https://stac-extensions.github.io/eo/v1.0.0/s... \n", "\n", " id ... \\\n", "0 S2A_MSIL2A_20150704T101006_R022_T35XQA_2021041... ... \n", "1 S2A_MSIL2A_20150704T101006_R022_T32TMM_2021041... ... \n", "2 S2A_MSIL2A_20150704T101006_R022_T32TMN_2021041... ... \n", "3 S2A_MSIL2A_20150704T101006_R022_T36WWC_2021041... ... \n", "4 S2A_MSIL2A_20150704T101006_R022_T36WWD_2021041... ... \n", "\n", " s2:high_proba_clouds_percentage s2:reflectance_conversion_factor \\\n", "0 92.546540 0.967449 \n", "1 0.048035 0.967449 \n", "2 0.011238 0.967449 \n", "3 65.812266 0.967449 \n", "4 97.629422 0.967449 \n", "\n", " s2:medium_proba_clouds_percentage s2:saturated_defective_pixel_percentage \n", "0 4.807670 0.0 \n", "1 0.051376 0.0 \n", "2 0.022928 0.0 \n", "3 19.050561 0.0 \n", "4 1.861097 0.0 \n", "\n", "[5 rows x 42 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "asset = catalog.get_collection(\"sentinel-2-l2a\").assets[\"geoparquet-items\"]\n", "\n", "s2l2a = dd.read_parquet(\n", " asset.href, storage_options=asset.extra_fields[\"table:storage_options\"]\n", ")\n", "s2l2a.head()" ] }, { "cell_type": "markdown", "id": "1a64eba5-b2e1-45f9-a379-e6424ad06c41", "metadata": {}, "source": [ "You can perform filtering operations on the entire collection." ] }, { "cell_type": "code", "execution_count": 9, "id": "eb0ef08d-5f9f-41d8-a3b1-c3be86504ea2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
typestac_versionstac_extensionsid...s2:high_proba_clouds_percentages2:reflectance_conversion_factors2:medium_proba_clouds_percentages2:saturated_defective_pixel_percentage
27Feature1.0.0[https://stac-extensions.github.io/eo/v1.0.0/s...S2A_MSIL2A_20150704T101006_R022_T32RMS_2021041......0.0000000.9674490.0000000.0
56Feature1.0.0[https://stac-extensions.github.io/eo/v1.0.0/s...S2A_MSIL2A_20150704T101006_R022_T31PFP_2021041......2.1697010.9674491.0148100.0
68Feature1.0.0[https://stac-extensions.github.io/eo/v1.0.0/s...S2A_MSIL2A_20150704T101006_R022_T31QGU_2021041......0.2114870.9674490.1716590.0
77Feature1.0.0[https://stac-extensions.github.io/eo/v1.0.0/s...S2A_MSIL2A_20150704T101006_R022_T31SGT_2021041......1.7101710.9674491.7127330.0
80Feature1.0.0[https://stac-extensions.github.io/eo/v1.0.0/s...S2A_MSIL2A_20150704T101006_R022_T31QHC_2021041......0.0000000.9674490.0000000.0
\n", "

5 rows × 42 columns

\n", "
" ], "text/plain": [ " type stac_version stac_extensions \\\n", "27 Feature 1.0.0 [https://stac-extensions.github.io/eo/v1.0.0/s... \n", "56 Feature 1.0.0 [https://stac-extensions.github.io/eo/v1.0.0/s... \n", "68 Feature 1.0.0 [https://stac-extensions.github.io/eo/v1.0.0/s... \n", "77 Feature 1.0.0 [https://stac-extensions.github.io/eo/v1.0.0/s... \n", "80 Feature 1.0.0 [https://stac-extensions.github.io/eo/v1.0.0/s... \n", "\n", " id ... \\\n", "27 S2A_MSIL2A_20150704T101006_R022_T32RMS_2021041... ... \n", "56 S2A_MSIL2A_20150704T101006_R022_T31PFP_2021041... ... \n", "68 S2A_MSIL2A_20150704T101006_R022_T31QGU_2021041... ... \n", "77 S2A_MSIL2A_20150704T101006_R022_T31SGT_2021041... ... \n", "80 S2A_MSIL2A_20150704T101006_R022_T31QHC_2021041... ... \n", "\n", " s2:high_proba_clouds_percentage s2:reflectance_conversion_factor \\\n", "27 0.000000 0.967449 \n", "56 2.169701 0.967449 \n", "68 0.211487 0.967449 \n", "77 1.710171 0.967449 \n", "80 0.000000 0.967449 \n", "\n", " s2:medium_proba_clouds_percentage s2:saturated_defective_pixel_percentage \n", "27 0.000000 0.0 \n", "56 1.014810 0.0 \n", "68 0.171659 0.0 \n", "77 1.712733 0.0 \n", "80 0.000000 0.0 \n", "\n", "[5 rows x 42 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mask = (s2l2a[\"eo:cloud_cover\"] < 10) & (s2l2a[\"s2:nodata_pixel_percentage\"] > 90)\n", "keep = s2l2a[mask]\n", "keep.head()" ] }, { "cell_type": "markdown", "id": "ec2b6360-9912-46eb-aacb-d4fad891c313", "metadata": {}, "source": [ "When you compute the results, the computation will run in parallel. See [Scale with Dask](https://planetarycomputer.microsoft.com/docs/quickstarts/scale-with-dask/) for more.\n", "\n", "As mentioned earlier, the different collections have different properties, and so have different columns in the DataFrame." ] }, { "cell_type": "code", "execution_count": 10, "id": "84bbb9b8-40f0-4b9e-9a00-67156a058c96", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['type',\n", " 'stac_version',\n", " 'stac_extensions',\n", " 'id',\n", " 'geometry',\n", " 'bbox',\n", " 'links',\n", " 'assets',\n", " 'collection',\n", " 'datetime',\n", " 'platform',\n", " 'proj:epsg',\n", " 'instruments',\n", " 's2:mgrs_tile',\n", " 'constellation',\n", " 's2:granule_id',\n", " 'eo:cloud_cover',\n", " 's2:datatake_id',\n", " 's2:product_uri',\n", " 's2:datastrip_id',\n", " 's2:product_type',\n", " 'sat:orbit_state',\n", " 's2:datatake_type',\n", " 's2:generation_time',\n", " 'sat:relative_orbit',\n", " 's2:water_percentage',\n", " 's2:mean_solar_zenith',\n", " 's2:mean_solar_azimuth',\n", " 's2:processing_baseline',\n", " 's2:snow_ice_percentage',\n", " 's2:vegetation_percentage',\n", " 's2:thin_cirrus_percentage',\n", " 's2:cloud_shadow_percentage',\n", " 's2:nodata_pixel_percentage',\n", " 's2:unclassified_percentage',\n", " 's2:dark_features_percentage',\n", " 's2:not_vegetated_percentage',\n", " 's2:degraded_msi_data_percentage',\n", " 's2:high_proba_clouds_percentage',\n", " 's2:reflectance_conversion_factor',\n", " 's2:medium_proba_clouds_percentage',\n", " 's2:saturated_defective_pixel_percentage']" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s2l2a.columns.tolist()" ] }, { "cell_type": "markdown", "id": "26ecde40-e42c-4584-8693-5e5714879c98", "metadata": {}, "source": [ "Different collections will be partitioned by different frequencies, depending on the update cadence, number of STAC items, and size of each STAC item. Look for an `msft:partition_info` property on the asset to check if the dataset is partitioned. The `partition_frequency` is a [pandas Offset alias](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)." ] }, { "cell_type": "code", "execution_count": 11, "id": "ea8f3bfd-0a5e-4a5d-9e44-9f3260095541", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'is_partitioned': True, 'partition_frequency': 'W-MON'}" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "asset.extra_fields[\"msft:partition_info\"]" ] }, { "cell_type": "markdown", "id": "3af4fb34-4250-4bc3-b6c0-5482d59d8572", "metadata": {}, "source": [ "### Expanding nested fields" ] }, { "cell_type": "markdown", "id": "20f226fe-96f0-4bcc-9f9d-a3dba988a8c1", "metadata": {}, "source": [ "STAC items are highly nested data structures, while libraries like pandas were mostly designed for working with non-nested data types. Consider a column like `assets`, which is a dictionary mapping asset keys to asset objects (which include an `href` and other properties)." ] }, { "cell_type": "code", "execution_count": 12, "id": "d3cf24f0-a794-4864-981d-5fb4abd9387e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 {'data': {'file:size': 53208880, 'file:values'...\n", "1 {'data': {'file:size': 114187155, 'file:values...\n", "2 {'data': {'file:size': 53981476, 'file:values'...\n", "3 {'data': {'file:size': 165601021, 'file:values...\n", "4 {'data': {'file:size': 97175834, 'file:values'...\n", "Name: assets, dtype: object" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"assets\"].head()" ] }, { "cell_type": "markdown", "id": "b1259961-9d29-4600-9f07-c27147ea5682", "metadata": {}, "source": [ "The [json_normalize](https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html) method can be used to expand this single column of nested data into many columns, one per asset:" ] }, { "cell_type": "code", "execution_count": 13, "id": "dee39f6e-0123-496b-becc-15da5aee98fd", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
data.file:sizedata.file:valuesdata.hrefdata.raster:bands...tilejson.hreftilejson.rolestilejson.titletilejson.type
053208880[{'summary': 'No Data', 'values': [0]}, {'summ...https://ai4edataeuwest.blob.core.windows.net/i...[{'nodata': 0, 'spatial_resolution': 10}]...https://planetarycomputer.microsoft.com/api/da...[tiles]TileJSON with default renderingapplication/json
1114187155[{'summary': 'No Data', 'values': [0]}, {'summ...https://ai4edataeuwest.blob.core.windows.net/i...[{'nodata': 0, 'spatial_resolution': 10}]...https://planetarycomputer.microsoft.com/api/da...[tiles]TileJSON with default renderingapplication/json
253981476[{'summary': 'No Data', 'values': [0]}, {'summ...https://ai4edataeuwest.blob.core.windows.net/i...[{'nodata': 0, 'spatial_resolution': 10}]...https://planetarycomputer.microsoft.com/api/da...[tiles]TileJSON with default renderingapplication/json
3165601021[{'summary': 'No Data', 'values': [0]}, {'summ...https://ai4edataeuwest.blob.core.windows.net/i...[{'nodata': 0, 'spatial_resolution': 10}]...https://planetarycomputer.microsoft.com/api/da...[tiles]TileJSON with default renderingapplication/json
497175834[{'summary': 'No Data', 'values': [0]}, {'summ...https://ai4edataeuwest.blob.core.windows.net/i...[{'nodata': 0, 'spatial_resolution': 10}]...https://planetarycomputer.microsoft.com/api/da...[tiles]TileJSON with default renderingapplication/json
\n", "

5 rows × 15 columns

\n", "
" ], "text/plain": [ " data.file:size data.file:values \\\n", "0 53208880 [{'summary': 'No Data', 'values': [0]}, {'summ... \n", "1 114187155 [{'summary': 'No Data', 'values': [0]}, {'summ... \n", "2 53981476 [{'summary': 'No Data', 'values': [0]}, {'summ... \n", "3 165601021 [{'summary': 'No Data', 'values': [0]}, {'summ... \n", "4 97175834 [{'summary': 'No Data', 'values': [0]}, {'summ... \n", "\n", " data.href \\\n", "0 https://ai4edataeuwest.blob.core.windows.net/i... \n", "1 https://ai4edataeuwest.blob.core.windows.net/i... \n", "2 https://ai4edataeuwest.blob.core.windows.net/i... \n", "3 https://ai4edataeuwest.blob.core.windows.net/i... \n", "4 https://ai4edataeuwest.blob.core.windows.net/i... \n", "\n", " data.raster:bands ... \\\n", "0 [{'nodata': 0, 'spatial_resolution': 10}] ... \n", "1 [{'nodata': 0, 'spatial_resolution': 10}] ... \n", "2 [{'nodata': 0, 'spatial_resolution': 10}] ... \n", "3 [{'nodata': 0, 'spatial_resolution': 10}] ... \n", "4 [{'nodata': 0, 'spatial_resolution': 10}] ... \n", "\n", " tilejson.href tilejson.roles \\\n", "0 https://planetarycomputer.microsoft.com/api/da... [tiles] \n", "1 https://planetarycomputer.microsoft.com/api/da... [tiles] \n", "2 https://planetarycomputer.microsoft.com/api/da... [tiles] \n", "3 https://planetarycomputer.microsoft.com/api/da... [tiles] \n", "4 https://planetarycomputer.microsoft.com/api/da... [tiles] \n", "\n", " tilejson.title tilejson.type \n", "0 TileJSON with default rendering application/json \n", "1 TileJSON with default rendering application/json \n", "2 TileJSON with default rendering application/json \n", "3 TileJSON with default rendering application/json \n", "4 TileJSON with default rendering application/json \n", "\n", "[5 rows x 15 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "assets = pd.json_normalize(df[\"assets\"].head())\n", "assets" ] }, { "cell_type": "markdown", "id": "f2c2b7f8-c8bd-4fff-ac30-fe481405d4f5", "metadata": {}, "source": [ "And the [explode](https://pandas.pydata.org/docs/reference/api/pandas.Series.explode.html) method will transform each element of a list-like value to a row:" ] }, { "cell_type": "code", "execution_count": 14, "id": "80aba9c8-9f82-4f9b-90f6-5180f83fef54", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 [{'summary': 'No Data', 'values': [0]}, {'summ...\n", "1 [{'summary': 'No Data', 'values': [0]}, {'summ...\n", "2 [{'summary': 'No Data', 'values': [0]}, {'summ...\n", "3 [{'summary': 'No Data', 'values': [0]}, {'summ...\n", "4 [{'summary': 'No Data', 'values': [0]}, {'summ...\n", "Name: data.file:values, dtype: object" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "assets[\"data.file:values\"]" ] }, { "cell_type": "code", "execution_count": 15, "id": "ca1df1b1-aa63-4c0f-ae97-b6fc890ff642", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "0 {'summary': 'No Data', 'values': [0]}\n", "0 {'summary': 'Water', 'values': [1]}\n", "0 {'summary': 'Trees', 'values': [2]}\n", "0 {'summary': 'Flooded vegetation', 'values': [4]}\n", "0 {'summary': 'Crops', 'values': [5]}\n", "0 {'summary': 'Built area', 'values': [7]}\n", "0 {'summary': 'Bare ground', 'values': [8]}\n", "0 {'summary': 'Snow/ice', 'values': [9]}\n", "0 {'summary': 'Clouds', 'values': [10]}\n", "0 {'summary': 'Rangeland', 'values': [11]}\n", "1 {'summary': 'No Data', 'values': [0]}\n", "1 {'summary': 'Water', 'values': [1]}\n", "1 {'summary': 'Trees', 'values': [2]}\n", "1 {'summary': 'Flooded vegetation', 'values': [4]}\n", "1 {'summary': 'Crops', 'values': [5]}\n", "1 {'summary': 'Built area', 'values': [7]}\n", "1 {'summary': 'Bare ground', 'values': [8]}\n", "1 {'summary': 'Snow/ice', 'values': [9]}\n", "1 {'summary': 'Clouds', 'values': [10]}\n", "1 {'summary': 'Rangeland', 'values': [11]}\n", "2 {'summary': 'No Data', 'values': [0]}\n", "2 {'summary': 'Water', 'values': [1]}\n", "2 {'summary': 'Trees', 'values': [2]}\n", "2 {'summary': 'Flooded vegetation', 'values': [4]}\n", "2 {'summary': 'Crops', 'values': [5]}\n", "2 {'summary': 'Built area', 'values': [7]}\n", "2 {'summary': 'Bare ground', 'values': [8]}\n", "2 {'summary': 'Snow/ice', 'values': [9]}\n", "2 {'summary': 'Clouds', 'values': [10]}\n", "2 {'summary': 'Rangeland', 'values': [11]}\n", "3 {'summary': 'No Data', 'values': [0]}\n", "3 {'summary': 'Water', 'values': [1]}\n", "3 {'summary': 'Trees', 'values': [2]}\n", "3 {'summary': 'Flooded vegetation', 'values': [4]}\n", "3 {'summary': 'Crops', 'values': [5]}\n", "3 {'summary': 'Built area', 'values': [7]}\n", "3 {'summary': 'Bare ground', 'values': [8]}\n", "3 {'summary': 'Snow/ice', 'values': [9]}\n", "3 {'summary': 'Clouds', 'values': [10]}\n", "3 {'summary': 'Rangeland', 'values': [11]}\n", "4 {'summary': 'No Data', 'values': [0]}\n", "4 {'summary': 'Water', 'values': [1]}\n", "4 {'summary': 'Trees', 'values': [2]}\n", "4 {'summary': 'Flooded vegetation', 'values': [4]}\n", "4 {'summary': 'Crops', 'values': [5]}\n", "4 {'summary': 'Built area', 'values': [7]}\n", "4 {'summary': 'Bare ground', 'values': [8]}\n", "4 {'summary': 'Snow/ice', 'values': [9]}\n", "4 {'summary': 'Clouds', 'values': [10]}\n", "4 {'summary': 'Rangeland', 'values': [11]}\n", "Name: data.file:values, dtype: object" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "assets[\"data.file:values\"].explode()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }