{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "8ea8b39f-23cb-47c9-aa0c-a88f5ede4885",
   "metadata": {},
   "source": [
    "## Bulk STAC item queries with GeoParquet\n",
    "\n",
    "In addition to its [STAC API](https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac/), the Planetary Computer also provides access to STAC items as [geoparquet datasets](https://github.com/opengeospatial/geoparquet). These parquet datasets can be used for \"bulk\" workloads, where the search might return a very large number of items, or if it might require many separate queries to get your desired result. In general, these parquet datasets are produced with a lag relative to what's available through the STAC API. Most use-cases, including those that need recently added assets, should use our [STAC API](https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac/).\n",
    "\n",
    "This example shows how to load STAC items from a Parquet dataset into a [geopandas](https://geopandas.readthedocs.io/) GeoDataFrame. A similar workflow would be possible with R's [geoarrow](https://wcjochem.github.io/sfarrow/index.html) package, or any other library that can read [GeoParquet](https://github.com/opengeospatial/geoparquet#current-implementations--examples)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "f6a65f00-2d8b-4d2a-b92b-804990a3ebe4",
   "metadata": {},
   "outputs": [],
   "source": [
    "import dask.dataframe as dd\n",
    "import geopandas\n",
    "import planetary_computer\n",
    "import pystac_client\n",
    "import pandas as pd\n",
    "\n",
    "pd.options.display.max_columns = 8"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d445f67-fbf2-4c20-b8ae-7bd7c65308e1",
   "metadata": {},
   "source": [
    "### Loading STAC Items\n",
    "\n",
    "Each STAC collection providing a geoparquet dataset has a collection-level asset under the `geoparquet-items` key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "f24ab006-2d86-41a6-b728-2e0bcb2a2511",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>stac_version</th>\n",
       "      <th>stac_extensions</th>\n",
       "      <th>id</th>\n",
       "      <th>...</th>\n",
       "      <th>end_datetime</th>\n",
       "      <th>proj:transform</th>\n",
       "      <th>start_datetime</th>\n",
       "      <th>io:supercell_id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Feature</td>\n",
       "      <td>1.0.0</td>\n",
       "      <td>[https://stac-extensions.github.io/projection/...</td>\n",
       "      <td>12Q-2017</td>\n",
       "      <td>...</td>\n",
       "      <td>2018-01-01 00:00:00+00:00</td>\n",
       "      <td>[10.0, 0.0, 178910.0, 0.0, -10.0, 2657470.0]</td>\n",
       "      <td>2017-01-01 00:00:00+00:00</td>\n",
       "      <td>12Q</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Feature</td>\n",
       "      <td>1.0.0</td>\n",
       "      <td>[https://stac-extensions.github.io/projection/...</td>\n",
       "      <td>15R-2017</td>\n",
       "      <td>...</td>\n",
       "      <td>2018-01-01 00:00:00+00:00</td>\n",
       "      <td>[10.0, 0.0, 194773.70566898846, 0.0, -10.0, 35...</td>\n",
       "      <td>2017-01-01 00:00:00+00:00</td>\n",
       "      <td>15R</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Feature</td>\n",
       "      <td>1.0.0</td>\n",
       "      <td>[https://stac-extensions.github.io/projection/...</td>\n",
       "      <td>16M-2017</td>\n",
       "      <td>...</td>\n",
       "      <td>2018-01-01 00:00:00+00:00</td>\n",
       "      <td>[10.0, 0.0, 166023.6435927535, 0.0, -10.0, 999...</td>\n",
       "      <td>2017-01-01 00:00:00+00:00</td>\n",
       "      <td>16M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Feature</td>\n",
       "      <td>1.0.0</td>\n",
       "      <td>[https://stac-extensions.github.io/projection/...</td>\n",
       "      <td>20L-2022</td>\n",
       "      <td>...</td>\n",
       "      <td>2023-01-01 00:00:00+00:00</td>\n",
       "      <td>[10.0, 0.0, 169256.89710350422, 0.0, -10.0, 91...</td>\n",
       "      <td>2022-01-01 00:00:00+00:00</td>\n",
       "      <td>20L</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Feature</td>\n",
       "      <td>1.0.0</td>\n",
       "      <td>[https://stac-extensions.github.io/projection/...</td>\n",
       "      <td>20M-2019</td>\n",
       "      <td>...</td>\n",
       "      <td>2020-01-01 00:00:00+00:00</td>\n",
       "      <td>[10.0, 0.0, 166023.6435927521, 0.0, -10.0, 999...</td>\n",
       "      <td>2019-01-01 00:00:00+00:00</td>\n",
       "      <td>20M</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 18 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      type stac_version                                    stac_extensions  \\\n",
       "0  Feature        1.0.0  [https://stac-extensions.github.io/projection/...   \n",
       "1  Feature        1.0.0  [https://stac-extensions.github.io/projection/...   \n",
       "2  Feature        1.0.0  [https://stac-extensions.github.io/projection/...   \n",
       "3  Feature        1.0.0  [https://stac-extensions.github.io/projection/...   \n",
       "4  Feature        1.0.0  [https://stac-extensions.github.io/projection/...   \n",
       "\n",
       "         id  ...              end_datetime  \\\n",
       "0  12Q-2017  ... 2018-01-01 00:00:00+00:00   \n",
       "1  15R-2017  ... 2018-01-01 00:00:00+00:00   \n",
       "2  16M-2017  ... 2018-01-01 00:00:00+00:00   \n",
       "3  20L-2022  ... 2023-01-01 00:00:00+00:00   \n",
       "4  20M-2019  ... 2020-01-01 00:00:00+00:00   \n",
       "\n",
       "                                      proj:transform  \\\n",
       "0       [10.0, 0.0, 178910.0, 0.0, -10.0, 2657470.0]   \n",
       "1  [10.0, 0.0, 194773.70566898846, 0.0, -10.0, 35...   \n",
       "2  [10.0, 0.0, 166023.6435927535, 0.0, -10.0, 999...   \n",
       "3  [10.0, 0.0, 169256.89710350422, 0.0, -10.0, 91...   \n",
       "4  [10.0, 0.0, 166023.6435927521, 0.0, -10.0, 999...   \n",
       "\n",
       "             start_datetime io:supercell_id  \n",
       "0 2017-01-01 00:00:00+00:00             12Q  \n",
       "1 2017-01-01 00:00:00+00:00             15R  \n",
       "2 2017-01-01 00:00:00+00:00             16M  \n",
       "3 2022-01-01 00:00:00+00:00             20L  \n",
       "4 2019-01-01 00:00:00+00:00             20M  \n",
       "\n",
       "[5 rows x 18 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "catalog = pystac_client.Client.open(\n",
    "    \"https://planetarycomputer.microsoft.com/api/stac/v1/\",\n",
    "    modifier=planetary_computer.sign_inplace,\n",
    ")\n",
    "\n",
    "\n",
    "asset = catalog.get_collection(\"io-lulc-9-class\").assets[\"geoparquet-items\"]\n",
    "\n",
    "df = geopandas.read_parquet(\n",
    "    asset.href, storage_options=asset.extra_fields[\"table:storage_options\"]\n",
    ")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9d44e67-3eb5-4118-89b6-adcab840410c",
   "metadata": {},
   "source": [
    "Now we can do things like look at the count of each `proj:epsg` code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "c44499b8-9da7-43f5-8d3e-f842a26c1f8f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "proj:epsg\n",
       "32616    60\n",
       "32638    60\n",
       "32650    60\n",
       "32648    60\n",
       "32617    60\n",
       "         ..\n",
       "32714    12\n",
       "32710    11\n",
       "32708    10\n",
       "32711     6\n",
       "32727     6\n",
       "Name: count, Length: 120, dtype: int64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"proj:epsg\"].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e3c06fd-0f47-4626-8ffe-aa1ec866e5ce",
   "metadata": {},
   "source": [
    "Or filter the items to a specific code and plot the footprints."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "e33c53aa-bbab-4753-8c7b-550d1a81d870",
   "metadata": {},
   "outputs": [],
   "source": [
    "subset = df.loc[df[\"proj:epsg\"] == 32651, [\"io:tile_id\", \"geometry\"]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "65c06109-11d7-460c-8182-12a600217644",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"https://ai4edatasetspublicassets.blob.core.windows.net/assets/notebook-output/quickstarts-stac-geoparquet.ipynb/5.png\"/>"
      ],
      "text/plain": [
       "<Figure size 400x2000 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import contextily\n",
    "\n",
    "ax = subset.plot(figsize=(4, 20), color=\"none\", edgecolor=\"yellow\")\n",
    "contextily.add_basemap(\n",
    "    ax, crs=df.crs.to_string(), source=contextily.providers.Esri.NatGeoWorldMap\n",
    ")\n",
    "\n",
    "ax.set_axis_off()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa7e885e-3187-4134-bc94-53e6c1b2e977",
   "metadata": {},
   "source": [
    "### Schemas\n",
    "\n",
    "Each parquet dataset has a unique schema, reflecting the unique properties captured in each collection. But there are some general patterns.\n",
    "\n",
    "1. Each dataset has a column for the properties required on a STAC item (`type`, `stac_version`, `stac_extensions`, `id`, `geometry`, `bbox`, `links`, `assets`, and `collection`).\n",
    "2. All fields under `properties` are lifted to the top-level, including datetime-related fields like `datetime`, `start_datetime`, `end_datetime`, common metadata (e.g. `platform`) and extension fields (e.g. `proj:bbox`, ...).\n",
    "3. Dynamic datasets, where new items are regularly added, are partitioned by time. \n",
    "\n",
    "### Partitioning\n",
    "\n",
    "Depending on the number of STAC items in the collection and whether or not new items are being added, the Parquet dataset may be split into multiple files by time.\n",
    "\n",
    "For example, the `io-lulc-9-class` collection is not partitioned and has just a single file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "5b654d8e-787f-4b3a-975d-2670e7d24f00",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['items/io-lulc-9-class.parquet']"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import adlfs\n",
    "\n",
    "fs = adlfs.AzureBlobFileSystem(**asset.extra_fields[\"table:storage_options\"])\n",
    "fs.ls(\"items/io-lulc-9-class.parquet\")  # Not partitioned, single result"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df56667e-78a3-4d29-82d0-9fbc1253e5d2",
   "metadata": {},
   "source": [
    "Compare that to `sentintel-2-l2a`, which is partitioned by week."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "3dd96e99-637c-4d2e-9414-f5896517968e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['items/sentinel-2-l2a.parquet/part-0001_2015-06-29T10:25:31+00:00_2015-07-06T10:25:31+00:00.parquet',\n",
       " 'items/sentinel-2-l2a.parquet/part-0002_2015-07-06T10:25:31+00:00_2015-07-13T10:25:31+00:00.parquet',\n",
       " 'items/sentinel-2-l2a.parquet/part-0003_2015-07-13T10:25:31+00:00_2015-07-20T10:25:31+00:00.parquet',\n",
       " 'items/sentinel-2-l2a.parquet/part-0004_2015-07-20T10:25:31+00:00_2015-07-27T10:25:31+00:00.parquet',\n",
       " 'items/sentinel-2-l2a.parquet/part-0005_2015-07-27T10:25:31+00:00_2015-08-03T10:25:31+00:00.parquet']"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fs.ls(\"items/sentinel-2-l2a.parquet\")[:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b6eb7d9-e380-4074-b574-78a91474ee43",
   "metadata": {},
   "source": [
    "To work with a partitioned dataset, you can use a library like dask or dask-geopandas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "30b0f6f0-9860-4c96-b01b-a3975f67e8b7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>stac_version</th>\n",
       "      <th>stac_extensions</th>\n",
       "      <th>id</th>\n",
       "      <th>...</th>\n",
       "      <th>s2:high_proba_clouds_percentage</th>\n",
       "      <th>s2:reflectance_conversion_factor</th>\n",
       "      <th>s2:medium_proba_clouds_percentage</th>\n",
       "      <th>s2:saturated_defective_pixel_percentage</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Feature</td>\n",
       "      <td>1.0.0</td>\n",
       "      <td>[https://stac-extensions.github.io/eo/v1.0.0/s...</td>\n",
       "      <td>S2A_MSIL2A_20150704T101006_R022_T35XQA_2021041...</td>\n",
       "      <td>...</td>\n",
       "      <td>92.546540</td>\n",
       "      <td>0.967449</td>\n",
       "      <td>4.807670</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Feature</td>\n",
       "      <td>1.0.0</td>\n",
       "      <td>[https://stac-extensions.github.io/eo/v1.0.0/s...</td>\n",
       "      <td>S2A_MSIL2A_20150704T101006_R022_T32TMM_2021041...</td>\n",
       "      <td>...</td>\n",
       "      <td>0.048035</td>\n",
       "      <td>0.967449</td>\n",
       "      <td>0.051376</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Feature</td>\n",
       "      <td>1.0.0</td>\n",
       "      <td>[https://stac-extensions.github.io/eo/v1.0.0/s...</td>\n",
       "      <td>S2A_MSIL2A_20150704T101006_R022_T32TMN_2021041...</td>\n",
       "      <td>...</td>\n",
       "      <td>0.011238</td>\n",
       "      <td>0.967449</td>\n",
       "      <td>0.022928</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Feature</td>\n",
       "      <td>1.0.0</td>\n",
       "      <td>[https://stac-extensions.github.io/eo/v1.0.0/s...</td>\n",
       "      <td>S2A_MSIL2A_20150704T101006_R022_T36WWC_2021041...</td>\n",
       "      <td>...</td>\n",
       "      <td>65.812266</td>\n",
       "      <td>0.967449</td>\n",
       "      <td>19.050561</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Feature</td>\n",
       "      <td>1.0.0</td>\n",
       "      <td>[https://stac-extensions.github.io/eo/v1.0.0/s...</td>\n",
       "      <td>S2A_MSIL2A_20150704T101006_R022_T36WWD_2021041...</td>\n",
       "      <td>...</td>\n",
       "      <td>97.629422</td>\n",
       "      <td>0.967449</td>\n",
       "      <td>1.861097</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 42 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      type stac_version                                    stac_extensions  \\\n",
       "0  Feature        1.0.0  [https://stac-extensions.github.io/eo/v1.0.0/s...   \n",
       "1  Feature        1.0.0  [https://stac-extensions.github.io/eo/v1.0.0/s...   \n",
       "2  Feature        1.0.0  [https://stac-extensions.github.io/eo/v1.0.0/s...   \n",
       "3  Feature        1.0.0  [https://stac-extensions.github.io/eo/v1.0.0/s...   \n",
       "4  Feature        1.0.0  [https://stac-extensions.github.io/eo/v1.0.0/s...   \n",
       "\n",
       "                                                  id  ...  \\\n",
       "0  S2A_MSIL2A_20150704T101006_R022_T35XQA_2021041...  ...   \n",
       "1  S2A_MSIL2A_20150704T101006_R022_T32TMM_2021041...  ...   \n",
       "2  S2A_MSIL2A_20150704T101006_R022_T32TMN_2021041...  ...   \n",
       "3  S2A_MSIL2A_20150704T101006_R022_T36WWC_2021041...  ...   \n",
       "4  S2A_MSIL2A_20150704T101006_R022_T36WWD_2021041...  ...   \n",
       "\n",
       "  s2:high_proba_clouds_percentage s2:reflectance_conversion_factor  \\\n",
       "0                       92.546540                         0.967449   \n",
       "1                        0.048035                         0.967449   \n",
       "2                        0.011238                         0.967449   \n",
       "3                       65.812266                         0.967449   \n",
       "4                       97.629422                         0.967449   \n",
       "\n",
       "  s2:medium_proba_clouds_percentage s2:saturated_defective_pixel_percentage  \n",
       "0                          4.807670                                     0.0  \n",
       "1                          0.051376                                     0.0  \n",
       "2                          0.022928                                     0.0  \n",
       "3                         19.050561                                     0.0  \n",
       "4                          1.861097                                     0.0  \n",
       "\n",
       "[5 rows x 42 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "asset = catalog.get_collection(\"sentinel-2-l2a\").assets[\"geoparquet-items\"]\n",
    "\n",
    "s2l2a = dd.read_parquet(\n",
    "    asset.href, storage_options=asset.extra_fields[\"table:storage_options\"]\n",
    ")\n",
    "s2l2a.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a64eba5-b2e1-45f9-a379-e6424ad06c41",
   "metadata": {},
   "source": [
    "You can perform filtering operations on the entire collection."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "eb0ef08d-5f9f-41d8-a3b1-c3be86504ea2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>stac_version</th>\n",
       "      <th>stac_extensions</th>\n",
       "      <th>id</th>\n",
       "      <th>...</th>\n",
       "      <th>s2:high_proba_clouds_percentage</th>\n",
       "      <th>s2:reflectance_conversion_factor</th>\n",
       "      <th>s2:medium_proba_clouds_percentage</th>\n",
       "      <th>s2:saturated_defective_pixel_percentage</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Feature</td>\n",
       "      <td>1.0.0</td>\n",
       "      <td>[https://stac-extensions.github.io/eo/v1.0.0/s...</td>\n",
       "      <td>S2A_MSIL2A_20150704T101006_R022_T32RMS_2021041...</td>\n",
       "      <td>...</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.967449</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>56</th>\n",
       "      <td>Feature</td>\n",
       "      <td>1.0.0</td>\n",
       "      <td>[https://stac-extensions.github.io/eo/v1.0.0/s...</td>\n",
       "      <td>S2A_MSIL2A_20150704T101006_R022_T31PFP_2021041...</td>\n",
       "      <td>...</td>\n",
       "      <td>2.169701</td>\n",
       "      <td>0.967449</td>\n",
       "      <td>1.014810</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>68</th>\n",
       "      <td>Feature</td>\n",
       "      <td>1.0.0</td>\n",
       "      <td>[https://stac-extensions.github.io/eo/v1.0.0/s...</td>\n",
       "      <td>S2A_MSIL2A_20150704T101006_R022_T31QGU_2021041...</td>\n",
       "      <td>...</td>\n",
       "      <td>0.211487</td>\n",
       "      <td>0.967449</td>\n",
       "      <td>0.171659</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>77</th>\n",
       "      <td>Feature</td>\n",
       "      <td>1.0.0</td>\n",
       "      <td>[https://stac-extensions.github.io/eo/v1.0.0/s...</td>\n",
       "      <td>S2A_MSIL2A_20150704T101006_R022_T31SGT_2021041...</td>\n",
       "      <td>...</td>\n",
       "      <td>1.710171</td>\n",
       "      <td>0.967449</td>\n",
       "      <td>1.712733</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>80</th>\n",
       "      <td>Feature</td>\n",
       "      <td>1.0.0</td>\n",
       "      <td>[https://stac-extensions.github.io/eo/v1.0.0/s...</td>\n",
       "      <td>S2A_MSIL2A_20150704T101006_R022_T31QHC_2021041...</td>\n",
       "      <td>...</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.967449</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 42 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       type stac_version                                    stac_extensions  \\\n",
       "27  Feature        1.0.0  [https://stac-extensions.github.io/eo/v1.0.0/s...   \n",
       "56  Feature        1.0.0  [https://stac-extensions.github.io/eo/v1.0.0/s...   \n",
       "68  Feature        1.0.0  [https://stac-extensions.github.io/eo/v1.0.0/s...   \n",
       "77  Feature        1.0.0  [https://stac-extensions.github.io/eo/v1.0.0/s...   \n",
       "80  Feature        1.0.0  [https://stac-extensions.github.io/eo/v1.0.0/s...   \n",
       "\n",
       "                                                   id  ...  \\\n",
       "27  S2A_MSIL2A_20150704T101006_R022_T32RMS_2021041...  ...   \n",
       "56  S2A_MSIL2A_20150704T101006_R022_T31PFP_2021041...  ...   \n",
       "68  S2A_MSIL2A_20150704T101006_R022_T31QGU_2021041...  ...   \n",
       "77  S2A_MSIL2A_20150704T101006_R022_T31SGT_2021041...  ...   \n",
       "80  S2A_MSIL2A_20150704T101006_R022_T31QHC_2021041...  ...   \n",
       "\n",
       "   s2:high_proba_clouds_percentage s2:reflectance_conversion_factor  \\\n",
       "27                        0.000000                         0.967449   \n",
       "56                        2.169701                         0.967449   \n",
       "68                        0.211487                         0.967449   \n",
       "77                        1.710171                         0.967449   \n",
       "80                        0.000000                         0.967449   \n",
       "\n",
       "   s2:medium_proba_clouds_percentage s2:saturated_defective_pixel_percentage  \n",
       "27                          0.000000                                     0.0  \n",
       "56                          1.014810                                     0.0  \n",
       "68                          0.171659                                     0.0  \n",
       "77                          1.712733                                     0.0  \n",
       "80                          0.000000                                     0.0  \n",
       "\n",
       "[5 rows x 42 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mask = (s2l2a[\"eo:cloud_cover\"] < 10) & (s2l2a[\"s2:nodata_pixel_percentage\"] > 90)\n",
    "keep = s2l2a[mask]\n",
    "keep.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec2b6360-9912-46eb-aacb-d4fad891c313",
   "metadata": {},
   "source": [
    "When you compute the results, the computation will run in parallel. See [Scale with Dask](https://planetarycomputer.microsoft.com/docs/quickstarts/scale-with-dask/) for more.\n",
    "\n",
    "As mentioned earlier, the different collections have different properties, and so have different columns in the DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "84bbb9b8-40f0-4b9e-9a00-67156a058c96",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['type',\n",
       " 'stac_version',\n",
       " 'stac_extensions',\n",
       " 'id',\n",
       " 'geometry',\n",
       " 'bbox',\n",
       " 'links',\n",
       " 'assets',\n",
       " 'collection',\n",
       " 'datetime',\n",
       " 'platform',\n",
       " 'proj:epsg',\n",
       " 'instruments',\n",
       " 's2:mgrs_tile',\n",
       " 'constellation',\n",
       " 's2:granule_id',\n",
       " 'eo:cloud_cover',\n",
       " 's2:datatake_id',\n",
       " 's2:product_uri',\n",
       " 's2:datastrip_id',\n",
       " 's2:product_type',\n",
       " 'sat:orbit_state',\n",
       " 's2:datatake_type',\n",
       " 's2:generation_time',\n",
       " 'sat:relative_orbit',\n",
       " 's2:water_percentage',\n",
       " 's2:mean_solar_zenith',\n",
       " 's2:mean_solar_azimuth',\n",
       " 's2:processing_baseline',\n",
       " 's2:snow_ice_percentage',\n",
       " 's2:vegetation_percentage',\n",
       " 's2:thin_cirrus_percentage',\n",
       " 's2:cloud_shadow_percentage',\n",
       " 's2:nodata_pixel_percentage',\n",
       " 's2:unclassified_percentage',\n",
       " 's2:dark_features_percentage',\n",
       " 's2:not_vegetated_percentage',\n",
       " 's2:degraded_msi_data_percentage',\n",
       " 's2:high_proba_clouds_percentage',\n",
       " 's2:reflectance_conversion_factor',\n",
       " 's2:medium_proba_clouds_percentage',\n",
       " 's2:saturated_defective_pixel_percentage']"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s2l2a.columns.tolist()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26ecde40-e42c-4584-8693-5e5714879c98",
   "metadata": {},
   "source": [
    "Different collections will be partitioned by different frequencies, depending on the update cadence, number of STAC items, and size of each STAC item. Look for an `msft:partition_info` property on the asset to check if the dataset is partitioned. The `partition_frequency` is a [pandas Offset alias](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "ea8f3bfd-0a5e-4a5d-9e44-9f3260095541",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'is_partitioned': True, 'partition_frequency': 'W-MON'}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "asset.extra_fields[\"msft:partition_info\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3af4fb34-4250-4bc3-b6c0-5482d59d8572",
   "metadata": {},
   "source": [
    "### Expanding nested fields"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20f226fe-96f0-4bcc-9f9d-a3dba988a8c1",
   "metadata": {},
   "source": [
    "STAC items are highly nested data structures, while libraries like pandas were mostly designed for working with non-nested data types. Consider a column like `assets`, which is a dictionary mapping asset keys to asset objects (which include an `href` and other properties)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "d3cf24f0-a794-4864-981d-5fb4abd9387e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    {'data': {'file:size': 53208880, 'file:values'...\n",
       "1    {'data': {'file:size': 114187155, 'file:values...\n",
       "2    {'data': {'file:size': 53981476, 'file:values'...\n",
       "3    {'data': {'file:size': 165601021, 'file:values...\n",
       "4    {'data': {'file:size': 97175834, 'file:values'...\n",
       "Name: assets, dtype: object"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"assets\"].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b1259961-9d29-4600-9f07-c27147ea5682",
   "metadata": {},
   "source": [
    "The [json_normalize](https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html) method can be used to expand this single column of nested data into many columns, one per asset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "dee39f6e-0123-496b-becc-15da5aee98fd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data.file:size</th>\n",
       "      <th>data.file:values</th>\n",
       "      <th>data.href</th>\n",
       "      <th>data.raster:bands</th>\n",
       "      <th>...</th>\n",
       "      <th>tilejson.href</th>\n",
       "      <th>tilejson.roles</th>\n",
       "      <th>tilejson.title</th>\n",
       "      <th>tilejson.type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>53208880</td>\n",
       "      <td>[{'summary': 'No Data', 'values': [0]}, {'summ...</td>\n",
       "      <td>https://ai4edataeuwest.blob.core.windows.net/i...</td>\n",
       "      <td>[{'nodata': 0, 'spatial_resolution': 10}]</td>\n",
       "      <td>...</td>\n",
       "      <td>https://planetarycomputer.microsoft.com/api/da...</td>\n",
       "      <td>[tiles]</td>\n",
       "      <td>TileJSON with default rendering</td>\n",
       "      <td>application/json</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>114187155</td>\n",
       "      <td>[{'summary': 'No Data', 'values': [0]}, {'summ...</td>\n",
       "      <td>https://ai4edataeuwest.blob.core.windows.net/i...</td>\n",
       "      <td>[{'nodata': 0, 'spatial_resolution': 10}]</td>\n",
       "      <td>...</td>\n",
       "      <td>https://planetarycomputer.microsoft.com/api/da...</td>\n",
       "      <td>[tiles]</td>\n",
       "      <td>TileJSON with default rendering</td>\n",
       "      <td>application/json</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>53981476</td>\n",
       "      <td>[{'summary': 'No Data', 'values': [0]}, {'summ...</td>\n",
       "      <td>https://ai4edataeuwest.blob.core.windows.net/i...</td>\n",
       "      <td>[{'nodata': 0, 'spatial_resolution': 10}]</td>\n",
       "      <td>...</td>\n",
       "      <td>https://planetarycomputer.microsoft.com/api/da...</td>\n",
       "      <td>[tiles]</td>\n",
       "      <td>TileJSON with default rendering</td>\n",
       "      <td>application/json</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>165601021</td>\n",
       "      <td>[{'summary': 'No Data', 'values': [0]}, {'summ...</td>\n",
       "      <td>https://ai4edataeuwest.blob.core.windows.net/i...</td>\n",
       "      <td>[{'nodata': 0, 'spatial_resolution': 10}]</td>\n",
       "      <td>...</td>\n",
       "      <td>https://planetarycomputer.microsoft.com/api/da...</td>\n",
       "      <td>[tiles]</td>\n",
       "      <td>TileJSON with default rendering</td>\n",
       "      <td>application/json</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>97175834</td>\n",
       "      <td>[{'summary': 'No Data', 'values': [0]}, {'summ...</td>\n",
       "      <td>https://ai4edataeuwest.blob.core.windows.net/i...</td>\n",
       "      <td>[{'nodata': 0, 'spatial_resolution': 10}]</td>\n",
       "      <td>...</td>\n",
       "      <td>https://planetarycomputer.microsoft.com/api/da...</td>\n",
       "      <td>[tiles]</td>\n",
       "      <td>TileJSON with default rendering</td>\n",
       "      <td>application/json</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 15 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   data.file:size                                   data.file:values  \\\n",
       "0        53208880  [{'summary': 'No Data', 'values': [0]}, {'summ...   \n",
       "1       114187155  [{'summary': 'No Data', 'values': [0]}, {'summ...   \n",
       "2        53981476  [{'summary': 'No Data', 'values': [0]}, {'summ...   \n",
       "3       165601021  [{'summary': 'No Data', 'values': [0]}, {'summ...   \n",
       "4        97175834  [{'summary': 'No Data', 'values': [0]}, {'summ...   \n",
       "\n",
       "                                           data.href  \\\n",
       "0  https://ai4edataeuwest.blob.core.windows.net/i...   \n",
       "1  https://ai4edataeuwest.blob.core.windows.net/i...   \n",
       "2  https://ai4edataeuwest.blob.core.windows.net/i...   \n",
       "3  https://ai4edataeuwest.blob.core.windows.net/i...   \n",
       "4  https://ai4edataeuwest.blob.core.windows.net/i...   \n",
       "\n",
       "                           data.raster:bands  ...  \\\n",
       "0  [{'nodata': 0, 'spatial_resolution': 10}]  ...   \n",
       "1  [{'nodata': 0, 'spatial_resolution': 10}]  ...   \n",
       "2  [{'nodata': 0, 'spatial_resolution': 10}]  ...   \n",
       "3  [{'nodata': 0, 'spatial_resolution': 10}]  ...   \n",
       "4  [{'nodata': 0, 'spatial_resolution': 10}]  ...   \n",
       "\n",
       "                                       tilejson.href tilejson.roles  \\\n",
       "0  https://planetarycomputer.microsoft.com/api/da...        [tiles]   \n",
       "1  https://planetarycomputer.microsoft.com/api/da...        [tiles]   \n",
       "2  https://planetarycomputer.microsoft.com/api/da...        [tiles]   \n",
       "3  https://planetarycomputer.microsoft.com/api/da...        [tiles]   \n",
       "4  https://planetarycomputer.microsoft.com/api/da...        [tiles]   \n",
       "\n",
       "                    tilejson.title     tilejson.type  \n",
       "0  TileJSON with default rendering  application/json  \n",
       "1  TileJSON with default rendering  application/json  \n",
       "2  TileJSON with default rendering  application/json  \n",
       "3  TileJSON with default rendering  application/json  \n",
       "4  TileJSON with default rendering  application/json  \n",
       "\n",
       "[5 rows x 15 columns]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "assets = pd.json_normalize(df[\"assets\"].head())\n",
    "assets"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f2c2b7f8-c8bd-4fff-ac30-fe481405d4f5",
   "metadata": {},
   "source": [
    "And the [explode](https://pandas.pydata.org/docs/reference/api/pandas.Series.explode.html) method will transform each element of a list-like value to a row:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "80aba9c8-9f82-4f9b-90f6-5180f83fef54",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    [{'summary': 'No Data', 'values': [0]}, {'summ...\n",
       "1    [{'summary': 'No Data', 'values': [0]}, {'summ...\n",
       "2    [{'summary': 'No Data', 'values': [0]}, {'summ...\n",
       "3    [{'summary': 'No Data', 'values': [0]}, {'summ...\n",
       "4    [{'summary': 'No Data', 'values': [0]}, {'summ...\n",
       "Name: data.file:values, dtype: object"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "assets[\"data.file:values\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "ca1df1b1-aa63-4c0f-ae97-b6fc890ff642",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0               {'summary': 'No Data', 'values': [0]}\n",
       "0                 {'summary': 'Water', 'values': [1]}\n",
       "0                 {'summary': 'Trees', 'values': [2]}\n",
       "0    {'summary': 'Flooded vegetation', 'values': [4]}\n",
       "0                 {'summary': 'Crops', 'values': [5]}\n",
       "0            {'summary': 'Built area', 'values': [7]}\n",
       "0           {'summary': 'Bare ground', 'values': [8]}\n",
       "0              {'summary': 'Snow/ice', 'values': [9]}\n",
       "0               {'summary': 'Clouds', 'values': [10]}\n",
       "0            {'summary': 'Rangeland', 'values': [11]}\n",
       "1               {'summary': 'No Data', 'values': [0]}\n",
       "1                 {'summary': 'Water', 'values': [1]}\n",
       "1                 {'summary': 'Trees', 'values': [2]}\n",
       "1    {'summary': 'Flooded vegetation', 'values': [4]}\n",
       "1                 {'summary': 'Crops', 'values': [5]}\n",
       "1            {'summary': 'Built area', 'values': [7]}\n",
       "1           {'summary': 'Bare ground', 'values': [8]}\n",
       "1              {'summary': 'Snow/ice', 'values': [9]}\n",
       "1               {'summary': 'Clouds', 'values': [10]}\n",
       "1            {'summary': 'Rangeland', 'values': [11]}\n",
       "2               {'summary': 'No Data', 'values': [0]}\n",
       "2                 {'summary': 'Water', 'values': [1]}\n",
       "2                 {'summary': 'Trees', 'values': [2]}\n",
       "2    {'summary': 'Flooded vegetation', 'values': [4]}\n",
       "2                 {'summary': 'Crops', 'values': [5]}\n",
       "2            {'summary': 'Built area', 'values': [7]}\n",
       "2           {'summary': 'Bare ground', 'values': [8]}\n",
       "2              {'summary': 'Snow/ice', 'values': [9]}\n",
       "2               {'summary': 'Clouds', 'values': [10]}\n",
       "2            {'summary': 'Rangeland', 'values': [11]}\n",
       "3               {'summary': 'No Data', 'values': [0]}\n",
       "3                 {'summary': 'Water', 'values': [1]}\n",
       "3                 {'summary': 'Trees', 'values': [2]}\n",
       "3    {'summary': 'Flooded vegetation', 'values': [4]}\n",
       "3                 {'summary': 'Crops', 'values': [5]}\n",
       "3            {'summary': 'Built area', 'values': [7]}\n",
       "3           {'summary': 'Bare ground', 'values': [8]}\n",
       "3              {'summary': 'Snow/ice', 'values': [9]}\n",
       "3               {'summary': 'Clouds', 'values': [10]}\n",
       "3            {'summary': 'Rangeland', 'values': [11]}\n",
       "4               {'summary': 'No Data', 'values': [0]}\n",
       "4                 {'summary': 'Water', 'values': [1]}\n",
       "4                 {'summary': 'Trees', 'values': [2]}\n",
       "4    {'summary': 'Flooded vegetation', 'values': [4]}\n",
       "4                 {'summary': 'Crops', 'values': [5]}\n",
       "4            {'summary': 'Built area', 'values': [7]}\n",
       "4           {'summary': 'Bare ground', 'values': [8]}\n",
       "4              {'summary': 'Snow/ice', 'values': [9]}\n",
       "4               {'summary': 'Clouds', 'values': [10]}\n",
       "4            {'summary': 'Rangeland', 'values': [11]}\n",
       "Name: data.file:values, dtype: object"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "assets[\"data.file:values\"].explode()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}