{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "b5ad9932-76e1-4a4a-bb34-afebaf20ccd8",
   "metadata": {},
   "source": [
    "## Accessing Forest Inventory and Analysis data with the Planetary Computer STAC API\n",
    "\n",
    "This notebook demonstrates accessing [Forest Inventory and Analysis](https://planetarycomputer.microsoft.com/dataset/fia) (FIA) data from the Planetary Computer STAC API.\n",
    "\n",
    "The Forest Inventory and Analysis collection contains many tables, and each STAC table corresponds to one STAC item in the [FIA collection](http://planetarycomputer.microsoft.com/api/stac/v1/collections/fia). In this example, we'll use a few of the tables to estimate the total amount of aboveground carbon, in pounds, per US county.\n",
    "\n",
    "This example builds on the [plot estimation](https://rfia.netlify.app/courses/plt_est/) example from the [rfia](https://rfia.netlify.app/) package."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b11609a3-e138-47c9-bcc7-7bae11405bca",
   "metadata": {},
   "outputs": [],
   "source": [
    "from cartopy import crs as ccrs\n",
    "import dask_gateway\n",
    "import dask_geopandas\n",
    "import dask.dataframe as dd\n",
    "import geopandas\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import pystac_client\n",
    "import planetary_computer"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be0c70f3-1124-44c5-96bc-d22107062384",
   "metadata": {},
   "source": [
    "The `tree` table below is relatively large, so we'll process it in parallel on a Dask cluster. The example will still run without a cluster, it will just take longer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "3e8cca5f-d068-4ea8-baff-45dcb485ed53",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "fc02b078c07b4efb83440164231f0360",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "VBox(children=(HTML(value='<h2>GatewayCluster</h2>'), HBox(children=(HTML(value='\\n<div>\\n<style scoped>\\n    …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "cluster = dask_gateway.GatewayCluster()\n",
    "cluster.scale(16)\n",
    "client = cluster.get_client()\n",
    "cluster"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15154632-a481-4d11-ba9f-8ef91b4e0c46",
   "metadata": {},
   "source": [
    "### Data Access\n",
    "\n",
    "We'll use two datsaets\n",
    "\n",
    "* `tree`: Information on each tree 1″ in diameter or larger, linked to `plot` and `cond`.\n",
    "* `plot`: Information relevant to each 1-acre field plot where the samples were collected.\n",
    "\n",
    "All of these are available in Azure Blob Storage as parquet datasets that can be read, for example, by `dask.dataframe`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "4a3f4218-9131-4906-a607-fc1f8042bbfb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<CollectionClient id=fia>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "catalog = pystac_client.Client.open(\n",
    "    \"https://planetarycomputer.microsoft.com/api/stac/v1\",\n",
    "    modifier=planetary_computer.sign_inplace,\n",
    ")\n",
    "fia = catalog.get_collection(\"fia\")\n",
    "fia"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "ffbf2992-37b5-4163-9c5b-d4aa3bc15d72",
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_item = fia.get_item(\"plot\")\n",
    "tree_item = fia.get_item(\"tree\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ed6c85d-bd30-4b43-bb2d-3c7211bc6404",
   "metadata": {},
   "source": [
    "And now we can read the items using `dask.dataframe`. We make sure to pass the `storage_options` included in the `data` asset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "940fda11-2897-4e07-bd76-4c84eb5e7aab",
   "metadata": {},
   "outputs": [],
   "source": [
    "plot = dd.read_parquet(\n",
    "    plot_item.assets[\"data\"].href,\n",
    "    columns=[\"CN\", \"STATECD\", \"COUNTYCD\"],\n",
    "    storage_options=plot_item.assets[\"data\"].extra_fields[\"table:storage_options\"],\n",
    "    engine=\"pyarrow\",\n",
    ")\n",
    "tree = dd.read_parquet(\n",
    "    tree_item.assets[\"data\"].href,\n",
    "    columns=[\"PLT_CN\", \"CONDID\", \"TREE\", \"DRYBIO_AG\", \"CARBON_AG\", \"TPA_UNADJ\"],\n",
    "    storage_options=tree_item.assets[\"data\"].extra_fields[\"table:storage_options\"],\n",
    "    engine=\"pyarrow\",\n",
    "    filters=[(\"CONDID\", \"==\", 1)],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e10d6c94-e48f-4cfb-8a3c-284d018b3731",
   "metadata": {},
   "source": [
    "### Join the datasets\n",
    "\n",
    "The three datasets can be joined on their various keys. Since `tree` is relatively large, we'll join the other (smaller, in-memory) dataframes to it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "e78a44bc-4f9f-4ef5-98e7-97a5d3201883",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><strong>Dask DataFrame Structure:</strong></div>\n",
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PLT_CN</th>\n",
       "      <th>CONDID</th>\n",
       "      <th>TREE</th>\n",
       "      <th>DRYBIO_AG</th>\n",
       "      <th>CARBON_AG</th>\n",
       "      <th>TPA_UNADJ</th>\n",
       "      <th>CN</th>\n",
       "      <th>STATECD</th>\n",
       "      <th>COUNTYCD</th>\n",
       "      <th>bio</th>\n",
       "      <th>carbon</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>npartitions=160</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <td>int64</td>\n",
       "      <td>int64</td>\n",
       "      <td>int64</td>\n",
       "      <td>float64</td>\n",
       "      <td>float64</td>\n",
       "      <td>float64</td>\n",
       "      <td>int64</td>\n",
       "      <td>int64</td>\n",
       "      <td>int64</td>\n",
       "      <td>float64</td>\n",
       "      <td>float64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>\n",
       "<div>Dask Name: assign, 1921 tasks</div>"
      ],
      "text/plain": [
       "Dask DataFrame Structure:\n",
       "                PLT_CN CONDID   TREE DRYBIO_AG CARBON_AG TPA_UNADJ     CN STATECD COUNTYCD      bio   carbon\n",
       "npartitions=160                                                                                             \n",
       "                 int64  int64  int64   float64   float64   float64  int64   int64    int64  float64  float64\n",
       "                   ...    ...    ...       ...       ...       ...    ...     ...      ...      ...      ...\n",
       "...                ...    ...    ...       ...       ...       ...    ...     ...      ...      ...      ...\n",
       "                   ...    ...    ...       ...       ...       ...    ...     ...      ...      ...      ...\n",
       "                   ...    ...    ...       ...       ...       ...    ...     ...      ...      ...      ...\n",
       "Dask Name: assign, 1921 tasks"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = tree.merge(plot.assign(PLT_CN=plot.CN).compute(), on=\"PLT_CN\").assign(\n",
    "    bio=lambda df: df.DRYBIO_AG * df.TPA_UNADJ / 2000,\n",
    "    carbon=lambda df: df.CARBON_AG * df.TPA_UNADJ / 2000,\n",
    ")\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "efb42525-1800-4612-9c7a-1480c5da3ce0",
   "metadata": {},
   "source": [
    "### Compute per-county summaries\n",
    "\n",
    "The `df` dataframe now includes the state and county FIPS codes, and the (adjusted) aboveground carbon and biomass. We'll group by the geographic boundaries and sum the aboveground carbon and biomass."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "423396fb-c05f-43db-bd1e-df0762a363f2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 259 ms, sys: 36.5 ms, total: 296 ms\n",
      "Wall time: 35.2 s\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>bio</th>\n",
       "      <th>carbon</th>\n",
       "      <th>STATE</th>\n",
       "      <th>COUNTY</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9130.179946</td>\n",
       "      <td>4565.090000</td>\n",
       "      <td>01</td>\n",
       "      <td>001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>27163.826785</td>\n",
       "      <td>13565.771340</td>\n",
       "      <td>01</td>\n",
       "      <td>003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>15567.843806</td>\n",
       "      <td>7774.593220</td>\n",
       "      <td>01</td>\n",
       "      <td>005</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>16643.533828</td>\n",
       "      <td>8321.766954</td>\n",
       "      <td>01</td>\n",
       "      <td>007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8742.498366</td>\n",
       "      <td>4364.876867</td>\n",
       "      <td>01</td>\n",
       "      <td>009</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            bio        carbon STATE COUNTY\n",
       "0   9130.179946   4565.090000    01    001\n",
       "1  27163.826785  13565.771340    01    003\n",
       "2  15567.843806   7774.593220    01    005\n",
       "3  16643.533828   8321.766954    01    007\n",
       "4   8742.498366   4364.876867    01    009"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%time\n",
    "result = (\n",
    "    df.groupby([\"STATECD\", \"COUNTYCD\"])[[\"bio\", \"carbon\"]]\n",
    "    .sum()\n",
    "    .compute()\n",
    "    .reset_index()\n",
    "    .assign(\n",
    "        STATE=lambda df: df[\"STATECD\"].astype(\"string\").str.pad(2, fillchar=\"0\"),\n",
    "        COUNTY=lambda df: df[\"COUNTYCD\"].astype(\"string\").str.pad(3, fillchar=\"0\"),\n",
    "    )\n",
    "    .drop(columns=[\"STATECD\", \"COUNTYCD\"])\n",
    ")\n",
    "result.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5652d76a-fc0e-4de2-86b2-63c884471d39",
   "metadata": {},
   "source": [
    "### Plot the results\n",
    "\n",
    "Now we'll make a chloropleth for the results. We just need to join in the actual geographic boundaries of the datasets, which we can get with geopandas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "aaa3d6e1-7731-4446-906e-5e8d2c63c5af",
   "metadata": {},
   "outputs": [],
   "source": [
    "census_item = catalog.get_collection(\"us-census\").get_item(\n",
    "    \"2020-cb_2020_us_county_500k\"\n",
    ")\n",
    "\n",
    "counties = (\n",
    "    dask_geopandas.read_parquet(\n",
    "        census_item.assets[\"data\"].href,\n",
    "        storage_options=census_item.assets[\"data\"].extra_fields[\n",
    "            \"table:storage_options\"\n",
    "        ],\n",
    "        columns=[\"STATEFP\", \"COUNTYFP\", \"geometry\"],\n",
    "    ).rename(columns={\"STATEFP\": \"STATE\", \"COUNTYFP\": \"COUNTY\"})\n",
    ").compute()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4cad86c0-c87b-46f2-be90-c284935467b3",
   "metadata": {},
   "source": [
    "Finally, we'll slice the data down to the continental United States (the dataset covers Hawaii, Alaska, and several territories)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "ce9de40f-c33c-4d54-8963-7ddb15fa185a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>bio</th>\n",
       "      <th>carbon</th>\n",
       "      <th>STATE</th>\n",
       "      <th>COUNTY</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9130.179946</td>\n",
       "      <td>4565.090000</td>\n",
       "      <td>01</td>\n",
       "      <td>001</td>\n",
       "      <td>POLYGON ((-86.92120 32.65754, -86.92035 32.658...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>27163.826785</td>\n",
       "      <td>13565.771340</td>\n",
       "      <td>01</td>\n",
       "      <td>003</td>\n",
       "      <td>POLYGON ((-88.02858 30.22676, -88.02399 30.230...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>15567.843806</td>\n",
       "      <td>7774.593220</td>\n",
       "      <td>01</td>\n",
       "      <td>005</td>\n",
       "      <td>POLYGON ((-85.74803 31.61918, -85.74544 31.618...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>16643.533828</td>\n",
       "      <td>8321.766954</td>\n",
       "      <td>01</td>\n",
       "      <td>007</td>\n",
       "      <td>POLYGON ((-87.42194 33.00338, -87.31854 33.006...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8742.498366</td>\n",
       "      <td>4364.876867</td>\n",
       "      <td>01</td>\n",
       "      <td>009</td>\n",
       "      <td>POLYGON ((-86.96336 33.85822, -86.95967 33.857...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            bio        carbon STATE COUNTY  \\\n",
       "0   9130.179946   4565.090000    01    001   \n",
       "1  27163.826785  13565.771340    01    003   \n",
       "2  15567.843806   7774.593220    01    005   \n",
       "3  16643.533828   8321.766954    01    007   \n",
       "4   8742.498366   4364.876867    01    009   \n",
       "\n",
       "                                            geometry  \n",
       "0  POLYGON ((-86.92120 32.65754, -86.92035 32.658...  \n",
       "1  POLYGON ((-88.02858 30.22676, -88.02399 30.230...  \n",
       "2  POLYGON ((-85.74803 31.61918, -85.74544 31.618...  \n",
       "3  POLYGON ((-87.42194 33.00338, -87.31854 33.006...  \n",
       "4  POLYGON ((-86.96336 33.85822, -86.95967 33.857...  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gdf = geopandas.GeoDataFrame(pd.merge(result, counties, on=[\"STATE\", \"COUNTY\"]))\n",
    "df_conus = gdf.cx[-124.784:-66.951, 24.744:49.346]\n",
    "df_conus.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e459f4d4-0aa7-4920-a6dc-7d12c9d64254",
   "metadata": {},
   "source": [
    "Finally, we'll plot the (log) of the estimated carbon stored above ground by the trees."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "a9886f61-470b-44d7-96da-ff459f7cd9a3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"https://ai4edatasetspublicassets.blob.core.windows.net/assets/notebook-output/datasets-fia-fia-example.ipynb/10.png\"/>"
      ],
      "text/plain": [
       "<Figure size 1152x648 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "crs = ccrs.LambertConformal()\n",
    "fig, ax = plt.subplots(subplot_kw={\"projection\": crs}, figsize=(16, 9))\n",
    "df_conus.assign(carbon=np.log(df_conus.carbon + 1)).to_crs(crs.proj4_init).plot(\n",
    "    column=\"carbon\",\n",
    "    cmap=\"Greens\",\n",
    "    edgecolor=\"k\",\n",
    "    scheme=\"natural_breaks\",\n",
    "    k=8,\n",
    "    ax=ax,\n",
    "    linewidths=0.1,\n",
    "    legend=True,\n",
    ")\n",
    "\n",
    "# Shift the legend\n",
    "bbox = ax.legend_.get_bbox_to_anchor().transformed(ax.transAxes.inverted())\n",
    "bbox.x0 += 0.075\n",
    "bbox.x1 += 0.075\n",
    "bbox.y0 -= 0.4\n",
    "bbox.y1 -= 0.4\n",
    "ax.legend_.set_bbox_to_anchor(bbox)\n",
    "\n",
    "ax.axis(\"off\");"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "34527ba3-e795-40de-8cc7-9e0f2fd6a268",
   "metadata": {},
   "source": [
    "### Next Steps\n",
    "\n",
    "Now that you've an introduction to the Forest Inventory and Analysis dataset, learn more with\n",
    "\n",
    "* The [Reading tabular data quickstart](https://planetarycomputer.microsoft.com/docs/quickstarts/reading-tabular-data/) for an introduction to tabular data on the Planeatry Computer"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.10"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}