{
"cells": [
{
"cell_type": "markdown",
"id": "b5ad9932-76e1-4a4a-bb34-afebaf20ccd8",
"metadata": {},
"source": [
"## Accessing Forest Inventory and Analysis data with the Planetary Computer STAC API\n",
"\n",
"This notebook demonstrates accessing [Forest Inventory and Analysis](https://planetarycomputer.microsoft.com/dataset/fia) (FIA) data from the Planetary Computer STAC API.\n",
"\n",
"The Forest Inventory and Analysis collection contains many tables, and each STAC table corresponds to one STAC item in the [FIA collection](http://planetarycomputer.microsoft.com/api/stac/v1/collections/fia). In this example, we'll use a few of the tables to estimate the total amount of aboveground carbon, in pounds, per US county.\n",
"\n",
"This example builds on the [plot estimation](https://rfia.netlify.app/courses/plt_est/) example from the [rfia](https://rfia.netlify.app/) package."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "b11609a3-e138-47c9-bcc7-7bae11405bca",
"metadata": {},
"outputs": [],
"source": [
"from cartopy import crs as ccrs\n",
"import dask_gateway\n",
"import dask_geopandas\n",
"import dask.dataframe as dd\n",
"import geopandas\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"import pystac_client\n",
"import planetary_computer"
]
},
{
"cell_type": "markdown",
"id": "be0c70f3-1124-44c5-96bc-d22107062384",
"metadata": {},
"source": [
"The `tree` table below is relatively large, so we'll process it in parallel on a Dask cluster. The example will still run without a cluster, it will just take longer."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "3e8cca5f-d068-4ea8-baff-45dcb485ed53",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "fc02b078c07b4efb83440164231f0360",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(HTML(value='
GatewayCluster
'), HBox(children=(HTML(value='\\n
\\n\n",
"
\n",
" \n",
"
\n",
"
\n",
"
PLT_CN
\n",
"
CONDID
\n",
"
TREE
\n",
"
DRYBIO_AG
\n",
"
CARBON_AG
\n",
"
TPA_UNADJ
\n",
"
CN
\n",
"
STATECD
\n",
"
COUNTYCD
\n",
"
bio
\n",
"
carbon
\n",
"
\n",
"
\n",
"
npartitions=160
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
\n",
"
int64
\n",
"
int64
\n",
"
int64
\n",
"
float64
\n",
"
float64
\n",
"
float64
\n",
"
int64
\n",
"
int64
\n",
"
int64
\n",
"
float64
\n",
"
float64
\n",
"
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Dask Name: assign, 1921 tasks
"
],
"text/plain": [
"Dask DataFrame Structure:\n",
" PLT_CN CONDID TREE DRYBIO_AG CARBON_AG TPA_UNADJ CN STATECD COUNTYCD bio carbon\n",
"npartitions=160 \n",
" int64 int64 int64 float64 float64 float64 int64 int64 int64 float64 float64\n",
" ... ... ... ... ... ... ... ... ... ... ...\n",
"... ... ... ... ... ... ... ... ... ... ... ...\n",
" ... ... ... ... ... ... ... ... ... ... ...\n",
" ... ... ... ... ... ... ... ... ... ... ...\n",
"Dask Name: assign, 1921 tasks"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = tree.merge(plot.assign(PLT_CN=plot.CN).compute(), on=\"PLT_CN\").assign(\n",
" bio=lambda df: df.DRYBIO_AG * df.TPA_UNADJ / 2000,\n",
" carbon=lambda df: df.CARBON_AG * df.TPA_UNADJ / 2000,\n",
")\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "efb42525-1800-4612-9c7a-1480c5da3ce0",
"metadata": {},
"source": [
"### Compute per-county summaries\n",
"\n",
"The `df` dataframe now includes the state and county FIPS codes, and the (adjusted) aboveground carbon and biomass. We'll group by the geographic boundaries and sum the aboveground carbon and biomass."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "423396fb-c05f-43db-bd1e-df0762a363f2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 259 ms, sys: 36.5 ms, total: 296 ms\n",
"Wall time: 35.2 s\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
bio
\n",
"
carbon
\n",
"
STATE
\n",
"
COUNTY
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
9130.179946
\n",
"
4565.090000
\n",
"
01
\n",
"
001
\n",
"
\n",
"
\n",
"
1
\n",
"
27163.826785
\n",
"
13565.771340
\n",
"
01
\n",
"
003
\n",
"
\n",
"
\n",
"
2
\n",
"
15567.843806
\n",
"
7774.593220
\n",
"
01
\n",
"
005
\n",
"
\n",
"
\n",
"
3
\n",
"
16643.533828
\n",
"
8321.766954
\n",
"
01
\n",
"
007
\n",
"
\n",
"
\n",
"
4
\n",
"
8742.498366
\n",
"
4364.876867
\n",
"
01
\n",
"
009
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" bio carbon STATE COUNTY\n",
"0 9130.179946 4565.090000 01 001\n",
"1 27163.826785 13565.771340 01 003\n",
"2 15567.843806 7774.593220 01 005\n",
"3 16643.533828 8321.766954 01 007\n",
"4 8742.498366 4364.876867 01 009"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"result = (\n",
" df.groupby([\"STATECD\", \"COUNTYCD\"])[[\"bio\", \"carbon\"]]\n",
" .sum()\n",
" .compute()\n",
" .reset_index()\n",
" .assign(\n",
" STATE=lambda df: df[\"STATECD\"].astype(\"string\").str.pad(2, fillchar=\"0\"),\n",
" COUNTY=lambda df: df[\"COUNTYCD\"].astype(\"string\").str.pad(3, fillchar=\"0\"),\n",
" )\n",
" .drop(columns=[\"STATECD\", \"COUNTYCD\"])\n",
")\n",
"result.head()"
]
},
{
"cell_type": "markdown",
"id": "5652d76a-fc0e-4de2-86b2-63c884471d39",
"metadata": {},
"source": [
"### Plot the results\n",
"\n",
"Now we'll make a chloropleth for the results. We just need to join in the actual geographic boundaries of the datasets, which we can get with geopandas."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "aaa3d6e1-7731-4446-906e-5e8d2c63c5af",
"metadata": {},
"outputs": [],
"source": [
"census_item = catalog.get_collection(\"us-census\").get_item(\n",
" \"2020-cb_2020_us_county_500k\"\n",
")\n",
"\n",
"counties = (\n",
" dask_geopandas.read_parquet(\n",
" census_item.assets[\"data\"].href,\n",
" storage_options=census_item.assets[\"data\"].extra_fields[\n",
" \"table:storage_options\"\n",
" ],\n",
" columns=[\"STATEFP\", \"COUNTYFP\", \"geometry\"],\n",
" ).rename(columns={\"STATEFP\": \"STATE\", \"COUNTYFP\": \"COUNTY\"})\n",
").compute()"
]
},
{
"cell_type": "markdown",
"id": "4cad86c0-c87b-46f2-be90-c284935467b3",
"metadata": {},
"source": [
"Finally, we'll slice the data down to the continental United States (the dataset covers Hawaii, Alaska, and several territories)."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "ce9de40f-c33c-4d54-8963-7ddb15fa185a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"