{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Enhanced Intake-ESM Catalog Demo"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overview\n",
"This notebook compares one [Intake-ESM](https://intake-esm.readthedocs.io/en/stable/) catalog with an enhanced version that includes additional attributes. Both catalogs are an inventory of the [NCAR Community Earth System Model (CESM) Large Ensemble (LENS) data hosted on AWS S3](https://doi.org/10.26024/wt24-5j82)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"| Concepts | Importance | Notes |\n",
"| --- | --- | --- |\n",
"| [Intro to Pandas](https://foundations.projectpythia.org/core/pandas/pandas) | Necessary | |\n",
"\n",
"- **Time to learn**: 10 minutes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import intake\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Original Intake-ESM Catalog"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At import time, the `intake-esm` plugin is available in `intake`’s registry as `esm_datastore` and can be accessed with `intake.open_esm_datastore()` function. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cat_url_orig = 'https://ncar-cesm-lens.s3-us-west-2.amazonaws.com/catalogs/aws-cesm1-le.json'\n",
"coll_orig = intake.open_esm_datastore(cat_url_orig)\n",
"coll_orig"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's a summary representation:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(coll_orig)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In an Intake-ESM catalog object, the `esmcat` class provides many useful attributes and functions. For example, we can get the collection's description:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"coll_orig.esmcat.description"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also get the URL pointing to the catalog's underlying tabular representation:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"coll_orig.esmcat.catalog_file"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's a CSV file ... let's take a peek."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df_orig = pd.read_csv(coll_orig.esmcat.catalog_file)\n",
"df_orig"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, we can save a step since an ESM catalog object provides a `df` instance which returns a dataframe too:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df_orig = coll_orig.df\n",
"df_orig"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Print out a sorted list of the unique values of selected columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for col in ['component', 'frequency', 'experiment', 'variable']:\n",
" unique_vals = coll_orig.unique()[col]\n",
" unique_vals.sort()\n",
" count = len(unique_vals)\n",
" print (col + ': ' ,unique_vals, \" count: \", count, '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Finding Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you happen to know the meaning of the variable names, you can find what data are available for that variable. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df = coll_orig.search(variable='FLNS').df\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can narrow the filter to specific frequency and experiment:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df = coll_orig.search(variable='FLNS', frequency='daily', experiment='RCP85').df\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
-enhanced appended to aws-cesm1-leWarning
\n", " Thelong_names are not CF Standard Names, but rather are those documented at \n",
"the NCAR LENS website. For interoperability, the long_name column should be replaced by a cf_name column and possibly an attribute column to disambiguate if needed.\n",
"