{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Enhanced Intake-ESM Catalog Demo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "This notebook compares one [Intake-ESM](https://intake-esm.readthedocs.io/en/stable/) catalog with an enhanced version that includes additional attributes. Both catalogs are an inventory of the [NCAR Community Earth System Model (CESM) Large Ensemble (LENS) data hosted on AWS S3](https://doi.org/10.26024/wt24-5j82)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "| Concepts | Importance | Notes |\n", "| --- | --- | --- |\n", "| [Intro to Pandas](https://foundations.projectpythia.org/core/pandas/pandas.html) | Necessary | |\n", "\n", "- **Time to learn**: 10 minutes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import intake\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Original Intake-ESM Catalog" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At import time, the `intake-esm` plugin is available in `intake`’s registry as `esm_datastore` and can be accessed with `intake.open_esm_datastore()` function. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cat_url_orig = 'https://ncar-cesm-lens.s3-us-west-2.amazonaws.com/catalogs/aws-cesm1-le.json'\n", "coll_orig = intake.open_esm_datastore(cat_url_orig)\n", "coll_orig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's a summary representation:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(coll_orig)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In an Intake-ESM catalog object, the `esmcat` class provides many useful attributes and functions. For example, we can get the collection's description:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "coll_orig.esmcat.description" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also get the URL pointing to the catalog's underlying tabular representation:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "coll_orig.esmcat.catalog_file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's a CSV file ... let's take a peek." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_orig = pd.read_csv(coll_orig.esmcat.catalog_file)\n", "df_orig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, we can save a step since an ESM catalog object provides a `df` instance which returns a dataframe too:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_orig = coll_orig.df\n", "df_orig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Print out a sorted list of the unique values of selected columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for col in ['component', 'frequency', 'experiment', 'variable']:\n", " unique_vals = coll_orig.unique()[col]\n", " unique_vals.sort()\n", " count = len(unique_vals)\n", " print (col + ': ' ,unique_vals, \" count: \", count, '\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Finding Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you happen to know the meaning of the variable names, you can find what data are available for that variable. For example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = coll_orig.search(variable='FLNS').df\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can narrow the filter to specific frequency and experiment:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = coll_orig.search(variable='FLNS', frequency='daily', experiment='RCP85').df\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
-enhanced
appended to aws-cesm1-le
Warning
\n", " Thelong_name
s are not CF Standard Names, but rather are those documented at \n",
"the NCAR LENS website. For interoperability, the long_name
column should be replaced by a cf_name
column and possibly an attribute
column to disambiguate if needed.\n",
"