{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Load CMIP6 Data with Intake ESM\n", "\n", "[Intake ESM](https://intake-esm.readthedocs.io/en/latest/) is an experimental new package that aims to provide a higher-level interface to searching and loading Earth System Model data archives, such as CMIP6. The packages is under very active development, and features may be unstable. Please report any issues or suggestions [on github](https://github.com/NCAR/intake-esm/issues)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2020-10-08T16:04:36.895969Z", "iopub.status.busy": "2020-10-08T16:04:36.894813Z", "iopub.status.idle": "2020-10-08T16:04:39.088783Z", "shell.execute_reply": "2020-10-08T16:04:39.087985Z" } }, "outputs": [], "source": [ "import xarray as xr\n", "xr.set_options(display_style='html')\n", "import intake\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Intake ESM works by parsing an [ESM Collection Spec](https://github.com/NCAR/esm-collection-spec/) and converting it to an [intake catalog](https://intake.readthedocs.io/en/latest). The collection spec is stored in a .json file. Here we open it using intake." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2020-10-08T16:04:39.096608Z", "iopub.status.busy": "2020-10-08T16:04:39.095524Z", "iopub.status.idle": "2020-10-08T16:04:45.399408Z", "shell.execute_reply": "2020-10-08T16:04:45.399947Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/srv/conda/envs/notebook/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3417: DtypeWarning: Columns (10) have mixed types.Specify dtype option on import or set low_memory=False.\n", " exec(code_obj, self.user_global_ns, self.user_ns)\n" ] }, { "data": { "text/html": [ "

pangeo-cmip6 catalog with 4749 dataset(s) from 294376 asset(s):

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
unique
activity_id15
institution_id34
source_id79
experiment_id107
member_id213
table_id30
variable_id392
grid_label10
zstore294376
dcpp_init_year60
version529
\n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "cat_url = \"https://storage.googleapis.com/cmip6/pangeo-cmip6.json\"\n", "col = intake.open_esm_datastore(cat_url)\n", "col" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now use intake methods to search the collection, and, if desired, export a pandas dataframe." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2020-10-08T16:04:45.405359Z", "iopub.status.busy": "2020-10-08T16:04:45.404610Z", "iopub.status.idle": "2020-10-08T16:04:45.527844Z", "shell.execute_reply": "2020-10-08T16:04:45.527090Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
activity_idinstitution_idsource_idexperiment_idmember_idtable_idvariable_idgrid_labelzstoredcpp_init_yearversion
0CMIPCCCmaCanESM5-CanOEhistoricalr1i1p2f1Oyro2gngs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical...NaN20190429
1CMIPCCCmaCanESM5-CanOEhistoricalr2i1p2f1Oyro2gngs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical...NaN20190429
2CMIPCCCmaCanESM5-CanOEhistoricalr3i1p2f1Oyro2gngs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical...NaN20190429
3CMIPCCCmaCanESM5historicalr10i1p1f1Oyro2gngs://cmip6/CMIP/CCCma/CanESM5/historical/r10i1...NaN20190429
4CMIPCCCmaCanESM5historicalr10i1p2f1Oyro2gngs://cmip6/CMIP/CCCma/CanESM5/historical/r10i1...NaN20190429
....................................
133ScenarioMIPIPSLIPSL-CM6A-LRssp585r4i1p1f1Oyro2gngs://cmip6/ScenarioMIP/IPSL/IPSL-CM6A-LR/ssp58...NaN20191122
134ScenarioMIPIPSLIPSL-CM6A-LRssp585r6i1p1f1Oyro2gngs://cmip6/ScenarioMIP/IPSL/IPSL-CM6A-LR/ssp58...NaN20191121
135ScenarioMIPMIROCMIROC-ES2Lssp585r1i1p1f2Oyro2gngs://cmip6/ScenarioMIP/MIROC/MIROC-ES2L/ssp585...NaN20190823
136ScenarioMIPMPI-MMPI-ESM1-2-LRssp585r10i1p1f1Oyro2gngs://cmip6/ScenarioMIP/MPI-M/MPI-ESM1-2-LR/ssp...NaN20190710
137ScenarioMIPMPI-MMPI-ESM1-2-LRssp585r1i1p1f1Oyro2gngs://cmip6/ScenarioMIP/MPI-M/MPI-ESM1-2-LR/ssp...NaN20190710
\n", "

138 rows × 11 columns

\n", "
" ], "text/plain": [ " activity_id institution_id source_id experiment_id member_id \\\n", "0 CMIP CCCma CanESM5-CanOE historical r1i1p2f1 \n", "1 CMIP CCCma CanESM5-CanOE historical r2i1p2f1 \n", "2 CMIP CCCma CanESM5-CanOE historical r3i1p2f1 \n", "3 CMIP CCCma CanESM5 historical r10i1p1f1 \n", "4 CMIP CCCma CanESM5 historical r10i1p2f1 \n", ".. ... ... ... ... ... \n", "133 ScenarioMIP IPSL IPSL-CM6A-LR ssp585 r4i1p1f1 \n", "134 ScenarioMIP IPSL IPSL-CM6A-LR ssp585 r6i1p1f1 \n", "135 ScenarioMIP MIROC MIROC-ES2L ssp585 r1i1p1f2 \n", "136 ScenarioMIP MPI-M MPI-ESM1-2-LR ssp585 r10i1p1f1 \n", "137 ScenarioMIP MPI-M MPI-ESM1-2-LR ssp585 r1i1p1f1 \n", "\n", " table_id variable_id grid_label \\\n", "0 Oyr o2 gn \n", "1 Oyr o2 gn \n", "2 Oyr o2 gn \n", "3 Oyr o2 gn \n", "4 Oyr o2 gn \n", ".. ... ... ... \n", "133 Oyr o2 gn \n", "134 Oyr o2 gn \n", "135 Oyr o2 gn \n", "136 Oyr o2 gn \n", "137 Oyr o2 gn \n", "\n", " zstore dcpp_init_year \\\n", "0 gs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical... NaN \n", "1 gs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical... NaN \n", "2 gs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical... NaN \n", "3 gs://cmip6/CMIP/CCCma/CanESM5/historical/r10i1... NaN \n", "4 gs://cmip6/CMIP/CCCma/CanESM5/historical/r10i1... NaN \n", ".. ... ... \n", "133 gs://cmip6/ScenarioMIP/IPSL/IPSL-CM6A-LR/ssp58... NaN \n", "134 gs://cmip6/ScenarioMIP/IPSL/IPSL-CM6A-LR/ssp58... NaN \n", "135 gs://cmip6/ScenarioMIP/MIROC/MIROC-ES2L/ssp585... NaN \n", "136 gs://cmip6/ScenarioMIP/MPI-M/MPI-ESM1-2-LR/ssp... NaN \n", "137 gs://cmip6/ScenarioMIP/MPI-M/MPI-ESM1-2-LR/ssp... NaN \n", "\n", " version \n", "0 20190429 \n", "1 20190429 \n", "2 20190429 \n", "3 20190429 \n", "4 20190429 \n", ".. ... \n", "133 20191122 \n", "134 20191121 \n", "135 20190823 \n", "136 20190710 \n", "137 20190710 \n", "\n", "[138 rows x 11 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cat = col.search(experiment_id=['historical', 'ssp585'], table_id='Oyr', variable_id='o2',\n", " grid_label='gn')\n", "cat.df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Intake knows how to automatically open the datasets using xarray. Furthermore, intake esm contains special logic to concatenate and merge the individual results of our query into larger, more high-level aggregated xarray datasets." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2020-10-08T16:04:45.622809Z", "iopub.status.busy": "2020-10-08T16:04:45.546399Z", "iopub.status.idle": "2020-10-08T16:04:52.516190Z", "shell.execute_reply": "2020-10-08T16:04:52.516740Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "--> The keys in the returned dictionary of datasets are constructed as follows:\n", "\t'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'\n" ] }, { "data": { "text/html": [ "\n", "
\n", " \n", " \n", " 100.00% [18/18 00:05<00:00]\n", "
\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "['ScenarioMIP.MPI-M.MPI-ESM1-2-LR.ssp585.Oyr.gn',\n", " 'ScenarioMIP.CCCma.CanESM5-CanOE.ssp585.Oyr.gn',\n", " 'CMIP.HAMMOZ-Consortium.MPI-ESM-1-2-HAM.historical.Oyr.gn',\n", " 'CMIP.MPI-M.MPI-ESM1-2-LR.historical.Oyr.gn',\n", " 'CMIP.CSIRO.ACCESS-ESM1-5.historical.Oyr.gn',\n", " 'ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.Oyr.gn',\n", " 'ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Oyr.gn',\n", " 'CMIP.NCC.NorESM2-MM.historical.Oyr.gn',\n", " 'ScenarioMIP.MIROC.MIROC-ES2L.ssp585.Oyr.gn',\n", " 'CMIP.CCCma.CanESM5-CanOE.historical.Oyr.gn',\n", " 'CMIP.MIROC.MIROC-ES2L.historical.Oyr.gn',\n", " 'ScenarioMIP.IPSL.IPSL-CM6A-LR.ssp585.Oyr.gn',\n", " 'CMIP.NCC.NorESM2-LM.historical.Oyr.gn',\n", " 'ScenarioMIP.DKRZ.MPI-ESM1-2-HR.ssp585.Oyr.gn',\n", " 'CMIP.CCCma.CanESM5.historical.Oyr.gn',\n", " 'ScenarioMIP.CCCma.CanESM5.ssp585.Oyr.gn',\n", " 'CMIP.MPI-M.MPI-ESM1-2-HR.historical.Oyr.gn',\n", " 'CMIP.IPSL.IPSL-CM6A-LR.historical.Oyr.gn']" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dset_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True})\n", "list(dset_dict.keys())" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2020-10-08T16:04:52.536255Z", "iopub.status.busy": "2020-10-08T16:04:52.535542Z", "iopub.status.idle": "2020-10-08T16:04:52.573472Z", "shell.execute_reply": "2020-10-08T16:04:52.574157Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset>\n",
       "Dimensions:             (bnds: 2, i: 360, j: 291, lev: 45, member_id: 35, time: 165, vertices: 4)\n",
       "Coordinates:\n",
       "  * i                   (i) int32 0 1 2 3 4 5 6 ... 353 354 355 356 357 358 359\n",
       "  * j                   (j) int32 0 1 2 3 4 5 6 ... 284 285 286 287 288 289 290\n",
       "    latitude            (j, i) float64 dask.array<chunksize=(291, 360), meta=np.ndarray>\n",
       "  * lev                 (lev) float64 3.047 9.454 16.36 ... 5.375e+03 5.625e+03\n",
       "    lev_bnds            (lev, bnds) float64 dask.array<chunksize=(45, 2), meta=np.ndarray>\n",
       "    longitude           (j, i) float64 dask.array<chunksize=(291, 360), meta=np.ndarray>\n",
       "  * time                (time) object 1850-07-02 12:00:00 ... 2014-07-02 12:0...\n",
       "    time_bnds           (time, bnds) object dask.array<chunksize=(165, 2), meta=np.ndarray>\n",
       "  * member_id           (member_id) <U9 'r10i1p1f1' 'r10i1p2f1' ... 'r9i1p2f1'\n",
       "Dimensions without coordinates: bnds, vertices\n",
       "Data variables:\n",
       "    o2                  (member_id, time, lev, j, i) float32 dask.array<chunksize=(1, 12, 45, 291, 360), meta=np.ndarray>\n",
       "    vertices_latitude   (j, i, vertices) float64 dask.array<chunksize=(291, 360, 4), meta=np.ndarray>\n",
       "    vertices_longitude  (j, i, vertices) float64 dask.array<chunksize=(291, 360, 4), meta=np.ndarray>\n",
       "Attributes:\n",
       "    variant_label:               r9i1p2f1\n",
       "    branch_method:               Spin-up documentation\n",
       "    source:                      CanESM5 (2019): \\naerosol: interactive\\natmo...\n",
       "    sub_experiment_id:           none\n",
       "    cmor_version:                3.4.0\n",
       "    institution_id:              CCCma\n",
       "    experiment:                  all-forcing simulation of the recent past\n",
       "    mip_era:                     CMIP6\n",
       "    parent_source_id:            CanESM5\n",
       "    parent_activity_id:          CMIP\n",
       "    nominal_resolution:          100 km\n",
       "    parent_time_units:           days since 1850-01-01 0:0:0.0\n",
       "    source_type:                 AOGCM\n",
       "    branch_time_in_child:        0.0\n",
       "    activity_id:                 CMIP\n",
       "    grid_label:                  gn\n",
       "    experiment_id:               historical\n",
       "    grid:                        ORCA1 tripolar grid, 1 deg with refinement t...\n",
       "    forcing_index:               1\n",
       "    CCCma_model_hash:            Unknown\n",
       "    source_id:                   CanESM5\n",
       "    YMDH_branch_time_in_child:   1850:01:01:00\n",
       "    external_variables:          areacello volcello\n",
       "    references:                  Geophysical Model Development Special issue ...\n",
       "    CCCma_parent_runid:          p2-pictrl\n",
       "    realm:                       ocnBgchem\n",
       "    product:                     model-output\n",
       "    institution:                 Canadian Centre for Climate Modelling and An...\n",
       "    table_id:                    Oyr\n",
       "    realization_index:           9\n",
       "    YMDH_branch_time_in_parent:  5950:01:01:00\n",
       "    frequency:                   yr\n",
       "    creation_date:               2019-05-30T08:58:45Z\n",
       "    title:                       CanESM5 output prepared for CMIP6\n",
       "    Conventions:                 CF-1.7 CMIP-6.2\n",
       "    status:                      2019-10-25;created;by nhn2@columbia.edu\n",
       "    CCCma_runid:                 p2-his09\n",
       "    parent_mip_era:              CMIP6\n",
       "    data_specs_version:          01.00.29\n",
       "    parent_experiment_id:        piControl\n",
       "    version:                     v20190429\n",
       "    license:                     CMIP6 model data produced by The Government ...\n",
       "    variable_id:                 o2\n",
       "    further_info_url:            https://furtherinfo.es-doc.org/CMIP6.CCCma.C...\n",
       "    history:                     2019-05-02T13:53:53Z ;rewrote data to be con...\n",
       "    sub_experiment:              none\n",
       "    tracking_id:                 hdl:21.14100/41426118-701c-482b-ae16-82932e4...\n",
       "    contact:                     ec.cccma.info-info.ccmac.ec@canada.ca\n",
       "    branch_time_in_parent:       1496500.0\n",
       "    initialization_index:        1\n",
       "    intake_esm_varname:          ['o2']\n",
       "    table_info:                  Creation Date:(20 February 2019) MD5:374fbe5...\n",
       "    intake_esm_dataset_key:      CMIP.CCCma.CanESM5.historical.Oyr.gn
" ], "text/plain": [ "\n", "Dimensions: (bnds: 2, i: 360, j: 291, lev: 45, member_id: 35, time: 165, vertices: 4)\n", "Coordinates:\n", " * i (i) int32 0 1 2 3 4 5 6 ... 353 354 355 356 357 358 359\n", " * j (j) int32 0 1 2 3 4 5 6 ... 284 285 286 287 288 289 290\n", " latitude (j, i) float64 dask.array\n", " * lev (lev) float64 3.047 9.454 16.36 ... 5.375e+03 5.625e+03\n", " lev_bnds (lev, bnds) float64 dask.array\n", " longitude (j, i) float64 dask.array\n", " * time (time) object 1850-07-02 12:00:00 ... 2014-07-02 12:0...\n", " time_bnds (time, bnds) object dask.array\n", " * member_id (member_id) \n", " vertices_latitude (j, i, vertices) float64 dask.array\n", " vertices_longitude (j, i, vertices) float64 dask.array\n", "Attributes:\n", " variant_label: r9i1p2f1\n", " branch_method: Spin-up documentation\n", " source: CanESM5 (2019): \\naerosol: interactive\\natmo...\n", " sub_experiment_id: none\n", " cmor_version: 3.4.0\n", " institution_id: CCCma\n", " experiment: all-forcing simulation of the recent past\n", " mip_era: CMIP6\n", " parent_source_id: CanESM5\n", " parent_activity_id: CMIP\n", " nominal_resolution: 100 km\n", " parent_time_units: days since 1850-01-01 0:0:0.0\n", " source_type: AOGCM\n", " branch_time_in_child: 0.0\n", " activity_id: CMIP\n", " grid_label: gn\n", " experiment_id: historical\n", " grid: ORCA1 tripolar grid, 1 deg with refinement t...\n", " forcing_index: 1\n", " CCCma_model_hash: Unknown\n", " source_id: CanESM5\n", " YMDH_branch_time_in_child: 1850:01:01:00\n", " external_variables: areacello volcello\n", " references: Geophysical Model Development Special issue ...\n", " CCCma_parent_runid: p2-pictrl\n", " realm: ocnBgchem\n", " product: model-output\n", " institution: Canadian Centre for Climate Modelling and An...\n", " table_id: Oyr\n", " realization_index: 9\n", " YMDH_branch_time_in_parent: 5950:01:01:00\n", " frequency: yr\n", " creation_date: 2019-05-30T08:58:45Z\n", " title: CanESM5 output prepared for CMIP6\n", " Conventions: CF-1.7 CMIP-6.2\n", " status: 2019-10-25;created;by nhn2@columbia.edu\n", " CCCma_runid: p2-his09\n", " parent_mip_era: CMIP6\n", " data_specs_version: 01.00.29\n", " parent_experiment_id: piControl\n", " version: v20190429\n", " license: CMIP6 model data produced by The Government ...\n", " variable_id: o2\n", " further_info_url: https://furtherinfo.es-doc.org/CMIP6.CCCma.C...\n", " history: 2019-05-02T13:53:53Z ;rewrote data to be con...\n", " sub_experiment: none\n", " tracking_id: hdl:21.14100/41426118-701c-482b-ae16-82932e4...\n", " contact: ec.cccma.info-info.ccmac.ec@canada.ca\n", " branch_time_in_parent: 1496500.0\n", " initialization_index: 1\n", " intake_esm_varname: ['o2']\n", " table_info: Creation Date:(20 February 2019) MD5:374fbe5...\n", " intake_esm_dataset_key: CMIP.CCCma.CanESM5.historical.Oyr.gn" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = dset_dict['CMIP.CCCma.CanESM5.historical.Oyr.gn']\n", "ds" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" } }, "nbformat": 4, "nbformat_minor": 4 }