{ "cells": [ { "cell_type": "markdown", "id": "3dbac58d-0638-4e3a-a594-efca35d34a7e", "metadata": {}, "source": [ "## Creating a cross-model ensemble using STAC" ] }, { "cell_type": "markdown", "id": "e0df9241-0c9d-4739-90f5-35b3f3b94092", "metadata": {}, "source": [ "This tutorial builds a cross-collection ensemble of GDPCIR bias corrected and downscaled data, and plots a single variable time series for the ensemble." ] }, { "cell_type": "code", "execution_count": 1, "id": "f6c291ba-5b61-41d9-bde9-3e49afddebaf", "metadata": {}, "outputs": [], "source": [ "# required to locate and authenticate with the stac collection\n", "import planetary_computer\n", "import pystac_client\n", "\n", "# required to load a zarr array using xarray\n", "import xarray as xr\n", "\n", "# optional imports used in this notebook\n", "import pandas as pd\n", "from dask.diagnostics import ProgressBar\n", "from tqdm.auto import tqdm" ] }, { "cell_type": "markdown", "id": "815bee61-c752-465b-ace8-48d7a6bff367", "metadata": {}, "source": [ "### Understanding the GDPCIR collections\n", "\n", "The [CIL-GDPCIR datasets](https://planetarycomputer.microsoft.com/dataset/group/cil-gdpcir) are grouped into two collections, depending on the license the data are provided under.\n", "\n", "- [CIL-GDPCIR-CC0](https://planetarycomputer.microsoft.com/dataset/cil-gdpcir-cc0) - provided in public domain using a [CC 1.0 Universal Public Domain Dedication](https://creativecommons.org/publicdomain/zero/1.0/)\n", "- [CIL-GDPCIR-CC-BY](https://planetarycomputer.microsoft.com/dataset/cil-gdpcir-cc-by) - provided under a [CC Attribution 4.0 License](https://creativecommons.org/licenses/by/4.0/)\n", "\n", "Note that the first group, CC0, places no restrictions on the data. CC-BY 4.0 requires citations of the climate models these datasets are derived from. See the [ClimateImpactLab/downscaleCMIP6 README](github.com/ClimateImpactLab/downscaleCMIP6) for the citation information for each GCM.\n", "\n", "Also, note that none of the descriptions of these licenses on this page, in this repository, and associated with this repository constitute legal advice. We are highlighting some of the key terms of these licenses, but this information should not be considered a replacement for the actual license terms, which are provided on the Creative Commons website at the links above.\n", "\n", "### Structure of the STAC collection\n", "\n", "The data assets in this collection are a set of [Zarr](https://zarr.readthedocs.io/) groups which can be opend by tools like [xarray](https://xarray.pydata.org/). Each Zarr group contains a single data variable (either `pr`, `tasmax`, or `tasmin`). The Planetary Computer provides a single STAC item per experiment, and each STAC item has one asset per data variable.\n", "\n", "Altogether, the collection is just over 21TB, with 247,997 individual files. The STAC collection is here to help search and make sense of this huge archive!\n", "\n", "For example, let's take a look at the CC0 collection:" ] }, { "cell_type": "code", "execution_count": 2, "id": "3dd5d503-3e05-4248-97bf-137cd3e61448", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
<xarray.Dataset> Size: 3TB\n", "Dimensions: (lat: 720, lon: 1440, model: 20, time: 31390)\n", "Coordinates:\n", " * lat (lat) float64 6kB -89.88 -89.62 -89.38 -89.12 ... 89.38 89.62 89.88\n", " * lon (lon) float64 12kB -179.9 -179.6 -179.4 ... 179.4 179.6 179.9\n", " * time (time) object 251kB 2015-01-01 12:00:00 ... 2100-12-31 12:00:00\n", " * model (model) object 160B 'GFDL-ESM4' 'NorESM2-MM' ... 'BCC-CSM2-MR'\n", "Data variables:\n", " tasmax (model, time, lat, lon) float32 3TB dask.array<chunksize=(1, 365, 360, 360), meta=np.ndarray>\n", "Attributes: (12/25)\n", " contact: climatesci@rhg.com\n", " dc6_bias_correction_method: Quantile Delta Method (QDM)\n", " dc6_citation: Please refer to https://github.com/ClimateI...\n", " dc6_data_version: v20211231\n", " dc6_dataset_name: Rhodium Group/Climate Impact Lab Global Dow...\n", " dc6_description: The prefix dc6 is the project-specific abbr...\n", " ... ...\n", " realization_index: 1\n", " realm: atmos\n", " sub_experiment: none\n", " sub_experiment_id: none\n", " table_id: day\n", " variable_id: tasmax
<xarray.DataArray 'tasmax' (model: 20, time: 5, lat: 58, lon: 64)> Size: 1MB\n", "array([[[[285.73315, 285.4604 , 285.3989 , ..., 288.46887, 288.45538,\n", " 288.3411 ],\n", " [285.4088 , 284.85898, 284.64523, ..., 288.25568, 288.40067,\n", " 288.26636],\n", " [285.042 , 284.65338, 284.3192 , ..., 288.09857, 288.33298,\n", " 288.17307],\n", " ...,\n", " [255.6336 , 255.90742, 255.76608, ..., 270.9189 , 271.3525 ,\n", " 271.49866],\n", " [255.53918, 256.2673 , 257.10635, ..., 271.33408, 270.98483,\n", " 271.3151 ],\n", " [254.87746, 255.66376, 256.50848, ..., 271.04498, 271.05704,\n", " 271.30338]],\n", "\n", " [[285.91162, 285.58484, 286.10092, ..., 292.35254, 292.3309 ,\n", " 292.34427],\n", " [285.565 , 284.94022, 285.75223, ..., 292.1405 , 292.2511 ,\n", " 292.22943],\n", " [285.25647, 284.79587, 285.2175 , ..., 291.90317, 292.11487,\n", " 292.16623],\n", "...\n", " [263.99496, 263.98923, 265.9909 , ..., 272.94186, 270.91608,\n", " 271.09778],\n", " [263.7273 , 264.36838, 267.27963, ..., 271.01 , 270.0902 ,\n", " 270.34082],\n", " [262.84192, 263.58743, 266.7099 , ..., 270.6768 , 270.26273,\n", " 270.4232 ]],\n", "\n", " [[289.9362 , 289.80225, 288.88925, ..., 291.64294, 291.30453,\n", " 291.20703],\n", " [289.503 , 289.25168, 288.7057 , ..., 291.51035, 291.37308,\n", " 291.30908],\n", " [289.054 , 288.8851 , 287.89902, ..., 291.3675 , 291.34402,\n", " 291.34195],\n", " ...,\n", " [265.43088, 265.4783 , 268.89062, ..., 270.09174, 269.90125,\n", " 270.14105],\n", " [264.06845, 264.98584, 268.8448 , ..., 270.29214, 269.33145,\n", " 269.47906],\n", " [263.4161 , 264.3746 , 268.24796, ..., 270.58902, 269.15533,\n", " 269.333 ]]]], dtype=float32)\n", "Coordinates:\n", " * lat (lat) float64 464B 31.12 31.38 31.62 31.88 ... 44.88 45.12 45.38\n", " * lon (lon) float64 512B 129.6 129.9 130.1 130.4 ... 144.9 145.1 145.4\n", " * time (time) object 40B 2020-01-01 12:00:00 ... 2020-01-05 12:00:00\n", " * model (model) object 160B 'GFDL-ESM4' 'NorESM2-MM' ... 'BCC-CSM2-MR'\n", "Attributes:\n", " cell_measures: area: areacella\n", " interp_method: conserve_order2\n", " long_name: Daily Maximum Near-Surface Air Temperature\n", " standard_name: air_temperature\n", " units: K\n", " comment: maximum near-surface (usually, 2 meter) air temperature (...