{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Google Cloud CMIP6 Public Data: Basic Python Example\n",
    "\n",
    "This notebooks shows how to query the catalog and load the data using python"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from matplotlib import pyplot as plt\n",
    "import pandas as pd\n",
    "import xarray as xr\n",
    "import fsspec\n",
    "import gcsfs\n",
    "\n",
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'retina' \n",
    "plt.rcParams['figure.figsize'] = 12, 6"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Browse Catalog\n",
    "\n",
    "The data catatalog is stored as a CSV file. Here we read it with Pandas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('https://storage.googleapis.com/cmip6/cmip6-zarr-consolidated-stores.csv')\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The columns of the dataframe correspond to the CMI6 controlled vocabulary. A beginners' guide to these terms is available in [this document](https://docs.google.com/document/d/1yUx6jr9EdedCOLd--CPdTfGDwEwzPpCF6p1jRmqx-0Q). \n",
    "\n",
    "Here we filter the data to find monthly surface air temperature for historical experiments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_ta = df.query(\"activity_id=='CMIP' & table_id == 'Amon' & variable_id == 'tas' & experiment_id == 'historical'\")\n",
    "df_ta"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we do further filtering to find just the models from NCAR."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_ta_ncar = df_ta.query('institution_id == \"NCAR\"')\n",
    "df_ta_ncar"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Data\n",
    "\n",
    "Now we will load a single store using fsspec, zarr, and xarray."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get the path to a specific zarr store (the first one from the dataframe above)\n",
    "zstore = df_ta_ncar.zstore.values[-1]\n",
    "print(zstore)\n",
    "\n",
    "# create a mutable-mapping-style interface to the store\n",
    "fs = gcsfs.GCSFileSystem(token='anon', session_kwargs={'trust_env': True})\n",
    "mapper = fs.get_mapper(zstore)\n",
    "\n",
    "# open it using xarray and zarr\n",
    "ds = xr.open_zarr(mapper, consolidated=True)\n",
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Plot a map from a specific date."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ds.tas.sel(time='1950-01').squeeze().plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a timeseries of global-average surface air temperature. For this we need the area weighting factor for each gridpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_area = df.query(\"variable_id == 'areacella' & source_id == 'CESM2'\")\n",
    "ds_area = xr.open_zarr(fs.get_mapper(df_area.zstore.values[0]), consolidated=True)\n",
    "ds_area"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "total_area = ds_area.areacella.sum(dim=['lon', 'lat'])\n",
    "ta_timeseries = (ds.tas * ds_area.areacella).sum(dim=['lon', 'lat']) / total_area\n",
    "ta_timeseries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default the data are loaded lazily, as Dask arrays. Here we trigger computation explicitly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%time ta_timeseries.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ta_timeseries.plot(label='monthly')\n",
    "ta_timeseries.rolling(time=12).mean().plot(label='12 month rolling mean')\n",
    "plt.legend()\n",
    "plt.title('Global Mean Surface Air Temperature')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}