{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# NCAR CESM High Resolution Ocean Simulation\n",
    "\n",
    "In this notebook, we load and analyze data described in the following paper:\n",
    "\n",
    "Small, R. J., et al. (2014), _A new synoptic scale resolving global climate simulation using the Community Earth System Model_, J. Adv. Model. Earth Syst., 6, 1065–1094.\n",
    "<https://doi.org/10.1002/2014MS000363>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import xarray as xr\n",
    "import numpy as np\n",
    "import xgcm\n",
    "import gcsfs\n",
    "from matplotlib import pyplot as plt\n",
    "plt.rcParams['figure.figsize'] = (14,8)\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from intake import open_catalog\n",
    "\n",
    "cat = open_catalog(\"https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/master.yaml\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load High Resolution Ocean Surface Fields\n",
    "\n",
    "Stored on Google Cloud Storage in Zarr format. Accessed using intake."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ds = cat.ocean.CESM_POP.CESM_POP_hires_control.to_dask()\n",
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Transform POP Model Dimensions\n",
    "\n",
    "The POP models uses the same dimensions (`nlat`, `nlon`) to describe all data variables,\n",
    "despite the fact that these variables live at different points on the grid cell (T point vs. U point).\n",
    "The true positions of the variables are encoded in the attribute `grid_loc`:\n",
    "\n",
    "    grid_loc 1: dimensionality\n",
    "    grid_loc 2: horizontal location in x: 1 = TLONG; 2 = ULONG\n",
    "    grid_loc 3: horizontal location in y: 1 = TLAT; 2 = ULAT\n",
    "    grid_loc 4: vertical grid: 0 = surface, 1 = z_t, 2 = z_w, 3 = z_w_bot, 4 = z_t_150m\n",
    "\n",
    "This choice is not optimal for using xarray and [xgcm](http://xgcm.readthedocs.org/),\n",
    "since it implies that those variables can be directly added / multiplied, etc.\n",
    "So we need to modify the dataset to fix this, by parsing `grid_loc` and introducing new coordinates\n",
    "(`nlat_u`, `nlat_t`, `nlon_u`, `nlon_t`).\n",
    "\n",
    "That's what the code below does. These routines are also archived in a standalone gist:\n",
    "<https://gist.github.com/rabernat/933bc785c99828352f343e48d0da6e22>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def dims_from_grid_loc(grid_loc):\n",
    "    grid_loc = str(grid_loc)\n",
    "    ndim = int(grid_loc[0])\n",
    "    x_loc_key = int(grid_loc[1])\n",
    "    y_loc_key = int(grid_loc[2])\n",
    "    z_loc_key = int(grid_loc[3])\n",
    "    \n",
    "    x_loc = {1: 'nlon_t', 2: 'nlon_u'}[x_loc_key]\n",
    "    y_loc = {1: 'nlat_t', 2: 'nlat_u'}[y_loc_key]\n",
    "    z_loc = {0: 'surface', 1: 'z_t', 2: 'z_w', 3: 'z_w_bot', 4: 'z_t_150m'}[z_loc_key]\n",
    "    \n",
    "    if ndim == 3:\n",
    "        return z_loc, y_loc, x_loc\n",
    "    elif ndim == 2:\n",
    "        return y_loc, x_loc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def label_coord_grid_locs(ds):\n",
    "    grid_locs = {'ANGLE': '2220', 'ANGLET': '2110',\n",
    "                 'DXT': '2110', 'DXU': '2220',\n",
    "                 'DYT': '2110', 'DYU': '2220',\n",
    "                 'HT': '2110', 'HU': '2220',\n",
    "                 'HTE': '2210', 'HTN': '2120',\n",
    "                 'HUS': '2210', 'HUW': '2120',\n",
    "                 'KMT': '2110', 'KMU': '2220',\n",
    "                 'REGION_MASK': '2110',\n",
    "                 'TAREA': '2110', 'TLAT': '2110', 'TLONG': '2110',\n",
    "                 'UAREA': '2220', 'ULAT': '2220', 'ULONG': '2220'}\n",
    "    ds_new = ds.copy()\n",
    "    for vname, grid_loc in grid_locs.items():\n",
    "        ds_new[vname].attrs['grid_loc'] = grid_loc\n",
    "    return ds_new"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create some actual dimension coordinates\n",
    "def add_pop_dims_to_dataset(ds):\n",
    "    ds_new = ds.copy()\n",
    "    ds_new['nlon_u'] = xr.Variable(('nlon_u'), np.arange(len(ds.nlon)) + 1, {'axis': 'X', 'c_grid_axis_shift': 0.5})\n",
    "    ds_new['nlat_u'] = xr.Variable(('nlat_u'), np.arange(len(ds.nlat)) + 1, {'axis': 'Y', 'c_grid_axis_shift': 0.5})\n",
    "    ds_new['nlon_t'] = xr.Variable(('nlon_t'), np.arange(len(ds.nlon)) + 0.5, {'axis': 'X'})\n",
    "    ds_new['nlat_t'] = xr.Variable(('nlat_t'), np.arange(len(ds.nlat)) + 0.5, {'axis': 'Y'})\n",
    "    \n",
    "    # add metadata to z grid\n",
    "    ds_new['z_t'].attrs.update({'axis': 'Z'})\n",
    "    ds_new['z_w'].attrs.update({'axis': 'Z', 'c_grid_axis_shift': -0.5})\n",
    "    ds_new['z_w_top'].attrs.update({'axis': 'Z', 'c_grid_axis_shift': -0.5})\n",
    "    ds_new['z_w_bot'].attrs.update({'axis': 'Z', 'c_grid_axis_shift': 0.5})\n",
    "    \n",
    "    return ds_new"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def relabel_pop_dims(ds):\n",
    "    ds_new = label_coord_grid_locs(ds)\n",
    "    ds_new = add_pop_dims_to_dataset(ds_new)\n",
    "    for vname in ds_new.variables:\n",
    "        if 'grid_loc' in ds_new[vname].attrs:\n",
    "            da = ds_new[vname]\n",
    "            dims_orig = da.dims\n",
    "            new_spatial_dims = dims_from_grid_loc(da.attrs['grid_loc'])\n",
    "            if dims_orig[0] == 'time':\n",
    "                dims = ('time',) + new_spatial_dims\n",
    "            else:\n",
    "                dims = new_spatial_dims\n",
    "            ds_new[vname] = xr.Variable(dims, da.data, da.attrs, da.encoding, fastpath=True)\n",
    "    return ds_new"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ds_new = relabel_pop_dims(ds)\n",
    "ds_new"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "### Create and XGCM Grid object to manipulate the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "grid = xgcm.Grid(ds_new)\n",
    "grid"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# here we strip the coordinates out of the dataset\n",
    "# this makes the calculation below work better\n",
    "ds_coords = ds_new.coords.to_dataset().reset_coords().drop('time')\n",
    "ds_raw = ds_new.reset_coords(drop=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Calculate Vorticity\n",
    "\n",
    "We now use xgcm to calculate vorticity using the B-grid discretization described in the POP manual.\n",
    "\n",
    "$$ \\zeta = \\frac{1}{\\Delta_y} \\delta_x \\overline{\\Delta_y u_y}^y -  \n",
    "           \\frac{1}{\\Delta_x} \\delta_y \\overline{\\Delta_x u_x}^x $$\n",
    "           \n",
    "This all happens lazily (no data is actually loaded yet)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "zeta = ( grid.diff(grid.interp(ds_raw.V1_1 * ds_coords.DYU, 'Y'), 'X')\n",
    "       - grid.diff(grid.interp(ds_raw.U1_1 * ds_coords.DXU, 'X'), 'Y') ) / ds_coords.TAREA\n",
    "zeta"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Note: this calculation creates over a million dask chunks\n",
    "len(zeta.data.dask)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plot a Vorticity Snapshot\n",
    "\n",
    "First we use the standard matplotlib approach."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "zeta[0].load().plot(vmax=1e-4, figsize=(15,8))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data is too high-resolution to visualize that way. We are aliasing fine structure\n",
    "\n",
    "### Visualize with Holoviews and Datashader\n",
    "\n",
    "Here we create an interactive browser for the data that dynamically resamples the image resolution as we zoom in and out."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import holoviews as hv\n",
    "import datashader\n",
    "from holoviews.operation.datashader import regrid, shade, datashade\n",
    "hv.extension('bokeh', width=100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hv_ds = hv.Dataset(zeta.rename('zeta'))\n",
    "im = hv_ds.to(hv.Image, kdims=[\"nlon_t\", \"nlat_t\"], dynamic=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%opts Image [width=800 height=400 colorbar=True] (cmap='RdBu_r')\n",
    "\n",
    "regrid(im)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Vorticity Histogram with Dask\n",
    "\n",
    "Let's calculate the PDF of vorticity for many timesteps.\n",
    "\n",
    "First we create a dask cluster"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dask.distributed import Client, progress\n",
    "from dask_gateway import Gateway\n",
    "\n",
    "gateway = Gateway()\n",
    "\n",
    "cluster = gateway.new_cluster()\n",
    "cluster.scale(20)\n",
    "cluster"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(👆don't forget to look at the dashboard!)\n",
    "\n",
    "...and connect to it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "client = Client(cluster)\n",
    "client"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Because xarray doesn't have out-of-core histogram capability, we operate directly on the lower level dask arrays."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import dask.array as dsa"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ntimesteps = 600\n",
    "h, bins = dsa.histogram(zeta[:ntimesteps].fillna(0.).data, bins=101, range=(-2e-3, 2e-3))\n",
    "hc = h.persist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.bar(bins[:-1], np.log10(hc), width=(bins[1]-bins[0]))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}