{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# NCAR CESM High Resolution Ocean Simulation\n", "\n", "In this notebook, we load and analyze data described in the following paper:\n", "\n", "Small, R. J., et al. (2014), _A new synoptic scale resolving global climate simulation using the Community Earth System Model_, J. Adv. Model. Earth Syst., 6, 1065โ€“1094.\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import xarray as xr\n", "import numpy as np\n", "import xgcm\n", "import gcsfs\n", "from matplotlib import pyplot as plt\n", "plt.rcParams['figure.figsize'] = (14,8)\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from intake import open_catalog\n", "\n", "cat = open_catalog(\"https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/master.yaml\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load High Resolution Ocean Surface Fields\n", "\n", "Stored on Google Cloud Storage in Zarr format. Accessed using intake." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds = cat.ocean.CESM_POP.CESM_POP_hires_control.to_dask()\n", "ds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Transform POP Model Dimensions\n", "\n", "The POP models uses the same dimensions (`nlat`, `nlon`) to describe all data variables,\n", "despite the fact that these variables live at different points on the grid cell (T point vs. U point).\n", "The true positions of the variables are encoded in the attribute `grid_loc`:\n", "\n", " grid_loc 1: dimensionality\n", " grid_loc 2: horizontal location in x: 1 = TLONG; 2 = ULONG\n", " grid_loc 3: horizontal location in y: 1 = TLAT; 2 = ULAT\n", " grid_loc 4: vertical grid: 0 = surface, 1 = z_t, 2 = z_w, 3 = z_w_bot, 4 = z_t_150m\n", "\n", "This choice is not optimal for using xarray and [xgcm](http://xgcm.readthedocs.org/),\n", "since it implies that those variables can be directly added / multiplied, etc.\n", "So we need to modify the dataset to fix this, by parsing `grid_loc` and introducing new coordinates\n", "(`nlat_u`, `nlat_t`, `nlon_u`, `nlon_t`).\n", "\n", "That's what the code below does. These routines are also archived in a standalone gist:\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def dims_from_grid_loc(grid_loc):\n", " grid_loc = str(grid_loc)\n", " ndim = int(grid_loc[0])\n", " x_loc_key = int(grid_loc[1])\n", " y_loc_key = int(grid_loc[2])\n", " z_loc_key = int(grid_loc[3])\n", " \n", " x_loc = {1: 'nlon_t', 2: 'nlon_u'}[x_loc_key]\n", " y_loc = {1: 'nlat_t', 2: 'nlat_u'}[y_loc_key]\n", " z_loc = {0: 'surface', 1: 'z_t', 2: 'z_w', 3: 'z_w_bot', 4: 'z_t_150m'}[z_loc_key]\n", " \n", " if ndim == 3:\n", " return z_loc, y_loc, x_loc\n", " elif ndim == 2:\n", " return y_loc, x_loc" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def label_coord_grid_locs(ds):\n", " grid_locs = {'ANGLE': '2220', 'ANGLET': '2110',\n", " 'DXT': '2110', 'DXU': '2220',\n", " 'DYT': '2110', 'DYU': '2220',\n", " 'HT': '2110', 'HU': '2220',\n", " 'HTE': '2210', 'HTN': '2120',\n", " 'HUS': '2210', 'HUW': '2120',\n", " 'KMT': '2110', 'KMU': '2220',\n", " 'REGION_MASK': '2110',\n", " 'TAREA': '2110', 'TLAT': '2110', 'TLONG': '2110',\n", " 'UAREA': '2220', 'ULAT': '2220', 'ULONG': '2220'}\n", " ds_new = ds.copy()\n", " for vname, grid_loc in grid_locs.items():\n", " ds_new[vname].attrs['grid_loc'] = grid_loc\n", " return ds_new" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create some actual dimension coordinates\n", "def add_pop_dims_to_dataset(ds):\n", " ds_new = ds.copy()\n", " ds_new['nlon_u'] = xr.Variable(('nlon_u'), np.arange(len(ds.nlon)) + 1, {'axis': 'X', 'c_grid_axis_shift': 0.5})\n", " ds_new['nlat_u'] = xr.Variable(('nlat_u'), np.arange(len(ds.nlat)) + 1, {'axis': 'Y', 'c_grid_axis_shift': 0.5})\n", " ds_new['nlon_t'] = xr.Variable(('nlon_t'), np.arange(len(ds.nlon)) + 0.5, {'axis': 'X'})\n", " ds_new['nlat_t'] = xr.Variable(('nlat_t'), np.arange(len(ds.nlat)) + 0.5, {'axis': 'Y'})\n", " \n", " # add metadata to z grid\n", " ds_new['z_t'].attrs.update({'axis': 'Z'})\n", " ds_new['z_w'].attrs.update({'axis': 'Z', 'c_grid_axis_shift': -0.5})\n", " ds_new['z_w_top'].attrs.update({'axis': 'Z', 'c_grid_axis_shift': -0.5})\n", " ds_new['z_w_bot'].attrs.update({'axis': 'Z', 'c_grid_axis_shift': 0.5})\n", " \n", " return ds_new" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def relabel_pop_dims(ds):\n", " ds_new = label_coord_grid_locs(ds)\n", " ds_new = add_pop_dims_to_dataset(ds_new)\n", " for vname in ds_new.variables:\n", " if 'grid_loc' in ds_new[vname].attrs:\n", " da = ds_new[vname]\n", " dims_orig = da.dims\n", " new_spatial_dims = dims_from_grid_loc(da.attrs['grid_loc'])\n", " if dims_orig[0] == 'time':\n", " dims = ('time',) + new_spatial_dims\n", " else:\n", " dims = new_spatial_dims\n", " ds_new[vname] = xr.Variable(dims, da.data, da.attrs, da.encoding, fastpath=True)\n", " return ds_new" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds_new = relabel_pop_dims(ds)\n", "ds_new" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "### Create and XGCM Grid object to manipulate the data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "grid = xgcm.Grid(ds_new)\n", "grid" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# here we strip the coordinates out of the dataset\n", "# this makes the calculation below work better\n", "ds_coords = ds_new.coords.to_dataset().reset_coords().drop('time')\n", "ds_raw = ds_new.reset_coords(drop=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Calculate Vorticity\n", "\n", "We now use xgcm to calculate vorticity using the B-grid discretization described in the POP manual.\n", "\n", "$$ \\zeta = \\frac{1}{\\Delta_y} \\delta_x \\overline{\\Delta_y u_y}^y - \n", " \\frac{1}{\\Delta_x} \\delta_y \\overline{\\Delta_x u_x}^x $$\n", " \n", "This all happens lazily (no data is actually loaded yet)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "zeta = ( grid.diff(grid.interp(ds_raw.V1_1 * ds_coords.DYU, 'Y'), 'X')\n", " - grid.diff(grid.interp(ds_raw.U1_1 * ds_coords.DXU, 'X'), 'Y') ) / ds_coords.TAREA\n", "zeta" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Note: this calculation creates over a million dask chunks\n", "len(zeta.data.dask)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot a Vorticity Snapshot\n", "\n", "First we use the standard matplotlib approach." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "zeta[0].load().plot(vmax=1e-4, figsize=(15,8))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data is too high-resolution to visualize that way. We are aliasing fine structure\n", "\n", "### Visualize with Holoviews and Datashader\n", "\n", "Here we create an interactive browser for the data that dynamically resamples the image resolution as we zoom in and out." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import holoviews as hv\n", "import datashader\n", "from holoviews.operation.datashader import regrid, shade, datashade\n", "hv.extension('bokeh', width=100)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "hv_ds = hv.Dataset(zeta.rename('zeta'))\n", "im = hv_ds.to(hv.Image, kdims=[\"nlon_t\", \"nlat_t\"], dynamic=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%opts Image [width=800 height=400 colorbar=True] (cmap='RdBu_r')\n", "\n", "regrid(im)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Vorticity Histogram with Dask\n", "\n", "Let's calculate the PDF of vorticity for many timesteps.\n", "\n", "First we create a dask cluster" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dask.distributed import Client, progress\n", "from dask_gateway import Gateway\n", "\n", "gateway = Gateway()\n", "\n", "cluster = gateway.new_cluster()\n", "cluster.scale(20)\n", "cluster" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(๐Ÿ‘†don't forget to look at the dashboard!)\n", "\n", "...and connect to it" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "client = Client(cluster)\n", "client" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because xarray doesn't have out-of-core histogram capability, we operate directly on the lower level dask arrays." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import dask.array as dsa" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ntimesteps = 600\n", "h, bins = dsa.histogram(zeta[:ntimesteps].fillna(0.).data, bins=101, range=(-2e-3, 2e-3))\n", "hc = h.persist()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.bar(bins[:-1], np.log10(hc), width=(bins[1]-bins[0]))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" } }, "nbformat": 4, "nbformat_minor": 2 }