{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# CM2.6 Ocean Model Analysis\n", "\n", "This notebook shows how to load and analyze ocean data from the GFDL [CM2.6](https://www.gfdl.noaa.gov/cm2-6/) high-resolution climate simulation.\n", "\n", "![CM2.6 SST](https://www.gfdl.noaa.gov/wp-content/uploads/ih/2012/06/cm2.6.png)\n", "\n", "Right now the only output available is the 5-day 3D fields of horizontal velocity, temperature, and salinity. We hope to add more going forward.\n", "\n", "Thanks to [Stephen Griffies](https://www.gfdl.noaa.gov/stephen-griffies-homepage/) for providing the data.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import numpy as np\n", "import xarray as xr\n", "import matplotlib.pyplot as plt\n", "import holoviews as hv\n", "import datashader\n", "import intake\n", "from holoviews.operation.datashader import regrid, shade, datashade\n", "\n", "hv.extension('bokeh', width=100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create and Connect to Dask Distributed Cluster\n", "\n", "This will launch a cluster of virtual machines in the cloud." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dask.distributed import Client, progress\n", "from dask_gateway import Gateway\n", "\n", "gateway = Gateway()\n", "cluster = gateway.new_cluster()\n", "cluster.scale(40)\n", "cluster" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "👆 Don't forget to click this link to get the cluster dashboard" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "client = Client(cluster)\n", "client" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load CM 2.6 Data\n", "\n", "This data is stored in [xarray-zarr](http://xarray.pydata.org/en/latest/io.html#zarr) format in Google Cloud Storage.\n", "This format is optimized for parallel distributed reads from within the cloud environment.\n", "\n", "It may take up to a minute to initialize the dataset when you run this cell." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from intake import open_catalog\n", "\n", "cat = open_catalog(\"https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/master.yaml\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Can also select GFDL_CM2_6_one_percent_ocean\n", "ds = cat.ocean.GFDL_CM2_6.GFDL_CM2_6_control_ocean.to_dask()\n", "ds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualize Temperature Data with Holoviews and Datashader\n", "\n", "The cells below show how to interactively explore the dataset.\n", "\n", "_**Warning**: it takes ~10-20 seconds to render each image after moving the sliders. Please be patient. There is an open [github issue](https://github.com/bokeh/datashader/issues/598) about improving the performance of datashader with this sort of dataset._" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "hv_ds = hv.Dataset(ds['temp'])\n", "qm = hv_ds.to(hv.QuadMesh, kdims=[\"xt_ocean\", \"yt_ocean\"], dynamic=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%opts QuadMesh [width=800 height=500 colorbar=True] (cmap='magma') \n", "regrid(qm, precompute=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Make an Expensive Calculation\n", "\n", "Here we make a big reduction by taking the time and zonal mean of the temperature. This demonstrates how the cluster distributes the reads from storage." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "temp_zonal_mean = ds.temp.mean(dim=('time', 'xt_ocean'))\n", "temp_zonal_mean" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Depending on the size of your cluster, this next cell will take a while. On a cluster of 40 workers, it took ~12 minutes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%time temp_zonal_mean.load()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=(16,8))\n", "temp_zonal_mean.plot.contourf(yincrease=False, levels=np.arange(-2,30))\n", "plt.title('Naive Zonal Mean Temperature')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" } }, "nbformat": 4, "nbformat_minor": 2 }