{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Analysis of Gridded Ensemble Precipitation and Temperature Estimates over the Contiguous United States\n",
    "====\n",
    "\n",
    "For this example, we'll open a 9 members of a 100 member ensemble of precipitation and temperature data. The analysis we do below is quite simple but the problem is a good illustration of an IO bound task. \n",
    "\n",
    "Link to dataset: https://www.earthsystemgrid.org/dataset/gridded_precip_and_temp.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "import numpy as np\n",
    "import xarray as xr\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Connect to Dask Distributed Cluster"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dask.distributed import Client, progress\n",
    "# HPC\n",
    "# client = Client(scheduler_file='/glade/scratch/jhamman/scheduler.json')\n",
    "# client\n",
    "\n",
    "from dask_kubernetes import KubeCluster\n",
    "cluster = KubeCluster(n_workers=10)\n",
    "cluster"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "client = Client(cluster)\n",
    "client"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Open Dataset\n",
    "\n",
    "We try storing a traditional atmospheric dataset on the cloud with two approaches\n",
    "\n",
    "1.  Hosting raw netCDF files from FUSE mounted file system based on the [gcsfs](gcsfs.readthedocs.io/en/latest/) Python project\n",
    "2.  Storing the data in an easier-to-read but new format, [Zarr](http://zarr.readthedocs.io/en/stable/)\n",
    "3.  Loading an [Intake](https://intake.readthedocs.io/en/latest/) catalog file from GCS, specifying the location of the zarr data.\n",
    "\n",
    "The dataset has dimensions of time, latitude, longitude, and ensmemble member. Each format is self-describing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "### Load with FUSE File system\n",
    "# ds = xr.open_mfdataset('/gcs/newmann-met-ensemble-netcdf/conus_ens_00*.nc',\n",
    "#                        engine='netcdf4', concat_dim='ensemble', chunks={'time': 50})\n",
    "\n",
    "### Load with Cloud object storage\n",
    "import gcsfs\n",
    "\n",
    "fs = gcsfs.GCSFileSystem(project='pangeo-181919', token='anon', access='read_only')\n",
    "gcsmap = gcsfs.mapping.GCSMap('pangeo-data/gmet_v1.zarr',\n",
    "                              gcs=fs, check=False, create=False)\n",
    "ds = xr.open_zarr(gcsmap)\n",
    "\n",
    "### Load with intake catalog service\n",
    "# import intake\n",
    "# cat = intake.Catalog('https://raw.githubusercontent.com/pangeo-data/pangeo/master/gce/catalog.yaml')\n",
    "# ds = cat.gmet_v1.read_chunked()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print dataset\n",
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Figure: Elevation and domain mask\n",
    "\n",
    "A quick plot of the mask to give us an idea of our spatial domain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if 'ensemble' in ds['elevation'].dims:\n",
    "    elevation = ds['elevation'].isel(ensemble=0).persist()\n",
    "else:\n",
    "    elevation = ds['elevation'].persist()\n",
    "elevation = elevation.where(elevation > 0)\n",
    "elevation.plot(figsize=(10, 6))\n",
    "plt.title('Domain Elevation')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Intra-ensemble range\n",
    "\n",
    "We calculate the intra-ensemble range for all the mean daily temperature in this dataset.  This gives us a sense of uncertainty."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "temp_mean = ds['t_mean'].mean(dim='time')\n",
    "spread = (temp_mean.max(dim='ensemble')\n",
    "        - temp_mean.min(dim='ensemble'))\n",
    "spread"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Calling compute\n",
    "The expressions above didn't actually compute anything. They just build the task graph. To do the computations, we call the `compute` or `persist` methods:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spread = spread.persist()\n",
    "progress(spread)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Figure: Intra-ensemble range\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spread\n",
    "spread.attrs['units'] = 'degC'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spread.plot(robust=True, figsize=(10, 6))\n",
    "plt.title('Intra-ensemble range in mean annual temperature')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Average seasonal snowfall\n",
    "\n",
    "We can compute a crude estimate of average seasonal snowfall using the temperature and precipitation variables in our dataset. Here, we'll look at the first 4 ensemble members and make some maps of the seasonal total snowfall in each ensemble member."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "da_snow = ds['pcp'].where(ds['t_mean'] < 0.).resample(time='QS-Mar').sum('time')\n",
    "seasonal_snow = da_snow.isel(ensemble=slice(0, 4)).groupby('time.season').mean('time').persist()\n",
    "progress(seasonal_snow)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# properly sort the seasons\n",
    "seasonal_snow = seasonal_snow.sel(season=['DJF', 'MAM','JJA', 'SON'])\n",
    "seasonal_snow.attrs['units'] = 'mm/season'\n",
    "seasonal_snow"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Figure: Average seasonal snowfall totals "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "seasonal_snow.plot.pcolormesh(col='season', row='ensemble', cmap='Blues', robust=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Extract a time series of annual maximum precipitation events over a region\n",
    "\n",
    "In the previous two examples, we've mostly reduced the time and/or ensemble dimension. Here, we'll do a reduction operation on the spatial dimension to look at a time series of extreme precipitation events near Austin, TX (30.2672° N, 97.7431° W)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "buf = 0.25  # look at Austin +/- 0.25 deg\n",
    "\n",
    "ds_tx = ds.sel(lon=slice(-97.7431-buf, -97.7431+buf), lat=slice(30.2672-buf, 30.2672+buf))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pcp_ann_max = ds_tx['pcp'].resample(time='AS').max('time')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pcp_ann_max_ts = pcp_ann_max.max(('lat', 'lon')).persist()\n",
    "progress(pcp_ann_max_ts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Figure: Timeseries of maximum precipitation near Austin, Tx."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ax = pcp_ann_max_ts.transpose().to_pandas().plot(title='Maximum precipitation near Austin, Tx', legend=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}