{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "# Concatenating a gridded rainfall reanalysis dataset into a time series"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Context\n",
    "### Purpose\n",
    "To load and extract a region of interest from a gridded rainfall reanalysis dataset, and concatenate into a time series using the [Iris package](https://scitools-iris.readthedocs.io/en/stable/).\n",
    "\n",
    "### Preprocessing description\n",
    "Time series data allows us to carry out a wide range of analyses including but not limited to trend, seasonality, anomaly detection and causality. As most of the climatological datasets are gridded, we provide a general tool to preprocess them into time series. The example global dataset from NCEP/NCAR reanalysis has a fairly low resolution (T62 Gaussian grid or approximately 1.9 * 1.9 degrees lat/long) which allows easy execution {cite:p}`QuantifyingCausalPathwaysofTeleconnections`. It is openly available with a variety of atmospheric variables at near surface levels in daily and monthly frequencies as well as long-term monthly mean in NetCDF format, which is described in and can be obtained from the [NOAA Physical Sciences Laboratory](https://psl.noaa.gov/data/gridded/data.ncep.reanalysis.derived.surfaceflux.html).\n",
    "\n",
    "This notebook uses a single sample data file for global daily precipitation rate (monthly mean) included with the notebook.\n",
    "\n",
    "### Highlights\n",
    "* Data for the entire globe is loaded and plotted using Iris\n",
    "* Seasonal means are created by aggregating the data\n",
    "* The Indonesian Borneo region is extracted and plotted\n",
    "* The area-averaged time series for Indonesian Borneo region is created \n",
    "* A particular season and timeframe are extracted from the time series\n",
    "\n",
    "### Contributions\n",
    "\n",
    "#### Dataset originator/creator\n",
    "* NOAA National Center for Environmental Prediction (creator)\n",
    "\n",
    "#### Dataset authors\n",
    "* Eugenia Kalnay, Director, NCEP Environmental Modeling Center\n",
    "\n",
    ":::{note}\n",
    "NCEP-NCAR Reanalysis 1 data {cite:p}`TheNCEPNCAR40YearReanalysisProject` provided by the NOAA PSL, Boulder, Colorado, USA, from their website at https://psl.noaa.gov\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load libraries"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "pycharm": {
     "is_executing": true
    },
    "tags": [
     "hide-input"
    ]
   },
   "source": [
    "import os\n",
    "import iris\n",
    "import iris.quickplot as qplt\n",
    "import iris.coord_categorisation as coord_cat\n",
    "\n",
    "import cf_units\n",
    "import nc_time_axis\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "import urllib.request\n",
    "\n",
    "import holoviews as hv\n",
    "import geoviews as gv\n",
    "\n",
    "import warnings\n",
    "warnings.filterwarnings(action='ignore')\n",
    "\n",
    "%matplotlib inline\n",
    "hv.extension('bokeh')"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set project structure"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "notebook_folder = './notebook'\n",
    "if not os.path.exists(notebook_folder):\n",
    "    os.makedirs(notebook_folder)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Retrieve and/or load a sample data file"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "scrolled": true
   },
   "source": [
    "filepath = 'https://downloads.psl.noaa.gov/Datasets/ncep.reanalysis.derived/surface_gauss/'\n",
    "filename = 'prate.sfc.mon.mean.nc'\n",
    "\n",
    "if not os.path.exists(os.path.join(notebook_folder,filename)):\n",
    "    urllib.request.urlretrieve(os.path.join(filepath,filename), os.path.join(notebook_folder, filename))"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Load monthly precipitation data into an iris cube\n",
    "precip = iris.load_cube(os.path.join(notebook_folder, filename), 'Monthly Mean of Precipitation Rate')\n",
    "precip.coord('latitude').guess_bounds()\n",
    "precip.coord('longitude').guess_bounds()\n",
    "print(precip)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From `print(precip)` we have an idea whether the metadata is complete and where the possible gaps are. In case the iris cube does not contain a unit, we can set it as follows:"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Set unit of precipitation data, if the cube does not contain it\n",
    "unit_to_add = 'kg m-2 s-1'\n",
    "if precip.units == 'unknown':\n",
    "    precip.units = cf_units.Unit (unit_to_add)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualisation\n",
    "\n",
    "Here we use the `Iris quickplot` wrapper to `matplotlib` (static plot with limited interactivity), and `holoviews` to interactive plotting the gridded data with added coastline."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Static plot"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "scrolled": true,
    "tags": [
     "hide-input"
    ]
   },
   "source": [
    "# Plot data of the first time step using iris quickplot with pcolormesh \n",
    "qplt.pcolormesh(precip[0])\n",
    "plt.gca().coastlines(color='white')\n",
    "plt.show()"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Interactive plot"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "scrolled": true,
    "tags": [
     "hide-input"
    ]
   },
   "source": [
    "# Declare some options\n",
    "options = dict(width=600, height=350, yaxis='left', colorbar=True,\n",
    "               toolbar='above', cmap='viridis', infer_projection=True, tools=['hover'])\n",
    "\n",
    "# Create a geoviews dataset object\n",
    "rainfall_ds = gv.Dataset(precip[0], ['longitude', 'latitude'], 'Monthly Mean Of Precipitation Rate (kg m-2 s-1)',\n",
    "                             group='Monthly mean of precipitation rate')\n",
    "\n",
    "plot_rainfall = rainfall_ds.to.image().opts(**options) * gv.feature.coastline().opts(line_color='white')\n",
    "plot_rainfall"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create seasonal means\n",
    "\n",
    "Here we construct seasonal means from the monthly data for each grid, for the purpose of extracting a particular season of interest later on."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Add auxiliary coordinates to the cube to indicate each season\n",
    "coord_cat.add_season(precip, 'time', name='clim_season')\n",
    "coord_cat.add_season_year(precip, 'time', name='season_year')"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "code",
   "metadata": {
    "scrolled": true
   },
   "source": [
    "print(precip)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Aggregate by season\n",
    "annual_seasonal_mean = precip.aggregated_by(\n",
    "    ['clim_season', 'season_year'],\n",
    "    iris.analysis.MEAN)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Check this worked\n",
    "for season, year in zip(\n",
    "    annual_seasonal_mean.coord('clim_season')[:10].points,\n",
    "    annual_seasonal_mean.coord('season_year')[:10].points):\n",
    "    print(season + ' ' + str(year))"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extract Borneo region\n",
    "\n",
    "Here we extract our area of study which covers the Indonesian Borneo region, as specified by Melendy et al. (2014) (available at https://daac.ornl.gov/CMS/guides/CMS_LiDAR_Indonesia.html). "
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Create a constraint for the latitude and Longitude extents\n",
    "Borneo_lat = iris.Constraint(latitude=lambda v: v > -4.757 and v <= 3.211 )\n",
    "Borneo_lon = iris.Constraint(longitude=lambda v: v > 107.815 and v <= 117.987 )\n",
    "\n",
    "# Extract data based on the spatial extent\n",
    "Borneo =  annual_seasonal_mean.extract(Borneo_lat & Borneo_lon) \n",
    "print(Borneo)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Plot data of the first season in the study region using iris quickplot with pcolormesh\n",
    "qplt.pcolormesh(Borneo[0]) \n",
    "plt.gca().coastlines(color='white')\n",
    "plt.show()"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create area-averaged time series\n",
    "\n",
    "To construct a seasonal rainfall time series for the study region, we first compute the areal average rainfall. Note that due to the spherical nature of the planet Earth, the area of every grid-box is not the same, therefore we need to perform the weighted mean based on the weights by area."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Create area-weights array\n",
    "grid_area_weights = iris.analysis.cartography.area_weights(Borneo)\n",
    "\n",
    "# Perform the area-weighted mean using the computed grid-box areas.\n",
    "Borneo_mean = Borneo.collapsed(['latitude', 'longitude'],\n",
    "                                iris.analysis.MEAN,\n",
    "                                weights=grid_area_weights)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We then extract the temporal timescale of interest (Boreal Summers from 1950 - 2019)."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "jja_constraint = iris.Constraint(clim_season='jja')\n",
    "year_constraint = iris.Constraint(season_year=lambda v: v > 1949 and v <= 2019 )\n",
    "\n",
    "Borneo_jja = Borneo_mean.extract(jja_constraint & year_constraint)\n",
    "print(Borneo_jja)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we use the `Iris quickplot` wrapper to `matplotlib pyplot` (static with limited interactivity) and `holoviews` to interactive plotting the time series generated."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Static plot"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "source": [
    "# Plot time series using iris quickplot\n",
    "qplt.plot(Borneo_jja)\n",
    "plt.title('Borneo JJA Precipitation')\n",
    "plt.show()"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Interactive plot"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "source": [
    "# As holoviews does not support direct plotting of a non-gridded cube object, we need to decompose the cube into its x- and y-axes.\n",
    "time = Borneo_jja.coord('season_year').points\n",
    "data = Borneo_jja.data\n",
    "\n",
    "# Create a holoviews time series object\n",
    "Borneo_jja_dynamic = hv.Curve((time, data), 'Time', 'Monthly Mean Of Precipitation Rate (kg m-2 s-1)')\n",
    "\n",
    "# Show the plot and declare some options\n",
    "plot_timeseries = Borneo_jja_dynamic.opts(width=600, height=350, yaxis='left', tools=['hover'], title=\"Borneo JJA Precipitation\")\n",
    "plot_timeseries"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Save as a new NetCDF file"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "iris.save(Borneo_jja, os.path.join(notebook_folder, 'Borneo_precip_mean.nc'))"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "This notebook has demonstrated the use of the `Iris` package to easily load, plot and manipulate gridded environmental NetCDF data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Citing this Notebook\n",
    "\n",
    "Please see [CITATION.cff](https://github.com/eds-book/ea34568e-d86e-4720-be2f-3f826f66a26c/blob/main/CITATION.cff) for the full citation information. The citation file can be exported to APA or BibTex formats (learn more [here](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files))."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Additional information\n",
    "\n",
    "**Review**: This notebook has been reviewed by one or more members of the Environmental Data Science book community. The open review is available [here](https://github.com/eds-book-gallery/ea34568e-d86e-4720-be2f-3f826f66a26c/pull/1).\n",
    "\n",
    "**License**: The code in this notebook is licensed under the MIT License. The Environmental Data Science book is licensed under the Creative Commons by Attribution 4.0 license. See further details [here](https://github.com/alan-turing-institute/environmental-ds-book/blob/main/LICENSE).\n",
    "\n",
    "**Contact**: If you have any suggestion or report an issue with this notebook, feel free to [create an issue](https://github.com/alan-turing-institute/environmental-ds-book/issues/new/choose) or send a direct message to [environmental.ds.book@gmail.com](mailto:environmental.ds.book@gmail.com)."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "tags": [
     "remove-input"
    ]
   },
   "source": [
    "from datetime import date\n",
    "\n",
    "print('Notebook repository version: v2025.6.0')\n",
    "print(f'Last tested: {date.today()}')"
   ],
   "outputs": [],
   "execution_count": null
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}