{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<p><font size=\"6\"><b>CASE - Sea Surface Temperature data</b></font></p>\n",
    "\n",
    "\n",
    "> *DS Python for GIS and Geoscience*  \n",
    "> *October, 2020*\n",
    ">\n",
    "> *© 2020, Joris Van den Bossche and Stijn Van Hoey. Licensed under [CC BY 4.0 Creative Commons](http://creativecommons.org/licenses/by/4.0/)*\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "import xarray as xr"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "For this use case, we focus on the [Extended Reconstructed Sea Surface Temperature (ERSST)](https://www.ncdc.noaa.gov/data-access/marineocean-data/extended-reconstructed-sea-surface-temperature-ersst-v4), a widely used and trusted gridded compilation of historical Sea Surface Temperature (SST).\n",
    "\n",
    "> The Extended Reconstructed Sea Surface Temperature (ERSST) dataset is a global monthly sea surface temperature dataset derived from the International Comprehensive Ocean–Atmosphere Dataset (ICOADS). It is produced on a 2° × 2° grid with spatial completeness enhanced using statistical methods. This monthly analysis begins in January 1854 continuing to the present and includes anomalies computed with respect to a 1971–2000 monthly climatology. \n",
    "\n",
    "\n",
    "\n",
    "First we download the dataset. We will use the [NOAA Extended Reconstructed Sea Surface Temperature (ERSST)](https://psl.noaa.gov/thredds/catalog/Datasets/noaa.ersst/catalog.html?dataset=Datasets/noaa.ersst/sst.mnmean.v4.nc) v4 product. Download the data from this link: https://psl.noaa.gov/thredds/fileServer/Datasets/noaa.ersst/sst.mnmean.v4.nc and store it in the same folder as the notebook as `sst.mnmean.v4.nc`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Reading in the data set, ignoring the `time_bnds` variable: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "data = './sst.mnmean.v4.nc'\n",
    "ds = xr.open_dataset(data, drop_variables=['time_bnds'], engine=\"h5netcdf\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "For this use case, we will focus on the years after 1960, so we slice the data from 1960 and load the data into our computer memory. By only loading the data after the initial slice, we make sure to only load into memory the data we specifically need:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "ds = ds.sel(time=slice('1960', '2018')).load()  # load into memory\n",
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "The data with the extension `nc` is a NetCDF format. NetCDF (Network Common Data Format) is the most widely used format for distributing geoscience data. NetCDF is maintained by the [Unidata](https://www.unidata.ucar.edu/) organization. Check the [netcdf website](https://www.unidata.ucar.edu/software/netcdf/docs/faq.html#whatisit) for more information. Xarray was designed to make reading netCDF files in python as easy, powerful, and flexible as possible. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "__Note:__ As the data is in a [OPeNDAP server](https://en.wikipedia.org/wiki/OPeNDAP), we could also load the NETCDF data directly without downloading anything. This would require us to add the `netcdf4` package in our conda environment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### Exploratory data analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "The data contains a single data variable `sst` and has 3 dimensions: lon, lat and time each described by a coordinate. Let's first get some insight in the structure and content of the data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "**EXERCISE**:\n",
    "\n",
    "- What is the total amount of elements/values in the xarray data set?\n",
    "- How many elements are there in the different dimensions   \n",
    "- The metadata of a netcdf file is also interpreted by xarray. Are the attributes on the xarray.Dataset `ds` the same as the attributes of the `sst` data itself?\n",
    "   \n",
    "<details>\n",
    "\n",
    "<summary>Hints</summary>\n",
    "    \n",
    "- The number of elements or `size` of an array is an attribute of an xarray.DataArray and not of a xarray.Dataset\n",
    "- Also the `shape` of an array is an attribute of an xarray.DataArray. A xarray.Dataset has the `dims` attribute to query dimension sizes\n",
    "\n",
    "</details>    \n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature1.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature2.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature3.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature4.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature5.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "---------"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "As we work with a single data variable, we will introduce a new variable `sst` which is the `xarray.DataArray` of the SST values. Note that we only keep the attributes on the xarray.DataArray level."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "sst = ds[\"sst\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "sst"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "**EXERCISE**:\n",
    "\n",
    "Make an image plot of the SST in the first month of the data set, January 1960. Adjust the range of the colorbar and switch to the `coolwarm` colormap.\n",
    "   \n",
    "<details>\n",
    "\n",
    "<summary>Hints</summary>\n",
    "    \n",
    "- xarray can interpret a date string in the [ISO 8601](https://nl.wikipedia.org/wiki/ISO_8601) format as a date, e.g. `2020-01-01`.\n",
    "- adjust ranges of the colorbar with `vmin` and `vmax`.\n",
    "\n",
    "</details>    \n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature6.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "__Note__ \n",
    "xaray uses xarray.plot.pcolormesh() as the default two-dimensional plot method because it is more flexible than xarray.plot.imshow(). However, for large arrays, imshow can be much faster than pcolormesh. If speed is important to you and you are plotting a regular mesh, consider using imshow."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "**EXERCISE**:\n",
    "    \n",
    "How did the SST evolve in time for a specific location on the earth? Make a line plot of the SST at lon=300, lat=50 as a function of time.\n",
    "    \n",
    "Do you recognize the seasonality of the data?\n",
    "   \n",
    "<details>\n",
    "\n",
    "<summary>Hints</summary>\n",
    "    \n",
    "- Use the `sel` for both the lon/lat selection.\n",
    "\n",
    "</details>    \n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature7.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "**EXERCISE**:\n",
    "\n",
    "What is the evolution of the SST as function of the month of the year?\n",
    "    \n",
    "Calculate the average SST with respect to the _month of the year_ for all positions in the data set and store the result as a variable `ds_mm`.\n",
    "\n",
    "Use the `ds_mm` variable to make a plot: For longitude `164`, make a comparison in between the monthly average at latitude `-23.4` versus latitude `23.4`. Use a line plot with in the x-axis the month of the year and in the y-axis the average SST.\n",
    "   \n",
    "<details>\n",
    "\n",
    "<summary>Hints</summary>\n",
    "    \n",
    "- Use the `sel` for both the lon/lat selection.\n",
    "- If the exact values are not in the coordinate, you can use the `method=\"nearest\"` inside a selection.\n",
    "\n",
    "</details>    \n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature8.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature9.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "**EXERCISE**:\n",
    "    \n",
    "How does the zonal mean climatology for each month of the year changes with the latitude? \n",
    "\n",
    "Reuse the `ds_mm` from the previous exercise or recalculate the average SST with respect to the _month of the year_ for all positions in the data set and store the result as a variable `ds_mm`.\n",
    "    \n",
    "To check the mean climatology (aggregating over the longitudes) as a function of the latitude for each month in the year, calculate the average SST over the `lon` dimension from `ds_mm`. Plot the result as an image with the month of the year in the x-axis and the latitude in the y-axis. \n",
    "   \n",
    "<details>\n",
    "\n",
    "<summary>Hints</summary>\n",
    "\n",
    "- You do not need another `groupby`, but need to calculate a reduction along a dimension.\n",
    "\n",
    "</details>    \n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature10.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature11.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature12.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "**EXERCISE**:\n",
    "    \n",
    "How different is the mean climatology in between January and July?\n",
    "\n",
    "Reuse the `ds_mm` variable from the previous exercises or recalculate the average SST with respect to the _month of the year_ for all positions in the data set and store the result as a variable `ds_mm`.\n",
    "    \n",
    "Calculate the difference of the mean climatology between January an July and plot the result as an image (map) with the longitude of the year in the x-axis and the latitude in the y-axis. \n",
    "   \n",
    "<details>\n",
    "\n",
    "<summary>Hints</summary>\n",
    "\n",
    "- You can subtract xarray just as Numpy arrays. You do not need another `groupby`, but only selections from the `ds_mm` variable. \n",
    "\n",
    "</details>    \n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature13.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### Calculate the residual by removing climatology"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "To understand how the SST temperature evolved as a function of time during the last decades, we want to remove this climatology from the dataset and examine the residual, called the anomaly, which is the interesting part from a climate perspective. \n",
    "\n",
    "We will do this by subtracting the monthly average from the values of that specific month. Hence, subtract the average January value over the years from the January data, subtract the average February value over the years from the February data,...\n",
    "\n",
    "Removing the seasonal climatology is an example of a transformation: it operates over a group, but does not change the size of the dataset as we do the operation on each element (`x - x.mean()`) \n",
    "\n",
    "This is not the same as the aggregations (e.g. `average`) we applied on each of the groups earlier. When using `groupby`, a calculation to the group can be applied and just like in Pandas, these calculations can either be:\n",
    "\n",
    "- _aggregation_: reduces the size of the group\n",
    "- _transformation_: preserves the groups full size\n",
    "\n",
    "One way to consider is that we `apply` a function to each of the groups. For our anomaly calculation we want to do a _transformation_ and apply the following function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "def remove_time_mean(x):\n",
    "    \"\"\"Subtract each value by the mean over time\"\"\"\n",
    "    return x - x.mean(dim='time')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "We can `apply` this function to each of the groups:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "sst = ds[\"sst\"]\n",
    "ds_anom = sst.groupby('time.month').apply(remove_time_mean)\n",
    "ds_anom"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "In other words: \n",
    "\n",
    "> subtract each element by the average over time of all elements of the month the element belongs to"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Xarray makes these sorts of transformations easy by supporting groupby arithmetic. This concept is easiest explained by applying it for our application:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "gb = sst.groupby('time.month')  # make groups (in this example each month of the year is a group) \n",
    "ds_anom = gb - gb.mean(dim='time')  # subtract each element of the group/month by the mean of that group/month over time \n",
    "ds_anom"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Now we can view the climate signal without the overwhelming influence of the seasonal cycle:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "ds_anom.sel(lon=300, lat=50).plot.line()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Check the difference between Jan. 1 2018 and Jan. 1 1960 to see where the evolution in time is the largest:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "(ds_anom.sel(time='2018-01-01') - ds_anom.sel(time='1960-01-01')).plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "**EXERCISE**:\n",
    "    \n",
    "Compute the _five-year median_ of the `ds_anom` variable for the location `lon=300`, `lat=50` as well as the 12 month rolling median of the same data set. Store the output as respectively `ds_anom_resample` and `ds_anom_rolling`.\n",
    "    \n",
    "Make a line plot as a function of time for the location `lon=300`, `lat=50` of the original `ds_anom` data, the resampled data and the rolling median data.\n",
    "   \n",
    "<details>\n",
    "\n",
    "<summary>Hints</summary>\n",
    "\n",
    "- If you only need a single location, do the slicing (selecting) first instead of calculating them for all positions.\n",
    "- Use the `resample` and the `rolling` functions.\n",
    "\n",
    "\n",
    "</details>    \n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature14.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature15.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature16.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## Make projection aware maps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "The previous maps were the default outputs of xarray without specification of the spatial context. For reporting these plots are not appropriate. We can use the [cartopy](https://scitools.org.uk/cartopy/docs/latest/) package to adjust our Matplotlib axis to make them spatially aware.\n",
    "\n",
    "To make sure the data of xarray can be integrated in a cartopy plot, the crucial element is to define the `transform` argument to to control which coordinate system that the given data is in. You can add the transform keyword with an appropriate `cartopy.crs.CRS` instance from the `import cartopy.crs` module:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "import cartopy.crs as ccrs\n",
    "import cartopy\n",
    "\n",
    "map_proj = ccrs.Robinson()  # Define the projection\n",
    "\n",
    "fig, ax = plt.subplots(figsize = (16,9), subplot_kw={\"projection\": map_proj})\n",
    "ax.gridlines()\n",
    "ax.coastlines()\n",
    "\n",
    "sst.sel(time=\"1960-01-01\").plot(ax=ax, vmin=-2, vmax=30,  center=5,\n",
    "                                cmap='coolwarm', transform = ccrs.PlateCarree(), # tranform argument\n",
    "                                cbar_kwargs={'shrink':0.75})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "**EXERCISE**:\n",
    "    \n",
    "Make a plot of the `ds_anom` variable of 2018-01-01 with cartopy.\n",
    "    \n",
    "- Use the `ccrs.Orthographic` with the central lon/lat on -20, 5\n",
    "- Add coastlines and gridlines to the plot    \n",
    "   \n",
    "   \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature17.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### Spatial aggregate per basin"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Apart from aggregations as a function of time, also spatial aggregations using other (spatial) data sets can be achieved. In the next section, we want to compute the average SST over different ocean basins. The http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NODC/.WOA09/.Masks/.basin/ is a data set that contains the main ocean basins in lon/lat:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "basin = xr.open_dataset(\"./data/basin.nc\")\n",
    "basin = basin.rename({'X': 'lon', 'Y': 'lat'})\n",
    "basin[\"basin\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "The name of the basins are included in the attributes of the data set. Using Pandas, we can create a mapping in between the basin names and the index used in the basin data set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "basin_names = basin[\"basin\"].attrs['CLIST'].split('\\n')\n",
    "basin_s = pd.Series(basin_names, index=np.arange(1, len(basin_names)+1))\n",
    "basin_s = basin_s.rename('basin')\n",
    "basin_s.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "We will use this mapping from identifier to label later in the analysis."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "The basin data set provides multiple Z levels. We are interested in the division on surface level (0.0):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "basin_surface = basin[\"basin\"].sel(Z=0.0).drop_vars(\"Z\")\n",
    "basin_surface.plot(vmax=10, cmap='tab10')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "The next step is to align both data sets. For this application, using the 'nearest' available data point will work to map both data sets with each other. Xarray provides the function `interp_like` to interpolate the `basin_surface` to the `sst` variable:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "basin_surface_interp = basin_surface.interp_like(sst, method='nearest')\n",
    "basin_surface_interp.plot(vmax=10, cmap='tab10')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "**EXERCISE**:\n",
    "\n",
    "Compute the mean SST (over all dimensions) for each of the basins in the `basin_surface` variable starting from the `sst` variable. \n",
    "\n",
    "Next, we want to plot a horizontal bar chart with the SST for each bar chart. To do so:\n",
    "    \n",
    "- Convert the output to Pandas DataFrame.\n",
    "- Combine the output with the `basin_s` variable by merging on the index (identifiers of the basin names).\n",
    "- Create a horizontal barplot of the average temperature for each of the basins using the resulting dataframe.\n",
    "   \n",
    "<details>\n",
    "\n",
    "<summary>Hints</summary>\n",
    "\n",
    "- Use a `groupby` with the `basin_surface_interp` as input.\n",
    "- Joining and merging of tables? See the [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html). \n",
    "\n",
    "</details>    \n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature18.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature19.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature20.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature21.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "clear_cell": true,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# %load _solutions/case-sea-surface-temperature22.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "-------\n",
    "\n",
    "Acknowledgements to https://earth-env-data-science.github.io/lectures/xarray/xarray-part2.html"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Nbtutor - export exercises",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}