{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "
CASE - Sea Surface Temperature data
\n", "\n", "\n", "> *DS Python for GIS and Geoscience* \n", "> *October, 2020*\n", ">\n", "> *© 2020, Joris Van den Bossche and Stijn Van Hoey. Licensed under [CC BY 4.0 Creative Commons](http://creativecommons.org/licenses/by/4.0/)*\n", "\n", "---" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "import xarray as xr" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "For this use case, we focus on the [Extended Reconstructed Sea Surface Temperature (ERSST)](https://www.ncdc.noaa.gov/data-access/marineocean-data/extended-reconstructed-sea-surface-temperature-ersst-v4), a widely used and trusted gridded compilation of historical Sea Surface Temperature (SST).\n", "\n", "> The Extended Reconstructed Sea Surface Temperature (ERSST) dataset is a global monthly sea surface temperature dataset derived from the International Comprehensive Ocean–Atmosphere Dataset (ICOADS). It is produced on a 2° × 2° grid with spatial completeness enhanced using statistical methods. This monthly analysis begins in January 1854 continuing to the present and includes anomalies computed with respect to a 1971–2000 monthly climatology. \n", "\n", "\n", "\n", "First we download the dataset. We will use the [NOAA Extended Reconstructed Sea Surface Temperature (ERSST)](https://psl.noaa.gov/thredds/catalog/Datasets/noaa.ersst/catalog.html?dataset=Datasets/noaa.ersst/sst.mnmean.v4.nc) v4 product. Download the data from this link: https://psl.noaa.gov/thredds/fileServer/Datasets/noaa.ersst/sst.mnmean.v4.nc and store it in the same folder as the notebook as `sst.mnmean.v4.nc`." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Reading in the data set, ignoring the `time_bnds` variable: " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true }, "outputs": [], "source": [ "data = './sst.mnmean.v4.nc'\n", "ds = xr.open_dataset(data, drop_variables=['time_bnds'], engine=\"h5netcdf\")" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "For this use case, we will focus on the years after 1960, so we slice the data from 1960 and load the data into our computer memory. By only loading the data after the initial slice, we make sure to only load into memory the data we specifically need:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "ds = ds.sel(time=slice('1960', '2018')).load() # load into memory\n", "ds" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The data with the extension `nc` is a NetCDF format. NetCDF (Network Common Data Format) is the most widely used format for distributing geoscience data. NetCDF is maintained by the [Unidata](https://www.unidata.ucar.edu/) organization. Check the [netcdf website](https://www.unidata.ucar.edu/software/netcdf/docs/faq.html#whatisit) for more information. Xarray was designed to make reading netCDF files in python as easy, powerful, and flexible as possible. " ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "__Note:__ As the data is in a [OPeNDAP server](https://en.wikipedia.org/wiki/OPeNDAP), we could also load the NETCDF data directly without downloading anything. This would require us to add the `netcdf4` package in our conda environment" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Exploratory data analysis" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The data contains a single data variable `sst` and has 3 dimensions: lon, lat and time each described by a coordinate. Let's first get some insight in the structure and content of the data." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "