{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "

Xarray

\n", "\n", "\n", "> *DS Python for GIS and Geoscience* \n", "> *October, 2020*\n", ">\n", "> *© 2020, Joris Van den Bossche and Stijn Van Hoey. Licensed under [CC BY 4.0 Creative Commons](http://creativecommons.org/licenses/by/4.0/)*\n", "\n", "---" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "import rasterio\n", "from rasterio.plot import plotting_extent, reshape_as_image" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "By this moment you probably already know how to read data files with rasterio:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true }, "outputs": [], "source": [ "data_file = \"./data/gent/raster/2020-09-17_Sentinel_2_L1C_True_color.tiff\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true }, "outputs": [], "source": [ "with rasterio.open(data_file) as src:\n", " # extract data, metadata and extent into memory\n", " gent_profile = src.profile\n", " gent_data = src.read([1, 2, 3], out_dtype=float, masked=False)\n", " gent_ext = plotting_extent(src)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "plt.imshow(gent_data[0, :, :], extent=gent_ext, cmap=\"Reds\")" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Rasterio...\n", "\n", "__Benefits__\n", " - Direct link with Numpy data types\n", " - Rasterio supports important GIS transformations (clip, mask, warp, merge, transformation,...)\n", " - Only load a subset of a large data set into memory\n", "\n", "__Drawbacks__:\n", " - Coordinate information is decoupled from the data itself (keep track and organize the extent and meta data) \n", " - Make sure to keep track of what each dimension represents (y-first, as arrays are organized along rows first)\n", " - Functionality overlap with GDAL (and sometimes installation issues)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Meet `xarray`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true }, "outputs": [], "source": [ "import xarray as xr" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "gent = xr.open_rasterio(data_file)\n", "gent" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "plt.imshow(gent.sel(band=1), cmap=\"Reds\");" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Xarray brings its own plotting methods, but relies on Matplotlib as well for the actual plotting:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "ax = gent.sel(band=1).plot.imshow(cmap=\"Reds\", figsize=(12, 5)) # robust=True\n", "# ax.axes.set_aspect('equal')" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "As a preview, plot the intersection of the data at x coordinate closest to 400000 for each band:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "gent.sel(x=400_000, method='nearest').plot.line(col='band')" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "But first, let's have a look at the data again:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "gent" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The output of xarray is a bit different to what we've previous seen. Let's go through the different elements:\n", "\n", "- It is a `xarray.DataArray`, one of the main data types provided by xarray\n", "- It has 3 __dimensions__:\n", " - `band`: 3 bands (RGB)\n", " - `y`: the y coordinates of the data set\n", " - `x`: the x coordinates of the data set\n", "- Each of these dimensions are defined by a __coordinate__ (1D) array\n", "- Other metadata provided by the `tiff` are stored in the __`Attributes`__\n", "\n", "Looking to the data itself (click on the icons on the right), we can see this is still a Numpy array" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true }, "outputs": [], "source": [ "#gent.values" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "gent = gent.assign_coords(band=(\"band\", [\"R\", \"G\", \"B\"]))\n", "gent" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Hence, we can __name dimensions__ and also extract (slice) data using these names..." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "gent.sel(band='R')" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Using xarray:\n", "\n", "- Data stored as a Numpy arrays\n", "- Dimensions do have a name\n", "- The coordinates of each of the dimensions can represent coordinates, categories, dates,... instead of just an index\n", " " ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "
\n", "\n", "**REMEMBER**:
\n", "\n", "The [`xarray` package](xarray.pydata.org/en/stable/) introduces __labels__ in the form of dimensions, coordinates and attributes on top of raw numPy-like arrays. Xarray is inspired by and borrows heavily from Pandas. \n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Numpy with labels..." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Recap the NDVI exercise of the Numpy notebook, using a stacked version of the 4th and 8th Sentinel band:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_array = xr.open_rasterio(\"./data/gent/raster/2020-09-17_Sentinel_2_L1C_B0408.tiff\")\n", "xr_array" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "In Numpy, we would do:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "b48_bands = xr_array.values # 0 is band 4 and 1 is band 8\n", "b48_bands.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true }, "outputs": [], "source": [ "ndvi_np = (b48_bands[1] - b48_bands[0])/(b48_bands[0] + b48_bands[1]) # or was it b48_bands[0] - b48_bands[1] ?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "plt.imshow(ndvi_np, cmap=\"YlGn\")" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "In __xarray__:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true }, "outputs": [], "source": [ "xr_array = xr_array.assign_coords(band=(\"band\", [\"b4\", \"b8\"]))\n", "xr_data = xr_array.to_dataset(dim=\"band\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true }, "outputs": [], "source": [ "ndvi_xr = (xr_data[\"b8\"] - xr_data[\"b4\"])/(xr_data[\"b8\"] + xr_data[\"b4\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "plt.imshow(ndvi_xr, cmap=\"YlGn\")" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The result is the same, but no more struggling on what index is representing which variable!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "np.allclose(ndvi_xr.data, ndvi_np)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We can keep the result together with the other data variables by adding a new variable to the data, in a very similar way as we created a new column in Pandas:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_data[\"ndvi\"] = ndvi_xr\n", "xr_data" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "You already encountered `xarray.DataArray`, but now we created a `xarray.Dataset`:\n", "\n", "- A `xarray.Dataset` is the second main data type provided by xarray\n", "- It has 2 __dimensions__:\n", " - `y`: the y coordinates of the data set\n", " - `x`: the x coordinates of the data set\n", "- Each of these dimensions are defined by a __coordinate__ (1D) array\n", "- It has 3 __Data variables__: `band_4`, `band_8` and `ndvi` that share the same coordinates\n", "- Other metadata provided by the `tiff` are stored in the __`Attributes`__\n", "\n", "Looking to the data itself (click on the icons on the right), we can see each of the _Data variables_ is a Numpy ndarrays:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "type(xr_data[\"b4\"].data)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "And also the coordinates that describe a dimension are Numpy ndarrays:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "type(xr_data.coords[\"x\"].values)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "__Selecting data__" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Xarray’s labels make working with multidimensional data much easier:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true }, "outputs": [], "source": [ "xr_array = xr.open_rasterio(\"./data/gent/raster/2020-09-17_Sentinel_2_L1C_B0408.tiff\")" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Rename the coordinates of the band dimension:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true }, "outputs": [], "source": [ "xr_array = xr_array.assign_coords(band=(\"band\", [\"b4\", \"b8\"]))" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We could use the Numpy style of data slicing:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_array[0]" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "However, it is often much more powerful to use xarray’s `.sel()` method to use label-based indexing:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_array.sel(band=\"b4\")" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We can select a specific set of coordinate values as a __list__ and take the value that is most near to the given value:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_array.sel(x=[406803, 410380, 413958], method=\"nearest\") # .sel(band=\"b4\").plot.line(hue=\"x\");" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Sometimes, a specific range is required. The `.sel()` method also supports __slicing__, so we can select band 4 and slice a subset of the data along the x direction:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_array.sel(x=slice(400_000, 420_000), band=\"b4\").plot.imshow()" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "__Note__ Switch in between `Array` and `Datasets` as you like, it won't hurt your computer memory:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true }, "outputs": [], "source": [ "xr_data = xr_array.to_dataset(dim=\"band\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true }, "outputs": [], "source": [ "#xr_data.to_array() # dim=\"band\"" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Reduction" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Just like in numpy, we can reduce xarray DataArrays along any number of axes:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_data[\"b4\"].mean(axis=0).dims" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_data[\"b4\"].mean(axis=1).dims" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "But we have __dimensions with labels__, so rather than performing reductions on axes (as in Numpy), we can perform them on __dimensions__. This turns out to be a huge convenience:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_data[\"b4\"].mean(dim=\"x\").dims" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Calculate minimum or quantile values for each of the bands separately:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_array.min(dim=[\"x\", \"y\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_array.quantile([0.1, 0.5, 0.9], dim=[\"x\", \"y\"])" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Element-wise computation" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Xarray DataArrays and Datasets work seamlessly with arithmetic operators and numpy array functions." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_data[\"b4\"] /10." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "np.log(xr_data[\"b8\"])" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "As we seen in the example of the NDVI, we can combine multiple xarray datasets in arithemetic operations:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_data[\"b8\"] + xr_data[\"b4\"]" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Broadcasting" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Perfoming an operation on arrays with differenty coordinates will result in automatic broadcasting:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_data.x.shape, xr_data[\"b8\"].shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_data[\"b8\"] + xr_data.x # Note, this calculaton does not make much sense, but illustrates broadcasting" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Plotting" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Similar to Pandas, there is a `plot` method, which can be used for different plot types:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true }, "outputs": [], "source": [ "xr_array = xr.open_rasterio(\"./data/gent/raster/2020-09-17_Sentinel_2_L1C_B0408.tiff\")\n", "xr_array = xr_array.assign_coords(band=(\"band\", [\"b4\", \"b8\"]))" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "It supports both 2 dimensional (e.g. line) as 3 (e.g. imshow, pcolormesh) dimensional plots. When just using `plot`, xarray will do a _best guess_ on how to plot the data. However being explicit `plot.line`, `plot.imshow`, `plot.pcolormesh`, `plot.scatter`,... gives you more control." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_array.sel(band=\"b4\").plot(); # add .line() -> ValueError: For 2D inputs, please specify either hue, x or y." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_array.sel(x=420000, method=\"nearest\").plot.line(hue=\"band\");" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "`facetting` splits the data in subplots according to a dimension, e.g. `band`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_array.sel(x=420000, method=\"nearest\").plot.line(col=\"band\"); # row=\"band\"" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Use the `robust` option when there is a lack of visual difference. This will use the 2nd and 98th percentiles of the data to compute the color limits. The arrows on the color bar indicate that the colors include data points outside the bounds." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "ax = xr_array.sel(band=\"b4\").plot(cmap=\"Reds\", robust=True, figsize=(12, 5))\n", "ax.axes.set_aspect('equal')" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Compare data variables within a `xarray Dataset`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_data = xr_array.to_dataset(dim=\"band\")\n", "xr_data.plot.scatter(x=\"b4\", y=\"b8\", s=2)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Calculating and plotting the NDVI in three classes illustrates the options of the `imshow` method:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr_data[\"ndvi\"] = (xr_data[\"b8\"] - xr_data[\"b4\"])/(xr_data[\"b8\"] + xr_data[\"b4\"])\n", "xr_data[\"ndvi\"].plot.imshow(levels=[-1, 0, 0.3, 1.], colors=[\"gray\", \"yellowgreen\", \"g\"])" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Let's practice!" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The data set for the following exercises is from [Argo floats](https://argo.ucsd.edu/), an international collaboration that collects high-quality temperature and salinity profiles from the upper 2000m of the ice-free global ocean and currents from intermediate depths.\n", "\n", "These data do not represent full coverage image data (like remote sensing images), but measurements of salinity and temperature as a function of water `level` (related to the pressure). Each measurements happens at a given `date` on a given location (`lon`/`lat`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true }, "outputs": [], "source": [ "import xarray as xr\n", "argo = xr.load_dataset(\"./data/argo_float.nc\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "argo" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The bold font (or * symbol in plain text output version) in the coordinate representation above indicates that x and y are 'dimension coordinates' (they describe the coordinates associated with data variable axes) while band is a 'non-dimension coordinates'. We can make any variable a non-dimension coordinate." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Let's plot the coordinates of the available measurements and add a background map using [contextly](https://contextily.readthedocs.io/en/latest/index.html):" ] }, { "cell_type": "raw", "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "source": [ "import contextily as cx\n", "from pyproj import CRS\n", "crs = CRS.from_epsg(4326)\n", "\n", "fig, ax = plt.subplots(figsize=(6, 6))\n", "argo.plot.scatter(x='lon', y='lat', ax=ax, color='red', s=2)\n", "\n", "# Custom adjustments of the limits, as we are in the middle of the ocean\n", "xmin, xmax = ax.get_xlim()\n", "ymin, ymax = ax.get_ylim()\n", "ax.set_xlim(xmin*1.5, xmax*0.5)\n", "ax.set_ylim(ymin*0.5, ymax*1.5)\n", "\n", "cx.add_basemap(ax, crs=crs.to_string())" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "
\n", "\n", "**EXERCISE**:\n", "\n", "Add a new variable to the `argo` data set, called `temperature_kelvin`, by converting the temperature to Kelvin. \n", " \n", "Degrees Kelvin = degrees celsius + 273.\n", " \n", "
\n", " \n", "Hints\n", "\n", "* Remember that xarray works as Numpy and relies on the same broadcasting rules.\n", "\n", "
\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "# %load _solutions/13-xarray1.py" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "
\n", "\n", "**EXERCISE**:\n", "\n", "The water level classes define different water depths. The pressure is a proxy for the water depth. Verify the relationship between the pressure and the level using a scatter plot. Does a larger value for the level represent deeper water depths or not?\n", " \n", "
Hints\n", " \n", "* If you get the error `ValueError: Dataset.plot cannot be called directly. Use an explicit plot method, e.g. ds.plot.scatter(...)`, read the message and do what it says.\n", "\n", "
\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# %load _solutions/13-xarray2.py" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "
\n", "\n", "**EXERCISE**:\n", "\n", "Assume that buoyancy is defined by the following formula:\n", " \n", "$$g \\cdot ( 2\\times 10^{-4} \\cdot T - 7\\times 10^{-4} \\cdot P )$$\n", "\n", "With:\n", "- $g$ = 9.8\n", "- $T$ = temperature\n", "- $P$ = pressure\n", "\n", "Calculate the buoyancy and add it as a new variable `buoyancy` to the `argo` data set. \n", "\n", "Make a 2D (image) plot with the x-axis the date, the y-axis the water level and the color intensity the buoyancy. As the level represents the depth of the water, it makes more sense to have 0 (surface) at the top of the y-axis: switch the y-axis direction.\n", " \n", "
Hints\n", "\n", "* Remember that xarray works as Numpy and relies on the same broadcasting rules.\n", "* The `imshow` method does not work on irregular intervals. Matplotlib and xarray also have `pcolormesh`. \n", "* Look for options [in the xarray documentation](http://xarray.pydata.org/en/stable/plotting.html#other-axes-kwargs) to control the axis direction. (The `ax.invert_yaxis()` Matplotlib function is not supported for pcolormesh)\n", " \n", "
\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "# %load _solutions/13-xarray3.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# %load _solutions/13-xarray4.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# %load _solutions/13-xarray5.py" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "
\n", "\n", "**EXERCISE**:\n", "\n", "Make a line plot of the salinity as a function of time at level 10\n", " \n", "
Hints\n", "\n", "Break it down into different steps and chain the individual steps:\n", " \n", "* From the argo data set, select the variable `salinity`. This is similar to selecting a column in Pandas.\n", "* Next, use the `sel` method to select the `level=10`\n", "* Next, use the `plot.line()` method.\n", "\n", "
\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# %load _solutions/13-xarray6.py" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "
\n", "\n", "**EXERCISE**:\n", "\n", "- Make a line plot of the temperature as a function of time for the levels 10, 20 and 30 at the same graph \n", "- Make a second line plot with each of the 3 levels (10, 20, 30) in a different subplot. \n", " \n", "
Hints\n", "\n", "Break it down into different steps and chain these individual steps:\n", " \n", "* From the argo data set, select the variable `temperature`. This is similar to selecting a column in Pandas.\n", "* Next, use the `sel` method to select the levels 10, 20 and 30.\n", "* Next, use the `plot.line()` method, but make sure the `hue` changes for each level\n", " \n", "For the subplots, check the [facetting documentation](http://xarray.pydata.org/en/stable/plotting.html#faceting) of xarray. \n", "\n", "
\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# %load _solutions/13-xarray7.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# %load _solutions/13-xarray8.py" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "
\n", "\n", "**EXERCISE**:\n", "\n", "You wonder how the temperature evolves with increasing latitude and what the effect is of the depth (level):\n", "\n", "- Create a scatter plot of the `level` as a function of the `temperature` colored by the `latitude`. \n", " \n", "- As a further exploration step, pick a subset of levels 1, 10, 25, and 50 and create a second scatter plot with in the x-axis the latitude of the measurement and in the y-axis the temperature. To compare the effect of the different levels, give each level a separate subplot next to each other.\n", " \n", "
Hints\n", "\n", "* In a scatter plot, the color or hue can be linked to a variable.\n", "* From the argo data set, use the `sel` method to select the levels 1, 10, 25, and 50.\n", "* For the second scatter plot, but make sure the `col` changes for each `level` and define which variables need to go to which axis.\n", "\n", "
\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# %load _solutions/13-xarray9.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# %load _solutions/13-xarray10.py" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "
\n", "\n", "**EXERCISE**:\n", "\n", "Make an image plot of the temperature as a function of time. Divide the colormap in 3 discrete categories:\n", " \n", "* x < 5\n", "* 5 < x < 15\n", "* x > 15\n", " \n", "Choose a custom colormap and adjust the label of the colorbar to `'Temperature (°C)'`\n", " \n", "
\n", " \n", "Hints\n", "\n", "- Check the help of the `plot` function or the [xarray documentation](http://xarray.pydata.org/en/stable/plotting.html#discrete-colormaps) on discrete colormaps.\n", "- Adjustments to the colorbar settings can be defined with the `cbar_kwargs` as a dict. Adjust the `label` of the colorbar. \n", "\n", "
\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# %load _solutions/13-xarray11.py" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "
\n", "\n", "**EXERCISE**:\n", "\n", "Calculate the average salinity and temperature as a function of level over the measurements taken between 2012-10-01 and 2012-12-01. \n", "\n", "Make a separate line plot for each of them. Define the figure and 2 subplots first with Matplotlib. \n", " \n", "
Hints\n", "\n", "* xarray supports to query dates using a string representation.\n", "* Use the `slice` operator within the `sel` to select a range of the data.\n", "* Whereas in Numpy we used `axis` in reduction functions, xarray uses the `dim` name.\n", "* Also for line plots you can define which dimension should be on the x-axis and which on the y-axis by providing the name. \n", "* Use `fig, (ax0, ax1) = plt.subplots(1, 2)` to create subplots.\n", "
\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "# %load _solutions/13-xarray12.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true, "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# %load _solutions/13-xarray13.py" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Pandas for multiple dimensions..." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true }, "outputs": [], "source": [ "argo = xr.load_dataset(\"./data/argo_float.nc\")" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "If we are interested in the _average over time_ for each of the levels, we can use a reducton function to get the averages of each of the variables at the same time:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "argo.mean(dim=[\"date\"])" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "But if we wanted the _average for each month of the year_ per level, we would first have to __split__ the data set in a group for each month of the year, __apply__ the average function on each of the months and __combine__ the data again. \n", "\n", "We already learned about the [split-apply-combine](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html) approach when using Pandas. The syntax of Xarray’s groupby is almost identical to Pandas! " ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "First, extract the month of the year (1-> 12) from each of the date coordinates:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "argo.date.dt.month # The coordinates is a Pandas datetime index" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We can use these arrays in a groupby operation:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "argo.groupby(argo.date.dt.month)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Xarray also offers a more concise syntax when the variable you're grouping on is already present in the dataset. This is identical to the previous line:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "argo.groupby(\"date.month\")" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Next, we apply an aggregation function _for each of the months_ over the `date` dimension in order to end up with: _for each month of the year, the average (over time) for each of the levels_:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "argo.groupby(\"date.month\").mean(dim=\"date\") #[\"temperature\"].sel(level=1).to_series().plot.barh()" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Another (alike) operation - specifically for time series data - is to `resample` the data to another time-aggregation. For example, resample to monthly (`1M`) or yearly (`1Y`) median values:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "argo.resample(date=\"1M\").median() # 1Y" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "argo[\"salinity\"].sel(level=1).plot.line(x=\"date\");\n", "argo[\"salinity\"].resample(date=\"1M\").median().sel(level=1).plot.line(x=\"date\"); # 1Y" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "A similar, but different functionality is `rolling` to calculate rolling window aggregates:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "argo.rolling(level=10, center=True).std()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "argo[\"salinity\"].sel(date='2012-10-31').plot.line(y=\"level\", yincrease=False, color=\"grey\");\n", "argo[\"salinity\"].sel(date='2012-10-31').rolling(level=10, center=True).median().plot.line(y=\"level\", yincrease=False, linewidth=3, color=\"crimson\");\n", "plt.legend(), plt.title(\"\");" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "
\n", "\n", "**REMEMBER**:
\n", "\n", "The [xarray `groupby`](http://xarray.pydata.org/en/stable/groupby.html) with the same syntax as Pandas implements the __split-apply-combine__ strategy. Also [`resample`](http://xarray.pydata.org/en/stable/time-series.html#resampling-and-grouped-operations) and [`rolling`](http://xarray.pydata.org/en/stable/computation.html?highlight=rolling#rolling-window-operations) are available in xarray.\n", " \n", "__Note:__ Xarray adds a [`groupby_bins`](http://xarray.pydata.org/en/stable/generated/xarray.Dataset.groupby_bins.html#xarray.Dataset.groupby_bins) convenience function for binned groups (instead of each value).\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "---------------\n", "\n", "__Note:__ Values are only read from disk when needed. For example, the following statement only reads the coordinate information and the metadata. The data itself is not yet loaded:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "gent = xr.open_rasterio(data_file)\n", "gent" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "`load()` will explicitly load the data into memory:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": true, "editable": true, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "xr.open_rasterio(data_file).load()" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Acknowledgements and great thanks to https://earth-env-data-science.github.io for the inspiration, data and examples." ] } ], "metadata": { "celltoolbar": "Nbtutor - export exercises", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }