{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading Zarr Data\n", "\n", "[Zarr](https://zarr.readthedocs.io/en/stable/) is a storage format for chunked, N-dimensional arrays. It works well with object storage systems like [Azure Blob Storage](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction) and open-source libraries like [xarray](https://xarray.pydata.org/en/stable/). It's widely used in the geosciences, especially within the [Pangeo](https://pangeo.io/) community.\n", "\n", "This example loads [Daymet](https://aka.ms/ai4edata-daymet) data that are stored in Zarr format into an xarray Dataset. Daymet provides gridded weather data for North America. We'll look at [daily frequency data covering Hawaii](https://planetarycomputer.microsoft.com/dataset/daymet-daily-hi).\n", "\n", "The [STAC Collections](../reading-stac) provided by the Planetary Computer contain assets with links to the root of the Zarr store." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pystac_client\n", "import planetary_computer\n", "\n", "catalog = pystac_client.Client.open(\n", " \"https://planetarycomputer.microsoft.com/api/stac/v1/\",\n", " modifier=planetary_computer.sign_inplace,\n", ")\n", "collection = catalog.get_collection(\"daymet-daily-hi\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll access the data using the `zarr-abfs` asset, which uses [adlfs](https://github.com/fsspec/adlfs) and [fsspec](https://filesystem-spec.readthedocs.io/) to load the data from Azure Blob Storage." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "asset = collection.assets[\"zarr-abfs\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Zarr assets provided by the Planetary Computer implement the [xarray-assets](https://github.com/stac-extensions/xarray-assets) extension. These specify the necessary and recommended keywords when loading data from [`fssspec`-based filesystems](https://filesystem-spec.readthedocs.io/en/latest/) and xarray." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['xarray:open_kwargs', 'xarray:storage_options']" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(asset.extra_fields.keys())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case, the dataset should be opened with consolidated metadata." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset>\n",
       "Dimensions:                  (time: 14965, y: 584, x: 284, nv: 2)\n",
       "Coordinates:\n",
       "    lat                      (y, x) float32 dask.array<chunksize=(584, 284), meta=np.ndarray>\n",
       "    lon                      (y, x) float32 dask.array<chunksize=(584, 284), meta=np.ndarray>\n",
       "  * time                     (time) datetime64[ns] 1980-01-01T12:00:00 ... 20...\n",
       "  * x                        (x) float32 -5.802e+06 -5.801e+06 ... -5.519e+06\n",
       "  * y                        (y) float32 -3.9e+04 -4e+04 ... -6.21e+05 -6.22e+05\n",
       "Dimensions without coordinates: nv\n",
       "Data variables:\n",
       "    dayl                     (time, y, x) float32 dask.array<chunksize=(365, 584, 284), meta=np.ndarray>\n",
       "    lambert_conformal_conic  int16 ...\n",
       "    prcp                     (time, y, x) float32 dask.array<chunksize=(365, 584, 284), meta=np.ndarray>\n",
       "    srad                     (time, y, x) float32 dask.array<chunksize=(365, 584, 284), meta=np.ndarray>\n",
       "    swe                      (time, y, x) float32 dask.array<chunksize=(365, 584, 284), meta=np.ndarray>\n",
       "    time_bnds                (time, nv) datetime64[ns] dask.array<chunksize=(365, 2), meta=np.ndarray>\n",
       "    tmax                     (time, y, x) float32 dask.array<chunksize=(365, 584, 284), meta=np.ndarray>\n",
       "    tmin                     (time, y, x) float32 dask.array<chunksize=(365, 584, 284), meta=np.ndarray>\n",
       "    vp                       (time, y, x) float32 dask.array<chunksize=(365, 584, 284), meta=np.ndarray>\n",
       "    yearday                  (time) int16 dask.array<chunksize=(365,), meta=np.ndarray>\n",
       "Attributes:\n",
       "    Conventions:       CF-1.6\n",
       "    Version_data:      Daymet Data Version 4.0\n",
       "    Version_software:  Daymet Software Version 4.0\n",
       "    citation:          Please see http://daymet.ornl.gov/ for current Daymet ...\n",
       "    references:        Please see http://daymet.ornl.gov/ for current informa...\n",
       "    source:            Daymet Software Version 4.0\n",
       "    start_year:        1980
" ], "text/plain": [ "\n", "Dimensions: (time: 14965, y: 584, x: 284, nv: 2)\n", "Coordinates:\n", " lat (y, x) float32 dask.array\n", " lon (y, x) float32 dask.array\n", " * time (time) datetime64[ns] 1980-01-01T12:00:00 ... 20...\n", " * x (x) float32 -5.802e+06 -5.801e+06 ... -5.519e+06\n", " * y (y) float32 -3.9e+04 -4e+04 ... -6.21e+05 -6.22e+05\n", "Dimensions without coordinates: nv\n", "Data variables:\n", " dayl (time, y, x) float32 dask.array\n", " lambert_conformal_conic int16 ...\n", " prcp (time, y, x) float32 dask.array\n", " srad (time, y, x) float32 dask.array\n", " swe (time, y, x) float32 dask.array\n", " time_bnds (time, nv) datetime64[ns] dask.array\n", " tmax (time, y, x) float32 dask.array\n", " tmin (time, y, x) float32 dask.array\n", " vp (time, y, x) float32 dask.array\n", " yearday (time) int16 dask.array\n", "Attributes:\n", " Conventions: CF-1.6\n", " Version_data: Daymet Data Version 4.0\n", " Version_software: Daymet Software Version 4.0\n", " citation: Please see http://daymet.ornl.gov/ for current Daymet ...\n", " references: Please see http://daymet.ornl.gov/ for current informa...\n", " source: Daymet Software Version 4.0\n", " start_year: 1980" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import xarray as xr\n", "\n", "ds = xr.open_zarr(\n", " asset.href,\n", " **asset.extra_fields[\"xarray:open_kwargs\"],\n", " storage_options=asset.extra_fields[\"xarray:storage_options\"]\n", ")\n", "ds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point we can load the data, aggregate it, and plot it." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import warnings\n", "import matplotlib.pyplot as plt\n", "\n", "warnings.simplefilter(\"ignore\", RuntimeWarning)\n", "fig, ax = plt.subplots(figsize=(12, 12))\n", "ds.sel(time=\"2009\")[\"tmax\"].mean(dim=\"time\").plot.imshow(ax=ax, cmap=\"inferno\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Learn more\n", "\n", "The xarray [User Guide](https://xarray.pydata.org/en/stable/io.html#zarr) has more information on reading Zarr data. For more about the Daymet dataset, see [here](http://aka.ms/ai4edata-daymet)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.10" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }