{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "
\n", "\"Unidata\n", "
\n", "\n", "

xarray: Data Access

\n", "

Unidata AMS 2021 Student Conference

\n", "\n", "
\n", "
\n", "\n", "---\n", "\n", "Using [xarray](http://xarray.pydata.org/en/stable/), you can work with a variety of multi-dimensional array-based data formats and access methods. NetCDF is the first and best option as a well-supported, metadata-rich format, but GRIB and others are also supported. In this notebook, we will be demonstrating various ways to access data in xarray.\n", "\n", "
\"HTML
\n", "\n", "\n", "### Focuses\n", "\n", "- Open a local NetCDF file into xarray and preview its contents\n", "- Open a remote dataset from a THREDDS server using OPENDAP and look through its metadata\n", "- Open a local GRIB file using cfgrib and learn how to work with the multiple datasets that often arise from a single GRIB file\n", "\n", "\n", "### Objectives\n", "\n", "1. [Access local NetCDF data](#1.-Access-local-NetCDF-data)\n", "1. [Access remote OPENDAP data](#2.-Access-remote-OPENDAP-data)\n", "1. [Access local GRIB data](#3.-Access-local-GRIB-data)\n", "1. [Access data using Siphon](#4.-Access-data-using-Siphon)\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Imports" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import cfgrib\n", "from datetime import datetime, timedelta\n", "from siphon.catalog import TDSCatalog\n", "import xarray as xr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Access local NetCDF data\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Xarray's data model strongly resembles that of NetCDF, so it is no surprise that NetCDF format data is well-supported in xarray. First, let's try opening a local NetCDF file. All you need to do is pass your (relative or absolute) file path to `xr.open_dataset`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "radar_data = xr.open_dataset(\"../../instructors/practice_files/spc_torn_2015_588559.nc\")\n", "\n", "radar_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how xarray identifies all the dimensions, coordinates, data variables and global attributes that are in your NetCDF file. Also, if you're running this yourself, notice the speed of this operation. By default, this operation is just loading in the coordinate and attribute metadata, and not the data variables themselves (which are instead only loaded when they are accessed/used). This is called *lazy loading*, and is one of the key features that helps make xarray easy to work with.\n", "\n", "Let's load in some of those data and preview what it looks like! (For more information on the `.isel` part of the line of code below, be sure to check out the [xarray: indexing training](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/analysis/xarray_indexing.ipynb)!)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "radar_data['reflectivity'].isel(time=2).plot.imshow()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From here, your xarray Dataset is ready to work with (see all the other things you can do with xarray at [its documentation](http://xarray.pydata.org/en/stable/) or [with the other training notebooks linked below](#See-also))." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Access remote OPENDAP data\n", "\n", "What if you don't have your data locally, but instead hosted remotely, on say a THREDDS server? Xarray can still help you here! Instead of specifying a filepath to `xr.open_dataset`, xarray also natively handles OPENDAP URLs:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "gfs_data = xr.open_dataset(\n", " \"https://thredds.unidata.ucar.edu/thredds/dodsC/casestudies/python-gallery/\"\n", " \"GFS_20101026_1200.nc\"\n", ")\n", "\n", "gfs_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice here how it pulls all of the metadata (dimensions, coordinates, and attributes), but doesn't actually actually download the data variable contents yet. Again, this is the benefit of xarray's lazy-loading approach. Feel free to click around the Dataset HTML object above to see what is contained in this GFS dataset.\n", "\n", "One additional word of caution with OPENDAP: sometimes when you copy and past an OPENDAP link from a THREDDS server, you'll get a string that looks like this:\n", " \n", "```\n", "https://thredds.unidata.ucar.edu/thredds/dodsC/casestudies/python-gallery/NARR_19930313_1800.nc.html\n", "```\n", "\n", "Xarray doesn't handle that `.html`, so you'll have to remove it before you run the Jupyter cell in your code. See how in the example above that has also been done for you.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Access local GRIB data\n", "\n", "On its own, xarray does not natively support GRIB file formats. However, it (fairly seamlessly) integrates with other packages that extend its supported file types to include GRIB. The package we'll be using is called `cfgrib`.\n", "\n", "While you are able to use `xr.open_dataset` with GRIB files when cfgrib is installed by specifying the `engine='cfgrib'` keyword argument, because of how cfgrib works, this will often require you to specify custom parameters (e.g., `filter_by_keys`) to resolve conflicting coordinates. Instead, it is much easier (so long as you have enough capacity to load your entire file into memory) to just use `cfgrib.open_datasets`, which opens all the data in the GRIB file into separate datasets to resolve any conflicts. You can then search through that list of datasets to find the fields that you need.\n", "\n", "Here, we have a GRIB file that produces just one dataset:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mrms_datasets = cfgrib.open_datasets(\n", " \"../../instructors/practice_files/\"\n", " \"MRMS_MergedReflectivityQCComposite_00.50_20150607-060039.grib2\",\n", " backend_kwargs={'indexpath':''} # don't use cached idx file, which often gives warning\n", ")\n", "\n", "mrms_datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See how we have just one dataset in this list, so, to access it, we index into that list:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mrms_data = mrms_datasets[0]\n", "mrms_data['paramId_0'].plot.imshow(vmin=0, vmax=70)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Top\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Access data with Siphon\n", "\n", "Siphon can also be used to access remote data and load it into xarray. Two methods are shown below: first, using `use_xarray=True` in the `remote_access` method, and second, using the `xr.backends.NetCDF4DataStore` wrapper to handle other NetCDF4-like data loading methods.\n", "\n", "For more details on using Siphon, please glance through all the [Siphon remote access trainings available on the course website](https://unidata.github.io/pyaos-ams-2021/resources.html#data-access).\n", "\n", "### `use_xarray=True`\n", "\n", "See the [remote access with Siphon training notebook](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/dataAccess/siphon-RemoteAccess.ipynb), from which this example comes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "catUrl = \"https://thredds-test.unidata.ucar.edu/thredds/catalog/casestudies/harvey/model/gfs/GFS_Global_0p5deg_20170825_1800.grib2/catalog.xml\"\n", "datasetName = \"GFS_Global_0p5deg_20170825_1800.grib2\"\n", "catalog = TDSCatalog(catUrl)\n", "ds = catalog.datasets[datasetName]\n", "\n", "dataset = ds.remote_access(use_xarray=True)\n", "dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Backend wrapper\n", "\n", "See the [siphon subset notebook](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/dataAccess/siphon-Subset.ipynb), from which this example comes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "top_cat = TDSCatalog('http://thredds.ucar.edu/thredds/catalog.xml')\n", "models_cat = top_cat.catalog_refs[0].follow() # follow reaturns a handle to the specified dataset\n", "gfs_cat = models_cat.catalog_refs['GFS Quarter Degree Forecast'].follow()\n", "\n", "ds = gfs_cat.latest\n", "ncss = ds.subset()\n", "\n", "query = ncss.query()\n", "query.lonlat_point(lon=-105, lat=40) # set coordinates of point of interest.\n", "now = datetime.utcnow() # get current time\n", "query.time_range(now, now + timedelta(days=1)) # create time range of 24 hours\n", "query.variables('Temperature_surface') # request surface temperature variable\n", "query.accept('netcdf4') # return data as a netCDF4 object\n", "\n", "grid_data = xr.open_dataset(xr.backends.NetCDF4DataStore(ncss.get_data(query))) # wrap NetCDF4-like object so that xarray accepts it\n", "grid_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## See also\n", "\n", "There's a lot more to xarray than just opening files! Be sure to take a look at the other xarray training notebooks linked below:\n", "\n", "- [Indexing with xarray](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/analysis/xarray_indexing.ipynb)\n", "- [Interpolation and regridding with xarray](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/analysis/xarray_interpolation.ipynb)\n", "- [Xarray aggregation operations](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/analysis/xarray_aggregations.ipynb)\n", "- [Calculations in xarray](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/analysis/xarray_calculations.ipynb)\n", "- [MetPy with xarray](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/analysis/metpy_and_xarray.ipynb)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:pyaos-ams-2021]", "language": "python", "name": "conda-env-pyaos-ams-2021-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.1" } }, "nbformat": 4, "nbformat_minor": 4 }