<a name="top"></a>
<div style="width:1000 px">

<div style="float:right; width:98 px; height:98px;">
<img src="https://raw.githubusercontent.com/Unidata/MetPy/master/src/metpy/plots/_static/unidata_150x150.png" alt="Unidata Logo" style="height: 98px;">
</div>

<h1>THREDDS Catalogs: The Basics</h1>
<h3>Unidata AMS 2021 Student Conference</h3>

<div style="clear:both"></div>
</div>

---

<div style="float:right; width:250 px"><img src="../../instructors/images/siphon_tds_intro_preview.png" alt="HTML view of a TDS Catalog" style="height: 300px;"></div>

### Focuses
* Become familiar with THREDDS Catalogs and the THREDDS Data Server (TDS)
* Browse THREDDS Catalogs using Siphon
* Show metadata and available datasets contained within a THREDDS Catalog
* List the data access methods associated with a dataset

### Objectives
1. [Read a THREDDS Catalog](#read)
1. [Moving from one THREDDS Catalog to another](#follow)
1. [Working with a TDS Catalog Dataset](#dataset)
<br>
<br>
<br>
---

### Imports

The main python package we will use to work with THREDDS Catalogs is called Siphon.
Siphon can read THREDDS Catalogs, which are xml documents that do one or more of the following:
1. reference other THREDDS Catalogs
1. expose metadata about a dataset
1. describe how to access a dataset

The xml documents themselves can be written by hand, but often they are generated by a server, such as the THREDDS Data Server.
They may be read locally from an xml file, or remotely over HTTP.
Siphon greatly simplifies the process of reading and using the information contained in xml, allowing users to "siphon off data" from a variety of sources.

In [None]:
from siphon.catalog import TDSCatalog

---
## Read a THREDDS Catalog from a TDS <a name="read" />

For this notebook, we will use the Unidata demonstration TDS.
If you visit the server <https://thredds.ucar.edu/thredds/catalog/catalog.html> in your browser, you will see something like the image at the top of this notebook.
The page you see is actually a product of the TDS, and is generated by the server (we call this an HTML view of the catalog).
If you change the last part of the URL from `.html` to `.xml` (that is, <https://thredds.ucar.edu/thredds/catalog/catalog.html>), you will see the actual THREDDS Catalog in your browser, which looks similar to this:

~~~xml
<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" xmlns:xlink="http://www.w3.org/1999/xlink"   name="Unidata THREDDS Data Server" version="1.0.1">
  <dataset name="Realtime data from IDD">
    <catalogRef xlink:href="idd/forecastModels.xml" xlink:title="Forecast Model Data" name=""/>
    <catalogRef xlink:href="idd/forecastProdsAndAna.xml" xlink:title="Forecast Products and Analyses" name=""/>
    <catalogRef xlink:href="idd/obsData.xml" xlink:title="Observation Data" name=""/>
    <catalogRef xlink:href="idd/radars.xml" xlink:title="Radar Data" name=""/>
    <catalogRef xlink:href="idd/satellite.xml" xlink:title="Satellite Data" name=""/>
  </dataset>
  <dataset name="Other Unidata Data">
    <catalogRef xlink:href="casestudies/catalog.xml" xlink:title="Unidata case studies" name=""/>
  </dataset>
</catalog>
~~~

This catalog tells us that there are other catalogs containing data from forecast models, radar data, satellite data, etc.
In this sense, a catalog can point to other catalogs, creating a tree-like structure in which the datasets are organized.
This will vary from server to server, as needs vary across organizations and groups.

We can use Siphon to read in this remote catalog programmatically, without the need for a web browser:

In [None]:
catalog = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/catalog.xml')

Now that we have read in the THREDDS Catalog from the THREDDS Data Server, we can investigate what information it holds.
A list of the names of other catalogs it points to is contained within the `catalog_refs` instance attribute, and can be access as follows:

In [None]:
catalog.catalog_refs

There are more things you can do once you have read a THREDDS Catalog using Siphon, but for now we'll leave it at this.

<div class="alert alert-block alert-info">
<b>Note: Not all TDS catalogs are intended to be browsed directly:</b> Occasionally, the TDS is used purely as "middleware", and the catalogs are not setup for users to easily browse directly.
An example of this would be the catalogs produced by the TDS serving data for the <b>N</b>orth <b>A</b>merica component of the <b>Co</b>ordinated <b>R</b>egional <b>D</b>ownscaling <b>Ex</b>periment ([NA-CORDEX](https://na-cordex.org/index.html)).
The intent of the data providers is for users to search for datasets using the <a href="https://na-cordex.org/index.html">NA-CORDEX search page</a> on the <a href="https://www.earthsystemgrid.org/">NCAR Climate Data Gateway</a>, which allows for one to search for datasets by variable type, experiment, driver, model, etc.
Although the NA-CORDEX datasets are hosted on a TDS, they are all contained in one catalog, and their names are defined using a combination of the parameters used in the NA-CORDEX search page.
For example, one of the over 62,000 NA-CORDEX datasets is named <i>cordex.raw.NAM-44.seas.RCA4.EC-EARTH.rcp26.prec.v20180914</i>.
When you see a THREDDS Catalog in which the datasets have opaque names like this, that's your clue that the catalogs are probably not intended to be browsed directly by users, but rather accessed through another service (such as the NA-CORDEX search interface on the Climate Data Gateway).
</div>

<a href="#top">Top</a>

---

## Reading a referenced catalog  <a name="follow" />

If we'd like to see what is available in the `'Satellite Data'` catalog, we can use the `.follow()` method to read in the new catalog, and look at the `.catalog_refs` instance attribute of the new catalog:

In [None]:
satellite_catalog = catalog.catalog_refs['Satellite Data'].follow()
satellite_catalog.catalog_refs

The URL of the new catalog is in the `catalog_url` instance attribute, and can be accessed as follows:

In [None]:
satellite_catalog.catalog_url

Any datasets described by the catalog are contained in the `datasets` instance attribute:

In [None]:
satellite_catalog.datasets

The `[]` indicates there are no datasets contained within the catalog.
We can continue to work our way down through the catalog structure until we reach a catalog that contains a dataset.

In [None]:
goes_east_grb_catalog = satellite_catalog.catalog_refs['GOES East GOES Rebroadcast (GRB)'].follow()
print(goes_east_grb_catalog.catalog_url)
print('  catalogs: {}'.format(goes_east_grb_catalog.catalog_refs))
print('  datasets: {}\n'.format(goes_east_grb_catalog.datasets))

abi_catalog = goes_east_grb_catalog.catalog_refs['ABI'].follow()
print(abi_catalog.catalog_url)
print('  catalogs: {}'.format(abi_catalog.catalog_refs))
print('  datasets: {}\n'.format(abi_catalog.datasets))

conus_catalog = abi_catalog.catalog_refs['CONUS'].follow()
print(conus_catalog.catalog_url)
print('  catalogs: {}'.format(conus_catalog.catalog_refs))
print('  datasets: {}\n'.format(conus_catalog.datasets))

channel01_catalog = conus_catalog.catalog_refs['Channel01'].follow()
print(channel01_catalog.catalog_url)
print('  catalogs: {}'.format(channel01_catalog.catalog_refs))
print('  datasets: {}\n'.format(channel01_catalog.datasets))

date_catalog = channel01_catalog.catalog_refs['20210110'].follow()
print(date_catalog.catalog_url)
print('  catalogs: {}'.format(date_catalog.catalog_refs))
print('  datasets: {}\n'.format(date_catalog.datasets))

We used the `follow()` method several times before finally reaching a catalog with datasets.
Normally, it is easiest to browse the catalogs of a TDS using a web browser in order to find a dataset collection that you might be interested in using.
Once you have found a dataset you are interested in, you can use the URL from your browser to begin working in python using Siphon.
For this collection of data (CONUS domain of the GOES East satellite Advanced Baseline Imager instrument (channel 1)), the catalog <https://thredds.ucar.edu/thredds/catalog/satellite/goes/east/grb/ABI/CONUS/Channel01/catalog.xml> looks like a good place to start, as it points to catalogs named by date (`yyyyMMdd`).

<div class="alert alert-block alert-danger">
<b>Real-time data availability:</b> In general, the datasets available on the demonstration TDS managed by Unidata are updated in real time.
Data are removed from the server after a certain period of time, typically between three days and one month (depending on the size of the data files).
This collection contains, roughly, the most recent 14 days of data.
</div>

As mentioned at the beginning of this notebook, catalogs can expose metadata about a dataset.
The `metadata` instance variable holds any metadata defined by the catalog, such as `dataFormat`, `documentation`, etc.
For example, the metadata associated with `date_catalog` looks like:

In [None]:
date_catalog.metadata

The amount of metadata contained within a catalog depends on how much effort has been put into currating the collection.

<a href="#top">Top</a>

---

## Working with a TDS Catalog Dataset <a name="dataset" />
Once we have found a catalog with datasets, we can access once of the datasets using its name:

In [None]:
dataset = date_catalog.datasets['OR_ABI-L1b-RadC-M6C01_G16_s20210100156163_e20210100158536_c20210100158591.nc']

Now that we have a dataset, we can see in what ways we can access the dataset using the `access_urls` instance variable:

In [None]:
dataset.access_urls

Each service provides a unique way of accessing the metadata or actual data contained within the dataset.
Other Siphon notebooks explore ways in which the services can be used, but at this point, you are ready to begin your data analysis journey!

<a href="#top">Top</a>

---

## See also
* [Siphon  documentation](https://unidata.github.io/siphon/latest/index.html)
* [Siphon TdsCatalog class documentation](https://unidata.github.io/siphon/latest/api/catalog.html#siphon.catalog.TDSCatalog)

## Related Notebooks
* [Siphon (catalog filtering)](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/dataAccess/siphon-catalog-filtering.ipynb)
* [Siphon (remote_access)](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/dataAccess/siphon-RemoteOpen.ipynb)
* [Siphon (remote_open)](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/dataAccess/siphon-RemoteAccess.ipynb)
* [Siphon (subset)](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/dataAccess/siphon-Subset.ipynb)

## Example TDS instances
* [Unidata Demonstration TDS](https://thredds.ucar.edu/thredds/catalog/catalog.html)
* [Department of Atmospheric and Oceanic Sciences - University of Wisconsin â€“ Madison](https://thredds.aos.wisc.edu/thredds/catalog/catalog.html)
* [Northwest Knowledge Network (NKN) - University of Idaho](https://www.reacchpna.org/thredds/catalog/catalog.html)
* [Coastal Data Information Program (CDIP) - University of California San Diego](https://thredds.cdip.ucsd.edu/thredds/catalog/catalog.html)
* NOAA Servers
  * [National Centers for Environmental Information (NCEI)](https://www.ncei.noaa.gov/thredds/catalog/catalog.html)
  * [Center for Satellite Applications and Research (STAR)](https://www.star.nesdis.noaa.gov/thredds/catalog/catalog.html)
  * [Environmental Research Division (ERD) - Southwest Fisheries Science Center](https://oceanwatch.pfeg.noaa.gov/thredds/catalog.html)
  * [Center for Operational Oceanographic Products and Services (CO-OPS)](https://opendap.co-ops.nos.noaa.gov/thredds/catalog/catalog.html)
* NASA Servers
  * [Jet Propulsion Laboratory (JPL) Physical Oceanography Distributed Active Archive Center (PO.DAAC)
](https://thredds.jpl.nasa.gov/thredds/catalog/catalog.html)
  * [Oak Ridge National Laboratory (ORNL) DAAC](https://thredds.daac.ornl.gov/thredds/catalogs/ornldaac/ornldaac.html)
* USGS Servers
  * [Center for Integrated Data Analytics](https://cida.usgs.gov/thredds/catalog/catalog.html)
  * [Woods Hole Coastal and Marine Science Center](https://geoport.whoi.edu/thredds/bathy_catalog.html) (Topography/Bathymetry)


<a href="#top">Top</a>

---