{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "
\n", "\"Unidata\n", "
\n", "\n", "

THREDDS Catalogs: Filtering Datasets

\n", "

Unidata AMS 2021 Student Conference

\n", "\n", "
\n", "
\n", "\n", "---\n", "\n", "
\"HTML
\n", "\n", "### Focuses\n", "* Filtering THREDDS Catalog datasets based on time\n", "\n", "### Objectives\n", "1. [Find the Dataset within a THREDDS Catalog nearest to a given time](#filter_nearest)\n", "1. [Find the Datasets within a THREDDS Catalog inside a given time range](#filter_range)\n", "\n", "---\n", "\n", "### Imports\n", "\n", "Often times a dataset's name will contain useful information, like the datasets associated date and time.\n", "Siphon includes methods to help filter THREDDS Catalog datasets based on this type of naming convention, which makes finding the most recent dataset somewhat easier.\n", "We will use the built-in python library `datetime` to set the desired dates/times used by the filter method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from siphon.catalog import TDSCatalog\n", "from datetime import datetime, timedelta" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## Find the Dataset within a THREDDS Catalog closest to a given time \n", "\n", "Building off of the skills developed in the [THREDDS Catalogs: The Basics](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/dataAccess/siphon-catalog-basics.ipynb) notebook, we'll start by reading the catalog associated with the [Unidata NEXRAD Level3 n0r 1km composite THREDDS Catalog](https://thredds.unidata.ucar.edu/thredds/catalog/nexrad/composite/gini/n0r/1km/catalog.html) and looking at what is available:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n0r_composite_catalog = TDSCatalog('https://thredds.unidata.ucar.edu/thredds/catalog/nexrad/composite/gini/n0r/1km/catalog.xml')\n", "print(n0r_composite_catalog.catalog_url)\n", "print(' catalogs: {}'.format(n0r_composite_catalog.catalog_refs))\n", "print(' datasets: {}\\n'.format(n0r_composite_catalog.datasets))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once again we see this THREDDS Catalog points to other catalogs named by date (`yyyyMMdd`).\n", "Let's say we'd like to choose the catalog for today's date and list the datasets.\n", "However, \"today's date\" will change depending on when you run this notebook.\n", "No problem - `datetime` to the rescue!\n", "First, we can ask `datetime` to give us the current _datetime_ object:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "current_date_time = datetime.now()\n", "print(' year: {}'.format (current_date_time.year))\n", "print(' month: {}'.format (current_date_time.month))\n", "print(' day: {}'.format (current_date_time.day))\n", "print(' hour: {}'.format (current_date_time.hour))\n", "print(' minute: {}'.format (current_date_time.minute))\n", "print(' second: {}'.format (current_date_time.second))\n", "print(' microseconds: {}'.format(current_date_time.microsecond))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since we would like the catalog associated with the current ``, we can use the _datetime_ object to make the appropriate string for us to use:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "catalog_name_for_today = current_date_time.strftime('%Y%m%d')\n", "print('Read the catalog named {}'.format(catalog_name_for_today))\n", "catalog_for_today = n0r_composite_catalog.catalog_refs[catalog_name_for_today].follow()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `strftime` method takes a format string (in this case, `'%Y%m%d'`).\n", "This tells the `datetime` library how we want the date/time string to look when converted into a string.\n", "`%Y` means use a four digit year, `%m` means use a two digit month, and `%d` means use a two digit day.\n", "A full description of all of the possible formats can be found in the [python `datetime` documentation](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes).\n", "\n", "Now that we have the catalog containing the data for today, let's examine the name of the first few datasets to get an idea of how they are named:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "datasets_for_today = catalog_for_today.datasets\n", "datasets_for_today[0:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dataset names should look like `Level3_Composite_n0r_1km_20210109_2355.gini`.\n", "As we can see, the date and time is encoded within the filename.\n", "When the filename contains a date and time in the form of `_`, Siphon can automatically filter for times using the `filter_time_nearest` or `filter_time_range` methods.\n", "These functions accept _datetime_ objects to describe the time (or times) you desire.\n", "For example, to find the dataset closest to the current date and time, we can use `filter_time_nearest` with `datetime.now()`, like so:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "current_time = datetime.now()\n", "most_recent_dataset = datasets_for_today.filter_time_nearest(current_time)\n", "print('Current date and time: {}'.format(current_time))\n", "print('Most Recent Dataset: {}'.format(most_recent_dataset))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top\n", "\n", "---\n", "\n", "## Find the Datasets within a THREDDS Catalog inside a given time range \n", "\n", "We can use the `filter_time_range` method to find the datasets associated with the past hour by supplying start and end times:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "one_hour_ago = current_time - timedelta(hours=1)\n", "datasets_from_last_hour = datasets_for_today.filter_time_range(one_hour_ago, current_time)\n", "print(' Start time: {}'.format(one_hour_ago))\n", "print(' End time: {}'.format(current_time))\n", "print(' Datasets between start and end times: {}'.format(datasets_from_last_hour))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These two filter methods can make finding datasets within a catalog a bit easier for simple cases where the date and time are part of the dataset names, like the ones described in this notebook.\n", "\n", "Top\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## See also\n", "* [`datetime` format codes](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes)\n", "* [Siphon documentation](https://unidata.github.io/siphon/latest/index.html)\n", "* [Siphon `filter_time_nearest` documentation](https://unidata.github.io/siphon/latest/api/catalog.html#siphon.catalog.DatasetCollection.filter_time_nearest)\n", "* [Siphon `filter_time_range` documentation](https://unidata.github.io/siphon/latest/api/catalog.html#siphon.catalog.DatasetCollection.filter_time_range)\n", "\n", "## Related Notebooks\n", "* [Siphon (catalog basics)](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/dataAccess/siphon-catalog-basics.ipynb)\n", "* [Siphon (remote_access)](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/dataAccess/siphon-RemoteOpen.ipynb)\n", "* [Siphon (remote_open)](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/dataAccess/siphon-RemoteAccess.ipynb)\n", "* [Siphon (subset)](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/dataAccess/siphon-Subset.ipynb)" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:pyaos-ams-2021]", "language": "python", "name": "conda-env-pyaos-ams-2021-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" } }, "nbformat": 4, "nbformat_minor": 4 }