<a name="top"></a>
<div style="width:1000 px">

<div style="float:right; width:98 px; height:98px;">
<img src="https://raw.githubusercontent.com/Unidata/MetPy/master/src/metpy/plots/_static/unidata_150x150.png" alt="Unidata Logo" style="height: 98px;">
</div>

<h1>Siphon (subset)</h1>
<h3>Unidata AMS 2021 Student Conference</h3>

<div style="clear:both"></div>
</div>

---

This notebook will demonstrate how to use Siphon to subset and download data using the NetcdfSubset service (NCSS). NCSS supports coordinate-based subsetting, i.e. selecting data by latitude, longitude, time, etc.
<div style="float:right; width:250 px"><img src="../../instructors/images/siphon_subset_preview.png" alt="plot of requestes subset of data" style="height: 300px;"></div>


### Focuses
* Use a NCSS client to view metadata of a dataset
* Build NCSS queries
    * Query point data
    * Query grid data
* Download data subsets by lat, lon, and time


### Objectives
1. [Find a dataset in a TDS Catalog](#1.-Find-a-dataset-in-a-TDS-Catalog)
1. [Create an NCSS Client and access metadata](#2.-Create-an-NCSS-client-and-access-metadata)
1. [Use NCSS to query and subset data at a single point](#3.-Use-NCSS-to-query-and-subset-data-at-a-single-point)
1. [Use NCSS to query and subset data for a gridded region](#4.-Use-NCSS-to-query-and-subset-data-for-a-gridded-region)

---

### Imports
Before beginning, let's import the packages to be used throughout this training:

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from siphon.catalog import TDSCatalog
from datetime import datetime, timedelta

---

## 1. Find a dataset in a TDS Catalog


Our first step is to find a dataset that we'd like to access and subset.  
In this example, we'll use the latest [`GFS Quarter Degree Forecaset`](https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_0p25deg/catalog.html) dataset from the Unidata THREDDS catalog.

Let's start with the top level catalog:

In [None]:
top_cat = TDSCatalog('http://thredds.ucar.edu/thredds/catalog.xml')

And then navigate down two levels to the GFS catalog:

In [None]:
models_cat = top_cat.catalog_refs[0].follow() # follow reaturns a handle to the specified dataset
gfs_cat = models_cat.catalog_refs['GFS Quarter Degree Forecast'].follow()

Finally, we get a handle for our dataset using `latest`:

In [None]:
ds = gfs_cat.latest
ds.name

We can now view the access protocols available for our dataset.

In [None]:
list(ds.access_urls)

This list includes the `NetcdfSubset` service (or NCSS), which is the service we'll be using to subset and download our data.

<a href="#top">Top</a>

---

## 2. Create an NCSS client and access metadata

To use the NetcdfSubset service, we first call `subset` to get an NCSS client.

In [None]:
ncss = ds.subset()

With this client, we can view the variables in our dataset.

In [None]:
list(ncss.variables)

We can also access the metata, which will be returned as [NCSSDataset object](https://unidata.github.io/siphon/latest/api/ncssdataset.html#siphon.ncss_dataset.NCSSDataset).

In [None]:
metadata = ncss.metadata
# print metadata
print("time span: " + str(metadata.time_span))
print("\naccept list: " + str(metadata.accept_list))
print("\nlat_lon_box: " + str(metadata.lat_lon_box))

We will use this metadata to create our subset query in the next section.

<a href="#top">Top</a>

---

## 3. Use NCSS to query and subset data at a single point
We can now use our NCSS client to create a query for the data we want.

In this example, we'll request a subset of data containing the next 24 hours of forecast at a single point.  
First, we create a query object.

In [None]:
query = ncss.query()

Next, we populate the query to request the data we want.

In [None]:
query.lonlat_point(lon=-105, lat=40) # set coordinates of point of interest.
now = datetime.utcnow() # get current time
query.time_range(now, now + timedelta(days=1)) # create time range of 24 hours
query.variables('Temperature_surface') # request surface temperature variable
query.accept('netcdf4') # return data as a netCDF4 object

Once our query is fully populated, we can request the data.

In [None]:
point_data = ncss.get_data(query)
list(point_data.variables)

Finally, let's plot our returned data.

In [None]:
temp = point_data.variables['Temperature_surface'][:] # get surface temperature data
time = point_data.variables['time'][:] # get time data
plt.plot(time, temp, 'k-'); # plot data

<a href="#top">Top</a>

---

## 4. Use NCSS to query and subset data for a gridded region
We can also request data for a region using a bounding box.

We start by creating a query object, just as before.

In [None]:
query = ncss.query()

We will populate this query with the same values as before, except instead of `latlon_point` we'll use `latlon_box`.

In [None]:
query.lonlat_box(east=-80, west=-90, south=35, north=45) # set bounding coordinates
query.time(now + timedelta(days=1))
query.variables('Temperature_surface')
query.accept('netcdf4')

Again, we request the data using `get_data`.

In [None]:
grid_data = ncss.get_data(query)
list(grid_data.variables)

And plot the surface temperature forecast in our region of interest over the next 24 hours.

In [None]:
temp = grid_data.variables['Temperature_surface']
lat = grid_data.variables['lat']
lon = grid_data.variables['lon']
plt.pcolormesh(lon[:], lat[:], temp[0], shading='auto');
plt.title(temp.name);

Try creating your own NCSS query to request different subsets of data, e.g. different regions, different times...

<a href="#top">Top</a>

---

## See also

For more information on Siphon and using NCSS read the [docs](https://unidata.github.io/siphon/latest/api/ncss.html).

You can also read more about the NetcdfSubset service [here](https://www.unidata.ucar.edu/software/tds/current/reference/NetcdfSubsetServiceReference.html).


<a href="#top">Top</a>

---