<a name="top"></a>
<div style="width:1000 px">

<div style="float:right; width:98 px; height:98px;">
<img src="https://raw.githubusercontent.com/Unidata/MetPy/master/src/metpy/plots/_static/unidata_150x150.png" alt="Unidata Logo" style="height: 98px;">
</div>

<h1>Siphon (remote_open)</h1>
<h3>Unidata AMS 2021 Student Conference</h3>

<div style="clear:both"></div>
</div>

---

This notebook demonstrates the Siphon `remote_open` function, which opens a TDS Catalog remote dataset for random access. The `remote_open` method returns a file-like object that can be used similarly to a local file to read raw data.
<div style="float:right; width:250 px"><img src="../../instructors/images/siphon_remote_open_preview.png" alt="raw GRIB data read using remote open" style="height: 300px;"></div>


### Focuses
* Open remote datasets on the TDS
* Use the returned object to read the dataset as raw bytes
* Interface with the dataset as if stored in a local file

### Objectives
1. [Find a dataset in a TDS Catalog](#1.-Find-a-dataset-in-a-TDS-Catalog)
1. [Open the dataset using remote_open](#2.-Open-the-dataset-using-remote_open)
1. [Read the returned object like a local file](#3.-Read-the-returned-object-like-a-local-file)

---

### Imports
Before beginning, let's import the packages to be used throughout this training:

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from siphon.catalog import TDSCatalog

---

## 1. Find a dataset in a TDS Catalog


Before we use `remote_open`, we need to find a dataset that we'd like to access.  
As an example, we'll use this [dataset](https://thredds-test.unidata.ucar.edu/thredds/catalog/casestudies/harvey/model/gfs/GFS_Global_0p5deg_20170825_1800.grib2/catalog.html?dataset=casestudies/harvey/model/gfs/GFS_Global_0p5deg_20170825_1800.grib2) from the NOAA NCEI THREDDS catalog.

To access a dataset, we need to know two things:
* the url of the catalog where the dataset lives
* the dataset name  

The dataset name can be found on the [dataset HTML page](https://www.ncei.noaa.gov/thredds/catalog/model-namanl/202101/20210104/catalog.html?dataset=model-namanl/202101/20210104/nam_218_20210104_0600_006.grb2), e.g. "nam_218_20210104_0600_006.grb2".  
The catalog URL is the URL of the dataset page up to ".html", replacing ".html" with ".xml".

In [None]:
catUrl="https://www.ncei.noaa.gov/thredds/catalog/model-namanl/202101/20210104/catalog.xml"
datasetName="nam_218_20210104_0600_006.grb2"

Next, we access the catalog using the catalog URL:

In [None]:
catalog = TDSCatalog(catUrl)

And then select our dataset using the dataset name:

In [None]:
ds = catalog.datasets[datasetName]
ds.name

We can now view the access protocols available for our dataset.

In [None]:
list(ds.access_urls)

The list of services available for this dataset includes `HTTPServer`, which we'll need to open the dataset using `remote_open`.

<a href="#top">Top</a>

---

## 2. Open the dataset using `remote_open`

We'll now use Siphon's `remote_open` to obtain a file-like object representing the dataset.

In [None]:
data_file = ds.remote_open()
data_file

We now have an object that we can read similar to a local file. 

In [None]:
data = data_file.readline()
data

*Note:* When we use `remote_open` to read a dataset, we are reading raw data from a file-like object, rather than formatted data. The `b` at the start of the data indicates that the string should be interpreted as bytes.

<a href="#top">Top</a>

---

## 3. Read the returned object like a local file
We can now read our dataset using random access.

We can read a line, as we did in the previous section, or we can read a specified number of bytes.

In [None]:
data = data_file.read(100)
data

We can change the our position in the file using `seek`, similar to moving a cursor in a file. The position is given as bytes.

In [None]:
data_file.seek(0) # move "cursor" to start of file
print(data_file.read(4)) # print first 4 bytes
data_file.seek(50) # move "cursor" to byte 50
print(data_file.read(10)) # print 10 more bytes

And we can read the data directly into a byte array.

In [None]:
b = bytearray(100) # create a byte array of length 100
data_file.readinto(b) # read 100 bytes into the byte array
b[:]

Calling `getbuffer` returns the location in memory where the dataset is being stored locally.

In [None]:
b = data_file.getbuffer()
b

We can use the memory buffer to make local writes. Write to the buffer will change the contents of `data_file` in memory, but will not write to the remote file.

In [None]:
data_file.seek(100) # move "cursor" position to byte 100
b[100:110] = b"helloworld"; # we include the `b` before "helloword" to tell Python to interpret it as bytes
data_file.seek(100) # return "cursor" to byte 100
n = data_file.read(10) # read back the written bytes
n

We have opened a remote dataset and read parts of it using random access! Use `remote_open` when you want access to the raw data in a dataset, e.g., if you have Python code to read bytes in a particular format.

*Note:* Without some prior knowledge about the format of the dataset, `remote_open` is not an effective method of parsing data. Since we are reading a raw file object, we need to know layout of the data and the data types (e.g. ints, floats, etc.). To read a dataset as a netCDF object, use [`remote_access`](https://unidata.github.io/siphon/latest/api/catalog.html?highlight=remote%20open#siphon.catalog.Dataset.remote_access)

<a href="#top">Top</a>

---

## See also

For more information on Siphon and `remote_open`, see the [Siphon docs](https://unidata.github.io/siphon/latest/api/catalog.html?highlight=remote%20open#siphon.catalog.Dataset.remote_open).

You may also be interested in reading more about the [file-like object](https://docs.python.org/3/library/io.html#io.BytesIO) returned by `remote_open`.

### Related notebooks
[Siphon (remote_access)](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/dataAccess/siphon-RemoteAccess.ipynb)

<a href="#top">Top</a>

---