\n",
"\n",
"---\n",
"\n",
"This notebook demonstrates the Siphon `remote_open` function, which opens a TDS Catalog remote dataset for random access. The `remote_open` method returns a file-like object that can be used similarly to a local file to read raw data.\n",
"\n",
"\n",
"\n",
"### Focuses\n",
"* Open remote datasets on the TDS\n",
"* Use the returned object to read the dataset as raw bytes\n",
"* Interface with the dataset as if stored in a local file\n",
"\n",
"### Objectives\n",
"1. [Find a dataset in a TDS Catalog](#1.-Find-a-dataset-in-a-TDS-Catalog)\n",
"1. [Open the dataset using remote_open](#2.-Open-the-dataset-using-remote_open)\n",
"1. [Read the returned object like a local file](#3.-Read-the-returned-object-like-a-local-file)\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Imports\n",
"Before beginning, let's import the packages to be used throughout this training:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"from siphon.catalog import TDSCatalog"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Find a dataset in a TDS Catalog\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before we use `remote_open`, we need to find a dataset that we'd like to access. \n",
"As an example, we'll use this [dataset](https://thredds-test.unidata.ucar.edu/thredds/catalog/casestudies/harvey/model/gfs/GFS_Global_0p5deg_20170825_1800.grib2/catalog.html?dataset=casestudies/harvey/model/gfs/GFS_Global_0p5deg_20170825_1800.grib2) from the NOAA NCEI THREDDS catalog."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To access a dataset, we need to know two things:\n",
"* the url of the catalog where the dataset lives\n",
"* the dataset name \n",
"\n",
"The dataset name can be found on the [dataset HTML page](https://www.ncei.noaa.gov/thredds/catalog/model-namanl/202101/20210104/catalog.html?dataset=model-namanl/202101/20210104/nam_218_20210104_0600_006.grb2), e.g. \"nam_218_20210104_0600_006.grb2\". \n",
"The catalog URL is the URL of the dataset page up to \".html\", replacing \".html\" with \".xml\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"catUrl=\"https://www.ncei.noaa.gov/thredds/catalog/model-namanl/202101/20210104/catalog.xml\"\n",
"datasetName=\"nam_218_20210104_0600_006.grb2\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we access the catalog using the catalog URL:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"catalog = TDSCatalog(catUrl)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And then select our dataset using the dataset name:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds = catalog.datasets[datasetName]\n",
"ds.name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now view the access protocols available for our dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"list(ds.access_urls)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The list of services available for this dataset includes `HTTPServer`, which we'll need to open the dataset using `remote_open`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Top\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Open the dataset using `remote_open`\n",
"\n",
"We'll now use Siphon's `remote_open` to obtain a file-like object representing the dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_file = ds.remote_open()\n",
"data_file"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now have an object that we can read similar to a local file. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = data_file.readline()\n",
"data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Note:* When we use `remote_open` to read a dataset, we are reading raw data from a file-like object, rather than formatted data. The `b` at the start of the data indicates that the string should be interpreted as bytes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Top\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Read the returned object like a local file\n",
"We can now read our dataset using random access."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can read a line, as we did in the previous section, or we can read a specified number of bytes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = data_file.read(100)\n",
"data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can change the our position in the file using `seek`, similar to moving a cursor in a file. The position is given as bytes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_file.seek(0) # move \"cursor\" to start of file\n",
"print(data_file.read(4)) # print first 4 bytes\n",
"data_file.seek(50) # move \"cursor\" to byte 50\n",
"print(data_file.read(10)) # print 10 more bytes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And we can read the data directly into a byte array."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"b = bytearray(100) # create a byte array of length 100\n",
"data_file.readinto(b) # read 100 bytes into the byte array\n",
"b[:]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Calling `getbuffer` returns the location in memory where the dataset is being stored locally."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"b = data_file.getbuffer()\n",
"b"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use the memory buffer to make local writes. Write to the buffer will change the contents of `data_file` in memory, but will not write to the remote file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_file.seek(100) # move \"cursor\" position to byte 100\n",
"b[100:110] = b\"helloworld\"; # we include the `b` before \"helloword\" to tell Python to interpret it as bytes\n",
"data_file.seek(100) # return \"cursor\" to byte 100\n",
"n = data_file.read(10) # read back the written bytes\n",
"n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have opened a remote dataset and read parts of it using random access! Use `remote_open` when you want access to the raw data in a dataset, e.g., if you have Python code to read bytes in a particular format.\n",
"\n",
"*Note:* Without some prior knowledge about the format of the dataset, `remote_open` is not an effective method of parsing data. Since we are reading a raw file object, we need to know layout of the data and the data types (e.g. ints, floats, etc.). To read a dataset as a netCDF object, use [`remote_access`](https://unidata.github.io/siphon/latest/api/catalog.html?highlight=remote%20open#siphon.catalog.Dataset.remote_access)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Top\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## See also\n",
"\n",
"For more information on Siphon and `remote_open`, see the [Siphon docs](https://unidata.github.io/siphon/latest/api/catalog.html?highlight=remote%20open#siphon.catalog.Dataset.remote_open).\n",
"\n",
"You may also be interested in reading more about the [file-like object](https://docs.python.org/3/library/io.html#io.BytesIO) returned by `remote_open`.\n",
"\n",
"### Related notebooks\n",
"[Siphon (remote_access)](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/dataAccess/siphon-RemoteAccess.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Top\n",
"\n",
"---"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:pyaos-ams-2021]",
"language": "python",
"name": "conda-env-pyaos-ams-2021-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 4
}