{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "
\n", "\"Unidata\n", "
\n", "\n", "

Siphon (remote_open)

\n", "

Unidata AMS 2021 Student Conference

\n", "\n", "
\n", "
\n", "\n", "---\n", "\n", "This notebook demonstrates the Siphon `remote_open` function, which opens a TDS Catalog remote dataset for random access. The `remote_open` method returns a file-like object that can be used similarly to a local file to read raw data.\n", "
\"raw
\n", "\n", "\n", "### Focuses\n", "* Open remote datasets on the TDS\n", "* Use the returned object to read the dataset as raw bytes\n", "* Interface with the dataset as if stored in a local file\n", "\n", "### Objectives\n", "1. [Find a dataset in a TDS Catalog](#1.-Find-a-dataset-in-a-TDS-Catalog)\n", "1. [Open the dataset using remote_open](#2.-Open-the-dataset-using-remote_open)\n", "1. [Read the returned object like a local file](#3.-Read-the-returned-object-like-a-local-file)\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Imports\n", "Before beginning, let's import the packages to be used throughout this training:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from siphon.catalog import TDSCatalog" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Find a dataset in a TDS Catalog\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we use `remote_open`, we need to find a dataset that we'd like to access. \n", "As an example, we'll use this [dataset](https://thredds-test.unidata.ucar.edu/thredds/catalog/casestudies/harvey/model/gfs/GFS_Global_0p5deg_20170825_1800.grib2/catalog.html?dataset=casestudies/harvey/model/gfs/GFS_Global_0p5deg_20170825_1800.grib2) from the NOAA NCEI THREDDS catalog." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To access a dataset, we need to know two things:\n", "* the url of the catalog where the dataset lives\n", "* the dataset name \n", "\n", "The dataset name can be found on the [dataset HTML page](https://www.ncei.noaa.gov/thredds/catalog/model-namanl/202101/20210104/catalog.html?dataset=model-namanl/202101/20210104/nam_218_20210104_0600_006.grb2), e.g. \"nam_218_20210104_0600_006.grb2\". \n", "The catalog URL is the URL of the dataset page up to \".html\", replacing \".html\" with \".xml\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "catUrl=\"https://www.ncei.noaa.gov/thredds/catalog/model-namanl/202101/20210104/catalog.xml\"\n", "datasetName=\"nam_218_20210104_0600_006.grb2\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we access the catalog using the catalog URL:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "catalog = TDSCatalog(catUrl)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And then select our dataset using the dataset name:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds = catalog.datasets[datasetName]\n", "ds.name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now view the access protocols available for our dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(ds.access_urls)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The list of services available for this dataset includes `HTTPServer`, which we'll need to open the dataset using `remote_open`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Open the dataset using `remote_open`\n", "\n", "We'll now use Siphon's `remote_open` to obtain a file-like object representing the dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_file = ds.remote_open()\n", "data_file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have an object that we can read similar to a local file. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = data_file.readline()\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note:* When we use `remote_open` to read a dataset, we are reading raw data from a file-like object, rather than formatted data. The `b` at the start of the data indicates that the string should be interpreted as bytes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Read the returned object like a local file\n", "We can now read our dataset using random access." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can read a line, as we did in the previous section, or we can read a specified number of bytes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = data_file.read(100)\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can change the our position in the file using `seek`, similar to moving a cursor in a file. The position is given as bytes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_file.seek(0) # move \"cursor\" to start of file\n", "print(data_file.read(4)) # print first 4 bytes\n", "data_file.seek(50) # move \"cursor\" to byte 50\n", "print(data_file.read(10)) # print 10 more bytes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can read the data directly into a byte array." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b = bytearray(100) # create a byte array of length 100\n", "data_file.readinto(b) # read 100 bytes into the byte array\n", "b[:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calling `getbuffer` returns the location in memory where the dataset is being stored locally." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b = data_file.getbuffer()\n", "b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the memory buffer to make local writes. Write to the buffer will change the contents of `data_file` in memory, but will not write to the remote file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_file.seek(100) # move \"cursor\" position to byte 100\n", "b[100:110] = b\"helloworld\"; # we include the `b` before \"helloword\" to tell Python to interpret it as bytes\n", "data_file.seek(100) # return \"cursor\" to byte 100\n", "n = data_file.read(10) # read back the written bytes\n", "n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have opened a remote dataset and read parts of it using random access! Use `remote_open` when you want access to the raw data in a dataset, e.g., if you have Python code to read bytes in a particular format.\n", "\n", "*Note:* Without some prior knowledge about the format of the dataset, `remote_open` is not an effective method of parsing data. Since we are reading a raw file object, we need to know layout of the data and the data types (e.g. ints, floats, etc.). To read a dataset as a netCDF object, use [`remote_access`](https://unidata.github.io/siphon/latest/api/catalog.html?highlight=remote%20open#siphon.catalog.Dataset.remote_access)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## See also\n", "\n", "For more information on Siphon and `remote_open`, see the [Siphon docs](https://unidata.github.io/siphon/latest/api/catalog.html?highlight=remote%20open#siphon.catalog.Dataset.remote_open).\n", "\n", "You may also be interested in reading more about the [file-like object](https://docs.python.org/3/library/io.html#io.BytesIO) returned by `remote_open`.\n", "\n", "### Related notebooks\n", "[Siphon (remote_access)](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/dataAccess/siphon-RemoteAccess.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top\n", "\n", "---" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:pyaos-ams-2021]", "language": "python", "name": "conda-env-pyaos-ams-2021-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.1" } }, "nbformat": 4, "nbformat_minor": 4 }