{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"top\"></a>\n",
    "<div style=\"width:1000 px\">\n",
    "\n",
    "<div style=\"float:right; width:98 px; height:98px;\">\n",
    "<img src=\"https://raw.githubusercontent.com/Unidata/MetPy/master/src/metpy/plots/_static/unidata_150x150.png\" alt=\"Unidata Logo\" style=\"height: 98px;\">\n",
    "</div>\n",
    "\n",
    "<h1>Siphon (remote_open)</h1>\n",
    "<h3>Unidata AMS 2021 Student Conference</h3>\n",
    "\n",
    "<div style=\"clear:both\"></div>\n",
    "</div>\n",
    "\n",
    "---\n",
    "\n",
    "This notebook demonstrates the Siphon `remote_open` function, which opens a TDS Catalog remote dataset for random access. The `remote_open` method returns a file-like object that can be used similarly to a local file to read raw data.\n",
    "<div style=\"float:right; width:250 px\"><img src=\"../../instructors/images/siphon_remote_open_preview.png\" alt=\"raw GRIB data read using remote open\" style=\"height: 300px;\"></div>\n",
    "\n",
    "\n",
    "### Focuses\n",
    "* Open remote datasets on the TDS\n",
    "* Use the returned object to read the dataset as raw bytes\n",
    "* Interface with the dataset as if stored in a local file\n",
    "\n",
    "### Objectives\n",
    "1. [Find a dataset in a TDS Catalog](#1.-Find-a-dataset-in-a-TDS-Catalog)\n",
    "1. [Open the dataset using remote_open](#2.-Open-the-dataset-using-remote_open)\n",
    "1. [Read the returned object like a local file](#3.-Read-the-returned-object-like-a-local-file)\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Imports\n",
    "Before beginning, let's import the packages to be used throughout this training:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "from siphon.catalog import TDSCatalog"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Find a dataset in a TDS Catalog\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we use `remote_open`, we need to find a dataset that we'd like to access.  \n",
    "As an example, we'll use this [dataset](https://thredds-test.unidata.ucar.edu/thredds/catalog/casestudies/harvey/model/gfs/GFS_Global_0p5deg_20170825_1800.grib2/catalog.html?dataset=casestudies/harvey/model/gfs/GFS_Global_0p5deg_20170825_1800.grib2) from the NOAA NCEI THREDDS catalog."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To access a dataset, we need to know two things:\n",
    "* the url of the catalog where the dataset lives\n",
    "* the dataset name  \n",
    "\n",
    "The dataset name can be found on the [dataset HTML page](https://www.ncei.noaa.gov/thredds/catalog/model-namanl/202101/20210104/catalog.html?dataset=model-namanl/202101/20210104/nam_218_20210104_0600_006.grb2), e.g. \"nam_218_20210104_0600_006.grb2\".  \n",
    "The catalog URL is the URL of the dataset page up to \".html\", replacing \".html\" with \".xml\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "catUrl=\"https://www.ncei.noaa.gov/thredds/catalog/model-namanl/202101/20210104/catalog.xml\"\n",
    "datasetName=\"nam_218_20210104_0600_006.grb2\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we access the catalog using the catalog URL:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "catalog = TDSCatalog(catUrl)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And then select our dataset using the dataset name:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ds = catalog.datasets[datasetName]\n",
    "ds.name"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now view the access protocols available for our dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "list(ds.access_urls)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The list of services available for this dataset includes `HTTPServer`, which we'll need to open the dataset using `remote_open`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"#top\">Top</a>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Open the dataset using `remote_open`\n",
    "\n",
    "We'll now use Siphon's `remote_open` to obtain a file-like object representing the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_file = ds.remote_open()\n",
    "data_file"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now have an object that we can read similar to a local file. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = data_file.readline()\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Note:* When we use `remote_open` to read a dataset, we are reading raw data from a file-like object, rather than formatted data. The `b` at the start of the data indicates that the string should be interpreted as bytes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"#top\">Top</a>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Read the returned object like a local file\n",
    "We can now read our dataset using random access."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can read a line, as we did in the previous section, or we can read a specified number of bytes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = data_file.read(100)\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can change the our position in the file using `seek`, similar to moving a cursor in a file. The position is given as bytes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_file.seek(0) # move \"cursor\" to start of file\n",
    "print(data_file.read(4)) # print first 4 bytes\n",
    "data_file.seek(50) # move \"cursor\" to byte 50\n",
    "print(data_file.read(10)) # print 10 more bytes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we can read the data directly into a byte array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b = bytearray(100) # create a byte array of length 100\n",
    "data_file.readinto(b) # read 100 bytes into the byte array\n",
    "b[:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calling `getbuffer` returns the location in memory where the dataset is being stored locally."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b = data_file.getbuffer()\n",
    "b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use the memory buffer to make local writes. Write to the buffer will change the contents of `data_file` in memory, but will not write to the remote file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_file.seek(100) # move \"cursor\" position to byte 100\n",
    "b[100:110] = b\"helloworld\"; # we include the `b` before \"helloword\" to tell Python to interpret it as bytes\n",
    "data_file.seek(100) # return \"cursor\" to byte 100\n",
    "n = data_file.read(10) # read back the written bytes\n",
    "n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have opened a remote dataset and read parts of it using random access! Use `remote_open` when you want access to the raw data in a dataset, e.g., if you have Python code to read bytes in a particular format.\n",
    "\n",
    "*Note:* Without some prior knowledge about the format of the dataset, `remote_open` is not an effective method of parsing data. Since we are reading a raw file object, we need to know layout of the data and the data types (e.g. ints, floats, etc.). To read a dataset as a netCDF object, use [`remote_access`](https://unidata.github.io/siphon/latest/api/catalog.html?highlight=remote%20open#siphon.catalog.Dataset.remote_access)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"#top\">Top</a>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## See also\n",
    "\n",
    "For more information on Siphon and `remote_open`, see the [Siphon docs](https://unidata.github.io/siphon/latest/api/catalog.html?highlight=remote%20open#siphon.catalog.Dataset.remote_open).\n",
    "\n",
    "You may also be interested in reading more about the [file-like object](https://docs.python.org/3/library/io.html#io.BytesIO) returned by `remote_open`.\n",
    "\n",
    "### Related notebooks\n",
    "[Siphon (remote_access)](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/dataAccess/siphon-RemoteAccess.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"#top\">Top</a>\n",
    "\n",
    "---"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:pyaos-ams-2021]",
   "language": "python",
   "name": "conda-env-pyaos-ams-2021-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}