{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Demo notebook for accessing NSRDB data on Azure\n", "\n", "This notebook provides an example of accessing National Solar Radiation Database (NSRDB) data from blob storage on Azure. The data is stored in one HDF file per year.\n", "\n", "NSRDB data are stored in the East US Azure region, so this notebook will run most efficiently on Azure compute located in East US. We recommend that substantial computation depending on NSRDB data also be situated in East US. You don't want to download hundreds of terabytes to your laptop! If you are using this data for environmental science applications, consider applying for an [AI for Earth grant](http://aka.ms/ai4egrants) to support your compute requirements.\n", "\n", "This notebook was adapted from the [NREL NSRDB/HSDS example](https://github.com/NREL/hsds-examples/blob/master/notebooks/03_NSRDB_introduction.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Imports and constants" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import xarray as xr\n", "import pandas as pd\n", "import planetary_computer\n", "\n", "from adlfs import AzureBlobFileSystem\n", "\n", "# Year to investigate and plot\n", "year = 2015\n", "\n", "# Storage resources\n", "storage_account_name = 'nrel'\n", "folder = f'nrel-nsrdb/v3'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### List the data files\n", "\n", "We can use `adlfs` to list available files (one per year):" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found 23 annual files:\n", "nrel-nsrdb/v3/nsrdb_1998.h5\n", "nrel-nsrdb/v3/nsrdb_1999.h5\n", "nrel-nsrdb/v3/nsrdb_2000.h5\n", "nrel-nsrdb/v3/nsrdb_2001.h5\n", "nrel-nsrdb/v3/nsrdb_2002.h5\n", "nrel-nsrdb/v3/nsrdb_2003.h5\n", "nrel-nsrdb/v3/nsrdb_2004.h5\n", "nrel-nsrdb/v3/nsrdb_2005.h5\n", "nrel-nsrdb/v3/nsrdb_2006.h5\n", "nrel-nsrdb/v3/nsrdb_2007.h5\n", "...\n" ] } ], "source": [ "fs = AzureBlobFileSystem(\n", " account_name=storage_account_name,\n", " credential=planetary_computer.sas.get_token(\"nrel\", \"nrel-nsrdb\").token\n", ")\n", "annual_files = fs.glob(folder + '/*.h5')\n", "print('Found {} annual files:'.format(len(annual_files)))\n", "for path in annual_files[:10]:\n", " print(path)\n", "\n", "print('...')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Open one data file with xarray" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset>\n",
       "Dimensions:                   (phony_dim_0: 17520, phony_dim_1: 2018392,\n",
       "                               phony_dim_2: 2)\n",
       "Dimensions without coordinates: phony_dim_0, phony_dim_1, phony_dim_2\n",
       "Data variables: (12/28)\n",
       "    air_temperature           (phony_dim_0, phony_dim_1) int8 ...\n",
       "    alpha                     (phony_dim_0, phony_dim_1) int16 ...\n",
       "    aod                       (phony_dim_0, phony_dim_1) int16 ...\n",
       "    asymmetry                 (phony_dim_0, phony_dim_1) int16 ...\n",
       "    cld_opd_dcomp             (phony_dim_0, phony_dim_1) int16 ...\n",
       "    cld_reff_dcomp            (phony_dim_0, phony_dim_1) int16 ...\n",
       "    ...                        ...\n",
       "    surface_albedo            (phony_dim_0, phony_dim_1) int16 ...\n",
       "    surface_pressure          (phony_dim_0, phony_dim_1) int16 ...\n",
       "    time_index                (phony_dim_0) |S30 ...\n",
       "    total_precipitable_water  (phony_dim_0, phony_dim_1) int16 ...\n",
       "    wind_direction            (phony_dim_0, phony_dim_1) int16 ...\n",
       "    wind_speed                (phony_dim_0, phony_dim_1) int16 ...\n",
       "Attributes:\n",
       "    Version:  3.0.6
" ], "text/plain": [ "\n", "Dimensions: (phony_dim_0: 17520, phony_dim_1: 2018392,\n", " phony_dim_2: 2)\n", "Dimensions without coordinates: phony_dim_0, phony_dim_1, phony_dim_2\n", "Data variables: (12/28)\n", " air_temperature (phony_dim_0, phony_dim_1) int8 ...\n", " alpha (phony_dim_0, phony_dim_1) int16 ...\n", " aod (phony_dim_0, phony_dim_1) int16 ...\n", " asymmetry (phony_dim_0, phony_dim_1) int16 ...\n", " cld_opd_dcomp (phony_dim_0, phony_dim_1) int16 ...\n", " cld_reff_dcomp (phony_dim_0, phony_dim_1) int16 ...\n", " ... ...\n", " surface_albedo (phony_dim_0, phony_dim_1) int16 ...\n", " surface_pressure (phony_dim_0, phony_dim_1) int16 ...\n", " time_index (phony_dim_0) |S30 ...\n", " total_precipitable_water (phony_dim_0, phony_dim_1) int16 ...\n", " wind_direction (phony_dim_0, phony_dim_1) int16 ...\n", " wind_speed (phony_dim_0, phony_dim_1) int16 ...\n", "Attributes:\n", " Version: 3.0.6" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "file = fs.open(f\"nrel-nsrdb/v3/nsrdb_{year}.h5\")\n", "ds = xr.open_dataset(file, backend_kwargs={\"phony_dims\": \"sort\"}, engine=\"h5netcdf\")\n", "ds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Explore metadata" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(17520, 2018392)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Datasets are stored in 2D grids of size [time x location]\n", "dset = ds['ghi']\n", "dset.shape" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 00:30:00',\n", " '2015-01-01 01:00:00', '2015-01-01 01:30:00',\n", " '2015-01-01 02:00:00', '2015-01-01 02:30:00',\n", " '2015-01-01 03:00:00', '2015-01-01 03:30:00',\n", " '2015-01-01 04:00:00', '2015-01-01 04:30:00',\n", " ...\n", " '2015-12-31 19:00:00', '2015-12-31 19:30:00',\n", " '2015-12-31 20:00:00', '2015-12-31 20:30:00',\n", " '2015-12-31 21:00:00', '2015-12-31 21:30:00',\n", " '2015-12-31 22:00:00', '2015-12-31 22:30:00',\n", " '2015-12-31 23:00:00', '2015-12-31 23:30:00'],\n", " dtype='datetime64[ns]', length=17520, freq=None)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Extract datetime index for datasets\n", "time_index = pd.to_datetime(ds['time_index'][...].astype(str))\n", "time_index # temporal resolution is 30min" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "[2018392 values with dtype=[('latitude', '" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "p = pd.DataFrame({\"ghi\": subset, \"lat\": CA.latitude, \"lon\": CA.longitude}).plot.scatter(x=\"lon\", y=\"lat\", c=\"ghi\", cmap=\"viridis\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" } }, "nbformat": 4, "nbformat_minor": 4 }