{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Xarray Tips and Tricks\n", "\n", "\n", "*This material is adapted from the [Earth and Environmental Data Science](https://earth-env-data-science.github.io/intro.html), from Ryan Abernathey (Columbia University)*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build a multi-file dataset from an OpenDAP server\n", "\n", "One thing we love about xarray is the `open_mfdataset` function, which combines many netCDF files into a single xarray Dataset.\n", "\n", "But what if the files are stored on a remote server and accessed over OpenDAP. An example can be found in NOAA's NCEP Reanalysis catalog.\n", "\n", "https://www.esrl.noaa.gov/psd/thredds/catalog/Datasets/ncep.reanalysis/surface/catalog.html\n", "\n", "The dataset is split into different files for each variable and year. For example, a single file for surface air temperature looks like:\n", "\n", "[http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995.1948.nc](https://psl.noaa.gov/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995.1948.nc.html)\n", "\n", "We can't just call\n", "\n", " open_mfdataset(\"http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995.*.nc\")\n", " \n", "Because wildcard expansion doesn't work with OpenDAP endpoints. The solution is to manually create a list of files to open." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "tags": [ "scroll-output" ] }, "outputs": [ { "data": { "text/plain": [ "['http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995.2015.nc',\n", " 'http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995.2016.nc',\n", " 'http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995.2017.nc',\n", " 'http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995.2018.nc']" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import xarray as xr\n", "%matplotlib inline\n", "\n", "base_url = 'http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995'\n", "\n", "files = [f'{base_url}.{year}.nc' for year in range(2015, 2019)]\n", "files" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "tags": [ "scroll-output", "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
<xarray.Dataset>\n", "Dimensions: (lat: 73, lon: 144, time: 5844)\n", "Coordinates:\n", " * time (time) datetime64[ns] 2015-01-01 ... 2018-12-31T18:00:00\n", " * lat (lat) float32 90.0 87.5 85.0 82.5 80.0 ... -82.5 -85.0 -87.5 -90.0\n", " * lon (lon) float32 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5\n", "Data variables:\n", " air (time, lat, lon) float32 dask.array<chunksize=(1460, 73, 144), meta=np.ndarray>\n", "Attributes:\n", " Conventions: COARDS\n", " title: 4x daily NMC reanalysis (2014)\n", " history: created 2013/12 by Hoop (netCDF2.3)\n", " description: Data is from NMC initialized reanalysis\\...\n", " platform: Model\n", " dataset_title: NCEP-NCAR Reanalysis 1\n", " References: http://www.psl.noaa.gov/data/gridded/dat...\n", " DODS_EXTRA.Unlimited_Dimension: time
array(['2015-01-01T00:00:00.000000000', '2015-01-01T06:00:00.000000000',\n", " '2015-01-01T12:00:00.000000000', ..., '2018-12-31T06:00:00.000000000',\n", " '2018-12-31T12:00:00.000000000', '2018-12-31T18:00:00.000000000'],\n", " dtype='datetime64[ns]')
array([ 90. , 87.5, 85. , 82.5, 80. , 77.5, 75. , 72.5, 70. , 67.5,\n", " 65. , 62.5, 60. , 57.5, 55. , 52.5, 50. , 47.5, 45. , 42.5,\n", " 40. , 37.5, 35. , 32.5, 30. , 27.5, 25. , 22.5, 20. , 17.5,\n", " 15. , 12.5, 10. , 7.5, 5. , 2.5, 0. , -2.5, -5. , -7.5,\n", " -10. , -12.5, -15. , -17.5, -20. , -22.5, -25. , -27.5, -30. , -32.5,\n", " -35. , -37.5, -40. , -42.5, -45. , -47.5, -50. , -52.5, -55. , -57.5,\n", " -60. , -62.5, -65. , -67.5, -70. , -72.5, -75. , -77.5, -80. , -82.5,\n", " -85. , -87.5, -90. ], dtype=float32)
array([ 0. , 2.5, 5. , 7.5, 10. , 12.5, 15. , 17.5, 20. , 22.5,\n", " 25. , 27.5, 30. , 32.5, 35. , 37.5, 40. , 42.5, 45. , 47.5,\n", " 50. , 52.5, 55. , 57.5, 60. , 62.5, 65. , 67.5, 70. , 72.5,\n", " 75. , 77.5, 80. , 82.5, 85. , 87.5, 90. , 92.5, 95. , 97.5,\n", " 100. , 102.5, 105. , 107.5, 110. , 112.5, 115. , 117.5, 120. , 122.5,\n", " 125. , 127.5, 130. , 132.5, 135. , 137.5, 140. , 142.5, 145. , 147.5,\n", " 150. , 152.5, 155. , 157.5, 160. , 162.5, 165. , 167.5, 170. , 172.5,\n", " 175. , 177.5, 180. , 182.5, 185. , 187.5, 190. , 192.5, 195. , 197.5,\n", " 200. , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5,\n", " 225. , 227.5, 230. , 232.5, 235. , 237.5, 240. , 242.5, 245. , 247.5,\n", " 250. , 252.5, 255. , 257.5, 260. , 262.5, 265. , 267.5, 270. , 272.5,\n", " 275. , 277.5, 280. , 282.5, 285. , 287.5, 290. , 292.5, 295. , 297.5,\n", " 300. , 302.5, 305. , 307.5, 310. , 312.5, 315. , 317.5, 320. , 322.5,\n", " 325. , 327.5, 330. , 332.5, 335. , 337.5, 340. , 342.5, 345. , 347.5,\n", " 350. , 352.5, 355. , 357.5], dtype=float32)
\n",
"
| \n",
"\n", "\n", " | \n", "
<xarray.Dataset>\n", "Dimensions: (lat: 89, lon: 180, nbnds: 2, time: 708)\n", "Coordinates:\n", " * lat (lat) float32 88.0 86.0 84.0 82.0 ... -82.0 -84.0 -86.0 -88.0\n", " * lon (lon) float32 0.0 2.0 4.0 6.0 8.0 ... 352.0 354.0 356.0 358.0\n", " * time (time) datetime64[ns] 1960-01-01 1960-02-01 ... 2018-12-01\n", "Dimensions without coordinates: nbnds\n", "Data variables:\n", " time_bnds (time, nbnds) float64 9.969e+36 9.969e+36 ... 9.969e+36 9.969e+36\n", " sst (time, lat, lon) float32 -1.8 -1.8 -1.8 -1.8 ... nan nan nan nan\n", "Attributes: (12/38)\n", " climatology: Climatology is based on 1971-2000 SST, X...\n", " description: In situ data: ICOADS2.5 before 2007 and ...\n", " keywords_vocabulary: NASA Global Change Master Directory (GCM...\n", " keywords: Earth Science > Oceans > Ocean Temperatu...\n", " instrument: Conventional thermometers\n", " source_comment: SSTs were observed by conventional therm...\n", " ... ...\n", " license: No constraints on data access or use\n", " comment: SSTs were observed by conventional therm...\n", " summary: ERSST.v5 is developed based on v4 after ...\n", " dataset_title: NOAA Extended Reconstructed SST V5\n", " data_modified: 2021-03-07\n", " DODS_EXTRA.Unlimited_Dimension: time
array([ 88., 86., 84., 82., 80., 78., 76., 74., 72., 70., 68., 66.,\n", " 64., 62., 60., 58., 56., 54., 52., 50., 48., 46., 44., 42.,\n", " 40., 38., 36., 34., 32., 30., 28., 26., 24., 22., 20., 18.,\n", " 16., 14., 12., 10., 8., 6., 4., 2., 0., -2., -4., -6.,\n", " -8., -10., -12., -14., -16., -18., -20., -22., -24., -26., -28., -30.,\n", " -32., -34., -36., -38., -40., -42., -44., -46., -48., -50., -52., -54.,\n", " -56., -58., -60., -62., -64., -66., -68., -70., -72., -74., -76., -78.,\n", " -80., -82., -84., -86., -88.], dtype=float32)
array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18., 20., 22.,\n", " 24., 26., 28., 30., 32., 34., 36., 38., 40., 42., 44., 46.,\n", " 48., 50., 52., 54., 56., 58., 60., 62., 64., 66., 68., 70.,\n", " 72., 74., 76., 78., 80., 82., 84., 86., 88., 90., 92., 94.,\n", " 96., 98., 100., 102., 104., 106., 108., 110., 112., 114., 116., 118.,\n", " 120., 122., 124., 126., 128., 130., 132., 134., 136., 138., 140., 142.,\n", " 144., 146., 148., 150., 152., 154., 156., 158., 160., 162., 164., 166.,\n", " 168., 170., 172., 174., 176., 178., 180., 182., 184., 186., 188., 190.,\n", " 192., 194., 196., 198., 200., 202., 204., 206., 208., 210., 212., 214.,\n", " 216., 218., 220., 222., 224., 226., 228., 230., 232., 234., 236., 238.,\n", " 240., 242., 244., 246., 248., 250., 252., 254., 256., 258., 260., 262.,\n", " 264., 266., 268., 270., 272., 274., 276., 278., 280., 282., 284., 286.,\n", " 288., 290., 292., 294., 296., 298., 300., 302., 304., 306., 308., 310.,\n", " 312., 314., 316., 318., 320., 322., 324., 326., 328., 330., 332., 334.,\n", " 336., 338., 340., 342., 344., 346., 348., 350., 352., 354., 356., 358.],\n", " dtype=float32)
array(['1960-01-01T00:00:00.000000000', '1960-02-01T00:00:00.000000000',\n", " '1960-03-01T00:00:00.000000000', ..., '2018-10-01T00:00:00.000000000',\n", " '2018-11-01T00:00:00.000000000', '2018-12-01T00:00:00.000000000'],\n", " dtype='datetime64[ns]')
array([[9.96920997e+36, 9.96920997e+36],\n", " [9.96920997e+36, 9.96920997e+36],\n", " [9.96920997e+36, 9.96920997e+36],\n", " ...,\n", " [9.96920997e+36, 9.96920997e+36],\n", " [9.96920997e+36, 9.96920997e+36],\n", " [9.96920997e+36, 9.96920997e+36]])
array([[[-1.8, -1.8, -1.8, ..., -1.8, -1.8, -1.8],\n", " [-1.8, -1.8, -1.8, ..., -1.8, -1.8, -1.8],\n", " [-1.8, -1.8, -1.8, ..., -1.8, -1.8, -1.8],\n", " ...,\n", " [ nan, nan, nan, ..., nan, nan, nan],\n", " [ nan, nan, nan, ..., nan, nan, nan],\n", " [ nan, nan, nan, ..., nan, nan, nan]],\n", "\n", " [[-1.8, -1.8, -1.8, ..., -1.8, -1.8, -1.8],\n", " [-1.8, -1.8, -1.8, ..., -1.8, -1.8, -1.8],\n", " [-1.8, -1.8, -1.8, ..., -1.8, -1.8, -1.8],\n", " ...,\n", " [ nan, nan, nan, ..., nan, nan, nan],\n", " [ nan, nan, nan, ..., nan, nan, nan],\n", " [ nan, nan, nan, ..., nan, nan, nan]],\n", "\n", " [[-1.8, -1.8, -1.8, ..., -1.8, -1.8, -1.8],\n", " [-1.8, -1.8, -1.8, ..., -1.8, -1.8, -1.8],\n", " [-1.8, -1.8, -1.8, ..., -1.8, -1.8, -1.8],\n", " ...,\n", "...\n", " ...,\n", " [ nan, nan, nan, ..., nan, nan, nan],\n", " [ nan, nan, nan, ..., nan, nan, nan],\n", " [ nan, nan, nan, ..., nan, nan, nan]],\n", "\n", " [[-1.8, -1.8, -1.8, ..., -1.8, -1.8, -1.8],\n", " [-1.8, -1.8, -1.8, ..., -1.8, -1.8, -1.8],\n", " [-1.8, -1.8, -1.8, ..., -1.8, -1.8, -1.8],\n", " ...,\n", " [ nan, nan, nan, ..., nan, nan, nan],\n", " [ nan, nan, nan, ..., nan, nan, nan],\n", " [ nan, nan, nan, ..., nan, nan, nan]],\n", "\n", " [[-1.8, -1.8, -1.8, ..., -1.8, -1.8, -1.8],\n", " [-1.8, -1.8, -1.8, ..., -1.8, -1.8, -1.8],\n", " [-1.8, -1.8, -1.8, ..., -1.8, -1.8, -1.8],\n", " ...,\n", " [ nan, nan, nan, ..., nan, nan, nan],\n", " [ nan, nan, nan, ..., nan, nan, nan],\n", " [ nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)