{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploring netCDF Files\n", "\n", "This notebook provides discussion, examples, and best practices for working with netCDF files in Python.\n", "Topics include:\n", "\n", "* The [`netcdf4-python`](http://http://unidata.github.io/netcdf4-python/) library\n", "* The [`salishsea_tools.nc_tools`](http://salishsea-meopar-tools.readthedocs.org/en/latest/SalishSeaTools/salishsea-tools.html#module-nc_tools) code module\n", "* Reading netCDF files into Python data structures\n", "* Exploring netCDF dataset dimensions, variables, and attributes\n", "* Working with netCDF variable data as [NumPy](http://www.numpy.org/) arrays" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook is a companion to the [Exploring netCDF Datasets from ERDDAP](https://nbviewer.jupyter.org/github/SalishSeaCast/tools/blob/master/analysis_tools/Exploring%20netCDF%20Files.ipynb) notebook.\n", "That notebooks focusses on reading data from netCDF datasets stored on\n", "[ERDDAP](http://coastwatch.pfeg.noaa.gov/erddap/) servers on the Internet.\n", "The Salish Sea project maintains an ERDDAP server at https://salishsea.eos.ubc.ca/erddap/.\n", "\n", "Creating netCDF files and working with their attribute metadata is documented elsewhere:\n", "http://salishsea-meopar-docs.readthedocs.org/en/latest/code-notes/salishsea-nemo/nemo-forcing/netcdf4.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [`netcdf4-python`](http://unidata.github.io/netcdf4-python/) library\n", "does all of the heavy lifting to let us work with netCDF files and their data.\n", "Follow the link to get to the library documentation.\n", "The [salishsea_tools.nc_tools](http://salishsea-meopar-tools.readthedocs.org/en/latest/SalishSeaTools/salishsea-tools.html#module-nc_tools) code module provides some shortcut functions for exploring netCDF datasets.\n", "Let's go ahead and import those two packages,\n", "We'll also import `numpy` because we're going to use it later and it's good Python form\n", "to keep all of our imports at the top of the file.\n", "\n", "This notebook assumes that you are working in Python 3.\n", "If you don't have a Python 3 environment set up,\n", "please see our\n", "[Anaconda Python Distribution](http://salishsea-meopar-docs.readthedocs.org/en/latest/work_env/anaconda_python.html)\n", "docs for instructions on how to set one up." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import netCDF4 as nc\n", "import numpy as np\n", "\n", "from salishsea_tools import nc_tools" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that:\n", "\n", "* By convention, we alias `netCDF4` to `nc` and `numpy` to `np`\n", "so that we don't have to type as much\n", "* For the same reason we use the `from ... import ...` form to get `nc_tools`\n", "so that we can avoid typing `salishsea_tools.nc_tools` everywhere" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`netCDF` provides a `Dataset` object that allows us to load the contents\n", "of a netCDF file into a Python data structure by simply passing in the\n", "path and file name.\n", "Let's explore the Salish Sea NEMO model bathymetry data:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "grid = nc.Dataset('../../NEMO-forcing/grid/bathy_meter_SalishSea2.nc')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note:**\n", "\n", "`netCDF4.Dataset` can also access datasets stored on [ERDDAP](http://coastwatch.pfeg.noaa.gov/erddap/) servers\n", "like https://salishsea.eos.ubc.ca/erddap/.\n", "If you don't have local access to the `tools/NEMO-forcing/grid/bathy_meter_SalishSea2.nc`,\n", "please see the [Exploring netCDF Datasets from ERDDAP](https://nbviewer.jupyter.org/github/SalishSeaCast/tools/blob/master/analysis_tools/Exploring%20netCDF%20Files.ipynb) notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "netCDF files are organized around 4 big concepts:\n", "\n", "* groups\n", "* dimensions\n", "* variables\n", "* attributes\n", "\n", "NEMO doesn't use netCDF groups, so we'll ignore them.\n", "\n", "`nc_tools` provides useful (convenience) functions to look at the other 3." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ": name = 'y', size = 898\n", "\n", ": name = 'x', size = 398\n", "\n" ] } ], "source": [ "nc_tools.show_dimensions(grid)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "odict_keys(['nav_lon', 'nav_lat', 'Bathymetry'])\n" ] } ], "source": [ "nc_tools.show_variables(grid)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, we have a dataset that has 2 dimensions called `y` and `x`\n", "of size 898 and 398, respectively,\n", "and 3 variables called `nav_lon`, `nav_lat`, and `Bathymetry`.\n", "We'll see how the dimensions and variables are related,\n", "and how to work with the data in the variables in a moment,\n", "but first, let's look at the dataset attributes:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "file format: NETCDF4\n", "Conventions: CF-1.6\n", "title: Salish Sea NEMO Bathymetry\n", "institution: Dept of Earth, Ocean & Atmospheric Sciences, University of British Columbia\n", "references: https://bitbucket.org/salishsea/nemo-forcing/src/tip/grid/bathy_meter_SalishSea.nc\n", "comment: Based on 1_bathymetry_seagrid_WestCoast.nc file from 2-Oct-2013 WCSD_PREP tarball provided by J-P Paquin.\n", "source: \n", "https://github.com/SalishSeaCast/tools/blob/master/bathymetry/SalishSeaBathy.ipynb\n", "https://github.com/SalishSeaCast/tools/blob/master/bathymetry/SmoothMouthJdF.ipynb\n", "\n", "history: \n", "[2013-10-30 13:18] Created netCDF4 zlib=True dataset.\n", "[2013-10-30 15:22] Set depths between 0 and 4m to 4m and those >428m to 428m.\n", "[2013-10-31 17:10] Algorithmic smoothing.\n", "[2013-11-21 19:53] Reverted to pre-smothing dataset (repo rev 3b301b5b9b6d).\n", "[2013-11-21 20:14] Updated dataset and variable attributes to CF-1.6 conventions & project standards.\n", "[2013-11-21 20:47] Removed east end of Jervis Inlet and Toba Inlet region due to deficient source bathymetry data in Cascadia dataset.\n", "[2013-11-21 21:52] Algorithmic smoothing.\n", "[2014-01-01 14:44] Smoothed mouth of Juan de Fuca\n", "\n" ] } ], "source": [ "nc_tools.show_dataset_attrs(grid)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "netCDF attributes are metadata.\n", "In the cast of the dataset attributes they tell us about the dataset as a whole:\n", "how, when, and by whom it was created, how it has been modified, etc.\n", "The meanings of the various attributes and the conventions for them that we use\n", "in the Salish Sea MEOPAR project are documented [elsewhere](http://salishsea-meopar-docs.readthedocs.org/en/latest/code-notes/salishsea-nemo/nemo-forcing/netcdf4.html).\n", "Variables also have attributes and `nc_tools` provides a function to display them too:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "float64 nav_lon(y, x)\n", " units: degrees east\n", " valid_range: [-126.40029144 -121.31835175]\n", " long_name: Longitude\n", "unlimited dimensions: \n", "current shape = (898, 398)\n", "filling on, default _FillValue of 9.969209968386869e+36 used\n", "\n" ] } ], "source": [ "nc_tools.show_variable_attrs(grid, 'nav_lon')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tells us a whole lot of useful information about the longitude data values in\n", "our bathymetry dataset, for instance:\n", "\n", "* They are 64-bit floating point values\n", "* They are associated with the `y` and `x` dimensions, in that order\n", "* The units are degrees measured eastward (presumably from the Greenwich meridian)\n", "* etc.\n", "\n", "You can list as many variable names as you want in the `show_variable_attrs()` call\n", "to get information about several variables at once.\n", "If you don't provide any variables names,\n", "you get the attributes of all of the variables in the dataset:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "float64 nav_lon(y, x)\n", " units: degrees east\n", " valid_range: [-126.40029144 -121.31835175]\n", " long_name: Longitude\n", "unlimited dimensions: \n", "current shape = (898, 398)\n", "filling on, default _FillValue of 9.969209968386869e+36 used\n", "\n", "\n", "float64 nav_lat(y, x)\n", " units: degrees north\n", " valid_range: [ 46.85966492 51.10480118]\n", " long_name: Latitude\n", "unlimited dimensions: \n", "current shape = (898, 398)\n", "filling on, default _FillValue of 9.969209968386869e+36 used\n", "\n", "\n", "float64 Bathymetry(y, x)\n", " _FillValue: 0.0\n", " least_significant_digit: 1\n", " units: m\n", " valid_range: [ 0. 428.]\n", " long_name: Depth\n", " positive: down\n", "unlimited dimensions: \n", "current shape = (898, 398)\n", "filling on\n" ] } ], "source": [ "nc_tools.show_variable_attrs(grid)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we can go further exploring and working with the variables we need to\n", "associate them with Python variables names.\n", "We do that by accessing them by name in the `variables` attribute of our `Dataset` object.\n", "`variables` is a Python `dict`.\n", "We can use any Python variable names we like, so let's shorten them\n", "(being careful not to sacrifice readability for ease of typing):" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [], "source": [ "lons = grid.variables['nav_lon']\n", "lats = grid.variables['nav_lat']\n", "bathy = grid.variables['Bathymetry']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Having done that, we can now access the attributes of our variables\n", "using dotted notation:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "('m', array([ 0., 428.]))" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bathy.units, bathy.valid_range" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our variables are instances of the `netCDF.Variable` object.\n", "In addition to their attributes, they carry a bunch of other\n", "useful properties and methods that you can read about in the netCDF4-python docs.\n", "Perhaps more importantly the data associated with the variables\n", "are stored as NumPy arrays.\n", "So, we can use NumPy indexing and slicing to access the data values.\n", "For instance, to get the latitudes and longitudes of the 4 corners of the domain:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(898, 398)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lats.shape" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Latitudes and longitudes of domain corners:\n", " 0, 0: 46.859664917 -123.42943573\n", " 0, x-max: 47.6009216309 -121.318351746\n", " y-max, 0: 50.3899269104 -126.400291443\n", " y-max, x-max: 51.104801178 -124.34198761\n" ] } ], "source": [ "print('Latitudes and longitudes of domain corners:')\n", "pt = (0, 0)\n", "print(' 0, 0: ', lats[pt], lons[pt])\n", "pt = (0, lats.shape[1] - 1)\n", "print(' 0, x-max: ', lats[pt], lons[pt])\n", "pt = (lats.shape[0] - 1, 0)\n", "print(' y-max, 0: ', lats[pt], lons[pt])\n", "pt = (lats.shape[0] - 1, lats.shape[1] - 1)\n", "print(' y-max, x-max:', lats[pt], lons[pt])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also access the entire variable data array, or subsets of it using slicing.\n", "The `[:]` slice notation is a convenient shorthand that means \"the entire array\"." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([[ 46.85966492, 46.86154556, 46.86342621, ..., 47.59721375,\n", " 47.59906769, 47.60092163],\n", " [ 46.86278915, 46.86481476, 46.86677933, ..., 47.60125732,\n", " 47.60311127, 47.60496521],\n", " [ 46.86606979, 46.86814499, 46.87015915, ..., 47.60529709,\n", " 47.60715485, 47.60900879],\n", " ..., \n", " [ 50.38191605, 50.38397598, 50.38602448, ..., 51.09400177,\n", " 51.09560776, 51.09720612],\n", " [ 50.38591766, 50.38798523, 50.39004135, ..., 51.09781265,\n", " 51.0994072 , 51.10100174],\n", " [ 50.38992691, 50.39200592, 50.39406967, ..., 51.10162354,\n", " 51.10321808, 51.10480118]])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lats[:]" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([[-122.884552 , -122.87927246, -122.87399292, -122.86871338,\n", " -122.86342621, -122.85814667, -122.85286713],\n", " [-122.88778687, -122.88250732, -122.87722778, -122.87194824,\n", " -122.8666687 , -122.86138916, -122.85610962],\n", " [-122.89102936, -122.88574982, -122.88047028, -122.87519073,\n", " -122.86991119, -122.86463165, -122.85934448]])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lons[42:45, 128:135]" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(array([[-123.42943573, -123.42411804],\n", " [-123.43196869, -123.42677307]]), array([[ 51.0994072 , 51.10100174],\n", " [ 51.10321808, 51.10480118]]))" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lons[:2, :2], lats[-2:, -2:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the zero and maximum dimension values may be omitted\n", "for slices that extend to the ends of array dimensions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In some cases, like our bathymetry depths, \n", "the netCDF variable has a `_FillingValue` attribute value that is equal\n", "to values in the variable data.\n", "In that case the data are represented by a [NumPy Masked Array](http://docs.scipy.org/doc/numpy/reference/maskedarray.html) with the\n", "mask applied there the data values equal the `_FillingValue`:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "masked_array(data =\n", " [[-- -- -- ..., -- -- --]\n", " [-- -- -- ..., -- -- --]\n", " [-- -- -- ..., -- -- --]\n", " ..., \n", " [-- -- -- ..., -- -- --]\n", " [-- -- -- ..., -- -- --]\n", " [-- -- -- ..., -- -- --]],\n", " mask =\n", " [[ True True True ..., True True True]\n", " [ True True True ..., True True True]\n", " [ True True True ..., True True True]\n", " ..., \n", " [ True True True ..., True True True]\n", " [ True True True ..., True True True]\n", " [ True True True ..., True True True]],\n", " fill_value = 0.0)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bathy[:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can test to see if a variables data is masked like this:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.ma.is_masked(bathy[:])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Masked arrays are useful because require less storage than a comparable\n", "size fully populated array.\n", "Also, when masked arrays are plotted the maked values are all plotted\n", "in the same colour (white by default).\n", "We'll see in other example notebooks how this allows us to very easily \n", "plot our bathymetry in a meaningfully way,\n", "and use it,\n", "or other values to mask velocity component, salinity, etc. results so\n", "that they show values only in the water areas of the domain." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 0 }