# Explore MODHID hdf5 Files

This notebook is an initial exploration of hdf5 output files produced by MOHID.

MOHID output file types explored:

* Lagrangian oil particle tracks

Python libraries explored:

* [PyNIO](https://www.pyngl.ucar.edu/Nio.shtml)
* [h5netcdf](https://github.com/shoyer/h5netcdf)
* [h5py](http://docs.h5py.org/en/stable/index.html)
* [PyTables](https://www.pytables.org/index.html)

Summary:

* PyNIO fails to open the files
* h5netcdf successfully opens the files but can't handle groups that contain variables
that don't have associated dimension scales
* h5py successfully opens the files and allows access to all of their contents,
but the interface is a really low level, nested-dict one
* PyTables successfully opens the files and allows access to al of their contents
via an object interface.
The PyTables tutorial also brought the `h5ls` tool to my attention.
* The results are organized as Eulerian snapshots,
in general both by 2D and 3D slabs of the hydrodynamics model (NEMO or FVCOM) grid,
and by particle.

Next step:

* Write code to transform PyTables `tables.File` objects into `xarray.Dataset` objects.

In [2]:
!ls -lh ../../SalishSeaShihan/results/Lagrangian*.hdf5

-rw-r--r-- 1 doug doug 200M Oct 11 07:42 ../../SalishSeaShihan/results/Lagrangian_7_nested_5.hdf5
-rw-r--r-- 1 doug doug 355M Oct 11 07:47 ../../SalishSeaShihan/results/Lagrangian_7_st_georgia_nested.hdf5


## PyNIO

In [1]:
import Nio

In [None]:
h5file = Nio.open_file('../../SalishSeaShihan/results/Lagrangian_7_st_georgia_nested.hdf5', format='hdf5')
h5file

...crashed the kernel :-(

In [None]:
h5file = Nio.open_file('../../SalishSeaShihan/results/Lagrangian_7_nested_5.hdf5')


...never finished :-(

## h5netcdf

In [1]:
import h5netcdf

In [69]:
h5file = h5netcdf.File('../../SalishSeaShihan/results/Lagrangian_7_nested_5.hdf5')
h5file

<h5netcdf.File 'Lagrangian_7_nested_5.hdf5' (mode r+)>
Dimensions:
Groups:
    Grid
    Results
    Time
Variables:
Attributes:

In [11]:
oil['Grid']

ValueError: variable '/Grid/Bathymetry' has no dimension scale associated with axis 0

[Dimeions scales](https://www.unidata.ucar.edu/software/netcdf/docs/interoperability_hdf5.html)
seem to be optional for HDF5, but required for netCDF4.
The h5py docs provide [some additional clues](http://docs.h5py.org/en/stable/high/dims.html).

In [70]:
h5file['Results']

<h5netcdf.Group '/Results' (3 members)>
Dimensions:
Groups:
    Number
    OilSpill
    Percentage Contaminated
Variables:
Attributes:
    Minimum: 9900000000000000.0
    Maximum: -9900000000000000.0

In [71]:
h5file['Time']

ValueError: variable '/Time/Time_00001' has no dimension scale associated with axis 0

## h5py

In [8]:
import h5py

In [73]:
h5file = h5py.File('../../SalishSeaShihan/results/Lagrangian_7_nested_5.hdf5', mode='r')
h5file

<HDF5 file "Lagrangian_7_nested_5.hdf5" (mode r+)>

In [74]:
for k in h5file: print(k)

Grid
Results
Time


In [75]:
h5file['Grid']

<HDF5 group "/Grid" (9 members)>

In [77]:
for k in h5file['Grid']: print(k)

Bathymetry
ConnectionX
ConnectionY
Define Cells
Latitude
Longitude
OpenPoints
VerticalZ
WaterPoints3D


In [78]:
h5file['Results']

<HDF5 group "/Results" (3 members)>

In [79]:
for k in h5file['Results']: print(k)

Number
OilSpill
Percentage Contaminated


In [80]:
h5file['Time']

<HDF5 group "/Time" (167 members)>

In [81]:
for k in h5file['Time']: print(k)

Time_00001
Time_00002
Time_00003
Time_00004
Time_00005
Time_00006
Time_00007
Time_00008
Time_00009
Time_00010
Time_00011
Time_00012
Time_00013
Time_00014
Time_00015
Time_00016
Time_00017
Time_00018
Time_00019
Time_00020
Time_00021
Time_00022
Time_00023
Time_00024
Time_00025
Time_00026
Time_00027
Time_00028
Time_00029
Time_00030
Time_00031
Time_00032
Time_00033
Time_00034
Time_00035
Time_00036
Time_00037
Time_00038
Time_00039
Time_00040
Time_00041
Time_00042
Time_00043
Time_00044
Time_00045
Time_00046
Time_00047
Time_00048
Time_00049
Time_00050
Time_00051
Time_00052
Time_00053
Time_00054
Time_00055
Time_00056
Time_00057
Time_00058
Time_00059
Time_00060
Time_00061
Time_00062
Time_00063
Time_00064
Time_00065
Time_00066
Time_00067
Time_00068
Time_00069
Time_00070
Time_00071
Time_00072
Time_00073
Time_00074
Time_00075
Time_00076
Time_00077
Time_00078
Time_00079
Time_00080
Time_00081
Time_00082
Time_00083
Time_00084
Time_00085
Time_00086
Time_00087
Time_00088
Time_00089
Time_00090
Time_00091

In [82]:
h5file['Time']['Time_00001']

<HDF5 dataset "Time_00001": shape (6,), type "<f4">

In [84]:
h5file['Time']['Time_00001'][:]

array([2015.,    4.,    8.,    0.,   30.,    0.], dtype=float32)

i.e. the date/time `2015-04-08 00:30:00` stored at 4 32-bit floats

In [85]:
h5file['Results']['Number']

<HDF5 group "/Results/Number" (167 members)>

In [86]:
h5file.attrs

<Attributes of HDF5 object at 139899974438056>

In [87]:
for attr in h5file.attrs: print(attr)

In [88]:
for attr, value in h5file['Grid'].attrs.items(): print(attr, value)

Minimum -9900000000000000.0
Maximum -9900000000000000.0


In [89]:
for attr, value in h5file['Grid']['Bathymetry'].attrs.items(): print(attr, value)

Minimum -9900000000000000.0
Maximum -9900000000000000.0
Units b'm'


In [90]:
for attr, value in h5file['Results']['Number'].attrs.items(): print(attr, value)

Minimum -9900000000000000.0
Maximum -9900000000000000.0


In [35]:
oil['Results']['Number']

<HDF5 group "/Results/Number" (167 members)>

In [91]:
for n in h5file['Results']['Number']: print(n)

Number_00001
Number_00002
Number_00003
Number_00004
Number_00005
Number_00006
Number_00007
Number_00008
Number_00009
Number_00010
Number_00011
Number_00012
Number_00013
Number_00014
Number_00015
Number_00016
Number_00017
Number_00018
Number_00019
Number_00020
Number_00021
Number_00022
Number_00023
Number_00024
Number_00025
Number_00026
Number_00027
Number_00028
Number_00029
Number_00030
Number_00031
Number_00032
Number_00033
Number_00034
Number_00035
Number_00036
Number_00037
Number_00038
Number_00039
Number_00040
Number_00041
Number_00042
Number_00043
Number_00044
Number_00045
Number_00046
Number_00047
Number_00048
Number_00049
Number_00050
Number_00051
Number_00052
Number_00053
Number_00054
Number_00055
Number_00056
Number_00057
Number_00058
Number_00059
Number_00060
Number_00061
Number_00062
Number_00063
Number_00064
Number_00065
Number_00066
Number_00067
Number_00068
Number_00069
Number_00070
Number_00071
Number_00072
Number_00073
Number_00074
Number_00075
Number_00076
Number_00077

In [92]:
h5file['Results']['Number']['Number_00001']

<HDF5 dataset "Number_00001": shape (20, 380, 210), type "<f4">

In [93]:
for attr in h5file['Results']['OilSpill'].attrs: print(attr)

Minimum
Maximum


In [94]:
for foo in h5file['Results']['OilSpill']: print(foo)

Data_2D
Data_3D


In [95]:
for attr, value in h5file['Results']['OilSpill']['Data_2D'].attrs.items(): print(attr, value)

Minimum 9900000000000000.0
Maximum -9900000000000000.0


In [96]:
for foo in h5file['Results']['OilSpill']['Data_2D']: print(foo)

Beaching Time
Oil Arrival Time
OilConcentration_2D
Thickness_2D


In [97]:
h5file['Results']['OilSpill']['Data_2D']['OilConcentration_2D']

<HDF5 group "/Results/OilSpill/Data_2D/OilConcentration_2D" (167 members)>

In [98]:
h5file['Results']['OilSpill']['Data_2D']['OilConcentration_2D']['OilConcentration_2D_00001']

<HDF5 dataset "OilConcentration_2D_00001": shape (380, 210), type "<f4">

In [99]:
for attr, value in h5file['Results']['OilSpill']['Data_2D']['OilConcentration_2D']['OilConcentration_2D_00001'].attrs.items(): print (attr, value)

Minimum -9900000000000000.0
Maximum -9900000000000000.0
Units b'ppm'


In [100]:
for attr, value in h5file['Results']['OilSpill']['Data_3D'].attrs.items(): print(attr, value)

Minimum 9900000000000000.0
Maximum -9900000000000000.0


In [101]:
for foo in h5file['Results']['OilSpill']['Data_3D']: print(foo)

Dissolution_3D
OilConcentration_3D


## PyTables

In [54]:
import tables

In [102]:
h5file = tables.open_file('../../SalishSeaShihan/results/Lagrangian_7_nested_5.hdf5', mode='r')
for group in h5file.walk_groups(): print(group)

/ (RootGroup) ''
/Grid (Group) ''
/Results (Group) ''
/Time (Group) ''
/Results/Number (Group) ''
/Results/OilSpill (Group) ''
/Results/Percentage Contaminated (Group) ''
/Results/OilSpill/Data_2D (Group) ''
/Results/OilSpill/Data_3D (Group) ''
/Results/OilSpill/Data_3D/Dissolution_3D (Group) ''
/Results/OilSpill/Data_3D/OilConcentration_3D (Group) ''
/Results/OilSpill/Data_2D/Beaching Time (Group) ''
/Results/OilSpill/Data_2D/Oil Arrival Time (Group) ''
/Results/OilSpill/Data_2D/OilConcentration_2D (Group) ''
/Results/OilSpill/Data_2D/Thickness_2D (Group) ''
/Grid/OpenPoints (Group) ''
/Grid/VerticalZ (Group) ''


In [103]:
for node in h5file.walk_nodes('/Results/Number'): print(node)

/Results/Number (Group) ''
/Results/Number/Number_00001 (CArray(20, 380, 210), zlib(6)) ''
/Results/Number/Number_00002 (CArray(20, 380, 210), zlib(6)) ''
/Results/Number/Number_00003 (CArray(20, 380, 210), zlib(6)) ''
/Results/Number/Number_00004 (CArray(20, 380, 210), zlib(6)) ''
/Results/Number/Number_00005 (CArray(20, 380, 210), zlib(6)) ''
/Results/Number/Number_00006 (CArray(20, 380, 210), zlib(6)) ''
/Results/Number/Number_00007 (CArray(20, 380, 210), zlib(6)) ''
/Results/Number/Number_00008 (CArray(20, 380, 210), zlib(6)) ''
/Results/Number/Number_00009 (CArray(20, 380, 210), zlib(6)) ''
/Results/Number/Number_00010 (CArray(20, 380, 210), zlib(6)) ''
/Results/Number/Number_00011 (CArray(20, 380, 210), zlib(6)) ''
/Results/Number/Number_00012 (CArray(20, 380, 210), zlib(6)) ''
/Results/Number/Number_00013 (CArray(20, 380, 210), zlib(6)) ''
/Results/Number/Number_00014 (CArray(20, 380, 210), zlib(6)) ''
/Results/Number/Number_00015 (CArray(20, 380, 210), zlib(6)) ''
/Results/Numb

In [104]:
h5file.root.Results.Number.Number_00001.attrs

/Results/Number/Number_00001._v_attrs (AttributeSet), 3 attributes:
   [Maximum := -9900000000000000.0,
    Minimum := -9900000000000000.0,
    Units := b'a']

In [105]:
h5file.root.Results.OilSpill.Data_3D.OilConcentration_3D.OilConcentration_3D_00001.attrs

/Results/OilSpill/Data_3D/OilConcentration_3D/OilConcentration_3D_00001._v_attrs (AttributeSet), 3 attributes:
   [Maximum := -9900000000000000.0,
    Minimum := -9900000000000000.0,
    Units := b'Kg/m3']

In [106]:
h5file.root.Results.OilSpill.Data_3D.OilConcentration_3D.OilConcentration_3D_00001

/Results/OilSpill/Data_3D/OilConcentration_3D/OilConcentration_3D_00001 (CArray(20, 380, 210), zlib(6)) ''
  atom := Float32Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (20, 380, 210)