{ "metadata": { "name": "", "signature": "sha256:ad9f4c68ddba2ca1959cc0ca7feb3d42145cc3ed37c3d5a0376de78a6bc2049c" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

Python & HDF5 (Hierarchical Data Format)

\n", "\n", "

Link to Kyle's [Bitbucket Repo](https://bitbucket.org/yingkaisha/python-in-remote-sensing/src/tip/_libs/) and some [testing data-sets](https://bitbucket.org/yingkaisha/python-in-remote-sensing/src/tip/_data/_demos/)

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Links to HDF5 & Python\n", "\n", "

\n", "

\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
To use HDF5 & Python, you'll need to install 2 libraries/packages from the Anaconda distribuion.
\n", "\n", "

\n", "Type the following commands into the terminal prompt.\n", "

\n", "\n", "```shell\n", "conda install hdf5\n", "conda install h5py\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

The HDF5 File Structure

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "The 3 components of an HDF5 file (files appended with a \".h5\").\n", "

\n", "\n", "
    \n", "
  1. Datasets
  2. \n", "\n", "
  3. Groups
  4. \n", "\n", "
  5. Attributes
  6. \n", "\n", "
\n", "\n", "

\n", "Primary benefits of HDF5-files:\n", "

\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Example of creating HDF5 files (taken from book 'Python and HDF5')

\n", "
Creation of HDF5 files shows the organizational structure...
\n", "\n", "```Python\n", "import h5py \n", "\n", ">>> f = h5py.File(\"weather_data.h5\")\n", "\n", ">>> f[\"/15/Temperature\"] = temperature_station15\n", ">>> f[\"/15/Temperature\"].attrs[\"dt\"] = 10.0\n", ">>> f[\"/15/Temperature\"].attrs[\"startTime\"] = 1375204299\n", "\n", ">>> f[\"/15/Wind\"] = wind\n", ">>> f[\"/15/Wind\"].attrs[\"dt\"] = 5.0\n", "\n", ">>> f[\"/20/Temperature\"] = temperature_station20\n", "```\n", "

\n", "In the above code-chunk:\n", "

\n", "

\n", "\n", "

\n", "Similarly, accessing the metadata attributes can be done in the following way...\n", "

\n", "\n", "```python\n", ">>> dataset = f[\"15/Temperature\"]\n", ">>> for key, value in dataset.attr.iteritems():\n", " print \"%s: %s\" % (key, value)\n", " \n", " dt: 10.0\n", " start_time = 1375204299\n", "```\n", "\n", "

\n", "

\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
h5dump() module from Kyle's Bitbucket Repo
\n", "```python\n", ">>> def h5dump(filename):\n", " if isinstance(filename,types.StringType):\n", " infile = h5py.File(filename,'r')\n", " elif isinstance(filename,h5py._hl.files.File):\n", " infile=filename\n", " else:\n", " raise IOError, \"need an h5 filename or h5.File instance\"\n", " infile.visititems(print_attrs)\n", " print('-------------------')\n", " print(\"attributes for the root file\")\n", " print('-------------------')\n", " for key,value in infile.attrs.items():\n", " print(\"attribute name: \",key,\"--- value: \",value)\n", " return Non\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Load example.h5 into the notebook
" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import h5py as h5\n", "from h5lib import h5dump, print_attrs\n", "\n", "f = h5.File('example.h5')\n", "\n", "print '\\nHDF5 file \\'example.h5\\' just loaded:\\n\\n%r' % f\n", "print '\\nWe see that the file has been opened in read-mode, hence the \\'mode +r\\'...'" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\n", "HDF5 file 'example.h5' just loaded:\n", "\n", "\n", "\n", "We see that the file has been opened in read-mode, hence the 'mode +r'...\n" ] } ], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Data-type of loaded .h5 file
" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print '\\nThe loaded .h5 file is of type...\\n%s' % type(f)\n", "print '\\nLoaded file is of type \\'File\\'.'" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\n", "The loaded .h5 file is of type...\n", "\n", "\n", "Loaded file is of type 'File'.\n" ] } ], "prompt_number": 40 }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Viewing the metadata of HDF5 files

\n", "\n", "

\n", "There are a few ways of viewing the stored metadata...\n", "

\n", "\n", "

One is to use h5dump() (contribution from Kyle). This routine essentially prints out all of the material stored within the .h5 file.

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Using h5dump on \"example.h5\"
" ] }, { "cell_type": "code", "collapsed": false, "input": [ "h5dump(f)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "item name: Example SDS i2\">\n", " HDF4_OBJECT_TYPE: SDS\n", " HDF4_OBJECT_NAME: Example SDS\n", " HDF4_REF_NUM: 2\n", "item name: Example Vdata \n", " TITLE: Example Vdata\n", " CLASS: TABLE\n", " VERSION: 1.0\n", " FIELD_0_NAME: Idx\n", " FIELD_1_NAME: Temp\n", " FIELD_2_NAME: Dewpt\n", " HDF4_OBJECT_TYPE: Vdata\n", " HDF4_OBJECT_NAME: Example Vdata\n", " HDF4_REF_NUM: 11\n", "item name: Example Vdata_t \n", "item name: HDF4_DIMGROUP \n", "item name: MonthlyRain \n", " HDF4_OBJECT_TYPE: Vgroup\n", " HDF4_OBJECT_NAME: MonthlyRain\n", " HDF4_REF_NUM: 12\n", "item name: MonthlyRain/Data Fields \n", " HDF4_OBJECT_TYPE: Vgroup\n", " HDF4_OBJECT_NAME: Data Fields\n", " HDF4_REF_NUM: 13\n", "item name: MonthlyRain/Data Fields/RrLandRain f4\">\n", " HDF4_OBJECT_TYPE: SDS\n", " HDF4_OBJECT_NAME: RrLandRain\n", " HDF4_REF_NUM: 16\n", " DIMENSION_NAMELIST: ['/HDF4_DIMGROUP/YDim:MonthlyRain' '/HDF4_DIMGROUP/XDim:MonthlyRain']\n", "item name: MonthlyRain/Data Fields/TbOceanRain f4\">\n", " HDF4_OBJECT_TYPE: SDS\n", " DIMENSION_NAMELIST: ['/HDF4_DIMGROUP/YDim:MonthlyRain' '/HDF4_DIMGROUP/XDim:MonthlyRain']\n", " HDF4_OBJECT_NAME: TbOceanRain\n", " HDF4_REF_NUM: 15\n", "item name: MonthlyRain/Grid Attributes \n", " HDF4_OBJECT_TYPE: Vgroup\n", " HDF4_OBJECT_NAME: Grid Attributes\n", " HDF4_REF_NUM: 14\n", "-------------------\n", "attributes for the root file\n", "-------------------\n", "attribute name: HDFEOSVersion_GLOSDS --- value: HDFEOS_V2.16\n", "attribute name: StructMetadata.0_GLOSDS --- value: GROUP=SwathStructure\n", "END_GROUP=SwathStructure\n", "GROUP=GridStructure\n", "\tGROUP=GRID_1\n", "\t\tGridName=\"MonthlyRain\"\n", "\t\tXDim=72\n", "\t\tYDim=28\n", "\t\tUpperLeftPointMtrs=(0.000000,70000000.000000)\n", "\t\tLowerRightMtrs=(360000000.000000,-70000000.000000)\n", "\t\tProjection=GCTP_GEO\n", "\t\tGROUP=Dimension\n", "\t\tEND_GROUP=Dimension\n", "\t\tGROUP=DataField\n", "\t\t\tOBJECT=DataField_1\n", "\t\t\t\tDataFieldName=\"TbOceanRain\"\n", "\t\t\t\tDataType=DFNT_FLOAT32\n", "\t\t\t\tDimList=(\"YDim\",\"XDim\")\n", "\t\t\tEND_OBJECT=DataField_1\n", "\t\t\tOBJECT=DataField_2\n", "\t\t\t\tDataFieldName=\"RrLandRain\"\n", "\t\t\t\tDataType=DFNT_FLOAT32\n", "\t\t\t\tDimList=(\"YDim\",\"XDim\")\n", "\t\t\tEND_OBJECT=DataField_2\n", "\t\tEND_GROUP=DataField\n", "\t\tGROUP=MergedFields\n", "\t\tEND_GROUP=MergedFields\n", "\tEND_GROUP=GRID_1\n", "END_GROUP=GridStructure\n", "GROUP=PointStructure\n", "END_GROUP=PointStructure\n", "END\n", "\n" ] } ], "prompt_number": 33 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n",
      "

\n", "The loaded .h5 file is a organized as an [OrderedDict](https://docs.python.org/2/library/collections.html#collections.OrderedDict) data-container (recall [Python dictionaries](https://docs.python.org/2/tutorial/datastructures.html)).\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Getting all \"items\" (e.g. key-value pairs, or all groups and datasets) stored in the .h5 file
\n", "

\n", "Accessing the .items() method of an .h5 file will give you all of the stored datasets and groups.\n", "

" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print '\\nStored groups and datasets within .h5 file:\\n'.upper()\n", "for each_item in f.items():\n", " print each_item" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\n", "STORED GROUPS AND DATASETS WITHIN .H5 FILE:\n", "\n", "(u'Example SDS', i2\">)\n", "(u'Example Vdata', )\n", "(u'Example Vdata_t', )\n", "(u'HDF4_DIMGROUP', )\n", "(u'MonthlyRain', )\n" ] } ], "prompt_number": 63 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n",
      "

\n", "If only the keys are desired, then simply use the .keys() method.\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Example
" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print '\\nPrinting out all the keys in the imported hdf5 file.\\n'\n", "for n, each_key in enumerate(f.keys()):\n", " print 'Key %d:\\t%s' % (n + 1, each_key)\n", "\n", "print '\\n'" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Printing out all the keys in the imported hdf5 file.\n", "\n", "Key 1:\tExample SDS\n", "Key 2:\tExample Vdata\n", "Key 3:\tExample Vdata_t\n", "Key 4:\tHDF4_DIMGROUP\n", "Key 5:\tMonthlyRain\n", "\n", "\n" ] } ], "prompt_number": 71 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Accessing metadata of a specific dataset or group
\n", "\n", "

\n", "Can use the print_attrs() routine (thanks Kyle). A look at the routine...\n", "

\n", "\n", "```python\n", ">>> def print_attrs(name, obj):\n", " print(\"item name: \",name,repr(obj))\n", " for key, val in obj.attrs.iteritems():\n", " print(\" %s: %s\" % (key, val))\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Example 1: Metadata of a dataset
" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print_attrs('Example SDS', f['Example SDS'])" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "item name: Example SDS i2\">\n", " HDF4_OBJECT_TYPE: SDS\n", " HDF4_OBJECT_NAME: Example SDS\n", " HDF4_REF_NUM: 2\n" ] } ], "prompt_number": 72 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Example 2: Metadata of a group
" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print_attrs('MonthlyRain', f['MonthlyRain'])\n", "\n", "print '''\\n\n", "Here, \n", " - we see 'MonthlyRain' is a group belonging to the group 'Vgroup'.\n", " - ...also see there are 2 additional members (or subgroups) attached to \n", " this group.\n", " - the subgroups can be accessed with the .keys() method.\n", "\\n'''" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "item name: MonthlyRain \n", " HDF4_OBJECT_TYPE: Vgroup\n", " HDF4_OBJECT_NAME: MonthlyRain\n", " HDF4_REF_NUM: 12\n", "\n", "\n", "Here, \n", " - we see 'MonthlyRain' is a group belonging to the group 'Vgroup'.\n", " - ...also see there are 2 additional members (or subgroups) attached to \n", " this group.\n", " - the subgroups can be accessed with the .keys() method.\n", "\n", "\n" ] } ], "prompt_number": 83 }, { "cell_type": "code", "collapsed": false, "input": [ "for n, each_key in enumerate(f['MonthlyRain']):\n", " print 'Key %d:\\t%s' % (n+1, each_key)\n", "\n", "print '''\\n\n", "We can check/confirm what object type each group/dataset is, by using the 'type()' \n", "function.\\n\n", "'''" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Key 1:\tData Fields\n", "Key 2:\tGrid Attributes\n", "\n", "\n", "We can check/confirm what object type each group/dataset is, by using the 'type()' \n", "function.\n", "\n", "\n" ] } ], "prompt_number": 90 }, { "cell_type": "code", "collapsed": false, "input": [ "print 'f[\\'MonthlyRain\\'] is a of type:\\n\\n%s\\n' % type(f['MonthlyRain'])" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "f['MonthlyRain'] is a of type:\n", "\n", "\n", "\n" ] } ], "prompt_number": 92 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Accessing the Data Fields subgroup in f['MonthlyRain'] & accessing its metadata
" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print_attrs('Data Fields Metadata', f['MonthlyRain']['Data Fields'])\n", "\n", "print '\\nHere we see 2 members of the Data Fields group, so we can access \\n\\\n", "additional fields with the .keys() method.\\n'\n", "\n", "print 'Additional groups/datasets:\\n'\n", "for n, each_key in enumerate(f['MonthlyRain']['Data Fields']):\n", " print 'Key %d:\\t%s' % (n+1, each_key)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "item name: Data Fields Metadata \n", " HDF4_OBJECT_TYPE: Vgroup\n", " HDF4_OBJECT_NAME: Data Fields\n", " HDF4_REF_NUM: 13\n", "\n", "Here we see 2 members of the Data Fields group, so we can access \n", "additional fields with the .keys() method.\n", "\n", "Additional groups/datasets:\n", "\n", "Key 1:\tRrLandRain\n", "Key 2:\tTbOceanRain\n" ] } ], "prompt_number": 98 }, { "cell_type": "code", "collapsed": false, "input": [ "type(f['MonthlyRain']['Data Fields']['TbOceanRain'])" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 97, "text": [ "h5py._hl.dataset.Dataset" ] } ], "prompt_number": 97 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Can access the numerical values of specific datasets by using the .value() method.
" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print '\\n', f['MonthlyRain']['Data Fields']['TbOceanRain'].value" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\n", "[[ -1. -1. -1. ..., -1. -1. -1.]\n", " [ -1. -1. -1. ..., -1. -1. -1.]\n", " [ 93. -1. -1. ..., 58. -1. -1.]\n", " ..., \n", " [ -1. -1. -1. ..., -1. -1. -1.]\n", " [ -1. -1. -1. ..., -1. -1. -1.]\n", " [ -1. -1. -1. ..., -1. -1. -1.]]\n" ] } ], "prompt_number": 104 } ], "metadata": {} } ] }