{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "> This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4.10. Manipulating large arrays with HDF5 and PyTables" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "import tables as tb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating an HDF5 file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a new empty HDF5 file." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "f = tb.open_file('myfile.h5', 'w')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We create a new top-level group named \"experiment1\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "f.create_group('/', 'experiment1')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's also add some metadata to this group." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "f.set_node_attr('/experiment1', 'date', '2014-09-01')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this group, we create a 1000*1000 array named \"array1\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "x = np.random.rand(1000, 1000)\n", "f.create_array('/experiment1', 'array1', x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we need to close the file to commit the changes on disk." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "f.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading a HDF5 file" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "f = tb.open_file('myfile.h5', 'r')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can retrieve an attribute by giving the group path and the attribute name." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "f.get_node_attr('/experiment1', 'date')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can access any item in the file using attributes. IPython's tab completion is incredibly useful in this respect when exploring a file interactively." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "y = f.root.experiment1.array1\n", "type(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The array can be used as a NumPy array, but an important distinction is that it is stored on disk instead of system memory. Performing a computation on this array triggers a preliminary loading of the array in memory, so that it is more efficient to only access views on this array." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "np.array_equal(x[0,:], y[0,:])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is also possible to get a node from its absolute path, which is useful when this path is only known at runtime." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "f.get_node('/experiment1/array1')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "f.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Clean-up." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import os\n", "os.remove('myfile.h5')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).\n", "\n", "> [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.2" } }, "nbformat": 4, "nbformat_minor": 0 }