{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Streaming: Processing Unlimited Frames On-Disk"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A key feature of trackpy is the ability to process an unlimited number of frames.\n",
    "\n",
    "For feature-finding, this is straightforward: a frame is loaded, features are located, the locations are saved the disk, and the memory is cleared for the next frame. For linking, the problem is more challenging, but trackpy handles all this complexity for you, using as little memory as possible throughout.\n",
    "\n",
    "When data sets become large, beginning-friendly file formats like CSV or Excel become impractical. We recommend using the [HDF5 file format](https://support.hdfgroup.org/HDF5/), which is trackpy can read and write out of the box. (HDF5 is [widely used](http://en.wikipedia.org/wiki/Hierarchical_Data_Format); you can be sure it will be around for many, many years to come.)\n",
    "\n",
    "If you have some other format in mind, see the end of this tutorial, where we explain how to extend trackpy's interface to support other formats.\n",
    "\n",
    "## PyTables\n",
    "\n",
    "You need pytables, which is normally included with the Anaconda distribution. If you find that you don't have it, you can easily install it using conda. Type this command into a Terminal or Command Prompt.\n",
    "\n",
    "    conda install pytables\n",
    "    \n",
    "## Locate Features, Streaming Results into an HDF5 File"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import trackpy as tp\n",
    "import pims"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "@pims.pipeline\n",
    "def gray(image):\n",
    "    return image[:, :, 1]\n",
    "\n",
    "images = gray(pims.open('../sample_data/bulk_water/*.png'))\n",
    "images = images[:10]  # We'll take just the first 10 frames for demo purposes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# For this demo, we'll first remove the file if it already exists.\n",
    "!rm -f data.h5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use `locate` inside a loop:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "with tp.PandasHDFStore('data.h5') as s:  # This opens an HDF5 file. Data will be stored and retrieved by frame number.\n",
    "    for image in images:\n",
    "        features = tp.locate(image, 11, invert=True)  # Find the features in a given frame.\n",
    "        s.put(features)  # Save the features to the file before continuing to the next frame."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "or, equivalently, we can use `batch`, which accepts the storage file as `output`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Frame 9: 573 features\n"
     ]
    }
   ],
   "source": [
    "with tp.PandasHDFStore('data.h5') as s:\n",
    "    tp.batch(images, 11, invert=True, output=s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can get the data for a given frame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>y</th>\n",
       "      <th>x</th>\n",
       "      <th>mass</th>\n",
       "      <th>size</th>\n",
       "      <th>ecc</th>\n",
       "      <th>signal</th>\n",
       "      <th>raw_mass</th>\n",
       "      <th>ep</th>\n",
       "      <th>frame</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>5.509524</td>\n",
       "      <td>497.075000</td>\n",
       "      <td>72.537360</td>\n",
       "      <td>2.870416</td>\n",
       "      <td>0.029523</td>\n",
       "      <td>2.158850</td>\n",
       "      <td>10115.0</td>\n",
       "      <td>0.768330</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>5.652962</td>\n",
       "      <td>295.981547</td>\n",
       "      <td>266.747504</td>\n",
       "      <td>2.322843</td>\n",
       "      <td>0.247683</td>\n",
       "      <td>16.407260</td>\n",
       "      <td>10670.0</td>\n",
       "      <td>0.078022</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>6.350493</td>\n",
       "      <td>68.049288</td>\n",
       "      <td>236.523604</td>\n",
       "      <td>2.351310</td>\n",
       "      <td>0.044731</td>\n",
       "      <td>10.362480</td>\n",
       "      <td>10878.0</td>\n",
       "      <td>0.058368</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>6.405941</td>\n",
       "      <td>336.590347</td>\n",
       "      <td>209.322095</td>\n",
       "      <td>1.996594</td>\n",
       "      <td>0.127966</td>\n",
       "      <td>15.111950</td>\n",
       "      <td>10551.0</td>\n",
       "      <td>0.096638</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>6.899098</td>\n",
       "      <td>432.617521</td>\n",
       "      <td>363.723046</td>\n",
       "      <td>2.855660</td>\n",
       "      <td>0.466993</td>\n",
       "      <td>14.334764</td>\n",
       "      <td>10838.0</td>\n",
       "      <td>0.061340</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          y           x        mass      size       ecc     signal  raw_mass  \\\n",
       "0  5.509524  497.075000   72.537360  2.870416  0.029523   2.158850   10115.0   \n",
       "1  5.652962  295.981547  266.747504  2.322843  0.247683  16.407260   10670.0   \n",
       "2  6.350493   68.049288  236.523604  2.351310  0.044731  10.362480   10878.0   \n",
       "3  6.405941  336.590347  209.322095  1.996594  0.127966  15.111950   10551.0   \n",
       "4  6.899098  432.617521  363.723046  2.855660  0.466993  14.334764   10838.0   \n",
       "\n",
       "         ep  frame  \n",
       "0  0.768330      2  \n",
       "1  0.078022      2  \n",
       "2  0.058368      2  \n",
       "3  0.096638      2  \n",
       "4  0.061340      2  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "with tp.PandasHDFStore('data.h5') as s:\n",
    "    frame_2_results = s.get(2)\n",
    "    \n",
    "frame_2_results.head()  # Display the first few rows."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or dump all the data, if your machine has enough memory to hold it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>y</th>\n",
       "      <th>x</th>\n",
       "      <th>mass</th>\n",
       "      <th>size</th>\n",
       "      <th>ecc</th>\n",
       "      <th>signal</th>\n",
       "      <th>raw_mass</th>\n",
       "      <th>ep</th>\n",
       "      <th>frame</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>4.750000</td>\n",
       "      <td>103.668564</td>\n",
       "      <td>192.862485</td>\n",
       "      <td>2.106615</td>\n",
       "      <td>0.066390</td>\n",
       "      <td>10.808405</td>\n",
       "      <td>10714.0</td>\n",
       "      <td>0.073666</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>5.249231</td>\n",
       "      <td>585.779487</td>\n",
       "      <td>164.659302</td>\n",
       "      <td>2.962674</td>\n",
       "      <td>0.078936</td>\n",
       "      <td>4.222033</td>\n",
       "      <td>10702.0</td>\n",
       "      <td>0.075116</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>5.785986</td>\n",
       "      <td>294.792544</td>\n",
       "      <td>244.624615</td>\n",
       "      <td>2.244542</td>\n",
       "      <td>0.219217</td>\n",
       "      <td>15.874846</td>\n",
       "      <td>10686.0</td>\n",
       "      <td>0.077141</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>5.869369</td>\n",
       "      <td>338.173423</td>\n",
       "      <td>187.458282</td>\n",
       "      <td>2.046201</td>\n",
       "      <td>0.185333</td>\n",
       "      <td>13.088304</td>\n",
       "      <td>10554.0</td>\n",
       "      <td>0.099201</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>6.746377</td>\n",
       "      <td>310.584169</td>\n",
       "      <td>151.486558</td>\n",
       "      <td>3.103294</td>\n",
       "      <td>0.053342</td>\n",
       "      <td>4.475355</td>\n",
       "      <td>10403.0</td>\n",
       "      <td>0.147430</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          y           x        mass      size       ecc     signal  raw_mass  \\\n",
       "0  4.750000  103.668564  192.862485  2.106615  0.066390  10.808405   10714.0   \n",
       "1  5.249231  585.779487  164.659302  2.962674  0.078936   4.222033   10702.0   \n",
       "2  5.785986  294.792544  244.624615  2.244542  0.219217  15.874846   10686.0   \n",
       "3  5.869369  338.173423  187.458282  2.046201  0.185333  13.088304   10554.0   \n",
       "4  6.746377  310.584169  151.486558  3.103294  0.053342   4.475355   10403.0   \n",
       "\n",
       "         ep  frame  \n",
       "0  0.073666      0  \n",
       "1  0.075116      0  \n",
       "2  0.077141      0  \n",
       "3  0.099201      0  \n",
       "4  0.147430      0  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "with tp.PandasHDFStore('data.h5') as s:\n",
    "    all_results = s.dump()\n",
    "    \n",
    "all_results.head()  # Display the first few rows."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can dump the first N frames using `s.dump(N)`.\n",
    "\n",
    "## Link Trajectories, Streaming From and Updating the HDF5 File"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Frame 9: 573 trajectories present.\n"
     ]
    }
   ],
   "source": [
    "with tp.PandasHDFStore('data.h5') as s:\n",
    "    for linked in tp.link_df_iter(s, 3, neighbor_strategy='KDTree'):\n",
    "        s.put(linked)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The original data is overwritten."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>y</th>\n",
       "      <th>x</th>\n",
       "      <th>mass</th>\n",
       "      <th>size</th>\n",
       "      <th>ecc</th>\n",
       "      <th>signal</th>\n",
       "      <th>raw_mass</th>\n",
       "      <th>ep</th>\n",
       "      <th>frame</th>\n",
       "      <th>particle</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>5.509524</td>\n",
       "      <td>497.075000</td>\n",
       "      <td>72.537360</td>\n",
       "      <td>2.870416</td>\n",
       "      <td>0.029523</td>\n",
       "      <td>2.158850</td>\n",
       "      <td>10115.0</td>\n",
       "      <td>0.768330</td>\n",
       "      <td>2</td>\n",
       "      <td>535</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>5.652962</td>\n",
       "      <td>295.981547</td>\n",
       "      <td>266.747504</td>\n",
       "      <td>2.322843</td>\n",
       "      <td>0.247683</td>\n",
       "      <td>16.407260</td>\n",
       "      <td>10670.0</td>\n",
       "      <td>0.078022</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>6.350493</td>\n",
       "      <td>68.049288</td>\n",
       "      <td>236.523604</td>\n",
       "      <td>2.351310</td>\n",
       "      <td>0.044731</td>\n",
       "      <td>10.362480</td>\n",
       "      <td>10878.0</td>\n",
       "      <td>0.058368</td>\n",
       "      <td>2</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>6.405941</td>\n",
       "      <td>336.590347</td>\n",
       "      <td>209.322095</td>\n",
       "      <td>1.996594</td>\n",
       "      <td>0.127966</td>\n",
       "      <td>15.111950</td>\n",
       "      <td>10551.0</td>\n",
       "      <td>0.096638</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>6.899098</td>\n",
       "      <td>432.617521</td>\n",
       "      <td>363.723046</td>\n",
       "      <td>2.855660</td>\n",
       "      <td>0.466993</td>\n",
       "      <td>14.334764</td>\n",
       "      <td>10838.0</td>\n",
       "      <td>0.061340</td>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          y           x        mass      size       ecc     signal  raw_mass  \\\n",
       "0  5.509524  497.075000   72.537360  2.870416  0.029523   2.158850   10115.0   \n",
       "1  5.652962  295.981547  266.747504  2.322843  0.247683  16.407260   10670.0   \n",
       "2  6.350493   68.049288  236.523604  2.351310  0.044731  10.362480   10878.0   \n",
       "3  6.405941  336.590347  209.322095  1.996594  0.127966  15.111950   10551.0   \n",
       "4  6.899098  432.617521  363.723046  2.855660  0.466993  14.334764   10838.0   \n",
       "\n",
       "         ep  frame  particle  \n",
       "0  0.768330      2       535  \n",
       "1  0.078022      2         2  \n",
       "2  0.058368      2         8  \n",
       "3  0.096638      2         3  \n",
       "4  0.061340      2         6  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "with tp.PandasHDFStore('data.h5') as s:\n",
    "    frame_2_results = s.get(2)\n",
    "    \n",
    "frame_2_results.head()  # Display the first few rows."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Framewise Data Interfaces\n",
    "\n",
    "### Built-in interfaces\n",
    "\n",
    "There are three different interfaces. You can use them interchangeably. They offer different performance advantages.\n",
    "\n",
    "* [`PandasHDFStore`](https://github.com/soft-matter/trackpy/blob/c309b83296c720b8b9fd9633259c2b9ea6740eeb/trackpy/framewise_data.py#L103-L161) -- fastest for a small (~100) number of frames\n",
    "* [`PandasHDFStoreBig`](https://github.com/soft-matter/trackpy/blob/c309b83296c720b8b9fd9633259c2b9ea6740eeb/trackpy/framewise_data.py#L164-L227) -- fastest for a medium or large number of frames\n",
    "* [`PandasHDFStoreSingleNode`](https://github.com/soft-matter/trackpy/blob/c309b83296c720b8b9fd9633259c2b9ea6740eeb/trackpy/framewise_data.py#L230-L320) -- optimizes HDF queries that access multiple frames (advanced)\n",
    "\n",
    "### Writing your own interface\n",
    "\n",
    "Trackpy implements a generic interface that could be used to store and retrieve particle tracking data in any file format. We hope that it can make it easier for researchers who use different file formats to exchange data. Any in-house format could be accessed using the same simple interface demonstrated above.\n",
    "\n",
    "At present, the interface is implemented only for HDF5 files. To extend it to any format, write a class subclassing [`trackpy.FramewiseData`](https://github.com/soft-matter/trackpy/blob/c309b83296c720b8b9fd9633259c2b9ea6740eeb/trackpy/framewise_data.py#L14-L86). This custom class must implement the methods `put`, `get`, `close`, and `__iter__` and the properties `max_frame` and `t_column`. Refer to the built-in classes in [framewise_data.py](https://github.com/soft-matter/trackpy/blob/master/trackpy/framewise_data.py) for examples to work from."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "trackpy-examples",
   "language": "python",
   "name": "trackpy-examples"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}