{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%autosave 10"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "javascript": [
        "IPython.notebook.set_autosave_interval(10000)"
       ],
       "metadata": {},
       "output_type": "display_data"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Autosaving every 10 seconds\n"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Intro\n",
      "\n",
      "- Particle accelerator, close to speed to light. Synchotron.\n",
      "- Data analysis during and after run.\n",
      "- Many users, time constraints.\n",
      "- Java-based code, General Data Acquisition (GDA)\n",
      "    - But want scientists to script/extend it.\n",
      "    - Jython.\n",
      "- Every increasing throughput of data.\n",
      "    - 2007: 10MB/s\n",
      "    - 2009: 60MB/s\n",
      "    - 2011: 150MB/s\n",
      "    - 2013: 600MB/s\n",
      "    - 2015: 6000MB/s\n",
      "    - doubling every 7.5 months!\n",
      "    - peaks at 1TB/day right now\n",
      "\n",
      "##\u00a0Data storage\n",
      "\n",
      "- 1PB near-line, 0.5PB on-line.\n",
      "- 200M+ files.\n",
      "- High performance parallel file systems *hate* lots of small files.\n",
      "- So they moved bigger files as HDF5, but it hates ASCII files.\n",
      "\n",
      "##\u00a0Big data\n",
      "\n",
      "- Volume, variety, veracity, velocity\n",
      "\n",
      "##\u00a0Tooling\n",
      "\n",
      "- Excel, MATLAB, etc., all assume data fits onto laptop\n",
      "- These tools do not scale to big data, at least at reasonable price.\n",
      "\n",
      "## Python\n",
      "\n",
      "- Free, easily distributable\n",
      "- Already used some via Jython\n",
      "- But how to spread it?\n",
      "    - Extend their existing acquisition tools to spin off new analysis tools\n",
      "- Use PyDev and Eclipse heavily\n",
      "    - Spun off as \"Dawn\" product\n",
      "- Use `scisoftpy` to tie PyDev with HDF5 storage.\n",
      "    - Like `matplotlib` but in Eclipse, do e.g. line fitting\n",
      "- MATLAB/IDL requires expensive support, but with Python easier to support.\n",
      "    - Easier to create sustainable software that survives over time.\n",
      "\n",
      "##\u00a0Optimization\n",
      "\n",
      "- Need a magnetic array to wiggle light beam as it travels\n",
      "- But magnets have imperfections.\n",
      "- Use Python to optimize an objective function.\n",
      "- Originally, Python was 1k slower than Fortran, because direct port.\n",
      "- Then NumPy, 10 times slower, but much cleaner than Fortran.\n",
      "- It became obvious how to improve caching, then speed same.\n",
      "- Eventually Python is 100 times faster.\n",
      "- Then instead of simulated annealing, used Artificial Immune systems\n",
      "    - Global.\n",
      "    - Slower.\n",
      "    - Parallelization, very simple, embarassingly parallel.\n",
      "            - numpy with threads.\n",
      "            - 25 machines, 200 CPUs.\n",
      "\n",
      "## Data reduction / processing\n",
      "\n",
      "- A lot of pre-existing FORTRAN code, but only single core\n",
      "- Want to create a data pipeline that parallelises.\n",
      "- Python to glue together data pipeline.\n",
      "    - Not so much Python to do core processing, but DIALS is a project to do this.\n",
      "\n",
      "## Tomography (Current Implementation)\n",
      "\n",
      "- Need large TIFFs to do processing, single GPU, uses CUDA\n",
      "- Store in HDF5, use Python to write out TIFFs, multi GPUs.\n",
      "- Going forward\n",
      "    - `zeromq` as transport, `blosc` as compression\n",
      "    - can still use CUDA via Python.\n",
      "\n",
      "##\u00a0multiprocessing + MPI \"profiling\"\n",
      "\n",
      "- Drew a multi-thread multi-thread image of profile, time series.\n",
      "- Based on log messages, not fine grained like `cProfile`\n",
      "- But discovered `h5py` holds the GIL\n",
      "- `google.visualization.DataTable()` JavaScript, use Python to parse logs, `jinja2` to produce HTML\n",
      "- Much better than grepping!\n",
      "\n",
      "## Summary\n",
      "\n",
      "- Python is the best tool to let scientists to become developers.\n",
      "- Rate of change of techniques and equipment is increasing.\n",
      "    - Python supports this.\n",
      "    \n",
      "http://www.dawnsci.org\n",
      "http://www.diamond.ac.uk"
     ]
    }
   ],
   "metadata": {}
  }
 ]
}