{
 "metadata": {
  "name": "Tutorial4"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "File Input and Output\n",
      "=====================\n",
      "\n",
      "In this tutorial, we discuss saving an BELT model to disk for later use.  We repeat the same calculations as in the previous tutorials, but this time save the model to disk for later use.  \n",
      "\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import numpy as np\n",
      "import pandas as pd\n",
      "import matplotlib.pyplot as plt\n",
      "from fitensemble import belt, example_loader\n",
      "\n",
      "predictions, measurements, uncertainties = example_loader.load_alanine_numpy()\n",
      "phi, psi, assignments, indicators = example_loader.load_alanine_dihedrals()\n",
      "\n",
      "num_samples = 20000  # Generate 20,000 MCMC samples\n",
      "thin = 25  # Subsample (i.e. thin) the MCMC traces by 25X to ensure independent samples\n",
      "burn = 5000  # Discard the first 5000 samples as \"burn-in\"\n",
      "\n",
      "regularization_strength = 3.0  # How strongly do we prefer a \"uniform\" ensemble (the \"raw\" MD)? "
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Saving to HDF5\n",
      "--------------\n",
      "\n",
      "The following code builds a model, just like before.  However, we now pass a filename argument when we begin the MCMC sampling.  This tells PyMC to save the results to disk as an HDF5 database.  This is useful for situations where the entire MCMC trace cannot fit in system memory.  "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "belt_model = belt.MaxEntBELT(predictions, measurements, uncertainties, regularization_strength)\n",
      "\n",
      "pymc_filename = \"trialanine.h5\"\n",
      "belt_model.sample(num_samples, thin=thin, burn=burn,filename=pymc_filename)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        " \r",
        "[****************100%******************]  25000 of 25000 complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The HDF5 database file trialanine.h5 contains all components necessary to work with your BELT model.  This includes\n",
      "\n",
      "* `predictions`\n",
      "* `measurements`\n",
      "* `uncertainties`\n",
      "* The MCMC trace\n",
      "\n",
      "Loading from HDF5\n",
      "-----------------\n",
      "\n",
      "To load an HDF5 file from disk, use the load() function:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "belt_model = belt.MaxEntBELT.load(\"./trialanine.h5\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As we did previously, this model can be used to calculate the MAP conformational populations (`p`) or a trace (`state_pops_trace`) of an arbitrary observable:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "p = belt_model.accumulate_populations()\n",
      "state_pops_trace = belt_model.trace_observable(indicators.T)\n",
      "\n",
      "state_pops = state_pops_trace.mean(0)\n",
      "pd.DataFrame([state_pops],columns=[r\"$PP_{II}$\",r\"$\\beta$\",r\"$\\alpha_r$\",r\"$\\alpha_l$\"],index=[\"BELT\"])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>$PP_{II}$</th>\n",
        "      <th>$\\beta$</th>\n",
        "      <th>$\\alpha_r$</th>\n",
        "      <th>$\\alpha_l$</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>BELT</th>\n",
        "      <td> 0.556701</td>\n",
        "      <td> 0.389624</td>\n",
        "      <td> 0.050868</td>\n",
        "      <td> 0.002806</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "output_type": "pyout",
       "prompt_number": 7,
       "text": [
        "      $PP_{II}$   $\\beta$  $\\alpha_r$  $\\alpha_l$\n",
        "BELT   0.556701  0.389624    0.050868    0.002806"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As before, we see the conformational populations of our BELT ensemble."
     ]
    }
   ],
   "metadata": {}
  }
 ]
}