{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Using `PairID` to extract PSA data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we will use the convenience class `PSAIdentifier` to extract Hausdorff pair analysis data generated by PSA." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline\n", "%load_ext autoreload\n", "%autoreload 2\n", "\n", "# Suppress FutureWarning about element-wise comparison to None\n", "# Occurs when calling PSA plotting functions\n", "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set up input data for `PSA` using `MDAnalysis`" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from MDAnalysis import Universe\n", "from MDAnalysis.analysis.psa import PSAnalysis\n", "from pair_id import PairID" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Initialize lists for the methods on which to perform PSA. PSA will be performed for four different simulations methods with three runs for each: **DIMS**, **FRODA**, **rTMD-F**, and **rTMD-S**. Also initialize a `PSAIdentifier` object to keep track of the data corresponding to comparisons between pairs of simulations." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "method_names = ['DIMS','FRODA','rTMD-F','rTMD-S']\n", "labels = [] # Heat map labels (not plotted in this example)\n", "simulations = [] # List of simulation topology/trajectory filename pairs\n", "universes = [] # List of MDAnalysis Universes representing simulations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For each method, get the topology and each of three total trajectories (per method). Each simulation is represented as a `(topology, trajectory)` pair of file names, which is appended to a master list of simulations." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for method in method_names:\n", " # Note: DIMS uses the PSF topology format\n", " topname = 'top.psf' if 'DIMS' in method or 'TMD' in method else 'top.pdb'\n", " pathname = 'fitted_psa.dcd'\n", " method_dir = 'methods/{}'.format(method)\n", " if method is not 'LinInt':\n", " for run in xrange(1, 4): # 3 runs per method\n", " run_dir = '{}/{:03n}'.format(method_dir, run)\n", " topology = '{}/{}'.format(method_dir, topname)\n", " trajectory = '{}/{}'.format(run_dir, pathname)\n", " labels.append(method + '(' + str(run) + ')')\n", " simulations.append((topology, trajectory))\n", " else: # only one LinInt trajectory\n", " topology = '{}/{}'.format(method_dir, topname)\n", " trajectory = '{}/{}'.format(method_dir, pathname)\n", " labels.append(method)\n", " simulations.append((topology, trajectory))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generate a list of universes from the list of simulations." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "for sim in simulations:\n", " universes.append(Universe(*sim))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Perform a path similarity analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Initialize a PSA comparison from the universe list using a C$_\\alpha$ trajectory representation." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "psa_hpa = PSAnalysis(universes, path_select='name CA', labels=labels)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generate PSA `Path`s from the trajectories" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "psa_hpa.generate_paths()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Perform a Hausdorff pairs analysis on all of the `Path`s" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "psa_hpa.run_pairs_analysis(hausdorff_pairs=True, neighbors=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extract specific data from PSA" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "identifier = PairID()\n", "for name in method_names:\n", " identifier.add_sim(name, [1,2,3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get the PSA ID for the second DIMS simulation (DIMS 2) and third rTMD-F simulation (rTMD-F 3)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [], "source": [ "pid = identifier.get_pair_id('DIMS 2', 'rTMD-F 3')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use the PSA ID to locate the Hausdorff analysis data for the DIMS 2/rTMD-F 3 comparison:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get the indices of the frames for the DIMS 2 and rTMD-F 3 paths corresponding to the Hausdorff pair" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "{'distance': 1.8656396, 'frames': (48, 97)}" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "psa_hpa.hausdorff_pairs[pid]" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(48, 97)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "psa_hpa.hausdorff_pairs[pid]['frames']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get the rmsd separating the Hausdorff pair (this is the Hausdorff distance!)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1.8656396" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "psa_hpa.hausdorff_pairs[pid]['distance']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get the indices of the nearest neighbor frames for the DIMS 2 and rTMD-F 3 paths" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(array([ 3, 3, 7, 8, 13, 13, 17, 17, 19, 22, 22, 22, 28,\n", " 29, 30, 36, 36, 36, 37, 42, 42, 43, 49, 51, 52, 56,\n", " 57, 57, 57, 57, 63, 63, 68, 68, 68, 69, 76, 80, 80,\n", " 83, 84, 84, 89, 90, 90, 91, 91, 93, 97, 102, 102, 105,\n", " 105, 109, 110, 110, 115, 116, 116, 116, 116, 116, 117, 125, 127,\n", " 127, 127, 129, 129, 129, 129, 130, 129, 133, 133, 133, 133, 133,\n", " 133, 134, 133, 133, 137, 138, 134, 138, 138, 138, 138, 139, 138, 139]),\n", " array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 3,\n", " 3, 3, 3, 3, 3, 3, 4, 4, 4, 10, 10, 10, 10, 10, 10, 12, 12,\n", " 14, 14, 14, 14, 15, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17,\n", " 17, 17, 17, 17, 22, 22, 26, 29, 29, 29, 29, 29, 29, 29, 29, 34, 34,\n", " 34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 36, 37, 37, 42, 42,\n", " 42, 42, 42, 42, 44, 44, 44, 45, 45, 45, 45, 45, 45, 45, 45, 52, 52,\n", " 54, 58, 59, 61, 61, 61, 61, 61, 87, 87, 87, 87, 87, 87, 87, 87, 87,\n", " 87, 89, 89, 89, 89, 89, 89, 89, 89, 89, 91, 91, 90, 90, 90, 90, 90,\n", " 90, 90, 91, 91]))" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "psa_hpa.nearest_neighbors[pid]['frames']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get the nearest neighbor rmsds for the paths" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(array([ 0.61047804, 0.72486514, 0.78072226, 0.83813328, 0.93167013,\n", " 1.01148224, 1.06094921, 1.11567676, 1.15830064, 1.18572509,\n", " 1.19752169, 1.24201703, 1.28473079, 1.3059119 , 1.32284486,\n", " 1.34145117, 1.36143172, 1.36866331, 1.45567334, 1.54358423,\n", " 1.5964545 , 1.63075888, 1.64360261, 1.68257856, 1.72496045,\n", " 1.71841514, 1.71702886, 1.72773194, 1.76599729, 1.72124612,\n", " 1.77787733, 1.76189709, 1.75216496, 1.74139595, 1.7261765 ,\n", " 1.73064542, 1.76115847, 1.7780937 , 1.8280741 , 1.81312251,\n", " 1.81970263, 1.79710293, 1.77227151, 1.7905525 , 1.76451468,\n", " 1.77083576, 1.80586505, 1.83057499, 1.86563957, 1.85736096,\n", " 1.86288309, 1.82903266, 1.7895925 , 1.78329408, 1.74680781,\n", " 1.72502816, 1.72101748, 1.66805542, 1.63964784, 1.60171819,\n", " 1.57724261, 1.5326587 , 1.55232775, 1.51061881, 1.47200549,\n", " 1.44320154, 1.38229811, 1.31344652, 1.28408015, 1.24469936,\n", " 1.19692671, 1.17440617, 1.11309803, 1.0645541 , 1.02498984,\n", " 0.99263138, 0.93730855, 0.91037399, 0.86864704, 0.82913756,\n", " 0.79435873, 0.76110363, 0.73129284, 0.68911695, 0.64772373,\n", " 0.60788435, 0.58289891, 0.55806595, 0.52546072, 0.4935227 ,\n", " 0.4636465 , 0.45959264], dtype=float32),\n", " array([ 0.63109344, 0.62719834, 0.61601019, 0.61047804, 0.62155843,\n", " 0.63865268, 0.66227907, 0.67063749, 0.69626302, 0.72733957,\n", " 0.75623101, 0.7901932 , 0.83100504, 0.85794991, 0.88361055,\n", " 0.90336204, 0.92149943, 0.93565702, 0.97083294, 0.99374866,\n", " 1.03374887, 1.06611073, 1.08965945, 1.13067162, 1.17012823,\n", " 1.20783472, 1.22551477, 1.23616135, 1.23949754, 1.25503623,\n", " 1.27730811, 1.29305518, 1.31983471, 1.32525945, 1.33354521,\n", " 1.33722591, 1.33787072, 1.35226214, 1.3751452 , 1.39359438,\n", " 1.39420414, 1.39986932, 1.40463161, 1.41709375, 1.4386462 ,\n", " 1.45734608, 1.47568536, 1.49810481, 1.51763892, 1.53081644,\n", " 1.55413353, 1.57935131, 1.6103493 , 1.65109909, 1.68054557,\n", " 1.69345903, 1.7044816 , 1.71702886, 1.72613454, 1.72539341,\n", " 1.72467172, 1.72730267, 1.72880113, 1.72190535, 1.73856974,\n", " 1.75418186, 1.7464515 , 1.73684514, 1.7261765 , 1.72830319,\n", " 1.73841536, 1.74597847, 1.74871075, 1.74822044, 1.75148296,\n", " 1.74700916, 1.74772978, 1.75728726, 1.77193272, 1.77165914,\n", " 1.7767179 , 1.78658187, 1.79900873, 1.78998554, 1.78027713,\n", " 1.78107965, 1.77923036, 1.77341914, 1.7738812 , 1.76738906,\n", " 1.76451468, 1.76607394, 1.77978122, 1.7792269 , 1.79973316,\n", " 1.80040133, 1.80270863, 1.81065071, 1.82419837, 1.83073819,\n", " 1.83595932, 1.82261348, 1.80349576, 1.79456735, 1.77086103,\n", " 1.74088705, 1.70968091, 1.68400657, 1.67217898, 1.64538348,\n", " 1.62183571, 1.56865823, 1.50714755, 1.4620024 , 1.39992738,\n", " 1.34150422, 1.28506386, 1.23358619, 1.1904099 , 1.13146091,\n", " 1.08869743, 1.04396403, 0.98481226, 0.92711079, 0.8842231 ,\n", " 0.83658326, 0.77844107, 0.72915876, 0.68647271, 0.63805151,\n", " 0.60030681, 0.56835544, 0.54370099, 0.50965744, 0.48632449,\n", " 0.47754329, 0.47594741, 0.46904811, 0.45977202, 0.45959264], dtype=float32))" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "psa_hpa.nearest_neighbors[pid]['distances']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Display the `pandas` `DataFrame` containing the set of simulations" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sim ID
NameRun ID
DIMS10
21
32
FRODA13
24
35
rTMD-F16
27
38
rTMD-S19
210
311
\n", "
" ], "text/plain": [ " Sim ID\n", "Name Run ID \n", "DIMS 1 0\n", " 2 1\n", " 3 2\n", "FRODA 1 3\n", " 2 4\n", " 3 5\n", "rTMD-F 1 6\n", " 2 7\n", " 3 8\n", "rTMD-S 1 9\n", " 2 10\n", " 3 11" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = identifier.data\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get the simulation IDs for DIMS simulations 1, 2, and 3" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Name Run ID\n", "DIMS 1 0\n", " 2 1\n", " 3 2\n", "Name: Sim ID, dtype: int64" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[('DIMS',[1,2,3]), 'Sim ID']" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }