{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# PYROSETTA.DISTRIBUTED - RosettaScripts/Python Interface Integration" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Integration Components\n", "The python software ecosystem relies on a small set of shared core interfaces utilizing primitive language-native data structures, pure function invocation, and object serialization to provide loosely coupled interoperability between independent software components. Our component, the `pyrosetta.distributed` namespace, utilizes established elements of the Rosetta internal architecture: the Pose model & score representation, RosettaScript protocols, and Pose serialization. \n", "\n", "The adoption of a small set of core interfaces supports integration with an array of scientific computing tools, including support for interactive development environments, common record-oriented data formats, statistical analysis and machine learning packages, and multiple distributed computing packages. The pyrosetta.distributed package provides example integrations with several preferred packages for data analysis (Pandas), distributed computing (Dask), and interactive development (Jupyter Notebook), but is loosely coupled to allow later integration with additional libraries." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PyRosetta-4 2019 [Rosetta PyRosetta4.conda.linux.CentOS.python37.Release 2019.22+release.d8f9b4a90a8f2caa32948bacdb6e551591facd5f 2019-05-30T13:47:16] retrieved from: http://www.pyrosetta.org\n", "(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.\n" ] } ], "source": [ "import pyrosetta.distributed\n", "\n", "# Distributed components perform default initialization on-demand, but \n", "# can be request custom initialization via\n", "pyrosetta.distributed.maybe_init()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Structures (pyrosetta.distributed.packed_pose)\n", "“Primitive” datatypes form a primary interface between many python libraries and, though not strictly defined, typically include the built-in scalar types (string, int, bool, float, ...), key-value dicts, and lists. Libraries operating on more complex user-defined classes often expose routines interconverting to and from primitive datatypes, and primitive datatypes can be efficiently serialized in multiple formats.\n", "For interaction between Rosetta protocol components and external libraries, we developed the `pyrosetta.distributed.packed_pose` namespace. This implements an isomorphism between the Pose object and dict-like records of the molecular model and scores. The Pose class represents a mutable, full-featured molecular model with non-trivial memory footprint. A Pose may be inexpensively interconverted to a compact binary encoding via recently developed cereal-based serialization in the suite. This serialized format is used to implement the `PackedPose` class, an immutable record containing model scores and the encoded model, which is isomorphic to a dict-based record. Adaptor functions within the packed_pose namespace freely adapt between collections of Pose (`packed_pose.to_pose`), PackedPose (`packed_pose.to_packed`), dict-records (`packed_pose.to_dict`) and pandas.DataFrame objects. (Fig 2.A)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "dict_keys(['pickled_pose'])" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pyrosetta.distributed.packed_pose as packed_pose\n", "import pyrosetta.distributed.io as io\n", "import requests\n", "import pandas\n", "\n", "ubq = io.pose_from_pdbstring(requests.get(\"https://files.rcsb.org/download/1UBQ.pdb\").text)\n", "\n", "# Packed pose structures interconvert between multiple datatypes.\n", "display(ubq)\n", "display(packed_pose.to_pose(ubq))\n", "display(packed_pose.to_dict(ubq).keys())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A dict-record and DataFrame interface provides zero-friction integration with a wide variety of data analysis tools and storage formats. For example, the record-oriented format can be passed through statsmodels or scikit-learn based filtering and analysis and written to any json-encoded text file, avro record-oriented storage, or parquet column-oriented storage. The pyrosetta.distributed.io namespace implements functions that mirror the pyrosetta.io namespace, providing conversion between PackedPose and the PDB, MMCIF & Rosetta silent-file formats.\n", "Critically, the PackedPose record format can also be transparently serialized, stored with a minimal memory footprint, and transmitted between processes in a distributed computing context. This allows a distributed system to process PackedPose records as plain data, storing and transmitting a large number of model decoys while only unpacking a small working set into heavyweight Pose objects. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pickled_pose
0gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
1gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
2gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
3gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
4gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
\n", "
" ], "text/plain": [ " pickled_pose\n", "0 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...\n", "1 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...\n", "2 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...\n", "3 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...\n", "4 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2..." ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Collections of packed pose structures interconvert to pandas DataFrame.\n", "\n", "frame_poses = pandas.DataFrame.from_records([packed_pose.to_dict(ubq) for _ in range(5)])\n", "display(frame_poses)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[,\n", " ,\n", " ,\n", " ,\n", " ]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "packed_poses = packed_pose.to_packed(frame_poses)\n", "display(packed_poses)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Protocol Components (pyrosetta.distributed.tasks)\n", "\n", "RosettaScripts uses an XML-based DSL to tersely encode molecular modeling protocols with a pipeline-like dataflow. The rosetta_scripts interpreter functions by parsing, XSD-validating and initializing a single RosettaScripts protocol. It then applies this protocol to input structures repeatedly to produce simulation output. Recent work has expanded support for more complex dataflow, including multi-stage operations and additional logic; however, RosettaScripts is not intended to be a general purpose programming language. \n", "\n", "The pyrosetta.distributed.tasks namespace encapsulates the RosettaScripts interface, allowing the DSL to be utilized within python processes. Protocol components are represented as ‘task’ objects containing an XML encoded script. Task objects are serializable via the standard pickle interface, and they use a simple caching strategy to perform on-demand initialization of the underlying protocol object as needed for task application." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "SingleoutputRosettaScriptsTask(protocol_xml = '\\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n ')" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "ScorePoseTask(patch = None, weights = None)" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "{'fa_atr': -397.6465926658618,\n", " 'fa_rep': 103.70704606947386,\n", " 'fa_sol': 242.95183729178729,\n", " 'fa_intra_rep': 355.46866408199486,\n", " 'fa_intra_sol_xover4': 16.826406860919942,\n", " 'lk_ball_wtd': -8.755571649079277,\n", " 'fa_elec': -113.09090558288852,\n", " 'pro_close': 1.906104764589372,\n", " 'hbond_sr_bb': -18.828056617518506,\n", " 'hbond_lr_bb': -23.131565839644882,\n", " 'hbond_bb_sc': -7.389119588161401,\n", " 'hbond_sc': -1.5490919363291988,\n", " 'dslf_fa13': 0.0,\n", " 'omega': 4.283688243517373,\n", " 'fa_dun': 412.2840241807293,\n", " 'p_aa_pp': -21.346309331921773,\n", " 'yhh_planarity': 0.0,\n", " 'ref': 11.884429999999998,\n", " 'rama_prepro': -16.216376041300332,\n", " 'total_score': 32.67775729376015}" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pyrosetta.distributed.tasks.score as score\n", "import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts\n", "\n", "# A blank RosettaScripts task\n", "blank_task = rosetta_scripts.SingleoutputRosettaScriptsTask(\"\"\"\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \"\"\")\n", "display(blank_task)\n", "\n", "# A simple scoring task\n", "score_task = score.ScorePoseTask()\n", "display(score_task)\n", "\n", "# The results of filters and scores are available as the PackedPose \"scores\"\n", "scored_ubq = score_task(ubq)\n", "display(scored_ubq.scores)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Task components accept any valid pose-equivalent data structure and return immutable PackedPose data structures by (1) deserializing the input into a short-lived Pose object, (2) applying the parsed protocol to the Pose and (3) serializing the resulting model as a PackedPose. Two task classes, SingleOutputRosettaScriptsTask and MultipleOutputRosettaScriptsTask define either a one-to-one function returning a single output, or a one-to-many protocol component returning a lazy iterator of outputs. All tasks operate as “pure functions”, returning a modified copy rather than directly manipulating input data structures. (Fig 2.B)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "relaxed score: -234.77631712424798\n", "delta score: -267.45407441800813\n" ] } ], "source": [ "relax_task = rosetta_scripts.SingleoutputRosettaScriptsTask(\"\"\"\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\"\"\")\n", "\n", "# Protocol execution does not change the input pose.\n", "# A modified copy is returned.\n", "relaxed_ubq = relax_task(scored_ubq)\n", "\n", "print(f\"relaxed score: {relaxed_ubq.scores['total_score']}\")\n", "print(f\"delta score: {relaxed_ubq.scores['total_score'] - scored_ubq.scores['total_score']}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Interactive Analysis and Notebook-based Computing\n", "Notebook-based interactive analysis, typified by the Jupyter project,18 has become a dominant tool in modern data science software development. In this model, data, code, output, and visualization are combined in a single document which is viewed and edited through a browser-based interface to a remote execution environment.\n", "\n", "To facilitate interactive analysis, we extended the PyRosetta Pose interface to expose total, residue one-body, and residue-pair two-body terms of the Rosetta score function as NumPy structured arrays. Combined with the pandas.DataFrame representation offered in pyrosetta.distributed.packed_pose, this provides an expressive interface for interactive model analysis and selection." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype([('fa_atr', '" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Pose energies are available under the energies *_energies_array accessor functions.\n", "\n", "source_energies = scored_ubq.pose.energies()\n", "relaxed_energies = relaxed_ubq.pose.energies()\n", "display(relaxed_energies.residue_onebody_energies_array().dtype)\n", "\n", "source_frame = pandas.DataFrame.from_records(source_energies.residue_total_energies_array())\n", "relaxed_frame = pandas.DataFrame.from_records(relaxed_energies.residue_total_energies_array())\n", "\n", "delta = relaxed_frame - source_frame\n", "delta.index.name=\"residue index\"\n", "delta[[\"total_score\"]].plot(title=\"Delta score via relax.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also integrated existing documentation into the pyrosetta.distributed.docs namespace to allow introspection-based exploration of Mover and Filter " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['BuriedUnsatHbonds',\n", " 'CalculatorFilter',\n", " 'ChainBreak',\n", " 'ChainCountFilter',\n", " 'ChainExists']" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "INFORMATION ABOUT FILTER \"ChainBreak\":\n", "\n", "DESCRIPTION:\n", "\n", "Measures the number of chainBreaks in the pose\n", "\n", "USAGE:\n", "\n", "\n", "\n", "\n", "OPTIONS:\n", "\n", "\"ChainBreak\" tag:\n", "\n", "\tthreshold (int,\"1\"): Number of chainbreaks allowed\n", "\n", "\tchain_num (int,\"1\"): which chain should we check for\n", "\n", "\ttolerance (real,\"0.13\"): the allowed angstrom deviation from the mean optimal bond length\n", "\n", "\tname (string): The name given to this instance.\n", "\n", "\tconfidence (real,\"1.0\"): Probability that the pose will be filtered out if it does not pass this Filter\n", "\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pyrosetta.distributed.docs as docs\n", "display(dir(docs.filters)[15:20])\n", "display(docs.filters.ChainBreak)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "RosettaScripts components. Existing tools for web-based biomolecular visualization, such as `py3dmol` and `NGLview` extend this interface to a fully-featured biomolecular simulation, analysis, and visualization environment. (Fig 5)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "application/3dmoljs_load.v0": "
\n

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol

\n
\n", "text/html": [ "
\n", "

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n", " jupyter labextension install jupyterlab_3dmol

\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import py3Dmol\n", "view = py3Dmol.view(linked=False, width=600, height=600)\n", "view.addModel( io.to_pdbstring(relaxed_ubq), \"pdb\")\n", "view.setStyle({'stick':{}})\n", "view.addStyle({'cartoon':{}})\n", "view.zoomTo()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Multithreaded and Distributed Execution\n", "Remote notebook execution has the distinct advantage of allowing a user to access computational resources far beyond the capabilities of a single workstation. By using tools such as Dask via the integrations described above, a remote notebook interface can be used to manage a distributed simulation spanning hundreds of cores for rapid model analysis, and it offers a viable alternative to traditional batch-based computing for some classes of simulation. " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "import dask\n", "import dask.distributed\n", "\n", "# Establish a single-node cluster of worker processes.\n", "# See dask.distributed documentation for multi-node cluster tools.\n", "cluster = dask.distributed.Client(dask.distributed.LocalCluster())\n", "print(cluster)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rosetta-based simulations frequently involve execution of a large number of independent monte-carlo sampling trajectories that all begin from a single starting structure; in other words, they are “embarrassingly” or “trivially” parallel. The Rosetta suite implements a job distribution framework to manage I/O and task scheduling for parallelizable workloads of this type; this allows the rosetta_scripts interpreter to operate as a single process or within MPI, BOINC, and other distributed computing frameworks. Semantics of the RosettaScripts language have also evolved to incorporate non-trivial forms of parallelism, including support for multi-stage scatter/gather protocols. Though fully functional, this framework is optimized for operation as a standalone application and does not provide straightforward integration with third party tools or generalized program logic.\n", "\n", "The combination of immutable data structures and pure function interfaces implemented in the pyrosetta.distributed namespace provides an alternative approach to job parallelization by integrating RosettaScripts as a submodule that is compatible with dask.distributed and other task-based distributed computing frameworks. By virtue of reliance on standard python primitives, the `pyrosetta.distributed` namespace is not tightly coupled to a single execution engine. Single-node scheduling may be managed via the standard `multiprocessing` or `concurrent.futures` interfaces, providing a zero-dependency solution for small-scale sampling or analysis tasks. Execution via MPI-based HPC deployments may be managed via the `mpi4py` interface.\n", "\n", "To support effective distributed execution, the pyrosetta.distributed namespace is intended to be installed via a build configuration of PyRosetta, provided by conda packages described above, supporting multithreaded execution. This variant utilizes existing work establishing thread-safety in the suite, and it releases the CPython global interpreter lock when calling compiled Rosetta interfaces. This enables multi-core concurrent execution of independent modeling trajectories via python-managed threads, as well as python-level operations such as network I/O and process heartbeats to occur concurrently with long-running Rosetta API calls." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Delayed('SingleoutputRosettaScriptsTask(protocol_xml = \\'\\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n\\')-bc7d516c-6955-4e22-a22e-48a279f4c541'),\n", " Delayed('SingleoutputRosettaScriptsTask(protocol_xml = \\'\\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n\\')-5885becc-e258-42e7-bdba-8acf92ec2727'),\n", " Delayed('SingleoutputRosettaScriptsTask(protocol_xml = \\'\\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n \\\\n\\')-2f0c494e-0885-4f39-9f30-236c4c42115a')]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# A \"delayed\" task is distributed on the worker clusters\n", "delayed_relax = dask.delayed(rosetta_scripts.SingleoutputRosettaScriptsTask(\"\"\"\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\"\"\"))\n", "relax_tasks = [delayed_relax(ubq) for _ in range(64)]\n", "display(relax_tasks[:3])" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Persist, beginning computation on the distributed cluster.\n", "relax_tasks, = dask.persist(relax_tasks)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "top - 21:42:52 up 3:06, 0 users, load average: 0.12, 0.08, 0.30\n", "Tasks: 636 total, 7 running, 629 sleeping, 0 stopped, 0 zombie\n", "%Cpu(s): 1.0 us, 0.0 sy, 0.0 ni, 98.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st\n", "KiB Mem : 19383412+total, 18696489+free, 3417512 used, 3451716 buff/cache\n", "KiB Swap: 0 total, 0 free, 0 used. 18897763+avail Mem \n", "\n", " PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND\n", "20246 lexaf 20 0 1614576 442568 198956 R 81.2 0.2 0:01.59 python\n", "20250 lexaf 20 0 1615360 441368 198936 R 81.2 0.2 0:01.59 python\n", "20252 lexaf 20 0 1613520 439736 198764 S 81.2 0.2 0:01.60 python\n", "20254 lexaf 20 0 1610652 437384 198760 S 81.2 0.2 0:01.60 python\n", "20260 lexaf 20 0 1614328 442196 198952 R 81.2 0.2 0:01.58 python\n", "20248 lexaf 20 0 1616860 439768 198928 R 75.0 0.2 0:01.59 python\n", "20256 lexaf 20 0 1613540 441372 198936 R 75.0 0.2 0:01.58 python\n", "20258 lexaf 20 0 1614572 442544 198892 R 75.0 0.2 0:01.58 python\n", " 9338 root 20 0 0 0 0 S 18.8 0.0 0:11.11 socknal_sd+\n", " 9372 root 20 0 0 0 0 S 6.2 0.0 0:00.24 ptlrpcd_02+\n", "11121 root 20 0 0 0 0 S 6.2 0.0 0:00.29 ldlm_bl_04\n", "14254 root 20 0 0 0 0 S 6.2 0.0 0:00.15 ldlm_bl_08\n", "20173 lexaf 20 0 3570384 912656 235400 S 6.2 0.5 0:22.89 ZMQbg/1\n" ] } ], "source": [ "# Multi-threaded worker processes begin a distributed relax.\n", "!top -bn1 | head -n 20" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# Compute, pulling results from workers when completed.\n", "relax_results, = dask.compute(relax_tasks)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fa_atrfa_repfa_solfa_intra_repfa_intra_sol_xover4lk_ball_wtdfa_elecpro_closehbond_sr_bbhbond_lr_bb...hbond_scdslf_fa13omegafa_dunp_aa_ppyhh_planarityreframa_preprototal_scorepickled_pose
0-414.53354887.017880240.935466171.63462113.561212-8.316931-138.8021210.137294-21.396509-24.637263...-9.8425930.014.882225141.846081-28.8108770.00167111.88443-25.225484-236.875258gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
1-417.61603689.665402250.540932172.23560014.865754-8.941395-143.7299470.115414-22.292130-25.138046...-11.7095810.013.564104147.703490-26.7798530.00514611.88443-24.831541-233.748825gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
2-416.03769686.559167241.219787171.87216113.114300-9.171195-133.3307780.140058-21.338510-24.602856...-9.8473750.014.890717137.678186-28.6456640.00162211.88443-25.542342-236.950268gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
3-413.65253286.523066241.708757173.25391013.964998-8.175880-139.5442560.134523-21.332893-24.553072...-9.8700900.014.423985139.206573-28.5248400.00467611.88443-25.556470-237.413003gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
4-405.05786583.600478250.550884164.49279814.157138-10.255674-143.1074070.130500-21.819339-24.785927...-10.5756840.010.606302132.080317-26.9933730.00950311.88443-24.741133-232.844306gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
..................................................................
59-412.84965582.278599256.142183170.45700612.826256-8.202753-141.3289950.075621-21.420823-24.372237...-9.5894710.014.844322123.350891-27.3653440.00134211.88443-28.595809-241.388086gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
60-417.56496585.638752251.586095167.49423814.068854-7.520947-146.3427240.112407-21.603107-24.666775...-12.0796060.015.722080128.969237-27.4883740.00352211.88443-25.949225-250.929621gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
61-406.71830981.164953250.565289167.86135414.206585-7.785318-145.6537550.138510-21.896157-25.440506...-10.8263150.013.309296127.432946-27.1710710.00176911.88443-25.857702-244.112752gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
62-405.33871781.308925242.362518170.44091313.111082-8.250842-138.9328860.111431-21.123840-25.344347...-13.2520320.014.103101133.353189-26.2219170.00049511.88443-23.497958-238.356824gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
63-421.35582888.145221253.151362167.47235614.056122-8.470014-146.6580220.105385-21.711494-24.673435...-12.0783740.015.877242132.712699-27.7562110.00686811.88443-26.570178-249.290683gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
\n", "

64 rows × 21 columns

\n", "
" ], "text/plain": [ " fa_atr fa_rep fa_sol fa_intra_rep fa_intra_sol_xover4 \\\n", "0 -414.533548 87.017880 240.935466 171.634621 13.561212 \n", "1 -417.616036 89.665402 250.540932 172.235600 14.865754 \n", "2 -416.037696 86.559167 241.219787 171.872161 13.114300 \n", "3 -413.652532 86.523066 241.708757 173.253910 13.964998 \n", "4 -405.057865 83.600478 250.550884 164.492798 14.157138 \n", ".. ... ... ... ... ... \n", "59 -412.849655 82.278599 256.142183 170.457006 12.826256 \n", "60 -417.564965 85.638752 251.586095 167.494238 14.068854 \n", "61 -406.718309 81.164953 250.565289 167.861354 14.206585 \n", "62 -405.338717 81.308925 242.362518 170.440913 13.111082 \n", "63 -421.355828 88.145221 253.151362 167.472356 14.056122 \n", "\n", " lk_ball_wtd fa_elec pro_close hbond_sr_bb hbond_lr_bb ... \\\n", "0 -8.316931 -138.802121 0.137294 -21.396509 -24.637263 ... \n", "1 -8.941395 -143.729947 0.115414 -22.292130 -25.138046 ... \n", "2 -9.171195 -133.330778 0.140058 -21.338510 -24.602856 ... \n", "3 -8.175880 -139.544256 0.134523 -21.332893 -24.553072 ... \n", "4 -10.255674 -143.107407 0.130500 -21.819339 -24.785927 ... \n", ".. ... ... ... ... ... ... \n", "59 -8.202753 -141.328995 0.075621 -21.420823 -24.372237 ... \n", "60 -7.520947 -146.342724 0.112407 -21.603107 -24.666775 ... \n", "61 -7.785318 -145.653755 0.138510 -21.896157 -25.440506 ... \n", "62 -8.250842 -138.932886 0.111431 -21.123840 -25.344347 ... \n", "63 -8.470014 -146.658022 0.105385 -21.711494 -24.673435 ... \n", "\n", " hbond_sc dslf_fa13 omega fa_dun p_aa_pp yhh_planarity \\\n", "0 -9.842593 0.0 14.882225 141.846081 -28.810877 0.001671 \n", "1 -11.709581 0.0 13.564104 147.703490 -26.779853 0.005146 \n", "2 -9.847375 0.0 14.890717 137.678186 -28.645664 0.001622 \n", "3 -9.870090 0.0 14.423985 139.206573 -28.524840 0.004676 \n", "4 -10.575684 0.0 10.606302 132.080317 -26.993373 0.009503 \n", ".. ... ... ... ... ... ... \n", "59 -9.589471 0.0 14.844322 123.350891 -27.365344 0.001342 \n", "60 -12.079606 0.0 15.722080 128.969237 -27.488374 0.003522 \n", "61 -10.826315 0.0 13.309296 127.432946 -27.171071 0.001769 \n", "62 -13.252032 0.0 14.103101 133.353189 -26.221917 0.000495 \n", "63 -12.078374 0.0 15.877242 132.712699 -27.756211 0.006868 \n", "\n", " ref rama_prepro total_score \\\n", "0 11.88443 -25.225484 -236.875258 \n", "1 11.88443 -24.831541 -233.748825 \n", "2 11.88443 -25.542342 -236.950268 \n", "3 11.88443 -25.556470 -237.413003 \n", "4 11.88443 -24.741133 -232.844306 \n", ".. ... ... ... \n", "59 11.88443 -28.595809 -241.388086 \n", "60 11.88443 -25.949225 -250.929621 \n", "61 11.88443 -25.857702 -244.112752 \n", "62 11.88443 -23.497958 -238.356824 \n", "63 11.88443 -26.570178 -249.290683 \n", "\n", " pickled_pose \n", "0 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2... \n", "1 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2... \n", "2 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2... \n", "3 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2... \n", "4 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2... \n", ".. ... \n", "59 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2... \n", "60 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2... \n", "61 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2... \n", "62 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2... \n", "63 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2... \n", "\n", "[64 rows x 21 columns]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fa_atrfa_repfa_solfa_intra_repfa_intra_sol_xover4lk_ball_wtdfa_elecpro_closehbond_sr_bbhbond_lr_bbhbond_bb_schbond_scdslf_fa13omegafa_dunp_aa_ppyhh_planarityreframa_preprototal_score
count64.00000064.00000064.00000064.00000064.00000064.00000064.00000064.00000064.00000064.00000064.00000064.00000064.064.00000064.00000064.00000064.0000006.400000e+0164.00000064.000000
mean-412.39838784.666632248.952723170.86271813.487279-8.242926-141.8909990.120264-21.631658-24.815269-12.999627-10.9334930.014.296867131.351531-27.5257430.0124381.188443e+01-25.539245-241.352149
std5.0920302.4582254.8473165.3444700.7014800.8264973.7702030.0456770.4160490.4488411.4573421.3336510.02.0642107.3133660.8967680.0631181.790399e-151.2688655.138987
min-421.89695779.929049240.367470161.35766312.286200-10.388430-150.2252410.059318-22.643724-25.658065-16.354001-14.2176830.08.937707115.768251-29.9413820.0000021.188443e+01-29.497474-253.163619
25%-416.17772882.952851245.247657167.17025712.843353-8.705077-144.7551540.092470-21.936929-25.122118-13.891951-12.0754590.012.979226125.540210-27.9272360.0013171.188443e+01-26.209107-244.884977
50%-412.69855584.353729249.709019170.46130413.616494-8.236696-142.3953100.112472-21.694169-24.795306-13.215612-10.6854790.014.532800131.981932-27.4479000.0027301.188443e+01-25.549406-241.091948
75%-407.81440186.110133251.954614174.03360514.086546-7.673382-139.7328800.132240-21.382692-24.576877-11.952388-9.8535090.015.829100136.166153-26.8384870.0060901.188443e+01-24.869320-237.144055
max-400.18579990.215249257.739908185.31301514.865754-6.772219-131.3054150.338816-20.472891-22.606144-9.671601-7.8054680.018.443958147.703490-26.1368420.5071801.188443e+01-21.722543-230.463111
\n", "
" ], "text/plain": [ " fa_atr fa_rep fa_sol fa_intra_rep fa_intra_sol_xover4 \\\n", "count 64.000000 64.000000 64.000000 64.000000 64.000000 \n", "mean -412.398387 84.666632 248.952723 170.862718 13.487279 \n", "std 5.092030 2.458225 4.847316 5.344470 0.701480 \n", "min -421.896957 79.929049 240.367470 161.357663 12.286200 \n", "25% -416.177728 82.952851 245.247657 167.170257 12.843353 \n", "50% -412.698555 84.353729 249.709019 170.461304 13.616494 \n", "75% -407.814401 86.110133 251.954614 174.033605 14.086546 \n", "max -400.185799 90.215249 257.739908 185.313015 14.865754 \n", "\n", " lk_ball_wtd fa_elec pro_close hbond_sr_bb hbond_lr_bb \\\n", "count 64.000000 64.000000 64.000000 64.000000 64.000000 \n", "mean -8.242926 -141.890999 0.120264 -21.631658 -24.815269 \n", "std 0.826497 3.770203 0.045677 0.416049 0.448841 \n", "min -10.388430 -150.225241 0.059318 -22.643724 -25.658065 \n", "25% -8.705077 -144.755154 0.092470 -21.936929 -25.122118 \n", "50% -8.236696 -142.395310 0.112472 -21.694169 -24.795306 \n", "75% -7.673382 -139.732880 0.132240 -21.382692 -24.576877 \n", "max -6.772219 -131.305415 0.338816 -20.472891 -22.606144 \n", "\n", " hbond_bb_sc hbond_sc dslf_fa13 omega fa_dun p_aa_pp \\\n", "count 64.000000 64.000000 64.0 64.000000 64.000000 64.000000 \n", "mean -12.999627 -10.933493 0.0 14.296867 131.351531 -27.525743 \n", "std 1.457342 1.333651 0.0 2.064210 7.313366 0.896768 \n", "min -16.354001 -14.217683 0.0 8.937707 115.768251 -29.941382 \n", "25% -13.891951 -12.075459 0.0 12.979226 125.540210 -27.927236 \n", "50% -13.215612 -10.685479 0.0 14.532800 131.981932 -27.447900 \n", "75% -11.952388 -9.853509 0.0 15.829100 136.166153 -26.838487 \n", "max -9.671601 -7.805468 0.0 18.443958 147.703490 -26.136842 \n", "\n", " yhh_planarity ref rama_prepro total_score \n", "count 64.000000 6.400000e+01 64.000000 64.000000 \n", "mean 0.012438 1.188443e+01 -25.539245 -241.352149 \n", "std 0.063118 1.790399e-15 1.268865 5.138987 \n", "min 0.000002 1.188443e+01 -29.497474 -253.163619 \n", "25% 0.001317 1.188443e+01 -26.209107 -244.884977 \n", "50% 0.002730 1.188443e+01 -25.549406 -241.091948 \n", "75% 0.006090 1.188443e+01 -24.869320 -237.144055 \n", "max 0.507180 1.188443e+01 -21.722543 -230.463111 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "relax_result_frame = pandas.DataFrame.from_records(packed_pose.to_dict(relax_results))\n", "display(relax_result_frame)\n", "display(relax_result_frame.describe())" ] } ], "metadata": { "kernelspec": { "display_name": "Python [rosetta_pydata_integration] *", "language": "python", "name": "conda-env-rosetta_pydata_integration-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }