{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta);\n", "content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [PyData Integration](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/11.00-PyData-integration.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) |

\"Open" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Distributed analysis example: exhaustive ddG PSSM\n", "\n", "## Notes\n", "This tutorial will walk you through how to generate an exhaustive ddG PSSM in PyRosetta using the PyData stack for analysis and distributed computing.\n", "\n", "## Citation\n", "[Integration of the Rosetta Suite with the Python Software Stack via reproducible packaging and core programming interfaces for distributed simulation](https://doi.org/10.1002/pro.3721)\n", "\n", "Alexander S. Ford, Brian D. Weitzner, Christopher D. Bahl\n", "\n", "## Manual\n", "Documentation for the `pyrosetta.distributed` namespace can be found here: https://nbviewer.jupyter.org/github/proteininnovation/Rosetta-PyData_Integration/blob/master/distributed_overview.ipynb" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import logging\n", "logging.basicConfig(level=logging.INFO)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas\n", "import seaborn\n", "import matplotlib" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import Bio.SeqUtils\n", "import Bio.Data.IUPACData as IUPACData" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import pyrosetta\n", "import pyrosetta.distributed.io as io\n", "import pyrosetta.distributed.packed_pose as packed_pose\n", "import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts\n", "import pyrosetta.distributed.tasks.score as score" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create test pose, initialize rosetta and pack" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "input_protocol = \"\"\"\n", "\n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:pyrosetta.distributed:maybe_init performing pyrosetta initialization: {'extra_options': '-out:levels all:warning'}\n", "INFO:pyrosetta.rosetta:Found rosetta database at: /home/lexaf/.conda/envs/rosetta_pydata_integration/lib/python3.7/site-packages/pyrosetta/database; using it....\n", "INFO:pyrosetta.rosetta:PyRosetta-4 2019 [Rosetta PyRosetta4.conda.linux.CentOS.python37.Release 2019.22+release.d8f9b4a90a8f2caa32948bacdb6e551591facd5f 2019-05-30T13:47:16] retrieved from: http://www.pyrosetta.org\n", "(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "PyRosetta-4 2019 [Rosetta PyRosetta4.conda.linux.CentOS.python37.Release 2019.22+release.d8f9b4a90a8f2caa32948bacdb6e551591facd5f 2019-05-30T13:47:16] retrieved from: http://www.pyrosetta.org\n", "(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.\n" ] } ], "source": [ "input_relax = rosetta_scripts.SingleoutputRosettaScriptsTask(input_protocol)\n", "# Syntax check via setup\n", "input_relax.setup()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "raw_input_pose = score.ScorePoseTask()(io.pose_from_sequence(\"TESTESTEST\"))\n", "input_pose = input_relax(raw_input_pose)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Perform exhaustive point mutation and pack" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def mutate_residue(input_pose, res_index, new_aa, res_label = None):\n", " import pyrosetta.rosetta.core.pose as pose\n", " \n", " work_pose = packed_pose.to_pose(input_pose)\n", " \n", " # Annotate strucure with reslabel, for use in downstream protocol\n", " # Add parameters as score, for use in downstream analysis\n", " if res_label:\n", " work_pose.pdb_info().add_reslabel(res_index, res_label)\n", " pose.setPoseExtraScore(work_pose, \"mutation_index\", res_index)\n", " pose.setPoseExtraScore(work_pose, \"mutation_aa\", new_aa)\n", " \n", " if len(new_aa) == 1:\n", " new_aa = str.upper(Bio.SeqUtils.seq3(new_aa))\n", " assert new_aa in map(str.upper, IUPACData.protein_letters_3to1)\n", " \n", " protocol = \"\"\"\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \"\"\" % locals()\n", " \n", " return rosetta_scripts.SingleoutputRosettaScriptsTask(protocol)(work_pose)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "refine = \"\"\"\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \"\"\"\n", " \n", "refine_mutation = rosetta_scripts.SingleoutputRosettaScriptsTask(refine)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Mutation and pack" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Job distribution via `multiprocessing`" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:root:mutating\n" ] } ], "source": [ "from multiprocessing import Pool\n", "import itertools\n", "with pyrosetta.distributed.utility.log.LoggingContext(logging.getLogger(\"rosetta\"), level=logging.WARN):\n", " with Pool() as p:\n", " work = [\n", " (input_pose, i, aa, \"mutation\")\n", " for i, aa in itertools.product(range(1, len(packed_pose.to_pose(input_pose).residues) + 1), IUPACData.protein_letters)\n", " ]\n", " logging.info(\"mutating\")\n", " mutations = p.starmap(mutate_residue, work)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Job distribution via `dask`" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:root:refining\n" ] } ], "source": [ "import dask.distributed\n", "cluster = dask.distributed.LocalCluster(n_workers=2, threads_per_worker=2)\n", "client = dask.distributed.Client(cluster)\n", "\n", "refinement_tasks = [client.submit(refine_mutation, mutant) for mutant in mutations]\n", "logging.info(\"refining\")\n", "refinements = [task.result() for task in refinement_tasks]\n", "\n", "client.close()\n", "cluster.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analysis of delta score" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": false }, "outputs": [], "source": [ "result_frame = pandas.DataFrame.from_records(packed_pose.to_dict(refinements))\n", "result_frame[\"delta_total_score\"] = result_frame[\"total_score\"] - input_pose.scores[\"total_score\"] \n", "result_frame[\"mutation_index\"] = list(map(int, result_frame[\"mutation_index\"]))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "

" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "matplotlib.rcParams['figure.figsize'] = [24.0, 8.0]\n", "seaborn.heatmap(\n", " result_frame.pivot(\"mutation_aa\", \"mutation_index\", \"delta_total_score\"),\n", " cmap=\"RdBu_r\", center=0, vmax=50)" ] } ], "metadata": { "_draft": { "nbviewer_url": "https://gist.github.com/a3118b776957d03bc4c789493bb541fe" }, "gist": { "data": { "description": "pyrosetta_distributed_mutation_demo.ipynb", "public": true }, "id": "a3118b776957d03bc4c789493bb541fe" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "toc": { "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "toc_cell": false, "toc_position": {}, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }