{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta.notebooks);\n", "content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [PyRosettaCluster Tutorial 1B. Reproduce simple protocol](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.07-PyRosettaCluster-Reproduce-simple-protocol.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [PyRosettaCluster Tutorial 3. Multiple decoys](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.09-PyRosettaCluster-Multiple-decoys.ipynb) >

\"Open" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# PyRosettaCluster Tutorial 2. Multiple protocols\n", "\n", "PyRosettaCluster Tutorial 2 is an example of using multiple user-provided PyRosetta protocols with `PyRosettaCluster`. Unlike Rosetta's `MultiplePoseMover` which executes multiple protocols serially, `PyRosettaCluster` executes multiple protocols in parallel (provided the cluster has more than one distributed worker). The user defines the order in which the protocols execute. Each `Pose` or `PackedPose` object returned from the first user-provided PyRosetta protocol is automatically passed to the second user-providd PyRosetta protocol, and so on. That is, `protocol1` returns a `Pose` object, which is then used as input for `protocol2`; `protocol2` returns a new `Pose` object, which is then used as input for `protocol3`, and so on. `Pose` objects returned by the final protocol are written to disk (unless the user specifies `PyRosettaCluster(..., save_all=True, ...)` in which case all intermediate decoys are also written to disk. Each decoy contains all of the relevant information needed to reproduce it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Warning*: This notebook uses `pyrosetta.distributed.viewer` code, which runs in `jupyter notebook` and might not run if you're using `jupyterlab`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note:* This Jupyter notebook uses parallelization and is **not** meant to be executed within a Google Colab environment." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note:* This Jupyter notebook requires the PyRosetta distributed layer which is obtained by building PyRosetta with the `--serialization` flag or installing PyRosetta from the RosettaCommons conda channel \n", "\n", "**Please see Chapter 16.00 for setup instructions**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note:* This Jupyter notebook is intended to be run within **Jupyter Lab**, but may still be run as a standalone Jupyter notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Import packages" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import bz2\n", "import glob\n", "import json\n", "import logging\n", "import os\n", "import pyrosetta\n", "import pyrosetta.distributed.io as io\n", "import pyrosetta.distributed.viewer as viewer\n", "\n", "from pyrosetta.distributed.cluster import PyRosettaCluster\n", "\n", "logging.basicConfig(level=logging.INFO)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Initialize a compute cluster using `dask`:\n", "\n", "See Tutorial 1A to review:\n", "1. Click the \"Dask\" tab in Jupyter Lab (arrow, left)\n", "2. Click the \"+ NEW\" button to launch a new compute cluster (arrow, lower)\n", "3. Once the cluster has started, click the brackets to \"inject client code\" for the cluster into your notebook\n", "\n", "Inject client code here, then run the cell:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "

Client

\n", "\n", "
\n", "

Cluster

\n", "
    \n", "
  • Workers: 4
  • \n", "
  • Cores: 4
  • \n", "
  • Memory: 16.63 GB
  • \n", "
\n", "
" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "if not os.getenv(\"DEBUG\"):\n", " from dask.distributed import Client\n", "\n", " client = Client(\"tcp://127.0.0.1:40329\")\n", "else:\n", " client = None\n", "client" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Define the user-provided PyRosetta protocols:\n", "\n", "User-provided PyRosetta protocols may return `Pose` or `PackedPose` objects to be passed on to the next protocol. Protocols that don't return `Pose` or `PackedPose` objects are allowed, for example returning a `NoneType` object. In such cases, the subsequent protocol receives an empty `PackedPose` object." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def protocol1(packed_pose_in, **kwargs):\n", " \"\"\"\n", " Repacks the input `PackedPose` object, which can be (a) input to the function\n", " automatically via the 'packed_pose_in' argument or (b) accessed through the 's' \n", " `kwargs` keyword argument, depending on the order in which the protocol is \n", " specified in the PyRosettaCluster.distributed() method.\n", " \n", " Args:\n", " packed_pose_in: A `PackedPose` object to be repacked. Optional.\n", " **kwargs: PyRosettaCluster keyword arguments.\n", "\n", " Returns:\n", " A `PackedPose` object.\n", " \"\"\"\n", " import pyrosetta\n", " import pyrosetta.distributed.io as io\n", " import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts\n", " \n", " logging.info(\n", " \"Now executing protocol number '{0}' called '{1}'.\".format(\n", " kwargs[\"PyRosettaCluster_protocol_number\"],\n", " kwargs[\"PyRosettaCluster_protocol_name\"]\n", " )\n", " )\n", " \n", " if packed_pose_in == None:\n", " logging.info(\"Generating `packed_pose_in` from `kwargs['s']`.\")\n", " packed_pose_in = io.pose_from_file(kwargs[\"s\"])\n", " else:\n", " logging.info(\"Using `packed_pose_in` from `args`.\")\n", " \n", " xml = \"\"\"\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \"\"\"\n", " \n", " return rosetta_scripts.SingleoutputRosettaScriptsTask(xml)(packed_pose_in.pose.clone())\n", "\n", "def protocol2(packed_pose_in, **kwargs):\n", " \"\"\"\n", " Performs sequence design (Thr24Ser) on an input pose.\n", " \n", " Args:\n", " packed_pose_in: A `PackedPose` object to be designed.\n", " **kwargs: PyRosettaCluster keyword arguments.\n", "\n", " Returns:\n", " A `PackedPose` object.\n", " \"\"\"\n", " import pyrosetta\n", " import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts\n", "\n", " xml = \"\"\"\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \"\"\"\n", "\n", " return rosetta_scripts.SingleoutputRosettaScriptsTask(xml)(packed_pose_in.pose.clone())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. Define the user-provided kwargs:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def create_tasks():\n", " yield {\n", " \"options\": \"-ex1\",\n", " \"extra_options\": \"-out:level 300 -multithreading:total_threads 1\",\n", " \"set_logging_handler\": \"interactive\",\n", " \"s\": os.path.join(os.getcwd(), \"inputs\", \"1QYS.pdb\"),\n", " }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5. Launch the original simulation using `distribute()`:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:pyrosetta.distributed:maybe_init performing pyrosetta initialization: {'options': '-run:constant_seed 1 -multithreading:total_threads 1', 'extra_options': '-mute all', 'silent': True}\n", "INFO:pyrosetta.rosetta:Found rosetta database at: /shared/home/jklima/.conda/envs/jupyterlab/lib/python3.7/site-packages/pyrosetta/database; using it....\n", "INFO:pyrosetta.rosetta:PyRosetta-4 2020 [Rosetta PyRosetta4.conda.linux.cxx11thread.serialization.CentOS.python37.Release 2020.15+release.3121c734db02d2b62dd1974dcb8daface3f50057 2020-04-10T09:29:24] retrieved from: http://www.pyrosetta.org\n", "(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.\n" ] } ], "source": [ "if not os.getenv(\"DEBUG\"):\n", " output_path = os.path.join(os.getcwd(), \"outputs_2\")\n", "\n", " PyRosettaCluster(\n", " tasks=create_tasks,\n", " client=client,\n", " scratch_dir=output_path,\n", " output_path=output_path,\n", " ).distribute(protocols=[protocol1, protocol2, protocol1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While jobs are running, you may monitor their progress using the dask dashboard diagnostics within Jupyter Lab!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6. Visualize the resultant decoy:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Gather the input and output decoys from disk into memory:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " input_file = os.path.join(os.getcwd(), \"inputs\", \"1QYS.pdb\")\n", " output_file = glob.glob(os.path.join(output_path, \"decoys\", \"*\", \"*.pdb.bz2\"))[0]\n", "\n", " packed_poses = []\n", " for pdbfile in [input_file, output_file]:\n", " if pdbfile.endswith(\".bz2\"):\n", " with open(pdbfile, \"rb\") as f:\n", " packed_poses.append(io.pose_from_pdbstring(bz2.decompress(f.read()).decode()))\n", " elif pdbfile.endswith(\".pdb\"):\n", " with open(pdbfile, \"r\") as f:\n", " packed_poses.append(io.pose_from_pdbstring(f.read()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The original Top7 (PDB ID: 1QYS) decoy and the designed Top7 decoy with the T24S mutation highlighted is shown below using the `pyrosetta.distributed.viewer` visualizer: " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "695e14f8f1004f1083c4fb84ac869c77", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(IntSlider(value=0, continuous_update=False, description='Decoys', max=1), Output()), _do…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ ".view(i=0)>" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "if not os.getenv(\"DEBUG\"):\n", " resi_24 = pyrosetta.rosetta.core.select.residue_selector.ResidueIndexSelector(\"24A\")\n", "\n", " view = viewer.init(packed_poses, window_size=(800, 600))\n", " view.add(viewer.setStyle())\n", " view.add(viewer.setStyle(colorscheme=\"whiteCarbon\", radius=0.25))\n", " view.add(viewer.setStyle(residue_selector=resi_24, colorscheme=\"magentaCarbon\", radius=0.5))\n", " view.add(viewer.setHydrogenBonds())\n", " view.add(viewer.setHydrogens(polar_only=True))\n", " view.add(viewer.setDisulfides(radius=0.25))\n", " view()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Congrats! \n", "You have successfully run `PyRosettaCluster` with multiple user-provided PyRosetta protocols!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [PyRosettaCluster Tutorial 1B. Reproduce simple protocol](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.07-PyRosettaCluster-Reproduce-simple-protocol.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [PyRosettaCluster Tutorial 3. Multiple decoys](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.09-PyRosettaCluster-Multiple-decoys.ipynb) >

\"Open" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:PyRosetta.notebooks]", "language": "python", "name": "pyrosetta.notebooks" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }