{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta.notebooks);\n", "content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [PyRosettaCluster Tutorial 2. Multiple protocols](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.08-PyRosettaCluster-Multiple-protocols.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [PyRosettaCluster Tutorial 4. Ligand params](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.10-PyRosettaCluster-Ligand-params.ipynb) >

\"Open" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# PyRosettaCluster Tutorial 3. Multiple decoys\n", "\n", "PyRosettaCluster Tutorial 3 demonstrates how multiple tasks (specified by `kwargs`) may be run several times (specified by `nstruct`). Additionally, user-provided PyRosetta protocols may `yield` or `return` multiple `Pose` or `PackedPose` objects to be efficiently parallelized on the user's compute resources." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Warning*: This notebook uses `pyrosetta.distributed.viewer` code, which runs in `jupyter notebook` and might not run if you're using `jupyterlab`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note:* This Jupyter notebook uses parallelization and is **not** meant to be executed within a Google Colab environment." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note:* This Jupyter notebook requires the PyRosetta distributed layer which is obtained by building PyRosetta with the `--serialization` flag or installing PyRosetta from the RosettaCommons conda channel \n", "\n", "**Please see Chapter 16.00 for setup instructions**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note:* This Jupyter notebook is intended to be run within **Jupyter Lab**, but may still be run as a standalone Jupyter notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Import packages" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import bz2\n", "import glob\n", "import logging\n", "import os\n", "import pyrosetta\n", "import pyrosetta.distributed.io as io\n", "import pyrosetta.distributed.viewer as viewer\n", "\n", "from pyrosetta.distributed.cluster import PyRosettaCluster\n", "\n", "logging.basicConfig(level=logging.INFO)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Initialize a compute cluster using `dask`\n", "\n", "See Tutorial 1A for review:\n", "1. Click the \"Dask\" tab in Jupyter Lab (arrow, left)\n", "2. Click the \"+ NEW\" button to launch a new compute cluster (arrow, lower)\n", "3. Once the cluster has started, click the brackets to \"inject client code\" for the cluster into your notebook\n", "\n", "Inject client code here, then run the cell:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " from dask.distributed import Client\n", "\n", " client = Client(\"tcp://127.0.0.1:40329\")\n", "else:\n", " client = None\n", "client" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Define the user-provided PyRosetta protocols that returns multiple `Pose` or `PackedPose` objects:\n", "\n", "PyRosettaCluster automatically passes returned or yielded `Pose` or `PackedPose` objects through the user-provided PyRosetta protocols. If a protocol produces `n` poses, the subsequent protocol runs `n` times, once for each pose. By default, the `Pose` and `PackedPose` objects returned by the final protocol are written to disk.\n", "\n", "Multiple `Pose` and `PackedPose` objects may be yielded iteratively, or returned in a `list` or `tuple`:\n", "\n", "To `yield` multiple poses:\n", "```\n", "for _ in range(n_results):\n", " yield backrub(ppose.pose.clone())\n", "```\n", "\n", "Note: `yield` does not add the yielded object to the queue for parallelization until all objects are yielded.\n", "\n", "To `return` multiple poses in a `list`:\n", "```\n", "return list_of_poses\n", "```\n", "\n", "To `return` multiple poses in a `tuple`:\n", "```\n", "return pose1, pose2, pose3\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def protocol1(packed_pose_in=None, **kwargs):\n", " \"\"\"\n", " Performs backrub on a `PackedPose` object, which may be (a) input \n", " to the function or (b) accessed through the 's' `kwargs` keyword\n", " argument.\n", " \n", " Args:\n", " packed_pose: A `PackedPose` object. Optional.\n", " **kwargs: PyRosettaCluster keyword arguments.\n", "\n", " Returns:\n", " Multiple `PackedPose` objects.\n", " \"\"\"\n", " import pyrosetta\n", " import pyrosetta.distributed.io as io\n", " import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts\n", " \n", " if packed_pose_in == None:\n", " packed_pose_in = io.pose_from_file(kwargs[\"s\"])\n", " \n", " xml = \"\"\"\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \"\"\"\n", " backrub = rosetta_scripts.SingleoutputRosettaScriptsTask(xml)\n", "\n", " n_results = 3\n", " for _ in range(n_results):\n", " yield backrub(packed_pose_in.pose.clone())\n", "\n", "\n", "def protocol2(packed_pose_in, **kwargs):\n", " \"\"\"\n", " Performs sequence design using 'ALLAAxc' resfile command on input \n", " `kwargs['resnums']` residue numbers on the input `PackedPose` object.\n", " \n", " Args:\n", " packed_pose_in: A `PackedPose` object to be designed.\n", " **kwargs: PyRosettaCluster keyword arguments.\n", "\n", " Returns:\n", " A `PackedPose` object that has been designed.\n", " \"\"\"\n", " import pyrosetta\n", " import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts\n", "\n", " xml = \"\"\"\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \"\"\".format(resnums=kwargs[\"resnums\"])\n", " \n", " return rosetta_scripts.SingleoutputRosettaScriptsTask(xml)(packed_pose_in.pose.clone())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. Define the user-provided tasks as `kwargs`:\n", "\n", "Returning a list of dictionaries or yielding dictionaries allows the user to run through the chain of user-provided PyRosetta protocols multiple times with different inputs, and the unique `kwargs` can be accessed within each user-provided PyRosetta protocol." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_of_options = {\n", " \"-out:level\": \"300\",\n", " \"-multithreading:total_threads\": \"1\",\n", "}\n", "\n", "def create_tasks():\n", " for resnum in range(22, 28):\n", " yield {\n", " \"options\": \"-ex1\",\n", " \"extra_options\": dict_of_options,\n", " \"set_logging_handler\": \"interactive\",\n", " \"s\": os.path.join(os.getcwd(), \"inputs\", \"1QYS.pdb\"),\n", " \"resnums\": str(resnum) + \"A\",\n", " }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5. Launch the original simulation using `distribute()`:\n", "\n", "We also will use the `PyRosettaCluster` `nstruct` attribute, which is an `int` object specifying the number of repeats of the first user-provided PyRosetta protocol." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " output_path = os.path.join(os.getcwd(), \"outputs_3\")\n", "\n", " PyRosettaCluster(\n", " tasks=create_tasks,\n", " client=client,\n", " scratch_dir=output_path,\n", " output_path=output_path,\n", " nstruct=2,\n", " ).distribute(protocols=[protocol1, protocol2, protocol1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While jobs are running, you may monitor their progress using the dask dashboard diagnostics within Jupyter Lab!\n", "\n", "Initially there are `12` simulations running in parallel: `6` tasks from the `create_tasks` generator, with each task executed using `nstruct=2`. After `protocol1` runs to completion, more `PackedPose` objects are added to the queue. After `protocol2` runs to completion, more `PackedPose` objects are added to the queue. The process continues until all tasks are run through the chain of user-provided PyRosetta protocols." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6. Visualize the resulting decoys:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Gather output decoys from disk into memory:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " results = glob.glob(os.path.join(output_path, \"decoys\", \"*\", \"*.pdb.bz2\"))\n", " packed_poses = []\n", " for i, bz2file in enumerate(results, start=1):\n", " with open(bz2file, \"rb\") as f:\n", " packed_poses.append(io.pose_from_pdbstring(bz2.decompress(f.read()).decode()))\n", " logging.info(\"Percent done loading: {0:0.1f} %\".format((i * 100.) / len(results)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Your designed Top7 (PDB ID: 1QYS) decoys are visualized below with residue numbers designed during the simulation shown.\n", "\n", "There are 108 resulting decoys: 6 (`kwargs`) x 2 (`nstruct`) x 3 (`protocol1`) x 1 (`protocol2`) x 3 (`protocol1`)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " assert 6 * 2 * 3 * 1 * 3 == len(results)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " resis = pyrosetta.rosetta.core.select.residue_selector.ResidueIndexSelector(\"22A,23A,24A,25A,26A,27A\")\n", "\n", " view = viewer.init(packed_poses, window_size=(800, 600))\n", " view.add(viewer.setStyle())\n", " view.add(viewer.setStyle(residue_selector=resis, colorscheme=\"whiteCarbon\", radius=0.35))\n", " view.add(viewer.setHydrogenBonds())\n", " view.add(viewer.setHydrogens(polar_only=True))\n", " view()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Congrats! \n", "You have successfully run a multiple-protocol PyRosetta trajectory with `PyRosettaCluster`!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [PyRosettaCluster Tutorial 2. Multiple protocols](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.08-PyRosettaCluster-Multiple-protocols.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [PyRosettaCluster Tutorial 4. Ligand params](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.10-PyRosettaCluster-Ligand-params.ipynb) >

\"Open" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:PyRosetta.notebooks]", "language": "python", "name": "pyrosetta.notebooks" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }