{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta.notebooks);\n", "content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [PyRosettaCluster Tutorial 1A. Simple protocol](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.06-PyRosettaCluster-Simple-protocol.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [PyRosettaCluster Tutorial 2. Multiple protocols](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.08-PyRosettaCluster-Multiple-protocols.ipynb) >

\"Open" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# PyRosettaCluster Tutorial 1B. Reproduce simple protocol\n", "\n", "PyRosettaCluster Tutorial 1B uses the `pyrosetta.distributed.cluster` python module to reproduce a decoy generated by a PyRosetta simulation previosly run in PyRosettaCluster Tutorial 1A, using only an input `.pdb` file and the original user-provided PyRosetta protocol(s).\n", "\n", "In PyRosettaCluster Tutorial 1A, you used `PyRosettaCluster` to apply a PyRosetta protocol to an input `.pdb` file, and generated several output `.pdb` files. Each output `.pdb` file contains information needed to exactly reproduce it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Warning*: This notebook uses `pyrosetta.distributed.viewer` code, which runs in `jupyter notebook` and might not run if you're using `jupyterlab`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note:* This Jupyter notebook uses parallelization and is **not** meant to be executed within a Google Colab environment." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note:* This Jupyter notebook requires the PyRosetta distributed layer which is obtained by building PyRosetta with the `--serialization` flag or installing PyRosetta from the RosettaCommons conda channel \n", "\n", "**Please see Chapter 16.00 for setup instructions**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note:* This Jupyter notebook is intended to be run within **Jupyter Lab**, but may still be run as a standalone Jupyter notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Import packages" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import bz2\n", "import json\n", "import glob\n", "import logging\n", "import os\n", "import pandas as pd\n", "import pyrosetta\n", "import pyrosetta.distributed.io as io\n", "import pyrosetta.distributed.viewer as viewer\n", "\n", "from pyrosetta.distributed.cluster import PyRosettaCluster, reproduce\n", "\n", "logging.basicConfig(level=logging.INFO)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Initialize a compute cluster using `dask` \n", "\n", "See Tutorial 1A to review:\n", "1. Click the \"Dask\" tab in Jupyter Lab (arrow, left)\n", "2. Click the \"+ NEW\" button to launch a new compute cluster (arrow, lower)\n", "\n", "3. Once the cluster has started, click the brackets to \"inject client code\" for the cluster into your notebook\n", "\n", "Inject client code here, then run the cell:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " from dask.distributed import Client\n", "\n", " client = Client(\"tcp://127.0.0.1:40329\")\n", "else:\n", " client = None\n", "client" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Re-define or import the original user-provided PyRosetta protocol:\n", "\n", "The purpose of the `sha1` attribute of `PyRosettaCluster` is to ensures that you have committed all of your untracked changes into your git repository before executing the original simulation. When you run the `reproduce` function, the original `sha1` attribute of `PyRosettaCluster` was captured in the output decoy `.pdb` file which ensures that you have checked out the same git SHA1 hash before reproducing the simulation. In this way, `my_protocol` remains statically captured at the git SHA1 hash from the original simulation. However, you may always update `my_protocol`, commit your changes to your git repository, and re-run the simulation, because the `sha1` attribute of `PyRosettaCluster` automatically detects the new git SHA1 hash in your git repository." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " from additional_scripts.my_protocols import my_protocol\n", " client.upload_file(\"additional_scripts/my_protocols.py\") # This sends a local file up to all worker nodes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. Reproduce the original decoy:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The simulation in Tutorial 1A generated four decoys (because `nstruct=4` in the original simulation). Let's say we'd like to reproduce the decoy with the lowest energy. First, let's inspect the results with the `pandas` library:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " original_results = glob.glob(os.path.join(os.getcwd(), \"outputs_1A\", \"decoys\", \"*\", \"*.pdb.bz2\"))\n", "\n", " data = {}\n", " for original_result in original_results:\n", " with open(original_result, \"rb\") as f:\n", " pdbstring = bz2.decompress(f.read()).decode()\n", " for line in reversed(pdbstring.split(\"\\n\")):\n", " remark = \"REMARK PyRosettaCluster: \"\n", " if line.startswith(remark):\n", " data[original_result] = json.loads(line.split(remark)[-1])[\"scores\"]\n", " break\n", "\n", " df = pd.DataFrame().from_records(data).T\n", " df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now locate the decoy with the lowest Rosetta `total_score` to reproduce:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " decoy_to_reproduce = df.sort_values(by=\"total_score\", ascending=True).index[0]\n", " decoy_to_reproduce" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5. Launch the reproduction simulation using `reproduce()`:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reproducing the decoy is accomplished with the `reproduce()` function of the `pyrosettacluster` module. This method requires the `.pdb` or `.pdb.bz2` file to reproduce: `input_file`. Alternatively, a `scorefile` with full simulation records and a `decoy_name` may be provided to `reproduce()` instead of the `.pdb` or `.pdb.bz2` file. The user-provided PyRosetta protocol(s) must be defined or imported and input into `reproduce()` as the `protocols` argument parameter. The user is responsible for supplying the same protocol that was used in the original simulation! Additionally, any supplied `instance_kwargs` will override any `PyRosettaCluster` instance attributes from the `input_file` or `scorefile`. This may be useful when, for example, you want to change your cluster configuration while reproducing a decoy." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " output_path = os.path.join(os.getcwd(), \"outputs_1B\")\n", "\n", " reproduce(\n", " input_file=decoy_to_reproduce,\n", " input_packed_pose=None, # Optional, if you used the `input_packed_pose` attribute of `PyRosettaCluster` in the original simulation\n", " client=client, # Optional\n", " instance_kwargs={\"output_path\": output_path, \"nstruct\": 1}, # Specify new output path, and set `nstruct` to 1 to reproduce the decoy only once. \n", " protocols=[my_protocol],\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6. Visualize the reproduced decoy:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " reproduced_results = glob.glob(os.path.join(output_path, \"decoys\", \"*\", \"*.pdb.bz2\"))\n", " assert len(reproduced_results) == 1\n", " with open(reproduced_results[0], \"rb\") as f:\n", " reproduced_packed_pose = io.pose_from_pdbstring(bz2.decompress(f.read()).decode())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " view = viewer.init(reproduced_packed_pose, window_size=(800, 600))\n", " view.add(viewer.setStyle())\n", " view.add(viewer.setStyle(colorscheme=\"whiteCarbon\", radius=0.25))\n", " view.add(viewer.setHydrogenBonds())\n", " view.add(viewer.setHydrogens(polar_only=True))\n", " view.add(viewer.setDisulfides(radius=0.25))\n", " view()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 7. Optionally, perform sanity checks to confirm that the reproduced decoy is identical to the original decoy:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PyRosetta trajectories are _deterministic_ depending on the input random number generated seed(s)!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " with open(decoy_to_reproduce, \"rb\") as f:\n", " original_packed_pose = io.pose_from_pdbstring(bz2.decompress(f.read()).decode())\n", " original_pose = original_packed_pose.pose\n", " reproduced_pose = reproduced_packed_pose.pose" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Assert that the sequences are identical:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " assert original_pose.sequence() == reproduced_pose.sequence()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Assert that the `total_score`s are identical:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " scorefxn = pyrosetta.create_score_function(\"ref2015.wts\")\n", " assert scorefxn(original_pose) == scorefxn(reproduced_pose)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Assert that the C$_{\\alpha}$–C$_{\\alpha}$ root-mean-square deviation (RMSD) is `0.0` Å:\n", "\n", "Note: There is no need to first superimpose the `original_pose` and `reproduced_pose` because they were both generated starting from the same `input_packed_pose`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " assert pyrosetta.rosetta.core.scoring.CA_rmsd(original_pose, reproduced_pose) == 0.0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Congrats! \n", "You have successfully reproduced a PyRosetta simulation using the `pyrosetta.distributed.cluster` module!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [PyRosettaCluster Tutorial 1A. Simple protocol](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.06-PyRosettaCluster-Simple-protocol.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [PyRosettaCluster Tutorial 2. Multiple protocols](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.08-PyRosettaCluster-Multiple-protocols.ipynb) >

\"Open" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:PyRosetta.notebooks]", "language": "python", "name": "pyrosetta.notebooks" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }