<!--NOTEBOOK_HEADER-->
*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta.notebooks);
content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*

<!--NAVIGATION-->
< [PyRosettaCluster Tutorial 2. Multiple protocols](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.08-PyRosettaCluster-Multiple-protocols.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [PyRosettaCluster Tutorial 4. Ligand params](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.10-PyRosettaCluster-Ligand-params.ipynb) ><p><a href="https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.09-PyRosettaCluster-Multiple-decoys.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open in Google Colaboratory"></a>

# PyRosettaCluster Tutorial 3. Multiple decoys

PyRosettaCluster Tutorial 3 demonstrates how multiple tasks (specified by `kwargs`) may be run several times (specified by `nstruct`). Additionally, user-provided PyRosetta protocols may `yield` or `return` multiple `Pose` or `PackedPose` objects to be efficiently parallelized on the user's compute resources.

*Warning*: This notebook uses `pyrosetta.distributed.viewer` code, which runs in `jupyter notebook` and might not run if you're using `jupyterlab`.

*Note:* This Jupyter notebook uses parallelization and is **not** meant to be executed within a Google Colab environment.

*Note:* This Jupyter notebook requires the PyRosetta distributed layer which is obtained by building PyRosetta with the `--serialization` flag or installing PyRosetta from the RosettaCommons conda channel 

**Please see Chapter 16.00 for setup instructions**

*Note:* This Jupyter notebook is intended to be run within **Jupyter Lab**, but may still be run as a standalone Jupyter notebook.

### 1. Import packages

In [1]:
import bz2
import glob
import logging
import os
import pyrosetta
import pyrosetta.distributed.io as io
import pyrosetta.distributed.viewer as viewer

from pyrosetta.distributed.cluster import PyRosettaCluster

logging.basicConfig(level=logging.INFO)

### 2. Initialize a compute cluster using `dask`

See Tutorial 1A for review:
1. Click the "Dask" tab in Jupyter Lab <i>(arrow, left)</i>
2. Click the "+ NEW" button to launch a new compute cluster <i>(arrow, lower)</i>
3. Once the cluster has started, click the brackets to "inject client code" for the cluster into your notebook

Inject client code here, then run the cell:

In [2]:
if not os.getenv("DEBUG"):
    from dask.distributed import Client

    client = Client("tcp://127.0.0.1:40329")
else:
    client = None
client

0,1
Client  Scheduler: tcp://127.0.0.1:40329  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 4  Cores: 4  Memory: 16.63 GB


### 3. Define the user-provided PyRosetta protocols that returns multiple `Pose` or `PackedPose` objects:

PyRosettaCluster automatically passes returned or yielded `Pose` or `PackedPose` objects through the user-provided PyRosetta protocols. If a protocol produces `n` poses, the subsequent protocol runs `n` times, once for each pose. By default, the `Pose` and `PackedPose` objects returned by the final protocol are written to disk.

Multiple `Pose` and `PackedPose` objects may be yielded iteratively, or returned in a `list` or `tuple`:

To `yield` multiple poses:
```
for _ in range(n_results):
    yield backrub(ppose.pose.clone())
```

Note: `yield` does not add the yielded object to the queue for parallelization until all objects are yielded.

To `return` multiple poses in a `list`:
```
return list_of_poses
```

To `return` multiple poses in a `tuple`:
```
return pose1, pose2, pose3
```

In [3]:
def protocol1(packed_pose_in=None, **kwargs):
    """
    Performs backrub on a `PackedPose` object, which may be (a) input 
    to the function or  (b) accessed through the 's' `kwargs` keyword
    argument.
    
    Args:
        packed_pose: A `PackedPose` object. Optional.
        **kwargs: PyRosettaCluster keyword arguments.

    Returns:
        Multiple `PackedPose` objects.
    """
    import pyrosetta
    import pyrosetta.distributed.io as io
    import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts
    
    if packed_pose_in == None:
        packed_pose_in = io.pose_from_file(kwargs["s"])
        
    xml = """
        <ROSETTASCRIPTS>
          <MOVERS>
            <Backrub name="backrub" pivot_residues="22A,23A,24A,25A,26A,27A"/>
          </MOVERS>
          <PROTOCOLS>
            <Add mover="backrub"/>
          </PROTOCOLS>
        </ROSETTASCRIPTS>
        """
    backrub = rosetta_scripts.SingleoutputRosettaScriptsTask(xml)

    n_results = 3
    for _ in range(n_results):
        yield backrub(packed_pose_in.pose.clone())


def protocol2(packed_pose_in, **kwargs):
    """
    Performs sequence design using 'ALLAAxc' resfile command on input 
    `kwargs['resnums']` residue numbers on the input `PackedPose` object.
    
    Args:
        packed_pose_in: A `PackedPose` object to be designed.
        **kwargs: PyRosettaCluster keyword arguments.

    Returns:
        A `PackedPose` object that has been designed.
    """
    import pyrosetta
    import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts

    xml = """
        <ROSETTASCRIPTS>
          <RESIDUE_SELECTORS>
            <Index name="my_resnums" resnums="{resnums}" />
            <Not name="not_my_resnums" selector="my_resnums" />
          </RESIDUE_SELECTORS>
          <TASKOPERATIONS>
            <ResfileCommandOperation name="design" command="ALLAAxc" residue_selector="my_resnums"/>
            <OperateOnResidueSubset name="prevent_repacking" selector="not_my_resnums">
              <PreventRepackingRLT/>
            </OperateOnResidueSubset>
          </TASKOPERATIONS>
          <MOVERS>
            <PackRotamersMover name="design_mover" task_operations="design,prevent_repacking"/>
          </MOVERS>
          <PROTOCOLS>
            <Add mover="design_mover"/>
          </PROTOCOLS>
        </ROSETTASCRIPTS>
        """.format(resnums=kwargs["resnums"])
    
    return rosetta_scripts.SingleoutputRosettaScriptsTask(xml)(packed_pose_in.pose.clone())

### 4. Define the user-provided tasks as `kwargs`:

Returning a list of dictionaries or yielding dictionaries allows the user to run through the chain of user-provided PyRosetta protocols multiple times with different inputs, and the unique `kwargs` can be accessed within each user-provided PyRosetta protocol.

In [4]:
dict_of_options = {
    "-out:level": "300",
    "-multithreading:total_threads": "1",
}

def create_tasks():
    for resnum in range(22, 28):
        yield {
            "options": "-ex1",
            "extra_options": dict_of_options,
            "set_logging_handler": "interactive",
            "s": os.path.join(os.getcwd(), "inputs", "1QYS.pdb"),
            "resnums": str(resnum) + "A",
        }

### 5. Launch the original simulation using `distribute()`:

We also will use the `PyRosettaCluster` `nstruct` attribute, which is an `int` object specifying the number of repeats of the first user-provided PyRosetta protocol.

In [5]:
if not os.getenv("DEBUG"):
    output_path = os.path.join(os.getcwd(), "outputs_3")

    PyRosettaCluster(
        tasks=create_tasks,
        client=client,
        scratch_dir=output_path,
        output_path=output_path,
        nstruct=2,
    ).distribute(protocols=[protocol1, protocol2, protocol1])

INFO:pyrosetta.distributed:maybe_init performing pyrosetta initialization: {'options': '-run:constant_seed 1 -multithreading:total_threads 1', 'extra_options': '-mute all', 'silent': True}
INFO:pyrosetta.rosetta:Found rosetta database at: /shared/home/jklima/.conda/envs/jupyterlab/lib/python3.7/site-packages/pyrosetta/database; using it....
INFO:pyrosetta.rosetta:PyRosetta-4 2020 [Rosetta PyRosetta4.conda.linux.cxx11thread.serialization.CentOS.python37.Release 2020.15+release.3121c734db02d2b62dd1974dcb8daface3f50057 2020-04-10T09:29:24] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.


While jobs are running, you may monitor their progress using the dask dashboard diagnostics within Jupyter Lab!

Initially there are `12` simulations running in parallel: `6` tasks from the `create_tasks` generator, with each task executed using `nstruct=2`. After `protocol1` runs to completion, more `PackedPose` objects are added to the queue. After `protocol2` runs to completion, more `PackedPose` objects are added to the queue. The process continues until all tasks are run through the chain of user-provided PyRosetta protocols.

### 6. Visualize the resulting decoys:

Gather output decoys from disk into memory:

In [8]:
if not os.getenv("DEBUG"):
    results = glob.glob(os.path.join(output_path, "decoys", "*", "*.pdb.bz2"))
    packed_poses = []
    for i, bz2file in enumerate(results, start=1):
        with open(bz2file, "rb") as f:
            packed_poses.append(io.pose_from_pdbstring(bz2.decompress(f.read()).decode()))
        logging.info("Percent done loading: {0:0.1f} %".format((i * 100.) / len(results)))

INFO:root:Percent done loading: 0.9 %
INFO:root:Percent done loading: 1.9 %
INFO:root:Percent done loading: 2.8 %
INFO:root:Percent done loading: 3.7 %
INFO:root:Percent done loading: 4.6 %
INFO:root:Percent done loading: 5.6 %
INFO:root:Percent done loading: 6.5 %
INFO:root:Percent done loading: 7.4 %
INFO:root:Percent done loading: 8.3 %
INFO:root:Percent done loading: 9.3 %
INFO:root:Percent done loading: 10.2 %
INFO:root:Percent done loading: 11.1 %
INFO:root:Percent done loading: 12.0 %
INFO:root:Percent done loading: 13.0 %
INFO:root:Percent done loading: 13.9 %
INFO:root:Percent done loading: 14.8 %
INFO:root:Percent done loading: 15.7 %
INFO:root:Percent done loading: 16.7 %
INFO:root:Percent done loading: 17.6 %
INFO:root:Percent done loading: 18.5 %
INFO:root:Percent done loading: 19.4 %
INFO:root:Percent done loading: 20.4 %
INFO:root:Percent done loading: 21.3 %
INFO:root:Percent done loading: 22.2 %
INFO:root:Percent done loading: 23.1 %
INFO:root:Percent done loading: 24.

Your designed Top7 (PDB ID: 1QYS) decoys are visualized below with residue numbers designed during the simulation shown.

There are 108 resulting decoys: 6 (`kwargs`) x 2 (`nstruct`) x 3 (`protocol1`) x 1 (`protocol2`) x 3 (`protocol1`)

In [10]:
if not os.getenv("DEBUG"):
    assert 6 * 2 * 3 * 1 * 3 == len(results)

In [11]:
if not os.getenv("DEBUG"):
    resis = pyrosetta.rosetta.core.select.residue_selector.ResidueIndexSelector("22A,23A,24A,25A,26A,27A")

    view = viewer.init(packed_poses, window_size=(800, 600))
    view.add(viewer.setStyle())
    view.add(viewer.setStyle(residue_selector=resis, colorscheme="whiteCarbon", radius=0.35))
    view.add(viewer.setHydrogenBonds())
    view.add(viewer.setHydrogens(polar_only=True))
    view()

interactive(children=(IntSlider(value=0, continuous_update=False, description='Decoys', max=107), Output()), _â€¦

<function pyrosetta.distributed.viewer.core.Viewer.show.<locals>.view(i=0)>

### Congrats! 
You have successfully run a multiple-protocol PyRosetta trajectory with `PyRosettaCluster`!

<!--NAVIGATION-->
< [PyRosettaCluster Tutorial 2. Multiple protocols](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.08-PyRosettaCluster-Multiple-protocols.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [PyRosettaCluster Tutorial 4. Ligand params](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.10-PyRosettaCluster-Ligand-params.ipynb) ><p><a href="https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.09-PyRosettaCluster-Multiple-decoys.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open in Google Colaboratory"></a>