<!--NOTEBOOK_HEADER-->
*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta.notebooks);
content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*

<!--NAVIGATION-->
< [PyRosettaCluster Tutorial 1A. Simple protocol](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.06-PyRosettaCluster-Simple-protocol.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [PyRosettaCluster Tutorial 2. Multiple protocols](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.08-PyRosettaCluster-Multiple-protocols.ipynb) ><p><a href="https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.07-PyRosettaCluster-Reproduce-simple-protocol.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open in Google Colaboratory"></a>

# PyRosettaCluster Tutorial 1B. Reproduce simple protocol

PyRosettaCluster Tutorial 1B uses the `pyrosetta.distributed.cluster` python module to reproduce a decoy generated by a PyRosetta simulation previosly run in PyRosettaCluster Tutorial 1A, using only an input `.pdb` file and the original user-provided PyRosetta protocol(s).

In PyRosettaCluster Tutorial 1A, you used `PyRosettaCluster` to apply a PyRosetta protocol to an input `.pdb` file, and generated several output `.pdb` files. Each output `.pdb` file contains information needed to exactly reproduce it.

*Warning*: This notebook uses `pyrosetta.distributed.viewer` code, which runs in `jupyter notebook` and might not run if you're using `jupyterlab`.

*Note:* This Jupyter notebook uses parallelization and is **not** meant to be executed within a Google Colab environment.

*Note:* This Jupyter notebook requires the PyRosetta distributed layer which is obtained by building PyRosetta with the `--serialization` flag or installing PyRosetta from the RosettaCommons conda channel 

**Please see Chapter 16.00 for setup instructions**

*Note:* This Jupyter notebook is intended to be run within **Jupyter Lab**, but may still be run as a standalone Jupyter notebook.

### 1. Import packages

In [None]:
import bz2
import json
import glob
import logging
import os
import pandas as pd
import pyrosetta
import pyrosetta.distributed.io as io
import pyrosetta.distributed.viewer as viewer

from pyrosetta.distributed.cluster import PyRosettaCluster, reproduce

logging.basicConfig(level=logging.INFO)

### 2. Initialize a compute cluster using `dask` 

See Tutorial 1A to review:
1. Click the "Dask" tab in Jupyter Lab <i>(arrow, left)</i>
2. Click the "+ NEW" button to launch a new compute cluster <i>(arrow, lower)</i>

3. Once the cluster has started, click the brackets to "inject client code" for the cluster into your notebook

Inject client code here, then run the cell:

In [None]:
if not os.getenv("DEBUG"):
    from dask.distributed import Client

    client = Client("tcp://127.0.0.1:40329")
else:
    client = None
client

### 3. Re-define or import the original user-provided PyRosetta protocol:

The purpose of the `sha1` attribute of `PyRosettaCluster` is to ensures that you have committed all of your untracked changes into your git repository before executing the original simulation. When you run the `reproduce` function, the original `sha1` attribute of `PyRosettaCluster` was captured in the output decoy `.pdb` file which ensures that you have checked out the same git SHA1 hash before reproducing the simulation. In this way, `my_protocol` remains statically captured at the git SHA1 hash from the original simulation. However, you may always update `my_protocol`, commit your changes to your git repository, and re-run the simulation, because the `sha1` attribute of `PyRosettaCluster` automatically detects the new git SHA1 hash in your git repository.

In [None]:
if not os.getenv("DEBUG"):
    from additional_scripts.my_protocols import my_protocol
    client.upload_file("additional_scripts/my_protocols.py") # This sends a local file up to all worker nodes.

### 4. Reproduce the original decoy:

The simulation in Tutorial 1A generated four decoys (because `nstruct=4` in the original simulation). Let's say we'd like to reproduce the decoy with the lowest energy. First, let's inspect the results with the `pandas` library:

In [None]:
if not os.getenv("DEBUG"):
    original_results = glob.glob(os.path.join(os.getcwd(), "outputs_1A", "decoys", "*", "*.pdb.bz2"))

    data = {}
    for original_result in original_results:
        with open(original_result, "rb") as f:
            pdbstring = bz2.decompress(f.read()).decode()
            for line in reversed(pdbstring.split("\n")):
                remark = "REMARK PyRosettaCluster: "
                if line.startswith(remark):
                    data[original_result] = json.loads(line.split(remark)[-1])["scores"]
                    break

    df = pd.DataFrame().from_records(data).T
    df

Now locate the decoy with the lowest Rosetta `total_score` to reproduce:

In [None]:
if not os.getenv("DEBUG"):
    decoy_to_reproduce = df.sort_values(by="total_score", ascending=True).index[0]
    decoy_to_reproduce

### 5. Launch the reproduction simulation using `reproduce()`:

Reproducing the decoy is accomplished with the `reproduce()` function of the `pyrosettacluster` module. This method requires the `.pdb` or `.pdb.bz2` file to reproduce: `input_file`. Alternatively, a `scorefile` with full simulation records and a `decoy_name` may be provided to `reproduce()` instead of the `.pdb` or `.pdb.bz2` file. The user-provided PyRosetta protocol(s) must be defined or imported and input into `reproduce()` as the `protocols` argument parameter. The user is responsible for supplying the same protocol that was used in the original simulation! Additionally, any supplied `instance_kwargs` will override any `PyRosettaCluster` instance attributes from the `input_file` or `scorefile`. This may be useful when, for example, you want to change your cluster configuration while reproducing a decoy.

In [None]:
if not os.getenv("DEBUG"):
    output_path = os.path.join(os.getcwd(), "outputs_1B")

    reproduce(
        input_file=decoy_to_reproduce,
        input_packed_pose=None, # Optional, if you used the `input_packed_pose` attribute of `PyRosettaCluster` in the original simulation
        client=client, # Optional
        instance_kwargs={"output_path": output_path, "nstruct": 1}, # Specify new output path, and set `nstruct` to 1 to reproduce the decoy only once. 
        protocols=[my_protocol],
    )

### 6. Visualize the reproduced decoy:

In [None]:
if not os.getenv("DEBUG"):
    reproduced_results = glob.glob(os.path.join(output_path, "decoys", "*", "*.pdb.bz2"))
    assert len(reproduced_results) == 1
    with open(reproduced_results[0], "rb") as f:
        reproduced_packed_pose = io.pose_from_pdbstring(bz2.decompress(f.read()).decode())

In [None]:
if not os.getenv("DEBUG"):
    view = viewer.init(reproduced_packed_pose, window_size=(800, 600))
    view.add(viewer.setStyle())
    view.add(viewer.setStyle(colorscheme="whiteCarbon", radius=0.25))
    view.add(viewer.setHydrogenBonds())
    view.add(viewer.setHydrogens(polar_only=True))
    view.add(viewer.setDisulfides(radius=0.25))
    view()

### 7. Optionally, perform sanity checks to confirm that the reproduced decoy is identical to the original decoy:

PyRosetta trajectories are _deterministic_ depending on the input random number generated seed(s)!

In [None]:
if not os.getenv("DEBUG"):
    with open(decoy_to_reproduce, "rb") as f:
        original_packed_pose = io.pose_from_pdbstring(bz2.decompress(f.read()).decode())
    original_pose = original_packed_pose.pose
    reproduced_pose = reproduced_packed_pose.pose

#### Assert that the sequences are identical:

In [None]:
if not os.getenv("DEBUG"):
    assert original_pose.sequence() == reproduced_pose.sequence()

#### Assert that the `total_score`s are identical:

In [None]:
if not os.getenv("DEBUG"):
    scorefxn = pyrosetta.create_score_function("ref2015.wts")
    assert scorefxn(original_pose) == scorefxn(reproduced_pose)

#### Assert that the C$_{\alpha}$–C$_{\alpha}$ root-mean-square deviation (RMSD) is `0.0` Å:

Note: There is no need to first superimpose the `original_pose` and `reproduced_pose` because they were both generated starting from the same `input_packed_pose`

In [None]:
if not os.getenv("DEBUG"):
    assert pyrosetta.rosetta.core.scoring.CA_rmsd(original_pose, reproduced_pose) == 0.0

### Congrats! 
You have successfully reproduced a PyRosetta simulation using the `pyrosetta.distributed.cluster` module!

<!--NAVIGATION-->
< [PyRosettaCluster Tutorial 1A. Simple protocol](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.06-PyRosettaCluster-Simple-protocol.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [PyRosettaCluster Tutorial 2. Multiple protocols](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.08-PyRosettaCluster-Multiple-protocols.ipynb) ><p><a href="https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.07-PyRosettaCluster-Reproduce-simple-protocol.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open in Google Colaboratory"></a>