<!--NOTEBOOK_HEADER-->
*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta.notebooks);
content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*

<!--NAVIGATION-->
< [PyRosettaCluster Tutorial 3. Multiple decoys](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.09-PyRosettaCluster-Multiple-decoys.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Command Reference](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/A.00-Appendix-A.ipynb) ><p><a href="https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.10-PyRosettaCluster-Ligand-params.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open in Google Colaboratory"></a>

# PyRosettaCluster Tutorial 4. Ligand params

PyRosettaCluster Tutorial 4 is an example of how to use a non-canonical residue or ligand `.params` file with `PyRosettaCluster`. If a structure contains a ligand that requires a `.params` file, then PyRosetta must be initialized prior to job distribution with `PyRosettaCluster`. For reproducibility outside of `PyRosettaCluster`, PyRosetta should always be initialized with a constant seed.

*Warning*: This notebook uses `pyrosetta.distributed.viewer` code, which runs in `jupyter notebook` and might not run if you're using `jupyterlab`.

*Note:* This Jupyter notebook uses parallelization and is **not** meant to be executed within a Google Colab environment.

*Note:* This Jupyter notebook requires the PyRosetta distributed layer which is obtained by building PyRosetta with the `--serialization` flag or installing PyRosetta from the RosettaCommons conda channel 

**Please see Chapter 16.00 for setup instructions**

*Note:* This Jupyter notebook is intended to be run within **Jupyter Lab**, but may still be run as a standalone Jupyter notebook.

### 1. Import packages

In [1]:
import bz2
import glob
import logging
import os
import pyrosetta
import pyrosetta.distributed.io as io
import pyrosetta.distributed.viewer as viewer

from pyrosetta.distributed.cluster import PyRosettaCluster

logging.basicConfig(level=logging.INFO)

### 2. Initialize a compute cluster using `dask`:

See Tutorial 1A for review:
1. Click the "Dask" tab in Jupyter Lab <i>(arrow, left)</i>
2. Click the "+ NEW" button to launch a new compute cluster <i>(arrow, lower)</i>
3. Once the cluster has started, click the brackets to "inject client code" for the cluster into your notebook

Inject client code here, then run the cell:

In [2]:
if not os.getenv("DEBUG"):
    from dask.distributed import Client

    client = Client("tcp://127.0.0.1:40329")
else:
    client = None
client

0,1
Client  Scheduler: tcp://127.0.0.1:40329  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 4  Cores: 4  Memory: 16.63 GB


### 3. Define ligand `.params` file(s) and initialize PyRosetta with a constant seed:

The `-run:constant_seed 1` flag defines a default constant seed of `1111111` and is necessary for reproducibility of your simulation! Initialization is necessary prior to distributing jobs that return a `Pose` or `PackedPose` with ligand or non-canonical residues. If you do not propery initialize PyRosetta within the Jupyter Notebook, then your Jupyter Notebook kernel may die and the job distribution may fail.

In [None]:
if not os.getenv("DEBUG"):
    params = os.path.join(os.getcwd(), "inputs", "TPA.am1-bcc.fa.params")
    pyrosetta.distributed.init(f"-extra_res_fa {params} -run:constant_seed 1 -multithreading:total_threads 1")

### 4. Define the user-provided PyRosetta protocol:

In [4]:
def protocol1(packed_pose_in, **kwargs):
    """
    Relax residue 1X (i.e. the ligand).
    
    Args:
        packed_pose_in: A `PackedPose` object. Optional.
        **kwargs: PyRosettaCluster keyword arguments.

    Returns:
        A `PackedPose` object.
    """
    import pyrosetta
    import pyrosetta.distributed.io as io
    import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts

    xml = """
        <ROSETTASCRIPTS>
          <RESIDUE_SELECTORS>
            <Index name="ligand_selector" resnums="1X"/>
            <Not name="not_ligand_selector" selector="ligand_selector"/>
          </RESIDUE_SELECTORS>
          <TASKOPERATIONS>
            <ResfileCommandOperation name="repack_ligand" command="NATAA" residue_selector="ligand_selector"/>
            <OperateOnResidueSubset name="prevent_repacking" selector="not_ligand_selector">
              <PreventRepackingRLT/>
            </OperateOnResidueSubset>
          </TASKOPERATIONS>
          <MOVERS>
            <FastRelax name="relax" task_operations="repack_ligand,prevent_repacking">
              <MoveMap bb="0" chi="0" jump="1">
                <ResidueSelector selector="ligand_selector" chi="1" bb="1" bondangle="0" bondlength="0"/>
              </MoveMap>
            </FastRelax>
          </MOVERS>
          <PROTOCOLS>
            <Add mover="relax"/>
          </PROTOCOLS>
        </ROSETTASCRIPTS>
        """
    
    return rosetta_scripts.SingleoutputRosettaScriptsTask(xml)(io.pose_from_file(kwargs["s"]))

### 5. Launch the original simulation using `distribute()`:

In [5]:
if not os.getenv("DEBUG"):
    my_task = {
        "options": "-ex1",
        "extra_options": f"-out:level 300 -multithreading:total_threads 1 -extra_res_fa {params}",
        "s": os.path.join(os.getcwd(), "inputs", "test_lig.pdb"), 
    }

    output_path = os.path.join(os.getcwd(), "outputs_4")

    PyRosettaCluster(
        tasks=my_task,
        client=client,
        scratch_dir=output_path,
        output_path=output_path,
    ).distribute(protocols=[protocol1])

While jobs are running, you may monitor their progress using the dask dashboard diagnostics within Jupyter Lab!

### 6. Visualize the resultant decoy:

Gather the input and output decoys from disk into memory:

In [9]:
if not os.getenv("DEBUG"):
    input_file = os.path.join(os.getcwd(), "inputs", "test_lig.pdb")
    output_files = glob.glob(os.path.join(output_path, "decoys", "*", "*.pdb.bz2"))

    packed_poses = []
    for pdbfile in [input_file] + output_files:
        if pdbfile.endswith(".bz2"):
            with open(pdbfile, "rb") as f:
                packed_poses.append(io.pose_from_pdbstring(bz2.decompress(f.read()).decode()))
        elif pdbfile.endswith(".pdb"):
            with open(pdbfile, "r") as f:
                packed_poses.append(io.pose_from_pdbstring(f.read()))

View the poses in memory:

In [10]:
if not os.getenv("DEBUG"):
    chX = pyrosetta.rosetta.core.select.residue_selector.ChainSelector("X")

    view = viewer.init(packed_poses, window_size=(800, 600))
    view.add(viewer.setStyle())
    view.add(viewer.setStyle(residue_selector=chX, colorscheme="magentaCarbon", radius=0.35))
    view.add(viewer.setHydrogenBonds())
    view.add(viewer.setHydrogens(polar_only=True))
    view()

interactive(children=(IntSlider(value=0, continuous_update=False, description='Decoys', max=1), Output()), _doâ€¦

<function pyrosetta.distributed.viewer.core.Viewer.show.<locals>.view(i=0)>

### Congrats! 
You have successfully executed a PyRosetta simulation that modifies a ligand residue with `PyRosettaCluster`!

<!--NAVIGATION-->
< [PyRosettaCluster Tutorial 3. Multiple decoys](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.09-PyRosettaCluster-Multiple-decoys.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Command Reference](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/A.00-Appendix-A.ipynb) ><p><a href="https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.10-PyRosettaCluster-Ligand-params.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open in Google Colaboratory"></a>