{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NOTEBOOK_HEADER-->\n",
    "*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta.notebooks);\n",
    "content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [Distributed computation example: miniprotein design](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.02-PyData-miniprotein-design.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Examples Using the `dask` Module](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.04-dask.delayed-Via-Slurm.ipynb) ><p><a href=\"https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.03-GNU-Parallel-Via-Slurm.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open in Google Colaboratory\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Example of Using PyRosetta with GNU Parallel"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Note:* In this tutorial, we will write a simple PyRosetta script to disk, and demonstrate how to run it in parallel using GNU parallel. This Jupyter notebook uses parallelization and is **not** meant to be executed within a Google Colab environment.\n",
    "\n",
    "**Please see setup instructions in Chapter 16.00**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "logging.basicConfig(level=logging.INFO)\n",
    "import os\n",
    "import sys\n",
    "\n",
    "if 'google.colab' in sys.modules:\n",
    "    print(\"This Jupyter notebook uses parallelization and is therefore not set up for the Google Colab environment.\")\n",
    "    sys.exit(0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%file outputs/mutate_residue_from_command_line.py\n",
    "import argparse\n",
    "import os\n",
    "import pyrosetta\n",
    "import uuid\n",
    "\n",
    "\n",
    "__author__ = \"My Name\"\n",
    "__email__ = \"name@email.com\"\n",
    "\n",
    "\n",
    "def main(target=None, new_res=None):\n",
    "    \"\"\"Example function to run my custom PyRosetta script.\"\"\"\n",
    "    \n",
    "    # Initialize PyRosetta with custom flags\n",
    "    pyrosetta.init(\"-ignore_unrecognized_res 1 -renumber_pdb 1 -mute all\")\n",
    "\n",
    "    # Setup pose\n",
    "    pose = pyrosetta.pose_from_file(\"inputs/5JG9.clean.pdb\")\n",
    "\n",
    "    # Setup directory structure\n",
    "    main_dir = os.getcwd()\n",
    "    unique_dir = os.path.join(\"outputs\", \"testing_\" + uuid.uuid4().hex)\n",
    "    if not os.path.isdir(unique_dir):\n",
    "        os.mkdir(unique_dir)\n",
    "        os.chdir(unique_dir)\n",
    "\n",
    "    # Create scorefunction\n",
    "    scorefxn = pyrosetta.create_score_function(\"ref2015_cart\")\n",
    "\n",
    "    # PyRosetta design protocol\n",
    "    keep_chA = pyrosetta.rosetta.protocols.grafting.simple_movers.KeepRegionMover(\n",
    "        res_start=str(pose.chain_begin(1)), res_end=str(pose.chain_end(1))\n",
    "    )\n",
    "    keep_chA.apply(pose)\n",
    "    \n",
    "    mutate = pyrosetta.rosetta.protocols.simple_moves.MutateResidue(\n",
    "        target=target, new_res=new_res\n",
    "    )\n",
    "    mutate.apply(pose)\n",
    "\n",
    "    mm = pyrosetta.rosetta.core.kinematics.MoveMap()\n",
    "    mm.set_bb(True)\n",
    "    mm.set_chi(True)\n",
    "    min_mover = pyrosetta.rosetta.protocols.minimization_packing.MinMover()\n",
    "    min_mover.set_movemap(mm)\n",
    "    min_mover.score_function(scorefxn)\n",
    "    min_mover.min_type(\"lbfgs_armijo_nonmonotone\")\n",
    "    min_mover.cartesian(True)\n",
    "    min_mover.tolerance(0.01)\n",
    "    min_mover.max_iter(200)\n",
    "    min_mover.apply(pose)\n",
    "\n",
    "    total_score = scorefxn(pose)\n",
    "    \n",
    "    # Setup outputs\n",
    "    pdb_output_filename = \"_\".join([\"5JG9.clean\", str(target), str(new_res), \".pdb\"])\n",
    "    \n",
    "    pyrosetta.dump_pdb(pose, pdb_output_filename)\n",
    "    \n",
    "    # Append scores to scorefile\n",
    "    pyrosetta.toolbox.py_jobdistributor.output_scorefile(\n",
    "        pose=pose, pdb_name=\"5JG9.clean\", current_name=pdb_output_filename,\n",
    "        scorefilepath=\"score.fasc\", scorefxn=scorefxn,\n",
    "        nstruct=1, native_pose=None, additional_decoy_info=None, json_format=True\n",
    "    )\n",
    "    os.chdir(main_dir)\n",
    "\n",
    "    \n",
    "if __name__ == \"__main__\":\n",
    "    \n",
    "    # Declare parser object for managing input options\n",
    "    parser = argparse.ArgumentParser()\n",
    "    parser.add_argument(\"-t\", \"--target\", type=int,\n",
    "                    help=\"Target residue to mutate as integer.\")\n",
    "    parser.add_argument(\"-r\", \"--new_res\", type=str,\n",
    "                        help=\"Three letter amino acid code to which to mutate target.\")\n",
    "    args = parser.parse_args()\n",
    "    \n",
    "    # Run protocol\n",
    "    main(target=args.target, new_res=args.new_res)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we will run this script in parallel several different ways to demonstrate different types of job submission styles."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Parallelize script in an interactive session:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On your laptop, you have access to as many cores as are on your machine.\n",
    "On a high-performance computing cluster with SLURM scheduling, you first have to request as many cores as you want to run on in an interactive login session:\n",
    "\n",
    ">qlogin -c 8 --mem=16g\n",
    "\n",
    "will reserve 8 CPU cores and 16 GB of RAM for you and start your session on a node that has available resources.\n",
    "\n",
    "Then, we need to write a run file to disc specifying our input parameters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"outputs/run_file_parallel_interactive.txt\", \"w\") as f:\n",
    "    for target in [2, 4, 6, 8, 10]:\n",
    "        for new_res in [\"ALA\", \"TRP\"]:\n",
    "            f.write(\"{0} outputs/mutate_residue_from_command_line.py -t {1} -r {2}\\n\".format(sys.executable, target, new_res))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note**: it's always a good idea to run just one command first to make sure there aren't any errors in your script!\n",
    "\n",
    "**Note**: if you don't specify the correct python executable, activate the correct conda environment in your interactive session first.\n",
    "\n",
    "Now submit `outputs/run_file_parallel_interactive.txt` to GNU parallel from the command line in your interactive session:\n",
    "\n",
    ">cat outputs/run_file_parallel_interactive.txt | parallel -j 8 --no-notice &\n",
    "\n",
    "**Note**: The `parallel` exectuable is usually located at `/usr/bin/parallel` but the full path may differ on your computer. For installation info, visit: https://www.gnu.org/software/parallel/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Parallelize script on a high-performance computing cluster with Slurm scheduling (non-interactive submission):"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use GNU parallel again, but this time there is no need to pre-request server allocations. We can submit jobs to the Slurm scheduler from directly within this Jupyter Notebook or from command line!\n",
    "\n",
    "Useful background information:\n",
    " - \"Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. ... As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.\" Read further: https://slurm.schedmd.com/overview.html\n",
    "\n",
    " - With the Slurm scheduler we will use the `sbatch` command, therefore it may be useful to review the available options:\n",
    "https://slurm.schedmd.com/sbatch.html\n",
    "\n",
    "First, write a SLURM submission script to disk specifying the job requirements. In this example, our conda environment is called `pyrosetta-bootcamp`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"outputs/sbatch_parallel.sh\", \"w\") as f:\n",
    "    f.write(\"#!/bin/bash \\n\") # Bash script\n",
    "    f.write(\"#SBATCH -p short \\n\") # Specify \"short\" partition/queue\n",
    "    f.write(\"#SBATCH -n 8 \\n\") # Specify eight cores\n",
    "    f.write(\"#SBATCH -N 1 \\n\") # Specify one node\n",
    "    f.write(\"#SBATCH --mem=16g \\n\") # Specify 16GB RAM over eight cores\n",
    "    f.write(\"#SBATCH -o sbatch_parallel.log \\n\") # Specify output log filename\n",
    "    f.write(\"conda activate pyrosetta-bootcamp \\n\") # Activate conda environment\n",
    "    f.write(\"cat outputs/run_file_parallel_interactive.txt | /usr/bin/parallel -j 8 --no-notice \\n\") # Submit jobs to GNU parallel"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, submit `outputs/sbatch_parallel.sh` to the SLURM scheduler with the `sbatch` command:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not os.getenv(\"DEBUG\"):\n",
    "    !sbatch outputs/sbatch_parallel.sh"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can then periodically check on the status of our jobs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!squeue -u $USER"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Submit jobs individually to the SLURM scheduler:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This time, we submit each job individually using the `sbatch` command directly on our PyRosetta script.\n",
    "\n",
    "**Warning**: do not submit more than ~1000 jobs with this method or risk clogging the SLURM scheduler!\n",
    "\n",
    "First, copy your python executable and paste it on the first line of the PyRosetta script after `#!`, followed by `#SBATCH` commands for each job:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sys.executable"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%file outputs/mutate_residue_from_sbatch.py\n",
    "#!/home/klimaj/anaconda3/envs/pyrosetta-bootcamp/bin/python\n",
    "#SBATCH -p short\n",
    "#SBATCH -n 1\n",
    "#SBATCH --mem=2g\n",
    "#SBATCH -o sbatch.log\n",
    "\n",
    "import argparse\n",
    "import os\n",
    "import pyrosetta\n",
    "import uuid\n",
    "\n",
    "\n",
    "__author__ = \"My Name\"\n",
    "__email__ = \"name@email.com\"\n",
    "\n",
    "\n",
    "def main(target=None, new_res=None):\n",
    "    \"\"\"Example function to run my custom PyRosetta script.\n",
    "    \"\"\"\n",
    "    \n",
    "    # Initialize PyRosetta with custom flags\n",
    "    pyrosetta.init(\"-ignore_unrecognized_res 1 -renumber_pdb 1 -mute all\")\n",
    "\n",
    "    # Setup pose\n",
    "    pose = pyrosetta.pose_from_file(\"inputs/5JG9.clean.pdb\")\n",
    "    \n",
    "    # Setup directory structure\n",
    "    main_dir = os.getcwd()\n",
    "    unique_dir = os.path.join(\"outputs\", \"testing_\" + uuid.uuid4().hex)\n",
    "    if not os.path.isdir(unique_dir):\n",
    "        os.mkdir(unique_dir)\n",
    "        os.chdir(unique_dir)\n",
    "\n",
    "    # Create scorefunction\n",
    "    scorefxn = pyrosetta.create_score_function(\"ref2015_cart\")\n",
    "\n",
    "    # PyRosetta design protocol\n",
    "    keep_chA = pyrosetta.rosetta.protocols.grafting.simple_movers.KeepRegionMover(\n",
    "        res_start=str(pose.chain_begin(1)), res_end=str(pose.chain_end(1))\n",
    "    )\n",
    "    keep_chA.apply(pose)\n",
    "    \n",
    "    mutate = pyrosetta.rosetta.protocols.simple_moves.MutateResidue(\n",
    "        target=target, new_res=new_res\n",
    "    )\n",
    "    mutate.apply(pose)\n",
    "\n",
    "    mm = pyrosetta.rosetta.core.kinematics.MoveMap()\n",
    "    mm.set_bb(True)\n",
    "    mm.set_chi(True)\n",
    "    min_mover = pyrosetta.rosetta.protocols.minimization_packing.MinMover()\n",
    "    min_mover.set_movemap(mm)\n",
    "    min_mover.score_function(scorefxn)\n",
    "    min_mover.min_type(\"lbfgs_armijo_nonmonotone\")\n",
    "    min_mover.cartesian(True)\n",
    "    min_mover.tolerance(0.01)\n",
    "    min_mover.max_iter(200)\n",
    "    min_mover.apply(pose)\n",
    "\n",
    "    total_score = scorefxn(pose)\n",
    "    \n",
    "    # Setup outputs\n",
    "    pdb_output_filename = \"_\".join([\"5JG9.clean\", str(target), str(new_res), \".pdb\"])\n",
    "    \n",
    "    pyrosetta.dump_pdb(pose, pdb_output_filename)\n",
    "    \n",
    "    # Append scores to scorefile\n",
    "    pyrosetta.toolbox.py_jobdistributor.output_scorefile(\n",
    "        pose=pose, pdb_name=\"5JG9.clean\", current_name=pdb_output_filename,\n",
    "        scorefilepath=\"score.fasc\", scorefxn=scorefxn,\n",
    "        nstruct=1, native_pose=None, additional_decoy_info=None, json_format=True\n",
    "    )\n",
    "    os.chdir(main_dir)\n",
    "\n",
    "    \n",
    "if __name__ == \"__main__\":\n",
    "    \n",
    "    # Declare parser object for managing input options\n",
    "    parser = argparse.ArgumentParser()\n",
    "    parser.add_argument(\"-t\", \"--target\", type=int,\n",
    "                    help=\"Target residue to mutate as integer.\")\n",
    "    parser.add_argument(\"-r\", \"--new_res\", type=str,\n",
    "                        help=\"Three letter amino acid code to which to mutate target.\")\n",
    "    args = parser.parse_args()\n",
    "    \n",
    "    # Run protocol\n",
    "    main(target=args.target, new_res=args.new_res)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, loop over your input parameters submitting the PyRosetta scripts to the scheduler using the `sbatch` command:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not os.getenv(\"DEBUG\"):\n",
    "    for target in [2, 4, 6, 8, 10]:\n",
    "        for new_res in [\"ALA\", \"TRP\"]:\n",
    "            !sbatch ./outputs/mutate_residue_from_sbatch.py -t $target -r $new_res"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can then periodically check on the status of our jobs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!squeue -u $USER"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [Distributed computation example: miniprotein design](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.02-PyData-miniprotein-design.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Examples Using the `dask` Module](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.04-dask.delayed-Via-Slurm.ipynb) ><p><a href=\"https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.03-GNU-Parallel-Via-Slurm.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open in Google Colaboratory\"></a>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:PyRosetta.notebooks]",
   "language": "python",
   "name": "conda-env-PyRosetta.notebooks-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}