{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta.notebooks);\n", "content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [Distributed computation example: miniprotein design](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.02-PyData-miniprotein-design.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Examples Using the `dask` Module](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.04-dask.delayed-Via-Slurm.ipynb) >
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Example of Using PyRosetta with GNU Parallel"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Note:* In this tutorial, we will write a simple PyRosetta script to disk, and demonstrate how to run it in parallel using GNU parallel. This Jupyter notebook uses parallelization and is **not** meant to be executed within a Google Colab environment.\n",
"\n",
"**Please see setup instructions in Chapter 16.00**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"logging.basicConfig(level=logging.INFO)\n",
"import os\n",
"import sys\n",
"\n",
"if 'google.colab' in sys.modules:\n",
" print(\"This Jupyter notebook uses parallelization and is therefore not set up for the Google Colab environment.\")\n",
" sys.exit(0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%file outputs/mutate_residue_from_command_line.py\n",
"import argparse\n",
"import os\n",
"import pyrosetta\n",
"import uuid\n",
"\n",
"\n",
"__author__ = \"My Name\"\n",
"__email__ = \"name@email.com\"\n",
"\n",
"\n",
"def main(target=None, new_res=None):\n",
" \"\"\"Example function to run my custom PyRosetta script.\"\"\"\n",
" \n",
" # Initialize PyRosetta with custom flags\n",
" pyrosetta.init(\"-ignore_unrecognized_res 1 -renumber_pdb 1 -mute all\")\n",
"\n",
" # Setup pose\n",
" pose = pyrosetta.pose_from_file(\"inputs/5JG9.clean.pdb\")\n",
"\n",
" # Setup directory structure\n",
" main_dir = os.getcwd()\n",
" unique_dir = os.path.join(\"outputs\", \"testing_\" + uuid.uuid4().hex)\n",
" if not os.path.isdir(unique_dir):\n",
" os.mkdir(unique_dir)\n",
" os.chdir(unique_dir)\n",
"\n",
" # Create scorefunction\n",
" scorefxn = pyrosetta.create_score_function(\"ref2015_cart\")\n",
"\n",
" # PyRosetta design protocol\n",
" keep_chA = pyrosetta.rosetta.protocols.grafting.simple_movers.KeepRegionMover(\n",
" res_start=str(pose.chain_begin(1)), res_end=str(pose.chain_end(1))\n",
" )\n",
" keep_chA.apply(pose)\n",
" \n",
" mutate = pyrosetta.rosetta.protocols.simple_moves.MutateResidue(\n",
" target=target, new_res=new_res\n",
" )\n",
" mutate.apply(pose)\n",
"\n",
" mm = pyrosetta.rosetta.core.kinematics.MoveMap()\n",
" mm.set_bb(True)\n",
" mm.set_chi(True)\n",
" min_mover = pyrosetta.rosetta.protocols.minimization_packing.MinMover()\n",
" min_mover.set_movemap(mm)\n",
" min_mover.score_function(scorefxn)\n",
" min_mover.min_type(\"lbfgs_armijo_nonmonotone\")\n",
" min_mover.cartesian(True)\n",
" min_mover.tolerance(0.01)\n",
" min_mover.max_iter(200)\n",
" min_mover.apply(pose)\n",
"\n",
" total_score = scorefxn(pose)\n",
" \n",
" # Setup outputs\n",
" pdb_output_filename = \"_\".join([\"5JG9.clean\", str(target), str(new_res), \".pdb\"])\n",
" \n",
" pyrosetta.dump_pdb(pose, pdb_output_filename)\n",
" \n",
" # Append scores to scorefile\n",
" pyrosetta.toolbox.py_jobdistributor.output_scorefile(\n",
" pose=pose, pdb_name=\"5JG9.clean\", current_name=pdb_output_filename,\n",
" scorefilepath=\"score.fasc\", scorefxn=scorefxn,\n",
" nstruct=1, native_pose=None, additional_decoy_info=None, json_format=True\n",
" )\n",
" os.chdir(main_dir)\n",
"\n",
" \n",
"if __name__ == \"__main__\":\n",
" \n",
" # Declare parser object for managing input options\n",
" parser = argparse.ArgumentParser()\n",
" parser.add_argument(\"-t\", \"--target\", type=int,\n",
" help=\"Target residue to mutate as integer.\")\n",
" parser.add_argument(\"-r\", \"--new_res\", type=str,\n",
" help=\"Three letter amino acid code to which to mutate target.\")\n",
" args = parser.parse_args()\n",
" \n",
" # Run protocol\n",
" main(target=args.target, new_res=args.new_res)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we will run this script in parallel several different ways to demonstrate different types of job submission styles."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. Parallelize script in an interactive session:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On your laptop, you have access to as many cores as are on your machine.\n",
"On a high-performance computing cluster with SLURM scheduling, you first have to request as many cores as you want to run on in an interactive login session:\n",
"\n",
">qlogin -c 8 --mem=16g\n",
"\n",
"will reserve 8 CPU cores and 16 GB of RAM for you and start your session on a node that has available resources.\n",
"\n",
"Then, we need to write a run file to disc specifying our input parameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with open(\"outputs/run_file_parallel_interactive.txt\", \"w\") as f:\n",
" for target in [2, 4, 6, 8, 10]:\n",
" for new_res in [\"ALA\", \"TRP\"]:\n",
" f.write(\"{0} outputs/mutate_residue_from_command_line.py -t {1} -r {2}\\n\".format(sys.executable, target, new_res))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**: it's always a good idea to run just one command first to make sure there aren't any errors in your script!\n",
"\n",
"**Note**: if you don't specify the correct python executable, activate the correct conda environment in your interactive session first.\n",
"\n",
"Now submit `outputs/run_file_parallel_interactive.txt` to GNU parallel from the command line in your interactive session:\n",
"\n",
">cat outputs/run_file_parallel_interactive.txt | parallel -j 8 --no-notice &\n",
"\n",
"**Note**: The `parallel` exectuable is usually located at `/usr/bin/parallel` but the full path may differ on your computer. For installation info, visit: https://www.gnu.org/software/parallel/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Parallelize script on a high-performance computing cluster with Slurm scheduling (non-interactive submission):"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We use GNU parallel again, but this time there is no need to pre-request server allocations. We can submit jobs to the Slurm scheduler from directly within this Jupyter Notebook or from command line!\n",
"\n",
"Useful background information:\n",
" - \"Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. ... As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.\" Read further: https://slurm.schedmd.com/overview.html\n",
"\n",
" - With the Slurm scheduler we will use the `sbatch` command, therefore it may be useful to review the available options:\n",
"https://slurm.schedmd.com/sbatch.html\n",
"\n",
"First, write a SLURM submission script to disk specifying the job requirements. In this example, our conda environment is called `pyrosetta-bootcamp`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with open(\"outputs/sbatch_parallel.sh\", \"w\") as f:\n",
" f.write(\"#!/bin/bash \\n\") # Bash script\n",
" f.write(\"#SBATCH -p short \\n\") # Specify \"short\" partition/queue\n",
" f.write(\"#SBATCH -n 8 \\n\") # Specify eight cores\n",
" f.write(\"#SBATCH -N 1 \\n\") # Specify one node\n",
" f.write(\"#SBATCH --mem=16g \\n\") # Specify 16GB RAM over eight cores\n",
" f.write(\"#SBATCH -o sbatch_parallel.log \\n\") # Specify output log filename\n",
" f.write(\"conda activate pyrosetta-bootcamp \\n\") # Activate conda environment\n",
" f.write(\"cat outputs/run_file_parallel_interactive.txt | /usr/bin/parallel -j 8 --no-notice \\n\") # Submit jobs to GNU parallel"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, submit `outputs/sbatch_parallel.sh` to the SLURM scheduler with the `sbatch` command:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if not os.getenv(\"DEBUG\"):\n",
" !sbatch outputs/sbatch_parallel.sh"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can then periodically check on the status of our jobs:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!squeue -u $USER"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. Submit jobs individually to the SLURM scheduler:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This time, we submit each job individually using the `sbatch` command directly on our PyRosetta script.\n",
"\n",
"**Warning**: do not submit more than ~1000 jobs with this method or risk clogging the SLURM scheduler!\n",
"\n",
"First, copy your python executable and paste it on the first line of the PyRosetta script after `#!`, followed by `#SBATCH` commands for each job:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sys.executable"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%file outputs/mutate_residue_from_sbatch.py\n",
"#!/home/klimaj/anaconda3/envs/pyrosetta-bootcamp/bin/python\n",
"#SBATCH -p short\n",
"#SBATCH -n 1\n",
"#SBATCH --mem=2g\n",
"#SBATCH -o sbatch.log\n",
"\n",
"import argparse\n",
"import os\n",
"import pyrosetta\n",
"import uuid\n",
"\n",
"\n",
"__author__ = \"My Name\"\n",
"__email__ = \"name@email.com\"\n",
"\n",
"\n",
"def main(target=None, new_res=None):\n",
" \"\"\"Example function to run my custom PyRosetta script.\n",
" \"\"\"\n",
" \n",
" # Initialize PyRosetta with custom flags\n",
" pyrosetta.init(\"-ignore_unrecognized_res 1 -renumber_pdb 1 -mute all\")\n",
"\n",
" # Setup pose\n",
" pose = pyrosetta.pose_from_file(\"inputs/5JG9.clean.pdb\")\n",
" \n",
" # Setup directory structure\n",
" main_dir = os.getcwd()\n",
" unique_dir = os.path.join(\"outputs\", \"testing_\" + uuid.uuid4().hex)\n",
" if not os.path.isdir(unique_dir):\n",
" os.mkdir(unique_dir)\n",
" os.chdir(unique_dir)\n",
"\n",
" # Create scorefunction\n",
" scorefxn = pyrosetta.create_score_function(\"ref2015_cart\")\n",
"\n",
" # PyRosetta design protocol\n",
" keep_chA = pyrosetta.rosetta.protocols.grafting.simple_movers.KeepRegionMover(\n",
" res_start=str(pose.chain_begin(1)), res_end=str(pose.chain_end(1))\n",
" )\n",
" keep_chA.apply(pose)\n",
" \n",
" mutate = pyrosetta.rosetta.protocols.simple_moves.MutateResidue(\n",
" target=target, new_res=new_res\n",
" )\n",
" mutate.apply(pose)\n",
"\n",
" mm = pyrosetta.rosetta.core.kinematics.MoveMap()\n",
" mm.set_bb(True)\n",
" mm.set_chi(True)\n",
" min_mover = pyrosetta.rosetta.protocols.minimization_packing.MinMover()\n",
" min_mover.set_movemap(mm)\n",
" min_mover.score_function(scorefxn)\n",
" min_mover.min_type(\"lbfgs_armijo_nonmonotone\")\n",
" min_mover.cartesian(True)\n",
" min_mover.tolerance(0.01)\n",
" min_mover.max_iter(200)\n",
" min_mover.apply(pose)\n",
"\n",
" total_score = scorefxn(pose)\n",
" \n",
" # Setup outputs\n",
" pdb_output_filename = \"_\".join([\"5JG9.clean\", str(target), str(new_res), \".pdb\"])\n",
" \n",
" pyrosetta.dump_pdb(pose, pdb_output_filename)\n",
" \n",
" # Append scores to scorefile\n",
" pyrosetta.toolbox.py_jobdistributor.output_scorefile(\n",
" pose=pose, pdb_name=\"5JG9.clean\", current_name=pdb_output_filename,\n",
" scorefilepath=\"score.fasc\", scorefxn=scorefxn,\n",
" nstruct=1, native_pose=None, additional_decoy_info=None, json_format=True\n",
" )\n",
" os.chdir(main_dir)\n",
"\n",
" \n",
"if __name__ == \"__main__\":\n",
" \n",
" # Declare parser object for managing input options\n",
" parser = argparse.ArgumentParser()\n",
" parser.add_argument(\"-t\", \"--target\", type=int,\n",
" help=\"Target residue to mutate as integer.\")\n",
" parser.add_argument(\"-r\", \"--new_res\", type=str,\n",
" help=\"Three letter amino acid code to which to mutate target.\")\n",
" args = parser.parse_args()\n",
" \n",
" # Run protocol\n",
" main(target=args.target, new_res=args.new_res)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, loop over your input parameters submitting the PyRosetta scripts to the scheduler using the `sbatch` command:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if not os.getenv(\"DEBUG\"):\n",
" for target in [2, 4, 6, 8, 10]:\n",
" for new_res in [\"ALA\", \"TRP\"]:\n",
" !sbatch ./outputs/mutate_residue_from_sbatch.py -t $target -r $new_res"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can then periodically check on the status of our jobs:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!squeue -u $USER"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"< [Distributed computation example: miniprotein design](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.02-PyData-miniprotein-design.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Examples Using the `dask` Module](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.04-dask.delayed-Via-Slurm.ipynb) >
"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:PyRosetta.notebooks]",
"language": "python",
"name": "conda-env-PyRosetta.notebooks-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}