{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta);\n", "content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [Distributed computation example: miniprotein design](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/15.02-PyData-miniprotein-design.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Examples Using the `dask` Module](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/15.04-dask.delayed-Via-Slurm.ipynb) >

\"Open" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Example of Using PyRosetta with GNU Parallel" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note:* In this tutorial, we will write a simple PyRosetta script to disk, and demonstrate how to run it in parallel using GNU parallel. This Jupyter notebook uses parallelization and is **not** meant to be executed within a Google Colab environment.\n", "\n", "**Please see setup instructions in Chapter 15.00**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import logging\n", "logging.basicConfig(level=logging.INFO)\n", "import os\n", "import sys\n", "\n", "if 'google.colab' in sys.modules:\n", " print(\"This Jupyter notebook uses parallelization and is therefore not set up for the Google Colab environment.\")\n", " sys.exit(0)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing outputs/mutate_residue_from_command_line.py\n" ] } ], "source": [ "%%file outputs/mutate_residue_from_command_line.py\n", "import argparse\n", "import os\n", "import pyrosetta\n", "import uuid\n", "\n", "\n", "__author__ = \"My Name\"\n", "__email__ = \"name@email.com\"\n", "\n", "\n", "def main(target=None, new_res=None):\n", " \"\"\"Example function to run my custom PyRosetta script.\"\"\"\n", " \n", " # Initialize PyRosetta with custom flags\n", " pyrosetta.init(\"-ignore_unrecognized_res 1 -renumber_pdb 1 -mute all\")\n", "\n", " # Setup pose\n", " pose = pyrosetta.pose_from_file(\"inputs/5JG9.clean.pdb\")\n", "\n", " # Setup directory structure\n", " main_dir = os.getcwd()\n", " unique_dir = os.path.join(\"outputs\", \"testing_\" + uuid.uuid4().hex)\n", " if not os.path.isdir(unique_dir):\n", " os.mkdir(unique_dir)\n", " os.chdir(unique_dir)\n", "\n", " # Create scorefunction\n", " scorefxn = pyrosetta.create_score_function(\"ref2015_cart\")\n", "\n", " # PyRosetta design protocol\n", " keep_chA = pyrosetta.rosetta.protocols.grafting.simple_movers.KeepRegionMover(\n", " res_start=str(pose.chain_begin(1)), res_end=str(pose.chain_end(1))\n", " )\n", " keep_chA.apply(pose)\n", " \n", " mutate = pyrosetta.rosetta.protocols.simple_moves.MutateResidue(\n", " target=target, new_res=new_res\n", " )\n", " mutate.apply(pose)\n", "\n", " mm = pyrosetta.rosetta.core.kinematics.MoveMap()\n", " mm.set_bb(True)\n", " mm.set_chi(True)\n", " min_mover = pyrosetta.rosetta.protocols.minimization_packing.MinMover()\n", " min_mover.set_movemap(mm)\n", " min_mover.score_function(scorefxn)\n", " min_mover.min_type(\"lbfgs_armijo_nonmonotone\")\n", " min_mover.cartesian(True)\n", " min_mover.tolerance(0.01)\n", " min_mover.max_iter(200)\n", " min_mover.apply(pose)\n", "\n", " total_score = scorefxn(pose)\n", " \n", " # Setup outputs\n", " pdb_output_filename = \"_\".join([\"5JG9.clean\", str(target), str(new_res), \".pdb\"])\n", " \n", " pyrosetta.dump_pdb(pose, pdb_output_filename)\n", " \n", " # Append scores to scorefile\n", " pyrosetta.toolbox.py_jobdistributor.output_scorefile(\n", " pose=pose, pdb_name=\"5JG9.clean\", current_name=pdb_output_filename,\n", " scorefilepath=\"score.fasc\", scorefxn=scorefxn,\n", " nstruct=1, native_pose=None, additional_decoy_info=None, json_format=True\n", " )\n", " os.chdir(main_dir)\n", "\n", " \n", "if __name__ == \"__main__\":\n", " \n", " # Declare parser object for managing input options\n", " parser = argparse.ArgumentParser()\n", " parser.add_argument(\"-t\", \"--target\", type=int,\n", " help=\"Target residue to mutate as integer.\")\n", " parser.add_argument(\"-r\", \"--new_res\", type=str,\n", " help=\"Three letter amino acid code to which to mutate target.\")\n", " args = parser.parse_args()\n", " \n", " # Run protocol\n", " main(target=args.target, new_res=args.new_res)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we will run this script in parallel several different ways to demonstrate different types of job submission styles." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Parallelize script in an interactive session:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On your laptop, you have access to as many cores as are on your machine.\n", "On a high-performance computing cluster with SLURM scheduling, you first have to request as many cores as you want to run on in an interactive login session:\n", "\n", ">qlogin -c 8 --mem=16g\n", "\n", "will reserve 8 CPU cores and 16 GB of RAM for you and start your session on a node that has available resources.\n", "\n", "Then, we need to write a run file to disc specifying our input parameters" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "with open(\"outputs/run_file_parallel_interactive.txt\", \"w\") as f:\n", " for target in [2, 4, 6, 8, 10]:\n", " for new_res in [\"ALA\", \"TRP\"]:\n", " f.write(\"{0} outputs/mutate_residue_from_command_line.py -t {1} -r {2}\\n\".format(sys.executable, target, new_res))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: it's always a good idea to run just one command first to make sure there aren't any errors in your script!\n", "\n", "**Note**: if you don't specify the correct python executable, activate the correct conda environment in your interactive session first.\n", "\n", "Now submit `outputs/run_file_parallel_interactive.txt` to GNU parallel from the command line in your interactive session:\n", "\n", ">cat outputs/run_file_parallel_interactive.txt | parallel -j 8 --no-notice &\n", "\n", "**Note**: The `parallel` exectuable is usually located at `/usr/bin/parallel` but the full path may differ on your computer. For installation info, visit: https://www.gnu.org/software/parallel/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Parallelize script on a high-performance computing cluster with Slurm scheduling (non-interactive submission):" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use GNU parallel again, but this time there is no need to pre-request server allocations. We can submit jobs to the Slurm scheduler from directly within this Jupyter Notebook or from command line!\n", "\n", "Useful background information:\n", " - \"Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. ... As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.\" Read further: https://slurm.schedmd.com/overview.html\n", "\n", " - With the Slurm scheduler we will use the `sbatch` command, therefore it may be useful to review the available options:\n", "https://slurm.schedmd.com/sbatch.html\n", "\n", "First, write a SLURM submission script to disk specifying the job requirements. In this example, our conda environment is called `pyrosetta-bootcamp`:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "with open(\"outputs/sbatch_parallel.sh\", \"w\") as f:\n", " f.write(\"#!/bin/bash \\n\") # Bash script\n", " f.write(\"#SBATCH -p short \\n\") # Specify \"short\" partition/queue\n", " f.write(\"#SBATCH -n 8 \\n\") # Specify eight cores\n", " f.write(\"#SBATCH -N 1 \\n\") # Specify one node\n", " f.write(\"#SBATCH --mem=16g \\n\") # Specify 16GB RAM over eight cores\n", " f.write(\"#SBATCH -o sbatch_parallel.log \\n\") # Specify output log filename\n", " f.write(\"conda activate pyrosetta-bootcamp \\n\") # Activate conda environment\n", " f.write(\"cat outputs/run_file_parallel_interactive.txt | /usr/bin/parallel -j 8 --no-notice \\n\") # Submit jobs to GNU parallel" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, submit `outputs/sbatch_parallel.sh` to the SLURM scheduler with the `sbatch` command:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Submitted batch job 12942806\r\n" ] } ], "source": [ "if not os.getenv(\"DEBUG\"):\n", " !sbatch outputs/sbatch_parallel.sh" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then periodically check on the status of our jobs:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\r\n", " 12942806 short sbatch.s klimaj PD 0:00 1 (Priority)\r\n", " 12931193 interacti zsh klimaj R 2:17:02 1 dig1\r\n", " 12931183 interacti jupyter- klimaj R 2:17:37 1 dig5\r\n", " 12935447 interacti zsh klimaj R 58:46 1 dig1\r\n" ] } ], "source": [ "!squeue -u $USER" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Submit jobs individually to the SLURM scheduler:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This time, we submit each job individually using the `sbatch` command directly on our PyRosetta script.\n", "\n", "**Warning**: do not submit more than ~1000 jobs with this method or risk clogging the SLURM scheduler!\n", "\n", "First, copy your python executable and paste it on the first line of the PyRosetta script after `#!`, followed by `#SBATCH` commands for each job:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'/home/klimaj/anaconda3/envs/pyrosetta-dev/bin/python'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sys.executable" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing outputs/mutate_residue_from_sbatch.py\n" ] } ], "source": [ "%%file outputs/mutate_residue_from_sbatch.py\n", "#!/home/klimaj/anaconda3/envs/pyrosetta-bootcamp/bin/python\n", "#SBATCH -p short\n", "#SBATCH -n 1\n", "#SBATCH --mem=2g\n", "#SBATCH -o sbatch.log\n", "\n", "import argparse\n", "import os\n", "import pyrosetta\n", "import uuid\n", "\n", "\n", "__author__ = \"My Name\"\n", "__email__ = \"name@email.com\"\n", "\n", "\n", "def main(target=None, new_res=None):\n", " \"\"\"Example function to run my custom PyRosetta script.\n", " \"\"\"\n", " \n", " # Initialize PyRosetta with custom flags\n", " pyrosetta.init(\"-ignore_unrecognized_res 1 -renumber_pdb 1 -mute all\")\n", "\n", " # Setup pose\n", " pose = pyrosetta.pose_from_file(\"inputs/5JG9.clean.pdb\")\n", " \n", " # Setup directory structure\n", " main_dir = os.getcwd()\n", " unique_dir = os.path.join(\"outputs\", \"testing_\" + uuid.uuid4().hex)\n", " if not os.path.isdir(unique_dir):\n", " os.mkdir(unique_dir)\n", " os.chdir(unique_dir)\n", "\n", " # Create scorefunction\n", " scorefxn = pyrosetta.create_score_function(\"ref2015_cart\")\n", "\n", " # PyRosetta design protocol\n", " keep_chA = pyrosetta.rosetta.protocols.grafting.simple_movers.KeepRegionMover(\n", " res_start=str(pose.chain_begin(1)), res_end=str(pose.chain_end(1))\n", " )\n", " keep_chA.apply(pose)\n", " \n", " mutate = pyrosetta.rosetta.protocols.simple_moves.MutateResidue(\n", " target=target, new_res=new_res\n", " )\n", " mutate.apply(pose)\n", "\n", " mm = pyrosetta.rosetta.core.kinematics.MoveMap()\n", " mm.set_bb(True)\n", " mm.set_chi(True)\n", " min_mover = pyrosetta.rosetta.protocols.minimization_packing.MinMover()\n", " min_mover.set_movemap(mm)\n", " min_mover.score_function(scorefxn)\n", " min_mover.min_type(\"lbfgs_armijo_nonmonotone\")\n", " min_mover.cartesian(True)\n", " min_mover.tolerance(0.01)\n", " min_mover.max_iter(200)\n", " min_mover.apply(pose)\n", "\n", " total_score = scorefxn(pose)\n", " \n", " # Setup outputs\n", " pdb_output_filename = \"_\".join([\"5JG9.clean\", str(target), str(new_res), \".pdb\"])\n", " \n", " pyrosetta.dump_pdb(pose, pdb_output_filename)\n", " \n", " # Append scores to scorefile\n", " pyrosetta.toolbox.py_jobdistributor.output_scorefile(\n", " pose=pose, pdb_name=\"5JG9.clean\", current_name=pdb_output_filename,\n", " scorefilepath=\"score.fasc\", scorefxn=scorefxn,\n", " nstruct=1, native_pose=None, additional_decoy_info=None, json_format=True\n", " )\n", " os.chdir(main_dir)\n", "\n", " \n", "if __name__ == \"__main__\":\n", " \n", " # Declare parser object for managing input options\n", " parser = argparse.ArgumentParser()\n", " parser.add_argument(\"-t\", \"--target\", type=int,\n", " help=\"Target residue to mutate as integer.\")\n", " parser.add_argument(\"-r\", \"--new_res\", type=str,\n", " help=\"Three letter amino acid code to which to mutate target.\")\n", " args = parser.parse_args()\n", " \n", " # Run protocol\n", " main(target=args.target, new_res=args.new_res)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, loop over your input parameters submitting the PyRosetta scripts to the scheduler using the `sbatch` command:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Submitted batch job 12946283\n", "Submitted batch job 12946284\n", "Submitted batch job 12946285\n", "Submitted batch job 12946286\n", "Submitted batch job 12946287\n", "Submitted batch job 12946288\n", "Submitted batch job 12946289\n", "Submitted batch job 12946290\n", "Submitted batch job 12946291\n", "Submitted batch job 12946292\n" ] } ], "source": [ "if not os.getenv(\"DEBUG\"):\n", " for target in [2, 4, 6, 8, 10]:\n", " for new_res in [\"ALA\", \"TRP\"]:\n", " !sbatch ./outputs/mutate_residue_from_sbatch.py -t $target -r $new_res" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then periodically check on the status of our jobs:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\r\n", " 12946286 short mutate_r klimaj PD 0:00 1 (Resources)\r\n", " 12946287 short mutate_r klimaj PD 0:00 1 (Priority)\r\n", " 12946288 short mutate_r klimaj PD 0:00 1 (Priority)\r\n", " 12946289 short mutate_r klimaj PD 0:00 1 (Priority)\r\n", " 12946290 short mutate_r klimaj PD 0:00 1 (Priority)\r\n", " 12946291 short mutate_r klimaj PD 0:00 1 (Priority)\r\n", " 12946292 short mutate_r klimaj PD 0:00 1 (Priority)\r\n", " 12931193 interacti zsh klimaj R 2:19:28 1 dig1\r\n", " 12931183 interacti jupyter- klimaj R 2:20:03 1 dig5\r\n", " 12935447 interacti zsh klimaj R 1:01:12 1 dig1\r\n", " 12946283 short mutate_r klimaj R 0:00 1 dig73\r\n", " 12946284 short mutate_r klimaj R 0:00 1 dig87\r\n", " 12946285 short mutate_r klimaj R 0:00 1 dig100\r\n" ] } ], "source": [ "!squeue -u $USER" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [Distributed computation example: miniprotein design](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/15.02-PyData-miniprotein-design.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Examples Using the `dask` Module](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/15.04-dask.delayed-Via-Slurm.ipynb) >

\"Open" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:PyRosetta.notebooks] *", "language": "python", "name": "conda-env-PyRosetta.notebooks-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }