{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta.notebooks);\n", "content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [Distributed computation example: miniprotein design](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.02-PyData-miniprotein-design.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Examples Using the `dask` Module](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.04-dask.delayed-Via-Slurm.ipynb) >

\"Open" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Example of Using PyRosetta with GNU Parallel" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note:* In this tutorial, we will write a simple PyRosetta script to disk, and demonstrate how to run it in parallel using GNU parallel. This Jupyter notebook uses parallelization and is **not** meant to be executed within a Google Colab environment.\n", "\n", "**Please see setup instructions in Chapter 16.00**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import logging\n", "logging.basicConfig(level=logging.INFO)\n", "import os\n", "import sys\n", "\n", "if 'google.colab' in sys.modules:\n", " print(\"This Jupyter notebook uses parallelization and is therefore not set up for the Google Colab environment.\")\n", " sys.exit(0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%file outputs/mutate_residue_from_command_line.py\n", "import argparse\n", "import os\n", "import pyrosetta\n", "import uuid\n", "\n", "\n", "__author__ = \"My Name\"\n", "__email__ = \"name@email.com\"\n", "\n", "\n", "def main(target=None, new_res=None):\n", " \"\"\"Example function to run my custom PyRosetta script.\"\"\"\n", " \n", " # Initialize PyRosetta with custom flags\n", " pyrosetta.init(\"-ignore_unrecognized_res 1 -renumber_pdb 1 -mute all\")\n", "\n", " # Setup pose\n", " pose = pyrosetta.pose_from_file(\"inputs/5JG9.clean.pdb\")\n", "\n", " # Setup directory structure\n", " main_dir = os.getcwd()\n", " unique_dir = os.path.join(\"outputs\", \"testing_\" + uuid.uuid4().hex)\n", " if not os.path.isdir(unique_dir):\n", " os.mkdir(unique_dir)\n", " os.chdir(unique_dir)\n", "\n", " # Create scorefunction\n", " scorefxn = pyrosetta.create_score_function(\"ref2015_cart\")\n", "\n", " # PyRosetta design protocol\n", " keep_chA = pyrosetta.rosetta.protocols.grafting.simple_movers.KeepRegionMover(\n", " res_start=str(pose.chain_begin(1)), res_end=str(pose.chain_end(1))\n", " )\n", " keep_chA.apply(pose)\n", " \n", " mutate = pyrosetta.rosetta.protocols.simple_moves.MutateResidue(\n", " target=target, new_res=new_res\n", " )\n", " mutate.apply(pose)\n", "\n", " mm = pyrosetta.rosetta.core.kinematics.MoveMap()\n", " mm.set_bb(True)\n", " mm.set_chi(True)\n", " min_mover = pyrosetta.rosetta.protocols.minimization_packing.MinMover()\n", " min_mover.set_movemap(mm)\n", " min_mover.score_function(scorefxn)\n", " min_mover.min_type(\"lbfgs_armijo_nonmonotone\")\n", " min_mover.cartesian(True)\n", " min_mover.tolerance(0.01)\n", " min_mover.max_iter(200)\n", " min_mover.apply(pose)\n", "\n", " total_score = scorefxn(pose)\n", " \n", " # Setup outputs\n", " pdb_output_filename = \"_\".join([\"5JG9.clean\", str(target), str(new_res), \".pdb\"])\n", " \n", " pyrosetta.dump_pdb(pose, pdb_output_filename)\n", " \n", " # Append scores to scorefile\n", " pyrosetta.toolbox.py_jobdistributor.output_scorefile(\n", " pose=pose, pdb_name=\"5JG9.clean\", current_name=pdb_output_filename,\n", " scorefilepath=\"score.fasc\", scorefxn=scorefxn,\n", " nstruct=1, native_pose=None, additional_decoy_info=None, json_format=True\n", " )\n", " os.chdir(main_dir)\n", "\n", " \n", "if __name__ == \"__main__\":\n", " \n", " # Declare parser object for managing input options\n", " parser = argparse.ArgumentParser()\n", " parser.add_argument(\"-t\", \"--target\", type=int,\n", " help=\"Target residue to mutate as integer.\")\n", " parser.add_argument(\"-r\", \"--new_res\", type=str,\n", " help=\"Three letter amino acid code to which to mutate target.\")\n", " args = parser.parse_args()\n", " \n", " # Run protocol\n", " main(target=args.target, new_res=args.new_res)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we will run this script in parallel several different ways to demonstrate different types of job submission styles." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Parallelize script in an interactive session:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On your laptop, you have access to as many cores as are on your machine.\n", "On a high-performance computing cluster with SLURM scheduling, you first have to request as many cores as you want to run on in an interactive login session:\n", "\n", ">qlogin -c 8 --mem=16g\n", "\n", "will reserve 8 CPU cores and 16 GB of RAM for you and start your session on a node that has available resources.\n", "\n", "Then, we need to write a run file to disc specifying our input parameters" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with open(\"outputs/run_file_parallel_interactive.txt\", \"w\") as f:\n", " for target in [2, 4, 6, 8, 10]:\n", " for new_res in [\"ALA\", \"TRP\"]:\n", " f.write(\"{0} outputs/mutate_residue_from_command_line.py -t {1} -r {2}\\n\".format(sys.executable, target, new_res))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: it's always a good idea to run just one command first to make sure there aren't any errors in your script!\n", "\n", "**Note**: if you don't specify the correct python executable, activate the correct conda environment in your interactive session first.\n", "\n", "Now submit `outputs/run_file_parallel_interactive.txt` to GNU parallel from the command line in your interactive session:\n", "\n", ">cat outputs/run_file_parallel_interactive.txt | parallel -j 8 --no-notice &\n", "\n", "**Note**: The `parallel` exectuable is usually located at `/usr/bin/parallel` but the full path may differ on your computer. For installation info, visit: https://www.gnu.org/software/parallel/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Parallelize script on a high-performance computing cluster with Slurm scheduling (non-interactive submission):" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use GNU parallel again, but this time there is no need to pre-request server allocations. We can submit jobs to the Slurm scheduler from directly within this Jupyter Notebook or from command line!\n", "\n", "Useful background information:\n", " - \"Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. ... As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.\" Read further: https://slurm.schedmd.com/overview.html\n", "\n", " - With the Slurm scheduler we will use the `sbatch` command, therefore it may be useful to review the available options:\n", "https://slurm.schedmd.com/sbatch.html\n", "\n", "First, write a SLURM submission script to disk specifying the job requirements. In this example, our conda environment is called `pyrosetta-bootcamp`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with open(\"outputs/sbatch_parallel.sh\", \"w\") as f:\n", " f.write(\"#!/bin/bash \\n\") # Bash script\n", " f.write(\"#SBATCH -p short \\n\") # Specify \"short\" partition/queue\n", " f.write(\"#SBATCH -n 8 \\n\") # Specify eight cores\n", " f.write(\"#SBATCH -N 1 \\n\") # Specify one node\n", " f.write(\"#SBATCH --mem=16g \\n\") # Specify 16GB RAM over eight cores\n", " f.write(\"#SBATCH -o sbatch_parallel.log \\n\") # Specify output log filename\n", " f.write(\"conda activate pyrosetta-bootcamp \\n\") # Activate conda environment\n", " f.write(\"cat outputs/run_file_parallel_interactive.txt | /usr/bin/parallel -j 8 --no-notice \\n\") # Submit jobs to GNU parallel" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, submit `outputs/sbatch_parallel.sh` to the SLURM scheduler with the `sbatch` command:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " !sbatch outputs/sbatch_parallel.sh" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then periodically check on the status of our jobs:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!squeue -u $USER" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Submit jobs individually to the SLURM scheduler:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This time, we submit each job individually using the `sbatch` command directly on our PyRosetta script.\n", "\n", "**Warning**: do not submit more than ~1000 jobs with this method or risk clogging the SLURM scheduler!\n", "\n", "First, copy your python executable and paste it on the first line of the PyRosetta script after `#!`, followed by `#SBATCH` commands for each job:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sys.executable" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%file outputs/mutate_residue_from_sbatch.py\n", "#!/home/klimaj/anaconda3/envs/pyrosetta-bootcamp/bin/python\n", "#SBATCH -p short\n", "#SBATCH -n 1\n", "#SBATCH --mem=2g\n", "#SBATCH -o sbatch.log\n", "\n", "import argparse\n", "import os\n", "import pyrosetta\n", "import uuid\n", "\n", "\n", "__author__ = \"My Name\"\n", "__email__ = \"name@email.com\"\n", "\n", "\n", "def main(target=None, new_res=None):\n", " \"\"\"Example function to run my custom PyRosetta script.\n", " \"\"\"\n", " \n", " # Initialize PyRosetta with custom flags\n", " pyrosetta.init(\"-ignore_unrecognized_res 1 -renumber_pdb 1 -mute all\")\n", "\n", " # Setup pose\n", " pose = pyrosetta.pose_from_file(\"inputs/5JG9.clean.pdb\")\n", " \n", " # Setup directory structure\n", " main_dir = os.getcwd()\n", " unique_dir = os.path.join(\"outputs\", \"testing_\" + uuid.uuid4().hex)\n", " if not os.path.isdir(unique_dir):\n", " os.mkdir(unique_dir)\n", " os.chdir(unique_dir)\n", "\n", " # Create scorefunction\n", " scorefxn = pyrosetta.create_score_function(\"ref2015_cart\")\n", "\n", " # PyRosetta design protocol\n", " keep_chA = pyrosetta.rosetta.protocols.grafting.simple_movers.KeepRegionMover(\n", " res_start=str(pose.chain_begin(1)), res_end=str(pose.chain_end(1))\n", " )\n", " keep_chA.apply(pose)\n", " \n", " mutate = pyrosetta.rosetta.protocols.simple_moves.MutateResidue(\n", " target=target, new_res=new_res\n", " )\n", " mutate.apply(pose)\n", "\n", " mm = pyrosetta.rosetta.core.kinematics.MoveMap()\n", " mm.set_bb(True)\n", " mm.set_chi(True)\n", " min_mover = pyrosetta.rosetta.protocols.minimization_packing.MinMover()\n", " min_mover.set_movemap(mm)\n", " min_mover.score_function(scorefxn)\n", " min_mover.min_type(\"lbfgs_armijo_nonmonotone\")\n", " min_mover.cartesian(True)\n", " min_mover.tolerance(0.01)\n", " min_mover.max_iter(200)\n", " min_mover.apply(pose)\n", "\n", " total_score = scorefxn(pose)\n", " \n", " # Setup outputs\n", " pdb_output_filename = \"_\".join([\"5JG9.clean\", str(target), str(new_res), \".pdb\"])\n", " \n", " pyrosetta.dump_pdb(pose, pdb_output_filename)\n", " \n", " # Append scores to scorefile\n", " pyrosetta.toolbox.py_jobdistributor.output_scorefile(\n", " pose=pose, pdb_name=\"5JG9.clean\", current_name=pdb_output_filename,\n", " scorefilepath=\"score.fasc\", scorefxn=scorefxn,\n", " nstruct=1, native_pose=None, additional_decoy_info=None, json_format=True\n", " )\n", " os.chdir(main_dir)\n", "\n", " \n", "if __name__ == \"__main__\":\n", " \n", " # Declare parser object for managing input options\n", " parser = argparse.ArgumentParser()\n", " parser.add_argument(\"-t\", \"--target\", type=int,\n", " help=\"Target residue to mutate as integer.\")\n", " parser.add_argument(\"-r\", \"--new_res\", type=str,\n", " help=\"Three letter amino acid code to which to mutate target.\")\n", " args = parser.parse_args()\n", " \n", " # Run protocol\n", " main(target=args.target, new_res=args.new_res)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, loop over your input parameters submitting the PyRosetta scripts to the scheduler using the `sbatch` command:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " for target in [2, 4, 6, 8, 10]:\n", " for new_res in [\"ALA\", \"TRP\"]:\n", " !sbatch ./outputs/mutate_residue_from_sbatch.py -t $target -r $new_res" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then periodically check on the status of our jobs:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!squeue -u $USER" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [Distributed computation example: miniprotein design](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.02-PyData-miniprotein-design.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Examples Using the `dask` Module](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.04-dask.delayed-Via-Slurm.ipynb) >

\"Open" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:PyRosetta.notebooks]", "language": "python", "name": "conda-env-PyRosetta.notebooks-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }