{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta.notebooks);\n", "content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [Example of Using PyRosetta with GNU Parallel](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.03-GNU-Parallel-Via-Slurm.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Part I: Parallelized Global Ligand Docking with `pyrosetta.distributed`](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/16.05-Ligand-Docking-dask.ipynb) >
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Examples Using the `dask` Module" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### We can make use of the `dask` library to parallelize code" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note:* This Jupyter notebook uses parallelization and is **not** meant to be executed within a Google Colab environment." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note:* This Jupyter notebook requires the PyRosetta distributed layer which is obtained by building PyRosetta with the `--serialization` flag or installing PyRosetta from the RosettaCommons conda channel \n", "\n", "**Please see Chapter 16.00 for setup instructions**" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import dask\n", "import dask.array as da\n", "import graphviz\n", "import logging\n", "logging.basicConfig(level=logging.INFO)\n", "import numpy as np\n", "import os\n", "import pyrosetta\n", "import pyrosetta.distributed\n", "import pyrosetta.distributed.dask\n", "import pyrosetta.distributed.io as io\n", "import random\n", "import sys\n", "\n", "from dask.distributed import Client, LocalCluster, progress\n", "from dask_jobqueue import SLURMCluster\n", "from IPython.display import Image\n", "\n", "if 'google.colab' in sys.modules:\n", " print(\"This Jupyter notebook uses parallelization and is therefore not set up for the Google Colab environment.\")\n", " sys.exit(0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Initialize PyRosetta within this Jupyter notebook using custom command line PyRosetta flags:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:pyrosetta.distributed:maybe_init performing pyrosetta initialization: {'extra_options': '-out:level 100 -ignore_unrecognized_res 1 -ignore_waters 0 -detect_disulf 0', 'silent': True}\n", "INFO:pyrosetta.rosetta:Found rosetta database at: /home/klimaj/anaconda3/envs/PyRosetta.notebooks/lib/python3.7/site-packages/pyrosetta/database; using it....\n", "INFO:pyrosetta.rosetta:PyRosetta-4 2020 [Rosetta PyRosetta4.conda.linux.CentOS.python37.Release 2020.02+release.22ef835b4a2647af94fcd6421a85720f07eddf12 2020-01-05T17:31:56] retrieved from: http://www.pyrosetta.org\n", "(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.\n" ] } ], "source": [ "flags = \"\"\"-out:level 100\n", "-ignore_unrecognized_res 1\n", " -ignore_waters 0 \n", " -detect_disulf 0 # Do not automatically detect disulfides\n", "\"\"\" # These can be unformatted for user convenience, but no spaces in file paths!\n", "pyrosetta.distributed.init(flags)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you are running this example on a high-performance computing (HPC) cluster with SLURM scheduling, use the `SLURMCluster` class described below. For more information, visit https://jobqueue.dask.org/en/latest/generated/dask_jobqueue.SLURMCluster.html. **Note**: If you are running this example on a HPC cluster with a job scheduler other than SLURM, `dask_jobqueue` also works with other job schedulers: http://jobqueue.dask.org/en/latest/api.html\n", "\n", "The `SLURMCluster` class in the `dask_jobqueue` module is very useful! In this case, we are requesting four workers using `cluster.scale(4)`, and specifying each worker to have:\n", "- one thread per worker with `cores=1`\n", "- one process per worker with `processes=1`\n", "- one CPU per task per worker with `job_cpu=1`\n", "- a total of 4GB memory per worker with `memory=\"4GB\"`\n", "- itself run on the \"short\" queue/partition on the SLURM scheduler with `queue=\"short\"`\n", "- a maximum job walltime of 3 hours using `walltime=\"03:00:00\"`\n", "- output dask files directed to `local_directory`\n", "- output SLURM log files directed to file path and file name (and any other SLURM commands) with the `job_extra` option\n", "- pre-initialization with the same custom command line PyRosetta flags used in this Jupyter notebook, using the `extra=pyrosetta.distributed.dask.worker_extra(init_flags=flags)` option\n", "\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "if not os.getenv(\"DEBUG\"):\n", " scratch_dir = os.path.join(\"/net/scratch\", os.environ[\"USER\"])\n", " cluster = SLURMCluster(\n", " cores=1,\n", " processes=1,\n", " job_cpu=1,\n", " memory=\"4GB\",\n", " queue=\"short\",\n", " walltime=\"02:59:00\",\n", " local_directory=scratch_dir,\n", " job_extra=[\"-o {}\".format(os.path.join(scratch_dir, \"slurm-%j.out\"))],\n", " extra=pyrosetta.distributed.dask.worker_extra(init_flags=flags)\n", " )\n", " cluster.scale(4)\n", " client = Client(cluster)\n", "else:\n", " cluster = None\n", " client = None" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: The actual sbatch script submitted to the Slurm scheduler under the hood was:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "#!/usr/bin/env bash\n", "\n", "#SBATCH -J dask-worker\n", "#SBATCH -p short\n", "#SBATCH -n 1\n", "#SBATCH --cpus-per-task=1\n", "#SBATCH --mem=4G\n", "#SBATCH -t 02:59:00\n", "#SBATCH -o /net/scratch/klimaj/slurm-%j.out\n", "\n", "JOB_ID=${SLURM_JOB_ID%;*}\n", "\n", "/home/klimaj/anaconda3/envs/PyRosetta.notebooks/bin/python -m distributed.cli.dask_worker tcp://172.16.131.107:19949 --nthreads 1 --memory-limit 4.00GB --name name --nanny --death-timeout 60 --local-directory /net/scratch/klimaj --preload pyrosetta.distributed.dask.worker ' -out:level 100 -ignore_unrecognized_res 1 -ignore_waters 0 -detect_disulf 0'\n", "\n" ] } ], "source": [ "if not os.getenv(\"DEBUG\"):\n", " print(cluster.job_script())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Otherwise, if you are running this example locally on your laptop, you can still spawn workers and take advantage of the `dask` module:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# cluster = LocalCluster(n_workers=1, threads_per_worker=1)\n", "# client = Client(cluster)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Open the `dask` dashboard, which shows diagnostic information about the current state of your cluster and helps track progress, identify performance issues, and debug failures:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n",
"Client\n", "
| \n",
"\n",
"Cluster\n", "
| \n",
"
" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:PyRosetta.notebooks]", "language": "python", "name": "conda-env-PyRosetta.notebooks-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }