{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\\rightarrow$Run All).\n",
    "\n",
    "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\", as well as your name and collaborators below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "NAME = \"\"\n",
    "COLLABORATORS = \"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NOTEBOOK_HEADER-->\n",
    "*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta);\n",
    "content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [Running Rosetta in Parellel](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/15.00-Running-PyRosetta-in-Parellel.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Distributed computation example: miniprotein design](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/15.02-PyData-miniprotein-design.ipynb) ><p><a href=\"https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/15.01-PyData-ddG-pssm.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open in Google Colaboratory\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Distributed analysis example: exhaustive ddG PSSM\n",
    "\n",
    "## Notes\n",
    "This tutorial will walk you through how to generate an exhaustive ddG PSSM in PyRosetta using the PyData stack for analysis and distributed computing.\n",
    "\n",
    "This Jupyter notebook uses parallelization and is not meant to be executed within a Google Colab environment.\n",
    "\n",
    "## Setup\n",
    "Please see setup instructions in Chapter 15.00\n",
    "\n",
    "## Citation\n",
    "[Integration of the Rosetta Suite with the Python Software Stack via reproducible packaging and core programming interfaces for distributed simulation](https://doi.org/10.1002/pro.3721)\n",
    "\n",
    "Alexander S. Ford, Brian D. Weitzner, Christopher D. Bahl\n",
    "\n",
    "## Manual\n",
    "Documentation for the `pyrosetta.distributed` namespace can be found here: https://nbviewer.jupyter.org/github/proteininnovation/Rosetta-PyData_Integration/blob/master/distributed_overview.ipynb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "if 'google.colab' in sys.modules:\n",
    "   print(\"This Jupyter notebook uses parallelization and is therefore not set up for the Google Colab environment.\")\n",
    "   sys.exit(0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "logging.basicConfig(level=logging.INFO)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas\n",
    "import seaborn\n",
    "import matplotlib"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import Bio.SeqUtils\n",
    "import Bio.Data.IUPACData as IUPACData"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pyrosetta\n",
    "import pyrosetta.distributed.io as io\n",
    "import pyrosetta.distributed.packed_pose as packed_pose\n",
    "import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts\n",
    "import pyrosetta.distributed.tasks.score as score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os,sys,platform\n",
    "\n",
    "platform.python_version()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create test pose, initialize rosetta and pack"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "input_protocol = \"\"\"\n",
    "<ROSETTASCRIPTS>\n",
    "  <TASKOPERATIONS>\n",
    "    <RestrictToRepacking name=\"only_pack\"/>\n",
    "  </TASKOPERATIONS>\n",
    "\n",
    "  <MOVERS>\n",
    "    <PackRotamersMover name=\"pack\" task_operations=\"only_pack\" />\n",
    "  </MOVERS>\n",
    "  \n",
    "  <PROTOCOLS>\n",
    "    <Add mover=\"pack\"/>\n",
    "  </PROTOCOLS>\n",
    "</ROSETTASCRIPTS>\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "input_relax = rosetta_scripts.SingleoutputRosettaScriptsTask(input_protocol)\n",
    "# Syntax check via setup\n",
    "input_relax.setup()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "raw_input_pose = score.ScorePoseTask()(io.pose_from_sequence(\"TESTESTEST\"))\n",
    "input_pose = input_relax(raw_input_pose)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Perform exhaustive point mutation and pack"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def mutate_residue(input_pose, res_index, new_aa, res_label = None):\n",
    "    import pyrosetta.rosetta.core.pose as pose\n",
    "    \n",
    "    work_pose = packed_pose.to_pose(input_pose)\n",
    "    \n",
    "    # Annotate strucure with reslabel, for use in downstream protocol\n",
    "    # Add parameters as score, for use in downstream analysis\n",
    "    if res_label:\n",
    "        work_pose.pdb_info().add_reslabel(res_index, res_label)\n",
    "        pose.setPoseExtraScore(work_pose, \"mutation_index\", res_index)\n",
    "        pose.setPoseExtraScore(work_pose, \"mutation_aa\", new_aa)\n",
    "    \n",
    "    if len(new_aa) == 1:\n",
    "        new_aa = str.upper(Bio.SeqUtils.seq3(new_aa))\n",
    "    assert new_aa in map(str.upper, IUPACData.protein_letters_3to1)\n",
    "    \n",
    "    protocol = \"\"\"\n",
    "<ROSETTASCRIPTS>\n",
    "    <MOVERS>\n",
    "        <MutateResidue name=\"mutate\" new_res=\"%(new_aa)s\" target=\"%(res_index)i\" />\n",
    "    </MOVERS>\n",
    "    <PROTOCOLS>\n",
    "        <Add mover_name=\"mutate\"/>\n",
    "    </PROTOCOLS>\n",
    "</ROSETTASCRIPTS>\n",
    "    \"\"\" % locals()\n",
    "    \n",
    "    return rosetta_scripts.SingleoutputRosettaScriptsTask(protocol)(work_pose)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "refine = \"\"\"\n",
    "<ROSETTASCRIPTS>\n",
    "\n",
    "  <RESIDUE_SELECTORS>\n",
    "    <ResiduePDBInfoHasLabel name=\"mutation\" property=\"mutation\" />\n",
    "    <Not name=\"not_neighbor\">\n",
    "      <Neighborhood selector=\"mutation\" distance=\"12.0\" />\n",
    "    </Not>\n",
    "  </RESIDUE_SELECTORS>\n",
    "  \n",
    "  <TASKOPERATIONS>\n",
    "    <RestrictToRepacking name=\"only_pack\"/>\n",
    "    <OperateOnResidueSubset name=\"only_repack_neighbors\" selector=\"not_neighbor\" >\n",
    "      <PreventRepackingRLT/>\n",
    "    </OperateOnResidueSubset>\n",
    "  </TASKOPERATIONS>\n",
    "\n",
    "  <MOVERS>\n",
    "    <PackRotamersMover name=\"pack_area\" task_operations=\"only_pack,only_repack_neighbors\" />\n",
    "  </MOVERS>\n",
    "  \n",
    "  <PROTOCOLS>\n",
    "    <Add mover=\"pack_area\"/>\n",
    "  </PROTOCOLS>\n",
    "</ROSETTASCRIPTS>\n",
    "    \"\"\"\n",
    "    \n",
    "refine_mutation = rosetta_scripts.SingleoutputRosettaScriptsTask(refine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Mutation and pack"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Job distribution via `multiprocessing`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from multiprocessing import Pool\n",
    "import itertools\n",
    "with pyrosetta.distributed.utility.log.LoggingContext(logging.getLogger(\"rosetta\"), level=logging.WARN):\n",
    "    with Pool() as p:\n",
    "        work = [\n",
    "            (input_pose, i, aa, \"mutation\")\n",
    "            for i, aa in itertools.product(range(1, len(packed_pose.to_pose(input_pose).residues) + 1), IUPACData.protein_letters)\n",
    "        ]\n",
    "        logging.info(\"mutating\")\n",
    "        mutations = p.starmap(mutate_residue, work)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Job distribution via `dask`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import dask.distributed\n",
    "cluster = dask.distributed.LocalCluster(n_workers=2, threads_per_worker=2)\n",
    "client = dask.distributed.Client(cluster)\n",
    "\n",
    "refinement_tasks = [client.submit(refine_mutation, mutant) for mutant in mutations]\n",
    "logging.info(\"refining\")\n",
    "refinements = [task.result() for task in refinement_tasks]\n",
    "\n",
    "client.close()\n",
    "cluster.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Analysis of delta score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result_frame = pandas.DataFrame.from_records(packed_pose.to_dict(refinements))\n",
    "result_frame[\"delta_total_score\"] = result_frame[\"total_score\"] - input_pose.scores[\"total_score\"] \n",
    "result_frame[\"mutation_index\"] = list(map(int, result_frame[\"mutation_index\"]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "matplotlib.rcParams['figure.figsize'] = [24.0, 8.0]\n",
    "seaborn.heatmap(\n",
    "    result_frame.pivot(\"mutation_aa\", \"mutation_index\", \"delta_total_score\"),\n",
    "    cmap=\"RdBu_r\", center=0, vmax=50)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [Running Rosetta in Parellel](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/15.00-Running-PyRosetta-in-Parellel.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Distributed computation example: miniprotein design](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/15.02-PyData-miniprotein-design.ipynb) ><p><a href=\"https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/15.01-PyData-ddG-pssm.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open in Google Colaboratory\"></a>"
   ]
  }
 ],
 "metadata": {
  "_draft": {
   "nbviewer_url": "https://gist.github.com/a3118b776957d03bc4c789493bb541fe"
  },
  "gist": {
   "data": {
    "description": "pyrosetta_distributed_mutation_demo.ipynb",
    "public": true
   },
   "id": "a3118b776957d03bc4c789493bb541fe"
  },
  "kernelspec": {
   "display_name": "Python [conda env:PyRosetta.notebooks] *",
   "language": "python",
   "name": "conda-env-PyRosetta.notebooks-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  },
  "toc": {
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": "block",
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}