{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NOTEBOOK_HEADER-->\n",
    "*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta);\n",
    "content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [Side-Chain Packing](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.02-Side-chain-packing.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Docking](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/07.00-Docking.ipynb) ><p><a href=\"https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.03-Design.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open in Google Colaboratory\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Design\n",
    "Keywords: generate_resfile_from_pdb(), generate_resfile_from_pose(), create_packer_task(), mutate_residue()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Notebook setup\n",
    "import sys\n",
    "if 'google.colab' in sys.modules:\n",
    "    !pip install pyrosettacolabsetup\n",
    "    import pyrosettacolabsetup\n",
    "    pyrosettacolabsetup.setup()\n",
    "    print (\"Notebook is set for PyRosetta use in Colab.  Have fun!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Make sure you are in the directory with the pdb files:**\n",
    "\n",
    "`cd google_drive/My\\ Drive/student-notebooks/`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0mcore.init: \u001b[0mChecking for fconfig files in pwd and ./rosetta/flags\n",
      "\u001b[0mcore.init: \u001b[0mRosetta version: PyRosetta4.Release.python36.mac r208 2019.04+release.fd666910a5e fd666910a5edac957383b32b3b4c9d10020f34c1 http://www.pyrosetta.org 2019-01-22T15:55:37\n",
      "\u001b[0mcore.init: \u001b[0mcommand: PyRosetta -ex1 -ex2aro -database /Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/database\n",
      "\u001b[0mcore.init: \u001b[0m'RNG device' seed mode, using '/dev/urandom', seed=114166398 seed_offset=0 real_seed=114166398\n",
      "\u001b[0mcore.init.random: \u001b[0mRandomGenerator:init: Normal mode, seed=114166398 RG_type=mt19937\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/teaching.py:13: UserWarning: Import of 'rosetta' as a top-level module is deprecated and may be removed in 2018, import via 'pyrosetta.rosetta'.\n",
      "  from rosetta.core.scoring import *\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0mcore.chemical.GlobalResidueTypeSet: \u001b[0mFinished initializing fa_standard residue type set.  Created 696 residue types\n",
      "\u001b[0mcore.chemical.GlobalResidueTypeSet: \u001b[0mTotal time to initialize 1.06092 seconds.\n",
      "\u001b[0mcore.import_pose.import_pose: \u001b[0mFile '1YY8.clean.pdb' automatically determined to be of type PDB\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  CG  on residue ARG 18\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  CD  on residue ARG 18\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  NE  on residue ARG 18\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  CZ  on residue ARG 18\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  NH1 on residue ARG 18\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  NH2 on residue ARG 18\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  CG  on residue GLN:NtermProteinFull 214\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  CD  on residue GLN:NtermProteinFull 214\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  OE1 on residue GLN:NtermProteinFull 214\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  NE2 on residue GLN:NtermProteinFull 214\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mFound disulfide between residues 23 88\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 23 CYS\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 88 CYS\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 23 CYD\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 88 CYD\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mFound disulfide between residues 134 194\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 134 CYS\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 194 CYS\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 134 CYD\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 194 CYD\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mFound disulfide between residues 235 308\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 235 CYS\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 308 CYS\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 235 CYD\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 308 CYD\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mFound disulfide between residues 359 415\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 359 CYS\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 415 CYS\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 359 CYD\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 415 CYD\n",
      "\u001b[0mcore.pack.pack_missing_sidechains: \u001b[0mpacking residue number 18 because of missing atom number 6 atom name  CG\n",
      "\u001b[0mcore.pack.pack_missing_sidechains: \u001b[0mpacking residue number 214 because of missing atom number 6 atom name  CG\n",
      "\u001b[0mcore.pack.task: \u001b[0mPacker task: initialize from command line()\n",
      "\u001b[0mcore.scoring.ScoreFunctionFactory: \u001b[0mSCOREFUNCTION: \u001b[32mref2015\u001b[0m\n",
      "\u001b[0mcore.scoring.etable: \u001b[0mStarting energy table calculation\n",
      "\u001b[0mcore.scoring.etable: \u001b[0msmooth_etable: changing atr/rep split to bottom of energy well\n",
      "\u001b[0mcore.scoring.etable: \u001b[0msmooth_etable: spline smoothing lj etables (maxdis = 6)\n",
      "\u001b[0mcore.scoring.etable: \u001b[0msmooth_etable: spline smoothing solvation etables (max_dis = 6)\n",
      "\u001b[0mcore.scoring.etable: \u001b[0mFinished calculating energy tables.\n",
      "\u001b[0mbasic.io.database: \u001b[0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBPoly1D.csv\n",
      "\u001b[0mbasic.io.database: \u001b[0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBFadeIntervals.csv\n",
      "\u001b[0mbasic.io.database: \u001b[0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBEval.csv\n",
      "\u001b[0mbasic.io.database: \u001b[0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/DonStrength.csv\n",
      "\u001b[0mbasic.io.database: \u001b[0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/AccStrength.csv\n",
      "\u001b[0mbasic.io.database: \u001b[0mDatabase file opened: scoring/score_functions/rama/fd/all.ramaProb\n",
      "\u001b[0mbasic.io.database: \u001b[0mDatabase file opened: scoring/score_functions/rama/fd/prepro.ramaProb\n",
      "\u001b[0mbasic.io.database: \u001b[0mDatabase file opened: scoring/score_functions/omega/omega_ppdep.all.txt\n",
      "\u001b[0mbasic.io.database: \u001b[0mDatabase file opened: scoring/score_functions/omega/omega_ppdep.gly.txt\n",
      "\u001b[0mbasic.io.database: \u001b[0mDatabase file opened: scoring/score_functions/omega/omega_ppdep.pro.txt\n",
      "\u001b[0mbasic.io.database: \u001b[0mDatabase file opened: scoring/score_functions/omega/omega_ppdep.valile.txt\n",
      "\u001b[0mbasic.io.database: \u001b[0mDatabase file opened: scoring/score_functions/P_AA_pp/P_AA\n",
      "\u001b[0mbasic.io.database: \u001b[0mDatabase file opened: scoring/score_functions/P_AA_pp/P_AA_n\n",
      "\u001b[0mcore.scoring.P_AA: \u001b[0mshapovalov_lib::shap_p_aa_pp_smooth_level of 1( aka low_smooth ) got activated.\n",
      "\u001b[0mbasic.io.database: \u001b[0mDatabase file opened: scoring/score_functions/P_AA_pp/shapovalov/10deg/kappa131/a20.prop\n",
      "\u001b[0mbasic.io.database: \u001b[0mDatabase file opened: scoring/score_functions/elec_cp_reps.dat\n",
      "\u001b[0mcore.scoring.elec.util: \u001b[0mRead 40 countpair representative atoms\n",
      "\u001b[0mcore.pack.dunbrack.RotamerLibrary: \u001b[0mshapovalov_lib_fixes_enable option is true.\n",
      "\u001b[0mcore.pack.dunbrack.RotamerLibrary: \u001b[0mshapovalov_lib::shap_dun10_smooth_level of 1( aka lowest_smooth ) got activated.\n",
      "\u001b[0mcore.pack.dunbrack.RotamerLibrary: \u001b[0mBinary rotamer library selected: /Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin\n",
      "\u001b[0mcore.pack.dunbrack.RotamerLibrary: \u001b[0mUsing Dunbrack library binary file '/Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin'.\n",
      "\u001b[0mcore.pack.dunbrack.RotamerLibrary: \u001b[0mDunbrack 2010 library took 0.428411 seconds to load from binary\n",
      "\u001b[0mcore.pack.pack_rotamers: \u001b[0mbuilt 43 rotamers at 2 positions.\n",
      "\u001b[0mcore.pack.interaction_graph.interaction_graph_factory: \u001b[0mInstantiating DensePDInteractionGraph\n",
      "\u001b[0mcore.scoring.ScoreFunctionFactory: \u001b[0mSCOREFUNCTION: \u001b[32mref2015\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "# From previous section:\n",
    "from pyrosetta import *\n",
    "from pyrosetta.teaching import *\n",
    "pyrosetta.init()\n",
    "pose = pose_from_pdb(\"inputs/1YY8.clean.pdb\")\n",
    "start_pose = Pose()\n",
    "start_pose.assign(pose)\n",
    "scorefxn = get_fa_scorefxn()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Design calculations can be accomplished simply by packing side chains with a rotamer set that includes all amino acid types. That is, when the Monte Carlo routine swaps rotamers, it could replace the existing side chain with another conformation of the same residue or some conformation of a different residue type. Trial mutations are accepted or rejected with the Metropolis criterion, and the standard full-atom energy function is supplemented by a reference energy term, `ref`, which represents the relative energies of each residue type in an unfolded peptide."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Design operations are easiest to specify through a data file called a “resfile.” You can create a resfile for a given pdb file or pose using the following toolbox methods:\n",
    "\n",
    "\n",
    "```\n",
    "from pyrosetta.toolbox import generate_resfile_from_pdb\n",
    "generate_resfile_from_pdb(\"1YY8.clean.pdb\", \"1YY8.resfile\")\n",
    "from pyrosetta.toolbox import generate_resfile_from_pose\n",
    "generate_resfile_from_pose(pose, \"1YY8.resfile\")\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "nbgrader": {
     "grade": true,
     "grade_id": "cell-c34d0bd1a81b4a7d",
     "locked": false,
     "points": 0,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0mcore.import_pose.import_pose: \u001b[0mFile '1YY8.clean.pdb' automatically determined to be of type PDB\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  CG  on residue ARG 18\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  CD  on residue ARG 18\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  NE  on residue ARG 18\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  CZ  on residue ARG 18\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  NH1 on residue ARG 18\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  NH2 on residue ARG 18\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  CG  on residue GLN:NtermProteinFull 214\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  CD  on residue GLN:NtermProteinFull 214\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  OE1 on residue GLN:NtermProteinFull 214\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  NE2 on residue GLN:NtermProteinFull 214\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mFound disulfide between residues 23 88\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 23 CYS\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 88 CYS\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 23 CYD\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 88 CYD\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mFound disulfide between residues 134 194\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 134 CYS\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 194 CYS\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 134 CYD\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 194 CYD\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mFound disulfide between residues 235 308\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 235 CYS\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 308 CYS\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 235 CYD\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 308 CYD\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mFound disulfide between residues 359 415\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 359 CYS\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 415 CYS\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 359 CYD\n",
      "\u001b[0mcore.conformation.Conformation: \u001b[0mcurrent variant for 415 CYD\n",
      "\u001b[0mcore.pack.pack_missing_sidechains: \u001b[0mpacking residue number 18 because of missing atom number 6 atom name  CG\n",
      "\u001b[0mcore.pack.pack_missing_sidechains: \u001b[0mpacking residue number 214 because of missing atom number 6 atom name  CG\n",
      "\u001b[0mcore.pack.task: \u001b[0mPacker task: initialize from command line()\n",
      "\u001b[0mcore.scoring.ScoreFunctionFactory: \u001b[0mSCOREFUNCTION: \u001b[32mref2015\u001b[0m\n",
      "\u001b[0mcore.pack.pack_rotamers: \u001b[0mbuilt 43 rotamers at 2 positions.\n",
      "\u001b[0mcore.pack.interaction_graph.interaction_graph_factory: \u001b[0mInstantiating DensePDInteractionGraph\n"
     ]
    }
   ],
   "source": [
    "### BEGIN SOLUTION\n",
    "from pyrosetta.toolbox import generate_resfile_from_pdb\n",
    "generate_resfile_from_pdb(\"inputs/1YY8.clean.pdb\", \"1YY8.resfile\")\n",
    "from pyrosetta.toolbox import generate_resfile_from_pose\n",
    "generate_resfile_from_pose(pose, \"1YY8.resfile\")\n",
    "### END SOLUTION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Inside the resfile you will see:\n",
    "\n",
    "```\n",
    "NATAA\n",
    "USE_INPUT_SC\n",
    "start\n",
    "```\n",
    "\n",
    "This means that all of the native amino acids will remain the same, but we will allow repacking to other rotamers. You can change \"NATAA\" to any of the phrases in the table below.\n",
    "\n",
    "Also, under \"start\", you can add exceptions for certain amino acids. Let's do an example."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| Name |  Definition  |\n",
    "|------|------|\n",
    "|  NATRO | use native amino acid and native rotamer (does not repack)|\n",
    "|  NATAA | use native amino acid but allow repacking to other rotamers|\n",
    "|  PIKAA ILV | use only the following amino acids (ILV) and allow repacking between them|\n",
    "|  ALLAA | use all amino acids and all repacking|"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## In your terminal:\n",
    "\n",
    "\n",
    "Edit the resfile to force residue 49 to be glutamic acid (`49 A PIKAA E`) and ensure all other residues cannot be redesigned (change `NATAA` to `NATRO`). Save the file as `1YY8-K49E.resfile`.\n",
    "\n",
    "Your resfile should look like this:\n",
    "```\n",
    "NATRO\n",
    "USE_INPUT_SC\n",
    "start\n",
    "49 A PIKAA E\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Back here in the notebook:\n",
    "\n",
    "Create a new task for design from the resfile:\n",
    "```\n",
    "from pyrosetta.rosetta.core.pack.task import TaskFactory, parse_resfile\n",
    "task_design = TaskFactory.create_packer_task(pose)\n",
    "parse_resfile(pose, task_design, \"1YY8-K49E.resfile\")\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "nbgrader": {
     "grade": true,
     "grade_id": "cell-1754b479780ad2e4",
     "locked": false,
     "points": 0,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [],
   "source": [
    "### BEGIN SOLUTION\n",
    "from pyrosetta.rosetta.core.pack.task import TaskFactory, parse_resfile\n",
    "task_design = TaskFactory.create_packer_task(pose)\n",
    "parse_resfile(pose, task_design, \"expected_outputs/1YY8-K49E.resfile\")\n",
    "### END SOLUTION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Score the original `start_pose` conformation from the 1YY8 pdb for reference. Create a new `PackResiduesMover` with the `task_design` and use it to mutate residue 49 to glutamic acid. Confirm you mutated the residue by printing residue 49.\n",
    "\n",
    "__Question:__ What is the predicted Δ*G* of the mutation? Is this a stabilizing mutation?\n",
    "\n",
    "```\n",
    "pose.assign(start_pose)\n",
    "pack_mover = PackRotamersMover(scorefxn, task_design)\n",
    "print(pose.residue(49))\n",
    "pack_mover.apply(pose)\n",
    "print(pose.residue(49))\n",
    "print(scorefxn(pose) - scorefxn(start_pose))\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "nbgrader": {
     "grade": true,
     "grade_id": "cell-9e50dd452de9681a",
     "locked": false,
     "points": 0,
     "schema_version": 1,
     "solution": true
    },
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Residue 49: LYS (LYS, K):\n",
      "Base: LYS\n",
      " Properties: POLYMER PROTEIN CANONICAL_AA POLAR CHARGED POSITIVE_CHARGE METALBINDING SIDECHAIN_AMINE ALPHA_AA L_AA\n",
      " Variant types:\n",
      " Main-chain atoms:  N    CA   C  \n",
      " Backbone atoms:    N    CA   C    O    H    HA \n",
      " Side-chain atoms:  CB   CG   CD   CE   NZ  1HB  2HB  1HG  2HG  1HD  2HD  1HE  2HE  1HZ  2HZ  3HZ \n",
      "Atom Coordinates:\n",
      "   N  : 32.097, 27.128, 7.923\n",
      "   CA : 31.663, 28.176, 7.007\n",
      "   C  : 30.939, 27.471, 5.878\n",
      "   O  : 31.191, 26.303, 5.597\n",
      "   CB : 32.852, 28.971, 6.449\n",
      "   CG : 33.743, 28.165, 5.512\n",
      "   CD : 35.058, 28.866, 5.167\n",
      "   CE : 35.795, 28.002, 4.134\n",
      "   NZ : 37.148, 28.45, 3.923\n",
      "   H  : 32.6902, 26.3872, 7.57742\n",
      "   HA : 31.0202, 28.8679, 7.55234\n",
      "  1HB : 32.4846, 29.8416, 5.90508\n",
      "  2HB : 33.4654, 29.335, 7.27346\n",
      "  1HG : 33.9859, 27.2076, 5.97453\n",
      "  2HG : 33.2119, 27.9736, 4.58014\n",
      "  1HD : 34.8472, 29.8571, 4.76287\n",
      "  2HD : 35.6568, 28.9812, 6.07042\n",
      "  1HE : 35.8169, 26.9678, 4.47438\n",
      "  2HE : 35.2623, 28.0373, 3.18377\n",
      "  1HZ : 37.5965, 27.8576, 3.23856\n",
      "  2HZ : 37.1393, 29.4035, 3.58859\n",
      "  3HZ : 37.6581, 28.4036, 4.79339\n",
      "Mirrored relative to coordinates in ResidueType: FALSE\n",
      "\n",
      "\u001b[0mcore.pack.pack_rotamers: \u001b[0mbuilt 6 rotamers at 1 positions.\n",
      "\u001b[0mcore.pack.interaction_graph.interaction_graph_factory: \u001b[0mInstantiating PDInteractionGraph\n",
      "Residue 49: GLU (GLU, E):\n",
      "Base: GLU\n",
      " Properties: POLYMER PROTEIN CANONICAL_AA SC_ORBITALS POLAR CHARGED NEGATIVE_CHARGE METALBINDING ALPHA_AA L_AA\n",
      " Variant types:\n",
      " Main-chain atoms:  N    CA   C  \n",
      " Backbone atoms:    N    CA   C    O    H    HA \n",
      " Side-chain atoms:  CB   CG   CD   OE1  OE2 1HB  2HB  1HG  2HG \n",
      "Atom Coordinates:\n",
      "   N  : 32.097, 27.128, 7.923\n",
      "   CA : 31.663, 28.176, 7.007\n",
      "   C  : 30.939, 27.471, 5.878\n",
      "   O  : 31.191, 26.303, 5.597\n",
      "   CB : 32.841, 28.9963, 6.4766\n",
      "   CG : 33.8415, 28.199, 5.65185\n",
      "   CD : 34.9945, 29.0322, 5.16552\n",
      "   OE1: 35.322, 29.9961, 5.8151\n",
      "   OE2: 35.5487, 28.7043, 4.1428\n",
      "   H  : 32.6902, 26.3872, 7.57742\n",
      "   HA : 31.0202, 28.8679, 7.55234\n",
      "  1HB : 32.4672, 29.8096, 5.85432\n",
      "  2HB : 33.3783, 29.4442, 7.31266\n",
      "  1HG : 34.2295, 27.3826, 6.26034\n",
      "  2HG : 33.3267, 27.7648, 4.79599\n",
      "Mirrored relative to coordinates in ResidueType: FALSE\n",
      "\n",
      "5.00381001783191\n"
     ]
    }
   ],
   "source": [
    "### BEGIN SOLUTION\n",
    "pose.assign(start_pose)\n",
    "pack_mover = PackRotamersMover(scorefxn, task_design)\n",
    "print(pose.residue(49))\n",
    "pack_mover.apply(pose)\n",
    "print(pose.residue(49))\n",
    "print(scorefxn(pose) - scorefxn(start_pose))\n",
    "### END SOLUTION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note the residue reference energy term (`ref`) in the scoring function.\n",
    "\n",
    "__Question:__ What is this value before and after you mutated the residue? What does this energy represent?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## In your terminal:\n",
    "\n",
    "Create a new resfile that allows residue 49 to be designed freely (`49 A ALLAA`) and call it `1YY8-K49All.resfile`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Back here in the notebook:\n",
    "\n",
    "Create a new `PackerTask` and `PackRotamersMover` and apply it.\n",
    "\n",
    "__Question:__ What residue does Rosetta choose? Why?\n",
    "\n",
    "```\n",
    "pose.assign(start_pose)\n",
    "task_design_all = TaskFactory.create_packer_task(pose)\n",
    "parse_resfile(pose, task_design_all, \"1YY8-K49All.resfile\")\n",
    "pack_mover_all = PackRotamersMover(scorefxn, task_design_all)\n",
    "pack_mover_all.apply(pose)\n",
    "print(pose.residue(49))\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "nbgrader": {
     "grade": true,
     "grade_id": "cell-ecdd3a3881511698",
     "locked": false,
     "points": 0,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0mcore.pack.pack_rotamers: \u001b[0mbuilt 96 rotamers at 1 positions.\n",
      "\u001b[0mcore.pack.interaction_graph.interaction_graph_factory: \u001b[0mInstantiating PDInteractionGraph\n",
      "Residue 49: LYS (LYS, K):\n",
      "Base: LYS\n",
      " Properties: POLYMER PROTEIN CANONICAL_AA POLAR CHARGED POSITIVE_CHARGE METALBINDING SIDECHAIN_AMINE ALPHA_AA L_AA\n",
      " Variant types:\n",
      " Main-chain atoms:  N    CA   C  \n",
      " Backbone atoms:    N    CA   C    O    H    HA \n",
      " Side-chain atoms:  CB   CG   CD   CE   NZ  1HB  2HB  1HG  2HG  1HD  2HD  1HE  2HE  1HZ  2HZ  3HZ \n",
      "Atom Coordinates:\n",
      "   N  : 32.097, 27.128, 7.923\n",
      "   CA : 31.663, 28.176, 7.007\n",
      "   C  : 30.939, 27.471, 5.878\n",
      "   O  : 31.191, 26.303, 5.597\n",
      "   CB : 32.8374, 29.0038, 6.48277\n",
      "   CG : 33.8145, 28.2273, 5.61012\n",
      "   CD : 34.9712, 29.1063, 5.15855\n",
      "   CE : 36.0034, 28.3064, 4.37754\n",
      "   NZ : 35.4575, 27.8019, 3.08838\n",
      "   H  : 32.6902, 26.3872, 7.57742\n",
      "   HA : 31.0202, 28.8679, 7.55234\n",
      "  1HB : 32.4591, 29.8421, 5.89706\n",
      "  2HB : 33.3952, 29.4171, 7.32334\n",
      "  1HG : 34.2106, 27.3804, 6.17181\n",
      "  2HG : 33.2943, 27.8464, 4.73181\n",
      "  1HD : 34.5922, 29.9104, 4.52621\n",
      "  2HD : 35.453, 29.5497, 6.02994\n",
      "  1HE : 36.869, 28.9339, 4.17067\n",
      "  2HE : 36.3331, 27.4565, 4.97492\n",
      "  1HZ : 36.171, 27.2776, 2.60211\n",
      "  2HZ : 34.6642, 27.2026, 3.26944\n",
      "  3HZ : 35.1659, 28.5822, 2.51737\n",
      "Mirrored relative to coordinates in ResidueType: FALSE\n",
      "\n"
     ]
    }
   ],
   "source": [
    "### BEGIN SOLUTION\n",
    "pose.assign(start_pose)\n",
    "task_design_all = TaskFactory.create_packer_task(pose)\n",
    "parse_resfile(pose, task_design_all, \"expected_outputs/1YY8-K49All.resfile\")\n",
    "pack_mover_all = PackRotamersMover(scorefxn, task_design_all)\n",
    "pack_mover_all.apply(pose)\n",
    "print(pose.residue(49))\n",
    "### END SOLUTION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create your own resfile that will restrict residue 49 to only negatively charged residues using the resfile line `49 A PIKAA DE` and re-apply the design mover.\n",
    "\n",
    "__Question:__ Now what residue is chosen? What is the new residue energy, and why (physically) is it less favorable than the last design?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let’s try to make this design more favorable. Select several surrounding residues for design, and set them also to enable mutations to any residue. Call the design mover again.\n",
    "\n",
    "__Question:__ Now what do you find?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It should be noted that PyRosetta includes a handy toolbox method mutate_residue() that will change a specified residue in a given pose into another. However, the rotamer of this new residue will not be optimized. For example:\n",
    "\n",
    "```\n",
    "from pyrosetta.toolbox import mutate_residue\n",
    "pose.assign(start_pose)\n",
    "print(pose.residue(49))\n",
    "mutate_residue(pose, 49, 'E')\n",
    "print(pose.residue(49))\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Programming Exercises\n",
    "\n",
    "\n",
    "- *Refinement and discrimination*. Download the “single misfold” decoy set from the Decoys ’R Us repository at dd.compbio.washington.edu/ddownload.cgi?misfold. (Documentation for this project is at dd.compbio.washington.edu.) This repository has a single “correct” and “incorrect” predicted structure for several proteins. For this exercise, analyze pdbs 2CI2 and 2CRO; each has two “incorrect” structures offered. (Technical note: These decoys have an empty occupancy field in the PDB *ATOM* records; a value of 1 needs to be added before Rosetta will load these structures.)\n",
    "\n",
    "    Write a program that will calculate and output the score for each decoy (i) as is from the PDB file, (ii) after packing only, (iii) after minimization only, and (iv) after packing and minimizing. For each of the four cases, compare the scores of the “correct” structure with those of the “incorrect” structure. Which schemes successfully discriminate the correct structures?\n",
    "\n",
    "\n",
    "- Write a refinement protocol that will iterate between side-chain packing, small and shear moves, and minimization. Where is the best place to position the Monte Carlo acceptance test? Test your protocol by making 10 independently-refined structures for the correct and incorrect decoys of 2CRO from the Decoys ’R Us single misfold set. Is this protocol able to discriminate the correct decoy? Submit your code.\n",
    "\n",
    "\n",
    "- HIV-1 protease is a major drug target for antiretroviral therapies. Protease inhibitors are designed from substrate peptide mimics. We will attempt to take a natural substrate peptide of HIV-1 protease and design it for improved binding — potentially to serve as a good template for drug design. Use PDB file 1KJG for the following analysis.\n",
    "    \n",
    "    \n",
    "    - Turn on side-chain packing for the protease active site (residues 8, 23, 25, 29, 30, 32, 45, 47, 50, 53, 82, and 84 of both chains A and B) and for the peptide (residues 2–9 on chain P; all of these numbers follow the PDB numbering).\n",
    "\n",
    "\n",
    "    - Repack the above side chains and then energy minimize those same side chains with the backbone fixed. Generate 10 decoys and record the energies for each decoy. This will represent the reference state: the wild-type peptide bound to the protease.\n",
    "\n",
    "\n",
    "    - For residue 2 of the peptide (chain P), allow repacking to any of the 20 amino acid residues, while leaving the packing and side-chain minimization the same as in step b. Generate 10 decoys and record the energies. These will represent single mutants at that residue position.\n",
    "\n",
    "\n",
    "    - Repeat step c for each of the other 8 residues in the substrate peptide.\n",
    "    \n",
    "    \n",
    "    - Take the lowest energy for each mutation position and compare it to the lowest energy for the wild type. Do single mutants at any of these positions improve the energy over the wild type? Which ones? By how much? Which energy components are mostly responsible?\n",
    "    \n",
    "\n",
    "    - Which peptide residue positions are easiest to improve? Which positions are the hardest?\n",
    "\n",
    "\n",
    "    - Are there any other trends? Hydrophobic vs. polar, bulky residues vs. small residues, etc.?\n",
    "\n",
    "\n",
    "    - Altman et al. (Proteins 2008) found, using their own computational design algorithm, that the most favorable sequences were a triple mutant E3D/T4I/V6L, a single mutant T4V, and a single mutant E3Q. How do their results compare with yours?\n",
    "\n",
    "\n",
    "    - Natural substrates are often sub-optimal binders. Why would this be advantageous?\n",
    "\n",
    "\n",
    "- Effect of backbone conformation on design. HIV-1 protease is promiscuous, meaning it can cleave a wide range of peptides beyond the ten natural substrates of the virus. Let’s examine the preferences of the enzyme through Rosetta design calculations.\n",
    "\n",
    "    - Download HIV-1 protease in complex with CA-P2 peptide (1F7A). Select the eight peptide residues for unrestricted design and let Rosetta redesign the substrate sequence. What is the new sequence and how does it compare to the original? What percent of the original sequence was optimal for its structure?\n",
    "\n",
    "\n",
    "    - Download HIV-1 protease in complex with RT-RH peptide (1KJG). (Note that the enzyme is the same here, but it is crystallized with a different substrate.) Again, design the eight substrate residues with Rosetta. What percent of this substrate sequence is optimal for this crystal structure? ____%\n",
    "\n",
    "\n",
    "    - How do the designed sequences of (a) and (b) compare? Why should they be the same? Why would they not be the same? What are the implications for the field of computational protein design?\n",
    "\n",
    "\n",
    "- Write a program which iterates between design of all residues of a protein and refinement via small, shear, and minimization moves.\n",
    "\n",
    "\n",
    "## Thought Question\n",
    "\n",
    "What is the thermodynamic meaning of the ref energy term, and what does it correspond to physically?\n",
    "During evolution, the genome sequence may mutate to cause protein sequence changes. Alternately, one could consider the difference in evolutionary propensities for each residue type. How could you derive reference energies from sequence data, and what would that mean? \n",
    "\n",
    "\n",
    "How do Kuhlman & Baker fit the reference energies in their 2000 PNAS paper?\n",
    "\n",
    "\n",
    "## References\n",
    "\n",
    "\n",
    "- S. C. Lovell et al., “The penultimate rotamer library,” Proteins 40, 389-408 (2000).\n",
    "\n",
    "\n",
    "- R. L. Dunbrack & F. E. Cohen, “Bayesian statistical analysis of protein side-chain rotamer preferences,” Protein Sci. 6, 1661-1681 (1997)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [Side-Chain Packing](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.02-Side-chain-packing.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Docking](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/07.00-Docking.ipynb) ><p><a href=\"https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.03-Design.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open in Google Colaboratory\"></a>"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Create Assignment",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.0"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "284.444px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}