{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\\rightarrow$Run All).\n",
    "\n",
    "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\", as well as your name and collaborators below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "NAME = \"\"\n",
    "COLLABORATORS = \"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NOTEBOOK_HEADER-->\n",
    "*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta.notebooks);\n",
    "content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [RosettaCarbohydrates: Trees, Selectors and Movers](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/13.01-Glycan-Trees-Selectors-and-Movers.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [RNA in PyRosetta](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/14.00-RNA-Basics.ipynb) ><p><a href=\"https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/13.02-Glycan-Modeling-and-Design.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open in Google Colaboratory\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# RosettaCarbohydrates: Modeling and Design\n",
    "Keywords: carbohydrate, glycan, glucose, mannose, sugar, design, prediction\n",
    "\n",
    "## Overview\n",
    "Here, you will learn how to model glycans and design optimal glycosylation positions in a protein.\n",
    "\n",
    "We will be using the RosettaCarbohydrate framework to build and model glycans.  The `GlycanModeler`, which is our main method for modeling glycans, will be published in 2020.  We will be using some custom glycan options to load pdbs. \n",
    "First, one needs the `-include_sugars` option, which will tell Rosetta to load sugars and add the sugar_bb energy term to a default scorefunction.  This scoreterm is like rama for the sugar dihedrals which connect each sugar residue. \n",
    "\n",
    "\t\t-include_sugars\n",
    "\n",
    "\n",
    "When loading structures from the PDB that include glycans, we use these options. This includes an option to write out the structures in pdb format instead of the Rosetta format (which is actually better).  Again, this is included in the config/flags files you will be using.\n",
    "\n",
    "\t\t-maintain_links\n",
    "\t\t-auto_detect_glycan_connections\n",
    "\t\t-alternate_3_letter_codes pdb_sugar\n",
    "\t\t-write_glycan_pdb_codes\n",
    "\n",
    "\n",
    "More information on working with glycans can be found at this page: [Working With Glycans](https://www.rosettacommons.org/docs/latest/application_documentation/carbohydrates/WorkingWithGlycans)\n",
    "\n",
    "## Algorithm\n",
    "  \n",
    "The `GlycanModeler` essentially builds glycans from the root (The first residue of the Tree) out to the trees in a way that simulates a tree growing.  It uses a notion of a 'layer' where the layer is defined as the number of residues to the glycan root (with the glycan root being layer 0).  Within modeling, all glycan residues other than the ones being optimized are 'virtualized'.  In Rosetta, the term 'Virtual' means that these residues are present, but not scored.  (It should be noted that it is now possible to turn any residues Virtual and back to Real using two movers: `ConvertVirtualToRealMover` and `ConvertRealToVirtualMover`. )\n",
    "\n",
    "Within the modeling application, sampling of glycan DOFs is done through the `GlycanSampler`.  The sampler attempts to sample the large amount of DOFs available to a glycan tree.  The GlycanSampler is a `WeightedRandomSampler`, which is a container of highly specific sampling strategies, where each strategy is weighted by a particular probability.  At each apply, the mover selects one of these samplers using the probability set to it. This is the same way the SnugDock algorithm for antibody modeling works. \n",
    "\n",
    "Sampling is always scaled with the number of glycan residues that you are modeling, so run-time will increase proportionally as well. \n",
    "If you are modeling a huge viral particle with lots of glycans, one can use quench mode, which will optimize each glycan individually. \n",
    "Tpyically for these cases, multiple rounds of glycan modeling is desired. \n",
    "\n",
    "\n",
    "### GlycanSampler Major components\n",
    "\n",
    "Some of these components were covered in the previous tutorial.\n",
    "\n",
    "1. __Glycan Conformers__\n",
    "\n",
    "\tThese conformers have been generated through an in-depth bioinformatic analysis of the PDB using adaptive kernal density estimates and are unique for each linkage type including glycan residues connected to ASN residues.  A conformer is a specific conformation of all of the backbone dihedrals of a particular glycan linkage. Essentialy glycan 'fragments' for a particular type of linkage.\n",
    "\n",
    "\n",
    "2. __SugarBB Sampling__ \n",
    "\n",
    "\tThis sampling is done through turning the `sugar_bb` energy term into a set of probabilities using the -log(e) function.  This allows us to sample on the QM derived torsonal potentials during modeling. \n",
    "\n",
    "\n",
    "3. __Random Sampling and Shear Moves__\n",
    "\n",
    "\tWe sample random torsions at +/- 15 , +/- 45, +/- 90 degrees, each at decreasing probabilities at a 4:2:1 ratio of sampling Small,Medium,Large. \n",
    "\tShear sampling is done where torsions are set for two residues in order to reduce downsteam effects and allow 'flipping' of the glycan torsions.\n",
    "\n",
    "\n",
    "4. __Minimization__\n",
    "\t\n",
    "\tWe Minimize Sugar residues by randomly selecting a residue from what is set to model, and selecting all residues out to the tree that are not virtualized. This reduces computational time that would otherwise restrict the total number of glycan residues we could model at once.\n",
    "    \n",
    "\n",
    "5. __Packing__\n",
    "\n",
    "\tOf the residues set to optimize, we chooses a random residue and pack that residue and all residues out to the tree that are not virtualized. We pack the sugar residues (OH and constituents) and any neighboring protein sidechains. TaskOperations may be set to allow design of protein residues during this.  We do packing this way to once again reduce total computational time.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Notebook setup\n",
    "import sys\n",
    "if 'google.colab' in sys.modules:\n",
    "    !pip install pyrosettacolabsetup\n",
    "    import pyrosettacolabsetup\n",
    "    pyrosettacolabsetup.setup()\n",
    "    print (\"Notebook is set for PyRosetta use in Colab.  Have fun!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Make sure you are in the directory with the pdb files:**\n",
    "\n",
    "`cd google_drive/My\\ Drive/student-notebooks/`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# General Setup and Inputs\n",
    "\n",
    "You will be using a few different inputs.  We will be designing in glycosylation spots in order to block antibody binding at a highly curved epitope, and we will be loading a human structure from the PDB that has internal glycans.   \n",
    "\n",
    "\n",
    "## Notes for Tutorial Shortening\n",
    "\n",
    "\n",
    "Typically, the value of `-glycan_sampler_rounds` is set to 25 (which typically is enough) and nstruct is about 5-10k per input structure. You may increase glycan_sampler_rounds to 100 and then decrease output to 1-2500 nstruct in order to have the same level of sampling, which will result in very good models as well.  Since this is denovo modeling of glycans, more nstruct is almost always better. For some tutorials, we may decrease this value below our optimal value in order to shorten the length of the tutorial.\n",
    "\n",
    "\n",
    "## General Notes\n",
    "\n",
    "We will use a flags file for all common options in this tutorial.  Note that instead of passing this flag on init, you can instead put it into your working directory or a particular place in your home directory and rename it common. \n",
    "    \n",
    "See this page for more info on using rosetta with custom config files: <https://www.rosettacommons.org/docs/latest/rosetta_basics/running-rosetta-with-options#common-options-and-default-user-configuration>\n",
    "\n",
    "All tutorials have generated output in output_files and their approximate time to finish on a single (core i7) processor.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Python\n",
    "from pyrosetta import *\n",
    "from pyrosetta.rosetta import *\n",
    "from pyrosetta.teaching import *\n",
    "init('@inputs/glycans/common_glycans @inputs/glycans/pdb_flags @inputs/glycans/map_flags')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial\n",
    "\n",
    "GlycanModeling is done through the RosettaScripts interface.  Each tutorial has you copying a base XML and adding/modifying specific components to achieve a goal.  ALL of these movers are available as components in PyRosetta - however, setup is much more difficult and time consuming.  So for now, we will rely on the RS interface, but \n",
    "\n",
    "## Tutorial A: Epitope Blocking, De-novo Glycan Modeling\n",
    "\n",
    "Here, we will start with the antigen known as Bee Hyaluronidase, from PDB ID 2J88.  The PDB file has an antibody bound to it as a HIGHLY immunogenic site. We would like to block this in order to begin to use this enzyme for therapy as Hyaluronidase can be effective in breaking down sugars in the extracellular matrix, allowing certain larger drugs to get to regions of interest.  The antibody is renumbered into the AHo numbering scheme that we used in the RAbD tutorial, and it has been relaxed with constraints into the Rosetta energy function. \n",
    "\n",
    "We will be designing in at least one optimal glycan at the most immunogenic site.\n",
    "Note that a prototocol called SugarCoat is in development that will scan regions of interest for potential ideal glycosylation, however, one can certainly do this manually as we do below. \n",
    "\n",
    "### A 1. Designing in a Glycosylation Site: \n",
    "\n",
    "CreateGlycanSequonMover` and `CreateSequenceMotifMover`\n",
    "\n",
    "A sugar glycosylation site is known as a `Sequon`.  The glycan sequon is made up of three protein residues which are recognized by the GlycosylTransferase Enzyme during translation in the ER.  This enzyme adds the root of nascent glycan onto a protein.  In this case, we use the sequon for ASN glycosylation.  The sequon is as follows: `N[^P][S/T]`.  The `[^P]` notation means that any residue other than P can be there.  The `[S/T]` notation means that either S or T is recognized.  This notation can be used to directly create Motifs in proteins using the `CreateSequenceMotifMover` and associated `SequenceMotifTaskOperation`. Documentation for these is available here:\n",
    "\n",
    "<https://www.rosettacommons.org/docs/wiki/scripting_documentation/RosettaScripts/xsd/mover_CreateSequenceMotifMover_type>\n",
    "\t<https://www.rosettacommons.org/docs/wiki/scripting_documentation/RosettaScripts/xsd/to_SequenceMotifTaskOperation_type>\n",
    "\n",
    "The create GlycanSequonMover can also be used for glycosylation of different AA than ASN.\n",
    "\n",
    "#### A1.1 Design using a typical sequon\n",
    "\n",
    "Before we begin, take a look at the complex either by PyMol or use the PyMolMover as you have previously. \n",
    "The file is `inputs/glycans/2j88_complex.pdb`\n",
    "        \n",
    "Where can we introduce a glycan to block binding?\n",
    "        \n",
    "        \n",
    "Where do you think the optimal glycan position would be for this particular antibody?  Take a look at the xml.  Is this the position we are targeting?  Typically, we may want to allow some backbone movement in our sequon.  The full glycan scanning protocol can be found in an input file, simple_glycan_scanner_manual.xml, where we relax the motif residues with constraints, add the sequon, and then relax again, comparing the energy between them to get the full energetic contributions of the sequon on the structure.  In order to reduce the run time in these tutorials, we will be removing this going forward.\n",
    "\n",
    "The XML syntax is below: \n",
    "            \n",
    "     <CreateGlycanSequeonMover name=\"motif_creator\" residue_selector=\"select\"/>\n",
    "            \n",
    "Go ahead and run the xml (`inputs/glycans/tutA11.xml`) using what you have learned previously, or run the code below (about 15 seconds)\n",
    "\n",
    "Note that the xml uses `SimpleMetrics` to output a variety of metrics that are in pose.scores or output into a scorefile.\n",
    "\n",
    "Here is the full XML.  We will only use part of it in code."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```xml\n",
    "<ROSETTASCRIPTS>\n",
    "    <SCOREFXNS>\n",
    "    </SCOREFXNS>\n",
    "    <RESIDUE_SELECTORS>\n",
    "        <Index name=\"select\" resnums=\"%%start%%\" />\n",
    "        <Index name=\"motif\" resnums=\"%%start%%-%%end%%\" />\n",
    "        <Neighborhood name=\"nbrhood\" selector=\"motif\"/>\n",
    "        <Not name=\"others\" selector=\"nbrhood\" />\n",
    "    </RESIDUE_SELECTORS>\n",
    "    <SIMPLE_METRICS>\n",
    "        <TimingProfileMetric name=\"timing\"/>\n",
    "        <TotalEnergyMetric name=\"total_energy_delta\" use_native=\"1\" custom_type=\"native_delta\"/>\n",
    "        <TotalEnergyMetric name=\"total_energy\" use_native=\"0\"/>\n",
    "        <CompositeEnergyMetric name=\"composite_energy\" use_native=\"1\"/>\n",
    "        <SelectedResiduesMetric name=\"selection\" residue_selector=\"select\"/>\n",
    "\t\t<SelectedResiduesMetric name=\"rosetta_sele\" residue_selector=\"select\" rosetta_numbering=\"1\"/>\n",
    "\t\t<SelectedResiduesPyMOLMetric name=\"pymol_selection\" residue_selector=\"select\" />\n",
    "        <SelectedResiduesPyMOLMetric name=\"region\" residue_selector=\"nbrhood\" />\n",
    "        <SequenceMetric name=\"sequence\" residue_selector=\"motif\" />\n",
    "        <SasaMetric name=\"sasa\" residue_selector=\"select\" />\n",
    "        <RMSDMetric name=\"rmsd\" use_native=\"1\" rmsd_type=\"rmsd_protein_bb_heavy\"/>\n",
    "    </SIMPLE_METRICS>\n",
    "    <MOVERS>\n",
    "        <CreateGlycanSequonMover name=\"motif_creator\" residue_selector=\"select\" pack_neighbors=\"1\"/>\n",
    "        <RunSimpleMetrics name=\"selections\" metrics=\"rosetta_sele,pymol_selection,sequence\" />\n",
    "        <RunSimpleMetrics name=\"pre_metrics\" metrics=\"sasa,total_energy\" prefix=\"pre_\" />\n",
    "        <RunSimpleMetrics name=\"post_sequon_metrics\" metrics=\"total_energy_delta,sequence,sasa,timing,rmsd\" prefix=\"post-sequon_\" />\n",
    "    </MOVERS>\n",
    "    <PROTOCOLS>\n",
    "        <Add mover=\"selections\" />\n",
    "        <Add mover=\"pre_metrics\" />\n",
    "        <Add mover=\"motif_creator\"/>\n",
    "        <Add mover=\"post_sequon_metrics\"/>\n",
    "    </PROTOCOLS>\n",
    "</ROSETTASCRIPTS>\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from rosetta.protocols.carbohydrates import *\n",
    "from rosetta.core.select.residue_selector import *\n",
    "from rosetta.core.simple_metrics.metrics import *\n",
    "from rosetta.core.simple_metrics.composite_metrics import *\n",
    "from rosetta.core.simple_metrics.per_residue_metrics import *"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pose_complex = pose_from_pdb(\"inputs/glycans/2j88_complex.pdb\")\n",
    "pose_complex_original = pose_complex.clone()\n",
    "\n",
    "pose_antigen = pose_from_pdb(\"inputs/glycans/2j88_antigen.pdb\")\n",
    "pose_antigen_original = pose_antigen.clone()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "start = pose_antigen.pdb_info().pdb2pose(\"A\", 143)\n",
    "end = pose_antigen.pdb_info().pdb2pose(\"A\", 145)\n",
    "select = ResidueIndexSelector()\n",
    "select.set_index(start)\n",
    "\n",
    "motif = ResidueIndexSelector()\n",
    "motif.set_index_range(start, end)\n",
    "\n",
    "sequon_creator = CreateGlycanSequonMover(select)\n",
    "sequon_creator.apply(pose_antigen)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "seq_metric = SequenceMetric(motif)\n",
    "print(\"original\", seq_metric.calculate(pose_antigen_original))\n",
    "print(\"designed\", seq_metric.calculate(pose_antigen))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Did we design in the sequon?  (`N[^P][S/T]`)\n",
    "\n",
    "What do our scores look like?\n",
    "Note we are packing neighbors here, so our score does not totally say that we actually reduced the score."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "score = get_score_function()\n",
    "print(\"original\", score.score(pose_antigen_original))\n",
    "print(\"designed\", score.score(pose_antigen))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note we are packing neighbors here, so our score does not totally say that we actually reduced the score."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### A1.2 Design using the `N[^P][T]` motif\n",
    "\n",
    "\n",
    "This motif has been shown to have higher occupancy of the glycosation site with glycans in the resulting protein.  Glycosylation is not 100% in some cases at some positions for (currently) unknown reasons, but it has been shown that a THR at the +2 site results in higher occupancy. If we were creating a drug, we can use chromatography during protein isolation to choose peaks which include our glycan. \n",
    "\n",
    "Here, we are using the [-] notation as to not actually design the second position.  We will use what is in the native protein here. Again, you may run the XML (`inputs/glycans/tutA12.xml`) within PyRosetta replacing start and end with: `start=143A end=145A`, or run the code below. \n",
    "\n",
    "The key XML line is here:\n",
    "    <CreateSequenceMotifMover name=\"create_sequon\" residue_selector=\"p1\" motif=\"N[-]T\"/>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from rosetta.protocols.calc_taskop_movers import *\n",
    "\n",
    "general_motif_creator = CreateSequenceMotifMover(select)\n",
    "general_motif_creator.set_motif(\"N[-]T\")\n",
    "pose_antigen_d1 = pose_antigen.clone()\n",
    "\n",
    "general_motif_creator.apply(pose_antigen)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Was the sequon successfully designed?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"original\", seq_metric.calculate(pose_antigen_original))\n",
    "print(\"designed1\", seq_metric.calculate(pose_antigen_d1))\n",
    "print(\"designed2\", seq_metric.calculate(pose_antigen))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"original\", score.score(pose_antigen_original))\n",
    "print(\"designed1\", score.score(pose_antigen_d1))\n",
    "print(\"designed2\", score.score(pose_antigen))  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A2. Adding a man5 glycan: \n",
    "\n",
    "__SimpleGlycosylateMover__\n",
    "\n",
    "Now, we will expand on our first tutorial by glycosylating afterward. We will use the common name for a man5 sugar, which is a high-mannose branching glycan of 7 sugar residues (and 5 mannoses).  You can use a few common names to make glycosylation easier, or an IUPAC string, or a file that has the IUPAC string in the first name of the file. Common names include man5,man7,man9 and a few others.  You can find these in\n",
    "\n",
    "\tdatabase/chemical/carbohydrates/common_glycans\n",
    "\n",
    "The IUPAC nomenclature of the man5 is as follows:  \n",
    "\n",
    "\n",
    "\t\ta-D-Manp-(1->3)-[a-D-Manp-(1->3)-[a-D-Manp-(1->6)]-a-D-Manp-(1->6)]\n",
    "\t\t                                          -b-D-Manp-(1->4)-b-D-GlcpNAc-(1->4)-b-D-GlcpNAc-\n",
    "\n",
    "\n",
    "\n",
    "More information on IUPAC nomenclature of sugar trees is here: <http://www.chem.qmul.ac.uk/iupac/2carb>.  There is also a very detailed README in the common glycan directory for your reference.\n",
    "\n",
    "Note that within the `SimpleGlycosylateMover` you may also give multiple glycans using the `glycans` option, which will randomly choose a glycan tree to use for glycosylation from the list given.  Glycosylation is not deterministic in that you always get a man5 at a particular position and is influenced by a great deal of structural biology that is not yet fully determined.  For now, since we are aiming to create a drug and purifying our result, using a man5 is sufficient. Once again, either run the xml (inputs/glycans/tutA2.xml) as before or run the code below. This takes about 15 seconds. \n",
    "\n",
    "XML syntax is as follows:\n",
    "\n",
    "\t<SimpleGlycosylateMover name=\"glycosylate\" residue_selector=\"select\" glycan=\"man5\" />\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "glycosylate = SimpleGlycosylateMover()\n",
    "glycosylate.set_glycosylation(\"man5\")\n",
    "glycosylate.set_residue_selector(select)\n",
    "glycosylate.apply(pose_antigen)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Take a look at the structure using the PyMol Mover or output it.  What does the structure look like? The SimpleGlycosylateMover simply adds in glycans at ideal values for rings and backbones. \n",
    "Lets optimize this and model some glycans!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A3. Modeling glycans\n",
    "\n",
    "`GlycanResidueSelector` and the `GlycanModeler`\n",
    "\n",
    "We will run the previous tutorials in a single rosetta script where we end with modeling the glycan residues.  We use a very short run time and nstruct, so results will not be as clean as they would otherwise, but this should give you an idea of how all this works. Typically, we would model different positions of potential glycosylations, but here to save time, we will simply continue to build and model the glycan position we started with. Output files have been provided for you if you wish to use these. We will not be giving the mover a residue selector as it uses all glycans by default, but you can use the `GlycanResidueSelector` to choose specific trees or even glycan residues within those trees to model as we have seen in the previous tutorial.\n",
    "\n",
    "This takes about 380 seconds to run.  You can either use the XML (`inputs/glycans/tutA3.xml`) or the code below. Output 10 structures.\n",
    "\n",
    "XML Syntax:\n",
    "\n",
    "\t<GlycanModeler name=\"model\" layer_size=\"2\" window_size=\"1\" rounds=\"1\" refine=\"false\" />\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Here, we setup some defaults that will be set in master shortly. \n",
    "# Note that the name of the GTM will change to simply GlycanModeler within the next week or two.\n",
    "\n",
    "modeler = GlycanTreeModeler()\n",
    "modeler.set_hybrid_protocol(True)\n",
    "modeler.set_use_shear(True)\n",
    "modeler.set_use_gaussian_sampling(True)\n",
    "modeler.apply(pose_antigen)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run this code for a total of 10 nstruct.  Compare energies and take a look at the lowest one. Scores and pdbs are also available in `expected_outputs/glycans` with prefix `tutA3`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How does it look?  Load the native (complexed structure) into pymol as well.  Would this glycan block this particular antibody? Where else could we place a glycan?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tutorial B: Using Glycan Density\n",
    "\n",
    "In this tutorial we will load a pdb directly into Rosetta with sugars already present. The config for this has been provided for you.  We will exclusively be using the XML interface to PyRosetta for these tutorials, however, using the knowledge gained in the Density and Symmetry tutorials, all of this is available code-wise as well, but the XML interface greatly simplifies this task.\n",
    "\n",
    "\n",
    "The glycan tree that we will be working with is 5 residues long.  I use coot to look at density maps. \n",
    "Density maps were generated by downloading the cif file from the PDB and using PHENIX maps and default `maps.params`. This command was used to generate them.  This is covered in the Working-With-Density tutorial:\n",
    "\n",
    "\t\tphenix.maps inputs/glycans/4do4.pdb inputs/glycans/4do4-sf.cif \n",
    "\n",
    "The density map generated is too large to be distributed with the rest of the tuorial, so I have uploaded it to Google Drive for you to download. <https://drive.google.com/open?id=1h569jpwLxyHu7iHLG8eu2Q9_B-Q9e_C9> Please download and place it in your working directory, if it's not already there.\n",
    "\n",
    "The tutorial will use a density map that is small enough to fit into the repo, but you are welcome to use 4d04 instead with the downloaded map.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### B1. Calculating Density Fit\n",
    "\n",
    "\n",
    "Although a structure may be solved with high resolution, not all solved residues may fit the density well.  A structure from the PDB is still a model afterall, informed through experimentation.  This is especially true of glycan residues, which are fairly mobile.  Crystal contacts of neighboring proteins help to reduce the movemment of glycans and may help to induce a state that can be solved more easily given high-resolution density.  In this tutorial, we will be using Rosetta to determine how well a residue fits into the given density.  There are methods to do this in the coot program, but we want to be able to do this for any structure in a streamlined way - especially if we need to calculate RMSDs on only well-fitting glycan residues.  The methods we will be employing in Rosetta are based on Frank Dimaio's work with Rosetta density.  \n",
    "\n",
    "To do this, we will once again be employing the SimpleMetric system.  In this case, we use the `PerResidueDensityFitMetric`, which is a PerResidueRealMetric.  This type of SimpleMetric calculates a particular value for each residue given a residue selector.  Very useful here. \n",
    "\n",
    "We will also be employing the DensityFitResidueSelector, which uses the metric.  Since this is a fairly slow metric, we will use in-built functionality for using our calculated values from the metric, which are stored in the pose.  We will then use the SelectedResidueCountMetric to determine how many residues have great fit.  In later tutorials, we will be using the RMSDMetric with this selector in order to calculate RMSD on well-fitting glycan residues. \n",
    "\n",
    "\tResidues higher than .8 are great fit to density.\n",
    "\tResidues between .6 - .8 are good fit to density\n",
    "\tResidues below .4 fit to density are BAD fits\n",
    "\n",
    "\n",
    "The XML is `inputs/glycans/tutB1.xml`\n",
    "\n",
    "The key XML lines are:\n",
    "\n",
    "\n",
    "\t<PerResidueDensityFitMetric name=\"fit_native\" residue_selector=\"tree\" output_as_pdb_nums=\"1\" \n",
    "\t\t\t                                                     sliding_window_size=\"1\" match_res=\"1\"/>\n",
    "\t<DensityFitResidueSelector name=\"fits8\" den_fit_metric=\"fit_native\" cutoff=\".8\" use_cache=\"1\" \n",
    "\t\t\t                                                                  fail_on_missing_cache=\"1\"/>\n",
    "\t<SelectedResidueCountMetric name=\"n_fits8\" custom_type=\"fit8\" residue_selector=\"fits8\"/>\n",
    "\n",
    "Run the xml and while it is running, take a look at the XML (runtime is about 80 seconds).  It is fairly complicated and we will be building on it during the rest of these tutorials.  Note that we first define the density metric, and then we use it within the selector.  At the bottom, we add these to our set of native_metrics.  What other metrics are we using?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "branch=\"200A\"\n",
    "map_file=\"inputs/1jnd_2mFo-DFc_map.ccp4\"\n",
    "symmdef=\"inputs/1jnd_crys.symm\"\n",
    "\n",
    "xml = f'''\n",
    "<ROSETTASCRIPTS>\n",
    "    <SCOREFXNS>\n",
    "    </SCOREFXNS>\n",
    "\n",
    "    NEEDED FOR CACHING density fit info\n",
    "    <RESIDUE_SELECTORS>\n",
    "        <Glycan name=\"tree\" branch=\"{branch}\" include_root=\"0\" />\n",
    "    </RESIDUE_SELECTORS>\n",
    "    <SIMPLE_METRICS>\n",
    "        <PerResidueDensityFitMetric name=\"fit_native\" residue_selector=\"tree\" output_as_pdb_nums=\"1\" sliding_window_size=\"1\" match_res=\"1\"/>\n",
    "    </SIMPLE_METRICS>\n",
    "    END Density Fit Setup\n",
    "\n",
    "    <RESIDUE_SELECTORS>\n",
    "        \n",
    "        <Index name=\"root\" resnums=\"{branch}\" />\n",
    "        <GlycanLayerSelector name=\"first_layer\" start=\"0\" end=\"1\"/>\n",
    "        <And name=\"layer01\" selectors=\"tree,first_layer\" />\n",
    "        <DensityFitResidueSelector name=\"fits8\" den_fit_metric=\"fit_native\" cutoff=\".8\" use_cache=\"1\" fail_on_missing_cache=\"1\" prefix=\"native_\"/>\n",
    "        <DensityFitResidueSelector name=\"fits6\" den_fit_metric=\"fit_native\" cutoff=\".6\" use_cache=\"1\" fail_on_missing_cache=\"1\" prefix=\"native_\"/>\n",
    "        <DensityFitResidueSelector name=\"fits4\" den_fit_metric=\"fit_native\" cutoff=\".4\" use_cache=\"1\" fail_on_missing_cache=\"1\" prefix=\"native_\"/>\n",
    "    </RESIDUE_SELECTORS>\n",
    "\n",
    "    <SIMPLE_METRICS>\n",
    "\n",
    "        <TimingProfileMetric name=\"timing\" />\n",
    "\n",
    "        <SelectedResidueCountMetric name=\"n_tree\" custom_type=\"tree_size\" residue_selector=\"tree\"/>\n",
    "        <SelectedResidueCountMetric name=\"n_fits8\" custom_type=\"fit8\" residue_selector=\"fits8\"/>\n",
    "        <SelectedResidueCountMetric name=\"n_fits6\" custom_type=\"fit6\" residue_selector=\"fits6\"/>\n",
    "        <SelectedResidueCountMetric name=\"n_fits4\" custom_type=\"fit4\" residue_selector=\"fits4\"/>\n",
    "        <SelectedResidueCountMetric name=\"n_layer01\" custom_type=\"layer01\" residue_selector=\"layer01\"/>\n",
    "\n",
    "        <PerResidueGlycanLayerMetric   name=\"layers\"         residue_selector=\"tree\" output_as_pdb_nums=\"1\"/>\n",
    "        <SelectedResiduesPyMOLMetric   name=\"pymol_tree\"   residue_selector=\"tree\"   custom_type=\"glycans\"/>\n",
    "        <SelectedResiduesPyMOLMetric   name=\"pymol_branch\" residue_selector=\"root\"   custom_type=\"branch\"/>\n",
    "        <SelectedResiduesMetric        name=\"pdb_glycans\"  residue_selector=\"tree\"   rosetta_numbering=\"0\" custom_type=\"glycans\"/>\n",
    "        <SelectedResiduesMetric        name=\"pdb_branch\"   residue_selector=\"root\"   rosetta_numbering=\"0\" custom_type=\"branch\"/>\n",
    "\n",
    "    </SIMPLE_METRICS>\n",
    "    <MOVERS>\n",
    "        <SetupForSymmetry name=\"setup_symm\" definition=\"{symmdef}\"/>\n",
    "        <LoadDensityMap name=\"loaddens\" mapfile=\"{map_file}\"/>\n",
    "        <SetupForDensityScoring name=\"setupdens\"/>\n",
    "        <RunSimpleMetrics name=\"native_metrics\" metrics=\"fit_native\" prefix=\"native_\" />\n",
    "        <RunSimpleMetrics name=\"selections\" metrics=\"layers,pymol_tree,pymol_branch,pdb_glycans,pdb_branch\" />\n",
    "        <RunSimpleMetrics name=\"counts\"     metrics=\"n_tree,n_layer01,n_fits6,n_fits8\"/>\n",
    "        <RunSimpleMetrics name=\"timings\"    metrics=\"timing\" /> \n",
    "\n",
    "    </MOVERS>\n",
    "    <PROTOCOLS>\n",
    "        <Add mover_name=\"loaddens\"/>\n",
    "        <Add mover_name=\"setupdens\"/>\n",
    "        <Add mover_name=\"selections\"/>\n",
    "        <Add mover_name=\"native_metrics\" />\n",
    "        <Add mover_name=\"counts\"/>\n",
    "        <Add mover_name=\"timings\"/>\n",
    "    </PROTOCOLS>\n",
    "</ROSETTASCRIPTS>\n",
    "'''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pose = pose_from_pdb(\"inputs/1jnd_refined.pdb.gz\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mover = pyrosetta.rosetta.protocols.rosetta_scripts.XmlObjects.create_from_string(xml).get_mover(\"ParsedProtocol\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mover.apply(pose)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, take a look at the (tutB1) scorefile or pose.scores.  You can use the scorefile.py script to output as tabs if you would like. How many residues have great fit to density (hint, look for `fit6_selection_count` and `fit8_selection_count` data terms)?  Are there any residues that fit poorly into the density?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(pose.scores)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(pose.scores[\"fit6_selection_count\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(pose.scores[\"fit8_selection_count\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import re\n",
    "\n",
    "\n",
    "for term in pose.scores:\n",
    "    if re.search(\"native_res_density_fit\", term):\n",
    "        print(term.split(\"_\")[-1], pose.scores[term])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### B2. Refinement into density\n",
    "\t\t\n",
    "Here, we will be doing a short refinement protocol into the density, with its crystal symmetry.  This is a short protocol, but will work for our purposes.  For a much longer (albeit very similar) refinement protocol of the glycan and whole protein, see Frenz et al (referenced in the Working With Glycans tutorial). The full protocol used in this paper is included in the input files as cryoem_glycan_refinement.xml.  Take a look and see how it compares to what we are doing here. As usual, output files are available. Runtime for all 10 structures is about 2 hours. \n",
    "\n",
    "The XML is `inputs/glycans/tutB2.xml`.  Since this takes so long to run, we will skip running it here. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Take a look at the scorefile either using the scorefile.py script or through pandas: `expected_outputs/glycans/tutB2_score.sc`\n",
    "\n",
    "Are the density fit scores higher?  How different are the RMSDs of the glycan residues? Take a look at the structure of the lowest energy - how different does it look?  Are any new contacts created?  Were we able to improve the density fit for some of those residues?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### B3. Denovo building into Density\n",
    "\t\n",
    "In this tutorial, we will be once again loading our crystal structure with density and symmetry.  However, we will be randomizing the bb torsions and building the glycan out from scratch. In reality, we would have some idea of what glycan we are building and we would glycosylate the protein with the chemical motif we have figured out from means such as mass spec.  We would then model the glycan to solve the crystal structure. With the new PackerPalette machinery in Rosetta and the ability to design glycans, we could actually build a protocol to sample chemical motifs of the glycans we are building out into the density, however, since this a very very large combinatorial problem, we should have some idea of what exists in the structure. \n",
    "\n",
    "\n",
    "We will first rebuild the glycan tree using the density as a guide, and then refine it further using what we learned in the previous tutorial. Note that like tutorial B2, this one takes a good long while (4 hours)\n",
    "\n",
    "The XML for this is `inputs/glycans/tutB3.xml`.  Once again, we will not be running this here.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Take a look at the scorefile and PDBs. (prefix tutB3_)\n",
    "\n",
    "How are our RMSDs?  Were we able to do enough sampling to get close to the native structure?  Are the energies acceptable?  Are there parts of the glycan that are closer to native than others?  Why might this be?  Is an nstruct of 10 enough??"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Thank you for doing this tutorial!  I hope you learned a lot and are ready to work with these crazy carbohydrates!  Cheers!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [RosettaCarbohydrates: Trees, Selectors and Movers](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/13.01-Glycan-Trees-Selectors-and-Movers.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [RNA in PyRosetta](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/14.00-RNA-Basics.ipynb) ><p><a href=\"https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/13.02-Glycan-Modeling-and-Design.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open in Google Colaboratory\"></a>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.0"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}