{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\\rightarrow$Run All).\n",
    "\n",
    "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\", as well as your name and collaborators below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "NAME = \"\"\n",
    "COLLABORATORS = \"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NOTEBOOK_HEADER-->\n",
    "*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta.notebooks);\n",
    "content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [RosettaCarbohydrates](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/13.00-RosettaCarbohydrates-Working-with-Glycans.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [RosettaCarbohydrates: Modeling and Design](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/13.02-Glycan-Modeling-and-Design.ipynb) ><p><a href=\"https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/13.01-Glycan-Trees-Selectors-and-Movers.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open in Google Colaboratory\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# RosettaCarbohydrates: Trees, Selectors and Movers\n",
    "Keywords: carbohydrate, glycan, glucose, mannose, sugar, ResidueSelector, Mover\n",
    "\n",
    "## Overview\n",
    "Here, we will cover useful `ResidueSelectors` and `Movers` available in the RosettaCarbohdyrate framework.  All of these framework components form the basis for the tools you will use in the next tutorial, Glycan Modeling and Design."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Make sure you are in the directory with the pdb files:**\n",
    "\n",
    "`cd google_drive/My\\ Drive/student-notebooks/`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Imports\n",
    "\n",
    "Before we begin, we must import some specific machinery from Rosetta.  Much of these tools are automatically imported when we do `from pyrosetta import *`, however, some are not. You should get into the habit of importing everything you need.  This will get you comfortable with the organization of Rosetta and make it easier to find tools that are beyond the scope of these workshops."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Notebook setup\n",
    "import sys\n",
    "if 'google.colab' in sys.modules:\n",
    "    !pip install pyrosettacolabsetup\n",
    "    import pyrosettacolabsetup\n",
    "    pyrosettacolabsetup.setup()\n",
    "    print (\"Notebook is set for PyRosetta use in Colab.  Have fun!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Python\n",
    "from pyrosetta import *\n",
    "from pyrosetta.rosetta import *\n",
    "from pyrosetta.teaching import *\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Intitlialization \n",
    "\n",
    "Here, we will be opening a PDB file with glycans, so we will use `-include_sugars` and a few other options that allow us to read (most) PDB files without issue. It is always a good idea to use the `GlycanInfoMover` to double check that the glycans you are interested in are properly represented by Rosetta.  If they are not, post the issue in the Rosetta forums.\n",
    "\n",
    "Once again, more information on working with glycans can be found at this page: [Working With Glycans](https://www.rosettacommons.org/docs/latest/application_documentation/carbohydrates/WorkingWithGlycans)\n",
    "\n",
    "### PDB vs Rosetta sugar format\n",
    "\n",
    "Unfortunately, there are few standards in the PDB for how saccharide residues in `.pdb` files should be numbered and named. The Rosetta code — with the appropriate flags initialization flags, such as `-alternate_3_letter_codes pdb_sugar` tries its best to interpret `.pdb` files with sugars, but because of ambiguity and inconsistency, success is in no way ensured.  See http://www.rosettacommons.org/docs/latest/rosetta_basics/preparation/Preparing-PDB-files-for-non-peptide-polymers for more info\n",
    "\n",
    "\n",
    "To guarantee that one can model the specific saccharide system desired unabiguously, Rosetta uses a slightly modified `.pdb` format for importing carbohydrate residues. The key difference in formats involves the `HETNAM` record of the PDB format. The standard PDB `HETNAM` record line:</p>\n",
    "\n",
    "```HETNAM     GLC ALPHA-D-GLUCOSE```\n",
    "\n",
    "...means that all `GLC` 3-letter codes in the <em>entire file</em> are α-<font style=\"font-variant: small-caps\">d</font>-glucose, which is insufficient, as this \n",
    "could mean several different α-<font style=\"font-variant: small-caps\">d</font>-glucoses, depending on the ring form and on the main-chain connectivity of the glycan — and \n",
    "many, many more if one includes modified sugars! The modified Rosetta-ready PDB `HETNAM` \n",
    "record line:</p>\n",
    "\n",
    "```HETNAM     Glc A   1  ->4)-alpha-D-Glcp```\n",
    "\n",
    "...means that the `GLC` residue <em>specifically at position A1</em> requires the `->4)-alpha-D-Glcp` `ResidueType` or any of its `VariantType`s. (Note also that Rosetta uses sentence case 3-letter-codes for sugars.)</p>\n",
    "\n",
    "Rosetta will output and input with this default format. \n",
    "We use `-alternate_3_letter_codes pdb_sugar` to read in the PDB-format sugar and `-write_glycan_pdb_codes` to output the PDB format since we will be working with a structure directly from the PDB.\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "options = \"\"\"\n",
    "-ignore_unrecognized_res\n",
    "-include_sugars\n",
    "-auto_detect_glycan_connections\n",
    "-maintain_links \n",
    "-alternate_3_letter_codes pdb_sugar\n",
    "-write_glycan_pdb_codes\n",
    "-ignore_zero_occupancy false \n",
    "-load_PDB_components false\n",
    "-no_fconfig\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "init(\" \".join(options.split('\\n')))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pose = pose_from_pdb(\"inputs/glycans/4do4_refined.pdb\")\n",
    "pose_original = pose.clone()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Object Exploration: GlycanTreeSet, CarbohydrateInfo, and the GlycanInfoMover\n",
    "\n",
    "Before we do anything else, lets get some information on the pose that we are working with.\n",
    "\n",
    "### GlycanTreeSet\n",
    "\n",
    "The `GlycanTreeSet` is created when glycans are added to a pose or a pose is created with glycans in it.  The `GlycanTreeSet` has information on each glycan tree and each residue's parent and child.  The tree set also has an observer attached to it, so it will auto-update itself when glycan residues are attached or removed from the pose.  The `GlycanTreeSet` is a part of the Pose's `Conformation` object.  First, lets expore this. \n",
    "\n",
    "Lets find out how many glycan trees are and their lengths. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tree_set = pose.glycan_tree_set()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(tree_set.n_trees())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, so there are 6 glycan trees in our pose!  Cool.  Lets see what the largest one is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(tree_set.get_largest_glycan_tree_length())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### GlycanTree and GlycanNode\n",
    "\n",
    "The `GlycanTreeSet` is made up of `GlycanTree` objects.  Each of these is made up of `GlycanNodes` for each residue in a tree. Lets expore these."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import Image\n",
    "Image('./Media/tree_set.png',width='500')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for start in tree_set.get_start_points():\n",
    "    print(start, pose.pdb_info().pose2pdb(start), pose.residue_type(start).name3(), pose.residue_type(start).name())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets look at the parent of each of these glycan start points to see if they are connected to a protein, and if so, what residue they are attached to."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for start in tree_set.get_start_points():\n",
    "    parent = tree_set.get_parent(start)\n",
    "    parent_naem = \"NONE\"\n",
    "    if parent != 0:\n",
    "        parent_name = pose.residue_type(parent).name3()\n",
    "    print(parent, pose.pdb_info().pose2pdb(parent), parent_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Cool.  So they are all connected to protein residues at an Asparigine.  Lets take a look at the first sugar. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tree1 = tree_set.get_tree(388)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"length\", tree1.size())\n",
    "print(\"root\", tree1.get_root())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for res in tree1.get_residues():\n",
    "    print(res, pose.residue_type(res).name3(), pose.residue_type(res).name())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets take a closer look at that Mannose, at the end of the tree."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "node390 = tree1.get_node(390)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"n_children\", len(node390.get_children()))\n",
    "print(\"parent\", node390.get_parent())\n",
    "print(\"distance\", node390.get_distance_to_start())\n",
    "print(\"exocylic_connection\", node390.has_exocyclic_linkage())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### CarbohydrateInfo\n",
    "\n",
    "Lets get a bit more information on this particular glycan residue."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "info390 = pose.residue_type(390).carbohydrate_info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "info390.anomeric_carbon()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "info390.anomeric_carbon_name()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "info390.basic_name()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "info390.cyclic_oxygen()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "info390.cyclic_oxygen_name()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "info390.full_name()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "info390.has_exocyclic_linkage_to_child_mainchain()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "info390.is_alpha_sugar()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "info390.is_amino_sugar()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "info390.is_beta_sugar()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "info390.is_cyclic()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "info390.is_acetylated()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, the `CarbohydrateInfo` object of `ResidueType` provides a great deal of information on this particular sugar.  By using the `GlycanTreeSet` and the `CarbohdrateInfo` objects, one can delineate nearly everything you wish to know about about a particular tree, glycan, and the connections of them in respect to each other and the whole pose. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## GlycanInfoMover\n",
    "\n",
    "This mover essentially prints much of the connectivity information of a particular pose.  It is useful as a first-pass to get general info and to make sure that Rosetta is loading your glycan properly.\n",
    "\n",
    "Note: You will need to look at the terminal for output of this mover."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from rosetta.protocols.analysis import *"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "glycan_info = GlycanInfoMover()\n",
    "glycan_info.apply(pose)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(Output copied below)\n",
    "\n",
    "```\n",
    "branch Point: ASN 107 124 A \n",
    "Branch Point: ASN 160 177 A \n",
    "Branch Point: ASN 368 385 A \n",
    "Carbohydrate: 388 501 A  Parent: 107 BP: 0 501 A   CON: _->4       DIS: 0 ShortName: ->4)-beta-D-GlcpNAc-\n",
    "Carbohydrate: 389 502 A  Parent: 388 BP: 0 502 A   CON: _->4       DIS: 1 ShortName: ->4)-beta-D-GlcpNAc-\n",
    "Carbohydrate: 390 503 A  Parent: 389 BP: 0 503 A   CON:            DIS: 2 ShortName: beta-D-Manp-\n",
    "Carbohydrate: 391 504 A  Parent: 160 BP: 0 504 A   CON: _->4       DIS: 0 ShortName: ->4)-beta-D-GlcpNAc-\n",
    "Carbohydrate: 392 505 A  Parent: 391 BP: 0 505 A   CON: _->4       DIS: 1 ShortName: ->4)-beta-D-GlcpNAc-\n",
    "Carbohydrate: 393 506 A  Parent: 392 BP: 1 506 A   CON: _->3,_->6  DIS: 2 ShortName: ->3)-beta-D-Manp-\n",
    "Carbohydrate: 394 507 A  Parent: 393 BP: 0 507 A   CON:            DIS: 3 ShortName: alpha-D-Manp-\n",
    "Carbohydrate: 395 508 A  Parent: 393 BP: 0 508 A   CON:            DIS: 3 ShortName: alpha-D-Manp-\n",
    "Carbohydrate: 396 509 A  Parent: 368 BP: 0 509 A   CON:            DIS: 0 ShortName: beta-D-GlcpNAc-\n",
    "Branch Point: ASN 503 124 B \n",
    "Branch Point: ASN 556 177 B \n",
    "Branch Point: ASN 764 385 B \n",
    "Carbohydrate: 797 501 B  Parent: 503 BP: 1 501 B   CON: _->4,_->6  DIS: 0 ShortName: ->4)-beta-D-GlcpNAc-\n",
    "Carbohydrate: 798 502 B  Parent: 797 BP: 0 502 B   CON:            DIS: 1 ShortName: beta-D-GlcpNAc-\n",
    "Carbohydrate: 799 503 B  Parent: 797 BP: 0 503 B   CON:            DIS: 1 ShortName: alpha-L-Fucp-\n",
    "Carbohydrate: 800 504 B  Parent: 556 BP: 0 504 B   CON: _->4       DIS: 0 ShortName: ->4)-beta-D-GlcpNAc-\n",
    "Carbohydrate: 801 505 B  Parent: 800 BP: 0 505 B   CON: _->4       DIS: 1 ShortName: ->4)-beta-D-GlcpNAc-\n",
    "Carbohydrate: 802 506 B  Parent: 801 BP: 1 506 B   CON: _->3,_->6  DIS: 2 ShortName: ->3)-beta-D-Manp-\n",
    "Carbohydrate: 803 507 B  Parent: 802 BP: 0 507 B   CON:            DIS: 3 ShortName: alpha-D-Manp-\n",
    "Carbohydrate: 804 508 B  Parent: 802 BP: 0 508 B   CON:            DIS: 3 ShortName: alpha-D-Manp-\n",
    "Carbohydrate: 805 509 B  Parent: 764 BP: 0 509 B   CON:            DIS: 0 ShortName: beta-D-GlcpNAc-\n",
    "Glycan Residues: 18\n",
    "Protein BPs: 6\n",
    "TREES\n",
    "107 124 A  Length: 3\n",
    "160 177 A  Length: 5\n",
    "368 385 A  Length: 1\n",
    "503 124 B  Length: 3\n",
    "556 177 B  Length: 5\n",
    "764 385 B  Length: 1\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Branched Connections\n",
    "\n",
    "Now we can see all of our glycans in the pose, all of their parents, and how all of them are connected to one another. Note residue 803 - here we have two connections.  both at carbons 3 and 6.  This means we have a branched connection and that residue 802 has two children.  A branched connection is always at carbon 6, which is an exocyclic connection.  This point has 3 backbone dihedrals instead of our standard two.  Lets confirm all of that. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#This is code used to get the branch points in CarbohydrateInfoMover, converted from C++:\n",
    "def get_connections(localpose, resnum):\n",
    "    info = localpose.residue(resnum).carbohydrate_info()\n",
    "    outstring = \"\"\n",
    "    attach = \"_->\"\n",
    "\n",
    "    if info.mainchain_glycosidic_bond_acceptor():\n",
    "        outstring = attach + str(info.mainchain_glycosidic_bond_acceptor())\n",
    "    \n",
    "\n",
    "    for i in range(1, info.n_branches()+1):\n",
    "        outstring = outstring + \",\" +attach + str(info.branch_point( i ))\n",
    "    \n",
    "    return outstring;\n",
    "                   \n",
    "get_connections(pose, 802)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tree802 = tree_set.get_tree_containing_residue(802)\n",
    "node802 = tree_set.get_node(802)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"len\", tree802.size())\n",
    "print(\"children\", node802.get_children())\n",
    "print(\"exocyclic\", node802.has_exocyclic_linkage())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that 802 doesn't have an exocyclic back to it's parent - however, one of its children has the exocyclic connection back to it.  Lets find out which one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"exo_803\", tree802.get_node(803).has_exocyclic_linkage())\n",
    "print(\"exo_804\", tree802.get_node(804).has_exocyclic_linkage())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Cool.  So residue 804 is branched connection. Lets take a closer look."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "node804 = tree802.get_node(804)\n",
    "node803 = tree802.get_node(803)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "node802.get_mainchain_child()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### MoveMapFactory vs MoveMap creation\n",
    "\n",
    "Here is something important to note.  Rosetta has a concept of the 'mainchain' as it was primarily written for proteins - that are linear in nature.  At the deep part of Rosetta, even sugars are denoted as having a 'mainchain'.  This mainchain is the 'non-branched' connections.  In this case, the mainchain continues onto residue 803, while the 'branch' goes off to residue 804.  This is __EXTREMELY__ important to be aware of as MoveMaps have seperate switches for 'branched' torsions.  In this way, you should always use the `MoveMapFactory` which does all this automatically for creating glycan Movemaps or torsions that are branched will not be turned on!!! "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After that side-note, lets confirm that there are indeed 3 torsions for the branched connection of residue 802 and 804. Remember that torsions are defined from child TO parent!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from rosetta.core.pose.carbohydrates import *\n",
    "from rosetta.core.conformation.carbohydrates import *"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "get_n_glycosidic_torsions_in_res(pose.conformation(), 804)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Great.  We have 3. Lets make sure our mainchild child has two."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "get_n_glycosidic_torsions_in_res(pose.conformation(), 803)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Awesome.  Finally, lets see how many torsions between our first glycan residue of this tree and the ASN.  Note that ASN has 3 'chi' angles before glycosylation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "get_n_glycosidic_torsions_in_res(pose.conformation(), tree1.get_start())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After glycosylation, this ASN chi no longer has side-chains to pack.  In the packer, they are turned off, as they are now part of the glycan backbone.  How does Rosetta know that this should be turned off?  Lets see."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "protein_res = tree802.get_node(tree802.get_start()).get_parent()\n",
    "print(protein_res, pose.residue_type(protein_res).name3())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Is Branch Point:\", pose.residue(protein_res).is_branch_point())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, Now we can see that this residue is a branch point - meaning that it once again has a mainchain connection that goes onto the the next protein residue, and a branch out to the start of the glycan.  Take a look at the rest of the glycan residues.  Which are the branch points?  Does this info match what the `GlycanInfoMover` printed?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Glycan Residue Selectors\n",
    "\n",
    "Now that we have a good idea about the glycans in our pose, lets use some residue selectors that use the underlying tools that we just learned about. \n",
    "\n",
    "### GlycanResidueSelector\n",
    "\n",
    "The most basic, but useful selector is the `GlycanResidueSelector`.  Here is the description:\n",
    "```\n",
    "A ResidueSelector for carbohydrates and individual carbohydrate trees.\n",
    "  Selects all Glycan residues if no option is given or the branch going out from the root residue. \n",
    "  Selecting from root residues allows you to choose the whole glycan branch or only tips, etc.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### All Carbohydrates\n",
    "\n",
    "First, lets select all carbohydrate residues in the pose."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from rosetta.core.select.residue_selector import *\n",
    "glycan_selector = GlycanResidueSelector()\n",
    "all_glycans = glycan_selector.apply(pose)\n",
    "\n",
    "def print_selection(localpose, selection):\n",
    "    for i in range(1, localpose.size()+1):\n",
    "        if selection[i]:\n",
    "            print(i, localpose.residue_type(i).name3())\n",
    "\n",
    "print_selection(pose, all_glycans)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Branch Selection\n",
    "\n",
    "Now lets select a particular glycan tree.  We can give either the start of the tree or the connecting protein residue.  By default, we do not include the root residue that we pass to the selector.  This selection is useful for modeling only a particular glycan tree (or parts of a tree) at a time.  It will select all the children and all the children of children/etc. from your selection, out to the tips. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "glycan_selector.set_select_from_branch_residue(800)\n",
    "glycan_selector.set_include_root(True)\n",
    "\n",
    "print_selection(pose, glycan_selector.apply(pose))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Cool.  Now selection from the ASN, but not include the root."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "glycan_selector.set_select_from_branch_residue(556)\n",
    "glycan_selector.set_include_root(False)\n",
    "print_selection(pose, glycan_selector.apply(pose))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We get the same results.  Awesome.  This selector can be used in modeling and design tasks in the next tutorial.   We can also pass multiple branch residues to select many parts or use the `AndSelector` as you have seen previously to combine selections.  Pass that selector to the `MoveMapFactory` when doing any minimization or relax."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### GlycanLayerSelector\n",
    "\n",
    "A selector for choosing glycan residues based on their layer - as measured by the residue distance to the start of the glycan tree.\n",
    "\n",
    "If no layer is set, will select all glycan residues.\n",
    "\n",
    "This layer selector is used for modeling glycans from their roots out to their trees as you will see in the next tutorial.  This definition of 'layer' is useful due to branching and can be used to optimize specific layers at a time. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Image('./Media/tree_layers.png',width='200')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "layer_selector = GlycanLayerSelector()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Lets select just the first two layers of all the glycans.\n",
    "\n",
    "layer_selector.set_layer(0, 1)\n",
    "print_selection(pose, layer_selector.apply(pose))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Now lets select only those glycans that have larger layers\n",
    "layer_selector.set_layer_as_greater_than_or_equal_to(2)\n",
    "print_selection(pose, layer_selector.apply(pose))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note the residues 802, and children 803 and 804.  Both 803 and 804 will have the same layer."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### GlycanSequonSelector\n",
    "\n",
    "So we have some tools for selecting specific glycan residues we are interested in.  Now lets change tune a bit.  A Sequon is the 3 residue motif recognized by GlycosylTransferase that adds the first glycan onto a protein. There are a few sequon's that are recognized by the glycosylation machinery`[N:(not-p):(S or T)]`, and you can set all or specific ones to use in this selector (via RosettaScripts unfortunately).\n",
    "\n",
    "We'll cover this more in-depth in the next tutorial, but this selector can be useful for finding potential glycosylation sites in a pose.\n",
    "\n",
    "Note that the `ResidueInSequenceMotifSelector` is a general-purpose version of this selector. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sequon_selector = GlycanSequonsSelector()\n",
    "print_selection(pose, sequon_selector.apply(pose))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that while 8 motifs were found in th pose, not all are glycosylated - in fact 6/8 are glycosylated from our information from the `GlycanInfoMover`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### RandomGlycanFoliageSelector\n",
    "\n",
    "This is a simple selector that Selects a random carbohydrate residue from a subset or selector, then selects the rest of the glycan foliage.  Used for sampling."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "random_foliage = RandomGlycanFoliageSelector()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "random_foliage.set_selector(glycan_selector)\n",
    "print_selection(pose, random_foliage.apply(pose))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print_selection(pose, random_foliage.apply(pose))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print_selection(pose, random_foliage.apply(pose))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print_selection(pose, random_foliage.apply(pose))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Glycan Movers\n",
    "\n",
    "Lets do a quick look at some useful glycan-specific movers \n",
    "\n",
    "### LinkageConformerMover\n",
    "\n",
    "The `LinkageConformerMover` is an integral part of glycan modeling.  This mover puts a 'conformer' of glycan dihedrals into a pose that was identified through a large-scale bioinformatic analysis.  A conformer is a well-defined and well-represented set of dihedral angles for a specific linkage.  The linkage is specific for different types of sugars in the i and i+1 spot, as well as the specific the i+1 glcyan is connected to on residue i.  The mover is useful, but should not be used by itself.  You will want a MonteCarlo object, and most likely some packing and minimization to go with it. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from rosetta.protocols.carbohydrates import *\n",
    "score = get_score_function()\n",
    "mc = MonteCarlo(pose, score, 4.0)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "conformer_mover = LinkageConformerMover()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "conformer_mover.set_residue_selector(layer_selector)\n",
    "conformer_mover.set_use_gaussian_sampling(True)\n",
    "conformer_mover.set_use_conformer_population_stats(False)\n",
    "\n",
    "pose = pose_original.clone()\n",
    "\n",
    "for i in range(1, 750):\n",
    "    conformer_mover.apply(pose)\n",
    "    print(score.score(pose), mc.boltzmann(pose))\n",
    "\n",
    "mc.recover_low(pose)\n",
    "print(score.score(pose_original), score.score(pose))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Did we find a better conformer? Did our energy decrease even in a pre-refined pose?  It should be noted that we use guassian sampling to sample around the gaussian using the standard deviation and mean of each dihedral in the conformer.  Otherwise, we just use the mean.  This gives some variance in our conformers.\n",
    "\n",
    "Use the RMSDMetric and the `LayerSelector` and `GlycanLayerSelector` to calculate the RMSD of change of the pose relative to the original_pose.  Take a look in PyMol.  How much has it changed?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Correcting Structures\n",
    "\n",
    "#### TautomerizeAnomerMover\n",
    "\n",
    "This mover is useful when solving structures of glycans or fixing errors in PDB files.  Here is the description:\n",
    "\n",
    "A Mover class for tautomerizing from one anomer to another at a reducing end.\n",
    "\n",
    "@details  \n",
    "\n",
    "This carbohydrate-specific Mover randomly selects a free reducing end (not a glycoside) and inverts the\n",
    "stereochemistry, swapping alpha anomers for beta and beta for alpha.  (This could be considered an extremely\n",
    "limited design case; however, reducing ends readily tautomerize in solution, in contrast to other cases, in which\n",
    "residues do not readily mutate into others!)  \n",
    "\n",
    "It is generally not certain which form is preferred (if any) in\n",
    "sugar-binding proteins, and crystal structures sometimes arbitrarily assign one anomer over another when fitting\n",
    "density, so this Mover can assure that each anomer is sampled.\n",
    "If a ResidueSelector is set, the Mover will select from the subset at random; it will not guarantee\n",
    "tautomerization of every Residue in the subset.\n",
    "\n",
    "In this case, all of our residues are lower termini, as they are all connected to proteins - so this mover doesn't do anything for us - but if you had free glycans, you could use this here. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "glycan_selector = GlycanResidueSelector()\n",
    "\n",
    "tautomerize_mover = TautomerizeAnomerMover()\n",
    "tautomerize_mover.selector(glycan_selector)\n",
    "\n",
    "\n",
    "pose = pose_original.clone()\n",
    "\n",
    "tautomerize_mover.apply(pose)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### IdealizeAnomericHydrogens\n",
    "\n",
    "This mover was references in Frenz, et al - Automatically fixing errors in carbohydrate structures.  Referenced in the previous tutorial.  It is used to idealize anomeric hydrogens, which sometimes can be quite wrong in structures or not well optimized. Since our input structure was actually refined through cartesian relax into the crystal density, the hydrogens should already be idealized."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pose = pose_original.clone()\n",
    "idealize_anomeric_hs = IdealizeAnomericHydrogens()\n",
    "idealize_anomeric_hs.apply(pose)\n",
    "\n",
    "print(score.score(pose_original), score.score(pose))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### RingPlaneFlipMover\n",
    "\n",
    "https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/Movers/movers_pages/carbohydrates/RingPlaneFlipMover\n",
    "\n",
    "Based on a given ResidueSelector and limited by a MoveMap, this Mover selects applicable cyclic residues and performs a 180-degree shearing move in which the anomeric bond and the main-chain bond on the opposite side of the ring are moved in opposite directions. An \"applicable\" residue is limited to 1,4-linked aldopyranoses or 2,5-linked ketopyranoses for which both the anomeric bond and the glycosidic linkage bond are equatorial.\n",
    "\n",
    "This Mover is useful in cases — for example, when working with highly charged and sulfated heparins — where Rosetta models an oligo- or polysaccharide in such a way that the residue is sitting in the relatively correct position but is missing favorable interactions that it could make on the other side of the glycan ring. Sometimes, a simple \"ring flip\" could correct this, but the energy barrier to rotate is too high; the small moves of a ShearMover would never flip the ring around."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ring_flipper = RingPlaneFlipMover()\n",
    "\n",
    "pose = pose_original.clone()\n",
    "mc = MonteCarlo(pose, score, 4.0)\n",
    "\n",
    "for i in range(1, 200):\n",
    "    ring_flipper.apply(pose)\n",
    "    print(score.score(pose), mc.boltzmann(pose))\n",
    "\n",
    "mc.recover_low(pose)\n",
    "print(score.score(pose_original), score.score(pose))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Glycosylation\n",
    "\n",
    "Glycosylation can be performed by either a function as you have seen in the previous tutorial, or through a mover, the `SimpleGlycosylateMover`.  This mover is covered in the next tutorial"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### BB Sampling\n",
    "\n",
    "Here, we will cover a few more components of glycan sampling, without the modeling movers covered in the next section\n",
    "\n",
    "#### GlycanTreeMinMover\n",
    "The `GlycanTreeMinMover` is useful as it randomly selects a glycan tree and a residue in the glycan tree set to move through a movemap, and then minimizes the rest of glycan foliage. Underneath the hood, it uses the `RandomGlycanFoliageSelector`.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "foliage_min = GlycanTreeMinMover(glycan_selector)\n",
    "foliage_min.apply(pose)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "foliage_min.apply(pose)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "foliage_min.apply(pose)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### BBSampler: SugarBBSampler\n",
    "\n",
    "Since glycans can have 2-4+ dihedral angles, a new way to sample specific backbone residues was created.  This is the BBSampler framework. It is also integrated into the GlycanModeler covered in the next chapter.\n",
    "\n",
    "The `SugarBBSampler` works by using the `sugar_bb` energy term as probabilities for each dihedral of each linkage type and using them for sampling.\n",
    "\n",
    "Note that we could sample on omega as well, and we can give the sampler a mask using the function `set_dihedral_mask()` that tells it which residues have which dihedrals to sample.  For now, we'll just sample phi and psi."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from rosetta.protocols.simple_moves import *\n",
    "from rosetta.protocols.simple_moves.bb_sampler import *\n",
    "from rosetta.core.id import *"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sampler = BBDihedralSamplerMover()\n",
    "sugar_bb_phi = SugarBBSampler(phi_dihedral)\n",
    "sugar_bb_psi = SugarBBSampler(psi_dihedral)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#We include from residue 801, as there is no sugarbb data for the ASN-glycan linkage.  \n",
    "glycan_selector.set_select_from_branch_residue(801)\n",
    "glycan_selector.set_include_root(True)\n",
    "\n",
    "sampler.add_sampler(sugar_bb_phi)\n",
    "sampler.add_sampler(sugar_bb_psi)\n",
    "sampler.set_residue_selector(glycan_selector)\n",
    "for i in range(1, 300):\n",
    "    sampler.apply(pose)\n",
    "    print(score.score(pose), mc.boltzmann(pose))\n",
    "\n",
    "mc.recover_low(pose)\n",
    "print(score.score(pose_original), score.score(pose))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Were we able to improve the energy just from using sugarBB on this glycan tree?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "This has covered most of the current RosettaCarbohydrate components you may find useful.  The next tutorial will build on these components and use new movers that encorporate them for modeling and design. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [RosettaCarbohydrates](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/13.00-RosettaCarbohydrates-Working-with-Glycans.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [RosettaCarbohydrates: Modeling and Design](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/13.02-Glycan-Modeling-and-Design.ipynb) ><p><a href=\"https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/13.01-Glycan-Trees-Selectors-and-Movers.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open in Google Colaboratory\"></a>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.0"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}