<!--NOTEBOOK_HEADER-->
*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta.notebooks);
content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*

<!--NAVIGATION-->
< [RosettaAntibodyDesign](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/12.02-RosettaAntibodyDesign-RAbD.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [RosettaCarbohydrates: Trees, Selectors and Movers](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/13.01-Glycan-Trees-Selectors-and-Movers.ipynb) ><p><a href="https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/13.00-RosettaCarbohydrates-Working-with-Glycans.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open in Google Colaboratory"></a>

# RosettaCarbohydrates
Keywords: carbohydrate, glycan, sugar, glucose, mannose, sugar, GlycanTreeSet, saccharide, furanose, pyranose, aldose, ketose

## Overview

In this chapter, we will focus on a special subset of non-peptide oligo- and polymers — carbohydrates.</p>

Modeling carbohydrates — also known as saccharides, glycans, or simply sugars — comes with some special challenges. For one, most saccharide residues contain a ring as part of their backbone. This ring provides potentially new degrees of freedom when sampling. Additionally, carbohydrate structures are often branched, leading in Rosetta to more complicated `FoldTrees`.
    
This chapter includes a quick overview of carbohydrate nomenclature, structure, and basic interactions within Rosetta.


## Carbohydrate Chemistry Background

<div class="figure-right" style="clear:right;float:right;width:300px"><img src="./Media/Fig-pyranose_vs_furanose.png" width="300px"><small><b>Figure 1.</b> A pyranose (left) and a furanose (right).</small></div>
<p>Sugars (<b>saccharides</b>) are defined as hyroxylated aldehydes and ketones. A typical monosaccharide has an equal number of carbon and oxygen atoms. For example, glucose has the molecular formula C<sub>6</sub>H<sub>12</sub>O<sub>6</sub>.</p><p>Sugars containing more than three carbons will spontaneously cyclize in aqueous environments to form five- or six-membered hemiacetals and hemiketals. Sugars with five-membered rings are called <b>furanoses</b>; those with six-membered rings are called <b>pyranoses</b> (Fig. 1).</p>
<div class="figure-left" style="clear:left;float:left;width:300px"><img src="./Media/Fig-aldose_vs_ketose.png" width="300px"><small><b>Figure 2.</b> An aldose (left) and a ketose (right).</small></div>
<p>A sugar is classified as an <b>aldose</b> or <b>ketose</b>, depending on whether it has an aldehyde or ketone in its linear form (Fig. 2).</p>
<p>The different sugars have different names, depending on the stereochemistry at each of the carbon atoms in the molecule. For example, glucose has one set of stereochemistries, while mannose has another.</p>
<p>In addition to their full names, many individual saccharide residues have three-letter codes, just like amino acid residues do. Glucose is "Glc" and mannose is "Man".</p>

## Backbone Torsions, Residue Connections, and side-chains

A glycan tree is made up of many sugar residues, each residue a ring.  The 'backbone' of a glycan is the connection between one residue and another. The chemical makeup of each sugar residue in this 'linkage' effects the propensity/energy of each bacbone dihedral angle.  In addition, sugars can be attached via different carbons of the parent glycan.  In this way, the chemical makeup and the attachment position effects the dihedral propensities. Typically, there are two backbone dihedral angles, but this could be up to 4+ angles depending on the connection.

In IUPAC, the dihedrals of N are defined as the dihedrals between N and N-1 (IE - the parent linkage). The ASN (or other glycosylated protein residue's) dihedrals become part of the first glycan residue that is connected. For this first first glycan residue that is connected to an ASN, it has 4 torsions, while the ASN now has none!

If you are creating a movemap for dihedral residues, please use the `MoveMapFactory` as this has the IUPAC nomenclature of glycan residues built in in order to allow proper DOF sampling of the backbone residues, especially for branching glycan trees.  In general, all of our samplers should use residue selectors and use the MoveMapFactory to build movemaps internally.

A sugar's side-chains are the constitutents of the glycan ring, which are typically an OH group or an acetyl group.  These are sampled together at 60 degree angles by default during packing.  A higher granularity of rotamers cannot currently be handled in Rosetta, but 60 degrees seems adequete for our purposes.

Within Rosetta, glycan connectivity information is stored in the `GlycanTreeSet`, which is continually updated to reflect any residue changes or additions to the pose. 
This info is always available through the function 

		pose.glycan_tree_set()

Chemical information of each glycan residue can be accessed through the CarbohydrateInfo object, which is stored in each ResidueType object: 

		pose.residue_type(i).carbohydrate_info()
      
We will cover both of these classes in the next tutorial.

## Documentation
https://www.rosettacommons.org/docs/latest/application_documentation/carbohydrates/WorkingWithGlycans


## References


**Residue centric modeling and design of saccharide and glycoconjugate structures**
Jason W. Labonte  Jared Adolf-Bryfogle  William R. Schief  Jeffrey J. Gray
_Journal of Computational Chemistry_, 11/30/2016 - <https://doi.org/10.1002/jcc.24679>


**Automatically Fixing Errors in Glycoprotein Structures with Rosetta**
Brandon Frenz, Sebastian Rämisch, Andrew J. Borst, Alexandra C. Walls
Jared Adolf-Bryfogle, William R. Schief, David Veesler, Frank DiMaio
_Structure_, 1/2/2019

## Initialization

<p>Let's use Pyrosetta to compare some common monosaccharide residues and see how they differ. As usual, we start by importing the `pyrosetta` and `rosetta` namespaces.</p>

In [None]:
!pip install pyrosettacolabsetup
import pyrosettacolabsetup; pyrosettacolabsetup.install_pyrosetta()
import pyrosetta; pyrosetta.init()


In [None]:
from pyrosetta import *
from pyrosetta.teaching import *
from pyrosetta.rosetta import *

First, one needs the `-include_sugars` option, which will tell Rosetta to load sugars and add the sugar_bb energy term to a default scorefunction.  This scoreterm is like rama for the sugar dihedrals which connect each sugar residue. 

In [None]:
init('-include_sugars')

When loading structures from the PDB that include glycans, we use these options. This includes an option to write out the structures in pdb format instead of the (better) Rosetta format.  We will be using these options in the next tutorial. 

		-maintain_links
		-auto_detect_glycan_connections
		-alternate_3_letter_codes pdb_sugar
		-write_glycan_pdb_codes
        -load_PDB_components false

<ul><li>Set up the `PyMOLMover` for viewing structures.</li></ul>

In [None]:
pm = PyMOLMover()

## Creating Saccharides from Sequence

We will use the function, `pose_from_saccharide_sequence()`, which must  be imported from the `core.pose` namespace. Unlike with peptide chains, one-letter-codes will not suffice when specifying saccharide chains, because there is too much information to convey; we must use at least four letters. The first three letters are the sugar's three-letter code; the fourth letter designates whether the residue is a furanose (`f`) or pyranose (`p`).

In [None]:
from pyrosetta.rosetta.core.pose import pose_from_saccharide_sequence

In [None]:
glucose = pose_from_saccharide_sequence('Glcp')
galactose = pose_from_saccharide_sequence('Galp')
mannose = pose_from_saccharide_sequence('Manp')

<ul><li>Use the `PyMOLMover` to compare the three monosacharides in PyMOL.</li></ul>

<ul><li>At which carbons do the three sugars differ?</li></ul>

### L and D Forms

<p>Just like with peptides, saccharides come in two enantiomeric forms, labelled <font style="font-variant: small-caps">l</font> and <font style="font-variant: small-caps">d</font>. (Note the small-caps, used in print.) These can be loaded into PyRosetta using the prefixes `L-` and `D-`.</p>

In [None]:
L_glucose = pose_from_saccharide_sequence('L-Glcp')
D_glucose = pose_from_saccharide_sequence('D-Glcp')

<ul><li>Compare the two structures in PyMOL. Notice that all stereocenters are inverted between the two monosaccharides.</li></ul>

<ul><li>Which enantiomer is loaded by PyRosetta by default if <font style="font-variant: small-caps">l</font> or <font style="font-variant: small-caps">d</font> are not specified?</li></ul>

### Anomers

<p>The carbon that is at a higher oxidation state — that is, the carbon of the hemiacetal/-ketal in the cyclic form or the carbon that is the carbonyl carbon of the aldehyde or ketone in the linear form — is called the <b>anomeric carbon</b>. Because the carbonyl of an aldehyde or ketone is planar, a sugar molecule can cyclize into one of two forms, one in which the resulting hydroxyl group is pointing "up" and another in which the same hydroxyl group is pointing "down". These two <b>anomers</b> are labelled α and β.</p>
<ul><li>Create a one-residue `Pose` for both α- and β-<font style="font-variant: small-caps">d</font>-glucopyranose and use PyMOL to compare both.</li></ul>

In [None]:
alpha_D_glucose = pose_from_saccharide_sequence('a-D-Glcp')

<ul><li>For which anomer is the C1 hydroxyl group axial to the chair conformation of the six-membered pyranose ring?</li>
<li>Which anomer of <font style="font-variant: small-caps">d</font>-glucose would you predict to be the most stable? (Hint: remember what you learned in organic chemistry about axial and equatorial substituents.)</li></ul>

### Linear Oligosaccharides & IUPAC Sequences

Oligo- and polysaccharides are composed of simple monosaccharide residues connected by acetal and ketal linkages called __glycosidic bonds__. Any of the monosaccharide's _hydroxyl_ groups can be used to form a linkage to the anomeric carbon of another monosaccharide, leading to both _linear_ and _branched_ molecules.
    
Rosetta can create both _linear_ and _branched_ oligosaccharides from an __IUPAC__ sequence. (IUPAC is the international organization dedicated to chemical nomenclature.)

<p>To  properly  build  a linear oligosaccharide, Rosetta  must  know  the  following  details  about  each sugar residue being created in the following order:</p>

 - Main-chain connectivity — →2) (`->2)`), →4) (`->4)`), →6) (`->6)`), _etc._; default value  is `->4)-`
 - Anomeric form — α (`a` or `alpha`) or β (`b` or `beta`); default value is `alpha`
 - Enantiomeric form — <font style="font-variant: small-caps">l</font> (`L`) or <font style="font-variant: small-caps">d</font> (`D`); default value is `D`
 - 3-Letter code — required; uses sentence case
 - Ring form code — <i>f</i> (for a furanose/5-membered ring), <i>p</i> (for a pyranose/6-membered ring); required
  
  
Residues must be separated by hyphens. Glycosidic linkages can be specified with full IUPAC notation, _e.g._, `-(1->4)-` for “-(1→4)-”. (This means that the residue on the left connects from its C1 (anomeric) position to the hydoxyl oxygen at C4 of the residue on the right.) Rosetta will assume `-(1->` for  aldoses  and `-(2->` for ketoses.</p><p>Note that the standard is to write the IUPAC sequence of a saccharide chain <em>in reverse order from how they are numbered</em>.  Lets create three new oligosacharides from sequence.

In [None]:
maltotriose = pose_from_saccharide_sequence('a-D-Glcp-' * 3)
lactose = pose_from_saccharide_sequence('b-D-Galp-(1->4)-a-D-Glcp')
isomaltose = pose_from_saccharide_sequence('->6)-Glcp-' * 2)

### General Residue Information
When you print a `Pose` containing carbohydrate residues, the sugar residues will be listed as `Z` in the sequence.

In [None]:
print("maltotriose\n", maltotriose)
print("\nisomaltose\n", isomaltose)
print("\nlactose\n", lactose)

However, you can have Rosetta print out the sequences for individual chains, using the `chain_sequence()` method. If you do this, Rosetta is smart enough to give you a distinct sequence format for saccharide chains. (You may have noticed that the default file name for a `.pdb` file created from this `Pose` will be the same sequence.)

In [None]:
print(maltotriose.chain_sequence(1))

In [None]:
print(isomaltose.chain_sequence(1))

In [None]:
print(lactose.chain_sequence(1))

<p>Again, the standard is to show the sequence of a saccharide chain in reverse order from how they are numbered.</p>  This is also how phi, psi, and omega are defined. From i+1 to i. 

In [None]:
for res in lactose.residues: print(res.seqpos(), res.name())

<p>Notice that for polysaccharides, the upstream residue is called the <b>reducing end</b>, while the downstream residue is called the <b>non-reducing end</b>.</p>

You will also see the terms parent and child being used across Rosetta.  Here, for Residue 2, residue 1 is the parent.  For Residue 1, Residue 2 is the child.  Due to branching, residues can have more than one child/non-reducing-end, but only a single parent residue. 

<p>Rosetta stores carbohydrate-specific information within `ResidueType`. If you print a residue, this additional information will be displayed.</p>

In [None]:
print(glucose.residue(1))

<ul><li>Scanning the output from printing a glucose `Residue`, what is the general term for an aldose with six carbon atoms?</li>

## Exploring Carbohydrate Structure

### Torsion Angles

<p>Most bioolymers have predefined, named torsion angles for their main-chain and side-chain bonds, such as φ, ψ, and ω and the various χs for amino acid residues. The same is true for saccharide residues. The torsion angles of sugars are as follows:</p>
<ul><div class="figure-right" style="clear:right;float:right;width:300px"><img src="./Media/Fig-saccharide_main_chain_torsions.png" width="300px"><small><b>Figure 3.</b> A disaccharide's main-chain torsion angles.</small></div><div class="figure-right" style="clear:right;float:right;width:150px"><img src="./Media/Fig-saccharide_ring_torsions.png" width="150px"><small><b>Figure 4.</b> A monosaccharide's internal ring torsion angles.</small></div><div class="figure-right" style="clear:right;float:right;width:150px"><img src="./Media/Fig-saccharide_side_chain_torsions.png" width="150px"><small><b>Figure 5.</b> A monosaccharide's side-chain torsion angles.</small></div><li>φ — The 1<sup>st</sup> glycosidic torsion back to the <em>previous</em> (<i>n</i>&minus;1) residue. The angle is defined by the cyclic oxygen, the two atoms across the bond, and the cyclic carbon numbered one less than the glycosidic linkage position. For aldopyranoses, φ(<i>n</i>) is thus defined as O5(<i>n</i>)–C1(<i>n</i>)–O<i>X</i>(<i>n</i>&minus;1)–C<i>X</i>(<i>n</i>&minus;1), where <i>X</i> is the position of the glycosidic linkage. For aldofuranoses, φ(<i>n</i>) is defined as O4(<i>n</i>)–C1(<i>n</i>)–O<i>X</i>(<i>n</i>&minus;1)–CX(<i>n</i>&minus;1) For 2-ketopyranoses, φ(<i>n</i>) is defined as O6(<i>n</i>)–C2(<i>n</i>)–O<i>X</i>(<i>n</i>&minus;1)–C<i>X</i>(<i>n</i>&minus;1). For 2-ketofuranoses, φ(<i>n</i>) is defined as O5(<i>n</i>)–C2(<i>n</i>)–OX(<i>n</i>&minus;1)–CX(<i>n</i>&minus;1). <i>Et cetera</i>&hellip;.</li><li>ψ — The 2<sup>nd</sup> glycosidic torsion back to the <em>previous</em> (<i>n</i>&minus;1) residue. The angle is defined by the 
anomeric carbon, the two atoms across the bond, and the cyclic carbon numbered two less than 
    the glycosidic linkage position. ψ(<i>n</i>) is thus defined as C<sub>anomeric</sub>(<i>n</i>)–O<i>X</i>(<i>n</i>&minus;1)–C<i>X</i>(<i>n</i>&minus;1)–C<i>X</i>&minus;1(<i>n</i>&minus;1), where <i>X</i> is the position of the glycosidic linkage.</li><li>ω — The 3<sup>rd</sup> (and any subsequent) glycosidic torsion(s) back to the <em>previous residue</em>. ω<sub>1</sub>(<i>n</i>) is defined as O<i>X</i>(<i>n</i>&minus;1)–C<i>X</i>(<i>n</i>&minus;1)–C<i>X</i>&minus;1(<i>n</i>&minus;1)–C<i>X</i>&minus;2(<i>n</i>&minus;1), where <i>X</i> is the position of the glycosidic linkage. (This only applies to sugars with exocyclic connectivities.).  The connection in Figure 3 has an exocyclic carbon, but the other potential connection points do not - so only phi and psi would available as bacbone torsion angles for those connection points. 
    </li><li>ν<sub>1</sub> – ν<sub><i>n</i></sub> — The internal ring torsion angles, where <i>n</i> is the number of atoms in the ring. ν<sub>1</sub> defines the torsion across bond C1–C2, <i>etc</i>.</li><li>χ<sub>1</sub> – χ<sub><i>n</i></sub> — The side-chain torsion angles, where <i>n</i> is the number of carbons in the sugar residue. The angle is defined by the carbon numbered one less than the glycosidic linkage position, the two atoms across the bond, and the polar hydrogen. The cyclic ring counts as carbon 0. For an aldopyranose, χ<sub>1</sub> is thus defined by O5–C1–O1–HO1, and χ<sub>2</sub> is defined by C1–C2–O2–HO2. χ<sub>5</sub> is defined by C4–C5–C6–O6, because it rotates the exocyclic carbon rather than twists the ring. χ<sub>6</sub> 
is defined by C5–C6–O6–HO6.</li></ul>

Take special note of how φ, ψ, and ω are defined <em>in the reverse order</em> as the angles of the same names for amino acid residues!

The `chi()` method of `Pose` works with sugar residues in the same way that it works with amino acid residues, where the first argument is the χ subscript and the second is the residue number of the `Pose`.

In [None]:
galactose.chi(1, 1)


In [None]:
galactose.chi(2, 1)

In [None]:
galactose.chi(3, 1)

In [None]:
galactose.chi(4, 1)

In [None]:
galactose.chi(5, 1)

In [None]:
galactose.chi(6, 1)

Likewise, we can use `set_chi()` to change these torsion angles and observe the changes in 
PyMOL, setting the option to keep history to true.

In [None]:
from pyrosetta.rosetta.protocols.moves import AddPyMOLObserver

In [None]:
observer = AddPyMOLObserver(galactose, True)

In [None]:
pm.apply(galactose)

<ul><li>Perform the following torsion angle changes to galactose using `set_chi()` and observe which torsions move in PyMOL.<ul><li>Set χ<sub>1</sub> to 120&deg;.</li><li>Set χ<sub>2</sub> to 60&deg;.</li><li>Set χ<sub>3</sub> to 60&deg;.</li><li>Set χ<sub>4</sub> to 0&deg;.</li><li>Set χ<sub>5</sub> to 60&deg;.</li><li>Set χ<sub>6</sub> to &minus;60&deg;.</li></ul></li></ul>

In [None]:
galactose.set_chi(1, 1, 180)

In [None]:
YOUR-CODE-HERE

### Creating Saccharides from a PDB file

The `phi()`, `set_phi()`, `psi()`, `set_psi()`, `omega()`, and `set_omega()` methods of `Pose` also work with sugars. However, since `pose_from_saccharide_sequence()` may create a `Pose` with angles that cause the residues to wrap around onto each other, instead, let's reload some Pose's from `.pdb` files.

In [None]:
maltotriose = pose_from_file('inputs/glycans/maltotriose.pdb')
isomaltose = pose_from_file('inputs/glycans/isomaltose.pdb')

<ul><li>Now, try out the torsion angle getters and setters for the glycosydic bonds.</li></ul>

In [None]:
pm.apply(maltotriose)

In [None]:
maltotriose.phi(1)

In [None]:
maltotriose.psi(1)

In [None]:
maltotriose.phi(2)

In [None]:
maltotriose.psi(2)

In [None]:
maltotriose.omega(2)

In [None]:
maltotriose.phi(3)

In [None]:
maltotriose.psi(3)

<p>Notice how φ<sub>1</sub> and ψ<sub>1</sub> are undefined&mdash;the first residue is not connected to anything

In [None]:
observer = AddPyMOLObserver(maltotriose, True)

In [None]:
for i in (2, 3):
    maltotriose.set_phi(i, 180)
    maltotriose.set_psi(i, 180)

**Isomaltose** is composed of (1→6) linkages, so in this case omega torsions are defined. Get and set φ<sub>2</sub>, ψ<sub>2</sub>, ω<sub>2</sub></p> for isomaltose

In [None]:
observer = AddPyMOLObserver(isomaltose, True)

In [None]:
YOUR-CODE-HERE

<p>Any cyclic residue also stores its ν angles.</p>

In [None]:
pm.apply(glucose)

In [None]:
Glc1 = glucose.residue(1)

In [None]:
for i in range(1, 6): print(Glc1.nu(i))

<p>However, we generally care more about the ring conformation of a cyclic residue&rsquo;s rings, in this case, its only  ring with index of 1. (The output values here are the ideal angles, not the actual angles, which we viewed above.)</p>

In [None]:
print(Glc1.ring_conformer(1))

### RingConformers

<p>The output above warrants a brief explanation. First, what does `4C1` mean? Most of us likely remember learning about chair and boat conformations in Organic Chemistry. Do you recall how there are two distinct chair conformations that can interconvert between each other? The names for these specific conformations are <sup>4</sup>C<sub>1</sub> and <sup>1</sup>C<sub>4</sub>. The nomenclature is as follows: Superscripts to the left of the capital letter are above the plane of the ring if it is oriented such that its carbon atoms proceed in a clockwise direction when viewed from above. Subscripts to the right of the letter are below the plane of the ring. The letter itself is an abbreviation, where, for example, C indicates a chair conformation and B a boat conformation. In all, there are 38 different ideal ring conformations that any six-membered cycle can take.</p><p>`C-P parameters` refers to the Cremer&ndash;Pople parameters for this conformation (Cremer D, Pople JA. J Am Chem Soc. 1975;97:1354–1358.). C&ndash;P parameters are an alternative coordinate system used to refer to a ring conformation.</p><p>
    
Finally, a `RingConformer` in Rosetta includes the values of the ν angles. Each conformer has a unique set of angles.
    
`Pose::set_nu()` does not exist, because it would rip a ring apart. Instead, to change a ring conformation, we need to use the `set_ring_conformer()` method, which takes a `RingConformer` object. Most of the time, you will not need to adjust the ring conformers, but you should be aware of it. 

We can ask a cyclic `ResidueType` for one of its `RingConformerSet`s to give us the `RingConformer` we want. (Each `RingConformerSet` includes the list of possible idealized ring conformers that such a ring can attain as well as information about the most energetically favorable one.) Then, we can et the conformation for our residue through `Pose`. (The arguments for `set_ring_conformer()` are the `Pose`’s sequence position, ring number, and the new conformer, respectively.)</p> 

<ul><div class="figure-center" style="clear:middle;float:middle;width:300px"><img src="./Media/chair_figure.tif" width="300px"><medium><b>Figure 5.</b> The two chair conformations of α-d-glucopyranose. In the <sup>1</sup>C<sup>4</sup> conformation (left), all of the substituents are axial; in the <sup>4</sup>C<sup>1</sup> conformation (right), they are equatorial. <sup>4</sup>C<sup>1</sup>  is the most stable conformation for the majority of the α-d-aldohexopyranoses. In this nomenclature, a superscript means that that numbered carbon is above the ring, if the atoms are arranged in a clockwise manner from C1. A subscripted number indicates a carbon below the plane of the ring.

In [None]:
ring_set = Glc1.type().ring_conformer_set(1)

In [None]:
conformer = ring_set.get_ideal_conformer_by_name('1C4')

In [None]:
glucose.set_ring_conformation(1, 1, conformer)

In [None]:
pm.apply(glucose)

## Modified Sugars, Branched Oligosaccharides, &amp; `.pdb` File `LINK` Records

<p>Modified sugars can also be created in Rosetta, either from sequence or from file. In the former case, simply use the proper abbreviation for the modification after the “ring form code”. For example, the abbreviation for an <i>N</i>-acetyl group is “NAc”. Note the <i>N</i>-acetyl group in the PyMOL window.</p>

In [None]:
LacNAc = pose_from_saccharide_sequence('b-D-Galp-(1->4)-a-D-GlcpNAc')
pm.apply(LacNAc)

<p>Rosetta can handle branched oligosaccharides as well, but when loading from a sequence, this requires the use of brackets, which is the standard IUPAC notation. For example, here is how one would load Lewis<sup>x</sup> (Le<sup>x</sup>), a common branched glyco-epitope, into Rosetta by sequence.</p>

In [None]:
Lex = pose_from_saccharide_sequence('b-D-Galp-(1->4)-[a-L-Fucp-(1->3)]-D-GlcpNAc')
pm.apply(Lex)

One can also load branched carbohydrates from a `.pdb` file. These `.pdb` files must include `LINK` records, which are a standard part of the PDB format. Open the `test/data/carbohydrates/Lex.pdb` file and look bear the top to see an example `LINK` record, which looks like this:

```LINK         O3  Glc A   1                 C1  Fuc B   1     1555   1555  1.5   ```

It tells us that there is a covalent linkage between O3 of glucose A1 and C1 of fucose B1 with a bond length of 1.5 Å. (The `1555`s indicate symmetry and are ignored by Rosetta.)

Note that if the LINK records are not in order, or HETNAM records are not in a Rosetta format, we will fail to load.  In the next tutorial we will use auto-detection to do this. For now, we know Lex.pdb will load OK.

In [None]:
Lex = pose_from_file('inputs/glycans/Lex.pdb')
pm.apply(Lex)

You may notice when viewing the structure in PyMOL that the hybridization of the carbonyl of the amido functionality of the <i>N</i>-acetyl group is wrong. This is because of an error in the model deposited in the PDB from which this file was generated. This is, unfortunately, a very common problem with sugar structures found in the PDB.  It is always useful to use http://www.glycosciences.de to identify any errors in the solution PDB structure before working with them in Rosetta.  The referenced paper, __Automatically Fixing Errors in Glycoprotein Structures with Rosetta__ can be used as a guide to fixing these.

You may
also have noticed that the `inputs/glycans/Lex.pdb` file indicated in its `HETNAM` records that Glc1 was actually an <i>N</i>-acetylglycosamine (GlcNAc) with the indication `2-acetylamino-2-deoxy-`. This is optional and is helpful for human-readability, but Rosetta only needs to know the base `ResidueType` of each sugar residue; specific 
`VariantType`s needed — and most sugar modifications are treated as `VariantType`s — are determined automatically from 
the atom names in the `HETATM` records for the residue. Anything after the comma is ignored.</p><ul><li>Print out the `Pose` to see how the `FoldTree` is defined.

In [None]:
YOUR-CODE-HERE

Note the `CHEMICAL` `Edge` (`-2`). This is Rosetta’s way of indicating a branch backbone connection. Unlike a standard 
`POLYMER` `Edge` (`-1`), this one tells you which atoms are involved.<p><ul><li>Print out the sequence of each chain.

<ul><li>Now print out information about each residue in the Pose to see which VariantTypes and ResiduePropertys are assigned to each.</li></ul>

<ul><li>What are the three `VariantType`s of residue 1?</li><li>Output the various torison angles and make sure that you understand to which angles they correspond.</li></ul>

<p>Can you see now why φ and ψ are defined the way they are? If they were defined as in AA residues, they would not have unique definitions, since GlcNAc is a branch point. A monosaccharide can have multiple children, but it can never have more than a single parent.</p><p>Note that for this oligosaccharide χ<sub>3</sub>(1) is equivalent to ψ(3) and χ<sub>4</sub>(1) is equivalent to ψ(2). Make sure that you understand why!</p>

In [None]:
Lex.chi(3, 1), Lex.psi(3)

In [None]:
Lex.chi(4, 1), Lex.psi(2)

<p>For chemically modified sugars, χ angles are redefined at the positions where substitution has occurred. For new χs that have come into existence from the addition of new atoms and bonds, new definitions are added to new indices. For example, for Glc<i>N</i><sup>2</sup>Ac residue 1, χ<sub>C2–N2–C′–Cα′</sub> is accessed through `chi(7, 1)`.</p>

In [None]:
Lex.chi(2, 1)

In [None]:
Lex.set_chi(2, 1, 180)

In [None]:
pm.apply(Lex)

In [None]:
Lex.chi(7, 1)

In [None]:
Lex.set_chi(7, 1, 0)

In [None]:
pm.apply(Lex)

<ul><li>Play around with getting and setting the various torsion angles for Le<sup>x</sup></li></ul>

## <i>N</i>- and <i>O</i>-Linked Glycans

<p>Branching does not have to occur at sugars; a glycan can be attached to the nitrogen of an ASN or the oxygen of a SER or THR. <i>N</i>-linked glycans themselves tend to be branched structures.</p>  

We will cover more on linked glycan trees in the next tutorial through the `GlycanTreeSet` object - which is always present in a pose that has carbohydrates.

In [None]:
N_linked = pose_from_file('inputs/glycans/N-linked_14-mer_glycan.pdb')
pm.apply(N_linked)
print(N_linked)

In [None]:
for i in range(4): print(N_linked.chain_sequence(i + 1))

<ul><li>Which residue number is glycosylated above?</li></ul>

In [None]:
O_linked = pose_from_file('inputs/glycans/O_glycan.pdb')
pm.apply(O_linked)

<ul><li>Print `O-linked` and the sequence of each of its chains.</li></ul>

`set_phi()` and `set_psi()` still work when a glycan is linked to a peptide. (Below, we use `pdb_info()` to give help us select the residue that we want. In this case, in the `.pdb` file, the glycan is chain B.)

In [None]:
N_linked.set_phi(N_linked.pdb_info().pdb2pose("B", 1), 180)
pm.apply(N_linked)

<ul><li>Set ψ(B1) to 0&deg; and ω(B1) to 90&deg; and view the results in PyMOL.</li></ul>

<p>Notice that in this case ψ and ω affect the side-chain torsions (χs) of the asparagine residue. This is another case where there are multiple ways of both naming and accessing the same specific torsion angles.</p><p>One can also create conjugated glycans from sequences if performed in steps, first creating the peptide portion by loading from a `.pdb`
file or from sequence and then using the `glycosylate_pose()` function, (which needs to be imported first.) For example,  to glycosylate an ASA peptide with a single glucose at position 2 of the peptide, we perform the following:</p>

### Glycosylation by function

Here, we will glycosylate a simple peptide using the function, `glycosylate_pose`.  In the next tutorial, we will use a Mover interface to this function.

In [None]:
peptide = pose_from_sequence('ASA')

In [None]:
pm.apply(peptide)

In [None]:
from pyrosetta.rosetta.core.pose.carbohydrates import glycosylate_pose, glycosylate_pose_by_file

In [None]:
glycosylate_pose(peptide, 2, 'Glcp')
pm.apply(peptide)

Here, we uset the main function to glycosylate a pose.  In the next tutorial, we will use a Mover interface to do so.

<p>It is also possible to glycosylate a pose with common glycans found in the database. These files end in the `.iupac`
extension and are simply IUPAC sequences just as we have been using throughout this chapter.</p>

Here is a list of some common iupacs.

```
bisected_fucosylated_N-glycan_core.iupac
bisected_N-glycan_core.iupac
common_names.txt
core_1_O-glycan.iupac
core_2_O-glycan.iupac
core_3_O-glycan.iupac
core_4_O-glycan.iupac
core_5_O-glycan.iupac
core_6_O-glycan.iupac
core_7_O-glycan.iupac
core_8_O-glycan.iupac
fucosylated_N-glycan_core.iupac
high-mannose_N-glycan_core.iupac
hybrid_bisected_fucosylated_N-glycan_core.iupac
hybrid_bisected_N-glycan_core.iupac
hybrid_fucosylated_N-glycan_core.iupac
hybrid_N-glycan_core.iupac
man5.iupac
man9.iupac
N-glycan_core.iupac
```

In [None]:
peptide = pose_from_sequence('ASA'); pm.apply(peptide)

In [None]:
glycosylate_pose_by_file(peptide, 2, 'core_5_O-glycan')
pm.apply(peptide)

## Conclusion

You now have a grasp on the basics of RosettaCarbohydrates.  Please continue onto the next tutorial for more on glycan residue selection and various movers that can be of use when working with glycans. 

**Chapter contributors:**

- Jared Adolf-Bryfogle (Scripps; Institute for Protein Innovation)
- Jason Labonte (Jons Hopkins; Franklin and Marshall College)

<!--NAVIGATION-->
< [RosettaAntibodyDesign](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/12.02-RosettaAntibodyDesign-RAbD.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [RosettaCarbohydrates: Trees, Selectors and Movers](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/13.01-Glycan-Trees-Selectors-and-Movers.ipynb) ><p><a href="https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/13.00-RosettaCarbohydrates-Working-with-Glycans.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open in Google Colaboratory"></a>