{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta.notebooks);\n", "content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [RosettaAntibodyDesign](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/12.02-RosettaAntibodyDesign-RAbD.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [RosettaCarbohydrates: Trees, Selectors and Movers](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/13.01-Glycan-Trees-Selectors-and-Movers.ipynb) >

\"Open" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# RosettaCarbohydrates\n", "Keywords: carbohydrate, glycan, sugar, glucose, mannose, sugar, GlycanTreeSet, saccharide, furanose, pyranose, aldose, ketose\n", "\n", "## Overview\n", "\n", "In this chapter, we will focus on a special subset of non-peptide oligo- and polymers — carbohydrates.

\n", "\n", "Modeling carbohydrates — also known as saccharides, glycans, or simply sugars — comes with some special challenges. For one, most saccharide residues contain a ring as part of their backbone. This ring provides potentially new degrees of freedom when sampling. Additionally, carbohydrate structures are often branched, leading in Rosetta to more complicated `FoldTrees`.\n", " \n", "This chapter includes a quick overview of carbohydrate nomenclature, structure, and basic interactions within Rosetta.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Carbohydrate Chemistry Background" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Figure 1. A pyranose (left) and a furanose (right).
\n", "

Sugars (saccharides) are defined as hyroxylated aldehydes and ketones. A typical monosaccharide has an equal number of carbon and oxygen atoms. For example, glucose has the molecular formula C6H12O6.

Sugars containing more than three carbons will spontaneously cyclize in aqueous environments to form five- or six-membered hemiacetals and hemiketals. Sugars with five-membered rings are called furanoses; those with six-membered rings are called pyranoses (Fig. 1).

\n", "
Figure 2. An aldose (left) and a ketose (right).
\n", "

A sugar is classified as an aldose or ketose, depending on whether it has an aldehyde or ketone in its linear form (Fig. 2).

\n", "

The different sugars have different names, depending on the stereochemistry at each of the carbon atoms in the molecule. For example, glucose has one set of stereochemistries, while mannose has another.

\n", "

In addition to their full names, many individual saccharide residues have three-letter codes, just like amino acid residues do. Glucose is \"Glc\" and mannose is \"Man\".

\n", "\n", "## Backbone Torsions, Residue Connections, and side-chains\n", "\n", "A glycan tree is made up of many sugar residues, each residue a ring. The 'backbone' of a glycan is the connection between one residue and another. The chemical makeup of each sugar residue in this 'linkage' effects the propensity/energy of each bacbone dihedral angle. In addition, sugars can be attached via different carbons of the parent glycan. In this way, the chemical makeup and the attachment position effects the dihedral propensities. Typically, there are two backbone dihedral angles, but this could be up to 4+ angles depending on the connection.\n", "\n", "In IUPAC, the dihedrals of N are defined as the dihedrals between N and N-1 (IE - the parent linkage). The ASN (or other glycosylated protein residue's) dihedrals become part of the first glycan residue that is connected. For this first first glycan residue that is connected to an ASN, it has 4 torsions, while the ASN now has none!\n", "\n", "If you are creating a movemap for dihedral residues, please use the `MoveMapFactory` as this has the IUPAC nomenclature of glycan residues built in in order to allow proper DOF sampling of the backbone residues, especially for branching glycan trees. In general, all of our samplers should use residue selectors and use the MoveMapFactory to build movemaps internally.\n", "\n", "A sugar's side-chains are the constitutents of the glycan ring, which are typically an OH group or an acetyl group. These are sampled together at 60 degree angles by default during packing. A higher granularity of rotamers cannot currently be handled in Rosetta, but 60 degrees seems adequete for our purposes.\n", "\n", "Within Rosetta, glycan connectivity information is stored in the `GlycanTreeSet`, which is continually updated to reflect any residue changes or additions to the pose. \n", "This info is always available through the function \n", "\n", "\t\tpose.glycan_tree_set()\n", "\n", "Chemical information of each glycan residue can be accessed through the CarbohydrateInfo object, which is stored in each ResidueType object: \n", "\n", "\t\tpose.residue_type(i).carbohydrate_info()\n", " \n", "We will cover both of these classes in the next tutorial.\n", "\n", "## Documentation\n", "https://www.rosettacommons.org/docs/latest/application_documentation/carbohydrates/WorkingWithGlycans\n", "\n", "\n", "## References\n", "\n", "\n", "**Residue centric modeling and design of saccharide and glycoconjugate structures**\n", "Jason W. Labonte Jared Adolf-Bryfogle William R. Schief Jeffrey J. Gray\n", "_Journal of Computational Chemistry_, 11/30/2016 - \n", "\n", "\n", "**Automatically Fixing Errors in Glycoprotein Structures with Rosetta**\n", "Brandon Frenz, Sebastian Rämisch, Andrew J. Borst, Alexandra C. Walls\n", "Jared Adolf-Bryfogle, William R. Schief, David Veesler, Frank DiMaio\n", "_Structure_, 1/2/2019" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialization\n", "\n", "

Let's use Pyrosetta to compare some common monosaccharide residues and see how they differ. As usual, we start by importing the `pyrosetta` and `rosetta` namespaces.

" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "!pip install pyrosettacolabsetup\n", "import pyrosettacolabsetup; pyrosettacolabsetup.install_pyrosetta()\n", "import pyrosetta; pyrosetta.init()\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from pyrosetta import *\n", "from pyrosetta.teaching import *\n", "from pyrosetta.rosetta import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, one needs the `-include_sugars` option, which will tell Rosetta to load sugars and add the sugar_bb energy term to a default scorefunction. This scoreterm is like rama for the sugar dihedrals which connect each sugar residue. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PyRosetta-4 2019 [Rosetta PyRosetta4.Release.python36.mac 2019.39+release.93456a567a8125cafdf7f8cb44400bc20b570d81 2019-09-26T14:24:44] retrieved from: http://www.pyrosetta.org\n", "(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.\n", "\u001b[0mcore.init: \u001b[0mChecking for fconfig files in pwd and ./rosetta/flags\n", "\u001b[0mcore.init: \u001b[0mReading fconfig.../Users/jadolfbr/.rosetta/flags/common\n", "\u001b[0mcore.init: \u001b[0m\n", "\u001b[0mcore.init: \u001b[0m\n", "\u001b[0mcore.init: \u001b[0mRosetta version: PyRosetta4.Release.python36.mac r233 2019.39+release.93456a567a8 93456a567a8125cafdf7f8cb44400bc20b570d81 http://www.pyrosetta.org 2019-09-26T14:24:44\n", "\u001b[0mcore.init: \u001b[0mcommand: PyRosetta -include_sugars -database /Users/jadolfbr/Library/Python/3.6/lib/python/site-packages/pyrosetta-2019.39+release.93456a567a8-py3.6-macosx-10.6-intel.egg/pyrosetta/database\n", "\u001b[0mbasic.random.init_random_generator: \u001b[0m'RNG device' seed mode, using '/dev/urandom', seed=1177525307 seed_offset=0 real_seed=1177525307\n", "\u001b[0mbasic.random.init_random_generator: \u001b[0mRandomGenerator:init: Normal mode, seed=1177525307 RG_type=mt19937\n" ] } ], "source": [ "init('-include_sugars')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When loading structures from the PDB that include glycans, we use these options. This includes an option to write out the structures in pdb format instead of the (better) Rosetta format. We will be using these options in the next tutorial. \n", "\n", "\t\t-maintain_links\n", "\t\t-auto_detect_glycan_connections\n", "\t\t-alternate_3_letter_codes pdb_sugar\n", "\t\t-write_glycan_pdb_codes\n", " -load_PDB_components false" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "pm = PyMOLMover()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating Saccharides from Sequence\n", "\n", "We will use the function, `pose_from_saccharide_sequence()`, which must be imported from the `core.pose` namespace. Unlike with peptide chains, one-letter-codes will not suffice when specifying saccharide chains, because there is too much information to convey; we must use at least four letters. The first three letters are the sugar's three-letter code; the fourth letter designates whether the residue is a furanose (`f`) or pyranose (`p`)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from pyrosetta.rosetta.core.pose import pose_from_saccharide_sequence" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0mcore.chemical.GlobalResidueTypeSet: \u001b[0mFinished initializing fa_standard residue type set. Created 1251 residue types\n", "\u001b[0mcore.chemical.GlobalResidueTypeSet: \u001b[0mTotal time to initialize 1.25647 seconds.\n", "\u001b[0mcore.pose: \u001b[0m by appending by jump...\n", "\u001b[0mcore.conformation.carbohydrates.GlycanTreeSet: \u001b[0mSetting up Glycan Trees\n", "\u001b[0mcore.conformation.carbohydrates.GlycanTreeSet: \u001b[0mFound 1 glycan trees.\n", "\u001b[0mcore.pose: \u001b[0m by appending by jump...\n", "\u001b[0mcore.conformation.carbohydrates.GlycanTreeSet: \u001b[0mSetting up Glycan Trees\n", "\u001b[0mcore.conformation.carbohydrates.GlycanTreeSet: \u001b[0mFound 1 glycan trees.\n", "\u001b[0mcore.pose: \u001b[0m by appending by jump...\n", "\u001b[0mcore.conformation.carbohydrates.GlycanTreeSet: \u001b[0mSetting up Glycan Trees\n", "\u001b[0mcore.conformation.carbohydrates.GlycanTreeSet: \u001b[0mFound 1 glycan trees.\n" ] } ], "source": [ "glucose = pose_from_saccharide_sequence('Glcp')\n", "galactose = pose_from_saccharide_sequence('Galp')\n", "mannose = pose_from_saccharide_sequence('Manp')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "### L and D Forms\n", "\n", "

Just like with peptides, saccharides come in two enantiomeric forms, labelled l and d. (Note the small-caps, used in print.) These can be loaded into PyRosetta using the prefixes `L-` and `D-`.

" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0mcore.pose: \u001b[0m by appending by jump...\n", "\u001b[0mcore.conformation.carbohydrates.GlycanTreeSet: \u001b[0mSetting up Glycan Trees\n", "\u001b[0mcore.conformation.carbohydrates.GlycanTreeSet: \u001b[0mFound 1 glycan trees.\n" ] } ], "source": [ "L_glucose = pose_from_saccharide_sequence('L-Glcp')\n", "D_glucose = pose_from_saccharide_sequence('D-Glcp')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Anomers\n", "\n", "

The carbon that is at a higher oxidation state — that is, the carbon of the hemiacetal/-ketal in the cyclic form or the carbon that is the carbonyl carbon of the aldehyde or ketone in the linear form — is called the anomeric carbon. Because the carbonyl of an aldehyde or ketone is planar, a sugar molecule can cyclize into one of two forms, one in which the resulting hydroxyl group is pointing \"up\" and another in which the same hydroxyl group is pointing \"down\". These two anomers are labelled α and β.

\n", "" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0mcore.pose: \u001b[0m by appending by jump...\n", "\u001b[0mcore.conformation.carbohydrates.GlycanTreeSet: \u001b[0mSetting up Glycan Trees\n", "\u001b[0mcore.conformation.carbohydrates.GlycanTreeSet: \u001b[0mFound 1 glycan trees.\n" ] } ], "source": [ "alpha_D_glucose = pose_from_saccharide_sequence('a-D-Glcp')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Linear Oligosaccharides & IUPAC Sequences\n", "\n", "Oligo- and polysaccharides are composed of simple monosaccharide residues connected by acetal and ketal linkages called __glycosidic bonds__. Any of the monosaccharide's _hydroxyl_ groups can be used to form a linkage to the anomeric carbon of another monosaccharide, leading to both _linear_ and _branched_ molecules.\n", " \n", "Rosetta can create both _linear_ and _branched_ oligosaccharides from an __IUPAC__ sequence. (IUPAC is the international organization dedicated to chemical nomenclature.)\n", "\n", "

To properly build a linear oligosaccharide, Rosetta must know the following details about each sugar residue being created in the following order:

\n", "\n", " - Main-chain connectivity — →2) (`->2)`), →4) (`->4)`), →6) (`->6)`), _etc._; default value is `->4)-`\n", " - Anomeric form — α (`a` or `alpha`) or β (`b` or `beta`); default value is `alpha`\n", " - Enantiomeric form — l (`L`) or d (`D`); default value is `D`\n", " - 3-Letter code — required; uses sentence case\n", " - Ring form code — f (for a furanose/5-membered ring), p (for a pyranose/6-membered ring); required\n", " \n", " \n", "Residues must be separated by hyphens. Glycosidic linkages can be specified with full IUPAC notation, _e.g._, `-(1->4)-` for “-(1→4)-”. (This means that the residue on the left connects from its C1 (anomeric) position to the hydoxyl oxygen at C4 of the residue on the right.) Rosetta will assume `-(1->` for aldoses and `-(2->` for ketoses.

Note that the standard is to write the IUPAC sequence of a saccharide chain in reverse order from how they are numbered. Lets create three new oligosacharides from sequence." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0mcore.pose: \u001b[0m by appending by jump...\n", "\u001b[0mcore.conformation.carbohydrates.GlycanTreeSet: \u001b[0mSetting up Glycan Trees\n", "\u001b[0mcore.conformation.carbohydrates.GlycanTreeSet: \u001b[0mFound 1 glycan trees.\n", "\u001b[0mcore.pose: \u001b[0m by appending by jump...\n", "\u001b[0mcore.conformation.carbohydrates.GlycanTreeSet: \u001b[0mSetting up Glycan Trees\n", "\u001b[0mcore.conformation.carbohydrates.GlycanTreeSet: \u001b[0mFound 1 glycan trees.\n", "\u001b[0mcore.pose: \u001b[0m by appending by jump...\n", "\u001b[0mcore.conformation.carbohydrates.GlycanTreeSet: \u001b[0mSetting up Glycan Trees\n", "\u001b[0mcore.conformation.carbohydrates.GlycanTreeSet: \u001b[0mFound 1 glycan trees.\n" ] } ], "source": [ "maltotriose = pose_from_saccharide_sequence('a-D-Glcp-' * 3)\n", "lactose = pose_from_saccharide_sequence('b-D-Galp-(1->4)-a-D-Glcp')\n", "isomaltose = pose_from_saccharide_sequence('->6)-Glcp-' * 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### General Residue Information\n", "When you print a `Pose` containing carbohydrate residues, the sugar residues will be listed as `Z` in the sequence." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "maltotriose\n", " PDB file name: alpha-D-Glcp-(1->4)-alpha-D-Glcp-(1->4)-alpha-D-Glcp\n", "Total residues: 3\n", "Sequence: ZZZ\n", "Fold tree:\n", "FOLD_TREE EDGE 1 3 -1 \n", "\n", "isomaltose\n", " PDB file name: alpha-D-Glcp-(1->6)-alpha-D-Glcp\n", "Total residues: 2\n", "Sequence: ZZ\n", "Fold tree:\n", "FOLD_TREE EDGE 1 2 -1 \n", "\n", "lactose\n", " PDB file name: beta-D-Galp-(1->4)-alpha-D-Glcp\n", "Total residues: 2\n", "Sequence: ZZ\n", "Fold tree:\n", "FOLD_TREE EDGE 1 2 -1 \n" ] } ], "source": [ "print(\"maltotriose\\n\", maltotriose)\n", "print(\"\\nisomaltose\\n\", isomaltose)\n", "print(\"\\nlactose\\n\", lactose)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, you can have Rosetta print out the sequences for individual chains, using the `chain_sequence()` method. If you do this, Rosetta is smart enough to give you a distinct sequence format for saccharide chains. (You may have noticed that the default file name for a `.pdb` file created from this `Pose` will be the same sequence.)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "alpha-D-Glcp-(1->4)-alpha-D-Glcp-(1->4)-alpha-D-Glcp\n" ] } ], "source": [ "print(maltotriose.chain_sequence(1))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "alpha-D-Glcp-(1->6)-alpha-D-Glcp\n" ] } ], "source": [ "print(isomaltose.chain_sequence(1))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "beta-D-Galp-(1->4)-alpha-D-Glcp\n" ] } ], "source": [ "print(lactose.chain_sequence(1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Again, the standard is to show the sequence of a saccharide chain in reverse order from how they are numbered.

This is also how phi, psi, and omega are defined. From i+1 to i. " ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 ->4)-alpha-D-Glcp:reducing_end\n", "2 ->4)-beta-D-Galp:non-reducing_end\n" ] } ], "source": [ "for res in lactose.residues: print(res.seqpos(), res.name())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Notice that for polysaccharides, the upstream residue is called the reducing end, while the downstream residue is called the non-reducing end.

\n", "\n", "You will also see the terms parent and child being used across Rosetta. Here, for Residue 2, residue 1 is the parent. For Residue 1, Residue 2 is the child. Due to branching, residues can have more than one child/non-reducing-end, but only a single parent residue. \n", "\n", "

Rosetta stores carbohydrate-specific information within `ResidueType`. If you print a residue, this additional information will be displayed.

" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Residue 1: ->4)-alpha-D-Glcp:reducing_end:non-reducing_end (Glc, Z):\n", "Base: ->4)-alpha-D-Glcp\n", " Properties: POLYMER CARBOHYDRATE LOWER_TERMINUS UPPER_TERMINUS POLAR CYCLIC HEXOSE ALDOSE D_SUGAR PYRANOSE ALPHA_SUGAR\n", " Variant types: UPPER_TERMINUS_VARIANT LOWER_TERMINUS_VARIANT\n", " Main-chain atoms: C1 C2 C3 C4 O4 \n", " Backbone atoms: C1 C2 C3 C4 O4 C5 O5 VO5 VC1 H1 H2 H3 H4 HO4 H5 \n", " Ring atoms: C1 C2 C3 C4 C5 O5 \n", " Side-chain atoms: O1 O2 O3 C6 O6 HO1 HO2 HO3 1H6 2H6 HO6\n", "Carbohydrate Properties for this Residue:\n", " Basic Name: glucose\n", " IUPAC Name: alpha-D-glucopyranose\n", " Abbreviation: alpha-D-Glcp\n", " Classification: aldohexose\n", " Stereochemistry: D\n", " Ring Form: pyranose\n", " Anomeric Form: alpha\n", " Modifications: \n", " none\n", " Polymeric Information:\n", " Main chain connection: N/A\n", " Branch connections: none\n", "Ring Conformer: 4C1 (chair): C-P parameters (q, phi, theta): 0.55, 180, 0; nu angles (degrees): 60, -60, 60, -60, 60, -60\n", " O1 : axial\n", " O2 : equatorial\n", " O3 : equatorial\n", " O4 : equatorial\n", " C6 : equatorial\n", "Atom Coordinates:\n", " C1 : 0, 0, 0\n", " C2 : 1.55, 0, 0\n", " C3 : 2.04812, 1.44664, 0\n", " C4 : 1.50806, 2.11919, -1.26369\n", " O4 : 1.94666, 3.46908, -1.30661\n", " C5 : -0.0200415, 2.06186, -1.21358\n", " O5 : -0.475077, 0.686176, -1.1593\n", " VO5: -0.492509, 0.676579, -1.17187 (virtual)\n", " VC1: 0.031762, 0.00822503, 0.00564973 (virtual)\n", " O1 : -0.494034, 0.697555, 1.2082\n", " O2 : 2.02401, -0.669275, 1.15922\n", " O3 : 3.4779, 1.4716, 1.64563e-16\n", " C6 : -0.614146, 2.71298, -2.43962\n", " O6 : -0.225074, 4.07556, -2.53127\n", " H1 : -0.370662, -1.03564, 0.00767336\n", " H2 : 1.90812, -0.520035, -0.900727\n", " H3 : 1.67301, 1.95456, 0.900727\n", " H4 : 1.88381, 1.57916, -2.14527\n", " HO4: 1.61609, 3.94572, -0.516717\n", " H5 : -0.369153, 2.59396, -0.316372\n", " HO1: -0.167832, 1.62167, 1.20877\n", " HO2: 3.00401, -0.669275, 1.15922\n", " HO3: 3.78886, 2.40096, 5.03844e-17\n", " 1H6 : -1.71106, 2.65811, -2.3783\n", " 2H6 : -0.261365, 2.17983, -3.33478\n", " HO6: -0.621924, 4.47587, -3.33293\n", "Mirrored relative to coordinates in ResidueType: FALSE\n", "\n" ] } ], "source": [ "print(glucose.residue(1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "