{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "choice-kuwait",
   "metadata": {},
   "source": [
    "## Rotamers & Packer\n",
    "\n",
    "@Author: 吴炜坤\n",
    "\n",
    "@email:weikun.wu@xtalpi.com/weikunwu@163.com"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "interstate-contents",
   "metadata": {},
   "source": [
    "### 一、旋转异构体(Rotamer)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "material-ordering",
   "metadata": {},
   "source": [
    "最早在有机化学的课本上就着重介绍过旋转异构体的概念,经典的例子是一个乙烷分子的不同二面角状态下能量的分布曲线,由于氢原子(红色标记)存在位阻效应,在重叠式状态下能量最高,而交叉式的能量最低。当能垒足够高的时候,构象异构体之间就不能轻易互相转换,呈现出几个极小值。具有极小值势能的几个构象互为旋转异构体(Rotamer)。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "solar-timer",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/what_is_rotamer.jpg\" width = \"700\" height = \"200\" align=center /></center>\n",
    "(图片来源: google图片)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "described-fitting",
   "metadata": {},
   "source": [
    "蛋白质中的氨基酸侧链构像也呈现出离散型的分布, 对于每一个sp3-sp3杂化间为中心的$\\chi$二面角,都有g-(-60°),t(180°),g+(+60°)三种离散低能状态存在(类似乙烷分子的交叉式和重叠式)。\n",
    "\n",
    "例如,谷氨酸的$\\chi_{1}$二面角分布中,CG-CB原子形成的二面角是sp3-sp3杂化键为中心的,因此也有3种低能状态,但与上述的乙烷分子不同,CA原子上相连是C=O、NH、H三种不同的基团,这些基团有着不同的体积(位阻效应不同)。对于$\\chi_{1}$角来讲g-,t,g+三种二面角的出现概率是不相等的。换言之,谷氨酸侧链$\\chi_{1}$角的分布是依赖于蛋白骨架二面角的(backbone-dependent rotamer),因为骨架上NH、C=O基团的朝向是由$\\phi$和$\\psi$角,不同的骨架二面角具有不同的位阻效应。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "prerequisite-formation",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/residue_rotamer.jpg\" width = \"800\" height = \"200\" align=center /></center>\n",
    "(图片来源: google图片)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "quantitative-diversity",
   "metadata": {},
   "source": [
    "对于sp3-sp3杂化键为中心的$\\chi$二面角构象通常是对称的,也称为具有rotameric性质的。既然氨基酸侧链的$\\chi$角状态分布是可数的,那么将每个$\\chi$角的状态组合起来,就可以得到若干个离散的能量构象。如对于谷氨酸而言, 𝜒1 和 𝜒2 是rotameric的,分别有3种状态g-,t,g+。那组合起来便有9种状态,分别是<g-,g->, <g-,t>, <g-,g+>, <t,g->, <t,g+>, <t,t>, <g+,g->, <g+,t>, <g+,g+>。每一种Rotamer的出现概率是依赖于蛋白骨架二面角,如果将蛋白骨架二面角$\\phi$和$\\psi$以每10°x10°作为统计的单位区间,就可以得到Rotamer出现的概率分布图(共计9个)。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "abroad-administrator",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/rotamerprobility.jpg\" width = \"600\" height = \"200\" align=center /></center>\n",
    "(图片来源: http://dunbrack.fccc.edu/bbdep2010/)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "matched-anger",
   "metadata": {},
   "source": [
    "细心的读者可能会发现,上述例子中的谷氨酸最后一个$\\chi_{3}$二面角的中心是sp3-sp2杂化的,CG原子上连接的是羧基团平面。这种二面角是Non-rotameric的。Non-rotameric二面角往往是非对称性的,其分布状态受到先前原子二面角构象的影响。如对于谷氨酸而言,$\\chi_{1}$和$\\chi_{2}$是rotameric的,构象异构体共有9种状态,$\\chi_{3}$二面角的分布式概率依赖于9种状态进行统计的。如下图中直观展示,谷氨酸$\\chi_{1}$$\\chi_{2}$处于<g+,g->状态下,$\\chi_{3}$二面角的概率分布: "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "unavailable-cuisine",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/05PercSmoothed.GLU.Chi3Dens.gp_gm.gif\" width = \"400\" height = \"200\" align=center /></center>\n",
    "(图片来源: http://dunbrack.fccc.edu/bbdep2010/)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ongoing-building",
   "metadata": {},
   "source": [
    "### 二、Rotamer Library"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "serial-anaheim",
   "metadata": {},
   "source": [
    "在之前的章节中,我们提及过Rosetta使用MCMC的方法对骨架构象进行采样。但是在全原子模型下,氨基酸残基的侧链构象的建模是尤为重要的。不同的侧链构象具有不同的能量,侧链建模本质上就是根据Rotamer能量的高低进行构象优化和搜索。Rotamer Library在这个过程中即负责记录每种Rotamer构象在特定$\\phi$和$\\psi$的bin格中的出现概率,随后将概率取-log对数得到Rotamer自身的单体能量(当然在Rosetta中处理稍微更复杂一些)。\n",
    "\n",
    "以下是Rosetta中内置的Dunbrack Backbone-dependent Rotamer Library中谷氨酸部分数据的展示:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "blocked-juvenile",
   "metadata": {},
   "source": [
    "|Resname|Phi|Psi|r1 |r2 |r3 |r4 |Probabil |chi1Val|chi2Val|chi3Val|chi4Val|chi1Sig|chi2Sig|chi3Sig|chi4Sig|\n",
    "|-------|---|---|---|---|---|---|---------|-------|-------|-------|-------|-------|-------|-------|-------|\n",
    "|GLU    |170|-60|2  |2  |1  |0  |0.125876 |-176.9 |177.5  |-1.2   |0      |9.6    |11.2   |8.4    |0      |\n",
    "|GLU    |170|-60|1  |2  |1  |0  |0.0852961|64.9   |-178.1 |-1.3   |0      |10.2   |11.3   |8.5    |0      |\n",
    "|GLU    |170|-60|2  |2  |6  |0  |0.0777195|-176.9 |177.5  |-29.6  |0      |9.6    |11.2   |8.6    |0      |\n",
    "|GLU    |170|-60|2  |2  |2  |0  |0.0774883|-176.9 |177.5  |27.1   |0      |9.6    |11.2   |8.7    |0      |\n",
    "|GLU    |170|-60|1  |2  |2  |0  |0.0641043|64.9   |-178.1 |27.9   |0      |10.2   |11.3   |8.7    |0      |\n",
    "|GLU    |170|-60|1  |2  |6  |0  |0.0597918|64.9   |-178.1 |-30.1  |0      |10.2   |11.3   |8.7    |0      |\n",
    "|GLU    |170|-60|2  |2  |3  |0  |0.0506507|-176.9 |177.5  |58     |0      |9.6    |11.2   |8.6    |0      |\n",
    "|GLU    |170|-60|1  |2  |3  |0  |0.0493846|64.9   |-178.1 |58.2   |0      |10.2   |11.3   |8.7    |0      |\n",
    "|GLU    |170|-60|1  |2  |5  |0  |0.0477243|64.9   |-178.1 |-60.9  |0      |10.2   |11.3   |8.6    |0      |\n",
    "|GLU    |170|-60|2  |2  |5  |0  |0.0459856|-176.9 |177.5  |-60.1  |0      |9.6    |11.2   |8.6    |0      |\n",
    "|GLU    |170|-60|3  |3  |1  |0  |0.0419302|-66.1  |-66    |-37.2  |0      |8.7    |11.2   |8.6    |0      |\n",
    "|GLU    |170|-60|1  |2  |4  |0  |0.0414128|64.9   |-178.1 |88.6   |0      |10.2   |11.3   |8.8    |0      |\n",
    "|GLU    |170|-60|2  |2  |4  |0  |0.0363803|-176.9 |177.5  |88.3   |0      |9.6    |11.2   |8.8    |0      |\n",
    "|GLU    |170|-60|3  |3  |2  |0  |0.0308673|-66.1  |-66    |-9.3   |0      |8.7    |11.2   |8.3    |0      |\n",
    "|GLU    |170|-60|3  |3  |6  |0  |0.0237073|-66.1  |-66    |-63.9  |0      |8.7    |11.2   |8.1    |0      |\n",
    "|GLU    |170|-60|3  |2  |1  |0  |0.0204087|-66.7  |179.1  |-5.8   |0      |8.2    |11.8   |8.4    |0      |\n",
    "|GLU    |170|-60|2  |3  |1  |0  |0.0132526|-170.5 |-83.3  |-28.7  |0      |10.9   |9.3    |8.2    |0      |\n",
    "|GLU    |170|-60|3  |2  |6  |0  |0.0130031|-66.7  |179.1  |-34    |0      |8.2    |11.8   |8.6    |0      |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "durable-amazon",
   "metadata": {},
   "source": [
    "字段的含义:\n",
    "- Phi, Psi字段记录了骨架二面角$\\phi$和$\\psi$的bin格的区间;\n",
    "- r1/r2/r3/4代表$\\chi$角的id,如果是rotameric的,r1=1、2或3时,对应的就是g-,t,g+三个区间。以第一行数据为例,谷氨酸的$\\chi_{3}$是Non-rotameric,r3=1时,对应的角度是-1.2°。但是r3 id对应的角度不是唯一的,它是根据$\\chi_{1}$和$\\chi_{2}$的id组合不同而不同的,如第11行中,r1=3,r2=3时,r3同样等于1,但是其对应的角度却是-37.2°;\n",
    "- Probabil代表了某个Rotamer出现概率,在一个$\\phi$和$\\psi$的bin格的区间中,所有Rotamer出现概率之和为1;\n",
    "- chi1Val等代表$\\chi_{1}$的二面角平均值;\n",
    "- chi1Sig等代表$\\chi_{1}$的二面角在这个bin格中的分布方差。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "durable-butler",
   "metadata": {},
   "source": [
    "在Rosetta中Rotamer的能量计算公式如下,主要由三项组成:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "suffering-symposium",
   "metadata": {},
   "source": [
    "$E_{\\text {fa_dun }}=\\sum_{r}-\\ln \\left(P\\left(\\operatorname{rot}_{r} \\mid \\phi_{r}, \\psi_{r}, a a_{r}\\right)\\right)+\\sum_{k<T_{r}} \\frac{1}{2}\\left(\\frac{\\chi_{k, r}-\\mu_{\\chi_{k}}\\left(\\phi_{r}, \\psi_{r} \\mid \\operatorname{rot}_{r}, \\mathrm{aa}_{r}\\right)}{\\sigma_{\\chi_{k}}\\left(\\phi_{r}, \\psi_{r} \\mid \\operatorname{rot}_{r}, \\mathrm{aa}_{r}\\right)}\\right)^{2}+$\n",
    "$-\\ln \\left(P\\left(\\chi_{T_{r}, r} \\mid \\phi_{r}, \\psi_{r}, \\operatorname{rot}_{r}, \\mathrm{aa}_{r}\\right)\\right)$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "earlier-auditor",
   "metadata": {},
   "source": [
    "第一项是$\\sum_{r}-\\ln \\left(P\\left(\\operatorname{rot}_{r} \\mid \\phi_{r}, \\psi_{r}, \\mathrm{aa}_{r}\\right)\\right)$,这一项其实就是对一个Rotamer的Probabil取-ln对数,可直接从数据库中获取,直接评估当前rotamer类型的标准化平均能量。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "manufactured-craft",
   "metadata": {},
   "source": [
    "第二项是$\\sum_{k<T_{r}} \\frac{1}{2}\\left(\\frac{\\chi_{k, r}-\\mu_{\\chi_{k}}\\left(\\phi_{r}, \\psi_{r} \\mid \\operatorname{rot}_{r}, \\mathrm{aa}_{r}\\right)}{\\sigma_{\\chi_{k}}\\left(\\phi_{r}, \\psi_{r} \\mid \\operatorname{rot}_{r}, \\mathrm{aa}_{r}\\right)}\\right)^{2}$,该项的作用是对当$\\chi$角偏离平均值时做出的能量惩罚/校正(能量升高),因为一个rotamer的$\\chi$角不可能正好等于数据库中的平均值。该项惩罚递增的速率由方差决定。这一惩罚项只对rotameric$\\chi$角有效。该函数形状类似高斯分布,$\\mu_{\\chi_{k}}$指的是chi1Val、chi2Val等为$\\chi$角的平均值,$\\sigma_{\\chi_{k}}$即为chi1Sig等值,代表方差。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "hispanic-synthetic",
   "metadata": {},
   "source": [
    "第三项是$-\\ln \\left(P\\left(\\chi_{T_{r}, r} \\mid \\phi_{r}, \\psi_{r}, \\operatorname{rot}_{r}, \\mathrm{aa}_{r}\\right)\\right)$,该项主要对最后一个$\\chi$角起作用,主要用于处理Non-rotameric二面角。当Rotamer最后一个$\\chi$角是rotameric时,该项恒等于0。反之对最后一个$\\chi$角做出相应的能量校正。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "handled-symposium",
   "metadata": {},
   "source": [
    "综上使用三项既可以分别描述每一个$\\chi$角的能量分布曲线,也可以同时用于描述每种Rotamer构型r在整个$\\phi$和$\\psi$bin中的概率分布,如下效果:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "rocky-lunch",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/Efun.jpg\" width = \"800\" height = \"200\" align=center /></center>\n",
    "(图片来源: Meiler Lab Rosetta2020教程中的Rosetta_Energy_Function ppt)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "greenhouse-louisiana",
   "metadata": {},
   "source": [
    "### 三、Rosetta Packer\n",
    "\n",
    "截止上述,我们已经完整地阐述了在Rosetta中侧链Rotamer的相关概念以及Rotamer Library和能量的计算方式。在这一节,我们将会介绍Rosetta Packer的基本算法以及具体应用方式。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fossil-forth",
   "metadata": {},
   "source": [
    "试想下如下场景: 在一个连续的两股螺旋组成的体系中,第i号位点上为精氨酸残基(R),第j号位点上是谷氨酸残基(E)。现在需要求解i和j位点上构象能量最低的状态?\n",
    "一个可能的笨方法是遍历所有i和j位点上的rotamer构象,分别计算每一组rotamer状态下的能量值,最终选取能量最低的两个rotamer即可。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "optimum-identity",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/interactiongraph.jpg\" width = \"800\" height = \"200\" align=center /></center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "valued-quantum",
   "metadata": {},
   "source": [
    "在Rosetta中,侧链优化的基本原理的确如此,每两个氨基酸位点之间,不同Rotamer组合的能量就是如此遍历计算,并储存在一个矩阵(InteractionGraph)当中。不过当氨基酸位点数量的增加时,侧链构象优化的问题将变得极为复杂(NP-hard)。Rosetta Packer算法采用的是mcmc+模拟退火的方法,配合InteractionGraph的预计算的数据,通过查询表的方式,非常快速地得到目标构象的能量值,而不需要每次都反复重新计算,该机制大幅度提升了侧链优化和搜索的速度。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "urban-vector",
   "metadata": {},
   "source": [
    "#### Packer模拟退火的C++代码\n",
    "```\n",
    "# Run simulated annealing:\n",
    "for ( int i = 1; i <= num_outer_iterations; ++i ) {\n",
    "  for ( int j = 1; j <= num_inner_iterations; ++j ) {\n",
    "     int newrot = pick_random_rotamer();\n",
    "     compute deltaE = E( newrot ) - E( oldrot )\n",
    "     if ( metropolis_accept( temp, deltaE )) {\n",
    "          accept random_rotamer;\n",
    "      }\n",
    "  }\n",
    "  decrease_temperature( temp );\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "premium-intention",
   "metadata": {},
   "source": [
    "Rotamer构象模拟退火的包括两个部分: \n",
    "\n",
    "第一个部分是外部的模拟退火控制,根据outer循环的迭代次数不断地降低温度,使得整个蛋白质的构象被“困”在能量的极小值区域。\n",
    "\n",
    "第二个部分是mcmc采样,pick_random_rotamer函数即随机选择一个位点,并随机生成该氨基酸残基的一个rotamer。接着是熟悉的Metropolis准则,如果当前构象能量下降接受新的构象,反之以一定的概率接受新构象。\n",
    "\n",
    "完成整个模拟退火过程后,将能量最低的构象作为输出。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "postal-arrangement",
   "metadata": {},
   "source": [
    "#### Packer的运行模式"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "previous-liberia",
   "metadata": {},
   "source": [
    "Packer有三种运行模式,分别是“Repacking”、“Rotamer Trial”和“Design”,这三种模式有着应用上的一定区别。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "russian-looking",
   "metadata": {},
   "source": [
    "“Repacking”直译的含义是“重新打包”,顾名思义,需要将以前侧链的状态先“忘记”,重头优化构象。具体的做法是:\n",
    "1. 将所有的侧链原子删除\n",
    "2. 逐步添加一些侧链结构,并运行模拟退火\n",
    "3. 重复这个过程,得到一系列低能量的侧链构象集合"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "friendly-nigeria",
   "metadata": {},
   "source": [
    "“Rotamer Trial”直译是“尝试Rotamer”,顾名思义,需要在当前的构象上继续优化,优点是快速。具体的做法是:\n",
    "1. 随机选择一个位点\n",
    "2. 运行模拟退火搜索能量更低的构象\n",
    "3. 如此往复,直到所有侧链能量比较高的位点都得到了优化"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "retained-sacramento",
   "metadata": {},
   "source": [
    "“Design”直译就是“序列设计”,因此这个模式下可以运行氨基酸类型发生变化。其具体的做法与“Repacking”类似,但最大的不同在于其考虑了所有被允许出现的氨基酸所有的Rotamer构象。这种模式往往需要配合骨架的Relax来避免过快收敛。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}