{ "cells": [ { "cell_type": "markdown", "id": "choice-kuwait", "metadata": {}, "source": [ "## Rotamers & Packer\n", "\n", "@Author: 吴炜坤\n", "\n", "@email:weikun.wu@xtalpi.com/weikunwu@163.com" ] }, { "cell_type": "markdown", "id": "interstate-contents", "metadata": {}, "source": [ "### 一、旋转异构体(Rotamer)\n" ] }, { "cell_type": "markdown", "id": "material-ordering", "metadata": {}, "source": [ "最早在有机化学的课本上就着重介绍过旋转异构体的概念,经典的例子是一个乙烷分子的不同二面角状态下能量的分布曲线,由于氢原子(红色标记)存在位阻效应,在重叠式状态下能量最高,而交叉式的能量最低。当能垒足够高的时候,构象异构体之间就不能轻易互相转换,呈现出几个极小值。具有极小值势能的几个构象互为旋转异构体(Rotamer)。" ] }, { "cell_type": "markdown", "id": "solar-timer", "metadata": {}, "source": [ "<center><img src=\"./img/what_is_rotamer.jpg\" width = \"700\" height = \"200\" align=center /></center>\n", "(图片来源: google图片)" ] }, { "cell_type": "markdown", "id": "described-fitting", "metadata": {}, "source": [ "蛋白质中的氨基酸侧链构像也呈现出离散型的分布, 对于每一个sp3-sp3杂化间为中心的$\\chi$二面角,都有g-(-60°),t(180°),g+(+60°)三种离散低能状态存在(类似乙烷分子的交叉式和重叠式)。\n", "\n", "例如,谷氨酸的$\\chi_{1}$二面角分布中,CG-CB原子形成的二面角是sp3-sp3杂化键为中心的,因此也有3种低能状态,但与上述的乙烷分子不同,CA原子上相连是C=O、NH、H三种不同的基团,这些基团有着不同的体积(位阻效应不同)。对于$\\chi_{1}$角来讲g-,t,g+三种二面角的出现概率是不相等的。换言之,谷氨酸侧链$\\chi_{1}$角的分布是依赖于蛋白骨架二面角的(backbone-dependent rotamer),因为骨架上NH、C=O基团的朝向是由$\\phi$和$\\psi$角,不同的骨架二面角具有不同的位阻效应。" ] }, { "cell_type": "markdown", "id": "prerequisite-formation", "metadata": {}, "source": [ "<center><img src=\"./img/residue_rotamer.jpg\" width = \"800\" height = \"200\" align=center /></center>\n", "(图片来源: google图片)" ] }, { "cell_type": "markdown", "id": "quantitative-diversity", "metadata": {}, "source": [ "对于sp3-sp3杂化键为中心的$\\chi$二面角构象通常是对称的,也称为具有rotameric性质的。既然氨基酸侧链的$\\chi$角状态分布是可数的,那么将每个$\\chi$角的状态组合起来,就可以得到若干个离散的能量构象。如对于谷氨酸而言, 𝜒1 和 𝜒2 是rotameric的,分别有3种状态g-,t,g+。那组合起来便有9种状态,分别是<g-,g->, <g-,t>, <g-,g+>, <t,g->, <t,g+>, <t,t>, <g+,g->, <g+,t>, <g+,g+>。每一种Rotamer的出现概率是依赖于蛋白骨架二面角,如果将蛋白骨架二面角$\\phi$和$\\psi$以每10°x10°作为统计的单位区间,就可以得到Rotamer出现的概率分布图(共计9个)。" ] }, { "cell_type": "markdown", "id": "abroad-administrator", "metadata": {}, "source": [ "<center><img src=\"./img/rotamerprobility.jpg\" width = \"600\" height = \"200\" align=center /></center>\n", "(图片来源: http://dunbrack.fccc.edu/bbdep2010/)" ] }, { "cell_type": "markdown", "id": "matched-anger", "metadata": {}, "source": [ "细心的读者可能会发现,上述例子中的谷氨酸最后一个$\\chi_{3}$二面角的中心是sp3-sp2杂化的,CG原子上连接的是羧基团平面。这种二面角是Non-rotameric的。Non-rotameric二面角往往是非对称性的,其分布状态受到先前原子二面角构象的影响。如对于谷氨酸而言,$\\chi_{1}$和$\\chi_{2}$是rotameric的,构象异构体共有9种状态,$\\chi_{3}$二面角的分布式概率依赖于9种状态进行统计的。如下图中直观展示,谷氨酸$\\chi_{1}$$\\chi_{2}$处于<g+,g->状态下,$\\chi_{3}$二面角的概率分布: " ] }, { "cell_type": "markdown", "id": "unavailable-cuisine", "metadata": {}, "source": [ "<center><img src=\"./img/05PercSmoothed.GLU.Chi3Dens.gp_gm.gif\" width = \"400\" height = \"200\" align=center /></center>\n", "(图片来源: http://dunbrack.fccc.edu/bbdep2010/)" ] }, { "cell_type": "markdown", "id": "ongoing-building", "metadata": {}, "source": [ "### 二、Rotamer Library" ] }, { "cell_type": "markdown", "id": "serial-anaheim", "metadata": {}, "source": [ "在之前的章节中,我们提及过Rosetta使用MCMC的方法对骨架构象进行采样。但是在全原子模型下,氨基酸残基的侧链构象的建模是尤为重要的。不同的侧链构象具有不同的能量,侧链建模本质上就是根据Rotamer能量的高低进行构象优化和搜索。Rotamer Library在这个过程中即负责记录每种Rotamer构象在特定$\\phi$和$\\psi$的bin格中的出现概率,随后将概率取-log对数得到Rotamer自身的单体能量(当然在Rosetta中处理稍微更复杂一些)。\n", "\n", "以下是Rosetta中内置的Dunbrack Backbone-dependent Rotamer Library中谷氨酸部分数据的展示:" ] }, { "cell_type": "markdown", "id": "blocked-juvenile", "metadata": {}, "source": [ "|Resname|Phi|Psi|r1 |r2 |r3 |r4 |Probabil |chi1Val|chi2Val|chi3Val|chi4Val|chi1Sig|chi2Sig|chi3Sig|chi4Sig|\n", "|-------|---|---|---|---|---|---|---------|-------|-------|-------|-------|-------|-------|-------|-------|\n", "|GLU |170|-60|2 |2 |1 |0 |0.125876 |-176.9 |177.5 |-1.2 |0 |9.6 |11.2 |8.4 |0 |\n", "|GLU |170|-60|1 |2 |1 |0 |0.0852961|64.9 |-178.1 |-1.3 |0 |10.2 |11.3 |8.5 |0 |\n", "|GLU |170|-60|2 |2 |6 |0 |0.0777195|-176.9 |177.5 |-29.6 |0 |9.6 |11.2 |8.6 |0 |\n", "|GLU |170|-60|2 |2 |2 |0 |0.0774883|-176.9 |177.5 |27.1 |0 |9.6 |11.2 |8.7 |0 |\n", "|GLU |170|-60|1 |2 |2 |0 |0.0641043|64.9 |-178.1 |27.9 |0 |10.2 |11.3 |8.7 |0 |\n", "|GLU |170|-60|1 |2 |6 |0 |0.0597918|64.9 |-178.1 |-30.1 |0 |10.2 |11.3 |8.7 |0 |\n", "|GLU |170|-60|2 |2 |3 |0 |0.0506507|-176.9 |177.5 |58 |0 |9.6 |11.2 |8.6 |0 |\n", "|GLU |170|-60|1 |2 |3 |0 |0.0493846|64.9 |-178.1 |58.2 |0 |10.2 |11.3 |8.7 |0 |\n", "|GLU |170|-60|1 |2 |5 |0 |0.0477243|64.9 |-178.1 |-60.9 |0 |10.2 |11.3 |8.6 |0 |\n", "|GLU |170|-60|2 |2 |5 |0 |0.0459856|-176.9 |177.5 |-60.1 |0 |9.6 |11.2 |8.6 |0 |\n", "|GLU |170|-60|3 |3 |1 |0 |0.0419302|-66.1 |-66 |-37.2 |0 |8.7 |11.2 |8.6 |0 |\n", "|GLU |170|-60|1 |2 |4 |0 |0.0414128|64.9 |-178.1 |88.6 |0 |10.2 |11.3 |8.8 |0 |\n", "|GLU |170|-60|2 |2 |4 |0 |0.0363803|-176.9 |177.5 |88.3 |0 |9.6 |11.2 |8.8 |0 |\n", "|GLU |170|-60|3 |3 |2 |0 |0.0308673|-66.1 |-66 |-9.3 |0 |8.7 |11.2 |8.3 |0 |\n", "|GLU |170|-60|3 |3 |6 |0 |0.0237073|-66.1 |-66 |-63.9 |0 |8.7 |11.2 |8.1 |0 |\n", "|GLU |170|-60|3 |2 |1 |0 |0.0204087|-66.7 |179.1 |-5.8 |0 |8.2 |11.8 |8.4 |0 |\n", "|GLU |170|-60|2 |3 |1 |0 |0.0132526|-170.5 |-83.3 |-28.7 |0 |10.9 |9.3 |8.2 |0 |\n", "|GLU |170|-60|3 |2 |6 |0 |0.0130031|-66.7 |179.1 |-34 |0 |8.2 |11.8 |8.6 |0 |" ] }, { "cell_type": "markdown", "id": "durable-amazon", "metadata": {}, "source": [ "字段的含义:\n", "- Phi, Psi字段记录了骨架二面角$\\phi$和$\\psi$的bin格的区间;\n", "- r1/r2/r3/4代表$\\chi$角的id,如果是rotameric的,r1=1、2或3时,对应的就是g-,t,g+三个区间。以第一行数据为例,谷氨酸的$\\chi_{3}$是Non-rotameric,r3=1时,对应的角度是-1.2°。但是r3 id对应的角度不是唯一的,它是根据$\\chi_{1}$和$\\chi_{2}$的id组合不同而不同的,如第11行中,r1=3,r2=3时,r3同样等于1,但是其对应的角度却是-37.2°;\n", "- Probabil代表了某个Rotamer出现概率,在一个$\\phi$和$\\psi$的bin格的区间中,所有Rotamer出现概率之和为1;\n", "- chi1Val等代表$\\chi_{1}$的二面角平均值;\n", "- chi1Sig等代表$\\chi_{1}$的二面角在这个bin格中的分布方差。" ] }, { "cell_type": "markdown", "id": "durable-butler", "metadata": {}, "source": [ "在Rosetta中Rotamer的能量计算公式如下,主要由三项组成:" ] }, { "cell_type": "markdown", "id": "suffering-symposium", "metadata": {}, "source": [ "$E_{\\text {fa_dun }}=\\sum_{r}-\\ln \\left(P\\left(\\operatorname{rot}_{r} \\mid \\phi_{r}, \\psi_{r}, a a_{r}\\right)\\right)+\\sum_{k<T_{r}} \\frac{1}{2}\\left(\\frac{\\chi_{k, r}-\\mu_{\\chi_{k}}\\left(\\phi_{r}, \\psi_{r} \\mid \\operatorname{rot}_{r}, \\mathrm{aa}_{r}\\right)}{\\sigma_{\\chi_{k}}\\left(\\phi_{r}, \\psi_{r} \\mid \\operatorname{rot}_{r}, \\mathrm{aa}_{r}\\right)}\\right)^{2}+$\n", "$-\\ln \\left(P\\left(\\chi_{T_{r}, r} \\mid \\phi_{r}, \\psi_{r}, \\operatorname{rot}_{r}, \\mathrm{aa}_{r}\\right)\\right)$" ] }, { "cell_type": "markdown", "id": "earlier-auditor", "metadata": {}, "source": [ "第一项是$\\sum_{r}-\\ln \\left(P\\left(\\operatorname{rot}_{r} \\mid \\phi_{r}, \\psi_{r}, \\mathrm{aa}_{r}\\right)\\right)$,这一项其实就是对一个Rotamer的Probabil取-ln对数,可直接从数据库中获取,直接评估当前rotamer类型的标准化平均能量。" ] }, { "cell_type": "markdown", "id": "manufactured-craft", "metadata": {}, "source": [ "第二项是$\\sum_{k<T_{r}} \\frac{1}{2}\\left(\\frac{\\chi_{k, r}-\\mu_{\\chi_{k}}\\left(\\phi_{r}, \\psi_{r} \\mid \\operatorname{rot}_{r}, \\mathrm{aa}_{r}\\right)}{\\sigma_{\\chi_{k}}\\left(\\phi_{r}, \\psi_{r} \\mid \\operatorname{rot}_{r}, \\mathrm{aa}_{r}\\right)}\\right)^{2}$,该项的作用是对当$\\chi$角偏离平均值时做出的能量惩罚/校正(能量升高),因为一个rotamer的$\\chi$角不可能正好等于数据库中的平均值。该项惩罚递增的速率由方差决定。这一惩罚项只对rotameric$\\chi$角有效。该函数形状类似高斯分布,$\\mu_{\\chi_{k}}$指的是chi1Val、chi2Val等为$\\chi$角的平均值,$\\sigma_{\\chi_{k}}$即为chi1Sig等值,代表方差。" ] }, { "cell_type": "markdown", "id": "hispanic-synthetic", "metadata": {}, "source": [ "第三项是$-\\ln \\left(P\\left(\\chi_{T_{r}, r} \\mid \\phi_{r}, \\psi_{r}, \\operatorname{rot}_{r}, \\mathrm{aa}_{r}\\right)\\right)$,该项主要对最后一个$\\chi$角起作用,主要用于处理Non-rotameric二面角。当Rotamer最后一个$\\chi$角是rotameric时,该项恒等于0。反之对最后一个$\\chi$角做出相应的能量校正。" ] }, { "cell_type": "markdown", "id": "handled-symposium", "metadata": {}, "source": [ "综上使用三项既可以分别描述每一个$\\chi$角的能量分布曲线,也可以同时用于描述每种Rotamer构型r在整个$\\phi$和$\\psi$bin中的概率分布,如下效果:" ] }, { "cell_type": "markdown", "id": "rocky-lunch", "metadata": {}, "source": [ "<center><img src=\"./img/Efun.jpg\" width = \"800\" height = \"200\" align=center /></center>\n", "(图片来源: Meiler Lab Rosetta2020教程中的Rosetta_Energy_Function ppt)" ] }, { "cell_type": "markdown", "id": "greenhouse-louisiana", "metadata": {}, "source": [ "### 三、Rosetta Packer\n", "\n", "截止上述,我们已经完整地阐述了在Rosetta中侧链Rotamer的相关概念以及Rotamer Library和能量的计算方式。在这一节,我们将会介绍Rosetta Packer的基本算法以及具体应用方式。" ] }, { "cell_type": "markdown", "id": "fossil-forth", "metadata": {}, "source": [ "试想下如下场景: 在一个连续的两股螺旋组成的体系中,第i号位点上为精氨酸残基(R),第j号位点上是谷氨酸残基(E)。现在需要求解i和j位点上构象能量最低的状态?\n", "一个可能的笨方法是遍历所有i和j位点上的rotamer构象,分别计算每一组rotamer状态下的能量值,最终选取能量最低的两个rotamer即可。" ] }, { "cell_type": "markdown", "id": "optimum-identity", "metadata": {}, "source": [ "<center><img src=\"./img/interactiongraph.jpg\" width = \"800\" height = \"200\" align=center /></center>\n", "(图片来源: 晶泰科技团队)" ] }, { "cell_type": "markdown", "id": "valued-quantum", "metadata": {}, "source": [ "在Rosetta中,侧链优化的基本原理的确如此,每两个氨基酸位点之间,不同Rotamer组合的能量就是如此遍历计算,并储存在一个矩阵(InteractionGraph)当中。不过当氨基酸位点数量的增加时,侧链构象优化的问题将变得极为复杂(NP-hard)。Rosetta Packer算法采用的是mcmc+模拟退火的方法,配合InteractionGraph的预计算的数据,通过查询表的方式,非常快速地得到目标构象的能量值,而不需要每次都反复重新计算,该机制大幅度提升了侧链优化和搜索的速度。" ] }, { "cell_type": "markdown", "id": "urban-vector", "metadata": {}, "source": [ "#### Packer模拟退火的C++代码\n", "```\n", "# Run simulated annealing:\n", "for ( int i = 1; i <= num_outer_iterations; ++i ) {\n", " for ( int j = 1; j <= num_inner_iterations; ++j ) {\n", " int newrot = pick_random_rotamer();\n", " compute deltaE = E( newrot ) - E( oldrot )\n", " if ( metropolis_accept( temp, deltaE )) {\n", " accept random_rotamer;\n", " }\n", " }\n", " decrease_temperature( temp );\n", "}\n", "```" ] }, { "cell_type": "markdown", "id": "premium-intention", "metadata": {}, "source": [ "Rotamer构象模拟退火的包括两个部分: \n", "\n", "第一个部分是外部的模拟退火控制,根据outer循环的迭代次数不断地降低温度,使得整个蛋白质的构象被“困”在能量的极小值区域。\n", "\n", "第二个部分是mcmc采样,pick_random_rotamer函数即随机选择一个位点,并随机生成该氨基酸残基的一个rotamer。接着是熟悉的Metropolis准则,如果当前构象能量下降接受新的构象,反之以一定的概率接受新构象。\n", "\n", "完成整个模拟退火过程后,将能量最低的构象作为输出。" ] }, { "cell_type": "markdown", "id": "postal-arrangement", "metadata": {}, "source": [ "#### Packer的运行模式" ] }, { "cell_type": "markdown", "id": "previous-liberia", "metadata": {}, "source": [ "Packer有三种运行模式,分别是“Repacking”、“Rotamer Trial”和“Design”,这三种模式有着应用上的一定区别。" ] }, { "cell_type": "markdown", "id": "russian-looking", "metadata": {}, "source": [ "“Repacking”直译的含义是“重新打包”,顾名思义,需要将以前侧链的状态先“忘记”,重头优化构象。具体的做法是:\n", "1. 将所有的侧链原子删除\n", "2. 逐步添加一些侧链结构,并运行模拟退火\n", "3. 重复这个过程,得到一系列低能量的侧链构象集合" ] }, { "cell_type": "markdown", "id": "friendly-nigeria", "metadata": {}, "source": [ "“Rotamer Trial”直译是“尝试Rotamer”,顾名思义,需要在当前的构象上继续优化,优点是快速。具体的做法是:\n", "1. 随机选择一个位点\n", "2. 运行模拟退火搜索能量更低的构象\n", "3. 如此往复,直到所有侧链能量比较高的位点都得到了优化" ] }, { "cell_type": "markdown", "id": "retained-sacramento", "metadata": {}, "source": [ "“Design”直译就是“序列设计”,因此这个模式下可以运行氨基酸类型发生变化。其具体的做法与“Repacking”类似,但最大的不同在于其考虑了所有被允许出现的氨基酸所有的Rotamer构象。这种模式往往需要配合骨架的Relax来避免过快收敛。" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 5 }