{ "cells": [ { "cell_type": "markdown", "id": "responsible-recipient", "metadata": {}, "source": [ "# Rosetta FoldTree\n", "\n", "@Author: 张博文\n", "@email:bowen.zhang@xtalpi.com\n", "\n", "@Proofread: 吴炜坤\n", "@email:weikun.wu@xtalpi.com" ] }, { "cell_type": "markdown", "id": "alive-analysis", "metadata": {}, "source": [ "在第一章第五节中,我们学习了有关蛋白质几何构象的处理方式的各种概念及处理方式。在蛋白质构象设计的过程中,我们需要针对这些几何关系对蛋白质结构进行计算。此时,就会涉及到计算过程中坐标的处理方式。这一章将介绍的就是Rosetta中坐标的处理方式。" ] }, { "cell_type": "code", "execution_count": 1, "id": "institutional-dividend", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.26+release.b308454c455dd04f6824cc8b23e54bbb9be2cdd7 2021-07-02T13:01:54] retrieved from: http://www.pyrosetta.org\n", "(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.\n", "\u001b[0mcore.init: {0} \u001b[0mChecking for fconfig files in pwd and ./rosetta/flags\n", "\u001b[0mcore.init: {0} \u001b[0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r288 2021.26+release.b308454c455 b308454c455dd04f6824cc8b23e54bbb9be2cdd7 http://www.pyrosetta.org 2021-07-02T13:01:54\n", "\u001b[0mcore.init: {0} \u001b[0mcommand: PyRosetta -ex1 -ex2aro -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database\n", "\u001b[0mbasic.random.init_random_generator: {0} \u001b[0m'RNG device' seed mode, using '/dev/urandom', seed=1689727216 seed_offset=0 real_seed=1689727216 thread_index=0\n", "\u001b[0mbasic.random.init_random_generator: {0} \u001b[0mRandomGenerator:init: Normal mode, seed=1689727216 RG_type=mt19937\n" ] } ], "source": [ "# 初始化pyrosetta,使用pyrosetta必须要做的一件事情\n", "from pyrosetta import *\n", "from pyrosetta.rosetta import *\n", "from pyrosetta.teaching import *\n", "init()" ] }, { "cell_type": "markdown", "id": "pointed-algebra", "metadata": {}, "source": [ "### 一、内坐标及减少自由度降低计算复杂度的处理方式" ] }, { "cell_type": "markdown", "id": "molecular-opera", "metadata": {}, "source": [ "PDB文件中,大多数的表示都是笛卡尔坐标(X,Y,Z),当改变或设置这3个值时,该原子的位置也就确定了,此时每个原子都具有三个自由度,但三个自由度对于计算来说复杂度过高。" ] }, { "cell_type": "markdown", "id": "weekly-party", "metadata": {}, "source": [ "**在进行计算的时候,坐标的表示方式是这样的:**" ] }, { "cell_type": "markdown", "id": "surprising-nelson", "metadata": {}, "source": [ "1 使用了内坐标来进行表示。内坐标定义如下:如下图所示,对于一个原子,其位置坐标通过其与临近原子的键长、键角、二面角来进行表示。此个原子来说,构象的改变是通过改变键长、键角和二面角来完成的,但此时的自由度仍为3。(二面角 dihedral,扭转角 = 180度 - 二面角,互补关系)\n", "\n", "举例: 如果想知道原子j的坐标,我们可以通过从原点和原子i的信息来计算,此处已知原点到原子i键长以及原子i到原子j的键长。此时只要再加上原点-原子i-原子j之间的键角。就可以推算出原子j的具体坐标(xyz),因此内坐标和笛卡尔坐标是可以轻易被转换的。" ] }, { "cell_type": "markdown", "id": "italic-chick", "metadata": {}, "source": [ "<center><img src=\"./imgs/freedom.jpg\" width = \"700\" height = \"400\" align=center /> </center>\n", "(图片来源: 晶泰科技团队)" ] }, { "cell_type": "markdown", "id": "demographic-april", "metadata": {}, "source": [ "2 减少自由度降低计算复杂度:\n", "\n", "尽管此时对于一个原子来说自由度仍为3,但这三个自由度在蛋白质设计中的重要程度并不相同。往往在蛋白质建模的过程中,键长、键角的变化相对来说设计很微小的,**这时候若设置将此两者设置为理想值(固定值),只以二面角的变化来表示构象变化。此时自由度就降为1了!**。同时,对于特定的原子(例如大多数的氢)其二面角受其化学环境的影响是固定的,此时自由度就变为0了。" ] }, { "cell_type": "markdown", "id": "differential-information", "metadata": {}, "source": [ "### 二、FoldTree的定义和功能" ] }, { "cell_type": "markdown", "id": "d795ac0e-532f-4a2e-a1c6-4f9a05b6a7f9", "metadata": {}, "source": [ "#### 2.1 什么是FoldTree?" ] }, { "cell_type": "markdown", "id": "e46edd97-a5ed-475c-9220-23d5fe7aaf6c", "metadata": {}, "source": [ "**在Rosseta中, FoldTree即对蛋白质主链内部及主链之间表示其上下游关系的方式,它定义了在蛋白质结构中,哪些残基是上游或是母节点、哪些残基是下游或是子节点,并为其贴上了标签,并且约定当内坐标自由度发生变换时,只有FoldTree下游的氨基酸xyz坐标会发生改变,而上游氨基酸的构象保持不变。**" ] }, { "cell_type": "markdown", "id": "265e4c6f-bdea-4e1e-9aa3-2e8213ec4fd5", "metadata": {}, "source": [ "此处举一个最简单的例子: FoldTree的顺序是从N端到C端,仅有一条链。因此1号残基是此树结构的\"根\"(上游),8号残基是最末端的\"支\"(下游)。\n", "\n", "如果此时对4号残基的骨架的二面角做出改变。只有处于树结构下游(5->6->7->8)的氨基酸会发生坐标变换。" ] }, { "cell_type": "markdown", "id": "96d628d0-0fca-4b88-bb39-1c66761ef708", "metadata": {}, "source": [ "<center><img src=\"./imgs/simple_foldtree.jpg\" width = \"700\" height = \"300\" align=center /> </center>\n", "(图片来源: Meiler Lab Rosetta2020教程中的Rosetta_Energy_Function ppt)" ] }, { "cell_type": "markdown", "id": "e23e480d-1c56-4899-9008-c84c370aec48", "metadata": {}, "source": [ "FoldTree共有四种特性:\n", "1. 杠杆效应(本节介绍)\n", "2. 顺序性(本节介绍)\n", "3. 跳跃性(下一节介绍)\n", "4. 可切割性(下一节介绍)" ] }, { "cell_type": "markdown", "id": "little-raleigh", "metadata": {}, "source": [ "#### 2.2 杠杆效应现象" ] }, { "cell_type": "markdown", "id": "741f23a6-0864-41e7-9fac-9db885c501ff", "metadata": {}, "source": [ "**杠杆效应本质即当遵守如上所说约定时,当内坐标变化时,约定只对处于树下游区域的氨基酸xyz坐标进行变换所产生的现象。**" ] }, { "cell_type": "markdown", "id": "nutritional-helen", "metadata": {}, "source": [ "举例,首先得到了一个由五个残基组成的多肽。" ] }, { "cell_type": "code", "execution_count": 2, "id": "arabic-caution", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0mcore.chemical.GlobalResidueTypeSet: {0} \u001b[0mFinished initializing fa_standard residue type set. Created 984 residue types\n", "\u001b[0mcore.chemical.GlobalResidueTypeSet: {0} \u001b[0mTotal time to initialize 0.695354 seconds.\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pose = pose_from_sequence('KPALN') # 根据序列生成一个示例的多肽序列\n", "pose.dump_pdb('./data/example.pdb') # 保存该pose为pdb文件" ] }, { "cell_type": "code", "execution_count": 3, "id": "boxed-exercise", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 更改示例多肽序列的3号残基的phi角\n", "pose.set_phi(3,70) # 修改3号ALA残基的phi角\n", "pose.dump_pdb('./data/example_change.pdb') # 保存修改后的pose为pdb文件" ] }, { "cell_type": "code", "execution_count": 4, "id": "5e6965dc-965a-4289-a907-681e0ea1942b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FOLD_TREE EDGE 1 5 -1 \n" ] } ], "source": [ "# 打印foldtree信息:\n", "print(pose.fold_tree())" ] }, { "cell_type": "markdown", "id": "eleven-cabin", "metadata": {}, "source": [ "从foldtree的信息来看,目前的设置是树顺序是1->2->3->4->5,因此当3号氨基酸phi角发生变化时,3,4,5号氨基酸都属于下游区域,理应发生坐标变换。" ] }, { "cell_type": "markdown", "id": "decimal-voltage", "metadata": {}, "source": [ "<center><img src=\"./imgs/phi_change.jpg\" width = \"800\" height = \"300\" align=center /> </center>\n", "(图片来源: 知乎,Rosettta研习社)" ] }, { "cell_type": "markdown", "id": "154bfead-b2ae-439a-8a93-58cb30760ca2", "metadata": {}, "source": [ "因此在内坐标体系中,通过内坐标和笛卡尔坐标的换算,可以高效地表示由于杠杠效应带来的3N*atom_number个自由度的变化。" ] }, { "cell_type": "markdown", "id": "indoor-correspondence", "metadata": {}, "source": [ "#### 2.3 FoldTree的表示方式" ] }, { "cell_type": "markdown", "id": "handled-illinois", "metadata": {}, "source": [ "Edge是FoldTree的基本组成部分,一个FoldTree是由1个或多个Edge组成。在Pyrosetta中Fold_Tree通过fold_tree方法来进行调用" ] }, { "cell_type": "code", "execution_count": 5, "id": "understanding-mortality", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0mcore.import_pose.import_pose: {0} \u001b[0mFile './data/1v74.pdb' automatically determined to be of type PDB\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom: OXT on residue LEU:CtermProteinFull 107\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom: OXT on residue LEU:CtermProteinFull 194\n" ] } ], "source": [ "# PDB:1v74的例子\n", "pose_1v74 = pose_from_pdb('./data/1v74.pdb') #读入蛋白质的pdb文件\n", "pose_1v74_foldtree = pose_1v74.fold_tree() #得到该pose的foldtree实例" ] }, { "cell_type": "code", "execution_count": 6, "id": "removed-dynamics", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PDB file name: ./data/1v74.pdb\n", " Pose Range Chain PDB Range | #Residues #Atoms\n", "\n", "0001 -- 0107 A 0591 -- 0697 | 0107 residues; 01728 atoms\n", "0108 -- 0194 B 0001 -- 0087 | 0087 residues; 01425 atoms\n", " TOTAL | 0194 residues; 03153 atoms\n", "\n" ] } ], "source": [ "# 输出PDB:1V74的信息\n", "print(pose_1v74.pdb_info())" ] }, { "cell_type": "markdown", "id": "scheduled-harvest", "metadata": {}, "source": [ "如同上面的输出所示,示例pose具有2条多肽链,分别是1-107号残基,以及108-194号残基的肽链。" ] }, { "cell_type": "code", "execution_count": 7, "id": "artificial-portugal", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1v74 Fold Tree:\n", " FOLD_TREE EDGE 1 107 -1 EDGE 1 108 1 EDGE 108 194 -1 \n" ] } ], "source": [ "# 输出PDB:1V74的fold tree表示\n", "print(\"1v74 Fold Tree:\\n\", pose_1v74_foldtree)" ] }, { "cell_type": "markdown", "id": "included-economics", "metadata": {}, "source": [ "FoldTree的表示方式一般如下。有四个字段组成:**EDGE、Start、End、Define**\n", "如上面输出所示:\n", "- **当Define字段是-1时**,代表着这一段多肽区域按照从start到end位的顺序通过共价键进行延伸。如上所示,表示1到107残基、108到194残基是以共价相连。如\"EDGE 1 107 -1\", \"EDGE 108 194 -1\"。\n", "- **当Define字段是正整数时**,代表着会在start和end位氨基酸建立虚拟链接,代表这个是一个Jump点,表示在FoldTree中,第Start号氨基酸和第End号氨基酸之间将建立**“虚拟的共价链接”**,Jump点会直接改变不同多肽链之间的上下游的关系。如上所示,分别在1和108号残基有1个jump点。如\"EDGE 1 108 1\", 1代表Jump的id序号,第一个Jump点。" ] }, { "cell_type": "markdown", "id": "asian-ranch", "metadata": {}, "source": [ "对于FoldTree的Jump,我们可以通过foldtree实例的一些方法得到他的信息。(**关于Jump点的信息我们将在3.2章节中进行介绍**)" ] }, { "cell_type": "code", "execution_count": 8, "id": "necessary-section", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The size of the foldtree (number of edges) is 3\n", "The number of the jump is 1\n" ] } ], "source": [ "print(f\"The size of the foldtree (number of edges) is {pose_1v74_foldtree.size()}\") #Foldtree 的尺寸(edge的个数)\n", "print(f\"The number of the jump is {pose_1v74_foldtree.num_jump()}\") #Foldtree 的jump点的个数" ] }, { "cell_type": "markdown", "id": "interesting-format", "metadata": {}, "source": [ "#### 2.4 自定义一个FoldTree" ] }, { "cell_type": "markdown", "id": "opening-duplicate", "metadata": {}, "source": [ "基本的操作方式:\n", "1. 实例化一个FoldTree对象; \n", "2. 使用该对象的add_edge、delete_edge进行FoldTree的编辑; \n", "3. pose的fold_tree读回创建的FoldTree对象。" ] }, { "cell_type": "code", "execution_count": 9, "id": "capable-hello", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FOLD_TREE EDGE 1 7 -1 \n" ] } ], "source": [ "# 从序列创建一个Pose对象\n", "seq_pose = pose_from_sequence('AAAAAAA') #读入PDB文件\n", "print(seq_pose.fold_tree())" ] }, { "cell_type": "markdown", "id": "broken-blond", "metadata": {}, "source": [ "可见,在创建Pose的同时FoldTree也会同时被自动创建,但这个foldtree默认是从N到C端的顺序进行创建。" ] }, { "cell_type": "code", "execution_count": 10, "id": "stainless-messenger", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FOLD_TREE \n" ] }, { "data": { "text/plain": [ "False" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 设置一个foldtree对象的实例,并将它读回pose中更新foldtree\n", "ft = FoldTree() #设置一个空的foldtree对象\n", "print(ft)\n", "ft.check_fold_tree() # 检查foldtree的有效性;" ] }, { "cell_type": "markdown", "id": "molecular-colorado", "metadata": {}, "source": [ "可见,通过check_fold_tree函数,可以快速检测你的foldtree是否有效。" ] }, { "cell_type": "code", "execution_count": 11, "id": "given-statement", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#增加一个5号残基到1号残基的共价连接的edge\n", "ft.add_edge(start=5, stop=1, label=-1) # label代表egde类型,-1=多肽链,1=jump\n", "ft.check_fold_tree() # 检查foldtree的有效性;" ] }, { "cell_type": "markdown", "id": "vulnerable-bracket", "metadata": {}, "source": [ "上面我们仅定义了1-5号氨基酸的foldtree,但是我们创建的pose却有7个氨基酸。此时将这个foldtree加载到pose中会发生什么?" ] }, { "cell_type": "code", "execution_count": 12, "id": "delayed-electricity", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0mcore.conformation.Conformation: {0} \u001b[0m\u001b[31m\u001b[1m[ ERROR ]\u001b[0m Error in assigning a FoldTree to a Conformation - size mismatch.\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0m\u001b[31m\u001b[1m[ ERROR ]\u001b[0m Conformation of length 7: A[ALA:NtermProteinFull]AAAAAA[ALA:CtermProteinFull]\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0m\u001b[31m\u001b[1m[ ERROR ]\u001b[0m FoldTree of length 5: FOLD_TREE EDGE 5 1 -1\n" ] }, { "ename": "RuntimeError", "evalue": "\n\nFile: /Volumes/MacintoshHD3/benchmark/W.fujii.release/rosetta.Fujii.release/_commits_/main/source/src/core/conformation/Conformation.cc:829\n[ ERROR ] UtilityExitException\nERROR: Conformation: fold_tree nres should match conformation nres. conformation nres: 7 fold_tree nres: 5\n\n", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mRuntimeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m<ipython-input-12-cb53977f0fef>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mseq_pose\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfold_tree\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mft\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m#更新pose的fold_tree\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mRuntimeError\u001b[0m: \n\nFile: /Volumes/MacintoshHD3/benchmark/W.fujii.release/rosetta.Fujii.release/_commits_/main/source/src/core/conformation/Conformation.cc:829\n[ ERROR ] UtilityExitException\nERROR: Conformation: fold_tree nres should match conformation nres. conformation nres: 7 fold_tree nres: 5\n\n" ] } ], "source": [ "seq_pose.fold_tree(ft) #更新pose的fold_tree" ] }, { "cell_type": "markdown", "id": "governmental-migration", "metadata": {}, "source": [ "运行结果表示,foldtree中必须完整定义整个pose的所有氨基酸上下游关系时,才是一个完整的tree!" ] }, { "cell_type": "code", "execution_count": null, "id": "written-graphics", "metadata": {}, "outputs": [], "source": [ "# 完整定义试试:\n", "ft = FoldTree() #设置一个空的foldtree对象\n", "ft.add_edge(start=5, stop=1, label=-1)\n", "ft.add_edge(start=5, stop=7, label=-1)\n", "ft.check_fold_tree() # 检查foldtree的有效性;" ] }, { "cell_type": "code", "execution_count": null, "id": "familiar-qatar", "metadata": {}, "outputs": [], "source": [ "seq_pose.fold_tree(ft) #更新pose的fold_tree\n", "print(seq_pose.fold_tree())" ] }, { "cell_type": "markdown", "id": "proof-surgery", "metadata": {}, "source": [ "#### 2.5 FoldTree的顺序性" ] }, { "cell_type": "markdown", "id": "legislative-testimony", "metadata": {}, "source": [ "在FoldTree中,Start和End字段是顺序敏感的,包括N->C顺序以及C->N顺序。不同顺序,代表EDGE内氨基酸所对应的上下游关系不同。" ] }, { "cell_type": "markdown", "id": "after-growing", "metadata": {}, "source": [ "还是以原本的的五肽为例,原本是以N->C的顺序来进行排列的。我们将其人为修改为C->N后,由于FoldTree上下游关系的改变,修改二面角后构象造成的变化也不再相同。" ] }, { "cell_type": "code", "execution_count": null, "id": "positive-settlement", "metadata": {}, "outputs": [], "source": [ "#观察原本五肽的FoldTree\n", "pose = pose_from_sequence('KPALN') # 根据一个序列得到一个五肽序列的pose\n", "pose.dump_pdb('./data/example_5_1.pdb') #保存五肽的构象\n", "print(\"The Fold Tree of the pose:\\n\", pose.fold_tree()) # 输出该pose的FoldTree" ] }, { "cell_type": "code", "execution_count": null, "id": "loving-fiction", "metadata": {}, "outputs": [], "source": [ "# 设置一个foldtree对象的实例,并将它读回pose中更新foldtree\n", "ft = FoldTree() #设置一个空的foldtree对象\n", "ft.add_edge(start=5, stop=1, label=-1) #增加一个5号残基到1号残基的共价连接的edge\n", "pose.fold_tree(ft) #更新pose的fold_tree\n", "print(pose.fold_tree()) #输出foldtree" ] }, { "cell_type": "code", "execution_count": null, "id": "authentic-patient", "metadata": {}, "outputs": [], "source": [ "pose.set_phi(3,70) #更改肽链的3号残基的phi角\n", "pose.dump_pdb('./data/example_5_1_change.pdb') #保存更改后的五肽的构象" ] }, { "cell_type": "markdown", "id": "simplified-search", "metadata": {}, "source": [ "<img src=\"./imgs/foldtree_sequence.png\" width = \"600\" height = \"300\" align=center /> \n", "(图片来源: 晶泰科技团队)" ] }, { "cell_type": "markdown", "id": "premier-front", "metadata": {}, "source": [ "如图,黄色的是初始的五肽,蓝色的是原本的多肽在修改3号残基的phi角后的多肽构象,可以发现出现构象改变的是4、5号残基;而在修改该多肽的FoldTree为从5到1的顺序后,发生构象变化的就是1、2号残基。而这,就是foldtree的顺序性,即在改变选定残基的构象时,随之而改变的只有选定残基下游的氨基酸的构象" ] }, { "cell_type": "code", "execution_count": null, "id": "blank-sewing", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "182px" }, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 5 }