{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Conformation Layer\n", "\n", "@Author: 吴炜坤\n", "\n", "@email:weikun.wu@xtalpi.com weikunwu@163.com\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在前一个章节中,我们已经介绍了在Rosetta中,Residue对象是描述蛋白质的基本单元,许多独立的Residue被用于描述蛋白质的几何结构和高级构象。\n", "这些细微的构象变化的度量就是由**原子间的键长、键角,二面角等一系列的具体参数构成。**在PyRosetta中, Pose中这些几何构象的参数由Conformation对象负责记录。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 一、蛋白质的几何构象\n", "蛋白质是由多个氨基酸通过脱水缩合的方式形成肽键共价连接。因此对于天然氨基酸而言。骨架最重要的两个二面角就是phi和psi角,而omega由于肽键平面一般处于0或180°附近。对于不同的氨基酸侧链,每个Residue含有若干个chi角。这些二面角组成了Rosetta对构象采样的基本几何参数。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<center><img src=\"./img/phipsiomega.png\" width = \"600\" height = \"200\" align=center /></center>\n", "(图片来源: https://github.com/RosettaCommons/PyRosetta.notebooks/tree/master/student-notebooks)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.31+release.c7009b3115c22daa9efe2805d9d1ebba08426a54 2021-08-07T10:04:12] retrieved from: http://www.pyrosetta.org\n", "(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.\n", "\u001b[0mcore.init: {0} \u001b[0mChecking for fconfig files in pwd and ./rosetta/flags\n", "\u001b[0mcore.init: {0} \u001b[0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r292 2021.31+release.c7009b3115c c7009b3115c22daa9efe2805d9d1ebba08426a54 http://www.pyrosetta.org 2021-08-07T10:04:12\n", "\u001b[0mcore.init: {0} \u001b[0mcommand: PyRosetta -ex1 -ex2aro -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database\n", "\u001b[0mbasic.random.init_random_generator: {0} \u001b[0m'RNG device' seed mode, using '/dev/urandom', seed=265248943 seed_offset=0 real_seed=265248943 thread_index=0\n", "\u001b[0mbasic.random.init_random_generator: {0} \u001b[0mRandomGenerator:init: Normal mode, seed=265248943 RG_type=mt19937\n", "\u001b[0mcore.chemical.GlobalResidueTypeSet: {0} \u001b[0mFinished initializing fa_standard residue type set. Created 983 residue types\n", "\u001b[0mcore.chemical.GlobalResidueTypeSet: {0} \u001b[0mTotal time to initialize 0.67614 seconds.\n", "\u001b[0mcore.import_pose.import_pose: {0} \u001b[0mFile './data/4jfx_peptide.pdb' automatically determined to be of type PDB\n" ] } ], "source": [ "# 读取多肽的PDB结构\n", "from pyrosetta import init, pose_from_pdb\n", "init()\n", "pose = pose_from_pdb('./data/4jfx_peptide.pdb')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1.1 基本二面角几何参数\n", "从pyrosetta中获取这些基本几何参数的方式非常简单:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-148.8025511852094 157.0624491251048 157.0624491251048\n" ] } ], "source": [ "# 获取第3号氨基酸的骨架二面角:\n", "phi = pose.phi(3)\n", "psi = pose.psi(3)\n", "omega = pose.psi(3)\n", "print(phi, psi, omega)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "也可以直接获取侧链的二面角参数: \n", "\n", "pose.chi($\\chi$角编号, Residue的pose编号)即可获取。" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3\n" ] } ], "source": [ "# 获取第三号残基的chi角信息: pose.chi(chi_id:int, residue_id:int)\n", "chi_num = len(pose.residue(3).chi_atoms())\n", "print(chi_num)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "残基中共有3个$\\chi$二面角。" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "70.4233120899352\n", "81.97278776078356\n", "-2.023205011879734e-14\n" ] } ], "source": [ "# 打印每个chi二面角的\n", "for chi_id in range(1, chi_num+1):\n", " chi_angle = pose.chi(chi_id, 3)\n", " print(chi_angle)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1.2 调整二面角几何参数\n", "Pose对象中内置的几个函数非常方便地可以用于调整几何构象: set_phi, set_psi, set_omega, set_chi。" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 调整骨架二面角\n", "pose.set_phi(seqpos=3, setting=-150)\n", "pose.set_phi(seqpos=3, setting=170)\n", "pose.dump_pdb('./data/4jfx_peptide_conf0.pdb')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "调整构象后的多肽构象直观感受:\n", "<center><img src=\"./img/conf.png\" width = \"500\" height = \"200\" align=center /></center>\n", "(图片来源: 晶泰科技团队)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 调整3号氨基酸的侧链chi1角的角度;\n", "pose.set_chi(chino=1, seqpos=3, setting=60)\n", "pose.dump_pdb('./data/4jfx_peptide_chi_conf0.pdb')\n", "pose.set_chi(chino=1, seqpos=3, setting=-60)\n", "pose.dump_pdb('./data/4jfx_peptide_chi_conf1.pdb')\n", "pose.set_chi(chino=1, seqpos=3, setting=180)\n", "pose.dump_pdb('./data/4jfx_peptide_chi_conf2.pdb')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<center><img src=\"./img/chi_conf.png\" width = \"500\" height = \"200\" align=center /></center>\n", "(图片来源: 晶泰科技团队)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 二、化学键几何参数\n", "除了对构象变化依赖影响最大的二面角参数,局部的化学键键长和键角信息也储存在Conformation对象中。\n", "为了定位原子的信息,首先需要构建atom identifier对象,相当于创建一个ID卡,让Rosetta知道我们指定的原子是位于哪个氨基酸中的。通过AtomID,提供残基号,原子号,就可以创建atom identifier对象" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " atomno= 1 rsd= 3 \n", " atomno= 2 rsd= 3 \n", " atomno= 3 rsd= 3 \n", " atomno= 4 rsd= 3 \n" ] } ], "source": [ "# 获取原子间的键长、键角信息前需要构建atom identifier objects\n", "from pyrosetta.rosetta.core.id import AtomID\n", "atom1 = AtomID(atomno_in=1, rsd_in=3) # 3号残基的第一个原子\n", "atom2 = AtomID(atomno_in=2, rsd_in=3) # 3号残基的第二个原子\n", "atom3 = AtomID(atomno_in=3, rsd_in=3) # 3号残基的第三个原子\n", "atom4 = AtomID(atomno_in=4, rsd_in=3) # 3号残基的第四个原子\n", "print(atom1)\n", "print(atom2)\n", "print(atom3)\n", "print(atom4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "知道原子的ID后,就可以通过conformation对象来获取键长、键角等数据了。但一般这些参数在Rosetta中键长和键角都设定为理想值,可以极大减少蛋白质构象的采样自由度空间。但注意的是,获取的键长键角必须是有“物理连接的”。" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "键长:1.4632750937537378, 键角:1.9840915800459624\n" ] } ], "source": [ "# 通过conformation层获取键长数据\n", "bond_length = pose.conformation().bond_length(atom1, atom2)\n", "\n", "# 通过conformation层获取键角数据(弧度)\n", "bond_angle = pose.conformation().bond_angle(atom1, atom2, atom3)\n", "\n", "print(f'键长:{bond_length}, 键角:{bond_angle}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "同样原子间的键长和键角也是可以被调整的:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.5 3.4\n" ] } ], "source": [ "# 设置新的值:\n", "pose.conformation().set_bond_length(atom1, atom2, setting=1.5) # 设置键长\n", "pose.conformation().set_bond_angle(atom1, atom2, atom3, setting=3.4) # 设置键角,弧度,而非角度\n", "\n", "# 查看新的值设定情况:\n", "new_bond_length = pose.conformation().bond_length(atom1, atom2)\n", "new_bond_angle = pose.conformation().bond_angle(atom1, atom2, atom3)\n", "print(new_bond_length, new_bond_angle)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 三、原子xyz坐标\n", "原子坐标的修改需要获取residue对象,并获取原子ID(atom identifier objects)。通过pose.set_xyz函数设定新的xyz坐标, 但用户一般不需要”显式“地修改原子坐标, 除非你明白这样操作的意义。此处我们沿着3个坐标轴平移所有原子3个埃的距离。" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# 原子坐标的修改(一般不需要这样操作)\n", "from pyrosetta.rosetta.numeric import xyzVector_double_t\n", "\n", "# 对所有氨基酸的所有原子的x坐标乘上一个负号:\n", "for residue_id in range(1, pose.total_residue()+1):\n", " residue = pose.residue(residue_id) # 获取residue对象\n", " for atom_id, atom in enumerate(residue.atoms()):\n", " x, y, z = atom.xyz()\n", " \n", " # 镜像处理xyz坐标:\n", " trans_xyz = xyzVector_double_t(x+3, y+3, z+3) # 平移+3埃.\n", " atom_index = AtomID(atom_id+1, residue_id) # 3号氨基酸的第x个原子的id\n", " pose.set_xyz(atom_index, trans_xyz) # 设置xyz坐标" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pose.dump_pdb('./data/trans.pdb')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<center><img src=\"./img/trans_xyz.png\" width = \"600\" height = \"200\" align=center /></center>\n", "(图片来源: 晶泰科技团队)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 四、 蛋白二级结构信息\n", "Rosetta中的二级结构信息来源于DSSP的计算。通过get_secstruct函数即可获取。" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0mprotocols.DsspMover: {0} \u001b[0mLLLLLLLL\n", "LLLLLLLL\n" ] } ], "source": [ "# 通过DSSP获取二级结构信息\n", "from pyrosetta.rosetta.protocols.membrane import get_secstruct\n", "ss = ''.join(get_secstruct(pose))\n", "print(ss)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 五、理想化初始拓扑数据\n", "在晶体中,可能会存在一些非理想二面角、键长、键角等。可以通过IdealizeMover进行修复。" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0mprotocols.idealize.IdealizeMover: {0} \u001b[0mtotal atompairs: 0\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mlastjumpmin: 1\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mpremin: (pos,rmsd,avg-bb,max-bb,avg-chi,max-chi,score) 2 N 0.075 0.000 0.000 0.000 0.000 1.416\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mpostmin: (pos,rmsd,avg-bb,max-bb,avg-chi,max-chi,score) 2 N 0.039 4.106 5.979 2.375 4.050 0.283\n", "\u001b[0mprotocols.idealize: {0} \u001b[0m\n", "------------------------------------------------------------\n", " Scores Weight Raw Score Wghtd.Score\n", "------------------------------------------------------------\n", " pro_close 0.500 0.000 0.000\n", " dslf_ss_dst 0.500 0.000 0.000\n", " dslf_cs_ang 2.000 0.000 0.000\n", " coordinate_constraint 0.010 28.336 0.283\n", "---------------------------------------------------\n", " Total weighted score: 0.283\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mpremin: (pos,rmsd,avg-bb,max-bb,avg-chi,max-chi,score) 3 Y 2.619 2.531 5.979 0.950 4.050 1095.920\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mpostmin: (pos,rmsd,avg-bb,max-bb,avg-chi,max-chi,score) 3 Y 0.656 46.121 77.515 28.485 109.577 56.651\n", "\u001b[0mprotocols.idealize: {0} \u001b[0m\n", "------------------------------------------------------------\n", " Scores Weight Raw Score Wghtd.Score\n", "------------------------------------------------------------\n", " pro_close 0.500 0.000 0.000\n", " dslf_ss_dst 0.500 0.000 0.000\n", " dslf_cs_ang 2.000 0.000 0.000\n", " coordinate_constraint 0.010 5665.102 56.651\n", "---------------------------------------------------\n", " Total weighted score: 56.651\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mpremin: (pos,rmsd,avg-bb,max-bb,avg-chi,max-chi,score) 4 V 0.657 44.944 118.620 23.737 109.577 56.780\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mpostmin: (pos,rmsd,avg-bb,max-bb,avg-chi,max-chi,score) 4 V 0.657 44.944 118.629 23.737 109.577 56.747\n", "\u001b[0mprotocols.idealize: {0} \u001b[0m\n", "------------------------------------------------------------\n", " Scores Weight Raw Score Wghtd.Score\n", "------------------------------------------------------------\n", " pro_close 0.500 0.000 0.000\n", " dslf_ss_dst 0.500 0.000 0.000\n", " dslf_cs_ang 2.000 0.000 0.000\n", " coordinate_constraint 0.010 5674.731 56.747\n", "---------------------------------------------------\n", " Total weighted score: 56.747\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mpremin: (pos,rmsd,avg-bb,max-bb,avg-chi,max-chi,score) 5 V 0.654 34.395 118.629 20.346 109.577 57.267\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mpostmin: (pos,rmsd,avg-bb,max-bb,avg-chi,max-chi,score) 5 V 0.656 34.377 118.361 20.353 109.577 56.568\n", "\u001b[0mprotocols.idealize: {0} \u001b[0m\n", "------------------------------------------------------------\n", " Scores Weight Raw Score Wghtd.Score\n", "------------------------------------------------------------\n", " pro_close 0.500 0.000 0.000\n", " dslf_ss_dst 0.500 0.000 0.000\n", " dslf_cs_ang 2.000 0.000 0.000\n", " coordinate_constraint 0.010 5656.769 56.568\n", "---------------------------------------------------\n", " Total weighted score: 56.568\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mpremin: (pos,rmsd,avg-bb,max-bb,avg-chi,max-chi,score) 6 T 0.667 27.842 118.361 15.830 109.577 60.651\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mpostmin: (pos,rmsd,avg-bb,max-bb,avg-chi,max-chi,score) 6 T 0.655 27.895 118.453 15.832 109.577 56.600\n", "\u001b[0mprotocols.idealize: {0} \u001b[0m\n", "------------------------------------------------------------\n", " Scores Weight Raw Score Wghtd.Score\n", "------------------------------------------------------------\n", " pro_close 0.500 0.000 0.000\n", " dslf_ss_dst 0.500 0.000 0.000\n", " dslf_cs_ang 2.000 0.000 0.000\n", " coordinate_constraint 0.010 5660.035 56.600\n", "---------------------------------------------------\n", " Total weighted score: 56.600\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mpremin: (pos,rmsd,avg-bb,max-bb,avg-chi,max-chi,score) 8 A 0.670 26.155 118.453 15.832 109.577 57.959\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mpostmin: (pos,rmsd,avg-bb,max-bb,avg-chi,max-chi,score) 8 A 0.670 26.155 118.453 15.832 109.577 57.957\n", "\u001b[0mprotocols.idealize: {0} \u001b[0m\n", "------------------------------------------------------------\n", " Scores Weight Raw Score Wghtd.Score\n", "------------------------------------------------------------\n", " pro_close 0.500 0.000 0.000\n", " dslf_ss_dst 0.500 0.000 0.000\n", " dslf_cs_ang 2.000 0.000 0.000\n", " coordinate_constraint 0.010 5795.692 57.957\n", "---------------------------------------------------\n", " Total weighted score: 57.957\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mpremin: (pos,rmsd,avg-bb,max-bb,avg-chi,max-chi,score) 7 Y 0.796 22.061 118.453 11.874 109.577 71.358\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mpostmin: (pos,rmsd,avg-bb,max-bb,avg-chi,max-chi,score) 7 Y 0.796 22.060 118.455 11.875 109.577 71.337\n", "\u001b[0mprotocols.idealize: {0} \u001b[0m\n", "------------------------------------------------------------\n", " Scores Weight Raw Score Wghtd.Score\n", "------------------------------------------------------------\n", " pro_close 0.500 0.000 0.000\n", " dslf_ss_dst 0.500 0.000 0.000\n", " dslf_cs_ang 2.000 0.000 0.000\n", " coordinate_constraint 0.010 7133.705 71.337\n", "---------------------------------------------------\n", " Total weighted score: 71.337\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mpremin: (pos,rmsd,avg-bb,max-bb,avg-chi,max-chi,score) 1 G 0.796 21.383 118.455 11.875 109.577 71.315\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mpostmin: (pos,rmsd,avg-bb,max-bb,avg-chi,max-chi,score) 1 G 0.796 21.383 118.457 11.875 109.577 71.307\n", "\u001b[0mprotocols.idealize: {0} \u001b[0m\n", "------------------------------------------------------------\n", " Scores Weight Raw Score Wghtd.Score\n", "------------------------------------------------------------\n", " pro_close 0.500 0.000 0.000\n", " dslf_ss_dst 0.500 0.000 0.000\n", " dslf_cs_ang 2.000 0.000 0.000\n", " coordinate_constraint 0.010 7130.685 71.307\n", "---------------------------------------------------\n", " Total weighted score: 71.307\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mpre-finalmin: (pos,rmsd,avg-bb,max-bb,avg-chi,max-chi,score) 0.796 21.383 118.457 11.875 109.577 71.307\n", "\u001b[0mprotocols.idealize: {0} \u001b[0mpost-finalmin: (pos,rmsd,avg-bb,max-bb,avg-chi,max-chi,score) 0.796 21.383 118.459 11.875 109.576 71.292\n", "\u001b[0mprotocols.idealize: {0} \u001b[0m\n", "------------------------------------------------------------\n", " Scores Weight Raw Score Wghtd.Score\n", "------------------------------------------------------------\n", " pro_close 0.500 0.000 0.000\n", " dslf_ss_dst 0.500 0.000 0.000\n", " dslf_cs_ang 2.000 0.000 0.000\n", " coordinate_constraint 0.010 7129.185 71.292\n", "---------------------------------------------------\n", " Total weighted score: 71.292\n", "\u001b[0mprotocols.idealize: {0} \u001b[0m\n", "\u001b[0mprotocols.idealize.IdealizeMover: {0} \u001b[0mRMS between original pose and idealised pose: 0.524355 CA RMSD, 0.811604 All-Atom RMSD,\n" ] } ], "source": [ "from pyrosetta.rosetta.protocols.idealize import IdealizeMover\n", "# idealized\n", "idm = IdealizeMover()\n", "idm.apply(pose)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "理想化之后,的All-Atom RMSD发生了轻微的变化。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 习题与思考\n", "1. 尝试下载PDB数据库中的一些数据,统计分析它们的omega角,看看其分布的范围区间?\n", "2. 尝试统计每种氨基酸的Phi/Psi角的分布,它们是否有些什么差异?\n", "3. 不同的二面角它们产生分布差异的原因是什么?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 4 }