{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SimpleMetrics\n", "@Author: 吴炜坤 | 槐喆\n", "\n", "@email:weikun.wu@xtalpi.com zhe.huai@xtalpi.com" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.31+release.c7009b3115c22daa9efe2805d9d1ebba08426a54 2021-08-07T10:04:12] retrieved from: http://www.pyrosetta.org\n", "(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.\n", "\u001b[0mcore.init: {0} \u001b[0mChecking for fconfig files in pwd and ./rosetta/flags\n", "\u001b[0mcore.init: {0} \u001b[0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r292 2021.31+release.c7009b3115c c7009b3115c22daa9efe2805d9d1ebba08426a54 http://www.pyrosetta.org 2021-08-07T10:04:12\n", "\u001b[0mcore.init: {0} \u001b[0mcommand: PyRosetta -ex1 -ex2aro -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database\n", "\u001b[0mbasic.random.init_random_generator: {0} \u001b[0m'RNG device' seed mode, using '/dev/urandom', seed=1792538074 seed_offset=0 real_seed=1792538074 thread_index=0\n", "\u001b[0mbasic.random.init_random_generator: {0} \u001b[0mRandomGenerator:init: Normal mode, seed=1792538074 RG_type=mt19937\n" ] } ], "source": [ "# 初始化\n", "from pyrosetta import pose_from_pdb, init\n", "init()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. SimpleMetrics简介\n", "SimpleMetrics是2018发布一个可以快速地提取和分析结构特征/性质的组件。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<center><img src=\"./img/SimpleMetricsFrameworks.jpg\" width = \"800\" height = \"300\" align=center /> </center>\n", "(图片来源: 晶泰科技团队)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SimpleMetrics的计算内容:\n", "- RealMetric 这些metrics计算并返回实数。如二面角、相互作用能、总能量、RMSD、SASA、序列相似度、数据的后处理等。\n", "- StringMetric 这些metrics计算并返回字符。如氨基酸序列、二级结构、相互作用对,用于Pymol可视化的残基等。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SimpleMetrics根据计算方式分类:\n", "- Standard Metrics: 返回一种metric\n", "- Composite Metrics: 返回多种metrics\n", "- Per-Residue Metrics: 返回选择残基中每个残基的metric" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1.1 如何定义一个SimpleMetrics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**SimpleMetrics计算器的选择**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "首先需要弄清楚,希望得到什么特征的计算。比如我想计算残基的可及溶剂表面积(SASA)时,我应该考虑我想计算的是每个残基的SASA,还是整体的SASA。以此来选定特定的SimpleMetrics计算器。以下用PerResidueSasaMetric作为demo引出用法和一些需要特别注意的细节。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**注意1: 大部分的SimpleMetrics都支持使用ResidueSelector来指定或限定计算的范围。**\n", "\n", "**注意2: 大部分的SimpleMetrics都支持使用set_output_as_pdb_nums来指定输出的编号使用PDB还是Pose编号。**" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0mcore.chemical.GlobalResidueTypeSet: {0} \u001b[0mFinished initializing fa_standard residue type set. Created 983 residue types\n", "\u001b[0mcore.chemical.GlobalResidueTypeSet: {0} \u001b[0mTotal time to initialize 0.683192 seconds.\n", "\u001b[0mcore.import_pose.import_pose: {0} \u001b[0mFile './data/6LZ9_H_L.pdb' automatically determined to be of type PDB\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mFound disulfide between residues 21 94\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 21 CYS\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 94 CYS\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 21 CYD\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 94 CYD\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mFound disulfide between residues 141 206\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 141 CYS\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 206 CYS\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 141 CYD\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 206 CYD\n" ] } ], "source": [ "from pyrosetta.rosetta.core.select.residue_selector import ResidueIndexSelector\n", "from pyrosetta.rosetta.core.simple_metrics.per_residue_metrics import PerResidueSasaMetric\n", "\n", "# 读取pose\n", "pose = pose_from_pdb('./data/6LZ9_H_L.pdb')\n", "\n", "# 定义SimpleMetrics计算器\n", "per_residue_sasa = PerResidueSasaMetric()\n", "sasa_sel = ResidueIndexSelector('16L-25L') # 比如计算L链上16-25号残基每个残基的sasa值\n", "per_residue_sasa.set_residue_selector(sasa_sel)\n", "per_residue_sasa.set_output_as_pdb_nums(True) # 输出时" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "每个Metrics都有custom_type, 也就是Metrics的名称,是可独立命名的。\n", "另外计算器的计算结果索引需要有一个“唯一”的索引号。默认来说,每种计算器都有自己的索引名。比如PerResidueSasaMetric计算器,他的索引名即为“res_sasa”。但是这个索引名是可以在apply时被重新自定义的:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "per_residue_sasa.apply(pose, prefix='per_', suffix='') # 使用前缀\"per_\", 后缀为“”, 此时索引名为“per_res_sasa”" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<center><img src=\"./img/6LZ9_per_resi_sasa.png\" width = \"400\" height = \"300\" align=center /> </center>\n", "(图片来源: 晶泰科技团队)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "以上,metrics的计算已经完成。\n", "**特别注意: 每次计算的索引号不得有重复,否则直接报错!**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "ename": "RuntimeError", "evalue": "\n\nFile: /Volumes/MacintoshHD3/benchmark/W.fujii.release/rosetta.Fujii.release/_commits_/main/source/src/core/simple_metrics/util.cc:282\n\n\nSimpleMetric error! \n The data of type PerResidueSasaMetric with data output tag per_res_sasa already exists! \nPlease use the prefix/suffix settings or set a custom_type for the metric.\n See the documentation for more:\n https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/SimpleMetrics/SimpleMetrics#effective-use-of-simplemetrics.\n Note: If this was intentional, please set the override option to true in RunSimpleMetricsMover\n\n", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mRuntimeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m<ipython-input-4-6ebfee108fc1>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# 再跑一次会怎么样?:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mper_residue_sasa\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpose\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mprefix\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'per_'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msuffix\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m''\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# 使用前缀\"per_\", 后缀为“”, 此时索引名为“per_res_sasa”\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mRuntimeError\u001b[0m: \n\nFile: /Volumes/MacintoshHD3/benchmark/W.fujii.release/rosetta.Fujii.release/_commits_/main/source/src/core/simple_metrics/util.cc:282\n\n\nSimpleMetric error! \n The data of type PerResidueSasaMetric with data output tag per_res_sasa already exists! \nPlease use the prefix/suffix settings or set a custom_type for the metric.\n See the documentation for more:\n https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/SimpleMetrics/SimpleMetrics#effective-use-of-simplemetrics.\n Note: If this was intentional, please set the override option to true in RunSimpleMetricsMover\n\n" ] } ], "source": [ "# 再跑一次会怎么样?:\n", "per_residue_sasa.apply(pose, prefix='per_', suffix='') # 使用前缀\"per_\", 后缀为“”, 此时索引名为“per_res_sasa”" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# 加个后缀名:\n", "per_residue_sasa.apply(pose, prefix='per_', suffix='_suf') # 使用前缀\"per_\", 后缀为“”, 此时索引名为“per_res_sasa_suf”" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**SimpleMetricData数据的储存与提取**\n", "\n", "在pose中,数据会储存在SimpleMetricData中,因此我们需要调用 get_sm_data(pose)来获取pose中的sm cache data,然后在通过SimpleMetricData来获取具体类型的值,主要有以下几类: \n", "- get_real_metric_data()\n", "- get_string_metric_data()\n", "- get_per_residue_real_metric_data()\n", "- get_composite_real_metric_data()\n", "\n", "注意: 提取pose中Metric的data.key为字典的key值。但是我们需要知道每种metric的key值名称,如rmsd, sasa, residue_count等,直接使用切片将其取出,如不适用切片,默认返回所有值。" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": true }, "outputs": [], "source": [ "from pyrosetta.rosetta.core.simple_metrics import get_sm_data\n", "sm_data = get_sm_data(pose)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "根据分类,PerResidueSasaMetric的计算结果属于real_metric_data:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "map_std_string_std_map_unsigned_long_double_std_less_unsigned_long_std_allocator_std_pair_const_unsigned_long_double_t{per_res_sasa: {134:36.1789, 135:40.0819, 136:158.19, 137:5.63578, 138:53.4908, 139:1.02469, 140:44.6698, 141:1.44319, 142:146.688, 143:5.70025, }, per_res_sasa_suf: {134:36.1789, 135:40.0819, 136:158.19, 137:5.63578, 138:53.4908, 139:1.02469, 140:44.6698, 141:1.44319, 142:146.688, 143:5.70025, }}\n" ] } ], "source": [ "per_real_metric = sm_data.get_per_residue_real_metric_data()\n", "print(per_real_metric)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "计算的结果被存在一个叫map_std_string_std_map_unsigned_long_double_std_less_unsigned_long_std_allocator_std_pair_const_unsigned_long_double_t类型(真长。。)的字典中。" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "map_unsigned_long_double{134: 36.1789, 135: 40.0819, 136: 158.19, 137: 5.63578, 138: 53.4908, 139: 1.02469, 140: 44.6698, 141: 1.44319, 142: 146.688, 143: 5.70025}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 通过索引名直接获取数据:\n", "per_real_metric['per_res_sasa']" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "36.17887576545817" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 获取134号残基的SASA值。\n", "per_real_metric['per_res_sasa'][134]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. SimpleMetrics的更多应用实例\n", "为了方便大家加深理解,本章节再举几个相关的例子进行说明:\n", "- RealMetrics\n", "- StringMetrics\n", "- PerResidueRealMetrics\n", "- CompositeRealMetrics" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.31+release.c7009b3115c22daa9efe2805d9d1ebba08426a54 2021-08-07T10:04:12] retrieved from: http://www.pyrosetta.org\n", "(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.\n", "\u001b[0mcore.init: {0} \u001b[0mChecking for fconfig files in pwd and ./rosetta/flags\n", "\u001b[0mcore.init: {0} \u001b[0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r292 2021.31+release.c7009b3115c c7009b3115c22daa9efe2805d9d1ebba08426a54 http://www.pyrosetta.org 2021-08-07T10:04:12\n", "\u001b[0mcore.init: {0} \u001b[0mcommand: PyRosetta -ex1 -ex2aro -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database\n", "\u001b[0mbasic.random.init_random_generator: {0} \u001b[0m'RNG device' seed mode, using '/dev/urandom', seed=-2065888904 seed_offset=0 real_seed=-2065888904 thread_index=0\n", "\u001b[0mbasic.random.init_random_generator: {0} \u001b[0mRandomGenerator:init: Normal mode, seed=-2065888904 RG_type=mt19937\n", "\u001b[0mcore.import_pose.import_pose: {0} \u001b[0mFile './data/6LZ9_H_L.pdb' automatically determined to be of type PDB\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mFound disulfide between residues 21 94\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 21 CYS\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 94 CYS\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 21 CYD\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 94 CYD\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mFound disulfide between residues 141 206\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 141 CYS\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 206 CYS\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 141 CYD\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 206 CYD\n" ] } ], "source": [ "#加载与SimpleMetrics调用有关的class.\n", "from pyrosetta import pose_from_pdb, init\n", "from pyrosetta.rosetta.core.simple_metrics.metrics import *\n", "from pyrosetta.rosetta.core.simple_metrics.per_residue_metrics import *\n", "from pyrosetta.rosetta.core.simple_metrics.composite_metrics import *\n", "from pyrosetta.rosetta.core.simple_metrics import *\n", "init()\n", "#通过第五章ResidueSelector的学习,相信对肝细胞生长因子抗体(PDB:6LZ9)已经很熟悉了,这里依然采用这个抗体,并从PDB中读入生成pose对象。\n", "pose = pose_from_pdb('./data/6LZ9_H_L.pdb')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<center><img src=\"./img/6LZ9_init.png\" width = \"400\" height = \"300\" align=center /> </center>\n", "(图片来源: 晶泰科技团队)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2.1 RealMetrics\n", "RealMetrics可以很方便地计算选择残基的某一物理量,并通过get_real_metric_data()返回实数。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 2.1.1 SelectedResidueCountMetric\n", "这个Metric用于计算被ResidueSelector选择残基的数量。" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# 比如选择抗体的重链,PDB链号为\"H\":\n", "from pyrosetta.rosetta.core.select.residue_selector import ChainSelector\n", "select_heavy_chain = ChainSelector('H')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<center><img src=\"./img/6LZ9_h_chain.png\" width = \"400\" height = \"300\" align=center /> </center>\n", "(图片来源: 晶泰科技团队)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "#下边将计算选择残基的数量,这里我们将前边提到的SimpleMetrics的名称和SimpleMetrics的前缀/后缀也分别简要说明。\n", "# expample1 SimpleMetrics的名称\n", "num = SelectedResidueCountMetric()\n", "name = 'H_chian'\n", "num.set_custom_type(name) #设置名称\n", "num.set_residue_selector(select_heavy_chain)\n", "num.apply(pose) " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "118.0" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pyrosetta.rosetta.core.simple_metrics import get_sm_data\n", "sm_data = get_sm_data(pose)\n", "real_metric = sm_data.get_real_metric_data()\n", "real_metric['H_chian_selection_count']" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# expample2 SimpleMetrics的前缀/后缀\n", "num = SelectedResidueCountMetric()\n", "num.set_residue_selector(select_heavy_chain)\n", "prefix = 'H_chain_'\n", "num.apply(pose, prefix)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "118.0" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "real_metric = sm_data.get_real_metric_data()\n", "real_metric['H_chian_selection_count']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**再次提醒说明**<br />\n", "通过比较可以发现,SimpleMetrics的名称和SimpleMetrics的前缀/后缀在apply时会有所差别,当然名称和前缀/后缀也可以同时使用。<br />\n", "每次运行这些名字都会保存,重复运行同一个Metric要指定不同的名字,否则会报错。" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PDB file name: ./data/6LZ9_H_L.pdb\n", " Pose Range Chain PDB Range | #Residues #Atoms\n", "\n", "0001 -- 0081 H 0002 -- 0082 | 0081 residues; 01283 atoms\n", "0082 -- 0082 H 0082A -- 0082A | 0001 residues; 00011 atoms\n", "0083 -- 0083 H 0082B -- 0082B | 0001 residues; 00011 atoms\n", "0084 -- 0084 H 0082C -- 0082C | 0001 residues; 00019 atoms\n", "0085 -- 0102 H 0083 -- 0100 | 0018 residues; 00271 atoms\n", "0103 -- 0103 H 0100A -- 0100A | 0001 residues; 00010 atoms\n", "0104 -- 0104 H 0100B -- 0100B | 0001 residues; 00021 atoms\n", "0105 -- 0105 H 0100C -- 0100C | 0001 residues; 00021 atoms\n", "0106 -- 0106 H 0100D -- 0100D | 0001 residues; 00010 atoms\n", "0107 -- 0107 H 0100E -- 0100E | 0001 residues; 00017 atoms\n", "0108 -- 0118 H 0101 -- 0111 | 0011 residues; 00160 atoms\n", "0119 -- 0223 L 0001 -- 0105 | 0105 residues; 01600 atoms\n", " TOTAL | 0223 residues; 03434 atoms\n", "\n" ] } ], "source": [ "# 再来看抗体的残基基本信息:\n", "print(pose.pdb_info())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**结果解读**<br />\n", "抗体中含有H和L两条链,分别为1-118和119-223。SelectedResidueCountMetric计算结果与此一致。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 2.1.2 SasaMetric \n", "计算选择残基的溶剂可及表面积。" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# 比如计算L链的溶剂可及表面积。\n", "select_light_chain = ChainSelector('L')\n", "sasa = SasaMetric(select_light_chain)\n", "prefix = 'L_chain_'\n", "sasa.apply(pose,prefix)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<center><img src=\"./img/6LZ9_l_chain_sasa.png\" width = \"400\" height = \"300\" align=center /> </center>\n", "(图片来源: 晶泰科技团队)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4854.614232155741" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "real_metric = sm_data.get_real_metric_data()\n", "real_metric['L_chain_sasa']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2.2 StringMetrics\n", "这个Metrics计算得到的结果通过get_string_metric_data来返回字符串。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 2.2.1 SelectedResiduesMetric(非常实用!)\n", "返回选择残基的poseID或PDBID, 可以快速获取ResidueSelector中到底都选择了哪些残基的列表!" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "#指定一段残基来给出相应的ID\n", "from pyrosetta.rosetta.core.select.residue_selector import ResidueIndexSelector\n", "range_selector = ResidueIndexSelector('18H-24H')\n", "selected = range_selector.apply(pose)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "selected_index = SelectedResiduesMetric(range_selector)\n", "selected_index.set_output_in_rosetta_num(True) # True: poseID, False: PDBID\n", "prefix = 'ID_'\n", "selected_index.apply(pose,prefix)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'17,18,19,20,21,22,23'" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pyrosetta.rosetta.core.simple_metrics import get_sm_data\n", "sm_data = get_sm_data(pose)\n", "string_metric = sm_data.get_string_metric_data()\n", "string_metric['ID_selection']" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[17, 18, 19, 20, 21, 22, 23]\n" ] } ], "source": [ "#验证SelectedResiduesMetric返回值是否正确:\n", "index_list = [index+1 for index, i in enumerate(selected) if i == 1]\n", "print(index_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 2.2.2 SelectedResiduesPyMOLMetric\n", "返回pymol命令: 粘贴复制后可以用于在Pymol中选择selector中的残基。<br />\n", "这个Metric在第五章ResidueSelectors中得到了广泛使用,可以一目了然的知道选择器选择了哪些残基。" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "index_selector = ResidueIndexSelector('41H,43H,45H')\n", "pymol_selected = SelectedResiduesPyMOLMetric()\n", "pymol_selected.set_residue_selector(index_selector)\n", "prefix = 'index_select_'\n", "pymol_selected.apply(pose,prefix)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'select rosetta_sele, (chain H and resid 41,43,45)'" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "string_metric = sm_data.get_string_metric_data()\n", "string_metric['index_select_pymol_selection']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<center><img src=\"./img/6LZ9_index.png\" width = \"400\" height = \"300\" align=center /> </center>\n", "(图片来源: 晶泰科技团队)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2.3 PerResidueRealMetrics\n", "这个Metrics与RealMetrics功能相同,但计算得到的是每个残基的物理量。<br />\n", "所有的PerResidueRealMetric都可以指定输出类型为PDBID或者是poseID 利用选项output_as_pdb_nums即可。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 2.3.1 PerResidueEnergyMetric\n", "计算每个残基的总能(或某一个能量类型如fa_atr, fa_rep, fa_elec等),返回能量的值是经过权重。通过分解总能得到每个氨基酸的能量。\n", "\n", "这个计算器有两种使用的模式:\n", "- Example1: 残基能量分解\n", "- Example2: 两个Pose之间的每个残基能量对比差计算" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "#expample1 计算H链11到20号残基的fa_rep类型能量\n", "range_sel = ResidueIndexSelector('11H-20H')" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0mcore.scoring.etable: {0} \u001b[0mStarting energy table calculation\n", "\u001b[0mcore.scoring.etable: {0} \u001b[0msmooth_etable: changing atr/rep split to bottom of energy well\n", "\u001b[0mcore.scoring.etable: {0} \u001b[0msmooth_etable: spline smoothing lj etables (maxdis = 6)\n", "\u001b[0mcore.scoring.etable: {0} \u001b[0msmooth_etable: spline smoothing solvation etables (max_dis = 6)\n", "\u001b[0mcore.scoring.etable: {0} \u001b[0mFinished calculating energy tables.\n", "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBPoly1D.csv\n", "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBFadeIntervals.csv\n", "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBEval.csv\n", "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/DonStrength.csv\n", "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/AccStrength.csv\n", "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/rama/fd/all.ramaProb\n", "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/rama/fd/prepro.ramaProb\n", "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/omega/omega_ppdep.all.txt\n", "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/omega/omega_ppdep.gly.txt\n", "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/omega/omega_ppdep.pro.txt\n", "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/omega/omega_ppdep.valile.txt\n", "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/P_AA_pp/P_AA\n", "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/P_AA_pp/P_AA_n\n", "\u001b[0mcore.scoring.P_AA: {0} \u001b[0mshapovalov_lib::shap_p_aa_pp_smooth_level of 1( aka low_smooth ) got activated.\n", "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/P_AA_pp/shapovalov/10deg/kappa131/a20.prop\n", "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/elec_cp_reps.dat\n", "\u001b[0mcore.scoring.elec.util: {0} \u001b[0mRead 40 countpair representative atoms\n", "\u001b[0mcore.pack.dunbrack.RotamerLibrary: {0} \u001b[0mshapovalov_lib_fixes_enable option is true.\n", "\u001b[0mcore.pack.dunbrack.RotamerLibrary: {0} \u001b[0mshapovalov_lib::shap_dun10_smooth_level of 1( aka lowest_smooth ) got activated.\n", "\u001b[0mcore.pack.dunbrack.RotamerLibrary: {0} \u001b[0mBinary rotamer library selected: /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin\n", "\u001b[0mcore.pack.dunbrack.RotamerLibrary: {0} \u001b[0mUsing Dunbrack library binary file '/opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin'.\n", "\u001b[0mcore.pack.dunbrack.RotamerLibrary: {0} \u001b[0mDunbrack 2010 library took 0.17968 seconds to load from binary\n" ] } ], "source": [ "from pyrosetta.rosetta.core.scoring import fa_atr, fa_rep, fa_elec\n", "from pyrosetta import create_score_function\n", "per_residue_energy = PerResidueEnergyMetric()\n", "per_residue_energy.set_residue_selector(range_sel) \n", "per_residue_energy.set_scoretype(fa_rep)\n", "scorefxn = create_score_function('ref2015')\n", "per_residue_energy.set_scorefunction(scorefxn)\n", "per_residue_energy.apply(pose,'per_')" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "map_unsigned_long_double{10: 0.141929, 11: 0.917974, 12: 1.55712, 13: 0.765261, 14: 0.137776, 15: 2.04166, 16: 0.549227, 17: 3.96559, 18: 0.523508, 19: 3.02056}" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "per_real_metric = sm_data.get_per_residue_real_metric_data()\n", "per_real_metric['per_res_energy']" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0mcore.import_pose.import_pose: {0} \u001b[0mFile './data/3K2U_H_L.pdb' automatically determined to be of type PDB\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom: OXT on residue THR:CtermProteinFull 121\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom: OXT on residue ARG:CtermProteinFull 229\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mFound disulfide between residues 22 96\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 22 CYS\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 96 CYS\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 22 CYD\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 96 CYD\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mFound disulfide between residues 144 209\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 144 CYS\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 209 CYS\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 144 CYD\n", "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 209 CYD\n" ] } ], "source": [ "#expample2 计算两个pose间每个残基的能量差,这里需要引入另一个pose\n", "com_pose = pose_from_pdb('./data/3K2U_H_L.pdb')\n", "#3K2U与6LZ9的结构是非常相似的" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<center><img src=\"./img/3K2U_init.png\" width = \"400\" height = \"300\" align=center /> </center>\n", "(图片来源: 晶泰科技团队)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "per_residue_energy = PerResidueEnergyMetric()\n", "per_residue_energy.set_residue_selector(range_sel) \n", "per_residue_energy.set_scoretype(fa_rep)\n", "scorefxn = create_score_function('ref2015')\n", "per_residue_energy.set_scorefunction(scorefxn)\n", "per_residue_energy.set_comparison_pose(com_pose) # <-主要变化的设置在这里;\n", "per_residue_energy.apply(pose,'com_pre_')" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "map_unsigned_long_double{10: 0.0422257, 11: 0.0217457, 12: -1.09715, 13: -0.255306, 14: -1.16531, 15: 1.88455, 16: 0.385523, 17: 3.58768, 18: -1.32255, 19: 2.705}" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "per_real_metric = sm_data.get_per_residue_real_metric_data()\n", "per_real_metric['com_pre_res_energy']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2.4 Composite Metrics\n", "同时返回多个物理量,并通过get_composite_real_metric_data来返回实数。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 2.4.1 CompositeEnergyMetric\n", "计算总能量中每一项能量值,功能与TotalEnergyMetric部分重合,但是比其方便,可以一次输出所有的值。" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "# 计算两个pose的H链30到50号残基的各项能量差\n", "from pyrosetta.rosetta.core.simple_metrics.composite_metrics import CompositeEnergyMetric\n", "composite_sel = ResidueIndexSelector('30H-50H')\n", "composite_energy = CompositeEnergyMetric(composite_sel)\n", "composite_energy.set_comparison_pose(com_pose) #如果不指定比较的pose,仅计算pose的各项能量\n", "scorefxn = create_score_function('ref2015')\n", "composite_energy.set_scorefunction(scorefxn)\n", "composite_energy.apply(pose,'30H-50H_')" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "map_std_string_double{dslf_fa13: 0, fa_atr: -3.88672, fa_dun: -2.83299, fa_elec: 1.73887, fa_intra_rep: -0.0201983, fa_intra_sol_xover4: -0.675227, fa_rep: -12.926, fa_sol: 7.53877, hbond_bb_sc: 0.0272416, hbond_lr_bb: -0.36422, hbond_sc: 1.24952, hbond_sr_bb: 0.455738, lk_ball_wtd: -1.62749, omega: 3.88765, p_aa_pp: -0.0911932, pro_close: -3.84744, rama_prepro: 0.450053, ref: -1.64251, yhh_planarity: 1.2478e-07}" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pyrosetta.rosetta.core.simple_metrics import get_sm_data\n", "sm_data = get_sm_data(pose)\n", "composite_real_metric = sm_data.get_composite_real_metric_data()\n", "composite_real_metric['30H-50H_composite_energy']" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 4 }