{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# SimpleMetrics\n",
    "@Author: 吴炜坤 | 槐喆\n",
    "\n",
    "@email:weikun.wu@xtalpi.com zhe.huai@xtalpi.com"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.31+release.c7009b3115c22daa9efe2805d9d1ebba08426a54 2021-08-07T10:04:12] retrieved from: http://www.pyrosetta.org\n",
      "(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.\n",
      "\u001b[0mcore.init: {0} \u001b[0mChecking for fconfig files in pwd and ./rosetta/flags\n",
      "\u001b[0mcore.init: {0} \u001b[0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r292 2021.31+release.c7009b3115c c7009b3115c22daa9efe2805d9d1ebba08426a54 http://www.pyrosetta.org 2021-08-07T10:04:12\n",
      "\u001b[0mcore.init: {0} \u001b[0mcommand: PyRosetta -ex1 -ex2aro -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database\n",
      "\u001b[0mbasic.random.init_random_generator: {0} \u001b[0m'RNG device' seed mode, using '/dev/urandom', seed=1792538074 seed_offset=0 real_seed=1792538074 thread_index=0\n",
      "\u001b[0mbasic.random.init_random_generator: {0} \u001b[0mRandomGenerator:init: Normal mode, seed=1792538074 RG_type=mt19937\n"
     ]
    }
   ],
   "source": [
    "# 初始化\n",
    "from pyrosetta import pose_from_pdb, init\n",
    "init()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.  SimpleMetrics简介\n",
    "SimpleMetrics是2018发布一个可以快速地提取和分析结构特征/性质的组件。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/SimpleMetricsFrameworks.jpg\" width = \"800\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "SimpleMetrics的计算内容:\n",
    "- RealMetric 这些metrics计算并返回实数。如二面角、相互作用能、总能量、RMSD、SASA、序列相似度、数据的后处理等。\n",
    "- StringMetric 这些metrics计算并返回字符。如氨基酸序列、二级结构、相互作用对,用于Pymol可视化的残基等。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "SimpleMetrics根据计算方式分类:\n",
    "- Standard Metrics: 返回一种metric\n",
    "- Composite Metrics: 返回多种metrics\n",
    "- Per-Residue Metrics: 返回选择残基中每个残基的metric"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.1 如何定义一个SimpleMetrics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**SimpleMetrics计算器的选择**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "首先需要弄清楚,希望得到什么特征的计算。比如我想计算残基的可及溶剂表面积(SASA)时,我应该考虑我想计算的是每个残基的SASA,还是整体的SASA。以此来选定特定的SimpleMetrics计算器。以下用PerResidueSasaMetric作为demo引出用法和一些需要特别注意的细节。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**注意1: 大部分的SimpleMetrics都支持使用ResidueSelector来指定或限定计算的范围。**\n",
    "\n",
    "**注意2: 大部分的SimpleMetrics都支持使用set_output_as_pdb_nums来指定输出的编号使用PDB还是Pose编号。**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0mcore.chemical.GlobalResidueTypeSet: {0} \u001b[0mFinished initializing fa_standard residue type set.  Created 983 residue types\n",
      "\u001b[0mcore.chemical.GlobalResidueTypeSet: {0} \u001b[0mTotal time to initialize 0.683192 seconds.\n",
      "\u001b[0mcore.import_pose.import_pose: {0} \u001b[0mFile './data/6LZ9_H_L.pdb' automatically determined to be of type PDB\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mFound disulfide between residues 21 94\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 21 CYS\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 94 CYS\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 21 CYD\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 94 CYD\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mFound disulfide between residues 141 206\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 141 CYS\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 206 CYS\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 141 CYD\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 206 CYD\n"
     ]
    }
   ],
   "source": [
    "from pyrosetta.rosetta.core.select.residue_selector import ResidueIndexSelector\n",
    "from pyrosetta.rosetta.core.simple_metrics.per_residue_metrics import PerResidueSasaMetric\n",
    "\n",
    "# 读取pose\n",
    "pose = pose_from_pdb('./data/6LZ9_H_L.pdb')\n",
    "\n",
    "# 定义SimpleMetrics计算器\n",
    "per_residue_sasa = PerResidueSasaMetric()\n",
    "sasa_sel = ResidueIndexSelector('16L-25L')  # 比如计算L链上16-25号残基每个残基的sasa值\n",
    "per_residue_sasa.set_residue_selector(sasa_sel)\n",
    "per_residue_sasa.set_output_as_pdb_nums(True)  # 输出时"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "每个Metrics都有custom_type, 也就是Metrics的名称,是可独立命名的。\n",
    "另外计算器的计算结果索引需要有一个“唯一”的索引号。默认来说,每种计算器都有自己的索引名。比如PerResidueSasaMetric计算器,他的索引名即为“res_sasa”。但是这个索引名是可以在apply时被重新自定义的:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "per_residue_sasa.apply(pose, prefix='per_', suffix='')  # 使用前缀\"per_\", 后缀为“”, 此时索引名为“per_res_sasa”"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_per_resi_sasa.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "以上,metrics的计算已经完成。\n",
    "**特别注意: 每次计算的索引号不得有重复,否则直接报错!**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "ename": "RuntimeError",
     "evalue": "\n\nFile: /Volumes/MacintoshHD3/benchmark/W.fujii.release/rosetta.Fujii.release/_commits_/main/source/src/core/simple_metrics/util.cc:282\n\n\nSimpleMetric error! \n The data of type PerResidueSasaMetric with data output tag per_res_sasa already exists! \nPlease use the prefix/suffix settings or set a custom_type for the metric.\n  See the documentation for more:\n  https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/SimpleMetrics/SimpleMetrics#effective-use-of-simplemetrics.\n Note: If this was intentional, please set the override option to true in RunSimpleMetricsMover\n\n",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mRuntimeError\u001b[0m                              Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-4-6ebfee108fc1>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;31m# 再跑一次会怎么样?:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mper_residue_sasa\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpose\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mprefix\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'per_'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msuffix\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m''\u001b[0m\u001b[0;34m)\u001b[0m  \u001b[0;31m# 使用前缀\"per_\", 后缀为“”, 此时索引名为“per_res_sasa”\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mRuntimeError\u001b[0m: \n\nFile: /Volumes/MacintoshHD3/benchmark/W.fujii.release/rosetta.Fujii.release/_commits_/main/source/src/core/simple_metrics/util.cc:282\n\n\nSimpleMetric error! \n The data of type PerResidueSasaMetric with data output tag per_res_sasa already exists! \nPlease use the prefix/suffix settings or set a custom_type for the metric.\n  See the documentation for more:\n  https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/SimpleMetrics/SimpleMetrics#effective-use-of-simplemetrics.\n Note: If this was intentional, please set the override option to true in RunSimpleMetricsMover\n\n"
     ]
    }
   ],
   "source": [
    "# 再跑一次会怎么样?:\n",
    "per_residue_sasa.apply(pose, prefix='per_', suffix='')  # 使用前缀\"per_\", 后缀为“”, 此时索引名为“per_res_sasa”"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 加个后缀名:\n",
    "per_residue_sasa.apply(pose, prefix='per_', suffix='_suf')  # 使用前缀\"per_\", 后缀为“”, 此时索引名为“per_res_sasa_suf”"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**SimpleMetricData数据的储存与提取**\n",
    "\n",
    "在pose中,数据会储存在SimpleMetricData中,因此我们需要调用 get_sm_data(pose)来获取pose中的sm cache data,然后在通过SimpleMetricData来获取具体类型的值,主要有以下几类: \n",
    "- get_real_metric_data()\n",
    "- get_string_metric_data()\n",
    "- get_per_residue_real_metric_data()\n",
    "- get_composite_real_metric_data()\n",
    "\n",
    "注意: 提取pose中Metric的data.key为字典的key值。但是我们需要知道每种metric的key值名称,如rmsd, sasa, residue_count等,直接使用切片将其取出,如不适用切片,默认返回所有值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from pyrosetta.rosetta.core.simple_metrics import get_sm_data\n",
    "sm_data = get_sm_data(pose)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "根据分类,PerResidueSasaMetric的计算结果属于real_metric_data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "map_std_string_std_map_unsigned_long_double_std_less_unsigned_long_std_allocator_std_pair_const_unsigned_long_double_t{per_res_sasa: {134:36.1789, 135:40.0819, 136:158.19, 137:5.63578, 138:53.4908, 139:1.02469, 140:44.6698, 141:1.44319, 142:146.688, 143:5.70025, }, per_res_sasa_suf: {134:36.1789, 135:40.0819, 136:158.19, 137:5.63578, 138:53.4908, 139:1.02469, 140:44.6698, 141:1.44319, 142:146.688, 143:5.70025, }}\n"
     ]
    }
   ],
   "source": [
    "per_real_metric = sm_data.get_per_residue_real_metric_data()\n",
    "print(per_real_metric)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "计算的结果被存在一个叫map_std_string_std_map_unsigned_long_double_std_less_unsigned_long_std_allocator_std_pair_const_unsigned_long_double_t类型(真长。。)的字典中。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "map_unsigned_long_double{134: 36.1789, 135: 40.0819, 136: 158.19, 137: 5.63578, 138: 53.4908, 139: 1.02469, 140: 44.6698, 141: 1.44319, 142: 146.688, 143: 5.70025}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 通过索引名直接获取数据:\n",
    "per_real_metric['per_res_sasa']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "36.17887576545817"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 获取134号残基的SASA值。\n",
    "per_real_metric['per_res_sasa'][134]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.  SimpleMetrics的更多应用实例\n",
    "为了方便大家加深理解,本章节再举几个相关的例子进行说明:\n",
    "- RealMetrics\n",
    "- StringMetrics\n",
    "- PerResidueRealMetrics\n",
    "- CompositeRealMetrics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.31+release.c7009b3115c22daa9efe2805d9d1ebba08426a54 2021-08-07T10:04:12] retrieved from: http://www.pyrosetta.org\n",
      "(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.\n",
      "\u001b[0mcore.init: {0} \u001b[0mChecking for fconfig files in pwd and ./rosetta/flags\n",
      "\u001b[0mcore.init: {0} \u001b[0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r292 2021.31+release.c7009b3115c c7009b3115c22daa9efe2805d9d1ebba08426a54 http://www.pyrosetta.org 2021-08-07T10:04:12\n",
      "\u001b[0mcore.init: {0} \u001b[0mcommand: PyRosetta -ex1 -ex2aro -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database\n",
      "\u001b[0mbasic.random.init_random_generator: {0} \u001b[0m'RNG device' seed mode, using '/dev/urandom', seed=-2065888904 seed_offset=0 real_seed=-2065888904 thread_index=0\n",
      "\u001b[0mbasic.random.init_random_generator: {0} \u001b[0mRandomGenerator:init: Normal mode, seed=-2065888904 RG_type=mt19937\n",
      "\u001b[0mcore.import_pose.import_pose: {0} \u001b[0mFile './data/6LZ9_H_L.pdb' automatically determined to be of type PDB\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mFound disulfide between residues 21 94\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 21 CYS\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 94 CYS\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 21 CYD\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 94 CYD\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mFound disulfide between residues 141 206\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 141 CYS\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 206 CYS\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 141 CYD\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 206 CYD\n"
     ]
    }
   ],
   "source": [
    "#加载与SimpleMetrics调用有关的class.\n",
    "from pyrosetta import pose_from_pdb, init\n",
    "from pyrosetta.rosetta.core.simple_metrics.metrics import *\n",
    "from pyrosetta.rosetta.core.simple_metrics.per_residue_metrics import *\n",
    "from pyrosetta.rosetta.core.simple_metrics.composite_metrics import *\n",
    "from pyrosetta.rosetta.core.simple_metrics import *\n",
    "init()\n",
    "#通过第五章ResidueSelector的学习,相信对肝细胞生长因子抗体(PDB:6LZ9)已经很熟悉了,这里依然采用这个抗体,并从PDB中读入生成pose对象。\n",
    "pose = pose_from_pdb('./data/6LZ9_H_L.pdb')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_init.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.1 RealMetrics\n",
    "RealMetrics可以很方便地计算选择残基的某一物理量,并通过get_real_metric_data()返回实数。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 2.1.1 SelectedResidueCountMetric\n",
    "这个Metric用于计算被ResidueSelector选择残基的数量。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 比如选择抗体的重链,PDB链号为\"H\":\n",
    "from pyrosetta.rosetta.core.select.residue_selector import ChainSelector\n",
    "select_heavy_chain = ChainSelector('H')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_h_chain.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "#下边将计算选择残基的数量,这里我们将前边提到的SimpleMetrics的名称和SimpleMetrics的前缀/后缀也分别简要说明。\n",
    "# expample1 SimpleMetrics的名称\n",
    "num = SelectedResidueCountMetric()\n",
    "name = 'H_chian'\n",
    "num.set_custom_type(name)  #设置名称\n",
    "num.set_residue_selector(select_heavy_chain)\n",
    "num.apply(pose) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "118.0"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pyrosetta.rosetta.core.simple_metrics import get_sm_data\n",
    "sm_data = get_sm_data(pose)\n",
    "real_metric = sm_data.get_real_metric_data()\n",
    "real_metric['H_chian_selection_count']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "# expample2 SimpleMetrics的前缀/后缀\n",
    "num = SelectedResidueCountMetric()\n",
    "num.set_residue_selector(select_heavy_chain)\n",
    "prefix = 'H_chain_'\n",
    "num.apply(pose, prefix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "118.0"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "real_metric = sm_data.get_real_metric_data()\n",
    "real_metric['H_chian_selection_count']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**再次提醒说明**<br />\n",
    "通过比较可以发现,SimpleMetrics的名称和SimpleMetrics的前缀/后缀在apply时会有所差别,当然名称和前缀/后缀也可以同时使用。<br />\n",
    "每次运行这些名字都会保存,重复运行同一个Metric要指定不同的名字,否则会报错。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PDB file name: ./data/6LZ9_H_L.pdb\n",
      " Pose Range  Chain    PDB Range  |   #Residues         #Atoms\n",
      "\n",
      "0001 -- 0081    H 0002  -- 0082  |   0081 residues;    01283 atoms\n",
      "0082 -- 0082    H 0082A -- 0082A |   0001 residues;    00011 atoms\n",
      "0083 -- 0083    H 0082B -- 0082B |   0001 residues;    00011 atoms\n",
      "0084 -- 0084    H 0082C -- 0082C |   0001 residues;    00019 atoms\n",
      "0085 -- 0102    H 0083  -- 0100  |   0018 residues;    00271 atoms\n",
      "0103 -- 0103    H 0100A -- 0100A |   0001 residues;    00010 atoms\n",
      "0104 -- 0104    H 0100B -- 0100B |   0001 residues;    00021 atoms\n",
      "0105 -- 0105    H 0100C -- 0100C |   0001 residues;    00021 atoms\n",
      "0106 -- 0106    H 0100D -- 0100D |   0001 residues;    00010 atoms\n",
      "0107 -- 0107    H 0100E -- 0100E |   0001 residues;    00017 atoms\n",
      "0108 -- 0118    H 0101  -- 0111  |   0011 residues;    00160 atoms\n",
      "0119 -- 0223    L 0001  -- 0105  |   0105 residues;    01600 atoms\n",
      "                           TOTAL |   0223 residues;    03434 atoms\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# 再来看抗体的残基基本信息:\n",
    "print(pose.pdb_info())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**结果解读**<br />\n",
    "抗体中含有H和L两条链,分别为1-118和119-223。SelectedResidueCountMetric计算结果与此一致。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 2.1.2 SasaMetric \n",
    "计算选择残基的溶剂可及表面积。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 比如计算L链的溶剂可及表面积。\n",
    "select_light_chain = ChainSelector('L')\n",
    "sasa = SasaMetric(select_light_chain)\n",
    "prefix = 'L_chain_'\n",
    "sasa.apply(pose,prefix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_l_chain_sasa.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4854.614232155741"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "real_metric = sm_data.get_real_metric_data()\n",
    "real_metric['L_chain_sasa']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.2 StringMetrics\n",
    "这个Metrics计算得到的结果通过get_string_metric_data来返回字符串。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 2.2.1 SelectedResiduesMetric(非常实用!)\n",
    "返回选择残基的poseID或PDBID, 可以快速获取ResidueSelector中到底都选择了哪些残基的列表!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "#指定一段残基来给出相应的ID\n",
    "from pyrosetta.rosetta.core.select.residue_selector import ResidueIndexSelector\n",
    "range_selector = ResidueIndexSelector('18H-24H')\n",
    "selected = range_selector.apply(pose)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "selected_index = SelectedResiduesMetric(range_selector)\n",
    "selected_index.set_output_in_rosetta_num(True) # True: poseID, False: PDBID\n",
    "prefix = 'ID_'\n",
    "selected_index.apply(pose,prefix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'17,18,19,20,21,22,23'"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pyrosetta.rosetta.core.simple_metrics import get_sm_data\n",
    "sm_data = get_sm_data(pose)\n",
    "string_metric = sm_data.get_string_metric_data()\n",
    "string_metric['ID_selection']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[17, 18, 19, 20, 21, 22, 23]\n"
     ]
    }
   ],
   "source": [
    "#验证SelectedResiduesMetric返回值是否正确:\n",
    "index_list = [index+1 for index, i in enumerate(selected) if i == 1]\n",
    "print(index_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#####  2.2.2 SelectedResiduesPyMOLMetric\n",
    "返回pymol命令: 粘贴复制后可以用于在Pymol中选择selector中的残基。<br />\n",
    "这个Metric在第五章ResidueSelectors中得到了广泛使用,可以一目了然的知道选择器选择了哪些残基。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "index_selector = ResidueIndexSelector('41H,43H,45H')\n",
    "pymol_selected = SelectedResiduesPyMOLMetric()\n",
    "pymol_selected.set_residue_selector(index_selector)\n",
    "prefix = 'index_select_'\n",
    "pymol_selected.apply(pose,prefix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'select rosetta_sele, (chain H and resid 41,43,45)'"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string_metric = sm_data.get_string_metric_data()\n",
    "string_metric['index_select_pymol_selection']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_index.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.3 PerResidueRealMetrics\n",
    "这个Metrics与RealMetrics功能相同,但计算得到的是每个残基的物理量。<br />\n",
    "所有的PerResidueRealMetric都可以指定输出类型为PDBID或者是poseID 利用选项output_as_pdb_nums即可。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#####  2.3.1 PerResidueEnergyMetric\n",
    "计算每个残基的总能(或某一个能量类型如fa_atr, fa_rep, fa_elec等),返回能量的值是经过权重。通过分解总能得到每个氨基酸的能量。\n",
    "\n",
    "这个计算器有两种使用的模式:\n",
    "- Example1: 残基能量分解\n",
    "- Example2: 两个Pose之间的每个残基能量对比差计算"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "#expample1 计算H链11到20号残基的fa_rep类型能量\n",
    "range_sel = ResidueIndexSelector('11H-20H')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0mcore.scoring.etable: {0} \u001b[0mStarting energy table calculation\n",
      "\u001b[0mcore.scoring.etable: {0} \u001b[0msmooth_etable: changing atr/rep split to bottom of energy well\n",
      "\u001b[0mcore.scoring.etable: {0} \u001b[0msmooth_etable: spline smoothing lj etables (maxdis = 6)\n",
      "\u001b[0mcore.scoring.etable: {0} \u001b[0msmooth_etable: spline smoothing solvation etables (max_dis = 6)\n",
      "\u001b[0mcore.scoring.etable: {0} \u001b[0mFinished calculating energy tables.\n",
      "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBPoly1D.csv\n",
      "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBFadeIntervals.csv\n",
      "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBEval.csv\n",
      "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/DonStrength.csv\n",
      "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/AccStrength.csv\n",
      "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/rama/fd/all.ramaProb\n",
      "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/rama/fd/prepro.ramaProb\n",
      "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/omega/omega_ppdep.all.txt\n",
      "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/omega/omega_ppdep.gly.txt\n",
      "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/omega/omega_ppdep.pro.txt\n",
      "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/omega/omega_ppdep.valile.txt\n",
      "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/P_AA_pp/P_AA\n",
      "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/P_AA_pp/P_AA_n\n",
      "\u001b[0mcore.scoring.P_AA: {0} \u001b[0mshapovalov_lib::shap_p_aa_pp_smooth_level of 1( aka low_smooth ) got activated.\n",
      "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/P_AA_pp/shapovalov/10deg/kappa131/a20.prop\n",
      "\u001b[0mbasic.io.database: {0} \u001b[0mDatabase file opened: scoring/score_functions/elec_cp_reps.dat\n",
      "\u001b[0mcore.scoring.elec.util: {0} \u001b[0mRead 40 countpair representative atoms\n",
      "\u001b[0mcore.pack.dunbrack.RotamerLibrary: {0} \u001b[0mshapovalov_lib_fixes_enable option is true.\n",
      "\u001b[0mcore.pack.dunbrack.RotamerLibrary: {0} \u001b[0mshapovalov_lib::shap_dun10_smooth_level of 1( aka lowest_smooth ) got activated.\n",
      "\u001b[0mcore.pack.dunbrack.RotamerLibrary: {0} \u001b[0mBinary rotamer library selected: /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin\n",
      "\u001b[0mcore.pack.dunbrack.RotamerLibrary: {0} \u001b[0mUsing Dunbrack library binary file '/opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin'.\n",
      "\u001b[0mcore.pack.dunbrack.RotamerLibrary: {0} \u001b[0mDunbrack 2010 library took 0.17968 seconds to load from binary\n"
     ]
    }
   ],
   "source": [
    "from pyrosetta.rosetta.core.scoring import fa_atr, fa_rep, fa_elec\n",
    "from pyrosetta import create_score_function\n",
    "per_residue_energy = PerResidueEnergyMetric()\n",
    "per_residue_energy.set_residue_selector(range_sel) \n",
    "per_residue_energy.set_scoretype(fa_rep)\n",
    "scorefxn = create_score_function('ref2015')\n",
    "per_residue_energy.set_scorefunction(scorefxn)\n",
    "per_residue_energy.apply(pose,'per_')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "map_unsigned_long_double{10: 0.141929, 11: 0.917974, 12: 1.55712, 13: 0.765261, 14: 0.137776, 15: 2.04166, 16: 0.549227, 17: 3.96559, 18: 0.523508, 19: 3.02056}"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "per_real_metric = sm_data.get_per_residue_real_metric_data()\n",
    "per_real_metric['per_res_energy']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0mcore.import_pose.import_pose: {0} \u001b[0mFile './data/3K2U_H_L.pdb' automatically determined to be of type PDB\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  OXT on residue THR:CtermProteinFull 121\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  OXT on residue ARG:CtermProteinFull 229\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mFound disulfide between residues 22 96\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 22 CYS\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 96 CYS\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 22 CYD\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 96 CYD\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mFound disulfide between residues 144 209\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 144 CYS\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 209 CYS\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 144 CYD\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 209 CYD\n"
     ]
    }
   ],
   "source": [
    "#expample2 计算两个pose间每个残基的能量差,这里需要引入另一个pose\n",
    "com_pose = pose_from_pdb('./data/3K2U_H_L.pdb')\n",
    "#3K2U与6LZ9的结构是非常相似的"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/3K2U_init.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "per_residue_energy = PerResidueEnergyMetric()\n",
    "per_residue_energy.set_residue_selector(range_sel) \n",
    "per_residue_energy.set_scoretype(fa_rep)\n",
    "scorefxn = create_score_function('ref2015')\n",
    "per_residue_energy.set_scorefunction(scorefxn)\n",
    "per_residue_energy.set_comparison_pose(com_pose)  # <-主要变化的设置在这里;\n",
    "per_residue_energy.apply(pose,'com_pre_')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "map_unsigned_long_double{10: 0.0422257, 11: 0.0217457, 12: -1.09715, 13: -0.255306, 14: -1.16531, 15: 1.88455, 16: 0.385523, 17: 3.58768, 18: -1.32255, 19: 2.705}"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "per_real_metric = sm_data.get_per_residue_real_metric_data()\n",
    "per_real_metric['com_pre_res_energy']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.4 Composite Metrics\n",
    "同时返回多个物理量,并通过get_composite_real_metric_data来返回实数。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 2.4.1 CompositeEnergyMetric\n",
    "计算总能量中每一项能量值,功能与TotalEnergyMetric部分重合,但是比其方便,可以一次输出所有的值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 计算两个pose的H链30到50号残基的各项能量差\n",
    "from pyrosetta.rosetta.core.simple_metrics.composite_metrics import CompositeEnergyMetric\n",
    "composite_sel = ResidueIndexSelector('30H-50H')\n",
    "composite_energy = CompositeEnergyMetric(composite_sel)\n",
    "composite_energy.set_comparison_pose(com_pose) #如果不指定比较的pose,仅计算pose的各项能量\n",
    "scorefxn = create_score_function('ref2015')\n",
    "composite_energy.set_scorefunction(scorefxn)\n",
    "composite_energy.apply(pose,'30H-50H_')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "map_std_string_double{dslf_fa13: 0, fa_atr: -3.88672, fa_dun: -2.83299, fa_elec: 1.73887, fa_intra_rep: -0.0201983, fa_intra_sol_xover4: -0.675227, fa_rep: -12.926, fa_sol: 7.53877, hbond_bb_sc: 0.0272416, hbond_lr_bb: -0.36422, hbond_sc: 1.24952, hbond_sr_bb: 0.455738, lk_ball_wtd: -1.62749, omega: 3.88765, p_aa_pp: -0.0911932, pro_close: -3.84744, rama_prepro: 0.450053, ref: -1.64251, yhh_planarity: 1.2478e-07}"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pyrosetta.rosetta.core.simple_metrics import get_sm_data\n",
    "sm_data = get_sm_data(pose)\n",
    "composite_real_metric = sm_data.get_composite_real_metric_data()\n",
    "composite_real_metric['30H-50H_composite_energy']"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}