{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ResidueSelectors的逻辑\n",
    "@Author: 槐喆\n",
    "@email:zhe.huai@xtalpi.com\n",
    "\n",
    "@Proofread: 吴炜坤\n",
    "@email:weikun.wu@xtalpi.com"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "氨基酸选择器(ResidueSelector)具有十分重要的功能。它能够从蛋白质结构(Pose)中选取并生成氨基酸子集。一旦生成了这些子集,对后续建模的逻辑操作具有重大的意义,比如可以定义设计或采样的自由度(使用ResidueSelector可以将蛋白质距离内核中心5埃范围内的氨基酸选择出来,后续进行氨基酸侧链能量最小化等结构优化),也可以配合SimpleMetrics、Filter等进行蛋白质性质或参数的统计。\n",
    "\n",
    "注: ResidueSelectors的概念比较简单也比较利于初学者理解,因此此章节学习难度较小。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 一、 ResidueSelector与vector1_bool"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在PyRosetta中,定义好ResidueSelectors后,进行apply(可以理解为执行选择的过程),我们将得到氨基酸残基的子集列表。这个列表被保存在vector1<bool>对象中,以下以具体的实例进行讲解:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.31+release.c7009b3115c22daa9efe2805d9d1ebba08426a54 2021-08-07T10:04:12] retrieved from: http://www.pyrosetta.org\n",
      "(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.\n",
      "\u001b[0mcore.init: {0} \u001b[0mChecking for fconfig files in pwd and ./rosetta/flags\n",
      "\u001b[0mcore.init: {0} \u001b[0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r292 2021.31+release.c7009b3115c c7009b3115c22daa9efe2805d9d1ebba08426a54 http://www.pyrosetta.org 2021-08-07T10:04:12\n",
      "\u001b[0mcore.init: {0} \u001b[0mcommand: PyRosetta -ex1 -ex2aro -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database\n",
      "\u001b[0mbasic.random.init_random_generator: {0} \u001b[0m'RNG device' seed mode, using '/dev/urandom', seed=1885576095 seed_offset=0 real_seed=1885576095 thread_index=0\n",
      "\u001b[0mbasic.random.init_random_generator: {0} \u001b[0mRandomGenerator:init: Normal mode, seed=1885576095 RG_type=mt19937\n",
      "\u001b[0mcore.chemical.GlobalResidueTypeSet: {0} \u001b[0mFinished initializing fa_standard residue type set.  Created 983 residue types\n",
      "\u001b[0mcore.chemical.GlobalResidueTypeSet: {0} \u001b[0mTotal time to initialize 0.651952 seconds.\n",
      "\u001b[0mcore.import_pose.import_pose: {0} \u001b[0mFile './data/6LZ9_H_L.pdb' automatically determined to be of type PDB\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mFound disulfide between residues 21 94\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 21 CYS\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 94 CYS\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 21 CYD\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 94 CYD\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mFound disulfide between residues 141 206\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 141 CYS\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 206 CYS\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 141 CYD\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0mcurrent variant for 206 CYD\n"
     ]
    }
   ],
   "source": [
    "# 导入链选择器\n",
    "from pyrosetta import pose_from_pdb, init\n",
    "from pyrosetta.rosetta.core.select.residue_selector import ChainSelector\n",
    "init()\n",
    "# 从pdb中读入生成pose对象,(肝细胞生长因子抗体PDB:6LZ9)\n",
    "pose = pose_from_pdb('./data/6LZ9_H_L.pdb')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_init.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PDB file name: ./data/6LZ9_H_L.pdb\n",
      " Pose Range  Chain    PDB Range  |   #Residues         #Atoms\n",
      "\n",
      "0001 -- 0081    H 0002  -- 0082  |   0081 residues;    01283 atoms\n",
      "0082 -- 0082    H 0082A -- 0082A |   0001 residues;    00011 atoms\n",
      "0083 -- 0083    H 0082B -- 0082B |   0001 residues;    00011 atoms\n",
      "0084 -- 0084    H 0082C -- 0082C |   0001 residues;    00019 atoms\n",
      "0085 -- 0102    H 0083  -- 0100  |   0018 residues;    00271 atoms\n",
      "0103 -- 0103    H 0100A -- 0100A |   0001 residues;    00010 atoms\n",
      "0104 -- 0104    H 0100B -- 0100B |   0001 residues;    00021 atoms\n",
      "0105 -- 0105    H 0100C -- 0100C |   0001 residues;    00021 atoms\n",
      "0106 -- 0106    H 0100D -- 0100D |   0001 residues;    00010 atoms\n",
      "0107 -- 0107    H 0100E -- 0100E |   0001 residues;    00017 atoms\n",
      "0108 -- 0118    H 0101  -- 0111  |   0011 residues;    00160 atoms\n",
      "0119 -- 0223    L 0001  -- 0105  |   0105 residues;    01600 atoms\n",
      "                           TOTAL |   0223 residues;    03434 atoms\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# 先来看抗体的残基基本信息:\n",
    "print(pose.pdb_info())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "抗体含有的链数量:2\n",
      "抗体含有的氨基酸数量:223\n"
     ]
    }
   ],
   "source": [
    "print(f'抗体含有的链数量:{pose.num_chains()}')\n",
    "print(f'抗体含有的氨基酸数量:{pose.total_residue()}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可见抗体中,共有两条链。H链氨基酸范围是1-118,L链氨基酸范围是119-223。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "vector1_bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n"
     ]
    }
   ],
   "source": [
    "# 选择抗体的重链,PDB链号为\"H\":\n",
    "select_heavy_chain = ChainSelector('H')\n",
    "selected = select_heavy_chain.apply(pose)\n",
    "print(selected)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**结果解读**<br />\n",
    "知识点1: vector1_bool中被选择的氨基酸返回“1”,而没有被选择的氨基酸返回“0”<br /> \n",
    "知识点2: vector1_bool中是按照Pose编号进行编写的(从1开始),也就是说重链的编号从1 -> n, 轻链的编号从n+1 -> 223."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "验证选择器是否正确:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118]\n"
     ]
    }
   ],
   "source": [
    "index_list = [index+1 for index, i in enumerate(selected) if i == 1]\n",
    "print(index_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可见选择器正确选择了重链的所有氨基酸。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 二、 ResidueSelector的可视化"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "PyRosetta中内置SelectedResiduesPyMOLMetric的函数,可以直接显示被选择的氨基酸。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyrosetta.rosetta.core.simple_metrics.metrics import SelectedResiduesPyMOLMetric\n",
    "pymol_selected = SelectedResiduesPyMOLMetric()\n",
    "pymol_selected.set_residue_selector(select_heavy_chain)\n",
    "prefix = 'heavy_chain_'\n",
    "pymol_selected.apply(pose, prefix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'select rosetta_sele, (chain H and resid 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,82A,82B,82C,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,100A,100B,100C,100D,100E,101,102,103,104,105,106,107,108,109,110,111)'"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pyrosetta.rosetta.core.simple_metrics import get_sm_data\n",
    "sm_data = get_sm_data(pose)\n",
    "string_metric = sm_data.get_string_metric_data()\n",
    "string_metric['heavy_chain_pymol_selection']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第一步,在PyMol中的cmd对话框输入上述的选择命令;\n",
    "\n",
    "第二步,用棍棒形式呈现\n",
    "show sticks, rosetta_sele"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_heavychain.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 三、 ResidueSelector的应用实例\n",
    "\n",
    "氨基酸选择器按功能可分为三大类:\n",
    "\n",
    "- 逻辑选择器\n",
    "- 非构象依赖选择器\n",
    "- 构象依赖选择器\n",
    "\n",
    "以下我们将逐步来讲解在实战中,都有哪些氨基酸选择可以为我们所用。<br /> \n",
    "这一节主要简单示例,下一节将详细讲解不同的API。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.1 逻辑选择器\n",
    "第一部分是逻辑选择器,很好理解,按照逻辑分类为Not、And、Or逻辑关系,可以将**两个**选择器进行逻辑的再次选择。\n",
    "在Rosetta中,负责逻辑定义的选择器为NotResidueSelector、AndResidueSelector、OrResidueSelector。\n",
    "以下做实例说明:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 还是以之前读入的抗体pose为例。\n",
    "# 先定义选择的链Selector:\n",
    "select_heavy_chain = ChainSelector('H')\n",
    "select_light_chain = ChainSelector('L')\n",
    "select_light_chain.apply(pose)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 可视化,选择轻链\n",
    "pymol_selected = SelectedResiduesPyMOLMetric()\n",
    "pymol_selected.set_residue_selector(select_light_chain)\n",
    "prefix = 'light_chain_'\n",
    "pymol_selected.apply(pose, prefix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'select rosetta_sele, (chain L and resid 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105)'"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string_metric = sm_data.get_string_metric_data()\n",
    "string_metric['light_chain_pymol_selection']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_lightchain.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "vector1_bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]\n"
     ]
    }
   ],
   "source": [
    "#example1: 选择轻链**或**重链\n",
    "from pyrosetta.rosetta.core.select.residue_selector import OrResidueSelector\n",
    "light_or_heavy = OrResidueSelector(select_heavy_chain, select_light_chain)\n",
    "residue_selector = light_or_heavy.apply(pose)\n",
    "print(residue_selector)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 可视化 选择轻链**或**重链\n",
    "pymol_selected = SelectedResiduesPyMOLMetric()\n",
    "pymol_selected.set_residue_selector(light_or_heavy)\n",
    "prefix = 'light_or_heavy_'\n",
    "pymol_selected.apply(pose, prefix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'select rosetta_sele, (chain H and resid 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,82A,82B,82C,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,100A,100B,100C,100D,100E,101,102,103,104,105,106,107,108,109,110,111) or (chain L and resid 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105)'"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string_metric = sm_data.get_string_metric_data()\n",
    "string_metric['light_or_heavy_pymol_selection']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_light_or_heavy.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n"
     ]
    }
   ],
   "source": [
    "#example2: 选择重链**且**轻链\n",
    "from pyrosetta.rosetta.core.select.residue_selector import AndResidueSelector\n",
    "light_and_heavy = AndResidueSelector(select_heavy_chain, select_light_chain)\n",
    "residue_selector = light_and_heavy.apply(pose)\n",
    "print(residue_selector)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 可视化选择重链**且**轻链\n",
    "pymol_selected = SelectedResiduesPyMOLMetric()\n",
    "pymol_selected.set_residue_selector(light_and_heavy)\n",
    "prefix = 'light_and_heavy_'\n",
    "pymol_selected.apply(pose, prefix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'select rosetta_sele, '"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string_metric = sm_data.get_string_metric_data()\n",
    "string_metric['light_and_heavy_pymol_selection']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_light_and_heavy.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "重链和轻链之间没有交集,所以选择的结果是**空集**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]\n"
     ]
    }
   ],
   "source": [
    "#example3: 非选择器:\n",
    "from pyrosetta.rosetta.core.select.residue_selector import NotResidueSelector\n",
    "not_heavy = NotResidueSelector(select_heavy_chain)\n",
    "residue_selector = not_heavy.apply(pose)\n",
    "print(residue_selector)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 可视化选择 非重链\n",
    "pymol_selected = SelectedResiduesPyMOLMetric()\n",
    "pymol_selected.set_residue_selector(not_heavy)\n",
    "prefix = 'not_heavy_'\n",
    "pymol_selected.apply(pose, prefix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'select rosetta_sele, (chain L and resid 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105)'"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string_metric = sm_data.get_string_metric_data()\n",
    "string_metric['not_heavy_pymol_selection']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_not_heavy.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "vector1_bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]\n"
     ]
    }
   ],
   "source": [
    "#example4: 选择整个Pose\n",
    "from pyrosetta.rosetta.core.select.residue_selector import TrueResidueSelector\n",
    "true = TrueResidueSelector()\n",
    "residue_selector = true.apply(pose)\n",
    "print(residue_selector)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 可视化选择 整个Pose\n",
    "pymol_selected = SelectedResiduesPyMOLMetric()\n",
    "pymol_selected.set_residue_selector(true)\n",
    "prefix = 'entire_pose_'\n",
    "pymol_selected.apply(pose, prefix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'select rosetta_sele, (chain H and resid 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,82A,82B,82C,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,100A,100B,100C,100D,100E,101,102,103,104,105,106,107,108,109,110,111) or (chain L and resid 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105)'"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string_metric = sm_data.get_string_metric_data()\n",
    "string_metric['entire_pose_pymol_selection']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_entire_pose.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.2 非构象依赖的选择器\n",
    "这类选择器的定义不依赖于具体的构象,仅仅依靠属性就可以定义。如氨基酸的序号,氨基酸的名称等。此次简单举两个例子进行说明。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**3.2.1 ResidueIndexSelector**\n",
    "\n",
    "通过氨基酸的具体编号定义的选择器,不仅可以使用PDB编号、Pose编号,还可以指定氨基酸的范围进行选择。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n"
     ]
    }
   ],
   "source": [
    "from pyrosetta.rosetta.core.select.residue_selector import ResidueIndexSelector\n",
    "# 根据具体的Pose编号选择:\n",
    "pose_index_selector = ResidueIndexSelector('40,42,44')\n",
    "residue_selector = pose_index_selector.apply(pose)\n",
    "print(residue_selector)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 可视化选择 特定的残基位点\n",
    "pymol_selected = SelectedResiduesPyMOLMetric()\n",
    "pymol_selected.set_residue_selector(pose_index_selector)\n",
    "prefix = 'index_select_'\n",
    "pymol_selected.apply(pose, prefix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'select rosetta_sele, (chain H and resid 41,43,45)'"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string_metric = sm_data.get_string_metric_data()\n",
    "string_metric['index_select_pymol_selection']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_index.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n"
     ]
    }
   ],
   "source": [
    "#example1: 根据具体的PDB编号选择, 注意需要附带上PDB链的信息。\n",
    "pdb_index_selector = ResidueIndexSelector('62H,63H,64H')\n",
    "residue_selector = pdb_index_selector.apply(pose)\n",
    "print(residue_selector)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 可视化选择 根据PDB编号选择的残基\n",
    "pymol_selected = SelectedResiduesPyMOLMetric()\n",
    "pymol_selected.set_residue_selector(pdb_index_selector)\n",
    "prefix = 'pdb_index_select_'\n",
    "pymol_selected.apply(pose, prefix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'select rosetta_sele, (chain H and resid 62,63,64)'"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string_metric = sm_data.get_string_metric_data()\n",
    "string_metric['pdb_index_select_pymol_selection']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_pdb_index.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n"
     ]
    }
   ],
   "source": [
    "#example2: 根据PDB的范围进行选择。\n",
    "range_selector = ResidueIndexSelector('42H-60H')\n",
    "residue_selector = range_selector.apply(pose)\n",
    "print(residue_selector)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 可视化选择 一定范围的残基\n",
    "pymol_selected = SelectedResiduesPyMOLMetric()\n",
    "pymol_selected.set_residue_selector(range_selector)\n",
    "prefix = 'range_select_'\n",
    "pymol_selected.apply(pose, prefix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'select rosetta_sele, (chain H and resid 42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60)'"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string_metric = sm_data.get_string_metric_data()\n",
    "string_metric['range_select_pymol_selection']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_pdb_range.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**3.2.2. ResidueNameSelector**\n",
    "\n",
    "通过氨基酸的具体残基名定义的选择器:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0]\n"
     ]
    }
   ],
   "source": [
    "#example1: 根据单个残基名进行选择:\n",
    "from pyrosetta.rosetta.core.select.residue_selector import *\n",
    "resname_selector = ResidueNameSelector('PHE')\n",
    "residue_selector = resname_selector.apply(pose)\n",
    "print(residue_selector)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 可视化选择 根据残基名选择的残基\n",
    "pymol_selected = SelectedResiduesPyMOLMetric()\n",
    "pymol_selected.set_residue_selector(resname_selector)\n",
    "prefix = 'resname_select_'\n",
    "pymol_selected.apply(pose, prefix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'select rosetta_sele, (chain H and resid 27,79,100) or (chain L and resid 21,49,62,83,87,96,98)'"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string_metric = sm_data.get_string_metric_data()\n",
    "string_metric['resname_select_pymol_selection']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<center><img src=\"./img/6LZ9_residuename.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0]\n"
     ]
    }
   ],
   "source": [
    "#example2:  根据多个残基名进行选择:\n",
    "resname_selector = ResidueNameSelector('PHE,ASN')\n",
    "residue_selector = resname_selector.apply(pose)\n",
    "print(residue_selector)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 可视化选择多个残基名对应的残基\n",
    "pymol_selected = SelectedResiduesPyMOLMetric()\n",
    "pymol_selected.set_residue_selector(resname_selector)\n",
    "prefix = 'multi_resname_select_'\n",
    "pymol_selected.apply(pose, prefix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'select rosetta_sele, (chain H and resid 27,54,60,76,79,100) or (chain L and resid 21,31,34,49,62,77,83,87,96,98)'"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string_metric = sm_data.get_string_metric_data()\n",
    "string_metric['multi_resname_select_pymol_selection']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_multi_residuename.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n"
     ]
    }
   ],
   "source": [
    "#example3:  选择带修饰的氨基酸残基\n",
    "resname_selector = ResidueNameSelector('CYS')\n",
    "residue_selector = resname_selector.apply(pose)\n",
    "print(residue_selector)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "好像出现了问题,残基选择器似乎没有正确地选择我所需要的二硫键残基。让我们打印21号残基的信息,看看出了什么问题?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Residue 21: CYS:disulfide (CYS, C):\n",
      "Base: CYS\n",
      " Properties: POLYMER PROTEIN CANONICAL_AA SC_ORBITALS METALBINDING DISULFIDE_BONDED ALPHA_AA L_AA\n",
      " Variant types: DISULFIDE\n",
      " Main-chain atoms:  N    CA   C  \n",
      " Backbone atoms:    N    CA   C    O    H    HA \n",
      " Side-chain atoms:  CB   SG  1HB  2HB \n",
      "Atom Coordinates:\n",
      "   N  : 39.126, 55.553, 42.324\n",
      "   CA : 37.869, 55.182, 41.689\n",
      "   C  : 37.774, 53.665, 41.73\n",
      "   O  : 38.654, 52.976, 41.209\n",
      "   CB : 37.81, 55.713, 40.253\n",
      "   SG : 36.265, 55.41, 39.34\n",
      "   H  : 39.995, 55.343, 41.854\n",
      "   HA : 37.051, 55.626, 42.256\n",
      "  1HB : 37.967, 56.792, 40.257\n",
      "  2HB : 38.614, 55.268, 39.667\n",
      "Mirrored relative to coordinates in ResidueType: FALSE\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(pose.residue(21))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**结果解读**<br />\n",
    "选择带二硫键的氨基酸时,使用CYS残基名并没有正确选择到对应的氨基酸,因为在Rosetta中,形成二硫键的半胱氨酸名为\n",
    "CYS:disulfide, 接下来我们尝试换个名字进行选择."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n"
     ]
    }
   ],
   "source": [
    "resname_selector = ResidueNameSelector('CYS:disulfide')\n",
    "residue_selector = resname_selector.apply(pose)\n",
    "print(residue_selector)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 可视化选择 带修饰的残基\n",
    "pymol_selected = SelectedResiduesPyMOLMetric()\n",
    "pymol_selected.set_residue_selector(resname_selector)\n",
    "prefix = 'ss_select_'\n",
    "pymol_selected.apply(pose, prefix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'select rosetta_sele, (chain H and resid 22,92) or (chain L and resid 23,88)'"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string_metric = sm_data.get_string_metric_data()\n",
    "string_metric['ss_select_pymol_selection']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_ss.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**结果解读**<br />\n",
    "现在可以正确选择到对应的二硫键氨基酸子集了!这些二硫键的位置是22H, 92H, 23L, 88L。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.3 构象依赖的选择器\n",
    "顾名思义,这类选择器与分子结构的具体构象有关,具体地由二面角、二级结构、氢键、邻居分子数量、相互作用界面、对称性等几个层次去进行定义。这里以NeighborhoodResidueSelector为例进行简要说明。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**3.3.1. NeighborhoodResidueSelector**\n",
    "\n",
    "选择邻近残基,默认选择10埃范围内的残基。有两种用法来选择,第一种选择半径范围内所有的氨基酸,第二种为选择**邻近范围内**的氨基酸"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0mcore.select.residue_selector.NeighborhoodResidueSelector: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m ################ Cloning pose and building neighbor graph ################\n",
      "\u001b[0mcore.select.residue_selector.NeighborhoodResidueSelector: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m Ensure that pose is either scored or has update_residue_neighbors() called\n",
      "\u001b[0mcore.select.residue_selector.NeighborhoodResidueSelector: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m before using NeighborhoodResidueSelector for maximum performance!\n",
      "\u001b[0mcore.select.residue_selector.NeighborhoodResidueSelector: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m ##########################################################################\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 比如选择PDB编号为H链42号氨基酸的10埃范围内所有的氨基酸(包括42号氨基酸):\n",
    "from pyrosetta.rosetta.core.select.residue_selector import NeighborhoodResidueSelector, ResidueIndexSelector\n",
    "residue1_selector = ResidueIndexSelector('42H')\n",
    "nbr_selector = NeighborhoodResidueSelector(residue1_selector, 10.0, True)  # True 代表包括42号氨基酸。\n",
    "nbr_selector.apply(pose)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0mcore.select.residue_selector.NeighborhoodResidueSelector: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m ################ Cloning pose and building neighbor graph ################\n",
      "\u001b[0mcore.select.residue_selector.NeighborhoodResidueSelector: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m Ensure that pose is either scored or has update_residue_neighbors() called\n",
      "\u001b[0mcore.select.residue_selector.NeighborhoodResidueSelector: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m before using NeighborhoodResidueSelector for maximum performance!\n",
      "\u001b[0mcore.select.residue_selector.NeighborhoodResidueSelector: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m ##########################################################################\n"
     ]
    }
   ],
   "source": [
    "# 可视化选择PDB编号为H链42号氨基酸的10埃范围内所有的氨基酸且含42号氨基酸\n",
    "pymol_selected = SelectedResiduesPyMOLMetric()\n",
    "pymol_selected.set_residue_selector(nbr_selector)\n",
    "prefix = 'nbr_select_'\n",
    "pymol_selected.apply(pose, prefix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'select rosetta_sele, (chain H and resid 39,40,41,42,43,44,88,89)'"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string_metric = sm_data.get_string_metric_data()\n",
    "string_metric['nbr_select_pymol_selection']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_nbr.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0mcore.select.residue_selector.NeighborhoodResidueSelector: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m ################ Cloning pose and building neighbor graph ################\n",
      "\u001b[0mcore.select.residue_selector.NeighborhoodResidueSelector: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m Ensure that pose is either scored or has update_residue_neighbors() called\n",
      "\u001b[0mcore.select.residue_selector.NeighborhoodResidueSelector: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m before using NeighborhoodResidueSelector for maximum performance!\n",
      "\u001b[0mcore.select.residue_selector.NeighborhoodResidueSelector: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m ##########################################################################\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 比如选择PDB编号为H链42号氨基酸10埃范围内所有的氨基酸(不包括42号氨基酸):\n",
    "nbr_selector = NeighborhoodResidueSelector(residue1_selector, 10.0, False)  # True 代表包括1号氨基酸。\n",
    "nbr_selector.apply(pose)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0mcore.select.residue_selector.NeighborhoodResidueSelector: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m ################ Cloning pose and building neighbor graph ################\n",
      "\u001b[0mcore.select.residue_selector.NeighborhoodResidueSelector: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m Ensure that pose is either scored or has update_residue_neighbors() called\n",
      "\u001b[0mcore.select.residue_selector.NeighborhoodResidueSelector: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m before using NeighborhoodResidueSelector for maximum performance!\n",
      "\u001b[0mcore.select.residue_selector.NeighborhoodResidueSelector: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m ##########################################################################\n"
     ]
    }
   ],
   "source": [
    "# 可视化选择 PDB编号为H链42号氨基酸的10埃范围内所有的氨基酸但不含42号氨基酸\n",
    "pymol_selected = SelectedResiduesPyMOLMetric()\n",
    "pymol_selected.set_residue_selector(nbr_selector)\n",
    "prefix = 'nbr_noself_select_'\n",
    "pymol_selected.apply(pose, prefix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'select rosetta_sele, (chain H and resid 39,40,41,43,44,88,89)'"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string_metric = sm_data.get_string_metric_data()\n",
    "string_metric['nbr_noself_select_pymol_selection']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/6LZ9_nbr_noself.png\" width = \"400\" height = \"300\" align=center /> </center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}