{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "mature-restriction",
   "metadata": {},
   "source": [
    "## Resfile System"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "monthly-september",
   "metadata": {},
   "source": [
    "@Author: 吴炜坤\n",
    "\n",
    "@email:weikun.wu@xtalpi.com/weikunwu@163.com"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "intelligent-associate",
   "metadata": {},
   "source": [
    "Resfile是控制Packer的外部输入文件,用于告诉Rosetta Packer如何对结构中的每一个氨基酸侧链自由度进行定义。Resfile以文件的形式被ReadResfile函数读取并生成对应的TaskOperation。用户可以很方便地在外部进行快速的定义,而不需要写出复杂的selector+RLT的方式。但Resfile系统的缺点是每个文件的定义都需要人去处理,无法自动化完成任务。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "understanding-satin",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.26+release.b308454c455dd04f6824cc8b23e54bbb9be2cdd7 2021-07-02T13:01:54] retrieved from: http://www.pyrosetta.org\n",
      "(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.\n",
      "\u001b[0mcore.init: {0} \u001b[0mChecking for fconfig files in pwd and ./rosetta/flags\n",
      "\u001b[0mcore.init: {0} \u001b[0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r288 2021.26+release.b308454c455 b308454c455dd04f6824cc8b23e54bbb9be2cdd7 http://www.pyrosetta.org 2021-07-02T13:01:54\n",
      "\u001b[0mcore.init: {0} \u001b[0mcommand: PyRosetta -ex1 -ex2aro -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database\n",
      "\u001b[0mbasic.random.init_random_generator: {0} \u001b[0m'RNG device' seed mode, using '/dev/urandom', seed=-867162182 seed_offset=0 real_seed=-867162182 thread_index=0\n",
      "\u001b[0mbasic.random.init_random_generator: {0} \u001b[0mRandomGenerator:init: Normal mode, seed=-867162182 RG_type=mt19937\n",
      "\u001b[0mcore.chemical.GlobalResidueTypeSet: {0} \u001b[0mFinished initializing fa_standard residue type set.  Created 984 residue types\n",
      "\u001b[0mcore.chemical.GlobalResidueTypeSet: {0} \u001b[0mTotal time to initialize 0.668191 seconds.\n",
      "\u001b[0mcore.import_pose.import_pose: {0} \u001b[0mFile './data/helix.pdb' automatically determined to be of type PDB\n",
      "\u001b[0mcore.conformation.Conformation: {0} \u001b[0m\u001b[1m[ WARNING ]\u001b[0m missing heavyatom:  OXT on residue GLY:CtermProteinFull 14\n"
     ]
    }
   ],
   "source": [
    "# 初始化PyRosetta\n",
    "from pyrosetta import init, pose_from_pdb\n",
    "from pyrosetta.rosetta.core.pack.task.operation import ReadResfile\n",
    "from pyrosetta.rosetta.core.pack.task import TaskFactory\n",
    "init()\n",
    "pose = pose_from_pdb('./data/helix.pdb')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "crucial-scratch",
   "metadata": {},
   "source": [
    "### 一、Resfile的基本格式"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "attempted-consistency",
   "metadata": {},
   "source": [
    "在Resfile中,我们使用的编号策略是: PDB Numbering, 并且大小写敏感。通常一个Resfile的格式包括两部分:HEADER & BODYs.\n",
    "\n",
    "每个部分的作用:\n",
    "- HEADER: **控制全局**如何进行Rotamer搜索方式;\n",
    "- BODY: 记录明确指定的**特定位置或范围**氨基酸Rotamer搜索方式;\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "possible-novel",
   "metadata": {},
   "source": [
    "Resfile实例:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "together-nutrition",
   "metadata": {},
   "source": [
    "<center><img src=\"./img/Resfile_format.jpg\" width = \"600\" height = \"200\" align=center /></center>\n",
    "(图片来源: 晶泰科技团队)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sealed-badge",
   "metadata": {},
   "source": [
    "### 二、HEADER的语法与编写"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "artificial-skiing",
   "metadata": {},
   "source": [
    "#### 2.1 全局自由度控制"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "least-cleaner",
   "metadata": {},
   "source": [
    "这部分语法实现的是除BODY部分中出现的氨基酸位点进行自由度控制,控制可以设置为Design/Repacking/No_repack三种基本状态。以下列举所有的语法:\n",
    "- ALLAA    ......... # 允许设计为20种氨基酸\n",
    "- ALLAAxc  ......... # 允许设计为**非半胱氨酸**以外的所有氨基酸\n",
    "- POLAR    ......... # 允许设计极性氨基酸(DEHKNQRST)\n",
    "- APOLAR   ......... # 允许设计非极性氨基酸(ACFGILMPVWY)\n",
    "- NOTAA    ......... # 不允许设计为特定的氨基酸列表。(列表连续编写无空格)\n",
    "- PIKAA    ......... # 只允许设计为特定的氨基酸列表。(列表连续编写无空格)\n",
    "- NATAA    ......... # Repack当前氨基酸类型,只允许构象变化。\n",
    "- NATRO    ......... # NoRepack不允许构象变化。\n",
    "- PROPERTY ......... # 只允许设计为有以下性质的氨基酸"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "major-finance",
   "metadata": {},
   "source": [
    "PROPERTY一般常用可选:\n",
    "- METAL: 金属离子\n",
    "- POLAR: 极性氨基酸\n",
    "- HYDROPHOBIC: 疏水氨基酸\n",
    "- CHARGED: 带电氨基酸\n",
    "- NEGATIVE_CHARGE: 带负电氨基酸\n",
    "- POSITIVE_CHARGE: 带正电氨基酸\n",
    "- AROMATIC: 芳香族氨基酸"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "academic-gather",
   "metadata": {},
   "source": [
    "编写一个NATRO相关的Resfile的demo:\n",
    "```\n",
    "NATRO\n",
    "START\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "genetic-spank",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#Packer_Task\n",
      "\n",
      "Threads to request: ALL AVAILABLE\n",
      "\n",
      "resid\tpack?\tdesign?\tallowed_aas\n",
      "1\tFALSE\tFALSE\t\n",
      "2\tFALSE\tFALSE\t\n",
      "3\tFALSE\tFALSE\t\n",
      "4\tFALSE\tFALSE\t\n",
      "5\tFALSE\tFALSE\t\n",
      "6\tFALSE\tFALSE\t\n",
      "7\tFALSE\tFALSE\t\n",
      "8\tFALSE\tFALSE\t\n",
      "9\tFALSE\tFALSE\t\n",
      "10\tFALSE\tFALSE\t\n",
      "11\tFALSE\tFALSE\t\n",
      "12\tFALSE\tFALSE\t\n",
      "13\tFALSE\tFALSE\t\n",
      "14\tFALSE\tFALSE\t\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# restrict to baestype list\n",
    "resfile_type = ReadResfile('./data/NATRO.resfile')\n",
    "\n",
    "# 将TaskOperations加载至TaskFactory中\n",
    "pack_tf = TaskFactory()\n",
    "pack_tf.push_back(resfile_type)\n",
    "\n",
    "# 生成PackerTask\n",
    "packer_task = pack_tf.create_task_and_apply_taskoperations(pose)\n",
    "print(packer_task)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "considered-lounge",
   "metadata": {},
   "source": [
    "#### 练习\n",
    "尝试写更多的Resfile并读取到上述的代码中,比较不同语法之间的差异。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "exempt-dating",
   "metadata": {},
   "source": [
    "#### 2.2 Rotamer采样丰度的设置:\n",
    "\n",
    "Rosetta Pack采样Rotamer时是离散的,默认只会采纳每个格点的中心富集的构象,我们可以通过Extra Rotamer相关控制手段来增加Rotamer的采样,默认扩充采样时,采集Rotamer时会额外考虑平均$\\chi$的+/-1个标准差的构象。这种Extra Rotamer相关控制仅对**包埋**的残基有效!\n",
    "\n",
    "只要在Resfile中HEADER中使用Extra Rotamer Commands字段即可,目前针对不同的氨基酸有四种编写方式:\n",
    "\n",
    "1. EX \\<chi-id> LEVEL \\<level-value> 语法\n",
    "\n",
    "    - EX \\<chi-id> 代表指定对侧链中第几个$\\chi$角进行扩大采样, 如EX 1 EX 2 代表同时对$\\chi_{1}$角和$\\chi_{2}$角扩大采样\n",
    "    - LEVEL \\<level-value> 代表如何允许的$\\chi$角标准差范围,如果缺省这部分的参数默认为+/-1个标准差\n",
    "\n",
    "2. EX ARO \\<chi-id> LEVEL \\<level-value> 语法\n",
    "    - EX ARO \\<chi-id> 代表指定对侧链中第几个$\\chi$角进行扩大采样, **但范围仅限于芳香族氨基酸(FHWY)!**\n",
    "    - LEVEL \\<level-value> 代表如何允许的$\\chi$角标准差范围,如果缺省这部分的参数默认为+/-1个标准差\n",
    "\n",
    "3. EX_CUTOFF \\<number of neighbors> 语法\n",
    "    - Rosetta默认不会对处于蛋白表面的氨基酸进行额外Rotamer采集,除非用户显式地设置(EX_CUTOFF >=0等),来改变“包埋”氨基酸的判断条件。默认为统计当前氨基酸10埃范围内残基数量,当数量大于设定的阈值时,认为是\"包埋\"的氨基酸,进行额外的Rotamer采样。因此通常EX_CUTOFF被显式地设置为0(默认值为18),来考虑所有的氨基酸位点都做Rotamer。\n",
    "\n",
    "4. USE_INPUT_SC 语法\n",
    "    - 第一轮Packer时考虑初始输入的侧链Rotamer构象"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "toxic-avatar",
   "metadata": {},
   "source": [
    "LEVEL参数目前有7个级别:\n",
    "- 0 ...... no extra chi angles\n",
    "- 1 ...... sample at 1 standard deviation\n",
    "- 2 ...... sample at 1/2 standard deviation\n",
    "- 3 ...... sample at two full standard deviations\n",
    "- 4 ...... sample at two 1/2 standard deviations\n",
    "- 5 ...... sample at four 1/2 standard deviations\n",
    "- 6 ...... sample at three 1/3 standard deviations\n",
    "- 7 ...... sample at six 1/4 standard deviations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sticky-retention",
   "metadata": {},
   "source": [
    "以下举一个控制Rotamer丰度的Resfile例子:\n",
    "```\n",
    "NATRO\n",
    "EX 1 EX 2\n",
    "START\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "random-sucking",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "start\n",
      "1 A  NATRO  EX ARO 1 EX ARO 2\n",
      "2 A  NATRO  EX ARO 1 EX ARO 2\n",
      "3 A  NATRO  EX ARO 1 EX ARO 2\n",
      "4 A  NATRO  EX ARO 1 EX ARO 2\n",
      "5 A  NATRO  EX ARO 1 EX ARO 2\n",
      "6 A  NATRO  EX ARO 1 EX ARO 2\n",
      "7 A  NATRO  EX ARO 1 EX ARO 2\n",
      "8 A  NATRO  EX ARO 1 EX ARO 2\n",
      "9 A  NATRO  EX ARO 1 EX ARO 2\n",
      "10 A  NATRO  EX ARO 1 EX ARO 2\n",
      "11 A  NATRO  EX ARO 1 EX ARO 2\n",
      "12 A  NATRO  EX ARO 1 EX ARO 2\n",
      "13 A  NATRO  EX ARO 1 EX ARO 2\n",
      "14 A  NATRO  EX ARO 1 EX ARO 2\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# restrict to baestype list\n",
    "resfile_type = ReadResfile('./data/EX1EX2.resfile')\n",
    "\n",
    "# 将TaskOperations加载至TaskFactory中\n",
    "pack_tf = TaskFactory()\n",
    "pack_tf.push_back(resfile_type)\n",
    "\n",
    "# 生成PackerTask\n",
    "packer_task = pack_tf.create_task_and_apply_taskoperations(pose)\n",
    "\n",
    "# 查看每个残基的Rotamer采样级别:\n",
    "print(packer_task.task_string(pose))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "several-sunglasses",
   "metadata": {},
   "source": [
    "可见所有位点的Rotamer都会额外采集$\\chi_{1}$和$\\chi_{2}$角。**(上述结果存在显示错误,我们并没有设置ARO,其实是没有设置ARO的。因此不影响实际的运行效果)**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aboriginal-constitution",
   "metadata": {},
   "source": [
    "除了全局控制,我们可以还可在HEADER中特定地给一些氨基酸设置额外Rotamer采集:\n",
    "```\n",
    "1. EX 1 EX 2\n",
    "\n",
    "2. EX ARO 2\n",
    "\n",
    "3. EX 1 LEVEL 7\n",
    "\n",
    "4. EX 1 EX ARO 1 LEVEL 4\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "closed-actress",
   "metadata": {},
   "source": [
    "#### 练习\n",
    "思考上述语法的具体含义。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "defined-trainer",
   "metadata": {},
   "source": [
    "### 三、BODY的语法与编写"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "internal-blond",
   "metadata": {},
   "source": [
    "BODY部分用于指明特定位点或范围的氨基酸Rotamer自由度。指定的形式一共有4种。\n",
    "- 特定单个位点指定\n",
    "- 指定位点范围\n",
    "- 指定链范围\n",
    "- 多重指定"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "recorded-bottle",
   "metadata": {},
   "source": [
    "#### 3.1 位点指定\n",
    "第一列为氨基酸的PDB编号(允许有insert code)。第二列为PDB链编号,第三列为COMMAND项。\n",
    "\n",
    "基本语法: \n",
    "```\n",
    "<PDBNUM>[<ICODE>] <CHAIN> <COMMANDS>\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "wired-shareware",
   "metadata": {},
   "source": [
    "注: ICODE是指PDB中存在特殊插入编号字符时使用,如抗体等有特殊编号的系统,和PDBNUM连续编写如35A,35B等。正常的PDBNUM应该只有数字。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "reserved-miniature",
   "metadata": {},
   "source": [
    "使用举例:\n",
    "```\n",
    "NATRO\n",
    "EX 1 EX 2\n",
    "START\n",
    "\n",
    "3 A ALLAA # 3号位允许设计为20种氨基酸\n",
    "4 A APOLAR # 4号位只允许在非极性氨基酸范围\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "necessary-depression",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#Packer_Task\n",
      "\n",
      "Threads to request: ALL AVAILABLE\n",
      "\n",
      "resid\tpack?\tdesign?\tallowed_aas\n",
      "1\tFALSE\tFALSE\t\n",
      "2\tFALSE\tFALSE\t\n",
      "3\tTRUE\tTRUE\tALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR\n",
      "4\tTRUE\tTRUE\tALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR\n",
      "5\tFALSE\tFALSE\t\n",
      "6\tFALSE\tFALSE\t\n",
      "7\tFALSE\tFALSE\t\n",
      "8\tFALSE\tFALSE\t\n",
      "9\tFALSE\tFALSE\t\n",
      "10\tFALSE\tFALSE\t\n",
      "11\tFALSE\tFALSE\t\n",
      "12\tFALSE\tFALSE\t\n",
      "13\tFALSE\tFALSE\t\n",
      "14\tFALSE\tFALSE\t\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# restrict to baestype list\n",
    "resfile_type = ReadResfile('./data/position_mut.resfile')\n",
    "\n",
    "# 将TaskOperations加载至TaskFactory中\n",
    "pack_tf = TaskFactory()\n",
    "pack_tf.push_back(resfile_type)\n",
    "\n",
    "# 生成PackerTask\n",
    "packer_task = pack_tf.create_task_and_apply_taskoperations(pose)\n",
    "print(packer_task)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "worth-mechanics",
   "metadata": {
    "tags": []
   },
   "source": [
    "#### 3.2 位点范围进行指定\n",
    "范围氨基酸指定格式:\n",
    "```\n",
    "<PDBNUM>[<ICODE>] - <PDBNUM>[<ICODE>] <CHAIN> <COMMANDS>\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "expanded-section",
   "metadata": {},
   "source": [
    "使用举例:\n",
    "```\n",
    "NATRO\n",
    "EX 1 EX 2\n",
    "START\n",
    "\n",
    "1 - 5 A APOLAR # A链1-5号位只允许在非极性氨基酸范围\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "induced-memorabilia",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#Packer_Task\n",
      "\n",
      "Threads to request: ALL AVAILABLE\n",
      "\n",
      "resid\tpack?\tdesign?\tallowed_aas\n",
      "1\tTRUE\tTRUE\tALA:NtermProteinFull,CYS:NtermProteinFull,PHE:NtermProteinFull,GLY:NtermProteinFull,ILE:NtermProteinFull,LEU:NtermProteinFull,MET:NtermProteinFull,PRO:NtermProteinFull,VAL:NtermProteinFull,TRP:NtermProteinFull,TYR:NtermProteinFull\n",
      "2\tTRUE\tTRUE\tALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR\n",
      "3\tTRUE\tTRUE\tALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR\n",
      "4\tTRUE\tTRUE\tALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR\n",
      "5\tTRUE\tTRUE\tALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR\n",
      "6\tFALSE\tFALSE\t\n",
      "7\tFALSE\tFALSE\t\n",
      "8\tFALSE\tFALSE\t\n",
      "9\tFALSE\tFALSE\t\n",
      "10\tFALSE\tFALSE\t\n",
      "11\tFALSE\tFALSE\t\n",
      "12\tFALSE\tFALSE\t\n",
      "13\tFALSE\tFALSE\t\n",
      "14\tFALSE\tFALSE\t\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# restrict to baestype list\n",
    "resfile_type = ReadResfile('./data/range_mut.resfile')\n",
    "\n",
    "# 将TaskOperations加载至TaskFactory中\n",
    "pack_tf = TaskFactory()\n",
    "pack_tf.push_back(resfile_type)\n",
    "\n",
    "# 生成PackerTask\n",
    "packer_task = pack_tf.create_task_and_apply_taskoperations(pose)\n",
    "print(packer_task)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "traditional-reconstruction",
   "metadata": {},
   "source": [
    "#### 3.3 链为单位进行指定\n",
    "链单位指定基本格式:\n",
    "```\n",
    "* <CHAIN> <COMMANDS>\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "conscious-backing",
   "metadata": {},
   "source": [
    "使用举例:\n",
    "```\n",
    "NATRO\n",
    "EX 1 EX 2\n",
    "START\n",
    "\n",
    "* A PROPERTY HYDROPHOBIC # A链所有位点可设计为疏水的天然氨基酸\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "compatible-respondent",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#Packer_Task\n",
      "\n",
      "Threads to request: ALL AVAILABLE\n",
      "\n",
      "resid\tpack?\tdesign?\tallowed_aas\n",
      "1\tTRUE\tTRUE\tPHE:NtermProteinFull,ILE:NtermProteinFull,LEU:NtermProteinFull,MET:NtermProteinFull,VAL:NtermProteinFull,TRP:NtermProteinFull,TYR:NtermProteinFull\n",
      "2\tTRUE\tTRUE\tPHE,ILE,LEU,MET,VAL,TRP,TYR\n",
      "3\tTRUE\tTRUE\tPHE,ILE,LEU,MET,VAL,TRP,TYR\n",
      "4\tTRUE\tTRUE\tPHE,ILE,LEU,MET,VAL,TRP,TYR\n",
      "5\tTRUE\tTRUE\tPHE,ILE,LEU,MET,VAL,TRP,TYR\n",
      "6\tTRUE\tTRUE\tPHE,ILE,LEU,MET,VAL,TRP,TYR\n",
      "7\tTRUE\tTRUE\tPHE,ILE,LEU,MET,VAL,TRP,TYR\n",
      "8\tTRUE\tTRUE\tPHE,ILE,LEU,MET,VAL,TRP,TYR\n",
      "9\tTRUE\tTRUE\tPHE,ILE,LEU,MET,VAL,TRP,TYR\n",
      "10\tTRUE\tTRUE\tPHE,ILE,LEU,MET,VAL,TRP,TYR\n",
      "11\tTRUE\tTRUE\tPHE,ILE,LEU,MET,VAL,TRP,TYR\n",
      "12\tTRUE\tTRUE\tPHE,ILE,LEU,MET,VAL,TRP,TYR\n",
      "13\tTRUE\tTRUE\tPHE,ILE,LEU,MET,VAL,TRP,TYR\n",
      "14\tTRUE\tTRUE\tPHE:CtermProteinFull,ILE:CtermProteinFull,LEU:CtermProteinFull,MET:CtermProteinFull,VAL:CtermProteinFull,TRP:CtermProteinFull,TYR:CtermProteinFull\n",
      "\n",
      "start\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# restrict to baestype list\n",
    "resfile_type = ReadResfile('./data/chain_range.resfile')\n",
    "\n",
    "# 将TaskOperations加载至TaskFactory中\n",
    "pack_tf = TaskFactory()\n",
    "pack_tf.push_back(resfile_type)\n",
    "\n",
    "# 生成PackerTask\n",
    "packer_task = pack_tf.create_task_and_apply_taskoperations(pose)\n",
    "print(packer_task)\n",
    "\n",
    "# 查看每个残基的Rotamer采样级别:\n",
    "print(packer_task.task_string(pose))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "personalized-brisbane",
   "metadata": {},
   "source": [
    "#### 3.4 非标准氨基酸的指定\n",
    "当在BODY中想引入非标准氨基酸时,需要特殊的格式进行指定(2019年版本的Rosetta支持该语法)\n",
    "```\n",
    "<PDBNUM>[<ICODE>] <CHAIN> <COMMANDS> X[ncaa]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "proof-winner",
   "metadata": {},
   "source": [
    "不同的地方在于COMMANDs部分: 非标准氨基酸加入前必须加入\"X[ncaa]\" ncaa=非标准氨基酸的三字母缩写\n",
    "\n",
    "使用举例:\n",
    "```\n",
    "NATRO\n",
    "EX 1 EX 2\n",
    "START\n",
    "\n",
    "5 A PIKAA X[B36]X[A20] # 5号引入单点非标准氨基酸B36以及A20非标准氨基酸\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "selective-jumping",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#Packer_Task\n",
      "\n",
      "Threads to request: ALL AVAILABLE\n",
      "\n",
      "resid\tpack?\tdesign?\tallowed_aas\n",
      "1\tFALSE\tFALSE\t\n",
      "2\tFALSE\tFALSE\t\n",
      "3\tFALSE\tFALSE\t\n",
      "4\tFALSE\tFALSE\t\n",
      "5\tTRUE\tTRUE\tB36,A20\n",
      "6\tFALSE\tFALSE\t\n",
      "7\tFALSE\tFALSE\t\n",
      "8\tFALSE\tFALSE\t\n",
      "9\tFALSE\tFALSE\t\n",
      "10\tFALSE\tFALSE\t\n",
      "11\tFALSE\tFALSE\t\n",
      "12\tFALSE\tFALSE\t\n",
      "13\tFALSE\tFALSE\t\n",
      "14\tFALSE\tFALSE\t\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from pyrosetta.rosetta.core.pack.palette import CustomBaseTypePackerPalette\n",
    "# restrict to baestype list\n",
    "resfile_type = ReadResfile('./data/ncaa.resfile')\n",
    "\n",
    "# 先在CustomBaseTypePackerPalette引入NCAA列表\n",
    "pp = CustomBaseTypePackerPalette()\n",
    "pp.add_type('B36')\n",
    "pp.add_type('A20')\n",
    "\n",
    "# 将TaskOperations加载至TaskFactory中\n",
    "pack_tf = TaskFactory()\n",
    "pack_tf.set_packer_palette(pp) ## 加载Palette到TaskFactory中;\n",
    "pack_tf.push_back(resfile_type)\n",
    "\n",
    "# 生成PackerTask\n",
    "packer_task = pack_tf.create_task_and_apply_taskoperations(pose)\n",
    "print(packer_task)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "growing-raleigh",
   "metadata": {},
   "source": [
    "#### 3.5 多重逻辑指定\n",
    "在Resfile中,如果BODY部分指定发生了重叠,有两种处理方式:\n",
    "\n",
    "- 如果BODY是同一种指定级别,**按照交集逻辑**进行处理,**无交集时为空集**。\n",
    "- 不同指定级别时,单位点指定的优先级高于范围级指定的优先级"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "genetic-diary",
   "metadata": {},
   "source": [
    "使用举例:\n",
    "```\n",
    "NATRO\n",
    "EX 1 EX 2\n",
    "START\n",
    "\n",
    "1 - 5 A APOLAR # A链1-5号位只允许在非极性氨基酸范围内进行Rotamer搜索\n",
    "3 - 5 A POLAR  # A链3-5号位只允许在极性氨基酸范围内进行Rotamer搜索\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "spatial-qatar",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#Packer_Task\n",
      "\n",
      "Threads to request: ALL AVAILABLE\n",
      "\n",
      "resid\tpack?\tdesign?\tallowed_aas\n",
      "1\tTRUE\tTRUE\tALA:NtermProteinFull,CYS:NtermProteinFull,PHE:NtermProteinFull,GLY:NtermProteinFull,ILE:NtermProteinFull,LEU:NtermProteinFull,MET:NtermProteinFull,PRO:NtermProteinFull,VAL:NtermProteinFull,TRP:NtermProteinFull,TYR:NtermProteinFull\n",
      "2\tTRUE\tTRUE\tALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR\n",
      "3\tFALSE\tFALSE\t\n",
      "4\tFALSE\tFALSE\t\n",
      "5\tFALSE\tFALSE\t\n",
      "6\tFALSE\tFALSE\t\n",
      "7\tFALSE\tFALSE\t\n",
      "8\tFALSE\tFALSE\t\n",
      "9\tFALSE\tFALSE\t\n",
      "10\tFALSE\tFALSE\t\n",
      "11\tFALSE\tFALSE\t\n",
      "12\tFALSE\tFALSE\t\n",
      "13\tFALSE\tFALSE\t\n",
      "14\tFALSE\tFALSE\t\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# restrict to baestype list\n",
    "resfile_type = ReadResfile('./data/multi-logic.resfile')\n",
    "\n",
    "# 将TaskOperations加载至TaskFactory中\n",
    "pack_tf = TaskFactory()\n",
    "pack_tf.push_back(resfile_type)\n",
    "\n",
    "# 生成PackerTask\n",
    "packer_task = pack_tf.create_task_and_apply_taskoperations(pose)\n",
    "print(packer_task)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "surprised-spread",
   "metadata": {},
   "source": [
    "同一指定级别,3-5号氨基酸自由度为空集。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "acceptable-baking",
   "metadata": {},
   "source": [
    "另外一种情况: \n",
    "\n",
    "使用举例:\n",
    "```\n",
    "NATRO\n",
    "EX 1 EX 2\n",
    "START\n",
    "\n",
    "1 - 5 A APOLAR # A链1-5号位只允许在非极性氨基酸范围内进行Rotamer搜索\n",
    "3 A POLAR  # A链3号氨基酸设计为极性氨基酸范围\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "sharing-steal",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#Packer_Task\n",
      "\n",
      "Threads to request: ALL AVAILABLE\n",
      "\n",
      "resid\tpack?\tdesign?\tallowed_aas\n",
      "1\tTRUE\tTRUE\tALA:NtermProteinFull,CYS:NtermProteinFull,PHE:NtermProteinFull,GLY:NtermProteinFull,ILE:NtermProteinFull,LEU:NtermProteinFull,MET:NtermProteinFull,PRO:NtermProteinFull,VAL:NtermProteinFull,TRP:NtermProteinFull,TYR:NtermProteinFull\n",
      "2\tTRUE\tTRUE\tALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR\n",
      "3\tTRUE\tTRUE\tASP,GLU,HIS,HIS_D,LYS,ASN,GLN,ARG,SER,THR\n",
      "4\tTRUE\tTRUE\tALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR\n",
      "5\tTRUE\tTRUE\tALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR\n",
      "6\tFALSE\tFALSE\t\n",
      "7\tFALSE\tFALSE\t\n",
      "8\tFALSE\tFALSE\t\n",
      "9\tFALSE\tFALSE\t\n",
      "10\tFALSE\tFALSE\t\n",
      "11\tFALSE\tFALSE\t\n",
      "12\tFALSE\tFALSE\t\n",
      "13\tFALSE\tFALSE\t\n",
      "14\tFALSE\tFALSE\t\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# restrict to baestype list\n",
    "resfile_type = ReadResfile('./data/multi-logic2.resfile')\n",
    "\n",
    "# 将TaskOperations加载至TaskFactory中\n",
    "pack_tf = TaskFactory()\n",
    "pack_tf.push_back(resfile_type)\n",
    "\n",
    "# 生成PackerTask\n",
    "packer_task = pack_tf.create_task_and_apply_taskoperations(pose)\n",
    "print(packer_task)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "south-migration",
   "metadata": {},
   "source": [
    "得到的结果: 3号设计为极性氨酸,1-2,4-5号氨基酸设计为非极性氨基酸。因为单位点优先级高于氨基酸范围指定。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "prepared-serve",
   "metadata": {},
   "source": [
    "再举一个例子:\n",
    "```\n",
    "NATRO\n",
    "EX 1 EX 2\n",
    "START\n",
    "\n",
    "* A POLAR  # A链所有氨基酸设计为极性氨酸\n",
    "1 - 5 A APOLAR # A链1-5号位只允许在非极性氨基酸范围内进行Rotamer搜索\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "apart-protein",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#Packer_Task\n",
      "\n",
      "Threads to request: ALL AVAILABLE\n",
      "\n",
      "resid\tpack?\tdesign?\tallowed_aas\n",
      "1\tTRUE\tTRUE\tALA:NtermProteinFull,CYS:NtermProteinFull,PHE:NtermProteinFull,GLY:NtermProteinFull,ILE:NtermProteinFull,LEU:NtermProteinFull,MET:NtermProteinFull,PRO:NtermProteinFull,VAL:NtermProteinFull,TRP:NtermProteinFull,TYR:NtermProteinFull\n",
      "2\tTRUE\tTRUE\tALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR\n",
      "3\tTRUE\tTRUE\tALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR\n",
      "4\tTRUE\tTRUE\tALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR\n",
      "5\tTRUE\tTRUE\tALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR\n",
      "6\tTRUE\tTRUE\tASP,GLU,HIS,HIS_D,LYS,ASN,GLN,ARG,SER,THR\n",
      "7\tTRUE\tTRUE\tASP,GLU,HIS,HIS_D,LYS,ASN,GLN,ARG,SER,THR\n",
      "8\tTRUE\tTRUE\tASP,GLU,HIS,HIS_D,LYS,ASN,GLN,ARG,SER,THR\n",
      "9\tTRUE\tTRUE\tASP,GLU,HIS,HIS_D,LYS,ASN,GLN,ARG,SER,THR\n",
      "10\tTRUE\tTRUE\tASP,GLU,HIS,HIS_D,LYS,ASN,GLN,ARG,SER,THR\n",
      "11\tTRUE\tTRUE\tASP,GLU,HIS,HIS_D,LYS,ASN,GLN,ARG,SER,THR\n",
      "12\tTRUE\tTRUE\tASP,GLU,HIS,HIS_D,LYS,ASN,GLN,ARG,SER,THR\n",
      "13\tTRUE\tTRUE\tASP,GLU,HIS,HIS_D,LYS,ASN,GLN,ARG,SER,THR\n",
      "14\tTRUE\tTRUE\tASP:CtermProteinFull,GLU:CtermProteinFull,HIS:CtermProteinFull,HIS_D:CtermProteinFull,LYS:CtermProteinFull,ASN:CtermProteinFull,GLN:CtermProteinFull,ARG:CtermProteinFull,SER:CtermProteinFull,THR:CtermProteinFull\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# restrict to baestype list\n",
    "resfile_type = ReadResfile('./data/multi-logic3.resfile')\n",
    "\n",
    "# 将TaskOperations加载至TaskFactory中\n",
    "pack_tf = TaskFactory()\n",
    "pack_tf.push_back(resfile_type)\n",
    "\n",
    "# 生成PackerTask\n",
    "packer_task = pack_tf.create_task_and_apply_taskoperations(pose)\n",
    "print(packer_task)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "informational-bolivia",
   "metadata": {},
   "source": [
    "得到的结果: 1-5号设计为非极性氨酸,其余氨基酸设计为极性氨基酸。因为氨基酸范围指定优先级大于链范围指定。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bright-niagara",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}