## TaskOperation & PackTask

@Author: 吴炜坤

@email:weikun.wu@xtalpi.com/weikunwu@163.com

上一节,我们详细介绍了Rotamer以及Rosetta Packer的基本工作原理。Packer有三种工作方式: “Repacking”、“Rotamer Trial”和“Design”。其中Rotamer就有可能存在3种基本状态:
- Repacking: 该位点处Rotamer类型只能是同一种氨基酸;
- Design: 该位点上Rotamer类型可以是多种氨酸;
- Fixed: 该位点Rotamer不允许发生变化。

在本章节中,读者将学习到如何使用TaskOperations来控制Rotamer在Packer中的“行为规范”。

### 一、PackerTask与TaskFactory

如果把Packer想象成一位“蛋白构建大师”,他正在将一些建筑材料(Rotamer)安装在地基上(Backbone的$C_{\alpha}$原子)。**但是在施工之前,建筑大师必须知道他需要在每个地基点上能安装什么材料**,并且经过他的深思熟虑(模拟退火)来构建出最完美的艺术品。此时,他需要我们的帮助,给他一张施工的蓝图。而这张蓝图就是PackerTask。

<center><img src="./img/PackerTask.jpg" width = "700" height = "200" align=center /></center>
(图片来源: 晶泰科技团队)

In [1]:
# 初始化PyRosetta并读取一段螺旋结构的PDB。
from pyrosetta import *
init()
pose = pose_from_pdb('./data/helix.pdb')

PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.26+release.b308454c455dd04f6824cc8b23e54bbb9be2cdd7 2021-07-02T13:01:54] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
[0mcore.init: {0} [0mChecking for fconfig files in pwd and ./rosetta/flags
[0mcore.init: {0} [0mRosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r288 2021.26+release.b308454c455 b308454c455dd04f6824cc8b23e54bbb9be2cdd7 http://www.pyrosetta.org 2021-07-02T13:01:54
[0mcore.init: {0} [0mcommand: PyRosetta -ex1 -ex2aro -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database
[0mbasic.random.init_random_generator: {0} [0m'RNG device' seed mode, using '/dev/urandom', seed=-778071707 seed_offset=0 real_seed=-778071707 thread_index=0
[0mbasic.random.init_random_generator: {0} [0mRandomGenerator:init: Normal mode, seed=-778071707 R

**PackerTask需要从TaskFactory中进行自动生成。**

In [2]:
# 从TaskFactory生成packer_task
from pyrosetta.rosetta.core.pack.task import TaskFactory
tf = TaskFactory()
packer_task = tf.create_packer_task(pose)
print(packer_task)

#Packer_Task

Threads to request: ALL AVAILABLE

resid	pack?	design?	allowed_aas
1	TRUE	TRUE	ALA:NtermProteinFull,CYS:NtermProteinFull,ASP:NtermProteinFull,GLU:NtermProteinFull,PHE:NtermProteinFull,GLY:NtermProteinFull,HIS:NtermProteinFull,HIS_D:NtermProteinFull,ILE:NtermProteinFull,LYS:NtermProteinFull,LEU:NtermProteinFull,MET:NtermProteinFull,ASN:NtermProteinFull,PRO:NtermProteinFull,GLN:NtermProteinFull,ARG:NtermProteinFull,SER:NtermProteinFull,THR:NtermProteinFull,VAL:NtermProteinFull,TRP:NtermProteinFull,TYR:NtermProteinFull
2	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
3	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
4	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
5	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
6	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GL

从上述结果可见,默认生成的Packer_Task包含三种数据:
- 残基位点编号信息
- 每个位点上,Rotamer的3种基本状态
- 每个位点上,被允许氨基酸Rotamer类型

**并且默认的PackerTask是运行每个位点都包含20种氨基酸的Rotamer**

### 二、什么是TaskOperations?

TaskOperations直译的意思即为"任务操作元件", **TaskOperations被TaskFactory加载并创建PackerTask。**

TaskOperations可以直观的理解为对**指定位点氨基酸的Rotamer自由度进行消减的过程**,类似做雕刻的过程,一开始默认的PackerTask允许所有20种氨基酸的Rotamer的出现,在加载不同的TaskOperations后,PackerTask的Rotamer自由度开始不断的缩减,直到满足用户设定的需求。其中一点是非常重要的。**一旦将Rotamer的自由度缩减后,将无法在PackerTask中重新被激活。** 如,第一个TaskOperations设定5号位点为“Fixed“的状态,第二个TaskOperations再重新设定5号位点为“Repacking“的状态。TaskFactory生成的PackerTask中5号位点**依然为“Fixed(No_repack)“的状态**。从底层逻辑来理解,TaskFactory将所有的TaskOperations中取Rotamer自由度越小的那个状态来生成PackerTask。

<center><img src="./img/taskop_turnoff.png" width = "500" height = "200" align=center /></center>
(图片来源: jadolfbr@gmail.com提供的ppt)

小结: 
* TaskOperations的三层自由度级别: Design/Repack/No_repack;
* TaskOperations形象地类似于"冰雕设计", 它的作用是告诉TaskFactory在Pose中每个位点的Rotamer自由度大小;
* 一旦一个氨基酸或区域被设计定为低自由度状态时,那么这个位点将不能被重新设定为高自由度状态。

### 三、TaskOperations的分类

TaskOperation从构建逻辑上来分类共计有两种类型:

* Residue Level TaskOperations: 根据Selector设定选择范围内位点的Rotamer自由度(手动挡);
* Specialized Operations: 根据预设好逻辑,对位点进行全局Rotamer操作(自动挡);

#### 3.1 Residue Level TaskOperations的使用示例

Residue Level TaskOperations(RLT)一般需要配合Selector来指定操作的范围。用户可以直观地将RLT理解为一个**自定义版本的Specialized Operations**。

**特别注意的是:RLT是无法直接被TaskFactory所读取,其必须通过OperateOnResidueSubset函数来生成一个标准的TaskOperations。**

此处举一个简单RestrictToRepackingRLT(顾明思议此RTL是将Rotamer约束到现有氨基酸类型上。)应用的例子:

In [3]:
from pyrosetta.rosetta.core.pack.task.operation import RestrictToRepackingRLT, OperateOnResidueSubset
from pyrosetta.rosetta.core.select.residue_selector import ResidueIndexSelector
# 选择氨基酸范围
select_pos3 = ResidueIndexSelector('1,3,5')

# 使用OperateOnResidueSubset生成TaskOperations
packing_taskop = OperateOnResidueSubset(RestrictToRepackingRLT(), select_pos3, False)

# 将TaskOperations加载至TaskFactory中
pack_tf = TaskFactory()
pack_tf.push_back(packing_taskop)

# 生成PackerTask
packer_task = pack_tf.create_task_and_apply_taskoperations(pose)
print(packer_task)

#Packer_Task

Threads to request: ALL AVAILABLE

resid	pack?	design?	allowed_aas
1	TRUE	FALSE	ASP:NtermProteinFull
2	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
3	TRUE	FALSE	LEU
4	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
5	TRUE	FALSE	LYS
6	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
7	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
8	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
9	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
10	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
11	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
12	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,AS

此时,通过PackerTask我们可以观测到,我们选择的1,3,5号氨基酸都只被允许为进行pack,氨基酸的类型也只剩下pose中原本的氨基酸类型。

除了RestrictToRepackingRLT以外,还有许多其他7种常用的RTL类型,此处做一个简单的列表解释用途:
- RestrictToRepackingRLT: 将Rotamer自由度设置为repacking
- PreventRepackingRLT: 将Rotamer自由度设置为no_repack
- RestrictAbsentCanonicalAASExceptNativeRLT: 将Rotamer自由度限定在给定的氨基酸类型列表,并允许保留当前位点氨基酸类型的Rotamer
- RestrictAbsentCanonicalAASRLT: 将Rotamer自由度限定在给定的氨基酸类型列表。
- DisallowIfNonnativeRLT: 将Rotamer自由度限定在**非给定的氨基酸类型列表**,并允许保留当前位点氨基酸类型的Rotamer
- IncludeCurrentRLT: 设定Packer在执行期间,考虑Pose输入时的Rotamer状态(并非全部忘记)。
- ExtraRotamersGenericRLT: 设定在Rotamer采样中,是否考虑增加额外的数量,否则仅会考虑理想型的Rotamer。 

上述的这些RTL和Selector就可以组合成任意的TaskOperations,更多RLT的API用法请参考TaskOperations API详解相关的章节。

#### 思考
如果将OperateOnResidueSubset中的False转变为True,PackerTask会发生什么变化?

#### 3.2 Specialized Operations的使用示例

Specialized Operations其实就是开发者预先设定好一些应用场景的TaskOperations,这类TaskOperations可以直接被TaskFactory所读取,一般的这种TaskOperations的应用范围都是全局性。

In [4]:
from pyrosetta.rosetta.core.pack.task.operation import RestrictToRepacking

# 将TaskOperations加载至TaskFactory中
pack_tf = TaskFactory()
pack_tf.push_back(RestrictToRepacking())

# 生成PackerTask
packer_task = pack_tf.create_task_and_apply_taskoperations(pose)
print(packer_task)

#Packer_Task

Threads to request: ALL AVAILABLE

resid	pack?	design?	allowed_aas
1	TRUE	FALSE	ASP:NtermProteinFull
2	TRUE	FALSE	GLU
3	TRUE	FALSE	LEU
4	TRUE	FALSE	GLN
5	TRUE	FALSE	LYS
6	TRUE	FALSE	TRP
7	TRUE	FALSE	VAL
8	TRUE	FALSE	GLU
9	TRUE	FALSE	GLN
10	TRUE	FALSE	ALA
11	TRUE	FALSE	GLU
12	TRUE	FALSE	ARG
13	TRUE	FALSE	ASN
14	TRUE	FALSE	GLY:CtermProteinFull



可见所有位点都被限制为repacking的状态,更多Specialized Operations的API用法请参考TaskOperations API详解相关的章节。

### 四、Pack Rotamer相关Mover

#### 4.1 PackRotamersMover

上述内容已经清楚地阐释了PackerTask的作用,以及如何去利用TaskOperations和TaskFactory去生成特定用途的PackerTask。当有了这张清晰的“蓝图文件后”,就实际地在Pose结构上进行“施工”,而真正执行任务的Mover大多可以在pyrosetta.rosetta.protocols.minimization_packing的API下找到。此处我们以最简单的PackRotamersMover做实例示范。

In [5]:
from pyrosetta.rosetta.protocols.minimization_packing import PackRotamersMover
from pyrosetta import create_score_function
pack_mover = PackRotamersMover()
ref2015 = create_score_function('ref2015')
pose = pose_from_pdb('./data/helix.pdb')

# 不需要导入PackTask,只需要输入TaskFactory即可。
pack_mover.task_factory(pack_tf)
pack_mover.score_function(ref2015)

# 执行repacking
pack_mover.apply(pose)
pose.dump_pdb('./data/repacked.pdb')

[0mcore.scoring.etable: {0} [0mStarting energy table calculation
[0mcore.scoring.etable: {0} [0msmooth_etable: changing atr/rep split to bottom of energy well
[0mcore.scoring.etable: {0} [0msmooth_etable: spline smoothing lj etables (maxdis = 6)
[0mcore.scoring.etable: {0} [0msmooth_etable: spline smoothing solvation etables (max_dis = 6)
[0mcore.scoring.etable: {0} [0mFinished calculating energy tables.
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBPoly1D.csv
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBFadeIntervals.csv
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBEval.csv
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/DonStrength.csv
[0mbasic.io.database: {0} [0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/AccStrength.csv
[0mbasic.i

True

对比原始输入结构和repack之后的结构变化,发现只有侧链的rotamer发生了变化,和预期一致。
<center><img src="./img/repacking_conf.png" width = "600" height = "200" align=center /></center>
(图片来源: 晶泰科技团队)

#### 4.2 FastRelax

上述的介绍的PackRotamersMover所做的就仅仅是处理侧链构象相关的模拟退火过程。但是在实际应用中Rosetta常常交替式运行Packer和Minimizer(MinMover),这类组合型的Mover常有的有FastRelaxMover以及FastDesignMover。这些Mover可以同时优化侧链和骨架去寻找能量更低的构象。

一张图看懂FastRelax在做什么:

<center><img src="./img/FastRelax.jpg" width = "600" height = "200" align=center /></center>
(图片来源: 晶泰科技团队)

In [6]:
# FastRelax
from pyrosetta.rosetta.protocols.relax import FastRelax
fastrelax = FastRelax()
fastrelax.set_scorefxn(ref2015)
fastrelax.set_default_movemap() #使用默认的Movemap()
fastrelax.apply(pose)
pose.dump_pdb('./data/fastrelaxed.pdb')

[0mprotocols.relax.RelaxScriptManager: {0} [0mReading relax scripts list from database.
[0mcore.scoring.ScoreFunctionFactory: {0} [0mSCOREFUNCTION: [32mref2015[0m
[0mprotocols.relax.RelaxScriptManager: {0} [0mLooking for MonomerRelax2019.txt
[0mprotocols.relax.RelaxScriptManager: {0} [0mrepeat %%nrepeats%%
[0mprotocols.relax.RelaxScriptManager: {0} [0mcoord_cst_weight 1.0
[0mprotocols.relax.RelaxScriptManager: {0} [0mscale:fa_rep 0.040
[0mprotocols.relax.RelaxScriptManager: {0} [0mrepack
[0mprotocols.relax.RelaxScriptManager: {0} [0mscale:fa_rep 0.051
[0mprotocols.relax.RelaxScriptManager: {0} [0mmin 0.01
[0mprotocols.relax.RelaxScriptManager: {0} [0mcoord_cst_weight 0.5
[0mprotocols.relax.RelaxScriptManager: {0} [0mscale:fa_rep 0.265
[0mprotocols.relax.RelaxScriptManager: {0} [0mrepack
[0mprotocols.relax.RelaxScriptManager: {0} [0mscale:fa_rep 0.280
[0mprotocols.relax.RelaxScriptManager: {0} [0mmin 0.01
[0mprotocols.relax.RelaxScriptManager: {0} [0mcoor

True

<center><img src="./img/fastrelaxed.png" width = "600" height = "200" align=center /></center>
(图片来源: 晶泰科技团队)

尽管构象没有非常大的变化,但是可见骨架和侧链是同时被优化的。

#### 4.3 FastDesign

FastDesign其实就是FastRelax的Design版本,允许序列被进行设计。此处我们定一个Task不允许1、3、5、7、9位点design,其余位点允许设计出20种氨基酸。

In [7]:
# 设置FastDesign的蓝图。
from pyrosetta.rosetta.protocols.denovo_design.movers import FastDesign

# 将TaskOperations加载至TaskFactory中
# 选择氨基酸范围
no_design_pos = ResidueIndexSelector('1,3,5,7,9')

# 使用OperateOnResidueSubset生成TaskOperations
design_taskop = OperateOnResidueSubset(RestrictToRepackingRLT(), no_design_pos, False)

design_tf = TaskFactory()
design_tf.push_back(design_taskop)

# 生成PackerTask
design_task = design_tf.create_task_and_apply_taskoperations(pose)
print(design_task)

#Packer_Task

Threads to request: ALL AVAILABLE

resid	pack?	design?	allowed_aas
1	TRUE	FALSE	ASP:NtermProteinFull
2	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
3	TRUE	FALSE	LEU
4	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
5	TRUE	FALSE	LYS
6	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
7	TRUE	FALSE	VAL
8	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
9	TRUE	FALSE	GLN
10	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
11	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
12	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
13	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
14	TRUE	TRUE	ALA:CtermProtein

In [8]:
# 初始化FastDesign:
fastdesign = FastDesign()
fastdesign.set_scorefxn(ref2015)
fastdesign.set_default_movemap() #使用默认的Movemap()
fastdesign.set_task_factory(design_tf)
fastdesign.apply(pose)
pose.dump_pdb('./data/fastdesign.pdb')

[0mcore.scoring.ScoreFunctionFactory: {0} [0mSCOREFUNCTION: [32mref2015[0m
[0mprotocols.denovo_design.movers.FastDesign: {0} [0m#Packer_Task
[0mprotocols.denovo_design.movers.FastDesign: {0} [0m
[0mprotocols.denovo_design.movers.FastDesign: {0} [0mThreads to request: ALL AVAILABLE
[0mprotocols.denovo_design.movers.FastDesign: {0} [0m
[0mprotocols.denovo_design.movers.FastDesign: {0} [0mresid	pack?	design?	allowed_aas
[0mprotocols.denovo_design.movers.FastDesign: {0} [0m1	TRUE	FALSE	ASP:NtermProteinFull
[0mprotocols.denovo_design.movers.FastDesign: {0} [0m2	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
[0mprotocols.denovo_design.movers.FastDesign: {0} [0m3	TRUE	FALSE	LEU
[0mprotocols.denovo_design.movers.FastDesign: {0} [0m4	TRUE	TRUE	ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR
[0mprotocols.denovo_design.movers.FastDesign: {0} [0m5	TRUE	FALSE	LYS
[0mprotocols.denovo_desig

True

通过结构比对,我发现一些位点上的氨基酸类型发生了变化。
<center><img src="./img/fastdesign.png" width = "500" height = "200" align=center /></center>
(图片来源: 晶泰科技团队)

### 练习:
1. 【LayerDesign】找一个小于60个氨基酸的小蛋白质,通过PyMOL观察,将包埋在内部的氨基酸标记,只允许这些部分设计为疏水氨基酸,其余暴露在溶液中的部分只能设计为极性氨基酸,使用相关的ResidueSelector选择并创建相关TaskOperations,最后使用FastDesign进行设计,查看设计的结果。