In [None]:
from htmd.ui import *
config(viewer='ngl')

# Adaptive sampling

## Stefan Doerr
Universitat Pompeu Fabra & Acellera

We demonstrate how to use the HTMD code to run adaptive sampling.

You can download the generators from the following link:

* [Generators](http://pub.htmd.org/tutorials/adaptive-sampling/generators.tar.gz).

Alternatively, you can download the generators using `wget`.

In [None]:
import os
os.system('wget -q http://pub.htmd.org/tutorials/adaptive-sampling/generators.tar.gz; tar -xf generators.tar.gz && rm generators.tar.gz')

## Generators folder structure

In [3]:
!tree generators | head -20

generators
|-- ntl9_1ns_0
| |-- input
| |-- input.coor
| |-- input.xsc
| |-- parameters
| |-- run.sh
| |-- structure.pdb
| `-- structure.psf
|-- ntl9_1ns_1
| |-- input
| |-- input.coor
| |-- input.xsc
| |-- parameters
| |-- run.sh
| |-- structure.pdb
| `-- structure.psf
`-- ntl9_1ns_2
 |-- input
 |-- input.coor


## Adaptive classes

* AdaptiveMD (free exploration)
* AdaptiveGoal (exploration + exploitation)

Create a directory for each type of adaptive and copy the generators into them:

In [None]:
os.makedirs('./adaptivemd', exist_ok=True)
os.makedirs('./adaptivegoal', exist_ok=True)
shutil.copytree('./generators', './adaptivemd/generators')
shutil.copytree('./generators', './adaptivegoal/generators')

## AdaptiveMD

In [5]:
os.chdir('./adaptivemd')

* Setup the queue that will be used for simulations. 
* Tell it to store completed trajectories in the data folder as this is where `AdaptiveMD` expects them to be by default

In [6]:
queue = LocalGPUQueue()
queue.datadir = './data'

In [7]:
ad = AdaptiveMD()
ad.app = queue

* Set the `nmin`, `nmax` and `nepochs`

In [8]:
ad.nmin = 1
ad.nmax = 3
ad.nepochs = 3

* Choose what projection to use for the construction of the Markov model

In [9]:
protsel = 'protein and name CA'
ad.projection = MetricSelfDistance(protsel)

* Set the `updateperiod` of the Adaptive to define how often it will poll for completed simulations and redo the analysis

In [10]:
ad.updateperiod = 300 # execute every 5 minutes

Launch the `AdaptiveMD` run:

In [None]:
ad.run()

In [12]:
os.chdir('../adaptivegoal')

## AdaptiveGoal

In [12]:
os.chdir('../adaptivegoal')

* Most of the class arguments are identical to AdaptiveMD

In [13]:
adg = AdaptiveGoal()
adg.app = queue
adg.nmin = 1
adg.nmax = 3
adg.nepochs = 2
adg.generatorspath = './generators'
adg.projection = MetricSelfDistance('protein and name CA')
adg.updateperiod = 300 # execute every 5 minutes
adg.goalfunction = None # set to None just as an example

* It requires the `goalfunction` argument which defines a goal
* We can define a variety of different goal functions

## The goal function

The goal function will:
* take as input a `Molecule` object of a simulation and 
* produce as output a score for each frame of that simulation. 
* The higher the score, the more desirable that simulation frame for being respawned.

## RMSD goal function

For this goal function, we will use a crystal structure of NTL9.

You can download the structure from the following link and save it on the `adaptivegoal` directory:

* [NTL9 crystal structure](http://pub.htmd.org/tutorials/adaptive-sampling/ntl9_crystal.pdb).

Alternatively, you can download the structure using `wget`.

In [None]:
os.system('wget -q http://pub.htmd.org/tutorials/adaptive-sampling/ntl9_crystal.pdb')

We can define a simple goal function that uses the RMSD between the conformation sampled and a reference (in this case, the crystal structure), and returns a score to be evaluated by the `AdaptiveGoal` algorithm:

In [15]:
ref = Molecule('./ntl9_crystal.pdb')

def mygoalfunction(mol):
 rmsd = MetricRmsd(ref, 'protein and name CA').project(mol)
 return -rmsd # or even 1/rmsd

adg.goalfunction = mygoalfunction

`AdaptiveGoal` ranks conformations from a high to low score. For the case of RMSD, since we want lower RMSD to give higher score, the symetric value is returned instead (the inverse would also work).

Launch the `AdaptiveGoal` run:

In [None]:
adg.run()

## Functions with multiple arguments

The goal function can also take multiple arguments. This allows flexibility and on-the-fly comparisons to non-static conformations (i.e. compare with different references as the run progresses). Here, we redefine the previous goal function with multiple arguments:

In [17]:
def newgoalfunction(mol, crystal):
 rmsd = MetricRmsd(crystal, 'protein and name CA').project(mol)
 return -rmsd # or even 1/rmsd

Now we clean the previous `AdaptiveGoal` run, and start a new one with the new goal function:

In [17]:
# clean previous run
shutil.rmtree('./input')
shutil.rmtree('./data')
shutil.rmtree('./filtered')

# run with new goal
ref = Molecule('./ntl9_crystal.pdb')
adg.goalfunction = (newgoalfunction, (ref,))
adg.run()

# Other goal function examples

## Secondary structure goal function

In [19]:
ref = Molecule('./ntl9_crystal.pdb')

def ssGoal(mol, crystal):
 crystalSS = MetricSecondaryStructure().project(crystal)[0]
 proj = MetricSecondaryStructure().project(mol)
 # How many crystal SS match with simulation SS
 ss_score = np.sum(proj == crystalSS, axis=1) / proj.shape[1] 
 return ss_score

adg.goalfunction = (ssGoal, (ref,))

## Contacts goal function

In [20]:
ref = Molecule('./ntl9_crystal.pdb')

def contactGoal(mol, crystal):
 crystalCO = MetricSelfDistance('protein and name CA', pbc=False,
 metric='contacts', 
 threshold=10).project(crystal)
 proj = MetricSelfDistance('protein and name CA', 
 metric='contacts', 
 threshold=10).project(mol)
 # How many crystal contacts are seen?
 co_score = np.sum(proj[:, crystalCO] == 1, axis=1)
 co_score /= np.sum(crystalCO)
 return ss_score

adg.goalfunction = (contactGoal, (ref,))