{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Structural alignment of proteins by sequence"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the field of Bioinformatics it is usual to perform a BLAST search in order to find\n",
    "similar proteins to the one of your interest. If the structures of some of the results\n",
    "are available, you might want to align them all together to see the differences. \n",
    "These structures will usually have mutations, or vary in the number of residues and atom order in the PDB files, which makes simple, full-structural alignment functions fail.\n",
    "\n",
    "Therefore, HTMD provides the function ```moleculekit.tools.sequenceStructureAlignment```, which takes two proteins and\n",
    "aligns both structures using their longest **sequence** aligment.\n",
    "\n",
    "In this example, we will use Dopamine receptor (PDB code: '3PBL') and Beta Adrenergic receptor (PDB code: '3NYA').\n",
    "Both are GPCR proteins, so they share a great fraction of their sequence. We will use this feature to align their structures."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quick Example\n",
    "\n",
    "Adrenergic receptor will be used as the reference"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from htmd.ui import *\n",
    "from moleculekit.tools.sequencestructuralalignment import sequenceStructureAlignment\n",
    "\n",
    "# We will use adrenaline receptor as the reference/template\n",
    "\n",
    "# Load dopamine receptor and display in red\n",
    "dop_receptor = Molecule('3PBL')\n",
    "# The crystal is a dimer, so we discard one of the units\n",
    "dop_receptor.filter('protein and chain A',_logger=False) \n",
    "dop_receptor.view(style='NewCartoon',color='1',name='DopRec')  \n",
    "\n",
    "# Load adrenergic receptor and display in dark blue\n",
    "adr_receptor = Molecule('3NYA')\n",
    "adr_receptor.filter('protein')\n",
    "adr_receptor.filter('resid 0 to 342',_logger=False) # filter out the G-protein\n",
    "adr_receptor.view(style='NewCartoon',color='0',name='AdrRec')\n",
    "\n",
    "# adr_receptor acts as the template\n",
    "dop_receptor_results = sequenceStructureAlignment(dop_receptor,adr_receptor) \n",
    "# pick the top result\n",
    "dop_receptor_aligned = dop_receptor_results[0] \n",
    "\n",
    "# See the result, dopamine receptor is now aligned and displayed in green\n",
    "dop_receptor_aligned.view(style='NewCartoon',color='7',name='DopAligned')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Detailed Explanation\n",
    "\n",
    "First, we load the dopamine receptor. The crystal is a dimer of two receptors, so we discard one of the units."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from htmd.ui import *\n",
    "from moleculekit.tools.sequencestructuralalignment import sequenceStructureAlignment\n",
    "\n",
    "dop_receptor = Molecule('3PBL')\n",
    "dop_receptor.filter('protein and chain A') # discard one of the units"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we load the beta adrenergic receptor. The G-protein from the Adrenergic receptor (residues from 343 to last) is discarded, to ensure that these region is not used to align both proteins"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "adr_receptor = Molecule('3NYA')\n",
    "adr_receptor.filter('protein')\n",
    "adr_receptor.filter('resid 0 to 342') # discard the G-protein "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's see the two proteins together before the alignment. The dopamine receptor is displayed in red."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dop_receptor.view(style='NewCartoon',color='1',name='DopRec') # visualize it in red \n",
    "adr_receptor.view(style='NewCartoon',color='0',name='AdrRec') # visualize it in dark blue"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, both proteins are ready to be aligned, so we call the ```sequenceStructureAlignment``` function. The second protein molecule passed to the function acts as the reference, in our case, the adrenaline receptor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dop_receptor_results = sequenceStructureAlignment(dop_receptor,adr_receptor)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```dop_receptor_results``` stores different structural alignments, each using a different portion of the sequence alignment.\n",
    "By default, only the best 10 alignments are stored, but you can modify this behaviour setting the parameter ```maxalignments``` to an arbitrary number.\n",
    "\n",
    "Let's look at the best result by choosing the first item in ```dop_receptor_results```. Dopamine receptor is shown in green."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dop_receptor_aligned = dop_receptor_results[0] # pick the best result \n",
    "dop_receptor_aligned.view(style='NewCartoon',color='7',name='DopAligned')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}