{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# An overview of Biobox"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Welcome! In this notebook we will overview Biobox's main features, and point to relevant publications where they were used. Biobox is developed by the Degiacomi group (www.degiacomi.org) in Durham University (UK), and is downloadable from Github, https://github.com/Degiacomi-Lab/biobox. Its API is available here: https://Degiacomi-Lab.github.io/biobox/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This overview covers the following topics:\n",
    "\n",
    "* [Selecting atoms from a (multi)PDB](#select)\n",
    "* [Protein conformations clustering](#cluster)\n",
    "* [Protein polyhedral assemblies](#polyhedra)\n",
    "* [Super-coarse grain modelling](#supercg)\n",
    "* [Density map cutoff via Collision Cross Section](#density)\n",
    "* [Calculating cross-linking distances](#xlink)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "<b>Citing Biobox</b>. If you use Biobox in your work, please cite:\n",
    "<a href=\"https://academic.oup.com/bioinformatics/article/38/4/1149/6428530\">\n",
    "    L. S. P. Rudden et al., Biobox: a toolbox for biomolecular modelling, Bioinformatics, 2021\n",
    "    </a>.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's get started by compiling Biobox provided along with this notebook (this can take a minute). Then, let's import Biobox, numpy, matplotlib and nglview (we will need all these packages later)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "os.chdir('biobox')\n",
    "os.system('python setup.py install')\n",
    "os.chdir('..')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from scipy.cluster import hierarchy\n",
    "import matplotlib.pyplot as plt\n",
    "import nglview as nv\n",
    "\n",
    "import biobox as bb"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Selecting atoms from a (multi)PDB <a class=\"anchor\" id=\"select\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let’s load a protein in a PDB file, containing 20 frames of a molecular dynamics simulation of a small heat-shock protein."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "M = bb.Molecule()\n",
    "M.import_pdb(\"HSP.pdb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All the atomic coordinates are stored in `M.coordinates`, which is a directly accessible 3D array of size `(nb_conformations, nb_atoms, 3)`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(M.coordinates.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "The properties of all atoms (i.e. anything that is not coordinates) are stored in the pandas data structure `M.data`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "M.data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's check how many chains this structure is made of:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(np.unique(M.data[\"chain\"]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Well that's odd, small heat shock proteins come as a dimer... As it sometimes happens, the MD engine has replaced the chain assignment with a single symbol, `X`. Let's ask Biobox to figure out how to split the protein in its two chains."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "M.guess_chain_split()\n",
    "print(np.unique(M.data[\"chain\"]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, that's better, two chains have been found and assigned. Now, let's identify only the backbone atoms of chain A. `atomselect` accepts as parameters single strings, lists or `“*”` as wildcard. After this call, `pos` contains the coordinates of all selected atoms, and `idx` their indices. Another way to select atoms, is to use the `query` method. The following call will yield the same result as the `atomselect` above. The `query` method follows the pandas query syntax, and allows to be more expressive. Any column stored in `M.data` can be addressed. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pos, idx = M.atomselect(\"A\", \"*\", [\"CA\",\"C\",\"N\",\"O\"], get_index=True)\n",
    "pos, idx = M.query('chain == \"A\" and name == [\"CA\",\"C\",\"N\",\"O\"]', get_index=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have identified indices of interest, we can save a subset of the initial pdb in a new one, or to create a new `Molecule` object containing only them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "M.write_pdb(\"chainA.pdb\", index=idx)\n",
    "M2 = M.get_subset(idx)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's have a look at what we did. We will represent in cartoon the input protein, and as a surface the part of the protein we selected and saved into a new file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "view = nv.show_file(\"HSP.pdb\")\n",
    "view.add_component(\"chainA.pdb\")\n",
    "view.clear_representations(component=0)\n",
    "view.add_representation(\"cartoon\", component=0)\n",
    "view.clear_representations(component=1)\n",
    "view.add_representation(\"surface\", opacity=0.2, component=1)\n",
    "view"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "multiple conformations may be available in the PDB file. By default, the first one is set as current. It is possible to set as current another one as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "M.set_current(2)\n",
    "pos2, idx2 = M.atomselect(\"A\", \"*\", [\"CA\",\"C\",\"N\",\"O\"], get_index=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After this new `atomselect` call, `idx2` will be equal to `idx1` (atom selected are still the same), but `pos2` will be different from `pos` (atoms positions differ between different conformations). Unless otherwise specified, `get_subset` selects all the alternative conformations from the atoms of interest. `get_subset` can however also be instructed to select a subset of conformations. For instance, the following call will select only the conformations 0, 1 and 2 of atoms of interest."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "M2 = M.get_subset(idx, conformations=[0,1,2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Protein conformations clustering <a class=\"anchor\" id=\"cluster\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Biobox methods return numpy arrays. This means that you can directly benefit from data analysis tools in all major Python scientific computing packages. For instance, let's start by calculating an RMSD distance matrix of protein conformations stored in our molecule `M`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dist = M.rmsd_distance_matrix(flat=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's run a hierarchical clustering on the multi-PDB we previously loaded by first calculating an all-vs-all RMSD matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# hierarchical clustering\n",
    "hierarchical_cluster = hierarchy.linkage(dist, method='single')\n",
    "\n",
    "#get assignment of structures to clusters\n",
    "flat_clusters = hierarchy.fcluster(hierarchical_cluster, 2.0, criterion='distance')\n",
    "print(\"clustering:\", flat_clusters)\n",
    "\n",
    "# plot dendrogram\n",
    "d = hierarchy.dendrogram(hierarchical_cluster)\n",
    "plt.xlabel(\"conformation (#)\")\n",
    "plt.ylabel(\"RMSD ($\\AA$)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Protein polyhedral assemblies <a class=\"anchor\" id=\"polyhedra\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We want to produce several protein tetrahedral assemblies, and compare them to each other. Now, let’s create a `Multimer` arranged according to a tetrahedral symmetry. To do so, we have to load information about the tetrahedral scaffold Biobox will exploit to align six monomers. By default this information is stored in the file `classes/polyhedron_database.dat` (along with many more symmetries), though the user can import their own database. The other information we need, is what molecule we will be the tetrahedron with. Here, we will select the first conformation of our small heat-shock protein, and we center it at the origin."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "M_subunit = M.get_subset(idxs=list(range(len(M))), conformations=[0])\n",
    "M_subunit.center_to_origin()\n",
    "\n",
    "P = bb.Multimer()\n",
    "P.setup_polyhedron('Tetrahedron', M_subunit)\n",
    "P.generate_polyhedron(45, 180, 0, 0)\n",
    "P.write_pdb(\"polyhedron.pdb\", rename_chains=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, `P` contains six proteins arranged as a tetrahedron having a radius of 10 Angstrom. Every subunit is rotated with respect of its specific position on the scaffold. Rotation angles are defined with respect of the molecule’s principal axes. Here, we rotate by 180 degrees around the first principal axis, 20 around the second, and 10 around the third. Let's have a look!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "view = nv.show_file(\"polyhedron.pdb\")\n",
    "view"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let’s now build two new polyhedra with different radii and rotation angles:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "P.generate_polyhedron(10, 180, 50, 65, add_conformation=True)\n",
    "P.generate_polyhedron(12, 185, 40, 60, add_conformation=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since we set `add_conformation=True`, the atoms arrangement of the new multimers will be appended as new conformations. With `add_conformation=False` (default) the previous subunits arrangements gets overwritten. Note that assemblies’ multiple conformations are treated by appending on each subunit its different conformation. Biobox then sets on all subunits the same current position. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Subunits can also be grouped, and different groups can be rotated differently. In the following example, the tetrahedron’s chains A, B, C and D, E, F form different groups that are rotated independently."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "P.conn_type = np.array([0, 0, 0, 1, 1, 1])\n",
    "P.generate_polyhedron(45, np.array([180, 90]), np.array([0, 0]), np.array([0, 0]), add_conformation=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's visualize our new polyhedron:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "P.write_pdb(\"polyhedron2.pdb\", rename_chains=True)\n",
    "view = nv.show_file(\"polyhedron2.pdb\")\n",
    "view"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that when more than one edge type is provided, rotation angles should be in the form of a numpy array having the same length as the amount of different groups in connection (values in `conn_type` are used to index the angles arrays)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Polyhedral scaffolds are constituted of vertices connected by edges. By altering the position of the vertices, the scaffolds can be deformed (e.g. useful to model near-symmetries). In Biobox, deformations are treated in terms of deformation vectors, i.e. unit-vectors indicating in which direction a vertex can move. Here, we will allow the first vertex to move radially. We will then build a tetrahedron, where this vertex is displaced from its initial position by its deformation vector, scaled by a constant (here, 20)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "P.deform = np.empty(0)\n",
    "P.add_deformation(1)\n",
    "P.conn_type = np.array([0, 0, 0, 0, 0, 0])\n",
    "P.generate_polyhedron(45, 180, 0, 0, deformation=[20], add_conformation=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's take a look at our latest polyhedron!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "P.write_pdb(\"polyhedron3.pdb\", rename_chains=True)\n",
    "view = nv.show_file(\"polyhedron3.pdb\")\n",
    "view"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that add_deformation also accepts user-defined deformation vectors. To see how your scaffold looks like, a pdb file containing the vertices and an associated TCL script for VMD (drawing colored edges, as a function of grouping) can be produced."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "P.write_poly_architecture(\"architecture\", scale=10, deformation=[5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will generate two files `architecture.pdb` and `architecture.tcl`. The initial unit-sized scaffold will scaled by 10, and the first vertex moved away radially."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we want to calculate the RMSD between the created multimers’ alpha carbons. With these lines, `dist_mat` will contain the RMSD distance matrix between the multimers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "idxs = P.atomselect(\"*\", \"*\" ,\"*\", \"CA\", get_index=True)[1]\n",
    "dist_mat = P.rmsd_distance_matrix(points_indices=idxs)\n",
    "print(dist_mat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that, as for the case of `atomselect` applied to `Molecule` objects, a `query` method is also available. The same selection as the command above can be obtained with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pts, idx = P.query('name == \"CA\"', get_index=True)\n",
    "print(len(idx))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To select atoms from some specific units, the following command can be issued:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "pts, idx = P.query('unit == [\"0\", \"3\", \"5\"] and name == \"CA\"', get_index=True)\n",
    "print(len(idx))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    <b>See also</b> : the polyhedra building method has been first presented in\n",
    "  <a href=\"https://www.sciencedirect.com/science/article/pii/S0969212611003273\"> A. J. Baldwin et. al., The Polydispersity of αB-Crystallin Is Rationalized by an Interconverting Polyhedral Architecture, Structure, 2011</a>.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    <b>See also</b> : this polyhedra building method was used to build assemblies in\n",
    "  <a href=\"https://www.jbc.org/article/S0021-9258(20)31083-8/fulltext\">I. Santhanagopalan I. et al., It takes a dimer to tango: Oligomeric small heat shock proteins dissociate to capture substrate, Journal of Biological Chemisty, 2018 </a>.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Super-coarse grain modelling <a class=\"anchor\" id=\"supercg\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example, we will arrange a group of cylinders in a ring. To do so, we have first to create a single collection of points arranged like a cylinder. Unless otherwise specified (using the optional keyword radius), every point composing the `Cylinder` instance (and any other convex point cloud) will have a radius of 1.4 Angstrom. To simulate a smooth surface, one can either increase the points radius, or their density. Finally, the resulting cylinder will be rotated by 45 degrees along the x axis."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "<b>Warning:</b> these models can be made of a very large number of pseudoatoms. High number of atoms are unsuitable for nglview. Here, we will use low sampling to enable representation via nglview.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cylinder_length = 10\n",
    "cylinder_radius = 40\n",
    "C = bb.Cylinder(cylinder_length, cylinder_radius, pts_density_u=np.pi/10., pts_density_h=0.5)\n",
    "C.rotate(45, 0, 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will now create an assembly loading ten copies of our template cylinder, arrange them in a 30 Angstrom-wide circle, and save the resulting structure into a PDB file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A = bb.Assembly()\n",
    "A.load(C, 4)\n",
    "A.make_circular_symmetry(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's save this point cloud and display it with nglview."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A.write_pdb(\"assembly.pdb\")\n",
    "view = nv.show_file(\"assembly.pdb\")\n",
    "view"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now assess some of the assembly’s characteristics, for instance its height and width. This can be done by extracting all the assembly’s points coordinates in a unique numpy array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "xyz = A.get_all_xyz()\n",
    "width = np.max(xyz[:, 0]) - np.min(xyz[:, 0])\n",
    "height = np.max(xyz[:, 2]) - np.min(xyz[:, 2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An alternative way to measure assembly dimensions, it to profit of methods in `Structure` class. Here we collapse the `Assembly` units coordinates in a single `Structure` instance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "S = A.make_structure()\n",
    "print(S.get_size())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In case not all the subunits of the assembly are the same, a list of subunits can be loaded. In this case, we will load a `Sphere` (and call it `S`) as well as two identical cylinders (called `C1` and `C2`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sphere_radius = 20\n",
    "cylinder_radius = 5\n",
    "cylinder_length = 50\n",
    "\n",
    "S = bb.Sphere(sphere_radius, n_sphere_point=500)\n",
    "C = bb.Cylinder(cylinder_radius, cylinder_length, pts_density_u=np.pi/10, pts_density_h=0.5)\n",
    "A2 = bb.Assembly()\n",
    "A2.load_list([S, C, C], [\"S\", \"C1\", \"C2\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we will arrange the three loaded structures so that the bases of two cylinders are in touch with the sphere, and one cylinder is rotated by 45 degrees with respect to the other."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A2.translate(0, 0, -cylinder_length/2.0-sphere_radius, [\"C1\", \"C2\"])\n",
    "A2.rotate(0.0, 45.0, 180.0, [\"C2\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, translations (and rotations) can be applied to units subsets. In this case, we kept the sphere fixed, and only translated the cylinders, and then rotated just one of the two cylinders. Let's take a look at the final result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A2.write_pdb(\"assembly2.pdb\")\n",
    "view = nv.show_file(\"assembly2.pdb\")\n",
    "view"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    <b>See also</b> :  this super-coarse grain approach was exploited to calculate the collision cross-section of curved chains of ellipsoids in Fig.3 of\n",
    "  <a href=\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4832279/\">M. A. McDowell et al., Characterisation of Shigella Spa33 and Thermotoga FliM/N reveals a new model for C-ring assembly in T3SS, Molecular Microbiology, 2015</a>.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    <b>See also</b> :  a graphical representation of typical membrane protein arrangements was obtained combining super-coarse grain models and VMD-generated lipid bilayers, Fig.3 of  \n",
    "  <a href=\"https://pubs.acs.org/doi/abs/10.1021/jacs.5b00420\">C. Bechara and C. V. Robinson, Different Modes of Lipid Binding to Membrane Proteins Probed by Mass Spectrometry, JACS, 2015</a>.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Density map cutoff via Collision Cross Section <a class=\"anchor\" id=\"density\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ion Mobility (IM) experiments report on a molecule’s collision cross section (CCS). Here we show how to relate IM data with a electron density 3D reconstruction obtained by Electron Microscopy (EM). \n",
    "We first import a GroEL density map `EMD-1800.mrc`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "D = bb.Density()\n",
    "D.import_map(\"EMD-1080.mrc\", \"mrc\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Depending on which threshold value one selects, the resulting isosurface will have a certain volume and CCS. We now compute the map’s relationship between threshold, volume and CCS with 100 equally spaced threshold values. This might take several minutes, depending on map size (by default, a scan between minimal and maximal map intensity is performed). Obtained values will be returned in a numpy array containining as columns `[threshold, volume, CCS]`. This will also be stored in `self.properties[‘scan’]`, for future usage."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "<b>Warning on CCS calculations:</b> to carry out CCS calculations, the <a href=\"https://doi.org/10.1016/j.str.2015.02.010\">IMPACT</a> software must be available on your computer.\n",
    "The easiest way to ensure Biobox uses it, is to declare the environment variable IMPACTPATH, pointing at the folder where the software is installed.\n",
    "If you are running this tutorial online via Binder, CCS calculations will fail (a CCS equal to zero will be reported).\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tvc = D.threshold_vol_ccs(low=0, sampling_points=100, verbose=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let’s predict the density CCS using a fitted mass-based threshold, and compare it the known CCS of 20600 A^2. This requires providing the map’s resolution (here, 5.4 Angstrom) and the mass of GroEL (801 kDa). The procedure interrogates the data previously stored in `D.properties[‘scan’]`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ccs_mass, fitted_mass_thresh = D.predict_ccs_from_mass(5.4, 801)\n",
    "print(\"threshold: %s\"%fitted_mass_thresh)\n",
    "print(\"CCS: %s A2\"%ccs_mass)\n",
    "error = 100 * (np.abs(ccs_mass - 20600)/20600.)\n",
    "print(\"error vs IM data: %4.2f percent\"%error)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Error should be typically less than 5%. Values greater than 8% indicate that the protein’s conformation is likely different between EM and IM. We can use `fitted_mass_thresh` to create a bead model, that can then be saved into a PDB."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "D.place_points(fitted_mass_thresh)\n",
    "D.write_pdb(\"model_ccs_mass.pdb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's take a look at the bead model we produced!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "view = nv.show_file(\"model_ccs_mass.pdb\")\n",
    "view.add_component(\"EMD-1080.mrc\")\n",
    "view.clear_representations()\n",
    "view.add_spacefill(component=0)\n",
    "view.add_representation('surface', color='grey', isolevel=0.5, opacity=0.5, component=1)\n",
    "view"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    <b>See also</b> :  this method is described in\n",
    "  <a href=\"https://pubs.rsc.org/en/Content/ArticleLanding/2016/AN/C5AN01636C#!divAbstract\">M. T. Degiacomi and J. L. P. Benesch, EMnIM: software for relating ion mobility mass spectrometry and electron microscopy data, Analyst, 2016 </a>.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Calculating cross-linking distances <a class=\"anchor\" id=\"xlink\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Cross-linking experiments report on the distance between the side chain of specific amino-acids. This distance, measured by a cross-linker molecule, is however not a straight line, but a “shortest solvent accessible path”."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To identify in a structure which lysines may be cross-linked, we start loading it and identifying the location of all lysines’ NZ atoms:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "idx = M.atomselect(\"*\", \"LYS\", \"NZ\", use_resname=True, get_index=True)[1]\n",
    "print(M.get_data(idx))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To calculate the path distance between all these atoms, we must first define which protein atoms should be used for clash detection. Here, we select all backbone atoms as well as beta carbon ones. Furthermore, atoms buried in the protein core are also added (with `densify=True`). This makes the protein core more “dense”, reducing the likelihood that a path will find its way through the protein, instead of around it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "XL = bb.Xlink(M)\n",
    "XL.set_clashing_atoms(atoms=[\"CA\", \"C\", \"N\", \"O\", \"CB\"], densify=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We then set up the grid used by the path detection algorithms. Here, we use a local search, using a cubic moving grid of 18 Angstrom per side. After this, the distance matrix path detection algorithm can be launched. We will use a lazy Theta* method, with flexible side chains, and path smoothing as postprocessing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "XL.setup_local_search(maxdist=18)\n",
    "distance_mat, paths = XL.distance_matrix(idx, method=\"theta\", smooth=True, flexible_sidechain=False, get_path=True)\n",
    "print(distance_mat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's have a look at the distances we measured. The data structure `paths` is a list containing the information we need!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#gather all path points and save them into a PDB file\n",
    "path_points = []\n",
    "for p in paths:\n",
    "    if len(p[1])>0:\n",
    "        path_points.append(p[1])\n",
    "\n",
    "path_points = np.vstack(tuple(path_points))\n",
    "S = bb.Structure(p=path_points)\n",
    "S.write_pdb(\"paths.pdb\")\n",
    "\n",
    "# view protein and the cross-linking distances\n",
    "view = nv.show_file(\"paths.pdb\")\n",
    "view.add_component(\"HSP.pdb\")\n",
    "view.clear_representations()\n",
    "view.add_representation(\"spacefill\", radiusScale=0.2, component=0)\n",
    "view.add_licorice('LYS', component=1)\n",
    "view"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`distance_mat` is the distance matrix between all lysines, sorted according to `idx`. It will contain -1 for lysine’s linking atoms too far to be encompassed by the moving grid, and -2 for failed path detection (e.g. because a linking atom is buried)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    <b>See also</b> :  this method is presented and benchmarked in\n",
    "  <a href=\"https://www.cell.com/structure/fulltext/S0969-2126(17)30291-5\"> M. T. Degiacomi et al., Accommodating protein dynamics in the analysis of chemical cross-links, Structure, 2017</a>.\n",
    "</div>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}