{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "XmDh_5uQ5qZA"
   },
   "source": [
    "<!--NOTEBOOK_HEADER-->\n",
    "*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta.notebooks);\n",
    "content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6ak9wZiw5qZD"
   },
   "source": [
    "<!--NAVIGATION-->\n",
    "< [Introduction to PyRosetta](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/02.00-Introduction-to-PyRosetta.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Working with Pose residues](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/02.02-Working-with-Pose-Residues.ipynb) ><p><a href=\"https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/02.01-Pose-Basics.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open in Google Colaboratory\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "AcfvEpZw5qZE"
   },
   "source": [
    "# Pose Basics\n",
    "Keywords: pose_from_pdb(), sequence(), cleanATOM, annotated_sequence()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4PobYkjK5qZE"
   },
   "source": [
    "In this lab, we will get practice working with the `Pose` class in PyRosetta. We will load in a protein from a PDB files, use the `Pose` class to learn about the geometry of the protein, make changes to this geometry, and visualize the changes easily with `PyMOL` and PyRosetta's `PyMOLMover`.\n",
    "\n",
    "On the corresponding `Pose` lab found on the PyRosetta website, you will find various useful commands to interrogate poses; these may come in handy for the exercises.\n",
    "\n",
    "**PyRosetta Installation:**\n",
    "The following two lines will load in the PyRosetta library and load in database files. If this does not work, please notify the professor or the TA."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "U6Z8fOnU5qZF"
   },
   "outputs": [],
   "source": [
    "!pip install pyrosettacolabsetup\n",
    "import pyrosettacolabsetup; pyrosettacolabsetup.install_pyrosetta()\n",
    "import pyrosetta; pyrosetta.init()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "2Tx_dYqJ5qZG",
    "outputId": "80f2786d-81a1-4ba4-dcb3-692b6759d59f"
   },
   "outputs": [],
   "source": [
    "from pyrosetta import *\n",
    "init()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HOpuNIdr5qZH"
   },
   "source": [
    "## Loading in a PDB File ##"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "lzZvNI6l5qZH"
   },
   "source": [
    "Protein Data Bank (PDB) is a text file format for describing 3D molecular structures and other information. Rosetta can read in PDB files and can output them as well. In addition to PDB, mmTF and mmCIF are a couple other file formats that are used with Rosetta.\n",
    "\n",
    "We will spend some time today looking at the crystal structure for the protein **PafA** (PDB ID: 5tj3) using Pyrosetta. PafA is an alkaline phosphatase, which removes a phosphate group from a phosphate monoester. In this structure, a modified amino acid, phosphothreonine, is used to mimic the substrate in the active site. Let's load in this structure with PyRosetta (make sure that you have the PDB file located in your current directory):"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "iKrhogQ55qZH"
   },
   "source": [
    "# First we need to be sure we are working in the correct folder\n",
    "This will set our current working directory to student-notebooks folder\n",
    "`!cd /content/google_drive/MyDrive/student-notebooks/`\n",
    "\n",
    "This will print the current working directory:\n",
    "`!pwd` (the bang symbol \"!\" indicates a linux system command).\n",
    "\n",
    "This will list the files in current working directory:\n",
    "`!ls`\n",
    "\n",
    "Remember to adjust your working directory each time you work in one of the PyRosetta notebooks!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "P-Wnv_TE5qZI"
   },
   "outputs": [],
   "source": [
    "!echo Starting directory info\n",
    "!pwd\n",
    "!ls\n",
    "\n",
    "!echo Student notebook directory [adjust the path depending on where you put the notebooks!]\n",
    "!cd /content/google_drive/MyDrive/student-notebooks\n",
    "!pwd\n",
    "!ls"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "MIx2SOcI5qZI"
   },
   "outputs": [],
   "source": [
    "# [optional|advanced] you can also install and open a \"real\" UNIX terminal and do above commands interactively:\n",
    "!pip install colab-xterm\n",
    "%load_ext colabxterm\n",
    "%xterm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "BajXHir26GK6"
   },
   "source": [
    "# Now we are ready to load the PDB file\n",
    "`pose = pose_from_pdb(\"5tj3.pdb\")`\n",
    "\n",
    "Here we are inputting the PDB file using the `pose_from_pdb` method. However, we can also load this structure from the internet with `pose_from_rcsb(\"5TJ3\")`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "id": "rOGXMCD95qZI",
    "nbgrader": {
     "cell_type": "code",
     "checksum": "ade0467986e24449fd68226324dcf1c1",
     "grade": true,
     "grade_id": "cell-690a147764ad96d7",
     "locked": false,
     "points": 0,
     "schema_version": 3,
     "solution": true
    },
    "outputId": "aae6363f-e2d5-4cc8-8224-39c5045f5822"
   },
   "outputs": [],
   "source": [
    "YOUR-CODE-HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "IrtVfAdK5qZI"
   },
   "source": [
    "## What is a Pose?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9Ix8T0Cm5qZI"
   },
   "source": [
    "The Pose class includes various types of information that describe a structure. Some of the core components include the Energies, PDBInfo, and Conformation. See the Rosetta3 paper to learn more: https://www.sciencedirect.com/science/article/pii/B9780123812704000196\n",
    "\n",
    "As an example, let's use our pose to look at the sequence of 5TJ3:\n",
    "`pose.sequence()`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "id": "ETWXyMI35qZJ",
    "nbgrader": {
     "cell_type": "code",
     "checksum": "62406b4883cbd9a50eff78f646befd63",
     "grade": true,
     "grade_id": "cell-61e3c7efb8ae6b94",
     "locked": false,
     "points": 0,
     "schema_version": 3,
     "solution": true
    },
    "outputId": "4e5b5865-f2db-45bb-e8eb-72a91e9fe158"
   },
   "outputs": [],
   "source": [
    "# print out the sequence of the pose\n",
    "YOUR-CODE-HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "XkQoI9005qZJ"
   },
   "source": [
    "Sometimes PDB files do not conform to standards and need to be cleaned to be loaded successfully with PyRosetta. One way to make sure the file is loaded successfully is to only include the ATOM lines from the PDB file. Alternatively, you could use the cleanATOM function in pyrosetta.toolbox to achieve the same:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "6nrEcjgx5qZJ"
   },
   "outputs": [],
   "source": [
    "from pyrosetta.toolbox import cleanATOM\n",
    "cleanATOM(\"inputs/5tj3.pdb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zLm27GAf5qZJ"
   },
   "source": [
    "This method will create a cleaned 5tj3.clean.pdb file for you. Lets load this into PyRosetta as well:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "7_UQ79kU5qZJ",
    "outputId": "7835c647-f0eb-437c-dcf6-e4028cbf404c"
   },
   "outputs": [],
   "source": [
    "pose_clean = pose_from_pdb(\"inputs/5tj3.clean.pdb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "fRPoxUUD5qZJ"
   },
   "source": [
    "In our case, we could load in the PDB file for 5tj3 without cleaning it. In fact, we've lost some residues when cleaning the PDB file with cleanATOM. What is the difference in the `sequence` of the `pose_clean` now, compared to before?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "id": "R63DCEB15qZK",
    "nbgrader": {
     "cell_type": "code",
     "checksum": "43194da40e47c87bdd8653b43bd4d9ff",
     "grade": true,
     "grade_id": "cell-57e06ba64f052592",
     "locked": false,
     "points": 0,
     "schema_version": 3,
     "solution": true
    },
    "outputId": "1fac119c-9367-47b0-f275-c2a31aa4d498"
   },
   "outputs": [],
   "source": [
    "# print out the sequence of the pose_clean\n",
    "YOUR-CODE-HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "fg_qqSKr5qZK"
   },
   "source": [
    "With the function `annotated_sequence` below, we can start to see in more detail what the differences are. Note that non-canonical amino acids and hetatms are spelled out more explicitly now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "puQTTmzy5qZK",
    "outputId": "03f71bdf-c64c-40ee-8ba0-aff38bf0b1c9"
   },
   "outputs": [],
   "source": [
    "pose.annotated_sequence()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "XjVqfcPT5qZK",
    "outputId": "f5b6247e-65c0-4909-b25d-2bb6051a7a7c"
   },
   "outputs": [],
   "source": [
    "pose_clean.annotated_sequence()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Cj7qE1dj5qZK"
   },
   "source": [
    "### Exercise 1: Inspecting pose sequences\n",
    "\n",
    "Visually inspect the sequences to find the difference(s) between the `pose_clean.sequence()` and `pose.sequence()`. Were residues removed? Which ones?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "r9t_IDfv5qZL"
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Ps1zeHZR5qZL"
   },
   "source": [
    "### Bonus Exercise 1: Identifying differences in sequences\n",
    "\n",
    "(Optional) Write a program to automatically find the differences between these two sequences"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "VVCtpDB55qZL"
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "kU7vXmt15qZL"
   },
   "source": [
    "<!--NAVIGATION-->\n",
    "< [Introduction to PyRosetta](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/02.00-Introduction-to-PyRosetta.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Working with Pose residues](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/02.02-Working-with-Pose-Residues.ipynb) ><p><a href=\"https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/02.01-Pose-Basics.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open in Google Colaboratory\"></a>"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "470px",
    "left": "48px",
    "top": "110px",
    "width": "267.984px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}