{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create molecules from scratch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "PyRNA allows you to construct easily DNA and RNA molecules. An RNA molecule will automatically convert T residues into U."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "my_rna: AGGGGAUUAACCCC\n",
      "my_dna: GGTTGGATTAACCCC\n"
     ]
    }
   ],
   "source": [
    "from pyrna.features import DNA, RNA\n",
    "rna = RNA(name = 'my_rna', sequence = 'AGGGGATTAACCCC')\n",
    "print \"%s: %s\"%(rna.name, rna.sequence)\n",
    "dna = DNA(name = 'my_dna', sequence = 'GGTTGGATTAACCCC')\n",
    "print \"%s: %s\"%(dna.name, dna.sequence)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "RNA and DNA molecules can return their length, are slicable and iterable:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "slice: AG\n",
      "length: 14\n"
     ]
    }
   ],
   "source": [
    "print \"slice: %s\"%rna[0:2]\n",
    "print \"length: %i\"%len(rna)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can easily get a single residue:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "G\n"
     ]
    }
   ],
   "source": [
    "print rna[3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The sequence can be easily changed by adding a new string at the end:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "AGGGGAUUAACCCCAAA\n"
     ]
    }
   ],
   "source": [
    "rna +'AAA'\n",
    "print rna.sequence"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or by removing some residues from the end:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "AGGGGAUUAACCCC\n"
     ]
    }
   ],
   "source": [
    "rna-3\n",
    "print rna.sequence"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An RNA molecule is iterable over its primary sequence:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "residue n1: A\n",
      "residue n2: G\n",
      "residue n3: G\n",
      "residue n4: G\n",
      "residue n5: G\n",
      "residue n6: A\n",
      "residue n7: U\n",
      "residue n8: U\n",
      "residue n9: A\n",
      "residue n10: A\n",
      "residue n11: C\n",
      "residue n12: C\n",
      "residue n13: C\n",
      "residue n14: C\n"
     ]
    }
   ],
   "source": [
    "for index, residue in enumerate(rna):\n",
    "    print \"residue n%i: %s\"%(index+1, residue)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create molecules from files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With PyRNA, an object pyrna.features.TertiaryStructure is made with a single molecular chain. Since a PDB file can contains several molecules, the function parse_pdb() returns a list of such objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "h = open('../data/1ehz.pdb')\n",
    "pdb_content = h.read()\n",
    "h.close()\n",
    "\n",
    "from pyrna.parsers import parse_pdb\n",
    "tertiary_structures = parse_pdb(pdb_content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "RNA molecules extracted from PDB files can contain modified residues. PyRNA converts them automatically into unmodified residues, and stores the modification in a dictionary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "A\n",
      "GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA\n",
      "[('2MG', 10), ('H2U', 16), ('H2U', 17), ('M2G', 26), ('OMC', 32), ('OMG', 34), ('YYG', 37), ('PSU', 39), ('5MC', 40), ('7MG', 46), ('5MC', 49), ('5MU', 54), ('PSU', 55), ('1MA', 58)]\n"
     ]
    }
   ],
   "source": [
    "for ts in tertiary_structures:\n",
    "    print ts.rna.name\n",
    "    print ts.rna.sequence\n",
    "    print ts.rna.modified_residues"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you want to parse a FASTA file, you have to precise the type of molecules stored. DNA molecules are faster to create since PyRNA will not try to identify modified residues."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "sequence of telomerase 1:\n",
      "AGUUUCUCGAUAAUUGAUCUGUAGAAUCUGUCAAGCAAAACCCCAAAACCUUACACUGAGAGCAUUUAGCCUGAUUACUCUUUAAAUCAAAUCAGGCAAUAGAGAGAAACUCGAGAGGUGAAAACCCCACAGCAUUCUGAAAUGUAUUUGGGAGUAAUCUCAUAUUAGUUUGCUGUCCUCUCAUCUUUU\n",
      "\n",
      "sequence of telomerase 2:\n",
      "AUCCCCGCAAAUUCAUUCUGUUUGCAUUCAAACAGUCAUUCAACCCCAAAAAUCUAGACCAAAUAUUGUCUUCCCUUCUUGGCACAAACAAAGAAGAGACGCGGGAUAAAGAUACUCCGACGAUUGAUACAAUAUUUAUCAACGGGAGGUCUUACUUUU\n",
      "\n",
      "sequence of telomerase 3:\n",
      "UACCUCCUGUGGAUCCAUUCAGGAUUAAUGAAAUCCUGUCAUUCAACCCCAAAAAUCUUGUCAAAUUAUUGCCUCGUCUUUUGGGCACAAACAAAAGUCACGCAGGAGGUUCAGACAUUCGACAUAAGAUACACUAUUUAUCUUAUGGAAGGUCUAGUUUUU\n",
      "\n"
     ]
    }
   ],
   "source": [
    "h = open('../data/telomerases.fasta')\n",
    "fasta_content = h.read()\n",
    "h.close()\n",
    "\n",
    "from pyrna.parsers import parse_fasta\n",
    "#the default type is RNA\n",
    "for rna in parse_fasta(fasta_content):\n",
    "    print \"sequence of %s:\"%rna.name\n",
    "    print \"%s\\n\"%rna.sequence"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An object RNA will automatically convert T residues into U."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "sequence as a DNA:\n",
      "TAACAATCTGCTGAAAGGTACCGTCGGAGGGAGCTTTGTTGCCAGCGCCAGAAACGCCGGTTTAACCAGCGCCGAAGTGAGCGCAGTGATTAAAGCCATGCAGTGGCAAATGGATTTCCGCAAACTGAAAAAAGGCGATGAATTTGCGGT\n",
      "\n",
      "sequence as an RNA:\n",
      "UAACAAUCUGCUGAAAGGUACCGUCGGAGGGAGCUUUGUUGCCAGCGCCAGAAACGCCGGUUUAACCAGCGCCGAAGUGAGCGCAGUGAUUAAAGCCAUGCAGUGGCAAAUGGAUUUCCGCAAACUGAAAAAAGGCGAUGAAUUUGCGGU\n"
     ]
    }
   ],
   "source": [
    "h = open('../data/ft3100_from_FANTOM3_project.fasta')\n",
    "fasta_content = h.read()\n",
    "h.close()\n",
    "\n",
    "for dna in parse_fasta(fasta_content, 'DNA'):\n",
    "    print \"sequence as a DNA:\"\n",
    "    print \"%s\\n\"%dna.sequence\n",
    "\n",
    "for rna in parse_fasta(fasta_content):\n",
    "    print \"sequence as an RNA:\"\n",
    "    print rna.sequence"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "DNA and RNA objects have a rich textual representation in Jupyter notebooks. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>1\t<font color=\"green\">U</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"orange\">C</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"green\">U</font><font color=\"blue\">A</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"green\">U</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"red\">G</font>\n",
       "61\t<font color=\"green\">U</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"red\">G</font>\n",
       "121\t<font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"orange\">C</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"green\">U</font>\n",
       "</pre>"
      ],
      "text/plain": [
       "<pyrna.features.RNA instance at 0x114b80fc8>"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "parse_fasta(fasta_content)[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create molecules from databases"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can load 3D structures directly from the Protein Databank"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from pyrna.db import PDB\n",
    "pdb = PDB()\n",
    "pdb_content = pdb.get_entry('1GID')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With PyRNA, a pyrna.features.TertiaryStructure object is made with a single molecular chain. Since a PDB file can contains several molecules, the function parse_pdb returns a list of pyrna.features.TertiaryStructure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "molecular chain A: GAAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUC\n",
      "molecular chain B: GAAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUC\n"
     ]
    }
   ],
   "source": [
    "from pyrna.parsers import parse_pdb\n",
    "\n",
    "for tertiary_structure in parse_pdb(pdb_content):\n",
    "    print \"molecular chain %s: %s\"%(tertiary_structure.rna.name, tertiary_structure.rna.sequence)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [Root]",
   "language": "python",
   "name": "Python [Root]"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}