{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Create molecules from scratch" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PyRNA allows you to construct easily DNA and RNA molecules. An RNA molecule will automatically convert T residues into U." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "my_rna: AGGGGAUUAACCCC\n", "my_dna: GGTTGGATTAACCCC\n" ] } ], "source": [ "from pyrna.features import DNA, RNA\n", "rna = RNA(name = 'my_rna', sequence = 'AGGGGATTAACCCC')\n", "print \"%s: %s\"%(rna.name, rna.sequence)\n", "dna = DNA(name = 'my_dna', sequence = 'GGTTGGATTAACCCC')\n", "print \"%s: %s\"%(dna.name, dna.sequence)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "RNA and DNA molecules can return their length, are slicable and iterable:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "slice: AG\n", "length: 14\n" ] } ], "source": [ "print \"slice: %s\"%rna[0:2]\n", "print \"length: %i\"%len(rna)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can easily get a single residue:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "G\n" ] } ], "source": [ "print rna[3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The sequence can be easily changed by adding a new string at the end:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "AGGGGAUUAACCCCAAA\n" ] } ], "source": [ "rna +'AAA'\n", "print rna.sequence" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or by removing some residues from the end:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "AGGGGAUUAACCCC\n" ] } ], "source": [ "rna-3\n", "print rna.sequence" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An RNA molecule is iterable over its primary sequence:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "residue n1: A\n", "residue n2: G\n", "residue n3: G\n", "residue n4: G\n", "residue n5: G\n", "residue n6: A\n", "residue n7: U\n", "residue n8: U\n", "residue n9: A\n", "residue n10: A\n", "residue n11: C\n", "residue n12: C\n", "residue n13: C\n", "residue n14: C\n" ] } ], "source": [ "for index, residue in enumerate(rna):\n", " print \"residue n%i: %s\"%(index+1, residue)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Create molecules from files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With PyRNA, an object pyrna.features.TertiaryStructure is made with a single molecular chain. Since a PDB file can contains several molecules, the function parse_pdb() returns a list of such objects." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [], "source": [ "h = open('../data/1ehz.pdb')\n", "pdb_content = h.read()\n", "h.close()\n", "\n", "from pyrna.parsers import parse_pdb\n", "tertiary_structures = parse_pdb(pdb_content)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "RNA molecules extracted from PDB files can contain modified residues. PyRNA converts them automatically into unmodified residues, and stores the modification in a dictionary." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A\n", "GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA\n", "[('2MG', 10), ('H2U', 16), ('H2U', 17), ('M2G', 26), ('OMC', 32), ('OMG', 34), ('YYG', 37), ('PSU', 39), ('5MC', 40), ('7MG', 46), ('5MC', 49), ('5MU', 54), ('PSU', 55), ('1MA', 58)]\n" ] } ], "source": [ "for ts in tertiary_structures:\n", " print ts.rna.name\n", " print ts.rna.sequence\n", " print ts.rna.modified_residues" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to parse a FASTA file, you have to precise the type of molecules stored. DNA molecules are faster to create since PyRNA will not try to identify modified residues." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sequence of telomerase 1:\n", "AGUUUCUCGAUAAUUGAUCUGUAGAAUCUGUCAAGCAAAACCCCAAAACCUUACACUGAGAGCAUUUAGCCUGAUUACUCUUUAAAUCAAAUCAGGCAAUAGAGAGAAACUCGAGAGGUGAAAACCCCACAGCAUUCUGAAAUGUAUUUGGGAGUAAUCUCAUAUUAGUUUGCUGUCCUCUCAUCUUUU\n", "\n", "sequence of telomerase 2:\n", "AUCCCCGCAAAUUCAUUCUGUUUGCAUUCAAACAGUCAUUCAACCCCAAAAAUCUAGACCAAAUAUUGUCUUCCCUUCUUGGCACAAACAAAGAAGAGACGCGGGAUAAAGAUACUCCGACGAUUGAUACAAUAUUUAUCAACGGGAGGUCUUACUUUU\n", "\n", "sequence of telomerase 3:\n", "UACCUCCUGUGGAUCCAUUCAGGAUUAAUGAAAUCCUGUCAUUCAACCCCAAAAAUCUUGUCAAAUUAUUGCCUCGUCUUUUGGGCACAAACAAAAGUCACGCAGGAGGUUCAGACAUUCGACAUAAGAUACACUAUUUAUCUUAUGGAAGGUCUAGUUUUU\n", "\n" ] } ], "source": [ "h = open('../data/telomerases.fasta')\n", "fasta_content = h.read()\n", "h.close()\n", "\n", "from pyrna.parsers import parse_fasta\n", "#the default type is RNA\n", "for rna in parse_fasta(fasta_content):\n", " print \"sequence of %s:\"%rna.name\n", " print \"%s\\n\"%rna.sequence" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An object RNA will automatically convert T residues into U." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sequence as a DNA:\n", "TAACAATCTGCTGAAAGGTACCGTCGGAGGGAGCTTTGTTGCCAGCGCCAGAAACGCCGGTTTAACCAGCGCCGAAGTGAGCGCAGTGATTAAAGCCATGCAGTGGCAAATGGATTTCCGCAAACTGAAAAAAGGCGATGAATTTGCGGT\n", "\n", "sequence as an RNA:\n", "UAACAAUCUGCUGAAAGGUACCGUCGGAGGGAGCUUUGUUGCCAGCGCCAGAAACGCCGGUUUAACCAGCGCCGAAGUGAGCGCAGUGAUUAAAGCCAUGCAGUGGCAAAUGGAUUUCCGCAAACUGAAAAAAGGCGAUGAAUUUGCGGU\n" ] } ], "source": [ "h = open('../data/ft3100_from_FANTOM3_project.fasta')\n", "fasta_content = h.read()\n", "h.close()\n", "\n", "for dna in parse_fasta(fasta_content, 'DNA'):\n", " print \"sequence as a DNA:\"\n", " print \"%s\\n\"%dna.sequence\n", "\n", "for rna in parse_fasta(fasta_content):\n", " print \"sequence as an RNA:\"\n", " print rna.sequence" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "DNA and RNA objects have a rich textual representation in Jupyter notebooks. " ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
1\tUAACAAUCUGCUGAAAGGUACCGUCGGAGGGAGCUUUGUUGCCAGCGCCAGAAACGCCGG\n",
       "61\tUUUAACCAGCGCCGAAGUGAGCGCAGUGAUUAAAGCCAUGCAGUGGCAAAUGGAUUUCCG\n",
       "121\tCAAACUGAAAAAAGGCGAUGAAUUUGCGGU\n",
       "
" ], "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parse_fasta(fasta_content)[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Create molecules from databases" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can load 3D structures directly from the Protein Databank" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from pyrna.db import PDB\n", "pdb = PDB()\n", "pdb_content = pdb.get_entry('1GID')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With PyRNA, a pyrna.features.TertiaryStructure object is made with a single molecular chain. Since a PDB file can contains several molecules, the function parse_pdb returns a list of pyrna.features.TertiaryStructure." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "molecular chain A: GAAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUC\n", "molecular chain B: GAAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUC\n" ] } ], "source": [ "from pyrna.parsers import parse_pdb\n", "\n", "for tertiary_structure in parse_pdb(pdb_content):\n", " print \"molecular chain %s: %s\"%(tertiary_structure.rna.name, tertiary_structure.rna.sequence)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [Root]", "language": "python", "name": "Python [Root]" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.12" } }, "nbformat": 4, "nbformat_minor": 0 }