{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Oligonucleotides: RNA\n", "\n", "## Nucleic Acid Sequences\n", "\n", "OpenMS also supports the representation of RNA oligonucleotides using\n", "the\n", "[NASequence](https://pyopenms.readthedocs.io/en/latest/apidocs/_autosummary/pyopenms/pyopenms.NASequence.html)\n", "class:" ], "id": "664f5f61-cef4-4d76-9e29-32bd6e390997" }, { "cell_type": "code", "execution_count": null, "metadata": { "linenos": "" }, "outputs": [], "source": [ "import pyopenms as oms\n", "\n", "oligo = oms.NASequence.fromString(\"AAUGCAAUGG\")\n", "prefix = oligo.getPrefix(4)\n", "suffix = oligo.getSuffix(4)\n", "\n", "print(oligo)\n", "print(prefix)\n", "print(suffix)\n", "print()\n", "\n", "print(\"Oligo length\", oligo.size())\n", "print(\"Total precursor mass\", oligo.getMonoWeight())\n", "print(\n", " \"y1+ ion mass of\",\n", " str(prefix),\n", " \":\",\n", " prefix.getMonoWeight(oms.NASequence.NASFragmentType.YIon, 1),\n", ")\n", "print()\n", "\n", "seq_formula = oligo.getFormula()\n", "print(\"RNA Oligo\", oligo, \"has molecular formula\", seq_formula)\n", "print(\"=\" * 35)\n", "print()\n", "\n", "isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(6))\n", "for iso in isotopes.getContainer():\n", " print(\"Isotope\", iso.getMZ(), \":\", iso.getIntensity())" ], "id": "590ee962-449e-43f3-92f8-7a95d65b0922" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Which will output\n", "\n", "``` output\n", "AAUGCAAUGG\n", "AAUG\n", "AUGG\n", "\n", "Oligo length 10\n", "Total precursor mass 3206.4885302061\n", "y1+ ion mass of AAUG : 1248.2298440331\n", "\n", "RNA Oligo AAUGCAAUGG has molecular formula C97H119N42O66P9\n", "===================================\n", "\n", "Isotope 3206.4885302061 : 0.25567981600761414\n", "Isotope 3207.4918850439003 : 0.31783154606819153\n", "Isotope 3208.4952398817004 : 0.23069815337657928\n", "Isotope 3209.4985947195 : 0.12306403368711472\n", "Isotope 3210.5019495573 : 0.053163252770900726\n", "Isotope 3211.5053043951 : 0.01956319250166416\n", "```\n", "\n", "The\n", "[NASequence](https://pyopenms.readthedocs.io/en/latest/apidocs/_autosummary/pyopenms/pyopenms.NASequence.html)\n", "object also allows iterations directly in Python:" ], "id": "9175f569-0656-4d12-aba0-fc4ec972ccaa" }, { "cell_type": "code", "execution_count": null, "metadata": { "linenos": "" }, "outputs": [], "source": [ "oligo = oms.NASequence.fromString(\"AAUGCAAUGG\")\n", "print(\n", " \"The oligonucleotide\", str(oligo), \"consists of the following nucleotides:\"\n", ")\n", "for ribo in oligo:\n", " print(ribo.getName())" ], "id": "de6182c8-cab5-4572-b5ed-8dd2dfbc5721" }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fragment Ions\n", "\n", "Similarly to before for amino acid sequences, we can also generate\n", "internal fragment ions:" ], "id": "c797e867-c9a9-4085-93ba-f2221b187df9" }, { "cell_type": "code", "execution_count": null, "metadata": { "linenos": "" }, "outputs": [], "source": [ "oligo = oms.NASequence.fromString(\"AAUGCAAUGG\")\n", "suffix = oligo.getSuffix(4)\n", "\n", "oligo.size()\n", "oligo.getMonoWeight()\n", "\n", "charge = 2\n", "mass = suffix.getMonoWeight(oms.NASequence.NASFragmentType.WIon, charge)\n", "w4_formula = suffix.getFormula(oms.NASequence.NASFragmentType.WIon, charge)\n", "mz = mass / charge\n", "\n", "print(\"=\" * 35)\n", "print(\"RNA Oligo w4++ ion\", suffix, \"has mz\", mz)\n", "print(\"RNA Oligo w4++ ion\", suffix, \"has molecular formula\", w4_formula)" ], "id": "7c328057-89fb-44d2-a8d8-88e8310cc6e2" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Modified Oligonucleotides\n", "\n", "Modified nucleotides can also be represented by the\n", "[Ribonucleotide](https://pyopenms.readthedocs.io/en/latest/apidocs/_autosummary/pyopenms/pyopenms.Ribonucleotide.html)\n", "class and are specified using a unique string identifier present in the\n", "[RibonucleotideDB](https://pyopenms.readthedocs.io/en/latest/apidocs/_autosummary/pyopenms/pyopenms.RibonucleotideDB.html)\n", "in square brackets. For example, `[m1A]` represents 1-methyl-adenosine.\n", "We can create a\n", "[NASequence](https://pyopenms.readthedocs.io/en/latest/apidocs/_autosummary/pyopenms/pyopenms.NASequence.html)\n", "object by parsing a modified sequence as follows:" ], "id": "efbd7b74-3bcb-4c1f-89d1-d810520d0a1b" }, { "cell_type": "code", "execution_count": null, "metadata": { "linenos": "" }, "outputs": [], "source": [ "oligo_mod = oms.NASequence.fromString(\"A[m1A][Gm]A\")\n", "seq_formula = oligo_mod.getFormula()\n", "print(\n", " \"RNA Oligo\",\n", " oligo_mod,\n", " \"has molecular formula\",\n", " seq_formula,\n", " \"and length\",\n", " oligo_mod.size(),\n", ")\n", "print(\"=\" * 35)\n", "\n", "oligo_list = [oligo_mod[i].getOrigin() for i in range(oligo_mod.size())]\n", "print(\n", " \"RNA Oligo\",\n", " oligo_mod.toString(),\n", " \"has unmodified sequence\",\n", " \"\".join(oligo_list),\n", ")\n", "\n", "r = oligo_mod[1]\n", "r.getName()\n", "r.getHTMLCode()\n", "r.getOrigin()\n", "\n", "for i in range(oligo_mod.size()):\n", " print(oligo_mod[i].isModified())" ], "id": "ef4cd60c-a282-4c20-be89-441b75a2239a" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## DNA, RNA and Protein\n", "\n", "We can also work with DNA and RNA sequences in combination with the\n", "BioPython library (you can install BioPython with\n", "`pip install biopython`):\n", "\n", "``` pseudocode\n", "from Bio.Seq import Seq\n", "from Bio.Alphabet import IUPAC\n", "bsa = oms.FASTAEntry()\n", "bsa.sequence = 'ATGAAGTGGGTGACTTTTATTTCTCTTCTCCTTCTCTTCAGCTCTGCTTATTCCAGGGGTGTGTTTCGT'\n", "bsa.description = \"BSA Bovine Albumin (partial sequence)\"\n", "bsa.identifier = \"BSA\"\n", "\n", "entries = [bsa]\n", "\n", "f = oms.FASTAFile()\n", "f.store(\"example_dna.fasta\", entries)\n", "\n", "coding_dna = Seq(bsa.sequence, IUPAC.unambiguous_dna) \n", "coding_rna = coding_dna.transcribe()\n", "protein_seq = coding_rna.translate()\n", "\n", "oligo = oms.NASequence.fromString(str(coding_rna))\n", "aaseq = oms.AASequence.fromString(str(protein_seq))\n", "\n", "print(\"The RNA sequence\", str(oligo), \"has mass\", oligo.getMonoWeight(), \"and \\n\"\n", " \"translates to the protein sequence\", str(aaseq), \"which has mass\", aaseq.getMonoWeight() )\n", "```" ], "id": "3eb26d6b-dc26-42e9-a35d-f9ac2b9c79a1" } ], "nbformat": 4, "nbformat_minor": 5, "metadata": {} }