{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Oligonucleotides: RNA\n",
    "\n",
    "## Nucleic Acid Sequences\n",
    "\n",
    "OpenMS also supports the representation of RNA oligonucleotides using\n",
    "the\n",
    "[NASequence](https://pyopenms.readthedocs.io/en/latest/apidocs/_autosummary/pyopenms/pyopenms.NASequence.html)\n",
    "class:"
   ],
   "id": "81d5ec8f-32ed-4df3-b425-44184466c98b"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "linenos": ""
   },
   "outputs": [],
   "source": [
    "import pyopenms as oms\n",
    "\n",
    "oligo = oms.NASequence.fromString(\"AAUGCAAUGG\")\n",
    "prefix = oligo.getPrefix(4)\n",
    "suffix = oligo.getSuffix(4)\n",
    "\n",
    "print(oligo)\n",
    "print(prefix)\n",
    "print(suffix)\n",
    "print()\n",
    "\n",
    "print(\"Oligo length\", oligo.size())\n",
    "print(\"Total precursor mass\", oligo.getMonoWeight())\n",
    "print(\n",
    "    \"y1+ ion mass of\",\n",
    "    str(prefix),\n",
    "    \":\",\n",
    "    prefix.getMonoWeight(oms.NASequence.NASFragmentType.YIon, 1),\n",
    ")\n",
    "print()\n",
    "\n",
    "seq_formula = oligo.getFormula()\n",
    "print(\"RNA Oligo\", oligo, \"has molecular formula\", seq_formula)\n",
    "print(\"=\" * 35)\n",
    "print()\n",
    "\n",
    "isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(6))\n",
    "for iso in isotopes.getContainer():\n",
    "    print(\"Isotope\", iso.getMZ(), \":\", iso.getIntensity())"
   ],
   "id": "f0e575b5-a98d-422c-a83e-666467cdce0b"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Which will output\n",
    "\n",
    "``` output\n",
    "AAUGCAAUGG\n",
    "AAUG\n",
    "AUGG\n",
    "\n",
    "Oligo length 10\n",
    "Total precursor mass 3206.4885302061\n",
    "y1+ ion mass of AAUG : 1248.2298440331\n",
    "\n",
    "RNA Oligo AAUGCAAUGG has molecular formula C97H119N42O66P9\n",
    "===================================\n",
    "\n",
    "Isotope 3206.4885302061 : 0.25567981600761414\n",
    "Isotope 3207.4918850439003 : 0.31783154606819153\n",
    "Isotope 3208.4952398817004 : 0.23069815337657928\n",
    "Isotope 3209.4985947195 : 0.12306403368711472\n",
    "Isotope 3210.5019495573 : 0.053163252770900726\n",
    "Isotope 3211.5053043951 : 0.01956319250166416\n",
    "```\n",
    "\n",
    "The\n",
    "[NASequence](https://pyopenms.readthedocs.io/en/latest/apidocs/_autosummary/pyopenms/pyopenms.NASequence.html)\n",
    "object also allows iterations directly in Python:"
   ],
   "id": "f2afcd22-5054-402a-af0a-ad1b7d89b291"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "linenos": ""
   },
   "outputs": [],
   "source": [
    "oligo = oms.NASequence.fromString(\"AAUGCAAUGG\")\n",
    "print(\n",
    "    \"The oligonucleotide\", str(oligo), \"consists of the following nucleotides:\"\n",
    ")\n",
    "for ribo in oligo:\n",
    "    print(ribo.getName())"
   ],
   "id": "cc4824ae-3da8-4447-b57d-a052b7aef8ea"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fragment Ions\n",
    "\n",
    "Similarly to before for amino acid sequences, we can also generate\n",
    "internal fragment ions:"
   ],
   "id": "2cf3d55c-3a34-4ac0-b346-073fcefb22c3"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "linenos": ""
   },
   "outputs": [],
   "source": [
    "oligo = oms.NASequence.fromString(\"AAUGCAAUGG\")\n",
    "suffix = oligo.getSuffix(4)\n",
    "\n",
    "oligo.size()\n",
    "oligo.getMonoWeight()\n",
    "\n",
    "charge = 2\n",
    "mass = suffix.getMonoWeight(oms.NASequence.NASFragmentType.WIon, charge)\n",
    "w4_formula = suffix.getFormula(oms.NASequence.NASFragmentType.WIon, charge)\n",
    "mz = mass / charge\n",
    "\n",
    "print(\"=\" * 35)\n",
    "print(\"RNA Oligo w4++ ion\", suffix, \"has mz\", mz)\n",
    "print(\"RNA Oligo w4++ ion\", suffix, \"has molecular formula\", w4_formula)"
   ],
   "id": "24bad687-f2cd-4c81-ab51-5063a25c2c0b"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Which will output\n",
    "\n",
    "``` output\n",
    "10\n",
    "3206.4885302061\n",
    "===================================\n",
    "RNA Oligo w4++ ion AUGG has mz 672.5989092135458\n",
    "RNA Oligo w4++ ion AUGG has molecular formula C39H51N17O29P4\n",
    "```\n",
    "\n",
    "## Modified Oligonucleotides\n",
    "\n",
    "Modified nucleotides can also be represented by the\n",
    "[Ribonucleotide](https://pyopenms.readthedocs.io/en/latest/apidocs/_autosummary/pyopenms/pyopenms.Ribonucleotide.html)\n",
    "class and are specified using a unique string identifier present in the\n",
    "[RibonucleotideDB](https://pyopenms.readthedocs.io/en/latest/apidocs/_autosummary/pyopenms/pyopenms.RibonucleotideDB.html)\n",
    "in square brackets. For example, `[m1A]` represents 1-methyl-adenosine.\n",
    "We can create a\n",
    "[NASequence](https://pyopenms.readthedocs.io/en/latest/apidocs/_autosummary/pyopenms/pyopenms.NASequence.html)\n",
    "object by parsing a modified sequence as follows:"
   ],
   "id": "5bebf045-8e23-4c52-a4d3-d724fe0196c5"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "linenos": ""
   },
   "outputs": [],
   "source": [
    "oligo_mod = oms.NASequence.fromString(\"A[m1A][Gm]A\")\n",
    "seq_formula = oligo_mod.getFormula()\n",
    "print(\n",
    "    \"RNA Oligo\",\n",
    "    oligo_mod,\n",
    "    \"has molecular formula\",\n",
    "    seq_formula,\n",
    "    \"and length\",\n",
    "    oligo_mod.size(),\n",
    ")\n",
    "print(\"=\" * 35)\n",
    "\n",
    "oligo_list = [oligo_mod[i].getOrigin() for i in range(oligo_mod.size())]\n",
    "print(\n",
    "    \"RNA Oligo\",\n",
    "    oligo_mod.toString(),\n",
    "    \"has unmodified sequence\",\n",
    "    \"\".join(oligo_list),\n",
    ")\n",
    "\n",
    "r = oligo_mod[1]\n",
    "r.getName()\n",
    "r.getHTMLCode()\n",
    "r.getOrigin()\n",
    "\n",
    "for i in range(oligo_mod.size()):\n",
    "    print(oligo_mod[i].isModified())"
   ],
   "id": "06d0b787-9dad-45ab-a655-b48d0f23dad0"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Which will output\n",
    "\n",
    "``` output\n",
    "RNA Oligo A[m1A][Gm]A has molecular formula C42H53N20O23P3 and length 4\n",
    "===================================\n",
    "RNA Oligo A[m1A][Gm]A has unmodified sequence AAGA\n",
    "'1-methyladenosine'\n",
    "'\"'\n",
    "'A'\n",
    "False\n",
    "True\n",
    "True\n",
    "False\n",
    "```\n",
    "\n",
    "## DNA, RNA and Protein\n",
    "\n",
    "We can also work with DNA and RNA sequences in combination with the\n",
    "BioPython library (you can install BioPython with\n",
    "`pip install biopython`):\n",
    "\n",
    "``` pseudocode\n",
    "from Bio.Seq import Seq\n",
    "from Bio.Alphabet import IUPAC\n",
    "bsa = oms.FASTAEntry()\n",
    "bsa.sequence = 'ATGAAGTGGGTGACTTTTATTTCTCTTCTCCTTCTCTTCAGCTCTGCTTATTCCAGGGGTGTGTTTCGT'\n",
    "bsa.description = \"BSA Bovine Albumin (partial sequence)\"\n",
    "bsa.identifier = \"BSA\"\n",
    "\n",
    "entries = [bsa]\n",
    "\n",
    "f = oms.FASTAFile()\n",
    "f.store(\"example_dna.fasta\", entries)\n",
    "\n",
    "coding_dna = Seq(bsa.sequence, IUPAC.unambiguous_dna)    \n",
    "coding_rna = coding_dna.transcribe()\n",
    "protein_seq = coding_rna.translate()\n",
    "\n",
    "oligo = oms.NASequence.fromString(str(coding_rna))\n",
    "aaseq = oms.AASequence.fromString(str(protein_seq))\n",
    "\n",
    "print(\"The RNA sequence\", str(oligo), \"has mass\", oligo.getMonoWeight(), \"and \\n\"\n",
    "  \"translates to the protein sequence\", str(aaseq), \"which has mass\", aaseq.getMonoWeight() )\n",
    "```"
   ],
   "id": "ad951176-357c-4fbf-9ae9-4f518e44ec9e"
  }
 ],
 "nbformat": 4,
 "nbformat_minor": 5,
 "metadata": {}
}