{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Calculating protein mass, from [Rosalind.info](https://www.rosalind.info)\n",
    "\n",
    "(Specific exercise can be found at: http://rosalind.info/problems/prtm/)\n",
    "\n",
    "## My personal interpretation\n",
    "\n",
    "1. The exercise is about calculating the molecular weight of a protein\n",
    "\n",
    "2. The protein is represented as an amino acid sequence (a string of letters)\n",
    "\n",
    "3. Molecular weights per amino acid are given in a table of monoisotopic masses\n",
    "\n",
    "4. The practical side of the exercise comes down to reading the table with masses and then translating the letters from a given sequence into numbers using the table and adding the numbers up.\n",
    "\n",
    "I think I can do this in three functions:\n",
    "\n",
    "1. Read the monoisotopic mass table and convert to a dictionary\n",
    "\n",
    "2. Read the text file with the amino acid sequence\n",
    "\n",
    "3. Take the amino acid sequence and mass table to calculate the mass"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "def read_monoisotopic_mass_table(input_file):\n",
    "    \"\"\"\n",
    "Given a tab-separatedd input file with amino acids (as capital letters)\n",
    "in the first column, and molecular weights (as floating point numbers)\n",
    "in the second column - create a dictionary with the amino acids as keys\n",
    "and their respective weights as values.\n",
    "    \"\"\"\n",
    "    mass_dict = {}\n",
    "    \n",
    "    with open(input_file, \"r\") as read_file:\n",
    "        for line in read_file:\n",
    "            elements = line.split()\n",
    "            amino_acid = str(elements[0])\n",
    "            weight = float(elements[1])\n",
    "            \n",
    "            mass_dict[amino_acid] = weight\n",
    "            \n",
    "    return mass_dict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'A': 71.03711, 'C': 103.00919, 'D': 115.02694, 'E': 129.04259, 'F': 147.06841, 'G': 57.02146, 'H': 137.05891, 'I': 113.08406, 'K': 128.09496, 'L': 113.08406, 'M': 131.04049, 'N': 114.04293, 'P': 97.05276, 'Q': 128.05858, 'R': 156.10111, 'S': 87.03203, 'T': 101.04768, 'V': 99.06841, 'W': 186.07931, 'Y': 163.06333}\n"
     ]
    }
   ],
   "source": [
    "mass_dict = read_monoisotopic_mass_table(\"data/monoisotopic_mass_table.tsv\")\n",
    "\n",
    "print(mass_dict)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So far so good, now make the second function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "def read_amino_acid_sequence(input_file):\n",
    "    \"\"\"\n",
    "Read a text file with an amino acid sequence and\n",
    "return the sequence as string.\n",
    "    \"\"\"\n",
    "    with open(input_file, \"r\") as read_file:\n",
    "        for line in read_file:\n",
    "            amino_acids = str(line.strip())\n",
    "            #Note: the .strip() is required to remove the\n",
    "            # newline, which otherwise would be interpreted\n",
    "            # as amino acid!\n",
    "            \n",
    "    return amino_acids"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SKADYEK\n"
     ]
    }
   ],
   "source": [
    "example_protein = read_amino_acid_sequence(\"data/Example_calculating_protein_mass.txt\")\n",
    "\n",
    "print(example_protein)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that works as well, time to make the final function: the one that converts the amino acid sequence to its weight."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "def calculate_protein_weight(protein, mass_table):\n",
    "    \"\"\"\n",
    "Given a protein sequence as string and a mass table as dictionary\n",
    "(with amino acids as keys and their respective weights as values),\n",
    "calculate the molecular weight of the protein by summing up the\n",
    "weight of each amino acid in the protein.\n",
    "    \"\"\"\n",
    "    total_weight = 0\n",
    "    \n",
    "    for amino_acid in protein:\n",
    "        weight = mass_table[amino_acid]\n",
    "        total_weight += weight\n",
    "        \n",
    "    return total_weight"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "821.3919199999999"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "calculate_protein_weight(example_protein, mass_dict)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now this answer looks good, except the rounding of the decimals is slightly different from the example on rosalind.info... Perhaps I should just round the answer to 3 decimals?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "821.392"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "round(calculate_protein_weight(example_protein, mass_dict), 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Perfect! Now let me just overwrite the function to incorporate the rounding:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "def calculate_protein_weight(protein, mass_table):\n",
    "    \"\"\"\n",
    "Given a protein sequence as string and a mass table as dictionary\n",
    "(with amino acids as keys and their respective weights as values),\n",
    "calculate the molecular weight of the protein by summing up the\n",
    "weight of each amino acid in the protein.\n",
    "    \"\"\"\n",
    "    total_weight = 0\n",
    "    \n",
    "    for amino_acid in protein:\n",
    "        weight = mass_table[amino_acid]\n",
    "        total_weight += weight\n",
    "        \n",
    "    return round(total_weight, 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And let's give the actual exercise a shot with this!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "103133.769\n"
     ]
    }
   ],
   "source": [
    "test_protein = read_amino_acid_sequence(\"data/rosalind_prtm.txt\")\n",
    "\n",
    "molecular_weight = calculate_protein_weight(test_protein, mass_dict)\n",
    "\n",
    "print(molecular_weight)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Success!!\n",
    "\n",
    "It worked. The problem has been solved."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}