{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#  Introductions to constrainat-based modeling using cobrapy\n",
    "\n",
    "## Part 3: In-silico gene knockouts\n",
    "\n",
    "### Instructor:\n",
    "* Miguel Ponce de León from (Barcelona Supercomputing Center)\n",
    "* Contact: miguel.ponce@bsc.es\n",
    "\n",
    "11 December, 2020"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import cobra\n",
    "from cobra.io import read_sbml_model\n",
    "\n",
    "# State the path to the file iJO1366.xml\n",
    "sbml_fname = './data/iJO1366.xml'\n",
    "\n",
    "# Read the model\n",
    "model = read_sbml_model(sbml_fname)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Pick a gene of interest\n",
    "gene =  model.genes.b0720\n",
    "\n",
    "# Inspect the reactions associated to b0720\n",
    "print(\"id\\treaction_name\")\n",
    "for r in gene.reactions: \n",
    "    print(\"%s \\t%s\" % (r.id,r.name))\n",
    "\n",
    "print()\n",
    "# We can also check the genes associated to this reaction\n",
    "reaction = model.reactions.CS\n",
    "print(\"GPR:\",reaction.gene_reaction_rule)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 3.1: Single knock out study.\n",
    "\n",
    "Documentation: [https://cobrapy.readthedocs.io/en/latest/deletions.html#Knocking-out-single-genes-and-reactions](https://cobrapy.readthedocs.io/en/latest/deletions.html#Knocking-out-single-genes-and-reactions)\n",
    "\n",
    "We will use gene b0720 as an example\n",
    "\n",
    "1. COBRA can find the proper reaction to be disabled when a gene is knocked out as follows:\n",
    "\n",
    "```\n",
    "gene =  model.genes.b0720\n",
    "with model:\n",
    "    gene.knock_out()\n",
    "    ko_solution = model.optimize()\n",
    "```\n",
    "\n",
    "\n",
    "(This codes knocks out the gene b0720, recalculates the FBA and stores the new solution in ko_solution and If we perform the knockout using the \"with\" block we don't need to care about restoring the knocked out gene afterwards; it is automatically restored out of the \"with\" block..)\n",
    "\n",
    "2. Check the growth value (Hint: ko_solution.objective_values)\n",
    "3. Is the gene predicted as essential or non-essential\n",
    "4. Go to the Ecocyc database and check the in vivo experimental result for the knockout of b0720 by accessing the following link:\n",
    "* [https://ecocyc.org/gene?orgid=ECOLI&id=EG10402](https://ecocyc.org/gene?orgid=ECOLI&id=EG10402)\n",
    "\n",
    "Is b0720 essential or not?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## TODO\n",
    "## Write your code below\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Systems-wide knock out study of *E. coli*.\n",
    "    \n",
    "COBRA has a special function to run the single gene knock outs of a list of genes. \n",
    "\n",
    "The function's name is single_gene_deletion\n",
    "\n",
    "First import the function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the function single_gene_deletion\n",
    "from cobra.flux_analysis import single_gene_deletion"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Then get the list of all the genes\n",
    "all_genes = [g.id for g in model.genes]\n",
    "\n",
    "# Running in silico (takes a while)\n",
    "knockout = single_gene_deletion(model, gene_list=all_genes)\n",
    "\n",
    "# This is a fix to get the gene's id as the index\n",
    "knockout['ids'] = [list(i)[0] for i in knockout.ids]\n",
    "knockout = knockout.set_index('ids')\n",
    "\n",
    "# The output of the function single_gene_deletion is a dataframe\n",
    "knockout.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We define a threshold to define whether the reduction on the biomass flux is considered lethal.\n",
    "threshold = 0.01\n",
    "\n",
    "# Use this threshold to find which set of genes' knock out reduce the predicted growth below the threshold.\n",
    "insilico_lethals = set(knockout.index[knockout.growth< threshold])\n",
    "# The set of non-essential genes are the genes with a growth value above the threshold.\n",
    "insilico_non_lethals = set(knockout.index[knockout.growth > threshold])\n",
    "\n",
    "print(\"in-silico lethals:\", len(insilico_lethals))\n",
    "print(\"in-silico non lethals:\", len(insilico_non_lethals))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now we need to experimentally verify essential and non-essential gene sets.\n",
    "\n",
    "# Read the set of essential genes in vivo\n",
    "import json\n",
    "fname = './data/m9_invivo_lethals.json'\n",
    "with open(fname) as fh:\n",
    "    invivo_lethals = json.load(fh)\n",
    "    invivo_lethals = set(invivo_lethals)\n",
    "    \n",
    "# Convert the list of all model genes into a set.\n",
    "all_genes = set(all_genes)\n",
    "\n",
    "# We can use the difference to obtain the list of in vivo non-lethals\n",
    "invivo_non_lethals = all_genes - invivo_lethals\n",
    "\n",
    "# Print the size of both sets\n",
    "print(\"in-vivo lethals:\", len(invivo_lethals))\n",
    "print(\"in-vivo non lethals:\", len(invivo_non_lethals))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# https://en.wikipedia.org/wiki/Receiver_operating_characteristic\n",
    "\n",
    "# True Positives, genes predicted as essentials that are essentials in vivo (correctly predicted)\n",
    "TP =  insilico_lethals & invivo_lethals\n",
    "\n",
    "# True Negatives, genes predicted as NON-essentials that are NON-essential in vivo (correctly predicted)\n",
    "TN = insilico_non_lethals & invivo_non_lethals\n",
    "\n",
    "# False Positives, wrongly predicted as NON-essential genes\n",
    "FN = insilico_non_lethals & invivo_lethals\n",
    "\n",
    "# False Positives, wrongly predicted as essential genes\n",
    "FP = insilico_lethals & invivo_non_lethals\n",
    "\n",
    "# True in vivo esssential genes\n",
    "P = TP | FN\n",
    "# True in vivo NON-esssential genes\n",
    "N = TN | FP"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Excercise 3.3:\n",
    "Complete the following table using the values from Exercise 3.2 (*E. coli*)\n",
    "\n",
    "| In vivo \\ In silico        | in silico lethal | in silico non-lethal |\n",
    "| -------------------------- |:----------------:| --------------------:|\n",
    "| <b>in vivo lethal</b>      |               ?  |                   ?  |\n",
    "| <b>in vivo non-lehtal</b>  |               ?  |                   ?  |\n",
    "\n",
    "\n",
    "Total negatives = {{N}}\n",
    "\n",
    "### Excercise 3.4:\n",
    "Acces the following link:\n",
    "\n",
    "https://en.wikipedia.org/wiki/Sensitivity_and_specificity\n",
    "\n",
    "Get the formulas and calculate the following metrics:\n",
    "* sensitivity\n",
    "* specificity\n",
    "* precision\n",
    "* accuracy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Sensitivity, recall, hit rate, or true positive rate (TPR)\n",
    "# We computed the sensitivity as follows\n",
    "sensitivity = len(TP) / len(P) \n",
    "\n",
    "# TODO\n",
    "# complete the following code\n",
    "\n",
    "# Specificity, selectivity or true negative rate (TNR)\n",
    "specificity = None ## COMPLETE HERE \n",
    "\n",
    "# Precision or positive predictive value (PPV)\n",
    "precision = None ## COMPLETE HERE\n",
    "\n",
    "# Accuracy (ACC)\n",
    "accuracy = None ## COMPLETE HERE\n",
    "\n",
    "# Print the different values and discuss their meanings"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "venv",
   "language": "python",
   "name": "venv"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  },
  "latex_metadata": {
   "affiliation": "Barcelona Supercomputing Center, Barcelona Spain",
   "authors": "Miguel Ponce de Leon, Arnau Montagud",
   "title": "Constraint-based modelling tutorial"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}