{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Growth media VMH high fat low carb diet\n",
    "\n",
    "Similar to the western-style diet we will again start by loading the diet and depleting components absorbed by the host. In this case we have no manual annotation for which components should be diluted so we will use a generic human metabolic model to find those.\n",
    "The growth medium supllied here was created the following way:\n",
    "\n",
    "Let's start by reading the diet which was downloaded from https://www.vmh.life/#diet/High%20fat,%20low%20carb. Flux is in mmol/human/day. This has to be adjusted to 1 hour. Also the VMH site has a bug where it will clip fluxes after 4 digits, so we will set values like 0.0000 to 0.0001. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>reaction</th>\n",
       "      <th>flux</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>EX_etoh</td>\n",
       "      <td>0.000004</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>EX_h2o</td>\n",
       "      <td>6608.413339</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>EX_caro</td>\n",
       "      <td>0.000068</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>EX_retinol</td>\n",
       "      <td>0.129458</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>EX_thm</td>\n",
       "      <td>0.075371</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>86</th>\n",
       "      <td>EX_lgnc</td>\n",
       "      <td>0.008570</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>87</th>\n",
       "      <td>EX_fol</td>\n",
       "      <td>0.000005</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88</th>\n",
       "      <td>EX_strch1</td>\n",
       "      <td>0.002862</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>89</th>\n",
       "      <td>EX_i</td>\n",
       "      <td>0.000320</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90</th>\n",
       "      <td>EX_starch1200</td>\n",
       "      <td>0.000026</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>91 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         reaction         flux\n",
       "0         EX_etoh     0.000004\n",
       "1          EX_h2o  6608.413339\n",
       "2         EX_caro     0.000068\n",
       "3      EX_retinol     0.129458\n",
       "4          EX_thm     0.075371\n",
       "..            ...          ...\n",
       "86        EX_lgnc     0.008570\n",
       "87         EX_fol     0.000005\n",
       "88      EX_strch1     0.002862\n",
       "89           EX_i     0.000320\n",
       "90  EX_starch1200     0.000026\n",
       "\n",
       "[91 rows x 2 columns]"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "medium = pd.read_csv(\"../data/vmh_high_fat_low_carb.tsv\", index_col=False, sep=\"\\t\")\n",
    "medium.columns = [\"reaction\", \"flux\"]\n",
    "medium.reaction = medium.reaction.str.replace(\"(\\[e\\]$)|(\\(e\\)$)\", \"\", regex=True)\n",
    "medium.loc[medium.flux < 1e-4, \"flux\"] = 1e-4\n",
    "medium.flux = medium.flux / 24\n",
    "medium"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we will try to identify components that can be taken up by human cells.\n",
    "\n",
    "## Identifying human adsorption\n",
    "\n",
    "To achieve this we will load the Recon3 human model. AGORA and Recon IDs are very similar so we should be able to match them. We just have to adjust the Recon3 ones a bit. We start by identifying all available exchanges in Recon3 and adjusting the IDs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     EX_5adtststerone\n",
       "1    EX_5adtststerones\n",
       "2             EX_5fthf\n",
       "3             EX_5htrp\n",
       "4             EX_5mthf\n",
       "dtype: object"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from cobra.io import read_sbml_model\n",
    "import pandas as pd\n",
    "\n",
    "recon3 = read_sbml_model(\"../data/Recon3D.xml.gz\")\n",
    "exchanges = pd.Series([r.id for r in recon3.exchanges])\n",
    "exchanges = exchanges.str.replace(\"__\", \"_\").str.replace(\"_e$\", \"\", regex=True)\n",
    "exchanges.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we will check which ones we can find in our set and add in the dilution factors (again going with 1:10."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.1    79\n",
       "1.0    12\n",
       "Name: dilution, dtype: int64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "medium[\"dilution\"] = 1.0\n",
    "medium.loc[medium.reaction.isin(exchanges), \"dilution\"] = 0.1\n",
    "medium.dilution.value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Okay, so 79/91 components can be adsorbed by humans. We end by filling in the additional info."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>reaction</th>\n",
       "      <th>flux</th>\n",
       "      <th>dilution</th>\n",
       "      <th>metabolite</th>\n",
       "      <th>global_id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>EX_etoh_m</td>\n",
       "      <td>0.000100</td>\n",
       "      <td>0.1</td>\n",
       "      <td>etoh_m</td>\n",
       "      <td>EX_etoh(e)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>EX_h2o_m</td>\n",
       "      <td>6608.413339</td>\n",
       "      <td>0.1</td>\n",
       "      <td>h2o_m</td>\n",
       "      <td>EX_h2o(e)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>EX_caro_m</td>\n",
       "      <td>0.000100</td>\n",
       "      <td>0.1</td>\n",
       "      <td>caro_m</td>\n",
       "      <td>EX_caro(e)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>EX_retinol_m</td>\n",
       "      <td>0.129458</td>\n",
       "      <td>0.1</td>\n",
       "      <td>retinol_m</td>\n",
       "      <td>EX_retinol(e)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>EX_thm_m</td>\n",
       "      <td>0.075371</td>\n",
       "      <td>0.1</td>\n",
       "      <td>thm_m</td>\n",
       "      <td>EX_thm(e)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>86</th>\n",
       "      <td>EX_lgnc_m</td>\n",
       "      <td>0.008570</td>\n",
       "      <td>0.1</td>\n",
       "      <td>lgnc_m</td>\n",
       "      <td>EX_lgnc(e)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>87</th>\n",
       "      <td>EX_fol_m</td>\n",
       "      <td>0.000100</td>\n",
       "      <td>0.1</td>\n",
       "      <td>fol_m</td>\n",
       "      <td>EX_fol(e)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88</th>\n",
       "      <td>EX_strch1_m</td>\n",
       "      <td>0.002862</td>\n",
       "      <td>0.1</td>\n",
       "      <td>strch1_m</td>\n",
       "      <td>EX_strch1(e)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>89</th>\n",
       "      <td>EX_i_m</td>\n",
       "      <td>0.000320</td>\n",
       "      <td>0.1</td>\n",
       "      <td>i_m</td>\n",
       "      <td>EX_i(e)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90</th>\n",
       "      <td>EX_starch1200_m</td>\n",
       "      <td>0.000100</td>\n",
       "      <td>1.0</td>\n",
       "      <td>starch1200_m</td>\n",
       "      <td>EX_starch1200(e)</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>91 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           reaction         flux  dilution    metabolite         global_id\n",
       "0         EX_etoh_m     0.000100       0.1        etoh_m        EX_etoh(e)\n",
       "1          EX_h2o_m  6608.413339       0.1         h2o_m         EX_h2o(e)\n",
       "2         EX_caro_m     0.000100       0.1        caro_m        EX_caro(e)\n",
       "3      EX_retinol_m     0.129458       0.1     retinol_m     EX_retinol(e)\n",
       "4          EX_thm_m     0.075371       0.1         thm_m         EX_thm(e)\n",
       "..              ...          ...       ...           ...               ...\n",
       "86        EX_lgnc_m     0.008570       0.1        lgnc_m        EX_lgnc(e)\n",
       "87         EX_fol_m     0.000100       0.1         fol_m         EX_fol(e)\n",
       "88      EX_strch1_m     0.002862       0.1      strch1_m      EX_strch1(e)\n",
       "89           EX_i_m     0.000320       0.1           i_m           EX_i(e)\n",
       "90  EX_starch1200_m     0.000100       1.0  starch1200_m  EX_starch1200(e)\n",
       "\n",
       "[91 rows x 5 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "medium[\"metabolite\"] = medium.reaction.str.replace(\"^EX_\", \"\", regex=True) + \"_m\"\n",
    "medium[\"global_id\"] = medium.reaction + \"(e)\"\n",
    "medium[\"reaction\"] = medium.reaction + \"_m\"\n",
    "medium.loc[medium.flux < 1e-4, \"flux\"] = 1e-4\n",
    "medium"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Checking the growth medium against the DB\n",
    "\n",
    "But can the bacteria in our model database actually grow on this medium? Let's check and start by downbloading the AGORA model database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !wget https://zenodo.org/record/3755182/files/agora103_genus.qza?download=1 -O data/agora103_genus.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "No we we will check for growth by running the growth medium against any single model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "6ce7cf36b5094dcf8aad3b1243cfc10a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from micom.workflows.db_media import check_db_medium\n",
    "\n",
    "check = check_db_medium(\"../data/agora103_genus.qza\", medium, threads=20)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`check` now includes the entire manifest plus two new columns: the growth rate and whether the models can grow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False    227\n",
       "Name: can_grow, dtype: int64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "check.can_grow.value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Okay nothing can grow. We probably miss some important cofactor such as manganese or copper."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's complete the medium so that all taxa in Refseq can grow at a rate of at least 1e-4.\n",
    "\n",
    "## Supplementing a growth medium from a skeleton\n",
    "\n",
    "Sometimes you may start from a few componenents and will want to complete this skeleton medium to reach a certain minimum growth rate across all models in the database. This can be done with `complete_db_medium`. We can minimize either the added total flux, mass or presence of any atom. Since, we want to build a low carb diet here we will minimize the presence of added carbon."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "15d4df157db44233bcaa066d3c982b60",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from micom.workflows.db_media import complete_db_medium\n",
    "\n",
    "manifest, imports = complete_db_medium(\"../data/agora103_genus.qza\", medium, growth=0.001, threads=20, max_added_import=10, weights=\"C\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True    227\n",
       "Name: can_grow, dtype: int64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "manifest.can_grow.value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`manifest` is the amended manifest as before and `imports` contains the used import fluxes for each model. A new column in the manifest also tells us how many import were added."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    227.000000\n",
       "mean       6.678414\n",
       "std        3.959711\n",
       "min        1.000000\n",
       "25%        3.000000\n",
       "50%        7.000000\n",
       "75%        9.000000\n",
       "max       22.000000\n",
       "Name: added, dtype: float64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "manifest.added.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So we added 7 metabolites on average (1-22)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From this we build up our new medium."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(122, 4)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fluxes = imports.max()\n",
    "fluxes = fluxes[(fluxes > 1e-6) | fluxes.index.isin(medium.reaction)]\n",
    "completed = pd.DataFrame({\n",
    "    \"reaction\": fluxes.index,\n",
    "    \"metabolite\": fluxes.index.str.replace(\"^EX_\", \"\", regex=True),\n",
    "    \"global_id\": fluxes.index.str.replace(\"_m$\", \"(e)\", regex=True),\n",
    "    \"flux\": fluxes\n",
    "})\n",
    "completed.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's also export the medium as Qiime 2 artifact which can be read with `q2-micom` or the normal micom package."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'../media/vmh_high_fat_low_carb_agora.qza'"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from qiime2 import Artifact\n",
    "\n",
    "arti = Artifact.import_data(\"MicomMedium[Global]\", completed)\n",
    "arti.save(\"../media/vmh_high_fat_low_carb_agora.qza\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Validation\n",
    "\n",
    "As a last step we validate the created medium."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "6ed0b090c7a84be69ab5a29a2c4ffb60",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "True    227\n",
       "Name: can_grow, dtype: int64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "check = check_db_medium(\"../data/agora103_genus.qza\", completed, threads=20)\n",
    "check.can_grow.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    227.000000\n",
       "mean       0.002235\n",
       "std        0.001484\n",
       "min        0.001000\n",
       "25%        0.001000\n",
       "50%        0.002030\n",
       "75%        0.002564\n",
       "max        0.006467\n",
       "Name: growth_rate, dtype: float64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "check.growth_rate.describe()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}