{
"cells": [
{
"cell_type": "markdown",
"id": "0d0370ab-faae-4fa3-a62b-34c8df9d50d5",
"metadata": {},
"source": [
"# Breast milk medium for AGORA\n",
"\n",
"Here will attempt to assemble a growth medium for the infant gut based on a breast milk diet. We will use the following strategy:\n",
"\n",
"1. Obtain metabolomics data for human breast milk\n",
"2. Use WHO breast milk composition data to fill in some of the abundances\n",
"3. Distribute remaining mass across the non-quantified components\n",
"4. Add in intestinal metabolites like mucins and primary bile acids\n",
"5. complete the medium so all taxa in the AGORA database can grow in it\n",
"\n",
"Let's start by reading the metabolomics data from a study on breast milk which is incidentally the only one on the metabolomics workbench. This one is from obese donors, but since we will fill in the main abundances based on the WHO we hope this will be fairly representative."
]
},
{
"cell_type": "code",
"execution_count": 106,
"id": "ae7ca5dd-27f2-4165-b712-369792344812",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"metabolite_name\n",
"zymosterol 24984.303109\n",
"xylulose 1194.383420\n",
"xylose 2586.269430\n",
"xylonolactone 2670.575130\n",
"xylitol 3394.310881\n",
" ... \n",
"1-monopalmitin 209480.132124\n",
"1-monoolein 900478.279793\n",
"1-methylgalactose 2383.512953\n",
"1,5-anhydroglucitol 7029.012953\n",
"1,2,4-benzenetriol 597.020725\n",
"Length: 124, dtype: float64"
]
},
"execution_count": 106,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import requests\n",
"import pandas as pd\n",
"from io import StringIO\n",
"\n",
"response = requests.get(\"https://www.metabolomicsworkbench.org/data/DRCCMetadata.php?Mode=ProcessDownloadResults&DownloadMode=DownloadResults&StudyID=ST001322&AnalysisID=AN002198\")\n",
"abundances, metabolites = [pd.read_csv(StringIO(content), sep=\"\\t\") for content in response.text.split(\"\\n\\n\")]\n",
"abundances.columns = abundances.columns.str.strip()\n",
"abundances.set_index(\"metabolite_name\", inplace=True)\n",
"abundances = abundances.mean(axis=1)\n",
"abundances"
]
},
{
"cell_type": "code",
"execution_count": 107,
"id": "04705940-459e-4ba5-95f4-d69a6965a13e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" retention index | \n",
" quantitated m/z | \n",
" Binbase ID | \n",
" PubChem ID | \n",
" spectrum | \n",
" KEGG ID | \n",
" InChI Key | \n",
" ri_type | \n",
" abundance | \n",
"
\n",
" \n",
" | metabolite_name | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | zymosterol | \n",
" 1088064 | \n",
" 129 | \n",
" 110304 | \n",
" 92746.0 | \n",
" 85:177.0 89:568.0 91:6448.0 92:1105.0 93:4713.... | \n",
" C05437 | \n",
" CGSJXLIKVBJVRY-XTGBIJOFSA-N | \n",
" Binbase | \n",
" 24984.303109 | \n",
"
\n",
" \n",
" | xylulose | \n",
" 553450 | \n",
" 173 | \n",
" 31632 | \n",
" 439205.0 | \n",
" 85:1861.0 86:702.0 87:1148.0 88:324.0 89:10095... | \n",
" C00312 | \n",
" LQXVFWRQNMEDEE-PYHARJCCSA-N | \n",
" Binbase | \n",
" 1194.383420 | \n",
"
\n",
" \n",
" | xylose | \n",
" 543267 | \n",
" 103 | \n",
" 169 | \n",
" 135191.0 | \n",
" 86:77.0 87:118.0 89:838.0 90:80.0 91:46.0 94:1... | \n",
" C00181 | \n",
" SRBFZHDQGSBBOR-IOVATXLUSA-N | \n",
" Binbase | \n",
" 2586.269430 | \n",
"
\n",
" \n",
" | xylonolactone | \n",
" 535176 | \n",
" 217 | \n",
" 1808 | \n",
" 439692.0 | \n",
" 86:6.0 88:17.0 89:5.0 101:53.0 103:590.0 104:3... | \n",
" C02266 | \n",
" XXBSUZSONOQQGK-FLRLBIABSA-N | \n",
" Binbase | \n",
" 2670.575130 | \n",
"
\n",
" \n",
" | xylitol | \n",
" 567437 | \n",
" 217 | \n",
" 5857 | \n",
" 6912.0 | \n",
" 85:22.0 87:53.0 88:95.0 89:312.0 94:15.0 99:38... | \n",
" C00379 | \n",
" HEBKCHPVOIAQTA-NGQZWQHPSA-N | \n",
" Binbase | \n",
" 3394.310881 | \n",
"
\n",
" \n",
" | ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" | 1-monopalmitin | \n",
" 901749 | \n",
" 129 | \n",
" 2070 | \n",
" 14900.0 | \n",
" 85:11049.0 86:1066.0 87:1727.0 88:2503.0 89:43... | \n",
" C01885 | \n",
" QHZLMUACJMDIAE-UHFFFAOYSA-N | \n",
" Binbase | \n",
" 209480.132124 | \n",
"
\n",
" \n",
" | 1-monoolein | \n",
" 955584 | \n",
" 129 | \n",
" 21632 | \n",
" 5283468.0 | \n",
" 85:72.0 89:187.0 91:637.0 92:23.0 93:609.0 94:... | \n",
" NaN | \n",
" RZRNAYUHWVFMIP-KTKRTIGZSA-N | \n",
" Binbase | \n",
" 900478.279793 | \n",
"
\n",
" \n",
" | 1-methylgalactose | \n",
" 664807 | \n",
" 204 | \n",
" 477 | \n",
" 2108.0 | \n",
" 85:1337.0 86:428.0 87:1347.0 88:1090.0 89:3019... | \n",
" NaN | \n",
" HOVAGTYPODGVJG-UHFFFAOYSA-N | \n",
" Binbase | \n",
" 2383.512953 | \n",
"
\n",
" \n",
" | 1,5-anhydroglucitol | \n",
" 633603 | \n",
" 217 | \n",
" 209168 | \n",
" 64960.0 | \n",
" 85:8076.0 87:8514.0 88:4105.0 89:11826.0 91:22... | \n",
" C07326 | \n",
" MPCAJMNYNOGXPB-SLPGGIOYSA-N | \n",
" Binbase | \n",
" 7029.012953 | \n",
"
\n",
" \n",
" | 1,2,4-benzenetriol | \n",
" 521803 | \n",
" 239 | \n",
" 26704 | \n",
" 10787.0 | \n",
" 85:1133.0 87:801.0 88:613.0 89:323.0 90:201.0 ... | \n",
" C02814 | \n",
" GGNQRNBDZQJCCN-UHFFFAOYSA-N | \n",
" Binbase | \n",
" 597.020725 | \n",
"
\n",
" \n",
"
\n",
"
124 rows × 9 columns
\n",
"
"
],
"text/plain": [
" retention index quantitated m/z Binbase ID PubChem ID \\\n",
"metabolite_name \n",
"zymosterol 1088064 129 110304 92746.0 \n",
"xylulose 553450 173 31632 439205.0 \n",
"xylose 543267 103 169 135191.0 \n",
"xylonolactone 535176 217 1808 439692.0 \n",
"xylitol 567437 217 5857 6912.0 \n",
"... ... ... ... ... \n",
"1-monopalmitin 901749 129 2070 14900.0 \n",
"1-monoolein 955584 129 21632 5283468.0 \n",
"1-methylgalactose 664807 204 477 2108.0 \n",
"1,5-anhydroglucitol 633603 217 209168 64960.0 \n",
"1,2,4-benzenetriol 521803 239 26704 10787.0 \n",
"\n",
" spectrum \\\n",
"metabolite_name \n",
"zymosterol 85:177.0 89:568.0 91:6448.0 92:1105.0 93:4713.... \n",
"xylulose 85:1861.0 86:702.0 87:1148.0 88:324.0 89:10095... \n",
"xylose 86:77.0 87:118.0 89:838.0 90:80.0 91:46.0 94:1... \n",
"xylonolactone 86:6.0 88:17.0 89:5.0 101:53.0 103:590.0 104:3... \n",
"xylitol 85:22.0 87:53.0 88:95.0 89:312.0 94:15.0 99:38... \n",
"... ... \n",
"1-monopalmitin 85:11049.0 86:1066.0 87:1727.0 88:2503.0 89:43... \n",
"1-monoolein 85:72.0 89:187.0 91:637.0 92:23.0 93:609.0 94:... \n",
"1-methylgalactose 85:1337.0 86:428.0 87:1347.0 88:1090.0 89:3019... \n",
"1,5-anhydroglucitol 85:8076.0 87:8514.0 88:4105.0 89:11826.0 91:22... \n",
"1,2,4-benzenetriol 85:1133.0 87:801.0 88:613.0 89:323.0 90:201.0 ... \n",
"\n",
" KEGG ID InChI Key ri_type \\\n",
"metabolite_name \n",
"zymosterol C05437 CGSJXLIKVBJVRY-XTGBIJOFSA-N Binbase \n",
"xylulose C00312 LQXVFWRQNMEDEE-PYHARJCCSA-N Binbase \n",
"xylose C00181 SRBFZHDQGSBBOR-IOVATXLUSA-N Binbase \n",
"xylonolactone C02266 XXBSUZSONOQQGK-FLRLBIABSA-N Binbase \n",
"xylitol C00379 HEBKCHPVOIAQTA-NGQZWQHPSA-N Binbase \n",
"... ... ... ... \n",
"1-monopalmitin C01885 QHZLMUACJMDIAE-UHFFFAOYSA-N Binbase \n",
"1-monoolein NaN RZRNAYUHWVFMIP-KTKRTIGZSA-N Binbase \n",
"1-methylgalactose NaN HOVAGTYPODGVJG-UHFFFAOYSA-N Binbase \n",
"1,5-anhydroglucitol C07326 MPCAJMNYNOGXPB-SLPGGIOYSA-N Binbase \n",
"1,2,4-benzenetriol C02814 GGNQRNBDZQJCCN-UHFFFAOYSA-N Binbase \n",
"\n",
" abundance \n",
"metabolite_name \n",
"zymosterol 24984.303109 \n",
"xylulose 1194.383420 \n",
"xylose 2586.269430 \n",
"xylonolactone 2670.575130 \n",
"xylitol 3394.310881 \n",
"... ... \n",
"1-monopalmitin 209480.132124 \n",
"1-monoolein 900478.279793 \n",
"1-methylgalactose 2383.512953 \n",
"1,5-anhydroglucitol 7029.012953 \n",
"1,2,4-benzenetriol 597.020725 \n",
"\n",
"[124 rows x 9 columns]"
]
},
"execution_count": 107,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"metabolites.set_index(\"metabolite_name\", inplace=True)\n",
"metabolites[\"abundance\"] = abundances\n",
"metabolites"
]
},
{
"cell_type": "markdown",
"id": "5ea3b467-a849-4e2d-baf3-402d52f115d4",
"metadata": {},
"source": [
"Now we try to map it onto the AGORA database."
]
},
{
"cell_type": "code",
"execution_count": 108,
"id": "e785ce7b-6705-48a2-86a0-be5636ce254b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" metabolite | \n",
" name | \n",
" hmdb | \n",
" kegg.compound | \n",
" pubchem.compound | \n",
" inchi | \n",
" chebi | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 10fthf5glu | \n",
" 10-formyltetrahydrofolate-[Glu](5) | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" | 1 | \n",
" 10fthf | \n",
" 10-Formyltetrahydrofolate | \n",
" HMDB00972 | \n",
" C00234 | \n",
" 122347.0 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" | 2 | \n",
" 10m3hddcaACP | \n",
" 10-methyl-3-hydroxy-dodecanoyl-ACP | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" | 3 | \n",
" 10m3hundecACP | \n",
" 10-methyl-3-hydroxy-undecanoyl-ACP | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" | 4 | \n",
" 10m3oddcaACP | \n",
" 10-methyl-3-oxo-dodecanoyl-ACP | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" metabolite name hmdb kegg.compound \\\n",
"0 10fthf5glu 10-formyltetrahydrofolate-[Glu](5) NaN NaN \n",
"1 10fthf 10-Formyltetrahydrofolate HMDB00972 C00234 \n",
"2 10m3hddcaACP 10-methyl-3-hydroxy-dodecanoyl-ACP NaN NaN \n",
"3 10m3hundecACP 10-methyl-3-hydroxy-undecanoyl-ACP NaN NaN \n",
"4 10m3oddcaACP 10-methyl-3-oxo-dodecanoyl-ACP NaN NaN \n",
"\n",
" pubchem.compound inchi chebi \n",
"0 NaN NaN NaN \n",
"1 122347.0 NaN NaN \n",
"2 NaN NaN NaN \n",
"3 NaN NaN NaN \n",
"4 NaN NaN NaN "
]
},
"execution_count": 108,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agora_mets = pd.read_csv(\"../data/agora_metabolites.csv\")\n",
"agora_mets.head()"
]
},
{
"cell_type": "code",
"execution_count": 109,
"id": "277f80a1-f30e-43dc-b49a-d00234a79c5b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" metabolite | \n",
" name | \n",
" hmdb | \n",
" kegg.compound | \n",
" pubchem.compound | \n",
" inchi | \n",
" chebi | \n",
" retention index | \n",
" quantitated m/z | \n",
" Binbase ID | \n",
" PubChem ID | \n",
" KEGG ID | \n",
" InChI Key | \n",
" ri_type | \n",
" abundance | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" ala_B | \n",
" beta-alanine | \n",
" HMDB00056 | \n",
" C00099 | \n",
" NaN | \n",
" InChI=1S/C3H7NO2/c4-2-1-3(5)6/h1-2,4H2,(H,5,6) | \n",
" NaN | \n",
" 435564 | \n",
" 248 | \n",
" 148 | \n",
" 239.0 | \n",
" C00099 | \n",
" UCMIRNVEIXFBKS-UHFFFAOYSA-N | \n",
" Binbase | \n",
" 2.759922e+02 | \n",
"
\n",
" \n",
" | 1 | \n",
" ala_L | \n",
" L-alanine | \n",
" HMDB00161 | \n",
" C00041 | \n",
" 5950.0 | \n",
" InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5... | \n",
" NaN | \n",
" 243971 | \n",
" 116 | \n",
" 34178 | \n",
" 5950.0 | \n",
" C00041 | \n",
" QNAYBMKLOCPYGJ-REOHCLBHSA-N | \n",
" Binbase | \n",
" 1.915410e+04 | \n",
"
\n",
" \n",
" | 2 | \n",
" asp_L | \n",
" L-aspartate(1-) | \n",
" HMDB00191 | \n",
" C00049 | \n",
" 5960.0 | \n",
" InChI=1S/C4H7NO4/c5-2(4(8)9)1-3(6)7/h2H,1,5H2,... | \n",
" NaN | \n",
" 480387 | \n",
" 232 | \n",
" 79 | \n",
" 5960.0 | \n",
" C00049 | \n",
" CKLJMWTZIZZHCS-REOHCLBHSA-N | \n",
" Binbase | \n",
" 2.535544e+02 | \n",
"
\n",
" \n",
" | 3 | \n",
" cit | \n",
" Citrate | \n",
" HMDB00094 | \n",
" C00158 | \n",
" 311.0 | \n",
" InChI=1S/C6H8O7/c7-3(8)1-6(13,5(11)12)2-4(9)10... | \n",
" NaN | \n",
" 617342 | \n",
" 273 | \n",
" 288 | \n",
" 311.0 | \n",
" C00158 | \n",
" KRKNYBCHXYNGOX-UHFFFAOYSA-N | \n",
" Binbase | \n",
" 2.791619e+03 | \n",
"
\n",
" \n",
" | 4 | \n",
" ddca | \n",
" laurate | \n",
" HMDB00638 | \n",
" C02679 | \n",
" 3893.0 | \n",
" InChI=1S/C12H24O2/c1-2-3-4-5-6-7-8-9-10-11-12(... | \n",
" NaN | \n",
" 547906 | \n",
" 117 | \n",
" 49 | \n",
" 3893.0 | \n",
" C02679 | \n",
" POULHZVOKOAJMA-UHFFFAOYSA-N | \n",
" Binbase | \n",
" 3.843045e+05 | \n",
"
\n",
" \n",
" | ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" | 58 | \n",
" glutar | \n",
" Glutarate | \n",
" HMDB00661 | \n",
" C00489 | \n",
" 3772.0 | \n",
" InChI=1S/C5H8O4/c6-4(7)2-1-3-5(8)9/h1-3H2,(H,6... | \n",
" NaN | \n",
" 421596 | \n",
" 261 | \n",
" 16952 | \n",
" 743.0 | \n",
" C00489 | \n",
" JFCQEDHGNNZCLN-UHFFFAOYSA-N | \n",
" Binbase | \n",
" 2.172487e+02 | \n",
"
\n",
" \n",
" | 59 | \n",
" chsterol | \n",
" cholesterol | \n",
" HMDB00067 | \n",
" C00187 | \n",
" 5997.0 | \n",
" NaN | \n",
" NaN | \n",
" 1076014 | \n",
" 129 | \n",
" 87943 | \n",
" 5997.0 | \n",
" C00187 | \n",
" HVYWMOMLDIMFJA-DPAQBDIFSA-N | \n",
" Binbase | \n",
" 2.556318e+05 | \n",
"
\n",
" \n",
" | 6 | \n",
" lcts | \n",
" Lactose | \n",
" HMDB00186 | \n",
" C00243 | \n",
" 440995.0 | \n",
" NaN | \n",
" NaN | \n",
" 932179 | \n",
" 204 | \n",
" 1373 | \n",
" 6134.0 | \n",
" C01970 | \n",
" GUBGYTABKSRVRQ-DCSYEGIMSA-N | \n",
" Binbase | \n",
" 4.425044e+06 | \n",
"
\n",
" \n",
" | 8 | \n",
" raffin | \n",
" Raffinose | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 1120886 | \n",
" 361 | \n",
" 3190 | \n",
" 439242.0 | \n",
" C00492 | \n",
" MUPFEKGTMRGPLJ-ZQSKZDJDSA-N | \n",
" Binbase | \n",
" 2.543718e+03 | \n",
"
\n",
" \n",
" | 12 | \n",
" hqn | \n",
" Hydroquinone | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 422583 | \n",
" 239 | \n",
" 16709 | \n",
" 785.0 | \n",
" C00530 | \n",
" QIGBRXMKCJKVMJ-UHFFFAOYSA-N | \n",
" Binbase | \n",
" 1.679870e+02 | \n",
"
\n",
" \n",
"
\n",
"
63 rows × 15 columns
\n",
"
"
],
"text/plain": [
" metabolite name hmdb kegg.compound pubchem.compound \\\n",
"0 ala_B beta-alanine HMDB00056 C00099 NaN \n",
"1 ala_L L-alanine HMDB00161 C00041 5950.0 \n",
"2 asp_L L-aspartate(1-) HMDB00191 C00049 5960.0 \n",
"3 cit Citrate HMDB00094 C00158 311.0 \n",
"4 ddca laurate HMDB00638 C02679 3893.0 \n",
".. ... ... ... ... ... \n",
"58 glutar Glutarate HMDB00661 C00489 3772.0 \n",
"59 chsterol cholesterol HMDB00067 C00187 5997.0 \n",
"6 lcts Lactose HMDB00186 C00243 440995.0 \n",
"8 raffin Raffinose NaN NaN NaN \n",
"12 hqn Hydroquinone NaN NaN NaN \n",
"\n",
" inchi chebi retention index \\\n",
"0 InChI=1S/C3H7NO2/c4-2-1-3(5)6/h1-2,4H2,(H,5,6) NaN 435564 \n",
"1 InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5... NaN 243971 \n",
"2 InChI=1S/C4H7NO4/c5-2(4(8)9)1-3(6)7/h2H,1,5H2,... NaN 480387 \n",
"3 InChI=1S/C6H8O7/c7-3(8)1-6(13,5(11)12)2-4(9)10... NaN 617342 \n",
"4 InChI=1S/C12H24O2/c1-2-3-4-5-6-7-8-9-10-11-12(... NaN 547906 \n",
".. ... ... ... \n",
"58 InChI=1S/C5H8O4/c6-4(7)2-1-3-5(8)9/h1-3H2,(H,6... NaN 421596 \n",
"59 NaN NaN 1076014 \n",
"6 NaN NaN 932179 \n",
"8 NaN NaN 1120886 \n",
"12 NaN NaN 422583 \n",
"\n",
" quantitated m/z Binbase ID PubChem ID KEGG ID \\\n",
"0 248 148 239.0 C00099 \n",
"1 116 34178 5950.0 C00041 \n",
"2 232 79 5960.0 C00049 \n",
"3 273 288 311.0 C00158 \n",
"4 117 49 3893.0 C02679 \n",
".. ... ... ... ... \n",
"58 261 16952 743.0 C00489 \n",
"59 129 87943 5997.0 C00187 \n",
"6 204 1373 6134.0 C01970 \n",
"8 361 3190 439242.0 C00492 \n",
"12 239 16709 785.0 C00530 \n",
"\n",
" InChI Key ri_type abundance \n",
"0 UCMIRNVEIXFBKS-UHFFFAOYSA-N Binbase 2.759922e+02 \n",
"1 QNAYBMKLOCPYGJ-REOHCLBHSA-N Binbase 1.915410e+04 \n",
"2 CKLJMWTZIZZHCS-REOHCLBHSA-N Binbase 2.535544e+02 \n",
"3 KRKNYBCHXYNGOX-UHFFFAOYSA-N Binbase 2.791619e+03 \n",
"4 POULHZVOKOAJMA-UHFFFAOYSA-N Binbase 3.843045e+05 \n",
".. ... ... ... \n",
"58 JFCQEDHGNNZCLN-UHFFFAOYSA-N Binbase 2.172487e+02 \n",
"59 HVYWMOMLDIMFJA-DPAQBDIFSA-N Binbase 2.556318e+05 \n",
"6 GUBGYTABKSRVRQ-DCSYEGIMSA-N Binbase 4.425044e+06 \n",
"8 MUPFEKGTMRGPLJ-ZQSKZDJDSA-N Binbase 2.543718e+03 \n",
"12 QIGBRXMKCJKVMJ-UHFFFAOYSA-N Binbase 1.679870e+02 \n",
"\n",
"[63 rows x 15 columns]"
]
},
"execution_count": 109,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kegg = pd.merge(agora_mets[agora_mets[\"kegg.compound\"].notnull()], metabolites, left_on=\"kegg.compound\", right_on=\"KEGG ID\")\n",
"name = pd.merge(agora_mets, metabolites, left_on=agora_mets.name.str.lower(), right_on=metabolites.index)\n",
"merged = pd.concat([kegg, name]).drop_duplicates(subset=[\"metabolite\"]).drop(columns=[\"spectrum\", \"key_0\"])\n",
"merged"
]
},
{
"cell_type": "markdown",
"id": "4bb0bde7-f19c-43ad-a84a-8ba66c430259",
"metadata": {},
"source": [
"Now we will in some abundances with data from the WHO (https://archive.unu.edu/unupress/food/8F174e/8F174E04.htm). We do this for 1l of milk for now since that is pretty much the largest amount a baby drinks per day. We also add in some carbon sources that are present in the gut (mucin cores and primary bile acids). Note that we won't be using the metabolomics abundances here since those are relative data not absolute ones (are under peak). So higher values *between* metabolites dont't necessarily mean that one is more abundant than the other."
]
},
{
"cell_type": "code",
"execution_count": 110,
"id": "39701fc4-6957-4de3-9ca0-7bac948fa383",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" hmdb | \n",
" kegg.compound | \n",
" pubchem.compound | \n",
" inchi | \n",
" chebi | \n",
" retention index | \n",
" quantitated m/z | \n",
" Binbase ID | \n",
" PubChem ID | \n",
" KEGG ID | \n",
" InChI Key | \n",
" ri_type | \n",
" abundance | \n",
" mmol_per_litre | \n",
"
\n",
" \n",
" | metabolite | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | ala_B | \n",
" beta-alanine | \n",
" HMDB00056 | \n",
" C00099 | \n",
" NaN | \n",
" InChI=1S/C3H7NO2/c4-2-1-3(5)6/h1-2,4H2,(H,5,6) | \n",
" NaN | \n",
" 435564.0 | \n",
" 248.0 | \n",
" 148.0 | \n",
" 239.0 | \n",
" C00099 | \n",
" UCMIRNVEIXFBKS-UHFFFAOYSA-N | \n",
" Binbase | \n",
" 275.992228 | \n",
" 1.000 | \n",
"
\n",
" \n",
" | ala_L | \n",
" L-alanine | \n",
" HMDB00161 | \n",
" C00041 | \n",
" 5950.0 | \n",
" InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5... | \n",
" NaN | \n",
" 243971.0 | \n",
" 116.0 | \n",
" 34178.0 | \n",
" 5950.0 | \n",
" C00041 | \n",
" QNAYBMKLOCPYGJ-REOHCLBHSA-N | \n",
" Binbase | \n",
" 19154.101036 | \n",
" 1.000 | \n",
"
\n",
" \n",
" | asp_L | \n",
" L-aspartate(1-) | \n",
" HMDB00191 | \n",
" C00049 | \n",
" 5960.0 | \n",
" InChI=1S/C4H7NO4/c5-2(4(8)9)1-3(6)7/h2H,1,5H2,... | \n",
" NaN | \n",
" 480387.0 | \n",
" 232.0 | \n",
" 79.0 | \n",
" 5960.0 | \n",
" C00049 | \n",
" CKLJMWTZIZZHCS-REOHCLBHSA-N | \n",
" Binbase | \n",
" 253.554404 | \n",
" 1.000 | \n",
"
\n",
" \n",
" | cit | \n",
" Citrate | \n",
" HMDB00094 | \n",
" C00158 | \n",
" 311.0 | \n",
" InChI=1S/C6H8O7/c7-3(8)1-6(13,5(11)12)2-4(9)10... | \n",
" NaN | \n",
" 617342.0 | \n",
" 273.0 | \n",
" 288.0 | \n",
" 311.0 | \n",
" C00158 | \n",
" KRKNYBCHXYNGOX-UHFFFAOYSA-N | \n",
" Binbase | \n",
" 2791.619171 | \n",
" 1.000 | \n",
"
\n",
" \n",
" | ddca | \n",
" laurate | \n",
" HMDB00638 | \n",
" C02679 | \n",
" 3893.0 | \n",
" InChI=1S/C12H24O2/c1-2-3-4-5-6-7-8-9-10-11-12(... | \n",
" NaN | \n",
" 547906.0 | \n",
" 117.0 | \n",
" 49.0 | \n",
" 3893.0 | \n",
" C02679 | \n",
" POULHZVOKOAJMA-UHFFFAOYSA-N | \n",
" Binbase | \n",
" 384304.502591 | \n",
" 1.000 | \n",
"
\n",
" \n",
" | ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" | gncore2_rl | \n",
" released GlcNAc-alpha-1,4-Core 2 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 1.000 | \n",
"
\n",
" \n",
" | core7 | \n",
" Core 7 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 1.000 | \n",
"
\n",
" \n",
" | gchola | \n",
" glycocholate | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 1.000 | \n",
"
\n",
" \n",
" | tchola | \n",
" taurocholate | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 1.000 | \n",
"
\n",
" \n",
" | o2 | \n",
" Oxygen | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 0.001 | \n",
"
\n",
" \n",
"
\n",
"
84 rows × 15 columns
\n",
"
"
],
"text/plain": [
" name hmdb kegg.compound \\\n",
"metabolite \n",
"ala_B beta-alanine HMDB00056 C00099 \n",
"ala_L L-alanine HMDB00161 C00041 \n",
"asp_L L-aspartate(1-) HMDB00191 C00049 \n",
"cit Citrate HMDB00094 C00158 \n",
"ddca laurate HMDB00638 C02679 \n",
"... ... ... ... \n",
"gncore2_rl released GlcNAc-alpha-1,4-Core 2 NaN NaN \n",
"core7 Core 7 NaN NaN \n",
"gchola glycocholate NaN NaN \n",
"tchola taurocholate NaN NaN \n",
"o2 Oxygen NaN NaN \n",
"\n",
" pubchem.compound \\\n",
"metabolite \n",
"ala_B NaN \n",
"ala_L 5950.0 \n",
"asp_L 5960.0 \n",
"cit 311.0 \n",
"ddca 3893.0 \n",
"... ... \n",
"gncore2_rl NaN \n",
"core7 NaN \n",
"gchola NaN \n",
"tchola NaN \n",
"o2 NaN \n",
"\n",
" inchi chebi \\\n",
"metabolite \n",
"ala_B InChI=1S/C3H7NO2/c4-2-1-3(5)6/h1-2,4H2,(H,5,6) NaN \n",
"ala_L InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5... NaN \n",
"asp_L InChI=1S/C4H7NO4/c5-2(4(8)9)1-3(6)7/h2H,1,5H2,... NaN \n",
"cit InChI=1S/C6H8O7/c7-3(8)1-6(13,5(11)12)2-4(9)10... NaN \n",
"ddca InChI=1S/C12H24O2/c1-2-3-4-5-6-7-8-9-10-11-12(... NaN \n",
"... ... ... \n",
"gncore2_rl NaN NaN \n",
"core7 NaN NaN \n",
"gchola NaN NaN \n",
"tchola NaN NaN \n",
"o2 NaN NaN \n",
"\n",
" retention index quantitated m/z Binbase ID PubChem ID KEGG ID \\\n",
"metabolite \n",
"ala_B 435564.0 248.0 148.0 239.0 C00099 \n",
"ala_L 243971.0 116.0 34178.0 5950.0 C00041 \n",
"asp_L 480387.0 232.0 79.0 5960.0 C00049 \n",
"cit 617342.0 273.0 288.0 311.0 C00158 \n",
"ddca 547906.0 117.0 49.0 3893.0 C02679 \n",
"... ... ... ... ... ... \n",
"gncore2_rl NaN NaN NaN NaN NaN \n",
"core7 NaN NaN NaN NaN NaN \n",
"gchola NaN NaN NaN NaN NaN \n",
"tchola NaN NaN NaN NaN NaN \n",
"o2 NaN NaN NaN NaN NaN \n",
"\n",
" InChI Key ri_type abundance \\\n",
"metabolite \n",
"ala_B UCMIRNVEIXFBKS-UHFFFAOYSA-N Binbase 275.992228 \n",
"ala_L QNAYBMKLOCPYGJ-REOHCLBHSA-N Binbase 19154.101036 \n",
"asp_L CKLJMWTZIZZHCS-REOHCLBHSA-N Binbase 253.554404 \n",
"cit KRKNYBCHXYNGOX-UHFFFAOYSA-N Binbase 2791.619171 \n",
"ddca POULHZVOKOAJMA-UHFFFAOYSA-N Binbase 384304.502591 \n",
"... ... ... ... \n",
"gncore2_rl NaN NaN NaN \n",
"core7 NaN NaN NaN \n",
"gchola NaN NaN NaN \n",
"tchola NaN NaN NaN \n",
"o2 NaN NaN NaN \n",
"\n",
" mmol_per_litre \n",
"metabolite \n",
"ala_B 1.000 \n",
"ala_L 1.000 \n",
"asp_L 1.000 \n",
"cit 1.000 \n",
"ddca 1.000 \n",
"... ... \n",
"gncore2_rl 1.000 \n",
"core7 1.000 \n",
"gchola 1.000 \n",
"tchola 1.000 \n",
"o2 0.001 \n",
"\n",
"[84 rows x 15 columns]"
]
},
"execution_count": 110,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merged.set_index(\"metabolite\", inplace=True)\n",
"merged.loc[\"lcts\", \"mmol_per_litre\"] = 70/180*1000\n",
"merged.loc[\"chsterol\", \"mmol_per_litre\"] = 0.16/386 * 1000\n",
"merged.loc[\"ca2\", \"mmol_per_litre\"] = 0.3/40 * 1000\n",
"merged.loc[\"ppi\", \"mmol_per_litre\"] = 0.14/174 * 1000\n",
"merged.loc[\"na1\", \"mmol_per_litre\"] = 0.15/35 * 1000\n",
"merged.loc[\"k\", \"mmol_per_litre\"] = 0.55/39 * 1000\n",
"merged.loc[\"cl\", \"mmol_per_litre\"] = 0.43/35 * 1000\n",
"\n",
"# mucin\n",
"for met in agora_mets.loc[agora_mets.metabolite.str.contains(\"core\"), \"metabolite\"]:\n",
" merged.loc[met, \"mmol_per_litre\"] = 1\n",
" merged.loc[met, \"name\"] = agora_mets.loc[agora_mets.metabolite == met, \"name\"].values\n",
"\n",
"# primary BAs\n",
"for met in [\"gchola\", \"tchola\"]:\n",
" merged.loc[met, \"mmol_per_litre\"] = 1\n",
" merged.loc[met, \"name\"] = agora_mets.loc[agora_mets.metabolite == met, \"name\"].values\n",
" \n",
"# anaerobic\n",
"merged.loc[\"o2\", [\"mmol_per_litre\", \"name\"]] = [0.001, \"Oxygen\"]\n",
"\n",
"merged.loc[merged.mmol_per_litre.isnull(), \"mmol_per_litre\"] = 1\n",
"merged"
]
},
{
"cell_type": "markdown",
"id": "9b29ce1f-503d-481a-8708-9ffa5eeed308",
"metadata": {},
"source": [
"Now we will try to identify components that can be taken up by human cells.\n",
"\n",
"## Identifying human adsorption\n",
"\n",
"To achieve this we will load the Recon3 human model. AGORA and Recon IDs are very similar so we should be able to match them. We just have to adjust the Recon3 ones a bit. We start by identifying all available exchanges in Recon3 and adjusting the IDs."
]
},
{
"cell_type": "code",
"execution_count": 111,
"id": "39b6cb06-f94c-4d48-b247-aa1a3df0ecd8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 EX_5adtststerone\n",
"1 EX_5adtststerones\n",
"2 EX_5fthf\n",
"3 EX_5htrp\n",
"4 EX_5mthf\n",
"dtype: object"
]
},
"execution_count": 111,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from cobra.io import read_sbml_model\n",
"import pandas as pd\n",
"\n",
"recon3 = read_sbml_model(\"../data/Recon3D.xml.gz\")\n",
"exchanges = pd.Series([r.id for r in recon3.exchanges])\n",
"exchanges = exchanges.str.replace(\"__\", \"_\").str.replace(\"_e$\", \"\", regex=True)\n",
"exchanges.head()"
]
},
{
"cell_type": "code",
"execution_count": 112,
"id": "e754c6ff-c5e4-4a80-92a2-a483b11eb01c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.1 71\n",
"1.0 13\n",
"Name: dilution, dtype: int64"
]
},
"execution_count": 112,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"medium = merged.reset_index().copy()\n",
"medium[\"reaction\"] = \"EX_\" + medium.metabolite\n",
"medium[\"dilution\"] = 1.0\n",
"medium.loc[medium.reaction.isin(exchanges), \"dilution\"] = 0.1\n",
"medium.dilution.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 113,
"id": "8d406683-118c-456f-92d4-74b6df8ac94a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" metabolite | \n",
" name | \n",
" hmdb | \n",
" kegg.compound | \n",
" pubchem.compound | \n",
" inchi | \n",
" chebi | \n",
" retention index | \n",
" quantitated m/z | \n",
" Binbase ID | \n",
" PubChem ID | \n",
" KEGG ID | \n",
" InChI Key | \n",
" ri_type | \n",
" abundance | \n",
" mmol_per_litre | \n",
" reaction | \n",
" dilution | \n",
" global_id | \n",
" flux | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" ala_B_m | \n",
" beta-alanine | \n",
" HMDB00056 | \n",
" C00099 | \n",
" NaN | \n",
" InChI=1S/C3H7NO2/c4-2-1-3(5)6/h1-2,4H2,(H,5,6) | \n",
" NaN | \n",
" 435564.0 | \n",
" 248.0 | \n",
" 148.0 | \n",
" 239.0 | \n",
" C00099 | \n",
" UCMIRNVEIXFBKS-UHFFFAOYSA-N | \n",
" Binbase | \n",
" 275.992228 | \n",
" 1.000 | \n",
" EX_ala_B_m | \n",
" 0.1 | \n",
" EX_ala_B(e) | \n",
" 0.1000 | \n",
"
\n",
" \n",
" | 1 | \n",
" ala_L_m | \n",
" L-alanine | \n",
" HMDB00161 | \n",
" C00041 | \n",
" 5950.0 | \n",
" InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5... | \n",
" NaN | \n",
" 243971.0 | \n",
" 116.0 | \n",
" 34178.0 | \n",
" 5950.0 | \n",
" C00041 | \n",
" QNAYBMKLOCPYGJ-REOHCLBHSA-N | \n",
" Binbase | \n",
" 19154.101036 | \n",
" 1.000 | \n",
" EX_ala_L_m | \n",
" 0.1 | \n",
" EX_ala_L(e) | \n",
" 0.1000 | \n",
"
\n",
" \n",
" | 2 | \n",
" asp_L_m | \n",
" L-aspartate(1-) | \n",
" HMDB00191 | \n",
" C00049 | \n",
" 5960.0 | \n",
" InChI=1S/C4H7NO4/c5-2(4(8)9)1-3(6)7/h2H,1,5H2,... | \n",
" NaN | \n",
" 480387.0 | \n",
" 232.0 | \n",
" 79.0 | \n",
" 5960.0 | \n",
" C00049 | \n",
" CKLJMWTZIZZHCS-REOHCLBHSA-N | \n",
" Binbase | \n",
" 253.554404 | \n",
" 1.000 | \n",
" EX_asp_L_m | \n",
" 0.1 | \n",
" EX_asp_L(e) | \n",
" 0.1000 | \n",
"
\n",
" \n",
" | 3 | \n",
" cit_m | \n",
" Citrate | \n",
" HMDB00094 | \n",
" C00158 | \n",
" 311.0 | \n",
" InChI=1S/C6H8O7/c7-3(8)1-6(13,5(11)12)2-4(9)10... | \n",
" NaN | \n",
" 617342.0 | \n",
" 273.0 | \n",
" 288.0 | \n",
" 311.0 | \n",
" C00158 | \n",
" KRKNYBCHXYNGOX-UHFFFAOYSA-N | \n",
" Binbase | \n",
" 2791.619171 | \n",
" 1.000 | \n",
" EX_cit_m | \n",
" 0.1 | \n",
" EX_cit(e) | \n",
" 0.1000 | \n",
"
\n",
" \n",
" | 4 | \n",
" ddca_m | \n",
" laurate | \n",
" HMDB00638 | \n",
" C02679 | \n",
" 3893.0 | \n",
" InChI=1S/C12H24O2/c1-2-3-4-5-6-7-8-9-10-11-12(... | \n",
" NaN | \n",
" 547906.0 | \n",
" 117.0 | \n",
" 49.0 | \n",
" 3893.0 | \n",
" C02679 | \n",
" POULHZVOKOAJMA-UHFFFAOYSA-N | \n",
" Binbase | \n",
" 384304.502591 | \n",
" 1.000 | \n",
" EX_ddca_m | \n",
" 0.1 | \n",
" EX_ddca(e) | \n",
" 0.1000 | \n",
"
\n",
" \n",
" | ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" | 79 | \n",
" gncore2_rl_m | \n",
" released GlcNAc-alpha-1,4-Core 2 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 1.000 | \n",
" EX_gncore2_rl_m | \n",
" 1.0 | \n",
" EX_gncore2_rl(e) | \n",
" 1.0000 | \n",
"
\n",
" \n",
" | 80 | \n",
" core7_m | \n",
" Core 7 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 1.000 | \n",
" EX_core7_m | \n",
" 0.1 | \n",
" EX_core7(e) | \n",
" 0.1000 | \n",
"
\n",
" \n",
" | 81 | \n",
" gchola_m | \n",
" glycocholate | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 1.000 | \n",
" EX_gchola_m | \n",
" 0.1 | \n",
" EX_gchola(e) | \n",
" 0.1000 | \n",
"
\n",
" \n",
" | 82 | \n",
" tchola_m | \n",
" taurocholate | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 1.000 | \n",
" EX_tchola_m | \n",
" 0.1 | \n",
" EX_tchola(e) | \n",
" 0.1000 | \n",
"
\n",
" \n",
" | 83 | \n",
" o2_m | \n",
" Oxygen | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 0.001 | \n",
" EX_o2_m | \n",
" 0.1 | \n",
" EX_o2(e) | \n",
" 0.0001 | \n",
"
\n",
" \n",
"
\n",
"
84 rows × 20 columns
\n",
"
"
],
"text/plain": [
" metabolite name hmdb kegg.compound \\\n",
"0 ala_B_m beta-alanine HMDB00056 C00099 \n",
"1 ala_L_m L-alanine HMDB00161 C00041 \n",
"2 asp_L_m L-aspartate(1-) HMDB00191 C00049 \n",
"3 cit_m Citrate HMDB00094 C00158 \n",
"4 ddca_m laurate HMDB00638 C02679 \n",
".. ... ... ... ... \n",
"79 gncore2_rl_m released GlcNAc-alpha-1,4-Core 2 NaN NaN \n",
"80 core7_m Core 7 NaN NaN \n",
"81 gchola_m glycocholate NaN NaN \n",
"82 tchola_m taurocholate NaN NaN \n",
"83 o2_m Oxygen NaN NaN \n",
"\n",
" pubchem.compound inchi chebi \\\n",
"0 NaN InChI=1S/C3H7NO2/c4-2-1-3(5)6/h1-2,4H2,(H,5,6) NaN \n",
"1 5950.0 InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5... NaN \n",
"2 5960.0 InChI=1S/C4H7NO4/c5-2(4(8)9)1-3(6)7/h2H,1,5H2,... NaN \n",
"3 311.0 InChI=1S/C6H8O7/c7-3(8)1-6(13,5(11)12)2-4(9)10... NaN \n",
"4 3893.0 InChI=1S/C12H24O2/c1-2-3-4-5-6-7-8-9-10-11-12(... NaN \n",
".. ... ... ... \n",
"79 NaN NaN NaN \n",
"80 NaN NaN NaN \n",
"81 NaN NaN NaN \n",
"82 NaN NaN NaN \n",
"83 NaN NaN NaN \n",
"\n",
" retention index quantitated m/z Binbase ID PubChem ID KEGG ID \\\n",
"0 435564.0 248.0 148.0 239.0 C00099 \n",
"1 243971.0 116.0 34178.0 5950.0 C00041 \n",
"2 480387.0 232.0 79.0 5960.0 C00049 \n",
"3 617342.0 273.0 288.0 311.0 C00158 \n",
"4 547906.0 117.0 49.0 3893.0 C02679 \n",
".. ... ... ... ... ... \n",
"79 NaN NaN NaN NaN NaN \n",
"80 NaN NaN NaN NaN NaN \n",
"81 NaN NaN NaN NaN NaN \n",
"82 NaN NaN NaN NaN NaN \n",
"83 NaN NaN NaN NaN NaN \n",
"\n",
" InChI Key ri_type abundance mmol_per_litre \\\n",
"0 UCMIRNVEIXFBKS-UHFFFAOYSA-N Binbase 275.992228 1.000 \n",
"1 QNAYBMKLOCPYGJ-REOHCLBHSA-N Binbase 19154.101036 1.000 \n",
"2 CKLJMWTZIZZHCS-REOHCLBHSA-N Binbase 253.554404 1.000 \n",
"3 KRKNYBCHXYNGOX-UHFFFAOYSA-N Binbase 2791.619171 1.000 \n",
"4 POULHZVOKOAJMA-UHFFFAOYSA-N Binbase 384304.502591 1.000 \n",
".. ... ... ... ... \n",
"79 NaN NaN NaN 1.000 \n",
"80 NaN NaN NaN 1.000 \n",
"81 NaN NaN NaN 1.000 \n",
"82 NaN NaN NaN 1.000 \n",
"83 NaN NaN NaN 0.001 \n",
"\n",
" reaction dilution global_id flux \n",
"0 EX_ala_B_m 0.1 EX_ala_B(e) 0.1000 \n",
"1 EX_ala_L_m 0.1 EX_ala_L(e) 0.1000 \n",
"2 EX_asp_L_m 0.1 EX_asp_L(e) 0.1000 \n",
"3 EX_cit_m 0.1 EX_cit(e) 0.1000 \n",
"4 EX_ddca_m 0.1 EX_ddca(e) 0.1000 \n",
".. ... ... ... ... \n",
"79 EX_gncore2_rl_m 1.0 EX_gncore2_rl(e) 1.0000 \n",
"80 EX_core7_m 0.1 EX_core7(e) 0.1000 \n",
"81 EX_gchola_m 0.1 EX_gchola(e) 0.1000 \n",
"82 EX_tchola_m 0.1 EX_tchola(e) 0.1000 \n",
"83 EX_o2_m 0.1 EX_o2(e) 0.0001 \n",
"\n",
"[84 rows x 20 columns]"
]
},
"execution_count": 113,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"medium[\"metabolite\"] = medium.reaction.str.replace(\"^EX_\", \"\", regex=True) + \"_m\"\n",
"medium[\"global_id\"] = medium.reaction + \"(e)\"\n",
"medium[\"reaction\"] = medium.reaction + \"_m\"\n",
"medium[\"flux\"] = medium.mmol_per_litre * medium.dilution\n",
"medium.loc[medium.flux < 1e-4, \"flux\"] = 1e-4\n",
"medium"
]
},
{
"cell_type": "code",
"execution_count": 114,
"id": "40324d7e-d26e-4c30-b8e8-53565d37d07e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" metabolite | \n",
" name | \n",
" hmdb | \n",
" kegg.compound | \n",
" pubchem.compound | \n",
" inchi | \n",
" chebi | \n",
" retention index | \n",
" quantitated m/z | \n",
" Binbase ID | \n",
" PubChem ID | \n",
" KEGG ID | \n",
" InChI Key | \n",
" ri_type | \n",
" abundance | \n",
" mmol_per_litre | \n",
" reaction | \n",
" dilution | \n",
" global_id | \n",
" flux | \n",
"
\n",
" \n",
" \n",
" \n",
" | 60 | \n",
" lcts_m | \n",
" Lactose | \n",
" HMDB00186 | \n",
" C00243 | \n",
" 440995.0 | \n",
" NaN | \n",
" NaN | \n",
" 932179.0 | \n",
" 204.0 | \n",
" 1373.0 | \n",
" 6134.0 | \n",
" C01970 | \n",
" GUBGYTABKSRVRQ-DCSYEGIMSA-N | \n",
" Binbase | \n",
" 4.425044e+06 | \n",
" 388.888889 | \n",
" EX_lcts_m | \n",
" 0.1 | \n",
" EX_lcts(e) | \n",
" 38.888889 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" metabolite name hmdb kegg.compound pubchem.compound inchi chebi \\\n",
"60 lcts_m Lactose HMDB00186 C00243 440995.0 NaN NaN \n",
"\n",
" retention index quantitated m/z Binbase ID PubChem ID KEGG ID \\\n",
"60 932179.0 204.0 1373.0 6134.0 C01970 \n",
"\n",
" InChI Key ri_type abundance mmol_per_litre \\\n",
"60 GUBGYTABKSRVRQ-DCSYEGIMSA-N Binbase 4.425044e+06 388.888889 \n",
"\n",
" reaction dilution global_id flux \n",
"60 EX_lcts_m 0.1 EX_lcts(e) 38.888889 "
]
},
"execution_count": 114,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"medium[medium.metabolite == \"lcts_m\"]"
]
},
{
"cell_type": "markdown",
"id": "0a6988f4-2b0d-4cc4-a8cc-8ec8d012db00",
"metadata": {},
"source": [
"## Checking the growth medium against the DB\n",
"\n",
"But can the bacteria in our model database actually grow on this medium? Let's check and start by downbloading the AGORA model database."
]
},
{
"cell_type": "code",
"execution_count": 115,
"id": "3951839c-39df-43b8-a818-086a4836b744",
"metadata": {},
"outputs": [],
"source": [
"# !wget https://zenodo.org/record/3755182/files/agora103_genus.qza?download=1 -O data/agora103_genus.qza"
]
},
{
"cell_type": "markdown",
"id": "84bf2ceb-3802-43b6-88f3-ca8801792ac9",
"metadata": {},
"source": [
"No we we will check for growth by running the growth medium against any single model."
]
},
{
"cell_type": "code",
"execution_count": 116,
"id": "472052d5-7964-47ba-820d-4031c13876b9",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "ce707336aab3437f8c05b0c03f287d3f",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from micom.workflows.db_media import check_db_medium\n",
"\n",
"check = check_db_medium(\"../data/agora103_genus.qza\", medium, threads=20)"
]
},
{
"cell_type": "markdown",
"id": "cb44d8fc-ef54-4d41-bd37-8848e3ca0a90",
"metadata": {},
"source": [
"`check` now includes the entire manifest plus two new columns: the growth rate and whether the models can grow."
]
},
{
"cell_type": "code",
"execution_count": 117,
"id": "6b2ca71d-cebe-481d-9d03-573265643587",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False 227\n",
"Name: can_grow, dtype: int64"
]
},
"execution_count": 117,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"check.can_grow.value_counts()"
]
},
{
"cell_type": "markdown",
"id": "0b6b51ac-783e-423d-8f8e-552d58d8926a",
"metadata": {},
"source": [
"Okay nothing can grow. We probably miss some important cofactor such as manganese or copper."
]
},
{
"cell_type": "markdown",
"id": "5d6ed642-9d73-4c1d-8250-c16683ac85fe",
"metadata": {},
"source": [
"Let's complete the medium so that all taxa in AGORA can grow at a rate of at least 1e-3.\n",
"\n",
"## Supplementing a growth medium from a skeleton\n",
"\n",
"Sometimes you may start from a few componenents and will want to complete this skeleton medium to reach a certain minimum growth rate across all models in the database. This can be done with `complete_db_medium`. We can minimize either the added total flux, mass or presence of any atom. Since, we want to build a low carb diet here we will minimize the presence of added carbon."
]
},
{
"cell_type": "code",
"execution_count": 118,
"id": "522df358-d9b4-432c-a1d0-08526336715d",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "0655fcb78e4e43288488ee3263758fc1",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from micom.workflows.db_media import complete_db_medium\n",
"\n",
"manifest, imports = complete_db_medium(\"../data/agora103_genus.qza\", medium, growth=0.01, threads=20, max_added_import=10, weights=\"mass\")"
]
},
{
"cell_type": "code",
"execution_count": 119,
"id": "486c0222-8dd7-4d57-a22f-8884a0e9b4c7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True 225\n",
"False 2\n",
"Name: can_grow, dtype: int64"
]
},
"execution_count": 119,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"manifest.can_grow.value_counts()"
]
},
{
"cell_type": "markdown",
"id": "55ea8013-4bec-4a0a-bfa6-0f977aca8d55",
"metadata": {},
"source": [
"`manifest` is the amended manifest as before and `imports` contains the used import fluxes for each model. A new column in the manifest also tells us how many import were added."
]
},
{
"cell_type": "code",
"execution_count": 120,
"id": "c1d5cf19-ebd8-47b8-856e-9cc3af8e33fb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"count 225.000000\n",
"mean 16.511111\n",
"std 6.705615\n",
"min 6.000000\n",
"25% 12.000000\n",
"50% 15.000000\n",
"75% 21.000000\n",
"max 38.000000\n",
"Name: added, dtype: float64"
]
},
"execution_count": 120,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"manifest.added.describe()"
]
},
{
"cell_type": "markdown",
"id": "2063b039-37c2-4186-b998-b5bff2a9a3fd",
"metadata": {},
"source": [
"From this we build up our new medium."
]
},
{
"cell_type": "code",
"execution_count": 121,
"id": "af4512bd-e2ad-462e-9af2-c012b82e5ee7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(182, 4)"
]
},
"execution_count": 121,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fluxes = imports.max()\n",
"fluxes = fluxes[(fluxes > 1e-6) | fluxes.index.isin(medium.reaction)]\n",
"completed = pd.DataFrame({\n",
" \"reaction\": fluxes.index,\n",
" \"metabolite\": fluxes.index.str.replace(\"^EX_\", \"\", regex=True),\n",
" \"global_id\": fluxes.index.str.replace(\"_m$\", \"(e)\", regex=True),\n",
" \"flux\": fluxes\n",
"})\n",
"completed.shape"
]
},
{
"cell_type": "markdown",
"id": "c03ca14d-8cf1-43b9-b027-099a74c4380d",
"metadata": {},
"source": [
"Let's also export the medium as Qiime 2 artifact which can be read with `q2-micom` or the normal micom package."
]
},
{
"cell_type": "code",
"execution_count": 122,
"id": "206feb9b-77f6-4ecc-94f8-b48f11d605df",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'../media/breast_milk_agora.qza'"
]
},
"execution_count": 122,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from qiime2 import Artifact\n",
"\n",
"arti = Artifact.import_data(\"MicomMedium[Global]\", completed)\n",
"arti.save(\"../media/breast_milk_agora.qza\")"
]
},
{
"cell_type": "markdown",
"id": "fad8a4c5-8ed2-4780-a3e2-ce7d4886e87a",
"metadata": {},
"source": [
"## Validation\n",
"\n",
"As a last step we validate the created medium."
]
},
{
"cell_type": "code",
"execution_count": 123,
"id": "34d01cfe-eb00-486c-9c32-9f7ec6ccffd3",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "41e8128482ed4bc9aba7839572b74e7b",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"True 227\n",
"Name: can_grow, dtype: int64"
]
},
"execution_count": 123,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"check = check_db_medium(\"../data/agora103_genus.qza\", completed, threads=20)\n",
"check.can_grow.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 124,
"id": "5811022a-dcd1-4e70-a3fa-d38bdbf1d82d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"count 227.000000\n",
"mean 0.019931\n",
"std 0.011181\n",
"min 0.000048\n",
"25% 0.010000\n",
"50% 0.019846\n",
"75% 0.025641\n",
"max 0.062186\n",
"Name: growth_rate, dtype: float64"
]
},
"execution_count": 124,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"check.growth_rate.describe()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}