{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Calculating Feature Log-Ratios Directly using Qarcoal\n", "\n", "Occasionally we might be only interested in the log-ratios between two features and not the ranks. In this case, it is useful to have a way to skip the step of running DEICODE/Songbird. This also has the advantage of allowing programmatic generation (through CLI or Python) of log-ratios for further visualization/analysis. We can perform this action using **Qarcoal**.\n", "\n", "We will use the same dataset featured in the Qurro DEICODE tutorial (`deicode_example.ipynb`).\n", "\n", "## Requirements\n", "\n", "This notebook relies on QIIME 2, DEICODE, and Qurro all being installed. You should be in a QIIME 2 conda environment." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Setting Up\n", "\n", "In this section, we replace the output directory with an empty directory. This just lets us run this notebook multiple times, without any tools complaining about overwriting files." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# Clear the output directory so we can write these files there\n", "!rm -rf output\n", "# Since git doesn't keep track of empty directories, create the output/ directory if it doesn't already exist\n", "# (if it does already exist, -p ensures that an error won't be thrown)\n", "!mkdir -p output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Using Qarcoal Through QIIME 2\n", "\n", "Currently, Qarcoal can only be used through QIIME 2. However, we are working on a standalone version -- so stay tuned.\n", "\n", "### 1.A. Import the feature table into a QIIME 2 artifact" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32mImported ../DEICODE_sleep_apnea/input/qiita_10422_table.biom as BIOMV210DirFmt to output/qiita_10422_table.biom.qza\u001b[0m\r\n", "\u001b[0m" ] } ], "source": [ "!qiime tools import \\\n", " --input-path ../DEICODE_sleep_apnea/input/qiita_10422_table.biom \\\n", " --output-path output/qiita_10422_table.biom.qza \\\n", " --type FeatureTable[Frequency]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.B. Run Qarcoal!\n", "Now, we can run Qarcoal through Qiime2 on our imported BIOM table. This produces one output: a table of samples with their associated log-ratios of selected features. We will use `g__Allobaculum` as our numerator string and `g__Coprococcus` as our denominator string for demonstration.\n", "\n", "Note that Qarcoal requires the taxonomy file to be a `.qza` artifact rather than a `.tsv` file." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: \u001b[94mqiime qurro qarcoal\u001b[0m [OPTIONS]\r\n", "\r\n", " Compute the log-ratio of two specified feature strings by searching\r\n", " taxonomy for incidence of each string, summing all relevant feature counts\r\n", " for each sample, and taking the natural log of the numerator sum divided\r\n", " by denominator sum.\r\n", "\r\n", "\u001b[1mInputs\u001b[0m:\r\n", " \u001b[94m\u001b[4m--i-table\u001b[0m ARTIFACT \u001b[32mFeatureTable[Frequency]\u001b[0m\r\n", " Feature table describing the abundances of the\r\n", " features in samples. \u001b[35m[required]\u001b[0m\r\n", " \u001b[94m\u001b[4m--i-taxonomy\u001b[0m ARTIFACT \u001b[32mFeatureData[Taxonomy]\u001b[0m\r\n", " Taxonomy information to be used for selecting\r\n", " features in log-ratio. \u001b[35m[required]\u001b[0m\r\n", "\u001b[1mParameters\u001b[0m:\r\n", " \u001b[94m\u001b[4m--p-num-string\u001b[0m TEXT Numerator string to search for in taxonomy.\r\n", " \u001b[35m[required]\u001b[0m\r\n", " \u001b[94m\u001b[4m--p-denom-string\u001b[0m TEXT Denominator string to search for in taxonomy.\r\n", " \u001b[35m[required]\u001b[0m\r\n", " \u001b[94m--m-samples-to-use-file\u001b[0m METADATA...\r\n", " (multiple Sample metadata file. If provided, log-ratios will\r\n", " arguments will be only be calculated from sample labels present in this\r\n", " merged) file. \u001b[35m[optional]\u001b[0m\r\n", " \u001b[94m--p-allow-shared-features\u001b[0m / \u001b[94m--p-no-allow-shared-features\u001b[0m\r\n", " Boolean value denoting handling of features shared\r\n", " between numerator and denominator. If False, an error\r\n", " is raised if features are shared. If True, shared\r\n", " features are retained in log-ratio computation.\r\n", " \u001b[35m[default: False]\u001b[0m\r\n", "\u001b[1mOutputs\u001b[0m:\r\n", " \u001b[94m\u001b[4m--o-qarcoal-log-ratios\u001b[0m ARTIFACT \u001b[32mSampleData[LogRatios]\u001b[0m\r\n", " \u001b[35m[required]\u001b[0m\r\n", "\u001b[1mMiscellaneous\u001b[0m:\r\n", " \u001b[94m--output-dir\u001b[0m PATH Output unspecified results to a directory\r\n", " \u001b[94m--verbose\u001b[0m / \u001b[94m--quiet\u001b[0m Display verbose output to stdout and/or stderr\r\n", " during execution of this action. Or silence output if\r\n", " execution is successful (silence is golden).\r\n", " \u001b[94m--example-data\u001b[0m PATH Write example data and exit.\r\n", " \u001b[94m--citations\u001b[0m Show citations and exit.\r\n", " \u001b[94m--help\u001b[0m Show this message and exit.\r\n" ] } ], "source": [ "!qiime qurro qarcoal --help" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32mSaved SampleData[LogRatios] to: output/allobaculum_coprococcus_log_ratios.qza\u001b[0m\r\n", "\u001b[0m" ] } ], "source": [ "!qiime qurro qarcoal \\\n", " --i-table output/qiita_10422_table.biom.qza \\\n", " --i-taxonomy ../DEICODE_sleep_apnea/input/taxonomy.qza \\\n", " --p-num-string g__Allobaculum \\\n", " --p-denom-string g__Coprococcus \\\n", " --o-qarcoal-log-ratios output/allobaculum_coprococcus_log_ratios.qza" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We just ran Qarcoal! The output log-ratios for each sample are contained in the `output/allobaculum_coprococcus_log_ratios.qza` artifact that was just produced. You can do lots of things with this artifact, including loading it programmatically into Python using QIIME 2's Artifact API, running `qiime metadata tabulate` on it to produce a sortable table showing the log-ratios, etc." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Verifying Qarcoal's output against Qurro's output in Python\n", "We're going to load the log-ratios we just produced in Qarcoal into Python using the Artifact API. This will let us demonstrate that Qarcoal's log-ratios match up with the same log-ratios Qurro would produce for the same data." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from qiime2 import Artifact, Metadata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.A. Load Qarcoal Log-Ratios in Python using QIIME 2's Artifact API" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Num_SumDenom_Sumlog_ratio
Sample-ID
10422.21.F.49.0659.0-4.293499
10422.22.F.85.03266.0-6.481883
10422.24.F.536.02028.0-4.031286
10422.17.F.98.050.0-1.832581
10422.18.F.129.0141.0-2.751535
\n", "
" ], "text/plain": [ " Num_Sum Denom_Sum log_ratio\n", "Sample-ID \n", "10422.21.F.4 9.0 659.0 -4.293499\n", "10422.22.F.8 5.0 3266.0 -6.481883\n", "10422.24.F.5 36.0 2028.0 -4.031286\n", "10422.17.F.9 8.0 50.0 -1.832581\n", "10422.18.F.12 9.0 141.0 -2.751535" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qarcoal_log_ratios = Artifact.load(\"output/allobaculum_coprococcus_log_ratios.qza\")\n", "qarcoal_log_ratios_df = qarcoal_log_ratios.view(pd.DataFrame)\n", "qarcoal_log_ratios_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.B. Run DEICODE\n", "\n", "In order to run Qurro, we need some sort of feature rankings (at least right now). So we're going to run DEICODE.\n", "\n", "We're going to do this through the Artifact API, but you could just as easily run the following through the command line. (Please see the DEICODE example notebook in this repository, and the [DEICODE documentation](https://github.com/biocore/DEICODE), for more details on using DEICODE.)\n", "\n", "**NOTE:** By default, DEICODE filters your input feature table. We will override this by setting both `min-feature-count` and `min-sample-count` to 0. If you want to match the DEICODE default filtering settings, filter your feature table to match DEICODE and pass a QIIME2 Metadata file containing the sample IDs to Qarcoal with the `--m-samples-to-use-file` flag." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "scrolled": true }, "outputs": [], "source": [ "from qiime2.plugins import deicode\n", "\n", "table = Artifact.load(\"output/qiita_10422_table.biom.qza\")\n", "\n", "ordination, dist_matrix = deicode.actions.rpca(\n", " table = table,\n", " min_sample_count = 0,\n", " min_feature_count = 0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.C. Run Qurro" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then input the ordination into Qurro, save the visualization, and compare our results." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "'output/qurro_viz.qzv'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from qiime2.plugins import qurro\n", "\n", "metadata = Metadata.load(\"../DEICODE_sleep_apnea/input/qiita_10422_metadata.tsv\")\n", "taxonomy = Metadata.load(\"../DEICODE_sleep_apnea/input/taxonomy.tsv\")\n", "\n", "qurro_viz = qurro.actions.loading_plot(\n", " ranks = ordination,\n", " table = table,\n", " sample_metadata = metadata,\n", " feature_metadata = taxonomy)\n", "\n", "qurro_viz.visualization.save(\"output/qurro_viz.qzv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.D. Compute the same log-ratio as before (`g__Allobaculum` to `g__Coprococcus`) in Qurro\n", "\n", "Open the visualization [here](https://view.qiime2.org/) and type in `g__Allobaculum` in the numerator search bar and `g__Coprococcus` in the denominator search bar. Make sure you select the option to filter features from \"Taxon\" rather than \"Feature ID.\"\n", "\n", "(This screenshot is from Qurro v0.4.0, so future versions of Qurro might look a bit different.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![img](imgs/qurro_feature_search.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Click the `Export sample data` button and save the resulting `sample_plot_data.tsv` file to the `input/` directory. (**NOTE:** We've provided a pre-downloaded version of this file in the `input/` directory so that you can run this entire notebook automatically, but feel free to overwrite the `input/sample_plot_data.tsv` file with your own.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.E. Comparing the output of Qurro and Qarcoal!\n", "\n", "We can now load the Qurro results and compare them with the Qarcoal results to make sure they match." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Current_Natural_Log_Ratioageage.1
Sample ID
10422.21.F.3-6.08298011.011.0
10422.28.F.94.65613514.014.0
10422.24.F.10-4.96439714.514.5
10422.21.F.10NaN14.514.5
10422.27.F.82.93725913.513.5
\n", "
" ], "text/plain": [ " Current_Natural_Log_Ratio age age.1\n", "Sample ID \n", "10422.21.F.3 -6.082980 11.0 11.0\n", "10422.28.F.9 4.656135 14.0 14.0\n", "10422.24.F.10 -4.964397 14.5 14.5\n", "10422.21.F.10 NaN 14.5 14.5\n", "10422.27.F.8 2.937259 13.5 13.5" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qurro_log_ratios_df = pd.read_csv(\"input/sample_plot_data.tsv\", sep=\"\\t\", index_col=0)\n", "qurro_log_ratios_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that the Qurro results have at least one `NaN` log-ratio (it's actually a `null` in the TSV file, but Pandas treats it as a `NaN`). This just means that for this sample, the log-ratio could not be calculated due to 0s.\n", "\n", "We can drop samples with these invalid log-ratios from the DataFrame using [`pd.isna()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isna.html) and some [mildly fancy indexing](https://stackoverflow.com/questions/46054318/tilde-sign-in-python-dataframe)." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Current_Natural_Log_Ratioageage.1
Sample ID
10422.21.F.3-6.08298011.011.0
10422.28.F.94.65613514.014.0
10422.24.F.10-4.96439714.514.5
10422.27.F.82.93725913.513.5
10422.19.F.4-6.41181811.511.5
\n", "
" ], "text/plain": [ " Current_Natural_Log_Ratio age age.1\n", "Sample ID \n", "10422.21.F.3 -6.082980 11.0 11.0\n", "10422.28.F.9 4.656135 14.0 14.0\n", "10422.24.F.10 -4.964397 14.5 14.5\n", "10422.27.F.8 2.937259 13.5 13.5\n", "10422.19.F.4 -6.411818 11.5 11.5" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qurro_log_ratios_df = qurro_log_ratios_df[~qurro_log_ratios_df[\"Current_Natural_Log_Ratio\"].isna()]\n", "qurro_log_ratios_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we can get a preliminary sense of how well the two methods coincide by looking at the number of samples present." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qurro_log_ratios_df.shape[0] == qarcoal_log_ratios_df.shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's a good sign, but let's be more rigorous and make sure the samples are the same." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "set(qurro_log_ratios_df.index) == set(qarcoal_log_ratios_df.index)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, let's make sure the log-ratios themselves are the same. Note that Qurro calculates log-ratios using Javascript, while Qarcoal uses Python. As a result, the individual values may differ very slightly due to implementation of the logarithm function. We will use `np.allclose` to check that the two are equal within a tolerance." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from numpy import allclose\n", "\n", "qurro_values = qurro_log_ratios_df.sort_index()['Current_Natural_Log_Ratio'].to_numpy()\n", "qarcoal_values = qarcoal_log_ratios_df.sort_index()['log_ratio'].to_numpy()\n", "\n", "allclose(qurro_values, qarcoal_values)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Success! Our Qarcoal-generated log-ratios are approximately equal to our Qurro-generated ones.\n", "\n", "We hope you find Qarcoal useful, and please contact us or open an issue if you having questions or suggestions about using Qarcoal." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 2 }