{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Visualizing Songbird feature differentials with Qurro\n", "In this example, we use data from the Red Sea metagenome dataset. This particular data was obtained from [Songbird's GitHub repository in its `data/redsea` folder](https://github.com/biocore/songbird/tree/master/data/redsea), and is associated with the following paper:\n", "\n", "Thompson, L. R., Williams, G. J., Haroon, M. F., Shibl, A., Larsen, P., Shorenstein, J., ... & Stingl, U. (2017). Metagenomic covariation along densely sampled environmental gradients in the Red Sea. _The ISME Journal, 11_(1), 138.\n", "\n", "## 2022 note on running Songbird (and also running Qurro)\n", "\n", "A lot has changed since we published these tools in 2019 and 2020! Notably, the [pandas](https://pandas.pydata.org/) (and, as a result, QIIME 2 versions) required by Qurro (version 0.8.0 and higher) and Songbird are now incompatible, as of writing:\n", "\n", "| Tool | Required `pandas` version | Required QIIME 2 version |\n", "| ---- | ---- | ---- |\n", "| Qurro | `>= 1` | `>= 2020.11` |\n", "| Songbird | `< 1` | `>= 2019.7, <= 2020.6` |\n", "\n", "This implies that installing Qurro and Songbird into the same conda environment is not feasible. However, it's possible to install them into separate conda environments; the differentials output by Songbird are still completely compatible with Qurro.\n", "\n", "__To get around this issue for the purposes of this tutorial__, we will run Songbird from within one QIIME 2 conda environment (version `2020.6`) and run Qurro from within another QIIME 2 conda environment (version `2022.2`). (Getting Jupyter and conda to play nicely can be a bit of a pain, but the `nb_conda_kernels` package should help make it easier to switch between conda environments within a notebook. That being said, it'll probably be easier to replicate these analyses outside of a Jupyter notebook.)\n", "\n", "For the most up-to-date details about how to install and run Songbird, please see [its documentation](https://github.com/biocore/songbird/).\n", "\n", "## Requirements\n", "This notebook relies on two QIIME 2 conda environments being installed, as discussed above: one containing Songbird, and one containing Qurro. See above for details on the exact versions required." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Setting up\n", "In this section, we replace the output directory with an empty directory. This just lets us run this notebook multiple times, without any tools complaining about overwriting files." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Clear the output directory so we can write these files there\n", "!rm -rf output\n", "# Since git doesn't keep track of empty directories, create the output/ directory if it doesn't already exist\n", "# (if it does already exist, -p ensures that an error won't be thrown)\n", "!mkdir -p output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Using Songbird and Qurro through QIIME 2 (using two conda environments)\n", "\n", "### 1. A. Using Songbird through QIIME 2 (`>= 2019.7, <= 2020.6`)\n", "\n", "__This should be run from a QIIME 2 conda environment in which Songbird (but not Qurro) is installed.__\n", "\n", "If you just installed Songbird, it's advised that you run `qiime dev refresh-cache` on your system afterwards in order to get QIIME 2 to \"find\" its QIIME 2 plugin.\n", "\n", "In order to use this dataset's BIOM table in QIIME 2, we need to import it as a `FeatureTable[Frequency]` QIIME 2 artifact." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32mImported input/redsea.biom as BIOMV210DirFmt to output/redsea.biom.qza\u001b[0m\r\n" ] } ], "source": [ "!qiime tools import \\\n", " --input-path input/redsea.biom \\\n", " --output-path output/redsea.biom.qza \\\n", " --type FeatureTable[Frequency]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we can run Songbird through QIIME 2 on our imported BIOM table. This produces three output files, but the main one we care about for Qurro is the `FeatureData[Differential]` artifact (which will be stored in `output/differentials.qza`). This artifact contains **feature differentials**: as Songbird's documentation puts it, these correspond to \"...the ordering of the coefficients within a covariate.\"\n", "\n", "Please see [Songbird's documentation](https://github.com/biocore/songbird/) for more information about how it works and how its output files are formatted.\n", "\n", "#### Why these hyperparameters?\n", "These hyperparameters (in particular, `epochs` and `differential-prior`) were selected based on experimentation with Tensorboard. See Songbird's [FAQs](https://github.com/biocore/songbird/#faqs) for details on how to use Tensorboard and select these sort of hyperparameters for your own datasets (this is important, but the question of how to do this is beyond the scope of this tutorial)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33mQIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.\u001b[0m\n", "\u001b[32mSaved FeatureData[Differential] to: output/differentials.qza\u001b[0m\n", "\u001b[32mSaved SampleData[SongbirdStats] to: output/regression-stats.qza\u001b[0m\n", "\u001b[32mSaved PCoAResults % Properties('biplot') to: output/regression-biplot.qza\u001b[0m\n" ] } ], "source": [ "!qiime songbird multinomial \\\n", " --i-table output/redsea.biom.qza \\\n", " --m-metadata-file input/redsea_metadata.txt \\\n", " --p-formula \"Depth+Temperature+Salinity+Oxygen+Fluorescence+Nitrate\" \\\n", " --p-epochs 10000 \\\n", " --p-differential-prior 0.5 \\\n", " --o-differentials output/differentials.qza \\\n", " --o-regression-stats output/regression-stats.qza \\\n", " --o-regression-biplot output/regression-biplot.qza" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. B. Using Qurro through QIIME 2 (`>= 2020.11`)\n", "\n", "__At this point, you should switch to a newer QIIME 2 environment with which Qurro will be compatible.__\n", "\n", "Since our \"feature rankings\" are the (sorted) feature differentials that Songbird just produced, we'll use the `qiime qurro differential-plot` command." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: \u001b[94mqiime qurro differential-plot\u001b[0m [OPTIONS]\r\n", "\r\n", " Generates an interactive visualization of feature differentials in tandem\r\n", " with a visualization of the log-ratios of selected features' sample\r\n", " abundances.\r\n", "\r\n", "\u001b[1mInputs\u001b[0m:\r\n", " \u001b[94m\u001b[4m--i-ranks\u001b[0m ARTIFACT \u001b[32mFeatureData[Differential]\u001b[0m\r\n", " Feature differentials. \u001b[35m[required]\u001b[0m\r\n", " \u001b[94m\u001b[4m--i-table\u001b[0m ARTIFACT \u001b[32mFeatureTable[Frequency]\u001b[0m\r\n", " A BIOM table describing the abundances of the ranked\r\n", " features in samples. Note that empty samples and\r\n", " features will be removed from the Qurro visualization.\r\n", " \u001b[35m[required]\u001b[0m\r\n", "\u001b[1mParameters\u001b[0m:\r\n", " \u001b[94m\u001b[4m--m-sample-metadata-file\u001b[0m METADATA...\r\n", " (multiple Sample metadata. In Qurro visualizations, you can use\r\n", " arguments will sample metadata fields to change the x-axis and colors\r\n", " be merged) in the sample plot. \u001b[35m[required]\u001b[0m\r\n", " \u001b[94m--m-feature-metadata-file\u001b[0m METADATA...\r\n", " (multiple Feature metadata (for example, if your features are\r\n", " arguments will ASVs or OTUs, this could be taxonomy). You can use\r\n", " be merged) feature metadata fields to filter features in the rank\r\n", " plot when selecting log-ratios. \u001b[35m[optional]\u001b[0m\r\n", " \u001b[94m--p-extreme-feature-count\u001b[0m INTEGER\r\n", " If specified, Qurro will only use this many \"extreme\"\r\n", " features from both ends of all of the rankings. This is\r\n", " useful when dealing with huge datasets (e.g. with BIOM\r\n", " tables exceeding 1 million entries), for which running\r\n", " Qurro normally might take a long amount of time or\r\n", " crash due to memory limits. Note that the automatic\r\n", " removal of empty samples and features from the table\r\n", " will be done *after* this filtering step. \u001b[35m[optional]\u001b[0m\r\n", " \u001b[94m--p-debug\u001b[0m / \u001b[94m--p-no-debug\u001b[0m\r\n", " If this flag is used, Qurro will output debug\r\n", " messages. Note that you'll also need to use the\r\n", " --verbose option to see these messages.\r\n", " \u001b[35m[default: False]\u001b[0m\r\n", "\u001b[1mOutputs\u001b[0m:\r\n", " \u001b[94m\u001b[4m--o-visualization\u001b[0m VISUALIZATION\r\n", " \u001b[35m[required]\u001b[0m\r\n", "\u001b[1mMiscellaneous\u001b[0m:\r\n", " \u001b[94m--output-dir\u001b[0m PATH Output unspecified results to a directory\r\n", " \u001b[94m--verbose\u001b[0m / \u001b[94m--quiet\u001b[0m Display verbose output to stdout and/or stderr during\r\n", " execution of this action. Or silence output if\r\n", " execution is successful (silence is golden).\r\n", " \u001b[94m--example-data\u001b[0m PATH Write example data and exit.\r\n", " \u001b[94m--citations\u001b[0m Show citations and exit.\r\n", " \u001b[94m--help\u001b[0m Show this message and exit.\r\n" ] } ], "source": [ "!qiime qurro differential-plot --help" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32mSaved Visualization to: output/qurro_plot_q2.qzv\u001b[0m\r\n", "\u001b[0m" ] } ], "source": [ "!qiime qurro differential-plot \\\n", " --i-ranks output/differentials.qza \\\n", " --i-table output/redsea.biom.qza \\\n", " --m-sample-metadata-file input/redsea_metadata.txt \\\n", " --m-feature-metadata-file input/feature_metadata.txt \\\n", " --verbose \\\n", " --o-visualization output/qurro_plot_q2.qzv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's it! Now, we've created a QZV file (describing a Qurro visualization) at `output/qurro_plot_q2.qzv`. You can view this visualization in one of the following ways:\n", " 1. Upload the QZV file to [view.qiime2.org](https://view.qiime2.org).\n", " 2. View the QZV file using `qiime tools view`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Using Songbird and Qurro as standalone tools (again, using two conda environments)\n", "We don't need to use Songbird and Qurro through QIIME 2; if you want, you can run these tools outside of QIIME 2. Although this means you don't have access to some of QIIME 2's functionality (e.g. provenance tracking, or artifact semantic types), the results you get should be roughly the same. (We say \"roughly\" because some of the machine learning methods used by Songbird involve randomness.)\n", "\n", "As with the QIIME 2 examples above, Songbird and Qurro are incompatible -- they have conflicting dependencies. We recommend using conda, so that you can install Songbird and Qurro into two separate environments (and switch between these as needed).\n", "\n", "### 2. A. Using Songbird as a standalone tool\n", "\n", "__This should be run from a conda environment in which Songbird (but not Qurro) is installed.__" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/bin/songbird:191: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.\n", "\n", "2022-07-05 21:04:30.548298: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA\n", "2022-07-05 21:04:30.572787: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1999965000 Hz\n", "2022-07-05 21:04:30.573710: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55feb226f620 initialized for platform Host (this does not guarantee that XLA will be used). Devices:\n", "2022-07-05 21:04:30.573769: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version\n", "WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/bin/songbird:194: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.\n", "\n", "WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:70: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Use `tf.random.categorical` instead.\n", "WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:81: The name tf.random_normal is deprecated. Please use tf.random.normal instead.\n", "\n", "WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:86: Normal.__init__ (from tensorflow.python.ops.distributions.normal) is deprecated and will be removed after 2019-01-01.\n", "Instructions for updating:\n", "The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.\n", "WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/tensorflow_core/python/ops/distributions/normal.py:160: Distribution.__init__ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01.\n", "Instructions for updating:\n", "The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.\n", "WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:95: Multinomial.__init__ (from tensorflow.python.ops.distributions.multinomial) is deprecated and will be removed after 2019-01-01.\n", "Instructions for updating:\n", "The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.\n", "WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:110: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.\n", "\n", "WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:116: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.\n", "\n", "WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/tensorflow_core/python/ops/clip_ops.py:301: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", "WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:124: The name tf.summary.histogram is deprecated. Please use tf.compat.v1.summary.histogram instead.\n", "\n", "WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:125: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.\n", "\n", "WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:127: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.\n", "\n", "WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:131: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.\n", "\n", "WARNING:tensorflow:From /home/marcus/anaconda3/envs/q2-2020.6/lib/python3.6/site-packages/songbird/multinomial.py:163: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.\n", "\n", " 0%| | 0/80000 [00:00