{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Visualizing DEICODE feature loadings with Qurro\n", "In this example, we use data from [this Qiita study](https://qiita.ucsd.edu/study/description/10422). It's associated with the following paper:\n", "\n", "Tripathi, A., Melnik, A. V., Xue, J., Poulsen, O., Meehan, M. J., Humphrey, G., ... & Haddad, G. (2018). Intermittent hypoxia and hypercapnia, a hallmark of obstructive sleep apnea, alters the gut microbiome and metabolome. _mSystems, 3_(3), e00020-18.\n", "\n", "## Requirements\n", "This notebook relies on QIIME 2, DEICODE, q2-emperor, and Qurro all being installed. You should be in a QIIME 2 conda environment." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Setting up\n", "In this section, we replace the output directory with an empty directory. This just lets us run this notebook multiple times, without any tools complaining about overwriting files." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Clear the output directory so we can write these files there\n", "!rm -rf output\n", "# Since git doesn't keep track of empty directories, create the output/ directory if it doesn't already exist\n", "# (if it does already exist, -p ensures that an error won't be thrown)\n", "!mkdir -p output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Using DEICODE and Qurro through QIIME 2\n", "You can use DEICODE and Qurro inside or outside of QIIME 2. In this section, we'll use DEICODE and Qurro from within QIIME 2; in the next section, we'll use these tools outside of QIIME 2.\n", "\n", "If you just installed DEICODE or Qurro, it's advised that you run `qiime dev refresh-cache` on your system afterwards in order to get QIIME 2 to \"find\" these tools' QIIME 2 plugins.\n", "\n", "### 1. A. Using DEICODE through QIIME 2\n", "In order to use this dataset's BIOM table in QIIME 2, we need to import it as a `FeatureTable[Frequency]` QIIME 2 artifact." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32mImported input/qiita_10422_table.biom as BIOMV210DirFmt to output/qiita_10422_table.biom.qza\u001b[0m\r\n", "\u001b[0m" ] } ], "source": [ "!qiime tools import \\\n", " --input-path input/qiita_10422_table.biom \\\n", " --output-path output/qiita_10422_table.biom.qza \\\n", " --type FeatureTable[Frequency]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we can run DEICODE through QIIME 2 on our imported BIOM table. This produces two output files: a biplot and a distance matrix. (We're going to use Qurro to visualize the feature loadings contained in the biplot output file.)\n", "\n", "Please see [DEICODE's official documentation](https://library.qiime2.org/plugins/deicode) for more information about how it works and how its output files are formatted." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33mQIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.\u001b[0m\n", "\u001b[32mSaved PCoAResults % Properties('biplot') to: output/ordination.qza\u001b[0m\n", "\u001b[32mSaved DistanceMatrix to: output/dist_matrix.qza\u001b[0m\n", "\u001b[0m" ] } ], "source": [ "!qiime deicode auto-rpca \\\n", " --i-table output/qiita_10422_table.biom.qza \\\n", " --o-biplot output/ordination.qza \\\n", " --o-distance-matrix output/dist_matrix.qza" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1. A. I. Optional: Visualizing the DEICODE biplot in [Emperor](https://docs.qiime2.org/2019.1/plugins/available/emperor/)\n", "This step isn't required if you just want to use DEICODE with Qurro. However, it provides some interesting context about the biplot that DEICODE just generated.\n", "\n", "To quote the DEICODE documentation linked above:\n", "> Biplots are exploratory visualization tools that allow us to represent the features (i.e. taxonomy or OTUs) that strongly influence the principal component axis as arrows. The interpretation of the compositional biplot differs slightly from classical biplot interpretation [...] The important features with regard to sample clusters are not a single arrow but [...] the log ratio between features represented by arrows pointing in different directions." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32mSaved Visualization to: output/biplot.qzv\u001b[0m\r\n", "\u001b[0m" ] } ], "source": [ "!qiime emperor biplot \\\n", " --i-biplot output/ordination.qza \\\n", " --m-sample-metadata-file input/qiita_10422_metadata.tsv \\\n", " --m-feature-metadata-file input/taxonomy.tsv \\\n", " --o-visualization output/biplot.qzv \\\n", " --p-number-of-features 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `biplot.qzv` file we just generated can be visualized in Emperor (either using `qiime tools view` or by uploading it to [view.qiime2.org](https://view.qiime2.org)). As mentioned above, arrows in the biplot represent features; you can try changing the `--p-number-of-features` parameter to adjust how many arrows are shown in the biplot." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. B. Using Qurro through QIIME 2\n", "Since our \"feature rankings\" are the (sorted) feature loadings within the biplot DEICODE just produced, we'll use the `qiime qurro loading-plot` command." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: \u001b[94mqiime qurro loading-plot\u001b[0m [OPTIONS]\r\n", "\r\n", " Generates an interactive visualization of feature loadings in tandem with\r\n", " a visualization of the log-ratios of selected features' sample abundances.\r\n", "\r\n", "\u001b[1mInputs\u001b[0m:\r\n", " \u001b[94m\u001b[4m--i-ranks\u001b[0m ARTIFACT \u001b[32mPCoAResults % Properties('biplot')\u001b[0m\r\n", " A biplot containing feature loadings. \u001b[35m[required]\u001b[0m\r\n", " \u001b[94m\u001b[4m--i-table\u001b[0m ARTIFACT \u001b[32mFeatureTable[Frequency]\u001b[0m\r\n", " A BIOM table describing the abundances of the ranked\r\n", " features in samples. Note that empty samples and\r\n", " features will be removed from the Qurro visualization.\r\n", " \u001b[35m[required]\u001b[0m\r\n", "\u001b[1mParameters\u001b[0m:\r\n", " \u001b[94m\u001b[4m--m-sample-metadata-file\u001b[0m METADATA...\r\n", " (multiple Sample metadata. In Qurro visualizations, you can use\r\n", " arguments will sample metadata fields to change the x-axis and colors\r\n", " be merged) in the sample plot. \u001b[35m[required]\u001b[0m\r\n", " \u001b[94m--m-feature-metadata-file\u001b[0m METADATA...\r\n", " (multiple Feature metadata (for example, if your features are\r\n", " arguments will ASVs or OTUs, this could be taxonomy). You can use\r\n", " be merged) feature metadata fields to filter features in the rank\r\n", " plot when selecting log-ratios. \u001b[35m[optional]\u001b[0m\r\n", " \u001b[94m--p-extreme-feature-count\u001b[0m INTEGER\r\n", " If specified, Qurro will only use this many \"extreme\"\r\n", " features from both ends of all of the rankings. This is\r\n", " useful when dealing with huge datasets (e.g. with BIOM\r\n", " tables exceeding 1 million entries), for which running\r\n", " Qurro normally might take a long amount of time or\r\n", " crash due to memory limits. Note that the automatic\r\n", " removal of empty samples and features from the table\r\n", " will be done *after* this filtering step. \u001b[35m[optional]\u001b[0m\r\n", " \u001b[94m--p-debug\u001b[0m / \u001b[94m--p-no-debug\u001b[0m\r\n", " If this flag is used, Qurro will output debug\r\n", " messages. Note that you'll also need to use the\r\n", " --verbose option to see these messages.\r\n", " \u001b[35m[default: False]\u001b[0m\r\n", "\u001b[1mOutputs\u001b[0m:\r\n", " \u001b[94m\u001b[4m--o-visualization\u001b[0m VISUALIZATION\r\n", " \u001b[35m[required]\u001b[0m\r\n", "\u001b[1mMiscellaneous\u001b[0m:\r\n", " \u001b[94m--output-dir\u001b[0m PATH Output unspecified results to a directory\r\n", " \u001b[94m--verbose\u001b[0m / \u001b[94m--quiet\u001b[0m Display verbose output to stdout and/or stderr during\r\n", " execution of this action. Or silence output if\r\n", " execution is successful (silence is golden).\r\n", " \u001b[94m--example-data\u001b[0m PATH Write example data and exit.\r\n", " \u001b[94m--citations\u001b[0m Show citations and exit.\r\n", " \u001b[94m--help\u001b[0m Show this message and exit.\r\n" ] } ], "source": [ "!qiime qurro loading-plot --help" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "689 feature(s) in the BIOM table were not present in the feature rankings.\n", "These feature(s) have been removed from the visualization.\n", "\u001b[32mSaved Visualization to: output/qurro_plot_q2.qzv\u001b[0m\n", "\u001b[0m" ] } ], "source": [ "!qiime qurro loading-plot \\\n", " --i-ranks output/ordination.qza \\\n", " --i-table output/qiita_10422_table.biom.qza \\\n", " --m-sample-metadata-file input/qiita_10422_metadata.tsv \\\n", " --m-feature-metadata-file input/taxonomy.tsv \\\n", " --verbose \\\n", " --o-visualization output/qurro_plot_q2.qzv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's it! Now, we've created a QZV file (describing a Qurro visualization) at `output/qurro_plot_q2.qzv`. As with the `biplot.qzv` file created by step 1.B. above, you can view this visualization in one of the following ways:\n", " 1. Upload the QZV file to [view.qiime2.org](https://view.qiime2.org).\n", " 2. View the QZV file using `qiime tools view`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Using DEICODE and Qurro as standalone tools\n", "We don't need to use DEICODE and Qurro through QIIME 2; if you want, you can run these tools outside of QIIME 2. Although this means you don't have access to some of QIIME 2's functionality (e.g. provenance tracking, or artifact semantic types), the results you get should be the same.\n", "\n", "### 2. A. Using DEICODE as a standalone tool" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "!deicode auto-rpca \\\n", " --in-biom input/qiita_10422_table.biom \\\n", " --output-dir output/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "DEICODE has now generated `ordination.txt` and `distance-matrix.tsv` files in the output directory. We can use the `ordination.txt` file with Qurro (when run outside of QIIME 2).\n", "\n", "### 2. B. Using Qurro as a standalone tool\n", "When we used Qurro through QIIME 2, we had to specify the `loading-plot` command in order to let the Qurro QIIME 2 plugin know we were working with feature loadings.\n", "\n", "Now that we're running Qurro outside of QIIME 2, we don't need to specify this; Qurro can accept either feature differentials or feature loadings as input." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: qurro [OPTIONS]\r\n", "\r\n", " Generates a visualization of feature rankings and log-ratios.\r\n", "\r\n", " The resulting visualization contains two plots. The first plot shows how\r\n", " features are ranked, and the second plot shows the log-ratio of \"selected\"\r\n", " features' abundances within samples.\r\n", "\r\n", " The visualization is interactive, so which features are \"selected\" to\r\n", " construct log-ratios -- as well as various other properties of the\r\n", " visualization -- can be changed by the user.\r\n", "\r\n", "Options:\r\n", " -r, --ranks TEXT Either feature differentials (contained in a\r\n", " TSV file, where each row describes a feature\r\n", " and each column describes a differential\r\n", " field) or a scikit-bio OrdinationResults\r\n", " file for a biplot (containing feature\r\n", " loadings). When sorted numerically,\r\n", " differentials and feature loadings alike\r\n", " provide 'rankings.' [required]\r\n", "\r\n", " -t, --table TEXT A BIOM table describing the abundances of\r\n", " the ranked features in samples. Note that\r\n", " empty samples and features will be removed\r\n", " from the Qurro visualization. [required]\r\n", "\r\n", " -sm, --sample-metadata TEXT Sample metadata, formatted as a TSV file\r\n", " (where each row describes a sample and each\r\n", " column describes a 'metadata' field, and the\r\n", " first column contains sample IDs). In Qurro\r\n", " visualizations, you can use sample metadata\r\n", " fields to change the x-axis and colors in\r\n", " the sample plot. [required]\r\n", "\r\n", " -fm, --feature-metadata TEXT Feature metadata, formatted as a TSV file\r\n", " (where each row describes a feature and each\r\n", " column describes a 'metadata' field, and the\r\n", " first column contains feature IDs). In Qurro\r\n", " visualizations, you can use feature metadata\r\n", " fields to filter features in the rank plot\r\n", " when selecting log-ratios.\r\n", "\r\n", " -o, --output-dir TEXT Directory to write the HTML/JS/... files\r\n", " defining a Qurro visualization to. If this\r\n", " directory already exists, files/directories\r\n", " already within it will be overwritten if\r\n", " necessary. Note that you need to keep the\r\n", " files in this directory together -- moving\r\n", " the index.html file in this directory to\r\n", " another location, without also moving the\r\n", " JS/etc. files, will break the visualization.\r\n", " [required]\r\n", "\r\n", " -x, --extreme-feature-count INTEGER\r\n", " If specified, Qurro will only use this many\r\n", " \"extreme\" features from both ends of all of\r\n", " the rankings. This is useful when dealing\r\n", " with huge datasets (e.g. with BIOM tables\r\n", " exceeding 1 million entries), for which\r\n", " running Qurro normally might take a long\r\n", " amount of time or crash due to memory\r\n", " limits. Note that the automatic removal of\r\n", " empty samples and features from the table\r\n", " will be done *after* this filtering step.\r\n", "\r\n", " --debug If this flag is used, Qurro will output\r\n", " debug messages.\r\n", "\r\n", " --version Show the version and exit.\r\n", " --help Show this message and exit.\r\n" ] } ], "source": [ "!qurro --help" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "689 feature(s) in the BIOM table were not present in the feature rankings.\n", "These feature(s) have been removed from the visualization.\n", "Successfully generated a visualization in the folder output/qurro_plot_standalone/.\n" ] } ], "source": [ "!qurro \\\n", " --ranks output/ordination.txt \\\n", " --table input/qiita_10422_table.biom \\\n", " --sample-metadata input/qiita_10422_metadata.tsv \\\n", " --feature-metadata input/taxonomy.tsv \\\n", " --output-dir output/qurro_plot_standalone/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We just generated a Qurro visualization in the folder `output/qurro_plot_standalone/`. This visualization is analogous to the QZV file we generated above using QIIME 2. You can view this visualization by just opening up `output/qurro_plot_standalone/index.html` in a modern web browser.\n", "\n", "That's it! If you have any more questions about using Qurro, feel free to contact us (see the Qurro README for contact information)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 2 }