{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Using Qurro with Arbitrary Compositional Data\n", "\n", "Although Qurro was initially designed for use with microbiome sequencing data, it can totally be used on any sort of compositional data. The main challenge is just getting your data formatted properly.\n", "\n", "We're going to demonstrate this by creating a Qurro visualization from \"color composition data for 22 abstract paintings.\" These data were taken from Table 1 of [Aitchison and Greenacre (2002)](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/1467-9876.00275).\n", "\n", "## Requirements\n", "\n", "This notebook relies on [Qurro](https://github.com/biocore/qurro) and [seaborn](https://seaborn.pydata.org/) being installed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Setting up\n", "\n", "In this section, we replace the output directory with an empty directory. This just lets us run this notebook multiple times, without any tools complaining about overwriting files." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2022-07-05T21:46:14.148850Z", "iopub.status.busy": "2022-07-05T21:46:14.147867Z", "iopub.status.idle": "2022-07-05T21:46:15.345873Z", "shell.execute_reply": "2022-07-05T21:46:15.343311Z" } }, "outputs": [], "source": [ "# Clear the output directory so we can write these files there\n", "!rm -rf output\n", "# Since git doesn't keep track of empty directories, create the output/ directory if it doesn't already exist\n", "# (if it does already exist, -p ensures that an error won't be thrown)\n", "!mkdir -p output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Getting the input data ready\n", "\n", "At minimum, three files are needed to generate a Qurro visualization. This section goes into detail on each of these three files, and what they look like for the color composition data.\n", "\n", "### 1.1. Feature Table\n", "This is a table of abundance data detailing the frequencies of _features_ in _samples_. Qurro expects this table to be in the [BIOM format](http://biom-format.org/), but fortunately converting TSV files to BIOM [isn't too bad](http://biom-format.org/documentation/biom_conversion.html).\n", "\n", "#### 1.1.1. Wait, hold on, what do you mean by \"features\" and \"samples\"?\n", "In the color composition data, we consider each of the 22 paintings as a *sample*, and each color (e.g. `Red`) as a *feature*.\n", "\n", "#### 1.1.2. Viewing the example file\n", "We've provided a TSV file **`input/color-table.tsv`** containing the color composition data for the 22 paintings. Notice how the columns are samples, and the rows are features." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2022-07-05T21:46:15.354818Z", "iopub.status.busy": "2022-07-05T21:46:15.354129Z", "iopub.status.idle": "2022-07-05T21:46:15.750792Z", "shell.execute_reply": "2022-07-05T21:46:15.750251Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
12345678910...13141516171819202122
FeatureID
Black0.1250.1430.1470.1640.1970.1570.1530.1150.1780.164...0.1550.1260.1990.1630.1360.1840.1690.1460.2000.135
White0.2430.2240.2310.2090.1510.2560.2320.2490.1670.183...0.2510.2730.1700.1960.1850.1520.2070.2400.1720.225
Blue0.1530.1110.0580.1200.1320.0720.1010.1760.0480.158...0.0910.0450.0800.1070.1620.1100.1110.1410.0590.217
Red0.0310.0510.1290.0470.0330.1160.0620.0250.1430.027...0.0850.1560.0760.0540.0200.0390.0570.0380.1200.019
Yellow0.1810.1590.1330.1780.1880.1530.1700.1760.1180.186...0.1610.1310.1580.1440.1930.1650.1560.1840.1360.187
\n", "

5 rows × 22 columns

\n", "
" ], "text/plain": [ " 1 2 3 4 5 6 7 8 9 \\\n", "FeatureID \n", "Black 0.125 0.143 0.147 0.164 0.197 0.157 0.153 0.115 0.178 \n", "White 0.243 0.224 0.231 0.209 0.151 0.256 0.232 0.249 0.167 \n", "Blue 0.153 0.111 0.058 0.120 0.132 0.072 0.101 0.176 0.048 \n", "Red 0.031 0.051 0.129 0.047 0.033 0.116 0.062 0.025 0.143 \n", "Yellow 0.181 0.159 0.133 0.178 0.188 0.153 0.170 0.176 0.118 \n", "\n", " 10 ... 13 14 15 16 17 18 19 20 \\\n", "FeatureID ... \n", "Black 0.164 ... 0.155 0.126 0.199 0.163 0.136 0.184 0.169 0.146 \n", "White 0.183 ... 0.251 0.273 0.170 0.196 0.185 0.152 0.207 0.240 \n", "Blue 0.158 ... 0.091 0.045 0.080 0.107 0.162 0.110 0.111 0.141 \n", "Red 0.027 ... 0.085 0.156 0.076 0.054 0.020 0.039 0.057 0.038 \n", "Yellow 0.186 ... 0.161 0.131 0.158 0.144 0.193 0.165 0.156 0.184 \n", "\n", " 21 22 \n", "FeatureID \n", "Black 0.200 0.135 \n", "White 0.172 0.225 \n", "Blue 0.059 0.217 \n", "Red 0.120 0.019 \n", "Yellow 0.136 0.187 \n", "\n", "[5 rows x 22 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from qurro._metadata_utils import read_metadata_file\n", "table = read_metadata_file(\"input/color-table.tsv\")\n", "table.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1.1.3. Converting from TSV to BIOM\n", "We need to convert this TSV file to a BIOM file that can be used with Qurro:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/marcus/.npm-global/bin /home/marcus/Dropbox/dotfiles/cmds /home/marcus/.npm-global/bin /home/marcus/Dropbox/dotfiles/cmds /home/marcus/.npm-global/bin /home/marcus/anaconda3/bin /home/marcus/anaconda3/condabin /home/marcus/Dropbox/dotfiles/cmds /usr/local/sbin /usr/local/bin /usr/sbin /usr/bin /sbin /bin /usr/games /usr/local/games /snap/bin /home/marcus/anaconda3/envs/q2-2022.2-unfucked/bin\r\n" ] } ], "source": [ "!echo $PATH" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2022-07-05T21:46:15.753021Z", "iopub.status.busy": "2022-07-05T21:46:15.752869Z", "iopub.status.idle": "2022-07-05T21:46:16.335861Z", "shell.execute_reply": "2022-07-05T21:46:16.333833Z" } }, "outputs": [], "source": [ "!biom convert \\\n", " -i input/color-table.tsv \\\n", " --to-json \\\n", " -o output/color-table.biom" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1.1.4. Summarizing the newly created BIOM file\n", "The ` | head -4` thing below just means \"only show the first four lines of the output summary.\"" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2022-07-05T21:46:16.345322Z", "iopub.status.busy": "2022-07-05T21:46:16.343469Z", "iopub.status.idle": "2022-07-05T21:46:16.919444Z", "shell.execute_reply": "2022-07-05T21:46:16.917542Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Num samples: 22\r\n", "Num observations: 6\r\n", "Total count: 21\r\n", "Table density (fraction of non-zero values): 1.000\r\n" ] } ], "source": [ "!biom summarize-table -i output/color-table.biom | head -4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.2. Sample Metadata\n", "\n", "This is a file containing descriptive information about samples, where each sample has a row in the file and each sample metadata field has a column in the file. Qurro expects this to be a [TSV file](https://en.wikipedia.org/wiki/Tab-separated_values).\n", "\n", "#### 1.2.1. What sort of \"metadata\" do we have for the color composition data?\n", "We don't have much, honestly. Just from Table 1 in [Aitchison and Greenacre (2002)](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/1467-9876.00275), all we really know about a given painting is its color composition.\n", "\n", "For illustrative purposes (we need _some_ sort of sample metadata to run Qurro), we've added `proportion_blue`, `proportion_black`, etc. columns to the sample metadata, as well as a `data_source` column which is just `AitchisonGreenacre2002` for all samples. These columns are obviously a bit silly; if we were super interested in studying _why_ certain paintings seem different, you could imagine us taking the time to investigate and then adding in more useful metadata columns like `artist`, `date painted`, `canvas height`, etc.\n", "\n", "#### 1.2.2. Viewing the example file\n", "We've provided an example TSV file, **`input/color-sample-metadata.tsv`**, containing the sample metadata for the color composition data. This file is suitable as-is for use in Qurro as sample metadata.\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2022-07-05T21:46:16.927027Z", "iopub.status.busy": "2022-07-05T21:46:16.926512Z", "iopub.status.idle": "2022-07-05T21:46:16.967815Z", "shell.execute_reply": "2022-07-05T21:46:16.967201Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
proportion_blackproportion_whiteproportion_blueproportion_redproportion_yellowproportion_otherdata_source
SampleID
10.1250.2430.1530.0310.1810.266AitchisonGreenacre2002
20.1430.2240.1110.0510.1590.313AitchisonGreenacre2002
30.1470.2310.0580.1290.1330.303AitchisonGreenacre2002
40.1640.2090.1200.0470.1780.282AitchisonGreenacre2002
50.1970.1510.1320.0330.1880.299AitchisonGreenacre2002
\n", "
" ], "text/plain": [ " proportion_black proportion_white proportion_blue proportion_red \\\n", "SampleID \n", "1 0.125 0.243 0.153 0.031 \n", "2 0.143 0.224 0.111 0.051 \n", "3 0.147 0.231 0.058 0.129 \n", "4 0.164 0.209 0.120 0.047 \n", "5 0.197 0.151 0.132 0.033 \n", "\n", " proportion_yellow proportion_other data_source \n", "SampleID \n", "1 0.181 0.266 AitchisonGreenacre2002 \n", "2 0.159 0.313 AitchisonGreenacre2002 \n", "3 0.133 0.303 AitchisonGreenacre2002 \n", "4 0.178 0.282 AitchisonGreenacre2002 \n", "5 0.188 0.299 AitchisonGreenacre2002 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "metadata = read_metadata_file(\"input/color-sample-metadata.tsv\")\n", "metadata.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.3. Feature Rankings\n", "\n", "By \"feature rankings,\" we usually mean either the feature loadings in a biplot or \"differentials.\" Please see Qurro's paper ([preprint here](https://www.biorxiv.org/content/10.1101/2019.12.17.880047v1)) for more details on what these terms mean.\n", "\n", "In the next section we're going to generate a biplot for the color composition abundance data using Aitchison PCA, and use the feature loadings in that biplot as the feature rankings." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Generating and visualizing a compositional biplot\n", "\n", "We generate the biplot using Aitchison PCA, wherein we take the [singular value decomposition](https://en.wikipedia.org/wiki/Singular_value_decomposition) of the [center log-ratio transform](https://en.wikipedia.org/wiki/Compositional_data#Center_logratio_transform) of the feature table.\n", "\n", "As you can see, this looks pretty similar to the biplot figures of this data shown in [Aitchison and Greenacre (2002)](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/1467-9876.00275). Some of the axes are inverted compared to that paper's biplots (i.e. here `Red` points to the right and `Blue` points to the left, whereas in the 2002 paper it's the opposite), but the interpretation should be the same.\n", "\n", "(One fun tidbit: if you're wondering why painting `20` here seems incorrectly placed compared to the A&G 2002 paper, it's because there's a small error in some of that paper's figures! See [here](https://github.com/biocore/qurro/pull/282#issuecomment-594256858) for details.)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2022-07-05T21:46:16.970486Z", "iopub.status.busy": "2022-07-05T21:46:16.970313Z", "iopub.status.idle": "2022-07-05T21:46:18.346441Z", "shell.execute_reply": "2022-07-05T21:46:18.345998Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from plotting_helper import apca, draw_painting_biplot\n", "\n", "# Perform Aitchison PCA\n", "ordination = apca(table.astype(float))\n", "\n", "# Style and draw the biplot, using the first and second principal components\n", "# https://github.com/jupyter/notebook/issues/3523#issuecomment-534379015\n", "%matplotlib inline\n", "draw_painting_biplot(ordination, \"Axis 1\", \"Axis 2\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.1. Viewing the loadings from the biplot\n", "When we used Aitchison PCA above, we got a scikit-bio [OrdinationResults](http://scikit-bio.org/docs/latest/generated/skbio.stats.ordination.OrdinationResults.html) object. This contains the sample and feature loadings underlying the biplot that was generated, as well as some additional information. (If you're interested in more details, we encourage you to check out the `plotting_helper.py` code provided in this folder.)\n", "\n", "#### 2.1.1. Feature Loadings" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2022-07-05T21:46:18.348586Z", "iopub.status.busy": "2022-07-05T21:46:18.348359Z", "iopub.status.idle": "2022-07-05T21:46:18.354924Z", "shell.execute_reply": "2022-07-05T21:46:18.354555Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Axis 1Axis 2
FeatureID
Black0.064761-0.544208
White0.0200500.724314
Blue-0.5410210.119259
Red0.8228540.130236
Yellow-0.1533300.028251
\n", "
" ], "text/plain": [ " Axis 1 Axis 2\n", "FeatureID \n", "Black 0.064761 -0.544208\n", "White 0.020050 0.724314\n", "Blue -0.541021 0.119259\n", "Red 0.822854 0.130236\n", "Yellow -0.153330 0.028251" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ordination.features.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2.1.2. Sample Loadings" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2022-07-05T21:46:18.356940Z", "iopub.status.busy": "2022-07-05T21:46:18.356770Z", "iopub.status.idle": "2022-07-05T21:46:18.362609Z", "shell.execute_reply": "2022-07-05T21:46:18.362116Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Axis 1Axis 2
SampleID
1-0.2018560.229034
2-0.0301620.070389
30.2875730.124102
4-0.064628-0.006208
5-0.159943-0.367294
\n", "
" ], "text/plain": [ " Axis 1 Axis 2\n", "SampleID \n", "1 -0.201856 0.229034\n", "2 -0.030162 0.070389\n", "3 0.287573 0.124102\n", "4 -0.064628 -0.006208\n", "5 -0.159943 -0.367294" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ordination.samples.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2. Export the ordination information to a file\n", "This will enable us to use the feature loadings contained in this file as feature rankings in Qurro." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2022-07-05T21:46:18.365121Z", "iopub.status.busy": "2022-07-05T21:46:18.364917Z", "iopub.status.idle": "2022-07-05T21:46:18.370805Z", "shell.execute_reply": "2022-07-05T21:46:18.370205Z" } }, "outputs": [ { "data": { "text/plain": [ "'output/apca-ordination.txt'" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ordination.write(\"output/apca-ordination.txt\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3. Optional: merge the sample loadings into the sample metadata\n", "As we mentioned before, we don't really have a lot of information about these paintings. One thing we do have now, though, are loadings in the biplot for each sample. You can imagine visualizing these loadings in relation to a selected log-ratio—for example, as shown in the bottom four sub-figures of Fig. 5 in [Martino et al. 2019](https://msystems.asm.org/content/msys/4/1/e00016-19).\n", "\n", "Here we're going to merge these loadings with our previous metadata to generate an augmented metadata file, and we'll use that augmented metadata file in Qurro." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2022-07-05T21:46:18.373142Z", "iopub.status.busy": "2022-07-05T21:46:18.372949Z", "iopub.status.idle": "2022-07-05T21:46:18.390260Z", "shell.execute_reply": "2022-07-05T21:46:18.389692Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
proportion_blackproportion_whiteproportion_blueproportion_redproportion_yellowproportion_otherdata_sourceAxis 1Axis 2
SampleID
10.1250.2430.1530.0310.1810.266AitchisonGreenacre2002-0.2018560.229034
20.1430.2240.1110.0510.1590.313AitchisonGreenacre2002-0.0301620.070389
30.1470.2310.0580.1290.1330.303AitchisonGreenacre20020.2875730.124102
40.1640.2090.1200.0470.1780.282AitchisonGreenacre2002-0.064628-0.006208
50.1970.1510.1320.0330.1880.299AitchisonGreenacre2002-0.159943-0.367294
\n", "
" ], "text/plain": [ " proportion_black proportion_white proportion_blue proportion_red \\\n", "SampleID \n", "1 0.125 0.243 0.153 0.031 \n", "2 0.143 0.224 0.111 0.051 \n", "3 0.147 0.231 0.058 0.129 \n", "4 0.164 0.209 0.120 0.047 \n", "5 0.197 0.151 0.132 0.033 \n", "\n", " proportion_yellow proportion_other data_source Axis 1 \\\n", "SampleID \n", "1 0.181 0.266 AitchisonGreenacre2002 -0.201856 \n", "2 0.159 0.313 AitchisonGreenacre2002 -0.030162 \n", "3 0.133 0.303 AitchisonGreenacre2002 0.287573 \n", "4 0.178 0.282 AitchisonGreenacre2002 -0.064628 \n", "5 0.188 0.299 AitchisonGreenacre2002 -0.159943 \n", "\n", " Axis 2 \n", "SampleID \n", "1 0.229034 \n", "2 0.070389 \n", "3 0.124102 \n", "4 -0.006208 \n", "5 -0.367294 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged_metadata = metadata.merge(\n", " ordination.samples,\n", " how=\"left\",\n", " left_index=True,\n", " right_index=True,\n", " suffixes=(False, False)\n", ")\n", "merged_metadata.to_csv(\"output/merged-metadata.tsv\", sep=\"\\t\")\n", "merged_metadata.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Running Qurro\n", "\n", "Now that we have everything ready, we can finally use Qurro with this data.\n", "\n", "### 3.1. Listing the available command-line options" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2022-07-05T21:46:18.392380Z", "iopub.status.busy": "2022-07-05T21:46:18.392196Z", "iopub.status.idle": "2022-07-05T21:46:18.888167Z", "shell.execute_reply": "2022-07-05T21:46:18.887080Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: qurro [OPTIONS]\r\n", "\r\n", " Generates a visualization of feature rankings and log-ratios.\r\n", "\r\n", " The resulting visualization contains two plots. The first plot shows how\r\n", " features are ranked, and the second plot shows the log-ratio of \"selected\"\r\n", " features' abundances within samples.\r\n", "\r\n", " The visualization is interactive, so which features are \"selected\" to\r\n", " construct log-ratios -- as well as various other properties of the\r\n", " visualization -- can be changed by the user.\r\n", "\r\n", "Options:\r\n", " -r, --ranks TEXT Either feature differentials (contained in a\r\n", " TSV file, where each row describes a feature\r\n", " and each column describes a differential\r\n", " field) or a scikit-bio OrdinationResults\r\n", " file for a biplot (containing feature\r\n", " loadings). When sorted numerically,\r\n", " differentials and feature loadings alike\r\n", " provide 'rankings.' [required]\r\n", "\r\n", " -t, --table TEXT A BIOM table describing the abundances of\r\n", " the ranked features in samples. Note that\r\n", " empty samples and features will be removed\r\n", " from the Qurro visualization. [required]\r\n", "\r\n", " -sm, --sample-metadata TEXT Sample metadata, formatted as a TSV file\r\n", " (where each row describes a sample and each\r\n", " column describes a 'metadata' field, and the\r\n", " first column contains sample IDs). In Qurro\r\n", " visualizations, you can use sample metadata\r\n", " fields to change the x-axis and colors in\r\n", " the sample plot. [required]\r\n", "\r\n", " -fm, --feature-metadata TEXT Feature metadata, formatted as a TSV file\r\n", " (where each row describes a feature and each\r\n", " column describes a 'metadata' field, and the\r\n", " first column contains feature IDs). In Qurro\r\n", " visualizations, you can use feature metadata\r\n", " fields to filter features in the rank plot\r\n", " when selecting log-ratios.\r\n", "\r\n", " -o, --output-dir TEXT Directory to write the HTML/JS/... files\r\n", " defining a Qurro visualization to. If this\r\n", " directory already exists, files/directories\r\n", " already within it will be overwritten if\r\n", " necessary. Note that you need to keep the\r\n", " files in this directory together -- moving\r\n", " the index.html file in this directory to\r\n", " another location, without also moving the\r\n", " JS/etc. files, will break the visualization.\r\n", " [required]\r\n", "\r\n", " -x, --extreme-feature-count INTEGER\r\n", " If specified, Qurro will only use this many\r\n", " \"extreme\" features from both ends of all of\r\n", " the rankings. This is useful when dealing\r\n", " with huge datasets (e.g. with BIOM tables\r\n", " exceeding 1 million entries), for which\r\n", " running Qurro normally might take a long\r\n", " amount of time or crash due to memory\r\n", " limits. Note that the automatic removal of\r\n", " empty samples and features from the table\r\n", " will be done *after* this filtering step.\r\n", "\r\n", " --debug If this flag is used, Qurro will output\r\n", " debug messages.\r\n", "\r\n", " --version Show the version and exit.\r\n", " --help Show this message and exit.\r\n" ] } ], "source": [ "!qurro --help" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2. Generating a Qurro visualization\n", "Our inputs will be the following three files:\n", "\n", " - **Feature table:** The BIOM table we generated in section 1.1.3 above.\n", " \n", " \n", " - **Sample metadata:** The merged metadata file we generated in section 2.3 above.\n", " \n", " \n", " - **Feature rankings:** The feature loadings we exported in section 2.2 above." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2022-07-05T21:46:18.892221Z", "iopub.status.busy": "2022-07-05T21:46:18.891899Z", "iopub.status.idle": "2022-07-05T21:46:19.418979Z", "shell.execute_reply": "2022-07-05T21:46:19.417045Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Successfully generated a visualization in the folder output/qurro-viz/.\r\n" ] } ], "source": [ "!qurro \\\n", " --table output/color-table.biom \\\n", " --sample-metadata output/merged-metadata.tsv \\\n", " --ranks output/apca-ordination.txt \\\n", " --output-dir output/qurro-viz/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.3. Interacting with the Qurro visualization\n", "\n", "The command you just ran will generate a folder containing the Qurro visualization. To view the visualization, you can just open up the `index.html` file contained within this folder in a web browser. You should see something like this:\n", "\n", "![](imgs/1.png)\n", "\n", "The top-left of the screen contains the **rank plot**: a plot showing the loadings for each feature for a selected axis or principal component. The top-right of the screen contains the **sample plot**: a plot that will show how a selected log-ratio of features looks for all of the samples.\n", "\n", "Things look pretty blank right now, since nothing is selected. Let's fix that!\n", "\n", "#### 3.3.1. Selecting a log-ratio\n", "\n", "One thing that's clear from looking at the biplot visualization we generated earlier is that `Red` and `Blue` seemed to differentiate samples along Axis 1. Looking at the rank plot for Axis 1 confirms this -- check out how the magnitudes of `Red` and `Blue` for the Axis 1 feature loadings are relatively larger than the other colors.\n", "\n", "So, let's try seeing how the `Red`:`Blue` log-ratio looks in Qurro. To select a log-ratio of individual features, you can just click on the rank plot -- the first click sets the new numerator and the second click sets the new denominator. In this case, we're going to click on the rightmost bar (`Red`), and then the leftmost bar (`Blue`).\n", "\n", "![](imgs/2.png)\n", "\n", "#### 3.3.2. Adjusting the sample plot\n", "\n", "The sample plot just looks like a bunch of noise! Mostly, this is because the way the sample plot x-axis is set up doesn't make sense: it's set to `proportion_black` (which we don't really have a reason to expect would be associated with the `Red`:`Blue` log-ratio), and its scale type is set to `Categorical` (despite the fact that `proportion_black` is a quantitative field). If we set the sample plot x-axis to `proportion_red`, and change up some of the other sample plot controls, we get a much more useful visualization:\n", "\n", "![](imgs/3.png)\n", "\n", "So we can see from here that the `Red`:`Blue` log-ratio is very correlated with the proportion of `Red` in a given painting. Hopefully this makes sense! All the sample plot is showing is that `ln(r / b)` is correlated with `r`, which shouldn't be too crazy.\n", "\n", "But we have some other things we can try out.\n", "\n", "#### 3.3.3. Relating feature log-ratios to sample loadings\n", "\n", "Remember the sample loadings we merged into the metadata a while back? We can use those here, and replicate the sorts of figures shown in Fig. 5 in [Martino et al. 2019](https://msystems.asm.org/content/msys/4/1/e00016-19).\n", "\n", "We know that `Red` and `Blue` differentiate samples along Axis 1, so let's look at how the Axis 1 sample loadings are correlated with the `Red`:`Blue` log-ratio. We already have that log-ratio selected, so all we need to do is change the sample plot x-axis field from `proportion_red` to `Axis 1`:\n", "\n", "![](imgs/4.png)\n", "\n", "That's cool. We can see that the `Red`:`Blue` log-ratio is highly correlated with the Axis 1 sample loadings of paintings, which confirms our observations from looking at the biplot visualization.\n", "\n", "#### 3.3.4. Trying additional log-ratios\n", "\n", "As an exercise for the reader: try switching the rank plot's `Feature Loading` to Axis 2, then try selecting the log-ratio of `White`:`Black`. How does this look when we view samples' Axis 1 sample loadings? How does this look when we view samples' Axis 2 sample loadings? (This is shown below.) What differences do you see, and why do you think these are the case?\n", "\n", "![](imgs/5.png)\n", "\n", "### 3.4. Finishing up; additional reading\n", "\n", "There are a few other functionalities in Qurro that we haven't covered here. We encourage you to check out the interface for yourself!\n", "\n", "There are a lot of ways to visualize compositional data, and a lot of ways to use Qurro in conjunction with a compositional biplot. **Our hope is that you interpret this document as less of a strict guide and more as a starting point for how to use Qurro**. In particular, [Aitchison and Greenacre (2002)](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/1467-9876.00275), where we got this color composition data from, goes pretty in-depth in how to interpret compositional biplots -- we highly recommend checking that paper out to get a sense of what these results \"mean.\"\n", "\n", "Thanks for reading this tutorial! As always, please feel free to [open an issue](https://github.com/biocore/qurro/) in Qurro's repository (or ask a question on the [QIIME 2 forums](https://forum.qiime2.org/)) if you have any questions, comments, or suggestions." ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:q2-2022.2-unfucked] *", "language": "python", "name": "conda-env-q2-2022.2-unfucked-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 2 }