{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Building plots from `sourmash compare` output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Running this notebook.\n", "\n", "You can run this notebook interactively via mybinder; click on this button:\n", "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dib-lab/sourmash/latest?filepath=doc%2Fplotting-compare.ipynb)\n", "\n", "A rendered version of this notebook is available at [sourmash.readthedocs.io](https://sourmash.readthedocs.io) under \"Tutorials and notebooks\".\n", "\n", "You can also get this notebook from the [doc/ subdirectory of the sourmash github repository](https://github.com/dib-lab/sourmash/tree/latest/doc). See [binder/environment.yaml](https://github.com/dib-lab/sourmash/blob/latest/binder/environment.yml) for installation dependencies.\n", "\n", "### What is this?\n", "\n", "This is a Jupyter Notebook using Python 3. If you are running this via [binder](https://mybinder.org), you can use Shift-ENTER to run cells, and double click on code cells to edit them.\n", "\n", "Contact: C. Titus Brown, ctbrown@ucdavis.edu. Please [file issues on GitHub](https://github.com/dib-lab/sourmash/issues/) if you have any questions or comments!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Running `sourmash compare` and generating figures in Python\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we need to generate a similarity matrix with compare. (If you want to generate this programmatically, it's just a `numpy` matrix.)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K\n", "== This is sourmash version 3.3.2.dev9+g462bc387. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kloaded 1 sigs from '../tests/test-data/demo/SRR2060939_1.sig'g'\n", "\u001b[Kloaded 1 sigs from '../tests/test-data/demo/SRR2060939_2.sig'g'\n", "\u001b[Kloaded 1 sigs from '../tests/test-data/demo/SRR2241509_1.sig'g'\n", "\u001b[Kloaded 1 sigs from '../tests/test-data/demo/SRR2255622_1.sig'g'\n", "\u001b[Kloaded 1 sigs from '../tests/test-data/demo/SRR453566_1.sig'g'\n", "\u001b[Kloaded 1 sigs from '../tests/test-data/demo/SRR453569_1.sig'g'\n", "\u001b[Kloaded 1 sigs from '../tests/test-data/demo/SRR453570_1.sig'g'\n", "\u001b[Kloaded 7 signatures total. \n", "\u001b[K\n", "0-SRR2060939_1.fa...\t[1. 0.356 0.078 0.086 0. 0. 0. ]\n", "1-SRR2060939_2.fa...\t[0.356 1. 0.072 0.078 0. 0. 0. ]\n", "2-SRR2241509_1.fa...\t[0.078 0.072 1. 0.074 0. 0. 0. ]\n", "3-SRR2255622_1.fa...\t[0.086 0.078 0.074 1. 0. 0. 0. ]\n", "4-SRR453566_1.fas...\t[0. 0. 0. 0. 1. 0.382 0.364]\n", "5-SRR453569_1.fas...\t[0. 0. 0. 0. 0.382 1. 0.386]\n", "6-SRR453570_1.fas...\t[0. 0. 0. 0. 0.364 0.386 1. ]\n", "min similarity in matrix: 0.000\n", "\u001b[Ksaving labels to: compare-demo.labels.txt\n", "\u001b[Ksaving comparison matrix to: compare-demo\n" ] } ], "source": [ "!sourmash compare ../tests/test-data/demo/*.sig -o compare-demo" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "source": [ "%pylab inline\n", "# import the `fig` module from sourmash:\n", "from sourmash import fig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `sourmash.fig` module contains code to load the similarity matrix and associated labels:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "matrix, labels = fig.load_matrix_and_labels('compare-demo')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, `matrix` is a numpy matrix and `labels` is a list of labels (by default, filenames)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "matrix:\n", " [[1. 0.356 0.078 0.086 0. 0. 0. ]\n", " [0.356 1. 0.072 0.078 0. 0. 0. ]\n", " [0.078 0.072 1. 0.074 0. 0. 0. ]\n", " [0.086 0.078 0.074 1. 0. 0. 0. ]\n", " [0. 0. 0. 0. 1. 0.382 0.364]\n", " [0. 0. 0. 0. 0.382 1. 0.386]\n", " [0. 0. 0. 0. 0.364 0.386 1. ]]\n", "labels: ['SRR2060939_1.fastq.gz', 'SRR2060939_2.fastq.gz', 'SRR2241509_1.fastq.gz', 'SRR2255622_1.fastq.gz', 'SRR453566_1.fastq.gz', 'SRR453569_1.fastq.gz', 'SRR453570_1.fastq.gz']\n" ] } ], "source": [ "print('matrix:\\n', matrix)\n", "print('labels:', labels)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `plot_composite_matrix` function returns a generated plot, along with the labels and matrix as re-ordered by the clustering:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "f, reordered_labels, reordered_matrix = fig.plot_composite_matrix(matrix, labels)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "reordered matrix:\n", " [[1. 0.382 0.364 0. 0. 0. 0. ]\n", " [0.382 1. 0.386 0. 0. 0. 0. ]\n", " [0.364 0.386 1. 0. 0. 0. 0. ]\n", " [0. 0. 0. 1. 0.356 0.078 0.086]\n", " [0. 0. 0. 0.356 1. 0.072 0.078]\n", " [0. 0. 0. 0.078 0.072 1. 0.074]\n", " [0. 0. 0. 0.086 0.078 0.074 1. ]]\n", "reordered labels: ['SRR2255622_1.fastq.gz', 'SRR2241509_1.fastq.gz', 'SRR2060939_2.fastq.gz', 'SRR2060939_1.fastq.gz', 'SRR453570_1.fastq.gz', 'SRR453569_1.fastq.gz', 'SRR453566_1.fastq.gz']\n" ] } ], "source": [ "print('reordered matrix:\\n', reordered_matrix)\n", "print('reordered labels:', reordered_labels)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Customizing plots" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to customize the plots, please see the code for `plot_composite_matrix` in [sourmash/fig.py](https://github.com/dib-lab/sourmash/blob/latest/sourmash/fig.py), which is reproduced below; you can modify the code in place to (for example) [use custom dendrogram colors](https://stackoverflow.com/questions/38153829/custom-cluster-colors-of-scipy-dendrogram-in-python-link-color-func)." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "import scipy.cluster.hierarchy as sch\n", "\n", "def plot_composite_matrix(D, labeltext, show_labels=True, show_indices=True,\n", " vmax=1.0, vmin=0.0, force=False):\n", " \"\"\"Build a composite plot showing dendrogram + distance matrix/heatmap.\n", " Returns a matplotlib figure.\"\"\"\n", " if D.max() > 1.0 or D.min() < 0.0:\n", " error('This matrix doesn\\'t look like a distance matrix - min value {}, max value {}', D.min(), D.max())\n", " if not force:\n", " raise ValueError(\"not a distance matrix\")\n", " else:\n", " notify('force is set; scaling to [0, 1]')\n", " D -= D.min()\n", " D /= D.max()\n", "\n", " if show_labels:\n", " show_indices = True\n", "\n", " fig = pylab.figure(figsize=(11, 8))\n", " ax1 = fig.add_axes([0.09, 0.1, 0.2, 0.6])\n", "\n", " # plot dendrogram\n", " Y = sch.linkage(D, method='single') # centroid\n", "\n", " dendrolabels = labeltext\n", " if not show_labels:\n", " dendrolabels = [str(i) for i in range(len(labeltext))]\n", "\n", " Z1 = sch.dendrogram(Y, orientation='left', labels=dendrolabels,\n", " no_labels=not show_indices)\n", " ax1.set_xticks([])\n", "\n", " xstart = 0.45\n", " width = 0.45\n", " if not show_labels:\n", " xstart = 0.315\n", " scale_xstart = xstart + width + 0.01\n", "\n", " # plot matrix\n", " axmatrix = fig.add_axes([xstart, 0.1, width, 0.6])\n", "\n", " # (this reorders D by the clustering in Z1)\n", " idx1 = Z1['leaves']\n", " D = D[idx1, :]\n", " D = D[:, idx1]\n", "\n", " # show matrix\n", " im = axmatrix.matshow(D, aspect='auto', origin='lower',\n", " cmap=pylab.cm.YlGnBu, vmin=vmin, vmax=vmax)\n", " axmatrix.set_xticks([])\n", " axmatrix.set_yticks([])\n", "\n", " # Plot colorbar.\n", " axcolor = fig.add_axes([scale_xstart, 0.1, 0.02, 0.6])\n", " pylab.colorbar(im, cax=axcolor)\n", "\n", " return fig" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "_ = plot_composite_matrix(matrix, labels)" ] } ], "metadata": { "kernelspec": { "display_name": "Python (myenv)", "language": "python", "name": "myenv" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 2 }