{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Some sourmash command line examples!\n", "\n", "[sourmash](https://sourmash.readthedocs.io/en/latest/) is research software from the Lab for Data Intensive Biology at UC Davis. It implements MinHash and modulo hash.\n", "\n", "Below are some examples of using sourmash. They are computed in a Jupyter Notebook so you can run them yourself if you like!\n", "\n", "Sourmash works on *signature files*, which are just saved collections of hashes.\n", "\n", "Let's try it out!\n", "\n", "### Running this notebook.\n", "\n", "You can run this notebook interactively via mybinder; click on this button:\n", "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dib-lab/sourmash/latest?filepath=doc%2Fsourmash-examples.ipynb)\n", "\n", "A rendered version of this notebook is available at [sourmash.readthedocs.io](https://sourmash.readthedocs.io) under \"Tutorials and notebooks\".\n", "\n", "You can also get this notebook from the [doc/ subdirectory of the sourmash github repository](https://github.com/dib-lab/sourmash/tree/latest/doc). See [binder/environment.yaml](https://github.com/dib-lab/sourmash/blob/latest/binder/environment.yml) for installation dependencies.\n", "\n", "### What is this?\n", "\n", "This is a Jupyter Notebook using Python 3. If you are running this via [binder](https://mybinder.org), you can use Shift-ENTER to run cells, and double click on code cells to edit them.\n", "\n", "Contact: C. Titus Brown, ctbrown@ucdavis.edu. Please [file issues on GitHub](https://github.com/dib-lab/sourmash/issues/) if you have any questions or comments!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compute scaled signatures" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K== This is sourmash version 2.0.0a12.dev48+ga92289b. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Ksetting num_hashes to 0 because --scaled is set\n", "\u001b[Kcomputing signatures for files: genomes/akkermansia.fa, genomes/shew_os185.fa, genomes/shew_os223.fa\n", "\u001b[KComputing signature for ksizes: [21, 31, 51]\n", "\u001b[KComputing only nucleotide (and not protein) signatures.\n", "\u001b[KComputing a total of 3 signature(s).\n", "\u001b[K... reading sequences from genomes/akkermansia.fa\n", "\u001b[Kcalculated 3 signatures for 1 sequences in genomes/akkermansia.fa\n", "\u001b[Ksaved 3 signature(s). Note: signature license is CC0.\n", "\u001b[K... reading sequences from genomes/shew_os185.fa\n", "\u001b[Kcalculated 3 signatures for 1 sequences in genomes/shew_os185.fa\n", "\u001b[Ksaved 3 signature(s). Note: signature license is CC0.\n", "\u001b[K... reading sequences from genomes/shew_os223.fa\n", "\u001b[Kcalculated 3 signatures for 1 sequences in genomes/shew_os223.fa\n", "\u001b[Ksaved 3 signature(s). Note: signature license is CC0.\n" ] } ], "source": [ "!rm -f *.sig\n", "!sourmash compute -k 21,31,51 --scaled=1000 genomes/*.fa --name-from-first -f" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This outputs three signature files, each containing three signatures (one calculated at k=21, one at k=31, and one at k=51)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "akkermansia.fa.sig shew_os185.fa.sig shew_os223.fa.sig\r\n" ] } ], "source": [ "ls *.sig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now use these signature files for various comparisons." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Search multiple signatures with a query\n", "\n", "The below command queries all of the signature files in the directory with the `shew_os223` signature and finds the best Jaccard similarity:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K== This is sourmash version 2.0.0a12.dev48+ga92289b. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kloaded query: NC_011663.1 Shewanella baltica... (k=31, DNA)\n", "\u001b[Kloaded 3 signatures. \n", "\n", "2 matches:\n", "similarity match\n", "---------- -----\n", "100.0% NC_011663.1 Shewanella baltica OS223, complete genome\n", " 22.8% NC_009665.1 Shewanella baltica OS185, complete genome\n" ] } ], "source": [ "!sourmash search -k 31 shew_os223.fa.sig *.sig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The below command uses Jaccard containment instead of Jaccard similarity:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K== This is sourmash version 2.0.0a12.dev48+ga92289b. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kloaded query: NC_011663.1 Shewanella baltica... (k=31, DNA)\n", "\u001b[Kloaded 3 signatures. \n", "\n", "2 matches:\n", "similarity match\n", "---------- -----\n", "100.0% NC_011663.1 Shewanella baltica OS223, complete genome\n", " 37.3% NC_009665.1 Shewanella baltica OS185, complete genome\n" ] } ], "source": [ "!sourmash search -k 31 shew_os223.fa.sig *.sig --containment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Performing all-by-all queries\n", "\n", "We can also compare all three signatures:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K== This is sourmash version 2.0.0a12.dev48+ga92289b. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kloaded 3 signatures total. \n", "\u001b[Kdownsampling to scaled value of 1000\n", "\u001b[K\n", "0-CP001071.1 Akke...\t[1. 0. 0.]\n", "1-NC_009665.1 She...\t[0. 1. 0.228]\n", "2-NC_011663.1 She...\t[0. 0.228 1. ]\n", "min similarity in matrix: 0.000\n" ] } ], "source": [ "!sourmash compare -k 31 *.sig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "...and produce a similarity matrix that we can use for plotting:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K== This is sourmash version 2.0.0a12.dev48+ga92289b. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kloaded 3 signatures total. \n", "\u001b[Kdownsampling to scaled value of 1000\n", "\u001b[K\n", "0-CP001071.1 Akke...\t[1. 0. 0.]\n", "1-NC_009665.1 She...\t[0. 1. 0.228]\n", "2-NC_011663.1 She...\t[0. 0.228 1. ]\n", "min similarity in matrix: 0.000\n", "\u001b[Ksaving labels to: genome_compare.mat.labels.txt\n", "\u001b[Ksaving distance matrix to: genome_compare.mat\n" ] } ], "source": [ "!sourmash compare -k 31 *.sig -o genome_compare.mat" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K== This is sourmash version 2.0.0a12.dev48+ga92289b. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kloading comparison matrix from genome_compare.mat...\n", "\u001b[K...got 3 x 3 matrix.\n", "\u001b[Kloading labels from genome_compare.mat.labels.txt\n", "\u001b[Ksaving histogram of matrix values => genome_compare.mat.hist.png\n", "\u001b[Kwrote dendrogram to: genome_compare.mat.dendro.png\n", "\u001b[Kwrote numpy distance matrix to: genome_compare.mat.matrix.png\n", "0\tCP001071.1 Akkermansia muciniphila ATCC BAA-835, complete genome\n", "1\tNC_009665.1 Shewanella baltica OS185, complete genome\n", "2\tNC_011663.1 Shewanella baltica OS223, complete genome\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAABEwAAAMgCAYAAAA5kPcVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAPYQAAD2EBqD+naQAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAIABJREFUeJzs3X9sX9V9N/DP/Rpik1CnVBnOj3nKuq7rgJJAGJYL7Fklj2ytUvWPdllLGxa1VG3JRPGqUtYSU6oROrook5ouKiWjW0Gk4uk2GCgVzZZntKTLGhppk6ARY2kyiA15GDM4JGb29/nD4D4+CSTxuc61b16v6P7h6+/3fI8tLkre933PKZrNZjMAAAAAGNeoegIAAAAA043ABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAAAAEgITAAAAgITABAAAACAhMAEAAABICEwAAABmuH/6p3+KFStWxMKFC6Moivjbv/3b475n+/btcfHFF0dra2u87W1vi7vuumvqJwoziMAEAABghhsaGoolS5bExo0bT+j1//Ef/xHvfe97493vfnfs3r07PvOZz8THP/7x+N73vjfFM4WZo2g2m82qJwEAAEA5iqKIv/mbv4n3v//9r/uaG264IR588MH4t3/7t/Fzv//7vx8vvPBCbN269VRME6a9M6qeAGNGR0fjmWeeiTe96U1RFEXV04Faazab8eKLL8bChQuj0Zj6op3rGwBmlpP9u8Lhw4djeHh4SuaR/t2htbU1Wltbs8fesWNH9PT0TDi3fPny+MxnPpM9NtSFwGSaeOaZZ6Kzs7PqacBpZf/+/fGLv/iLU/45rm8AmJlO5O8Khw8fjrPmnBMxerj0zz/77LPjpZdemnCur68vbr755uyx+/v7o6OjY8K5jo6OGBwcjJdffjnOOuus7M+AmU5gMk286U1vioix/ym3t7dXPBuot8HBwejs7By/7qbaz6/v/xPt7Wefks+E49n9f5+segowwZVXfKfqKcC45ugrMXzg/hP6u8Lw8HDE6OFoXfC+iMaZ5U1i9JV46cD9R/37oIx2CXBiBCbTxGtVu/b2doEJnCKn6vGYn1/fZwtMmDbOHp5d9RRggqLMf2hCSU7m7wpFS2up/x03i7FHgabq3wfz58+PgYGBCecGBgaivb1duwReZZccAACA00x3d3ds27ZtwrmHH344uru7K5oRTD8CEwAAgExFFFFEo8Tj5JqwL730UuzevTt2794dEWPbBu/evTv27dsXERE33nhjrFq1avz1n/zkJ+Opp56Kz33uc/HEE0/E17/+9fjOd74T119/fXm/FJjhBCYAAAAz3I9//OO46KKL4qKLLoqIiN7e3rjoooti7dq1ERFx4MCB8fAkIuKXf/mX48EHH4yHH344lixZEn/2Z38W3/zmN2P58uWVzB+mI2uYAAAAZCqKRhRFifejT3Ks3/qt34pms/m637/rrruO+Z6f/OQnJzszOG1omAAAAAAkNEwAAAAyVd0wAcrnKgQAAABIaJgAAABkKooiiuLkdrY5zoDljQVMioYJAAAAQELDBAAAIFsjyr0f7d42VM1VCAAAAJDQMAEAAMhklxyoH4EJAABAJoEJ1I+rEAAAACChYQIAAJCpiEYUFn2FWnEVAgAAACQ0TAAAADJZwwTqx1UIAAAAkNAwAQAAyFREyQ0T97ahcq5CAAAAgISGCQAAQK6iKLVh0iyK0sYCJkfDBAAAACChYQIAAJCpePVPmeMB1dIwAQAAAEhomAAAAGQqinJ3ySl3xx1gMlyFAAAAAAkNEwAAgEwaJlA/AhMAAIBMAhOoH1chAAAAQELDBAAAIFsjyr0f7d42VM1VCAAAAJDQMAEAAMhkDROoH1chAAAAQELDBAAAIJOGCdSPqxAAAAAgoWECAACQqYhGFCXejy5zLGByXIUAAAAACQ0TAACATEVRlLyGSVHaWMDkaJgAAAAAJDRMAAAAMo01TMprhWiYQPU0TAAAAAASGiYAAACZiqJR8hom7m1D1QQmAAAAmWwrDPXjKgQAAABIaJgAAABk8kgO1I+rEAAAACChYQIAAJBJwwTqx1UIAAAAkNAwAQAAyGSXHKgfVyEAAABAQsMEAAAgV9EYO8ocD6iUqxAAAAAgoWECAACQyS45UD+uQgAAAICEhgkAAECmoiiiKIpSxwOqpWECAAAAkNAwAQAAyFREEUWJ96OL0DCBqglMAAAAcpW86KtthaF6rkIAAACAhIYJAABArqIYO8ocD6iUhgkAAABAQsMEAAAgVxHl3o4eLXEsYFI0TAAAAAASGiYAAAC5rGECtaNhAgAAAJDQMAEAAMilYQK1o2ECAAAAkNAwAQAAyNWIcm9Hu7UNlXMZAgAAACQ0TAAAAHIVRTStYQK1omECAAAAkBCYABzHv/zLv8SaNWvi/PPPjzlz5sQv/dIvxe/93u/Fnj17qp4aADBdFFNwAJXySA7AcXzlK1+JH/7wh/HBD34wLrzwwujv74+vfe1rcfHFF8ePfvSjuOCCC6qeIgBQtUYxdpQ5HlApgQnAcfT29sY999wTs2bNGj+3cuXKeOc73xm33XZbfPvb365wdgAAwFQQmAAcx7ve9a6jzv3qr/5qnH/++fH4449XMCMAYNopinIXarXoK1ROYEIlms2IQ4eqngWnq6Gh/DGazWYMDAzE+eefnz8YAAAw7QhMOOWazYjLL4949NGqZwKTd/fdd8fTTz8dt9xyS9VTAQCmg7IXalUwgcrZJYdT7tAhYQkz2xNPPBHXXnttdHd3x9VXX131dAAAgCmgYUKlBgYi5sypehacbgYHIxYunNx7+/v7473vfW/MnTs37rvvvmhpaSl3cgDAzGSXHKgdgQmVmjNHYMKpNzIyuff993//d/zu7/5uvPDCC/HII4/EwsmmLgAAwLQnMAE4AYcPH44VK1bEnj174vvf/36cd955VU8JAJhO7JIDtSMwATiOkZGRWLlyZezYsSP+7u/+Lrq7u6ueEgAAMMUEJgDH8Ud/9Edx//33x4oVK+L555+Pb3/72xO+/5GPfKSimQEA04ZdcqB2BCYAx7F79+6IiHjggQfigQceOOr7AhMAAKgfgQnAcWzfvr3qKQAA051dcqB2GlVPAAAAAGC60TABAADIZQ0TqB2BCQAAQKZmFNEscSvgpsQEKueRHAAAAICEwAQAACDXa4u+lnmcpI0bN8bixYujra0turq6YufOnW/4+g0bNsSv/dqvxVlnnRWdnZ1x/fXXx+HDhyf7G4DaEZgAAADMcFu2bIne3t7o6+uLxx57LJYsWRLLly+PZ5999pivv+eee+Lzn/989PX1xeOPPx533nlnbNmyJf74j//4FM8cpi+BCQAAQK5iCo6TsH79+rjmmmti9erVcd5558WmTZti9uzZsXnz5mO+/tFHH43LLrssPvzhD8fixYvjyiuvjA996EPHbaXA6URgAgAAME0NDg5OOI4cOXLUa4aHh2PXrl3R09Mzfq7RaERPT0/s2LHjmOO+613vil27do0HJE899VQ89NBD8Z73vGdqfhCYgeySAwAAkKsoxo4yx4uIzs7OCaf7+vri5ptvnnDu4MGDMTIyEh0dHRPOd3R0xBNPPHHM4T/84Q/HwYMH4/LLL49msxn/8z//E5/85Cc9kgP/H4EJAADANLV///5ob28f/7q1tbWUcbdv3x633nprfP3rX4+urq548skn47rrrosvf/nLcdNNN5XyGTDTCUwAAAByTXJnmzccLyLa29snBCbHMm/evGhpaYmBgYEJ5wcGBmL+/PnHfM9NN90UH/3oR+PjH/94RES8853vjKGhofjEJz4RX/jCF6LRsHoDuAoAAABmsFmzZsWyZcti27Zt4+dGR0dj27Zt0d3dfcz3HDp06KhQpKWlJSIims3m1E0WZhANEwAAgFyT2NnmuOOdhN7e3rj66qvjkksuiUsvvTQ2bNgQQ0NDsXr16oiIWLVqVSxatCjWrVsXERErVqyI9evXx0UXXTT+SM5NN90UK1asGA9O4HQnMAEAAJjhVq5cGc8991ysXbs2+vv7Y+nSpbF169bxhWD37ds3oVHyxS9+MYqiiC9+8Yvx9NNPxy/8wi/EihUr4k/+5E+q+hFg2hGYAAAA5JqiXXJOxpo1a2LNmjXH/N727dsnfH3GGWdEX19f9PX1TWZ2cFqwhgkAAABAQsMEAAAg1zRomADlEpgAAADkakS5/X3PAkDlXIYAAAAACQ0TAACAXEWU/EhOeUMBk6NhAgAAAJDQMAEAAMhVRLmtEA0TqJyGCQAAAEBCwwQAACBTs1FEs1FeLaTMsYDJ0TABAAAASGiYAAAA5CqKknfJ0TCBqmmYAAAAACQ0TAAAAHLZJQdqR8MEAAAAIKFhAgAAkKsoIsrc2cYaJlA5DRMAAACAhIYJAABALrvkQO0ITAAAAHJZ9BVqxyM5AAAAAAkNEwAAgFyNkhd9LXMsYFI0TAAAAAASGiYAAAC5NEygdjRMAAAAABIaJgAAAJmaxdhR5nhAtTRMAAAAABIaJgAAALmsYQK1o2ECAAAAkNAwAQAAyFUUY0eZ4wGV0jABAAAASGiYAAAA5LKGCdSOhgkAAABAQsMEAAAgVyPKvR3t1jZUTmACAACQy6KvUDtySwAAAICEhgkAAEAui75C7WiYAAAAACQ0TAAAADI1o4hmieuONEPDBKqmYQIAAACQ0DABAADIZVthqB2XIQAAAEBCwwQAACCXXXKgdjRMAAAAABIaJgAAALmKYuwoczygUhomAAAAAAkNEwAAgFzWMIHa0TABAAAASGiYAAAA5CpePcocD6iUwAQAACBTs1FEs8THaMocC5gcj+QAAAAAJDRMAAAAcln0FWpHwwQAAAAgoWECAACQqyjGjjLHAyqlYQIAAACQ0DABAADI1Yhyb0e7tQ2VcxkCAAAAJDRMZphmM+LQoapnkWdoqOoZAABAyYooeQ2T8oYCJkdgMoM0mxGXXx7x6KNVzwQAAADqTWAygxw6JCwBoByXXXx31VOACX742FVVTwHGvfTiofhfb/3fJ/emooho2CUH6kRgMkMNDETMmVP1LCZnaCiio6PqWQAAAMDrE5jMUHPmzNzABAAAaqdRcsOkzLGASbFLDgAAAEBCwwQAACBTsyiiWeK6I2WOBUyOwAQAACBXI8rt73sWACrnMgQAAABIaJgAAADkKopytwL2SA5UTsMEAAAAIKFhAgAAkMu2wlA7GiYAAAAACQ0TAACAXBomUDsaJgAAAAAJDRMAAIBcxatHmeMBldIwAQAAAEhomAAAAGRqNopolrjuSJljAZOjYQIAAACQ0DABAADIVRRjR5njAZXSMAEAAABIaJgAAADkahRjR5njAZUSmAAAAOSyrTDUjkdyAAAAABIaJgAAAJkaRUSjzNvRGiZQOQ0TAAAAgISGCQAAQCa7CkP9aJgAAAAAJDRMAAAAMmmYQP1omAAAANTAxo0bY/HixdHW1hZdXV2xc+fON3z9Cy+8ENdee20sWLAgWltb4+1vf3s89NBDp2i2MP1pmAAAAGQqiiKKEmshJzvWli1bore3NzZt2hRdXV2xYcOGWL58efz0pz+Nc88996jXDw8Px2//9m/HueeeG/fdd18sWrQofvazn8Wb3/zmsn4EmPEEJgAAADPc+vXr45prronVq1dHRMSmTZviwQcfjM2bN8fnP//5o16/efPmeP755+PRRx+NM888MyIiFi9efCqnDNOeR3IAAAAyvbaGSZnHiRoeHo5du3ZFT0/P+LlGoxE9PT2xY8eOY77n/vvvj+7u7rj22mujo6MjLrjggrj11ltjZGQk91cBtaFhAgAAME0NDg5O+Lq1tTVaW1snnDt48GCMjIxER0fHhPMdHR3xxBNPHHPcp556Kv7hH/4hrrrqqnjooYfiySefjE9/+tPxyiuvRF9fX7k/BMxQGiYAAACZpqph0tnZGXPnzh0/1q1bV8p8R0dH49xzz41vfOMbsWzZsli5cmV84QtfiE2bNpUyPtSBhgkAAMA0tX///mhvbx//Om2XRETMmzcvWlpaYmBgYML5gYGBmD9//jHHXbBgQZx55pnR0tIyfu7Xf/3Xo7+/P4aHh2PWrFkl/QQwc2mYAAAA5GpEFCUer/1Lrb29fcJxrMBk1qxZsWzZsti2bdv4udHR0di2bVt0d3cfc7qXXXZZPPnkkzE6Ojp+bs+ePbFgwQJhCbxKYAIAAJCpykVfIyJ6e3vjjjvuiG9961vx+OOPx6c+9akYGhoa3zVn1apVceONN46//lOf+lQ8//zzcd1118WePXviwQcfjFtvvTWuvfbaMn8tMKN5JAcAAGCGW7lyZTz33HOxdu3a6O/vj6VLl8bWrVvHF4Ldt29fNBo/v1/e2dkZ3/ve9+L666+PCy+8MBYtWhTXXXdd3HDDDVX9CDDtCEwAAAAyNYqxoyzNSYy1Zs2aWLNmzTG/t3379qPOdXd3x49+9KOT/yA4TXgkBwAAACChYQIAAJBpMuuOHG88oFoaJgAAAAAJDRMAAIBMGiZQPxomAAAAAAkNEwAAgExFUURRYi2kzLGAydEwAQAAAEhomAAAAGQqGmNHmeMB1XIZAgAAACQ0TAAAADLZJQfqR8MEAAAAIKFhAgAAkEnDBOpHYAIAAJCpiJIDk/KGAibJIzkAAAAACQ0TAACATI1i7ChLU8UEKqdhAgAAAJDQMAEAAMhk0VeoHw0TAAAAgISGCQAAQCYNE6gfDRMAAACAhIYJAABApqJRRFHiNjlljgVMjoYJAAAAQELDBAAAIJM1TKB+NEwAAAAAEhomAAAAmTRMoH40TAAAAAASGiYAAAC5Sm6YhIYJVE5gAgAAkKlRjB1ljgdUyyM5AAAAAAkNEwAAgEwWfYX60TABAAAASGiYAAAAZCoaY0eZ4wHVchkCAAAAJDRMAAAAMlnDBOpHwwQAAAAgoWECAACQqSiKKEqshZQ5FjA5GiYAJ+Cll16Kvr6++J3f+Z14y1veEkVRxF133VX1tAAAgCkiMAE4AQcPHoxbbrklHn/88ViyZEnV0wEAppnX1jAp8wCq5ZEcgBOwYMGCOHDgQMyfPz9+/OMfx2/8xm9UPSUAAGAKCUwATkBra2vMnz+/6mkAANOUXXKgfgQmVGpoqOoZcDry3x0AAHA8AhMq1dFR9QwAACCfhgnUj0VfOeVmz4647LKqZwEAAOVpFOUfQLU0TDjliiLikUciDh2qeiacrgYHIxYurHoWAADAdCYwoRJFETFnTtWz4HQ1MlL1DACAumlEua0QjwJA9VyHAAAAAAkNEwAAgEyNohmNolnqeEC1BCYAJ+hrX/tavPDCC/HMM89ERMQDDzwQ//mf/xkREX/4h38Yc+fOrXJ6AABAiQQmACfoq1/9avzsZz8b//q73/1ufPe7342IiI985CMCEwA4jZW9s41dcqB6AhOAE7R3796qpwAAAJwiAhMAAIBMRZS7o4aCCVTPLjkAAAAACQ0TAACATHbJgfrRMAEAAABIaJgAAABksksO1I+GCQAAAEBCwwQAACBTI8q9G+3ONlRPYAIAAJDJIzlQP4JLAAAAgISGCQAAQKaiaEZR4lbAZY4FTI6GCQAAAEBCwwQAACCTNUygfjRMAAAAABIaJgAAAJlsKwz14zoEAAAASGiYAAAAZGoUzWiUuLNNmWMBk6NhAgAAAJDQMAEAAMhklxyoHw0TAAAAgISGCQAAQKYiyr0brWAC1dMwAQAAAEhomAAAAGSyhgnUj8AEAAAgk22FoX48kgMAAACQ0DABAADI5JEcqB8NEwAAAICEhgkAAECmRpR7N9qdbaie6xAAAAAgoWECAACQyS45UD8aJgAAAAAJDRMAAIBMdsmB+tEwAQAAAEhomAAAAGTSMIH60TABAAAASGiYAAAAZGpEuXej3dmG6rkOAQAAABICEwAAgExF0YxGiUdRNE96Dhs3bozFixdHW1tbdHV1xc6dO0/offfee28URRHvf//7T/ozoc4EJgAAAJleW/S1zONkbNmyJXp7e6Ovry8ee+yxWLJkSSxfvjyeffbZN3zf3r1747Of/WxcccUVGT891JPABAAAYIZbv359XHPNNbF69eo477zzYtOmTTF79uzYvHnz675nZGQkrrrqqvjSl74Ub33rW0/hbGFmEJgAAABkakzBERExODg44Thy5MhRnz08PBy7du2Knp6en8+n0Yienp7YsWPH6875lltuiXPPPTc+9rGP5fzoUFsCEwAAgGmqs7Mz5s6dO36sW7fuqNccPHgwRkZGoqOjY8L5jo6O6O/vP+a4P/jBD+LOO++MO+64Y0rmDXVgW2EAAIBMjTj5dUeON15ExP79+6O9vX38fGtra/bYL774Ynz0ox+NO+64I+bNm5c9HtSVwAQAAGCaam9vnxCYHMu8efOipaUlBgYGJpwfGBiI+fPnH/X6f//3f4+9e/fGihUrxs+Njo5GRMQZZ5wRP/3pT+NXfuVXSpg9zGweyQEAAMhUvLoVcJnHiZo1a1YsW7Ystm3bNn5udHQ0tm3bFt3d3Ue9/h3veEf867/+a+zevXv8eN/73hfvfve7Y/fu3dHZ2VnK7wRmOg0TAACAGa63tzeuvvrquOSSS+LSSy+NDRs2xNDQUKxevToiIlatWhWLFi2KdevWRVtbW1xwwQUT3v/mN785IuKo83A6E5gAAABkahQlr2FykmOtXLkynnvuuVi7dm309/fH0qVLY+vWreMLwe7bty8aDQ8YwMkQmAAAANTAmjVrYs2aNcf83vbt29/wvXfddVf5E4IZTmACAACQqRHlLhCpCwLVcx0CAAAAJDRMAAAAMjWKZjROYmebExkPqJaGCQAAAEBCwwQAACBT1bvkAOUTmAAAAGQqSg5MCoEJVM4jOQAAAAAJDRMAAIBMLa8eZY4HVEvDBAAAACChYQIAAJDJtsJQPxomAAAAAAkNEwAAgEy2FYb60TABAAAASGiYAAAAZNIwgfrRMAEAAABIaJgAAABkainGjjLHA6qlYQIAAACQ0DABAADIZA0TqB8NEwD73hf9AAAP9klEQVQAAICEhgkAAECmRtGMRtEsdTygWgITAACATEXJj+QUHsmBynkkBwAAACChYQIAAJCp5dWjzPGAammYAAAAACQ0TAAAADLZVhjqR2ACcIqce96tUTTOrHoaEBERL+/7UtVTAJi2Bme9VPUUgGlAYAIAAJDJtsJQP9YwAQAAAEhomAAAAGRqKcaOMscDqqVhAgAAAJDQMAEAAMhklxyoHw0TAAAAgISGCQAAQCYNE6gfDRMAAACAhIYJAABApkaU3DApbyhgkgQmAAAAmRpFM1qKZqnjAdUSXAIAAAAkNEwAAAAyNaLcu9HubEP1XIcAAAAACQ0TAACATLYVhvrRMAEAAABIaJgAAABk0jCB+tEwAQAAAEhomAAAAGRqKSJaimap4wHV0jABAAAASGiYAAAAZLKGCdSPhgkAAABAQsMEAAAgk4YJ1I+GCQAAAEBCwwQAACCThgnUj8AEAAAgU6ModytggQlUzyM5AAAAAAkNEwAAgEyNohmNolnqeEC1NEwAAAAAEhomAAAAmRpR7t1od7aheq5DAAAAgISGCQAAQCbbCkP9aJgAAAAAJDRMAAAAMrUUY0eZ4wHV0jABAAAASGiYAAAAZGoUzWgUzVLHA6qlYQIAAACQ0DABAADIZJccqB8NEwAAAICEhgkAAEAmDROoH4EJAABApkaUW9/3KABUz3UIAAAAkNAwAQAAyFVEFGU+RuORHKichgkAAABAQsMEAAAgUxHllkIUTKB6GiYAAAAACQ0TAACATEXJa5iUuh4KMCkaJgAAAAAJDRMAAIBMjSj3brQ721A91yEAAABAQsMEAAAgU1E0oyiapY4HVEvDBAAAACChYQIAAJCpePUoczygWhomAAAAAAkNEwAAgExFRBQl1kI0TKB6AhMAAIBMHsmB+vFIDgAAAEBCwwQAACBToxg7yhwPqJaGCQAAAEBCwwQAACCTNUygfjRMAAAAamDjxo2xePHiaGtri66urti5c+frvvaOO+6IK664Is4555w455xzoqen5w1fD6cjgQkAAECmoij/OBlbtmyJ3t7e6Ovri8ceeyyWLFkSy5cvj2efffaYr9++fXt86EMfin/8x3+MHTt2RGdnZ1x55ZXx9NNPl/DbgHoQmAAAAMxw69evj2uuuSZWr14d5513XmzatClmz54dmzdvPubr77777vj0pz8dS5cujXe84x3xzW9+M0ZHR2Pbtm2neOYwfQlMAAAAMhVTcEREDA4OTjiOHDly1GcPDw/Hrl27oqenZ/xco9GInp6e2LFjxwnN/9ChQ/HKK6/EW97ylpP90aG2BCYAAADTVGdnZ8ydO3f8WLdu3VGvOXjwYIyMjERHR8eE8x0dHdHf339Cn3PDDTfEwoULJ4QucLqzSw4AAECmqdolZ//+/dHe3j5+vrW1tcRPGXPbbbfFvffeG9u3b4+2trbSx4eZSmACAAAwTbW3t08ITI5l3rx50dLSEgMDAxPODwwMxPz589/wvV/96lfjtttui+9///tx4YUXZs8X6sQjOQAAAJkaRfnHiZo1a1YsW7ZswoKtry3g2t3d/brv+9M//dP48pe/HFu3bo1LLrkk58eHWtIwAQAAmOF6e3vj6quvjksuuSQuvfTS2LBhQwwNDcXq1asjImLVqlWxaNGi8TVQvvKVr8TatWvjnnvuicWLF4+vdXL22WfH2WefXdnPAdOJwAQAACDTVK1hcqJWrlwZzz33XKxduzb6+/tj6dKlsXXr1vGFYPft2xeNxs8fMPiLv/iLGB4ejg984AMTxunr64ubb745c/ZQDwITAACAXEUziqJZ6ngna82aNbFmzZpjfm/79u0Tvt67d+8kJgWnF2uYAAAAACQ0TAAAADJV/UgOUD4NEwAAAICEwATgBBw5ciRuuOGGWLhwYZx11lnR1dUVDz/8cNXTAgCmiaIo/wCqJTABOAF/8Ad/EOvXr4+rrroq/vzP/zxaWlriPe95T/zgBz+oemoAAMAUsIYJwHHs3Lkz7r333rj99tvjs5/9bERErFq1Ki644IL43Oc+F48++mjFMwQAqtaIcu9Gu7MN1XMdAhzHfffdFy0tLfGJT3xi/FxbW1t87GMfix07dsT+/fsrnB0AADAVNExmqKGhqmcAM9fJXj8/+clP4u1vf3u0t7dPOH/ppZdGRMTu3bujs7OzrOkBADNQ2euOWMMEqicwmaE6OqqeAZw+Dhw4EAsWLDjq/GvnnnnmmVM9JQAAYIp5JGcGmT074rLLqp4FnH5efvnlaG1tPep8W1vb+PcBgNNbMQUHUC0NkxmkKCIeeSTi0KGqZwIz2+BgxMKFJ/76s846K44cOXLU+cOHD49/HwAAqBeByQxTFBFz5lQ9C5jZRkZO7vULFiyIp59++qjzBw4ciIiIhSeTvgAAtWQNE6gfj+QAHMfSpUtjz549MTg4OOH8P//zP49/HwAAqBeBCcBxfOADH4iRkZH4xje+MX7uyJEj8Zd/+ZfR1dVlhxwAwBomUEMeyQE4jq6urvjgBz8YN954Yzz77LPxtre9Lb71rW/F3r17484776x6egDANNAoxo4yxwOqJTABOAF/9Vd/FTfddFP89V//dfzXf/1XXHjhhfH3f//38Zu/+ZtVTw0AAJgCAhOAE9DW1ha333573H777VVPBQCYhsp+jEbBBKpnDRMAAACAhIYJAABApqJoRlE0Sx0PqJaGCQAAAEBCwwQAACCTNUygfjRMAAAAABIaJgAAAJmKYuwoczygWhomAAAAAAkNEwAAgEzWMIH60TABAAAASGiYAAAAZGpEuXej3dmG6rkOAQAAABIaJgAAALlK3iXHIiZQPYEJAABANsu+Qt14JAcAAAAgoWECAACQqXj1T5njAdXSMAEAAABIaJgAAABkKopGFEV596PLHAuYHFchAAAAQELDBAAAIJtdcqBuNEwAAAAAEhomAAAAmcb6JWXukgNUTcMEAAAAIKFhAgAAkM0aJlA3GiYAAAAACQ0TAACATEXRiKIo7350mWMBk+MqBAAAAEhomAAAAGSzhgnUjcAEAAAgU/HqnzLHA6rlkRwAAACAhIYJAABAJg0TqB8NEwAAAICEhgkAAEC2RpR7P9q9baiaqxAAAAAgoWECAACQqSiKKIoS1zApcSxgcjRMAAAAABIaJgAAANmKV48yxwOqpGECAAAA/L/27t81ij8N4PgzKySLRgMiZokGbAQVRdFCYnNNIJbpxEYQ8R8IWCiJEewFIYJNLIUgXwhXiCBpDQT8UVjYCVrcRq2y921yaK64xCPP5b6YzCSbrK9X2GaYzM4WEzbvefazJCZMAAAASipWfqo8HtBeJkwAAAAAEhMmAAAApdWi2vvR7m1Du7kKAQAAABITJgAAACVZwwQ6j2ACAABQUlEUURQVBpMKjwVsjo/kAAAAACQmTAAAAEorVh5VHg9oJxMmAAAAAIkJEwAAgJL+s+RrdfejLfoK7WfCBAAAACAxYQIAAFCaNUyg05gwAQAAAEhMmAAAAJRUFEUURXVTIVUeC9gcEyYAAAAAiQkTAACA0qxhAp3GhAkAAABAYsIEAACgpCJqUVR4P7rKYwGb4yoEAAAASEyYAAAAlGYNE+g0ggkAAEBJxcpPlccD2stHcgAAAAASEyYAAAAlFUURRVHhhEmFxwI2x4QJAAAAQGLCBAAAoLRaVHs/2r1taDdXIQAAAEBiwgQAAKAk35IDnceECQAAAEBiwgQAAKC0YuVR5fGAdjJhAgAAAJCYMAEAACipKIooigrXMKnwWMDmmDABAADoAI8ePYpjx45FvV6Pixcvxvz8/F/u/+zZszhx4kTU6/U4c+ZMPH/+fJvOFHYHwQQAAKC02hY8ft309HSMjo7GxMREvHnzJs6ePRvDw8Px5cuXdfd/9epVXL16NW7cuBFv376NkZGRGBkZiffv32/0hUPHEkwAAAB2uQcPHsTNmzfj+vXrcerUqXj8+HHs3bs3njx5su7+Dx8+jMuXL8etW7fi5MmTcf/+/Th//nxMTk5u85nDzmUNkx1ieXk5IiIWFxfbfCbQ+Vavs9XrbqutPs/yj39ty/PBr1hc/Ge7TwFgx1r9G7mR9wqtxT+jqPCbbVqLf66cy9r/D7q7u6O7u3vNtqWlpXj9+nXcvn3757ZarRZDQ0MxNze37vHn5uZidHR0zbbh4eGYmZmp4vShIwgmO0Sr1YqIiIGBgTafCfw+Wq1W9Pb2bsvzREQs/ePvW/5c8Kt6e/9o9ykA7Hi/8l6hq6srGo1GDAz8rfLn7+np+Z//DyYmJuLevXtrtn379i2+f/8efX19a7b39fXFhw8f1j12s9lcd/9ms1n+xKFDCCY7RH9/f3z+/Dn2799vRWzYYsvLy9FqtaK/v39bns/1DQC7y0beK9Tr9fj48WMsLS1tyXnk9w55ugTYOoLJDlGr1eLo0aPtPg34bWzHZMkq1zcA7D4bea9Qr9ejXq9v4dn8tUOHDsWePXtiYWFhzfaFhYVoNBrr/k6j0djQ/vA7sugrAADALtbV1RUXLlyI2dnZn9t+/PgRs7OzMTg4uO7vDA4Ortk/IuLly5f/d3/4He25lz8ABwAAwK5y4MCBGB8fj4GBgeju7o7x8fF49+5dTE1NRU9PT1y7di3m5+djaGgoIiKOHDkSY2NjsW/fvjh48GBMTk7G9PR0TE1NxeHDh9v8amBn8JEcAACAXe7KlSvx9evXuHv3bjSbzTh37ly8ePHi58Kunz59ilrtvx8wuHTpUjx9+jTGxsbizp07cfz48ZiZmYnTp0+36yXAjlMsb9f3agIAAADsEtYwAQAAAEgEEwAAAIBEMAEAAABIBBMAAACARDABAAAASAQTAAAAgEQwAQAAAEgEEwAAAIBEMAEAAABIBBMAAACARDABAAAASAQTAAAAgEQwAQAAAEgEEwAAAIBEMAEAAABIBBMAAACARDABAAAASAQTAAAAgEQwAQAAAEgEEwAAAIBEMAEAAABIBBMAAACARDABAAAASAQTAAAAgEQwAQAAAEgEEwAAAIBEMAEAAABIBBMAAACARDABAAAASAQTAAAAgEQwAQAAAEgEEwAAAIBEMAEAAABIBBMAAACARDABAAAASAQTAAAAgEQwAQAAAEgEEwAAAIBEMAEAAABIBBMAAACARDABAAAASAQTAAAAgEQwAQAAAEgEEwAAAIBEMAEAAABIBBMAAACARDABAAAASAQTAAAAgEQwAQAAAEgEEwAAAIBEMAEAAABIBBMAAACARDABAAAASAQTAAAAgEQwAQAAAEgEEwAAAIBEMAEAAABIBBMAAACARDABAAAASAQTAAAAgEQwAQAAAEgEEwAAAIBEMAEAAABIBBMAAACARDABAAAASAQTAAAAgEQwAQAAAEgEEwAAAIBEMAEAAABIBBMAAACARDABAAAASAQTAAAAgEQwAQAAAEj+DSWbqrPCL8a0AAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "!sourmash plot genome_compare.mat\n", "\n", "from IPython.display import Image\n", "Image(filename='genome_compare.mat.matrix.png') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and for the R aficionados, you can output a CSV version of the matrix:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K== This is sourmash version 2.0.0a12.dev48+ga92289b. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kloaded 3 signatures total. \n", "\u001b[Kdownsampling to scaled value of 1000\n", "\u001b[K\n", "0-CP001071.1 Akke...\t[1. 0. 0.]\n", "1-NC_009665.1 She...\t[0. 1. 0.228]\n", "2-NC_011663.1 She...\t[0. 0.228 1. ]\n", "min similarity in matrix: 0.000\n" ] } ], "source": [ "!sourmash compare -k 31 *.sig --csv genome_compare.csv" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\"CP001071.1 Akkermansia muciniphila ATCC BAA-835, complete genome\",\"NC_009665.1 Shewanella baltica OS185, complete genome\",\"NC_011663.1 Shewanella baltica OS223, complete genome\"\r", "\r\n", "1.0,0.0,0.0\r\n", "0.0,1.0,0.22846441947565543\r\n", "0.0,0.22846441947565543,1.0\r\n" ] } ], "source": [ "!cat genome_compare.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is now a file that you can load into R and examine - see [our documentation](https://sourmash.readthedocs.io/en/latest/other-languages.html) on that." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## working with metagenomes\n", "\n", "Let's make a fake metagenome:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K== This is sourmash version 2.0.0a12.dev48+ga92289b. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Ksetting num_hashes to 0 because --scaled is set\n", "\u001b[Kcomputing signatures for files: fake-metagenome.fa\n", "\u001b[KComputing signature for ksizes: [31]\n", "\u001b[KComputing only nucleotide (and not protein) signatures.\n", "\u001b[KComputing a total of 1 signature(s).\n", "\u001b[K... reading sequences from fake-metagenome.fa\n", "\u001b[Kcalculated 1 signatures for 3 sequences in fake-metagenome.fa\n", "\u001b[Ksaved 1 signature(s). Note: signature license is CC0.\n" ] } ], "source": [ "!cat genomes/*.fa > fake-metagenome.fa\n", "!sourmash compute -k 31 --scaled=1000 fake-metagenome.fa" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the `sourmash gather` command to see what's in it:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K== This is sourmash version 2.0.0a12.dev48+ga92289b. ==\n", "\u001b[K== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==\n", "\n", "\u001b[Kselect query k=31 automatically.\n", "\u001b[Kloaded query: fake-metagenome.fa... (k=31, DNA)\n", "\u001b[Kloaded 3 signatures. \n", "\n", "\n", "overlap p_query p_match\n", "--------- ------- -------\n", "499.0 kbp 38.4% 100.0% CP001071.1 Akkermansia muciniphila AT...\n", "494.0 kbp 38.0% 100.0% NC_009665.1 Shewanella baltica OS185,...\n", "490.0 kbp 23.6% 62.7% NC_011663.1 Shewanella baltica OS223,...\n", "\n", "found 3 matches total;\n", "the recovered matches hit 100.0% of the query\n", "\n" ] } ], "source": [ "!sourmash gather fake-metagenome.fa.sig shew*.sig akker*.sig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other pointers\n", "\n", "[Sourmash: a practical guide](https://sourmash.readthedocs.io/en/latest/using-sourmash-a-guide.html)\n", "\n", "[Classifying signatures taxonomically](https://sourmash.readthedocs.io/en/latest/classifying-signatures.html)\n", "\n", "[Pre-built search databases](https://sourmash.readthedocs.io/en/latest/databases.html)\n", "\n", "## A full list of notebooks\n", "\n", "[An introduction to k-mers for genome comparison and analysis](kmers-and-minhash.ipynb)\n", "\n", "[Some sourmash command line examples!](sourmash-examples.ipynb)\n", "\n", "[Working with private collections of signatures.](sourmash-collections.ipynb)\n", "\n", "[Using the LCA_Database API.](using-LCA-database-API.ipynb)\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 2 }