{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## (📗) ipyrad Cookbook: `tetrad` species tree inference\n", "\n", "The `ipyrad.analysis` package includes a module and command-line tool called `tetrad` that can be used to infer quartets from very large SNP alignments, and to join the quartets into a species tree for large numbers of samples following the methodology outlined by Chifman and Kubatko (2014) and implemented in `SVDQuartets`. \n", "\n", "The `tetrad` approach varies in several ways: (1) it uses the same mode of parallelization as `ipyrad` and is therefore easy to distribute work across multiple nodes of a large HPC cluster; (2) it can be easily implemented from the command-line tool or in a jupyter-notebook without having to write commands into the large sequence file as a nexus block; (3) it implements a strategy to perform bootstrap re-sampling of RAD loci, and of SNPs within RAD loci; (4) it calculates admixture statistics while it runs (i.e., ABBA-BABA); (5) [coming soon] advanced sampling methods for large trees when the number of quartets is too large to sample. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook setup\n", "This is a Jupyter notebook, a reproducible and executable document. The code in this notebook is Python (2.7), and should be executed in a jupyter-notebook like this one. Execute each cell in order to reproduce our entire analysis. We make use of the [ipyparallel](http://ipyparallel.rtfd.io) Python library to distribute STRUCTURE jobs across processers in parallel. If that is confusing, see our [tutorial]() on using ipcluster with jupyter. The example data set used in this analysis is from the [empirical example ipyrad tutorial](http://ipyrad.readthedocs.io/pedicularis_.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Required software\n", "All software required for this notebook can be installed using conda." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## conda install ipyrad -c ipyrad\n", "## conda install toytree -c eaton-lab" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import ipyrad as ip\n", "import ipyrad.analysis as ipa\n", "import ipyparallel as ipp\n", "import toytree" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Connect to an ipyparallel cluster\n", "`tetrad` uses a program called `ipcluster` to distribute work across a computing cluster. When using the command-line program `tetrad` will automatically start up an `ipcluster` instance, however, when using the API we leave it to the use to start the `ipcluster` instance so that you have more fine-tuned control over the parallelization. For this notebook we assume that you will have started an ipcluster instance running. You can start one by running the commented command below in a separate terminal. This will be connected to when you call the `.run()` command further below. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## \n", "## ipcluster start --n=40\n", "##" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setup `tetrad` tree inference\n", "The first step is to create a named `tetrad` Class object, which requires a minimum of two argument, a name and a sequence file. The sequence that you will typically want to enter is the `'.snps.phy'` file from `ipyrad`, which is a phylip formatted file with *all* SNPs. You can also pass it the `'.snps.map'` file, which tells `tetrad` how to the SNPS are linked within loci, so that a single SNP can be randomly sampled in each bootstrap replicate. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loading seq array [13 taxa x 196435 bp]\n", "max unlinked SNPs per quartet (nloci): 39613\n" ] } ], "source": [ "## create a tetrad Class object\n", "tet = ipa.tetrad(\n", " name='tutorial', \n", " data=\"./analysis-ipyrad/pedicularis_outfiles/pedicularis.snps.phy\",\n", " mapfile=\"./analysis-ipyrad/pedicularis_outfiles/pedicularis.snps.map\",\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A large number of additional paremeter settings are available that you can set when you create the `tetrad` Class object, or which you can set afterwards, as we do here. Let's set the `'nboots'` parameter and the `'method'` parameter which tells `tetrad` how to sample quartets. Use tab-completion after the `tet` variable below to see what other options are available. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## set additional parameters\n", "tet.nboots = 10\n", "tet.method = \"all\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Infer the tree\n", "The `'run()'` command distributes the computation across your cluster, print progress, and blocks until the job in complete. The results of the analysis will be written to file and also stored in your `tetrad` Class object, and can be accessed from its `.stats` and `.trees` attributes. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "host compute node: [4 cores] on oud\n", "inferring 715 induced quartet trees\n", "[####################] 100% initial tree | 0:00:04 | \n", "[####################] 100% boot 10 | 0:00:26 | \n" ] } ], "source": [ "tet.run()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Draw the tree\n", "Use the `toytree.tree()` function to generate a tree plot of the unrooted consensus tree with bootstrap support values. The command `'draw()'` returns a number of objects, the first of which is the `canvas`. To save the plot as an SVG image use the `toyplot.svg.render()` function like below. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "boots ~/Documents/ipyrad/tests/analysis-tetrad/tutorial.boots\n", "cons ~/Documents/ipyrad/tests/analysis-tetrad/tutorial.cons\n", "nhx ~/Documents/ipyrad/tests/analysis-tetrad/tutorial.nhx\n", "tree ~/Documents/ipyrad/tests/analysis-tetrad/tutorial.tree" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## the newick tree files produced by tetrad\n", "tet.trees" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
33588_przewalskii32082_przewalskii30686_cyathophylla29154_superba41954_cyathophylloides41478_cyathophylloides33413_thamno35236_rex30556_thamno35855_rex40578_rex38362_rex39618_rexidx: 1\n", "name: 1\n", "dist: 100\n", "support: 100100idx: 2\n", "name: 2\n", "dist: 100\n", "support: 100100idx: 3\n", "name: 3\n", "dist: 100\n", "support: 100100idx: 4\n", "name: 4\n", "dist: 100\n", "support: 100100idx: 5\n", "name: 5\n", "dist: 100\n", "support: 100100idx: 6\n", "name: 6\n", "dist: 100\n", "support: 100100idx: 7\n", "name: 7\n", "dist: 91\n", "support: 9191idx: 8\n", "name: 8\n", "dist: 82\n", "support: 8282idx: 9\n", "name: 9\n", "dist: 100\n", "support: 100100idx: 10\n", "name: 10\n", "dist: 100\n", "support: 100100
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
33588_przewalskii32082_przewalskii30686_cyathophylla29154_superba41954_cyathophylloides41478_cyathophylloides33413_thamno35236_rex30556_thamno35855_rex40578_rex38362_rex39618_rexidx: 1\n", "name: 1\n", "dist: 100\n", "support: 10028idx: 2\n", "name: 2\n", "dist: 100\n", "support: 10056idx: 3\n", "name: 3\n", "dist: 100\n", "support: 10018idx: 4\n", "name: 4\n", "dist: 100\n", "support: 10018idx: 5\n", "name: 5\n", "dist: 100\n", "support: 10048idx: 6\n", "name: 6\n", "dist: 100\n", "support: 10048idx: 7\n", "name: 7\n", "dist: 91\n", "support: 9128idx: 8\n", "name: 8\n", "dist: 82\n", "support: 8256idx: 9\n", "name: 9\n", "dist: 100\n", "support: 10018idx: 10\n", "name: 10\n", "dist: 100\n", "support: 10018
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## create a toytree object from the newick file path\n", "tre = toytree.tree(tet.trees.nhx)\n", "\n", "## draw unrooted consensus tree with support values\n", "canvas, axes = tre.draw(\n", " width=300,\n", " node_labels=tre.get_node_values(\"support\"),\n", " );\n", "\n", "## draw unrooted consensus tree with the number of quartets \n", "## that can possibly be informative about any given split\n", "## given the 'true' tree.\n", "canvas, axes = tre.draw(\n", " width=300,\n", " node_labels=tre.get_node_values(\"quartets_total\"),\n", " );" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## save the tree figure in [format]\n", "import toyplot.svg\n", "toyplot.svg.render(canvas, \"analysis-tetrad/pedic-tree.svg\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Checkpointed analysis\n", "If you want to add more bootstrap replicates later simply increase the the `'nboots'` attribute and execute `'run()'` again. " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "host compute node: [4 cores] on oud\n", "initial tree already inferred\n", "[####################] 100% boot 50 | 0:01:43 | \n" ] } ], "source": [ "tet.nboots = 50\n", "tet.run()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
33588_przewalskii32082_przewalskii30686_cyathophylla29154_superba41954_cyathophylloides41478_cyathophylloides33413_thamno35236_rex30556_thamno35855_rex40578_rex38362_rex39618_rexidx: 1\n", "name: 1\n", "dist: 100\n", "support: 100100idx: 2\n", "name: 2\n", "dist: 100\n", "support: 100100idx: 3\n", "name: 3\n", "dist: 100\n", "support: 100100idx: 4\n", "name: 4\n", "dist: 100\n", "support: 100100idx: 5\n", "name: 5\n", "dist: 100\n", "support: 100100idx: 6\n", "name: 6\n", "dist: 100\n", "support: 100100idx: 7\n", "name: 7\n", "dist: 94\n", "support: 9494idx: 8\n", "name: 8\n", "dist: 84\n", "support: 8484idx: 9\n", "name: 9\n", "dist: 100\n", "support: 100100idx: 10\n", "name: 10\n", "dist: 100\n", "support: 100100
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## load in the tree\n", "tre = toytree.tree(tet.trees.nhx)\n", "\n", "## draw the unrooted consensus tree with support values\n", "canvas, axes = tre.draw(\n", " width=300,\n", " node_labels=tre.get_node_values(\"support\"),\n", " );" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load existing tetrad object\n", "Whenever you execute the `'run()'` command your `tetrad` Class object will be saved in your working directory as a JSON file (`{name}.tet.json`). You can load an existing `tetrad` object back into memory by using the `load` argument when you create a `tetrad` object. The default working directory (`workdir`) is `'./analysis-tetrad'` unless you change it. " ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loading seq array [13 taxa x 196435 bp]\n", "max unlinked SNPs per quartet (nloci): 39613\n" ] }, { "data": { "text/html": [ "
33588_przewalskii32082_przewalskii30686_cyathophylla29154_superba41954_cyathophylloides41478_cyathophylloides33413_thamno35236_rex30556_thamno35855_rex40578_rex38362_rex39618_rexidx: 1\n", "name: 1\n", "dist: 100\n", "support: 100100idx: 2\n", "name: 2\n", "dist: 100\n", "support: 100100idx: 3\n", "name: 3\n", "dist: 100\n", "support: 100100idx: 4\n", "name: 4\n", "dist: 100\n", "support: 100100idx: 5\n", "name: 5\n", "dist: 100\n", "support: 100100idx: 6\n", "name: 6\n", "dist: 100\n", "support: 100100idx: 7\n", "name: 7\n", "dist: 94\n", "support: 9494idx: 8\n", "name: 8\n", "dist: 84\n", "support: 8484idx: 9\n", "name: 9\n", "dist: 100\n", "support: 100100idx: 10\n", "name: 10\n", "dist: 100\n", "support: 100100
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## load an old tetrad object \n", "## (the json file will be found from {workdir}/{name}.tet.json)\n", "oldtet = ipa.tetrad(\n", " name=\"tutorial\", \n", " workdir=\"analysis-tetrad/\", \n", " load=True)\n", "\n", "## draw the tree from it\n", "tre = toytree.tree(oldtet.trees.nhx)\n", "tre.draw(width=300, node_labels=tre.get_node_values(\"support\"));" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 2 }