{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Inferring species trees with *tetrad*\n", "\n", "When you install _ipyrad_ a number of analysis tools are installed as well. This includes the program __tetrad__, which applies the theory of phylogenetic invariants (see Lake 1987) to infer quartet trees based on a SNP alignment. It then uses the software wQMC to join the quartets into a species tree. This combined approach was first developed by Chifman and Kubatko (2015) in the software *SVDQuartets*. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Required software" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## conda install ipyrad -c ipyrad\n", "## conda install toytree -c eaton-lab" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import ipyrad.analysis as ipa\n", "import ipyparallel as ipp\n", "import toytree" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Connect to a cluster" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "connected to 4 cores\n" ] } ], "source": [ "## connect to a cluster\n", "ipyclient = ipp.Client()\n", "print(\"connected to {} cores\".format(len(ipyclient)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run tetrad" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loading seq array [13 taxa x 14159 bp]\n", "max unlinked SNPs per quartet (nloci): 2777\n" ] } ], "source": [ "## initiate a tetrad object\n", "tet = ipa.tetrad(\n", " name=\"pedic-full\",\n", " seqfile=\"analysis-ipyrad/pedic-full_outfiles/pedic-full.snps.phy\",\n", " mapfile=\"analysis-ipyrad/pedic-full_outfiles/pedic-full.snps.map\",\n", " nboots=100,\n", " )" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "host compute node: [4 cores] on oud\n", "inferring 715 induced quartet trees\n", "[####################] 100% initial tree | 0:00:06 | \n", "[####################] 100% boot 100 | 0:01:00 | \n" ] } ], "source": [ "## run tetrad on the cluster\n", "tet.run(ipyclient=ipyclient)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot the tree" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
33588_przewalskii32082_przewalskii30686_cyathophylla29154_superba41954_cyathophylloides41478_cyathophylloides33413_thamno30556_thamno35236_rex35855_rex40578_rex38362_rex39618_rexidx: 1\n", "name: 1\n", "dist: 100\n", "support: 100100idx: 2\n", "name: 2\n", "dist: 100\n", "support: 100100idx: 3\n", "name: 3\n", "dist: 100\n", "support: 100100idx: 4\n", "name: 4\n", "dist: 100\n", "support: 100100idx: 5\n", "name: 5\n", "dist: 100\n", "support: 100100idx: 6\n", "name: 6\n", "dist: 81\n", "support: 8181idx: 7\n", "name: 7\n", "dist: 48\n", "support: 4848idx: 8\n", "name: 8\n", "dist: 35\n", "support: 3535idx: 9\n", "name: 9\n", "dist: 100\n", "support: 100100idx: 10\n", "name: 10\n", "dist: 100\n", "support: 100100
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## plot the resulting unrooted tree\n", "import toytree\n", "tre = toytree.tree(tet.trees.nhx)\n", "tre.draw(\n", " width=350, \n", " node_labels=tre.get_node_values(\"support\"),\n", " );" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "## save the tree as a pdf\n", "import toyplot.pdf\n", "toyplot.pdf.render(canvas, \"analysis-tetrad/tetrad-tree.pdf\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What does *tetrad* do differently from *svdquartets*?\n", "\n", "Not too much currently. But we have plans to expand it. Importantly, however, the code is open source meaning that anybody can read it and contribute it, which is not the case for Paup\\*. *tetrad* is also easier to install using conda and therefore easier to setup on an HPC cluster or local machine, and it can be parallelized across an arbitrarily large number of compute nodes while retaining a super small memory footprint." ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 1 }