{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

ipyrad-analysis toolkit: tetrad

\n", "\n", "The `tetrad` tool is a framework for inferring a species tree topology using quartet-joining on large unlinked SNP data sets. It is particularly optimized for RAD-seq type datasets that are likely to involve a lot of missing data. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load libraries" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "# conda install ipyrad -c conda-forge -c bioconda\n", "# conda install tetrad -c conda-forge" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import ipyrad.analysis as ipa\n", "import toytree\n", "import ipcoal" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simulate a random tree with 20 tips and crown age of 10M generations" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
r0r1r2r3r4r5r6r7r8r9r10r11r12r13r14r15r16r17r18r190500000010000000
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tree = toytree.rtree.bdtree(ntips=20, seed=555)\n", "tree = tree.mod.node_scale_root_height(10e6) \n", "tree.draw(scalebar=True);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simulate SNPs with missing data and write to database (.seqs.hdf5)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "wrote 259208 SNPs to /tmp/test-tet-miss50.snps.hdf5\n" ] } ], "source": [ "# init simulator with one diploid sample from each tip\n", "model = ipcoal.Model(tree, Ne=1e6, nsamples=2, recomb=0)\n", "\n", "# simulate sequence data on 10K loci\n", "model.sim_loci(10000, 50)\n", "\n", "# add missing data (50%)\n", "model.apply_missing_mask(0.5)\n", "\n", "# write results to database file\n", "model.write_snps_to_hdf5(name=\"test-tet-miss50\", outdir='/tmp', diploid=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Infer tetrad tree" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "SNPS = \"/tmp/test-tet-miss50.snps.hdf5\"" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loading snps array [20 taxa x 259208 snps]\n", "max unlinked SNPs per quartet [nloci]: 10000\n", "quartet sampler [full]: 4845 / 4845\n" ] } ], "source": [ "tet = ipa.tetrad(\n", " data=SNPS,\n", " name=\"test-tet-miss50\",\n", " workdir=\"/tmp\",\n", " nboots=10, \n", " nquartets=1e6,\n", ")" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parallel connection | latituba: 8 cores\n", "initializing quartet sets database\n", "[####################] 100% 0:00:14 | full tree * | mean SNPs/qrt: 3167 \n", "[####################] 100% 0:00:12 | boot rep. 1 | mean SNPs/qrt: 3163 \n", "[####################] 100% 0:00:12 | boot rep. 2 | mean SNPs/qrt: 3161 \n", "[####################] 100% 0:00:12 | boot rep. 3 | mean SNPs/qrt: 3170 \n", "[####################] 100% 0:00:12 | boot rep. 4 | mean SNPs/qrt: 3144 \n", "[####################] 100% 0:00:11 | boot rep. 5 | mean SNPs/qrt: 3172 \n", "[####################] 100% 0:00:11 | boot rep. 6 | mean SNPs/qrt: 3191 \n", "[####################] 100% 0:00:11 | boot rep. 7 | mean SNPs/qrt: 3144 \n", "[####################] 100% 0:00:11 | boot rep. 8 | mean SNPs/qrt: 3153 \n", "[####################] 100% 0:00:11 | boot rep. 9 | mean SNPs/qrt: 3180 \n", "[####################] 100% 0:00:11 | boot rep. 10 | mean SNPs/qrt: 3128 \n" ] } ], "source": [ "tet.run(auto=True, force=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Draw the inferred tetrad tree" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
10010070100100100100100100100100100100100100100100100r3r2r0r1r4r5r7r6r8r9r11r10r12r13r15r14r16r18r17r19
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tre = toytree.tree(tet.trees.cons)\n", "rtre = tre.root([\"r19\", \"r18\", \"r17\"])\n", "rtre.draw(ts='d', use_edge_lengths=False, node_labels=\"support\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Does this tree match the *true* species tree?" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rfdist = rtre.treenode.robinson_foulds(tree.treenode)[0]\n", "rfdist == 0" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" } }, "nbformat": 4, "nbformat_minor": 4 }