{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Cookbook: TreeMix\n", "\n", "The program [TreeMix](https://bitbucket.org/nygcresearch/treemix/wiki/Home) by [Pickrell & Pritchard (2012)](http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002967) is used to infer population splits and admixture from allele frequency data. From the TreeMix documentation: \"In the underlying model, the modern-day populations in a species are related to a common ancestor via a graph of ancestral populations. We use the allele frequencies in the modern populations to infer the structure of this graph.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Required software" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## conda install -c ipyrad ipyrad\n", "## conda install -c ipyrad treemix\n", "## conda install -c eaton-lab toytree" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import ipyrad.analysis as ipa\n", "import toytree\n", "import toyplot\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ipyrad 0.7.1\n", "toytree 0.1.3\n", "toyplot 0.15.0-dev\n", "TreeMix v. 1.12\r\n" ] } ], "source": [ "print 'ipyrad', ipa.__version__\n", "print 'toytree', toytree.__version__\n", "print 'toyplot', toyplot.__version__\n", "! treemix --version | grep 'TreeMix v. '" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define the populations\n", "These two dictionaries will be used to parse out SNPs from the ipyrad SNPs output file (`.snps.phy`), to filter the SNPs for inclusion in the data set based on missing data (minmap), and to group them into populations (imap). " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## a dictionary mapping sample names to 'species' names\n", "imap = {\n", " \"prz\": [\"32082_przewalskii\", \"33588_przewalskii\"],\n", " \"cys\": [\"41478_cyathophylloides\", \"41954_cyathophylloides\"],\n", " \"cya\": [\"30686_cyathophylla\"],\n", " \"sup\": [\"29154_superba\"],\n", " \"cup\": [\"33413_thamno\"],\n", " \"tha\": [\"30556_thamno\"],\n", " \"rck\": [\"35236_rex\"],\n", " \"rex\": [\"35855_rex\", \"40578_rex\"],\n", " \"lip\": [\"39618_rex\", \"38362_rex\"], \n", " }\n", "\n", "## optional: loci will be filtered if they do not have data for at\n", "## least N samples in each species. Minimums cannot be <1.\n", "minmap = {\n", " \"prz\": 2,\n", " \"cys\": 2,\n", " \"cya\": 1,\n", " \"sup\": 1,\n", " \"cup\": 1,\n", " \"tha\": 1, \n", " \"rck\": 1,\n", " \"rex\": 2,\n", " \"lip\": 2,\n", " }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a Treemix object\n", "The format for this object is similar to many of the other `ipyrad.analysis` objects. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "t = ipa.treemix(\n", " name=\"test\",\n", " data=\"analysis-ipyrad/pedicularis_outfiles/pedicularis.snps.phy\",\n", " imap=imap,\n", " minmap=minmap,\n", " )" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "binary treemix \n", "bootstrap 0 \n", "climb 0 \n", "cormig 0 \n", "g (None, None) \n", "global_ 1 \n", "k 0 \n", "m 1 \n", "root prz \n", "se 0 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## you can set additional parameter args here\n", "t.params.m = 1\n", "t.params.root = \"prz\"\n", "t.params.global_ = 1\n", "#t.params.se = 1\n", "t.params" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generate the treemix input file" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ntaxa 13; nSNPs total 194653; nSNPs written 12247\n" ] } ], "source": [ "## write treemix input files so you can call treemix from the command line\n", "s = t.write_output_file()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The command string\n", "This shows the command string that corresponds to the parameter settings in the Treemix object. You can see that the input file (-i) is the string we enetered in the data field above, and the output prefix (-o) corresponds to the default working directory and the name field that we provided above. In addition, the argument (-m 1) is added because we added that to the params dictionary. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "treemix -i /home/deren/Documents/ipyrad/tests/analysis-treemix/test.treemix.in.gz -o /home/deren/Documents/ipyrad/tests/analysis-treemix/test -m 1 -root prz -global\n" ] } ], "source": [ "## the command string\n", "print t.command" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true, "scrolled": false }, "outputs": [], "source": [ "## you can run the command in a notebook by using bash one-liners (!)\n", "! $t.command > analysis-treemix/treemix.log" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run treemix jobs\n", "Alternatively, you can use the `.run()` command of the treemix object to run treemix. This is more convenient because the results will automatically be parsed by the treemix object so that they are easily accessible for downstream plotting. In the loop below we run treemix over a range of migration parameters (-m) and with 4 replicates per setting. " ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## a dictionary for storing treemix objects\n", "tdict = {}\n", "\n", "## iterate over values of m\n", "for rep in xrange(4):\n", " for mig in xrange(4):\n", " \n", " ## create new treemix object copy\n", " name = \"mig-{}-rep-{}\".format(mig, rep)\n", " tmp = t.copy(name)\n", "\n", " ## set params on new object\n", " tmp.params.m = mig\n", " \n", " ## run treemix analysis\n", " tmp.run()\n", " \n", " ## store the treemix object\n", " tdict[name] = tmp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Accessible results" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## choose a treemix object from the above analysis\n", "t = tdict['mig-2-rep-3']" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "cov ~/Documents/ipyrad/tests/analysis-treemix/mig-2-rep-3.cov.gz\n", "covse ~/Documents/ipyrad/tests/analysis-treemix/mig-2-rep-3.covse.gz\n", "edges ~/Documents/ipyrad/tests/analysis-treemix/mig-2-rep-3.edges.gz\n", "llik ~/Documents/ipyrad/tests/analysis-treemix/mig-2-rep-3.llik\n", "modelcov ~/Documents/ipyrad/tests/analysis-treemix/mig-2-rep-3.modelcov.gz\n", "treeout ~/Documents/ipyrad/tests/analysis-treemix/mig-2-rep-3.treeout.gz\n", "vertices ~/Documents/ipyrad/tests/analysis-treemix/mig-2-rep-3.vertices.gz" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## access output files produced by treemix\n", "t.files" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "'((((tha:0.0528677,(lip:0.0360458,(rex:0.015358,rck:0.0400938):0.0013585):0.000729372):0.0061529,cup:0.0402529):0.0346747,((sup:0.0493299,cya:0.0485717):0.0144389,cys:0.0693095):0.016622):0.00587823,prz:0.309518);'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## access the newick string representation of the tree\n", "t.results.tree" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(12, 0.0402529, 13, 0.0528677, 0.087311),\n", " (8, 0.309518, 16, 0.0400938, 0.0459799)]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## access a list of admixture edges\n", "t.results.admixture" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.267703 , -0.0258312 , -0.0268198 , -0.0267356 , -0.0364621 ,\n", " -0.0393228 , -0.038564 , -0.0365603 , -0.037407 ],\n", " [-0.0258312 , 0.0819627 , 0.0116645 , 0.0117488 , -0.0145997 ,\n", " -0.0174604 , -0.0167016 , -0.0146979 , -0.0160852 ],\n", " [-0.0268198 , 0.0116645 , 0.0744446 , 0.025199 , -0.0155884 ,\n", " -0.0184491 , -0.0176903 , -0.0156866 , -0.0170739 ],\n", " [-0.0267356 , 0.0117488 , 0.025199 , 0.0738549 , -0.0155041 ,\n", " -0.0183649 , -0.0176061 , -0.0156024 , -0.0169897 ],\n", " [-0.0364621 , -0.0145997 , -0.0155884 , -0.0155041 , 0.0496971 ,\n", " 0.00940488, 0.00734218, 0.00934589, 0.00636426],\n", " [-0.0393228 , -0.0174604 , -0.0184491 , -0.0183649 , 0.00940488,\n", " 0.0531333 , 0.0100971 , 0.0121008 , 0.00886101],\n", " [-0.038564 , -0.0167016 , -0.0176903 , -0.0176061 , 0.00734218,\n", " 0.0100971 , 0.0481683 , 0.0141262 , 0.0108281 ],\n", " [-0.0365603 , -0.0146979 , -0.0156866 , -0.0156024 , 0.00934589,\n", " 0.0121008 , 0.0141262 , 0.0328464 , 0.0141279 ],\n", " [-0.037407 , -0.0160852 , -0.0170739 , -0.0169897 , 0.00636426,\n", " 0.00886101, 0.0108281 , 0.0141279 , 0.0473745 ]])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## access the covariance matrix\n", "t.results.modelcov" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### You can generate plots in R\n", "Follow the directions in the Treemix tutorial for plotting results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Or, you can generate plots in Python using Toytree\n", "The code to produce tree plots with admixture edges in Toytree is still in development and will change to be more user friendly in the near future. " ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def _get_admix_point(tre, idx, dist):\n", " ## parent coordinates\n", " px, py = tre.verts[idx]\n", " ## child coordinates\n", " cx, cy = tre.verts[tre.tree.search_nodes(idx=idx)[0].up.idx]\n", " ## angle of hypotenuse\n", " theta = np.arctan((px-cx) / (py-cy))\n", " ## new coords along the hypot angle\n", " horz = np.sin(theta) * dist\n", " vert = np.cos(theta) * dist\n", " \n", " ## change x\n", " a = tre.verts[idx, 0]\n", " b = tre.verts[idx, 1] \n", " a -= abs(horz)\n", " if py < cy:\n", " b += abs(vert)\n", " else:\n", " b -= abs(vert)\n", " return a, b" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def treemix_plot(tmp, axes):\n", " \n", " ## create a toytree object from the treemix tree result\n", " tre = toytree.tree(newick=tmp.results.tree)\n", " tre.draw(\n", " axes=axes,\n", " use_edge_lengths=True,\n", " tree_style='c',\n", " tip_labels_align=True,\n", " edge_align_style={\"stroke-width\": 1}\n", " );\n", "\n", " ## get coords \n", " for admix in tmp.results.admixture:\n", " ## parse admix event\n", " pidx, pdist, cidx, cdist, weight = admix\n", " a = _get_admix_point(tre, pidx, pdist)\n", " b = _get_admix_point(tre, cidx, cdist)\n", "\n", " ## add line for admixture edge\n", " mark = axes.plot(\n", " a = (a[0], b[0]),\n", " b = (a[1], b[1]),\n", " style={\"stroke-width\": 10*weight,\n", " \"stroke-opacity\": 0.95, \n", " \"stroke-linecap\": \"round\"}\n", " )\n", "\n", " ## add points at admixture sink\n", " axes.scatterplot(\n", " a = (b[0]),\n", " b = (b[1]),\n", " size=8,\n", " title=\"weight: {}\".format(weight),\n", " )\n", "\n", " ## add scale bar for edge lengths\n", " axes.y.show=False\n", " axes.x.ticks.show=True\n", " axes.x.label.text = \"Drift parameter\"\n", " return axes\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Draw a single result" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'0.15.0-dev'" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "toyplot.__version__" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
przcyssupcyathacupliprexrckweight: 0.137873-0.3-0.2-0.10.0Drift parameter
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## select a result\n", "tmp = tdict[\"mig-1-rep-0\"]\n", "\n", "## draw the tree similar to the Treemix plotting R code\n", "canvas = toyplot.Canvas(width=350, height=350)\n", "axes = canvas.cartesian(padding=25, margin=75)\n", "axes = treemix_plot(tmp, axes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Draw replicate runs" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
przcyssupcyacupthaliprckrex-0.2-0.10.0Drift parameterprzcyssupcyacupthaliprckrex-0.2-0.10.0Drift parameterprzcyssupcyacupthaliprexrck-0.2-0.10.0Drift parameterprzcyscyasupcupthaliprexrck-0.2-0.10.0Drift parameterprzcyssupcyathacupliprexrckweight: 0.137873-0.3-0.2-0.10.0Drift parameterprzcyscyasupthacupliprexrckweight: 0.137961-0.3-0.2-0.10.0Drift parameterprzcyscyasupthacupliprckrexweight: 0.137873-0.3-0.2-0.10.0Drift parameterprzcyscyasupthacupliprckrexweight: 0.137873-0.3-0.2-0.10.0Drift parameterprzcyscyasupcupthaliprexrckweight: 0.0460636weight: 0.139281-0.3-0.2-0.10.0Drift parameterprzcyssupcyacupthaliprexrckweight: 0.0963147weight: 0.0459799-0.3-0.2-0.10.0Drift parameterprzcyscyasupcupthaliprexrckweight: 0.0460636weight: 0.139281-0.3-0.2-0.10.0Drift parameterprzcyssupcyacupthaliprexrckweight: 0.087311weight: 0.0459799-0.3-0.2-0.10.0Drift parameterprzsupcyacyscupthaliprexrckweight: 0.141451weight: 0.457816weight: 0.0459704-0.3-0.2-0.10.0Drift parameterprzsupcyacysthacupliprexrckweight: 0.141451weight: 0.457816weight: 0.0459704-0.3-0.2-0.10.0Drift parameterprzsupcyacysthacupliprexrckweight: 0.141451weight: 0.457816weight: 0.0459704-0.3-0.2-0.10.0Drift parameterprzsupcyacyscupthaliprexrckweight: 0.141451weight: 0.457816weight: 0.0459704-0.3-0.2-0.10.0Drift parameter
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "canvas = toyplot.Canvas(width=800, height=1200)\n", "idx = 0\n", "for mig in range(4):\n", " for rep in range(4):\n", " tmp = tdict[\"mig-{}-rep-{}\".format(mig, rep)]\n", " ax = canvas.cartesian(grid=(4, 4, idx), padding=25, margin=(25, 50, 100, 25))\n", " ax = treemix_plot(tmp, ax)\n", " idx += 1" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
prz0.267703-0.0270258-0.0280964-0.0242687-0.0364397-0.0391614-0.038536-0.0367678-0.037407sup-0.02702580.07444460.0251990.0111396-0.0156012-0.0181699-0.0177945-0.015368-0.0168238cya-0.02809640.0251990.07385490.0122737-0.0152837-0.0181586-0.0174872-0.0153056-0.0169961cys-0.02426870.01113960.01227370.0819627-0.0148089-0.018092-0.0170124-0.0149584-0.0162356cup-0.0364397-0.0156012-0.0152837-0.01480890.04969710.009404880.007483780.009634720.0059131tha-0.0391614-0.0181699-0.0181586-0.0180920.009404880.05313330.01019190.01232250.00852938lip-0.038536-0.0177945-0.0174872-0.01701240.007483780.01019190.04816830.01346840.0115176rex-0.0367678-0.015368-0.0153056-0.01495840.009634720.01232250.01346840.03284640.0141279rck-0.037407-0.0168238-0.0169961-0.01623560.00591310.008529380.01151760.01412790.0473745przsupcyacyscupthaliprexrck-0.040.040.110.190.27
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## grab names from the tree\n", "tre = toytree.tree(tmp.results.tree)\n", "\n", "lnames = toyplot.locator.Explicit(\n", " locations=range(len(tre.get_tip_labels())),\n", " labels=tre.get_tip_labels(),\n", ")\n", "\n", "## get a colormap and plot the matrix\n", "cmap = toyplot.color.diverging.map(\"BlueRed\", tmp.results.cov.min(), tmp.results.cov.max())\n", "canvas, table = toyplot.matrix(\n", " (tmp.results.cov, cmap),\n", " width=400, \n", " height=400, \n", " bshow=True,\n", " tshow=False,\n", " llocator=lnames,\n", " blocator=lnames, \n", " );\n", "\n", "## add a color scale\n", "tlocs = np.linspace(tmp.results.cov.min(), tmp.results.cov.max(), 5)\n", "tlabs = [\"{:.2f}\".format(i) for i in tlocs]\n", "canvas.color_scale(cmap, \n", " x1=100, x2=300, y1=50, y2=50, \n", " ticklocator=toyplot.locator.Explicit(\n", " locations=tlocs, \n", " labels=tlabs,\n", " ));\n", " " ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 2 }