{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ipyrad-analysis toolkit: RAxML\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "RAxML is the most popular tool for inferring phylogenetic trees using maximum likelihood. It is fast even for very large data sets. The documentation for raxml is huge, and there are many options. However, I tend to use the same small number of options very frequently, which motivated me to write the `ipa.raxml()` tool to automate the process of generating RAxml command line strings, running them, and accessing the resulting tree files. The simplicity of this tool makes it easy to incorporate into other more complex tools, for example, to infer tress in sliding windows along the genome using the `ipa.treeslider` tool.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Required software" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# conda install ipyrad -c bioconda\n", "# conda install raxml -c bioconda\n", "# conda install toytree -c eaton-lab" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import ipyrad.analysis as ipa\n", "import toytree" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Short Tutorial:\n", "\n", "The raxml tool takes a *phylip* formatted file as input. In addition you can set a number of analysis options either when you init the tool, or afterwards by accessing the `.params` dictionary. You can view the raxml command string that is generated from the input arguments and you can call `.run()` to start the tree inference. \n", "This example takes about 3-5 minutes to run on my laptop for a data set with 13 samples and ~1.2M SNPs and about 14% missing data." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# the path to your phylip formatted output file\n", "phyfile = \"../min10_outfiles/min10.phy\"" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "raxmlHPC-PTHREADS-SSE3 -f a -T 4 -m GTRGAMMA -n test -w /home/deren/Documents/ipyrad/newdocs/cookbook/analysis-raxml -s /home/deren/Documents/ipyrad/tests/pedicularis/data10_outfiles/data10.phy -p 54321 -N 10 -x 12345\n", "job test finished successfully\n" ] } ], "source": [ "# init raxml object with input data and (optional) parameter options\n", "rax = ipa.raxml(data=phyfile, T=4, N=10)\n", "\n", "# print the raxml command string for prosperity\n", "print(rax.command)\n", "\n", "# run the command, (options: block until finishes; overwrite existing)\n", "rax.run(block=True, force=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Draw the inferred tree\n", "After inferring a tree you can then visualize it in a notebook using `toytree`. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
38362_rex39618_rex35236_rex40578_rex35855_rex30556_thamno33413_thamno41954_cyathophylloides41478_cyathophylloides30686_cyathophylla29154_superba33588_przewalskii32082_przewalskii100100100100100100100100100100100100
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# (optional) draw your tree in the notebook\n", "import toytree\n", "\n", "# load from the .trees attribute of the raxml object, or from the saved tree file\n", "tre = toytree.tree(rax.trees.bipartitions)\n", "\n", "# draw the tree\n", "rtre = tre.root(wildcard=\"prz\")\n", "rtre.draw(tip_labels_align=True, node_labels=\"support\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setting parameters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default several parameters are pre-set in the raxml object. To remove those parameters from the command string you can set them to None. Additionally, you can build complex raxml command line strings by adding almost any parameter to the raxml object init, like below. You probably can't do everythin in raxml using this tool, it's only meant as a convenience. You can always of course just write the raxml command line string by hand instead." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "N 10 \n", "T 4 \n", "binary raxmlHPC-PTHREADS-SSE3\n", "f a \n", "m GTRGAMMA \n", "n test \n", "p 54321 \n", "s ~/Documents/ipyrad/tests/pedicularis/data10_outfiles/data10.phy\n", "w ~/Documents/ipyrad/newdocs/cookbook/analysis-raxml\n", "x 12345 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# init raxml object\n", "rax = ipa.raxml(data=phyfile, T=4, N=10)\n", "\n", "# parameter dictionary for a raxml object\n", "rax.params" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "bestTree ~/Documents/ipyrad/newdocs/cookbook/analysis-raxml/RAxML_bestTree.test\n", "bipartitions ~/Documents/ipyrad/newdocs/cookbook/analysis-raxml/RAxML_bipartitions.test\n", "bipartitionsBranchLabels ~/Documents/ipyrad/newdocs/cookbook/analysis-raxml/RAxML_bipartitionsBranchLabels.test\n", "bootstrap ~/Documents/ipyrad/newdocs/cookbook/analysis-raxml/RAxML_bootstrap.test\n", "info ~/Documents/ipyrad/newdocs/cookbook/analysis-raxml/RAxML_info.test" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# paths to output files produced by raxml inference\n", "rax.trees" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cookbook\n", "\n", "Most frequently used: perform 100 rapid bootstrap analyses followed by 10 rapid hill-climbing ML searches from random starting trees under the GTRGAMMA substitution model. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "raxmlHPC-PTHREADS-SSE3 -f a -T 20 -m GTRGAMMA -n test -w /home/deren/Documents/ipyrad/newdocs/cookbook/analysis-raxml -s /home/deren/Documents/ipyrad/tests/pedicularis/data10_outfiles/data10.phy -p 54321 -N 100 -x 12345\n" ] } ], "source": [ "rax = ipa.raxml(\n", " data=phyfile,\n", " name=\"test\",\n", " workdir=\"analysis-raxml\",\n", " m=\"GTRGAMMA\",\n", " T=20,\n", " f=\"a\",\n", " N=100,\n", ")\n", "print(rax.command)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another common option: Perform N rapid hill-climbing ML analyses from random starting trees, with no bootstrap replicates." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "raxmlHPC-PTHREADS-SSE3 -f d -T 20 -m GTRGAMMA -n test -w /home/deren/Documents/ipyrad/newdocs/cookbook/analysis-raxml -s /home/deren/Documents/ipyrad/tests/pedicularis/data10_outfiles/data10.phy -p 54321 -N 10\n" ] } ], "source": [ "rax = ipa.raxml(\n", " data=phyfile,\n", " name=\"test\",\n", " workdir=\"analysis-raxml\",\n", " m=\"GTRGAMMA\",\n", " T=20,\n", " f=\"d\",\n", " N=10,\n", " x=None,\n", ")\n", "print(rax.command)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generating new phylip alignments\n", "See the ipyrad-analysis `window_extracter` tool for creating new alignments by applying filtering or consensus calling to include/exclude taxa or loci/sites from analyses with options to minimize th amount of missing data in alignments. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What's next?\n", "\n", "If you have reference mapped data then you should see the `.treeslider()` tool to infer trees in sliding windows along scaffolds; or the `.window_extracter()` tool to extract, filter, and concatenate RAD loci within a given window (e.g., near some known gene)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 2 }