{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Cookbook: *RAxML* analyses in a notebook\n", "\n", "As part of the `ipyrad.analysis` toolkit we've created convenience functions for easily running common *RAxML* commands. This can be useful when you want to run all of your analyes in a clean stream-lined way in a jupyter-notebook to create a completely reproducible study. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Install software\n", "There are many ways to install raxml, the simplest of which is to use conda. This will install several raxml binaries into your conda path. If you want to call a different version of *raxml* that can easily be done by changing the parameter 'binary'. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## conda install ipyrad -c ipyrad\n", "## conda install toytree -c eaton-lab\n", "## conda install raxml -c bioconda" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a raxml Class object\n", "Create a raxml object which has a bunch of default parameters associated with it. The only required argument to initialize the object is a phylip formatted sequence file. In this example I provide a name and working directory as well. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import ipyrad.analysis as ipa\n", "import toyplot\n", "import toytree" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "rax = ipa.raxml(\n", " data=\"./analysis-ipyrad/aligntest_outfiles/aligntest.phy\",\n", " name=\"aligntest\", \n", " workdir=\"analysis-raxml\",\n", " );" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Additional options\n", "You can also modify many of the other command line arguments to raxml by changing values in the params dictionary of your raxml object. These values could also have been set when you initialized the object. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## set some other params\n", "rax.params.N = 10\n", "rax.params.T = 2\n", "rax.params.o = None \n", "#rax.params.o = [\"32082_przewalskii\", \"33588_przewalskii\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Print the command string \n", "It is good practice to always print the command string so that you know exactly what was called for you analysis and it is documented. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "raxmlHPC-PTHREADS-SSE3 -f a -T 2 -m GTRGAMMA -N 10 -x 12345 -p 54321 -n aligntest -w /home/deren/Documents/ipyrad/tests/analysis-raxml -s /home/deren/Documents/ipyrad/tests/analysis-ipyrad/aligntest_outfiles/aligntest.phy\n" ] } ], "source": [ "print rax.command" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run the job\n", "This will start the job running. We haven't made a progress bar yet but we will add one soon. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "job aligntest finished successfully\n" ] } ], "source": [ "rax.run(force=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Access results\n", "One of the reasons it is so convenient to run your raxml jobs this way is that the results files are easily accessible from your raxml objects. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "bestTree ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bestTree.aligntest\n", "bipartitions ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bipartitions.aligntest\n", "bipartitionsBranchLabels ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bipartitionsBranchLabels.aligntest\n", "bootstrap ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bootstrap.aligntest\n", "info ~/Documents/ipyrad/tests/analysis-raxml/RAxML_info.aligntest" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rax.trees" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot the results\n", "Here we use toytree to plot the bootstrap results. \n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
3L_03K_03I_03J_02H_02G_02F_02E_01D_01C_01B_01A_0idx: 1\n", "name: 7\n", "dist: 0.00306813286161\n", "support: 100100idx: 2\n", "name: 8\n", "dist: 0.000958273347783\n", "support: 100100idx: 3\n", "name: 9\n", "dist: 0.00106650019095\n", "support: 100100idx: 4\n", "name: 3\n", "dist: 0.00306813286161\n", "support: 100100idx: 5\n", "name: 4\n", "dist: 0.000973067754965\n", "support: 100100idx: 6\n", "name: 5\n", "dist: 0.00117656048502\n", "support: 100100idx: 7\n", "name: 6\n", "dist: 0.00116405198144\n", "support: 100100idx: 8\n", "name: 2\n", "dist: 0.00107367225634\n", "support: 100100idx: 9\n", "name: 1\n", "dist: 0.00105363755445\n", "support: 100100idx: 10\n", "name: 10\n", "dist: 0.00110599233987\n", "support: 100100
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tre = toytree.tree(rax.trees.bipartitions)\n", "tre.root(wildcard=\"3\")\n", "tre.draw(\n", " height=300,\n", " width=300,\n", " node_labels=tre.get_node_values(\"support\"),\n", ");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## [optional] Submit raxml jobs to run on a cluster\n", "Using the ipyparallel library you can submit raxml jobs to run in parallel on cluster in a load-balanced fashion. You can then tell the notebook to wait until all jobs are finished before progressing in the notebook to draw trees, etc. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Start an ipyparallel cluster\n", "In a separate terminal start an `ipcluster` instance and tell it how many engines to start. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "##\n", "## ipcluster start --n=20\n", "##" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a Client connected to the cluster" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import ipyparallel as ipp\n", "ipyclient = ipp.Client()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create several raxml objects for different data sets" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "rax1 = ipa.raxml(\n", " data=\"~/Documents/ipyrad/tests/analysis-ipyrad/pedic_outfiles/pedic.phy\", \n", " name=\"rax1\", T=4, N=100)\n", "\n", "rax2 = ipa.raxml(\n", " data=\"~/Documents/ipyrad/tests/analysis-ipyrad/aligntest_outfiles/aligntest.phy\", \n", " name=\"rax2\", T=4, N=100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Submit jobs to run on the cluster queue. " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "job rax1 submitted to cluster\n", "job rax2 submitted to cluster\n" ] } ], "source": [ "rax1.run(ipyclient=ipyclient, force=True)\n", "rax2.run(ipyclient=ipyclient, force=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Wait for jobs to finish" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## you can query each job while it's running\n", "rax1.async.ready()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## or just block until all jobs on ipyclient are finished\n", "ipyclient.wait()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot trees when jobs are finished\n", "Here we will draw a slighly more complex tree figure that combines two trees onto a single canvas." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
32082_przewalskii33588_przewalskii29154_superba30686_cyathophylla41478_cyathophylloides41954_cyathophylloides33413_thamno30556_thamno40578_rex35855_rex35236_rex38362_rex39618_rex
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
3L_03K_03I_03J_02H_02G_02F_02E_01D_01C_01B_01A_0
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## load trees and add to axes\n", "tre1 = toytree.tree(rax1.trees.bipartitions)\n", "tre1.root(wildcard=\"prz\")\n", "tre1.draw(width=300);\n", "\n", "tre2 = toytree.tree(rax2.trees.bipartitions)\n", "tre2.root(wildcard=\"3\")\n", "tre2.draw(width=300);" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 2 }