{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# DaCosta & Sorenson (2016) single-end dd-RAD data set\n", "Here we demonstrate a denovo assembly for an empirical RAD data set using the ipyrad Python API. This example was run on a workstation with 20 cores available and takes about <10 minutes to run completely, but can be run on even a laptop in about less than an hour.\n", "\n", "We will use the Lagonosticta and Vidua data set from [DaCosta & Sorenson 2016](https://www.ncbi.nlm.nih.gov/pubmed/26279345). This data set is composed of single-end 101bp reads from a ddRAD-seq library prepared with the SbfI and EcoRI enzymes and is available on NCBI by its study accession SRP059199. At the end of this notebook we also demonstrate the use of ipyrad.analysis tools to run downstream analyses on this data set. \n", "\n", "The figure below from this paper shows the general workflow in which two fairly distinct clades were sequenced together but then analyzed separately.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup (software and data files)\n", "If you haven't done so yet, start by installing `ipyrad` using conda (see ipyrad installation instructions) as well as the packages in the cell below. This is easiest to do in a terminal. Then open a jupyter-notebook, like this one, and follow along with the tutorial by copying and executing the code in the cells, and adding your own documentation between them using markdown. Feel free to modify parameters to see their effects on the downstream results. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## conda install ipyrad -c ipyrad\n", "## conda install toytree -c eaton-lab\n", "## conda install entrez-direct -c bioconda\n", "## conda install sratools -c bioconda" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## imports\n", "import ipyrad as ip\n", "import ipyrad.analysis as ipa\n", "import ipyparallel as ipp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In contrast to the ipyrad CLI, the ipyrad API gives users much more fine-scale control over the parallelization of their analysis, but this also requires learning a little bit about the library that we use to do this, called `ipyparallel`. This library is designed for use with jupyter-notebooks to allow massive-scale multi-processing while working *interactively*. \n", "\n", "Understanding the nuts and bolts of it might take a little while, but it is fairly easy to get started using it, especially in the way it is integrated with ipyrad. To start a parallel client to you must run the command-line program '`ipcluster`'. This will essentially start a number of independent Python processes (kernels) which we can then send bits of work to do. The cluster can be stopped and restarted independently of this notebook, which is convenient for working on a cluster where connecting to many cores is not always immediately available. \n", "\n", "Open a terminal and type the following command to start an ipcluster instance with N engines. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## ipcluster start --n=20" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## connect to cluster\n", "ipyclient = ipp.Client()\n", "ipyclient.ids" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download the data set (Finches)\n", "These data are archived on the NCBI sequence read archive (SRA) under accession id SRP059199. For convenience, the data are also hosted at a public Dropbox link which is a bit easier to access. Run the code below to download and decompress the fastq data files, which will save them into a directory called fastqs-Finches/. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[####################] 100% Downloading fastq files | 0:00:52 | \n", "24 fastq files downloaded to /home/deren/Documents/ipyrad/tests/fastqs-Finches\n" ] } ], "source": [ "## download the Pedicularis data set from NCBI\n", "sra = ipa.sratools(accession=\"SRP059199\", workdir=\"fastqs-Finches\")\n", "sra.run(force=True, ipyclient=ipyclient)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create an Assembly object\n", "This object stores the parameters of the assembly and the organization of data files. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "New Assembly: Finches\n" ] } ], "source": [ "## you must provide a name for the Assembly\n", "data = ip.Assembly(\"Finches\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set parameters for the Assembly. This will raise an error if any of the parameters are not allowed because they are the wrong type, or out of the allowed range. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 assembly_name Finches \n", "1 project_dir ./analysis-ipyrad/Finches \n", "2 raw_fastq_path \n", "3 barcodes_path \n", "4 sorted_fastq_path ./fastqs-Finches/*.fastq.gz \n", "5 assembly_method denovo \n", "6 reference_sequence \n", "7 datatype ddrad \n", "8 restriction_overhang ('CCTGCAGG', 'AATTC') \n", "9 max_low_qual_bases 5 \n", "10 phred_Qscore_offset 33 \n", "11 mindepth_statistical 6 \n", "12 mindepth_majrule 6 \n", "13 maxdepth 10000 \n", "14 clust_threshold 0.85 \n", "15 max_barcode_mismatch 0 \n", "16 filter_adapters 2 \n", "17 filter_min_trim_len 35 \n", "18 max_alleles_consens 2 \n", "19 max_Ns_consens (5, 5) \n", "20 max_Hs_consens (5, 5) \n", "21 min_samples_locus 4 \n", "22 max_SNPs_locus (20, 20) \n", "23 max_Indels_locus (8, 8) \n", "24 max_shared_Hs_locus 0.5 \n", "25 trim_reads (0, 0, 0, 0) \n", "26 trim_loci (0, 0, 0, 0) \n", "27 output_formats ('p', 's', 'v', 'n', 'k', 'u') \n", "28 pop_assign_file \n" ] } ], "source": [ "## set parameters\n", "data.set_params(\"project_dir\", \"analysis-ipyrad/Finches\")\n", "data.set_params(\"sorted_fastq_path\", \"fastqs-Finches/*.fastq.gz\")\n", "data.set_params(\"datatype\", \"ddrad\")\n", "data.set_params(\"restriction_overhang\", (\"CCTGCAGG\", \"AATTC\"))\n", "data.set_params(\"clust_threshold\", \"0.85\")\n", "data.set_params(\"filter_adapters\", \"2\")\n", "data.set_params(\"max_Hs_consens\", (5, 5))\n", "data.set_params(\"output_formats\", \"psvnkua\")\n", "\n", "## see/print all parameters\n", "data.get_params()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Assemble the data set\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Assembly: Finches\n", "[####################] 100% loading reads | 0:00:03 | s1 | \n", "[####################] 100% processing reads | 0:01:08 | s2 | \n" ] } ], "source": [ "## run steps 1 & 2 of the assembly\n", "data.run(\"12\")" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
statereads_rawreads_passed_filter
Anomalospiza_imberbis2935241889028
Clytospiza_monteiri212208791168704
Lagonosticta_larvata21001743944653
Lagonosticta_rara2992534934386
Lagonosticta_rhodopareia21020850961277
Lagonosticta_rubricata_congica21064587997387
Lagonosticta_rubricata_rubricata210797011009532
Lagonosticta_rufopicta2898117846834
Lagonosticta_sanguinodorsalis21034739980499
Lagonosticta_senegala_rendalli2834688786701
Lagonosticta_senegala_rhodopsis2792799749508
Vidua_chalybeata_amauropteryx210316741028936
Vidua_chalybeata_neumanni2709824678382
Vidua_fischeri2998161935631
Vidua_hypocherina2889818844395
Vidua_interjecta2741675708862
Vidua_macroura_arenosa2939649891987
Vidua_macroura_macroura2729322690496
Vidua_obtusa2809186763098
Vidua_orientalis2862619824073
Vidua_paradisaea2833981791532
Vidua_purpurascens2927116883478
Vidua_raricola2956686907898
Vidua_regia21012887965657
\n", "
" ], "text/plain": [ " state reads_raw reads_passed_filter\n", "Anomalospiza_imberbis 2 935241 889028\n", "Clytospiza_monteiri 2 1220879 1168704\n", "Lagonosticta_larvata 2 1001743 944653\n", "Lagonosticta_rara 2 992534 934386\n", "Lagonosticta_rhodopareia 2 1020850 961277\n", "Lagonosticta_rubricata_congica 2 1064587 997387\n", "Lagonosticta_rubricata_rubricata 2 1079701 1009532\n", "Lagonosticta_rufopicta 2 898117 846834\n", "Lagonosticta_sanguinodorsalis 2 1034739 980499\n", "Lagonosticta_senegala_rendalli 2 834688 786701\n", "Lagonosticta_senegala_rhodopsis 2 792799 749508\n", "Vidua_chalybeata_amauropteryx 2 1031674 1028936\n", "Vidua_chalybeata_neumanni 2 709824 678382\n", "Vidua_fischeri 2 998161 935631\n", "Vidua_hypocherina 2 889818 844395\n", "Vidua_interjecta 2 741675 708862\n", "Vidua_macroura_arenosa 2 939649 891987\n", "Vidua_macroura_macroura 2 729322 690496\n", "Vidua_obtusa 2 809186 763098\n", "Vidua_orientalis 2 862619 824073\n", "Vidua_paradisaea 2 833981 791532\n", "Vidua_purpurascens 2 927116 883478\n", "Vidua_raricola 2 956686 907898\n", "Vidua_regia 2 1012887 965657" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## access the stats of the assembly (so far) from the .stats attribute\n", "data.stats" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Assembly: Finches\n", "[####################] 100% dereplicating | 0:00:03 | s3 | \n", "[####################] 100% clustering | 0:00:43 | s3 | \n", "[####################] 100% building clusters | 0:00:05 | s3 | \n", "[####################] 100% chunking | 0:00:01 | s3 | \n", "[####################] 100% aligning | 0:05:02 | s3 | \n", "[####################] 100% concatenating | 0:00:04 | s3 | \n", "[####################] 100% inferring [H, E] | 0:00:35 | s4 | \n", "[####################] 100% calculating depths | 0:00:03 | s5 | \n", "[####################] 100% chunking clusters | 0:00:02 | s5 | \n", "[####################] 100% consens calling | 0:01:24 | s5 | \n" ] } ], "source": [ "## run steps 3-5 (within-sample steps) of the assembly\n", "data.run(\"345\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Branch to create separate data sets for Vidua and Lagonistica" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## create data set with only Vidua samples + outgroup\n", "subs = [i for i in data.samples if \"Vidua\" in i] +\\\n", " [i for i in data.samples if \"Anomalo\" in i]\n", "vidua = data.branch(\"vidua\", subsamples=subs)\n", "\n", "## create data set with only Lagonostica sampes + outgroup\n", "subs = [i for i in data.samples if \"Lagon\" in i] +\\\n", " [i for i in data.samples if \"Clyto\" in i]\n", "lagon = data.branch(\"lagon\", subsamples=subs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Assemble each data set through final steps\n", "Or, if you are pressed for time, you can choose just one of the Assemblies going forward. If you do, you may want to choose Vidua since that is the one we use in the downstream analysis tools at the end of this notebook. " ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Assembly: vidua\n", "[####################] 100% concat/shuffle input | 0:00:01 | s6 | \n", "[####################] 100% clustering across | 0:00:08 | s6 | \n", "[####################] 100% building clusters | 0:00:02 | s6 | \n", "[####################] 100% aligning clusters | 0:00:07 | s6 | \n", "[####################] 100% database indels | 0:00:03 | s6 | \n", "[####################] 100% indexing clusters | 0:00:01 | s6 | \n", "[####################] 100% building database | 0:00:04 | s6 | \n" ] } ], "source": [ "vidua.run(\"6\")" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Assembly: lagon\n", "[####################] 100% concat/shuffle input | 0:00:01 | s6 | \n", "[####################] 100% clustering across | 0:00:06 | s6 | \n", "[####################] 100% building clusters | 0:00:01 | s6 | \n", "[####################] 100% aligning clusters | 0:00:04 | s6 | \n", "[####################] 100% database indels | 0:00:03 | s6 | \n", "[####################] 100% indexing clusters | 0:00:01 | s6 | \n", "[####################] 100% building database | 0:00:03 | s6 | \n" ] } ], "source": [ "lagon.run(\"6\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Branch to create several final data sets with different parameter settings\n", "\n", "Here we use nested for-loops to iterate over assemblies and parameter values. \n" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Assembly: vidua_min4\n", "[####################] 100% filtering loci | 0:00:09 | s7 | \n", "[####################] 100% building loci/stats | 0:00:01 | s7 | \n", "[####################] 100% building vcf file | 0:00:04 | s7 | \n", "[####################] 100% writing vcf file | 0:00:00 | s7 | \n", "[####################] 100% building arrays | 0:00:07 | s7 | \n", "[####################] 100% writing outfiles | 0:00:01 | s7 | \n", "Outfiles written to: ~/Documents/ipyrad/tests/analysis-ipyrad/Finches/vidua_min4_outfiles\n", "\n", "Assembly: vidua_min10\n", "[####################] 100% filtering loci | 0:00:02 | s7 | \n", "[####################] 100% building loci/stats | 0:00:02 | s7 | \n", "[####################] 100% building vcf file | 0:00:02 | s7 | \n", "[####################] 100% writing vcf file | 0:00:00 | s7 | \n", "[####################] 100% building arrays | 0:00:07 | s7 | \n", "[####################] 100% writing outfiles | 0:00:01 | s7 | \n", "Outfiles written to: ~/Documents/ipyrad/tests/analysis-ipyrad/Finches/vidua_min10_outfiles\n", "\n", "Assembly: lagon_min4\n", "[####################] 100% filtering loci | 0:00:02 | s7 | \n", "[####################] 100% building loci/stats | 0:00:01 | s7 | \n", "[####################] 100% building vcf file | 0:00:01 | s7 | \n", "[####################] 100% writing vcf file | 0:00:00 | s7 | \n", "[####################] 100% building arrays | 0:00:06 | s7 | \n", "[####################] 100% writing outfiles | 0:00:01 | s7 | \n", "Outfiles written to: ~/Documents/ipyrad/tests/analysis-ipyrad/Finches/lagon_min4_outfiles\n", "\n", "Assembly: lagon_min10\n", "[####################] 100% filtering loci | 0:00:02 | s7 | \n", "[####################] 100% building loci/stats | 0:00:01 | s7 | \n", "[####################] 100% building vcf file | 0:00:02 | s7 | \n", "[####################] 100% writing vcf file | 0:00:00 | s7 | \n", "[####################] 100% building arrays | 0:00:06 | s7 | \n", "[####################] 100% writing outfiles | 0:00:01 | s7 | \n", "Outfiles written to: ~/Documents/ipyrad/tests/analysis-ipyrad/Finches/lagon_min10_outfiles\n", "\n" ] } ], "source": [ "## iterate over data set and parameters\n", "for assembly in [vidua, lagon]:\n", " for min_sample in [4, 10]:\n", " \n", " ## create new assembly, apply new name and parameters\n", " newname = \"{}_min{}\".format(assembly.name, min_sample)\n", " newdata = assembly.branch(newname)\n", " newdata.set_params(\"min_samples_locus\", min_sample)\n", " \n", " ## run step 7 \n", " newdata.run(\"7\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View final stats\n", "The .stats attribute shows a stats summary for each sample, and a number of stats dataframes can be accessed for each step from the .stats_dfs attribute of the Assembly. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loading Assembly: vidua_min4\n", "from saved path: ~/Documents/ipyrad/tests/analysis-ipyrad/Finches/vidua_min4.json\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
statereads_rawreads_passed_filterclusters_totalclusters_hidepthhetero_esterror_estreads_consens
Anomalospiza_imberbis69352418890281911687700.0075620.0038238584
Vidua_chalybeata_amauropteryx6103167410289361967486200.0030110.0006748535
Vidua_chalybeata_neumanni67098246783822205390350.0052030.0033468879
Vidua_fischeri69981619356312630997330.0049100.0025789595
Vidua_hypocherina68898188443952380695390.0041990.0026989406
Vidua_interjecta67416757088622238393450.0066170.0028649180
Vidua_macroura_arenosa693964989198726536103850.0096330.0036439994
Vidua_macroura_macroura67293226904962244995190.0073300.0026589350
Vidua_obtusa680918676309826641100390.0060430.0038289834
Vidua_orientalis68626198240732512798210.0064610.0027309666
Vidua_paradisaea68339817915322628696360.0066480.0040849360
Vidua_purpurascens69271168834782021293350.0046030.0024899196
Vidua_raricola695668690789827028103030.0065000.00254510115
Vidua_regia610128879656572635597980.0042250.0024829657
\n", "
" ], "text/plain": [ " state reads_raw reads_passed_filter \\\n", "Anomalospiza_imberbis 6 935241 889028 \n", "Vidua_chalybeata_amauropteryx 6 1031674 1028936 \n", "Vidua_chalybeata_neumanni 6 709824 678382 \n", "Vidua_fischeri 6 998161 935631 \n", "Vidua_hypocherina 6 889818 844395 \n", "Vidua_interjecta 6 741675 708862 \n", "Vidua_macroura_arenosa 6 939649 891987 \n", "Vidua_macroura_macroura 6 729322 690496 \n", "Vidua_obtusa 6 809186 763098 \n", "Vidua_orientalis 6 862619 824073 \n", "Vidua_paradisaea 6 833981 791532 \n", "Vidua_purpurascens 6 927116 883478 \n", "Vidua_raricola 6 956686 907898 \n", "Vidua_regia 6 1012887 965657 \n", "\n", " clusters_total clusters_hidepth hetero_est \\\n", "Anomalospiza_imberbis 19116 8770 0.007562 \n", "Vidua_chalybeata_amauropteryx 19674 8620 0.003011 \n", "Vidua_chalybeata_neumanni 22053 9035 0.005203 \n", "Vidua_fischeri 26309 9733 0.004910 \n", "Vidua_hypocherina 23806 9539 0.004199 \n", "Vidua_interjecta 22383 9345 0.006617 \n", "Vidua_macroura_arenosa 26536 10385 0.009633 \n", "Vidua_macroura_macroura 22449 9519 0.007330 \n", "Vidua_obtusa 26641 10039 0.006043 \n", "Vidua_orientalis 25127 9821 0.006461 \n", "Vidua_paradisaea 26286 9636 0.006648 \n", "Vidua_purpurascens 20212 9335 0.004603 \n", "Vidua_raricola 27028 10303 0.006500 \n", "Vidua_regia 26355 9798 0.004225 \n", "\n", " error_est reads_consens \n", "Anomalospiza_imberbis 0.003823 8584 \n", "Vidua_chalybeata_amauropteryx 0.000674 8535 \n", "Vidua_chalybeata_neumanni 0.003346 8879 \n", "Vidua_fischeri 0.002578 9595 \n", "Vidua_hypocherina 0.002698 9406 \n", "Vidua_interjecta 0.002864 9180 \n", "Vidua_macroura_arenosa 0.003643 9994 \n", "Vidua_macroura_macroura 0.002658 9350 \n", "Vidua_obtusa 0.003828 9834 \n", "Vidua_orientalis 0.002730 9666 \n", "Vidua_paradisaea 0.004084 9360 \n", "Vidua_purpurascens 0.002489 9196 \n", "Vidua_raricola 0.002545 10115 \n", "Vidua_regia 0.002482 9657 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vm4 = ip.load_json(\"analysis-ipyrad/Finches/vidua_min4.json\")\n", "vm4.stats" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loading Assembly: lagon_min4\n", "from saved path: ~/Documents/ipyrad/tests/analysis-ipyrad/Finches/lagon_min4.json\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
statereads_rawreads_passed_filterclusters_totalclusters_hidepthhetero_esterror_estreads_consens
Clytospiza_monteiri61220879116870432301102160.0084510.0036119775
Lagonosticta_larvata6100174394465325409105570.0082310.00276410378
Lagonosticta_rara699253493438629094100160.0087810.0028099799
Lagonosticta_rhodopareia610208509612772776799470.0087520.0031919714
Lagonosticta_rubricata_congica6106458799738726398101380.0101140.0030079887
Lagonosticta_rubricata_rubricata6107970110095322230687230.0088990.0030638456
Lagonosticta_rufopicta689811784683425233100230.0066090.0037899802
Lagonosticta_sanguinodorsalis610347399804992771690460.0094350.0040158677
Lagonosticta_senegala_rendalli68346887867012136399600.0073500.0027769747
Lagonosticta_senegala_rhodopsis67927997495082692193790.0060130.0031819194
\n", "
" ], "text/plain": [ " state reads_raw reads_passed_filter \\\n", "Clytospiza_monteiri 6 1220879 1168704 \n", "Lagonosticta_larvata 6 1001743 944653 \n", "Lagonosticta_rara 6 992534 934386 \n", "Lagonosticta_rhodopareia 6 1020850 961277 \n", "Lagonosticta_rubricata_congica 6 1064587 997387 \n", "Lagonosticta_rubricata_rubricata 6 1079701 1009532 \n", "Lagonosticta_rufopicta 6 898117 846834 \n", "Lagonosticta_sanguinodorsalis 6 1034739 980499 \n", "Lagonosticta_senegala_rendalli 6 834688 786701 \n", "Lagonosticta_senegala_rhodopsis 6 792799 749508 \n", "\n", " clusters_total clusters_hidepth \\\n", "Clytospiza_monteiri 32301 10216 \n", "Lagonosticta_larvata 25409 10557 \n", "Lagonosticta_rara 29094 10016 \n", "Lagonosticta_rhodopareia 27767 9947 \n", "Lagonosticta_rubricata_congica 26398 10138 \n", "Lagonosticta_rubricata_rubricata 22306 8723 \n", "Lagonosticta_rufopicta 25233 10023 \n", "Lagonosticta_sanguinodorsalis 27716 9046 \n", "Lagonosticta_senegala_rendalli 21363 9960 \n", "Lagonosticta_senegala_rhodopsis 26921 9379 \n", "\n", " hetero_est error_est reads_consens \n", "Clytospiza_monteiri 0.008451 0.003611 9775 \n", "Lagonosticta_larvata 0.008231 0.002764 10378 \n", "Lagonosticta_rara 0.008781 0.002809 9799 \n", "Lagonosticta_rhodopareia 0.008752 0.003191 9714 \n", "Lagonosticta_rubricata_congica 0.010114 0.003007 9887 \n", "Lagonosticta_rubricata_rubricata 0.008899 0.003063 8456 \n", "Lagonosticta_rufopicta 0.006609 0.003789 9802 \n", "Lagonosticta_sanguinodorsalis 0.009435 0.004015 8677 \n", "Lagonosticta_senegala_rendalli 0.007350 0.002776 9747 \n", "Lagonosticta_senegala_rhodopsis 0.006013 0.003181 9194 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lm4 = ip.load_json(\"analysis-ipyrad/Finches/lagon_min4.json\")\n", "lm4.stats" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\r\n", "\r\n", "## The number of loci caught by each filter.\r\n", "## ipyrad API location: [assembly].stats_dfs.s7_filters\r\n", "\r\n", " total_filters applied_order retained_loci\r\n", "total_prefiltered_loci 13949 0 13949\r\n", "filtered_by_rm_duplicates 1645 1645 12304\r\n", "filtered_by_max_indels 233 233 12071\r\n", "filtered_by_max_snps 268 9 12062\r\n", "filtered_by_max_shared_het 83 55 12007\r\n", "filtered_by_min_sample 3583 3310 8697\r\n", "filtered_by_max_alleles 724 251 8446\r\n", "total_filtered_loci 8446 0 8446\r\n", "\r\n", "\r\n", "## The number of loci recovered for each Sample.\r\n", "## ipyrad API location: [assembly].stats_dfs.s7_samples\r\n", "\r\n", " sample_coverage\r\n", "Anomalospiza_imberbis 3921\r\n", "Vidua_chalybeata_amauropteryx 5701\r\n", "Vidua_chalybeata_neumanni 6548\r\n", "Vidua_fischeri 6296\r\n", "Vidua_hypocherina 6377\r\n", "Vidua_interjecta 6590\r\n", "Vidua_macroura_arenosa 6548\r\n", "Vidua_macroura_macroura 6396\r\n", "Vidua_obtusa 6898\r\n", "Vidua_orientalis 6746\r\n", "Vidua_paradisaea 6699\r\n", "Vidua_purpurascens 6674\r\n", "Vidua_raricola 6991\r\n", "Vidua_regia 6267\r\n", "\r\n", "\r\n", "## The number of loci for which N taxa have data.\r\n", "## ipyrad API location: [assembly].stats_dfs.s7_loci\r\n", "\r\n", " locus_coverage sum_coverage\r\n", "1 0 0\r\n", "2 0 0\r\n", "3 0 0\r\n", "4 915 915\r\n", "5 480 1395\r\n", "6 444 1839\r\n", "7 334 2173\r\n", "8 368 2541\r\n", "9 348 2889\r\n", "10 400 3289\r\n", "11 423 3712\r\n", "12 760 4472\r\n", "13 1895 6367\r\n", "14 2079 8446\r\n", "\r\n", "\r\n", "## The distribution of SNPs (var and pis) per locus.\r\n", "## var = Number of loci with n variable sites (pis + autapomorphies)\r\n", "## pis = Number of loci with n parsimony informative site (minor allele in >1 sample)\r\n", "## ipyrad API location: [assembly].stats_dfs.s7_snps\r\n", "\r\n", " var sum_var pis sum_pis\r\n", "0 941 0 3168 0\r\n", "1 1169 1169 2220 2220\r\n", "2 1144 3457 1454 5128\r\n", "3 1096 6745 813 7567\r\n", "4 855 10165 384 9103\r\n", "5 723 13780 200 10103\r\n", "6 629 17554 94 10667\r\n", "7 516 21166 63 11108\r\n", "8 391 24294 26 11316\r\n", "9 293 26931 9 11397\r\n", "10 221 29141 7 11467\r\n", "11 151 30802 3 11500\r\n", "12 115 32182 2 11524\r\n", "13 74 33144 1 11537\r\n", "14 46 33788 2 11565\r\n", "15 34 34298 0 11565\r\n", "16 20 34618 0 11565\r\n", "17 11 34805 0 11565\r\n", "18 11 35003 0 11565\r\n", "19 5 35098 0 11565\r\n", "20 1 35118 0 11565\r\n", "\r\n", "\r\n", "## Final Sample stats summary\r\n", "\r\n", " state reads_raw reads_passed_filter clusters_total clusters_hidepth hetero_est error_est reads_consens loci_in_assembly\r\n", "Anomalospiza_imberbis 7 935241 889028 19116 8770 0.008 3.823e-03 8584 3921\r\n", "Vidua_chalybeata_amauropteryx 7 1031674 1028936 19674 8620 0.003 6.740e-04 8535 5701\r\n", "Vidua_chalybeata_neumanni 7 709824 678382 22053 9035 0.005 3.346e-03 8879 6548\r\n", "Vidua_fischeri 7 998161 935631 26309 9733 0.005 2.578e-03 9595 6296\r\n", "Vidua_hypocherina 7 889818 844395 23806 9539 0.004 2.698e-03 9406 6377\r\n", "Vidua_interjecta 7 741675 708862 22383 9345 0.007 2.864e-03 9180 6590\r\n", "Vidua_macroura_arenosa 7 939649 891987 26536 10385 0.010 3.643e-03 9994 6548\r\n", "Vidua_macroura_macroura 7 729322 690496 22449 9519 0.007 2.658e-03 9350 6396\r\n", "Vidua_obtusa 7 809186 763098 26641 10039 0.006 3.828e-03 9834 6898\r\n", "Vidua_orientalis 7 862619 824073 25127 9821 0.006 2.730e-03 9666 6746\r\n", "Vidua_paradisaea 7 833981 791532 26286 9636 0.007 4.084e-03 9360 6699\r\n", "Vidua_purpurascens 7 927116 883478 20212 9335 0.005 2.489e-03 9196 6674\r\n", "Vidua_raricola 7 956686 907898 27028 10303 0.007 2.545e-03 10115 6991\r\n", "Vidua_regia 7 1012887 965657 26355 9798 0.004 2.482e-03 9657 6267" ] } ], "source": [ "## or read the full stats file as a bash command (cat)\n", "!cat $vm4.stats_files.s7" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\r\n", "\r\n", "## The number of loci caught by each filter.\r\n", "## ipyrad API location: [assembly].stats_dfs.s7_filters\r\n", "\r\n", " total_filters applied_order retained_loci\r\n", "total_prefiltered_loci 14533 0 14533\r\n", "filtered_by_rm_duplicates 1717 1717 12816\r\n", "filtered_by_max_indels 282 282 12534\r\n", "filtered_by_max_snps 264 14 12520\r\n", "filtered_by_max_shared_het 84 65 12455\r\n", "filtered_by_min_sample 5209 4689 7766\r\n", "filtered_by_max_alleles 797 268 7498\r\n", "total_filtered_loci 7498 0 7498\r\n", "\r\n", "\r\n", "## The number of loci recovered for each Sample.\r\n", "## ipyrad API location: [assembly].stats_dfs.s7_samples\r\n", "\r\n", " sample_coverage\r\n", "Clytospiza_monteiri 4776\r\n", "Lagonosticta_larvata 6057\r\n", "Lagonosticta_rara 5737\r\n", "Lagonosticta_rhodopareia 6061\r\n", "Lagonosticta_rubricata_congica 6183\r\n", "Lagonosticta_rubricata_rubricata 4741\r\n", "Lagonosticta_rufopicta 5500\r\n", "Lagonosticta_sanguinodorsalis 5512\r\n", "Lagonosticta_senegala_rendalli 5491\r\n", "Lagonosticta_senegala_rhodopsis 5309\r\n", "\r\n", "\r\n", "## The number of loci for which N taxa have data.\r\n", "## ipyrad API location: [assembly].stats_dfs.s7_loci\r\n", "\r\n", " locus_coverage sum_coverage\r\n", "1 0 0\r\n", "2 0 0\r\n", "3 0 0\r\n", "4 1141 1141\r\n", "5 794 1935\r\n", "6 840 2775\r\n", "7 752 3527\r\n", "8 911 4438\r\n", "9 1359 5797\r\n", "10 1701 7498\r\n", "\r\n", "\r\n", "## The distribution of SNPs (var and pis) per locus.\r\n", "## var = Number of loci with n variable sites (pis + autapomorphies)\r\n", "## pis = Number of loci with n parsimony informative site (minor allele in >1 sample)\r\n", "## ipyrad API location: [assembly].stats_dfs.s7_snps\r\n", "\r\n", " var sum_var pis sum_pis\r\n", "0 313 0 2301 0\r\n", "1 640 640 1969 1969\r\n", "2 806 2252 1296 4561\r\n", "3 895 4937 835 7066\r\n", "4 812 8185 521 9150\r\n", "5 808 12225 255 10425\r\n", "6 705 16455 151 11331\r\n", "7 617 20774 73 11842\r\n", "8 516 24902 41 12170\r\n", "9 375 28277 25 12395\r\n", "10 277 31047 16 12555\r\n", "11 196 33203 8 12643\r\n", "12 197 35567 3 12679\r\n", "13 119 37114 2 12705\r\n", "14 84 38290 0 12705\r\n", "15 54 39100 1 12720\r\n", "16 42 39772 1 12736\r\n", "17 17 40061 0 12736\r\n", "18 16 40349 0 12736\r\n", "19 3 40406 0 12736\r\n", "20 6 40526 0 12736\r\n", "\r\n", "\r\n", "## Final Sample stats summary\r\n", "\r\n", " state reads_raw reads_passed_filter clusters_total clusters_hidepth hetero_est error_est reads_consens loci_in_assembly\r\n", "Clytospiza_monteiri 7 1220879 1168704 32301 10216 0.008 0.004 9775 4776\r\n", "Lagonosticta_larvata 7 1001743 944653 25409 10557 0.008 0.003 10378 6057\r\n", "Lagonosticta_rara 7 992534 934386 29094 10016 0.009 0.003 9799 5737\r\n", "Lagonosticta_rhodopareia 7 1020850 961277 27767 9947 0.009 0.003 9714 6061\r\n", "Lagonosticta_rubricata_congica 7 1064587 997387 26398 10138 0.010 0.003 9887 6183\r\n", "Lagonosticta_rubricata_rubricata 7 1079701 1009532 22306 8723 0.009 0.003 8456 4741\r\n", "Lagonosticta_rufopicta 7 898117 846834 25233 10023 0.007 0.004 9802 5500\r\n", "Lagonosticta_sanguinodorsalis 7 1034739 980499 27716 9046 0.009 0.004 8677 5512\r\n", "Lagonosticta_senegala_rendalli 7 834688 786701 21363 9960 0.007 0.003 9747 5491\r\n", "Lagonosticta_senegala_rhodopsis 7 792799 749508 26921 9379 0.006 0.003 9194 5309" ] } ], "source": [ "## the same full stats for lagon\n", "!cat $lm4.stats_files.s7" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analysis tools\n", "Thee is a lot more information about analysis tools in the ipyrad documentation. But here I'll show just a quick example of how you can easily access the data files for these assemblies and use them in downstream analysis software. The ipyrad analysis tools include convenient wrappers to make it easier to parallelize analyses of RAD-seq data. You should still read the full tutorial of the software you are using to understand the full scope of the parameters involved and their impacts, but once you understand that, the ipyrad analysis tools provide an easy way to setup up scripts to sample different distributions of SNPs and to run many replicates in parallel. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import ipyrad.analysis as ipa" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loading Assembly: vidua_min4\n", "from saved path: ~/Documents/ipyrad/tests/analysis-ipyrad/Finches/vidua_min4.json\n", "loading Assembly: vidua_min10\n", "from saved path: ~/Documents/ipyrad/tests/analysis-ipyrad/Finches/vidua_min10.json\n" ] } ], "source": [ "## you can re-load assemblies at a later time from their JSON file\n", "min4 = ip.load_json(\"analysis-ipyrad/Finches/vidua_min4.json\")\n", "min10 = ip.load_json(\"analysis-ipyrad/Finches/vidua_min10.json\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### *RAxML* -- ML concatenation tree inference" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## conda install raxml -c bioconda\n", "## conda install toytree -c eaton-lab" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## create a raxml analysis object for the min13 data sets\n", "rax = ipa.raxml(\n", " name=min10.name, \n", " data=min10.outfiles.phy,\n", " workdir=\"analysis-raxml\",\n", " T=20,\n", " N=100, \n", " o=[i for i in min10.samples if \"Ano\" in i],\n", " )" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "raxmlHPC-PTHREADS-SSE3 -f a -T 20 -m GTRGAMMA -N 100 -x 12345 -p 54321 -n vidua_min10 -w /home/deren/Documents/ipyrad/tests/analysis-raxml -s /home/deren/Documents/ipyrad/tests/analysis-ipyrad/Finches/vidua_min10_outfiles/vidua_min10.phy -o Anomalospiza_imberbis\n", "job vidua_min10 finished successfully\n" ] } ], "source": [ "## print the raxml command and call it\n", "print rax.command\n", "rax.run(force=True)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "bestTree ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bestTree.vidua_min10\n", "bipartitions ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bipartitions.vidua_min10\n", "bipartitionsBranchLabels ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bipartitionsBranchLabels.vidua_min10\n", "bootstrap ~/Documents/ipyrad/tests/analysis-raxml/RAxML_bootstrap.vidua_min10\n", "info ~/Documents/ipyrad/tests/analysis-raxml/RAxML_info.vidua_min10" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## access the resulting tree files\n", "rax.trees" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Anomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricolaidx: 1\n", "name: 1\n", "dist: 0.0202770172748\n", "support: 100100idx: 2\n", "name: 2\n", "dist: 0.00497052067772\n", "support: 100100idx: 3\n", "name: 3\n", "dist: 0.000609155416378\n", "support: 100100idx: 4\n", "name: 4\n", "dist: 0.00079846260878\n", "support: 100100idx: 5\n", "name: 5\n", "dist: 0.000828930900595\n", "support: 100100idx: 6\n", "name: 6\n", "dist: 0.000759628385216\n", "support: 9696idx: 7\n", "name: 7\n", "dist: 0.00392468254923\n", "support: 100100idx: 8\n", "name: 8\n", "dist: 0.00369545217793\n", "support: 100100idx: 9\n", "name: 9\n", "dist: 0.00054103209537\n", "support: 100100idx: 10\n", "name: 10\n", "dist: 0.00167952306397\n", "support: 100100idx: 11\n", "name: 11\n", "dist: 0.00093349964352\n", "support: 100100idx: 12\n", "name: 12\n", "dist: 0.000395638998582\n", "support: 9999
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## plot a tree in the notebook with toytree\n", "import toytree\n", "tre = toytree.tree(rax.trees.bipartitions)\n", "tre.root(wildcard=\"Ano\")\n", "tre.draw(\n", " width=350,\n", " height=400,\n", " node_labels=tre.get_node_values(\"support\"),\n", " #use_edge_lengths=True,\n", " );" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### *tetrad* -- quartet tree inference " ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loading seq array [14 taxa x 35118 bp]\n", "max unlinked SNPs per quartet (nloci): 7505\n" ] } ], "source": [ "## create a tetrad analysis object\n", "tet = ipa.tetrad(\n", " name=min4.name, \n", " seqfile=min4.outfiles.snpsphy,\n", " mapfile=min4.outfiles.snpsmap,\n", " nboots=100,\n", " )" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "host compute node: [40 cores] on tinus\n", "inferring 1001 induced quartet trees\n", "[####################] 100% initial tree | 0:00:04 | \n", "[####################] 100% boot 100 | 0:01:35 | \n" ] } ], "source": [ "## run tree inference\n", "tet.run(ipyclient)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "boots ~/Documents/ipyrad/tests/analysis-tetrad/vidua_min4.boots\n", "cons ~/Documents/ipyrad/tests/analysis-tetrad/vidua_min4.cons\n", "nhx ~/Documents/ipyrad/tests/analysis-tetrad/vidua_min4.nhx\n", "tree ~/Documents/ipyrad/tests/analysis-tetrad/vidua_min4.tree" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## access tree files\n", "tet.trees" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Anomalospiza_imberbisVidua_interjectaVidua_orientalisVidua_paradisaeaVidua_obtusaVidua_hypocherinaVidua_macroura_arenosaVidua_macroura_macrouraVidua_regiaVidua_fischeriVidua_purpurascensVidua_chalybeata_amauropteryxVidua_raricolaVidua_chalybeata_neumanniidx: 1\n", "name: added-node\n", "dist: 100\n", "support: 100100idx: 2\n", "name: 9\n", "dist: 100\n", "support: 100100idx: 3\n", "name: 10\n", "dist: 100\n", "support: 100100idx: 4\n", "name: 11\n", "dist: 100\n", "support: 100100idx: 5\n", "name: 5\n", "dist: 97\n", "support: 9797idx: 6\n", "name: 6\n", "dist: 75\n", "support: 7575idx: 7\n", "name: 7\n", "dist: 100\n", "support: 100100idx: 8\n", "name: 8\n", "dist: 100\n", "support: 100100idx: 9\n", "name: 1\n", "dist: 81\n", "support: 8181idx: 10\n", "name: 2\n", "dist: 100\n", "support: 100100idx: 11\n", "name: 3\n", "dist: 100\n", "support: 100100idx: 12\n", "name: 4\n", "dist: 54\n", "support: 5454
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## plot results (just like above, but unrooted by default)\n", "import toytree\n", "qtre = toytree.tree(tet.trees.nhx)\n", "qtre.root(wildcard=\"Ano\")\n", "qtre.draw(\n", " width=350,\n", " height=400,\n", " node_labels=tre.get_node_values(\"support\"),\n", " );" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Anomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricola
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## draw a cloud-tree to see variation among bootstrap trees\n", "## note that the trees are UNROOTED here, but tips are in the \n", "## same order in all trees.\n", "boots = toytree.multitree(tet.trees.boots, fixed_order=tre.get_tip_labels())\n", "boots.draw_cloudtree(orient='right', edge_style={\"opacity\": 0.05});" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### *STRUCTURE* -- population cluster inference" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## conda install structure clumpp -c ipyrad" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## create a structure analysis object for the no-outgroup data set\n", "struct = ipa.structure(\n", " name=min10.name, \n", " data=min10.outfiles.str, \n", " mapfile=min10.outfiles.snpsmap, \n", ")\n", "\n", "## set params for analysis (should be longer in real analyses)\n", "struct.mainparams.burnin=1000\n", "struct.mainparams.numreps=8000" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "submitted 10 structure jobs [vidua_min10-K-2]\n", "submitted 10 structure jobs [vidua_min10-K-4]\n", "submitted 10 structure jobs [vidua_min10-K-6]\n", "submitted 10 structure jobs [vidua_min10-K-8]\n" ] } ], "source": [ "## run structure across 10 random replicates of sampled unlinked SNPs\n", "for kpop in [2, 4, 6, 8]:\n", " struct.run(kpop=kpop, nreps=10, ipyclient=ipyclient)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## wait for all of these jobs to finish\n", "ipyclient.wait()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Use Clumpp to permute replicates" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## these options make it run faster\n", "struct.clumppparams.m = 3 ## use largegreedy algorithm\n", "struct.clumppparams.greedy_option = 2 ## test nrepeat possible orders\n", "struct.clumppparams.repeats = 10000 ## number of repeats" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "mean scores across 10 replicates.\n", "mean scores across 10 replicates.\n", "mean scores across 10 replicates.\n", "mean scores across 10 replicates.\n" ] } ], "source": [ "## collect results\n", "tables = {}\n", "for kpop in [2, 4, 6, 8]:\n", " tables[kpop] = struct.get_clumpp_table(kpop)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Plot the results as a barplot\n", "Usually a next step in a structure analysis would be do some kind of statistical analysis to compare models and identify K values that fit the data well. " ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## order of bars will be taken from ladderized tree above\n", "myorder = tre.get_tip_labels()" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Anomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricolaAnomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricola051015
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Anomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricolaAnomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricolaAnomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricolaAnomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricola051015
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Anomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricolaAnomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricolaAnomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricolaAnomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricolaAnomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricolaAnomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricola051015
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Anomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricolaAnomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricolaAnomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricolaAnomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricolaAnomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricolaAnomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricolaAnomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricolaAnomalospiza_imberbisVidua_orientalisVidua_interjectaVidua_obtusaVidua_paradisaeaVidua_hypocherinaVidua_macroura_macrouraVidua_macroura_arenosaVidua_fischeriVidua_regiaVidua_chalybeata_amauropteryxVidua_purpurascensVidua_chalybeata_neumanniVidua_raricola051015
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## import toyplot (packaged with toytree) \n", "import toyplot\n", "\n", "## plot bars for each K-value (mean of 10 reps)\n", "for kpop in [2, 4, 6, 8]:\n", " table = tables[kpop]\n", " table = table.ix[myorder]\n", " \n", " ## plot barplot w/ hover\n", " canvas, axes, mark = toyplot.bars(\n", " table, \n", " title=[[i] for i in table.index.tolist()],\n", " width=400, \n", " height=200, \n", " yshow=False, \n", " style={\"stroke\": toyplot.color.near_black},\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### *TREEMIX* -- ML tree & admixture co-inference" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## conda install treemix -c ipyrad" ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## group taxa into 'populations'\n", "imap = {\n", " 'orient': ['Vidua_orientalis'],\n", " 'interj': ['Vidua_interjecta'],\n", " 'obtusa': ['Vidua_obtusa'],\n", " 'paradi': ['Vidua_paradisaea'],\n", " 'hypoch': ['Vidua_hypocherina'],\n", " 'macrou': ['Vidua_macroura_macroura', 'Vidua_macroura_arenosa'],\n", " 'fische': ['Vidua_fischeri'],\n", " 'regia' : ['Vidua_regia'],\n", " 'chalyb': ['Vidua_chalybeata_amauropteryx', 'Vidua_chalybeata_neumanni'],\n", " 'purpur': ['Vidua_purpurascens'],\n", " 'rarico': ['Vidua_raricola'],\n", " #'outgro': ['Anomalospiza_imberbis'],\n", "}\n", "\n", "## optional: loci will be filtered if they do not have data for at\n", "## least N samples in each species. Minimums cannot be <1.\n", "minmap = {\n", " 'orient': 1,\n", " 'interj': 1,\n", " 'obtusa': 1,\n", " 'paradi': 1,\n", " 'hypoch': 1,\n", " 'macrou': 2,\n", " 'fische': 1,\n", " 'regia' : 1,\n", " 'chalyb': 2,\n", " 'purpur': 1,\n", " 'rarico': 1,\n", " #'outgro': 1,\n", " }" ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## create a treemix analysis object\n", "tmix = ipa.treemix(\n", " name=min10.name, \n", " data=min10.outfiles.snpsphy,\n", " imap=imap,\n", " minmap=minmap, \n", " )\n", "\n", "## set params on treemix object\n", "tmix.params.m = 1\n", "tmix.params.root = \"interj,orient,paradi,obtusa\"\n", "tmix.params.global_ = 1" ] }, { "cell_type": "code", "execution_count": 105, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ntaxa 14; nSNPs total 35118; nSNPs written 16364\n" ] } ], "source": [ "## you can simply write the input files and run them externally\n", "## or, as we show below, use the .run() command to run them here.\n", "tmix.write_output_file()" ] }, { "cell_type": "code", "execution_count": 108, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## a dictionary for storing treemix objects\n", "tdict = {}\n", "\n", "## iterate over values of m\n", "for rep in xrange(4):\n", " for mig in xrange(4):\n", " \n", " ## create new treemix object copy\n", " name = \"mig-{}-rep-{}\".format(mig, rep)\n", " tmp = tmix.copy(name)\n", "\n", " ## set params on new object\n", " tmp.params.m = mig\n", " \n", " ## run treemix analysis\n", " tmp.run()\n", " \n", " ## store the treemix object\n", " tdict[name] = tmp" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
orientinterjparadiobtusamacrouhypochfischeregiararicopurpurchalyb-0.10-0.050.000.05Drift parameterorientinterjparadiobtusamacrouhypochfischeregiararicopurpurchalyb-0.10-0.050.000.05Drift parameterorientinterjparadiobtusamacrouhypochfischeregiararicopurpurchalyb-0.10-0.050.000.05Drift parameterorientinterjparadiobtusamacrouhypochfischeregiararicopurpurchalyb-0.10-0.050.000.05Drift parameterorientinterjparadiobtusamacrouhypochfischeregiararicopurpurchalybweight: 0.491017-0.10-0.050.000.05Drift parameterorientinterjparadiobtusamacrouhypochfischeregiararicopurpurchalybweight: 0.491017-0.10-0.050.000.05Drift parameterorientinterjparadiobtusamacrouhypochfischeregiararicopurpurchalybweight: 0.491017-0.10-0.050.000.05Drift parameterorientinterjparadiobtusamacrouhypochfischeregiararicopurpurchalybweight: 0.491017-0.10-0.050.000.05Drift parameterorientinterjobtusaparadihypochmacrouregiafischeraricopurpurchalybweight: 0.460687weight: 0.0121953-0.10-0.050.000.05Drift parameterorientinterjobtusaparadihypochmacrouregiafischeraricopurpurchalybweight: 0.460687weight: 0.0121953-0.10-0.050.000.05Drift parameterorientinterjobtusaparadihypochmacrouregiafischeraricopurpurchalybweight: 0.460687weight: 0.0121953-0.10-0.050.000.05Drift parameterorientinterjobtusaparadihypochmacrouregiafischeraricopurpurchalybweight: 0.460687weight: 0.0121953-0.10-0.050.000.05Drift parameterorientinterjobtusaparadihypochmacrouraricochalybpurpurregiafischeweight: 0.0132624weight: 0.442482weight: 0.278645-0.15-0.10-0.050.000.05Drift parameterorientinterjobtusaparadihypochmacrouraricochalybpurpurregiafischeweight: 0.0132624weight: 0.442482weight: 0.278645-0.15-0.10-0.050.000.05Drift parameterorientinterjobtusaparadihypochmacrouraricochalybpurpurregiafischeweight: 0.0132624weight: 0.442482weight: 0.278645-0.15-0.10-0.050.000.05Drift parameterorientinterjobtusaparadihypochmacrouraricochalybpurpurregiafischeweight: 0.0132624weight: 0.442482weight: 0.278645-0.15-0.10-0.050.000.05Drift parameter
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import toyplot\n", "import numpy as np\n", "\n", "canvas = toyplot.Canvas(width=800, height=1200)\n", "idx = 0\n", "for mig in range(4):\n", " for rep in range(4):\n", " tmp = tdict[\"mig-{}-rep-{}\".format(mig, rep)]\n", " ax = canvas.cartesian(grid=(4, 4, idx), padding=25, margin=(25, 50, 100, 25))\n", " ax = tmp.draw(ax)\n", " idx += 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### baba: D-statistic (ABBA-BABA) admixture inference/hypothesis testing" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## create a baba analysis object\n", "bb = ipa.baba(\n", " data=min4.outfiles.loci,\n", " newick=\"analysis-raxml/RAxML_bestTree.vidua_min10\" \n", ")" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "168 tests generated from tree\n" ] } ], "source": [ "## this will generate tests from the tree, using constraints.\n", "bb.generate_tests_from_tree(\n", " constraint_exact=False,\n", " constraint_dict={\n", " \"p4\": ['Anomalospiza_imberbis'],\n", " 'p3': ['Vidua_macroura_macroura', 'Vidua_macroura_arenosa'],\n", " })" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[####################] 100% calculating D-stats | 0:00:24 | \n" ] } ], "source": [ "## run inference and significance testing on tests\n", "bb.run(ipyclient=ipyclient)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dstatbootmeanbootstdZABBABABAnloci
25-0.195-1.952e-010.0712.75143.25064.2503207
220.2122.116e-010.0782.72961.00039.6253143
1-0.186-1.816e-010.0692.70144.65665.0943262
13-0.192-1.888e-010.0742.60841.87561.7503186
15-0.212-2.081e-010.0842.52839.62561.0003143
........................
120-0.009-2.353e-030.1040.08534.75035.3752717
1570.0045.400e-030.0850.04746.81246.4383222
00.0024.923e-030.0710.03058.31258.0623268
76-0.003-6.335e-040.0940.02736.50036.6882765
700.0031.563e-030.0950.02736.68836.5002765
\n", "

168 rows × 7 columns

\n", "
" ], "text/plain": [ " dstat bootmean bootstd Z ABBA BABA nloci\n", "25 -0.195 -1.952e-01 0.071 2.751 43.250 64.250 3207\n", "22 0.212 2.116e-01 0.078 2.729 61.000 39.625 3143\n", "1 -0.186 -1.816e-01 0.069 2.701 44.656 65.094 3262\n", "13 -0.192 -1.888e-01 0.074 2.608 41.875 61.750 3186\n", "15 -0.212 -2.081e-01 0.084 2.528 39.625 61.000 3143\n", ".. ... ... ... ... ... ... ...\n", "120 -0.009 -2.353e-03 0.104 0.085 34.750 35.375 2717\n", "157 0.004 5.400e-03 0.085 0.047 46.812 46.438 3222\n", "0 0.002 4.923e-03 0.071 0.030 58.312 58.062 3268\n", "76 -0.003 -6.335e-04 0.094 0.027 36.500 36.688 2765\n", "70 0.003 1.563e-03 0.095 0.027 36.688 36.500 2765\n", "\n", "[168 rows x 7 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## sorted results for the tests performed \n", "bb.results_table.sort_values(by=\"Z\", ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### summary\n", "The top results show greatest support for admixture between V. macroura and V. paradisaea, though the signal is not very strong (Z=2.7)." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'p1': ['Vidua_paradisaea'],\n", " 'p2': ['Vidua_orientalis', 'Vidua_interjecta'],\n", " 'p3': ['Vidua_macroura_arenosa'],\n", " 'p4': ['Anomalospiza_imberbis']}" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## the test that had the most significant result: (BABA)\n", "bb.tests[25]" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'p1': ['Vidua_orientalis'],\n", " 'p2': ['Vidua_paradisaea'],\n", " 'p3': ['Vidua_macroura_macroura'],\n", " 'p4': ['Anomalospiza_imberbis']}" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## next best (ABBA)\n", "bb.tests[22]" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 2 }