{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Canarium GBS: Concordance analysis\n",
    "### *Federman et al.*\n",
    "\n",
    "This notebook provides all code necessary to reproduce the BUCKy concordance analyses. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Required software"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "## conda install ipyrad -c ipyrad\n",
    "## conda install bucky -c ipyrad\n",
    "## conda install mrbayes -c biobuilds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ipyrad v.0.7.20\n"
     ]
    }
   ],
   "source": [
    "import ipyrad as ip\n",
    "import ipyrad.analysis as ipa\n",
    "print \"ipyrad v.{}\".format(ip.__version__)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Connect to cluster"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "host compute node: [40 cores] on sacra\n"
     ]
    }
   ],
   "source": [
    "import ipyparallel as ipp\n",
    "ipyclient = ipp.Client()\n",
    "ip.cluster_info(ipyclient)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Concordance factors\n",
    "One sample was selected from each major clade that was recovered in every phyogenetic analysis (see notebook 2). Samples were selected that had the most data available and did not appear to be admixed. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "bucky_samples = [\n",
    "    \"D14269\",      ## outgroup          o\n",
    "    \"SF328\",       ## betamponae        1A\n",
    "    \"D13052\",      ## pilicarpum        1B\n",
    "    \"SF276\",       ## pulchebracteatum  1C  \n",
    "    \"D14483\",      ## multinervis       2A\n",
    "    \"D14505\",      ## velutinifolium    2B\n",
    "    \"D14478\",      ## multiflorum       2C\n",
    "    \"D12950\",      ## longistipulatum   3A\n",
    "    \"SF224\",       ## obtusifolium      3B\n",
    "    \"D13097\",      ## compressum        3C\n",
    "    ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "## initiate a bucky object\n",
    "b = ipa.bucky(\n",
    "    name=\"ten-clades\",\n",
    "    data=\"analysis-ipyrad/Canarium-min10_outfiles/Canarium-min10.alleles.loci\",\n",
    "    workdir=\"analysis-bucky\",\n",
    "    samples=bucky_samples,\n",
    "    minsnps=2,\n",
    "    seed=12345,\n",
    "    mb_mcmc_burnin=1000000,\n",
    "    mb_mcmc_ngen=4000000,\n",
    "    mb_mcmc_sample_freq=4000,\n",
    "    bucky_alpha=[0.1, 1.0, 10.0],\n",
    "    bucky_nchains=4,\n",
    "    bucky_nreps=4,\n",
    "    bucky_niter=int(1e6),\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "wrote 750 nexus files to ~/Documents/Canarium/analysis-bucky/ten-clades\n",
      "[####################] 100% [mb] infer gene-tree posteriors | 2:13:22 |  \n",
      "[####################] 100% [mbsum] sum replicate runs      | 0:00:01 |  \n",
      "[####################] 100% [bucky] infer CF posteriors     | 3:16:00 |  \n"
     ]
    }
   ],
   "source": [
    "## run full analysis [write-nex, mb, mbsum, bucky]\n",
    "b.run(ipyclient=ipyclient, force=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## BUCKy results "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "translate\r\n",
      " 1 D14269,\r\n",
      " 2 SF224,\r\n",
      " 3 D12950,\r\n",
      " 4 SF328,\r\n",
      " 5 D13097,\r\n",
      " 6 D13052,\r\n",
      " 7 D14483,\r\n",
      " 8 SF276,\r\n",
      " 9 D14478,\r\n",
      " 10 D14505;\r\n",
      "\r\n",
      "Population Tree:\r\n",
      "((((1,(4,(6,8))),((2,5),3)),7),9,10);\r\n",
      "\r\n",
      "Primary Concordance Tree Topology:\r\n",
      "((((1,(4,(6,8))),((2,5),3)),7),9,10);\r\n",
      "\r\n",
      "Population Tree, With Branch Lengths In Estimated Coalescent Units:\r\n",
      "((((1:10.000,(4:10.000,(6:10.000,8:10.000):0.617):1.512):0.462,((2:10.000,5:10.000):0.161,3:10.000):0.567):0.394,7:10.000):0.103,9:10.000,10:10.000);\r\n",
      "\r\n",
      "Primary Concordance Tree with Sample Concordance Factors:\r\n",
      "((((1:1.000,(4:1.000,(6:1.000,8:1.000):0.587):0.771):0.446,((2:1.000,5:1.000):0.375,3:1.000):0.392):0.338,7:1.000):0.305,9:1.000,10:1.000);\r\n",
      "\r\n",
      "Four-way partitions in the Population Tree: sample-wide CF, coalescent units and Ties(if present)\r\n",
      "{1,4,6,7,8,9,10; 3|2; 5}\t0.432, 0.161,  \r\n",
      "{1,2,3,5,7,9,10; 4|6; 8}\t0.640, 0.617,  \r\n",
      "{1; 2,3,5,7,9,10|4; 6,8}\t0.853, 1.512,  \r\n",
      "{1,2,3,4,5,6,8; 7|9; 10}\t0.399, 0.103,  \r\n",
      "{1,4,6,8; 7,9,10|2,5; 3}\t0.622, 0.567,  \r\n",
      "{1; 4,6,8|2,3,5; 7,9,10}\t0.580, 0.462,  \r\n",
      "{1,4,6,8; 2,3,5|7; 9,10}\t0.550, 0.394,  \r\n",
      "\r\n",
      "Splits in the Primary Concordance Tree: sample-wide and genome-wide mean CF (95% credibility), SD of mean sample-wide CF across runs\r\n",
      "{1,2,3,5,7,9,10|4,6,8} 0.771(0.737,0.816) 0.770(0.720,0.825)\t0.023\r\n",
      "{1,2,3,4,5,7,9,10|6,8} 0.587(0.445,0.693) 0.586(0.443,0.700)\t0.066\r\n",
      "{1,4,6,8|2,3,5,7,9,10} 0.446(0.401,0.495) 0.446(0.389,0.505)\t0.018\r\n",
      "{1,4,6,7,8,9,10|2,3,5} 0.392(0.328,0.464) 0.392(0.316,0.475)\t0.039\r\n",
      "{1,3,4,6,7,8,9,10|2,5} 0.375(0.281,0.443) 0.375(0.278,0.451)\t0.033\r\n",
      "{1,2,3,4,5,6,8|7,9,10} 0.338(0.279,0.412) 0.337(0.269,0.418)\t0.026\r\n",
      "{1,2,3,4,5,6,7,8|9,10} 0.305(0.241,0.371) 0.304(0.231,0.380)\t0.029\r\n",
      "\r\n",
      "Splits NOT in the Primary Concordance Tree but with estimated CF > 0.050:\r\n",
      "{1,2,4,6,7,8,9,10|3,5} 0.267(0.180,0.355) 0.267(0.174,0.362)\t0.044\r\n",
      "{1,2,3,4,5,6,8,10|7,9} 0.220(0.160,0.280) 0.219(0.152,0.289)\t0.022\r\n",
      "{1,2,3,4,5,6,8,9|7,10} 0.194(0.124,0.247) 0.194(0.120,0.256)\t0.026\r\n",
      "{1,2,3,5,7,8,9,10|4,6} 0.187(0.137,0.276) 0.187(0.130,0.280)\t0.025\r\n",
      "{1,4,5,6,7,8,9,10|2,3} 0.150(0.087,0.217) 0.150(0.080,0.225)\t0.039\r\n",
      "{1,2,3,5,6,7,9,10|4,8} 0.131(0.051,0.225) 0.131(0.047,0.230)\t0.047\r\n",
      "{1,2,3,5|4,6,7,8,9,10} 0.082(0.037,0.125) 0.082(0.034,0.132)\t0.025\r\n"
     ]
    }
   ],
   "source": [
    "## figures were made by hand from parsing the resulting in \n",
    "## the \"Splits in the Primary Concordance Tree\" section.\n",
    "! head -n 50 ./analysis-bucky/ten-clades/CF-a1.0.concordance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "translate\r\n",
      " 1 D14269,\r\n",
      " 2 SF224,\r\n",
      " 3 D12950,\r\n",
      " 4 SF328,\r\n",
      " 5 D13097,\r\n",
      " 6 D13052,\r\n",
      " 7 D14483,\r\n",
      " 8 SF276,\r\n",
      " 9 D14478,\r\n",
      " 10 D14505;\r\n",
      "\r\n",
      "Population Tree:\r\n",
      "((((1,(4,(6,8))),((2,5),3)),7),9,10);\r\n",
      "\r\n",
      "Primary Concordance Tree Topology:\r\n",
      "((((1,(4,(6,8))),((2,5),3)),7),9,10);\r\n",
      "\r\n",
      "Population Tree, With Branch Lengths In Estimated Coalescent Units:\r\n",
      "((((1:10.000,(4:10.000,(6:10.000,8:10.000):0.495):1.514):0.438,((2:10.000,5:10.000):0.170,3:10.000):0.546):0.395,7:10.000):0.140,9:10.000,10:10.000);\r\n",
      "\r\n",
      "Primary Concordance Tree with Sample Concordance Factors:\r\n",
      "((((1:1.000,(4:1.000,(6:1.000,8:1.000):0.540):0.775):0.445,((2:1.000,5:1.000):0.372,3:1.000):0.394):0.329,7:1.000):0.314,9:1.000,10:1.000);\r\n",
      "\r\n",
      "Four-way partitions in the Population Tree: sample-wide CF, coalescent units and Ties(if present)\r\n",
      "{1,4,6,7,8,9,10; 3|2; 5}\t0.438, 0.170,  \r\n",
      "{1,2,3,5,7,9,10; 4|6; 8}\t0.594, 0.495,  \r\n",
      "{1; 2,3,5,7,9,10|4; 6,8}\t0.853, 1.514,  \r\n",
      "{1,2,3,4,5,6,8; 7|9; 10}\t0.420, 0.140,  \r\n",
      "{1,4,6,8; 7,9,10|2,5; 3}\t0.614, 0.546,  \r\n",
      "{1; 4,6,8|2,3,5; 7,9,10}\t0.570, 0.438,  \r\n",
      "{1,4,6,8; 2,3,5|7; 9,10}\t0.551, 0.395,  \r\n",
      "\r\n",
      "Splits in the Primary Concordance Tree: sample-wide and genome-wide mean CF (95% credibility), SD of mean sample-wide CF across runs\r\n",
      "{1,2,3,5,7,9,10|4,6,8} 0.775(0.735,0.819) 0.775(0.722,0.826)\t0.018\r\n",
      "{1,2,3,4,5,7,9,10|6,8} 0.540(0.479,0.604) 0.540(0.468,0.616)\t0.010\r\n",
      "{1,4,6,8|2,3,5,7,9,10} 0.445(0.409,0.489) 0.445(0.394,0.501)\t0.006\r\n",
      "{1,4,6,7,8,9,10|2,3,5} 0.394(0.337,0.471) 0.394(0.327,0.478)\t0.020\r\n",
      "{1,3,4,6,7,8,9,10|2,5} 0.372(0.312,0.447) 0.372(0.300,0.457)\t0.026\r\n",
      "{1,2,3,4,5,6,8|7,9,10} 0.329(0.271,0.389) 0.329(0.260,0.400)\t0.026\r\n",
      "{1,2,3,4,5,6,7,8|9,10} 0.314(0.232,0.381) 0.314(0.225,0.392)\t0.030\r\n",
      "\r\n",
      "Splits NOT in the Primary Concordance Tree but with estimated CF > 0.050:\r\n",
      "{1,2,4,6,7,8,9,10|3,5} 0.281(0.171,0.427) 0.281(0.164,0.437)\t0.078\r\n",
      "{1,2,3,5,7,8,9,10|4,6} 0.204(0.161,0.281) 0.204(0.152,0.284)\t0.018\r\n",
      "{1,2,3,4,5,6,8,10|7,9} 0.191(0.137,0.265) 0.191(0.130,0.272)\t0.020\r\n",
      "{1,2,3,4,5,6,8,9|7,10} 0.165(0.117,0.209) 0.165(0.112,0.217)\t0.015\r\n",
      "{1,2,3,5,6,7,9,10|4,8} 0.150(0.099,0.203) 0.150(0.092,0.212)\t0.026\r\n",
      "{1,4,5,6,7,8,9,10|2,3} 0.143(0.081,0.212) 0.143(0.076,0.218)\t0.035\r\n",
      "{1,2,3,5|4,6,7,8,9,10} 0.084(0.043,0.128) 0.084(0.039,0.136)\t0.023\r\n"
     ]
    }
   ],
   "source": [
    "## figures were made by hand from parsing the resulting in \n",
    "## the \"Splits in the Primary Concordance Tree\" section.\n",
    "! head -n 50 ./analysis-bucky/ten-clades/CF-a0.1.concordance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "translate\r\n",
      " 1 D14269,\r\n",
      " 2 SF224,\r\n",
      " 3 D12950,\r\n",
      " 4 SF328,\r\n",
      " 5 D13097,\r\n",
      " 6 D13052,\r\n",
      " 7 D14483,\r\n",
      " 8 SF276,\r\n",
      " 9 D14478,\r\n",
      " 10 D14505;\r\n",
      "\r\n",
      "Population Tree:\r\n",
      "((((1,(4,(6,8))),((2,5),3)),7),9,10);\r\n",
      "\r\n",
      "Primary Concordance Tree Topology:\r\n",
      "((((1,(4,(6,8))),((2,5),3)),7),9,10);\r\n",
      "\r\n",
      "Population Tree, With Branch Lengths In Estimated Coalescent Units:\r\n",
      "((((1:10.000,(4:10.000,(6:10.000,8:10.000):0.472):1.619):0.438,((2:10.000,5:10.000):0.200,3:10.000):0.552):0.464,7:10.000):0.060,9:10.000,10:10.000);\r\n",
      "\r\n",
      "Primary Concordance Tree with Sample Concordance Factors:\r\n",
      "((((1:1.000,(4:1.000,(6:1.000,8:1.000):0.545):0.796):0.443,((2:1.000,5:1.000):0.398,3:1.000):0.415):0.375,7:1.000):0.280,9:1.000,10:1.000);\r\n",
      "\r\n",
      "Four-way partitions in the Population Tree: sample-wide CF, coalescent units and Ties(if present)\r\n",
      "{1,4,6,7,8,9,10; 3|2; 5}\t0.454, 0.200,  \r\n",
      "{1,2,3,5,7,9,10; 4|6; 8}\t0.584, 0.472,  \r\n",
      "{1; 2,3,5,7,9,10|4; 6,8}\t0.868, 1.619,  \r\n",
      "{1,2,3,4,5,6,8; 7|9; 10}\t0.372, 0.060,  \r\n",
      "{1,4,6,8; 7,9,10|2,5; 3}\t0.616, 0.552,  \r\n",
      "{1; 4,6,8|2,3,5; 7,9,10}\t0.570, 0.438,  \r\n",
      "{1,4,6,8; 2,3,5|7; 9,10}\t0.581, 0.464,  \r\n",
      "\r\n",
      "Splits in the Primary Concordance Tree: sample-wide and genome-wide mean CF (95% credibility), SD of mean sample-wide CF across runs\r\n",
      "{1,2,3,5,7,9,10|4,6,8} 0.796(0.769,0.824) 0.786(0.745,0.826)\t0.008\r\n",
      "{1,2,3,4,5,7,9,10|6,8} 0.545(0.479,0.632) 0.539(0.463,0.629)\t0.010\r\n",
      "{1,4,6,8|2,3,5,7,9,10} 0.443(0.371,0.515) 0.437(0.357,0.520)\t0.021\r\n",
      "{1,4,6,7,8,9,10|2,3,5} 0.415(0.375,0.475) 0.410(0.355,0.479)\t0.010\r\n",
      "{1,3,4,6,7,8,9,10|2,5} 0.398(0.337,0.471) 0.394(0.324,0.474)\t0.024\r\n",
      "{1,2,3,4,5,6,8|7,9,10} 0.375(0.321,0.428) 0.370(0.307,0.434)\t0.024\r\n",
      "{1,2,3,4,5,6,7,8|9,10} 0.280(0.209,0.356) 0.277(0.198,0.361)\t0.044\r\n",
      "\r\n",
      "Splits NOT in the Primary Concordance Tree but with estimated CF > 0.050:\r\n",
      "{1,2,4,6,7,8,9,10|3,5} 0.280(0.208,0.361) 0.278(0.198,0.364)\t0.032\r\n",
      "{1,2,3,5,7,8,9,10|4,6} 0.225(0.171,0.285) 0.223(0.161,0.292)\t0.016\r\n",
      "{1,2,3,4,5,6,8,10|7,9} 0.217(0.141,0.289) 0.215(0.134,0.296)\t0.038\r\n",
      "{1,2,3,4,5,6,8,9|7,10} 0.211(0.152,0.265) 0.209(0.145,0.271)\t0.014\r\n",
      "{1,2,3,5,6,7,9,10|4,8} 0.149(0.065,0.236) 0.148(0.062,0.236)\t0.010\r\n",
      "{1,4,5,6,7,8,9,10|2,3} 0.129(0.081,0.175) 0.128(0.078,0.180)\t0.014\r\n",
      "{1,2,3,5|4,6,7,8,9,10} 0.082(0.044,0.135) 0.081(0.040,0.140)\t0.027\r\n"
     ]
    }
   ],
   "source": [
    "## figures were made by hand from parsing the resulting in \n",
    "## the \"Splits in the Primary Concordance Tree\" section.\n",
    "! head -n 50 ./analysis-bucky/ten-clades/CF-a10.0.concordance"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}