{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Notebook 5: \n",
    "\n",
    "This is an IPython notebook. Most of the code is composed of bash scripts, indicated by %%bash at the top of the cell, otherwise it is IPython code. This notebook includes code to download, assemble and analyze a published RADseq data set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": []
    }
   ],
   "source": [
    "### Notebook 5\n",
    "### Data set 5 (Heliconius)\n",
    "### Authors: Nadeau et al. (2013)\n",
    "### Data Location: ERP000991"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download the sequence data\n",
    "Below I read in EraRunTable.txt for this project which contains all of the information we need to download the data. \n",
    "\n",
    "+ Project ERA: ERP000991"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "## make a new directory for this analysis\n",
    "mkdir -p empirical_5/fastq/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### sub-sampling\n",
    "The authors used only a subset of the data that are on the archive for their phylogenetic analyses so I will choose the same 54 samples here which are listed in Table S1 of their publication. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "54"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "subsamples = ['ERS070268','ERS070269','ERS070270','ERS070271','ERS070272',\n",
    "              'ERS070273','ERS070274','ERS070275','ERS070276','ERS070277',\n",
    "              'ERS070246','ERS070247','ERS070248','ERS070249','ERS070245',\n",
    "              'ERS070252','ERS070253','ERS070254','ERS070256','ERS070255',\n",
    "              'ERS074398','ERS074399','ERS074400','ERS074401','ERS074402',\n",
    "              'ERS074403','ERS074404','ERS074405','ERS074406','ERS074407',\n",
    "              'ERS074408','ERS074409','ERS074410','ERS074411','ERS074412',\n",
    "              'ERS074413','ERS074414','ERS074415','ERS074416','ERS074417',\n",
    "              'ERS074418','ERS074419','ERS074420','ERS074421','ERS074422',\n",
    "              'ERS074423','ERS074424','ERS074425','ERS074426','ERS074427',\n",
    "              'ERS074428','ERS074429','ERS070250','ERS070251']\n",
    "\n",
    "len(subsamples)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### For each ERS (individuals) get all of the ERR (sequence file accessions)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  study_accession secondary_study_accession sample_accession  \\\n",
      "0       PRJEB2743                 ERP000991     SAMEA1322899   \n",
      "1       PRJEB2743                 ERP000991     SAMEA1322899   \n",
      "2       PRJEB2743                 ERP000991     SAMEA1322873   \n",
      "3       PRJEB2743                 ERP000991     SAMEA1322873   \n",
      "4       PRJEB2743                 ERP000991     SAMEA1322940   \n",
      "\n",
      "  secondary_sample_accession experiment_accession run_accession  tax_id  \\\n",
      "0                  ERS070236            ERX030872     ERR053824   33444   \n",
      "1                  ERS070236            ERX030873     ERR053825   33444   \n",
      "2                  ERS070237            ERX030874     ERR053826   33444   \n",
      "3                  ERS070237            ERX030875     ERR053827   33444   \n",
      "4                  ERS070238            ERX030876     ERR053828   33444   \n",
      "\n",
      "       scientific_name              instrument_model library_layout  \\\n",
      "0  Heliconius elevatus  Illumina Genome Analyzer IIx         PAIRED   \n",
      "1  Heliconius elevatus  Illumina Genome Analyzer IIx         PAIRED   \n",
      "2  Heliconius elevatus  Illumina Genome Analyzer IIx         PAIRED   \n",
      "3  Heliconius elevatus  Illumina Genome Analyzer IIx         PAIRED   \n",
      "4  Heliconius elevatus  Illumina Genome Analyzer IIx         PAIRED   \n",
      "\n",
      "                                           fastq_ftp  \\\n",
      "0  ftp.sra.ebi.ac.uk/vol1/fastq/ERR053/ERR053824/...   \n",
      "1  ftp.sra.ebi.ac.uk/vol1/fastq/ERR053/ERR053825/...   \n",
      "2  ftp.sra.ebi.ac.uk/vol1/fastq/ERR053/ERR053826/...   \n",
      "3  ftp.sra.ebi.ac.uk/vol1/fastq/ERR053/ERR053827/...   \n",
      "4  ftp.sra.ebi.ac.uk/vol1/fastq/ERR053/ERR053828/...   \n",
      "\n",
      "                                        fastq_galaxy  \\\n",
      "0  ftp.sra.ebi.ac.uk/vol1/fastq/ERR053/ERR053824/...   \n",
      "1  ftp.sra.ebi.ac.uk/vol1/fastq/ERR053/ERR053825/...   \n",
      "2  ftp.sra.ebi.ac.uk/vol1/fastq/ERR053/ERR053826/...   \n",
      "3  ftp.sra.ebi.ac.uk/vol1/fastq/ERR053/ERR053827/...   \n",
      "4  ftp.sra.ebi.ac.uk/vol1/fastq/ERR053/ERR053828/...   \n",
      "\n",
      "                                       submitted_ftp  \\\n",
      "0  ftp.sra.ebi.ac.uk/vol1/ERA066/ERA066292/fastq/...   \n",
      "1  ftp.sra.ebi.ac.uk/vol1/ERA066/ERA066292/fastq/...   \n",
      "2  ftp.sra.ebi.ac.uk/vol1/ERA066/ERA066292/fastq/...   \n",
      "3  ftp.sra.ebi.ac.uk/vol1/ERA066/ERA066292/fastq/...   \n",
      "4  ftp.sra.ebi.ac.uk/vol1/ERA066/ERA066292/fastq/...   \n",
      "\n",
      "                                    submitted_galaxy  cram_index_ftp  \\\n",
      "0  ftp.sra.ebi.ac.uk/vol1/ERA066/ERA066292/fastq/...             NaN   \n",
      "1  ftp.sra.ebi.ac.uk/vol1/ERA066/ERA066292/fastq/...             NaN   \n",
      "2  ftp.sra.ebi.ac.uk/vol1/ERA066/ERA066292/fastq/...             NaN   \n",
      "3  ftp.sra.ebi.ac.uk/vol1/ERA066/ERA066292/fastq/...             NaN   \n",
      "4  ftp.sra.ebi.ac.uk/vol1/ERA066/ERA066292/fastq/...             NaN   \n",
      "\n",
      "   cram_index_galaxy  \n",
      "0                NaN  \n",
      "1                NaN  \n",
      "2                NaN  \n",
      "3                NaN  \n",
      "4                NaN  \n"
     ]
    }
   ],
   "source": [
    "## IPython code\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import urllib2\n",
    "import os\n",
    "\n",
    "## open the SRA run table from github url\n",
    "url = \"https://raw.githubusercontent.com/\"+\\\n",
    "      \"dereneaton/RADmissing/master/empirical_9_EraRunTable.txt\"\n",
    "intable = urllib2.urlopen(url)\n",
    "indata = pd.read_table(intable, sep=\"\\t\")\n",
    "\n",
    "## print first few rows\n",
    "print indata.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def wget_download_ERR(ERR, outdir, outname):\n",
    "    \"\"\" Python function to get sra data from ncbi and write to\n",
    "    outdir with a new name using bash call wget \"\"\"\n",
    "    \n",
    "    ## get output name\n",
    "    output = os.path.join(outdir, outname+\".fastq.gz\")\n",
    "\n",
    "    ## create a call string \n",
    "    call = \"wget -q -r -nH --cut-dirs=9 -O \"+output+\" \"+\\\n",
    "           \"ftp://ftp.sra.ebi.ac.uk/vol1/fastq/\"+\\\n",
    "           \"{}/{}/{}_1.fastq.gz\".format(ERR[:6], ERR, ERR)\n",
    "    \n",
    "    ## call bash script\n",
    "    ! $call "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we pass the SRR number and the sample name to the `wget_download` function so that the files are saved with their sample names. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "collapsed": false,
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Heliconius timareta ssp. JD-2011   \tERS070246\tERR053848\n",
      "Heliconius timareta ssp. JD-2011   \tERS070246\tERR053849\n",
      "Heliconius timareta ssp. JD-2011   \tERS070246\tERR053850\n",
      "Heliconius timareta ssp. JD-2011   \tERS070247\tERR053851\n",
      "Heliconius timareta ssp. JD-2011   \tERS070247\tERR053852\n",
      "Heliconius timareta ssp. JD-2011   \tERS070247\tERR053853\n",
      "Heliconius timareta ssp. JD-2011   \tERS070245\tERR053854\n",
      "Heliconius timareta ssp. JD-2011   \tERS070245\tERR053855\n",
      "Heliconius timareta ssp. JD-2011   \tERS070248\tERR053856\n",
      "Heliconius timareta ssp. JD-2011   \tERS070248\tERR053857\n",
      "Heliconius timareta ssp. JD-2011   \tERS070249\tERR053858\n",
      "Heliconius timareta ssp. JD-2011   \tERS070249\tERR053859\n",
      "Heliconius timareta timareta       \tERS070250\tERR053860\n",
      "Heliconius timareta timareta       \tERS070250\tERR053861\n",
      "Heliconius timareta timareta       \tERS070250\tERR053862\n",
      "Heliconius timareta timareta       \tERS070251\tERR053863\n",
      "Heliconius timareta timareta       \tERS070251\tERR053864\n",
      "Heliconius timareta timareta       \tERS070251\tERR053865\n",
      "Heliconius hecale felix            \tERS070252\tERR053866\n",
      "Heliconius hecale felix            \tERS070252\tERR053867\n",
      "Heliconius hecale felix            \tERS070253\tERR053868\n",
      "Heliconius hecale felix            \tERS070253\tERR053869\n",
      "Heliconius hecale felix            \tERS070254\tERR053870\n",
      "Heliconius hecale felix            \tERS070254\tERR053871\n",
      "Heliconius hecale felix            \tERS070255\tERR053872\n",
      "Heliconius hecale felix            \tERS070255\tERR053873\n",
      "Heliconius hecale felix            \tERS070255\tERR053874\n",
      "Heliconius hecale felix            \tERS070256\tERR053875\n",
      "Heliconius hecale felix            \tERS070256\tERR053876\n",
      "Heliconius hecale felix            \tERS070256\tERR053877\n",
      "Heliconius melpomene aglaope       \tERS070268\tERR053897\n",
      "Heliconius melpomene aglaope       \tERS070268\tERR053898\n",
      "Heliconius melpomene aglaope       \tERS070269\tERR053899\n",
      "Heliconius melpomene aglaope       \tERS070269\tERR053900\n",
      "Heliconius melpomene aglaope       \tERS070270\tERR053901\n",
      "Heliconius melpomene aglaope       \tERS070270\tERR053902\n",
      "Heliconius melpomene aglaope       \tERS070271\tERR053903\n",
      "Heliconius melpomene aglaope       \tERS070271\tERR053904\n",
      "Heliconius melpomene aglaope       \tERS070272\tERR053905\n",
      "Heliconius melpomene aglaope       \tERS070272\tERR053906\n",
      "Heliconius melpomene amaryllis     \tERS070273\tERR053907\n",
      "Heliconius melpomene amaryllis     \tERS070273\tERR053908\n",
      "Heliconius melpomene amaryllis     \tERS070274\tERR053909\n",
      "Heliconius melpomene amaryllis     \tERS070274\tERR053910\n",
      "Heliconius melpomene amaryllis     \tERS070275\tERR053911\n",
      "Heliconius melpomene amaryllis     \tERS070275\tERR053912\n",
      "Heliconius melpomene amaryllis     \tERS070276\tERR053913\n",
      "Heliconius melpomene amaryllis     \tERS070276\tERR053914\n",
      "Heliconius melpomene amaryllis     \tERS070277\tERR053915\n",
      "Heliconius melpomene amaryllis     \tERS070277\tERR053916\n",
      "Heliconius heurippa                \tERS074398\tERR055402\n",
      "Heliconius heurippa                \tERS074399\tERR055403\n",
      "Heliconius heurippa                \tERS074400\tERR055404\n",
      "Heliconius heurippa                \tERS074401\tERR055405\n",
      "Heliconius heurippa                \tERS074402\tERR055406\n",
      "Heliconius heurippa                \tERS074403\tERR055407\n",
      "Heliconius cydno cordula           \tERS074404\tERR055408\n",
      "Heliconius cydno cordula           \tERS074405\tERR055409\n",
      "Heliconius cydno cordula           \tERS074406\tERR055410\n",
      "Heliconius cydno cordula           \tERS074407\tERR055411\n",
      "Heliconius cydno cordula           \tERS074408\tERR055412\n",
      "Heliconius cydno cordula           \tERS074409\tERR055413\n",
      "Heliconius cydno chioneus          \tERS074410\tERR055414\n",
      "Heliconius cydno chioneus          \tERS074411\tERR055415\n",
      "Heliconius cydno weymeri           \tERS074412\tERR055416\n",
      "Heliconius cydno weymeri           \tERS074413\tERR055417\n",
      "Heliconius cydno cydnides          \tERS074414\tERR055418\n",
      "Heliconius cydno cydnides          \tERS074415\tERR055419\n",
      "Heliconius melpomene melpomene     \tERS074416\tERR055420\n",
      "Heliconius melpomene melpomene     \tERS074417\tERR055421\n",
      "Heliconius melpomene melpomene     \tERS074418\tERR055422\n",
      "Heliconius melpomene melpomene     \tERS074419\tERR055423\n",
      "Heliconius melpomene melpomene     \tERS074420\tERR055424\n",
      "Heliconius melpomene melpomene     \tERS074421\tERR055425\n",
      "Heliconius melpomene melpomene     \tERS074422\tERR055426\n",
      "Heliconius melpomene melpomene     \tERS074423\tERR055427\n",
      "Heliconius melpomene melpomene     \tERS074424\tERR055428\n",
      "Heliconius melpomene melpomene     \tERS074425\tERR055429\n",
      "Heliconius melpomene rosina        \tERS074426\tERR055430\n",
      "Heliconius melpomene rosina        \tERS074427\tERR055431\n",
      "Heliconius melpomene ecuadorensis  \tERS074428\tERR055432\n",
      "Heliconius melpomene ecuadorensis  \tERS074429\tERR055433\n"
     ]
    }
   ],
   "source": [
    "for ID, ERS, ERR in zip(indata.scientific_name, \n",
    "                        indata.secondary_sample_accession,\n",
    "                        indata.run_accession):\n",
    "    if ERS in subsamples:\n",
    "        name = ID.split()[1]\n",
    "        name += \"_\"+ERS+\"_\"+ERR\n",
    "        print \"{:<35}\\t{}\\t{}\".format(ID, ERS, ERR)\n",
    "        wget_download_ERR(ERR, \"empirical_5/fastq/\", name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Note:\n",
    "The data look kind of weird because there are a lot of As in the beginning. I figured out it is just because the sequences are sorted alphabetically."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Merge technical replicates\n",
    "This study includes several technical replicates per sequenced individuals, which we combine into a single file for each individual here. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Make a new directory for merged files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "mkdir -p empirical_5/fastq/merged"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "melpomene_ERS070276 merged\n",
      "melpomene_ERS070277 merged\n",
      "melpomene_ERS070274 merged\n",
      "melpomene_ERS070275 merged\n",
      "melpomene_ERS070272 merged\n",
      "melpomene_ERS070273 merged\n",
      "melpomene_ERS070270 merged\n",
      "melpomene_ERS070271 merged\n",
      "timareta_ERS070247 merged\n",
      "timareta_ERS070246 merged\n",
      "timareta_ERS070245 merged\n",
      "timareta_ERS070249 merged\n",
      "timareta_ERS070248 merged\n",
      "cydno_ERS074413 merged\n",
      "cydno_ERS074412 merged\n",
      "cydno_ERS074411 merged\n",
      "cydno_ERS074410 merged\n",
      "cydno_ERS074415 merged\n",
      "cydno_ERS074414 merged\n",
      "hecale_ERS070253 merged\n",
      "hecale_ERS070252 merged\n",
      "hecale_ERS070256 merged\n",
      "hecale_ERS070255 merged\n",
      "hecale_ERS070254 merged\n",
      "melpomene_ERS074418 merged\n",
      "melpomene_ERS074419 merged\n",
      "melpomene_ERS074416 merged\n",
      "melpomene_ERS074417 merged\n",
      "melpomene_ERS070269 merged\n",
      "melpomene_ERS070268 merged\n",
      "timareta_ERS070250 merged\n",
      "timareta_ERS070251 merged\n"
     ]
    }
   ],
   "source": [
    "##IPython code\n",
    "import glob\n",
    "import os\n",
    "\n",
    "## get all fastq files\n",
    "taxa = [i.split(\"/\")[-1].rsplit('_',1)[0] for i in \\\n",
    "        glob.glob(\"empirical_5/fastq/*.gz\")]\n",
    "\n",
    "## iterate over individuals and create merge file\n",
    "for taxon in set(taxa):\n",
    "    print taxon, \"merged\"\n",
    "    reps = glob.glob(\"empirical_5/fastq/\"+taxon+\"*\")\n",
    "\n",
    "    if len(reps) > 1:      \n",
    "        ## get replicate files\n",
    "        with open(\"empirical_5/fastq/merged/\"+taxon+\".fastq.gz\", 'wb') as out:\n",
    "            for rep in reps:\n",
    "                with open(rep, 'rb') as tempin:\n",
    "                    out.write(tempin.read())\n",
    "    else:\n",
    "        dirs, ff = os.path.split(reps[0])\n",
    "        os.rename(os.path.join(dirs,ff), os.path.join(dirs,\"merged\",taxon+\".fastq.gz\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Make a params file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "pyRAD 3.0.63\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "pyrad --version"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\tnew params.txt file created\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "## remove old params file if it exists\n",
    "rm params.txt \n",
    "\n",
    "## create a new default params file\n",
    "pyrad -n "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Note: \n",
    "The data here are from Illumina Casava <1.8, so the phred scores are offset by 64 instead of 33, so we use that in the params file below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "## substitute new parameters into file\n",
    "sed -i '/## 1. /c\\empirical_5/           ## 1. working directory ' params.txt\n",
    "sed -i '/## 6. /c\\TGCAG                  ## 6. cutters ' params.txt\n",
    "sed -i '/## 7. /c\\20                     ## 7. N processors      ' params.txt\n",
    "sed -i '/## 9. /c\\6                      ## 9. NQual             ' params.txt\n",
    "sed -i '/## 10./c\\.85                    ## 10. clust threshold  ' params.txt\n",
    "sed -i '/## 12./c\\4                      ## 12. MinCov           ' params.txt\n",
    "sed -i '/## 13./c\\10                     ## 13. maxSH            ' params.txt\n",
    "sed -i '/## 14./c\\empirical_5_m4          ## 14. output name      ' params.txt\n",
    "sed -i '/## 18./c\\empirical_5/fastq/merged/*.gz ## 18. data location    ' params.txt\n",
    "sed -i '/## 20./c\\64                     ## 20. phred offset     ' params.txt\n",
    "sed -i '/## 29./c\\2,2                    ## 29. trim overhang    ' params.txt\n",
    "sed -i '/## 30./c\\p,n,s                  ## 30. output formats   ' params.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "==** parameter inputs for pyRAD version 3.0.63  **======================== affected step ==\r\n",
      "empirical_5/           ## 1. working directory \r\n",
      "./*.fastq.gz              ## 2. Loc. of non-demultiplexed files (if not line 18)  (s1)\r\n",
      "./*.barcodes              ## 3. Loc. of barcode file (if not line 18)             (s1)\r\n",
      "vsearch                   ## 4. command (or path) to call vsearch (or usearch)    (s3,s6)\r\n",
      "muscle                    ## 5. command (or path) to call muscle                  (s3,s7)\r\n",
      "TGCAG                  ## 6. cutters \r\n",
      "20                     ## 7. N processors      \r\n",
      "6                         ## 8. Mindepth: min coverage for a cluster              (s4,s5)\r\n",
      "6                      ## 9. NQual             \r\n",
      ".85                    ## 10. clust threshold  \r\n",
      "rad                       ## 11. Datatype: rad,gbs,pairgbs,pairddrad,(others:see docs)(all)\r\n",
      "4                      ## 12. MinCov           \r\n",
      "10                     ## 13. maxSH            \r\n",
      "empirical_5            ## 14. output name      \r\n",
      "==== optional params below this line ===================================  affected step ==\r\n",
      "                       ## 15.opt.: select subset (prefix* only selector)            (s2-s7)\r\n",
      "                       ## 16.opt.: add-on (outgroup) taxa (list or prefix*)         (s6,s7)\r\n",
      "                       ## 17.opt.: exclude taxa (list or prefix*)                   (s7)\r\n",
      "empirical_5/fastq/merged/*.gz ## 18. data location    \r\n",
      "                       ## 19.opt.: maxM: N mismatches in barcodes (def= 1)          (s1)\r\n",
      "64                     ## 20. phred offset     \r\n",
      "                       ## 21.opt.: filter: def=0=NQual 1=NQual+adapters. 2=strict   (s2)\r\n",
      "                       ## 22.opt.: a priori E,H (def= 0.001,0.01, if not estimated) (s5)\r\n",
      "                       ## 23.opt.: maxN: max Ns in a cons seq (def=5)               (s5)\r\n",
      "                       ## 24.opt.: maxH: max heterozyg. sites in cons seq (def=5)   (s5)\r\n",
      "                       ## 25.opt.: ploidy: max alleles in cons seq (def=2;see docs) (s4,s5)\r\n",
      "                       ## 26.opt.: maxSNPs: (def=100). Paired (def=100,100)         (s7)\r\n",
      "                       ## 27.opt.: maxIndels: within-clust,across-clust (def. 3,99) (s3,s7)\r\n",
      "                       ## 28.opt.: random number seed (def. 112233)              (s3,s6,s7)\r\n",
      "2,2                    ## 29. trim overhang    \r\n",
      "p,n,s                  ## 30. output formats   \r\n",
      "                       ## 31.opt.: maj. base call at depth>x<mindepth (def.x=mindepth) (s5)\r\n",
      "                       ## 32.opt.: keep trimmed reads (def=0). Enter min length.    (s2)\r\n",
      "                       ## 33.opt.: max stack size (int), def= max(500,mean+2*SD)    (s3)\r\n",
      "                       ## 34.opt.: minDerep: exclude dereps with <= N copies, def=1 (s3)\r\n",
      "                       ## 35.opt.: use hierarchical clustering (def.=0, 1=yes)      (s6)\r\n",
      "                       ## 36.opt.: repeat masking (def.=1='dust' method, 0=no)      (s3,s6)\r\n",
      "                       ## 37.opt.: vsearch max threads per job (def.=6; see docs)   (s3,s6)\r\n",
      "==== optional: list group/clade assignments below this line (see docs) ==================\r\n"
     ]
    }
   ],
   "source": [
    "cat params.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Assemble in _pyrad_\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "pyrad -p params.txt -s 234567 >> log.txt 2>&1  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "sed -i '/## 12./c\\2                    ## 12. MinCov           ' params.txt\n",
    "sed -i '/## 14./c\\empirical_5_m2       ## 14. output name      ' params.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "pyrad -p params.txt -s 7 >> log.txt 2>&1  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Results\n",
    "We are interested in the relationship between the amount of input (raw) data between any two samples, the average coverage they recover when clustered together, and the phylogenetic distances separating samples.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Raw data amounts\n",
    "The average number of raw reads per sample is 1.36M."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "count          54.000000\n",
      "mean      5315416.333333\n",
      "std       2877440.775877\n",
      "min       2022986.000000\n",
      "25%       3310234.000000\n",
      "50%       3784971.000000\n",
      "75%       8453907.500000\n",
      "max      12011939.000000\n",
      "Name: passed.total, dtype: float64\n",
      "\n",
      "most raw data in sample:\n",
      "15    hecale_ERS070255\n",
      "Name: sample , dtype: object\n"
     ]
    }
   ],
   "source": [
    "## read in the data\n",
    "s2dat = pd.read_table(\"empirical_5/stats/s2.rawedit.txt\", header=0, nrows=55)\n",
    "\n",
    "## print summary stats\n",
    "print s2dat[\"passed.total\"].describe()\n",
    "\n",
    "## find which sample has the most raw data\n",
    "maxraw = s2dat[\"passed.total\"].max()\n",
    "print \"\\nmost raw data in sample:\"\n",
    "print s2dat['sample '][s2dat['passed.total']==maxraw]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Look at distributions of coverage\n",
    "pyrad v.3.0.63 outputs depth information for each sample which I read in here and plot. First let's ask which sample has the highest depth of coverage. The std of coverages is pretty low in this data set compared to several others. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "summary of means\n",
      "==================\n",
      "count    54.000000\n",
      "mean     47.842574\n",
      "std      19.705225\n",
      "min      15.745000\n",
      "25%      38.288250\n",
      "50%      44.435500\n",
      "75%      56.646000\n",
      "max      92.441000\n",
      "Name: dpt.me, dtype: float64\n",
      "\n",
      "summary of std\n",
      "==================\n",
      "count     54.000000\n",
      "mean     165.924389\n",
      "std       76.682339\n",
      "min       34.403000\n",
      "25%      118.026500\n",
      "50%      150.211500\n",
      "75%      215.882000\n",
      "max      381.038000\n",
      "Name: dpt.sd, dtype: float64\n",
      "\n",
      "summary of proportion lowdepth\n",
      "==================\n",
      "count    54.000000\n",
      "mean      0.359732\n",
      "std       0.135457\n",
      "min       0.151041\n",
      "25%       0.257660\n",
      "50%       0.347190\n",
      "75%       0.466101\n",
      "max       0.669051\n",
      "dtype: float64\n",
      "\n",
      "highest coverage in sample:\n",
      "53    timareta_ERS070251\n",
      "Name: taxa, dtype: object\n"
     ]
    }
   ],
   "source": [
    "## read in the s3 results\n",
    "s5dat = pd.read_table(\"empirical_5/stats/s3.clusters.txt\", header=0, nrows=54)\n",
    "\n",
    "## print summary stats\n",
    "print \"summary of means\\n==================\"\n",
    "print s5dat['dpt.me'].describe()\n",
    "\n",
    "## print summary stats\n",
    "print \"\\nsummary of std\\n==================\"\n",
    "print s5dat['dpt.sd'].describe()\n",
    "\n",
    "## print summary stats\n",
    "print \"\\nsummary of proportion lowdepth\\n==================\"\n",
    "print pd.Series(1-s5dat['d>5.tot']/s5dat[\"total\"]).describe()\n",
    "\n",
    "## find which sample has the greatest depth of retained loci\n",
    "max_hiprop = (s5dat[\"d>5.tot\"]/s5dat[\"total\"]).max()\n",
    "print \"\\nhighest coverage in sample:\"\n",
    "print s5dat['taxa'][s5dat['d>5.tot']/s5dat[\"total\"]==max_hiprop]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "44.0298150232 117.15674992\n"
     ]
    }
   ],
   "source": [
    "## print mean and std of coverage for the highest coverage sample\n",
    "with open(\"empirical_5/clust.85/timareta_ERS070251.depths\", 'rb') as indat:\n",
    "    depths = np.array(indat.read().strip().split(\",\"), dtype=int)\n",
    "    \n",
    "print depths.mean(), depths.std()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Plot the coverage for the sample with highest mean coverage\n",
    "Green shows the loci that were discarded and orange the loci that were retained. The majority of data were discarded for being too low of coverage. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<toyplot.mark.BarMagnitudes at 0x7ff22c0e08d0>"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/html": [
       "<div align=\"center\" class=\"toyplot\" id=\"t5798f7236e6242d0a3c185b5b5125fb5\"><svg height=\"300.0px\" id=\"t00cd2eff4f1847728f2e0235cbd309c9\" preserveAspectRatio=\"xMidyMid meet\" style=\"background-color:transparent;fill:rgb(16.1%,15.3%,14.1%);fill-opacity:1.0;font-family:helvetica;font-size:12px;opacity:1.0;stroke:rgb(16.1%,15.3%,14.1%);stroke-opacity:1.0;stroke-width:1.0\" viewBox=\"0 0 350.0 300.0\" width=\"350.0px\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:toyplot=\"http://www.sandia.gov/toyplot\"><g class=\"toyplot-axes-Cartesian\" id=\"te67b2392830c4b3394d4080a5b7b3753\"><clipPath id=\"t8b705a34a1a7489b8cd0bf82b4e133f6\"><rect height=\"200.0\" width=\"250.0\" x=\"50.0\" y=\"50.0\"></rect></clipPath><g class=\"toyplot-coordinate-events\" clip-path=\"url(#t8b705a34a1a7489b8cd0bf82b4e133f6)\" style=\"cursor:crosshair\"><rect height=\"200.0\" style=\"pointer-events:all;visibility:hidden\" width=\"250.0\" x=\"50.0\" y=\"50.0\"></rect><g class=\"toyplot-mark-BarMagnitudes\" id=\"t39dd5cedfd294f08aebab77b0106054f\" style=\"stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\"><g class=\"toyplot-Series\"><rect class=\"toyplot-Datum\" height=\"0.0\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666714\" x=\"60.0\" y=\"240.0\"></rect><rect class=\"toyplot-Datum\" height=\"180.0\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"67.666666666666671\" y=\"60.0\"></rect><rect class=\"toyplot-Datum\" height=\"68.882387361029828\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666714\" x=\"75.333333333333329\" y=\"171.11761263897017\"></rect><rect class=\"toyplot-Datum\" height=\"37.00409596255119\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"83.0\" y=\"202.99590403744881\"></rect><rect class=\"toyplot-Datum\" height=\"26.506729081334129\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666714\" x=\"90.666666666666657\" y=\"213.49327091866587\"></rect><rect class=\"toyplot-Datum\" height=\"22.15330602691634\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666714\" x=\"98.333333333333329\" y=\"217.84669397308366\"></rect><rect class=\"toyplot-Datum\" height=\"22.083089526038606\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666714\" x=\"106.0\" y=\"217.91691047396139\"></rect><rect class=\"toyplot-Datum\" height=\"22.012873025160928\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"113.66666666666667\" y=\"217.98712697483907\"></rect><rect class=\"toyplot-Datum\" height=\"21.591574019894693\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666714\" x=\"121.33333333333333\" y=\"218.40842598010531\"></rect><rect class=\"toyplot-Datum\" height=\"24.47045055588066\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"129.0\" y=\"215.52954944411934\"></rect><rect class=\"toyplot-Datum\" height=\"24.01404330017553\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"136.66666666666666\" y=\"215.98595669982447\"></rect><rect class=\"toyplot-Datum\" height=\"25.558806319485058\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666856\" x=\"144.33333333333331\" y=\"214.44119368051494\"></rect><rect class=\"toyplot-Datum\" height=\"25.34815681685194\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666856\" x=\"152.0\" y=\"214.65184318314806\"></rect><rect class=\"toyplot-Datum\" height=\"22.118197776477473\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"159.66666666666669\" y=\"217.88180222352253\"></rect><rect class=\"toyplot-Datum\" height=\"27.068461088355775\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"167.33333333333334\" y=\"212.93153891164422\"></rect><rect class=\"toyplot-Datum\" height=\"24.926857811585734\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"175.0\" y=\"215.07314218841427\"></rect><rect class=\"toyplot-Datum\" height=\"25.067290813341145\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666856\" x=\"182.66666666666666\" y=\"214.93270918665885\"></rect><rect class=\"toyplot-Datum\" height=\"24.751316559391483\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"190.33333333333334\" y=\"215.24868344060852\"></rect><rect class=\"toyplot-Datum\" height=\"24.821533060269161\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"198.0\" y=\"215.17846693973084\"></rect><rect class=\"toyplot-Datum\" height=\"26.366296079578717\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"205.66666666666666\" y=\"213.63370392042128\"></rect><rect class=\"toyplot-Datum\" height=\"27.875950848449378\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666856\" x=\"213.33333333333331\" y=\"212.12404915155062\"></rect><rect class=\"toyplot-Datum\" height=\"27.068461088355775\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"221.0\" y=\"212.93153891164422\"></rect><rect class=\"toyplot-Datum\" height=\"25.488589818607352\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666856\" x=\"228.66666666666666\" y=\"214.51141018139265\"></rect><rect class=\"toyplot-Datum\" height=\"27.033352837916908\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"236.33333333333334\" y=\"212.96664716208309\"></rect><rect class=\"toyplot-Datum\" height=\"26.963136337039202\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666856\" x=\"244.0\" y=\"213.0368636629608\"></rect><rect class=\"toyplot-Datum\" height=\"27.66530134581626\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666288\" x=\"251.66666666666669\" y=\"212.33469865418374\"></rect><rect class=\"toyplot-Datum\" height=\"26.541837331772967\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666856\" x=\"259.33333333333331\" y=\"213.45816266822703\"></rect><rect class=\"toyplot-Datum\" height=\"28.648332358104142\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666856\" x=\"267.0\" y=\"211.35166764189586\"></rect><rect class=\"toyplot-Datum\" height=\"56.734932709186666\" style=\"fill:rgb(40%,76.1%,64.7%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666288\" x=\"274.66666666666669\" y=\"183.26506729081333\"></rect></g></g><g class=\"toyplot-mark-BarMagnitudes\" id=\"t513008bdcad84e41a38058b7cc2e3d12\" style=\"stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\"><g class=\"toyplot-Series\"><rect class=\"toyplot-Datum\" height=\"0.0\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666714\" x=\"60.0\" y=\"240.0\"></rect><rect class=\"toyplot-Datum\" height=\"0.0\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"67.666666666666671\" y=\"240.0\"></rect><rect class=\"toyplot-Datum\" height=\"0.0\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666714\" x=\"75.333333333333329\" y=\"240.0\"></rect><rect class=\"toyplot-Datum\" height=\"0.0\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"83.0\" y=\"240.0\"></rect><rect class=\"toyplot-Datum\" height=\"0.0\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666714\" x=\"90.666666666666657\" y=\"240.0\"></rect><rect class=\"toyplot-Datum\" height=\"0.0\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666714\" x=\"98.333333333333329\" y=\"240.0\"></rect><rect class=\"toyplot-Datum\" height=\"22.083089526038606\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666714\" x=\"106.0\" y=\"217.91691047396139\"></rect><rect class=\"toyplot-Datum\" height=\"22.012873025160928\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"113.66666666666667\" y=\"217.98712697483907\"></rect><rect class=\"toyplot-Datum\" height=\"21.591574019894693\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666714\" x=\"121.33333333333333\" y=\"218.40842598010531\"></rect><rect class=\"toyplot-Datum\" height=\"24.47045055588066\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"129.0\" y=\"215.52954944411934\"></rect><rect class=\"toyplot-Datum\" height=\"24.01404330017553\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"136.66666666666666\" y=\"215.98595669982447\"></rect><rect class=\"toyplot-Datum\" height=\"25.558806319485058\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666856\" x=\"144.33333333333331\" y=\"214.44119368051494\"></rect><rect class=\"toyplot-Datum\" height=\"25.34815681685194\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666856\" x=\"152.0\" y=\"214.65184318314806\"></rect><rect class=\"toyplot-Datum\" height=\"22.118197776477473\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"159.66666666666669\" y=\"217.88180222352253\"></rect><rect class=\"toyplot-Datum\" height=\"27.068461088355775\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"167.33333333333334\" y=\"212.93153891164422\"></rect><rect class=\"toyplot-Datum\" height=\"24.926857811585734\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"175.0\" y=\"215.07314218841427\"></rect><rect class=\"toyplot-Datum\" height=\"25.067290813341145\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666856\" x=\"182.66666666666666\" y=\"214.93270918665885\"></rect><rect class=\"toyplot-Datum\" height=\"24.751316559391483\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"190.33333333333334\" y=\"215.24868344060852\"></rect><rect class=\"toyplot-Datum\" height=\"24.821533060269161\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"198.0\" y=\"215.17846693973084\"></rect><rect class=\"toyplot-Datum\" height=\"26.366296079578717\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"205.66666666666666\" y=\"213.63370392042128\"></rect><rect class=\"toyplot-Datum\" height=\"27.875950848449378\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666856\" x=\"213.33333333333331\" y=\"212.12404915155062\"></rect><rect class=\"toyplot-Datum\" height=\"27.068461088355775\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"221.0\" y=\"212.93153891164422\"></rect><rect class=\"toyplot-Datum\" height=\"25.488589818607352\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666856\" x=\"228.66666666666666\" y=\"214.51141018139265\"></rect><rect class=\"toyplot-Datum\" height=\"27.033352837916908\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666572\" x=\"236.33333333333334\" y=\"212.96664716208309\"></rect><rect class=\"toyplot-Datum\" height=\"26.963136337039202\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666856\" x=\"244.0\" y=\"213.0368636629608\"></rect><rect class=\"toyplot-Datum\" height=\"27.66530134581626\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666288\" x=\"251.66666666666669\" y=\"212.33469865418374\"></rect><rect class=\"toyplot-Datum\" height=\"26.541837331772967\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666856\" x=\"259.33333333333331\" y=\"213.45816266822703\"></rect><rect class=\"toyplot-Datum\" height=\"28.648332358104142\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666856\" x=\"267.0\" y=\"211.35166764189586\"></rect><rect class=\"toyplot-Datum\" height=\"56.734932709186666\" style=\"fill:rgb(98.8%,55.3%,38.4%);fill-opacity:1.0;opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.0\" width=\"7.6666666666666288\" x=\"274.66666666666669\" y=\"183.26506729081333\"></rect></g></g></g><g class=\"toyplot-coordinates\" style=\"visibility:hidden\"><rect height=\"14.0\" style=\"fill:rgb(100%,100%,100%);fill-opacity:1.0;opacity:0.75;stroke:none\" width=\"90.0\" x=\"200.0\" y=\"60.0\"></rect><text style=\"alignment-baseline:middle;font-size:10px;font-weight:normal;stroke:none;text-anchor:middle\" x=\"245.0\" y=\"67.0\"></text></g><line style=\"\" x1=\"60.0\" x2=\"282.3333333333333\" y1=\"250.0\" y2=\"250.0\"></line><g><text style=\"alignment-baseline:middle;baseline-shift:-80%;font-size:10px;font-weight:normal;stroke:none;text-anchor:middle\" x=\"60.0\" y=\"250.0\">0</text><text style=\"alignment-baseline:middle;baseline-shift:-80%;font-size:10px;font-weight:normal;stroke:none;text-anchor:middle\" x=\"136.66666666666666\" y=\"250.0\">10</text><text style=\"alignment-baseline:middle;baseline-shift:-80%;font-size:10px;font-weight:normal;stroke:none;text-anchor:middle\" x=\"213.33333333333331\" y=\"250.0\">20</text><text style=\"alignment-baseline:middle;baseline-shift:-80%;font-size:10px;font-weight:normal;stroke:none;text-anchor:middle\" x=\"290.0\" y=\"250.0\">30</text></g><text style=\"alignment-baseline:middle;baseline-shift:-200%;font-weight:bold;stroke:none;text-anchor:middle\" x=\"175.0\" y=\"250.0\">Depth of coverage (N reads)</text><line style=\"\" x1=\"50.0\" x2=\"50.0\" y1=\"60.0\" y2=\"240.0\"></line><g><text style=\"alignment-baseline:middle;baseline-shift:80%;font-size:10px;font-weight:normal;stroke:none;text-anchor:middle\" transform=\"rotate(-90, 50.0, 240.0)\" x=\"50.0\" y=\"240.0\">0</text><text style=\"alignment-baseline:middle;baseline-shift:80%;font-size:10px;font-weight:normal;stroke:none;text-anchor:middle\" transform=\"rotate(-90, 50.0, 204.8917495611469)\" x=\"50.0\" y=\"204.8917495611469\">1000</text><text style=\"alignment-baseline:middle;baseline-shift:80%;font-size:10px;font-weight:normal;stroke:none;text-anchor:middle\" transform=\"rotate(-90, 50.0, 169.78349912229373)\" x=\"50.0\" y=\"169.78349912229373\">2000</text><text style=\"alignment-baseline:middle;baseline-shift:80%;font-size:10px;font-weight:normal;stroke:none;text-anchor:middle\" transform=\"rotate(-90, 50.0, 134.6752486834406)\" x=\"50.0\" y=\"134.6752486834406\">3000</text><text style=\"alignment-baseline:middle;baseline-shift:80%;font-size:10px;font-weight:normal;stroke:none;text-anchor:middle\" transform=\"rotate(-90, 50.0, 99.56699824458748)\" x=\"50.0\" y=\"99.56699824458748\">4000</text><text style=\"alignment-baseline:middle;baseline-shift:80%;font-size:10px;font-weight:normal;stroke:none;text-anchor:middle\" transform=\"rotate(-90, 50.0, 64.45874780573435)\" x=\"50.0\" y=\"64.45874780573435\">5000</text></g><text style=\"alignment-baseline:middle;baseline-shift:200%;font-weight:bold;stroke:none;text-anchor:middle\" transform=\"rotate(-90, 50.0, 150.0)\" x=\"50.0\" y=\"150.0\">N loci</text><text style=\"alignment-baseline:middle;baseline-shift:100%;font-size:14px;font-weight:bold;stroke:none;text-anchor:middle\" x=\"175.0\" y=\"50.0\">dataset5/sample=timareta_ERS070251</text></g></svg><div class=\"toyplot-controls\"><ul class=\"toyplot-mark-popup\" onmouseleave=\"this.style.visibility='hidden'\" style=\"background:rgba(0%,0%,0%,0.75);border:0;border-radius:6px;color:white;cursor:default;list-style:none;margin:0;padding:5px;position:fixed;visibility:hidden\"><li class=\"toyplot-mark-popup-title\" style=\"color:lightgray;cursor:default;padding:5px;list-style:none;margin:0;\"></li><li class=\"toyplot-mark-popup-save-csv\" onmouseout=\"this.style.color='white';this.style.background='steelblue'\" onmouseover=\"this.style.color='steelblue';this.style.background='white'\" style=\"border-radius:3px;padding:5px;list-style:none;margin:0;\">Save as .csv</li></ul><script>\n",
       "(function()\n",
       "{\n",
       "  if(window.CSS !== undefined && window.CSS.supports !== undefined)\n",
       "  {\n",
       "    if(!window.CSS.supports(\"alignment-baseline\", \"middle\"))\n",
       "    {\n",
       "      var re = /\\s*alignment-baseline\\s*:\\s*([^;\\s]*)\\s*/;\n",
       "      var text = document.querySelectorAll(\"#t5798f7236e6242d0a3c185b5b5125fb5 text\");\n",
       "      for(var i = 0; i != text.length; ++i)\n",
       "      {\n",
       "        var match = re.exec(text[i].attributes.style.value);\n",
       "        if(match)\n",
       "        {\n",
       "          if(match[1] == \"middle\")\n",
       "          {\n",
       "            var style = getComputedStyle(text[i]);\n",
       "            var font_size = style.fontSize.substr(0, style.fontSize.length - 2);\n",
       "            var dy = text[i].dy.baseVal.length ? text[i].dy.baseVal[0].value : 0;\n",
       "            dy += 0.4 * font_size;\n",
       "            text[i].setAttribute(\"dy\", dy);\n",
       "          }\n",
       "        }\n",
       "      }\n",
       "    }\n",
       "    if(!window.CSS.supports(\"baseline-shift\", \"0\"))\n",
       "    {\n",
       "      var re = /\\s*baseline-shift\\s*:\\s*([^;\\s]*)\\s*/;\n",
       "      var text = document.querySelectorAll(\"#t5798f7236e6242d0a3c185b5b5125fb5 text\");\n",
       "      for(var i = 0; i != text.length; ++i)\n",
       "      {\n",
       "        var match = re.exec(text[i].attributes.style.value);\n",
       "        if(match)\n",
       "        {\n",
       "          var style = getComputedStyle(text[i]);\n",
       "          var font_size = style.fontSize.substr(0, style.fontSize.length - 2);\n",
       "          var percent = 0.01 * match[1].substr(0, match[1].length-1);\n",
       "          var dy = text[i].dy.baseVal.length ? text[i].dy.baseVal[0].value : 0;\n",
       "          dy -= percent * font_size\n",
       "          text[i].setAttribute(\"dy\", dy);\n",
       "        }\n",
       "      }\n",
       "    }\n",
       "  }\n",
       "})();\n",
       "</script><script>\n",
       "(function()\n",
       "{\n",
       "  var data_tables = [{\"data\": [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 5127.0, 1962.0, 1054.0, 755.0, 631.0, 629.0, 627.0, 615.0, 697.0, 684.0, 728.0, 722.0, 630.0, 771.0, 710.0, 714.0, 705.0, 707.0, 751.0, 794.0, 771.0, 726.0, 770.0, 768.0, 788.0, 756.0, 816.0, 1616.0]], \"title\": \"Bar Data\", \"names\": [\"left\", \"right\", \"baseline\", \"magnitude0\"], \"id\": \"t39dd5cedfd294f08aebab77b0106054f\", \"filename\": \"toyplot\"}, {\"data\": [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 629.0, 627.0, 615.0, 697.0, 684.0, 728.0, 722.0, 630.0, 771.0, 710.0, 714.0, 705.0, 707.0, 751.0, 794.0, 771.0, 726.0, 770.0, 768.0, 788.0, 756.0, 816.0, 1616.0]], \"title\": \"Bar Data\", \"names\": [\"left\", \"right\", \"baseline\", \"magnitude0\"], \"id\": \"t513008bdcad84e41a38058b7cc2e3d12\", \"filename\": \"toyplot\"}];\n",
       "\n",
       "  function save_csv(data_table)\n",
       "  {\n",
       "    var uri = \"data:text/csv;charset=utf-8,\";\n",
       "    uri += data_table.names.join(\",\") + \"\\n\";\n",
       "    for(var i = 0; i != data_table.data[0].length; ++i)\n",
       "    {\n",
       "      for(var j = 0; j != data_table.data.length; ++j)\n",
       "      {\n",
       "        if(j)\n",
       "          uri += \",\";\n",
       "        uri += data_table.data[j][i];\n",
       "      }\n",
       "      uri += \"\\n\";\n",
       "    }\n",
       "    uri = encodeURI(uri);\n",
       "\n",
       "    var link = document.createElement(\"a\");\n",
       "    if(typeof link.download != \"undefined\")\n",
       "    {\n",
       "      link.href = uri;\n",
       "      link.style = \"visibility:hidden\";\n",
       "      link.download = data_table.filename + \".csv\";\n",
       "\n",
       "      document.body.appendChild(link);\n",
       "      link.click();\n",
       "      document.body.removeChild(link);\n",
       "    }\n",
       "    else\n",
       "    {\n",
       "      window.open(uri);\n",
       "    }\n",
       "  }\n",
       "\n",
       "  function open_popup(data_table)\n",
       "  {\n",
       "    return function(e)\n",
       "    {\n",
       "      var popup = document.querySelector(\"#t5798f7236e6242d0a3c185b5b5125fb5 .toyplot-mark-popup\");\n",
       "      popup.querySelector(\".toyplot-mark-popup-title\").innerHTML = data_table.title;\n",
       "      popup.querySelector(\".toyplot-mark-popup-save-csv\").onclick = function() { popup.style.visibility = \"hidden\"; save_csv(data_table); }\n",
       "      popup.style.left = (e.clientX - 50) + \"px\";\n",
       "      popup.style.top = (e.clientY - 20) + \"px\";\n",
       "      popup.style.visibility = \"visible\";\n",
       "      e.stopPropagation();\n",
       "      e.preventDefault();\n",
       "    }\n",
       "\n",
       "  }\n",
       "\n",
       "  for(var i = 0; i != data_tables.length; ++i)\n",
       "  {\n",
       "    var data_table = data_tables[i];\n",
       "    var event_target = document.querySelector(\"#\" + data_table.id);\n",
       "    event_target.oncontextmenu = open_popup(data_table);\n",
       "  }\n",
       "})();\n",
       "</script><script>\n",
       "(function()\n",
       "{\n",
       "  var axes = {\"te67b2392830c4b3394d4080a5b7b3753\": {\"x\": [{\"domain\": {\"bounds\": {\"max\": Infinity, \"min\": -Infinity}, \"max\": 30.0, \"min\": 0}, \"range\": {\"bounds\": {\"max\": Infinity, \"min\": -Infinity}, \"max\": 290.0, \"min\": 60.0}, \"scale\": \"linear\"}], \"y\": [{\"domain\": {\"bounds\": {\"max\": Infinity, \"min\": -Infinity}, \"max\": 5127.0, \"min\": 0.0}, \"range\": {\"bounds\": {\"max\": -Infinity, \"min\": Infinity}, \"max\": 60.0, \"min\": 240.0}, \"scale\": \"linear\"}]}};\n",
       "\n",
       "  function sign(x)\n",
       "  {\n",
       "    return x < 0 ? -1 : x > 0 ? 1 : 0;\n",
       "  }\n",
       "\n",
       "  function _mix(a, b, amount)\n",
       "  {\n",
       "    return ((1.0 - amount) * a) + (amount * b);\n",
       "  }\n",
       "\n",
       "  function _log(x, base)\n",
       "  {\n",
       "    return Math.log(Math.abs(x)) / Math.log(base);\n",
       "  }\n",
       "\n",
       "  function _in_range(a, x, b)\n",
       "  {\n",
       "    var left = Math.min(a, b);\n",
       "    var right = Math.max(a, b);\n",
       "    return left <= x && x <= right;\n",
       "  }\n",
       "\n",
       "  function to_domain(projection, range)\n",
       "  {\n",
       "    for(var i = 0; i != projection.length; ++i)\n",
       "    {\n",
       "      var segment = projection[i];\n",
       "      if(_in_range(segment.range.bounds.min, range, segment.range.bounds.max))\n",
       "      {\n",
       "        if(segment.scale == \"linear\")\n",
       "        {\n",
       "          var amount = (range - segment.range.min) / (segment.range.max - segment.range.min);\n",
       "          return _mix(segment.domain.min, segment.domain.max, amount)\n",
       "        }\n",
       "        else if(segment.scale[0] == \"log\")\n",
       "        {\n",
       "          var amount = (range - segment.range.min) / (segment.range.max - segment.range.min);\n",
       "          var base = segment.scale[1];\n",
       "          return sign(segment.domain.min) * Math.pow(base, _mix(_log(segment.domain.min, base), _log(segment.domain.max, base), amount));\n",
       "        }\n",
       "      }\n",
       "    }\n",
       "  }\n",
       "\n",
       "  // Compute mouse coordinates relative to a DOM object, with thanks to d3js.org, where this code originated.\n",
       "  function d3_mousePoint(container, e)\n",
       "  {\n",
       "    if (e.changedTouches) e = e.changedTouches[0];\n",
       "    var svg = container.ownerSVGElement || container;\n",
       "    if (svg.createSVGPoint) {\n",
       "      var point = svg.createSVGPoint();\n",
       "      point.x = e.clientX, point.y = e.clientY;\n",
       "      point = point.matrixTransform(container.getScreenCTM().inverse());\n",
       "      return [point.x, point.y];\n",
       "    }\n",
       "    var rect = container.getBoundingClientRect();\n",
       "    return [e.clientX - rect.left - container.clientLeft, e.clientY - rect.top - container.clientTop];\n",
       "  };\n",
       "\n",
       "  function display_coordinates(e)\n",
       "  {\n",
       "    var dom_axes = e.currentTarget.parentElement;\n",
       "    var data = axes[dom_axes.id];\n",
       "\n",
       "    point = d3_mousePoint(e.target, e);\n",
       "    var x = Number(to_domain(data[\"x\"], point[0])).toFixed(2);\n",
       "    var y = Number(to_domain(data[\"y\"], point[1])).toFixed(2);\n",
       "\n",
       "    var coordinates = dom_axes.querySelectorAll(\".toyplot-coordinates\");\n",
       "    for(var i = 0; i != coordinates.length; ++i)\n",
       "    {\n",
       "      coordinates[i].style.visibility = \"visible\";\n",
       "      coordinates[i].querySelector(\"text\").textContent = \"x=\" + x + \" y=\" + y;\n",
       "    }\n",
       "  }\n",
       "\n",
       "  function clear_coordinates(e)\n",
       "  {\n",
       "    var dom_axes = e.currentTarget.parentElement;\n",
       "    var coordinates = dom_axes.querySelectorAll(\".toyplot-coordinates\");\n",
       "    for(var i = 0; i != coordinates.length; ++i)\n",
       "      coordinates[i].style.visibility = \"hidden\";\n",
       "  }\n",
       "\n",
       "  for(var axes_id in axes)\n",
       "  {\n",
       "    var event_target = document.querySelector(\"#\" + axes_id + \" .toyplot-coordinate-events\");\n",
       "    event_target.onmousemove = display_coordinates;\n",
       "    event_target.onmouseout = clear_coordinates;\n",
       "  }\n",
       "})();\n",
       "</script></div></div>"
      ],
      "text/plain": [
       "<toyplot.canvas.Canvas at 0x7ff22c0e0210>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import toyplot\n",
    "import toyplot.svg\n",
    "import numpy as np\n",
    "\n",
    "## read in the depth information for this sample\n",
    "with open(\"empirical_5/clust.85/timareta_ERS070251.depths\", 'rb') as indat:\n",
    "    depths = np.array(indat.read().strip().split(\",\"), dtype=int)\n",
    "    \n",
    "## make a barplot in Toyplot\n",
    "canvas = toyplot.Canvas(width=350, height=300)\n",
    "axes = canvas.axes(xlabel=\"Depth of coverage (N reads)\", \n",
    "                   ylabel=\"N loci\", \n",
    "                   label=\"dataset5/sample=timareta_ERS070251\")\n",
    "\n",
    "## select the loci with depth > 5 (kept)\n",
    "keeps = depths[depths>5]\n",
    "\n",
    "## plot kept and discarded loci\n",
    "edat = np.histogram(depths, range(30)) # density=True)\n",
    "kdat = np.histogram(keeps, range(30)) #, density=True)\n",
    "axes.bars(edat)\n",
    "axes.bars(kdat)\n",
    "\n",
    "#toyplot.svg.render(canvas, \"empirical_5_depthplot.svg\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Print final stats table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {
    "collapsed": false,
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r\n",
      "\r\n",
      "125876      ## loci with > minsp containing data\r\n",
      "119819      ## loci with > minsp containing data & paralogs removed\r\n",
      "119819      ## loci with > minsp containing data & paralogs removed & final filtering\r\n",
      "\r\n",
      "## number of loci recovered in final data set for each taxon.\r\n",
      "taxon\tnloci\r\n",
      "cydno_ERS074404    \t37350\r\n",
      "cydno_ERS074405    \t38572\r\n",
      "cydno_ERS074406    \t37734\r\n",
      "cydno_ERS074407    \t37863\r\n",
      "cydno_ERS074408    \t37843\r\n",
      "cydno_ERS074409    \t38464\r\n",
      "cydno_ERS074410    \t33024\r\n",
      "cydno_ERS074411    \t30593\r\n",
      "cydno_ERS074412    \t22874\r\n",
      "cydno_ERS074413    \t22699\r\n",
      "cydno_ERS074414    \t22996\r\n",
      "cydno_ERS074415    \t22966\r\n",
      "hecale_ERS070252   \t38275\r\n",
      "hecale_ERS070253   \t38129\r\n",
      "hecale_ERS070254   \t38378\r\n",
      "hecale_ERS070255   \t38013\r\n",
      "hecale_ERS070256   \t37917\r\n",
      "heurippa_ERS074398 \t38458\r\n",
      "heurippa_ERS074399 \t38354\r\n",
      "heurippa_ERS074400 \t37843\r\n",
      "heurippa_ERS074401 \t38376\r\n",
      "heurippa_ERS074402 \t38231\r\n",
      "heurippa_ERS074403 \t38201\r\n",
      "melpomene_ERS070268\t40055\r\n",
      "melpomene_ERS070269\t40284\r\n",
      "melpomene_ERS070270\t39727\r\n",
      "melpomene_ERS070271\t39908\r\n",
      "melpomene_ERS070272\t38809\r\n",
      "melpomene_ERS070273\t40151\r\n",
      "melpomene_ERS070274\t36786\r\n",
      "melpomene_ERS070275\t40068\r\n",
      "melpomene_ERS070276\t39807\r\n",
      "melpomene_ERS070277\t37965\r\n",
      "melpomene_ERS074416\t33006\r\n",
      "melpomene_ERS074417\t36365\r\n",
      "melpomene_ERS074418\t36483\r\n",
      "melpomene_ERS074419\t35993\r\n",
      "melpomene_ERS074420\t38888\r\n",
      "melpomene_ERS074421\t38023\r\n",
      "melpomene_ERS074422\t37885\r\n",
      "melpomene_ERS074423\t38333\r\n",
      "melpomene_ERS074424\t30796\r\n",
      "melpomene_ERS074425\t33266\r\n",
      "melpomene_ERS074426\t35757\r\n",
      "melpomene_ERS074427\t37341\r\n",
      "melpomene_ERS074428\t32355\r\n",
      "melpomene_ERS074429\t31705\r\n",
      "timareta_ERS070245 \t40166\r\n",
      "timareta_ERS070246 \t38704\r\n",
      "timareta_ERS070247 \t37990\r\n",
      "timareta_ERS070248 \t39932\r\n",
      "timareta_ERS070249 \t40204\r\n",
      "timareta_ERS070250 \t38882\r\n",
      "timareta_ERS070251 \t37739\r\n",
      "\r\n",
      "\r\n",
      "## nloci = number of loci with data for exactly ntaxa\r\n",
      "## ntotal = number of loci for which at least ntaxa have data\r\n",
      "ntaxa\tnloci\tsaved\tntotal\r\n",
      "1\t-\r\n",
      "2\t-\t\t-\r\n",
      "3\t-\t\t-\r\n",
      "4\t26561\t*\t119819\r\n",
      "5\t13321\t*\t93258\r\n",
      "6\t6556\t*\t79937\r\n",
      "7\t5141\t*\t73381\r\n",
      "8\t4286\t*\t68240\r\n",
      "9\t3795\t*\t63954\r\n",
      "10\t3316\t*\t60159\r\n",
      "11\t2938\t*\t56843\r\n",
      "12\t2831\t*\t53905\r\n",
      "13\t2541\t*\t51074\r\n",
      "14\t2294\t*\t48533\r\n",
      "15\t2163\t*\t46239\r\n",
      "16\t1941\t*\t44076\r\n",
      "17\t1816\t*\t42135\r\n",
      "18\t1740\t*\t40319\r\n",
      "19\t1691\t*\t38579\r\n",
      "20\t1663\t*\t36888\r\n",
      "21\t1568\t*\t35225\r\n",
      "22\t1426\t*\t33657\r\n",
      "23\t1412\t*\t32231\r\n",
      "24\t1368\t*\t30819\r\n",
      "25\t1170\t*\t29451\r\n",
      "26\t1217\t*\t28281\r\n",
      "27\t1130\t*\t27064\r\n",
      "28\t1122\t*\t25934\r\n",
      "29\t1032\t*\t24812\r\n",
      "30\t1028\t*\t23780\r\n",
      "31\t1011\t*\t22752\r\n",
      "32\t1003\t*\t21741\r\n",
      "33\t947\t*\t20738\r\n",
      "34\t922\t*\t19791\r\n",
      "35\t881\t*\t18869\r\n",
      "36\t838\t*\t17988\r\n",
      "37\t806\t*\t17150\r\n",
      "38\t811\t*\t16344\r\n",
      "39\t802\t*\t15533\r\n",
      "40\t849\t*\t14731\r\n",
      "41\t800\t*\t13882\r\n",
      "42\t849\t*\t13082\r\n",
      "43\t864\t*\t12233\r\n",
      "44\t902\t*\t11369\r\n",
      "45\t959\t*\t10467\r\n",
      "46\t746\t*\t9508\r\n",
      "47\t794\t*\t8762\r\n",
      "48\t903\t*\t7968\r\n",
      "49\t1171\t*\t7065\r\n",
      "50\t1834\t*\t5894\r\n",
      "51\t480\t*\t4060\r\n",
      "52\t611\t*\t3580\r\n",
      "53\t969\t*\t2969\r\n",
      "54\t2000\t*\t2000\r\n",
      "\r\n",
      "\r\n",
      "## nvar = number of loci containing n variable sites (pis+autapomorphies).\r\n",
      "## sumvar = sum of variable sites (SNPs).\r\n",
      "## pis = number of loci containing n parsimony informative sites.\r\n",
      "## sumpis = sum of parsimony informative sites.\r\n",
      "\tnvar\tsumvar\tPIS\tsumPIS\r\n",
      "0\t10137\t0\t29807\t0\r\n",
      "1\t9141\t9141\t16094\t16094\r\n",
      "2\t8326\t25793\t11449\t38992\r\n",
      "3\t7494\t48275\t8901\t65695\r\n",
      "4\t7121\t76759\t7571\t95979\r\n",
      "5\t6775\t110634\t6655\t129254\r\n",
      "6\t6705\t150864\t5800\t164054\r\n",
      "7\t6231\t194481\t5399\t201847\r\n",
      "8\t6035\t242761\t4762\t239943\r\n",
      "9\t5580\t292981\t4234\t278049\r\n",
      "10\t5326\t346241\t3797\t316019\r\n",
      "11\t4909\t400240\t3255\t351824\r\n",
      "12\t4535\t454660\t2658\t383720\r\n",
      "13\t4217\t509481\t2376\t414608\r\n",
      "14\t3864\t563577\t1919\t441474\r\n",
      "15\t3381\t614292\t1423\t462819\r\n",
      "16\t3056\t663188\t1085\t480179\r\n",
      "17\t2665\t708493\t844\t494527\r\n",
      "18\t2365\t751063\t594\t505219\r\n",
      "19\t2096\t790887\t407\t512952\r\n",
      "20\t1780\t826487\t297\t518892\r\n",
      "21\t1596\t860003\t168\t522420\r\n",
      "22\t1319\t889021\t129\t525258\r\n",
      "23\t1131\t915034\t85\t527213\r\n",
      "24\t912\t936922\t39\t528149\r\n",
      "25\t706\t954572\t32\t528949\r\n",
      "26\t563\t969210\t17\t529391\r\n",
      "27\t465\t981765\t9\t529634\r\n",
      "28\t394\t992797\t5\t529774\r\n",
      "29\t277\t1000830\t7\t529977\r\n",
      "30\t228\t1007670\t1\t530007\r\n",
      "31\t152\t1012382\t0\t530007\r\n",
      "32\t118\t1016158\t0\t530007\r\n",
      "33\t88\t1019062\t0\t530007\r\n",
      "34\t48\t1020694\t0\t530007\r\n",
      "35\t28\t1021674\t0\t530007\r\n",
      "36\t13\t1022142\t0\t530007\r\n",
      "37\t15\t1022697\t0\t530007\r\n",
      "38\t7\t1022963\t0\t530007\r\n",
      "39\t5\t1023158\t0\t530007\r\n",
      "40\t4\t1023318\t0\t530007\r\n",
      "41\t3\t1023441\t0\t530007\r\n",
      "42\t3\t1023567\t0\t530007\r\n",
      "43\t1\t1023610\t0\t530007\r\n",
      "44\t1\t1023654\t0\t530007\r\n",
      "45\t0\t1023654\t0\t530007\r\n",
      "46\t1\t1023700\t0\t530007\r\n",
      "47\t0\t1023700\t0\t530007\r\n",
      "48\t1\t1023748\t0\t530007\r\n",
      "49\t0\t1023748\t0\t530007\r\n",
      "50\t1\t1023798\t0\t530007\r\n",
      "total var= 1023798\r\n",
      "total pis= 530007\r\n",
      "sampled unlinked SNPs= 109682\r\n",
      "sampled unlinked bi-allelic SNPs= 89479\r\n"
     ]
    }
   ],
   "source": [
    "cat empirical_5/stats/empirical_5_m4.stats\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "218875      ## loci with > minsp containing data\n",
      "212818      ## loci with > minsp containing data & paralogs removed\n",
      "212818      ## loci with > minsp containing data & paralogs removed & final filtering\n",
      "\n",
      "## number of loci recovered in final data set for each taxon.\n",
      "taxon\tnloci\n",
      "cydno_ERS074404    \t40114\n",
      "cydno_ERS074405    \t41446\n",
      "cydno_ERS074406    \t40560\n",
      "cydno_ERS074407    \t40694\n",
      "cydno_ERS074408    \t40758\n",
      "cydno_ERS074409    \t41320\n",
      "cydno_ERS074410    \t35272\n",
      "cydno_ERS074411    \t32803\n",
      "cydno_ERS074412    \t51705\n",
      "cydno_ERS074413    \t47291\n",
      "cydno_ERS074414    \t52449\n",
      "cydno_ERS074415    \t52917\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "head -n 20 empirical_5/stats/empirical_5_m2.stats\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Infer ML phylogeny in _raxml_ as an unrooted tree"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "## raxml argumement w/ ...\n",
    "raxmlHPC-PTHREADS-AVX -f a -m GTRGAMMA -N 100 -x 12345 -p 12345 -T 20 \\\n",
    "                      -w /home/deren/Documents/RADmissing/empirical_5/ \\\n",
    "                      -n empirical_5_m4 -s empirical_5/outfiles/empirical_5_m4.phy\n",
    "                      "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "## raxml argumement w/ ...\n",
    "raxmlHPC-PTHREADS-AVX -f a -m GTRGAMMA -N 100 -x 12345 -p 12345 -T 20 \\\n",
    "                      -w /home/deren/Documents/RADmissing/empirical_5/ \\\n",
    "                      -n empirical_5_m2 -s empirical_5/outfiles/empirical_5_m2.phy\n",
    "                      "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "This is RAxML version 8.0.16 released by Alexandros Stamatakis on March 21 2014.\n",
      "\n",
      "With greatly appreciated code contributions by:\n",
      "Andre Aberer      (HITS)\n",
      "Simon Berger      (HITS)\n",
      "Alexey Kozlov     (HITS)\n",
      "Nick Pattengale   (Sandia)\n",
      "Wayne Pfeiffer    (SDSC)\n",
      "Akifumi S. Tanabe (NRIFS)\n",
      "David Dao         (KIT)\n",
      "Charlie Taylor    (UF)\n",
      "\n",
      "\n",
      "Alignment has 546632 distinct alignment patterns\n",
      "\n",
      "Proportion of gaps and completely undetermined characters in this alignment: 70.04%\n",
      "\n",
      "RAxML rapid bootstrapping and subsequent ML search\n"
     ]
    }
   ],
   "source": [
    "%%bash \n",
    "head -n 20 empirical_5/RAxML_info.empirical_5_m4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "This is RAxML version 8.0.16 released by Alexandros Stamatakis on March 21 2014.\n",
      "\n",
      "With greatly appreciated code contributions by:\n",
      "Andre Aberer      (HITS)\n",
      "Simon Berger      (HITS)\n",
      "Alexey Kozlov     (HITS)\n",
      "Nick Pattengale   (Sandia)\n",
      "Wayne Pfeiffer    (SDSC)\n",
      "Akifumi S. Tanabe (NRIFS)\n",
      "David Dao         (KIT)\n",
      "Charlie Taylor    (UF)\n",
      "\n",
      "\n",
      "Alignment has 338778 distinct alignment patterns\n",
      "\n",
      "Proportion of gaps and completely undetermined characters in this alignment: 80.98%\n",
      "\n",
      "RAxML rapid bootstrapping and subsequent ML search\n"
     ]
    }
   ],
   "source": [
    "%%bash \n",
    "head -n 20 empirical_5/RAxML_info.empirical_5_m2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get phylo distance (GTRgamma dist)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%load_ext rpy2.ipython"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1] 0.05029949\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%R\n",
    "library(ape)\n",
    "ltre <- read.tree()\n",
    "mean(cophenetic.phylo(ltre))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}