{ "cells": [ { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "wd = '/software/fstats_tutorial/'\n", "dd = '/data//fstats_tutorial/'\n", "setwd(wd)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "options(repr.matrix.max.cols=10, repr.matrix.max.rows=10)\n", "options(repr.plot.width=22, repr.plot.height=22)\n", "#setwd('~peter/fstats_tutorial')\n", "suppressPackageStartupMessages({\n", " library(admixtools)\n", " library(tidyverse)\n", " library(gplots)\n", " library(glue)\n", " source(glue(\"{wd}/scripts/analysis.R\"))\n", "})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 0. Resources\n", "- [`Admixtools`](https://github.com/DReichLab/AdmixTools)\n", "- [`Admixtools 2`](https://github.com/uqrmaie1/admixtools)\n", "- [`admixr`](https://github.com/bodkan/admixr)\n", "- [Patterson et al. (2012)](http://www.genetics.org/content/192/3/1065)\n", "- [Bhatia et al. (2013)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3759727/)\n", "- [Peter (2016)](http://www.genetics.org/content/202/4/1485)\n", "- [Peter (2022)](https://doi.org/10.1098/rstb.2020.0413)\n", "- [Petr et al. (2019)](https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz030/5298728)\n", "\n", "\n", "# 1. Setting up\n", "To get your own copy of this tutorial, you can copy it on the server from my folder:\n", "\n", "```bash\n", "cp -r /software/fstats_tutorial/ .\n", "cp -r /data/fstats_tutorial/ .\n", "```\n", "\n", "If you work in jupyter, there is no need to do that and you can access it directly from the folders. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use the `R`-package `admixtools` for computation. This package is a fast implementation of $F$-statistics that works well if there is little missing data. It allows for precomputation of statistics, which saves us a lot of time. For this tutorial, I already prepared the data; but it can easily be regenerated using the following lines of code." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "#admixtools::extract_f2('data/world/worldfoci2', outdir='fdata/worldfoci2', blgsize = 0.05)\n", "#admixtools::extract_f2('data/world/ancient', outdir='fdata/ancient', blgsize = 0.05)\n", "#admixtools::extract_f2('data/europe/westeurasian1', outdir='fdata/westeurasia1', blgsize = 0.05)\n", "#admixtools::extract_f2('data/europe/westeurasian2', outdir='fdata/westeurasia2', blgsize = 0.05)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. Data\n", "For this tutorial, we will be using data from the ancient DNA compendium by David Reich's lab, that can be downloaded from https://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadable-genotypes-present-day-and-ancient-dna-data. This data has an unique ascertainment\n", "scheme, so throughout we need to keep ascertainment bias in mind. \n", "\n", "In order to save time, I subset the data and pre-computed F2-statistics that we will be using throughout. We will be using two distinct data sets; one focused on Western Eurasian diversity (`europe`), one representing global human diversity (`world`), and one designed to investigate archaic ancestry (`ancient`):" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
ind | sex | pop |
---|---|---|
<chr> | <chr> | <chr> |
Adg-185 | F | Adygei |
Adg-192 | M | Adygei |
Adg-194 | M | Adygei |
Adg-222 | M | Adygei |
Adg-224 | M | Adygei |
⋮ | ⋮ | ⋮ |
EIV004_2 | M | Spanish |
EIV012_2 | F | Spanish |
EIV013_2 | F | Spanish |
EIV014_2 | M | Spanish |
EIV015_2 | F | Spanish |
pop1 | pop2 | est | se |
---|---|---|---|
<chr> | <chr> | <dbl> | <dbl> |
AA | Ami | 0.04942118 | 0.0003878861 |
AA | Atayal | 0.05464286 | 0.0004206249 |
AA | Basque | 0.03370974 | 0.0003165638 |
AA | BedouinB | 0.03139797 | 0.0003027369 |
AA | Biaka | 0.01256080 | 0.0001514436 |
⋮ | ⋮ | ⋮ | ⋮ |
Surui | Ulchi | 0.04615105 | 0.0005242951 |
Surui | Yoruba | 0.08793567 | 0.0005744330 |
Tubalar | Ulchi | 0.01139411 | 0.0001463168 |
Tubalar | Yoruba | 0.04703516 | 0.0003102144 |
Ulchi | Yoruba | 0.05555142 | 0.0003697745 |
pop1 | pop2 | pop3 | est | se | z | p |
---|---|---|---|---|---|---|
<chr> | <chr> | <chr> | <dbl> | <dbl> | <dbl> | <dbl> |
AA | Yoruba | Basque | -0.004777214 | 0.0001110123 | -43.03319 | 0 |
AA | Yoruba | Sardinian | -0.004585963 | 0.0001110581 | -41.29336 | 0 |
AA | Yoruba | Georgian | -0.004319089 | 0.0001057450 | -40.84436 | 0 |
AA | Yoruba | Kalash | -0.004086874 | 0.0001085978 | -37.63311 | 0 |
AA | Yoruba | Brahui | -0.003913767 | 0.0000985930 | -39.69619 | 0 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
AA | Yoruba | Mbuti | 0.002164048 | 8.005551e-05 | 27.03184 | 6.245131e-161 |
AA | Yoruba | Biaka | 0.002384887 | 7.849669e-05 | 30.38201 | 9.494679e-203 |
AA | Yoruba | Mandenka | 0.002521384 | 7.111747e-05 | 35.45380 | 2.534922e-275 |
AA | Yoruba | Yoruba | 0.003055913 | 7.271039e-05 | 42.02856 | 0.000000e+00 |
AA | Yoruba | Esan | 0.003063343 | 7.891989e-05 | 38.81585 | 0.000000e+00 |
qpf4ratio {admixtools} | R Documentation |
Estimate admixture proportions via f4 ratios\n", "
\n", "\n", "\n", "\n", "qpf4ratio(data, pops, boot = FALSE, verbose = FALSE)\n", "\n", "\n", "\n", "
data | \n",
"\n",
" Input data in one of three forms:\n", " \n", "\n", "
|
pops | \n",
"\n",
" A vector of 5 populations or a five column population matrix.\n",
"The following ratios will be computed: |
boot | \n",
"\n",
" If |
verbose | \n",
"\n",
" Print progress updates \n", " |
qpf4ratio
returns a data frame with f4 ratios\n",
"
pop1 | pop2 | pop3 | pop4 | pop5 | alpha | se | z |
---|---|---|---|---|---|---|---|
<chr> | <chr> | <chr> | <chr> | <chr> | <dbl> | <dbl> | <dbl> |
Altai_Neanderthal.DG | Primate_Chimp | French | Yoruba | Vindija_Neanderthal.DG | 0.02067599 | 0.002513743 | 8.225182 |
Altai_Neanderthal.DG | Primate_Chimp | Papuan | Yoruba | Vindija_Neanderthal.DG | 0.03070703 | 0.003833748 | 8.009663 |
Altai_Neanderthal.DG | Primate_Chimp | Han | Yoruba | Vindija_Neanderthal.DG | 0.02205858 | 0.002959204 | 7.454227 |