{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# F Statistics\n", "## F3 Statistics\n", "\n", "F3 statistics are a useful analytical tool to understand population relationships. F3 statistics, just as F4 and F2 statistics measure allele frequency correlations between populations and were introduced by Nick Patterson in his [Patterson 2012](http://www.genetics.org/content/early/2012/09/06/genetics.112.145037).\n", "\n", "F3 statistics are used for two purposes: i) as a test whether a target population (C) is admixed between two source populations (A and B), and ii) to measure shared drift between two test populations (A and B) from an outgroup (C).\n", "\n", "F3 statistics are in both cases defined as the product of allele frequency differences between population C to A and B, respectively:\n", "\n", "$$F3(A,B;C)=\\langle(c−a)(c−b)\\rangle$$\n", "\n", "Here, $\\langle\\cdot\\rangle$ denotes the average over all genotyped sites, and a, b and c\n", "denote the allele frequency for a given site in the three populations A, B and C." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Admixture F3 Statistics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It can be shown that if that statistics is negative, it provides unambiguous proof that population C is admixed between populations A and B, as in the following phylogeny (taken from Figure 1 from [Patterson 2012](http://www.genetics.org/content/early/2012/09/06/genetics.112.145037):\n", "\n", "\"F3-phylogeny\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Intuitively, an F3 statistics becomes negative if the allele frequency of the target population C is on average intermediate between the allele frequencies of A and B. Consider as an extreme example a genomic site where a=0, b=1 and c=0.5. Then we have (c−a)(c−b)=−0.25, which is negative. So if the entire statistics is negative, it suggests that in many positions, the allele frequency c is indeed intermediate, suggesting admixture between the two sources." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***Note:*** If an F3 statistics is *not* negative, it does *not* proof that there is no admixture!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use this statistics to test if Finnish are admixed between East and West, using different Eastern and Western sources. In the West, we use French, Icelandic, Lithuanian and Norwegian as source, and in the East we use Nganasan and one of the ancient individuals analysed in this workshop, *Bolshoy Oleni Ostrov*, 3500 year old individuals from the Northern Russian Kola-peninsula.\n", "\n", "We use the software `qp3Pop` from [AdmixTools](https://github.com/DReichLab/AdmixTools), which similar to `smartpca` takes a parameter file:\n", "\n", " genotypename: input genotype file (in eigenstrat format)\n", " snpname: input snp file (in eigenstrat format)\n", " indivname: input indiv file (in eigenstrat format)\n", " popfilename: a file containing rows with three populations on each line A, B and C.\n", " inbreed: YES" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, the last option is necessary if we are analysing pseudo-diploid ancient data (which is the case here).\n", "\n", "To prepare the `popfilename`, create a new text file using Jupyter with the following content:\n", "\n", " Nganasan French Finnish\n", "\tNganasan Icelandic Finnish\n", "\tNganasan Lithuanian Finnish\n", "\tNganasan Norwegian Finnish\n", "\tBolshoyOleniOstrov French Finnish\n", "\tBolshoyOleniOstrov Icelandic Finnish\n", "\tBolshoyOleniOstrov Lithuanian Finnish\n", "\tBolshoyOleniOstrov Norwegian Finnish" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***Exercise:*** Prepare the parameter file with the input data as in the PCA session (see Principal Components Analysis (PCA)) and then run `qp3Pop -p PARAMETER_FILE`, where `PARAMETERFILE` should be replaced by your parameter file name. As genotype data, use the files called `/data/popgen_course/HumanOrigins_FennoScandian_small.*`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results are in the output that you can view in the Notebook. The crucial bit should look like this:\n", "\n", "\tSource 1\tSource 2\tTarget\tf_3\tstd. err\tZ\tSNPs\n", "\tresult:\tNganasan\tFrench\tFinnish\t-0.004539\t0.000510\t-8.894\t442567\n", "\tresult:\tNganasan\tIcelandic\tFinnish\t-0.005297\t0.000563\t-9.404\t427954\n", "\tresult:\tNganasan\tLithuanian\tFinnish\t-0.005062\t0.000590\t-8.574\t426231\n", "\tresult:\tNganasan\tNorwegian\tFinnish\t-0.004744\t0.000569\t-8.332\t428161\n", "\tresult:\tBolshoyOleniOstrov\tFrench\tFinnish\t-0.002814\t0.000444\t-6.341\t402958\n", "\tresult:\tBolshoyOleniOstrov\tIcelandic\tFinnish\t-0.002590\t0.000486\t-5.323\t386418\n", "\tresult:\tBolshoyOleniOstrov\tLithuanian\tFinnish\t-0.001523\t0.000536\t-2.840\t384134\n", "\tresult:\tBolshoyOleniOstrov\tNorwegian\tFinnish\t-0.001553\t0.000502\t-3.092\t386203\n", "\n", "This output shows as first three columns the three populations A, B (sources) and C (target). Then the f3 statistics, which is negative in all cases tested here, a standard error, a Z score and the number of SNPs involved in the statistics.\n", "\n", "The Z score is key: It gives the deviation of the f3 statistic from zero in units of the standard error. As general rule, a Z score of -3 or more suggests a significant rejection of the Null hypothesis that the statistic is not negative. In this case, all of the statistics are significantly negative, proving that Finnish have ancestral admixture of East and West Eurasian ancestry. Note that the statistics does not suggest when this admixture happened!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## F4 Statistics\n", "\n", "A different way to test for admixture is by “F4 statistics” (or “D statistics” which is very similar), also introduced in [Patterson 2012](http://www.genetics.org/content/early/2012/09/06/genetics.112.145037).\n", "\n", "F4 statistics are also defined in terms of correlations of allele frequency differences, similarly to F3 statistics (see above), but involving four different populations, not just three. Specifically we define\n", "\n", "$$F4(A,B;C,D)=\\langle(a−b)(c−d)\\rangle.$$\n", "\n", "To understand the statistics, consider the following tree:\n", "\n", "\"F4-phylogeny\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tree, without any additional admixture, the allele frequency difference between A and B should be completely independent from the allele frequency difference between C and D. In that case, F4(A, B; C, D) should be zero, or at least not statistically different from zero. However, if there was gene flow from C or D into A or B, the statistic should be different from zero. Specifically, if the statistic is significantly negative, it implies gene flow between either C and B, or D and A. If it is significantly positive, it implies gene flow between A and C, or B and D.\n", "\n", "The way this statistic is often used, is to put a divergent outgroup as population A, for which we know for sure that there was no admixture into either C or D. With this setup, we can then test for gene flow between B and D (if the statistic is positive), or B and C (if it is negative).\n", "\n", "Here, we can use this statistic to test for East Asian admixture in Finns, similarly to the test using Admixture F3 statistics above. We will use the `qpDstat` program from [AdmixTools](https://github.com/DReichLab/AdmixTools) for that. We need to again prepare a population list file, this time with four populations (A, B, C, D). I suggest you open a new file and fill it with:\n", "\n", "\tMbuti Nganasan French Finnish\n", "\tMbuti Nganasan Icelandic Finnish\n", "\tMbuti Nganasan Lithuanian Finnish\n", "\tMbuti Nganasan Norwegian Finnish\n", "\tMbuti BolshoyOleniOstrov French Finnish\n", "\tMbuti BolshoyOleniOstrov Icelandic Finnish\n", "\tMbuti BolshoyOleniOstrov Lithuanian Finnish\n", "\tMbuti BolshoyOleniOstrov Norwegian Finnish" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can then use this file again in a parameter file, similar to the one prepared for `qp3Pop` above:\n", "\n", "\tgenotypename: input genotype file (in eigenstrat format)\n", "\tsnpname: input snp file (in eigenstrat format)\n", "\tindivname: input indiv file (in eigenstrat format)\n", "\tpopfilename: a file containing rows with three populations on each line A, B and C.\n", "\tf4mode: YES\n", "\n", "Note that you cannot give the “inbreed” option here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***Exercise:*** Prepare the parameter file as suggested above and then run `qpDstat -p PARAMETER_FILE`, where `PARAMETERFILE` should be replaced by your parameter file name. This will take 5-6 minutes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results should be (skipping some header lines):\n", "\n", " result:\tMbuti\tNganasan French Finnish 0.002363 19.016 29254 27852 593124\n", "\tresult:\tMbuti\tNganasan Icelandic Finnish 0.001721 11.926 28915 27894 593124\n", "\tresult:\tMbuti\tNganasan Lithuanian Finnish 0.001368 9.664 28745 27933 593124\n", "\tresult:\tMbuti\tNganasan Norwegian Finnish 0.001685 11.663 28933 27934 593124\n", "\tresult:\tMbuti\tBolshoyOleniOstrov French Finnish 0.001962 16.737 27249 26175 547486\n", "\tresult:\tMbuti\tBolshoyOleniOstrov Icelandic Finnish 0.001084 7.776 26876 26282 547486\n", "\tresult:\tMbuti\tBolshoyOleniOstrov Lithuanian Finnish 0.000554 3.942 26683 26380 547486\n", "\tresult:\tMbuti\tBolshoyOleniOstrov Norwegian Finnish 0.000952 6.707 26873 26351 547486" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, the key columns are columns 2, 3, 4 and 5, denoting A, B, C and D, and column 6 and 7, which denote the F4 statistic and the Z score, measuring significance in difference from zero.\n", "\n", "As you can see, in all cases, the Z score is positive and larger than 3, indicating a significant deviation from zero, and implying gene flow between Nganasan and Finnish, and BolshoyOleniOstrov and Finnish, when compared to French, Icelandic, Lithuanian or Norwegian." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Outgroup-F3-Statistics\n", "Outgroup F3 statistics are a special case how to use F3 statistics. The definition is the same as for Admixture F3 statistics, but instead of a target C and two source populations A and B, one now gives an outgroup C and two test populations A and B.\n", "\n", "To get an intuition for this statistics, consider the following tree:\n", "\n", "\"Outgroup-F3-phylogeny\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this scenario, the statistic F3(A, B; C) measures the branch length from C to the common ancestor of A and B, coloured red. So this statistic is simply a measure of how closely two population A and B are related with each other, as measured from a distant outgroup. It is thus a similarity measure: The higher the statistic, the more genetically similar A and B are to one another.\n", "\n", "We can use this statistic to measure for example the the genetic affinity to East Asia, by performing the statistic F3(Han, X; Mbuti), where Mbuti is a distant African population and acts as outgroup here, Han denote Han Chinese, and X denotes various European populations that we want to test.\n", "\n", "You need to start, again, by preparing a list of population triples to be measured. I suggest the following list:\n", "\n", "\tHan Chuvash Mbuti\n", "\tHan Albanian Mbuti\n", "\tHan Armenian Mbuti\n", "\tHan Bulgarian Mbuti\n", "\tHan Czech Mbuti\n", "\tHan Druze Mbuti\n", "\tHan English Mbuti\n", "\tHan Estonian Mbuti\n", "\tHan Finnish Mbuti\n", "\tHan French Mbuti\n", "\tHan Georgian Mbuti\n", "\tHan Greek Mbuti\n", "\tHan Hungarian Mbuti\n", "\tHan Icelandic Mbuti\n", "\tHan Italian_North Mbuti\n", "\tHan Italian_South Mbuti\n", "\tHan Lithuanian Mbuti\n", "\tHan Maltese Mbuti\n", "\tHan Mordovian Mbuti\n", "\tHan Norwegian Mbuti\n", "\tHan Orcadian Mbuti\n", "\tHan Russian Mbuti\n", "\tHan Sardinian Mbuti\n", "\tHan Scottish Mbuti\n", "\tHan Sicilian Mbuti\n", "\tHan Spanish_North Mbuti\n", "\tHan Spanish Mbuti\n", "\tHan Ukrainian Mbuti\n", "\tHan Levanluhta Mbuti\n", "\tHan BolshoyOleniOstrov Mbuti\n", "\tHan ChalmnyVarre Mbuti\n", "\tHan Saami.DG Mbuti" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "which cycles through many populations from Europe, including the ancient individuals from Chalmny Varre, Bolshoy Oleni Ostrov and Levänluhta." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***Exercise:*** Copy this list into a file, and prepare a parameter file for running `qp3Pop`, similar to the parameter file for admixture F3 statistics above, and run `qp3Pop` with that parameter file as above. Note that here you don't need the line beginning with `inbreed`. This will take up to 10 minutes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should find this (skipping header lines from the output):\n", "\n", "\t Source 1 Source 2 Target f_3 std. err Z SNPs\n", "\tresult: Han Chuvash Mbuti 0.233652 0.002072 112.782 502678\n", "\tresult: Han Albanian Mbuti 0.215629 0.002029 106.291 501734\n", "\tresult: Han Armenian Mbuti 0.213724 0.001963 108.882 504370\n", "\tresult: Han Bulgarian Mbuti 0.216193 0.001979 109.266 504310\n", "\tresult: Han Czech Mbuti 0.218060 0.002002 108.939 504089\n", "\tresult: Han Druze Mbuti 0.209551 0.001919 109.205 510853\n", "\tresult: Han English Mbuti 0.216959 0.001973 109.954 504161\n", "\tresult: Han Estonian Mbuti 0.220730 0.002019 109.332 503503\n", "\tresult: Han Finnish Mbuti 0.223447 0.002044 109.345 502217\n", "\tresult: Han French Mbuti 0.216623 0.001969 110.012 509613\n", "\tresult: Han Georgian Mbuti 0.214295 0.001935 110.721 503598\n", "\tresult: Han Greek Mbuti 0.215203 0.001984 108.465 507475\n", "\tresult: Han Hungarian Mbuti 0.217894 0.001999 109.004 507409\n", "\tresult: Han Icelandic Mbuti 0.218683 0.002015 108.553 504655\n", "\tresult: Han Italian_North Mbuti 0.215332 0.001978 108.854 507589\n", "\tresult: Han Italian_South Mbuti 0.211787 0.002271 93.265 492400\n", "\tresult: Han Lithuanian Mbuti 0.219615 0.002032 108.098 503681\n", "\tresult: Han Maltese Mbuti 0.210359 0.001956 107.542 503985\n", "\tresult: Han Mordovian Mbuti 0.223469 0.002008 111.296 503441\n", "\tresult: Han Norwegian Mbuti 0.218873 0.002023 108.197 504621\n", "\tresult: Han Orcadian Mbuti 0.217773 0.002014 108.115 504993\n", "\tresult: Han Russian Mbuti 0.223993 0.001995 112.274 506525\n", "\tresult: Han Sardinian Mbuti 0.213230 0.001980 107.711 508413\n", "\tresult: Han Scottish Mbuti 0.218489 0.002039 107.145 499784\n", "\tresult: Han Sicilian Mbuti 0.212272 0.001975 107.486 505477\n", "\tresult: Han Spanish_North Mbuti 0.215885 0.002029 106.383 500853\n", "\tresult: Han Spanish Mbuti 0.213869 0.001975 108.297 513648\n", "\tresult: Han Ukrainian Mbuti 0.218716 0.002007 108.950 503981\n", "\tresult: Han Levanluhta Mbuti 0.236252 0.002383 99.123 263049\n", "\tresult: Han BolshoyOleniOstrov Mbuti 0.247814 0.002177 113.849 457102\n", "\tresult: Han ChalmnyVarre Mbuti 0.233499 0.002304 101.345 366220\n", "\tresult: Han Saami.DG Mbuti 0.236198 0.002274 103.852 489038" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now it’s time to plot these results using python. Copy the results (all lines from the output beginning with “results:”) into a text file named \"f3_outgroup_stats_Han.txt\", and load it into a pandas dataframe using:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import pandas as pd\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "d = pd.read_csv(\"f3_outgroup_stats_Han.txt\",\n", " delim_whitespace=True,\n", " names=[\"dummy\", \"A\", \"B\", \"C\", \"F3\", \"StdErr\", \"Z\", \"SNPS\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can check that it worked:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dummyABCF3StdErrZSNPS
0result:HanChuvashMbuti0.2336520.002072112.782502678
1result:HanAlbanianMbuti0.2156290.002029106.291501734
2result:HanArmenianMbuti0.2137240.001963108.882504370
3result:HanBulgarianMbuti0.2161930.001979109.266504310
4result:HanCzechMbuti0.2180600.002002108.939504089
5result:HanDruzeMbuti0.2095510.001919109.205510853
6result:HanEnglishMbuti0.2169590.001973109.954504161
7result:HanEstonianMbuti0.2207300.002019109.332503503
8result:HanFinnishMbuti0.2234470.002044109.345502217
9result:HanFrenchMbuti0.2166230.001969110.012509613
10result:HanGeorgianMbuti0.2142950.001935110.721503598
11result:HanGreekMbuti0.2152030.001984108.465507475
12result:HanHungarianMbuti0.2178940.001999109.004507409
13result:HanIcelandicMbuti0.2186830.002015108.553504655
14result:HanItalian_NorthMbuti0.2153320.001978108.854507589
15result:HanItalian_SouthMbuti0.2117870.00227193.265492400
16result:HanLithuanianMbuti0.2196150.002032108.098503681
17result:HanMalteseMbuti0.2103590.001956107.542503985
18result:HanMordovianMbuti0.2234690.002008111.296503441
19result:HanNorwegianMbuti0.2188730.002023108.197504621
20result:HanOrcadianMbuti0.2177730.002014108.115504993
21result:HanRussianMbuti0.2239930.001995112.274506525
22result:HanSardinianMbuti0.2132300.001980107.711508413
23result:HanScottishMbuti0.2184890.002039107.145499784
24result:HanSicilianMbuti0.2122720.001975107.486505477
25result:HanSpanish_NorthMbuti0.2158850.002029106.383500853
26result:HanSpanishMbuti0.2138690.001975108.297513648
27result:HanUkrainianMbuti0.2187160.002007108.950503981
28result:HanLevanluhtaMbuti0.2362520.00238399.123263049
29result:HanBolshoyOleniOstrovMbuti0.2478140.002177113.849457102
30result:HanChalmnyVarreMbuti0.2334990.002304101.345366220
31result:HanSaami.DGMbuti0.2361980.002274103.852489038
\n", "
" ], "text/plain": [ " dummy A B C F3 StdErr Z \\\n", "0 result: Han Chuvash Mbuti 0.233652 0.002072 112.782 \n", "1 result: Han Albanian Mbuti 0.215629 0.002029 106.291 \n", "2 result: Han Armenian Mbuti 0.213724 0.001963 108.882 \n", "3 result: Han Bulgarian Mbuti 0.216193 0.001979 109.266 \n", "4 result: Han Czech Mbuti 0.218060 0.002002 108.939 \n", "5 result: Han Druze Mbuti 0.209551 0.001919 109.205 \n", "6 result: Han English Mbuti 0.216959 0.001973 109.954 \n", "7 result: Han Estonian Mbuti 0.220730 0.002019 109.332 \n", "8 result: Han Finnish Mbuti 0.223447 0.002044 109.345 \n", "9 result: Han French Mbuti 0.216623 0.001969 110.012 \n", "10 result: Han Georgian Mbuti 0.214295 0.001935 110.721 \n", "11 result: Han Greek Mbuti 0.215203 0.001984 108.465 \n", "12 result: Han Hungarian Mbuti 0.217894 0.001999 109.004 \n", "13 result: Han Icelandic Mbuti 0.218683 0.002015 108.553 \n", "14 result: Han Italian_North Mbuti 0.215332 0.001978 108.854 \n", "15 result: Han Italian_South Mbuti 0.211787 0.002271 93.265 \n", "16 result: Han Lithuanian Mbuti 0.219615 0.002032 108.098 \n", "17 result: Han Maltese Mbuti 0.210359 0.001956 107.542 \n", "18 result: Han Mordovian Mbuti 0.223469 0.002008 111.296 \n", "19 result: Han Norwegian Mbuti 0.218873 0.002023 108.197 \n", "20 result: Han Orcadian Mbuti 0.217773 0.002014 108.115 \n", "21 result: Han Russian Mbuti 0.223993 0.001995 112.274 \n", "22 result: Han Sardinian Mbuti 0.213230 0.001980 107.711 \n", "23 result: Han Scottish Mbuti 0.218489 0.002039 107.145 \n", "24 result: Han Sicilian Mbuti 0.212272 0.001975 107.486 \n", "25 result: Han Spanish_North Mbuti 0.215885 0.002029 106.383 \n", "26 result: Han Spanish Mbuti 0.213869 0.001975 108.297 \n", "27 result: Han Ukrainian Mbuti 0.218716 0.002007 108.950 \n", "28 result: Han Levanluhta Mbuti 0.236252 0.002383 99.123 \n", "29 result: Han BolshoyOleniOstrov Mbuti 0.247814 0.002177 113.849 \n", "30 result: Han ChalmnyVarre Mbuti 0.233499 0.002304 101.345 \n", "31 result: Han Saami.DG Mbuti 0.236198 0.002274 103.852 \n", "\n", " SNPS \n", "0 502678 \n", "1 501734 \n", "2 504370 \n", "3 504310 \n", "4 504089 \n", "5 510853 \n", "6 504161 \n", "7 503503 \n", "8 502217 \n", "9 509613 \n", "10 503598 \n", "11 507475 \n", "12 507409 \n", "13 504655 \n", "14 507589 \n", "15 492400 \n", "16 503681 \n", "17 503985 \n", "18 503441 \n", "19 504621 \n", "20 504993 \n", "21 506525 \n", "22 508413 \n", "23 499784 \n", "24 505477 \n", "25 500853 \n", "26 513648 \n", "27 503981 \n", "28 263049 \n", "29 457102 \n", "30 366220 \n", "31 489038 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "d_sorted = d.sort_values(by=\"F3\")\n", "y = range(len(d_sorted))\n", "plt.figure(figsize=(6, 8))\n", "plt.errorbar(d_sorted[\"F3\"], y, xerr=d[\"StdErr\"], fmt='o')\n", "plt.yticks(y, d_sorted[\"B\"]);\n", "plt.xlabel(\"F3(Han, Test; Mbuti)\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As expected, the ancient samples and modern Saami are the ones with the highest allele sharing with present-day East Asians (as represented by Han) compared to many other Europeans." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Outgroup F3 Statistics Scatter plot\n", "\n", "The above plot shows an intriguing cline of differential relatedness to Han in many Europeans. For example, would you have guessed that Icelandics are closer to Han than Armenians are to Han? This is very surprising, and it shows that European ancestry has a complex relationship to East Asians. To understand this better, you can read [Patterson 2012](http://www.genetics.org/content/early/2012/09/06/genetics.112.145037), who makes some intriguing observations. Patterson and colleagues use Admixture F3 statistics and apply it to many populations world-wide. They summarise some population triples with the most negative F3 statistics in the following table:\n", "\n", "\"Patterson" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are many interesting results here, but one of the most striking one is the finding of F3(Sardinian, Karitiana; French), which is highly significantly negative. This statistics implies that French are admixed between Sardinians and Karitiana, a Native American population from Brazil. How is that possible? We can of course rule out any recent Native American backflow into Europe.\n", "\n", "Patterson and colleagues explained this finding with hypothesising an ancient admixture event, from a Siberian population that contributed to both Europeans and to Native Americans. They termed that population the “Ancient North Eurasians (ANE)”. The following admixture graph was suggested:\n", "\n", "\"Patterson" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, the idea is that modern Central Europeans, such as French, are admixed between Southern Europeans (Sardinians) and ANE. The Ancient North Eurasians are a classic example for a “Ghost” population, a population which does not exist anymore in unmixed form, and from which we have no direct individual representative.\n", "\n", "Amazingly, two years after the publication of [Patterson 2012](http://www.genetics.org/content/early/2012/09/06/genetics.112.145037), the ANE ghost population was actually found: [Raghavan et al.](https://www.nature.com/articles/nature12736) and colleagues, in 2014, published a paper called “Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans”. A 24,000 year old boy (called MA1) from the site of “Mal’ta” in Siberia was shown to have close genetic affinity with both Europeans and in particular Native Americans, just as proposed in [Patterson 2012](http://www.genetics.org/content/early/2012/09/06/genetics.112.145037).\n", "\n", "The affinities are summarised nicely in this figure from [Raghavan et al.](https://www.nature.com/articles/nature12736):\n", "\n", "\"MA1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, so we now know that ancestry related to Native Americans contributed to European countries. Could that possibly explain the affinity of our ancient samples and Saami to Han Chinese in some way? To test this, we will run the same Outgroup F3 statistics as above, but this time not with Han but with MA1 as test population. Specifically, we run the following population triples in `qp3Pop`:\n", "\n", "\tMA1_HG.SG Chuvash Mbuti\n", "\tMA1_HG.SG Albanian Mbuti\n", "\tMA1_HG.SG Armenian Mbuti\n", "\tMA1_HG.SG Bulgarian Mbuti\n", "\tMA1_HG.SG Czech Mbuti\n", "\tMA1_HG.SG Druze Mbuti\n", "\tMA1_HG.SG English Mbuti\n", "\tMA1_HG.SG Estonian Mbuti\n", "\tMA1_HG.SG Finnish Mbuti\n", "\tMA1_HG.SG French Mbuti\n", "\tMA1_HG.SG Georgian Mbuti\n", "\tMA1_HG.SG Greek Mbuti\n", "\tMA1_HG.SG Hungarian Mbuti\n", "\tMA1_HG.SG Icelandic Mbuti\n", "\tMA1_HG.SG Italian_North Mbuti\n", "\tMA1_HG.SG Italian_South Mbuti\n", "\tMA1_HG.SG Lithuanian Mbuti\n", "\tMA1_HG.SG Maltese Mbuti\n", "\tMA1_HG.SG Mordovian Mbuti\n", "\tMA1_HG.SG Norwegian Mbuti\n", "\tMA1_HG.SG Orcadian Mbuti\n", "\tMA1_HG.SG Russian Mbuti\n", "\tMA1_HG.SG Sardinian Mbuti\n", "\tMA1_HG.SG Scottish Mbuti\n", "\tMA1_HG.SG Sicilian Mbuti\n", "\tMA1_HG.SG Spanish_North Mbuti\n", "\tMA1_HG.SG Spanish Mbuti\n", "\tMA1_HG.SG Ukrainian Mbuti\n", "\tMA1_HG.SG Levanluhta Mbuti\n", "\tMA1_HG.SG BolshoyOleniOstrov Mbuti\n", "\tMA1_HG.SG ChalmnyVarre Mbuti\n", "\tMA1_HG.SG Saami.DG Mbuti\n", "\n", "here, `MA1_HG.SG` is the somewhat cryptic population name for the MA1-individual." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***Exercise:*** Follow the same protocol as above: Copy the list into a file, prepare a parameter file for `qp3Pop` with that population triple list, and run `qp3Pop`. Copy the results (all lines beginning with “results:”) into a file, named \"f3_outgroup_stats_MA1.txt\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To test in what way the relationship to Han Chinese is correlated with the relationship with MA1, we will now plot the two statistics against each other in a scatter plot. We first have to merge the two outgroup-F3 datasets together. Here is the code including loading (assuming that the two F3 dataframes are called `outgroupf3dat_Han` and `outgroupf3dat_MA1`):" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "outgroupf3dat_Han = pd.read_csv(\"f3_outgroup_stats_Han.txt\",\n", " delim_whitespace=True,\n", " names=[\"dummy\", \"A\", \"B\", \"C\", \"F3\", \"stderr\", \"Z\", \"nSNPs\"])\n", "outgroupf3dat_MA1 = pd.read_csv(\"f3_outgroup_stats_MA1.txt\",\n", " delim_whitespace=True,\n", " names=[\"dummy\", \"A\", \"B\", \"C\", \"F3\", \"stderr\", \"Z\", \"nSNPs\"])\n", "\n", "outgroupf3dat_merged = outgroupf3dat_Han.merge(outgroupf3dat_MA1, on=\"B\", suffixes=(\"_Han\", \"_MA1\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, we check that everything worked:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dummy_HanA_HanBC_HanF3_Hanstderr_HanZ_HannSNPs_Handummy_MA1A_MA1C_MA1F3_MA1stderr_MA1Z_MA1nSNPs_MA1
0result:HanChuvashMbuti0.2336520.002072112.782502678result:MA1_HG.SGMbuti0.2438180.002349103.781350484
1result:HanAlbanianMbuti0.2156290.002029106.291501734result:MA1_HG.SGMbuti0.2364940.002296103.008344332
2result:HanArmenianMbuti0.2137240.001963108.882504370result:MA1_HG.SGMbuti0.2313990.002264102.229349612
3result:HanBulgarianMbuti0.2161930.001979109.266504310result:MA1_HG.SGMbuti0.2374980.002281104.103349800
4result:HanCzechMbuti0.2180600.002002108.939504089result:MA1_HG.SGMbuti0.2432240.002328104.457349553
5result:HanDruzeMbuti0.2095510.001919109.205510853result:MA1_HG.SGMbuti0.2267400.002197103.193359004
6result:HanEnglishMbuti0.2169590.001973109.954504161result:MA1_HG.SGMbuti0.2431350.002317104.941349321
7result:HanEstonianMbuti0.2207300.002019109.332503503result:MA1_HG.SGMbuti0.2470650.002362104.619348861
8result:HanFinnishMbuti0.2234470.002044109.345502217result:MA1_HG.SGMbuti0.2456840.002379103.266347208
9result:HanFrenchMbuti0.2166230.001969110.012509613result:MA1_HG.SGMbuti0.2402350.002269105.886357842
10result:HanGeorgianMbuti0.2142950.001935110.721503598result:MA1_HG.SGMbuti0.2326450.002253103.243349082
11result:HanGreekMbuti0.2152030.001984108.465507475result:MA1_HG.SGMbuti0.2365660.002280103.757355261
12result:HanHungarianMbuti0.2178940.001999109.004507409result:MA1_HG.SGMbuti0.2417200.002313104.483355340
13result:HanIcelandicMbuti0.2186830.002015108.553504655result:MA1_HG.SGMbuti0.2444880.002386102.481350287
14result:HanItalian_NorthMbuti0.2153320.001978108.854507589result:MA1_HG.SGMbuti0.2364070.002273104.002354999
15result:HanItalian_SouthMbuti0.2117870.00227193.265492400result:MA1_HG.SGMbuti0.2308390.00276783.427321217
16result:HanLithuanianMbuti0.2196150.002032108.098503681result:MA1_HG.SGMbuti0.2468640.002403102.718348656
17result:HanMalteseMbuti0.2103590.001956107.542503985result:MA1_HG.SGMbuti0.2302000.002259101.903347725
18result:HanMordovianMbuti0.2234690.002008111.296503441result:MA1_HG.SGMbuti0.2452840.002346104.571350058
19result:HanNorwegianMbuti0.2188730.002023108.197504621result:MA1_HG.SGMbuti0.2439300.002301106.031350182
20result:HanOrcadianMbuti0.2177730.002014108.115504993result:MA1_HG.SGMbuti0.2436140.002320105.008351053
21result:HanRussianMbuti0.2239930.001995112.274506525result:MA1_HG.SGMbuti0.2452120.002298106.698355953
22result:HanSardinianMbuti0.2132300.001980107.711508413result:MA1_HG.SGMbuti0.2319670.002264102.449355548
23result:HanScottishMbuti0.2184890.002039107.145499784result:MA1_HG.SGMbuti0.2445980.002434100.512339441
24result:HanSicilianMbuti0.2122720.001975107.486505477result:MA1_HG.SGMbuti0.2311410.002260102.297351028
25result:HanSpanish_NorthMbuti0.2158850.002029106.383500853result:MA1_HG.SGMbuti0.2384790.00242698.319341661
26result:HanSpanishMbuti0.2138690.001975108.297513648result:MA1_HG.SGMbuti0.2353860.002257104.293361951
27result:HanUkrainianMbuti0.2187160.002007108.950503981result:MA1_HG.SGMbuti0.2435510.002345103.881348948
28result:HanLevanluhtaMbuti0.2362520.00238399.123263049result:MA1_HG.SGMbuti0.2476400.00303081.728174148
29result:HanBolshoyOleniOstrovMbuti0.2478140.002177113.849457102result:MA1_HG.SGMbuti0.2560410.00262497.561305851
30result:HanChalmnyVarreMbuti0.2334990.002304101.345366220result:MA1_HG.SGMbuti0.2496190.00286287.212239594
31result:HanSaami.DGMbuti0.2361980.002274103.852489038result:MA1_HG.SGMbuti0.2515300.00262295.922326072
\n", "
" ], "text/plain": [ " dummy_Han A_Han B C_Han F3_Han stderr_Han Z_Han \\\n", "0 result: Han Chuvash Mbuti 0.233652 0.002072 112.782 \n", "1 result: Han Albanian Mbuti 0.215629 0.002029 106.291 \n", "2 result: Han Armenian Mbuti 0.213724 0.001963 108.882 \n", "3 result: Han Bulgarian Mbuti 0.216193 0.001979 109.266 \n", "4 result: Han Czech Mbuti 0.218060 0.002002 108.939 \n", "5 result: Han Druze Mbuti 0.209551 0.001919 109.205 \n", "6 result: Han English Mbuti 0.216959 0.001973 109.954 \n", "7 result: Han Estonian Mbuti 0.220730 0.002019 109.332 \n", "8 result: Han Finnish Mbuti 0.223447 0.002044 109.345 \n", "9 result: Han French Mbuti 0.216623 0.001969 110.012 \n", "10 result: Han Georgian Mbuti 0.214295 0.001935 110.721 \n", "11 result: Han Greek Mbuti 0.215203 0.001984 108.465 \n", "12 result: Han Hungarian Mbuti 0.217894 0.001999 109.004 \n", "13 result: Han Icelandic Mbuti 0.218683 0.002015 108.553 \n", "14 result: Han Italian_North Mbuti 0.215332 0.001978 108.854 \n", "15 result: Han Italian_South Mbuti 0.211787 0.002271 93.265 \n", "16 result: Han Lithuanian Mbuti 0.219615 0.002032 108.098 \n", "17 result: Han Maltese Mbuti 0.210359 0.001956 107.542 \n", "18 result: Han Mordovian Mbuti 0.223469 0.002008 111.296 \n", "19 result: Han Norwegian Mbuti 0.218873 0.002023 108.197 \n", "20 result: Han Orcadian Mbuti 0.217773 0.002014 108.115 \n", "21 result: Han Russian Mbuti 0.223993 0.001995 112.274 \n", "22 result: Han Sardinian Mbuti 0.213230 0.001980 107.711 \n", "23 result: Han Scottish Mbuti 0.218489 0.002039 107.145 \n", "24 result: Han Sicilian Mbuti 0.212272 0.001975 107.486 \n", "25 result: Han Spanish_North Mbuti 0.215885 0.002029 106.383 \n", "26 result: Han Spanish Mbuti 0.213869 0.001975 108.297 \n", "27 result: Han Ukrainian Mbuti 0.218716 0.002007 108.950 \n", "28 result: Han Levanluhta Mbuti 0.236252 0.002383 99.123 \n", "29 result: Han BolshoyOleniOstrov Mbuti 0.247814 0.002177 113.849 \n", "30 result: Han ChalmnyVarre Mbuti 0.233499 0.002304 101.345 \n", "31 result: Han Saami.DG Mbuti 0.236198 0.002274 103.852 \n", "\n", " nSNPs_Han dummy_MA1 A_MA1 C_MA1 F3_MA1 stderr_MA1 Z_MA1 \\\n", "0 502678 result: MA1_HG.SG Mbuti 0.243818 0.002349 103.781 \n", "1 501734 result: MA1_HG.SG Mbuti 0.236494 0.002296 103.008 \n", "2 504370 result: MA1_HG.SG Mbuti 0.231399 0.002264 102.229 \n", "3 504310 result: MA1_HG.SG Mbuti 0.237498 0.002281 104.103 \n", "4 504089 result: MA1_HG.SG Mbuti 0.243224 0.002328 104.457 \n", "5 510853 result: MA1_HG.SG Mbuti 0.226740 0.002197 103.193 \n", "6 504161 result: MA1_HG.SG Mbuti 0.243135 0.002317 104.941 \n", "7 503503 result: MA1_HG.SG Mbuti 0.247065 0.002362 104.619 \n", "8 502217 result: MA1_HG.SG Mbuti 0.245684 0.002379 103.266 \n", "9 509613 result: MA1_HG.SG Mbuti 0.240235 0.002269 105.886 \n", "10 503598 result: MA1_HG.SG Mbuti 0.232645 0.002253 103.243 \n", "11 507475 result: MA1_HG.SG Mbuti 0.236566 0.002280 103.757 \n", "12 507409 result: MA1_HG.SG Mbuti 0.241720 0.002313 104.483 \n", "13 504655 result: MA1_HG.SG Mbuti 0.244488 0.002386 102.481 \n", "14 507589 result: MA1_HG.SG Mbuti 0.236407 0.002273 104.002 \n", "15 492400 result: MA1_HG.SG Mbuti 0.230839 0.002767 83.427 \n", "16 503681 result: MA1_HG.SG Mbuti 0.246864 0.002403 102.718 \n", "17 503985 result: MA1_HG.SG Mbuti 0.230200 0.002259 101.903 \n", "18 503441 result: MA1_HG.SG Mbuti 0.245284 0.002346 104.571 \n", "19 504621 result: MA1_HG.SG Mbuti 0.243930 0.002301 106.031 \n", "20 504993 result: MA1_HG.SG Mbuti 0.243614 0.002320 105.008 \n", "21 506525 result: MA1_HG.SG Mbuti 0.245212 0.002298 106.698 \n", "22 508413 result: MA1_HG.SG Mbuti 0.231967 0.002264 102.449 \n", "23 499784 result: MA1_HG.SG Mbuti 0.244598 0.002434 100.512 \n", "24 505477 result: MA1_HG.SG Mbuti 0.231141 0.002260 102.297 \n", "25 500853 result: MA1_HG.SG Mbuti 0.238479 0.002426 98.319 \n", "26 513648 result: MA1_HG.SG Mbuti 0.235386 0.002257 104.293 \n", "27 503981 result: MA1_HG.SG Mbuti 0.243551 0.002345 103.881 \n", "28 263049 result: MA1_HG.SG Mbuti 0.247640 0.003030 81.728 \n", "29 457102 result: MA1_HG.SG Mbuti 0.256041 0.002624 97.561 \n", "30 366220 result: MA1_HG.SG Mbuti 0.249619 0.002862 87.212 \n", "31 489038 result: MA1_HG.SG Mbuti 0.251530 0.002622 95.922 \n", "\n", " nSNPs_MA1 \n", "0 350484 \n", "1 344332 \n", "2 349612 \n", "3 349800 \n", "4 349553 \n", "5 359004 \n", "6 349321 \n", "7 348861 \n", "8 347208 \n", "9 357842 \n", "10 349082 \n", "11 355261 \n", "12 355340 \n", "13 350287 \n", "14 354999 \n", "15 321217 \n", "16 348656 \n", "17 347725 \n", "18 350058 \n", "19 350182 \n", "20 351053 \n", "21 355953 \n", "22 355548 \n", "23 339441 \n", "24 351028 \n", "25 341661 \n", "26 361951 \n", "27 348948 \n", "28 174148 \n", "29 305851 \n", "30 239594 \n", "31 326072 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "outgroupf3dat_merged" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can make a scatter plot:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(10, 10))\n", "plt.scatter(x=outgroupf3dat_merged[\"F3_Han\"], y=outgroupf3dat_merged[\"F3_MA1\"])\n", "plt.xlabel(\"F3(Test, Han; Mbuti)\");\n", "plt.ylabel(\"F3(Test, MA1; Mbuti)\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This isn’t very useful, however, as we cannot see which point is which population. We can use the annotation function from matplotlib to add text labels to each point:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(10, 10))\n", "plt.scatter(x=outgroupf3dat_merged[\"F3_Han\"], y=outgroupf3dat_merged[\"F3_MA1\"])\n", "for i, row in outgroupf3dat_merged.iterrows():\n", " plt.annotate(row[\"B\"], (row[\"F3_Han\"], row[\"F3_MA1\"]))\n", "plt.xlabel(\"F3(Test, Han; Mbuti)\");\n", "plt.ylabel(\"F3(Test, MA1; Mbuti)\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result shows that indeed the affinity to East Asians in the bulk of European contries can be explained by MA1-related ancestry. Most European countries have a linear relationship between their affinity to Han and their affinity to MA1. However, this is not true for our ancient samples from Fennoscandia and for modern Saami and Chuvash, who have extra affinity to Han not explained by MA1 ([Lazaridis et al. 2014](https://www.nature.com/articles/nature13673)).\n", "\n", "Now, why there is a connection between MA1 and Han is not trivial to explain. The most probable explanation involves \"Basal Eurasian\" ancestry, which happens to be anti-correlated to MA1-ancestry in Europe, and which drives those populations with high \"Basal Eurasian\" ancestry further away from Han. See [Lazaridis et al. 2014](https://www.nature.com/articles/nature13673) for more details." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 2 }