{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction\n", "\n", "Set tests are a powerful approach for association testing between groups of genetic variants and quantitative traits.\n", "In this tutorial we demonstrate how to use set tests within the LIMIX framework to test for association (mtSet)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# mtSet\n", "\n", "Multi Trait Set test is an implementation of efficient set test algorithms for testing for associations between multiple genetic variants and multiple traits.\n", "mtSet can account for confounding factors such as relatedness and can be used for analysis of single traits.\n", "mtSet can be used both with the command line interface using the limix scripts (`mtSet_preprocess`, `mtSet_analyze`, `mtSet_postprocess`, `mtSet_simPheno`) or within python." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quick Start\n", "\n", "In the following, we give a brief example on how to use mtSet. As a case study, we use a subset of the genotypes from the 1000 project [1] and simulated phenotypes.\n", "\n", "All commands can be found in `_demos/runmtSet.sh`. In the following, we give a short summary of the individual steps. A demo for running mtSet-PC can be found in `_demos/runmtSetPC.sh` and it is not showcased here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "0. Our software depends on [Plink](https://www.cog-genomics.org/plink2) version 1.9 (or greater) for preprocessing. Please, make sure you have it before proceeding.\n", "\n", "1. Download and install Limix\n", "```bash\n", "git clone --depth 1 https://github.com/PMBio/limix.git\n", "pushd limix\n", "python setup.py install\n", "popd\n", "```\n", "\n", "2. Download tutorial\n", "```bash\n", "git clone --depth 1 https://github.com/PMBio/limix-tutorials.git\n", "cd limix-tutorials/mtSet\n", "python download_examples.py\n", "cd data\n", "mkdir out\n", "ls 1000g/\n", "```\n", "\n", "3. Set some handy shell variables\n", "```bash\n", "BFILE=1000g/chrom22_subsample20_maf0.10\n", "CFILE=out/chrom22\n", "PFILE=1000g/pheno\n", "WFILE=out/windows\n", "NFILE=out/null\n", "WSIZE=30000\n", "RESDIR=out/results\n", "OUTFILE=out/final\n", "```\n", "\n", "4. Preprocess and phenotype simulation\n", "```bash\n", "# Kinship matrix estimation\n", "mtSet_preprocess --compute_covariance --bfile $BFILE --cfile $CFILE \n", "# Fitting the null model and assigning the markers to windows\n", "mtSet_preprocess --precompute_windows --fit_null --bfile $BFILE --cfile $CFILE --pfile $PFILE --wfile $WFILE --nfile $NFILE --window_size $WSIZE --plot_windows\n", "```\n", "\n", "5. Analysing true genotypes\n", "```bash\n", "mtSet_analyze --bfile $BFILE --cfile $CFILE --pfile $PFILE --nfile $NFILE --wfile $WFILE --minSnps 4 --resdir $RESDIR --start_wnd 0 --end_wnd 100\n", "```\n", "\n", "6. Analysing permuted genotypes\n", "```bash\n", "for i in `seq 0 10`; do\n", " mtSet_analyze --bfile $BFILE --cfile $CFILE --pfile $PFILE --nfile $NFILE --wfile $WFILE --minSnps 4 --resdir $RESDIR --start_wnd 0 --end_wnd 100 --perm $i\n", "done\n", "```\n", "\n", "7. Postprocess\n", "```bash\n", "mtSet_postprocess --resdir $RESDIR --outfile $OUTFILE --manhattan_plot\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Processing\n", "\n", "Before getting started, we have to compute the sample-to-sample genetic covariance matrix, assign the markers to windows and estimate the trait-to-trait covariance matrices on the null model.\n", "\n", "### Computing the Covariance Matrix\n", "The covariance matrix can be pre-computed as follows:\n", "\n", "```bash\n", "mtSet_preprocess --compute_covariance --plink_path plink_path --bfile bfile --cfile cfile\n", "```\n", "\n", "where\n", "* __plink\\_path__ (default: plink) is a pointer to the [plink software](https://www.cog-genomics.org/plink2) (Version 1.9 or greater must be installed).\n", " If not set, a python covariance reader is employed.\n", " We strongly recommend using the plink reader for large datasets.\n", "* __bfile__ is the base name of of the binary bed file (__bfile__.bed, __bfile__.bim, __bfile__.fam are required).\n", "* __cfile__ is the base name of the output file.\n", " The relatedness matrix will be written to __cfile__.cov while the identifiers of the individuals are written to the file __cfile__.cov.id.\n", " The eigenvalue decomposition of the matrix is saved in the files __cfile__.cov.eval (eigenvalues) and __cfile__.cov.evec (eigenvectors).\n", " If __cfile__ is not specified, the files will be exported to the current directory with the following filenames __bfile__.cov, __bfile__.cov.id, __bfile__.cov.eval, __bfile__.cov.evec.\n", "\n", "### Precomputing the Principal Components \n", "The principal components can be pre-computed as follows:\n", "\n", "```bash\n", "mtSet_preprocess --compute_PCs k --plink_path plink_path --ffile ffile --bfile bfile\n", "```\n", "\n", "where\n", "* __k__ is the number of top principal components that are saved\n", "* __plink\\_path__ (default: plink) is a pointer to the [plink software](https://www.cog-genomics.org/plink2) (Version 1.9 or greater must be installed).\n", " If not set, a python genotype reader is employed.\n", " We strongly recommend using the plink reader for large datasets.\n", "* __ffile__ is the name of the fixed effects file, to which the principal components are written to.\n", "* __bfile__ is the base name of of the binary bed file (__bfile__.bed, __bfile__.bim, __bfile__.fam are required).\n", "\n", "\n", "### Fitting the null model\n", "\n", "To efficiently apply mtSet, it is neccessary to compute the null model beforehand.\n", "This can be done with the following command:\n", "\n", "```bash\n", "mtSet_preprocess --fit_null --bfile bfile --cfile cfile --nfile nfile --pfile pfile --ffile ffile --trait_idx trait_idx\n", "```\n", "\n", "where\n", "* __bfile__ is the base name of of the binary bed file (_bfile_.bed,_bfile_.bim,_bfile_.fam are required).\n", "* __cfile__ is the base name of the covariance file and its eigen decomposition (__cfile__.cov, __cfile__.cov.eval and __cfile__.cov.evec).\n", "If __cfile__ is not set, the relatedness component is omitted from the model.\n", "* __nfile__ is the base name of the output file.\n", "The estimated parameters are saved in __nfile__.p0, the negative log likelihood ratio in __nfile__.nll0, the trait-to-trait genetic covariance matrix in __nfile__.cg0 and the trait-to-trait residual covariance matrix in __nfile__.cn0. \n", "* __pfile__ is the base name of the phenotype file.\n", "* __ffile__ is the name of the file containing the covariates. Each covariate is saved in one column\n", "* __trait\\_idx__ can be used to specify a subset of the phenotypes. If more than one phenotype is selected, the phenotypes have to be seperated by commas. For instance, `--trait_idx 3,4` selects the phenotypes saved in the forth and fifth column (indexing starts with zero).\n", "\n", "Notice that phenotypes are standardized prior to model fitting.\n", "\n", "### Precomputing the windows\n", "For applying our set test, the markers have to be assigned to windows. We provide a method that splits the genome in windows of fixed sizes:\n", "\n", "```bash\n", "mtSet_preprocess --precompute_windows --bfile bfile --wfile wfile --window_size window_size --plot_windows\n", "```\n", "\n", "where\n", "* __bfile__ is the base name of of the binary bed file (__bfile__.bim is required).\n", "* __window\\_size__ is the size of the window (in basepairs). The default value is 30kb.\n", "* __wfile__ is the base name of the output file.\n", " If not specified, the file is saved as __bfile__.window\\_size.wnd in the current folder.\n", " Each window is stored in one line having the following format: index, chromosome, start position, stop position, index of startposition and number of SNPs.\n", "* __plot\\_windows__ if the flag is set, a histogram over the number of markers within a window is generated and saved as __wfile__.pdf.\n", "\n", "### Merging the preprocessing steps\n", "\n", "Here, we provided the commands to execute the three preprocessing operations individually. However, it is also possible to combine all steps in a single command:\n", "\n", "```bash\n", "mtSet_preprocess --compute_covariance --fit_null --precompute_windows ...\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Phenotype simulation\n", "\n", "Our software package also includes a command-line simulator that allows to generate phenotypes with a wide range of different genetic architectures.\n", "In brief, the simulator assumes a linear-additive model, considering the contribution of a randomly selected (causal) genetic region for the set component, polygenic background effects from all remaining genome-wide variants, a contribution from unmeasured factors and iid observation noise.\n", "For a detailed description of the simulation procedure, we refer to the Supplementary Methods. \n", "\n", "The simulator requires as input the genotypes and the relatedness component:\n", "\n", "```bash\n", "mtSet_simPheno --bfile bfile --cfile cfile --pfile pfile\n", "```\n", "\n", "where\n", "* __bfile__ is the name of of the binary bed file (__bfile__.bed, __bfile__.bim, __bfile__.fam are required).\n", "* __cfile__ is the name of the covariance matrix file (__cfile__.cov, __cfile__.cov.id are required).\n", "If none is specified, the covariance matrix is expected to be in the current folder, having the same filename as the bed file.\n", "* __pfile__ is the name of the output file (__pfile__.phe, __pfile__.region).\n", "The file __pfile__.phe contains the phenotypic values (each sample is saved in one row, each trait in one column).\n", "The file __pfile__.region contains the randomly selected causal region (chromsom, start position, end position). \n", "If __pfile__ is not specified, the files are saved in the current folder having an automatic generated filename containing the bed filename and the values of all simulation parameters.\n", "\n", "By changing the following parameters different genetic architectures can be simulated and, in particular, the simulation experiments of our paper can be reproduced.\n", "\n", "\n", "| Option | Default | Datatype | Explanation |\n", "| ------------- |:-------------:|:--------:| --------|\n", "| `--seed` | 0 | int | seed for random number generator |\n", "| `--nTraits` | 4 | int | number of simulated phenotypes |\n", "| `--windowSize` | 1.5e4 | int | size of causal region |\n", "| `--vTotR` | 0.05 | float | variance explained by the causal region |\n", "| `--nCausalR` | 10 | int | number of causal variants in the region |\n", "| `--pCommonR` | 0.8 | float | percentage of shared causal variants |\n", "| `--vTotBg` | 0.4 | float | variance explained by the polygenic background effects |\n", "| `--pHidden` | 0.6 | float | residual variance explained by hidden confounders (in %) |\n", "| `--pCommon` | 0.8 | float | background and residual signal that is shared across traits (in %) |\n", "| `--chrom` | None | int | specifies the chromosome of the causal region |\n", "| `--minPos` | None | int | specifies the min. chromosomal position of the causal region (in basepairs) |\n", "| `--maxPos` | None | int | specifies the max. chromosomal position of the causal region (in basepairs) |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Running analysis\n", "\n", "Once the preprocessing step has been used to obtain the genetic relatedness matrix, to fit the null model and to identify the genetic regions to be considered in the analysis, the set test can be run by the following analysis script:\n", "\n", "```bash\n", "mtSet_analyze --bfile bfile --cfile cfile --pfile pfile --nfile nfile --wfile wfile --ffile ffile --minSnps minSnps --start_wnd start_wnd --end_wnd end_wnd --resdir rdir --trait_idx traitIdx\n", "```\n", "\n", "where\n", "\n", "- __bfile__ is the base name of of the binary bed file (__bfile__.bed, __bfile__.bim, __bfile__.fam are required).\n", "- __cfile__ is the base name of the covariance matrix file. The script requires the files: __cfile__.cov containing the the genetic relatedness matrix, cfile.cov.id containing the corresponding sample identifiers, __cfile__.cov.eval and __cfile__.cov.evec containing the eigenvalues and eigenvectors of the matrix. If cfile is not set, the relatedness component is omitted from the model.\n", "- __pfile__ is the base name of the phenotype file. The script requires the file __pfile__.phe containing the phenotype data.\n", "- __nfile__ is the base name of the null model file. The script requires the file __nfile__.p0 containing the optimal null model parameters. If covariates are set, it also requires the file __nfile__.f0.\n", "- __wfile__ is the base name of the file containing the windows to be considered in the set test. The script requires the file __wfile__.wnd.\n", "- __ffile__ is the name of the file containing the covariates. Each covariate is saved in one column.\n", "- __perm__ is the seed used when permuting the genotypes. If the option is not specified then no permutation is considered.\n", "- __start\\_wnd__ is the index of the start window\n", "- __end\\_wnd__ is the index of the end window\n", "- __minSnps__ if set only windows containing at least minSnps are considered in the analysis\n", "rdir is the directory to which the results are exported. The results are exported in the folder rdir/perm if a permutation seed has been set, otherwise in the folder rdir/test. The output file is named - __start\\_wnd\\_endwnd__.res and contains results in the following format: window index, chromosome, start position, stop position, index of start position, number of SNPs and log likelihood ratio.\n", "- __rdir__ is the directory to which the results are exported. The results are exported in the folder rdir/perm if a permutation seed has been set, otherwise in the folder rdir/test. The output file is named __start\\_wnd\\_end\\_wnd__.res and contains results in the following format: window index, chromosome, start position, stop position, index of startposition, number of SNPs and log likelihood ratio.\n", "- __trait\\_idx__ can be used to specify a subset of the phenotypes. If more than one phenotype is selected, the phenotypes have to be seperated by commas. For instance __trait\\_idx__ 3,4 selects the phenotypes saved in the forth and fifth column (indexing starts with zero).\n", "Notice that phenotypes are standardized prior to model fitting." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Postprocessing\n", "\n", "After running mtSet, the following script can be used to merge the result files and estimate the p-values (p-values are obtained by a parametric fit of the test statistics): \n", "\n", "```bash\n", "mtSet_postprocess --resdir resdir --outfile outfile --manhattan_plot\n", "```\n", "\n", "where \n", "* __resdir__ is a pointer to the folder containing the result files of the analysis.\n", "* __outfile__ is the prefix of the two output files.\n", "__outfile__.perm lists the test statistics (first column) and p-values (second column) of the permutated windows\n", "__outfile__.test contains the (index, chromosome, start position, stop position, SNP index, number of SNPs, test statistics and p-value) of each window. Each window is saved in one row.\n", "* __manhattan\\_plot__ is a flag. If set, a manhattan plot is saved in __outfile__.manhattan.jpg (default: False)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Within Python\n", "\n", "This part of the tutorial shows how to use mtSet within python.\n", "For a tutorial on how to use mtSet from the command line using the limix scripts (mtSet_preprocess, mtSet_analyze, mtSet_postprocess, mtSet_simPheno) please refer to sections [Processing](#Processing), [Phenotype simulation](#Phenotype-simulation), [Running analysis](#Running-analysis), and [Postprocessing](#Postprocessing).\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setting up" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# activiate inline plotting\n", "%matplotlib inline\n", "\n", "from download_examples import get_1000G_mtSet\n", "import scipy as sp\n", "import scipy.linalg\n", "import limix" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# loading 1000G genotypes for mtSet demo\n", "get_1000G_mtSet()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# base name for bed, bim and fam\n", "bfile = './data/1000g/chrom22_subsample20_maf0.10'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Split genotypes into regions" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from limix.mtSet.core import plink_reader" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# import genotype positions\n", "bim = plink_reader.readBIM(bfile,usecols=(0,1,2,3))\n", "chrom = bim[:, 0].astype(float)\n", "pos = bim[:, -1].astype(float)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# uses splitter to split the genotypes\n", "from limix.mtSet.core.splitter import Splitter\n", "split = Splitter(pos=pos,chrom=chrom)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The method `splitGeno` allows to define the regions that will then considered for the analysis with mtSet.\n", "Information relative to the calculated regions can be cached in an external file by activating the cache option (see below).\n", "\n", "| Argument | Default | Datatype | Explanation |\n", "| ------------- |:-------------:|:--------:| --------|\n", "| __method__ | 'slidingWindow' | str | Uses a sliding window approach to define regions (a region-based approach will be availabe soon) |\n", "| __size__ | 5E+04 (50kb) | float | Window size. Pace is set at half the size of the window |\n", "| __minSnps__ | 1 | int | Windows with number of SNPs lower that this threshold are not considered |\n", "| __maxSnps__ | sp.inf | int | Windows with number of SNPs higher that this threshold are not considered |\n", "| __cache__ | False | bool | If true, it activates the caching |\n", "| __out\\_dir__ | './cache' | str | outdir of the cache file |\n", "| __fname__ | None | str | Name of the file |\n", "| __rewrite__ | False | bool | If true and the cache file already exists, the cache file is overwritten |" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1380 windows\n" ] } ], "source": [ "split.splitGeno(cache=True, fname='regions.h5', minSnps=4)\n", "print '%d windows' % split.nWindows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Apply\n", "\n", "In this section we showcase how to construct the mtSet class that will then be used for the set test analysis. We showcase both the full mtSet that models relatedness as random effect by the means of an individual-to-individual covariance matrix and the approximated model (mtSetPC) that models relatedness as fixed effect using principal component from the covariance." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# import phenotype and sample relatedness\n", "pheno_file = './data/1000g/pheno.phe'\n", "sample_relatedness_file = './data/1000g/chrom22.cov'\n", "Y = sp.loadtxt(pheno_file)\n", "R = sp.loadtxt(sample_relatedness_file)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# compute eigenvalues and eigenvectors of the sample relatedness matrix\n", "S_R, U_R = scipy.linalg.eigh(R) # these are needed for the full mtSet model" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# caculate fixed effects with rel\n", "F = U_R[:, ::-1][:, :10] # it considered the first 10 PCs\n", "F = sp.concatenate([F, sp.ones((F.shape[0], 1))], 1) # add an intercept term" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "if 0:\n", " # use full mtSet implementation\n", " # (relatedness is modeled as random effect by the means of\n", " # an individual-to-individual covariance matrix)\n", " mtSet = limix.MTSet(Y, S_R=S_R, U_R=U_R)\n", "else:\n", " # use mtSetPC\n", " # (relatedness s modelled as fixed effect\n", " # using principal component from the covariance)\n", " mtSet = limix.MTSet(Y, F=F)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Null model\n", "\n", "If the analysis is parallelized across different set of regions and permutations,\n", "it might be convenient to cache the results from the optimization of the null model\n", "(as the null model need to be optimized only once).\n", "\n", "| Argument | Default | Datatype | Explanation |\n", "| ------------- |:-------------:|:--------:| --------|\n", "| __cache__ | False | bool | If true, it activates the caching |\n", "| __out\\_dir__ | './cache' | str | outdir of the cache file |\n", "| __fname__ | None | str | Name of the file |\n", "| __rewrite__ | False | bool | If true and the cache file already exists, the cache file is overwritten |" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "RV = mtSet.fitNull()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The returned dictionary contains:\n", "* __B__: value of the optimized effect sizes\n", "* __Cg__: value of the genetic trait-to-trait covariance\n", "* __Cn__: value of the residual trait-to-trait covariance\n", "* __conv__: bool that indicates convergence of the optimization\n", "* __time__: time elpased for optimizing the parameters\n", "* __NLL0__: negative log likelihood of the null model\n", "* __LMLgrad__: norm of the gradient of the negative log likelihood dividived by the number of parameters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Test" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# read fam\n", "bim = plink_reader.readBIM(bfile,usecols=(0,1,2,3))\n", "fam = plink_reader.readFAM(bfile,usecols=(0,1))" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ".. window 0 - (22, 16025000-16075000) - 21 snps\n", ".. window 1 - (22, 16050000-16100000) - 23 snps\n", ".. window 2 - (22, 16125000-16175000) - 7 snps\n", ".. window 3 - (22, 16225000-16275000) - 9 snps\n", ".. window 4 - (22, 16250000-16300000) - 16 snps\n", ".. window 5 - (22, 16275000-16325000) - 12 snps\n", ".. window 6 - (22, 16325000-16375000) - 5 snps\n", ".. window 7 - (22, 16350000-16400000) - 5 snps\n", ".. window 8 - (22, 16475000-16525000) - 8 snps\n", ".. window 9 - (22, 16500000-16550000) - 7 snps\n", ".. window 10 - (22, 16550000-16600000) - 5 snps\n", ".. window 11 - (22, 16600000-16650000) - 13 snps\n", ".. window 12 - (22, 16625000-16675000) - 18 snps\n", ".. window 13 - (22, 16650000-16700000) - 10 snps\n", ".. window 14 - (22, 16825000-16875000) - 46 snps\n", ".. window 15 - (22, 16850000-16900000) - 75 snps\n", ".. window 16 - (22, 16875000-16925000) - 68 snps\n", ".. window 17 - (22, 16900000-16950000) - 62 snps\n", ".. window 18 - (22, 16925000-16975000) - 43 snps\n", ".. window 19 - (22, 16950000-17000000) - 22 snps\n", ".. window 20 - (22, 16975000-17025000) - 34 snps\n", ".. window 21 - (22, 17000000-17050000) - 73 snps\n", ".. window 22 - (22, 17025000-17075000) - 80 snps\n", ".. window 23 - (22, 17050000-17100000) - 71 snps\n", ".. window 24 - (22, 17075000-17125000) - 61 snps\n", ".. window 25 - (22, 17100000-17150000) - 64 snps\n", ".. window 26 - (22, 17125000-17175000) - 42 snps\n", ".. window 27 - (22, 17150000-17200000) - 9 snps\n", ".. window 28 - (22, 17175000-17225000) - 24 snps\n", ".. window 29 - (22, 17200000-17250000) - 39 snps\n", ".. window 30 - (22, 17225000-17275000) - 26 snps\n", ".. window 31 - (22, 17250000-17300000) - 55 snps\n", ".. window 32 - (22, 17275000-17325000) - 109 snps\n", ".. window 33 - (22, 17300000-17350000) - 115 snps\n", ".. window 34 - (22, 17325000-17375000) - 92 snps\n", ".. window 35 - (22, 17350000-17400000) - 77 snps\n", ".. window 36 - (22, 17375000-17425000) - 92 snps\n", ".. window 37 - (22, 17400000-17450000) - 97 snps\n", ".. window 38 - (22, 17425000-17475000) - 68 snps\n", ".. window 39 - (22, 17450000-17500000) - 54 snps\n", ".. window 40 - (22, 17475000-17525000) - 61 snps\n", ".. window 41 - (22, 17500000-17550000) - 83 snps\n", ".. window 42 - (22, 17525000-17575000) - 81 snps\n", ".. window 43 - (22, 17550000-17600000) - 72 snps\n", ".. window 44 - (22, 17575000-17625000) - 88 snps\n", ".. window 45 - (22, 17600000-17650000) - 99 snps\n", ".. window 46 - (22, 17625000-17675000) - 158 snps\n", ".. window 47 - (22, 17650000-17700000) - 156 snps\n", ".. window 48 - (22, 17675000-17725000) - 96 snps\n", ".. window 49 - (22, 17700000-17750000) - 89 snps\n", ".. window 50 - (22, 17725000-17775000) - 65 snps\n", ".. window 51 - (22, 17750000-17800000) - 77 snps\n", ".. window 52 - (22, 17775000-17825000) - 111 snps\n", ".. window 53 - (22, 17800000-17850000) - 102 snps\n", ".. window 54 - (22, 17825000-17875000) - 73 snps\n", ".. window 55 - (22, 17850000-17900000) - 69 snps\n", ".. window 56 - (22, 17875000-17925000) - 129 snps\n", ".. window 57 - (22, 17900000-17950000) - 156 snps\n", ".. window 58 - (22, 17925000-17975000) - 120 snps\n", ".. window 59 - (22, 17950000-18000000) - 91 snps\n", ".. window 60 - (22, 17975000-18025000) - 93 snps\n", ".. window 61 - (22, 18000000-18050000) - 88 snps\n", ".. window 62 - (22, 18025000-18075000) - 68 snps\n", ".. window 63 - (22, 18050000-18100000) - 112 snps\n", ".. window 64 - (22, 18075000-18125000) - 113 snps\n", ".. window 65 - (22, 18100000-18150000) - 73 snps\n", ".. window 66 - (22, 18125000-18175000) - 71 snps\n", ".. window 67 - (22, 18150000-18200000) - 58 snps\n", ".. window 68 - (22, 18175000-18225000) - 74 snps\n", ".. window 69 - (22, 18200000-18250000) - 75 snps\n", ".. window 70 - (22, 18225000-18275000) - 56 snps\n", ".. window 71 - (22, 18250000-18300000) - 77 snps\n", ".. window 72 - (22, 18275000-18325000) - 103 snps\n", ".. window 73 - (22, 18300000-18350000) - 88 snps\n", ".. window 74 - (22, 18325000-18375000) - 67 snps\n", ".. window 75 - (22, 18350000-18400000) - 63 snps\n", ".. window 76 - (22, 18375000-18425000) - 65 snps\n", ".. window 77 - (22, 18400000-18450000) - 116 snps\n", ".. window 78 - (22, 18425000-18475000) - 153 snps\n", ".. window 79 - (22, 18450000-18500000) - 124 snps\n", ".. window 80 - (22, 18475000-18525000) - 125 snps\n", ".. window 81 - (22, 18500000-18550000) - 111 snps\n", ".. window 82 - (22, 18525000-18575000) - 65 snps\n", ".. window 83 - (22, 18550000-18600000) - 50 snps\n", ".. window 84 - (22, 18575000-18625000) - 45 snps\n", ".. window 85 - (22, 18600000-18650000) - 80 snps\n", ".. window 86 - (22, 18625000-18675000) - 74 snps\n", ".. window 87 - (22, 18650000-18700000) - 17 snps\n", ".. window 88 - (22, 18675000-18725000) - 11 snps\n", ".. window 89 - (22, 18700000-18750000) - 19 snps\n", ".. window 90 - (22, 18725000-18775000) - 14 snps\n", ".. window 91 - (22, 18750000-18800000) - 11 snps\n", ".. window 92 - (22, 18775000-18825000) - 8 snps\n", ".. window 93 - (22, 18800000-18850000) - 11 snps\n", ".. window 94 - (22, 18825000-18875000) - 14 snps\n", ".. window 95 - (22, 18850000-18900000) - 42 snps\n", ".. window 96 - (22, 18875000-18925000) - 102 snps\n", ".. window 97 - (22, 18900000-18950000) - 101 snps\n", ".. window 98 - (22, 18925000-18975000) - 93 snps\n", ".. window 99 - (22, 18950000-19000000) - 78 snps\n" ] } ], "source": [ "n_wnds = 100 # only hundred windows are considered\n", "LLR = sp.zeros(n_wnds) # vector with test statistics of the n_wnds regions\n", "for wnd_i in range(n_wnds):\n", " wnd_pos = split.wnd_pos[wnd_i]\n", " nSnps = split.nSnps[wnd_i]\n", " idx_wnd_start = split.idx_wnd_start[wnd_i]\n", " print('.. window %d - (%d, %d-%d) - %d snps' % (wnd_i, wnd_pos[0], wnd_pos[1], wnd_pos[2], nSnps))\n", " \n", " Xr = plink_reader.readBED(bfile, useMAFencoding=True, start = idx_wnd_start, nSNPs = nSnps, bim=bim , fam=fam)['snps']\n", "\n", " # multi trait set test fit\n", " RV = mtSet.optimize(Xr)\n", " LLR[wnd_i] = RV['LLR'][0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The returned dictionary from _mtSet.optimize_ contains:\n", "* __Cr__: value of the region-term trait-to-trait covariance\n", "* __Cg__: value of the genetic trait-to-trait covariance\n", "* __Cn__: value of the residual trait-to-trait covariance\n", "* __variances__: n_traits-by-3 matrix of variance explained by the three contributions (region, background, noise) for the traits\n", "* __conv__: bool that indicates convergence of the optimization\n", "* __time__: time elpased for optimizing the parameters\n", "* __NLLAlt__: negative log likelihood of the alternative model\n", "* __LLR__: test statistics\n", "* __LMLgrad__: norm of the gradient of the negative log likelihood dividived by the number of parameters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### P-values\n", "\n", "P values are obtained from a relatively small number of genome-wide permutations, fitting a parametric model to the null distribution. Here we showcase the permutation procedure by considering 10 permutations for the 10 regions analyzed." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "permutation 0\n", "permutation 1\n", "permutation 2\n", "permutation 3\n", "permutation 4\n", "permutation 5\n", "permutation 6\n", "permutation 7\n", "permutation 8\n", "permutation 9\n" ] } ], "source": [ "n_perms = 10\n", "LLR_null = [] # in this list test statistics from permutations will be stored\n", "for perm_i in range(n_perms):\n", " \n", " #1. generate permutation\n", " print 'permutation %d' % perm_i\n", " sp.random.seed(perm_i)\n", " perm_idxs = sp.random.permutation(Y.shape[0])\n", " \n", " #2. scan on the 100 regions\n", " for wnd_i in range(n_wnds):\n", " wnd_pos = split.wnd_pos[wnd_i]\n", " nSnps = split.nSnps[wnd_i]\n", " idx_wnd_start = split.idx_wnd_start[wnd_i]\n", " Xr = plink_reader.readBED(bfile, useMAFencoding=True, start = idx_wnd_start, nSNPs = nSnps, bim=bim , fam=fam)['snps']\n", " Xr = Xr[perm_idxs, :] # permute samples in region term\n", " RV = mtSet.optimize(Xr)\n", " LLR_null.append(RV['LLR'][0])\n", "LLR_null = sp.array(LLR_null)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The parametric fit to the ditribution of the test statistics under the null and the consequent conversion of the observed test statistics in P values is performed by the module `limix.stats.chi2mixture` as shown below." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from limix.stats.chi2mixture import Chi2mixture\n", "c2m = Chi2mixture(tol=4e-3)\n", "c2m.estimate_chi2mixture(LLR_null)\n", "pv = c2m.sf(LLR)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXMAAAENCAYAAAD9koUjAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHGBJREFUeJzt3X2QXNV55/Hf05oBS5qRZMbsCIyNOrEwVlgFK4AXZOM2\n8TKEyEnK3i3Dhs3Y63Ut2ajirVJ2k61yit6t1L5V5I0TV0E58To4lcSpOHbWO5Ug2MVtFy8ugREv\n8oBCoEWwkaassZFmRjLMqM/+0S/cmemeubf7vvXp76eKotV95/Y599x++vS5zznXnHMCAPS3QtYF\nAAD0jmAOAB4gmAOABwjmAOABgjkAeIBgDgAeCBXMzewKMztiZk80/n/azH496cIBAMKxqHnmZlaQ\n9D1J73HOvZxIqQAAkXQzzPJBSS8QyAEgP7oJ5h+V9OdxFwQA0L1IwyxmNizpFUm7nHM/SKxUAIBI\nhiJu/3OSvtMukJsZi7wAQBecc9brPqIOs9yuNYZYnHPe/nfXXXdlXgbqR/0GsX4+1825+PrAoYO5\nmW1S/eLnV2N7dwBALEIPszjnzkq6OMGyAAC6xAzQkEqlUtZFSBT1628+18/nusUp8qShjjsyc3GO\n/wDAIDAzuQwugAIAcohgDgAeIJgDgAcI5shMtVrV5OR+TUzcocnJ/apWq1kXCehbXABFJqrVqvbt\nu1OzsztVKAyrVlvU2Njzmpq6R8ViMeviAanhAij6Wrl8sBXIJalQGNbs7E6VywczLhnQnwjmyMTJ\nk6+2AnlToTCsmZnTGZUI6G8Ec2Ri+/ZtqtUWlz1Xqy1qfHxrRiUC+htj5sgEY+ZAXVxj5gRzZKZa\nrapcPqiZmdMaH9+qcvkAgRwDh2AOAB4gmwUA0EIwBwAPEMwBwAMEcwDwAMEcADxAMAcADxDMAcAD\nBHMA8ADBHAA8QDAHAA+EDuZmttXM/tLMnjWz75rZe5IsGAAgvKEI235W0t845/65mQ1J2pRQmQAA\nEYVaaMvMtkg64pz7yTW2YaEtAIgo7YW2ipJOmdkXzewJM/u8mW3s9c0BAPEIO8wyJGmPpF9zzj1u\nZr8n6bck3RXcqFwutx6XSiWVSqV4SgkAnqhUKqpUKrHvN+wwy7ikR51zP9H493sl/aZz7kOBbRhm\nAYCIUh1mcc7NSHrZzK5oPPWzkqZ7fXMAQDxC32nIzH5a0h9JGpb0oqSPO+dOB16nZw4AEXHbOADw\nALeNAwC0EMwBwAMEcwDwAMEcADxAMAcADxDMAcADBHMA8ADBHAA8QDAHAA8QzJG5arWqycn9mpi4\nQ5OT+1WtVrMuEtB3mM6PTFWrVe3bd6dmZ3eqUBhWrbaosbHnNTV1j4rFYtbFAxLHdH54oVw+2Ark\nklQoDGt2dqfK5YMZlwzoLwRzZOrkyVdbgbypUBjWzMzpDn8BoB2COTK1ffs21WqLy56r1RY1Pr41\noxIB/Ykxc2SKMXMMOtYzhzeq1arK5YOamTmt8fGtKpcPEMgxMAjmAOABslkAAC0EcwDwAMEcADxA\nMEeqmLoPJIMLoEgNaYjAalwARd9h6j6QnKGwG5rZcUmnJdUkLTrnrkuqUPBTfer+xcueq0/dP5VR\niQB/ROmZ1ySVnHPvJpCjG0zdB5ITJZhbxO2BZcrlAxobe74V0Jtj5uXygYxLBvS/KMHZSXrAzB4z\ns08mVSD4q1gsamrqHk1MSLt3n9LEhLj4CcQk9Ji5pL3OuRNmdrHqQf1Z59xDwQ3K5XLrcalUUqlU\niqWQ8EexWNS9934u62IAmalUKqpUKrHvt6vURDO7S9Kcc+4zgedITQSAiFJNTTSzTWY20ni8WdLN\nko72+uYAgHiEHWYZl/Q1M3ONv/lT59z9yRULABAFM0ABIEPMAAUAtBDMAcADBHMA8ADBHAA8QDAH\nAA8QzAHAAwRzAPAAwRwAPEAwBwAPEMwBwAMEcwDwAMEcADxAMAcADxDMAcADBHMA8ADBHAA8QDAH\nAA8QzAHAAwRzAPAAwRwAPEAwBwAPEMwBwAMEcwDwQOhgbmYFM3vCzL6eZIEAANFF6Zl/StJ0UgUB\nAHQvVDA3s8sk3Srpj5ItDgCgG2F75v9T0r+X5BIsCwCgS0PrbWBmPy9pxjn3pJmVJFmnbcvlcutx\nqVRSqVTqvYQA4JFKpaJKpRL7fs25tTvbZvZfJN0haUnSRkmjkr7qnPuVFdu59fYFAFjOzOSc69hJ\nDr2fKAHYzN4v6YBz7hfavEYwB4CI4grm5JkDgAci9czX3BE9cwCIjJ45AKCFYA4AHiCYA4AHCOYA\n4AGCOQB4gGAOAB4gmAOABwjmAOABgjkAeIBgDgAeIJgDgAcI5gDgAYI5AHiAYA4AHiCYA4AHCOYA\n4AGCOQB4gGAOAB4gmAOABwjmAOABgjkAeIBgDgAeIJgDgAeGwmxkZhdK+pakCxp/8xXn3H9KsmAA\ngPDMORduQ7NNzrmzZrZB0sOSft05dzjwugu7LwBAnZnJOWe97if0MItz7mzj4YWq986J3ACQE6GD\nuZkVzOyIpJOSHnDOPZZcsQAAUYQaM5ck51xN0rvNbIukvzazXc656eA25XK59bhUKqlUKsVUTADw\nQ6VSUaVSiX2/ocfMl/2R2W9LWnDOfSbwHGPmABBRqmPmZvYWM9vaeLxR0j+V9Fyvbw4AiEfYYZZL\nJN1rZgXVvwD+wjn3N8kVCwAQRVfDLG13xDALAESWemoiACC/COYA4AGCOQB4gGAOAB4gmKOtarWq\nycn9mpi4Q5OT+1WtVrMuEoA1kM2CVarVqvbtu1OzsztVKAyrVlvU2Njzmpq6R8ViMeviAV4hmwWJ\nKZcPtgK5JBUKw5qd3aly+WDGJQPQCcEcq5w8+WorkDcVCsOamTmdUYkArIdgjlW2b9+mWm1x2XO1\n2qLGx7dmVCIA62HMHKswZg6kJ64xc4I52qpWqyqXD2pm5rTGx7eqXD5AIAcSQDAHAA+QzQIAaCGY\nA4AHCOYA4AGCOQB4gGCOFtZjAfoX2SyQRG45kBWyWRCrXtZjoUcPZI9gDkndr8fS7NEfOiQ988zF\nOnRI2rfvTgI6kDKCOSR1vx4LKywC+UAwhySpXD6gsbHnWwG9OWZeLh9Y8+9YYRHIB4I5JEnFYlFT\nU/doYkLavfuUJiYU6uInKywC+RAqm8XMLpP0JUnjkmqS/tA59/srtiGbZQCRBQP0JtWFtsxsu6Tt\nzrknzWxE0nck/aJz7rnANgTzAcUKi0D3Ml010cz+WtIfOOf+X+A5gjnQZ5pfxCdPvqrt27fxRZyB\nzIK5me2QVJF0lXNuPvA8wRzoIwyR5UNcwXwo4puOSPqKpE8FA3lTuVxuPS6VSiqVSj0WDz6g95c/\n1WpVN998m+bmrm1lI9Vq5/TSS6/q/e//qD7wgetop4RUKhVVKpXY9xu6Z25mQ5KmJP2tc+6zbV6n\nZ45V6P3lT7NNXnrpR9qyZa8kaWnpjM6efVIjI9fTTinLYjr//5I03S6QA50wqSh/mm1itqGVVnru\n3HQrkEu0Uz8KFczNbK+kX5Z0k5kdMbMnzOyWZIsGHzCpKH+abbJx4y7Nzz/aCOiOdupzoYK5c+5h\n59wG59zVzrl3O+f2OOfuS7pw6H9MKsqfZpsMDW3Rpk1Xa2HhO1pcPNX37ZTGgm95XlSOJXCRKMbM\n86ddm4yOPq1CYUinT+/qy3ZK4zxL6j0yzTNvuyOCOTpgUlH+tGsTSX3bTpOT+3XokJYNFdVqi5qY\nkO6993O5fg+COQA0TEzcoWeeuXjV87t3n9J99/1Jrt+Dm1MAQEMa12byfv2HnjmAvseYOcEcgCea\n1wGq1e/rxIlXdOmlb9OOHdt7HvsPzmDevLkgs4IWFs7Hdl2BYA4AK8Tde06jx8+YOYBI8pwjHZe4\nZxz30wzmSAttAehPy3uYF+uppxb1+ON39k0eeScrF3E7fvykCoW3LdumPpP1VFf7PXToURUKN/a8\nvzTQMwcGQD/1MMP+gmh+QR06JD3zzMU6dEg6enS654yT4H4XFi6ItL8sf/0QzIEB0C9r5LQL0Pv2\n3dk2KLb7gioUrtH5849EvjF5p/0uX79m7f1FKXsSCObIlUEY181C3nOkm6L8gmj3BTU8PKZ3vevy\nyDcm77Tf4Po1tdq31txf1r9+GDNHbvg6rpsH5fIBPf746qyMcvmerIu2TD2QLp9l2WmMevv2bXrq\nqcVV0+uLxbe2ptd3c2OUlfsdGtqizZt/Zt1p+1HKngR65siNrHs2PisWi5qauid0jzWrX0hRfkGU\nywc0NvZ8xyGQboc91ttvHGVPAnnm6EoSt4JLY30NrC/LlS6jvvdai7j1sjBWN4vDdXvcmDSUEu5f\nuVpSH/Y0Vr7D+rJuh7hW2Uyrc9DrDNFMbug8aBjDbW+t4ZBePuz9Mq7ru6zHfovFYixfGp3G1OMc\n9lgZI7JcB54x8zUwhtteUmluUcd1kYysx37jEnXsO8p1gua2N9744dzECHrma8i6h5JXSfZ44uqV\noXu+/EJqdg7qQzanGsMencfew/4KD267sHChRkfbdWzSjxH0zNfgSw8lbt1e7Ud/8OkXUrNzcN99\nf6J77/1cxzpE+RW+fFvLTYygZ74GX3oovWh3cWd+fknvfOe4zOYDF3r688OO9gbtF1KUX+HBbZsz\nREdGrs88RngdzHvNRInyM81HwZ+TtdqFWlg4otHRG/ryhr/AWqIMHQa3bc4QnZ9/TBdd5PSBD1yX\nWYzwNjUx6VzZQUhZDKaozc19W5s3/0yq6WqDcIyRD1HiRdyxJdU8czP7gqR9kmacc7s7bJOrYJ5k\nrmyWkyrSFMzTnZt7VKOj16/aJqkJPYNyjJGtbnPE48qFl9LPM/+ipD+Q9KVe3zAtSWaiJJVnnTfL\nf3rWL/QkmbMbNCjHGJ0l/cuslxzxPF5TCJXN4px7SNKPEi5LrJLMROmX5UR7Fcxa2bhxl+bmelta\nNIpBOcZoL43lZH2bR+JtamKS6XODkrIYTFHbs+d1TUxcpb1751NJVxuUY4z20gi0vnUYYs1mKZfL\nrcelUkmlUinO3UeSZCbKIKUsZvVzcpCOMVZLY8JeGtP926lUKqpUKrHvN3Q2i5ldLun/9MsF0KTF\neQEE7XGMB1cai33l5SJ76qsmmtkO1YP5P+7w+kAFc7RHOiHikFagzUOHIe3UxD+TVJI0JmlG0l3O\nuS+u2Gbgg/mgB7K89HTghzwE2jSwnnnOEMiW/zReWjqjc+em5dx5XXKJ6f77vzwwxwFYaa2OHsE8\nZ7Je0D8PmpOMlpbO6OzZJ1etVzFIX2xA03odvbiCeV+kJvbDHdv7Nc0pzmPbTCc8d266Fcil/s/f\n7UbwuH74w7+ij3zkY7k+f5PUD5/fJKS95nnug3k3kweyOHn6MS867okZzdx+58735RdbXILH9ciR\nC3Xo0FE9/PBIYpNf8iyNyT95FKz3q69emMrnIdZgnkQQjTp5IKuTpx/X+I57YkYzt/+SS/KzxnMW\ngsf13Lnp1kqT0uD9SvFtlmVYWax5HmswTyKIRh2+yOrk6ccF/ZMYGioWi7r//i/33RdbnJYfV5dK\nryyvQxn9OvzYq2C9m2ueJ/15iDWYJxFEVw5fLC2d0Zkzj2h6+ljbkzbLkyfsXU3yIqmhoX78YovT\n8uOafK9srV+jWQf5fhx+jEOw3sE1z4eGHk7s8xBrNssll/y71r+bqWmbN7+uiYnru84RXX6DhHPr\n3iAhL1kl/ZBzTjplMsKes5JiOUc6nfM33HBGx47NZNq+g3qORal3LlMTx8d/rZVjHGdqWjMwPvjg\nw1pc3LtmoI7j5Ok1EPfTCTwoEzPSFjyumzbZqnWyJcV2jgTXnQ/asKGy7uclSc1j8OKL39PJkyd0\n6aVv044d2wfmHAv72cplMN+16+bGHau/k8hdaTqdtCtvkNBLgIojEOfl1wHSF7YjEOc50mlfF1xw\nWEtLe1dtn9QNRYL6qUOTtVzmmTfHSTdvfj2Rceuw42+9jF3HcQF1UC/65FGaY8ZRMqniPEc6ZVJd\ne+2uzMar405EiLsds76WkIRYg3kziE5MXJ/ISZRG+l8cH7JBveiTlrAfxLTTVKMEsLjOkeYvgYsu\n2qjR0ce0c+fLrQtsv/u7d2WWVRTnl1Xc7ehr7nsik4aSCrppZEnE8SHrx5zzvGsG8Pe975d03XUf\nCfVBTDtNNUoAi+McCQalF14oam7uWp06Ndca2skyqyjODk3c7ehr7ntia7P064W1uMb6+rX+eRRs\nkyjXY8JeY4lLmHHwbm8g3O37ZSXOMfO42zHt82I9ad/QObI83vA0jLjuUNSv9c+j5T2pTpNw3rgD\nTTNgTk9/V7Xa6myO5G5CvfbdkaLeQHi9i6lp3I2nW3He6WvlHYHq2XJHNT3tNDm5P/KXYFZ3GEoa\nqyamIA8553koQ7eCPam5uW+3eubtltmV1DHHe3FxVrXa47rqql2Jpcit9YssSk86TM92rfzy0dEt\nsbd1VudQ1LkmUfaXh0ybXKYmEsxXy8OJk4cy9GLlOulnzz6pjRt/SufOfXfVXIYrr7xEDz88sqoX\nNzr6Yy0tbdCGDd0HgV6t9fP+7rv/87JAOT8/v6weUrg5FaOjT6tQGNLp07tirWfS59B6XxRR5ppE\neb88DIMSzPtE3OOaYXpHK8dljxw5qnPnbsjl2GoYKwPJ4uKszp79pkZHPxQ6tzrrCTRStJmar732\nTW3c+MFV+1hvTkWYL4E4yx7H8YvyRZG38e445DLPHKulnaLVbvnVH/xgqK/z3ldmZdx66xZdc821\nbevk3GLbLAqz4cyPQacMFrPCquyKxcVNXc2pmJ9fir2e1WpV3/jG4Vj22y6tNIuUTh8RzBOWdopW\nu+VXzTb0/QdgZdDasWN72wXYlpZqOn/+kZ4n0CQxqaRTqmC7ALxp01V6/fWH2qYurlW2uINds3Pw\nwx/2vmBYp87I8eMnU03p9BXBPGFxnnxhevntll9NawnONAWP69LSGS0sHNHIyLUyu0nSVVpcfLDr\nCTRJTippNzu5XQAuFDaqVNq9KvBLay81HXewa3YONm26atk5tLg4q8XFB3X8+MnQX3adOiOvvPJy\n6C+KQV+Rcy2MmacgrostYcYtg9t0yvwYG3td11xztebnl/ousyUo6kWx5vbV6vd14sQruvTSt2ls\nbJPMCpqfX2rlfR8+/NSy/QUvoo6MvCn2BaOijBlHyWWP4+JecIy6eQ7Vaj+W2WsaGbkp0sXQTuPd\n73jHcc3Onu3bC/S94gLoAGp3IXBlqp3UOTUvyWyHLEW5KNYpzS34eGHhcY2OXi9J62bPxJ3NsV4A\nzsNEqDNnHtHIyLXrfnmG2Vfz78rlA7nJLklb6sHczG6R9HuqD818wTn331e8HimYx5Gz2s+5090K\n9i6fffalVqpdMLA3e5sLC+dXLb+aVLZDlqJkWnT65bLe45UzT5u99Ysucrrmmitbvftez8NO53Tc\nqXlRyrPyV8PKTJvgvQtuuOGnOh6LMJ0R3z+/7aQazM2sIOnvJP2spFckPSbpNufcc4FtQgfzuNYc\nTzN3ulKpqFQqxb7fbrXLvQ7Ta+zUs7vssqM6fPiBtIofqzDnQrP9lk9AerTVAw8+Dh7PZi+90+tx\nTGJZrx533/1p/eqv/s6ak2Y+/emP6fbbb4/rkK4qV6f0x6jHolNnZK3jFtdnL6+dv7RTE6+T9Lxz\n7iXn3KKkL0v6xW7fNI6FbtJeLKdSqSSy324FL3SeOzfdCuRSd6ldr73WH2mK7YS5KNZsv863dHvj\ncfA2X2anGs+/8XrweMd5w+ZO5/QnPvEbreeHhrZo8+Z3r7oF2bFjxyK/X1grL9oGLyZHPRbNfRWL\nb20F8k7bNsXx2fN1pcSgsMH8rZJeDvz7e43nuhJH7vWgrxm+PCiFv2lwp2yHUun6pIucqLBr2Afr\nv3HjLs3NPbLqsVTPJrn88m2amvqixsae15vetDOQzRE83vHdsLnTOT03tzx1cWhoi7ZsuUG7dr0z\nk3vNBr88l9+7IPyxSPvz6+tKiUGZpCbGkQs76JMHlgfl8DnAnXqxb37zm1MqebaC9d+z53VNTFyl\nvXvnlz0OHpcbb3yfpqbu0a23btGePW/X6OhjGh0926F3X9ftedjpnB4dHcrdud7+3gXhj0Xan99B\n6PyFHTP/J5LKzrlbGv/+LUkueBHUzEhlAYAupHkBdIOkY6pfAD0h6bCk251zz/ZaAABA70KtZ+6c\nO29m+yXdrzdSEwnkAJATsU0aAgBkZ90LoGb2BTObMbOn19imZGZHzOyomX0j8PwtZvacmf2dmf1m\nXIWOU4/1O25mTzVeO5xOiaNZr35m9huN8j9hZs+Y2ZKZbWu81vftt079fGi/LWb2dTN7slG/jwVe\ny3X79Vg3H9pum5l9tVGPb5vZrsBr0dvOObfmf5LeK+lqSU93eH2rpO9Kemvj329p/L8g6e8lXS5p\nWNKTkq5c7/3S/q/b+jUevyjpzVnXoZf6rdh2n6T/61P7daqfL+0n6T9K+q+Nx2+RNKv68Gnu26/b\nunnUdv9D0m83Hr+z18/euj1z59xDkn60xib/QtJfOee+39i+eQPCWCcaJaWH+kmSKecrT4aoX9Dt\nkv688diX9gsK1k/yo/2cpNHG41FJs865JfVB+/VQN8mPttsl6cHGtsck7TCzi9Vl28VxMK6QdJGZ\nfcPMHjOzf9l4PtaJRhnqVD+pfrI90Hj+kxmVLxZmtlHSLZL+qvGUL+0nqW39JD/a73OSdpnZK5Ke\nkvSpxvM+tF+nukl+tN1Tkj4sSWZ2naS3S7pMXbZdqGyWEPvYI+kmSZslPWpmj8aw37xoWz/n3N9L\n2uucO9H4Nn3AzJ5tfBv3ow9Jesg592rWBUlIu/r50H4Tko44524ys59UvR67sy5UTNrWzTk3Lz/a\n7r9J+qyZPSHpGUlHJJ3vdmdx9My/J+mQc+7HzrlZSd+S9NOSvq/6N03TZY3n+k2n+sk5d6Lx/x9I\n+prqP4/61W1aPgThS/s1rayfL+33cUlflSTn3AuSqpKulB/t16luXrSdc27OOfevnHN7nHOTkv6R\n6tcCumq7sMHcGv+1878lvdfMNpjZJknvkfSs6isrvsPMLjezC1T/MH095PulLXL9zGyTmY1Ikplt\nlnSzpKOplDa6teonM9sq6f2q17XJl/ZrWz+P2u8lSR+UJDMbV31Y8EX1T/tFrpsvbWdmW81suPH4\nk5K+2fjV0VXbrTvMYmZ/JqkkaczM/kHSXZIuUH06/+edc8+Z2SFJT6v+E+Hzzrnpxt/mfqJRt/Uz\ns6Kkr1l9GYMhSX/qnLs/m1p0tl79Gpv9kuq/Ps41/871yUSxbusnaVx+tN/vSPrjQPrbf3DO/bDx\nt7luv27r5tFn712S7jWzmuoZc5+Quv/sMWkIADyQ69QeAEA4BHMA8ADBHAA8QDAHAA8QzAGgC+st\npLVi28/YGwu+HTOzH8ZeHrJZACA6M3uvpHlJX3LOhZ5120g7vNo596/jLA89cwDoQruFtMzsJ8zs\nbxtrxnzTzK5o86crF3yLRRxrswAA6j4v6d84515oLJ51t+q325QkmdnbJe1QY7XEOBHMASAGjaUF\nbpD0l2bWnMI/vGKz2yR9xSUwvk0wB4B4FCT9yDm3Z41tbpP0b5N6cwBAd1oLaTnn5iRVzeyftV4M\nLEdsZldK2uac+3YSBSGYA0AXGgtpPSLpCjP7BzP7uKRflvQJq9+39KikXwj8yUdVv2tQMuUhNREA\n+h89cwDwAMEcADxAMAcADxDMAcADBHMA8ADBHAA8QDAHAA8QzAHAA/8fprD2VZQohBUAAAAASUVO\nRK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#makes a manhattan plot\n", "wnd_chrom = split.wnd_pos[:n_wnds,0]\n", "wnd_start = split.wnd_pos[:n_wnds,1]\n", "wnd_end = split.wnd_pos[:n_wnds,2]\n", "import pylab as pl\n", "pl.plot(wnd_start, -sp.log10(pv), 'o', color='MidnightBlue')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Developers\n", "\n", "- Francesco Paolo Casale ()\n", "- Barbara Rakitsch ()\n", "- Danilo Horta ()\n", "- Oliver Stegle ()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# References\n", "\n", "[1] Genomes Project, C. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56-65 (2012)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "application/javascript": [ "$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%javascript\n", "$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [conda root]", "language": "python", "name": "conda-root-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" }, "nav_menu": {}, "toc": { "navigate_menu": true, "number_sections": true, "sideBar": true, "threshold": 6, "toc_cell": false, "toc_section_display": "block", "toc_window_display": true }, "toc_position": { "height": "1211px", "left": "0px", "right": "auto", "top": "106px", "width": "215px" } }, "nbformat": 4, "nbformat_minor": 0 }