{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Resources\n", "\n", "### New installs\n", "/projects/ps-jcvi2/schork-lab/tools/plink-1.07-x86_64/plink --noweb\n", "/projects/ps-jcvi2/schork-lab/tools/admixture_linux-1.3.0/admixture\n", "\n", "### Data sets\n", "/projects/ps-yeolab/biom262_2016/data\n", "\n", "### New environment settings\n", "```export PATH=\"/projects/ps-jcvi2/schork-lab/tools/plink-1.07-x86_64:$PATH\"``` \n", "```export PATH=\"/projects/ps-yeolab/biom262_2016/tools/admixture:$PATH\"```\n", "\n", "\n", "### Pulled data from HAPMAP database\n", "* data on CEU, CHB, and YRI\n", "* extract 3 files with different numbers of SNPs\n", "* clean them\n", "* determine IBS sharing distance #IBS vs. IBD?\n", "* use MDS (PCA) to plot distances in 2D\n", "* color them by population\n", "* inspect for clustering\n", "* use GAMOVA for determining PVE\n", "\n", "* * *\n", "\n", "# Introduce PLINK" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "--2016-02-23 15:19:09-- http://pngu.mgh.harvard.edu/~purcell/plink/dist/plink-1.07-x86_64.zip\n", "Resolving pngu.mgh.harvard.edu... 155.52.206.11\n", "Connecting to pngu.mgh.harvard.edu|155.52.206.11|:80... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 3971138 (3.8M) [application/zip]\n", "Saving to: ‘plink-1.07-x86_64.zip’\n", "\n", " 0K .......... .......... .......... .......... .......... 1% 110K 35s\n", " 50K .......... .......... .......... .......... .......... 2% 521K 21s\n", " 100K .......... .......... .......... .......... .......... 3% 522K 16s\n", " 150K .......... .......... .......... .......... .......... 5% 558K 14s\n", " 200K .......... .......... .......... .......... .......... 6% 512K 12s\n", " 250K .......... .......... .......... .......... .......... 7% 28.1M 10s\n", " 300K .......... .......... .......... .......... .......... 9% 510K 9s\n", " 350K .......... .......... .......... .......... .......... 10% 8.26M 8s\n", " 400K .......... .......... .......... .......... .......... 11% 527K 8s\n", " 450K .......... .......... .......... .......... .......... 12% 6.15M 7s\n", " 500K .......... .......... .......... .......... .......... 14% 33.8M 6s\n", " 550K .......... .......... .......... .......... .......... 15% 526K 6s\n", " 600K .......... .......... .......... .......... .......... 16% 8.27M 6s\n", " 650K .......... .......... .......... .......... .......... 18% 9.21M 5s\n", " 700K .......... .......... .......... .......... .......... 19% 530K 5s\n", " 750K .......... .......... .......... .......... .......... 20% 624K 5s\n", " 800K .......... .......... .......... .......... .......... 21% 3.74M 5s\n", " 850K .......... .......... .......... .......... .......... 23% 618K 5s\n", " 900K .......... .......... .......... .......... .......... 24% 3.04M 4s\n", " 950K .......... .......... .......... .......... .......... 25% 11.5M 4s\n", " 1000K .......... .......... .......... .......... .......... 27% 653K 4s\n", " 1050K .......... .......... .......... .......... .......... 28% 2.73M 4s\n", " 1100K .......... .......... .......... .......... .......... 29% 10.8M 4s\n", " 1150K .......... .......... .......... .......... .......... 30% 675K 4s\n", " 1200K .......... .......... .......... .......... .......... 32% 2.32M 3s\n", " 1250K .......... .......... .......... .......... .......... 33% 13.9M 3s\n", " 1300K .......... .......... .......... .......... .......... 34% 10.3M 3s\n", " 1350K .......... .......... .......... .......... .......... 36% 601K 3s\n", " 1400K .......... .......... .......... .......... .......... 37% 4.03M 3s\n", " 1450K .......... .......... .......... .......... .......... 38% 12.5M 3s\n", " 1500K .......... .......... .......... .......... .......... 39% 11.4M 3s\n", " 1550K .......... .......... .......... .......... .......... 41% 535K 3s\n", " 1600K .......... .......... .......... .......... .......... 42% 12.5M 2s\n", " 1650K .......... .......... .......... .......... .......... 43% 553K 2s\n", " 1700K .......... .......... .......... .......... .......... 45% 12.4M 2s\n", " 1750K .......... .......... .......... .......... .......... 46% 10.7M 2s\n", " 1800K .......... .......... .......... .......... .......... 47% 10.1M 2s\n", " 1850K .......... .......... .......... .......... .......... 48% 11.6M 2s\n", " 1900K .......... .......... .......... .......... .......... 50% 551K 2s\n", " 1950K .......... .......... .......... .......... .......... 51% 12.0M 2s\n", " 2000K .......... .......... .......... .......... .......... 52% 10.4M 2s\n", " 2050K .......... .......... .......... .......... .......... 54% 603K 2s\n", " 2100K .......... .......... .......... .......... .......... 55% 4.48M 2s\n", " 2150K .......... .......... .......... .......... .......... 56% 9.36M 2s\n", " 2200K .......... .......... .......... .......... .......... 58% 11.3M 2s\n", " 2250K .......... .......... .......... .......... .......... 59% 679K 2s\n", " 2300K .......... .......... .......... .......... .......... 60% 9.51M 1s\n", " 2350K .......... .......... .......... .......... .......... 61% 2.42M 1s\n", " 2400K .......... .......... .......... .......... .......... 63% 13.7M 1s\n", " 2450K .......... .......... .......... .......... .......... 64% 10.2M 1s\n", " 2500K .......... .......... .......... .......... .......... 65% 11.7M 1s\n", " 2550K .......... .......... .......... .......... .......... 67% 781K 1s\n", " 2600K .......... .......... .......... .......... .......... 68% 1.59M 1s\n", " 2650K .......... .......... .......... .......... .......... 69% 10.1M 1s\n", " 2700K .......... .......... .......... .......... .......... 70% 8.53M 1s\n", " 2750K .......... .......... .......... .......... .......... 72% 566K 1s\n", " 2800K .......... .......... .......... .......... .......... 73% 12.0M 1s\n", " 2850K .......... .......... .......... .......... .......... 74% 10.5M 1s\n", " 2900K .......... .......... .......... .......... .......... 76% 13.2M 1s\n", " 2950K .......... .......... .......... .......... .......... 77% 550K 1s\n", " 3000K .......... .......... .......... .......... .......... 78% 11.8M 1s\n", " 3050K .......... .......... .......... .......... .......... 79% 11.7M 1s\n", " 3100K .......... .......... .......... .......... .......... 81% 9.25M 1s\n", " 3150K .......... .......... .......... .......... .......... 82% 13.2M 1s\n", " 3200K .......... .......... .......... .......... .......... 83% 523K 1s\n", " 3250K .......... .......... .......... .......... .......... 85% 9.19M 0s\n", " 3300K .......... .......... .......... .......... .......... 86% 12.7M 0s\n", " 3350K .......... .......... .......... .......... .......... 87% 7.75M 0s\n", " 3400K .......... .......... .......... .......... .......... 88% 13.9M 0s\n", " 3450K .......... .......... .......... .......... .......... 90% 14.5M 0s\n", " 3500K .......... .......... .......... .......... .......... 91% 520K 0s\n", " 3550K .......... .......... .......... .......... .......... 92% 11.1M 0s\n", " 3600K .......... .......... .......... .......... .......... 94% 546K 0s\n", " 3650K .......... .......... .......... .......... .......... 95% 8.92M 0s\n", " 3700K .......... .......... .......... .......... .......... 96% 565K 0s\n", " 3750K .......... .......... .......... .......... .......... 97% 15.0M 0s\n", " 3800K .......... .......... .......... .......... .......... 99% 10.2M 0s\n", " 3850K .......... .......... ........ 100% 8.08M=3.0s\n", "\n", "2016-02-23 15:19:12 (1.26 MB/s) - ‘plink-1.07-x86_64.zip’ saved [3971138/3971138]\n", "\n" ] } ], "source": [ "%%bash\n", "wget \"http://pngu.mgh.harvard.edu/~purcell/plink/dist/plink-1.07-x86_64.zip\"" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Archive: plink-1.07-x86_64.zip\n", " creating: plink-1.07-x86_64/\n", " inflating: plink-1.07-x86_64/plink \n", " inflating: plink-1.07-x86_64/COPYING.txt \n", " inflating: plink-1.07-x86_64/test.map \n", " inflating: plink-1.07-x86_64/test.ped \n", " inflating: plink-1.07-x86_64/gPLINK.jar \n", " inflating: plink-1.07-x86_64/README.txt \n" ] } ], "source": [ "%%bash\n", "\n", "unzip -o plink-1.07-x86_64.zip" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "bash: line 6: /Users/apple/Desktop/BIOM262/plink-1.07-x86_64/plink: cannot execute binary file\n" ] } ], "source": [ "%%bash\n", "\n", "#inspect files\n", "#extract random 0.1% of SNPs\n", "\n", "export pwd=`pwd`\n", "$pwd/plink-1.07-x86_64/plink --noweb --bfile hapmap3 --thin 0.00001 --recode --out hapmap3_thin" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "bash: line 3: /plink-1.07-x86_64/plink: No such file or directory\n" ] } ], "source": [ "%%bash\n", "\n", "#extract SNPs on chr 21 and recode\n", "export pwd=`pwd`\n", "$pwd/plink-1.07-x86_64/plink --noweb --bfile hapmap3 --chr 21 --make-bed --out hapmap3_chr21" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "bash: line 3: /plink-1.07-x86_64/plink: No such file or directory\n", "bash: line 4: /plink-1.07-x86_64/plink: No such file or directory\n", "bash: line 5: /plink-1.07-x86_64/plink: No such file or directory\n", "bash: line 6: /plink-1.07-x86_64/plink: No such file or directory\n", "bash: line 7: /plink-1.07-x86_64/plink: No such file or directory\n", "bash: line 8: /plink-1.07-x86_64/plink: No such file or directory\n" ] } ], "source": [ "%%bash\n", "\n", "#prune for LD\n", "export pwd=`pwd`\n", "$pwd/plink-1.07-x86_64/plink --noweb --file hapmap3_thin --indep-pairwise 50 5 0.1 \n", "$pwd/plink-1.07-x86_64/plink --noweb --file hapmap3_thin --extract plink.prune.in --make-bed --out hapmap3_thin_pruned \n", "$pwd/plink-1.07-x86_64/plink --noweb --bfile hapmap3_chr21 --indep-pairwise 50 5 0.1 \n", "$pwd/plink-1.07-x86_64/plink --noweb --bfile hapmap3_chr21 --extract plink.prune.in --make-bed --out hapmap3_chr21_pruned \n", "$pwd/plink-1.07-x86_64/plink --noweb --bfile hapmap3/hapmap3 --indep-pairwise 50 5 0.1 \n", "$pwd/plink-1.07-x86_64/plink --noweb --bfile hapmap3/hapmap3 --extract plink.prune.in --make-bed --out hapmap3_pruned \n", "#why prune? (ask while running...)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "bash: line 3: /plink-1.07-x86_64/plink: No such file or directory\n", "bash: line 4: /plink-1.07-x86_64/plink: No such file or directory\n", "bash: line 5: /plink-1.07-x86_64/plink: No such file or directory\n" ] } ], "source": [ "%%bash\n", "\n", "#create IBS matrix\n", "export pwd=`pwd`\n", "$pwd/plink-1.07-x86_64/plink --noweb --bfile hapmap3_thin_pruned --cluster --distance-matrix --out hapmap3_thin_pruned \n", "$pwd/plink-1.07-x86_64/plink --noweb --bfile hapmap3_chr21_pruned --cluster --distance-matrix --out hapmap3_chr21_pruned \n", "$pwd/plink-1.07-x86_64/plink --noweb --bfile hapmap3_pruned --cluster --distance-matrix --out hapmap3_pruned " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" } }, "nbformat": 4, "nbformat_minor": 0 }