{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from __future__ import print_function, division" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# `PyPlink`\n", "\n", "`PyPlink` is a Python module to read and write binary Plink files. Here are small examples for `PyPlink`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from pyplink import PyPlink" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table of contents\n", "\n", "* [**Reading binary pedfile**](#Reading-binary-pedfile)\n", " * [Getting the demo data](#Getting-the-demo-data)\n", " * [Reading the binary file](#Reading-the-binary-file)\n", " * [Getting dataset information](#Getting-dataset-information)\n", " * [Iterating over all markers](#Iterating-over-all-markers)\n", " * [*Additive format*](#iterating_over_all_additive)\n", " * [*Nucleotide format*](#iterating_over_all_nuc)\n", " * [Iterating over selected markers](#Iterating-over-selected-markers)\n", " * [*Additive format*](#iterating_over_selected_additive)\n", " * [*Nucleotide format*](#iterating_over_selected_nuc)\n", " * [Extracting a single marker](#Extracting-a-single-marker)\n", " * [*Additive format*](#extracting_additive)\n", " * [*Nucleotide format*](#extracting_nuc)\n", " * [Misc example](#Misc-example)\n", " * [*Extracting a subset of markers and samples*](#Extracting-a-subset-of-markers-and-samples)\n", " * [*Counting the allele frequency of markers*](#Counting-the-allele-frequency-of-markers)\n", "\n", "\n", "* [**Writing binary pedfile**](#Writing-binary-pedfile)\n", " * [SNP-major format](#SNP-major-format)\n", " * [INDIVIDUAL-major-format](#INDIVIDUAL-major-format)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading binary pedfile\n", "\n", "### Getting the demo data\n", "\n", "The [`Plink`](http://pngu.mgh.harvard.edu/~purcell/plink/) softwares provides a testing dataset on the [resources page](http://pngu.mgh.harvard.edu/~purcell/plink/res.shtml). It contains the 270 samples from the HapMap project (release 23) on build GRCh36/hg18." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import zipfile\n", "try:\n", " from urllib.request import urlretrieve\n", "except ImportError:\n", " from urllib import urlretrieve\n", "\n", "# Downloading the demo data from Plink webset\n", "urlretrieve(\n", " \"http://pngu.mgh.harvard.edu/~purcell/plink/dist/hapmap_r23a.zip\",\n", " \"hapmap_r23a.zip\",\n", ")\n", "\n", "# Extracting the archive content\n", "with zipfile.ZipFile(\"hapmap_r23a.zip\", \"r\") as z:\n", " z.extractall(\".\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reading the binary file\n", "\n", "To read a binary file, `PyPlink` only requires the prefix of the files." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "pedfile = PyPlink(\"hapmap_r23a\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Getting dataset information" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "270 samples and 4,098,136 markers\n" ] } ], "source": [ "print(\"{:,d} samples and {:,d} markers\".format(\n", " pedfile.get_nb_samples(),\n", " pedfile.get_nb_markers(),\n", "))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fidiidfathermothergenderstatus
0NA18524NA18524001-9
1NA18526NA18526002-9
2NA18529NA18529002-9
3NA18532NA18532002-9
4NA18537NA18537002-9
\n", "
" ], "text/plain": [ " fid iid father mother gender status\n", "0 NA18524 NA18524 0 0 1 -9\n", "1 NA18526 NA18526 0 0 2 -9\n", "2 NA18529 NA18529 0 0 2 -9\n", "3 NA18532 NA18532 0 0 2 -9\n", "4 NA18537 NA18537 0 0 2 -9" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_samples = pedfile.get_fam()\n", "all_samples.head()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
chromposcma1a2
snp
rs1039974914516200C
rs294942014525700T
rs40303031724340AG
rs403030017251500C
rs385595217768900A
\n", "
" ], "text/plain": [ " chrom pos cm a1 a2\n", "snp \n", "rs10399749 1 45162 0 0 C\n", "rs2949420 1 45257 0 0 T\n", "rs4030303 1 72434 0 A G\n", "rs4030300 1 72515 0 0 C\n", "rs3855952 1 77689 0 0 A" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_markers = pedfile.get_bim()\n", "all_markers.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Iterating over all markers\n", "\n", "\n", "#### Additive format\n", "\n", "Cycling through genotypes as `-1`, `0`, `1` and `2` values, where `-1` is unknown, `0` is homozygous (major allele), `1` is heterozygous, and `2` is homozygous (minor allele)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "rs10399749\n", "[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0\n", " 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 -1\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 -1 0 0 0\n", " 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 -1 0 0 0 0 0 0 0\n", " -1 0 -1 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 -1 0 -1 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]\n" ] } ], "source": [ "for marker_id, genotypes in pedfile:\n", " print(marker_id)\n", " print(genotypes)\n", " break" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "rs10399749\n", "[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0\n", " 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 -1\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 -1 0 0 0\n", " 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 -1 0 0 0 0 0 0 0\n", " -1 0 -1 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 -1 0 -1 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]\n" ] } ], "source": [ "for marker_id, genotypes in pedfile.iter_geno():\n", " print(marker_id)\n", " print(genotypes)\n", " break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "#### Nucleotide format\n", "\n", "Cycling through genotypes as `A`, `C`, `G` and `T` values (where `00` is unknown)." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "rs10399749\n", "['CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC'\n", " 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC'\n", " 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' '00' 'CC' 'CC'\n", " 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' '00' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC'\n", " 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC'\n", " 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC'\n", " 'CC' '00' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' '00' 'CC' 'CC' 'CC' 'CC' 'CC'\n", " 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC'\n", " 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' '00' 'CC' 'CC'\n", " 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC'\n", " 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' '00'\n", " 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' '00' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' '00' 'CC'\n", " 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC'\n", " 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC'\n", " 'CC' 'CC' 'CC' 'CC' 'CC' '00' 'CC' '00' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC'\n", " '00' 'CC' '00' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' '00' 'CC' 'CC'\n", " 'CC' 'CC' 'CC' '00' 'CC' '00' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC'\n", " 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC' 'CC']\n" ] } ], "source": [ "for marker_id, genotypes in pedfile.iter_acgt_geno():\n", " print(marker_id)\n", " print(genotypes)\n", " break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Iterating over selected markers\n", "\n", "\n", "#### Additive format\n", "\n", "Cycling through genotypes as `-1`, `0`, `1` and `2` values, where `-1` is unknown, `0` is homozygous (major allele), `1` is heterozygous, and `2` is homozygous (minor allele)." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "rs7092431\n", "[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]\n", "\n", "rs9943770\n", "[ 0 0 0 2 0 2 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0\n", " 1 0 1 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1\n", " 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0\n", " 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0\n", " 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 1 -1 0 0 0 1 2 1 1 0 1 1 2 0 0 0 1 0 0 0 0 1 1 0 2\n", " 1 1 1 0 1 1 -1 1 0 0 1 0 0 2 0 1 2 0 1 1 1 2 0 1 0\n", " 0 0 0 0 0 0 2 1 2 1 1 2 1 1 1 1 0 1 1 2 1 0 -1 0 1\n", " 1 1 0 1 0 1 -1 2 1 0 0 0 1 1 1 2 0 0 1 1]\n", "\n", "rs1587483\n", "[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 1 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]\n", "\n" ] } ], "source": [ "markers = [\"rs7092431\", \"rs9943770\", \"rs1587483\"]\n", "for marker_id, genotypes in pedfile.iter_geno_marker(markers):\n", " print(marker_id)\n", " print(genotypes, end=\"\\n\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "#### Nucleotide format\n", "\n", "Cycling through genotypes as `A`, `C`, `G` and `T` values (where `00` is unknown)." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "rs7092431\n", "['GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' '00'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG']\n", "\n", "rs9943770\n", "['AA' 'AA' 'AA' 'GG' 'AA' 'GG' 'AA' 'GA' 'GA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA'\n", " 'AA' 'GA' 'AA' 'AA' 'GA' 'AA' 'AA' 'GA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA'\n", " 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'GA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA'\n", " 'GA' 'AA' 'GA' 'AA' 'AA' 'GA' 'AA' 'GA' 'AA' 'AA' 'GA' 'GA' 'GA' 'AA' 'AA'\n", " 'AA' 'AA' 'AA' 'AA' 'AA' 'GA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'GA'\n", " 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'GA' 'AA' 'AA' 'GA' 'AA' 'AA' 'AA'\n", " 'AA' 'GA' 'GA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA'\n", " 'AA' 'AA' 'AA' 'AA' 'AA' 'GA' 'AA' 'AA' 'GA' 'AA' 'AA' 'AA' 'GA' 'AA' 'AA'\n", " 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'GA' 'GA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA'\n", " 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'GA' 'AA' 'AA' 'GA' 'AA' 'AA' 'AA' 'AA'\n", " 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'GA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA'\n", " 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'GA' '00' 'AA' 'AA' 'AA'\n", " 'GA' 'GG' 'GA' 'GA' 'AA' 'GA' 'GA' 'GG' 'AA' 'AA' 'AA' 'GA' 'AA' 'AA' 'AA'\n", " 'AA' 'GA' 'GA' 'AA' 'GG' 'GA' 'GA' 'GA' 'AA' 'GA' 'GA' '00' 'GA' 'AA' 'AA'\n", " 'GA' 'AA' 'AA' 'GG' 'AA' 'GA' 'GG' 'AA' 'GA' 'GA' 'GA' 'GG' 'AA' 'GA' 'AA'\n", " 'AA' 'AA' 'AA' 'AA' 'AA' 'AA' 'GG' 'GA' 'GG' 'GA' 'GA' 'GG' 'GA' 'GA' 'GA'\n", " 'GA' 'AA' 'GA' 'GA' 'GG' 'GA' 'AA' '00' 'AA' 'GA' 'GA' 'GA' 'AA' 'GA' 'AA'\n", " 'GA' '00' 'GG' 'GA' 'AA' 'AA' 'AA' 'GA' 'GA' 'GA' 'GG' 'AA' 'AA' 'GA' 'GA']\n", "\n", "rs1587483\n", "['GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' '00' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' '00' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'CG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' '00' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG'\n", " 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG' 'GG']\n", "\n" ] } ], "source": [ "markers = [\"rs7092431\", \"rs9943770\", \"rs1587483\"]\n", "for marker_id, genotypes in pedfile.iter_acgt_geno_marker(markers):\n", " print(marker_id)\n", " print(genotypes, end=\"\\n\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Extracting a single marker\n", "\n", "\n", "#### Additive format\n", "\n", "Cycling through genotypes as `-1`, `0`, `1` and `2` values, where `-1` is unknown, `0` is homozygous (major allele), `1` is heterozygous, and `2` is homozygous (minor allele)." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([ 0, 0, 0, 0, 0, -1, 0, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, -1,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0,\n", " 0, 0, 0, 0, 0, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n", " -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n", " -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n", " -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n", " -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n", " -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n", " -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n", " -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n", " -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n", " -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n", " -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1], dtype=int8)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pedfile.get_geno_marker(\"rs7619974\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "#### Nucleotide format\n", "\n", "Cycling through genotypes as `A`, `C`, `G` and `T` values (where `00` is unknown)." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array(['CC', 'CC', 'CC', 'CC', 'CC', '00', 'CC', 'CC', '00', '00', 'CC',\n", " 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC',\n", " 'CC', 'CC', '00', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC',\n", " '00', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC',\n", " 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC',\n", " 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC',\n", " 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC',\n", " 'CC', '00', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC',\n", " 'CC', 'CC', '00', '00', '00', '00', '00', '00', '00', '00', '00',\n", " '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',\n", " '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',\n", " '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',\n", " '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',\n", " '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',\n", " '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',\n", " '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',\n", " '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',\n", " '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',\n", " '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',\n", " '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',\n", " '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',\n", " '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',\n", " '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',\n", " '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',\n", " '00', '00', '00', '00', '00', '00'], \n", " dtype=' Note that `PyPlink` only writes the `BED` file. The user is required to create the `FAM` and `BIM` files." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# The genotypes for 3 markers and 10 samples\n", "all_genotypes = [\n", " [0, 0, 0, 1, 0, 0, -1, 2, 1, 0],\n", " [0, 0, 1, 1, 0, 0, 0, 1, 2, 0],\n", " [0, 0, 0, 0, 1, 1, 0, 0, 0, 1],\n", "]\n", "\n", "# Writing the BED file using PyPlink\n", "with PyPlink(\"test_output\", \"w\") as pedfile:\n", " for genotypes in all_genotypes:\n", " pedfile.write_genotypes(genotypes)\n", "\n", "# Writing a dummy FAM file\n", "with open(\"test_output.fam\", \"w\") as fam_file:\n", " for i in range(10):\n", " print(\"family_{}\".format(i+1), \"sample_{}\".format(i+1), \"0\", \"0\", \"0\", \"-9\",\n", " sep=\" \", file=fam_file)\n", "\n", "# Writing a dummy BIM file\n", "with open(\"test_output.bim\", \"w\") as bim_file:\n", " for i in range(3):\n", " print(\"1\", \"marker_{}\".format(i+1), \"0\", i+1, \"A\", \"T\",\n", " sep=\"\\t\", file=bim_file)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Checking the content of the newly created binary files\n", "pedfile = PyPlink(\"test_output\")" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fidiidfathermothergenderstatus
0family_1sample_1000-9
1family_2sample_2000-9
2family_3sample_3000-9
3family_4sample_4000-9
4family_5sample_5000-9
5family_6sample_6000-9
6family_7sample_7000-9
7family_8sample_8000-9
8family_9sample_9000-9
9family_10sample_10000-9
\n", "
" ], "text/plain": [ " fid iid father mother gender status\n", "0 family_1 sample_1 0 0 0 -9\n", "1 family_2 sample_2 0 0 0 -9\n", "2 family_3 sample_3 0 0 0 -9\n", "3 family_4 sample_4 0 0 0 -9\n", "4 family_5 sample_5 0 0 0 -9\n", "5 family_6 sample_6 0 0 0 -9\n", "6 family_7 sample_7 0 0 0 -9\n", "7 family_8 sample_8 0 0 0 -9\n", "8 family_9 sample_9 0 0 0 -9\n", "9 family_10 sample_10 0 0 0 -9" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pedfile.get_fam()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
chromposcma1a2
snp
marker_1110AT
marker_2120AT
marker_3130AT
\n", "
" ], "text/plain": [ " chrom pos cm a1 a2\n", "snp \n", "marker_1 1 1 0 A T\n", "marker_2 1 2 0 A T\n", "marker_3 1 3 0 A T" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pedfile.get_bim()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "marker_1 [ 0 0 0 1 0 0 -1 2 1 0]\n", "marker_2 [0 0 1 1 0 0 0 1 2 0]\n", "marker_3 [0 0 0 0 1 1 0 0 0 1]\n" ] } ], "source": [ "for marker, genotypes in pedfile:\n", " print(marker, genotypes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The newly created binary files are compatible with Plink." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "@----------------------------------------------------------@\n", "| PLINK! | v1.07 | 10/Aug/2009 |\n", "|----------------------------------------------------------|\n", "| (C) 2009 Shaun Purcell, GNU General Public License, v2 |\n", "|----------------------------------------------------------|\n", "| For documentation, citation & bug-report instructions: |\n", "| http://pngu.mgh.harvard.edu/purcell/plink/ |\n", "@----------------------------------------------------------@\n", "\n", "Skipping web check... [ --noweb ] \n", "Writing this text to log file [ plink.log ]\n", "Analysis started: Fri Jan 20 09:20:55 2017\n", "\n", "Options in effect:\n", "\t--noweb\n", "\t--bfile test_output\n", "\t--freq\n", "\n", "Reading map (extended format) from [ test_output.bim ] \n", "3 markers to be included from [ test_output.bim ]\n", "Reading pedigree information from [ test_output.fam ] \n", "10 individuals read from [ test_output.fam ] \n", "0 individuals with nonmissing phenotypes\n", "Assuming a disease phenotype (1=unaff, 2=aff, 0=miss)\n", "Missing phenotype value is also -9\n", "0 cases, 0 controls and 10 missing\n", "0 males, 0 females, and 10 of unspecified sex\n", "Warning, found 10 individuals with ambiguous sex codes\n", "These individuals will be set to missing ( or use --allow-no-sex )\n", "Writing list of these individuals to [ plink.nosex ]\n", "Reading genotype bitfile from [ test_output.bed ] \n", "Detected that binary PED file is v1.00 SNP-major mode\n", "Before frequency and genotyping pruning, there are 3 SNPs\n", "10 founders and 0 non-founders found\n", "Writing allele frequencies (founders-only) to [ plink.frq ] \n", "\n", "Analysis finished: Fri Jan 20 09:20:55 2017\n", "\n" ] } ], "source": [ "from subprocess import Popen, PIPE\n", "\n", "# Computing frequencies\n", "proc = Popen([\"plink\", \"--noweb\", \"--bfile\", \"test_output\", \"--freq\"],\n", " stdout=PIPE, stderr=PIPE)\n", "outs, errs = proc.communicate()\n", "print(outs.decode(), end=\"\")" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " CHR SNP A1 A2 MAF NCHROBS\n", " 1 marker_1 A T 0.2222 18\n", " 1 marker_2 A T 0.25 20\n", " 1 marker_3 A T 0.15 20\n" ] } ], "source": [ "with open(\"plink.frq\", \"r\") as i_file:\n", " print(i_file.read(), end=\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### *INDIVIDUAL-major* format\n", "\n", "The following examples shows how to write a binary file using the `PyPlink` module. The *INDIVIDUAL-major* format means that the binary file is written one sample at a time.\n", "\n", "**Files in *INDIVIDUAL-major* format is not readable by `PyPlink`.** You need to convert it using *Plink*.\n", "\n", "> Note that `PyPlink` only writes the `BED` file. The user is required to create the `FAM` and `BIM` files." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# The genotypes for 3 markers and 10 samples (INDIVIDUAL-major)\n", "all_genotypes = [\n", " [ 0, 0, 0],\n", " [ 0, 0, 0],\n", " [ 0, 1, 0],\n", " [ 1, 1, 0],\n", " [ 0, 0, 1],\n", " [ 0, 0, 1],\n", " [-1, 0, 0],\n", " [ 2, 1, 0],\n", " [ 1, 2, 0],\n", " [ 0, 0, 1],\n", "]\n", "\n", "# Writing the BED file using PyPlink\n", "with PyPlink(\"test_output_2\", \"w\", bed_format=\"INDIVIDUAL-major\") as pedfile:\n", " for genotypes in all_genotypes:\n", " pedfile.write_genotypes(genotypes)\n", "\n", "# Writing a dummy FAM file\n", "with open(\"test_output_2.fam\", \"w\") as fam_file:\n", " for i in range(10):\n", " print(\"family_{}\".format(i+1), \"sample_{}\".format(i+1), \"0\", \"0\", \"0\", \"-9\",\n", " sep=\" \", file=fam_file)\n", "\n", "# Writing a dummy BIM file\n", "with open(\"test_output_2.bim\", \"w\") as bim_file:\n", " for i in range(3):\n", " print(\"1\", \"marker_{}\".format(i+1), \"0\", i+1, \"A\", \"T\",\n", " sep=\"\\t\", file=bim_file)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "@----------------------------------------------------------@\n", "| PLINK! | v1.07 | 10/Aug/2009 |\n", "|----------------------------------------------------------|\n", "| (C) 2009 Shaun Purcell, GNU General Public License, v2 |\n", "|----------------------------------------------------------|\n", "| For documentation, citation & bug-report instructions: |\n", "| http://pngu.mgh.harvard.edu/purcell/plink/ |\n", "@----------------------------------------------------------@\n", "\n", "Skipping web check... [ --noweb ] \n", "Writing this text to log file [ plink_2.log ]\n", "Analysis started: Fri Jan 20 09:21:10 2017\n", "\n", "Options in effect:\n", "\t--noweb\n", "\t--bfile test_output_2\n", "\t--freq\n", "\t--out plink_2\n", "\n", "Reading map (extended format) from [ test_output_2.bim ] \n", "3 markers to be included from [ test_output_2.bim ]\n", "Reading pedigree information from [ test_output_2.fam ] \n", "10 individuals read from [ test_output_2.fam ] \n", "0 individuals with nonmissing phenotypes\n", "Assuming a disease phenotype (1=unaff, 2=aff, 0=miss)\n", "Missing phenotype value is also -9\n", "0 cases, 0 controls and 10 missing\n", "0 males, 0 females, and 10 of unspecified sex\n", "Warning, found 10 individuals with ambiguous sex codes\n", "These individuals will be set to missing ( or use --allow-no-sex )\n", "Writing list of these individuals to [ plink_2.nosex ]\n", "Reading genotype bitfile from [ test_output_2.bed ] \n", "Detected that binary PED file is v1.00 individual-major mode\n", "Before frequency and genotyping pruning, there are 3 SNPs\n", "Converting data to SNP-major format\n", "10 founders and 0 non-founders found\n", "Writing allele frequencies (founders-only) to [ plink_2.frq ] \n", "\n", "Analysis finished: Fri Jan 20 09:21:10 2017\n", "\n" ] } ], "source": [ "from subprocess import Popen, PIPE\n", "\n", "# Computing frequencies\n", "proc = Popen([\"plink\", \"--noweb\", \"--bfile\", \"test_output_2\", \"--freq\", \"--out\", \"plink_2\"],\n", " stdout=PIPE, stderr=PIPE)\n", "outs, errs = proc.communicate()\n", "print(outs.decode(), end=\"\")" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " CHR SNP A1 A2 MAF NCHROBS\n", " 1 marker_1 A T 0.2222 18\n", " 1 marker_2 A T 0.25 20\n", " 1 marker_3 A T 0.15 20\n" ] } ], "source": [ "with open(\"plink_2.frq\", \"r\") as i_file:\n", " print(i_file.read(), end=\"\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }