{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Loading files into LIMIX" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Internally, LIMIX uses the hdf5 file format (http://www.hdfgroup.org) to handle genotype and phenotype data. This file format is flexible and supported by a number of data analysis tools, including R (e.g. [rhdf5](http://www.bioconductor.org/packages/release/bioc/html/rhdf5.html)) and python (e.g. [h5py](http://www.h5py.org), [pandas](http://pandas.pydata.org) or [perl hdf5](http://search.cpan.org/~chm/PDL-IO-HDF5-0.6501/hdf5.pd)).\n", "\n", "There is also a growing list of Bioinformatics tools and pipelines that build on hdf5:\n", "* [h5vc](http://www.bioconductor.org/packages/release/bioc/html/h5vc.html)\n", "* [biohdf](http://www.hdfgroup.org/projects/biohdf/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Limix file converter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Limix offers a simple conversion tool, which can be used to convert [plink]((http://pngu.mgh.harvard.edu/~purcell/plink/)binary files (.bed), csv files and 0,1,2 files, which can be generated using [VCFtools](http://vcftools.sourceforge.net).\n", "\n", "## Importing of genotype data\n", "\n", "### Reading a plink file:\n", "> limix_converter --outfile=./my_file.hdf5 --plink=./my_file \n", "\n", "Note, the .bed ending is ommited. If the file my_file.hdf5 already exists, the genoytpe group (not the phenotypes) is deleted. An example plink file is included in the tutorial folder in \"data/importer/genotype.(bed/bfam/bim)\n", "\n", "### Reading a VCF file:\n", "VCF files need first to be converted into a G012 file. This can be achieved via [vcftools](http://vcftools.sourceforge.net):\n", "> vcftools --vcf INFILE --012 --out OUTFILE\n", "\n", "If the vcf file is .gz compressed, you need to call\n", "\n", "> vcftools --vcfgz INFILE --012 --out OUTFILE\n", "\n", "Subsequently, the file can be imported into a LIMX hdf5 file, using:\n", "\n", "> limix_converter --outfile=./my_file.hdf5 --g012=./OUTFILE \n", "\n", "Note again that the endings are ommited. VCFtools will require several files in the export statement and both limix_converter and vcftools assume that any file ending is ommitted.\n", "An example vcf file is included in the tutorial folder in \"data/importer/vcf_sample.vcf.gz. \n", "\n", "\n", "## Importing of phenotype data\n", "\n", "### Reading a phentoype CSV file:\n", "> limix_converter --outfile=./my_file.hdf5 -csv=./phenotype_sample.csv\n", "\n", "Note, the phenotype file is expected to be in the format [samples (rows) x phenotypes (columns)], including column headers (phenotype IDs) and rowheader (sample IDs). \n", "An example CSV file is included in the tutorials folder in \"data/importer/phenotype.csv\".\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }