{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Loading files into LIMIX"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Internally, LIMIX uses the hdf5 file format (http://www.hdfgroup.org) to handle genotype and phenotype data. This file format is flexible and supported by a number of data analysis tools, including R (e.g. [rhdf5](http://www.bioconductor.org/packages/release/bioc/html/rhdf5.html)) and python (e.g. [h5py](http://www.h5py.org), [pandas](http://pandas.pydata.org) or [perl hdf5](http://search.cpan.org/~chm/PDL-IO-HDF5-0.6501/hdf5.pd)).\n",
    "\n",
    "There is also a growing list of Bioinformatics tools and pipelines that build on hdf5:\n",
    "* [h5vc](http://www.bioconductor.org/packages/release/bioc/html/h5vc.html)\n",
    "* [biohdf](http://www.hdfgroup.org/projects/biohdf/)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##Limix file converter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Limix offers a simple conversion tool, which can be used to convert [plink]((http://pngu.mgh.harvard.edu/~purcell/plink/)binary files (.bed), csv files and 0,1,2 files, which can be generated using [VCFtools](http://vcftools.sourceforge.net).\n",
    "\n",
    "## Importing of genotype data\n",
    "\n",
    "### Reading a plink file:\n",
    ">  limix_converter --outfile=./my_file.hdf5 --plink=./my_file \n",
    "\n",
    "Note, the .bed ending is ommited. If the file my_file.hdf5 already exists, the genoytpe group (not the phenotypes) is deleted. An example plink file is included in the tutorial folder in \"data/importer/genotype.(bed/bfam/bim)\n",
    "\n",
    "### Reading a VCF file:\n",
    "VCF files need first to be converted into a G012 file. This can be achieved via [vcftools](http://vcftools.sourceforge.net):\n",
    ">  vcftools --vcf INFILE --012 --out OUTFILE\n",
    "\n",
    "If the vcf file is .gz compressed, you need to call\n",
    "\n",
    ">  vcftools --vcfgz INFILE --012 --out OUTFILE\n",
    "\n",
    "Subsequently, the file can be imported into a LIMX hdf5 file, using:\n",
    "\n",
    ">  limix_converter --outfile=./my_file.hdf5 --g012=./OUTFILE \n",
    "\n",
    "Note again that the endings are ommited. VCFtools will require several files in the export statement and both limix_converter and vcftools assume that any file ending is ommitted.\n",
    "An example vcf file is included in the tutorial folder in \"data/importer/vcf_sample.vcf.gz. \n",
    "\n",
    "\n",
    "## Importing of phenotype data\n",
    "\n",
    "### Reading a phentoype CSV file:\n",
    ">  limix_converter --outfile=./my_file.hdf5 -csv=./phenotype_sample.csv\n",
    "\n",
    "Note, the phenotype file is expected to be in the format [samples (rows) x phenotypes (columns)], including column headers (phenotype IDs) and rowheader (sample IDs). \n",
    "An example CSV file is included in the tutorials folder in \"data/importer/phenotype.csv\".\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}