######################################################################

README for GIPS Test Project

updated: 15 July, 2014

######################################################################


=====================================
README file
=====================================
This file provides information specific to GIPS Test project, to help
user check whether GIPS is correctly set up, and help user understand GIPS workflow.



=====================================
Directory contents
=====================================
This test directory includes several sub-directories and data files. 
.
├── Data
│   ├── sample1
│   │   ├── sample1.sam
│   │   └── sample1.vcf
│   ├── sample2
│   │   ├── sample2.sam
│   │   └── sample2.vcf
│   └── sample3
│       ├── sample3.sam
│       └── sample3.vcf
├── Ref
│   ├── Test.fa
│   ├── Test.fa.fai
│   └── Test.gff3
├── Script
│   └── caller_script
├── Working
│   ├── Archive
│   └── temporary
├── README
└── PROJECT.ini

9 directories, 12 files

It is recommended to store alignment results (i.e. SAM files) 
in the “Data” directory; store reference data, such as genome annotation or 
sequence, in the “Ref” directory; and store user scripts in the “Script” directory.
In this test, sequence alignment/map results (SAM files) were simulated by ART[1,2].

The “Working” directory is intended for storing intermediate results such as 
simulated sequencing read mappings and called variants, as well as archived GIPS results.
The existing GIPS result will automatically be moved into “Working/Archive” in each GIPS run.  


====================================================
Essential setup before running GIPS 
====================================================
Most configurations are already in PROJECT.ini. However, users are still required to
complete the following settings:
SNPEFF
CANDIDATE_CRITERIA


About CANDIDATE_CRITERIA
------------------------
The minimal number of phenotype-exhibiting samples, in which candidate genes are expected
to harbor variants. Defaults to the total number of phenotype-exhibiting samples.
In this case, GIPS reports only genes that habor variants in all samples.


About SNPEFF
---------------
GIPS doesn’t use SnpEff to filter variants, instead, GIPS filters variants
based on annotations of the SNPs, which are created by SnpEff. 
SnpEff will be invoked by GIPS when running.

If a user has already installed SnpEff, please set the SnpEff folder path 
in PROJECT.ini to let GIPS knows where the components of SnpEff can be found. 

If not, SnpEff is available at http://snpeff.sourceforge.net/index.html [3].
User can download SnpEff with wget, for example:

     cd /path/to/Test
     mkdir Software
     cd Software
     wget -c http://iweb.dl.sourceforge.net/project/snpeff/snpEff_v3_4_core.zip
     unzip snpeff.zip

After installing SnpEff, set the SNPEFF parameter in PROJECT.ini with /path/to/snpeff
(not /path/to/snpeff/snpeff.jar)


=====================================
About caller script
=====================================
There is a variant caller script using samtools, “Script/samtools_Q13q30_script”.
However, users must make sure that the samtools components, 
“samtools, bcftools and vcfutils.pl” are in executable search path. 
Otherwise, users needs to modify this script to ensure that the samltools components
can be correctly invoked. 
In addition, please make sure the script is executable, “chmod 755 /path/to/Test/Script”. 
If a user wishes to modify this script, it is recommended that the user makes a copy of the
original script and modify on the copy. 

Important: 
--------------------------
GIPS will invoke the variant caller script to call variants from simulated sequencing 
read mappings, for estimation of variant calling sensitivity. For this purpose, 

  1, The script must be executable.
  2, The script must take two command line parameters.  
     "$1" is a sequence alignment file, input, in SAM format.  
     "$2" is a variant file, output, in VCF format.
  3, GIPS has a 'temporary' directory in the “Working” folder, 
     which stores intermediate files (e.g. BAM files, BCF files). If the script
     produces intermediate results, it is recommended to dump these intermediate
     results into “Working/temporary”.




=====================================
Run GIPS with Test data
=====================================
#   estimate each variant calling sensitivity (VCS) for each sample
 java -jar GIPS.jar -T vcs -p Test
#   run a full GIPS workflow
 java -jar GIPS.jar -T gips -p Test
#   show help
 java -jar GIPS.jar -h



=========
Reference
=========
[1] Li H.*, Handsaker B.*, Wysoker A., Fennell T., Ruan J., Homer N., Marth G., 
    Abecasis G., Durbin R. and 1000 Genome Project Data Processing 
    Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. 
    Bioinformatics, 25, 2078-9.
[2] Weichun Huang, Leping Li, Jason R Myers, and Gabor T Marth. ART: a 
    next-generation sequencing read simulator, Bioinformatics
    (2012) 28 (4): 593-594 
[3] A program for annotating and predicting the effects of single nucleotide 
    polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain 
    w1118; iso-2; iso-3., Cingolani P, Platts A, Wang le L, Coon M, Nguyen 
    T, Wang L, Land SJ, Lu X, Ruden DM. Fly (Austin). 2012 Apr-Jun;6(2):80-92. 
    PMID: 22728672
[4] Landrum, Melissa J., et al. "ClinVar: public archive of relationships 
    among sequence variation and human phenotype." Nucleic acids 
    research (2013): gkt1113.



=====
Notes
=====
For more information, see:
  https://code.google.com/p/gips/
If you have questions, please contact:
  sxzhuxu(at)126.com or xinchen(at)zju.edu.cn



