# hicstuff command line interface demo

## Preparing the data

If using bowtie, genome must first be indexed using bowtie2-build

```bash
bowtie2-build genome.fa genome
```
The input reads can be in fastq format, or in name-sorted BAM format if already aligned to the genome.


## Generating matrices

The pipeline command can be used to generate the Hi-C contact map from the input reads.

```bash
hicstuff pipeline --no-cleanup \
 --enzyme DpnII \
 --distance-law \
 --filter \
 --threads 12 \
 --plot \
 --iterative \
 --genome genome \
 --output output/ \
 --prefix demo \
 forward.fq \
 reverse.fq
```
For instance, this will create a directory named "output", containing the output files with the prefix "demo". The ouput directory will contain two subdirectories; "tmp", containing all temporary files and "plots", containing figures generated at different stages of the pipeline. Reads will be truncated to 20bp and aligned to the genome by iterative extension. The process is parallelized on 12 threadsd. Hi-C pairs will also be filtered to exclude uninformative religation events.

## Output files
The output files should look like this:
```
output
├── demo.chr.tsv
├── demo.frags.tsv
├── demo.hicstuff_20190423185220.log
├── demo.mat.tsv
├── demo.distance_law.txt
├── plots
│   ├── event_distance.pdf
│   ├── event_distribution.pdf
│   └── frags_hist.pdf
└── tmp
 ├── demo.for.bam
 ├── demo.genome.fasta
 ├── demo.rev.bam
 ├── demo.valid_idx_filtered.pairs
 ├── demo.valid_idx.pairs
 └── demo.valid.pairs
```

There are 3 output files in the base `output` directory: the contact matrix (demo.mat.tsv), the info_contigs file (demo.chr.tsv) and the fragments_list (demo.frags.tsv), there is another file if the arguments `--distance-law` is enabled: the raw distance law table. The `tmp` directory contains the fasta genome extracted from the bowtie2 index, the alignments in SAM format and all temporary files in .pairs fomat.


## Generating the distance law

The distance law is the probability of contact of two fragments in function of the distance between these fragments. There are two ways to compute it with hicstuff. The first one using the full pipeline with the option `--distance-law`, as done above. It's possible to add an option `--centromeres` if you want to compute the distance law on separate arms. The output of this command will be a raw table of the distance without any treatment of the data. It will be then possible with the command distancelaw to process this table.

The second way is to use the command distancelaw with the pairs file as input:

```bash
hicstuff distancelaw --average \
 --big-arm-only \
 --centromeres centromeres.txt \
 --frags output/demo.frags.tsv \
 --inf 3000 \ 
 --outputfile-img output/demo_distance_law.svg \ 
 --labels labels.txt \
 --sup 500000 \
 --pairs output/tmp/demo.valid_idx_filtered.pairs
```

For instance, this will create an image with the distance law generated from the pairs file given in input. The distance law will be the average between all the distance laws of the arms bigger than 500kb. The logspace used to plot it will have a base 1.1 by default. The limits of the x axis will be 3kb and 500kb.

## Visualizing the matrix

The view command can be used to visualise the output Hi-C matrix.

```bash
hicstuff view --binning 5kb --normalize --frags output/demo.frags.tsv output/demo.mat.tsv
```

This will show an interactive heatmap using matplotlib. In order to save the matrix to a file instead, one could add `--output output/demo.png`

Note there are many options allowing to process the matrix to improve the signal.

## Converting output

The default output files of hicstuff pipeline can be converted into a cool file or a bedgraph2d file using the command `hicstuff convert`. For example to generate the file `cool_output/demo.cool`:

```bash
hicstuff convert --frags output/demo.frags.tsv \
 --chroms output/demo.chr.tsv \
 --out cool_output \
 --prefix demo \
 --from GRAAL \
 --to cool output/demo.mat.tsv
```

## Rebining existing files
Files previously produced by hicstuff pipeline can be rebinned at a lower resolution using the `hicstuff rebin` command.
This will generate a new matrix, a new fragments_list.txt and a new info_contigs.txt, all with updated number of bins:

```bash
hicstuff rebin -f output/demo.frags.tsv \
 -c output/demo.chr.tsv \
 --out rebin_1kb \
 --binning 1kb output/demo.mat.tsv
```