ISB-CGC Community Notebooks
Check out more notebooks at our [Community Notebooks Repository](https://github.com/isb-cgc/Community-Notebooks)!


Title: How to use Kallisto to quantify genes in 10X scRNA-seq

Author: David L Gibbs

Created: 2019-08-07

Purpose: Demonstrate how to use 10X fastq files and produce the gene quantification matrix

Notes:

In this notebook, we're going to use the 10X genomics fastq files that we generated earlier, to quantify gene expression per cell using Kallisto and Bustools.

It is assumed that this notebook is running INSIDE THE CLOUD! By starting up a Jupyter notebook, you are already authenticated, can read and write to cloud storage (buckets) for free, and data transfers are super fast. To start up a notebook, log into your Google Cloud Console, use the main 'hamburger' menu to find the 'AI platform' near the bottom. Select Notebooks and you'll have an interface to start either an R or Python notebook.

## Resources:

Bustools paper:
https://www.ncbi.nlm.nih.gov/pubmed/31073610

https://www.kallistobus.tools/getting_started_explained.html

https://github.com/BUStools/BUS_notebooks_python/blob/master/dataset-notebooks/10x_hgmm_6k_v2chem_python/10x_hgmm_6k_v2chem.ipynb

https://pachterlab.github.io/kallisto/starting

In [None]:
cd /home/jupyter/

## Software install

In [None]:
!git clone https://github.com/pachterlab/kallisto.git

In [None]:
cd kallisto/

In [None]:
ls -lha

In [None]:
!sudo apt --yes install autoconf cmake

In [None]:
!mkdir build

In [None]:
cd build

In [None]:
!sudo cmake ..

!sudo make

!sudo make install

In [None]:
!kallisto

In [None]:
cd ../..

In [None]:
!git clone https://github.com/BUStools/bustools.git

In [None]:
# we need the devel version due to a bug that stopped compilation ...
!git checkout devel

In [None]:
!git status

In [None]:
cd bustools/

In [None]:
!mkdir build

In [None]:
cd build

In [None]:
!sudo cmake ..

!sudo make

!sudo make install

In [None]:
cd ../..

In [None]:
!bustools

## Reference Gathering

In [None]:
mkdir kallisto_bustools_getting_started/; cd kallisto_bustools_getting_started/

In [None]:
!wget ftp://ftp.ensembl.org/pub/release-96/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz

In [None]:
!wget ftp://ftp.ensembl.org/pub/release-96/gtf/homo_sapiens/Homo_sapiens.GRCh38.96.gtf.gz

## Barcode whitelist

In [None]:
# Version 3 chemistry
!wget https://github.com/BUStools/getting_started/releases/download/species_mixing/10xv3_whitelist.txt

In [None]:
# Version 2 chemistry
!wget https://github.com/bustools/getting_started/releases/download/getting_started/10xv2_whitelist.txt

## Gene map utility

In [None]:
!wget https://raw.githubusercontent.com/BUStools/BUS_notebooks_python/master/utils/transcript2gene.py

In [None]:
!gunzip Homo_sapiens.GRCh38.96.gtf.gz

In [None]:
!python transcript2gene.py --use_version < Homo_sapiens.GRCh38.96.gtf > transcripts_to_genes.txt

In [None]:
!head transcripts_to_genes.txt

## Data

In [None]:
mkdir data

In [None]:
!gsutil -m cp gs://your-bucket/bamtofastq_S1_* data

In [None]:
mkdir output

In [None]:
cd /home/jupyter

In [None]:
ls -lha data

## Indexing

In [None]:
!kallisto index -i Homo_sapiens.GRCh38.cdna.all.idx -k 31 Homo_sapiens.GRCh38.cdna.all.fa.gz

## Kallisto

In [None]:
!kallisto bus -i Homo_sapiens.GRCh38.cdna.all.idx -o output -x 10xv3 -t 8 \
data/bamtofastq_S1_L005_R1_001.fastq.gz data/bamtofastq_S1_L005_R2_001.fastq.gz \
data/bamtofastq_S1_L005_R1_002.fastq.gz data/bamtofastq_S1_L005_R2_002.fastq.gz \
data/bamtofastq_S1_L005_R1_003.fastq.gz data/bamtofastq_S1_L005_R2_003.fastq.gz \
data/bamtofastq_S1_L005_R1_004.fastq.gz data/bamtofastq_S1_L005_R2_004.fastq.gz \
data/bamtofastq_S1_L005_R1_005.fastq.gz data/bamtofastq_S1_L005_R2_005.fastq.gz \
data/bamtofastq_S1_L005_R1_006.fastq.gz data/bamtofastq_S1_L005_R2_006.fastq.gz \
data/bamtofastq_S1_L005_R1_007.fastq.gz data/bamtofastq_S1_L005_R2_007.fastq.gz 


## Bustools

In [None]:
cd /home/jupyter/output/

In [None]:
!mkdir genecount;
!mkdir tmp;
!mkdir eqcount

In [None]:
!bustools correct -w ../10xv3_whitelist.txt -o output.correct.bus output.bus

In [None]:
!bustools sort -t 8 -o output.correct.sort.bus output.correct.bus

In [None]:
!bustools text -o output.correct.sort.txt output.correct.sort.bus

In [None]:
!bustools count -o eqcount/output -g ../transcripts_to_genes.txt -e matrix.ec -t transcripts.txt output.correct.sort.bus

In [None]:
!bustools count -o genecount/output -g ../transcripts_to_genes.txt -e matrix.ec -t transcripts.txt --genecounts output.correct.sort.bus

In [None]:
!gzip output.bus
!gzip output.correct.bus

## Copyting out results

In [None]:
cd /home/jupyter

In [None]:
!gsutil -m cp -r output gs://my-output-bucket/my-results