{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "t5DCDUTabzzY" }, "source": [ "ISB-CGC Community Notebooks\n", "Check out more notebooks at our [Community Notebooks Repository](https://github.com/isb-cgc/Community-Notebooks)!\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Title: How to use Kallisto to quantify genes in 10X scRNA-seq\n", "\n", "Author: David L Gibbs\n", "\n", "Created: 2019-08-07\n", "\n", "Purpose: Demonstrate how to use 10X fastq files and produce the gene quantification matrix\n", "\n", "Notes:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook, we're going to use the 10X genomics fastq files that we generated earlier, to quantify gene expression per cell using Kallisto and Bustools." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is assumed that this notebook is running INSIDE THE CLOUD! By starting up a Jupyter notebook, you are already authenticated, can read and write to cloud storage (buckets) for free, and data transfers are super fast. To start up a notebook, log into your Google Cloud Console, use the main 'hamburger' menu to find the 'AI platform' near the bottom. Select Notebooks and you'll have an interface to start either an R or Python notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Resources:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Bustools paper:\n", "https://www.ncbi.nlm.nih.gov/pubmed/31073610" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "https://www.kallistobus.tools/getting_started_explained.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "https://github.com/BUStools/BUS_notebooks_python/blob/master/dataset-notebooks/10x_hgmm_6k_v2chem_python/10x_hgmm_6k_v2chem.ipynb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "https://pachterlab.github.io/kallisto/starting" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cd /home/jupyter/" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "E7qiNoHdb9vh" }, "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "pJ2wnujWwlFb" }, "source": [ "## Software install" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 136 }, "colab_type": "code", "id": "w5UxLaCSwm9_", "outputId": "1ccaa519-e295-4f61-999a-fef686950c94" }, "outputs": [], "source": [ "!git clone https://github.com/pachterlab/kallisto.git" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cd kallisto/" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ls -lha" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 850 }, "colab_type": "code", "id": "1MAgAT2Kw7GS", "outputId": "e877a752-aea9-44bd-e9b5-cb7bd927c0cd" }, "outputs": [], "source": [ "!sudo apt --yes install autoconf cmake" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "5qUPqfMrxbih" }, "outputs": [], "source": [ "!mkdir build" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "id": "nN3bP7X2xtFC", "outputId": "2acf0f2d-99a7-4db9-c5d4-c14ac07bf402" }, "outputs": [], "source": [ "cd build" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 2009 }, "colab_type": "code", "id": "9zMImaArxA_h", "outputId": "9a2843a9-0862-4697-9396-e951d4ce3df3" }, "outputs": [], "source": [ "!sudo cmake ..\n", "\n", "!sudo make\n", "\n", "!sudo make install" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 323 }, "colab_type": "code", "id": "kZRc9jLTxsig", "outputId": "7fe3990b-6c91-4df5-a0b9-98c5077b759b" }, "outputs": [], "source": [ "!kallisto" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cd ../.." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 136 }, "colab_type": "code", "id": "hDAdMXUoymbG", "outputId": "e8347ef1-01d4-421d-8af7-2f70409666ca" }, "outputs": [], "source": [ "!git clone https://github.com/BUStools/bustools.git" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# we need the devel version due to a bug that stopped compilation ...\n", "!git checkout devel" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!git status" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cd bustools/" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "ZyTY539G0Pf3" }, "outputs": [], "source": [ "!mkdir build" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "id": "pqRkGsdA1afB", "outputId": "519845e6-45db-4fdd-8273-7026be71acb2" }, "outputs": [], "source": [ "cd build" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 765 }, "colab_type": "code", "id": "_95UWvCb0UFK", "outputId": "25594773-fd95-4b27-bf95-5dce80db3a3b" }, "outputs": [], "source": [ "!sudo cmake ..\n", "\n", "!sudo make\n", "\n", "!sudo make install" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "id": "BjD5mz5B0pxu", "outputId": "c26a4c88-480a-4b4f-9d9c-8a348a837bbc" }, "outputs": [], "source": [ "cd ../.." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 255 }, "colab_type": "code", "id": "Dwhu4Z5c0k1k", "outputId": "3c05bf3d-11a5-4523-bdf7-013036470a4a" }, "outputs": [], "source": [ "!bustools" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab_type": "text", "id": "Q5qi1JRAuobI" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "tU_SHwE5utez" }, "source": [ "## Reference Gathering" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "qchHsWTnsrVM" }, "outputs": [], "source": [ "mkdir kallisto_bustools_getting_started/; cd kallisto_bustools_getting_started/" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 272 }, "colab_type": "code", "id": "zypqbJ8JsraW", "outputId": "73957f0d-cf91-45e7-f6b4-3df790455a15" }, "outputs": [], "source": [ "!wget ftp://ftp.ensembl.org/pub/release-96/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 272 }, "colab_type": "code", "id": "Zno4t5dUsrda", "outputId": "ab9f1835-7563-41ad-fa6f-09a50acf3ecb" }, "outputs": [], "source": [ "!wget ftp://ftp.ensembl.org/pub/release-96/gtf/homo_sapiens/Homo_sapiens.GRCh38.96.gtf.gz" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 187 }, "colab_type": "code", "id": "i6ogJW9isrgM", "outputId": "c29204e8-5254-47de-f186-5edf4e6862db" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "Y_0E9g3Du1k5" }, "source": [ "## Barcode whitelist" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 187 }, "colab_type": "code", "id": "RSp31-dSuiCu", "outputId": "4a908e89-4819-40d9-f645-59570f7401ae" }, "outputs": [], "source": [ "# Version 3 chemistry\n", "!wget https://github.com/BUStools/getting_started/releases/download/species_mixing/10xv3_whitelist.txt" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Version 2 chemistry\n", "!wget https://github.com/bustools/getting_started/releases/download/getting_started/10xv2_whitelist.txt" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "5Cd9ggS9u-au" }, "source": [ "## Gene map utility" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 204 }, "colab_type": "code", "id": "I1AwPwzbv3xP", "outputId": "07d6eaf6-0ba4-449d-e6db-f7a858164de4" }, "outputs": [], "source": [ "!wget https://raw.githubusercontent.com/BUStools/BUS_notebooks_python/master/utils/transcript2gene.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!gunzip Homo_sapiens.GRCh38.96.gtf.gz" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "9F9t9InPuiO5" }, "outputs": [], "source": [ "!python transcript2gene.py --use_version < Homo_sapiens.GRCh38.96.gtf > transcripts_to_genes.txt" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!head transcripts_to_genes.txt" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "JRrZ9P-k1397" }, "source": [ "## Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mkdir data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 68 }, "colab_type": "code", "id": "h2LG71Vl2olI", "outputId": "64e6d28a-81cf-48a5-94ed-9f1bdcd9e404" }, "outputs": [], "source": [ "!gsutil -m cp gs://your-bucket/bamtofastq_S1_* data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mkdir output" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cd /home/jupyter" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ls -lha data" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "krdOGVfL3qVU" }, "source": [ "## Indexing" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 255 }, "colab_type": "code", "id": "7uwhAtw33hlL", "outputId": "3b012501-f71f-4127-be94-a6dd0a99c5c4" }, "outputs": [], "source": [ "!kallisto index -i Homo_sapiens.GRCh38.cdna.all.idx -k 31 Homo_sapiens.GRCh38.cdna.all.fa.gz" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "yHqoOdsB6D5H" }, "source": [ "## Kallisto" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 187 }, "colab_type": "code", "id": "APhJDd4e6CFH", "outputId": "3f2ad505-4d5e-4c16-b89f-04c17e66bdb6" }, "outputs": [], "source": [ "!kallisto bus -i Homo_sapiens.GRCh38.cdna.all.idx -o output -x 10xv3 -t 8 \\\n", "data/bamtofastq_S1_L005_R1_001.fastq.gz data/bamtofastq_S1_L005_R2_001.fastq.gz \\\n", "data/bamtofastq_S1_L005_R1_002.fastq.gz data/bamtofastq_S1_L005_R2_002.fastq.gz \\\n", "data/bamtofastq_S1_L005_R1_003.fastq.gz data/bamtofastq_S1_L005_R2_003.fastq.gz \\\n", "data/bamtofastq_S1_L005_R1_004.fastq.gz data/bamtofastq_S1_L005_R2_004.fastq.gz \\\n", "data/bamtofastq_S1_L005_R1_005.fastq.gz data/bamtofastq_S1_L005_R2_005.fastq.gz \\\n", "data/bamtofastq_S1_L005_R1_006.fastq.gz data/bamtofastq_S1_L005_R2_006.fastq.gz \\\n", "data/bamtofastq_S1_L005_R1_007.fastq.gz data/bamtofastq_S1_L005_R2_007.fastq.gz \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bustools" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "id": "wMj9XJy6CwSk", "outputId": "09a57b2c-d93a-439f-df83-f9e3e8bec0da" }, "outputs": [], "source": [ "cd /home/jupyter/output/" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "6xRptPRHDHRw" }, "outputs": [], "source": [ "!mkdir genecount;\n", "!mkdir tmp;\n", "!mkdir eqcount" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!bustools correct -w ../10xv3_whitelist.txt -o output.correct.bus output.bus" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!bustools sort -t 8 -o output.correct.sort.bus output.correct.bus" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!bustools text -o output.correct.sort.txt output.correct.sort.bus" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!bustools count -o eqcount/output -g ../transcripts_to_genes.txt -e matrix.ec -t transcripts.txt output.correct.sort.bus" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!bustools count -o genecount/output -g ../transcripts_to_genes.txt -e matrix.ec -t transcripts.txt --genecounts output.correct.sort.bus" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!gzip output.bus\n", "!gzip output.correct.bus" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Copyting out results" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cd /home/jupyter" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!gsutil -m cp -r output gs://my-output-bucket/my-results" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "colab": { "collapsed_sections": [ "wYNMaJPlMEBA", "pJ2wnujWwlFb", "5Cd9ggS9u-au", "qwDda_5W6nrQ", "OYtFzZhkcGfb", "SpRXyhrdgpP6", "ZuBS1BbkhLsV", "bziQTgqPm1iJ" ], "name": "CRUK_G02_Normal_Kallisto_Run.ipynb", "provenance": [], "version": "0.3.2" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 4 }