{ "cells": [ { "cell_type": "markdown", "id": "concrete-bruce", "metadata": {}, "source": [ "# Annotating cell types in human single-cell RNA-seq data with CellO\n", "\n", "This Jupyter notebook implements the STAR Protocol for using CellO to annotate human single-cell RNA-seq data." ] }, { "cell_type": "markdown", "id": "restricted-neighbor", "metadata": {}, "source": [ "### Before we begin\n", "\n", "We will download a single-cell RNA lung tissue dataset from GEO produced by Laughney et al. (2020)." ] }, { "cell_type": "code", "execution_count": 1, "id": "yellow-ministry", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "CompletedProcess(args='curl -O ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3516nnn/GSM3516673/suppl/GSM3516673_MSK_LX682_NORMAL_dense.csv.gz', returncode=0)" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import subprocess\n", "\n", "GEO_DATASET_URL = 'ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3516nnn/GSM3516673/suppl/GSM3516673_MSK_LX682_NORMAL_dense.csv.gz'\n", "\n", "subprocess.run(f'curl -O {GEO_DATASET_URL}', shell=True)" ] }, { "cell_type": "markdown", "id": "exotic-cheese", "metadata": {}, "source": [ "### Steps 1-2: Install CellO and its dependencies\n", "\n", "Steps 1-2 entail installing CellO, its dependencies and verifying that they are installed correctly. We will install CellO within an Anaconda environment. Make sure that Anaconda is installed, and then run the following commands:\n", "\n", "```\n", "conda activate\n", "conda create -y -n cello_env python=3.7 graphviz\n", "conda activate cello_env\n", "pip install pygraphviz leidenalg cello-classify\n", "```" ] }, { "cell_type": "markdown", "id": "interesting-raise", "metadata": {}, "source": [ "### Step 3. Import necessary Python packages" ] }, { "cell_type": "code", "execution_count": 3, "id": "female-sense", "metadata": {}, "outputs": [], "source": [ "import os\n", "import pandas as pd\n", "import scanpy as sc\n", "from anndata import AnnData\n", "import cello" ] }, { "cell_type": "markdown", "id": "expensive-sessions", "metadata": {}, "source": [ "### Step 4: Load the expression matrix using Pandas and Scanpy" ] }, { "cell_type": "code", "execution_count": 4, "id": "heated-court", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/anaconda3/envs/cello_env/lib/python3.7/site-packages/anndata/_core/anndata.py:120: ImplicitModificationWarning: Transforming to str index.\n", " warnings.warn(\"Transforming to str index.\", ImplicitModificationWarning)\n" ] }, { "data": { "text/html": [ "
| \n", " | TSPAN6 | \n", "DPM1 | \n", "SCYL3 | \n", "C1ORF112 | \n", "FGR | \n", "CFH | \n", "FUCA2 | \n", "GCLC | \n", "NFYA | \n", "STPG1 | \n", "... | \n", "RP11-24F11.5 | \n", "RP5-958B11.1 | \n", "WDFY4.1 | \n", "RP11-244E17.1 | \n", "RP11-57A19.7 | \n", "RP11-419I17.1 | \n", "RP3-454G6.2 | \n", "AC013271.5 | \n", "RP11-122G18.12 | \n", "RP5-937E21.8 | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 120703408789411 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "3 | \n", "0 | \n", "3 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
| 120703408793835 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
| 120703409145716 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
| 120703409339181 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
| 120703409379676 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 241114576481206 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
| 241114577287974 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
| 241114589031734 | \n", "0 | \n", "3 | \n", "0 | \n", "0 | \n", "3 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
| 241114589096668 | \n", "0 | \n", "2 | \n", "0 | \n", "0 | \n", "2 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
| 241114608782195 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
4061 rows × 18804 columns
\n", "