{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction\n",
"\n",
"In this tutorial, we demonstrate how to use PHATE (Potential of Heat-diffusion for Affinity-based Transition Embedding) to analyze a 31,000 cell 27-day time course of embryoid body (EB) differentiation. You can run and edit this notebook at https://colab.research.google.com/github/KrishnaswamyLab/PHATE/blob/master/Python/tutorial/EmbryoidBody.ipynb. Running the tutorial should take approximately 15 minutes excluding the t-SNE comparison, or 25 minutes including the comparison.\n",
"\n",
"We review the following steps:\n",
"\n",
"[1. Loading 10X data](#loading) \n",
"[2. Preprocessing: Filtering, Normalizing, and Transforming](#preprocessing) \n",
"[3. Embedding Data Using PHATE](#embedding) \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Time course of human embryoid body differentation\n",
"\n",
"Low passage H1 hESCs were maintained on Matrigel-coated dishes in DMEM/F12-N2B27 media supplemented with FGF2. For EB formation, cells were treated with Dispase, dissociated into small clumps and plated in non-adherent plates in media supplemented with 20% FBS,\n",
"45\n",
"which was prescreened for EB differentiation. Samples were collected during 3-day intervals during a 27 day-long differentiation timecourse. An undifferentiated hESC sample was also included (Figure S7D). Induction of key germ layer markers in these EB cultures was validated by qPCR (data not shown). For single cell analyses, EB cultures were dissociated, FACS sorted to remove doublets and dead cells and processed on a 10x genomics instrument to generate cDNA libraries, which were then sequenced. Small scale sequencing determined that we have successfully collected data on approximately 31,000 cells equally distributed throughout the timecourse.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 0. Install PHATE\n",
"\n",
"If you have not already installed PHATE and `scprep`, we can install them from the notebook. You may need to restart the kernel/runtime after installation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install --user --upgrade phate scprep"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## 1. Loading 10X data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Downloading Data from Mendeley Datasets\n",
"\n",
"The EB dataset is publically available as `scRNAseq.zip` at Mendelay Datasets at
\n", " | RP11-34P13.3 (ENSG00000243485) | \n", "FAM138A (ENSG00000237613) | \n", "OR4F5 (ENSG00000186092) | \n", "RP11-34P13.7 (ENSG00000238009) | \n", "RP11-34P13.8 (ENSG00000239945) | \n", "RP11-34P13.14 (ENSG00000239906) | \n", "RP11-34P13.9 (ENSG00000241599) | \n", "FO538757.3 (ENSG00000279928) | \n", "FO538757.2 (ENSG00000279457) | \n", "AP006222.2 (ENSG00000228463) | \n", "... | \n", "AC007325.2 (ENSG00000277196) | \n", "BX072566.1 (ENSG00000277630) | \n", "AL354822.1 (ENSG00000278384) | \n", "AC023491.2 (ENSG00000278633) | \n", "AC004556.1 (ENSG00000276345) | \n", "AC233755.2 (ENSG00000277856) | \n", "AC233755.1 (ENSG00000275063) | \n", "AC240274.1 (ENSG00000271254) | \n", "AC213203.1 (ENSG00000277475) | \n", "FAM231B (ENSG00000268674) | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
AAACATACCAGAGG-1 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
AAACATTGAAAGCA-1 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
AAACATTGAAGTGA-1 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
AAACATTGGAGGTG-1 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
AAACATTGGTTTCT-1 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
5 rows × 33694 columns
\n", "\n", " | RP11-34P13.3 (ENSG00000243485) | \n", "FAM138A (ENSG00000237613) | \n", "OR4F5 (ENSG00000186092) | \n", "RP11-34P13.7 (ENSG00000238009) | \n", "RP11-34P13.8 (ENSG00000239945) | \n", "RP11-34P13.14 (ENSG00000239906) | \n", "RP11-34P13.9 (ENSG00000241599) | \n", "FO538757.3 (ENSG00000279928) | \n", "FO538757.2 (ENSG00000279457) | \n", "AP006222.2 (ENSG00000228463) | \n", "... | \n", "AC007325.2 (ENSG00000277196) | \n", "BX072566.1 (ENSG00000277630) | \n", "AL354822.1 (ENSG00000278384) | \n", "AC023491.2 (ENSG00000278633) | \n", "AC004556.1 (ENSG00000276345) | \n", "AC233755.2 (ENSG00000277856) | \n", "AC233755.1 (ENSG00000275063) | \n", "AC240274.1 (ENSG00000271254) | \n", "AC213203.1 (ENSG00000277475) | \n", "FAM231B (ENSG00000268674) | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AAACATTGAAAGCA-1_Day 00-03 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
AAACCGTGCAGAAA-1_Day 00-03 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
AAACCGTGGAAGGC-1_Day 00-03 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
AAACGCACCGGTAT-1_Day 00-03 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
AAACGCACCTATTC-1_Day 00-03 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "... | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
5 rows × 33694 columns
\n", "