{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Blastocyst Development in Mice: Single Cell TaqMan Arrays" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### presented at the EBI BioPreDyn Course 'The Systems Biology Modelling Cycle'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Max Zwiessele, Oliver Stegle, Neil Lawrence 12th May 2014 University of Sheffield and EBI." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook we follow Buettner and Theis (2012) and use the GP-LVM to analyze some single cell data from Guo et al (2010). They performed qPCR TaqMan array on single cells from the developing blastocyst in mouse. The data is taken from the early stages of development when the Blastocyst is forming. At the 32 cell stage the data is already separated into the trophectoderm (TE) which goes onto form the placenta and the inner cellular mass (ICM). The ICM further differentiates into the epiblast (EPI)---which gives rise to the endoderm, mesoderm and ectoderm---and the primitive endoderm (PE) which develops into the amniotic sack. Guo et al selected 48 genes for expression measurement. They labelled the resulting cells and their labels are included as an aide to visualization." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the paper, they first visualized their data using principal component analysis. In the first two principal components this fails to separate the domains. This is perhaps because the principal components are dominated by the variation in the 64 cell systems. This in turn may be because there are more cells from the data set in that regime, or additionally it could be that the natural variation across the 64 cell systems is greater. Both are probable causes of the dominance of these cells in the first two principal components. The first thing we do is plot the principal coordinates of the cells on the first two dominant directions. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### This Notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook we will perform PCA on the original data, showing that the different regimes do not separate. Then, we follow Buettner and Theis (2012) in applying the GP-LVM to the data. There is a slight pathology in the result, one which they fixed by using priors that were dependent on the developmental stage. We then show how the Bayesian GP-LVM doesn't exhibit those pathologies and gives nice results that seems to show the lineage of the cells. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we perform some set up." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pods, GPy, itertools\n", "%matplotlib inline\n", "from matplotlib import pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we load in the data. We've provided a convenience function for loading in the data with GPy. It is loaded in as a `pandas` DataFrame. This allows us to summarize it with the `describe` attribute." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", " | Actb | \n", "Ahcy | \n", "Aqp3 | \n", "Atp12a | \n", "Bmp4 | \n", "Cdx2 | \n", "Creb312 | \n", "Cebpa | \n", "Dab2 | \n", "DppaI | \n", "... | \n", "Sox2 | \n", "Sall4 | \n", "Sox17 | \n", "Snail | \n", "Sox13 | \n", "Tcfap2a | \n", "Tcfap2c | \n", "Tcf23 | \n", "Utf1 | \n", "Tspan8 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | \n", "4.370000e+02 | \n", "4.370000e+02 | \n", "4.370000e+02 | \n", "4.370000e+02 | \n", "4.370000e+02 | \n", "4.370000e+02 | \n", "4.370000e+02 | \n", "4.370000e+02 | \n", "4.370000e+02 | \n", "4.370000e+02 | \n", "... | \n", "4.370000e+02 | \n", "4.370000e+02 | \n", "4.370000e+02 | \n", "4.370000e+02 | \n", "4.370000e+02 | \n", "4.370000e+02 | \n", "4.370000e+02 | \n", "4.370000e+02 | \n", "4.370000e+02 | \n", "4.370000e+02 | \n", "
mean | \n", "2.089245e-08 | \n", "2.400458e-08 | \n", "3.011442e-08 | \n", "2.009153e-08 | \n", "1.416476e-08 | \n", "2.661327e-08 | \n", "1.828375e-08 | \n", "2.329519e-08 | \n", "2.993135e-08 | \n", "2.077803e-08 | \n", "... | \n", "2.180778e-08 | \n", "2.146453e-08 | \n", "2.077803e-08 | \n", "2.585812e-08 | \n", "2.473684e-08 | \n", "2.670481e-08 | \n", "2.009153e-08 | \n", "2.231121e-08 | \n", "2.263158e-08 | \n", "2.606407e-08 | \n", "
std | \n", "1.001146e+00 | \n", "1.001146e+00 | \n", "1.001146e+00 | \n", "1.001146e+00 | \n", "1.001146e+00 | \n", "1.001146e+00 | \n", "1.001146e+00 | \n", "1.001146e+00 | \n", "1.001146e+00 | \n", "1.001146e+00 | \n", "... | \n", "1.001146e+00 | \n", "1.001146e+00 | \n", "1.001146e+00 | \n", "1.001146e+00 | \n", "1.001146e+00 | \n", "1.001146e+00 | \n", "1.001146e+00 | \n", "1.001146e+00 | \n", "1.001146e+00 | \n", "1.001146e+00 | \n", "
min | \n", "-2.997659e+00 | \n", "-2.140030e+00 | \n", "-1.768643e+00 | \n", "-2.355953e+00 | \n", "-4.420191e+00 | \n", "-1.972546e+00 | \n", "-2.493296e+00 | \n", "-2.290915e+00 | \n", "-1.641875e+00 | \n", "-1.755517e+00 | \n", "... | \n", "-2.246728e+00 | \n", "-2.015028e+00 | \n", "-2.960914e+00 | \n", "-1.964500e+00 | \n", "-2.033562e+00 | \n", "-1.886886e+00 | \n", "-2.238215e+00 | \n", "-2.089612e+00 | \n", "-1.825355e+00 | \n", "-2.035440e+00 | \n", "
25% | \n", "-7.169942e-01 | \n", "-7.796685e-01 | \n", "-7.171038e-01 | \n", "-9.337937e-01 | \n", "-2.516838e-01 | \n", "-6.446426e-01 | \n", "-7.717112e-01 | \n", "-8.545864e-01 | \n", "-6.703301e-01 | \n", "-1.018770e+00 | \n", "... | \n", "-7.734272e-01 | \n", "-9.343944e-01 | \n", "-6.515803e-01 | \n", "-7.255418e-01 | \n", "-7.276985e-01 | \n", "-8.376303e-01 | \n", "-9.886838e-01 | \n", "-9.197137e-01 | \n", "-9.236930e-01 | \n", "-7.673579e-01 | \n", "
50% | \n", "9.227372e-02 | \n", "-1.782182e-01 | \n", "-1.842611e-01 | \n", "2.928931e-01 | \n", "2.146358e-01 | \n", "-1.353097e-01 | \n", "2.772327e-01 | \n", "-6.806050e-02 | \n", "-2.415102e-01 | \n", "1.097815e-01 | \n", "... | \n", "7.709900e-03 | \n", "8.293864e-02 | \n", "-3.701860e-02 | \n", "-1.972392e-01 | \n", "-1.336842e-01 | \n", "-2.656403e-01 | \n", "2.828122e-01 | \n", "-1.836380e-02 | \n", "-1.127644e-01 | \n", "-2.462695e-01 | \n", "
75% | \n", "6.808323e-01 | \n", "6.245760e-01 | \n", "3.312916e-01 | \n", "8.234601e-01 | \n", "6.738631e-01 | \n", "4.854397e-01 | \n", "8.213037e-01 | \n", "9.125241e-01 | \n", "4.854736e-01 | \n", "9.259312e-01 | \n", "... | \n", "8.608111e-01 | \n", "8.312436e-01 | \n", "6.177259e-01 | \n", "1.082358e+00 | \n", "5.329677e-01 | \n", "9.967453e-01 | \n", "8.099583e-01 | \n", "1.022974e+00 | \n", "8.257577e-01 | \n", "6.422915e-01 | \n", "
max | \n", "2.490650e+00 | \n", "3.564713e+00 | \n", "2.627702e+00 | \n", "1.687127e+00 | \n", "1.569268e+00 | \n", "3.427747e+00 | \n", "1.826346e+00 | \n", "1.729923e+00 | \n", "2.780331e+00 | \n", "1.671790e+00 | \n", "... | \n", "1.709987e+00 | \n", "4.489121e+00 | \n", "3.565258e+00 | \n", "1.655467e+00 | \n", "2.231074e+00 | \n", "2.104293e+00 | \n", "4.187598e+00 | \n", "1.808619e+00 | \n", "2.208022e+00 | \n", "2.270316e+00 | \n", "
8 rows × 48 columns
\n", "