{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Bayesian Learning of GP-LVM" ], "id": "7d6e509f-be10-43cc-af73-cb0ac24dffb0" }, { "cell_type": "markdown", "metadata": {}, "source": [], "id": "556f6ba4-1571-4b3c-82ac-06658afeee54" }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\n", "$$" ], "id": "4ca65e00-341f-4aa1-b5b5-84a4fb959c7d" }, { "cell_type": "markdown", "metadata": {}, "source": [ "::: {.cell .markdown}\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ], "id": "dec2cf36-df29-4433-8ad8-41c4b9b8f497" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install gpy" ], "id": "a607eac4-1b70-467c-8ad3-939b99b773ab" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## GPy: A Gaussian Process Framework in Python\n", "\n", "\\[edit\\]\n", "\n", "Gaussian processes are a flexible tool for non-parametric analysis with\n", "uncertainty. The GPy software was started in Sheffield to provide a easy\n", "to use interface to GPs. One which allowed the user to focus on the\n", "modelling rather than the mathematics.\n", "\n", "\n", "\n", "Figure: GPy is a BSD licensed software code base for implementing\n", "Gaussian process models in Python. It is designed for teaching and\n", "modelling. We welcome contributions which can be made through the GitHub\n", "repository \n", "\n", "GPy is a BSD licensed software code base for implementing Gaussian\n", "process models in python. This allows GPs to be combined with a wide\n", "variety of software libraries.\n", "\n", "The software itself is available on\n", "[GitHub](https://github.com/SheffieldML/GPy) and the team welcomes\n", "contributions.\n", "\n", "The aim for GPy is to be a probabilistic-style programming language,\n", "i.e., you specify the model rather than the algorithm. As well as a\n", "large range of covariance functions the software allows for non-Gaussian\n", "likelihoods, multivariate outputs, dimensionality reduction and\n", "approximations for larger data sets.\n", "\n", "The documentation for GPy can be found\n", "[here](https://gpy.readthedocs.io/en/latest/).\n", "\n", "- GP-LVM Provides probabilistic non-linear dimensionality reduction.\n", "- How to select the dimensionality?\n", "- Need to estimate marginal likelihood.\n", "- In standard GP-LVM it increases with increasing $q$.\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "**Bayesian GP-LVM**\n", "\n", "- Start with a standard GP-LVM.\n", "- Apply standard latent variable approach:\n", " - Define Gaussian prior over , $\\mathbf{Z}$.\n", " - Integrate out .\n", " - Unfortunately integration is intractable.\n", "\n", "\n", "
\n", "\n", "{ }\n", "\n", "
\n", "
" ], "id": "46146fd0-1631-4f57-8921-3434c8681c34" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Standard Variational Approach Fails\n", "\n", "\\[edit\\]\n", "\n", "- Standard variational bound has the form: $$\n", " \\mathcal{L}= \\left\\langle\\log p(\\mathbf{ y}|\\mathbf{Z})\\right\\rangle_{q(\\mathbf{Z})} + \\text{KL}\\left( q(\\mathbf{Z})\\,\\|\\,p(\\mathbf{Z}) \\right)\n", " $$\n", "\n", "The standard variational approach would require the expectation of\n", "$\\log p(\\mathbf{ y}|\\mathbf{Z})$ under $q(\\mathbf{Z})$. $$\n", " \\begin{align}\n", " \\log p(\\mathbf{ y}|\\mathbf{Z}) = & -\\frac{1}{2}\\mathbf{ y}^\\top\\left(\\mathbf{K}_{\\mathbf{ f}, \\mathbf{ f}}+\\sigma^2\\mathbf{I}\\right)^{-1}\\mathbf{ y}\\\\ & -\\frac{1}{2}\\log \\det{\\mathbf{K}_{\\mathbf{ f}, \\mathbf{ f}}+\\sigma^2 \\mathbf{I}} -\\frac{n}{2}\\log 2\\pi\n", " \\end{align}\n", " $$ But this is extremely difficult to compute because\n", "$\\mathbf{K}_{\\mathbf{ f}, \\mathbf{ f}}$ is dependent on $\\mathbf{Z}$ and\n", "it appears in the inverse." ], "id": "d552ac58-e9d5-4dee-a5da-7dc7d22265b8" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Variational Bayesian GP-LVM\n", "\n", "The alternative approach is to consider the collapsed variational bound\n", "(used for low rank (sparse is a misnomer) Gaussian process\n", "approximations. $$\n", " p(\\mathbf{ y})\\geq \\prod_{i=1}^nc_i \\int \\mathcal{N}\\left(\\mathbf{ y}|\\left\\langle\\mathbf{ f}\\right\\rangle,\\sigma^2\\mathbf{I}\\right)p(\\mathbf{ u}) \\text{d}\\mathbf{ u}\n", " $$ $$\n", " p(\\mathbf{ y}|\\mathbf{Z})\\geq \\prod_{i=1}^nc_i \\int \\mathcal{N}\\left(\\mathbf{ y}|\\left\\langle\\mathbf{ f}\\right\\rangle_{p(\\mathbf{ f}|\\mathbf{ u}, \\mathbf{Z})},\\sigma^2\\mathbf{I}\\right)p(\\mathbf{ u}) \\text{d}\\mathbf{ u}\n", " $$ $$\n", " \\int p(\\mathbf{ y}|\\mathbf{Z})p(\\mathbf{Z}) \\text{d}\\mathbf{Z}\\geq \\int \\prod_{i=1}^nc_i \\mathcal{N}\\left(\\mathbf{ y}|\\left\\langle\\mathbf{ f}\\right\\rangle_{p(\\mathbf{ f}|\\mathbf{ u}, \\mathbf{Z})},\\sigma^2\\mathbf{I}\\right) p(\\mathbf{Z})\\text{d}\\mathbf{Z}p(\\mathbf{ u}) \\text{d}\\mathbf{ u}\n", " $$\n", "\n", "To integrate across $\\mathbf{Z}$ we apply the lower bound to the inner\n", "integral. $$\n", " \\begin{align}\n", " \\int \\prod_{i=1}^nc_i \\mathcal{N}\\left(\\mathbf{ y}|\\left\\langle\\mathbf{ f}\\right\\rangle_{p(\\mathbf{ f}|\\mathbf{ u}, \\mathbf{Z})},\\sigma^2\\mathbf{I}\\right) p(\\mathbf{Z})\\text{d}\\mathbf{Z}\\geq & \\left\\langle\\sum_{i=1}^n\\log c_i\\right\\rangle_{q(\\mathbf{Z})}\\\\ & +\\left\\langle\\log\\mathcal{N}\\left(\\mathbf{ y}|\\left\\langle\\mathbf{ f}\\right\\rangle_{p(\\mathbf{ f}|\\mathbf{ u}, \\mathbf{Z})},\\sigma^2\\mathbf{I}\\right)\\right\\rangle_{q(\\mathbf{Z})}\\\\& + \\text{KL}\\left( q(\\mathbf{Z})\\,\\|\\,p(\\mathbf{Z}) \\right) \n", " \\end{align}\n", " $$ \\* Which is analytically tractable for Gaussian $q(\\mathbf{Z})$ and\n", "some covariance functions.\n", "\n", "- Need expectations under $q(\\mathbf{Z})$ of: $$\n", " \\log c_i = \\frac{1}{2\\sigma^2} \\left[k_{i, i} - \\mathbf{ k}_{i, \\mathbf{ u}}^\\top \\mathbf{K}_{\\mathbf{ u}, \\mathbf{ u}}^{-1} \\mathbf{ k}_{i, \\mathbf{ u}}\\right]\n", " $$ and $$\n", " \\log \\mathcal{N}\\left(\\mathbf{ y}|\\left\\langle\\mathbf{ f}\\right\\rangle_{p(\\mathbf{ f}|\\mathbf{ u},\\mathbf{Y})},\\sigma^2\\mathbf{I}\\right) = -\\frac{1}{2}\\log 2\\pi\\sigma^2 - \\frac{1}{2\\sigma^2}\\left(y_i - \\mathbf{K}_{\\mathbf{ f}, \\mathbf{ u}}\\mathbf{K}_{\\mathbf{ u},\\mathbf{ u}}^{-1}\\mathbf{ u}\\right)^2\n", " $$\n", "\n", "- This requires the expectations $$\n", " \\left\\langle\\mathbf{K}_{\\mathbf{ f},\\mathbf{ u}}\\right\\rangle_{q(\\mathbf{Z})}\n", " $$ and $$\n", " \\left\\langle\\mathbf{K}_{\\mathbf{ f},\\mathbf{ u}}\\mathbf{K}_{\\mathbf{ u},\\mathbf{ u}}^{-1}\\mathbf{K}_{\\mathbf{ u},\\mathbf{ f}}\\right\\rangle_{q(\\mathbf{Z})}\n", " $$ which can be computed analytically for some covariance functions\n", " (Damianou et al., 2016) or through sampling (Damianou, 2015;\n", " Salimbeni and Deisenroth, 2017)." ], "id": "3979aaea-6d71-4cfc-8b68-fda7ca79a756" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Manifold Relevance Determination\n", "\n", "\\[edit\\]" ], "id": "d824b88c-4f1b-4f8e-b5f1-9adb73285e0b" }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Modeling Multiple ‘Views’\n", "\n", "\\[edit\\]\n", "\n", "- Single space to model correlations between two different data\n", " sources, e.g., images & text, image & pose.\n", "\n", "- Shared latent spaces: Shon et al. (n.d.);Navaratnam et al. (2007);Ek\n", " et al. (2008b)\n", "\n", "- Effective when the \\`views’ are correlated.\n", "\n", "- But not all information is shared between both \\`views’.\n", "\n", "- PCA applied to concatenated data vs CCA applied to data." ], "id": "3b169b09-8007-4ea6-bfe8-112f4290404d" }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Shared-Private Factorization\n", "\n", "- In real scenarios, the \\`views’ are neither fully independent, nor\n", " fully correlated.\n", "- Shared models\n", " - either allow information relevant to a single view to be mixed\n", " in the shared signal,\n", " - or are unable to model such private information.\n", "- Solution: Model shared and private information Virtanen et al.\n", " (n.d.),Ek et al. (2008a),Leen and Fyfe (2006),Klami and Kaski\n", " (n.d.),Klami and Kaski (2008),Tucker (1958)" ], "id": "40b2ff92-4002-4d8a-90bb-59d5d2dca792" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ " \\begin{center}\n", " \\begin{tikzpicture}\n", " % Define nodes\n", " \\draw node[latent] (XS1) {$\\mathbf{Z}^{(1)}$};\n", " \\draw node[obs, below right=of XS1] (Y1) {$\\dataMatrix^{(1)}$};\n", " \\draw node[latent, above right=of Y1] (X) {$\\latentMatrix$};\n", " \\draw node[obs, below right=of X] (Y2) {$\\dataMatrix^{(2)}$};\n", " \\draw node[latent, above right=of Y2] (XS2) {$\\mathbf{Z}^{(2)}$};\n", " \n", " % Connect the nodes\n", " \\draw [->] (XS1) to (Y1);%\n", " \\draw [->] (XS2) to (Y2);%\n", " \\draw [->] (X) to (Y1);%\n", " \\draw [->] (X) to (Y2);%\n", " \n", " \\end{tikzpicture}\n", " \\end{center}" ], "id": "07bf16d7-8e48-40ae-b358-e95b6bf79ef2" }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Probabilistic CCA is case when dimensionality of $\\mathbf{Z}$\n", " matches $\\mathbf{Y}^{(i)}$ (cf Inter Battery Factor Analysis Tucker\n", " (1958)).\n", "\n", "" ], "id": "368c35cf-d592-44b6-bd12-f8c888f039ba" }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Manifold Relevance Determination\n", "\n", "\\[edit\\]\n", "\n", " \\def\\layersep{2cm}\n", " \\begin{center}\n", " \\begin{tikzpicture}[node distance=\\layersep]\n", "\n", "= \\[text width=4em, text centered\\] % Draw the input layer nodes / in\n", "{1,…,8} % This is the same as writing / in {1/1,2/2,3/3,4/4} (Y-) at (,\n", "0) {$y_\\x$};\n", "\n", " % Draw the hidden layer nodes\n", " \\foreach \\name / \\x in {1,...,6}\n", " \\path[xshift=1cm]\n", " node[latent] (X-\\name) at (\\x cm, \\layersep) {$\\latentScalar_\\x$};\n", "\n", " % Connect every node in the latent layer with every node in the\n", " % data layer.\n", " \\foreach \\source in {1,...,6}\n", " \\foreach \\dest in {1,...,8}\n", " \\draw[->] (X-\\source) -- (Y-\\dest);\n", "\n", "\n", "\n", " % Annotate the layers\n", " \\node[annot,left of=X-1, node distance=1cm] (ls) {Latent space};\n", " \\node[annot,left of=Y-1, node distance=1cm] (ds) {Data space};\n", "\n", "\\end{tikzpicture} \\end{center}\n", "\n", "\n", "" ], "id": "d8620917-37b4-4756-a08a-28bb6626261f" }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Shared GP-LVM\n", "\n", " \\def\\layersep{2cm}\n", " \\begin{center}\n", " \\begin{tikzpicture}[node distance=\\layersep]\n", "\n", "= \\[text width=4em, text centered\\] % Draw the input layer nodes / in\n", "{1,…,4} % This is the same as writing / in {1/1,2/2,3/3,4/4} (Y-) at (,\n", "0) {$y^{(1)}_\\x$};\n", "\n", " \\foreach \\name / \\x in {1,...,4}\n", " % This is the same as writing \\foreach \\name / \\x in {1/1,2/2,3/3,4/4}\n", " \\node[obs] (Z-\\name) at (\\x+5, 0) {$\\dataScalar^{(2)}_\\x$};\n", "\n", " % Draw the hidden layer nodes\n", " \\foreach \\name / \\x in {1,...,6}\n", " \\path[xshift=2cm]\n", " node[latent] (X-\\name) at (\\x cm, \\layersep) {$\\latentScalar_\\x$};\n", "\n", " % Connect every node in the latent layer with every node in the\n", " % data layer.\n", " \\foreach \\source in {1,...,6}\n", " \\foreach \\dest in {1,...,4}\n", " \\draw[->] (X-\\source) -- (Y-\\dest);\n", "\n", " \\foreach \\source in {1,...,6}\n", " \\foreach \\dest in {1,...,4}\n", " \\draw[->] (X-\\source) -- (Z-\\dest);\n", "\n", "\n", " % Annotate the layers\n", " \\node[annot,left of=X-1, node distance=1cm] (ls) {Latent space};\n", " \\node[annot,left of=Y-1, node distance=1cm] (ds) {Data space};\n", "\n", "\\end{tikzpicture} Separate ARD parameters for mappings to\n", "$\\mathbf{Y}^{(1)}$ and $\\mathbf{Y}^{(2)}$. \\end{center}" ], "id": "de789956-99a7-453e-a991-4ffc094e173f" }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Manifold Relevance Determination Results\n", "\n", "\\[edit\\]\n", "\n", "\n", "\n", "Figure: The Yale Faces data set shows different people under\n", "different lighting conditions\n", "\n", "\n", "\n", "Figure: ARD Demonstrates not all of the latent space is used.\n", "\n", "\n", "\n", "Figure: Other applications include inferring pose from silhouette" ], "id": "7714e6cf-fd5b-4451-96e7-04a5c70cb78e" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.lib.display import YouTubeVideo\n", "YouTubeVideo('UvLI8o8z4IU')" ], "id": "09c6a9d3-4dc9-4bd1-87f0-9f000d3eec93" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Figure: A short video description of the Manifold Relevance\n", "Determination method as published at ICML 2012\n", "\n", "" ], "id": "ee2a6dd5-eded-4464-bd54-6d66ee4e9f26" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Started and Downloading Data\n", "\n", "\\[edit\\]" ], "id": "9d1ea359-9330-4670-ac51-39a7faa45c7b" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import GPy\n", "import string" ], "id": "ac11fd88-b977-43fb-939d-ed655772b466" }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following code is for plotting and to prepare the bigger models for\n", "later useage. If you are interested, you can have a look, but this is\n", "not essential." ], "id": "9968e37f-f29f-4171-b80b-a0be83b97c71" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from matplotlib import pyplot as plt\n", "import mlai.plot as plot\n", "import mlai" ], "id": "25576a98-537a-4238-9a5a-3405b93ea92c" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "colors = [\"#3FCC94\", \"#DD4F23\", \"#C6D63B\", \"#D44271\", \n", " \"#E4A42C\", \"#4F9139\", \"#6DDA4C\", \"#85831F\", \n", " \"#B36A29\", \"#CF4E4A\"]" ], "id": "5268add6-2947-449e-8d66-2813ea2d6f7c" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def plot_model(X, which_dims, labels):\n", " fig, ax = plt.subplots(figsize=plot.big_figsize)\n", " X = X[:,which_dims]\n", " ulabs = []\n", " for lab in labels:\n", " if not lab in ulabs:\n", " ulabs.append(lab)\n", " pass\n", " pass\n", " for i, lab in enumerate(ulabs):\n", " ax.scatter(*X[labels==lab].T,marker='o',color=colors[i],label=lab)\n", " pass\n", " pass" ], "id": "80e345dd-856b-4ab6-bb0a-ec4b9c6bd678" }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this lab, we’ll use a data set containing all handwritten digits\n", "from $0 \\cdots 9$ handwritten, provided by de Campos et al. (2009). We\n", "will only use some of the digits for the demonstrations in this lab\n", "class, but you can edit the code below to select different subsets of\n", "the digit data as you wish." ], "id": "1ccfb942-2513-42e8-ab4c-e0645eda4762" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "which = [0,1,2,6,7,9] # which digits to work on\n", "data = pods.datasets.decampos_digits(which_digits=which)\n", "Y = data['Y']\n", "labels = data['str_lbls']" ], "id": "39a19f04-72cd-48f3-9a2e-ecb90bf2dc71" }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can try to plot some of the digits using `plt.matshow` (the digit\n", "images have size `16x16`)." ], "id": "65108859-5f2e-4f7d-a3f8-dc5a66c698c0" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bayesian GPLVM\n", "\n", "\\[edit\\]\n", "\n", "In GP-LVM we use a point estimate of the distribution of the input\n", "$\\mathbf{Z}$. This estimate is derived through maximum likelihood or\n", "through a maximum a posteriori (MAP) approach. Ideally, we would like to\n", "also estimate a distribution over the input $\\mathbf{Z}$. In the\n", "Bayesian GPLVM we approximate the true distribution\n", "$p(\\mathbf{Z}|\\mathbf{Y})$ by a variational approximation\n", "$q(\\mathbf{Z})$ and integrate $\\mathbf{Z}$ out (Titsias and Lawrence,\n", "2010).\n", "\n", "Approximating the posterior in this way allows us to optimize a lower\n", "bound on the marginal likelihood. Handling the uncertainty in a\n", "principled way allows the model to make an assessment of whether a\n", "particular latent dimension is required, or the variation is better\n", "explained by noise. This allows the algorithm to switch off latent\n", "dimensions. The switching off can take some time though, so below in\n", "Section 6 we provide a pre-learnt module, but to complete section 6\n", "you’ll need to be working in the IPython console instead of the\n", "notebook.\n", "\n", "For the moment we’ll run a short experiment applying the Bayesian GP-LVM\n", "with an exponentiated quadratic covariance function." ], "id": "2410d42f-fd1b-4e59-89ea-d39157401ec2" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "# Model optimization\n", "input_dim = 5 # How many latent dimensions to use\n", "kern = GPy.kern.RBF(input_dim,ARD=True) # ARD kernel\n", "m = GPy.models.BayesianGPLVM(Yn, input_dim=input_dim, kernel=kern, num_inducing=25)\n", "\n", "# initialize noise as 1% of variance in data\n", "#m.likelihood.variance = m.Y.var()/100.\n", "m.optimize(messages=1)" ], "id": "a8b7efe4-6f61-47b2-b37f-f5fd21d92160" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plotting the model\n", "plot_model(m.X.mean, m.rbf.lengthscale.argsort()[:2], labels.flatten())\n", "pb.legend()\n", "m.kern.plot_ARD()\n", "# Saving the model:\n", "m.pickle('digit_bgplvm_rbf.pickle')" ], "id": "d58008b3-2f13-4395-9471-4f56fbdc7acc" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because we are now also considering the uncertainty in the model, this\n", "optimization can take some time. However, you are free to interrupt the\n", "optimization at any point selecting `Kernel->Interupt` from the notepad\n", "menu. This will leave you with the model, `m` in the current state and\n", "you can plot and look into the model parameters." ], "id": "07ce1317-c2fe-4041-bccc-dfc898264bc0" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preoptimized Model\n", "\n", "A good way of working with latent variable models is to interact with\n", "the latent dimensions, generating data. This is a little bit tricky in\n", "the notebook, so below in section 6 we provide code for setting up an\n", "interactive demo in the standard IPython shell. If you are working on\n", "your own machine you can try this now. Otherwise continue with section\n", "5." ], "id": "8d5ab560-01dd-4313-b4ce-6d31e9a7e011" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multiview Learning: Manifold Relevance Determination\n", "\n", "In Manifold Relevance Determination we try to find one latent space,\n", "common for $K$ observed output sets (modalities)\n", "$\\{\\mathbf{Y}_{k}\\}_{k=1}^{K}$. Each modality is associated with a\n", "separate set of ARD parameters so that it switches off different parts\n", "of the whole latent space and, therefore, $\\mathbf{Z}$ is softly\n", "segmented into parts that are private to some, or shared for all\n", "modalities. Can you explain what happens in the following example?\n", "\n", "Again, you can stop the optimizer at any point and explore the result\n", "obtained with the so far training:" ], "id": "8e2390fc-1408-4ef5-ab6f-ee6b375a5750" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "m = GPy.examples.dimensionality_reduction.mrd_simulation(optimize = False, plot=False)\n", "m.optimize(messages = True, max_iters=3e3, optimizer = 'bfgs')" ], "id": "fd8a0b7a-bfc3-4332-800d-04ed7031fdbd" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_ = m.X.plot()\n", "m.plot_scales()" ], "id": "d4b18258-dea4-4f84-a294-1c63fb4622de" }, { "cell_type": "markdown", "metadata": {}, "source": [], "id": "c113e8a7-a31c-4e53-8d12-aed69fd0bb40" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interactive Demo: For Use Outside the Notepad\n", "\n", "The module below loads a pre-optimized Bayesian GPLVM model (like the\n", "one you just trained) and allows you to interact with the latent space.\n", "Three interactive figures pop up: the latent space, the ARD scales and a\n", "sample in the output space (corresponding to the current selected latent\n", "point of the other figure). You can sample with the mouse from the\n", "latent space and obtain samples in the output space. You can select\n", "different latent dimensions to vary by clicking on the corresponding\n", "scales with the left and right mouse buttons. This will also cause the\n", "latent space to be projected on the selected latent dimensions in the\n", "other figure." ], "id": "729e381e-f8bb-4e2e-bdd2-fe9633822e55" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import urllib2, os, sys" ], "id": "1fecdc8f-069c-4a26-b465-508124c50d75" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_path = 'digit_bgplvm_demo.pickle' # local name for model file\n", "status = \"\"\n", "\n", "re = 0\n", "if len(sys.argv) == 2:\n", " re = 1\n", "\n", "if re or not os.path.exists(model_path): # only download the model new, if it was not already\n", " url = 'http://staffwww.dcs.sheffield.ac.uk/people/M.Zwiessele/gpss/lab3/digit_bgplvm_demo.pickle'\n", " with open(model_path, 'wb') as f:\n", " u = urllib2.urlopen(url)\n", " meta = u.info()\n", " file_size = int(meta.getheaders(\"Content-Length\")[0])\n", " print \"Downloading: %s\" % (model_path)\n", "\n", " file_size_dl = 0\n", " block_sz = 8192\n", " while True:\n", " buff = u.read(block_sz)\n", " if not buff:\n", " break\n", " file_size_dl += len(buff)\n", " f.write(buff)\n", " sys.stdout.write(\" \"*(len(status)) + \"\\r\")\n", " status = r\"{:7.3f}/{:.3f}MB [{: >7.2%}]\".format(file_size_dl/(1.*1e6), file_size/(1.*1e6), float(file_size_dl)/file_size)\n", " sys.stdout.write(status)\n", " sys.stdout.flush()\n", " sys.stdout.write(\" \"*(len(status)) + \"\\r\")\n", " print status\n", "else:\n", " print(\"Already cached, to reload run with 'reload' as the only argument\")" ], "id": "caeb3c5f-1423-460f-93c3-cb32bdaedb8d" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import cPickle as pickle\n", "with open('./digit_bgplvm_demo.pickle', 'rb') as f:\n", " m = pickle.load(f)" ], "id": "e1d20731-1b65-4b45-9a76-d27b91721713" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Prepare for plotting of this model. If you run on a webserver the\n", "interactive plotting will not work. Thus, you can skip to the next\n", "codeblock and run it on your own machine, later." ], "id": "f9e93f19-a6aa-453b-bf38-1f2a81134071" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig = pb.figure('Latent Space & Scales', figsize=(16,6))\n", "ax_latent = fig.add_subplot(121)\n", "ax_scales = fig.add_subplot(122)\n", "\n", "fig_out = pb.figure('Output', figsize=(1,1))\n", "ax_image = fig_out.add_subplot(111)\n", "fig_out.tight_layout(pad=0)\n", "\n", "data_show = GPy.plotting.matplot_dep.visualize.image_show(m.Y[0:1, :], dimensions=(16, 16), transpose=0, invert=0, scale=False, axes=ax_image)\n", "lvm_visualizer = GPy.plotting.matplot_dep.visualize.lvm_dimselect(m.X.mean.copy(), m, data_show, ax_latent, ax_scales, labels=labels.flatten())" ], "id": "7d0098bc-7d39-464d-a2e0-222b2674019b" }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Observations\n", "\n", "Confirm the following observations by interacting with the demo:\n", "\n", "- We tend to obtain more “strange” outputs when sampling from latent\n", " space areas away from the training inputs.\n", "- When sampling from the two dominant latent dimensions (the ones\n", " corresponding to large scales) we differentiate between all digits.\n", " Also note that projecting the latent space into the two dominant\n", " dimensions better separates the classes.\n", "- When sampling from less dominant latent dimensions the outputs vary\n", " in a more subtle way.\n", "\n", "You can also run the dimensionality reduction example" ], "id": "e20e4011-2ee3-4e33-9c00-b267d4d709d8" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "GPy.examples.dimensionality_reduction.bgplvm_simulation()" ], "id": "52ef32b7-80ba-498b-b02c-219eba98d870" }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Questions\n", "\n", "- Can you see a difference in the ARD parameters to the non Bayesian\n", " GPLVM?\n", "- How does the Bayesian GPLVM allow the ARD parameters of the RBF\n", " kernel magnify the two first dimensions?\n", "- Is Bayesian GPLVM better in differentiating between different kinds\n", " of digits?\n", "- Why does the starting noise variance have to be lower then the\n", " variance of the observed values?\n", "- How come we use the lowest variance when using a linear kernel, but\n", " the highest lengtscale when using an RBF kernel?" ], "id": "01bfb372-5c15-4f8a-b823-f8133522d19f" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data for Blastocyst Development in Mice: Single Cell TaqMan Arrays\n", "\n", "Now we analyze some single cell data from Guo et al. (2010). Tey\n", "performed qPCR TaqMan array on single cells from the developing\n", "blastocyst in mouse. The data is taken from the early stages of\n", "development when the Blastocyst is forming. At the 32 cell stage the\n", "data is already separated into the trophectoderm (TE) which goes onto\n", "form the placenta and the inner cellular mass (ICM). The ICM further\n", "differentiates into the epiblast (EPI)—which gives rise to the endoderm,\n", "mesoderm and ectoderm—and the primitive endoderm (PE) which develops\n", "into the amniotic sack. Guo et al. (2010) selected 48 genes for\n", "expression measurement. They labelled the resulting cells and their\n", "labels are included as an aide to visualization.\n", "\n", "They first visualized their data using principal component analysis. In\n", "the first two principal components this fails to separate the domains.\n", "This is perhaps because the principal components are dominated by the\n", "variation in the 64 cell systems. This in turn may be because there are\n", "more cells from the data set in that regime, and may be because the\n", "natural variation is greater. We first recreate their visualization\n", "using principal component analysis.\n", "\n", "In this notebook we will perform PCA on the original data, showing that\n", "the different regimes do not separate.\n", "\n", "Next we load in the data. We’ve provided a convenience function for\n", "loading in the data with `pods`. It is loaded in as a `pandas`\n", "DataFrame. This allows us to summarize it with the `describe` attribute." ], "id": "b3b516dc-7fe5-4971-af8f-2870c6f3e388" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods" ], "id": "765fd2fa-865e-401c-9f23-498ef6a96686" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = pods.datasets.singlecell()\n", "Y = data['Y']\n", "Y.describe" ], "id": "edc5eb0c-4f3d-4f94-af72-8e427467289a" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Principal Component Analysis\n", "\n", "Now we follow Guo et al. (2010) in performing PCA on the data. Rather\n", "than calling a ‘PCA routine’, here we break the algorithm down into its\n", "steps: compute the data covariance, compute the eigenvalues and\n", "eigenvectors and sort according to magnitude of eigenvalue. Because we\n", "want to visualize the data, we’ve chose to compute the eigenvectors of\n", "the *inner product matrix* rather than the covariance matrix. This\n", "allows us to plot the eigenvalues directly. However, this is less\n", "efficient (in this case because the number of genes is smaller than the\n", "number of data) than computing the eigendecomposition of the covariance\n", "matrix." ], "id": "bf34b36e-6ac0-42da-8923-69f01388aa97" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ], "id": "b39b923d-1547-4a84-9c0b-7b954ed31e6a" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# obtain a centred version of data.\n", "centredY = Y - Y.mean()\n", "# compute inner product matrix\n", "C = np.dot(centredY,centredY.T)\n", "# perform eigendecomposition\n", "V, U = np.linalg.eig(C)\n", "# sort eigenvalues and vectors according to size\n", "ind = V.argsort()\n", "ev = V[ind[::-1]]\n", "U = U[:, ind[::-1]]" ], "id": "473c5413-99ac-43ee-96bd-649452a2fcec" }, { "cell_type": "markdown", "metadata": {}, "source": [ "To visualize the result, we now construct a simple helper function. This\n", "will ensure that the plots have the same legends as the GP-LVM plots we\n", "use below." ], "id": "aef1b692-8ad2-4a85-ba85-a5d10f67cceb" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import GPy\n", "import matplotlib.pyplot as plt" ], "id": "2ed2dbcf-2722-44a3-b9af-85fce0b17ef1" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def plot_labels(ax, x, y, labels, symbols):\n", " \"\"\"A small helper function for plotting with labels\"\"\"\n", " # make sure labels are in order of input:\n", " ulabels = []\n", " for lab in labels:\n", " if not lab in ulabels:\n", " ulabels.append(lab)\n", " for i, label in enumerate(ulabels):\n", " symbol = symbols[i % len(symbols)]\n", " ind = labels == label\n", " ax.plot(x[ind], y[ind], symbol)\n", " ax.legend(ulabels)" ], "id": "16baa8b6-59d2-497a-b0fc-dde157cae201" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## PCA Result\n", "\n", "Now, using the helper function we can plot the results with appropriate\n", "labels." ], "id": "d86cabb2-0024-43ef-9ed0-8948b57922ab" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import mlai.plot as plot\n", "import mlai" ], "id": "34bc8056-8e26-4476-82e7-40c4a488f5cd" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_figsize)\n", "plot_labels(ax, U[:, 0], U[:, 1], data['labels'], '<>^vsd')\n", "\n", "mlai.write_figure('singlecell-data-pca.svg', directory='./datasets')" ], "id": "88703d09-b21c-4a2a-a4dd-0e4415278f17" }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "Figure: First two principal compoents of the Guo et al. (2010)\n", "blastocyst development data." ], "id": "d36be001-7ed5-482b-8499-24bd93812238" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bayesian GP-LVM\n", "\n", "\\[edit\\]\n", "\n", "Here we show the new code that uses the Bayesian GP-LVM to fit the data.\n", "This means we can automatically determine the dimensionality of the\n", "model whilst fitting a non-linear dimensionality reduction. The\n", "approximations we use also mean that it is faster than the original\n", "GP-LVM." ], "id": "5f2f6c2f-51bd-43b4-9174-41d29f9255d3" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import GPy" ], "id": "29a77fc5-48a1-4fbb-bd66-8da251931ca4" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "kernel=GPy.kern.RBF(5,ARD=1)+GPy.kern.Bias(5)\n", "model = GPy.models.BayesianGPLVM(Y.values, 5, num_inducing=15, kernel=kernel)\n", "model.optimize(messages=True)" ], "id": "e9520e0d-b66d-4971-a507-2e78409147b2" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import mlai.plot as plot\n", "import mlai" ], "id": "20c59b9d-6732-48d5-b6e1-91ba7c62463c" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_figsize)\n", "model.plot_latent(ax=ax, labels=data['labels'], marker='<>^vsd')\n", "\n", "mlai.write_figure('singlecell-bayes-gplvm.svg', directory='./gplvm')" ], "id": "b4fedf96-c50b-4fd4-af31-d418d8191d06" }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "Figure: Visualisation of the Guo et al. (2010) blastocyst development\n", "data with the Bayesian GP-LVM." ], "id": "3d50ddb9-2c2a-496f-88f6-0b00ba8b6085" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "model.kern.plot_ARD(ax=ax)\n", "\n", "mlai.write_figure('singlecell-bayes-gplvm-ard.svg', directory='./gplvm')" ], "id": "88eb4792-d165-4a91-93bf-d5454aceefe3" }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "Figure: The ARD parameters of the Bayesian GP-LVM for the Guo et al.\n", "(2010) blastocyst development data.\n", "\n", "This gives a really nice result. Broadly speaking two latent dimensions\n", "dominate the representation. When we visualize using these two\n", "dimensions we can see the entire cell phylogeny laid out nicely in the\n", "two dimensions." ], "id": "d6d393fc-da9b-4fc8-a5f9-15ad92a6d5f3" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Thanks!\n", "\n", "For more information on these subjects and more you might want to check\n", "the following resources.\n", "\n", "- twitter: [@lawrennd](https://twitter.com/lawrennd)\n", "- podcast: [The Talking Machines](http://thetalkingmachines.com)\n", "- newspaper: [Guardian Profile\n", " Page](http://www.theguardian.com/profile/neil-lawrence)\n", "- blog:\n", " [http://inverseprobability.com](http://inverseprobability.com/blog.html)" ], "id": "b678e7c9-b1a1-40b8-9813-1cf99895a0d1" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References" ], "id": "9ba4614d-9c00-4969-aab8-197ffd460dd0" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Damianou, A., 2015. Deep Gaussian processes and variational propagation\n", "of uncertainty (PhD thesis). University of Sheffield.\n", "\n", "Damianou, A., Titsias, M.K., Lawrence, N.D., 2016. Variational inference\n", "for latent variables and uncertain inputs in Gaussian processes. Journal\n", "of Machine Learning Research 17.\n", "\n", "de Campos, T.E., Babu, B.R., Varma, M., 2009. Character recognition in\n", "natural images, in: Proceedings of the Fourth International Conference\n", "on Computer Vision Theory and Applications - Volume 2: VISAPP,\n", "(VISIGRAPP 2009). INSTICC; SciTePress, pp. 273–280.\n", "\n", "\n", "Ek, C.H., Rihan, J., Torr, P.H.S., Rogez, G., Lawrence, N.D., 2008a.\n", "Ambiguity modeling in latent spaces, in: Popescu-Belis, A.,\n", "Stiefelhagen, R. (Eds.), Machine Learning for Multimodal Interaction\n", "(MLMI 2008), LNCS. Springer-Verlag, pp. 62–73.\n", "\n", "Ek, C.H., Torr, P.H.S., Lawrence, N.D., 2008b. Gaussian process latent\n", "variable models for human pose estimation, in: Popescu-Belis, A.,\n", "Renals, S., Bourlard, H. (Eds.), Machine Learning for Multimodal\n", "Interaction (MLMI 2007), LNCS. Springer-Verlag, Brno, Czech Republic,\n", "pp. 132–143. \n", "\n", "Guo, G., Huss, M., Tong, G.Q., Wang, C., Sun, L.L., Clarke, N.D.,\n", "Robsonemail, P., 2010. Resolution of cell fate decisions revealed by\n", "single-cell gene expression analysis from zygote to blastocyst.\n", "Developmental Cell 18, 675–685.\n", "\n", "\n", "Klami, A., Kaski, S., n.d. Local dependent components analysis.\n", "\n", "Klami, A., Kaski, S., 2008. Probabilistic approach to detecting\n", "dependencies between data sets. Neurocomputing 72, 39–46.\n", "\n", "Leen, G., Fyfe, C., 2006. A Gaussian process latent variable model\n", "formulation of canonical correlation analysis, in: European Symposium on\n", "Artificial Neural Networks. Bruges (Belgium).\n", "\n", "Navaratnam, R., Fitzgibbon, A., Cipolla, R., 2007. The joint manifold\n", "model for semi-supervised multi-valued regression, in: IEEE\n", "International Conference on Computer Vision (ICCV). IEEE Computer\n", "Society Press.\n", "\n", "Salimbeni, H., Deisenroth, M., 2017. [Doubly stochastic variational\n", "inference for deep Gaussian\n", "processes](http://papers.nips.cc/paper/7045-doubly-stochastic-variational-inference-for-deep-gaussian-processes.pdf),\n", "in: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R.,\n", "Vishwanathan, S., Garnett, R. (Eds.), Advances in Neural Information\n", "Processing Systems 30. Curran Associates, Inc., pp. 4591–4602.\n", "\n", "Shon, A.P., Grochow, K., Hertzmann, A., Rao, R.P.N., n.d. Learning\n", "shared latent structure for image synthesis and robotic imitation.\n", "\n", "Titsias, M.K., Lawrence, N.D., 2010. Bayesian Gaussian process latent\n", "variable model. pp. 844–851.\n", "\n", "Tucker, L.R., 1958. An inter-battery method of factor analysis.\n", "Psychometrika 23, 111–136.\n", "\n", "Virtanen, S., Klami, A., Kaski, S., n.d. Bayesian CCA via group\n", "sparsity." ], "id": "d6205f57-7716-4aea-9103-4deb80d1a03d" } ], "nbformat": 4, "nbformat_minor": 5, "metadata": {} }