{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Dimensionality Reduction: Latent Variable Modelling\n", "\n", "### 17th November 2015 Neil D. Lawrence\n", "\n", "So far in our classes we have focussed mainly on regression problems, which are examples of supervised learning. We have considered the relationship between the likelihood and the objective function and we have shown how we can find paramters by maximizing the likelihood (equivalent to minimizing the objective function) and in the last session we saw how we can *marginalize* the parameters in a process known as Bayesian inference.\n", "\n", "Now we are going to turn to a different form of learning, commonly known as *unsupervised* learning. In unsupervised learning our data isn't necessarily labelled in any form, but we want models that give us a better understanding of the data. We've actually seen an example of this already with [*matrix factorization* for collaborative filtering](./week2.ipynb), which we introduces in the context of *objective functions*. Now we will introduce a more probabilistic approach to such models, specifically we are interested in *latent variable* modelling.\n", "\n", "## Latent Variables\n", "\n", "Latent means hidden, and hidden variables are simply *unobservable* variables. The idea of a latent variable is crucial to the concept of artificial intelligence, machine learning and experimental design. A latent variable could take many forms. We might observe a man walking along a road with a large bag of clothes and we might *infer* that the man is walking to the laundrette. Our observations are a highly complex data space, the response in our eyes is processed through our visual cortex, the combination of the indidivuals limb movememnts and the direction they are walking in all conflate in our heads to cause us to infer that (perhaps) the individual is going to the laundrette. We don't *know* that the man is walking to the laundrette, but we have a model of the world that suggests that it's a likely outcome for the very complex data. In some ways the latent variable can be seen as a *compression* of this very complex scene. If I were writing a book, I might write that \"A man tripped over whilst walking to the laundrette\". In the reader's mind an image of a man, perhaps laden with dirty clothes, may occur. All these ideas come from our expectations of the world around us. We can make further inference about the man, some of it perhaps plausible others less so. The man may be going to the laundrette because his washing machine is broken, or because he doesn't have a large enough flat to have a washing machine, or because he's carrying a duvet, or because he doesn't like ironing. All of these may *increase* in probability given our observation, but they are still *latent* variables. Unless we follow the man back to his appartment, or start making other enquirires about the man, we don't know the true answer. \n", "\n", "It's clear that to do inference about any complex system we *must* include latent variables. Latent variables are extremely powerful. In robotics, they are used to represent the *state* of the robot. The state of the robot may include its position (in x, y coordinates) its speed, its direction of facing. How are *these* variables unknown to the robot? Well the robot only posesses *sensors*, it can make observations of the nearest object in a certain direction, and it may have a map of its environment. If we represent the state of the robot as its position on a map, it may be uncertain of that position. If you go walking or running in the hills around Sheffield, you can take a very high quality ordnance survey map with you. However, unless you are a really excellent orienteer, when you are far from any given landmark, you will probably be *uncertain* about your true position on the map. These states are also latent variables. \n", "\n", "In statistical analysis of experiments you try to control for each aspect of the experiment, in particular by *randomization*. So if I'm interested in the ability of a particular fertilizer to improve the yield of a particular plant I may design an experiment where I apply the fertilizer to some plants (the treatment group) and withold the fertilizer from others (the control group). I then test to see whether the yield from the treatment group is better (or worse) than the control group. I may find that I have an excellent yield for the treatment group. However, what if I'd (unknowlingly) planted all my treatment plants in a sunny part of the field, and all the control plants in a shady part of the field. That would also be a latent variable, in this case known as a *confounder*. In statistical experimental design *randomization* is used to attempt to eliminate the correlated effects of these confounders: you aim to ensure that if these confounders *do* exist their effects are not correlated with treatment and contorl. This is known as a [randomized control trial](http://en.wikipedia.org/wiki/Randomized_controlled_trial). \n", "\n", "Greek philosophers worried a great deal about what was knowable and what was unknowable. Adherents of [philosophical Skeptisism](http://en.wikipedia.org/wiki/Skepticism) were inspired by the idea that since your senses sometimes give you contradictory information, they cannot be trusted, and in extreme cases they chose to *ignore* their senses. This is an acknowledgement that very often the true state of the world cannot be known with precision. Unfortunately, these philosophers didn't have a good understanding of probability, so they were unable to encapsulate their ideas through a *degree* of belief.\n", "\n", "We often use language to express the compression of a complex behavior or patterns in a simpler way, for example we talk about motives as a useful distallation for a perhaps very complex patter of behavior. In physics we use principles of causation and simple laws to describe the world around us. Such motives or underlying principles are difficult to observe directly, our conclusions about them emerge over a period of time by observing indirect consequences of the latent variables.\n", "\n", "Epistemic uncertainty allows us to deal with these worries by associating our degree of belief about the state of the world with a probaiblity distribution. This core idea underpins state space modelling, probabilistic graphical models and the wider field of latent variable modelling. In this session we are going to explore the idea in a simple linear system and see how it relates to *factor analysis* and *principal component analysis*.\n", "\n", "## Your Personality\n", "\n", "At the beginning of the 20th century there was a great deal of interest amoungst psychologists in formalizing patterns of thought. The approach they used became known as factor analysis. The principle is that we observe a potentially high dimensional vector of characteristics about an individual. To formalize this, social scientists designed questionaires. We can envisage many questions that we may ask, but the assumption is that underlying these questions there are only a few traits that dictate the behavior. These models are known as latent trait models and the analysis is sometimes known as factor analysis. The idea is that there are a few characteristic traits that we are looking to discern. These traits or factors can be extracted by assimilating the high dimensional characteristics of the individual into a few latent factors. \n", "\n", "### Factor Analysis Model\n", "\n", "This causes us to consider a model as follows, if we are given a high dimensional vector of features (perhaps questionaire answers) associated with an individual, $\\mathbf{y}$, we assume that these factors are actually generated from a low dimensional vector latent traits, or latent variables, which determine the personality.\n", "$$\n", "\\mathbf{y} = \\mathbf{f}(\\mathbf{x}) + \\boldsymbol{\\epsilon}\n", "$$\n", "where $\\mathbf{f}(\\mathbf{x})$ is a *vector valued* function that is dependent on the latent traits and $\\boldsymbol{\\epsilon}$ is some corrupting noise. For simplicity, we assume that the function is given by a *linear* relationship,\n", "$$\n", "\\mathbf{f}(\\mathbf{x}) = \\mathbf{W}\\mathbf{x}\n", "$$\n", "where we have introduced a matrix $\\mathbf{W}$ that is sometimes referred to as the *factor loadings* but we also immediately see is related to our *multivariate linear regression* models from the [previous session on linear regression](./week3.ipynb). That is because our vector valued function is of the form\n", "$$\n", "\\mathbf{f}(\\mathbf{x}) = \\begin{bmatrix} f_1(\\mathbf{x}) \\\\ f_2(\\mathbf{x}) \\\\ \\vdots \\\\ f_p(\\mathbf{x})\\end{bmatrix}\n", "$$\n", "where there are $p$ features associated with the individual. If we consider any of these functions individually we have a prediction function that looks like a regression model,\n", "$$\n", "f_j(\\mathbf{x}) = \\mathbf{w}_{j, :}^\\top \\mathbf{x},\n", "$$\n", "for each element of the vector valued function, where $\\mathbf{w}_{:, j}$ is the $j$th column of the matrix $\\mathbf{W}$. In that context each column of $\\mathbf{W}$ is a vector of *regression weights*. This is a multiple input and multiple output regression. Our inputs (or covariates) have dimensionality greater than 1 and our outputs (or response variables) also have dimensionality greater than one. Just as in a standard regression, we are assuming that we don't observe the function directly (note that this *also* makes the function a *type* of latent variable), but we observe some corrupted variant of the function, where the corruption is given by $\\boldsymbol{\\epsilon}$. Just as in linear regression we can assume that this corruption is given by Gaussian noise, where the noise for the $j$th element of $\\mathbf{y}$ is by,\n", "$$\n", "\\epsilon_j \\sim \\mathcal{N}(0, \\sigma^2_j).\n", "$$\n", "Of course, just as in a regression problem we also need to make an assumption across the individual data points to form our full likelihood. Our data set now consists of many observations of $\\mathbf{y}$ for diffetent individuals. We store these observations in a *design matrix*, $\\mathbf{Y}$, where each *row* of $\\mathbf{Y}$ contains the observation for one individual. To emphasize that $\\mathbf{y}$ is a vector derived from a row of $\\mathbf{Y}$ we represent the observation of the features associated with the $i$th individual by $\\mathbf{y}_{i, :}$, and place each individual in our data matrix,\n", "$$\n", "\\mathbf{Y} = \\begin{bmatrix} \\mathbf{y}_{1, :}^\\top \\\\ \\mathbf{y}_{2, :}^\\top \\\\ \\vdots \\\\ \\mathbf{y}_{n, :}^\\top\\end{bmatrix},\n", "$$\n", "where we have $n$ data points. Our data matrix therefore has $n$ rows and $p$ columns. The point to notice here is that each data obsesrvation appears as a row vector in the design matrix (thus the transpose operation inside the brackets). Our prediction functions are now actually a *matrix value* function, \n", "$$\n", "\\mathbf{F} = \\mathbf{X}\\mathbf{W}^\\top,\n", "$$\n", "where for each matrix the data points are in the rows and the data features are in the columns. This implies that if we have $q$ inputs to the function we have $\\mathbf{F}\\in \\Re^{n\\times p}$, $\\mathbf{W} \\in \\Re^{p \\times q}$ and $\\mathbf{X} \\in \\Re^{n\\times q}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Assignment Question 1\n", "\n", "Show that, given all the definitions above, if,\n", "$$\n", "\\mathbf{F} = \\mathbf{X}\\mathbf{W}^\\top\n", "$$\n", "and the elements of the vector valued function $\\mathbf{F}$ are given by \n", "$$\n", "f_{i, j} = f_j(\\mathbf{x}_{i, :}),\n", "$$\n", "where $\\mathbf{x}_{i, :}$ is the $i$th row of the latent variables, $\\mathbf{X}$, then show that\n", "$$\n", "f_j(\\mathbf{x}_{i, :}) = \\mathbf{w}_{j, :}^\\top \\mathbf{x}_{i, :}\n", "$$\n", "\n", "*10 marks*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Question 1 Answer\n", "\n", "Write your answer to the question in this box." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Latent Variables\n", "\n", "The difference between this model and a multiple output regression is that in the regression case we are provided with the covariates $\\mathbf{X}$, here they are *latent variables*. These variables are unknown. \n", "\n", "Just as we have done in the past for unknowns, we now treat them with a probability distribution. In *factor analysis* we assume that the latent variables have a Gaussian density which is independent across both across the latent variables associated with the different data points, and across those associated with different data features, so we have,\n", "$$\n", "x_{i,j} \\sim \\mathcal{N}(0, 1),\n", "$$\n", "and we can write the density governing the latent variable associated with a single point as,\n", "$$\n", "\\mathbf{x}_{i, :} \\sim \\mathcal{N}(\\mathbf{0}, \\mathbf{I}).\n", "$$\n", "If we consider the values of the function for the $i$th data point as\n", "$$\n", "\\mathbf{f}_{i, :} = \\mathbf{f}(\\mathbf{x}_{i, :}) = \\mathbf{W}\\mathbf{x}_{i, :} \n", "$$\n", "then we can use the rules for multivariate Gaussian relationships to write that\n", "$$\n", "\\mathbf{f}_{i, :} \\sim \\mathcal{N}(\\mathbf{0}, \\mathbf{W}\\mathbf{W}^\\top)\n", "$$\n", "which implies that the distribution for $\\mathbf{y}_{i, :}$ is given by\n", "$$\n", "\\mathbf{y}_{i, :} = \\sim \\mathcal{N}(\\mathbf{0}, \\mathbf{W}\\mathbf{W}^\\top + \\boldsymbol{\\Sigma})\n", "$$\n", "where $\\boldsymbol{\\Sigma}$ the covariance of the noise variable, $\\epsilon_{i, :}$ which for factor analysis is a diagonal matrix (because we have assumed that the noise was *independent* across the features),\n", "$$\n", "\\boldsymbol{\\Sigma} = \\begin{bmatrix}\\sigma^2_{1} & 0 & 0 & 0\\\\\n", " 0 & \\sigma^2_{2} & 0 & 0\\\\\n", " 0 & 0 & \\ddots & 0\\\\\n", " 0 & 0 & 0 & \\sigma^2_p\\end{bmatrix}.\n", "$$\n", "For completeness, we could also add in a *mean* for the data vector $\\boldsymbol{\\mu}$, \n", "$$\n", "\\mathbf{y}_{i, :} = \\mathbf{W} \\mathbf{x}_{i, :} + \\boldsymbol{\\mu} + \\boldsymbol{\\epsilon}_{i, :}\n", "$$\n", "which would give our marginal distribution for $\\mathbf{y}_{i, :}$ a mean $\\boldsymbol{\\mu}$. However, the maximum likelihood solution for $\\boldsymbol{\\mu}$ turns out to equal the empirical mean of the data,\n", "$$\n", "\\boldsymbol{\\mu} = \\frac{1}{n} \\sum_{i=1}^n \\mathbf{y}_{i, :},\n", "$$\n", "*regardless* of the form of the covariance, $\\mathbf{C} = \\mathbf{W}\\mathbf{W}^\\top + \\boldsymbol{\\Sigma}$. As a result it is very common to simply preprocess the data and ensure it is zero mean. We will follow that convention for this session.\n", "\n", "The prior density over latent variables is independent, and the likelihood is independent, that means that the marginal likelihood here is also independent over the data points. \n", " \n", "Factor analysis was developed mainly in psychology and the social sciences for understanding personality and intelligence. [Charles Spearman](http://en.wikipedia.org/wiki/Charles_Spearman) was concerned with the measurements of \"the abilities of man\" and is credited with the earliest version of factor analysis. \n", " \n", "## Principal Component Analysis\n", "\n", "In 1933 [Harold Hotelling](http://en.wikipedia.org/wiki/Harold_Hotelling) published on *principal component analysis* the first mention of this approach. Hotelling's inspiration was to provide mathematical foundation for factor analysis methods that were by then widely used within psychology and the social sciences. His model was a factor analysis model, but he considered the noiseless 'limit' of the model. In other words he took $\\sigma^2_i \\rightarrow 0$ so that he had\n", "$$\n", "\\mathbf{y}_{i, :} \\sim \\lim_{\\sigma^2 \\rightarrow 0} \\mathcal{N}(\\mathbf{0}, \\mathbf{W}\\mathbf{W}^\\top + \\sigma^2 \\mathbf{I}).\n", "$$\n", "The paper had two unfortunate effects. Firstly, the resulting model is no longer valid probablistically, because the covariance of this Gaussian is 'degenerate'. Because $\\mathbf{W}\\mathbf{W}^\\top$ has rank of at most $q$ where $q$n$ so you need to consider how to do the larger eigenvalue probleme efficiently without large demands on computer memory.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data is actually quite high dimensional, and solving the eigenvalue problem in the high dimensional space can take some time. At this point we turn to a neat trick, you don't have to solve the full eigenvalue problem in the $p\\times p$ covariance, you can choose instead to solve the related eigenvalue problem in the $n \\times n$ space, and in this case $n=200$ which is much smaller than $p$.\n", "\n", "The original eigenvalue problem has the form\n", "$$\n", "\\mathbf{Y}^\\top\\mathbf{Y} \\mathbf{U} = \\mathbf{U}\\boldsymbol{\\Lambda}\n", "$$\n", "But if we premultiply by $\\mathbf{Y}$ then we can solve,\n", "$$\n", "\\mathbf{Y}\\mathbf{Y}^\\top\\mathbf{Y} \\mathbf{U} = \\mathbf{Y}\\mathbf{U}\\boldsymbol{\\Lambda}\n", "$$\n", "but it turns out that we can write\n", "$$\n", "\\mathbf{U}^\\prime = \\mathbf{Y} \\mathbf{U} \\Lambda^{\\frac{1}{2}}\n", "$$\n", "where $\\mathbf{U}^\\prime$ is an orthorormal matrix because\n", "$$\n", "\\left.\\mathbf{U}^\\prime\\right.^\\top\\mathbf{U}^\\prime = \\Lambda^{-\\frac{1}{2}}\\mathbf{U}\\mathbf{Y}^\\top\\mathbf{Y} \\mathbf{U} \\Lambda^{-\\frac{1}{2}}\n", "$$\n", "and since $\\mathbf{U}$ diagonalises $\\mathbf{Y}^\\top\\mathbf{Y}$, \n", "$$\n", "\\mathbf{U}\\mathbf{Y}^\\top\\mathbf{Y} \\mathbf{U} = \\Lambda\n", "$$\n", "then \n", "$$\n", "\\left.\\mathbf{U}^\\prime\\right.^\\top\\mathbf{U}^\\prime = \\mathbf{I}\n", "$$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Olivetti Faces\n", "\n", "You too can create your own eigenfaces. In this example we load in the 'Olivetti Face' data set, a small data set of 200 faces from the [Olivetti Research Laboratory](http://en.wikipedia.org/wiki/Olivetti_Research_Laboratory). Below we load in the data and display an image of the second face in the data set (i.e., indexed by 1). " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = pods.datasets.olivetti_glasses()\n", "Y = data['X'] # this data set is set up for classification, but we will model the outputs.\n", "lbls = data['Y']\n", "import matplotlib.cm as cm # import color map\n", "display_index = 0\n", "plt.imshow(np.reshape(Y[display_index, :].flatten(), (64, 64)).T, cmap = cm.Greys_r)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that to display the face we had to reshape the appropriate row of the data matrix. This is because the images are turned into vectors by stacking columns of the image on top of each other to form a vector. The operation\n", "```python \n", "im = np.reshape(Y[1, :].flatten(), (64, 64)).T\n", "```\n", "recovers the original image into a matrix `im` by using the `np.reshape` function to return the vector to a matrix.\n", "\n", "### Visualizing the Eigenvectors\n", "\n", "Each retained eigenvector is stored in the $j$th column of $\\mathbf{U}$. Each of these eigenvectors is associated with particular directions of variation in the original data. Principal component analysis implies that we can reconstruct any face by using a weighted sum of these eigenvectors where the weights for each face are given by the relevant vector of the latent variables, $\\mathbf{x}_{i, :}$ and the diagonal elements of the matrix $\\mathbf{L}$. We can visualize the eigenvectors $\\mathbf{U}$ as images by performing the same reshape operation we used to recover the image associated with a data point above. Below we do this for the first nine eigenvectors of the Olivetti Faces data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "width=3\n", "height=3\n", "q = width*height\n", "fig, ax = plt.subplots(width,height,figsize=(12,12))\n", "\n", "U, ell, sigma2 = ppca(Y, q)\n", "lat = 0\n", "for i in range(width):\n", " for j in range(height):\n", " ax[i, j].imshow(np.reshape(U[:, lat].flatten(), (64, 64)).T, cmap = cm.Greys_r)\n", " ax[i, j].set_title('Principle Component ' + str(lat+1))\n", " lat += 1\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reconstruction\n", "\n", "We can now attempt to reconstruct a given training point from these eigenvectors. As we mentioned above, the reconstruction is dependent on the value of the latent variable and the weights from the matrix $\\mathbf{L}$. First let's compute the value of the latent variables for the point we want to construct. Then we'll use them to compute the weightings of each of the eigenvectors." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mu_x, C_x = posterior(Y, U, ell, sigma2)\n", "reconstruction_weights = mu_x[display_index, :]*ell\n", "print(reconstruction_weights)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This vector of reconstruction weights is applied to the 'template images' given by the eigenvectors to give us our reconstruction. Below we weight these templates and combine to form the reconstructed image, and show the comparison to the original image." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(1, 2,figsize=(12,6))\n", "ax[0].imshow(np.reshape(Y[display_index, :].flatten(), (64, 64)).T, cmap = cm.Greys_r)\n", "ax[0].set_title('Original Example no ' + str(display_index))\n", "ax[1].imshow(np.reshape(np.dot(U,reconstruction_weights) + Y.mean(axis=0)[None, :], (64, 64)).T, cmap = cm.Greys_r)\n", "ax[1].set_title('Reconstruction of Example from ' + str(len(reconstruction_weights)) + ' Latent Variables')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The quality of the reconstruction is a bit blurry, it can be improved by increasing the number of template images used (i.e. increasing the *latent dimensionality*). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Gene Expression\n", "\n", "Each of the cells in your body stores your entire genetic code in your DNA, but at any one moment it is only 'expressing' a small portion of that code. Your cells are mainly constructed of protein, and these proteins are manufactured by first transcribing the DNA to RNA and then translating the RNA to protein. If the DNA is the cells hard drive, then one role of the RNA is to act like a cache of data that is being read from the hard drive at any one time. Gene expression arrays allow us to measure the quantity of different types of RNA in the cell, effectively analyzing what's in the cache (although we have to destroy the cell or the tissue to access it). A gene expression experiment often consists of a time course or a series of experiments that characterise the gene expression of cells at any given time.\n", "\n", "We will now load in one of the earliest gene expression data sets from a [1998 paper by Spellman et al.](http://www.ncbi.nlm.nih.gov/pubmed/9843569), it consists of gene expression measurements of over six thousand genes in a range of conditions for brewer's yeast. The experiment was designed for understanding the cell cycle of the genes. The expectation is that there should be oscillating signals inside the cell.\n", "\n", "First we extract the principale components of the gene expression." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# load in data and replace missing values with zero\n", "data=pods.datasets.spellman_yeast_cdc15()\n", "Y = data['Y'].fillna(0)\n", "q = 5\n", "U, ell, sigma2 = ppca(Y, q)\n", "mu_x, C_x = posterior(Y, U, ell, sigma2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, looking through, we find that there is indeed a latent variable that appears to oscilate at approximately the right frequency. The 4th latent dimension (`index=3`) can be plotted across the time course as follows." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(mu_x[:, 3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To reveal an oscillating shape. We can see which genes correspond to this shape by examining the associated column of $\\mathbf{U}$. Let's augment our data matrix with this score." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "gene_list = Y.T\n", "gene_list['oscilation'] = np.sqrt(U[:, 3]**2)\n", "gene_list.sort(columns='oscilation', ascending=False).index[:4]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can look up the first three genes in this list which now ranks the genes according to how strongly they respond to the fourth latent dimension. [The NCBI gene database](http://www.ncbi.nlm.nih.gov/gene/) allows us to search for the function of these genes. Looking at the function of the four genes that respond most strongly to the third latent variable they are all genes that encode [histone](http://en.wikipedia.org/wiki/Histone) proteins. The histone is the support scaffold for the DNA that ensures it folds correctly within the cell creating the nucleosomes. It seems to make sense that production of histone proteins should be strongly correlated with the cell cycle, as when the cell divides it needs to create a large number of histone proteins for creating the duplicated nucleosomes. The relevant links giving the descriptions of each gene given here: [YDR224C](http://www.ncbi.nlm.nih.gov/gene/851810), [YDR225W](http://www.ncbi.nlm.nih.gov/gene/851811), [YBL003C](http://www.ncbi.nlm.nih.gov/gene/852283) and [YNL030W](http://www.ncbi.nlm.nih.gov/gene/855701)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 1 }