{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Objective Functions: A Simple Example with Matrix Factorisation\n", "### [Neil D. Lawrence](http://inverseprobability.com), University of Sheffield\n", "### 2015-10-06\n", "\n", "**Abstract**: In this session we introduce the notion of objective functions and show\n", "how they can be used in a simple recommender system based on *matrix\n", "factorisation*.\n", "\n", "$$\n", "\\newcommand{\\tk}[1]{}\n", "%\\newcommand{\\tk}[1]{\\textbf{TK}: #1}\n", "\\newcommand{\\Amatrix}{\\mathbf{A}}\n", "\\newcommand{\\KL}[2]{\\text{KL}\\left( #1\\,\\|\\,#2 \\right)}\n", "\\newcommand{\\Kaast}{\\kernelMatrix_{\\mathbf{ \\ast}\\mathbf{ \\ast}}}\n", "\\newcommand{\\Kastu}{\\kernelMatrix_{\\mathbf{ \\ast} \\inducingVector}}\n", "\\newcommand{\\Kff}{\\kernelMatrix_{\\mappingFunctionVector \\mappingFunctionVector}}\n", "\\newcommand{\\Kfu}{\\kernelMatrix_{\\mappingFunctionVector \\inducingVector}}\n", "\\newcommand{\\Kuast}{\\kernelMatrix_{\\inducingVector \\bf\\ast}}\n", "\\newcommand{\\Kuf}{\\kernelMatrix_{\\inducingVector \\mappingFunctionVector}}\n", "\\newcommand{\\Kuu}{\\kernelMatrix_{\\inducingVector \\inducingVector}}\n", "\\newcommand{\\Kuui}{\\Kuu^{-1}}\n", "\\newcommand{\\Qaast}{\\mathbf{Q}_{\\bf \\ast \\ast}}\n", "\\newcommand{\\Qastf}{\\mathbf{Q}_{\\ast \\mappingFunction}}\n", "\\newcommand{\\Qfast}{\\mathbf{Q}_{\\mappingFunctionVector \\bf \\ast}}\n", "\\newcommand{\\Qff}{\\mathbf{Q}_{\\mappingFunctionVector \\mappingFunctionVector}}\n", "\\newcommand{\\aMatrix}{\\mathbf{A}}\n", "\\newcommand{\\aScalar}{a}\n", "\\newcommand{\\aVector}{\\mathbf{a}}\n", "\\newcommand{\\acceleration}{a}\n", "\\newcommand{\\bMatrix}{\\mathbf{B}}\n", "\\newcommand{\\bScalar}{b}\n", "\\newcommand{\\bVector}{\\mathbf{b}}\n", "\\newcommand{\\basisFunc}{\\phi}\n", "\\newcommand{\\basisFuncVector}{\\boldsymbol{ \\basisFunc}}\n", "\\newcommand{\\basisFunction}{\\phi}\n", "\\newcommand{\\basisLocation}{\\mu}\n", "\\newcommand{\\basisMatrix}{\\boldsymbol{ \\Phi}}\n", "\\newcommand{\\basisScalar}{\\basisFunction}\n", "\\newcommand{\\basisVector}{\\boldsymbol{ \\basisFunction}}\n", "\\newcommand{\\activationFunction}{\\phi}\n", "\\newcommand{\\activationMatrix}{\\boldsymbol{ \\Phi}}\n", "\\newcommand{\\activationScalar}{\\basisFunction}\n", "\\newcommand{\\activationVector}{\\boldsymbol{ \\basisFunction}}\n", "\\newcommand{\\bigO}{\\mathcal{O}}\n", "\\newcommand{\\binomProb}{\\pi}\n", "\\newcommand{\\cMatrix}{\\mathbf{C}}\n", "\\newcommand{\\cbasisMatrix}{\\hat{\\boldsymbol{ \\Phi}}}\n", "\\newcommand{\\cdataMatrix}{\\hat{\\dataMatrix}}\n", "\\newcommand{\\cdataScalar}{\\hat{\\dataScalar}}\n", "\\newcommand{\\cdataVector}{\\hat{\\dataVector}}\n", "\\newcommand{\\centeredKernelMatrix}{\\mathbf{ \\MakeUppercase{\\centeredKernelScalar}}}\n", "\\newcommand{\\centeredKernelScalar}{b}\n", "\\newcommand{\\centeredKernelVector}{\\centeredKernelScalar}\n", "\\newcommand{\\centeringMatrix}{\\mathbf{H}}\n", "\\newcommand{\\chiSquaredDist}[2]{\\chi_{#1}^{2}\\left(#2\\right)}\n", "\\newcommand{\\chiSquaredSamp}[1]{\\chi_{#1}^{2}}\n", "\\newcommand{\\conditionalCovariance}{\\boldsymbol{ \\Sigma}}\n", "\\newcommand{\\coregionalizationMatrix}{\\mathbf{B}}\n", "\\newcommand{\\coregionalizationScalar}{b}\n", "\\newcommand{\\coregionalizationVector}{\\mathbf{ \\coregionalizationScalar}}\n", "\\newcommand{\\covDist}[2]{\\text{cov}_{#2}\\left(#1\\right)}\n", "\\newcommand{\\covSamp}[1]{\\text{cov}\\left(#1\\right)}\n", "\\newcommand{\\covarianceScalar}{c}\n", "\\newcommand{\\covarianceVector}{\\mathbf{ \\covarianceScalar}}\n", "\\newcommand{\\covarianceMatrix}{\\mathbf{C}}\n", "\\newcommand{\\covarianceMatrixTwo}{\\boldsymbol{ \\Sigma}}\n", "\\newcommand{\\croupierScalar}{s}\n", "\\newcommand{\\croupierVector}{\\mathbf{ \\croupierScalar}}\n", "\\newcommand{\\croupierMatrix}{\\mathbf{ \\MakeUppercase{\\croupierScalar}}}\n", "\\newcommand{\\dataDim}{p}\n", "\\newcommand{\\dataIndex}{i}\n", "\\newcommand{\\dataIndexTwo}{j}\n", "\\newcommand{\\dataMatrix}{\\mathbf{Y}}\n", "\\newcommand{\\dataScalar}{y}\n", "\\newcommand{\\dataSet}{\\mathcal{D}}\n", "\\newcommand{\\dataStd}{\\sigma}\n", "\\newcommand{\\dataVector}{\\mathbf{ \\dataScalar}}\n", "\\newcommand{\\decayRate}{d}\n", "\\newcommand{\\degreeMatrix}{\\mathbf{ \\MakeUppercase{\\degreeScalar}}}\n", "\\newcommand{\\degreeScalar}{d}\n", "\\newcommand{\\degreeVector}{\\mathbf{ \\degreeScalar}}\n", "% Already defined by latex\n", "%\\newcommand{\\det}[1]{\\left|#1\\right|}\n", "\\newcommand{\\diag}[1]{\\text{diag}\\left(#1\\right)}\n", "\\newcommand{\\diagonalMatrix}{\\mathbf{D}}\n", "\\newcommand{\\diff}[2]{\\frac{\\text{d}#1}{\\text{d}#2}}\n", "\\newcommand{\\diffTwo}[2]{\\frac{\\text{d}^2#1}{\\text{d}#2^2}}\n", "\\newcommand{\\displacement}{x}\n", "\\newcommand{\\displacementVector}{\\textbf{\\displacement}}\n", "\\newcommand{\\distanceMatrix}{\\mathbf{ \\MakeUppercase{\\distanceScalar}}}\n", "\\newcommand{\\distanceScalar}{d}\n", "\\newcommand{\\distanceVector}{\\mathbf{ \\distanceScalar}}\n", "\\newcommand{\\eigenvaltwo}{\\ell}\n", "\\newcommand{\\eigenvaltwoMatrix}{\\mathbf{L}}\n", "\\newcommand{\\eigenvaltwoVector}{\\mathbf{l}}\n", "\\newcommand{\\eigenvalue}{\\lambda}\n", "\\newcommand{\\eigenvalueMatrix}{\\boldsymbol{ \\Lambda}}\n", "\\newcommand{\\eigenvalueVector}{\\boldsymbol{ \\lambda}}\n", "\\newcommand{\\eigenvector}{\\mathbf{ \\eigenvectorScalar}}\n", "\\newcommand{\\eigenvectorMatrix}{\\mathbf{U}}\n", "\\newcommand{\\eigenvectorScalar}{u}\n", "\\newcommand{\\eigenvectwo}{\\mathbf{v}}\n", "\\newcommand{\\eigenvectwoMatrix}{\\mathbf{V}}\n", "\\newcommand{\\eigenvectwoScalar}{v}\n", "\\newcommand{\\entropy}[1]{\\mathcal{H}\\left(#1\\right)}\n", "\\newcommand{\\errorFunction}{E}\n", "\\newcommand{\\expDist}[2]{\\left<#1\\right>_{#2}}\n", "\\newcommand{\\expSamp}[1]{\\left<#1\\right>}\n", "\\newcommand{\\expectation}[1]{\\left\\langle #1 \\right\\rangle }\n", "\\newcommand{\\expectationDist}[2]{\\left\\langle #1 \\right\\rangle _{#2}}\n", "\\newcommand{\\expectedDistanceMatrix}{\\mathcal{D}}\n", "\\newcommand{\\eye}{\\mathbf{I}}\n", "\\newcommand{\\fantasyDim}{r}\n", "\\newcommand{\\fantasyMatrix}{\\mathbf{ \\MakeUppercase{\\fantasyScalar}}}\n", "\\newcommand{\\fantasyScalar}{z}\n", "\\newcommand{\\fantasyVector}{\\mathbf{ \\fantasyScalar}}\n", "\\newcommand{\\featureStd}{\\varsigma}\n", "\\newcommand{\\gammaCdf}[3]{\\mathcal{GAMMA CDF}\\left(#1|#2,#3\\right)}\n", "\\newcommand{\\gammaDist}[3]{\\mathcal{G}\\left(#1|#2,#3\\right)}\n", "\\newcommand{\\gammaSamp}[2]{\\mathcal{G}\\left(#1,#2\\right)}\n", "\\newcommand{\\gaussianDist}[3]{\\mathcal{N}\\left(#1|#2,#3\\right)}\n", "\\newcommand{\\gaussianSamp}[2]{\\mathcal{N}\\left(#1,#2\\right)}\n", "\\newcommand{\\given}{|}\n", "\\newcommand{\\half}{\\frac{1}{2}}\n", "\\newcommand{\\heaviside}{H}\n", "\\newcommand{\\hiddenMatrix}{\\mathbf{ \\MakeUppercase{\\hiddenScalar}}}\n", "\\newcommand{\\hiddenScalar}{h}\n", "\\newcommand{\\hiddenVector}{\\mathbf{ \\hiddenScalar}}\n", "\\newcommand{\\identityMatrix}{\\eye}\n", "\\newcommand{\\inducingInputScalar}{z}\n", "\\newcommand{\\inducingInputVector}{\\mathbf{ \\inducingInputScalar}}\n", "\\newcommand{\\inducingInputMatrix}{\\mathbf{Z}}\n", "\\newcommand{\\inducingScalar}{u}\n", "\\newcommand{\\inducingVector}{\\mathbf{ \\inducingScalar}}\n", "\\newcommand{\\inducingMatrix}{\\mathbf{U}}\n", "\\newcommand{\\inlineDiff}[2]{\\text{d}#1/\\text{d}#2}\n", "\\newcommand{\\inputDim}{q}\n", "\\newcommand{\\inputMatrix}{\\mathbf{X}}\n", "\\newcommand{\\inputScalar}{x}\n", "\\newcommand{\\inputSpace}{\\mathcal{X}}\n", "\\newcommand{\\inputVals}{\\inputVector}\n", "\\newcommand{\\inputVector}{\\mathbf{ \\inputScalar}}\n", "\\newcommand{\\iterNum}{k}\n", "\\newcommand{\\kernel}{\\kernelScalar}\n", "\\newcommand{\\kernelMatrix}{\\mathbf{K}}\n", "\\newcommand{\\kernelScalar}{k}\n", "\\newcommand{\\kernelVector}{\\mathbf{ \\kernelScalar}}\n", "\\newcommand{\\kff}{\\kernelScalar_{\\mappingFunction \\mappingFunction}}\n", "\\newcommand{\\kfu}{\\kernelVector_{\\mappingFunction \\inducingScalar}}\n", "\\newcommand{\\kuf}{\\kernelVector_{\\inducingScalar \\mappingFunction}}\n", "\\newcommand{\\kuu}{\\kernelVector_{\\inducingScalar \\inducingScalar}}\n", "\\newcommand{\\lagrangeMultiplier}{\\lambda}\n", "\\newcommand{\\lagrangeMultiplierMatrix}{\\boldsymbol{ \\Lambda}}\n", "\\newcommand{\\lagrangian}{L}\n", "\\newcommand{\\laplacianFactor}{\\mathbf{ \\MakeUppercase{\\laplacianFactorScalar}}}\n", "\\newcommand{\\laplacianFactorScalar}{m}\n", "\\newcommand{\\laplacianFactorVector}{\\mathbf{ \\laplacianFactorScalar}}\n", "\\newcommand{\\laplacianMatrix}{\\mathbf{L}}\n", "\\newcommand{\\laplacianScalar}{\\ell}\n", "\\newcommand{\\laplacianVector}{\\mathbf{ \\ell}}\n", "\\newcommand{\\latentDim}{q}\n", "\\newcommand{\\latentDistanceMatrix}{\\boldsymbol{ \\Delta}}\n", "\\newcommand{\\latentDistanceScalar}{\\delta}\n", "\\newcommand{\\latentDistanceVector}{\\boldsymbol{ \\delta}}\n", "\\newcommand{\\latentForce}{f}\n", "\\newcommand{\\latentFunction}{u}\n", "\\newcommand{\\latentFunctionVector}{\\mathbf{ \\latentFunction}}\n", "\\newcommand{\\latentFunctionMatrix}{\\mathbf{ \\MakeUppercase{\\latentFunction}}}\n", "\\newcommand{\\latentIndex}{j}\n", "\\newcommand{\\latentScalar}{z}\n", "\\newcommand{\\latentVector}{\\mathbf{ \\latentScalar}}\n", "\\newcommand{\\latentMatrix}{\\mathbf{Z}}\n", "\\newcommand{\\learnRate}{\\eta}\n", "\\newcommand{\\lengthScale}{\\ell}\n", "\\newcommand{\\rbfWidth}{\\ell}\n", "\\newcommand{\\likelihoodBound}{\\mathcal{L}}\n", "\\newcommand{\\likelihoodFunction}{L}\n", "\\newcommand{\\locationScalar}{\\mu}\n", "\\newcommand{\\locationVector}{\\boldsymbol{ \\locationScalar}}\n", "\\newcommand{\\locationMatrix}{\\mathbf{M}}\n", "\\newcommand{\\variance}[1]{\\text{var}\\left( #1 \\right)}\n", "\\newcommand{\\mappingFunction}{f}\n", "\\newcommand{\\mappingFunctionMatrix}{\\mathbf{F}}\n", "\\newcommand{\\mappingFunctionTwo}{g}\n", "\\newcommand{\\mappingFunctionTwoMatrix}{\\mathbf{G}}\n", "\\newcommand{\\mappingFunctionTwoVector}{\\mathbf{ \\mappingFunctionTwo}}\n", "\\newcommand{\\mappingFunctionVector}{\\mathbf{ \\mappingFunction}}\n", "\\newcommand{\\scaleScalar}{s}\n", "\\newcommand{\\mappingScalar}{w}\n", "\\newcommand{\\mappingVector}{\\mathbf{ \\mappingScalar}}\n", "\\newcommand{\\mappingMatrix}{\\mathbf{W}}\n", "\\newcommand{\\mappingScalarTwo}{v}\n", "\\newcommand{\\mappingVectorTwo}{\\mathbf{ \\mappingScalarTwo}}\n", "\\newcommand{\\mappingMatrixTwo}{\\mathbf{V}}\n", "\\newcommand{\\maxIters}{K}\n", "\\newcommand{\\meanMatrix}{\\mathbf{M}}\n", "\\newcommand{\\meanScalar}{\\mu}\n", "\\newcommand{\\meanTwoMatrix}{\\mathbf{M}}\n", "\\newcommand{\\meanTwoScalar}{m}\n", "\\newcommand{\\meanTwoVector}{\\mathbf{ \\meanTwoScalar}}\n", "\\newcommand{\\meanVector}{\\boldsymbol{ \\meanScalar}}\n", "\\newcommand{\\mrnaConcentration}{m}\n", "\\newcommand{\\naturalFrequency}{\\omega}\n", "\\newcommand{\\neighborhood}[1]{\\mathcal{N}\\left( #1 \\right)}\n", "\\newcommand{\\neilurl}{http://inverseprobability.com/}\n", "\\newcommand{\\noiseMatrix}{\\boldsymbol{ E}}\n", "\\newcommand{\\noiseScalar}{\\epsilon}\n", "\\newcommand{\\noiseVector}{\\boldsymbol{ \\epsilon}}\n", "\\newcommand{\\norm}[1]{\\left\\Vert #1 \\right\\Vert}\n", "\\newcommand{\\normalizedLaplacianMatrix}{\\hat{\\mathbf{L}}}\n", "\\newcommand{\\normalizedLaplacianScalar}{\\hat{\\ell}}\n", "\\newcommand{\\normalizedLaplacianVector}{\\hat{\\mathbf{ \\ell}}}\n", "\\newcommand{\\numActive}{m}\n", "\\newcommand{\\numBasisFunc}{m}\n", "\\newcommand{\\numComponents}{m}\n", "\\newcommand{\\numComps}{K}\n", "\\newcommand{\\numData}{n}\n", "\\newcommand{\\numFeatures}{K}\n", "\\newcommand{\\numHidden}{h}\n", "\\newcommand{\\numInducing}{m}\n", "\\newcommand{\\numLayers}{\\ell}\n", "\\newcommand{\\numNeighbors}{K}\n", "\\newcommand{\\numSequences}{s}\n", "\\newcommand{\\numSuccess}{s}\n", "\\newcommand{\\numTasks}{m}\n", "\\newcommand{\\numTime}{T}\n", "\\newcommand{\\numTrials}{S}\n", "\\newcommand{\\outputIndex}{j}\n", "\\newcommand{\\paramVector}{\\boldsymbol{ \\theta}}\n", "\\newcommand{\\parameterMatrix}{\\boldsymbol{ \\Theta}}\n", "\\newcommand{\\parameterScalar}{\\theta}\n", "\\newcommand{\\parameterVector}{\\boldsymbol{ \\parameterScalar}}\n", "\\newcommand{\\partDiff}[2]{\\frac{\\partial#1}{\\partial#2}}\n", "\\newcommand{\\precisionScalar}{j}\n", "\\newcommand{\\precisionVector}{\\mathbf{ \\precisionScalar}}\n", "\\newcommand{\\precisionMatrix}{\\mathbf{J}}\n", "\\newcommand{\\pseudotargetScalar}{\\widetilde{y}}\n", "\\newcommand{\\pseudotargetVector}{\\mathbf{ \\pseudotargetScalar}}\n", "\\newcommand{\\pseudotargetMatrix}{\\mathbf{ \\widetilde{Y}}}\n", "\\newcommand{\\rank}[1]{\\text{rank}\\left(#1\\right)}\n", "\\newcommand{\\rayleighDist}[2]{\\mathcal{R}\\left(#1|#2\\right)}\n", "\\newcommand{\\rayleighSamp}[1]{\\mathcal{R}\\left(#1\\right)}\n", "\\newcommand{\\responsibility}{r}\n", "\\newcommand{\\rotationScalar}{r}\n", "\\newcommand{\\rotationVector}{\\mathbf{ \\rotationScalar}}\n", "\\newcommand{\\rotationMatrix}{\\mathbf{R}}\n", "\\newcommand{\\sampleCovScalar}{s}\n", "\\newcommand{\\sampleCovVector}{\\mathbf{ \\sampleCovScalar}}\n", "\\newcommand{\\sampleCovMatrix}{\\mathbf{s}}\n", "\\newcommand{\\scalarProduct}[2]{\\left\\langle{#1},{#2}\\right\\rangle}\n", "\\newcommand{\\sign}[1]{\\text{sign}\\left(#1\\right)}\n", "\\newcommand{\\sigmoid}[1]{\\sigma\\left(#1\\right)}\n", "\\newcommand{\\singularvalue}{\\ell}\n", "\\newcommand{\\singularvalueMatrix}{\\mathbf{L}}\n", "\\newcommand{\\singularvalueVector}{\\mathbf{l}}\n", "\\newcommand{\\sorth}{\\mathbf{u}}\n", "\\newcommand{\\spar}{\\lambda}\n", "\\newcommand{\\trace}[1]{\\text{tr}\\left(#1\\right)}\n", "\\newcommand{\\BasalRate}{B}\n", "\\newcommand{\\DampingCoefficient}{C}\n", "\\newcommand{\\DecayRate}{D}\n", "\\newcommand{\\Displacement}{X}\n", "\\newcommand{\\LatentForce}{F}\n", "\\newcommand{\\Mass}{M}\n", "\\newcommand{\\Sensitivity}{S}\n", "\\newcommand{\\basalRate}{b}\n", "\\newcommand{\\dampingCoefficient}{c}\n", "\\newcommand{\\mass}{m}\n", "\\newcommand{\\sensitivity}{s}\n", "\\newcommand{\\springScalar}{\\kappa}\n", "\\newcommand{\\springVector}{\\boldsymbol{ \\kappa}}\n", "\\newcommand{\\springMatrix}{\\boldsymbol{ \\mathcal{K}}}\n", "\\newcommand{\\tfConcentration}{p}\n", "\\newcommand{\\tfDecayRate}{\\delta}\n", "\\newcommand{\\tfMrnaConcentration}{f}\n", "\\newcommand{\\tfVector}{\\mathbf{ \\tfConcentration}}\n", "\\newcommand{\\velocity}{v}\n", "\\newcommand{\\sufficientStatsScalar}{g}\n", "\\newcommand{\\sufficientStatsVector}{\\mathbf{ \\sufficientStatsScalar}}\n", "\\newcommand{\\sufficientStatsMatrix}{\\mathbf{G}}\n", "\\newcommand{\\switchScalar}{s}\n", "\\newcommand{\\switchVector}{\\mathbf{ \\switchScalar}}\n", "\\newcommand{\\switchMatrix}{\\mathbf{S}}\n", "\\newcommand{\\tr}[1]{\\text{tr}\\left(#1\\right)}\n", "\\newcommand{\\loneNorm}[1]{\\left\\Vert #1 \\right\\Vert_1}\n", "\\newcommand{\\ltwoNorm}[1]{\\left\\Vert #1 \\right\\Vert_2}\n", "\\newcommand{\\onenorm}[1]{\\left\\vert#1\\right\\vert_1}\n", "\\newcommand{\\twonorm}[1]{\\left\\Vert #1 \\right\\Vert}\n", "\\newcommand{\\vScalar}{v}\n", "\\newcommand{\\vVector}{\\mathbf{v}}\n", "\\newcommand{\\vMatrix}{\\mathbf{V}}\n", "\\newcommand{\\varianceDist}[2]{\\text{var}_{#2}\\left( #1 \\right)}\n", "% Already defined by latex\n", "%\\newcommand{\\vec}{#1:}\n", "\\newcommand{\\vecb}[1]{\\left(#1\\right):}\n", "\\newcommand{\\weightScalar}{w}\n", "\\newcommand{\\weightVector}{\\mathbf{ \\weightScalar}}\n", "\\newcommand{\\weightMatrix}{\\mathbf{W}}\n", "\\newcommand{\\weightedAdjacencyMatrix}{\\mathbf{A}}\n", "\\newcommand{\\weightedAdjacencyScalar}{a}\n", "\\newcommand{\\weightedAdjacencyVector}{\\mathbf{ \\weightedAdjacencyScalar}}\n", "\\newcommand{\\onesVector}{\\mathbf{1}}\n", "\\newcommand{\\zerosVector}{\\mathbf{0}}\n", "$$\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\\addreading{@Rogers:book11}{Section 1.1.3} for loss functions.\n", "\n", "\\reading\n", "In \\refnotes{the introduction}{intro-probability} we saw how we could\n", "load in a data set to pandas and use it for some simple data processing.\n", "We computed variaous probabilities on the data and I encouraged you to\n", "think about what sort of probabilities you need for prediction. This\n", "week we are going to take a slightly different tack.\n", "\n", "Broadly speaking there are two dominating approaches to machine learning\n", "problems. We started to consider the first approach last week:\n", "constructing models based on defining the relationship between variables\n", "using probabilities. This week we will consider the second approach:\n", "which involves defining an *objective function* and optimizing it. What\n", "do we mean by an objective function? An objective function could be an\n", "*error function* a *cost function* or a *benefit* function. In\n", "evolutionary computing they are called *fitness* functions. But the idea\n", "is always the same. We write down a mathematical equation which is then\n", "optimized to do the learning. The equation should be a function of the\n", "*data* and our model *parameters*. We have a choice when optimizing,\n", "either minimize or maximize. To avoid confusion, in the optimization\n", "field, we always choose to minimize the function. If we have function\n", "that we would like to maximize, we simply choose to minimize the\n", "negative of that function.\n", "\n", "So for this lab session, we are going to ignore probabilities, but don't\n", "worry, they will return!\n", "\n", "This week we are going to try and build a simple movie recommender\n", "system using an objective function. To do this, the first thing I'd like\n", "you to do is to install some software we've written for sharing\n", "information across google documents.\n", "\n", "## `pods` \\[edit\\]\n", "\n", "In Sheffield we created a suite of software tools for 'Open Data\n", "Science'. Open data science is an approach to sharing code, models and\n", "data that should make it easier for companies, health professionals and\n", "scientists to gain access to data science techniques.\n", "\n", "You can also check my blog post on [Open Data\n", "Science](http://inverseprobability.com/2014/07/01/open-data-science).\n", "\n", "The software can be installed using" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install pods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "from the command prompt where you can access your python installation.\n", "\n", "The code is also available on github: \n", "\n", "Once `pods` is installed, it can be imported in the usual manner." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Movie Body Count Example \\[edit\\]\n", "\n", "There is a crisis in the movie industry, deaths are occuring on a\n", "massive scale. In every feature film the body count is tolling up. But\n", "what is the cause of all these deaths? Let's try and investigate.\n", "\n", "For our first example of data science, we take inspiration from work by\n", "[researchers at NJIT](http://www.theswarmlab.com/r-vs-python-round-2/).\n", "They researchers were comparing the qualities of Python with R (my brief\n", "thoughts on the subject are available in a Google+ post here:\n", "https://plus.google.com/116220678599902155344/posts/5iKyqcrNN68). They\n", "put together a data base of results from the the \"Internet Movie\n", "Database\" and the [Movie Body Count](http://www.moviebodycounts.com/)\n", "website which will allow us to do some preliminary investigation.\n", "\n", "We will make use of data that has already been 'scraped' from the [Movie\n", "Body Count](http://www.moviebodycounts.com/) website. Code and the data\n", "is available at [a github\n", "repository](https://github.com/sjmgarnier/R-vs-%20Python/tree/master/Deadliest%20movies%20scrape/code).\n", "Git is a version control system and github is a website that hosts code\n", "that can be accessed through git. By sharing the code publicly through\n", "github, the authors are licensing the code publicly and allowing you to\n", "access and edit it. As well as accessing the code via github you can\n", "also [download the zip\n", "file](https://github.com/sjmgarnier/R-vs-Python/archive/master.zip).\n", "\n", "For ease of use we've packaged this data set in the `pods` library" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = pods.datasets.movie_body_count()['Y']\n", "data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once it is loaded in the data can be summarized using the `describe`\n", "method in pandas." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In jupyter and jupyter notebook it is possible to see a list of all\n", "possible functions and attributes by typing the name of the object\n", "followed by `.` for example in the above case if we type\n", "`data.` it show the columns available (these are attributes in\n", "pandas dataframes) such as `Body_Count`, and also functions, such as\n", ".describe().\n", "\n", "For functions we can also see the documentation about the function by\n", "following the name with a question mark. This will open a box with\n", "documentation at the bottom which can be closed with the x button." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data.describe?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The film deaths data is stored in an object known as a 'data frame'.\n", "Data frames come from the statistical family of programming languages\n", "based on `S`, the most widely used of which is\n", "[`R`](http://en.wikipedia.org/wiki/R_(programming_language)). The data\n", "frame gives us a convenient object for manipulating data. The describe\n", "method summarizes which columns there are in the data frame and gives us\n", "counts, means, standard deviations and percentiles for the values in\n", "those columns. To access a column directly we can write" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(data['Year'])\n", "#print(data['Body_Count'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This shows the number of deaths per film across the years. We can plot\n", "the data as follows." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# this ensures the plot appears in the web browser\n", "%matplotlib inline \n", "import matplotlib.pyplot as plt # this imports the plotting library in python" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(data['Year'], data['Body_Count'], 'rx')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You may be curious what the arguments we give to `plt.plot` are for, now\n", "is the perfect time to look at the documentation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We immediately note that some films have a lot of deaths, which prevent\n", "us seeing the detail of the main body of films. First lets identify the\n", "films with the most deaths." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data[data['Body_Count']>200]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we are using the command `data['Kill_Count']>200` to index the\n", "films in the pandas data frame which have over 200 deaths. To sort them\n", "in order we can also use the `sort` command. The result of this command\n", "on its own is a data series of `True` and `False` values. However, when\n", "it is passed to the `data` data frame it returns a new data frame which\n", "contains only those values for which the data series is `True`. We can\n", "also sort the result. To sort the result by the values in the\n", "`Kill_Count` column in *descending* order we use the following command." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data[data['Body_Count']>200].sort_values(by='Body_Count', ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now see that the 'Lord of the Rings' is a large outlier with a very\n", "large number of kills. We can try and determine how much of an outlier\n", "by histograming the data.\n", "\n", "## Plotting the Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data['Body_Count'].hist(bins=20) # histogram the data with 20 bins.\n", "plt.title('Histogram of Film Kill Count')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Question 2\n", "\n", "Read on the internet about the following python libraries: `numpy`,\n", "`matplotlib`, `scipy` and `pandas`. What functionality does each provide\n", "python. What is the `pylab` library and how does it relate to the other\n", "libraries?\n", "\n", "*10 marks*\n", "\n", "### Write your answer to Question 2 here\n", "\n", "We could try and remove these outliers, but another approach would be\n", "plot the logarithm of the counts against the year." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(data['Year'], data['Body_Count'], 'rx')\n", "ax = plt.gca() # obtain a handle to the current axis\n", "ax.set_yscale('log') # use a logarithmic death scale\n", "# give the plot some titles and labels\n", "plt.title('Film Deaths against Year')\n", "plt.ylabel('deaths')\n", "plt.xlabel('year')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note a few things. We are interacting with our data. In particular, we\n", "are replotting the data according to what we have learned so far. We are\n", "using the progamming language as a *scripting* language to give the\n", "computer one command or another, and then the next command we enter is\n", "dependent on the result of the previous. This is a very different\n", "paradigm to classical software engineering. In classical software\n", "engineering we normally write many lines of code (entire object classes\n", "or functions) before compiling the code and running it. Our approach is\n", "more similar to the approach we take whilst debugging. Historically,\n", "researchers interacted with data using a *console*. A command line\n", "window which allowed command entry. The notebook format we are using is\n", "slightly different. Each of the code entry boxes acts like a separate\n", "console window. We can move up and down the notebook and run each part\n", "in a different order. The *state* of the program is always as we left it\n", "after running the previous part.\n", "\n", "### Question 1\n", "\n", "Data ethics. If you find data available on the internet, can you simply\n", "use it without consequence? If you are given data by a fellow researcher\n", "can you publish that data on line?\n", "\n", "*10 marks*\n", "\n", "### Write your answer to Question 1 here\n", "\n", "# Recommender Systems \\[edit\\]\n", "\n", "A recommender system aims to make suggestions for items (films, books,\n", "other commercial products) given what it knows about users' tastes. The\n", "recommendation engine needs to represent the *taste* of all the users\n", "and the *characteristics* of each object.\n", "\n", "A common way for organizing objects is to place related objects\n", "spatially close together. For example in a library we try and put books\n", "that are on related topics near to each other on the shelves. One system\n", "for doing this is known as [Dewey Decimal\n", "Classification](http://en.wikipedia.org/wiki/Dewey_Decimal_Classification).\n", "In the Dewey Decimal Classification system (which dates from 1876) each\n", "subject is given a number (in fact it's a decimal number). For example,\n", "the field of Natural Sciences and Mathematics is given numbers which\n", "start with 500. Subjects based on Computer Science are given numbers\n", "which start 004 and works on the 'mathematical principles' of Computer\n", "science are given the series 004.0151 (which we might store as 4.0151 on\n", "a Computer). Whilst it's a classification system, the books in the\n", "library are typically laid out in the same order as the numbers, so we\n", "might expect that neighbouring numbers represent books that are related\n", "in subject. That seems to be exactly what we want when also representing\n", "films. Could we somehow represent each film's subject according to a\n", "number? In a similar way we could then imagine representing users with a\n", "list of numbers that represent things that each user is interested in.\n", "\n", "Actually a one dimensional representation of a subject can be very\n", "awkward. To see this, let's have a look at the Dewey Decimal\n", "Classification numbers for the 900s, which is listed as 'History and\n", "Geography'. We will focus on subjects in the 940s which can be found in\n", "this list from [Nova Southeastern\n", "University](http://www.nova.edu/library/help/misc/lc_dewey/dewey900.html#40).\n", "Whilst the ordering for places is somewhat sensible, it is also rather\n", "arbitrary. In the 940s we have Europe listed from 940-949, Asia listed\n", "from 950-959 and Africa listed from 960-969. Whilst it's true that Asia\n", "borders Europe, Africa is also very close, and the history of the Roman\n", "Empire spreads into [Carthage](http://en.wikipedia.org/wiki/Carthage)\n", "and later on Egypt. This image from Wikipedia shows a map of the\n", "Cathaginian Empire which fell after fighting with Rome.\n", "\n", "\n", "\n", "Figure: The Carhaginian Empire at its peak.\n", "\n", "We now need to make a decision about whether Roman Histories are\n", "European or African, ideally we'd like them to be somewhere between the\n", "two, but we can't place them there in the Dewey Decimal system because\n", "between Europe and Africa is Asia, which has less to do with the Roman\n", "Empire than either Europe or Africa. Of course the fact that we've used\n", "a map provides a clue as to what to do next. Libraries are actually laid\n", "out on floors, so what if we were to use the spatial lay out to organise\n", "the sujbects of the books in two dimensions. Books on Geography could be\n", "laid out according to where in the world they are referring to.\n", "\n", "Such complexities are very hard to encapsulate in one number, but\n", "inspired by the map examples we can start considering how we might lay\n", "out films in two dimensions. Similarly, we can consider laying out a map\n", "of people's interests. If the two maps correspond to one another, the\n", "map of people could reflect where they might want to live in 'subject\n", "space'. We can think of representing people's tastes as where they might\n", "best like to sit in the library to access easily the books they are most\n", "interested in.\n", "\n", "## Inner Products for Representing Similarity\n", "\n", "Ideas like the above are good for gaining intuitions about what we might\n", "want, but the one of the skills of data science is representing those\n", "ideas mathematically. Mathematical abstraction of a problem is one of\n", "the key ways in which we've been able to progress as a society.\n", "Understanding planetary motions, as well as those of the smallest\n", "molecule (to quote Laplace's [Philosophical Essay on\n", "Probabilities](http://books.google.co.uk/books?id=1YQPAAAAQAAJ&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false))\n", "needed to be done mathematically. The right mathematical model in\n", "machine learning can be slightly more elusive, because constructing it\n", "is a two stage process.\n", "\n", "1. We have to determine the right intuition for the system we want to\n", " represent. Notions such as 'subject' and 'interest' are not\n", " mathematically well defined, and even when we create a new\n", " interpretation of what they might mean, each interpretation may have\n", " its own weaknesses.\n", "\n", "2. Once we have our interpretation we can attempt to mathematically\n", " formalize it. In our library interpretation, that's what we need to\n", " do next.\n", "\n", "## The Library on an Infinite Plane\n", "\n", "Let's imagine a library which stores all the items we are interested in,\n", "not just books, but films and shopping items too. Such a library is\n", "likely to be very large, so we'll create it on an infinite two\n", "dimensional plane. This means we can use all the real numbers to\n", "represent the location of each item on the plane. For a two dimensional\n", "plane, we need to store the locations in a vector of numbers: we can\n", "decide that the $j$th item's location in the library is given by $$\n", "\\mathbf{v}_j = \\begin{bmatrix} v_{j,1} \\\\ v_{j,2}\\end{bmatrix},\n", "$$ where $v_{j,1}$ represents the $j$th item's location in the East-West\n", "direction (or the $x$-axis) and $v_{j,2}$ represents the $j$th item's\n", "location in the North-South direction (or the $y$-axis). Now we need to\n", "specify the location where each user sits so that all the items that\n", "interest them are nearby: we can also represent the $i$th user's\n", "location with a vector $$\n", "\\mathbf{u}_i = \\begin{bmatrix} u_{i,1} \\\\ u_{i,2}\\end{bmatrix}.\n", "$$ Finally, we need some way of recording a given user's affinity for a\n", "given item. This affinity might be the rating that the user gives the\n", "film. We can use $y_{i,j}$ to represent user $i$'s affinity for item\n", "$j$.\n", "\n", "For our film example we might imagine wanting to order films in a few\n", "ways. We could imagine organising films in the North-South direction as\n", "to how romantic they are. We could place the more romantic films further\n", "North and the less romantic films further South. For the East-West\n", "direction we could imagine ordering them according to how historic they\n", "are: we can imagine placing science fiction films to the East and\n", "historical drama to the West. In this case, fans of historical romances\n", "would be based in the North-West location, whilst fans of Science\n", "Fiction Action films might be located in the South-East (if we assume\n", "that 'Action' is the opposite of 'Romance', which is not necessarily the\n", "case). How do we lay out all these films? Have we got the right axes? In\n", "machine learning the answer is to 'let the data speak'. Use the data to\n", "try and obtain such a lay out. To do this we first need to obtain the\n", "data.\n", "\n", "## Obtaining the Data \\[edit\\]\n", "\n", "We are using a functionality of the Open Data Science software library\n", "to obtain the data. This functionality involves some prewritten code\n", "which distributes to each of you a google spreadsheet where you can rate\n", "movies that you've seen. For completeness the code follows. Try and read\n", "and understand the code, but don't run it! It has already been run\n", "centrally by me." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods\n", "import pandas as pd\n", "import numpy as np\n", "user_data = pd.DataFrame(index=movies.index, columns=['title', 'year', 'rating', 'prediction'])\n", "user_data['title']=movies.Film\n", "user_data['year']=movies.Year\n", "\n", "accumulator=pods.lab.distributor(spreadsheet_title='COM4509/6509 Movie Ratings:', user_sep='\\t')\n", "# function to apply to data before giving it to user \n", "# Select 50 movies at random and order them according to year.\n", "max_movies = 50\n", "function = lambda x: x.loc[np.random.permutation(x.index)[:max_movies]].sort(columns='year')\n", "accumulator.write(data_frame=user_data, comment='Film Ratings', \n", " function=function)\n", "accumulator.write_comment('Rate Movie Here (score 1-5)', row=1, column=4)\n", "accumulator.share(share_type='writer', send_notifications=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In your Google docs account you should be able to find a spreadsheet\n", "called 'COM4509/6509 Movie Ratings: Your Name'. In the spreadsheet You\n", "should find a number of movies listed according to year. In the column\n", "titled 'rating' please place your rating using a score of 1-5 for *any*\n", "of the movies you've seen from the list.\n", "\n", "Once you have placed your ratings we can download the data from your\n", "spreadsheets to a central file where the ratings of the whole class can\n", "be stored. We will build an algorithm on these ratings and use them to\n", "make predictions for the rest of the class. Firstly, here's the code for\n", "reading the ratings from each of the spreadsheets." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import os\n", "\n", "accumulator = pods.lab.distributor(user_sep='\\t')\n", "data = accumulator.read(usecols=['rating'], dtype={'index':int, 'rating':np.float64}, header=2)\n", "\n", "for user in data:\n", " if data[user].rating.count()>0: # only add user if they rated something\n", " # add a new field to movies with that user's ratings.\n", " movies[user] = data[user]['rating']\n", "\n", "# Store the csv on disk where it will be shared through dropbox.\n", "movies.to_csv(os.path.join(pods.lab.class_dir,'movies.csv'), index_label='index')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we will convert our data structure into a form that is appropriate\n", "for processing. We will convert the `movies` object into a data base\n", "which contains the movie, the user and the score using the following\n", "command." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import os" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# uncomment the line below if you are doing this task by self study.\n", "pods.util.download_url('https://dl.dropboxusercontent.com/u/4347554/mlai_movies.csv', store_directory = 'class_movie', save_name='movies.csv')\n", "#pods.util.download_url('https://www.dropbox.com/s/s6gqvp9b383b59y/movies.csv?dl=0&raw=1', store_directory = 'class_movie', save_name='movies.csv')\n", "movies = pd.read_csv(os.path.join('class_movie', 'movies.csv'),encoding='latin-1').set_index('index')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Question 2\n", "\n", "The movies data is now in a data frame which contains one column for\n", "each user rating the movie. There are some entries that contain 'NaN'.\n", "What does the 'NaN' mean in this context?\n", "\n", "*5 marks*\n", "\n", "### Write your answer to Question 2 here\n", "\n", "## Processing the Data\n", "\n", "We will now prepare the data set for processing. To do this we are going\n", "to conver the data into a new format using the `melt` command." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "user_names = list(set(movies.columns)-set(movies.columns[:9]))\n", "Y = pd.melt(movies.reset_index(), id_vars=['Film', 'index'], \n", " var_name='user', value_name='rating', \n", " value_vars=user_names)\n", "Y = Y.dropna(axis=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What is a pivot table? What does the `pandas` command\n", "\n", "`pd.melt` do?\n", "\n", "## Measuring Similarity \\[edit\\]\n", "\n", "We now need a measure for determining the similarity between the item\n", "and the user: how close the user is sitting to the item in the rooom if\n", "you like. We are going to use the inner product between the vector\n", "representing the item and the vector representing the user.\n", "\n", "An inner product (or [dot\n", "product](http://en.wikipedia.org/wiki/Dot_product)) between two vectors\n", "$\\mathbf{a}$ and $\\mathbf{b}$ is written as $\\mathbf{a}\\cdot\\mathbf{b}$.\n", "Or in vector notation we sometimes write it as\n", "$\\mathbf{a}^\\top\\mathbf{b}$. An inner product is simply the sume of the\n", "products of each element of the vector, $$\n", "\\mathbf{a}^\\top\\mathbf{b} = \\sum_{i} a_i b_i\n", "$$ The inner product can be seen as a measure of similarity. The inner\n", "product gives us the cosine of the angle between the two vectors\n", "multiplied by their length. The smaller the angle between two vectors\n", "the larger the inner product. $$\n", "\\mathbf{a}^\\top\\mathbf{b} = |\\mathbf{a}||\\mathbf{b}| \\cos(\\theta)\n", "$$ where $\\theta$ is the angle between two vectors and $|\\mathbf{a}|$\n", "and $|\\mathbf{b}|$ are the respective lengths of the two vectors.\n", "\n", "Since we want each user to be sitting near each item, then we want the\n", "inner product to be large for any two items which are rated highly by\n", "that user. We can do this by trying to force the inner product\n", "$\\mathbf{u}_i^\\top\\mathbf{v}_j$ to be similar to the rating given by the\n", "user, $y_{i,j}$. To ensure this we will use a least squares objective\n", "function for all user ratings.\n", "\n", "## Objective Function\n", "\n", "The error function (or objective function, or cost function) we will\n", "choose is known as 'sum of squares', we will aim to minimize the sum of\n", "squared squared error between the inner product of $\\mathbf{u}_i$ and\n", "$\\mathbf{v}_i$ and the observed score for the user/item pairing, given\n", "by $y_{i, j}$.\n", "\n", "The total objective function can be written as $$\n", "E(\\mathbf{U}, \\mathbf{V}) = \\sum_{i,j} s_{i,j} (y_{i,j} - \\mathbf{u}_i^\\top \\mathbf{v}_j)^2\n", "$$ where $s_{i,j}$ is an *indicator* variable that is 1 if user $i$ has\n", "rated item $j$ and is zero otherwise. Here $\\mathbf{U}$ is the matrix\n", "made up of all the vectors $\\mathbf{u}$, $$\n", "\\mathbf{U} = \\begin{bmatrix} \\mathbf{u}_1 \\dots \\mathbf{u}_n\\end{bmatrix}^\\top\n", "$$ where we note that $i$th *row* of $\\mathbf{U}$ contains the vector\n", "associated with the $i$th user and $n$ is the total number of users.\n", "This form of matrix is known as a *design matrix*. Similarly, we define\n", "the matrix $$\n", "\\mathbf{V} = \\begin{bmatrix} \\mathbf{v}_1 \\dots \\mathbf{v}_m\\end{bmatrix}^\\top\n", "$$ where again the $j$th row of $\\mathbf{V}$ contains the vector\n", "associated with the $j$th item and $m$ is the total number of items in\n", "the data set.\n", "\n", "## Objective Optimization\n", "\n", "The idea is to mimimize this objective. A standard, simple, technique\n", "for minimizing an objective is *gradient descent* or *steepest descent*.\n", "In gradient descent we simply choose to update each parameter in the\n", "model by subtracting a multiple of the objective function's gradient\n", "with respect to the parameters. So for a parameter $u_{i,j}$ from the\n", "matrix $\\mathbf{U}$ we would have an update as follows: $$ \n", "u_{k,\\ell} \\leftarrow u_{k,\\ell} - \\eta \\frac{\\text{d}\n", "E(\\mathbf{U}, \\mathbf{V})}{\\text{d}u_{k,\\ell}} \n", "$$ where $\\eta$ (which is pronounced *eta* in English) is a Greek letter\n", "representing the *learning rate*.\n", "\n", "We can compute the gradient of the objective function with respect to\n", "$u_{k,\\ell}$ as $$\n", "\\frac{\\text{d}E(\\mathbf{U}, \\mathbf{V})}{\\text{d}u_{k,\\ell}} = -2\n", "\\sum_j s_{k,j}v_{j,\\ell}(y_{k, j} - \\mathbf{u}_k^\\top\\mathbf{v}_{j}).\n", "$$ Similarly each parameter $v_{i,j}$ needs to be updated according to\n", "its gradient.\n", "\n", "### Question 4\n", "\n", "What is the gradient of the objective function with respect to\n", "$v_{k, \\ell}$? Write your answer in the box below, and explain which\n", "differentiation techniques you used to get there. You will be expected\n", "to justify your answer in class by oral questioning. Create a function\n", "for computing this gradient that is used in the algorithm below.\n", "\n", "*20 marks*\n", "\n", "### Write your answer to Question 4 here\n", "\n", "## Steepest Descent Algorithm\n", "\n", "In the steepest descent algorithm we aim to minimize the objective\n", "function by subtacting the gradient of the objective function from the\n", "parameters.\n", "\n", "## Initialisation\n", "\n", "To start with though, we need initial values for the matrix $\\mathbf{U}$\n", "and the matrix $\\mathbf{V}$. Let's create them as `pandas` data frames\n", "and initialise them randomly with small values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "q = 2 # the dimension of our map of the 'library'\n", "learn_rate = 0.01\n", "U = pd.DataFrame(np.random.normal(size=(len(user_names), q))*0.001, index=user_names)\n", "V = pd.DataFrame(np.random.normal(size=(len(movies.index), q))*0.001, index=movies.index)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also will subtract the mean from the rating before we try and predict\n", "them predictions. Have a think about why this might be a good idea\n", "(hint, what will the gradients be if we don't subtract the mean)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Y['rating'] -= Y['rating'].mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have the initial values set, we can start the optimization.\n", "First we define a function for the gradient of the objective and the\n", "objective function itself." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def objective_gradient(Y, U, V):\n", " gU = pd.DataFrame(np.zeros((U.shape)), index=U.index)\n", " gV = pd.DataFrame(np.zeros((V.shape)), index=V.index)\n", " obj = 0.\n", " for ind, series in Y.iterrows():\n", " film = series['index']\n", " user = series['user']\n", " rating = series['rating']\n", " prediction = np.dot(U.loc[user], V.loc[film]) # vTu\n", " diff = prediction - rating # vTu - y\n", " obj += diff*diff\n", " gU.loc[user] += 2*diff*V.loc[film]\n", " gV.loc[film] += 2*diff*U.loc[user]\n", " return obj, gU, gV" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can write our simple optimisation route. This allows us to\n", "observe the objective function as the optimization proceeds." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sys\n", "iterations = 100\n", "for i in range(iterations):\n", " obj, gU, gV = objective_gradient(Y, U, V)\n", " print(\"Iteration\", i, \"Objective function: \", obj)\n", " U -= learn_rate*gU\n", " V -= learn_rate*gV" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Question 5\n", "\n", "What happens as you increase the number of iterations? What happens if\n", "you increase the learning rate?\n", "\n", "*10 marks*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Write code for your answer to Question 5 in this box\n", "# provide the answers so that the code runs correctly otherwise you will loose marks!\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Stochastic Gradient Descent or Robbins Monroe Algorithm\n", "\n", "Stochastic gradient descent involves updating separating each gradient\n", "update according to each separate observation, rather than summing over\n", "them all. It is an approximate optimization method, but it has proven\n", "convergence under certain conditions and can be much faster in practice.\n", "It is used widely by internet companies for doing machine learning in\n", "practice. For example, Facebook's ad ranking algorithm uses stochastic\n", "gradient descent.\n", "\n", "### Question 6\n", "\n", "Create a stochastic gradient descent version of the algorithm. Monitor\n", "the objective function after every 1000 updates to ensure that it is\n", "decreasing. When you have finished, plot the movie map and the user map\n", "in two dimensions. Label the plots with the name of the movie or user.\n", "\n", "*30 marks*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Write code for your answer to Question 6 in this box\n", "# provide the answers so that the code runs correctly otherwise you will loose marks!\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Making Predictions\n", "\n", "Predictions can be made from the model of the appropriate rating for a\n", "given user, $i$, for a given film, $j$, by simply taking the inner\n", "product between their vectors $\\mathbf{u}_i$ and $\\mathbf{v}_j$.\n", "\n", "## Is Our Map Enough? Are Our Data Enough?\n", "\n", "Is two dimensions really enough to capture the complexity of humans and\n", "their artforms? Perhaps we need even more dimensions to capture that\n", "complexity. Extending our books analogy further, consider how we should\n", "place books that have a historical timeframe as well as some\n", "geographical location. Do we really want books from the 2nd World War to\n", "sit alongside books from the Roman Empire? Books on the American\n", "invasion of Sicily in 1943 are perhaps less related to books about\n", "Carthage than those that study the Jewish Revolt from 66-70 (in the\n", "Roman Province of Judaea). So books that relate to subjects which are\n", "closer in time should be stored together. However, a student of\n", "rebellion against empire may also be interested in the relationship\n", "between the Jewish Revolt of 66-70 and the Indian Rebellion of 1857,\n", "nearly 1800 years later. Whilst the technologies are different, the\n", "psychology of the people is shared: a rebellious nation angainst their\n", "imperial masters, triggered by misrule with a religious and cultural\n", "background. To capture such complexities we would need further\n", "dimensions in our latent representation. But are further dimensions\n", "justified by the amount of data we have? Can we really understand the\n", "facets of a film that only has at most three or four ratings?\n", "\n", "## Going Further\n", "\n", "If you want to take this model further then you'll need more data. One\n", "possible source of data is the [`movielens` data\n", "set](http://grouplens.org/datasets/movielens/). They have data sets\n", "containing up to ten million movie ratings. The few ratings we were able\n", "to collect in the class are not enough to capture the rich structure\n", "underlying these films. Imagine if we assume that the ratings are\n", "uniformly distributed between 1 and 5. If you know something about\n", "information theory then you could use that to work out the maximum\n", "number of *bits* of information we could gain per rating.\n", "\n", "Now we'll download the movielens 100k data and see if we can extract\n", "information about these movies." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pods" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = pods.datasets.movielens100k()\n", "Y=d['Y']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Question 7\n", "\n", "Use stochastic gradient descent to make a movie map for the movielens\n", "data. Plot the map of the movies when you are finished.\n", "\n", "*15 marks*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Write code for your answer to Question 7 in this box\n", "# provide the answers so that the code runs correctly otherwise you will loose marks!\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# References" ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 2 }