{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Programming Exercise 8: Anomaly Detection and Recommender Systems\n", "#### Author - Rishabh Jain" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import warnings,os\n", "warnings.simplefilter('ignore')\n", "\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import pandas as pd\n", "import numpy as np\n", "%matplotlib inline\n", "\n", "from scipy.io import loadmat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1    Anomaly detection\n", "\n", "#### Problem Statement\n", "In this part of the exercise, we will implement an anomaly detection algorithm to detect the anomalous behavior in server computers. The features measure the throughput (mb/s) and latency (ms) of response of each server. While our servers were operating, we collected m=307 examples of how they were behaving, and thus have an unlabeled dataset. We suspect that the vast majority of these examples are \"Normal\" (Non-anomalous) examples of servers operating normally, but there might also be some examples of servers acting anomalously within the dataset.\n", "\n", "**We will use a gaussian model to detect the anomalies in our dataset. On this dataset, we will fit a gaussian distribution and then find the values that have very low probability and hence can be considered anomalies.**" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "__header__\n", "__version__\n", "__globals__\n", "X\n", "Xval\n", "yval\n" ] } ], "source": [ "# Loading dataset\n", "mat=loadmat('ex8data1.mat')\n", "X=mat['X']\n", "print(*mat.keys(),sep='\\n')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": false }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig=sns.scatterplot(X[:,0],X[:,1],marker='x')\n", "fig.set(xlabel='Latency (ms)',ylabel='Throughput (mb/s)',title='Original Dataset');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1.1    Gaussian Distribution\n", "To perform anomaly detection, we will first need to fit a model to the data's distribution. \n", "Given a training set, we want to estimate the gaussian distribution for each of the features. For each feature, we need to find the parameters $\\mu_{i}$ and $\\sigma_{i}^{2}$ that fit the data. The gaussian distribution is given by:\n", "\n", "$$\\boxed{p(x_{j};\\mu_{j},\\sigma_{j}^{2})=\\frac{1}{\\sqrt{2\\pi\\sigma_{j}^{2}}}e^{-\\frac{(X_{j}-\\mu_{j})^{2}}{2\\sigma_{j}^{2}}}}$$\n", "\n", "$$p(x)=\\Pi_{j=1}^{n}p(x_{j};\\mu_{j},\\sigma_{j}^{2})$$\n", "\n", "$$\\text{Anomaly, if }p(x)< \\varepsilon$$" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def mutlivariateGaussian(X,mu,sigma):\n", " p=(1/(np.sqrt(2*np.pi*np.power(sigma,2))))*np.exp(-np.power(X-mu,2)/(2*np.power(sigma,2)))\n", " p=np.product(p,axis=1).reshape((-1,1))\n", " return p" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1.2    Estimating parameters for a Gaussian\n", "We can estimate the parameters ($\\mu_{i},\\sigma_{i}^{2}$) of the i-th feature by using the following equations. To estimate the mean, we will use\n", "$$\\mu_{i}=\\frac{1}{m}\\sum_{j=1}^{m}x_{i}^{(j)}$$\n", "\n", "And for the variance, we will use:\n", "$$\\sigma_{i}^{2}=\\frac{1}{m}\\sum_{j=1}^{m}(x_{i}^{(j)}-\\mu_{i})^2$$" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def estimateGaussian(X):\n", " mu=np.mean(X,axis=0)\n", " sigma=np.std(X,axis=0)\n", " return (mu,sigma)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at the Gaussian distribution contours of the distribution fit to the dataset." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def plotGaussianContour(X,mu,sigma,title=None):\n", " # Constructing grid around the min and max range of original data\n", " temp=np.linspace(np.min(X)-5,np.max(X)+5)\n", " [x1,x2]=np.meshgrid(temp,temp)\n", " # Computing the Gaussian Density Probability for the grid\n", " temp=np.array([x1.reshape(-1),x2.reshape(-1)]).T\n", " z=mutlivariateGaussian(temp,mu,sigma).reshape(x1.shape)\n", " # Plotting\n", " levels=np.power(10,list(map(lambda x:float(x),list(range(-20,0,3)))))\n", " plt.contour(x1,x2,z,levels=levels);\n", " fig=sns.scatterplot(X[:,0],X[:,1],marker='x')\n", " title='Gaussian Distribution Contours' if title==None else title\n", " fig.set(xlabel='Latency (ms)',ylabel='Throughput (mb/s)',title=title);" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Estimating Gaussian parameters\n", "mu,sigma=estimateGaussian(X)\n", "# Computing Density Probability\n", "p=mutlivariateGaussian(X,mu,sigma)\n", "# Plotting\n", "plotGaussianContour(X,mu,sigma);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1.3    Selecting the threshold, $\\varepsilon$\n", "Now that we have estimated the gaussian parameters, we can investigate which examples have a very high probability given this distribution and which examples have a very low probability. The low probability examples are more likely to be the anomalies in our dataset. One way to determine which examples are the anomalies is to select the threshold based on a cross validation set. In this part of the exericse, we will implement an algorithm to select the threshold using F1 score on a cross validation set. \n", "\n", "**$y=1$ corresponds to an anomalous examples and $y=0$ corresponds to a normal examples.**" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": false }, "outputs": [], "source": [ "Xval=mat['Xval']\n", "yval=mat['yval']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For many different values of $\\varepsilon$, we will compute the resulting $F_1$ score by computing how many examples the current threshold classifies correctly and incorrectly. The $F_1$ score is computed using **Precision** (prec) and **Recall** (rec):\n", "\n", "$$\\boxed{F1=\\frac{2.prec.rec}{prec+rec}}$$\n", "\n", "We can compute Precision (prec) and Recall (rec) using:\n", "\n", "$$\\boxed{prec=\\frac{tp}{tp+fp}}$$\n", "\n", "$$\\boxed{rec=\\frac{tp}{tp+fn}}$$\n", "\n", "where,\n", "- $tp$ is the number of true positives: the ground truth label says it's an anomaly and our algorithm classified it as an anomaly.\n", "- $fp$ is the number of false positives: the ground truth label says it's not an anomaly and our algorithm classified it as an anomaly.\n", "- $fn$ is the number of false negatives: the ground truth labels says it's an anomaly and our alogrithm classified it as not being anomalous." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def selectThreshold(yval,pval):\n", " '''\n", " Returns the threshold value of epsilon and the corresponding F1 score.\n", " '''\n", " bestEpsilon=bestF1=F1=0\n", " epsilons=np.linspace(np.min(pval),np.max(pval),1000)\n", " for epsilon in epsilons:\n", " cvPrediction=pval0:\n", " prec=tp/(tp+fp)\n", " # Computing Recall\n", " if tp+fn>0:\n", " rec=tp/(tp+fn)\n", " # Computing F1 Score\n", " if prec+rec>0:\n", " F1=(2*prec*rec)/(prec+rec)\n", " if F1>bestF1:\n", " bestF1=F1\n", " bestEpsilon=epsilon\n", " return (bestEpsilon,bestF1)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": false }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Using sample mean and std. to compute density probability for cross validation set\n", "pval=mutlivariateGaussian(Xval,mu,sigma)\n", "# Calculating the threshold based on the best F1 Score\n", "epsilon,f1=selectThreshold(yval,pval)\n", "# Getting indices for anomalous examples\n", "indices=np.where((p\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012345678910
863-3.357801-6.06349519.827681-2.843608-6.47280513.165775-1.30180411.396985-11.5868983.16464317.165776
3772.5412021.2953662.816464-18.192452-5.44531417.679813-15.3258509.203726-3.411643-4.39875414.293232
2820.506733-1.74664321.146559-27.537185-9.62469024.728547-18.7321478.330862-0.3971589.1972459.742638
3697.149015-7.7357636.911932-13.206095-30.845135-3.0047592.893837-1.338351-10.686856-12.0073507.673190
5210.440863-20.1224968.223160-2.15436212.276639-4.383862-11.5438523.127246-0.237430-1.6946384.206813
\n", "" ], "text/plain": [ " 0 1 2 3 4 5 \\\n", "863 -3.357801 -6.063495 19.827681 -2.843608 -6.472805 13.165775 \n", "377 2.541202 1.295366 2.816464 -18.192452 -5.445314 17.679813 \n", "28 20.506733 -1.746643 21.146559 -27.537185 -9.624690 24.728547 \n", "369 7.149015 -7.735763 6.911932 -13.206095 -30.845135 -3.004759 \n", "52 10.440863 -20.122496 8.223160 -2.154362 12.276639 -4.383862 \n", "\n", " 6 7 8 9 10 \n", "863 -1.301804 11.396985 -11.586898 3.164643 17.165776 \n", "377 -15.325850 9.203726 -3.411643 -4.398754 14.293232 \n", "28 -18.732147 8.330862 -0.397158 9.197245 9.742638 \n", "369 2.893837 -1.338351 -10.686856 -12.007350 7.673190 \n", "52 -11.543852 3.127246 -0.237430 -1.694638 4.206813 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(X).sample(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Principal Component Analysis (Optional)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEKCAYAAAAfGVI8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzt3XeYVOX5xvHvQ1lg6b0vS0eks4igsaHGgl2sWFE0iWKNLYklJsauYImisf1UDMUegiiIPSi9916Xtixtly3P748Z4koow7IzZ3bm/lyXFzNnljn3COy9p7zva+6OiIgkrzJBBxARkWCpCEREkpyKQEQkyakIRESSnIpARCTJqQhERJKcikBEJMmpCEREkpyKQEQkyZULOkAk6tSp4+np6UHHEBEpVSZPnrzR3ese7OtKRRGkp6czadKkoGOIiJQqZrY8kq/TqSERkSSnIhARSXIqAhGRJKciEBFJcioCEZEkpyIQEUlyKgIRkSSnIhARiUOLMrfx0CezyS8ojPq+SsWAMhGRZLFg/TaGjFvIv2aupVL5spzftQkdm1SP6j5VBCIicWDeumyGjFvI6JnrqJxSlt8c35LrftWCWpVTor5vFYGISIBmr9nKc+MWMWb2OqpUKMdNJ7ZiwLHNqRmDAthDRSAiEoCZq7YyZPxCPp+znqoVyzGoT2sGHNOc6qnlY55FRSAiEkPTV2YxZNxCxs3LpFrFctx2chuuPiad6pViXwB7qAhERGJgyootDBm3kAnzN1AjtTx3ntqGq3qnU7VicAWwh4pARCSKJi/fzLNfLOSbhRupmVqeu05ry5W90qlSIX6+/cZPEhGRBPLj0s0MHreA7xZtonblFO49vR39j25G5TgqgD3iL5GISCn2w+JNDB63gP8s2UydKhX445lHcFnPNFJT4vfbbfwmExEpJdyd7xdvYvC4hfy4dDN1q1bgT33bc9lRaVRKKRt0vINSEYiIFJO7883CjQwZt5BJy7dQv1oFHjyrPZcclUbF8vFfAHuoCEREDpG7M2HBBoaMW8jUFVk0rF6Rh885kn4ZTUtVAeyhIhARiZC78+X8TAaPW8T0lVk0rlGJv57XgQu7N6FCudJXAHuoCEREDsLd+WJuJkPGLWTm6q00qVmJR8/vyPndmpBSrvRP4qwiEBHZD3dn7Jz1DP5iIXPWZpNWK5XHL+jEed0aU75s6S+APVQEIiJ72XMN4OmxC5i5eivptVN5sl9nzu3SiHIJVAB7qAhERIr4ftFGnhw7nykrsmhSsxJPXNiJ87o2TsgC2CNqRWBmrwF9gUx37xDeVgv4J5AOLAMucvct0cogIhKpScs289TYBfywZBMNqlXkr+d1oF/3pglxDeBgonlE8AbwPPBWkW33AOPc/VEzuyf8/O4oZhAROaAZq7J4auwCvlqwgTpVKvDAWe25tJSNAzhcUSsCd//azNL32nwOcEL48ZvABFQEIhKAuWuzefrzBXw+Zz01U8tz7+ntuKJXs7ieCiJaYv2J67v7WgB3X2tm9WK8fxFJcosyt/PsFwv4dMZaqlYsx+2ntOGaY+JjOuigxG31mdlAYCBAWlpawGlEpLRbvmkHg8ct5MOpq6lUviw3ndiK63/VIpAVweJNrItgvZk1DB8NNAQy9/eF7j4UGAqQkZHhsQooIollddYunh+/kBGTVlG2jHHdr1pww3EtqF2lQtDR4kasi+Bj4Crg0fCvH8V4/yKSJDKzc3jhy0UM+3ElAJf3TON3J7aiXrWKASeLP9G8fXQYoQvDdcxsFfAAoQIYbmYDgBVAv2jtX0SS06btubz89RLe/H4Z+YXORRlNuOmk1jSuUSnoaHErmncNXbqfl/pEa58ikry27szjlW+W8Np3S8nJK+Dcro25pU9rmtWuHHS0uBe3F4tFRCKxLSeP179bxivfLGFbTj59OzXk1pPb0KpelaCjlRoqAhEplXbuzuetH5bz0leLydqZxynt63P7KW04omG1oKOVOioCESlVcvIKeHfiCl6csJiN23M5vk1dbj+lDZ2b1gg6WqmlIhCRUmF3fiEjJq/kuXGLWJedQ68WtXmpfzcy0msFHa3UUxGISFwrLHQ+mbGGJ8fOZ+XmXXRvVpOnL+pM71Z1go6WMFQEIhK3vl+0kUf+PZdZq7Np37Aar1/TgRPa1MXMgo6WUFQEIhJ35q/bxt/+PZcJ8zfQuEYlnrm4M+d0bkyZMiqAaFARiEjcWLc1h6c/n8/IyauoUqEc953Rjit7pSfVlNBBUBGISOC25eTx8ldLePXbJRQWwrXHNOd3J7aiZuWUoKMlBRWBiARmd34hw35cweBxC9m8YzfndGnEnae2pWmt1KCjJRUVgYjEnLszZtY6Hhszj2WbdnJ0i1rcd8YRdGqisQBBUBGISEz9tGwzj4yey9QVWbSpX4XXr+7BCW11J1CQVAQiEhOLN2znsX/PY+yc9dSvVoHHLujIhd2bUlZ3AgVORSAiUbVhWy6Dxy1g2I8rqVS+LHee2oZrj22elGsDxyv9SYhIVOzIzefVb5Yy9OvF5OYXcnnPNAb1aU0drQwWd1QEIlKi8gsKGT5pFc98sYAN23I5vUMDfv/rtrSoq2mh45WKQERKhLszbm4mj46Zx6LM7XRvVpOX+nene7OaQUeTg1ARiMhhm7Yyi7+NnsvEpZtpUacyL/Xvzq+PrK87gUoJFYGIFNvyTTt44rP5fDpjLXWqpPDwuR24pEdTypctE3Q0OQQqAhE5ZFt27GbI+IW8/Z/llCtThkEntWLg8S2pUkHfUkoj/amJSMRy8gp4/btlvDhhETty87m4R1NuPbkN9atVDDqaHAYVgYgclLvz2ez1/OVfc1i1ZRd92tXj7tPb0aZ+1aCjSQlQEYjIAS3esJ0HP57NNws30rZ+Vd69vie9W2p1sESiIhCRfdqem89z4xfy2rdLqViuLA+c1Z4rjm5GOV0ITjgqAhH5BXfn4+lreGT0XNZn59KvexPuOq0ddatqRHCiUhGIyH/NXZvNAx/P5selm+nYuDp/79+dbmkaEJboVAQiwtZdeTzz+QLe+mEZ1SuV55HzOnJxD80MmixUBCJJrLDQGTl5FY+NmceWnbu5vGcz7ji1DTVStURkMlERiCSp6SuzuP/j2UxfmUX3ZjV58+yj6NC4etCxJAAqApEks3nHbp74bB7v/bSS2pUr8PRFnTmva2PNC5TEVAQiSSK/oJB3f1zBU2MXsCM3nwHHNOeWk1tTtWL5oKNJwFQEIkngp2Wbuf+j2cxdm03vlrV56Owjaa1RwRIWSBGY2W3AdYADM4Fr3D0niCwiiSwzO4e//XseH0xdTaPqFXnx8m6c3qGBTgPJL8S8CMysMTAIaO/uu8xsOHAJ8Eass4gkqryCQt74bhnPfrGAvALnphNb8dsTW2qdYNmng/6tMLMawJVAetGvd/dBh7nfSmaWB6QCaw7jvUSkiG8XbuTBT2azKHM7J7Wrx/1925Nep3LQsSSORfLjwWjgP4RO4RQe7g7dfbWZPQmsAHYBY9197OG+r0iyW521i798Ood/z1pHWq1U/nFVBn2OqB90LCkFIimCiu5+e0nt0MxqAucAzYEsYISZ9Xf3t/f6uoHAQIC0tLSS2r1IwsnJK+CVr5fwwoRFANxxShuuP64FFcuXDTiZlBaRFMH/mdn1wKdA7p6N7r65mPs8GVjq7hsAzOx9oDfwiyJw96HAUICMjAwv5r5EEtq4uet56JM5rNi8kzM6NuAPZ7ancY1KQceSUiaSItgNPAH8gdBdPoR/bVHMfa4AjjazVEKnhvoAk4r5XiJJadnGHfz50zmMn5dJy7qVeXtAT45trTUCpHgiKYLbgVbuvrEkdujuE81sJDAFyAemEv7JX0QObHd+IUO/XsyQ8YsoX8b4wxlHcFXvdFLKaY0AKb5IimA2sLMkd+ruDwAPlOR7iiS6aSuzuGfUDOat28aZnRpyf9/2WitYSkQkRVAATDOzL/nlNYLDuX1URCK0Izefp8Yu4PXvl1K/akVeuTKDU9rrbiApOZEUwYfh/0QkxibMz+QPH8xiddYu+h+dxl2ntaOa5gaSEnbQInD3N2MRRER+tnnHbh7+dA4fTF1Ny7qVGXFjL3qk1wo6liSoSEYWL+Xnu4X+y92Le9eQiOzHnvWCH/pkDtm78hh0Uit+e2IrjQmQqIrk1FBGkccVgX6AfjQRKWGrtuzkjx/OYsL8DXRpWoNHL+hIuwbVgo4lSSCSU0Ob9tr0rJl9C9wfnUgiyaWg0Hnrh2U88dl8AB44qz1X9krXesESM5GcGupW5GkZQkcImshcpATMX7eNu0fNYNrKLE5oW5e/nNuBJjVTg44lSSaSU0NPFXmcDywDLopKGpEkkZtfwAvjF/HihMVUq1SewZd04ezOjbROgAQiklNDJ8YiiEiy+GnZZu4ZNYPFG3ZwXtfG/Klve2pVTgk6liSxSE4N3QK8DmwDXgG6Afdo6miRQ7MtJ4/Hxszj7f+soHGNSrx57VEc36Zu0LFEIjo1dK27DzazXwP1gGsIFYOKQCRCn89Zz58+nMX6bTlce0xz7ji1DZUraLUwiQ+R/E3cc9LyDOB1d59uOpEpEpHMbTk89PEc/jVzLe0aVOXv/bvRNa1m0LFEfiGSIphsZmMJLSRzr5lVpQRWKhNJZO7OiEmr+Mu/5pCTV8idp7Zh4HEtNUuoxKVIimAA0AVY4u47zaw2odNDIrIPyzft4N73Z/L94k0clV6LR87vSKt6VYKOJbJfkdw1VEho7YA9zzcBew8yE0l6+QWF/OPbpTzzxQLKlynDX8/rwKU90iijgWES53S1SqQEzFq9lbtHzWD2mmxOaV+fh8/pQIPqWitASgcVgchh2LW7gGfHLeDVb5ZSq3IKf7+8G6d1aKCBYVKqRFQEZnYs0NrdXzezukAVd18a3Wgi8e0/SzZx96gZLN+0k4szmnLfGUdQPVVrBUjpE8mAsgcIzS/UltD4gfLA28Ax0Y0mEp927s7n8THzeeP7ZTSrncq71/WkdystHC+lVyRHBOcBXQlfMHb3NeFbSEWSzqRlm7lzxHSWbdrJ1b3Tueu0tqSm6AyrlG6R/A3e7e5uZg5gZpWjnEkk7uTkFfDU2Pm8+u1SGteoxLDrj6ZXy9pBxxIpEZEUwXAzexmoYWbXA9cSmnNIJClMW5nFHcOnsXjDDi7rmcZ9ZxxBFU0PIQkkknEET5rZKUA2oesE97v751FPJhKw3PwChoxbyN8nLKZ+tYq8de1RHKdJ4iQBRXKx+DZghL75SzKZtXord46Yzrx127goowl/7NueahV1R5AkpkiOb6sBn5nZZuA9YKS7r49uLJFg5BUU8sKXi3h+/CJqVU7htaszOKld/aBjiURVJKeGHgIeMrNOwMXAV2a2yt1Pjno6kRiaty6bO4ZPZ/aabM7r2pgHzmpPjVQtGCOJ71CueGUC6wjNM1QvOnFEYi+/oJCXv17Cs18soHql8rzUvzundWgQdCyRmInkGsFvCB0J1AVGAte7+5xoBxOJhUWZ27hjxAymr8zizI4N+fM5R1K7SoWgY4nEVCRHBM2AW919WrTDiMRKQaHz2rdLeWLsfCqnlOX5y7rSt1OjoGOJBGK/RWBm1dw9G3g8/LxW0dfdfXOUs4lExdKNO/j9iOlMWr6FU9rX55HzOlK3qo4CJHkd6IjgXaAvMBlwfl6ykvDzFlHMJVLiCgudN39YxmNj5pFStgzPXNyZc7s01kyhkvT2WwTu3jf8a/PYxRGJjpWbd/L7kdP5z5LNnNC2Lo+e30nrBYiERXKxeJy79znYtkNhZjWAV4EOhI4urnX3H4r7fiL74+68M3EFj4yeSxkzHr+gE/0ymugoQKSIA10jqAikAnXMrCY/nxqqBhzuVbXBwBh3v9DMUsL7ESlRa7J2cfeoGXyzcCPHtqrDYxd2onGNSkHHEok7BzoiuAG4ldA3/cn8XATZwAvF3aGZVQOOA64GcPfdwO7ivp/I3tydEZNW8fCncyhw5y/nduDynmk6ChDZjwNdIxgMDDazm939uRLcZwtgA/C6mXUmVDK3uPuOEtyHJKn12Tnc+/5Mxs/LpGfzWjxxYWfSauuAU+RAIpli4jkz6wC0ByoW2f7WYeyzG3Czu080s8HAPcCfin6RmQ0EBgKkpaUVc1eSLNydj6at4YGPZ5ObX8ADZ7Xnql7plCmjowCRg4l0qcoTCBXBaOB04FuguEWwCljl7hPDz0cSKoJfcPehwFCAjIwML+a+JAls2JbLHz+cyWez19O9WU2euLATLepWCTqWSKkRycjiC4HOwFR3v8bM6hO646dY3H2dma00s7buPh/oA2jKCimWMbPWct8Hs9iem899Z7RjwLEtKKujAJFDEkkR7HL3QjPLD1/ozeTwB5PdDLwTvmNoCXDNYb6fJJnsnDwe+ngOo6asomPj6jx9UWda19dS2iLFEUkRTArf9/8KoQu724EfD2en4XmLMg7nPSR5/bB4E3eOmM667BwGndSKm/u0pnzZMkHHEim1IrlY/Nvww5fMbAxQzd1nRDeWyP/KySvgyc/m84/vlpJeuzIjb+xF17SaQccSKfUONKCs24Fec/cp0Ykk8r9mrd7K7cOnsWD9dvofHVpAPjVFC8iLlIQD/Ut66gCvOXBSCWcR+R8Fhc5LXy3m2S8WUDM1hTeu6cEJbbUukkhJOtCAshNjGURkb8s37eD24dOZvHwLZ3ZsyF/O7UDNylo6UqSkRTKO4Mp9bT+MAWUiB+TuvPfTSh7+dA5lyxjPXtyFc7o00hQRIlESyUnWHkUeVyR03/8Uij+gTGS/MrflcO+omYybl0nvlrV5sl9nGmmiOJGoiuSuoZuLPjez6sD/RS2RJK0xs9Zy7/sz2bm7gPv7tufq3poiQiQWinPbxU6gdUkHkeS19+CwZy7uTKt6GhwmEiuRXCP4hNBdQgBlCM05NDyaoSR57BkctnbrLg0OEwlIJEcETxZ5nA8sd/dVUcojSaLo4LBmtVIZ+ZvedNPgMJFARHKN4Cv474Iy5cKPa7n75ihnkwSlwWEi8SWSU0MDgYeBXUAhoZXKnMOfeE6STNHBYTVSU3j9mh6cqMFhIoGL5Mew3wNHuvvGaIeRxKXBYSLxK5IiWEzoTiGRQ6bBYSLxL5IiuBf43swmArl7Nrr7oKilkoSgwWEipUMkRfAyMB6YSegagchBaXCYSOkRSRHku/vtUU8iCSE7J48HP57N+1NW06FxNZ65qItWDhOJc5EUwZfhO4c+4ZenhnT7qPxC0cFhN5/UiptPak1KOQ0OE4l3kRTBZeFf7y2yTbePyn/lFRTy1NgFvPz1YprVSmXEjb3p3kyDw0RKi0gGlDWPRRApnVZt2cmgYVOZsiKLS3o05f6z2mtwmEgpo/UIpNjGzFrHXSOnU+gw5NKunN25UdCRRKQYtB6BHLKcvAIeGT2Xt35YTsfG1Xnu0q6k16kcdCwRKSatRyCHZMmG7dz07lTmrM1mwLHNufu0drogLFLKaT0CidgHU1fxhw9mkVKuDK9emcHJ7esHHUlESoDWI5CD2pGbz/0fzWbUlFX0SK/J4Eu6aoSwSALRegRyQHPXZnPTu1NYsnEHN5/Uilv6tKacFo4RSSj7LQIzawXU37MeQZHtvzKzCu6+OOrpJDDuzjsTV/DnT+dQvVJ53h7Qk2Na1Qk6lohEwYF+tHsW2LaP7bvCr0mC2rorj9+9O4U/fjiLns1rMXrQr1QCIgnsQKeG0t19xt4b3X2SmaVHLZEEatrKLG56dwprt+Zw92ntuOG4FposTiTBHagIKh7gNV0pTDCFhc6r3y7h8THzqV+tIsNv6KVpIkSSxIGK4Cczu97dXym60cwGAJOjG0tiadP2XO4YMZ0J8zfw6yPr8/gFnameWj7oWCISIwcqgluBD8zscn7+xp8BpADnRTuYxMYPizdx6z+nsmVHHn8+50iuOLqZVg8TSTL7LQJ3Xw/0NrMTgQ7hzf9y9/ElsWMzKwtMAla7e9+SeE+JXEGhM2TcQp4bv5D02pV57eoeHNmoetCxRCQAkUwx8SXwZRT2fQswF6gWhfeWA1i3NYdb3pvKxKWbOb9rYx4+twOVK2jGUJFkFci/fjNrApwJ/BXQ6mcx9OX8TO4YPp1duwt4sl9nLuzeJOhIIhKwoH4MfBa4C9jvGobhVdEGAqSlpcUoVuLanV/IE5/N45VvltKuQVWev6wbrepVCTqWiMSBmBeBmfUFMt19spmdsL+vc/ehwFCAjIwM39/XycGt3LyTm4ZNZfrKLPofncYfz2xPxfJlg44lInEiiCOCY4CzzewMQmMVqpnZ2+7eP4AsCW/0zLXcPSo0LvDFy7txRseGAScSkXgT8yJw93sJr38cPiK4UyVQ8nLyCnj40zm8M3EFXZrW4LlLu9K0VmrQsUQkDulWkQS0KHM7N707hXnrtnHDcS2489dtKa8ZQ0VkPwItAnefAEwIMkOiGTl5FX/6cBaVUsry+jU9OLFtvaAjiUic0xFBgsjJK+D+j2YxfNIqejavxZBLu1K/2oGmixIRCVERJIClG3fwm7cnM2/dNm46sRW3ndKGspoxVEQipCIo5f49cy2/HzmDcmVNp4JEpFhUBKXU7vxC/vbvubz+3TK6NK3BC5d3o7HWERaRYlARlEKrs3Zx07tTmLoii6t7p3PfGUeQUk53BYlI8agISpkJ8zO57Z/TyCtwXrisG2d20gAxETk8KoJSoqDQefaLBTz/5SLa1q/Ki5d3o0VdzRUkIodPRVAKbNiWyy3vTeX7xZvo170Jfz6nA5VSNFeQiJQMFUGcm7hkEzcPm8rWXXk8fmEnLspoGnQkEUkwKoI4VVjoDP1mCU98Np+0Wqm8ee1RHNFQa/iISMlTEcShrTvzuGPENL6Ym8kZHRvw2AWdqFpRi8mLSHSoCOLMjFVZ/PadKazPzuGBs9pzde90LSYvIlGlIogT7s7bE1fw8CdzqFMlheE39KJrWs2gY4lIElARxIEdufnc+/5MPp6+hhPa1uWZi7pQs3JK0LFEJEmoCAK2YP02fvP2ZJZu3MGdp7bhtye0oowmjBORGFIRBOiDqau47/1ZVK5Qjrev60nvlnWCjiQiSUhFEICcvAIe+mQOw35cwVHNa/H8pV2pp7UDRCQgKoIYW75pB799Zwqz12Rz4/EtufPUNpTTMpIiEiAVQQx9Nnsdd46YThkz/nFVBn2OqB90JBERFUEs5BUU8viYebzyzVI6NanOC5d1o2mt1KBjiYgAKoKoW7t1Fze/O5VJy7dwZa9m/OHMI6hQThPGiUj8UBFE0TcLN3DLe9PIyStgyKVdObtzo6AjiYj8DxVBFBQWOkPGL2TwuIW0rleFFy/vTqt6WjtAROKTiqCE7cjN55b3pvHF3PWc37UxfzmvA6kp+t8sIvFL36FK0OqsXQx44ycWrN/Gg2e15ypNGCcipYCKoIRMXr6FG/5vErl5hbx+zVEc36Zu0JFERCKiIigBH05dzV2jZtCgWkXeG5hBq3pVg44kIhIxFcFhKCx0nv48tKB8z+a1eKl/d80aKiKljoqgmHbuzuf2f05nzOx1XNKjKX8+pwMp5TRVhIiUPiqCYli7dRfXvTmJuWuz+eOZRzDg2Oa6KCwipZaK4BBNW5nF9W9NYtfuAl69KoOT2mm+IBEp3VQEh+CT6Wu4c8R06latwNsDetK2gS4Ki0jpF/OT2mbW1My+NLO5ZjbbzG6JdYZD5e488/kCbh42lU5NqvPR745RCYhIwgjiiCAfuMPdp5hZVWCymX3u7nMCyHJQOXkF3DFiOv+asZYLujXhkfM7aNI4EUkoMS8Cd18LrA0/3mZmc4HGQNwVwfrsHAa+NYkZq7dy7+ntGHhcC10UFpGEE+g1AjNLB7oCE/fx2kBgIEBaWlpMcwHMWr2V696cRHZOHkOvyOCU9rooLCKJKbAb382sCjAKuNXds/d+3d2HunuGu2fUrRvb6RpGz1zLhS99T9kyxqjf9FYJiEhCC+SIwMzKEyqBd9z9/SAy7Iu78/z4RTz1+QK6pdXg5SsyqFu1QtCxRESiKuZFYKGT7P8A5rr707He//7k5BVw96gZfDRtDed2acSjF3SiYnldFBaRxBfEEcExwBXATDObFt52n7uPDiALAJnbchj41mSmrczi979uy29PaKmLwiKSNIK4a+hbIG6+y85Zk811b/7Elp15vNS/G6d1aBh0JBGRmErqkcVjZ6/j1n9Oo3ql8oy4sRcdGlcPOpKISMwlZRG4Oy99tYTHP5tHpyY1eOWK7tSrVjHoWCIigUi6IsjNL+De92fy/pTV9O3UkCf7ddZFYRFJaklVBBu353Lj/01m0vIt3HZyGwb1aaWLwiKS9JKmCOaty2bAG5PYuD2X5y/rSt9OjYKOJCISF5KiCMbNXc+gYVOpXKEcw2/oReemNYKOJCISNxK6CNydf3y7lL+OnsuRjarx6pU9aFBdF4VFRIpK2CJwd+77YBbDflzBGR0b8FS/LlRK0UVhEZG9JWwRmBkt61bm5pNacdvJbShTRheFRUT2JWGLAOC6X7UIOoKISNwLbBpqERGJDyoCEZEkpyIQEUlyKgIRkSSnIhARSXIqAhGRJKciEBFJcioCEZEkZ+4edIaDMrMNwPJi/vY6wMYSjFMa6DMnB33mxHe4n7eZu9c92BeViiI4HGY2yd0zgs4RS/rMyUGfOfHF6vPq1JCISJJTEYiIJLlkKIKhQQcIgD5zctBnTnwx+bwJf41AREQOLBmOCERE5AASugjM7DQzm29mi8zsnqDzRJOZNTWzL81srpnNNrNbgs4UK2ZW1symmtmnQWeJBTOrYWYjzWxe+M+7V9CZos3Mbgv/vZ5lZsPMLOHWnDWz18ws08xmFdlWy8w+N7OF4V9rRmPfCVsEZlYWeAE4HWgPXGpm7YNNFVX5wB3ufgRwNPC7BP+8Rd0CzA06RAwNBsa4ezugMwn+2c2sMTAIyHD3DkBZ4JJgU0XFG8Bpe227Bxjn7q2BceHnJS5hiwA4Cljk7kvcfTfwHnBOwJmixt3XuvuU8ONthL45NA42VfSZWRPgTODVoLPEgpmL1a4cAAAE8UlEQVRVA44D/gHg7rvdPSvYVDFRDqhkZuWAVGBNwHlKnLt/DWzea/M5wJvhx28C50Zj34lcBI2BlUWeryIJvjECmFk60BWYGGySmHgWuAsoDDpIjLQANgCvh0+HvWpmlYMOFU3uvhp4ElgBrAW2uvvYYFPFTH13XwuhH/aAetHYSSIXwb5Wq0/4W6TMrAowCrjV3bODzhNNZtYXyHT3yUFniaFyQDfg7+7eFdhBlE4XxIvwefFzgOZAI6CymfUPNlViSeQiWAU0LfK8CQl4OFmUmZUnVALvuPv7QeeJgWOAs81sGaFTfyeZ2dvBRoq6VcAqd99ztDeSUDEkspOBpe6+wd3zgPeB3gFnipX1ZtYQIPxrZjR2kshF8BPQ2syam1kKoYtLHwecKWrMzAidN57r7k8HnScW3P1ed2/i7umE/nzHu3tC/6To7uuAlWbWNrypDzAnwEixsAI42sxSw3/P+5DgF8iL+Bi4Kvz4KuCjaOykXDTeNB64e76Z3QR8Rugug9fcfXbAsaLpGOAKYKaZTQtvu8/dRweYSaLjZuCd8A84S4BrAs4TVe4+0cxGAlMI3R03lQQcYWxmw4ATgDpmtgp4AHgUGG5mAwgVYr+o7Fsji0VEklsinxoSEZEIqAhERJKcikBEJMmpCEREkpyKQEQkyakIpNQzswIzm1bkv3vC21+N5sR7ZvaGmd2w17ZzzeyAt+ya2TIzqxOtXCKHKmHHEUhS2eXuXfbe6O7XRXm/wwhN7/BykW2XhLeLlBo6IpCEZWYTzCwj/HiAmS0Ib3vFzJ4Pb69rZqPM7Kfwf8eEtz8Ynh9+gpktMbNB+9jFF0C7IlMApBKaDuHD8PMPzWxyeB79gfvIl77X3PN3mtmD4cctzWxM+Pd/Y2btwtv7hefkn25mX5fg/y5JYjoikERQqchoaoC/ufs/9zwxs0bAnwjNybMNGA9MD788GHjG3b81szRCI9GPCL/WDjgRqArMN7O/h+e6AcDdC8zsfeCi8PucDXwZngYc4Fp332xmlYCfzGyUu2+K8DMNBW5094Vm1hN4ETgJuB/4tbuvNrMaEb6XyAGpCCQR7PPUUBFHAV+5+2YAMxsBtAm/djLQPjSFDQDVzKxq+PG/3D0XyDWzTKA+oUnfihoGPEGoCC4B3iry2iAzOy/8uCnQGjhoEYRnkO0NjCiSq0L41++AN8xsOKHJ10QOm4pAksG+piTfowzQy913/eI3hL4B5xbZVMC+/718BzQ0s86EvnlfEv79JxAqmV7uvtPMJgB7L6+Yzy9Pz+55vQyQtZ/rHjeGjxDOBKaZWZdDOMoQ2SddI5Bk8CNwvJnVDK9wdUGR18YCN+15YmYHOrL4Hx6arGs4odWjRrt7Tvil6sCWcAm0I7R86N7WA/XMrLaZVQD6ht8zG1hqZv3CmSxcNJhZS3ef6O73Axv55VTrIsWiIpBEUGmv20cfLfpieIWrRwit2PYFoWmbt4ZfHgRkmNkMM5sD3FiM/Q8jtHbwe0W2jQHKmdkM4GHgP3v/pvD1hj+Hc30KzCvy8uXAADObDszm52VWnzCzmeGLzF/z87UOkWLT7KOSFMysirtvDx8RfEBoWvIPgs4lEg90RCDJ4sHwnUWzgKWEb/EUER0RiIgkPR0RiIgkORWBiEiSUxGIiCQ5FYGISJJTEYiIJDkVgYhIkvt/6szQ6knt4jEAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Centering and scaling the data matrix\n", "xPCA=(X-X.mean(axis=0))/X.std(axis=0)\n", "# Computing Covariance\n", "C=np.cov(xPCA.T)\n", "# Computing eigenvalues and eigenvectors\n", "eigenValues,eigenVectors=np.linalg.eig(C)\n", "# Sorting eigenvalues in descending order\n", "idx=eigenValues.argsort()[::-1]\n", "eigenValues=eigenValues[idx]\n", "eigenVectors=eigenVectors[idx]\n", "\n", "# Plotting elbow curve\n", "chart=sns.lineplot(list(range(len(eigenValues))),np.cumsum(eigenValues));\n", "chart.set(xlabel='Eigen Values',ylabel='Cumulative sum');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looking at the graph above, we can conclude that most of the singular values matter for this dataset.\n", "\n", "#### Finding Anomalies" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# Computing Gaussian Parameters\n", "mu,sigma=estimateGaussian(X)\n", "p=mutlivariateGaussian(X,mu,sigma)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epsilon : 1.3786074982000233e-18\n", "F1 Score : 0.6153846153846154\n" ] } ], "source": [ "# Using Cross Validation set to select a threshold\n", "Xval=mat['Xval']\n", "yval=mat['yval']\n", "# Computing density probability\n", "pval=mutlivariateGaussian(Xval,mu,sigma)\n", "epsilon,f1=selectThreshold(yval,pval)\n", "print(f'Epsilon : {epsilon}')\n", "print(f'F1 Score : {f1}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Found 117 Anomalies**" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012345678910
015.107877-16.43076619.707360-19.811888-1.644537-6.444184-6.1212147.0422837.23247617.223479-2.956852
112.4117063.15095914.827734-10.482672-8.250082-7.982698-20.76691830.6894764.62254712.23475123.167294
220.9467541.0481708.296324-2.595042-14.0619708.776611-21.8860689.769264-20.07113013.8719064.197881
35.127033-18.51713711.422480-28.993408-5.797960-15.989215-12.03975331.2966818.2032088.03566819.519680
411.622006-0.22972310.005823-9.700729-14.76549326.933578-9.29988213.1091718.9762188.15717018.663064
5-5.689178-9.70742322.2581807.890772-11.39764134.445407-9.836411-0.179841-13.26105215.2006897.595782
616.452993-15.32002617.283979-5.896654-2.94096412.148073-3.42262315.684343-20.903409-0.306900-8.481797
7-1.907518-10.1910115.041437-31.5398850.7613682.6712012.14960011.920996-10.353145-5.472842-7.579770
8-2.665572-1.793277-3.312666-11.771151-22.9488240.6577024.0408982.684116-4.217029-12.9934726.657632
9-2.120839-22.57335210.638703-2.0648670.942871-11.1363181.74789822.794201-3.7267474.053987-1.185641
10-12.692722-11.4989050.026479-8.809048-16.233664-6.701840-10.635467-8.274253-12.428766-2.3964409.134138
1114.552377-38.59361319.161298-11.379035-24.67981729.394102-10.2559914.409564-20.899036-6.9976682.424330
123.968173-0.94844516.3905324.925583-30.25214714.208280-11.77589914.966458-10.9491190.96774622.231068
1321.302835-22.47878622.219274-10.333549-15.15376022.31641712.582181-0.541537-9.43633710.1685427.909259
1412.9819348.31903119.570986-17.000717-23.49379416.9813199.99673920.3867083.952619-4.3102156.554124
15-14.696577-17.20923714.525584-20.4754096.15927112.879206-13.9424174.008424-5.97408817.80836413.138762
166.1045853.181527-3.318357-23.428609-14.55989120.376025-8.4343546.850325-17.518661-4.5840835.415755
1718.8294491.171611-0.177123-3.604261-3.41329819.5233490.90354218.084633-11.911412-7.92111818.581649
188.511318-16.10722529.176976-21.298302-29.8558740.995455-7.2669778.1599630.9856097.729459-12.966527
19-5.513919-6.8386574.026831-7.0591060.96788315.2059734.9478555.362446-20.47781811.95815316.785018
2019.861932-7.8575065.643955-4.439531-0.84603726.410503-28.74180813.098772-14.189061-2.01023310.575248
21-10.373139-33.57765814.393455-3.373885-12.0783831.415410-6.675364-7.337178-4.6860003.7006220.949668
226.766758-4.40757910.305895-4.4008966.75428723.58476014.64471116.288997-17.19577318.1586902.366239
231.081479-12.69050731.5374069.477382-15.6769291.106263-10.3240354.418272-19.2676746.1746475.671273
24-15.600509-20.35123018.33472810.487489-16.0376351.283141-8.87962112.1409374.8448132.226241-4.970682
256.075045-23.85770911.6931874.866555-11.308274-14.698708-6.384952-10.841639-14.239166-9.7034914.654031
2617.438743-12.6158990.7892131.746310-13.324761-4.484684-8.72846926.461376-13.22848710.5459758.663423
27-0.345422-9.86808526.251767-25.748620-22.1684800.075066-4.9023242.7726235.538813-10.65286312.019222
288.782745-11.03972419.792848-7.454397-11.136051-5.493029-26.63675214.116266-9.00101012.18234721.790293
291.8152326.53683915.9589746.2318715.501488-4.8442461.44260926.645073-15.078784-14.62122411.307183
....................................
87-7.107823-2.659451-9.9137767.865090-15.062972-6.669139-10.886804-5.428388-6.145721-1.69492736.544551
8814.410571-22.4777263.857208-2.952726-24.88094126.884771-10.30126014.843514-5.62955015.83472514.923468
892.024200-17.99589218.1202559.302085-20.657652-1.342614-18.167920-6.7616271.510516-0.82274312.133360
9024.1614732.15071822.656128-13.8344544.338249-5.126474-6.1192243.4149711.0665109.99734327.965546
91-11.113160-5.64785023.2609814.848569-6.39947818.21001613.16430019.956570-0.3458789.65433110.193919
92-11.316527-8.60039925.029113-2.779305-2.41441918.9462122.166333-5.7781130.0441834.84972722.058928
9310.463708-21.58087823.9466227.302745-22.0521800.784292-7.5729496.403820-1.5673847.09663521.298699
94-2.731858-1.160324-1.203266-9.844513-8.30969027.4091558.01383410.136497-16.9462085.64617410.622073
95-0.343405-2.9805903.57423010.676147-3.6621732.813027-21.867923-6.557764-2.5759853.41159416.251755
96-1.741997-1.48763127.069244-7.5857682.93621824.6169082.10137041.978198-0.5265888.414796-1.203203
97-9.546285-5.59847416.340206-0.153478-14.9821392.475574-14.79176925.6550022.786267-5.18495918.274686
9812.453219-4.06505117.5362157.5435188.57137420.703939-18.2567047.374196-10.9452390.359840-12.650669
9913.4197617.41029923.401764-17.150200-8.98799916.752096-7.50370520.8561911.77092617.36948514.560195
100-8.993098-24.1379588.507993-28.898992-4.012547-2.715727-4.0275056.212865-15.7770598.403863-1.090873
1017.663754-21.31595910.085647-13.4764591.86976931.556190-4.2077396.783527-9.06969825.7114984.205376
102-9.653924-10.0979666.088028-17.224249-35.3518222.677276-13.81538210.687653-10.8043565.571192-9.528244
1035.5912356.70706925.2612027.251960-15.3359476.264382-1.49367224.244968-4.64305813.6280949.853019
1045.349885-12.91908312.132417-30.097538-23.68771415.293885-7.95428716.382830-2.277219-12.92647225.837356
10511.459300-6.87355724.0648250.688340-14.256252-14.539024-26.29600318.508660-6.11472915.6682790.323248
106-3.840912-12.8003351.341582-5.303823-11.195030-0.0784780.028865-8.083519-21.34462910.631312-8.177534
10712.687826-7.98979415.5369746.6412901.06200533.3528656.12296417.815628-17.525844-2.7214043.446680
1080.117894-24.10831425.600026-2.184434-26.4068519.524311-10.713197-1.100855-7.452232-16.072449-1.506024
109-14.471792-12.09913017.744803-5.173237-13.187795-3.559686-6.379751-5.804498-10.55295221.1747724.976100
110-8.591807-7.0446478.2141134.041871-13.3614538.9308216.097335-11.421545-5.72395612.70534223.809446
111-11.788266-11.48362712.438678-3.1628110.12198315.905273-14.98810015.049161-19.249455-8.94927116.943905
112-7.514689-11.63308212.714199-9.659930-20.17002646.574569-10.49880420.522685-1.2237424.478910-0.186369
1138.610956-2.08759513.609762-5.5094574.565548-2.1501336.6527946.604923-0.554557-21.5514095.522046
1144.712890-5.4680147.46378515.156009-4.815854-10.0926293.830391-0.360002-11.221351-3.2624861.591322
115-12.120077-5.84591328.951978-10.215184-18.07029715.439734-4.96133831.429439-7.4064235.84700020.797016
1163.059507-5.92629831.8374230.24534010.050776-1.9974282.05944912.5716244.11015914.8156449.035330
\n", "

117 rows × 11 columns

\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 \\\n", "0 15.107877 -16.430766 19.707360 -19.811888 -1.644537 -6.444184 \n", "1 12.411706 3.150959 14.827734 -10.482672 -8.250082 -7.982698 \n", "2 20.946754 1.048170 8.296324 -2.595042 -14.061970 8.776611 \n", "3 5.127033 -18.517137 11.422480 -28.993408 -5.797960 -15.989215 \n", "4 11.622006 -0.229723 10.005823 -9.700729 -14.765493 26.933578 \n", "5 -5.689178 -9.707423 22.258180 7.890772 -11.397641 34.445407 \n", "6 16.452993 -15.320026 17.283979 -5.896654 -2.940964 12.148073 \n", "7 -1.907518 -10.191011 5.041437 -31.539885 0.761368 2.671201 \n", "8 -2.665572 -1.793277 -3.312666 -11.771151 -22.948824 0.657702 \n", "9 -2.120839 -22.573352 10.638703 -2.064867 0.942871 -11.136318 \n", "10 -12.692722 -11.498905 0.026479 -8.809048 -16.233664 -6.701840 \n", "11 14.552377 -38.593613 19.161298 -11.379035 -24.679817 29.394102 \n", "12 3.968173 -0.948445 16.390532 4.925583 -30.252147 14.208280 \n", "13 21.302835 -22.478786 22.219274 -10.333549 -15.153760 22.316417 \n", "14 12.981934 8.319031 19.570986 -17.000717 -23.493794 16.981319 \n", "15 -14.696577 -17.209237 14.525584 -20.475409 6.159271 12.879206 \n", "16 6.104585 3.181527 -3.318357 -23.428609 -14.559891 20.376025 \n", "17 18.829449 1.171611 -0.177123 -3.604261 -3.413298 19.523349 \n", "18 8.511318 -16.107225 29.176976 -21.298302 -29.855874 0.995455 \n", "19 -5.513919 -6.838657 4.026831 -7.059106 0.967883 15.205973 \n", "20 19.861932 -7.857506 5.643955 -4.439531 -0.846037 26.410503 \n", "21 -10.373139 -33.577658 14.393455 -3.373885 -12.078383 1.415410 \n", "22 6.766758 -4.407579 10.305895 -4.400896 6.754287 23.584760 \n", "23 1.081479 -12.690507 31.537406 9.477382 -15.676929 1.106263 \n", "24 -15.600509 -20.351230 18.334728 10.487489 -16.037635 1.283141 \n", "25 6.075045 -23.857709 11.693187 4.866555 -11.308274 -14.698708 \n", "26 17.438743 -12.615899 0.789213 1.746310 -13.324761 -4.484684 \n", "27 -0.345422 -9.868085 26.251767 -25.748620 -22.168480 0.075066 \n", "28 8.782745 -11.039724 19.792848 -7.454397 -11.136051 -5.493029 \n", "29 1.815232 6.536839 15.958974 6.231871 5.501488 -4.844246 \n", ".. ... ... ... ... ... ... \n", "87 -7.107823 -2.659451 -9.913776 7.865090 -15.062972 -6.669139 \n", "88 14.410571 -22.477726 3.857208 -2.952726 -24.880941 26.884771 \n", "89 2.024200 -17.995892 18.120255 9.302085 -20.657652 -1.342614 \n", "90 24.161473 2.150718 22.656128 -13.834454 4.338249 -5.126474 \n", "91 -11.113160 -5.647850 23.260981 4.848569 -6.399478 18.210016 \n", "92 -11.316527 -8.600399 25.029113 -2.779305 -2.414419 18.946212 \n", "93 10.463708 -21.580878 23.946622 7.302745 -22.052180 0.784292 \n", "94 -2.731858 -1.160324 -1.203266 -9.844513 -8.309690 27.409155 \n", "95 -0.343405 -2.980590 3.574230 10.676147 -3.662173 2.813027 \n", "96 -1.741997 -1.487631 27.069244 -7.585768 2.936218 24.616908 \n", "97 -9.546285 -5.598474 16.340206 -0.153478 -14.982139 2.475574 \n", "98 12.453219 -4.065051 17.536215 7.543518 8.571374 20.703939 \n", "99 13.419761 7.410299 23.401764 -17.150200 -8.987999 16.752096 \n", "100 -8.993098 -24.137958 8.507993 -28.898992 -4.012547 -2.715727 \n", "101 7.663754 -21.315959 10.085647 -13.476459 1.869769 31.556190 \n", "102 -9.653924 -10.097966 6.088028 -17.224249 -35.351822 2.677276 \n", "103 5.591235 6.707069 25.261202 7.251960 -15.335947 6.264382 \n", "104 5.349885 -12.919083 12.132417 -30.097538 -23.687714 15.293885 \n", "105 11.459300 -6.873557 24.064825 0.688340 -14.256252 -14.539024 \n", "106 -3.840912 -12.800335 1.341582 -5.303823 -11.195030 -0.078478 \n", "107 12.687826 -7.989794 15.536974 6.641290 1.062005 33.352865 \n", "108 0.117894 -24.108314 25.600026 -2.184434 -26.406851 9.524311 \n", "109 -14.471792 -12.099130 17.744803 -5.173237 -13.187795 -3.559686 \n", "110 -8.591807 -7.044647 8.214113 4.041871 -13.361453 8.930821 \n", "111 -11.788266 -11.483627 12.438678 -3.162811 0.121983 15.905273 \n", "112 -7.514689 -11.633082 12.714199 -9.659930 -20.170026 46.574569 \n", "113 8.610956 -2.087595 13.609762 -5.509457 4.565548 -2.150133 \n", "114 4.712890 -5.468014 7.463785 15.156009 -4.815854 -10.092629 \n", "115 -12.120077 -5.845913 28.951978 -10.215184 -18.070297 15.439734 \n", "116 3.059507 -5.926298 31.837423 0.245340 10.050776 -1.997428 \n", "\n", " 6 7 8 9 10 \n", "0 -6.121214 7.042283 7.232476 17.223479 -2.956852 \n", "1 -20.766918 30.689476 4.622547 12.234751 23.167294 \n", "2 -21.886068 9.769264 -20.071130 13.871906 4.197881 \n", "3 -12.039753 31.296681 8.203208 8.035668 19.519680 \n", "4 -9.299882 13.109171 8.976218 8.157170 18.663064 \n", "5 -9.836411 -0.179841 -13.261052 15.200689 7.595782 \n", "6 -3.422623 15.684343 -20.903409 -0.306900 -8.481797 \n", "7 2.149600 11.920996 -10.353145 -5.472842 -7.579770 \n", "8 4.040898 2.684116 -4.217029 -12.993472 6.657632 \n", "9 1.747898 22.794201 -3.726747 4.053987 -1.185641 \n", "10 -10.635467 -8.274253 -12.428766 -2.396440 9.134138 \n", "11 -10.255991 4.409564 -20.899036 -6.997668 2.424330 \n", "12 -11.775899 14.966458 -10.949119 0.967746 22.231068 \n", "13 12.582181 -0.541537 -9.436337 10.168542 7.909259 \n", "14 9.996739 20.386708 3.952619 -4.310215 6.554124 \n", "15 -13.942417 4.008424 -5.974088 17.808364 13.138762 \n", "16 -8.434354 6.850325 -17.518661 -4.584083 5.415755 \n", "17 0.903542 18.084633 -11.911412 -7.921118 18.581649 \n", "18 -7.266977 8.159963 0.985609 7.729459 -12.966527 \n", "19 4.947855 5.362446 -20.477818 11.958153 16.785018 \n", "20 -28.741808 13.098772 -14.189061 -2.010233 10.575248 \n", "21 -6.675364 -7.337178 -4.686000 3.700622 0.949668 \n", "22 14.644711 16.288997 -17.195773 18.158690 2.366239 \n", "23 -10.324035 4.418272 -19.267674 6.174647 5.671273 \n", "24 -8.879621 12.140937 4.844813 2.226241 -4.970682 \n", "25 -6.384952 -10.841639 -14.239166 -9.703491 4.654031 \n", "26 -8.728469 26.461376 -13.228487 10.545975 8.663423 \n", "27 -4.902324 2.772623 5.538813 -10.652863 12.019222 \n", "28 -26.636752 14.116266 -9.001010 12.182347 21.790293 \n", "29 1.442609 26.645073 -15.078784 -14.621224 11.307183 \n", ".. ... ... ... ... ... \n", "87 -10.886804 -5.428388 -6.145721 -1.694927 36.544551 \n", "88 -10.301260 14.843514 -5.629550 15.834725 14.923468 \n", "89 -18.167920 -6.761627 1.510516 -0.822743 12.133360 \n", "90 -6.119224 3.414971 1.066510 9.997343 27.965546 \n", "91 13.164300 19.956570 -0.345878 9.654331 10.193919 \n", "92 2.166333 -5.778113 0.044183 4.849727 22.058928 \n", "93 -7.572949 6.403820 -1.567384 7.096635 21.298699 \n", "94 8.013834 10.136497 -16.946208 5.646174 10.622073 \n", "95 -21.867923 -6.557764 -2.575985 3.411594 16.251755 \n", "96 2.101370 41.978198 -0.526588 8.414796 -1.203203 \n", "97 -14.791769 25.655002 2.786267 -5.184959 18.274686 \n", "98 -18.256704 7.374196 -10.945239 0.359840 -12.650669 \n", "99 -7.503705 20.856191 1.770926 17.369485 14.560195 \n", "100 -4.027505 6.212865 -15.777059 8.403863 -1.090873 \n", "101 -4.207739 6.783527 -9.069698 25.711498 4.205376 \n", "102 -13.815382 10.687653 -10.804356 5.571192 -9.528244 \n", "103 -1.493672 24.244968 -4.643058 13.628094 9.853019 \n", "104 -7.954287 16.382830 -2.277219 -12.926472 25.837356 \n", "105 -26.296003 18.508660 -6.114729 15.668279 0.323248 \n", "106 0.028865 -8.083519 -21.344629 10.631312 -8.177534 \n", "107 6.122964 17.815628 -17.525844 -2.721404 3.446680 \n", "108 -10.713197 -1.100855 -7.452232 -16.072449 -1.506024 \n", "109 -6.379751 -5.804498 -10.552952 21.174772 4.976100 \n", "110 6.097335 -11.421545 -5.723956 12.705342 23.809446 \n", "111 -14.988100 15.049161 -19.249455 -8.949271 16.943905 \n", "112 -10.498804 20.522685 -1.223742 4.478910 -0.186369 \n", "113 6.652794 6.604923 -0.554557 -21.551409 5.522046 \n", "114 3.830391 -0.360002 -11.221351 -3.262486 1.591322 \n", "115 -4.961338 31.429439 -7.406423 5.847000 20.797016 \n", "116 2.059449 12.571624 4.110159 14.815644 9.035330 \n", "\n", "[117 rows x 11 columns]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "indices=np.where((p\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
grad (derivative)grad (numerical)
02.5704362.570436
1-0.173870-0.173870
20.4435130.443513
3-1.151438-1.151438
43.0783153.078315
51.4061081.406108
60.2513070.251307
70.8603910.860391
8-2.623469-2.623469
9-0.681626-0.681626
10-1.868502-1.868502
112.9086192.908619
12-1.269194-1.269194
13-0.828064-0.828064
14-0.727420-0.727420
150.4203400.420340
160.4051810.405181
17-1.245199-1.245199
180.0020560.002056
190.2489650.248965
20-3.372456-3.372456
211.4020441.402044
22-1.401868-1.401868
234.4467944.446794
241.5647901.564790
250.5185270.518527
26-0.877819-0.877819
\n", "" ], "text/plain": [ " grad (derivative) grad (numerical)\n", "0 2.570436 2.570436\n", "1 -0.173870 -0.173870\n", "2 0.443513 0.443513\n", "3 -1.151438 -1.151438\n", "4 3.078315 3.078315\n", "5 1.406108 1.406108\n", "6 0.251307 0.251307\n", "7 0.860391 0.860391\n", "8 -2.623469 -2.623469\n", "9 -0.681626 -0.681626\n", "10 -1.868502 -1.868502\n", "11 2.908619 2.908619\n", "12 -1.269194 -1.269194\n", "13 -0.828064 -0.828064\n", "14 -0.727420 -0.727420\n", "15 0.420340 0.420340\n", "16 0.405181 0.405181\n", "17 -1.245199 -1.245199\n", "18 0.002056 0.002056\n", "19 0.248965 0.248965\n", "20 -3.372456 -3.372456\n", "21 1.402044 1.402044\n", "22 -1.401868 -1.401868\n", "23 4.446794 4.446794\n", "24 1.564790 1.564790\n", "25 0.518527 0.518527\n", "26 -0.877819 -0.877819" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def computeNumericalGradient(J,theta):\n", " '''Computes the numerical gradient using two point slope equation'''\n", " numGrad=np.zeros(theta.shape)\n", " perturb=np.zeros(theta.shape)\n", " epsilon=1e-4\n", " for i in range(theta.size):\n", " perturb[i,:]=epsilon\n", " numGrad[i,:]=(J(theta+perturb)-J(theta-perturb))/(2*epsilon)\n", " perturb[i,:]=0\n", " return numGrad\n", "\n", "def checkNNGradients(lmbda):\n", " # Create small problem\n", " X=np.random.randn(4,3)\n", " theta=np.random.randn(5,3) \n", " Y=np.dot(X,theta.T)\n", " Y[np.random.rand(*Y.shape)>0.5]=0\n", " R=np.zeros(Y.shape)\n", " R[Y!=0]=1\n", " numUsers=Y.shape[1]\n", " numMovies=Y.shape[0]\n", " numFeatures=theta.shape[1]\n", " # Gradient from cost function derivative\n", " params=np.row_stack((X.reshape((-1,1)),theta.reshape((-1,1))))\n", " J,grad=costFunction(params,Y,R,numUsers,numMovies,numFeatures,lmbda)\n", " # Computing numerical gradient\n", " def cost(params):\n", " J,grad=costFunction(params,Y,R,numUsers,numMovies,numFeatures,lmbda)\n", " return J\n", " numGrad=computeNumericalGradient(cost,params)\n", " # Evaluating the norm of the differences between two solution\n", " diff=np.linalg.norm(numGrad-grad)/np.linalg.norm(numGrad+grad)\n", " return grad,numGrad,diff \n", "\n", "lmbda=1.5\n", "grad,numGrad,diff=checkNNGradients(lmbda)\n", "print(f'DIFFERENCE : {diff}')\n", "pd.DataFrame(data={'grad (derivative)':grad.reshape(-1),'grad (numerical)':numGrad.reshape(-1)})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2.3    Learning Movie Recommendations\n", "Now that we have finished the collaborative filtering cost function and gradient, we can now start training our learning algorithm to make movie recommendations for ourself. Later, we will also enter our own movie preferences, so that when the algorithm runs, we can get our own movie recommendations." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Y : 1682 X 943\n", "R : 1682 X 943\n" ] } ], "source": [ "# Loading data\n", "mat=loadmat('ex8_movies.mat')\n", "Y=mat['Y']\n", "R=mat['R']\n", "\n", "numUsers=Y.shape[1]\n", "numMovies=Y.shape[0]\n", "numFeatures=10\n", "\n", "print(\"Y : {0} X {1}\".format(*Y.shape))\n", "print(\"R : {0} X {1}\".format(*R.shape))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MovieYear
ID
1528Nowhere1997
14120,000 Leagues Under the Sea1954
27Bad Boys1995
1060Adventures of Pinocchio, The1996
972Passion Fish1992
588Beauty and the Beast1991
159Basic Instinct1992
1454Angel and the Badman1947
16648 Heads in a Duffel Bag1997
239Sneakers1992
\n", "
" ], "text/plain": [ " Movie Year\n", "ID \n", "1528 Nowhere 1997\n", "141 20,000 Leagues Under the Sea 1954\n", "27 Bad Boys 1995\n", "1060 Adventures of Pinocchio, The 1996\n", "972 Passion Fish 1992\n", "588 Beauty and the Beast 1991\n", "159 Basic Instinct 1992\n", "1454 Angel and the Badman 1947\n", "1664 8 Heads in a Duffel Bag 1997\n", "239 Sneakers 1992" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Loading Movies\n", "movies={'ID':[],'Movie':[],'Year':[]}\n", "with open('movie_ids.txt') as f:\n", " for line in f.readlines():\n", " movies['ID'].append(int(line.split()[0]))\n", " movies['Movie'].append(' '.join(line.split()[1:-1]))\n", " movies['Year'].append(line.split()[-1][1:-1])\n", "movies=pd.DataFrame(movies)\n", "movies=movies.set_index('ID')\n", "movies.sample(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Y is a 1682x943 matrix, containing ratings (1-5) of 1682 movies by 943 users" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456789...933934935936937938939940941942
Movie
Toy Story5400440004...2340400500
GoldenEye3000300000...4000000005
Four Rooms4000000000...0040000000
Get Shorty3000005004...5000002000
Copycat3000000000...0000000000
\n", "

5 rows × 943 columns

\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 7 8 9 ... 933 934 \\\n", "Movie ... \n", "Toy Story 5 4 0 0 4 4 0 0 0 4 ... 2 3 \n", "GoldenEye 3 0 0 0 3 0 0 0 0 0 ... 4 0 \n", "Four Rooms 4 0 0 0 0 0 0 0 0 0 ... 0 0 \n", "Get Shorty 3 0 0 0 0 0 5 0 0 4 ... 5 0 \n", "Copycat 3 0 0 0 0 0 0 0 0 0 ... 0 0 \n", "\n", " 935 936 937 938 939 940 941 942 \n", "Movie \n", "Toy Story 4 0 4 0 0 5 0 0 \n", "GoldenEye 0 0 0 0 0 0 0 5 \n", "Four Rooms 4 0 0 0 0 0 0 0 \n", "Get Shorty 0 0 0 0 2 0 0 0 \n", "Copycat 0 0 0 0 0 0 0 0 \n", "\n", "[5 rows x 943 columns]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(Y,movies['Movie']).head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### R is a 1682x943 matrix, where R(i,j) = 1 if and only if user j gave a rating to movie i" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456789...933934935936937938939940941942
Movie
Toy Story1100110001...1110100100
GoldenEye1000100000...1000000001
Four Rooms1000000000...0010000000
Get Shorty1000001001...1000001000
Copycat1000000000...0000000000
\n", "

5 rows × 943 columns

\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 7 8 9 ... 933 934 \\\n", "Movie ... \n", "Toy Story 1 1 0 0 1 1 0 0 0 1 ... 1 1 \n", "GoldenEye 1 0 0 0 1 0 0 0 0 0 ... 1 0 \n", "Four Rooms 1 0 0 0 0 0 0 0 0 0 ... 0 0 \n", "Get Shorty 1 0 0 0 0 0 1 0 0 1 ... 1 0 \n", "Copycat 1 0 0 0 0 0 0 0 0 0 ... 0 0 \n", "\n", " 935 936 937 938 939 940 941 942 \n", "Movie \n", "Toy Story 1 0 1 0 0 1 0 0 \n", "GoldenEye 0 0 0 0 0 0 0 1 \n", "Four Rooms 1 0 0 0 0 0 0 0 \n", "Get Shorty 0 0 0 0 1 0 0 0 \n", "Copycat 0 0 0 0 0 0 0 0 \n", "\n", "[5 rows x 943 columns]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(R,movies['Movie']).head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Visualizing the movie ratings" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.imshow(Y,cmap='gray')\n", "plt.xlabel('Users')\n", "plt.ylabel('Movies');" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Please enter the movie ratings(1-5 or 0 if not seen): \n", "\n", "Toy Story (1995) 4.1\n", "Silence of the Lambs, The (1991) 4.5\n", "Twelve Monkeys (1995) 1.5\n", "Usual Suspects, The (1995) 4.7\n", "Professional, The (1994) 3\n", "Shawshank Redemption, The (1994) 5\n", "While You Were Sleeping (1995) 3.5\n", "Forrest Gump (1994) 5\n", "Alien (1979) 3\n", "101 Dalmatians (1996) 3.8\n", "Sphere (1998) 0\n" ] } ], "source": [ "print('Please enter the movie ratings(1-5 or 0 if not seen): \\n')\n", "userRatings=np.zeros((numMovies,1))\n", "\n", "# Ask user ratings for set movies\n", "indices=[1,98,7,12,55,64,66,69,183,225,355]\n", "for index in indices:\n", " movie=movies.loc[index]\n", " try:\n", " question=f\"{movie['Movie']} ({movie['Year']})\"\n", " question=question.ljust(50)\n", " rating=float(input(question))\n", " if 1<=rating<=5:\n", " userRatings[int(index)-1]=rating\n", " except Exception as e:\n", " break" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "# Adding new user ratings \n", "Y=np.column_stack((userRatings,Y))\n", "# Updating R with new user ratings\n", "R=np.column_stack((np.zeros(numMovies),R))\n", "indices=np.where(Y[:,0]!=0)[0]\n", "R[indices,0]=1" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "scrolled": false }, "outputs": [], "source": [ "# Normalize Ratings\n", "Ymean=np.zeros((numMovies,1))\n", "Ynorm=np.zeros(Y.shape)\n", "\n", "for i in range(numMovies):\n", " idx=np.where(R[i,:]==1)\n", " Ymean[i]=Y[i,idx].mean()\n", " Ynorm[i,idx]=Y[i,idx]-Ymean[i]" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training Collaborative Filtering...\n", "ITERATIONS : 500\t\tCOST : 38914.118\r" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZsAAAEKCAYAAADEovgeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzt3X2UXVWZ5/HvryqvhIS8UGA6AQKSVtCRtwKiqK2gITBqaEcUu0ey7KxOj42M2t0qtGuabm2XODiirKXMoDAkDq2iSBNpIKYR1FZeUryFl6ApXlMGkkAlISaQpJJn/jj7Jpfi3ntu5d5zK1X1+6x11z3nOfucvU+MPNnn7Lu3IgIzM7MitQ12A8zMbPhzsjEzs8I52ZiZWeGcbMzMrHBONmZmVjgnGzMzK5yTjZmZFc7JxszMCldYspH0BkkPln1ekvRpSVMlLZe0On1PSeUl6QpJ3ZJWSjqx7FoLUvnVkhaUxU+S9HA65wpJSvGKdZiZ2eBQK2YQkNQO/B44FbgA6I2ISyVdBEyJiM9LOhu4EDg7lftmRJwqaSrQBXQCAdwHnBQRGyXdC3wKuBu4BbgiIm6V9D8r1VGrjQcffHDMmjWrgLs3Mxu+7rvvvhcioiOv3KhWNAY4A3giIp6RNB94V4ovBu4EPg/MB5ZElv3uljRZ0vRUdnlE9AJIWg7Mk3QnMCki7krxJcA5wK3pWpXqqGrWrFl0dXU14VbNzEYOSc/UU65V72zOA76ftg+NiOcA0vchKT4DWFN2Tk+K1Yr3VIjXqsPMzAZB4clG0hjgA8CP8opWiMU+xAfStkWSuiR1bdiwYSCnmpnZALSiZ3MWcH9ErEv769LjMdL3+hTvAQ4rO28msDYnPrNCvFYdrxIRV0VEZ0R0dnTkPnI0M7N91Ipk81H2PkIDWAqURpQtAG4qi5+fRqXNATanR2DLgLmSpqRRZXOBZenYFklz0ii08/tdq1IdZmY2CAodICDpAOC9wF+VhS8Frpe0EHgWODfFbyEbidYNbAM+DhARvZK+BKxI5b5YGiwAfAK4FhhPNjDg1pw6zMxsELRk6PNQ0NnZGR6NZmY2MJLui4jOvHKeQcDMzArnZNOgn9zfw3X31DXM3MxsxHKyadDSh9bywxVr8guamY1gTjYNEuDXXmZmtTnZNEgSMbDfkpqZjThONg1yz8bMLJ+TTYMkJxszszxONg2TH6KZmeVwsmlQ1rNxujEzq8XJpkGVpp42M7NXc7JpkN/ZmJnlc7JpkPDQZzOzPE42DXLPxswsn5NNg6QBLg9qZjYCOdk0SMij0czMcjjZNMo9GzOzXE42DRI425iZ5XCyaVA2EaeZmdXiZNOgbCJOpxszs1qcbBrk0WhmZvmcbBrkJQbMzPI52TTIi6eZmeUrNNlImizpx5Iel7RK0lslTZW0XNLq9D0llZWkKyR1S1op6cSy6yxI5VdLWlAWP0nSw+mcKyQpxSvWUcg94p6NmVmeons23wRui4g3AscBq4CLgNsjYjZwe9oHOAuYnT6LgCshSxzAJcCpwCnAJWXJ48pUtnTevBSvVkfzeboaM7NchSUbSZOAdwJXA0TEjojYBMwHFqdii4Fz0vZ8YElk7gYmS5oOnAksj4jeiNgILAfmpWOTIuKuyIaDLel3rUp1NP8+vciAmVmuIns2RwEbgP8r6QFJ35U0ATg0Ip4DSN+HpPIzgDVl5/ekWK14T4U4NepoOi+eZmaWr8hkMwo4EbgyIk4AtlL7cValLkLsQ7xukhZJ6pLUtWHDhoGcuvcaA63UzGwEKjLZ9AA9EXFP2v8xWfJZlx6Bkb7Xl5U/rOz8mcDanPjMCnFq1PEqEXFVRHRGRGdHR8c+3WSbxG73bMzMaios2UTE88AaSW9IoTOAx4ClQGlE2QLgprS9FDg/jUqbA2xOj8CWAXMlTUkDA+YCy9KxLZLmpFFo5/e7VqU6ms7r2ZiZ5RtV8PUvBK6TNAZ4Evg4WYK7XtJC4Fng3FT2FuBsoBvYlsoSEb2SvgSsSOW+GBG9afsTwLXAeODW9AG4tEodTecZBMzM8hWabCLiQaCzwqEzKpQN4IIq17kGuKZCvAt4c4X4i5XqKIbcszEzy+EZBBokrzFgZpbLyaZBnkHAzCyfk02D/M7GzCyfk02DhPyjTjOzHE42DXLPxswsn5NNg/zOxswsn5NNgyQ/RjMzy+Nk0wRONWZmtTnZNEieidPMLJeTTYOEnGvMzHI42TTI69mYmeVzsmmQn6KZmeVzsmmQlxgwM8vnZNMgSYT7NmZmNTnZNMg/6jQzy+dk0yhPV2NmlsvJpkFytjEzy+Vk06BsIk5nGzOzWpxsGuR3NmZm+ZxsGuQlBszM8jnZNMiLp5mZ5XOyaZB7NmZm+QpNNpKelvSwpAcldaXYVEnLJa1O31NSXJKukNQtaaWkE8uusyCVXy1pQVn8pHT97nSuatVRyD3idzZmZnla0bN5d0QcHxGdaf8i4PaImA3cnvYBzgJmp88i4ErIEgdwCXAqcApwSVnyuDKVLZ03L6eO5svym5mZ1TAYj9HmA4vT9mLgnLL4ksjcDUyWNB04E1geEb0RsRFYDsxLxyZFxF2RvTRZ0u9alepoulKq8XsbM7Pqik42AfxM0n2SFqXYoRHxHED6PiTFZwBrys7tSbFa8Z4K8Vp1NF2pY+NcY2ZW3aiCr39aRKyVdAiwXNLjNcpWeh4V+xCvW0qAiwAOP/zwgZy69xqpGc41ZmbVFdqziYi16Xs9cCPZO5d16REY6Xt9Kt4DHFZ2+kxgbU58ZoU4Nero376rIqIzIjo7Ojr26R739mycbszMqiks2UiaIGliaRuYCzwCLAVKI8oWADel7aXA+WlU2hxgc3oEtgyYK2lKGhgwF1iWjm2RNCeNQju/37Uq1dH8+0zfTjVmZtUV+RjtUODGNBp5FPAvEXGbpBXA9ZIWAs8C56bytwBnA93ANuDjABHRK+lLwIpU7osR0Zu2PwFcC4wHbk0fgEur1NF0fmdjZpavsGQTEU8Cx1WIvwicUSEewAVVrnUNcE2FeBfw5nrrKEJKpp6M08ysBs8g0CTu2ZiZVedk0yD/ptPMLJ+TTYP2DH12z8bMrConmwbtGSDgdzZmZlU52TRo73Q1g9oMM7P9mpNNg/b2bMzMrBonmwbtfWfjdGNmVk1dv7OR9DZgVnn5iFhSUJuGFPdszMzy5SYbSd8DXg88COxK4dKU/pa4Y2NmVl09PZtO4Njwc6KK5K6NmVmuet7ZPAK8ruiGDFV7J+J0tjEzq6aens3BwGOS7gW2l4IR8YHCWjWEeCJOM7N89SSbfyy6EUOZlxgwM8uXm2wi4heSDgVOTqF702JoRtmsz+7amJlVlfvORtKHgXvJ1oT5MHCPpA8V3bChwuMDzMzy1fMY7QvAyaXejKQO4N+BHxfZsKHC09WYmeWrZzRaW7/HZi/Wed7I4MXTzMxy1dOzuU3SMuD7af8jZEs4G9Dm0WhmZrnqGSDwWUn/BTiN7KnRVRFxY+EtGyK8no2ZWb665kaLiBuAGwpuy5Dk9WzMzPJVTTaS/iMi3i5pC68ebCUgImJS4a0bAjxAwMwsX9VkExFvT98TW9ecocdDn83M8tXzO5vv1ROrcX67pAck3Zz2j5R0j6TVkn4oaUyKj0373en4rLJrXJziv5V0Zll8Xop1S7qoLF6xjiJ4PRszs3z1DGF+U/mOpFHASQOo41PAqrL9rwKXR8RsYCOwMMUXAhsj4mjg8lQOSccC56V2zAO+nRJYO/At4CzgWOCjqWytOprPo9HMzHJVTTapN7EFeIukl9JnC7AOuKmei0uaCfxn4LtpX8Dp7P1B6GLgnLQ9P+2Tjp+Rys8HfhAR2yPiKaAbOCV9uiPiyYjYAfwAmJ9TR9Mpv4iZ2YhXNdlExFfS+5rLImJS+kyMiGkRcXGd1/8G8Dlgd9qfBmyKiL603wPMSNszgDWp7j5gcyq/J97vnGrxWnU03d650Yqqwcxs6KvnMdq9kg4q7UiaLCm3pyDpfcD6iLivPFyhaOQca1a8UhsXSeqS1LVhw4ZKRXJ5PRszs3z1JJtLImJzaSciNgGX1HHeacAHJD1N9ojrdLKezuT03gdgJrA2bfcAh8Ge90IHAb3l8X7nVIu/UKOOV4mIqyKiMyI6Ozo66ril1/J6NmZm+eqaG61CrJ6ZBy6OiJkRMYvsBf/PI+LPgTuA0qzRC9j7/mdp2icd/3lainopcF4arXYkMJtsFuoVwOw08mxMqmNpOqdaHU3noc9mZvnqSTZdkr4u6fWSjpJ0OXBf7lnVfR74G0ndZO9Xrk7xq4FpKf43wEUAEfEocD3wGHAbcEFE7ErvZD4JLCMb7XZ9Klurjqbz0Gczs3z1TFdzIfA/gB+SvaL4GXDBQCqJiDuBO9P2k2QjyfqXeYVszZxK538Z+HKF+C1UmBS0Wh1FcM/GzCxfPY/DtpJ6GVadOzZmZtXlJhtJfwz8HTCrvHxEnF5cs4aO0tBn923MzKqr5zHaj4D/TfbDzF3FNmfo8UScZmb56kk2fRFxZeEtGaL8zsbMLF89o9F+KumvJU2XNLX0KbxlQ4QXTzMzy1dPz6b025fPlsUCOKr5zRl6vHiamVm+ekajHdmKhgxVfmdjZpavntFo51eKR8SS5jdn6PF0NWZm+ep5jHZy2fY44AzgfsDJBij1bfwYzcysunoeo11Yvp9mgK57pc7hzj0bM7N89YxG628b2WSYhhdPMzOrRz3vbH7K3p+RtJEtwXx9kY0aSrx4mplZvnre2XytbLsPeCYiegpqz5DjxdPMzPJVTTaS5kTE3RHxi1Y2aKjxOxszs3y13tl8u7Qh6a4WtGVI8nQ1Zmb5aiWb8nff44puyFDlxdPMzPLVemfTJmkKWUIqbe9JQBHRW3TjhgT3bMzMctVKNgeRLf9cSjD3lx3z3GiJp6sxM8tXNdlExKwWtmPI8uJpZmb59uVHnVbGPRszs3xONg3yaDQzs3xONg3y4mlmZvlyk42k10y6WSlWocw4SfdKekjSo5L+KcWPlHSPpNWSfihpTIqPTfvd6fissmtdnOK/lXRmWXxeinVLuqgsXrGOIuz9UaezjZlZNfX0bN5UviOpHTipjvO2A6dHxHHA8cA8SXOArwKXR8RsYCOwMJVfCGyMiKOBy1M5JB0LnJfaMQ/4tqT21I5vAWeRzdf20VSWGnU0nYcHmJnlq5psUm9iC/AWSS+lzxZgPXBT3oUj84e0Ozp9Ajgd+HGKLwbOSdvz0z7p+BnKhnrNB34QEdsj4imgGzglfboj4smI2AH8AJifzqlWR/N5uhozs1xVk01EfCUiJgKXRcSk9JkYEdMi4uJ6Lp56IA+SJajlwBPApojoS0V6gBlpewawJtXdB2wGppXH+51TLT6tRh1NJy+eZmaWq57HaDdLmgAg6b9K+rqkI+q5eETsiojjgZlkPZFjKhVL35WWhokmxl9D0iJJXZK6NmzYUKlILv/MxswsXz3J5kpgm6TjgM8BzzDAJaEjYhNwJzAHmCyp9GPSmcDatN0DHAaQjh8E9JbH+51TLf5CjTr6t+uqiOiMiM6Ojo6B3NIezjVmZvnqSTZ9kQ21mg98MyK+CUzMO0lSh6TJaXs88B5gFXAH8KFUbAF73/8sTfuk4z9P9S4Fzkuj1Y4kWyX0XmAFMDuNPBtDNohgaTqnWh1N58XTzMzy1bN42hZJFwMfA96RRoGNruO86cDiVL4NuD4ibpb0GPADSf8MPABcncpfDXxPUjdZj+Y8gIh4VNL1wGNki7ddEBG7ACR9ElgGtAPXRMSj6Vqfr1JH0+39UaezjZlZNfUkm48Afwb8RUQ8L+lw4LK8kyJiJXBChfiTZO9v+sdfAc6tcq0vA1+uEL8FuKXeOorg6WrMzPLlPkaLiOeB64CDJL0PeCUiBvTOZjjzdDVmZvnqmUHgw2TvSM4FPgzcI+lDtc8aSbx4mplZnnoeo30BODki1kP24h/4d/b+aHJEc8/GzCxfPaPR2kqJJnmxzvNGhL1Llw5mK8zM9m/19Gxuk7QM+H7a/whwa3FNGlr2DH12tjEzqyo32UTEZyV9EHg72T/kr4qIGwtv2RDh0WhmZvmqJhtJRwOHRsSvI+InwE9S/J2SXh8RT7SqkfszeSJOM7Nctd69fAPYUiG+LR0z9k7EudvZxsysqlrJZlb6YearREQXMKuwFg0xHo1mZpavVrIZV+PY+GY3ZKjyYzQzs3y1ks0KSX/ZPyhpIXBfcU0aWuR5n83MctUajfZp4EZJf87e5NIJjAH+tOiGDRXu2ZiZ5auabCJiHfA2Se8G3pzC/xYRP29Jy4YIv7MxM8tXz+9s7iBbH8Yq2LMstLONmVlVnnamQV7Pxswsn5NNgzyDgJlZPiebBvmdjZlZPiebhnk9GzOzPE42DZLyy5iZjXRONg3yOxszs3xONg3yejZmZvmcbBrkno2ZWb7Cko2kwyTdIWmVpEclfSrFp0paLml1+p6S4pJ0haRuSSslnVh2rQWp/GpJC8riJ0l6OJ1zhVI3o1odxdxn9u1kY2ZWXZE9mz7gbyPiGGAOcIGkY4GLgNsjYjZwe9oHOAuYnT6LgCshSxzAJcCpwCnAJWXJ48pUtnTevBSvVkfT7ZlBoKgKzMyGgcKSTUQ8FxH3p+0twCpgBjAfWJyKLQbOSdvzgSWRuRuYLGk6cCawPCJ6I2IjsByYl45Nioi7Iht3vKTftSrV0XR7ezZON2Zm1bTknY2kWcAJwD1kS00/B1lCAg5JxWYAa8pO60mxWvGeCnFq1FEYpxozs+oKTzaSDgRuAD4dES/VKlohFvsQH0jbFknqktS1YcOGgZxado19qdnMbGQpNNlIGk2WaK6LiJ+k8Lr0CIz0vT7Fe4DDyk6fCazNic+sEK9Vx6tExFUR0RkRnR0dHft6j9m1nG3MzKoqcjSagKuBVRHx9bJDS4HSiLIFwE1l8fPTqLQ5wOb0CGwZMFfSlDQwYC6wLB3bImlOquv8fteqVEfTeeizmVm+3PVsGnAa8DHgYUkPptjfA5cC16flpZ8Fzk3HbgHOBrqBbcDHASKiV9KXgBWp3BcjojdtfwK4FhgP3Jo+1Kij6TwRp5lZvsKSTUT8B5XfqwCcUaF8ABdUudY1wDUV4l3sXUW0PP5ipTqK4MXTzMzyeQaBBrWldLrL2cbMrConmwaNH9MOwMs7+ga5JWZm+y8nmwZNGJM9ifzD9l2D3BIzs/2Xk02D2trEAWPa2brdPRszs2qcbJrgwLGjnGzMzGpwsmmCA8eO4g9ONmZmVTnZNMEE92zMzGpysmmCCWPb2eoBAmZmVTnZNIEfo5mZ1eZk0wQTxo5iq39nY2ZWlZNNE/idjZlZbU42TeDHaGZmtTnZNMHEsaN4Zedutvd5kICZWSVONk3wR5PHA/D7jS8PckvMzPZPTjZNcMS0AwB4pnfbILfEzGz/5GTTBIdPzZLNGicbM7OKnGyaoGPiWMaNbuOZF51szMwqcbJpAkm8vuNAfrduy2A3xcxsv+Rk0yRvmXkQK3s2E16x08zsNZxsmuQ/zZjM5pd38qzf25iZvYaTTZOccPhkAO59qneQW2Jmtv9xsmmSNxw6kYMPHMOvu18Y7KaYme13Cks2kq6RtF7SI2WxqZKWS1qdvqekuCRdIalb0kpJJ5adsyCVXy1pQVn8JEkPp3OukKRadRStrU28Y3YHd/5uAzv6dreiSjOzIaPIns21wLx+sYuA2yNiNnB72gc4C5idPouAKyFLHMAlwKnAKcAlZcnjylS2dN68nDoK9/7jprNp205+8bsNrarSzGxIKCzZRMQvgf4vMOYDi9P2YuCcsviSyNwNTJY0HTgTWB4RvRGxEVgOzEvHJkXEXZEN/1rS71qV6ijcO2Z3MHXCGP71gd+3qkozsyGh1e9sDo2I5wDS9yEpPgNYU1auJ8VqxXsqxGvVUbjR7W28/y3TWb5qHb1bd7SqWjOz/d7+MkBAFWKxD/GBVSotktQlqWvDhuY8+vrYW49g567dXP0fTzblemZmw0Grk8269AiM9L0+xXuAw8rKzQTW5sRnVojXquM1IuKqiOiMiM6Ojo59vqlyRx8ykbPfPJ3Fv3mGzdt2NuWaZmZDXauTzVKgNKJsAXBTWfz8NCptDrA5PQJbBsyVNCUNDJgLLEvHtkiak0ahnd/vWpXqaJkLzziabTv6+Pry37a6ajOz/VKRQ5+/D9wFvEFSj6SFwKXAeyWtBt6b9gFuAZ4EuoHvAH8NEBG9wJeAFenzxRQD+ATw3XTOE8CtKV6tjpZ54+sm8bE5R7Dk7md4aM2mVldvZrbfkefyynR2dkZXV1fTrvfSKzt579d/wYQxo/jphW9nwthRTbu2mdn+QtJ9EdGZV25/GSAw7EwaN5pvnncCT7+4lc/dsJLdu53UzWzkcrIp0JyjpvG5eW/k31Y+x1duXTXYzTEzGzR+tlOwv3rnUTy36WW+86unOGj8aC5499GkmXXMzEYMJ5uCSeIf3v8mXnqlj6/97Hdsfnknf3/2MU44ZjaiONm0QHub+F/nHsekcaP4zq+eYk3vy1x27luYOG70YDfNzKwl/M6mRdraxD9+4E184exjWL5qHed869c88vvNg90sM7OWcLJpIUn85TuP4v8tPJUtr/Qx/1u/5rJlj7N1e99gN83MrFBONoPgra+fxvLP/AnnHD+Db93xBH9y2R0s/s3TbO/bNdhNMzMrhH/UmTT7R531euDZjXz1tse5+8lepk0Yw7mdh/HRUw7jiGkTWt4WM7OBqvdHnU42yWAlG4CI4K4nXuTa3zzNv69ax+6AY6dP4sw3vY73HHsIx7xuEm1tHr1mZvsfJ5sBGsxkU27tppe5eeValj26jvuf3UgEHDR+NJ1HTOGkWVM4dvokjpk+iUMmjvXwaTMbdE42A7S/JJty67e8wq9+9wIrnu7l3qd6efKFrXuOTTlgNEd1HMjMKePT5wD+aPJ4pk0Yw9T0GTe6fRBbb2YjgZPNAO2Pyaa/zdt28vjzL/H481t4/PktPP3CVno2bWPtplfYVWHutQlj2pl64Bgmjh3NhLHtTBg7igljRjFhbDsHjBnFgWNHMW50G2NGtTG6fe/32NJ+exujR6XvdtHWJtol2tuElP1+qF1ZvE2l7SzephRLZdRG9i0Q2XdJeUxko/aU4qR9M9s/1Zts/KPOIeSgA0Zz6lHTOPWoaa+K9+3azfMvvcLzm1/hxa076E2fF/+wg96t2/nD9l1s3d7Hxq07WNO7jW07sv2tO3ZVTFL7q6rJiOyAeG3ioqw85ef3O/6aumq0oUJ0AGUrl65etkLbBnDdrHz9ybrqtSvEK7Wtetlq161yjbqDxRqsf+a0+h9Y1yw4mcOnHVBoHU42w8Co9jZmTjmAmVMG9pclIujbHezo283OXbvZ0bebHel7565I+7vY0Rfs3LWbXRHs3h3s2h3sDtgdpe1I22TH036p7K7I6tq1OwggAiKt4l3qWEdEir/2eKSN8mN7zisvv+f8ysdfVdeeP4MKfy5VVhivXLban22VAxXOqFa20bZVrq1W2fovUv2+K9xf1bJ1V1fxukUbtH+GDULFY0YV/ysYJ5sRTBKj28Xodv/cysyK5f/KmJlZ4ZxszMyscE42ZmZWOCcbMzMrnJONmZkVzsnGzMwK52RjZmaFc7IxM7PCeW60RNIG4Jl9PP1g4IUmNmco8D2PDL7nkaGRez4iIjryCjnZNIGkrnomohtOfM8jg+95ZGjFPfsxmpmZFc7JxszMCudk0xxXDXYDBoHveWTwPY8Mhd+z39mYmVnh3LMxM7PCOdk0SNI8Sb+V1C3posFuT7NIukbSekmPlMWmSlouaXX6npLiknRF+jNYKenEwWv5vpF0mKQ7JK2S9KikT6X4cL7ncZLulfRQuud/SvEjJd2T7vmHksak+Ni0352OzxrM9jdCUrukByTdnPaH9T1LelrSw5IelNSVYi39u+1k0wBJ7cC3gLOAY4GPSjp2cFvVNNcC8/rFLgJuj4jZwO1pH7L7n50+i4ArW9TGZuoD/jYijgHmABek/y2H8z1vB06PiOOA44F5kuYAXwUuT/e8EViYyi8ENkbE0cDlqdxQ9SlgVdn+SLjnd0fE8WVDnFv7dztbQtefffkAbwWWle1fDFw82O1q4v3NAh4p2/8tMD1tTwd+m7b/D/DRSuWG6ge4CXjvSLln4ADgfuBUsh/3jUrxPX/HgWXAW9P2qFROg932fbjXmWT/cT0duBnQCLjnp4GD+8Va+nfbPZvGzADWlO33pNhwdWhEPAeQvg9J8WH155AelZwA3MMwv+f0OOlBYD2wHHgC2BQRfalI+X3tued0fDMwrbUtbopvAJ8Ddqf9aQz/ew7gZ5Luk7QoxVr6d3tUoxcY4VQhNhKH9w2bPwdJBwI3AJ+OiJekSreWFa0QG3L3HBG7gOMlTQZuBI6pVCx9D/l7lvQ+YH1E3CfpXaVwhaLD5p6T0yJiraRDgOWSHq9RtpB7ds+mMT3AYWX7M4G1g9SWVlgnaTpA+l6f4sPiz0HSaLJEc11E/CSFh/U9l0TEJuBOsvdVkyWV/iFafl977jkdPwjobW1LG3Ya8AFJTwM/IHuU9g2G9z0TEWvT93qyf1ScQov/bjvZNGYFMDuNZBkDnAcsHeQ2FWkpsCBtLyB7r1GKn59GscwBNpe650OFsi7M1cCqiPh62aHhfM8dqUeDpPHAe8hemt8BfCgV63/PpT+LDwE/j/RQf6iIiIsjYmZEzCL7/+vPI+LPGcb3LGmCpImlbWAu8Ait/rs92C+uhvoHOBv4Hdmz7i8MdnuaeF/fB54DdpL9S2ch2bPq24HV6XtqKiuyUXlPAA8DnYPd/n2437eTPSpYCTyYPmcP83t+C/BAuudHgH9I8aOAe4Fu4EfA2BQfl/a70/GjBvseGrz/dwE3D/d7Tvf2UPo8WvrvVKv/bnsGATMzK5wfo5mZWeGcbMzMrHBONmZmVjgnGzMzK5yTjZmZFc7JxqwJJP0hfc+S9GdNvvbf99v/TTOvb9YKTjZmzTULGFCySbOH1/KqZBMRbxtgm8wGnZONWXNdCrwSCv0SAAACCklEQVQjrRvymTTR5WWSVqS1Qf4KQNK7lK2f8y9kP5xD0r+miRIfLU2WKOlSYHy63nUpVupFKV37kbRWyUfKrn2npB9LelzSdWmGBCRdKumx1JavtfxPx0YsT8Rp1lwXAX8XEe8DSEljc0ScLGks8GtJP0tlTwHeHBFPpf2/iIjeNHXMCkk3RMRFkj4ZEcdXqOuDZOvQHAccnM75ZTp2AvAmsjmtfg2cJukx4E+BN0ZElKaqMWsF92zMijWXbJ6pB8mWLJhGtigVwL1liQbgv0t6CLibbCLE2dT2duD7EbErItYBvwBOLrt2T0TsJpt6ZxbwEvAK8F1JHwS2NXx3ZnVysjErloALI1sh8fiIODIiSj2brXsKZdPdv4dsoa7jyOYsG1fHtavZXra9i2xhsD6y3tQNwDnAbQO6E7MGONmYNdcWYGLZ/jLgE2n5AiT9cZp5t7+DyJYf3ibpjWRT/ZfsLJ3fzy+Bj6T3Qh3AO8kmi6wordVzUETcAnya7BGcWUv4nY1Zc60E+tLjsGuBb5I9wro/vaTfQNar6O824L9JWkm2DO/dZceuAlZKuj+y6fBLbiRbwvghshmrPxcRz6dkVclE4CZJ48h6RZ/Zt1s0GzjP+mxmZoXzYzQzMyuck42ZmRXOycbMzArnZGNmZoVzsjEzs8I52ZiZWeGcbMzMrHBONmZmVrj/D7hmWtRQI5E3AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Setting Initial Parameters\n", "X=np.random.randn(numMovies,numFeatures)\n", "theta=np.random.randn(numUsers,numFeatures)\n", "initialParams=np.row_stack((X.reshape((-1,1)),theta.reshape((-1,1))))\n", "\n", "# Setting Gradient descent options\n", "lmbda=10\n", "alpha=0.003\n", "iterations=500\n", "\n", "# Training Collaborative filtering\n", "jHistory,params=regularizedGradientDescent(initialParams,Ynorm,R,numUsers,numMovies,numFeatures,lmbda,alpha,iterations)\n", "# Unrolling the learned parameters into X and theta\n", "X=params[:numMovies*numFeatures].reshape((numMovies,numFeatures))\n", "theta=params[numMovies*numFeatures:].reshape((numUsers,numFeatures))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**After training the model, we can now make the predictions by computing the prediction matrix.**" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "P=np.dot(X,theta.T)\n", "userPrediction=P[:,0]+Ymean.reshape(-1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**For Original Ratings provided by the user :**" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MovieYearUser Ratings
ID
1Toy Story19954.1
7Twelve Monkeys19951.5
12Usual Suspects, The19954.7
55Professional, The19943.0
64Shawshank Redemption, The19945.0
66While You Were Sleeping19953.5
69Forrest Gump19945.0
98Silence of the Lambs, The19914.5
183Alien19793.0
225101 Dalmatians19963.8
\n", "
" ], "text/plain": [ " Movie Year User Ratings\n", "ID \n", "1 Toy Story 1995 4.1\n", "7 Twelve Monkeys 1995 1.5\n", "12 Usual Suspects, The 1995 4.7\n", "55 Professional, The 1994 3.0\n", "64 Shawshank Redemption, The 1994 5.0\n", "66 While You Were Sleeping 1995 3.5\n", "69 Forrest Gump 1994 5.0\n", "98 Silence of the Lambs, The 1991 4.5\n", "183 Alien 1979 3.0\n", "225 101 Dalmatians 1996 3.8" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df=movies.copy()\n", "df['User Ratings']=userRatings\n", "df[df['User Ratings']!=0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Top 10 Recommended Movies are :**" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MovieYearPredicted Ratings
ID
1201Marlene Dietrich: Shadow and Light19965.0
1122They Made Me a Criminal19395.0
1293Star Kid19975.0
1467Saint of Fort Washington, The19935.0
1599Someone Else's America19955.0
814Great Day in Harlem, A19945.0
1653Entertaining Angels: The Dorothy Day Story19965.0
1189Prefontaine19975.0
1536Aiqing wansui19945.0
1500Santa with Muscles19965.0
\n", "
" ], "text/plain": [ " Movie Year Predicted Ratings\n", "ID \n", "1201 Marlene Dietrich: Shadow and Light 1996 5.0\n", "1122 They Made Me a Criminal 1939 5.0\n", "1293 Star Kid 1997 5.0\n", "1467 Saint of Fort Washington, The 1993 5.0\n", "1599 Someone Else's America 1995 5.0\n", "814 Great Day in Harlem, A 1994 5.0\n", "1653 Entertaining Angels: The Dorothy Day Story 1996 5.0\n", "1189 Prefontaine 1997 5.0\n", "1536 Aiqing wansui 1994 5.0\n", "1500 Santa with Muscles 1996 5.0" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "limit=10\n", "df=movies.copy()\n", "df['Predicted Ratings']=userPrediction\n", "df.sort_values(by=['Predicted Ratings'],ascending=False).head(limit)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" } }, "nbformat": 4, "nbformat_minor": 2 }