{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#Notebook 3 - Item Based Filtering\n", "## fun with Slope 1\n", "\n", "Consider the following table of ratings (this will get filled in during class):\n", "\n", "| | Taylor Swift | Fall Out Boy | The Weeknd | Nicki Minaj | Carrie Underwood | Drake |\n", "| :--: | :--: | :--: | :--: | :--: | :--: | :--: |\n", "| - | | | | | | |\n", "| - | | | | | | |\n", "| - | | | | | | |\n", "| - | | | | | | |\n", "| - | | | | | | |\n", "| - | | | | | | | |\n", "\n", "\n", "(The requirement is that each person is missing 2 ratings)\n", "\n", "\n", "How would you represent this table in Python? \n", "\n", "#### checkpoint 1 - 10xp" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### the old code\n", "To help us get started here is the recommender class from the previous lab.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import codecs\n", "from math import sqrt\n", "\n", "class recommender:\n", "\n", " def __init__(self, data, k=1, metric='pearson', n=5):\n", " \"\"\" initialize recommender\n", " currently, if data is dictionary the recommender is initialized\n", " to it.\n", " For all other data types of data, no initialization occurs\n", " k is the k value for k nearest neighbor\n", " metric is which distance formula to use\n", " n is the maximum number of recommendations to make\"\"\"\n", " self.k = k\n", " self.n = n\n", " self.username2id = {}\n", " self.userid2name = {}\n", " self.productid2name = {}\n", " # for some reason I want to save the name of the metric\n", " self.metric = metric\n", " if self.metric == 'pearson':\n", " self.fn = self.pearson\n", " #\n", " # if data is dictionary set recommender data to it\n", " #\n", " if type(data).__name__ == 'dict':\n", " self.ratings = data\n", "\n", " def convertProductID2name(self, id):\n", " \"\"\"Given product id number return product name\"\"\"\n", " if id in self.productid2name:\n", " return self.productid2name[id]\n", " else:\n", " return id\n", "\n", "\n", " def userRatings(self, id, n):\n", " \"\"\"Return n top ratings for user with id\"\"\"\n", " print (\"Ratings for \" + self.userid2name[id])\n", " ratings = self.ratings[id]\n", " print(len(ratings))\n", " ratings = list(ratings.items())\n", " ratings = [(self.convertProductID2name(k), v)\n", " for (k, v) in ratings]\n", " # finally sort and return\n", " ratings.sort(key=lambda artistTuple: artistTuple[1],\n", " reverse = True)\n", " ratings = ratings[:n]\n", " for rating in ratings:\n", " print(\"%s\\t%i\" % (rating[0], rating[1]))\n", " \n", "\n", " \n", " \n", " \n", " def pearson(self, rating1, rating2):\n", " sum_xy = 0\n", " sum_x = 0\n", " sum_y = 0\n", " sum_x2 = 0\n", " sum_y2 = 0\n", " n = 0\n", " for key in rating1:\n", " if key in rating2:\n", " n += 1\n", " x = rating1[key]\n", " y = rating2[key]\n", " sum_xy += x * y\n", " sum_x += x\n", " sum_y += y\n", " sum_x2 += pow(x, 2)\n", " sum_y2 += pow(y, 2)\n", " if n == 0:\n", " return 0\n", " # now compute denominator\n", " denominator = (sqrt(sum_x2 - pow(sum_x, 2) / n)\n", " * sqrt(sum_y2 - pow(sum_y, 2) / n))\n", " if denominator == 0:\n", " return 0\n", " else:\n", " return (sum_xy - (sum_x * sum_y) / n) / denominator\n", "\n", "\n", " def computeNearestNeighbor(self, username):\n", " \"\"\"creates a sorted list of users based on their distance to\n", " username\"\"\"\n", " distances = []\n", " for instance in self.ratings:\n", " if instance != username:\n", " distance = self.fn(self.ratings[username],\n", " self.ratings[instance])\n", " distances.append((instance, distance))\n", " # sort based on distance -- closest first\n", " distances.sort(key=lambda artistTuple: artistTuple[1],\n", " reverse=True)\n", " return distances\n", "\n", " def recommend(self, user):\n", " \"\"\"Give list of recommendations\"\"\"\n", " recommendations = {}\n", " for i in range(self.k):\n", " totalDistance += nearest[i][1]\n", " # now iterate through the k nearest neighbors\n", " # accumulating their ratings\n", " for i in range(self.k):\n", " # compute slice of pie \n", " weight = nearest[i][1] / totalDistance\n", " # get the name of the person\n", " name = nearest[i][0]\n", " # get the ratings for this person\n", " neighborRatings = self.ratings[name]\n", " # get the name of the person\n", " # now find bands neighbor rated that user didn't\n", " for artist in neighborRatings:\n", " if not artist in userRatings:\n", " if artist not in recommendations:\n", " recommendations[artist] = (neighborRatings[artist]\n", " * weight)\n", " else:\n", " recommendations[artist] = (recommendations[artist]\n", " + neighborRatings[artist]\n", " * weight)\n", " # now make list from dictionary\n", " recommendations = list(recommendations.items())\n", " recommendations = [(self.convertProductID2name(k), v)\n", " for (k, v) in recommendations]\n", " # finally sort and return\n", " recommendations.sort(key=lambda artistTuple: artistTuple[1],\n", " reverse = True)\n", " # Return the first n items\n", " return recommendations[:self.n]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### compute Slope One Deviations\n", "I would like you to add a method to the above called `computeSlopeOneDeviations` that computes the Slope One Deviations \n", "and stores those deviations in an instance variable called slopeOneDeviations. The method should implement:\n", " \n", " \n", "$$dev_{i,j}=\\sum_{u\\in{S_{i,j}}(X)}\\frac{u_i - u_j}{card(S_{i,j}(X))}$$\n", "\n", "for each pair of artists *i* and *j*\n", "\n", "#### checkpoint 2 (15xp)\n", "\n", "When you execute the following code you should see the dictionary of deviations:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "reco = recommender(userRatings) // change userRatings to match your data set\n", "reco.computeSlopeOneDeviations()\n", "reco.slopeOneDeviations\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Recommendation Component\n", "Now it is time to write code for the recommendation component. The formula you are implementing is\n", "\n", "$$p^{wS1}(u)_j = \\frac{\\sum_{i\\in{S(u)-(j)}}(dev_{j,i}+u_i)c_{j,i}}{\\sum_{i\\in{S(u)-(j)}}c_{j,i}}$$\n", "\n", "where\n", "\n", "$$c_{j,i}$$\n", "\n", "is the cardinality of *j* and *i* -- that is, how many people rated both artists *j* and *i*\n", "\n", "Please write a method called `slopeOneRecommendations` that takes a dictionary of ratings as an argument and returns a list of artist & rating tuples. For example, something like:\n", "\n", " > r.slopeOneRecommendations({'Taylor Swift': 5, 'The Weeknd': 3, 'Drake': 2})\n", " [('Carrie Underwood': 4.375), ('Nicki Minaj': 2.127)]\n", " \n", "#### checkpoint 3 (25 xp)\n", " \n", "Test your code:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The music recommender\n", "In the previous lab there was a method called `loadBookDB` which loaded data from a csv file. \n", "\n", "1. Download the class ratings data set to your laptop.\n", "2. Write a method that will read this dataset into the recommender\n", "3. Demo a recommender that uses Slope One to make recommendations for people in our class.\n", "\n", "#### checkpoint 4 (30 xp)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Extra Credit\n", "\n", "##Movie Lens Data (50xp)\n", "This task uses the Small MovieLens Latest dataset from [grouplens.org](http://grouplens.org/datasets/movielens/).\n", "\n", "See how well the Slope One recommender recommends movies for you. Rate 10 movies or so (ones that are in the MovieLens dataset). Does the recommender suggest movies you might like? \n", "This can be done with a partner. \n", "\n", "To get full credit, you need to report to the class on your results (in a coherent way). You will have 1-3 minutes. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adjusted Cosine Similarity (75xp)\n", "Implement Adjusted Cosine Similarity. Compare its performance to Slope One. \n", "To get full credit, you need to report to the class on your results (in a coherent way). You will have 2-5 minutes." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.3" } }, "nbformat": 4, "nbformat_minor": 0 }