{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Simple recommender\n", "\n", "\n", "### 1. Python dictionaries\n", "\n", "Python dictionaries will play a crucial role in a lot of the code we write. If you are not familiar with Python dictionaries, read this section. \n", "\n", "Python dictionaries are similar to a phone book. In a phone book there is the name of the person you want to look up (*Ann* say), and the number you want to retrieve. So you might give a phone book application the name of someone and the application will return the phone number associated with that person. We are going to make a simple Python dictionary called *phone* that contains some phone numbers:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "phone = {\"Ann\": \"575-680-5555\", \"Bernie\": \"540-224-1130\", \"Clara\": \"540-220-7865\"}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the above code we associate a name with a phone number. The name and the number are separated by a colon, and each individual entry is separated by a comma. We can look up a phone number using the following syntax:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'575-680-5555'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "phone['Ann']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 1. how would you look up Bernie's number?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can add an entry with this syntax:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "phone['Dan'] = \"575-540-1234\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check to see if that entry was added:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We commonly call the name (the thing to the left of the colon) the **key** and the thing we want to look up (the thing to the right of the colon) the **value**. So, for example, *Ann* is a key whose value is \"575-680-5555\". That phone number is really a string (a sequence of characters) and not a number--meaning we can't do operations we might do with numbers like add one phone number to another. The values of a dictionary don't need to be strings. They can be numbers, for example:\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "ages = {\"Ann\": 21, \"Bernie\": 34, \"Clara\": 18}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "ages['Ann']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "#### 2. you try\n", "Suppose I have a little table of grades:\n", "\n", "| | Jeff | Sara | Miguel | Dan |\n", "| :---: | :---: | :---: | :---: | :---: |\n", "| grades | 83 | 97 | 93 | 67 |\n", "\n", "How might we represent this using a Python dictionary?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### a dictionary whose values are dictionaries\n", "\n", "Okay, here is where things get a bit cool!\n", "\n", "Suppose I want to represent a table like the following where different customers rate different musical artists:\n", "\n", "|Customer | Taylor Swift | Miranda Lambert | Carrie Underwood | Nicki Minaj | Ariana Grande |\n", "|:-----------|:------:|:------:|:---------:|:------:|:--------:|\n", "|Jake|5|-|5|2|2|\n", "|Clara|-|-|2|4|5|\n", "|Kelsey|5|5|5|2|-|\n", "|Angelica|4|3|-|5|5|\n", "\n", "Here I am going to have a dictionary called *ratings* with the keys being the customers' names. And the value for a particular customer will be that customer's ratings. What better way to keep track of a customer's set of ratings than with a dictionary. So Jake's ratings could be represented as" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "jake = {\"Taylor Swift\": 5, \"Carrie Underwood\": 5, \"Nicki Minaj\": 2, \"Ariana Grande\": 2}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Okay, that works. So now for the big dictionary. The key will be the user's name (*Jake* in this first case) and the value for that key will be the dictionary of that user's ratings:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "ratings = {\"Jake\": {\"Taylor Swift\": 5, \"Carrie Underwood\": 5, \"Nicki Minaj\": 2, \"Ariana Grande\": 2},\n", " \"Clara\": {\"Carrie Underwood\": 2, \"Nicki Minaj\": 4, \"Ariana Grande\": 5},\n", " \"Kelsey\": {\"Taylor Swift\": 5, \"Miranda Lambert\": 5,\"Carrie Underwood\": 5, \"Nicki Minaj\": 2},\n", " \"Angelica\" : {\"Taylor Swift\": 4, \"Miranda Lambert\": 3, \"Nicki Minaj\": 5, \"Ariana Grande\": 5}}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Take a minute or two to look at the above and see how the rows and columns of the table are transformed to a Python dictionary. Now to get, for example, Kelsey's rating of Taylor Swift:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ratings[\"Kelsey\"][\"Taylor Swift\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### You try\n", "How would you get Jake's rating of Nicki Minaj? What about Clara's rating of Carrie Underwood?\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The classifier code\n", "The following is an abbreviated version of the classifier code from the book:\n", "#### First the code to compute the Manhattan distance " ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def manhattan(rating1, rating2):\n", " \"\"\"Computes the Manhattan distance. Both rating1 and rating2 are dictionaries\n", " of the form {'The Strokes': 3.0, 'Slightly Stoopid': 2.5}\"\"\"\n", " distance = 0\n", " commonRatings = False \n", " for key in rating1:\n", " if key in rating2:\n", " distance += abs(rating1[key] - rating2[key])\n", " commonRatings = True\n", " if commonRatings:\n", " return distance\n", " else:\n", " return -1 #Indicates no ratings in common" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see if this works. Consult with your team. How can we compute the distance between Jake and Clara?" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "8" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Here is the code to find the nearest neighbor\n", "\n", "Again, take your time and look over the code. It actually returns a sorted list" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def computeNearestNeighbor(username, users):\n", " \"\"\"creates a sorted list of users based on their distance to username\"\"\"\n", " distances = []\n", " for user in users:\n", " if user != username:\n", " distance = manhattan(users[user], users[username])\n", " distances.append((distance, user))\n", " # sort based on distance -- closest first\n", " distances.sort()\n", " return distances" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Write the code to find the nearest neighbor of Jake" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[(0, 'Kelsey'), (7, 'Angelica'), (8, 'Clara')]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Okay, so it looks like of the artists that both Jake and Kelsey rated, Jake and Kelsey gave the exact same ratings.\n", "\n", "#### Finally, code to make a recommendation" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def recommend(username, users):\n", " \"\"\"Give list of recommendations\"\"\"\n", " # first find nearest neighbor\n", " nearest = computeNearestNeighbor(username, users)[0][1]\n", "\n", " recommendations = []\n", " # now find bands neighbor rated that user didn't\n", " neighborRatings = users[nearest]\n", " userRatings = users[username]\n", " for artist in neighborRatings:\n", " if not artist in userRatings:\n", " recommendations.append((artist, neighborRatings[artist]))\n", " # using the fn sorted for variety - sort is more efficient\n", " return sorted(recommendations, key=lambda artistTuple: artistTuple[1], reverse = True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's recommend something for Jake:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[('Miranda Lambert', 5)]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's another dataset:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "users = {\"Angelica\": {\"Blues Traveler\": 3.5, \"Broken Bells\": 2.0, \"Norah Jones\": 4.5, \"Phoenix\": 5.0, \"Slightly Stoopid\": 1.5, \"The Strokes\": 2.5, \"Vampire Weekend\": 2.0},\n", " \"Bill\":{\"Blues Traveler\": 2.0, \"Broken Bells\": 3.5, \"Deadmau5\": 4.0, \"Phoenix\": 2.0, \"Slightly Stoopid\": 3.5, \"Vampire Weekend\": 3.0},\n", " \"Chan\": {\"Blues Traveler\": 5.0, \"Broken Bells\": 1.0, \"Deadmau5\": 1.0, \"Norah Jones\": 3.0, \"Phoenix\": 5, \"Slightly Stoopid\": 1.0},\n", " \"Dan\": {\"Blues Traveler\": 3.0, \"Broken Bells\": 4.0, \"Deadmau5\": 4.5, \"Phoenix\": 3.0, \"Slightly Stoopid\": 4.5, \"The Strokes\": 4.0, \"Vampire Weekend\": 2.0},\n", " \"Hailey\": {\"Broken Bells\": 4.0, \"Deadmau5\": 1.0, \"Norah Jones\": 4.0, \"The Strokes\": 4.0, \"Vampire Weekend\": 1.0},\n", " \"Jordyn\": {\"Broken Bells\": 4.5, \"Deadmau5\": 4.0, \"Norah Jones\": 5.0, \"Phoenix\": 5.0, \"Slightly Stoopid\": 4.5, \"The Strokes\": 4.0, \"Vampire Weekend\": 4.0},\n", " \"Sam\": {\"Blues Traveler\": 5.0, \"Broken Bells\": 2.0, \"Norah Jones\": 3.0, \"Phoenix\": 5.0, \"Slightly Stoopid\": 4.0, \"The Strokes\": 5.0},\n", " \"Veronica\": {\"Blues Traveler\": 3.0, \"Norah Jones\": 5.0, \"Phoenix\": 4.0, \"Slightly Stoopid\": 2.5, \"The Strokes\": 3.0}\n", " }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What should we recommend to Hailey?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TO DO - work with a partner\n", "\n", "First, write a function called euclidean that computes the Euclidean distance between two users (see the Manhattan function above as a guide)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now write a new version of computeNearestNeighbors that uses the Euclidean distance." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Does this new recommendation system make the same recommendation for Jake?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make your own dataset\n", "\n", "Make your own dataset and test your recommendation system. It can be in whatever domain you want (musical artists, fine wine, movies, restaurants in Fredericksburg)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now test your recommendation system with that data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### individual xp: 50\n", "show your Euclidean distance and revised nearest neighbor functions and demo on your data" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.3" } }, "nbformat": 4, "nbformat_minor": 0 }