## Simple recommender


### 1. Python dictionaries

Python dictionaries will play a crucial role in a lot of the code we write. If you are not familiar with Python dictionaries, read this section. 

Python dictionaries are similar to a phone book. In a phone book there is the name of the person you want to look up (*Ann* say), and the number you want to retrieve. So you might give a phone book application the name of someone and the application will return the phone number associated with that person. We are going to make a simple Python dictionary called *phone* that contains some phone numbers:

In [1]:
phone = {"Ann": "575-680-5555", "Bernie": "540-224-1130", "Clara": "540-220-7865"}

In the above code we associate a name with a phone number. The name and the number are separated by a colon, and each individual entry is separated by a comma. We can look up a phone number using the following syntax:

In [2]:
phone['Ann']

'575-680-5555'

##### 1. how would you look up Bernie's number?

We can add an entry with this syntax:

In [None]:
phone['Dan'] = "575-540-1234"

Check to see if that entry was added:

We commonly call the name (the thing to the left of the colon) the **key** and the thing we want to look up (the thing to the right of the colon) the **value**. So, for example, *Ann* is a key whose value is "575-680-5555". That phone number is really a string (a sequence of characters) and not a number--meaning we can't do operations we might do with numbers like add one phone number to another. The values of a dictionary don't need to be strings. They can be numbers, for example:




In [None]:
ages = {"Ann": 21, "Bernie": 34, "Clara": 18}

In [None]:
ages['Ann']



#### 2. you try
Suppose I have a little table of grades:

| | Jeff | Sara | Miguel | Dan |
| :---: | :---: | :---: | :---: | :---: |
| grades | 83 | 97 | 93 | 67 |

How might we represent this using a Python dictionary?

### a dictionary whose values are dictionaries

Okay, here is where things get a bit cool!

Suppose I want to represent a table like the following where different customers rate different musical artists:

|Customer | Taylor Swift | Miranda Lambert | Carrie Underwood | Nicki Minaj | Ariana Grande |
|:-----------|:------:|:------:|:---------:|:------:|:--------:|
|Jake|5|-|5|2|2|
|Clara|-|-|2|4|5|
|Kelsey|5|5|5|2|-|
|Angelica|4|3|-|5|5|

Here I am going to have a dictionary called *ratings* with the keys being the customers' names. And the value for a particular customer will be that customer's ratings. What better way to keep track of a customer's set of ratings than with a dictionary. So Jake's ratings could be represented as

In [6]:
jake = {"Taylor Swift": 5, "Carrie Underwood": 5, "Nicki Minaj": 2, "Ariana Grande": 2}

Okay, that works. So now for the big dictionary. The key will be the user's name (*Jake* in this first case) and the value for that key will be the dictionary of that user's ratings:

In [1]:
ratings = {"Jake": {"Taylor Swift": 5, "Carrie Underwood": 5, "Nicki Minaj": 2, "Ariana Grande": 2},
 "Clara": {"Carrie Underwood": 2, "Nicki Minaj": 4, "Ariana Grande": 5},
 "Kelsey": {"Taylor Swift": 5, "Miranda Lambert": 5,"Carrie Underwood": 5, "Nicki Minaj": 2},
 "Angelica" : {"Taylor Swift": 4, "Miranda Lambert": 3, "Nicki Minaj": 5, "Ariana Grande": 5}}

Take a minute or two to look at the above and see how the rows and columns of the table are transformed to a Python dictionary. Now to get, for example, Kelsey's rating of Taylor Swift:

In [2]:
ratings["Kelsey"]["Taylor Swift"]

5

#### You try
How would you get Jake's rating of Nicki Minaj? What about Clara's rating of Carrie Underwood?


### The classifier code
The following is an abbreviated version of the classifier code from the book:
#### First the code to compute the Manhattan distance 

In [3]:
def manhattan(rating1, rating2):
 """Computes the Manhattan distance. Both rating1 and rating2 are dictionaries
 of the form {'The Strokes': 3.0, 'Slightly Stoopid': 2.5}"""
 distance = 0
 commonRatings = False 
 for key in rating1:
 if key in rating2:
 distance += abs(rating1[key] - rating2[key])
 commonRatings = True
 if commonRatings:
 return distance
 else:
 return -1 #Indicates no ratings in common

Let's see if this works. Consult with your team. How can we compute the distance between Jake and Clara?

8

#### Here is the code to find the nearest neighbor

Again, take your time and look over the code. It actually returns a sorted list

In [5]:
def computeNearestNeighbor(username, users):
 """creates a sorted list of users based on their distance to username"""
 distances = []
 for user in users:
 if user != username:
 distance = manhattan(users[user], users[username])
 distances.append((distance, user))
 # sort based on distance -- closest first
 distances.sort()
 return distances

Write the code to find the nearest neighbor of Jake

[(0, 'Kelsey'), (7, 'Angelica'), (8, 'Clara')]

Okay, so it looks like of the artists that both Jake and Kelsey rated, Jake and Kelsey gave the exact same ratings.

#### Finally, code to make a recommendation

In [7]:
def recommend(username, users):
 """Give list of recommendations"""
 # first find nearest neighbor
 nearest = computeNearestNeighbor(username, users)[0][1]

 recommendations = []
 # now find bands neighbor rated that user didn't
 neighborRatings = users[nearest]
 userRatings = users[username]
 for artist in neighborRatings:
 if not artist in userRatings:
 recommendations.append((artist, neighborRatings[artist]))
 # using the fn sorted for variety - sort is more efficient
 return sorted(recommendations, key=lambda artistTuple: artistTuple[1], reverse = True)

Now let's recommend something for Jake:

[('Miranda Lambert', 5)]

Here's another dataset:

In [9]:
users = {"Angelica": {"Blues Traveler": 3.5, "Broken Bells": 2.0, "Norah Jones": 4.5, "Phoenix": 5.0, "Slightly Stoopid": 1.5, "The Strokes": 2.5, "Vampire Weekend": 2.0},
 "Bill":{"Blues Traveler": 2.0, "Broken Bells": 3.5, "Deadmau5": 4.0, "Phoenix": 2.0, "Slightly Stoopid": 3.5, "Vampire Weekend": 3.0},
 "Chan": {"Blues Traveler": 5.0, "Broken Bells": 1.0, "Deadmau5": 1.0, "Norah Jones": 3.0, "Phoenix": 5, "Slightly Stoopid": 1.0},
 "Dan": {"Blues Traveler": 3.0, "Broken Bells": 4.0, "Deadmau5": 4.5, "Phoenix": 3.0, "Slightly Stoopid": 4.5, "The Strokes": 4.0, "Vampire Weekend": 2.0},
 "Hailey": {"Broken Bells": 4.0, "Deadmau5": 1.0, "Norah Jones": 4.0, "The Strokes": 4.0, "Vampire Weekend": 1.0},
 "Jordyn": {"Broken Bells": 4.5, "Deadmau5": 4.0, "Norah Jones": 5.0, "Phoenix": 5.0, "Slightly Stoopid": 4.5, "The Strokes": 4.0, "Vampire Weekend": 4.0},
 "Sam": {"Blues Traveler": 5.0, "Broken Bells": 2.0, "Norah Jones": 3.0, "Phoenix": 5.0, "Slightly Stoopid": 4.0, "The Strokes": 5.0},
 "Veronica": {"Blues Traveler": 3.0, "Norah Jones": 5.0, "Phoenix": 4.0, "Slightly Stoopid": 2.5, "The Strokes": 3.0}
 }

What should we recommend to Hailey?

### TO DO - work with a partner

First, write a function called euclidean that computes the Euclidean distance between two users (see the Manhattan function above as a guide)

Now write a new version of computeNearestNeighbors that uses the Euclidean distance.

Does this new recommendation system make the same recommendation for Jake?

### Make your own dataset

Make your own dataset and test your recommendation system. It can be in whatever domain you want (musical artists, fine wine, movies, restaurants in Fredericksburg).

Now test your recommendation system with that data

### individual xp: 50
show your Euclidean distance and revised nearest neighbor functions and demo on your data