{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Simple recommender\n",
    "\n",
    "\n",
    "### 1. Python dictionaries\n",
    "\n",
    "Python dictionaries will play a crucial role in a lot of the code we write. If you are not familiar with Python dictionaries, read this section. \n",
    "\n",
    "Python dictionaries are similar to a phone book. In a phone book there is the name of the person you want to look up (*Ann* say), and the number you want to retrieve. So you might give a phone book application the name of someone and the application will return the phone number associated with that person. We are going to make a simple Python dictionary called *phone* that contains some phone numbers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "phone = {\"Ann\": \"575-680-5555\", \"Bernie\": \"540-224-1130\", \"Clara\": \"540-220-7865\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the above code we associate a name with a phone number. The name and the number are separated by a colon, and each individual entry is separated by a comma. We can look up a phone number using the following syntax:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'575-680-5555'"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "phone['Ann']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 1. how would you look up Bernie's number?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can add an entry with this syntax:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "phone['Dan'] = \"575-540-1234\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check to see if that entry was added:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We commonly call the name (the thing to the left of the colon) the **key** and the thing we want to look up (the thing to the right of the colon) the **value**. So, for example, *Ann* is a key whose value is \"575-680-5555\". That phone number is really a string (a sequence of characters) and not a number--meaning we can't do operations we might do with numbers like add one phone number to another. The values of a dictionary don't need to be strings. They can be numbers, for example:\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "ages = {\"Ann\": 21, \"Bernie\": 34, \"Clara\": 18}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "ages['Ann']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "#### 2. you try\n",
    "Suppose I have a little table of grades:\n",
    "\n",
    "| | Jeff | Sara | Miguel | Dan |\n",
    "|  :---: | :---: | :---: | :---: | :---: |\n",
    "| grades | 83 | 97 | 93 | 67 |\n",
    "\n",
    "How might we represent this using a Python dictionary?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### a dictionary whose values are dictionaries\n",
    "\n",
    "Okay, here is where things get a bit cool!\n",
    "\n",
    "Suppose I want to represent a table like the following where different customers rate different musical artists:\n",
    "\n",
    "|Customer | Taylor Swift | Miranda Lambert | Carrie Underwood | Nicki Minaj | Ariana Grande |\n",
    "|:-----------|:------:|:------:|:---------:|:------:|:--------:|\n",
    "|Jake|5|-|5|2|2|\n",
    "|Clara|-|-|2|4|5|\n",
    "|Kelsey|5|5|5|2|-|\n",
    "|Angelica|4|3|-|5|5|\n",
    "\n",
    "Here I am going to have a dictionary called *ratings* with the keys being the customers' names. And the value for a particular customer will be that customer's ratings. What better way to keep track of a customer's set of ratings than with a dictionary. So Jake's ratings could be represented as"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "jake = {\"Taylor Swift\": 5, \"Carrie Underwood\": 5, \"Nicki Minaj\": 2, \"Ariana Grande\": 2}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Okay, that works. So now for the big dictionary. The key will be the user's name (*Jake* in this first case) and the value for that key will be the dictionary of that user's ratings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "ratings = {\"Jake\": {\"Taylor Swift\": 5, \"Carrie Underwood\": 5, \"Nicki Minaj\": 2, \"Ariana Grande\": 2},\n",
    "           \"Clara\": {\"Carrie Underwood\": 2, \"Nicki Minaj\": 4, \"Ariana Grande\": 5},\n",
    "           \"Kelsey\": {\"Taylor Swift\": 5, \"Miranda Lambert\": 5,\"Carrie Underwood\": 5, \"Nicki Minaj\": 2},\n",
    "           \"Angelica\" : {\"Taylor Swift\": 4, \"Miranda Lambert\": 3, \"Nicki Minaj\": 5, \"Ariana Grande\": 5}}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Take a minute or two to look at the above and see how the rows and columns of the table are transformed to a Python dictionary. Now to get, for example, Kelsey's rating of Taylor Swift:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ratings[\"Kelsey\"][\"Taylor Swift\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### You try\n",
    "How would you get Jake's rating of Nicki Minaj? What about Clara's rating of Carrie Underwood?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The classifier code\n",
    "The following is an abbreviated version of the classifier code from the book:\n",
    "#### First the code to compute the Manhattan distance "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def manhattan(rating1, rating2):\n",
    "    \"\"\"Computes the Manhattan distance. Both rating1 and rating2 are dictionaries\n",
    "       of the form {'The Strokes': 3.0, 'Slightly Stoopid': 2.5}\"\"\"\n",
    "    distance = 0\n",
    "    commonRatings = False \n",
    "    for key in rating1:\n",
    "        if key in rating2:\n",
    "            distance += abs(rating1[key] - rating2[key])\n",
    "            commonRatings = True\n",
    "    if commonRatings:\n",
    "        return distance\n",
    "    else:\n",
    "        return -1 #Indicates no ratings in common"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's see if this works. Consult with your team. How can we compute the distance between Jake and Clara?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "8"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Here is the code to find the nearest neighbor\n",
    "\n",
    "Again, take your time and look over the code. It actually returns a sorted list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def computeNearestNeighbor(username, users):\n",
    "    \"\"\"creates a sorted list of users based on their distance to username\"\"\"\n",
    "    distances = []\n",
    "    for user in users:\n",
    "        if user != username:\n",
    "            distance = manhattan(users[user], users[username])\n",
    "            distances.append((distance, user))\n",
    "    # sort based on distance -- closest first\n",
    "    distances.sort()\n",
    "    return distances"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Write the code to find the nearest neighbor of Jake"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(0, 'Kelsey'), (7, 'Angelica'), (8, 'Clara')]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Okay, so it looks like of the artists that both Jake and Kelsey rated, Jake and Kelsey gave the exact same ratings.\n",
    "\n",
    "#### Finally, code to make a recommendation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def recommend(username, users):\n",
    "    \"\"\"Give list of recommendations\"\"\"\n",
    "    # first find nearest neighbor\n",
    "    nearest = computeNearestNeighbor(username, users)[0][1]\n",
    "\n",
    "    recommendations = []\n",
    "    # now find bands neighbor rated that user didn't\n",
    "    neighborRatings = users[nearest]\n",
    "    userRatings = users[username]\n",
    "    for artist in neighborRatings:\n",
    "        if not artist in userRatings:\n",
    "            recommendations.append((artist, neighborRatings[artist]))\n",
    "    # using the fn sorted for variety - sort is more efficient\n",
    "    return sorted(recommendations, key=lambda artistTuple: artistTuple[1], reverse = True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's recommend something for Jake:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('Miranda Lambert', 5)]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's another dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "users = {\"Angelica\": {\"Blues Traveler\": 3.5, \"Broken Bells\": 2.0, \"Norah Jones\": 4.5, \"Phoenix\": 5.0, \"Slightly Stoopid\": 1.5, \"The Strokes\": 2.5, \"Vampire Weekend\": 2.0},\n",
    "         \"Bill\":{\"Blues Traveler\": 2.0, \"Broken Bells\": 3.5, \"Deadmau5\": 4.0, \"Phoenix\": 2.0, \"Slightly Stoopid\": 3.5, \"Vampire Weekend\": 3.0},\n",
    "         \"Chan\": {\"Blues Traveler\": 5.0, \"Broken Bells\": 1.0, \"Deadmau5\": 1.0, \"Norah Jones\": 3.0, \"Phoenix\": 5, \"Slightly Stoopid\": 1.0},\n",
    "         \"Dan\": {\"Blues Traveler\": 3.0, \"Broken Bells\": 4.0, \"Deadmau5\": 4.5, \"Phoenix\": 3.0, \"Slightly Stoopid\": 4.5, \"The Strokes\": 4.0, \"Vampire Weekend\": 2.0},\n",
    "         \"Hailey\": {\"Broken Bells\": 4.0, \"Deadmau5\": 1.0, \"Norah Jones\": 4.0, \"The Strokes\": 4.0, \"Vampire Weekend\": 1.0},\n",
    "         \"Jordyn\":  {\"Broken Bells\": 4.5, \"Deadmau5\": 4.0, \"Norah Jones\": 5.0, \"Phoenix\": 5.0, \"Slightly Stoopid\": 4.5, \"The Strokes\": 4.0, \"Vampire Weekend\": 4.0},\n",
    "         \"Sam\": {\"Blues Traveler\": 5.0, \"Broken Bells\": 2.0, \"Norah Jones\": 3.0, \"Phoenix\": 5.0, \"Slightly Stoopid\": 4.0, \"The Strokes\": 5.0},\n",
    "         \"Veronica\": {\"Blues Traveler\": 3.0, \"Norah Jones\": 5.0, \"Phoenix\": 4.0, \"Slightly Stoopid\": 2.5, \"The Strokes\": 3.0}\n",
    "        }"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What should we recommend to Hailey?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### TO DO - work with a partner\n",
    "\n",
    "First, write a function called euclidean that computes the Euclidean distance between two users (see the Manhattan function above as a guide)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now write a new version of computeNearestNeighbors that uses the Euclidean distance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Does this new recommendation system make the same recommendation for Jake?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Make your own dataset\n",
    "\n",
    "Make your own dataset and test your recommendation system. It can be in whatever domain you want (musical artists, fine wine, movies, restaurants in Fredericksburg)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now test your recommendation system with that data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### individual xp: 50\n",
    "show your Euclidean distance and  revised nearest neighbor functions and demo on your data"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}