{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Activity 9 - Recommender Systems\n", "\n", "This notebook illustrates the recommender system example used in the lecture notes for Text Analytics.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def rand_array():\n", " return list(np.round(np.random.random([10,])))\n", "rand_array()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ItemsAliceBobCharlieDaisyEdwardFayeGeorgeHarrietImogenJohn
0Apple1.01.00.01.01.01.01.00.01.00.0
1Banana1.00.01.00.01.00.00.00.00.01.0
2Pear0.00.00.01.00.00.00.00.01.00.0
3Chicken0.01.01.00.01.00.00.00.00.01.0
4Beef1.00.00.01.00.00.00.01.01.00.0
5Lamb0.00.01.00.01.01.00.00.00.01.0
6Pizza1.01.00.00.00.00.01.01.00.01.0
7Pasta1.01.00.00.01.01.01.00.00.01.0
8Rice1.01.01.00.01.00.00.01.00.00.0
9Cake1.01.01.00.01.01.01.01.01.00.0
\n", "
" ], "text/plain": [ " Items Alice Bob Charlie Daisy Edward Faye George Harriet Imogen \\\n", "0 Apple 1.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0 \n", "1 Banana 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 \n", "2 Pear 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 \n", "3 Chicken 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 \n", "4 Beef 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 \n", "5 Lamb 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 \n", "6 Pizza 1.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 \n", "7 Pasta 1.0 1.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 \n", "8 Rice 1.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 \n", "9 Cake 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 \n", "\n", " John \n", "0 0.0 \n", "1 1.0 \n", "2 0.0 \n", "3 1.0 \n", "4 0.0 \n", "5 1.0 \n", "6 1.0 \n", "7 1.0 \n", "8 0.0 \n", "9 0.0 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = {}\n", "data['Items'] = ['Apple','Banana','Pear','Chicken','Beef','Lamb','Pizza','Pasta','Rice','Cake']\n", "data['Alice']= rand_array()\n", "data['Bob']= rand_array()\n", "data['Charlie']= rand_array()\n", "data['Daisy']= rand_array()\n", "data['Edward']= rand_array()\n", "data['Faye']= rand_array()\n", "data['George']= rand_array()\n", "data['Harriet']= rand_array()\n", "data['Imogen']= rand_array()\n", "data['John']= rand_array()\n", "df = pd.DataFrame(data)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Above we have a table, where we can see whether each person likes a particular item.\n", "\n", "* If the item is scored 1.0 then the person likes the item.\n", "* If the item is scored 0.0 then the person does not like the item.\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ItemsAliceBobCharlieDaisyEdwardFayeGeorgeHarrietImogenJohnKyle
0Apple1.01.00.01.01.01.01.00.01.00.00.7
1Banana1.00.01.00.01.00.00.00.00.01.00.4
2Pear0.00.00.01.00.00.00.00.01.00.00.2
3Chicken0.01.01.00.01.00.00.00.00.01.00.4
4Beef1.00.00.01.00.00.00.01.01.00.00.4
5Lamb0.00.01.00.01.01.00.00.00.01.00.4
6Pizza1.01.00.00.00.00.01.01.00.01.00.5
7Pasta1.01.00.00.01.01.01.00.00.01.00.6
8Rice1.01.01.00.01.00.00.01.00.00.00.5
9Cake1.01.01.00.01.01.01.01.01.00.00.8
\n", "
" ], "text/plain": [ " Items Alice Bob Charlie Daisy Edward Faye George Harriet Imogen \\\n", "0 Apple 1.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0 \n", "1 Banana 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 \n", "2 Pear 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 \n", "3 Chicken 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 \n", "4 Beef 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 \n", "5 Lamb 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 \n", "6 Pizza 1.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 \n", "7 Pasta 1.0 1.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 \n", "8 Rice 1.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 \n", "9 Cake 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 \n", "\n", " John Kyle \n", "0 0.0 0.7 \n", "1 1.0 0.4 \n", "2 0.0 0.2 \n", "3 1.0 0.4 \n", "4 0.0 0.4 \n", "5 1.0 0.4 \n", "6 1.0 0.5 \n", "7 1.0 0.6 \n", "8 0.0 0.5 \n", "9 0.0 0.8 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['Kyle'] = df.values[:,1:11].sum(axis=1) / 10\n", "#df = df.drop(['Lemmy'], axis=1)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Above we observe a new person - Kyle. At this stage we do not know anything about Kyle.\n", "\n", "* We can generate a initial profile for this new user by taking the average popularity of each item for each user from our current knowledge base." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ItemsAliceBobCharlieDaisyEdwardFayeGeorgeHarrietImogenJohnKyle
0Apple1.01.00.01.01.01.01.00.01.00.00
1Banana1.00.01.00.01.00.00.00.00.01.00
2Pear0.00.00.01.00.00.00.00.01.00.00
3Chicken0.01.01.00.01.00.00.00.00.01.00
4Beef1.00.00.01.00.00.00.01.01.00.01
5Lamb0.00.01.00.01.01.00.00.00.01.00
6Pizza1.01.00.00.00.00.01.01.00.01.00
7Pasta1.01.00.00.01.01.01.00.00.01.00
8Rice1.01.01.00.01.00.00.01.00.00.00
9Cake1.01.01.00.01.01.01.01.01.00.00
\n", "
" ], "text/plain": [ " Items Alice Bob Charlie Daisy Edward Faye George Harriet Imogen \\\n", "0 Apple 1.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0 \n", "1 Banana 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 \n", "2 Pear 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 \n", "3 Chicken 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 \n", "4 Beef 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 \n", "5 Lamb 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 \n", "6 Pizza 1.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 \n", "7 Pasta 1.0 1.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 \n", "8 Rice 1.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 \n", "9 Cake 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 \n", "\n", " John Kyle \n", "0 0.0 0 \n", "1 1.0 0 \n", "2 0.0 0 \n", "3 1.0 0 \n", "4 0.0 1 \n", "5 1.0 0 \n", "6 1.0 0 \n", "7 1.0 0 \n", "8 0.0 0 \n", "9 0.0 0 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#df.style.set_properties(**{'background-color': '#7f3fee'}, subset=['Kyle'])\n", "df = df.drop(['Kyle'], axis=1)\n", "df['Kyle'] = [0,0,0,0,1,0,0,0,0,0]\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose we now know that Kyle likes Beef. We could initialise his profile with this information." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AliceDaisyHarrietImogenKyle
01.01.00.01.00
11.00.00.00.00
20.01.00.01.00
30.00.00.00.00
41.01.01.01.01
50.00.00.00.00
61.00.01.00.00
71.00.00.00.00
81.00.01.00.00
91.00.01.01.00
\n", "
" ], "text/plain": [ " Alice Daisy Harriet Imogen Kyle\n", "0 1.0 1.0 0.0 1.0 0\n", "1 1.0 0.0 0.0 0.0 0\n", "2 0.0 1.0 0.0 1.0 0\n", "3 0.0 0.0 0.0 0.0 0\n", "4 1.0 1.0 1.0 1.0 1\n", "5 0.0 0.0 0.0 0.0 0\n", "6 1.0 0.0 1.0 0.0 0\n", "7 1.0 0.0 0.0 0.0 0\n", "8 1.0 0.0 1.0 0.0 0\n", "9 1.0 0.0 1.0 1.0 0" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['Kyle'].argmax()\n", "p = df.iloc[df['Kyle'].argmax(),:]\n", "p = p[p==1]\n", "v = len(p)\n", "p = df[p.index]\n", "p" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above shows us who else also likes Beef, allowing us to see who is more similar to Kyle." ] }, { "cell_type": "code", "execution_count": 121, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AliceEdwardHarrietJohnKyleItems
00.00.00.00.00.00Apple
11.00.00.00.00.25Banana
20.01.01.01.00.75Pear
30.00.00.00.00.00Chicken
41.01.01.01.01.00Beef
51.00.00.01.00.50Lamb
61.00.00.01.00.50Pizza
71.01.01.01.01.00Pasta
81.00.00.01.00.50Rice
91.00.00.01.00.50Cake
\n", "
" ], "text/plain": [ " Alice Edward Harriet John Kyle Items\n", "0 0.0 0.0 0.0 0.0 0.00 Apple\n", "1 1.0 0.0 0.0 0.0 0.25 Banana\n", "2 0.0 1.0 1.0 1.0 0.75 Pear\n", "3 0.0 0.0 0.0 0.0 0.00 Chicken\n", "4 1.0 1.0 1.0 1.0 1.00 Beef\n", "5 1.0 0.0 0.0 1.0 0.50 Lamb\n", "6 1.0 0.0 0.0 1.0 0.50 Pizza\n", "7 1.0 1.0 1.0 1.0 1.00 Pasta\n", "8 1.0 0.0 0.0 1.0 0.50 Rice\n", "9 1.0 0.0 0.0 1.0 0.50 Cake" ] }, "execution_count": 121, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p = p.drop(['Kyle'], axis=1)\n", "p['Kyle'] = p.values[:,:].sum(axis=1) / (v-1)\n", "p[\"Items\"] = df['Items']\n", "p" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since we know for sure that Kyle likes Beef, we can use this subset of users to get a more precise initialisation of Kyle's preferences. As a simple observation, previously we were only 0.6 as to whether Kyle liked Pasta, when we had a \"cold\" initialisation of his profile. Knowing that he likes Beef, we are now very confident that he likes Pasta, since everyone in our dataset who likes Beef also likes Pasta, and hence his probability for this now increases to 1.0." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 4 }