{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Classical Music Recommendation Playground\n", "\n", "This notebook will show how to implement simple recommender system follwing two different approaches: **Collaborative Filtering** (user based) and **Content Based** recommendation.\n", "\n", "> DISCLAIMER:\n", "> The used dataset is NOT a real dataset, but it has been artificially generated for the Tutorial purposes.\n", "> It absolutely should NOT be used as training data for any application." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import scipy.spatial.distance as distance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data import\n", "\n", "### data.csv\n", "\n", "User listening experience dataset. 100 **users** -- numeric identifiers from 0 to 99 -- interact (or not) with 100 **items** (classical composers).\n", "\n", "1 = interaction, 0 = no interaction (implicit feedback)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Wolfgang Amadeus MozartFranz LisztJoseph HaydnJohannes BrahmsRobert SchumannAntonio VivaldiRoland de LassusFrédéric ChopinFranz SchubertDomenico Scarlatti...Arnold SchoenbergBruno MantovaniAntonín DvořákPiotr Ilitch TchaïkovskiJohann Christian BachAaron CoplandFerruccio BusoniRalph Vaughan WilliamsZoltán KodályLeonard Bernstein
01100110001...0000100000
11111111111...1011110011
20100000110...0000000001
30000000100...0010000000
40101010001...0010000001
\n", "

5 rows × 100 columns

\n", "
" ], "text/plain": [ " Wolfgang Amadeus Mozart Franz Liszt Joseph Haydn Johannes Brahms \\\n", "0 1 1 0 0 \n", "1 1 1 1 1 \n", "2 0 1 0 0 \n", "3 0 0 0 0 \n", "4 0 1 0 1 \n", "\n", " Robert Schumann Antonio Vivaldi Roland de Lassus Frédéric Chopin \\\n", "0 1 1 0 0 \n", "1 1 1 1 1 \n", "2 0 0 0 1 \n", "3 0 0 0 1 \n", "4 0 1 0 0 \n", "\n", " Franz Schubert Domenico Scarlatti ... Arnold Schoenberg \\\n", "0 0 1 ... 0 \n", "1 1 1 ... 1 \n", "2 1 0 ... 0 \n", "3 0 0 ... 0 \n", "4 0 1 ... 0 \n", "\n", " Bruno Mantovani Antonín Dvořák Piotr Ilitch Tchaïkovski \\\n", "0 0 0 0 \n", "1 0 1 1 \n", "2 0 0 0 \n", "3 0 1 0 \n", "4 0 1 0 \n", "\n", " Johann Christian Bach Aaron Copland Ferruccio Busoni \\\n", "0 1 0 0 \n", "1 1 1 0 \n", "2 0 0 0 \n", "3 0 0 0 \n", "4 0 0 0 \n", "\n", " Ralph Vaughan Williams Zoltán Kodály Leonard Bernstein \n", "0 0 0 0 \n", "1 0 1 1 \n", "2 0 0 1 \n", "3 0 0 0 \n", "4 0 0 1 \n", "\n", "[5 rows x 100 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = pd.read_csv('data.csv', index_col=0)\n", "data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here a (quite-predictable) list of the top 10 most popular composer." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Wolfgang Amadeus Mozart 89\n", "Ludwig van Beethoven 83\n", "Johann Strauss 80\n", "Antonio Vivaldi 79\n", "Johann Sebastian Bach 76\n", "Joseph Haydn 73\n", "Georg Friedrich Haendel 71\n", "Franz Liszt 71\n", "Maurice Ravel 71\n", "Giuseppe Verdi 70\n", "dtype: int64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.sum().sort_values(ascending=False)[0:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### artists.csv\n", "\n", "The 100 involved artists with label, uris and 17 embedding dimension coming from the [music embeddings repo](https://github.com/DOREMUS-ANR/music-embeddings)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
label012345678910111213141516
uri
http://data.doremus.org/artist/4802a043-23bb-3b8d-a443-4a3bd22ccc63Wolfgang Amadeus Mozart-0.0494240.0129720.0304350.6723810.705714-0.0032920.0403970.0333740.023954-0.0852310.1582340.0446640.0181660.010425-0.1119560.142959-0.030154
http://data.doremus.org/artist/aabcd2ee-ac9b-30f2-8096-e9de8b3c7a81Franz Liszt0.000628-0.008797-0.0075130.7247620.7961910.0109840.0173130.0721200.025766-0.0851520.1564240.0430580.0203980.032555-0.0489860.033035-0.001050
http://data.doremus.org/artist/12fa21ff-cfa4-31d6-87d9-a22315193b04Joseph Haydn-2.000000-2.000000-2.0000000.6495240.722857-0.0014220.0355830.0295200.023965-0.0852920.1584000.0421920.0088990.007677-0.0914350.1475380.000403
http://data.doremus.org/artist/f9a2ac39-a62d-3be2-8abb-e564de0ec96dJohannes Brahms-0.010495-0.0039600.0009200.7457140.8066670.0033210.0197240.0590370.024170-0.0854090.1582570.0447970.0289430.022953-0.0631010.111707-0.025850
http://data.doremus.org/artist/f753314d-87a7-32a9-9218-da98ae4f9812Robert Schumann0.000628-0.008797-0.0075130.7238100.7676190.0033860.0218360.0593320.023935-0.0849140.1581040.0452420.0305600.023502-0.1299610.110820-0.069069
\n", "
" ], "text/plain": [ " label \\\n", "uri \n", "http://data.doremus.org/artist/4802a043-23bb-3b... Wolfgang Amadeus Mozart \n", "http://data.doremus.org/artist/aabcd2ee-ac9b-30... Franz Liszt \n", "http://data.doremus.org/artist/12fa21ff-cfa4-31... Joseph Haydn \n", "http://data.doremus.org/artist/f9a2ac39-a62d-3b... Johannes Brahms \n", "http://data.doremus.org/artist/f753314d-87a7-32... Robert Schumann \n", "\n", " 0 1 \\\n", "uri \n", "http://data.doremus.org/artist/4802a043-23bb-3b... -0.049424 0.012972 \n", "http://data.doremus.org/artist/aabcd2ee-ac9b-30... 0.000628 -0.008797 \n", "http://data.doremus.org/artist/12fa21ff-cfa4-31... -2.000000 -2.000000 \n", "http://data.doremus.org/artist/f9a2ac39-a62d-3b... -0.010495 -0.003960 \n", "http://data.doremus.org/artist/f753314d-87a7-32... 0.000628 -0.008797 \n", "\n", " 2 3 \\\n", "uri \n", "http://data.doremus.org/artist/4802a043-23bb-3b... 0.030435 0.672381 \n", "http://data.doremus.org/artist/aabcd2ee-ac9b-30... -0.007513 0.724762 \n", "http://data.doremus.org/artist/12fa21ff-cfa4-31... -2.000000 0.649524 \n", "http://data.doremus.org/artist/f9a2ac39-a62d-3b... 0.000920 0.745714 \n", "http://data.doremus.org/artist/f753314d-87a7-32... -0.007513 0.723810 \n", "\n", " 4 5 \\\n", "uri \n", "http://data.doremus.org/artist/4802a043-23bb-3b... 0.705714 -0.003292 \n", "http://data.doremus.org/artist/aabcd2ee-ac9b-30... 0.796191 0.010984 \n", "http://data.doremus.org/artist/12fa21ff-cfa4-31... 0.722857 -0.001422 \n", "http://data.doremus.org/artist/f9a2ac39-a62d-3b... 0.806667 0.003321 \n", "http://data.doremus.org/artist/f753314d-87a7-32... 0.767619 0.003386 \n", "\n", " 6 7 \\\n", "uri \n", "http://data.doremus.org/artist/4802a043-23bb-3b... 0.040397 0.033374 \n", "http://data.doremus.org/artist/aabcd2ee-ac9b-30... 0.017313 0.072120 \n", "http://data.doremus.org/artist/12fa21ff-cfa4-31... 0.035583 0.029520 \n", "http://data.doremus.org/artist/f9a2ac39-a62d-3b... 0.019724 0.059037 \n", "http://data.doremus.org/artist/f753314d-87a7-32... 0.021836 0.059332 \n", "\n", " 8 9 \\\n", "uri \n", "http://data.doremus.org/artist/4802a043-23bb-3b... 0.023954 -0.085231 \n", "http://data.doremus.org/artist/aabcd2ee-ac9b-30... 0.025766 -0.085152 \n", "http://data.doremus.org/artist/12fa21ff-cfa4-31... 0.023965 -0.085292 \n", "http://data.doremus.org/artist/f9a2ac39-a62d-3b... 0.024170 -0.085409 \n", "http://data.doremus.org/artist/f753314d-87a7-32... 0.023935 -0.084914 \n", "\n", " 10 11 \\\n", "uri \n", "http://data.doremus.org/artist/4802a043-23bb-3b... 0.158234 0.044664 \n", "http://data.doremus.org/artist/aabcd2ee-ac9b-30... 0.156424 0.043058 \n", "http://data.doremus.org/artist/12fa21ff-cfa4-31... 0.158400 0.042192 \n", "http://data.doremus.org/artist/f9a2ac39-a62d-3b... 0.158257 0.044797 \n", "http://data.doremus.org/artist/f753314d-87a7-32... 0.158104 0.045242 \n", "\n", " 12 13 \\\n", "uri \n", "http://data.doremus.org/artist/4802a043-23bb-3b... 0.018166 0.010425 \n", "http://data.doremus.org/artist/aabcd2ee-ac9b-30... 0.020398 0.032555 \n", "http://data.doremus.org/artist/12fa21ff-cfa4-31... 0.008899 0.007677 \n", "http://data.doremus.org/artist/f9a2ac39-a62d-3b... 0.028943 0.022953 \n", "http://data.doremus.org/artist/f753314d-87a7-32... 0.030560 0.023502 \n", "\n", " 14 15 \\\n", "uri \n", "http://data.doremus.org/artist/4802a043-23bb-3b... -0.111956 0.142959 \n", "http://data.doremus.org/artist/aabcd2ee-ac9b-30... -0.048986 0.033035 \n", "http://data.doremus.org/artist/12fa21ff-cfa4-31... -0.091435 0.147538 \n", "http://data.doremus.org/artist/f9a2ac39-a62d-3b... -0.063101 0.111707 \n", "http://data.doremus.org/artist/f753314d-87a7-32... -0.129961 0.110820 \n", "\n", " 16 \n", "uri \n", "http://data.doremus.org/artist/4802a043-23bb-3b... -0.030154 \n", "http://data.doremus.org/artist/aabcd2ee-ac9b-30... -0.001050 \n", "http://data.doremus.org/artist/12fa21ff-cfa4-31... 0.000403 \n", "http://data.doremus.org/artist/f9a2ac39-a62d-3b... -0.025850 \n", "http://data.doremus.org/artist/f753314d-87a7-32... -0.069069 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "artists = pd.read_csv('artists.csv', index_col=0)\n", "artists.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Utils" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Wolfgang Amadeus Mozart',\n", " 'Joseph Haydn',\n", " 'Robert Schumann',\n", " 'Antonio Vivaldi',\n", " 'Franz Schubert',\n", " 'Georg Philipp Telemann',\n", " 'Ludwig van Beethoven',\n", " 'Alessandro Scarlatti',\n", " 'Benjamin Britten',\n", " 'Johann Sebastian Bach',\n", " 'Richard Wagner',\n", " 'Luigi Cherubini',\n", " 'Giuseppe Verdi',\n", " 'Johann Strauss',\n", " 'Niccolò Paganini',\n", " 'Gaetano Donizetti',\n", " 'Jean-Baptiste Lully',\n", " 'George Gershwin',\n", " 'Piotr Ilitch Tchaïkovski']" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# returns the list of items which a given user has interacted to\n", "def get_items(id):\n", " user = data.loc[id]\n", " return user[user[:] == 1].axes[0].tolist()\n", "\n", "# example user 7\n", "get_items(7)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "masked_array(data=[-0.049423877149820335, 0.012972225435078144,\n", " 0.030434519052505493, 0.6723809242248535,\n", " 0.7057142853736877, -0.003291688393801451,\n", " 0.04039749875664711, 0.03337432071566582,\n", " 0.023954134434461597, -0.08523057401180267,\n", " 0.1582336723804474, 0.04466380551457405,\n", " 0.018166353926062584, 0.01042507402598858,\n", " -0.11195577681064606, 0.14295919239521027,\n", " -0.030153987929224968],\n", " mask=[False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False],\n", " fill_value=1e+20)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# retrieve the embeddings for an artists given a label\n", "def get_emb(label):\n", " a = artists.loc[artists['label'] == label]\n", " embs = a.drop('label', axis=1).values[0]\n", " \n", " return np.ma.array(embs, mask=embs == -2.)\n", "\n", " \n", "# example\n", "get_emb('Wolfgang Amadeus Mozart')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Collaborative filtering" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Find most similar users to the given one\n", "def most_similar_users(user, k):\n", " user_vec = user.values # user listening\n", " \n", " # search among all the other users\n", " pool = data.drop(user.name)\n", " \n", " # apply the cosine distance to each element of the pool, and sort accordingly\n", " pool['distance'] = data.apply(lambda u: distance.cosine(user_vec, u.values), axis=1)\n", " pool = pool.sort_values('distance').drop('distance', 1)\n", " \n", " # return the first k users\n", " return pool[:k]" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Select the most popular artists among a subset of users\n", "def most_popular_among(user_subset, k=10):\n", " return user_subset.sum().sort_values(ascending=False).index.tolist()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Recommend artists by looking at similar users\n", "def collaborative_filtering(user, k=10):\n", " _user = data.loc[user]\n", " # find k most similar users\n", " similar_users = most_similar_users(_user, k)\n", " # get k closest items\n", " most_popular = most_popular_among(similar_users)\n", " # remove the ones already in the list\n", " prediction = [x for x in most_popular if x not in get_items(user)]\n", " return prediction[:k]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Which artist would be recommended to our user? He already listened these ones:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Wolfgang Amadeus Mozart',\n", " 'Frédéric Chopin',\n", " 'Franz Schubert',\n", " 'Ludwig van Beethoven',\n", " 'Carl Philipp Emanuel Bach',\n", " 'Richard Strauss',\n", " 'Francis Poulenc',\n", " 'Maurice Ravel',\n", " 'Felix Mendelssohn Bartholdy',\n", " 'Giuseppe Verdi',\n", " 'Igor Stravinsky',\n", " 'Georg Friedrich Haendel']" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "user_example = 8\n", "get_items(user_example)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The recommendation proposes other Germans composers" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Johann Strauss',\n", " 'Franz Liszt',\n", " 'Johannes Brahms',\n", " 'Richard Wagner',\n", " 'Claude Debussy',\n", " 'Johann Sebastian Bach',\n", " 'Gustav Mahler',\n", " 'Luigi Cherubini',\n", " 'Jean-Baptiste Lully',\n", " 'Antonio Vivaldi']" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collaborative_filtering(user_example)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Content based recommendation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define similarity metric" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def compute_similarity(seed, target, w=1):\n", " b1 = np.where(seed.mask==True)[0]\n", " b2 = np.where(target.mask==True)[0]\n", " bad_pos = np.unique(np.concatenate([b1, b2]))\n", "\n", " _seed = np.delete(seed, bad_pos, axis=0)\n", " _target = np.delete(target, bad_pos, axis=0)\n", " _w = np.delete(w, bad_pos, axis=0)\n", "\n", " if len(_seed) == 0:\n", " return 0\n", "\n", " # distance\n", " d = weighted_l2(_seed, _target, _w)\n", "\n", " # how much info I am not finding\n", " penalty = len([x for x in b2 if x not in b1]) / len(seed)\n", "\n", " # score\n", " s = (max_distance - d) / max_distance\n", " return s * (1 - penalty)\n", "\n", "\n", "def weighted_l2(a, b, w=1):\n", " q = a - b\n", " return np.sqrt((w * q * q).sum())\n", "\n", "\n", "_ones = np.ones(17)\n", "max_distance = weighted_l2(_ones,-_ones, _ones)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute all the similarity scores between couple of artists and put them in a Data Frame." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:8: DeprecationWarning: in the future the special handling of scalars will be removed from delete and raise an error\n", " \n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelWolfgang Amadeus MozartFranz LisztJoseph HaydnJohannes BrahmsRobert SchumannAntonio VivaldiRoland de LassusFrédéric ChopinFranz SchubertDomenico Scarlatti...Arnold SchoenbergBruno MantovaniAntonín DvořákPiotr Ilitch TchaïkovskiJohann Christian BachAaron CoplandFerruccio BusoniRalph Vaughan WilliamsZoltán KodályLeonard Bernstein
label
Wolfgang Amadeus Mozart10.9773170.8187050.9815980.9850710.8142040.6240890.9769580.9873080.813783...0.8038840.5730750.8082750.8089630.8188150.6285350.8056520.803020.7975340.627795
Franz Liszt0.97731710.8065180.9890320.9835110.7988430.6161850.982430.9799110.800846...0.8078690.5769310.8138840.8138080.8052560.63640.8076720.807570.7989230.635377
Joseph Haydn0.9941420.97934310.9824940.9841840.9880960.7947930.9777260.9888630.990187...0.9757430.7429320.9811110.9826250.9932930.7998820.9763970.9741620.9702760.798628
Johannes Brahms0.9815980.9890320.80911310.9887740.8007530.6146460.9794360.9827090.802281...0.8151530.5783980.8187820.8173880.8070470.6380880.813230.8113940.8038530.637382
Robert Schumann0.9850710.9835110.8105040.98877410.8049820.6179330.9767340.9816110.804426...0.8101590.5768560.8134930.8128160.8105740.6346820.8140130.8099810.7989080.634023
\n", "

5 rows × 100 columns

\n", "
" ], "text/plain": [ "label Wolfgang Amadeus Mozart Franz Liszt Joseph Haydn \\\n", "label \n", "Wolfgang Amadeus Mozart 1 0.977317 0.818705 \n", "Franz Liszt 0.977317 1 0.806518 \n", "Joseph Haydn 0.994142 0.979343 1 \n", "Johannes Brahms 0.981598 0.989032 0.809113 \n", "Robert Schumann 0.985071 0.983511 0.810504 \n", "\n", "label Johannes Brahms Robert Schumann Antonio Vivaldi \\\n", "label \n", "Wolfgang Amadeus Mozart 0.981598 0.985071 0.814204 \n", "Franz Liszt 0.989032 0.983511 0.798843 \n", "Joseph Haydn 0.982494 0.984184 0.988096 \n", "Johannes Brahms 1 0.988774 0.800753 \n", "Robert Schumann 0.988774 1 0.804982 \n", "\n", "label Roland de Lassus Frédéric Chopin Franz Schubert \\\n", "label \n", "Wolfgang Amadeus Mozart 0.624089 0.976958 0.987308 \n", "Franz Liszt 0.616185 0.98243 0.979911 \n", "Joseph Haydn 0.794793 0.977726 0.988863 \n", "Johannes Brahms 0.614646 0.979436 0.982709 \n", "Robert Schumann 0.617933 0.976734 0.981611 \n", "\n", "label Domenico Scarlatti ... \\\n", "label ... \n", "Wolfgang Amadeus Mozart 0.813783 ... \n", "Franz Liszt 0.800846 ... \n", "Joseph Haydn 0.990187 ... \n", "Johannes Brahms 0.802281 ... \n", "Robert Schumann 0.804426 ... \n", "\n", "label Arnold Schoenberg Bruno Mantovani Antonín Dvořák \\\n", "label \n", "Wolfgang Amadeus Mozart 0.803884 0.573075 0.808275 \n", "Franz Liszt 0.807869 0.576931 0.813884 \n", "Joseph Haydn 0.975743 0.742932 0.981111 \n", "Johannes Brahms 0.815153 0.578398 0.818782 \n", "Robert Schumann 0.810159 0.576856 0.813493 \n", "\n", "label Piotr Ilitch Tchaïkovski Johann Christian Bach \\\n", "label \n", "Wolfgang Amadeus Mozart 0.808963 0.818815 \n", "Franz Liszt 0.813808 0.805256 \n", "Joseph Haydn 0.982625 0.993293 \n", "Johannes Brahms 0.817388 0.807047 \n", "Robert Schumann 0.812816 0.810574 \n", "\n", "label Aaron Copland Ferruccio Busoni Ralph Vaughan Williams \\\n", "label \n", "Wolfgang Amadeus Mozart 0.628535 0.805652 0.80302 \n", "Franz Liszt 0.6364 0.807672 0.80757 \n", "Joseph Haydn 0.799882 0.976397 0.974162 \n", "Johannes Brahms 0.638088 0.81323 0.811394 \n", "Robert Schumann 0.634682 0.814013 0.809981 \n", "\n", "label Zoltán Kodály Leonard Bernstein \n", "label \n", "Wolfgang Amadeus Mozart 0.797534 0.627795 \n", "Franz Liszt 0.798923 0.635377 \n", "Joseph Haydn 0.970276 0.798628 \n", "Johannes Brahms 0.803853 0.637382 \n", "Robert Schumann 0.798908 0.634023 \n", "\n", "[5 rows x 100 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "similarity_matrix = pd.DataFrame(index=artists['label'], columns=artists['label'])\n", "for i in np.arange(len(similarity_matrix)):\n", " seed = artists.iloc[i]['label']\n", " for j in np.arange(len(similarity_matrix)):\n", " \n", " if i == j:\n", " similarity_matrix.iloc[i][j] = 1\n", " continue\n", " \n", " target = artists.iloc[j]['label']\n", " similarity_matrix.iloc[i][j] = compute_similarity(get_emb(seed), get_emb(target))\n", " \n", "similarity_matrix.head()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "def content_based(user, k=10):\n", " _items = get_items(user)\n", " \n", " # remove the items already in the list\n", " candidates = similarity_matrix.drop(labels=_items, axis=1)\n", " \n", " # choose the artists that maximise the similarity among all the items\n", " candidates = candidates.loc[_items]\n", " return candidates.sum().sort_values(ascending=False).index.tolist()[0:k]" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Wolfgang Amadeus Mozart',\n", " 'Frédéric Chopin',\n", " 'Franz Schubert',\n", " 'Ludwig van Beethoven',\n", " 'Carl Philipp Emanuel Bach',\n", " 'Richard Strauss',\n", " 'Francis Poulenc',\n", " 'Maurice Ravel',\n", " 'Felix Mendelssohn Bartholdy',\n", " 'Giuseppe Verdi',\n", " 'Igor Stravinsky',\n", " 'Georg Friedrich Haendel']" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_items(user_example)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Johannes Brahms',\n", " 'Claude Debussy',\n", " 'Robert Schumann',\n", " 'Bedřich Smetana',\n", " 'César Franck',\n", " 'Jean Sibelius',\n", " 'Carl Maria von Weber',\n", " 'Gabriel Fauré',\n", " 'Edward Elgar',\n", " 'Franz Liszt']" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "content_based(user_example)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bonus \n", "\n", "What happens with a user that appreciate just a particular composer?" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Wolfgang Amadeus MozartFranz LisztJoseph HaydnJohannes BrahmsRobert SchumannAntonio VivaldiRoland de LassusFrédéric ChopinFranz SchubertDomenico Scarlatti...Arnold SchoenbergBruno MantovaniAntonín DvořákPiotr Ilitch TchaïkovskiJohann Christian BachAaron CoplandFerruccio BusoniRalph Vaughan WilliamsZoltán KodályLeonard Bernstein
1000.00.00.00.00.01.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
\n", "

1 rows × 100 columns

\n", "
" ], "text/plain": [ " Wolfgang Amadeus Mozart Franz Liszt Joseph Haydn Johannes Brahms \\\n", "100 0.0 0.0 0.0 0.0 \n", "\n", " Robert Schumann Antonio Vivaldi Roland de Lassus Frédéric Chopin \\\n", "100 0.0 1.0 0.0 0.0 \n", "\n", " Franz Schubert Domenico Scarlatti ... Arnold Schoenberg \\\n", "100 0.0 0.0 ... 0.0 \n", "\n", " Bruno Mantovani Antonín Dvořák Piotr Ilitch Tchaïkovski \\\n", "100 0.0 0.0 0.0 \n", "\n", " Johann Christian Bach Aaron Copland Ferruccio Busoni \\\n", "100 0.0 0.0 0.0 \n", "\n", " Ralph Vaughan Williams Zoltán Kodály Leonard Bernstein \n", "100 0.0 0.0 0.0 \n", "\n", "[1 rows x 100 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_user = np.zeros(len(data.loc[0]))\n", "new_user_id = len(data)\n", "data.loc[new_user_id] = new_user\n", "data.loc[new_user_id]['Antonio Vivaldi'] = 1\n", "data.loc[[new_user_id]]" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Wolfgang Amadeus Mozart',\n", " 'Johann Strauss',\n", " 'Claudio Monteverdi',\n", " 'Joseph Haydn',\n", " 'Piotr Ilitch Tchaïkovski',\n", " 'Georg Philipp Telemann',\n", " 'Johannes Brahms',\n", " 'Johann Sebastian Bach',\n", " 'Niccolò Paganini',\n", " 'Franz Liszt']" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collaborative_filtering(new_user_id)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Alessandro Scarlatti',\n", " 'Johann Sebastian Bach',\n", " 'Georg Friedrich Haendel',\n", " 'François Couperin',\n", " 'Henry Purcell',\n", " 'Carl Philipp Emanuel Bach',\n", " 'Domenico Scarlatti',\n", " 'Georg Philipp Telemann',\n", " 'Baldassare Galuppi',\n", " 'André Campra']" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "content_based(new_user_id)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 1 }