{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Combining recommended lists\n", "** *\n", "This IPython notebook consists in combining the Top-N recommended items from different recommender methodologies (here one list each coming from collaborative filtering, content-based, and most-popular) for a given user using interleaved ranking, in order to obtain a final recommended list.\n", "\n", "A simple approach to combine recommendations from different sources is to add or multiply the score that each item for a given user gets under each algorithm, but this might not end up changing the recommendations too much if the scores are dissimilar or if they come in the form of a ranking. Interleaved ranking – originally an algorithm for mixing search engine results – offers a method to force the final recommended list to be more “mixed” by making them contain elements from each list.\n", "\n", "There are different algorithms for making an interleaved ranked list – here I’ll use the simplest algorithm, also known as the soccer team selection, which intuitively is as follows: each recommended list gets to contribute items to the final list in a sequence, by trying to add their top-ranked item, but ignoring items that got already put in the final list by another recommended list.\n", "\n", "Here I’ll produce three different recommended lists of 20 items each using the [MovieLens 1M dataset](https://grouplens.org/datasets/movielens/1m/) for the user numbered $100$ (userId = 100) as follows:\n", "* Most-popular: each item’s score is the sum of the ratings they get from all users, thus favoring both highly rated and highly voted movies. This is a non-personalized list (i.e. it’s the same for all users).\n", "* Collaborative filtering: a low-rank matrix factorization of the ratings matrix using alternating least squares.\n", "* Content-based: regression of the (centered) ratings against the outer product of user and movie features – this is a more involved process and the details can be found [in this other IPython notebook](http://nbviewer.ipython.org/github/david-cortes/datascienceprojects/blob/master/machine_learning/recommender_system_w_coldstart.ipynb).\n", "\n", "** *\n", "## Sections\n", "\n", "[1. Loading the data](#p1)\n", "\n", "[2. Producing a Most-Popular recommended list](#p2)\n", "\n", "[3. Producing a Collaborative Filtering recommended list](#p3)\n", "\n", "[4. Producing a Content-Based recommended list](#p4)\n", "\n", "[5. Examining the recommendations](#p5)\n", "\n", "[6. Combining recommended lists](#p6)\n", "** *\n", "\n", "\n", "## 1. Loading the data\n", "\n", "Initiallizing spark locally (will be used for most computations) and loading the necessary libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np, pandas as pd, re, findspark\n", "from collections import defaultdict\n", "from sklearn.decomposition import PCA\n", "from scipy.sparse import csc_matrix\n", "\n", "findspark.init(\"/home/david/Downloads/spark-2.1.1-bin-hadoop2.7/\")\n", "\n", "import pyspark\n", "sc = pyspark.SparkContext()\n", "from pyspark.sql import SQLContext\n", "sqlContext = SQLContext(sc)\n", "\n", "from pyspark.mllib.regression import (LabeledPoint, RidgeRegressionWithSGD)\n", "from pyspark.ml.regression import LinearRegression\n", "from pyspark.ml.recommendation import ALS" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Loading the MovieLens-1M ratings:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
userIdmovieIdRatingTimestamp
0111935978300760
116613978302109
219143978301968
3134084978300275
4123555978824291
\n", "
" ], "text/plain": [ " userId movieId Rating Timestamp\n", "0 1 1193 5 978300760\n", "1 1 661 3 978302109\n", "2 1 914 3 978301968\n", "3 1 3408 4 978300275\n", "4 1 2355 5 978824291" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ratings=pd.read_table(\"/home/david/movielens/ml-1m/ml-1m/ratings.dat\", sep=\"::\", names=[\"userId\",\"movieId\",\"Rating\",\"Timestamp\"], engine='python')\n", "ratings.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Loading the movie titles encoding - will be used later to examine recommended lists:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "movie_titles=pd.read_csv('/home/david/movielens/ml-1m/ml-1m/movies.dat', sep=\"::\", names=['movieId','MovieTitle','genres'],engine='python')\n", "movie_titles={i.movieId:i.MovieTitle for i in movie_titles.itertuples()}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 2. Producing a Most-Popular recommended list\n", "\n", "Items are ranked by sum of their ratings:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NumRatingsAvgRatingscoreTitle
movieId
285834284.31738614800.0American Beauty (1999)
26029914.45369413321.0Star Wars: Episode IV - A New Hope (1977)
119629904.29297712836.0Star Wars: Episode V - The Empire Strikes Back...
121028834.02289311598.0Star Wars: Episode VI - Return of the Jedi (1983)
202826534.33735411507.0Saving Private Ryan (1998)
\n", "
" ], "text/plain": [ " NumRatings AvgRating score \\\n", "movieId \n", "2858 3428 4.317386 14800.0 \n", "260 2991 4.453694 13321.0 \n", "1196 2990 4.292977 12836.0 \n", "1210 2883 4.022893 11598.0 \n", "2028 2653 4.337354 11507.0 \n", "\n", " Title \n", "movieId \n", "2858 American Beauty (1999) \n", "260 Star Wars: Episode IV - A New Hope (1977) \n", "1196 Star Wars: Episode V - The Empire Strikes Back... \n", "1210 Star Wars: Episode VI - Return of the Jedi (1983) \n", "2028 Saving Private Ryan (1998) " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "user=100\n", "movies_watched_by_user=set(list(ratings.movieId.loc[ratings.userId==user]))\n", "\n", "avg_ratings=ratings.groupby('movieId')['Rating'].mean().to_frame().rename(columns={'Rating':'AvgRating'})\n", "num_ratings=ratings.groupby('movieId')['Rating'].agg(lambda x: len(tuple(x))).to_frame().rename(columns={'Rating':'NumRatings'})\n", "pop_rec=num_ratings.join(avg_ratings)\n", "pop_rec.loc[~pop_rec.index.isin(movies_watched_by_user)]\n", "pop_rec['score']=pop_rec.NumRatings*pop_rec.AvgRating\n", "pop_rec=pop_rec.sort_values('score',ascending=False)\n", "pop20=list(pop_rec.index[:20])\n", "pop_rec['Title']=pop_rec.index.map(lambda x: movie_titles[x])\n", "pop_rec.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 3. Producing a Collaborative Filtering recommended list\n", "\n", "Here I'm using ALS from PySpark to factorize the ratings matrix:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
userIdmovieIdscore_cfTitle
140510033824.950840Song of Freedom (1936)
33331005573.812159Mamma Roma (1962)
12441009893.618343Schlafes Bruder (Brother of Sleep) (1995)
5121005783.510315Hour of the Pig, The (1993)
263310032333.498407Smashing Time (1967)
\n", "
" ], "text/plain": [ " userId movieId score_cf Title\n", "1405 100 3382 4.950840 Song of Freedom (1936)\n", "3333 100 557 3.812159 Mamma Roma (1962)\n", "1244 100 989 3.618343 Schlafes Bruder (Brother of Sleep) (1995)\n", "512 100 578 3.510315 Hour of the Pig, The (1993)\n", "2633 100 3233 3.498407 Smashing Time (1967)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ratings_df=sqlContext.createDataFrame(ratings)\n", "\n", "cfmodel=ALS(rank=50, regParam=0.5, userCol=\"userId\", itemCol=\"movieId\", ratingCol=\"Rating\").fit(ratings_df)\n", "movies_available=set(list(ratings.movieId))\n", "movies_available=movies_available.difference(movies_watched_by_user)\n", "preds=pd.DataFrame([(user,m) for m in movies_available],columns=['userId','movieId'])\n", "preds_df=sqlContext.createDataFrame(preds)\n", "preds_scores=cfmodel.transform(preds_df).collect()\n", "preds_scores=pd.DataFrame(preds_scores, columns=['userId','movieId','score_cf'])\n", "preds_scores=preds_scores.sort_values('score_cf',ascending=False)\n", "cf20=list(preds_scores.movieId.iloc[:20])\n", "preds_scores['Title']=preds_scores.movieId.map(lambda x: movie_titles[x])\n", "preds_scores.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 4. Producing a Content-Based recommended list\n", "\n", "The overall idea is to get user demographic info including their geographical region, which I get from their zip codes by using some free zip code databases, and movie information by taking the movie tags from the latest movielens releases, matching them by title to the movielens-1m ratings and adding the movie genres and release year as a discretized category.\n", "\n", "Then, a regression is performed on the centered rating against the outer product of the user and movie features - a more detailed and explained version can be found [here](http://nbviewer.ipython.org/github/david-cortes/datascienceprojects/blob/master/machine_learning/recommender_system_w_coldstart.ipynb)." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/david/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.py:179: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame\n", "\n", "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n", " self._setitem_with_indexer(indexer, value)\n", "/home/david/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py:2717: DtypeWarning: Columns (11) have mixed types. Specify dtype option on import or set low_memory=False.\n", " interactivity=interactivity, compiler=compiler, result=result)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
userIdmovieIdscore_cfTitlescore_cb
319110012623.136641Great Escape, The (1963)1.030581
83510030303.163622Yojimbo (1961)1.015274
3981009083.137593North by Northwest (1959)1.012968
35410034353.167782Double Indemnity (1944)1.003191
47310031963.033045Stalag 17 (1953)0.998952
\n", "
" ], "text/plain": [ " userId movieId score_cf Title score_cb\n", "3191 100 1262 3.136641 Great Escape, The (1963) 1.030581\n", "835 100 3030 3.163622 Yojimbo (1961) 1.015274\n", "398 100 908 3.137593 North by Northwest (1959) 1.012968\n", "354 100 3435 3.167782 Double Indemnity (1944) 1.003191\n", "473 100 3196 3.033045 Stalag 17 (1953) 0.998952" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies=pd.read_csv('/home/david/movielens/ml-latest/ml-latest/movies.csv')\n", "movies_humanreadable=movies.copy()\n", "movies['hasYear']=movies.title.map(lambda x: bool(re.search(\"\\s\\((\\d{4})\\)$\",x.strip())))\n", "movies['Year']='unknown'\n", "movies['Year'].loc[movies.hasYear]=movies.title.loc[movies.hasYear].map(lambda x: re.search(\"\\s\\((\\d{4})\\)$\",x.strip()).group(1))\n", "del movies['hasYear']\n", "\n", "movies['genres']=movies.genres.map(lambda x: set(x.split('|')))\n", "present_genres=set()\n", "for movie in movies.itertuples():\n", " present_genres=present_genres.union(movie.genres)\n", "for genre in present_genres:\n", " movies['genre'+genre]=movies.genres.map(lambda x: 1.0*(genre in x))\n", "\n", "tags=pd.read_csv('/home/david/movielens/ml-latest/ml-latest/genome-scores.csv')\n", "tags_wide=tags.pivot(index='movieId', columns='tagId', values='relevance')\n", "tags_wide=tags_wide.fillna(0)\n", "pca=PCA(svd_solver='full')\n", "pca.fit(tags_wide)\n", "tags_pca=pd.DataFrame(pca.transform(tags_wide)[:,:50])\n", "tags_pca.columns=[\"pc\"+str(x) for x in tags_pca.columns.values]\n", "tags_pca['movieId']=tags_wide.index\n", "movies=pd.merge(movies,tags_pca,how='inner',on='movieId')\n", "\n", "def discretize_year(x):\n", " if x=='unknown':\n", " return x\n", " else:\n", " x=int(x)\n", " if x>=2000:\n", " return '>=2000'\n", " if x>=1995 and x<=1999:\n", " return str(x)\n", " if x>=1990 and x<=1994:\n", " return 'low90s'\n", " if x>=1980 and x<=1989:\n", " return '80s'\n", " if x>=1970 and x<=1979:\n", " return '70s'\n", " if x>=1960 and x<=1969:\n", " return '60s'\n", " if x>=1950 and x<=1959:\n", " return '50s'\n", " if x>=1940 and x<=1959:\n", " return '40s'\n", " if x<1940:\n", " return '<1940'\n", " else:\n", " return 'unknown'\n", "\n", "movies_features=movies.copy()\n", "del movies_features['title']\n", "del movies_features['genres']\n", "del movies_features['genre(no genres listed)']\n", "movies_features['Year']=movies_features.Year.map(lambda x: discretize_year(x))\n", "movies_features=pd.get_dummies(movies_features, columns=['Year'])\n", "movies_features.set_index('movieId',inplace=True)\n", "\n", "zipcode_abbs=pd.read_csv(\"/home/david/movielens/zips/states.csv\")\n", "zipcode_abbs_dct={z.State:z.Abbreviation for z in zipcode_abbs.itertuples()}\n", "us_regs_table=[\n", " ('New England', 'Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont'),\n", " ('Middle Atlantic', 'Delaware, Maryland, New Jersey, New York, Pennsylvania'),\n", " ('South', 'Alabama, Arkansas, Florida, Georgia, Kentucky, Louisiana, Mississippi, Missouri, North Carolina, South Carolina, Tennessee, Virginia, West Virginia'),\n", " ('Midwest', 'Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Nebraska, North Dakota, Ohio, South Dakota, Wisconsin'),\n", " ('Southwest', 'Arizona, New Mexico, Oklahoma, Texas'),\n", " ('West', 'Alaska, California, Colorado, Hawaii, Idaho, Montana, Nevada, Oregon, Utah, Washington, Wyoming')\n", " ]\n", "us_regs_table=[(x[0],[i.strip() for i in x[1].split(\",\")]) for x in us_regs_table]\n", "us_regs_dct=dict()\n", "for r in us_regs_table:\n", " for s in r[1]:\n", " us_regs_dct[zipcode_abbs_dct[s]]=r[0]\n", "\n", "zipcode_info=pd.read_csv(\"/home/david/movielens/free-zipcode-database.csv\")\n", "zipcode_info=zipcode_info.groupby('Zipcode').first().reset_index()\n", "zipcode_info['State'].loc[zipcode_info.Country!=\"US\"]='UnknownOrNonUS'\n", "zipcode_info['Region']=zipcode_info['State'].copy()\n", "zipcode_info['Region'].loc[zipcode_info.Country==\"US\"]=zipcode_info.Region.loc[zipcode_info.Country==\"US\"].map(lambda x: us_regs_dct[x] if x in us_regs_dct else 'UsOther')\n", "zipcode_info=zipcode_info[['Zipcode', 'Region']]\n", "\n", "users=pd.read_table(\"/home/david/movielens/ml-1m/ml-1m/users.dat\",sep='::',names=[\"userId\",\"Gender\",\"Age\",\"Occupation\",\"Zipcode\"], engine='python')\n", "users[\"Zipcode\"]=users.Zipcode.map(lambda x: np.int(re.sub(\"-.*\",\"\",x)))\n", "users=pd.merge(users,zipcode_info,on='Zipcode',how='left')\n", "users['Region']=users.Region.fillna('UnknownOrNonUS')\n", "\n", "users_features=users.copy()\n", "users_features['Gender']=users_features.Gender.map(lambda x: 1.0*(x=='M'))\n", "del users_features['Zipcode']\n", "users_features['Age']=users_features.Age.map(lambda x: str(x))\n", "users_features['Occupation']=users_features.Occupation.map(lambda x: str(x))\n", "users_features=pd.get_dummies(users_features, columns=['Age', 'Occupation', 'Region'])\n", "users_features.set_index('userId',inplace=True)\n", "\n", "movies_w_sideinfo=set(list(movies.movieId))\n", "ratings=ratings.loc[ratings.movieId.map(lambda x: x in movies_w_sideinfo)]\n", "avg_rating_by_user=ratings.groupby('userId')['Rating'].mean().to_frame().rename(columns={'Rating':'AvgRating'})\n", "ratings_train=pd.merge(ratings, avg_rating_by_user, left_on='userId',right_index=True)\n", "ratings_train['RatingCentered']=ratings_train.Rating-ratings_train.AvgRating\n", "\n", "def generate_features(user,movie,users_features_bc,movies_features_bc):\n", " user_feats=users_features_bc.value.loc[user].as_matrix()\n", " movie_feats=movies_features_bc.value.loc[movie].as_matrix()\n", " return csc_matrix(np.kron(user_feats,movie_feats).reshape(-1,1))\n", "\n", "users_features_bc=sc.broadcast(users_features)\n", "movies_features_bc=sc.broadcast(movies_features)\n", "\n", "trainset=sc.parallelize([(i.userId,i.movieId,i.RatingCentered) for i in ratings_train.itertuples()])\\\n", ".map(lambda x: LabeledPoint(x[2],generate_features(x[0],x[1],users_features_bc,movies_features_bc)))\\\n", ".map(lambda x: (float(x.label),x.features.asML())).toDF(['label','features'])\n", "trainset.repartition(50)\n", "\n", "recommender=LinearRegression(regParam=1e-4).fit(trainset)\n", "formula_coeffs=recommender.coefficients.toArray()\n", "\n", "def generate_features_series(user,movie):\n", " user_feats=users_features.loc[user].as_matrix()\n", " movie_feats=movies_features.loc[movie].as_matrix()\n", " return pd.Series(np.kron(user_feats,movie_feats).astype('float64'))\n", "\n", "preds_scores=preds_scores.loc[preds_scores.movieId.map(lambda x: x in movies_w_sideinfo)]\n", "X_predict=preds_scores.movieId.apply(lambda x: generate_features_series(user,x))\n", "preds_scores['score_cb']=X_predict.dot(formula_coeffs)\n", "preds_scores=preds_scores.sort_values('score_cb',ascending=False)\n", "cb20=list(preds_scores.movieId.iloc[:20])\n", "preds_scores.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 5. Examining the recommendations\n", "\n", "Now taking a look at what these lists are actually recommend each - their recommendations are very different with little intersection, and as expected, collaborative filtering tends to favor less popular items for this user. First Most-Popular recommended list:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1) - American Beauty (1999) - Average Rating: 4.32 - Number of ratings: 3428\n", "2) - Star Wars: Episode IV - A New Hope (1977) - Average Rating: 4.45 - Number of ratings: 2991\n", "3) - Star Wars: Episode V - The Empire Strikes Back (1980) - Average Rating: 4.29 - Number of ratings: 2990\n", "4) - Star Wars: Episode VI - Return of the Jedi (1983) - Average Rating: 4.02 - Number of ratings: 2883\n", "5) - Saving Private Ryan (1998) - Average Rating: 4.34 - Number of ratings: 2653\n", "6) - Raiders of the Lost Ark (1981) - Average Rating: 4.48 - Number of ratings: 2514\n", "7) - Silence of the Lambs, The (1991) - Average Rating: 4.35 - Number of ratings: 2578\n", "8) - Matrix, The (1999) - Average Rating: 4.32 - Number of ratings: 2590\n", "9) - Sixth Sense, The (1999) - Average Rating: 4.41 - Number of ratings: 2459\n", "10) - Terminator 2: Judgment Day (1991) - Average Rating: 4.06 - Number of ratings: 2649\n", "11) - Fargo (1996) - Average Rating: 4.25 - Number of ratings: 2513\n", "12) - Schindler's List (1993) - Average Rating: 4.51 - Number of ratings: 2304\n", "13) - Braveheart (1995) - Average Rating: 4.23 - Number of ratings: 2443\n", "14) - Back to the Future (1985) - Average Rating: 3.99 - Number of ratings: 2583\n", "15) - Shawshank Redemption, The (1994) - Average Rating: 4.55 - Number of ratings: 2227\n", "16) - Godfather, The (1972) - Average Rating: 4.52 - Number of ratings: 2223\n", "17) - Jurassic Park (1993) - Average Rating: 3.76 - Number of ratings: 2672\n", "18) - Princess Bride, The (1987) - Average Rating: 4.3 - Number of ratings: 2318\n", "19) - Shakespeare in Love (1998) - Average Rating: 4.13 - Number of ratings: 2369\n", "20) - L.A. Confidential (1997) - Average Rating: 4.22 - Number of ratings: 2288\n" ] } ], "source": [ "def print_reclist(reclist):\n", " list_w_info=[str(m+1)+\") - \"+movie_titles[reclist[m]]+\\\n", " \" - Average Rating: \"+str(np.round(avg_ratings.loc[reclist[m]].iloc[0],2))+\\\n", " \" - Number of ratings: \"+str(num_ratings.loc[reclist[m]].iloc[0]) for m in range(len(reclist))]\n", " print \"\\n\".join(list_w_info)\n", " \n", "print_reclist(pop20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Collaborative filtering recommended list:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1) - Song of Freedom (1936) - Average Rating: 5.0 - Number of ratings: 1\n", "2) - Mamma Roma (1962) - Average Rating: 4.5 - Number of ratings: 2\n", "3) - Schlafes Bruder (Brother of Sleep) (1995) - Average Rating: 5.0 - Number of ratings: 1\n", "4) - Hour of the Pig, The (1993) - Average Rating: 4.5 - Number of ratings: 2\n", "5) - Smashing Time (1967) - Average Rating: 5.0 - Number of ratings: 2\n", "6) - Gate of Heavenly Peace, The (1995) - Average Rating: 5.0 - Number of ratings: 3\n", "7) - Apple, The (Sib) (1998) - Average Rating: 4.67 - Number of ratings: 9\n", "8) - Ulysses (Ulisse) (1954) - Average Rating: 5.0 - Number of ratings: 1\n", "9) - Follow the Bitch (1998) - Average Rating: 5.0 - Number of ratings: 1\n", "10) - I Am Cuba (Soy Cuba/Ya Kuba) (1964) - Average Rating: 4.8 - Number of ratings: 5\n", "11) - One Little Indian (1973) - Average Rating: 5.0 - Number of ratings: 1\n", "12) - Lamerica (1994) - Average Rating: 4.75 - Number of ratings: 8\n", "13) - Foreign Student (1994) - Average Rating: 3.0 - Number of ratings: 2\n", "14) - Sanjuro (1962) - Average Rating: 4.61 - Number of ratings: 69\n", "15) - Lured (1947) - Average Rating: 5.0 - Number of ratings: 1\n", "16) - Bells, The (1926) - Average Rating: 4.5 - Number of ratings: 2\n", "17) - Bittersweet Motel (2000) - Average Rating: 5.0 - Number of ratings: 1\n", "18) - Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954) - Average Rating: 4.56 - Number of ratings: 628\n", "19) - Jar, The (Khomreh) (1992) - Average Rating: 4.0 - Number of ratings: 1\n", "20) - For All Mankind (1989) - Average Rating: 4.44 - Number of ratings: 27\n" ] } ], "source": [ "print_reclist(cf20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Content-based recommended list:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1) - Great Escape, The (1963) - Average Rating: 4.38 - Number of ratings: 696\n", "2) - Yojimbo (1961) - Average Rating: 4.4 - Number of ratings: 215\n", "3) - North by Northwest (1959) - Average Rating: 4.38 - Number of ratings: 1315\n", "4) - Double Indemnity (1944) - Average Rating: 4.42 - Number of ratings: 551\n", "5) - Stalag 17 (1953) - Average Rating: 4.23 - Number of ratings: 394\n", "6) - It Happened One Night (1934) - Average Rating: 4.28 - Number of ratings: 374\n", "7) - Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954) - Average Rating: 4.56 - Number of ratings: 628\n", "8) - Gladiator (2000) - Average Rating: 4.11 - Number of ratings: 1924\n", "9) - Casablanca (1942) - Average Rating: 4.41 - Number of ratings: 1669\n", "10) - Third Man, The (1949) - Average Rating: 4.45 - Number of ratings: 480\n", "11) - Maltese Falcon, The (1941) - Average Rating: 4.4 - Number of ratings: 1043\n", "12) - To Kill a Mockingbird (1962) - Average Rating: 4.43 - Number of ratings: 928\n", "13) - Treasure of the Sierra Madre, The (1948) - Average Rating: 4.29 - Number of ratings: 453\n", "14) - Everest (1998) - Average Rating: 4.01 - Number of ratings: 167\n", "15) - Wrong Trousers, The (1993) - Average Rating: 4.51 - Number of ratings: 882\n", "16) - In the Heat of the Night (1967) - Average Rating: 4.13 - Number of ratings: 348\n", "17) - Terminator 2: Judgment Day (1991) - Average Rating: 4.06 - Number of ratings: 2649\n", "18) - Modern Times (1936) - Average Rating: 4.24 - Number of ratings: 305\n", "19) - City Lights (1931) - Average Rating: 4.39 - Number of ratings: 271\n", "20) - Terminator, The (1984) - Average Rating: 4.15 - Number of ratings: 2098\n" ] } ], "source": [ "print_reclist(cb20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 6. Combining recommended lists\n", "\n", "Finally, combining these three lists through interleaved ranking, prioritizing them in this order: CF-CB-MP:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1) - Song of Freedom (1936) - Average Rating: 5.0 - Number of ratings: 1\n", "2) - Great Escape, The (1963) - Average Rating: 4.38 - Number of ratings: 696\n", "3) - American Beauty (1999) - Average Rating: 4.32 - Number of ratings: 3428\n", "4) - Mamma Roma (1962) - Average Rating: 4.5 - Number of ratings: 2\n", "5) - Yojimbo (1961) - Average Rating: 4.4 - Number of ratings: 215\n", "6) - Star Wars: Episode IV - A New Hope (1977) - Average Rating: 4.45 - Number of ratings: 2991\n", "7) - Schlafes Bruder (Brother of Sleep) (1995) - Average Rating: 5.0 - Number of ratings: 1\n", "8) - North by Northwest (1959) - Average Rating: 4.38 - Number of ratings: 1315\n", "9) - Star Wars: Episode V - The Empire Strikes Back (1980) - Average Rating: 4.29 - Number of ratings: 2990\n", "10) - Hour of the Pig, The (1993) - Average Rating: 4.5 - Number of ratings: 2\n", "11) - Double Indemnity (1944) - Average Rating: 4.42 - Number of ratings: 551\n", "12) - Star Wars: Episode VI - Return of the Jedi (1983) - Average Rating: 4.02 - Number of ratings: 2883\n", "13) - Smashing Time (1967) - Average Rating: 5.0 - Number of ratings: 2\n", "14) - Stalag 17 (1953) - Average Rating: 4.23 - Number of ratings: 394\n", "15) - Saving Private Ryan (1998) - Average Rating: 4.34 - Number of ratings: 2653\n", "16) - Gate of Heavenly Peace, The (1995) - Average Rating: 5.0 - Number of ratings: 3\n", "17) - It Happened One Night (1934) - Average Rating: 4.28 - Number of ratings: 374\n", "18) - Raiders of the Lost Ark (1981) - Average Rating: 4.48 - Number of ratings: 2514\n", "19) - Apple, The (Sib) (1998) - Average Rating: 4.67 - Number of ratings: 9\n", "20) - Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954) - Average Rating: 4.56 - Number of ratings: 628\n" ] } ], "source": [ "def interleaved_ranking(lst_of_lists,n):\n", " final_list=list()\n", " while len(final_list)