{ "metadata": { "name": "", "signature": "sha256:385227659ff4511c03f3655dac3d2046d42294a9b8dd56a4494d79efe6ade2af" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "The Five-Line Recommender, Explained" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Building a recommender system is easy with GraphLab Create. Simply import graphlab, load data, create a recommender model, and start making recommendations. Let's walk through this line by line.\n", "\n", "
PROGRESS: Downloading http://s3.amazonaws.com/dato-datasets/movie_ratings/training_data.csv to /var/tmp/graphlab-zach/39733/000000.csv" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: Downloading http://s3.amazonaws.com/dato-datasets/movie_ratings/training_data.csv to /var/tmp/graphlab-zach/39733/000000.csv" ] }, { "html": [ "
PROGRESS: Finished parsing file http://s3.amazonaws.com/dato-datasets/movie_ratings/training_data.csv" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: Finished parsing file http://s3.amazonaws.com/dato-datasets/movie_ratings/training_data.csv" ] }, { "html": [ "
PROGRESS: Parsing completed. Parsed 100 lines in 0.109583 secs." ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: Parsing completed. Parsed 100 lines in 0.109583 secs." ] }, { "html": [ "
PROGRESS: Finished parsing file http://s3.amazonaws.com/dato-datasets/movie_ratings/training_data.csv" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: Finished parsing file http://s3.amazonaws.com/dato-datasets/movie_ratings/training_data.csv" ] }, { "html": [ "
PROGRESS: Parsing completed. Parsed 82068 lines in 0.101536 secs." ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: Parsing completed. Parsed 82068 lines in 0.101536 secs." ] }, { "html": [ "
user | \n", "movie | \n", "rating | \n", "
---|---|---|
Jacob Smith | \n", "Flirting with Disaster | \n", "4 | \n", "
Jacob Smith | \n", "Indecent Proposal | \n", "3 | \n", "
Jacob Smith | \n", "Runaway Bride | \n", "2 | \n", "
Jacob Smith | \n", "Swiss Family Robinson | \n", "1 | \n", "
Jacob Smith | \n", "The Mexican | \n", "2 | \n", "
Jacob Smith | \n", "Maid in Manhattan | \n", "4 | \n", "
Jacob Smith | \n", "A Charlie Brown Thanksgiving / The ... | \n",
" 3 | \n", "
Jacob Smith | \n", "Brazil | \n", "1 | \n", "
Jacob Smith | \n", "Forrest Gump | \n", "3 | \n", "
Jacob Smith | \n", "It Happened One Night | \n", "4 | \n", "
PROGRESS: Recsys training: model = ranking_factorization_recommender" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: Recsys training: model = ranking_factorization_recommender" ] }, { "html": [ "
PROGRESS: Preparing data set." ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: Preparing data set." ] }, { "html": [ "
PROGRESS: Data has 82068 observations with 334 users and 7714 items." ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: Data has 82068 observations with 334 users and 7714 items." ] }, { "html": [ "
PROGRESS: Data prepared in: 0.204213s" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: Data prepared in: 0.204213s" ] }, { "html": [ "
PROGRESS: Training ranking_factorization_recommender for recommendations." ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: Training ranking_factorization_recommender for recommendations." ] }, { "html": [ "
PROGRESS: +--------------------------------+--------------------------------------------------+----------+" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: +--------------------------------+--------------------------------------------------+----------+" ] }, { "html": [ "
PROGRESS: | Parameter | Description | Value |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | Parameter | Description | Value |" ] }, { "html": [ "
PROGRESS: +--------------------------------+--------------------------------------------------+----------+" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: +--------------------------------+--------------------------------------------------+----------+" ] }, { "html": [ "
PROGRESS: | num_factors | Factor Dimension | 32 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | num_factors | Factor Dimension | 32 |" ] }, { "html": [ "
PROGRESS: | regularization | L2 Regularization on Factors | 1e-09 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | regularization | L2 Regularization on Factors | 1e-09 |" ] }, { "html": [ "
PROGRESS: | solver | Solver used for training | sgd |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | solver | Solver used for training | sgd |" ] }, { "html": [ "
PROGRESS: | linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |" ] }, { "html": [ "
PROGRESS: | ranking_regularization | Rank-based Regularization Weight | 0.25 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | ranking_regularization | Rank-based Regularization Weight | 0.25 |" ] }, { "html": [ "
PROGRESS: | max_iterations | Maximum Number of Iterations | 25 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | max_iterations | Maximum Number of Iterations | 25 |" ] }, { "html": [ "
PROGRESS: +--------------------------------+--------------------------------------------------+----------+" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: +--------------------------------+--------------------------------------------------+----------+" ] }, { "html": [ "
PROGRESS: Optimizing model using SGD; tuning step size." ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: Optimizing model using SGD; tuning step size." ] }, { "html": [ "
PROGRESS: Using 10258 / 82068 points for tuning the step size." ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: Using 10258 / 82068 points for tuning the step size." ] }, { "html": [ "
PROGRESS: +---------+-------------------+------------------------------------------+" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: +---------+-------------------+------------------------------------------+" ] }, { "html": [ "
PROGRESS: | Attempt | Initial Step Size | Estimated Objective Value |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | Attempt | Initial Step Size | Estimated Objective Value |" ] }, { "html": [ "
PROGRESS: +---------+-------------------+------------------------------------------+" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: +---------+-------------------+------------------------------------------+" ] }, { "html": [ "
PROGRESS: | 0 | 25 | Not Viable |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | 0 | 25 | Not Viable |" ] }, { "html": [ "
PROGRESS: | 1 | 6.25 | Not Viable |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | 1 | 6.25 | Not Viable |" ] }, { "html": [ "
PROGRESS: | 2 | 1.5625 | Not Viable |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | 2 | 1.5625 | Not Viable |" ] }, { "html": [ "
PROGRESS: | 3 | 0.390625 | Not Viable |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | 3 | 0.390625 | Not Viable |" ] }, { "html": [ "
PROGRESS: | 4 | 0.0976562 | 1.61865 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | 4 | 0.0976562 | 1.61865 |" ] }, { "html": [ "
PROGRESS: | 5 | 0.0488281 | 1.66185 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | 5 | 0.0488281 | 1.66185 |" ] }, { "html": [ "
PROGRESS: | 6 | 0.0244141 | 1.72837 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | 6 | 0.0244141 | 1.72837 |" ] }, { "html": [ "
PROGRESS: | 7 | 0.012207 | 1.80785 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | 7 | 0.012207 | 1.80785 |" ] }, { "html": [ "
PROGRESS: +---------+-------------------+------------------------------------------+" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: +---------+-------------------+------------------------------------------+" ] }, { "html": [ "
PROGRESS: | Final | 0.0976562 | 1.61865 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | Final | 0.0976562 | 1.61865 |" ] }, { "html": [ "
PROGRESS: +---------+-------------------+------------------------------------------+" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: +---------+-------------------+------------------------------------------+" ] }, { "html": [ "
PROGRESS: Starting Optimization." ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: Starting Optimization." ] }, { "html": [ "
PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+" ] }, { "html": [ "
PROGRESS: | Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |" ] }, { "html": [ "
PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+" ] }, { "html": [ "
PROGRESS: | Initial | 339us | 2.40069 | 1.10654 | |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | Initial | 339us | 2.40069 | 1.10654 | |" ] }, { "html": [ "
PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+" ] }, { "html": [ "
PROGRESS: | 1 | 121.209ms | 2.01682 | 1.13516 | 0.0976562 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | 1 | 121.209ms | 2.01682 | 1.13516 | 0.0976562 |" ] }, { "html": [ "
PROGRESS: | 2 | 236.228ms | 1.76884 | 1.06319 | 0.0580668 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | 2 | 236.228ms | 1.76884 | 1.06319 | 0.0580668 |" ] }, { "html": [ "
PROGRESS: | 3 | 344.786ms | 1.55833 | 0.983183 | 0.042841 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | 3 | 344.786ms | 1.55833 | 0.983183 | 0.042841 |" ] }, { "html": [ "
PROGRESS: | 4 | 455.018ms | 1.36545 | 0.906899 | 0.0345267 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | 4 | 455.018ms | 1.36545 | 0.906899 | 0.0345267 |" ] }, { "html": [ "
PROGRESS: | 5 | 558.047ms | 1.19205 | 0.832933 | 0.029206 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | 5 | 558.047ms | 1.19205 | 0.832933 | 0.029206 |" ] }, { "html": [ "
PROGRESS: | 6 | 688.083ms | 1.04541 | 0.765651 | 0.0254734 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | 6 | 688.083ms | 1.04541 | 0.765651 | 0.0254734 |" ] }, { "html": [ "
PROGRESS: | 10 | 1.14s | 0.718906 | 0.605205 | 0.017366 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | 10 | 1.14s | 0.718906 | 0.605205 | 0.017366 |" ] }, { "html": [ "
PROGRESS: | 11 | 1.23s | 0.674778 | 0.583149 | 0.016168 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | 11 | 1.23s | 0.674778 | 0.583149 | 0.016168 |" ] }, { "html": [ "
PROGRESS: | 20 | 2.22s | 0.518888 | 0.497694 | 0.0103259 |" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: | 20 | 2.22s | 0.518888 | 0.497694 | 0.0103259 |" ] }, { "html": [ "
PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: +---------+--------------+-------------------+-----------------------+-------------+" ] }, { "html": [ "
PROGRESS: Optimization Complete: Maximum number of passes through the data reached." ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: Optimization Complete: Maximum number of passes through the data reached." ] }, { "html": [ "
PROGRESS: Computing final objective value and training RMSE." ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: Computing final objective value and training RMSE." ] }, { "html": [ "
PROGRESS: Final objective value: 0.448549" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: Final objective value: 0.448549" ] }, { "html": [ "
PROGRESS: Final training RMSE: 0.417582" ], "metadata": {}, "output_type": "display_data", "text": [ "PROGRESS: Final training RMSE: 0.417582" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Details (and the small devil therein):\n", "\n", "Under the hood, the type of recommender is chosen based on the provided data and whether the desired task is ranking (default) or rating prediction. The default recommender for this type of data and the default ranking task is a matrix factorization model, implemented on top of the disk-backed SFrame data structure. The default solver is stochastic gradient descent, and the recommender model used is the RankingFactorizationModel , which balances rating prediction with a ranking objective. The default `create()` function does not allow changes to the default parameters of a specific model, but it is just as easy to build a specific recommender with your own parameters using the appropriate model-specific `create()` function. \n", "\n", "
user | \n", "movie | \n", "score | \n", "rank | \n", "
---|---|---|---|
Jacob Smith | \n", "Sex and the City: Season 2 ... | \n",
" 5.11639140642 | \n", "1 | \n", "
Jacob Smith | \n", "Sex and the City: Season 1 ... | \n",
" 5.02684734857 | \n", "2 | \n", "
Jacob Smith | \n", "Sex and the City: Season 6: Part 2 ... | \n",
" 4.85449193514 | \n", "3 | \n", "
Jacob Smith | \n", "Sex and the City: Season 3 ... | \n",
" 4.67698882616 | \n", "4 | \n", "
Jacob Smith | \n", "Doctor Zhivago | \n", "4.6545002321 | \n", "5 | \n", "
Mason Smith | \n", "Mulholland Drive | \n", "6.07918380297 | \n", "1 | \n", "
Mason Smith | \n", "Rushmore | \n", "5.84550069368 | \n", "2 | \n", "
Mason Smith | \n", "The Sound of Music | \n", "5.73360227144 | \n", "3 | \n", "
Mason Smith | \n", "Napoleon Dynamite | \n", "5.48369811571 | \n", "4 | \n", "
Mason Smith | \n", "Six Feet Under: Season 1 | \n", "5.38490270174 | \n", "5 | \n", "