# Collaborative filtering

In [None]:
from fastai.gen_doc.nbdoc import *

This package contains all the necessary functions to quickly train a model for a collaborative filtering task. Let's start by importing all we'll need.

In [None]:
from fastai import *
from fastai.collab import * 

## Overview

Collaborative filtering is when you're tasked to predict how much a user is going to like a certain item. The fastai library contains a [`CollabFilteringDataset`](/collab.html#CollabFilteringDataset) class that will help you create datasets suitable for training, and a function `get_colab_learner` to build a simple model directly from a ratings table. Let's first see how we can get started before devling in the documentation.

For our example, we'll use a small subset of the [MovieLens](https://grouplens.org/datasets/movielens/) dataset. In there, we have to predict the rating a user gave a given movie (from 0 to 5). It comes in the form of a csv file where each line is the rating of a movie by a given person.

In [None]:
path = untar_data(URLs.ML_SAMPLE)
ratings = pd.read_csv(path/'ratings.csv')
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,73,1097,4.0,1255504951
1,561,924,3.5,1172695223
2,157,260,3.5,1291598691
3,358,1210,5.0,957481884
4,130,316,2.0,1138999234


We'll first turn the `userId` and `movieId` columns in category codes, so that we can replace them with their codes when it's time to feed them to an `Embedding` layer. This step would be even more important if our csv had names of users, or names of items in it. To do it, we wimply have to call a [`CollabDataBunch`](/collab.html#CollabDataBunch) factory method.

In [None]:
data = CollabDataBunch.from_df(ratings)

Now that this step is done, we can directly create a [`Learner`](/basic_train.html#Learner) object:

In [None]:
learn = collab_learner(data, n_factors=50, y_range=(0.,5.))

And then immediately begin training

In [None]:
learn.fit_one_cycle(5, 5e-3, wd=0.1)

Total time: 00:03
epoch  train_loss  valid_loss
1      2.354234    1.927426    (00:00)
2      1.089076    0.677427    (00:00)
3      0.729364    0.650618    (00:00)
4      0.626125    0.638089    (00:00)
5      0.561493    0.637897    (00:00)



In [None]:
show_doc(CollabDataBunch, doc_string=False)

<h2 id="CollabDataBunch"><code>class</code> <code>CollabDataBunch</code><a href="https://github.com/fastai/fastai/blob/master/fastai/collab.py#L40" class="source_link">[source]</a></h2>

> <code>CollabDataBunch</code>(`train_dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `valid_dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `test_dl`:`Optional`\[[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)\]=`None`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`None`, `tfms`:`Optional`\[`Collection`\[`Callable`\]\]=`None`, `path`:`PathOrStr`=`'.'`, `collate_fn`:`Callable`=`'data_collate'`) :: [`DataBunch`](/basic_data.html#DataBunch)

This is the basic class to buil a [`DataBunch`](/basic_data.html#DataBunch) suitable for colaborative filtering.

In [None]:
show_doc(CollabDataBunch.from_df, doc_string=False)

<h4 id="CollabDataBunch.from_df"><code>from_df</code><a href="https://github.com/fastai/fastai/blob/master/fastai/collab.py#L41" class="source_link">[source]</a></h4>

> <code>from_df</code>(`ratings`:`DataFrame`, `pct_val`:`float`=`0.2`, `user_name`:`Optional`\[`str`\]=`None`, `item_name`:`Optional`\[`str`\]=`None`, `rating_name`:`Optional`\[`str`\]=`None`, `test`:`DataFrame`=`None`, `seed`=`None`, `kwargs`)

Takes a `ratings` dataframe and splits it randomly for train and test following `pct_val` (unless it's None). `user_name`, `item_name` and `rating_name` give the names of the corresponding columns (defaults to the first, the second and the third column). Optionally a `test` dataframe can be passed an a `seed` for the separation between training and validation set. The `kwargs` will be passed to [`DataBunch.create`](/basic_data.html#DataBunch.create).

## Model and [`Learner`](/basic_train.html#Learner)

In [None]:
show_doc(EmbeddingDotBias, doc_string=False, title_level=3)

<h3 id="EmbeddingDotBias"><code>class</code> <code>EmbeddingDotBias</code><a href="https://github.com/fastai/fastai/blob/master/fastai/collab.py#L17" class="source_link">[source]</a></h3>

> <code>EmbeddingDotBias</code>(`n_factors`:`int`, `n_users`:`int`, `n_items`:`int`, `y_range`:`Point`=`None`) :: [`Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module)

Creates a simple model with `Embedding` weights and biases for `n_users` and `n_items`, with `n_factors` latent factors. Takes the dot product of the embeddings and adds the bias, then if `y_range` is specified, feed the result to a sigmoid rescaled to go from `y_range[0]` to `y_range[1]`. 

In [None]:
show_doc(collab_learner, doc_string=False)

<h4 id="collab_learner"><code>collab_learner</code><a href="https://github.com/fastai/fastai/blob/master/fastai/collab.py#L53" class="source_link">[source]</a></h4>

> <code>collab_learner</code>(`data`, `n_factors`:`int`=`None`, `use_nn`:`bool`=`False`, `metrics`=`None`, `y_range`:`Point`=`None`, `emb_szs`:`Dict`\[`str`, `int`\]=`None`, `kwargs`) → [`Learner`](/basic_train.html#Learner)

Creates a [`Learner`](/basic_train.html#Learner) object built from the data in `ratings`, `pct_val`, `user_name`, `item_name`, `rating_name` to [`CollabFilteringDataset`](/collab.html#CollabFilteringDataset). Optionally, creates another [`CollabFilteringDataset`](/collab.html#CollabFilteringDataset) for `test`. `kwargs` are fed to [`DataBunch.create`](/basic_data.html#DataBunch.create) with these datasets. The model is given by [`EmbeddingDotBias`](/collab.html#EmbeddingDotBias) with `n_factors`, `y_range` (the numbers of users and items will be inferred from the data).

## Links with the Data Block API

In [None]:
show_doc(CollabLine, doc_string=False, title_level=3)

<h3 id="CollabLine"><code>class</code> <code>CollabLine</code><a href="https://github.com/fastai/fastai/blob/master/fastai/collab.py#L10" class="source_link">[source]</a></h3>

> <code>CollabLine</code>(`cats`, `conts`, `classes`, `names`) :: [`TabularLine`](/tabular.data.html#TabularLine)

Subclass of [`TabularLine`](/tabular.data.html#TabularLine) for collaborative filtering.

In [None]:
show_doc(CollabList, title_level=3, doc_string=False)

<h3 id="CollabList"><code>class</code> <code>CollabList</code><a href="https://github.com/fastai/fastai/blob/master/fastai/collab.py#L15" class="source_link">[source]</a></h3>

> <code>CollabList</code>(`items`:`Iterator`, `cat_names`:`OptStrList`=`None`, `cont_names`:`OptStrList`=`None`, `procs`=`None`, `kwargs`) → `TabularList` :: [`TabularList`](/tabular.data.html#TabularList)

Subclass of [`TabularList`](/tabular.data.html#TabularList) for collaborative filtering.

## Undocumented Methods - Methods moved below this line will intentionally be hidden

In [None]:
show_doc(EmbeddingDotBias.forward)

<h4 id="EmbeddingDotBias.forward"><code>forward</code><a href="https://github.com/fastai/fastai/blob/master/fastai/collab.py#L26" class="source_link">[source]</a></h4>

> <code>forward</code>(`users`:`LongTensor`, `items`:`LongTensor`) → `Tensor`

Defines the computation performed at every call. Should be overridden by all subclasses.

.. note::
    Although the recipe for forward pass needs to be defined within
    this function, one should call the :class:`Module` instance afterwards
    instead of this since the former takes care of running the
    registered hooks while the latter silently ignores them. 

## New Methods - Please document or move to the undocumented section