{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Splitting biased test data for recommender system\n", "** *\n", "*Note: if you are visualizing this notebook directly from GitHub, some mathematical symbols might display incorrectly or not display at all. This same notebook can be rendered from nbviewer by following [this link.](http://nbviewer.jupyter.org/github/david-cortes/datascienceprojects/blob/master/optimization/dataset_splitting.ipynb)*\n", "\n", "This project consists of splitting a dataset of feedback of users from products in order to later develop and evaluate recommendation algorithms. The final task (not developed in this example) consists of recommending an option within a product to users, and there is a dataset of feedback from different users about different products, like this (the option with the highest score within a product is recommended to the user):" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
| \n", " | UserId | \n", "ProductId | \n", "ProductCategory | \n", "ProductOption | \n", "UserFeedback | \n", "Score | \n", "
|---|---|---|---|---|---|---|
| 0 | \n", "1 | \n", "5 | \n", "4 | \n", "1 | \n", "0 | \n", "0.54 | \n", "
| 1 | \n", "1 | \n", "5 | \n", "4 | \n", "2 | \n", "1 | \n", "0.78 | \n", "
| 2 | \n", "1 | \n", "5 | \n", "4 | \n", "3 | \n", "0 | \n", "0.21 | \n", "