{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"2022-01-13-explainable-bpr-lastfm.ipynb","provenance":[{"file_id":"https://github.com/recohut/nbs/blob/main/raw/P394476%20%7C%20Training%20Explainable%20BPR%20model%20on%20LastFM%20dataset.ipynb","timestamp":1644611811107}],"collapsed_sections":[],"authorship_tag":"ABX9TyPFXy/0hGgK9V5Zk/ft9ojK"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"},"accelerator":"GPU"},"cells":[{"cell_type":"markdown","source":["# Training Explainable BPR model on LastFM dataset"],"metadata":{"id":"_tLSbR0PWx9j"}},{"cell_type":"markdown","source":["## Executive summary\n","\n","| | |\n","| --- | --- |\n","| Problem | BPR is black-box model and vulnerable to Missing-not-at-random (MNaR) based exposure bias. |\n","| Solution | An explainable loss function and a corresponding Matrix Factorization-based model called Explainable Bayesian Personalized Ranking (EBPR) that generates recommendations along with item-based explanations. |\n","| Dataset | ML-100k, ML-1m, LastFM-200k, Yahoo R3. |\n","| Preprocessing | Interactions were converted into binary interactions, regardless of their values. Then we filtered out users with less than 10 interactions to ensure enough training and evaluation samples for every user and reduce the data sparsity. We follow the standard Leave-One-Out (LOO) procedure that consists of considering the latest interaction of each user as a test item and comparing it to 100 randomly sampled negative items. In the training, we sample, at every epoch, one negative item for every positive user-item interaction. |\n","| Metrics | NDCG@K, HR@K, Mean Explainability Precision (MEP@K), WMEP@K, Avg_Pop@K, EFD@K, and Div@K. |\n","| Hyperparams | NUM_CONFIGURATIONS, NUM_REPS, NUM_EPOCH, WEIGHT_DECAY, NEIGHBORHOOD, TOP_K, LR, OPTIMIZER, SGD_MOMENTUM, RMSPROP_ALPHA, RMSPROP_MOMENTUM, LOO_EVAL, TEST_RATE, USE_CUDA, DEVICE_ID, SAVE_MODELS, SAVE_RESULTS, INT_PER_ITEM. |\n","| Models | BPR, UBPR, EBPR, pUEBPR, UEBPR. |\n","| Cluster | Python 3.7+, PyTorch |\n","| Tags | `Fairness`, `Explainability`, `ExposureBias`, `ExplainableBPR` |\n","| Credits | Khalil Damak |"],"metadata":{"id":"LLaRmDxUWx50"}},{"cell_type":"markdown","source":["## Process flow\n","\n","![](https://github.com/RecoHut-Stanzas/S241566/raw/main/images/process_flow.svg)"],"metadata":{"id":"CWxQF9gwMGmS"}},{"cell_type":"markdown","source":["## Setup"],"metadata":{"id":"B_IXvLmOMGiw"}},{"cell_type":"code","source":["!pip install -q ml_metrics\n","!pip install -q pyprind"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"cPAreB--NPMR","executionInfo":{"status":"ok","timestamp":1639221891836,"user_tz":-330,"elapsed":8796,"user":{"displayName":"Sparsh Agarwal","photoUrl":"https://lh3.googleusercontent.com/a/default-user=s64","userId":"13037694610922482904"}},"outputId":"ffca50db-d08d-4fee-9164-de04b5a62959"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":[" Building wheel for ml-metrics (setup.py) ... \u001b[?25l\u001b[?25hdone\n"]}]},{"cell_type":"code","source":["import torch\n","import random\n","import numpy as np\n","import pandas as pd\n","from copy import deepcopy\n","from torch.utils.data import DataLoader, Dataset\n","from sklearn.model_selection import train_test_split\n","from sklearn.metrics.pairwise import pairwise_distances\n","from sklearn.metrics.pairwise import cosine_similarity\n","\n","import math\n","from ml_metrics import mapk\n","from itertools import combinations\n","import sys\n","import pyprind\n","import argparse"],"metadata":{"id":"530fE0WPMCMX"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["random.seed(1)"],"metadata":{"id":"X45wVmpOMJKG"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["!mkdir -p Output/checkpoints"],"metadata":{"id":"YhP7jqejMJHv"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## Dataset"],"metadata":{"id":"dh1WUwE8MJEb"}},{"cell_type":"markdown","source":["Download LastFM raw dataset"],"metadata":{"id":"uLoOJoePMMYS"}},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"oNxbyTO4_iC5","executionInfo":{"status":"ok","timestamp":1639221898664,"user_tz":-330,"elapsed":732,"user":{"displayName":"Sparsh Agarwal","photoUrl":"https://lh3.googleusercontent.com/a/default-user=s64","userId":"13037694610922482904"}},"outputId":"8df28659-fb21-474e-f3ad-6c6872c55e26"},"outputs":[{"output_type":"stream","name":"stdout","text":["Cloning into 'data'...\n","remote: Enumerating objects: 16, done.\u001b[K\n","remote: Counting objects: 100% (16/16), done.\u001b[K\n","remote: Compressing objects: 100% (13/13), done.\u001b[K\n","remote: Total 16 (delta 2), reused 12 (delta 2), pack-reused 0\u001b[K\n","Unpacking objects: 100% (16/16), done.\n"]}],"source":["!mkdir -p Data/lastfm-2k\n","!git clone --branch v1 https://github.com/RecoHut-Datasets/lastfm.git Data/lastfm-2k"]},{"cell_type":"code","source":["def read_data(dataset_name, int_per_item):\n"," \"\"\"Read dataset\"\"\"\n"," dataset = pd.DataFrame()\n"," if dataset_name == 'ml-100k':\n"," # Load Movielens 100K Data\n"," data_dir = 'Data/ml-100k/u.data'\n"," dataset = pd.read_csv(data_dir, sep='\\t', header=None, names=['uid', 'mid', 'rating', 'timestamp'],\n"," engine='python')\n"," elif dataset_name == 'ml-1m':\n"," # Load Movielens 1M Data\n"," data_dir = 'Data/ml-1m/ratings.dat'\n"," dataset = pd.read_csv(data_dir, sep='::', header=None, names=['uid', 'mid', 'rating', 'timestamp'], engine='python')\n","\n"," elif dataset_name == 'lastfm-2k':\n"," # Load Last.FM 2K Data\n"," data_dir = 'Data/lastfm-2k/user_artists.dat'\n"," dataset = pd.read_csv(data_dir, sep='\\t', header=0, names=['uid', 'mid', 'rating'], engine='python')\n"," dataset['timestamp'] = [1 for i in range(len(dataset))]\n"," # Filtering items with more than int_per_item interactions\n"," item_count = dataset[['uid', 'mid']].groupby('mid').count()['uid'].rename('count').reset_index()\n"," dataset = dataset.merge(item_count, how='left', on='mid')\n"," dataset = dataset.loc[dataset['count'] >= int_per_item][['uid', 'mid', 'rating', 'timestamp']]\n"," # Filtering users with more than 10 interactions\n"," user_count = dataset[['uid', 'mid']].groupby('uid').count()['mid'].rename('count').reset_index()\n"," dataset = dataset.merge(user_count, how='left', on='uid')\n"," dataset = dataset.loc[dataset['count'] >= 10][['uid', 'mid', 'rating', 'timestamp']]\n","\n"," elif dataset_name == 'yahoo-r3':\n"," # Load Yahoo! R3 Data\n"," data_dir = 'Data/yahoo-r3/ydata-ymusic-rating-study-v1_0-train.txt'\n"," dataset = pd.read_csv(data_dir, sep='\\t', header=None, names=['uid', 'mid', 'rating'], engine='python')\n"," dataset['timestamp'] = [1 for i in range(len(dataset))]\n","\n"," elif dataset_name == 'yahoo-r3-unbiased':\n"," # Load Yahoo! R3 Data\n"," data_dir = 'Data/yahoo-r3/ydata-ymusic-rating-study-v1_0-train.txt'\n"," test_data_dir = 'Data/yahoo-r3/ydata-ymusic-rating-study-v1_0-test.txt'\n"," dataset = pd.read_csv(data_dir, sep='\\t', header=None, names=['uid', 'mid', 'rating'], engine='python')\n"," dataset['test'] = [0 for i in range(len(dataset))]\n"," test_dataset = pd.read_csv(test_data_dir, sep='\\t', header=None, names=['uid', 'mid', 'rating'], engine='python')\n"," test_dataset['test'] = [1 for i in range(len(test_dataset))]\n"," dataset = pd.concat([dataset, test_dataset])\n"," dataset['timestamp'] = [1 for i in range(len(dataset))]\n"," # Reindex data\n"," user_id = dataset[['uid']].drop_duplicates().reindex()\n"," user_id['userId'] = np.arange(len(user_id))\n"," dataset = pd.merge(dataset, user_id, on=['uid'], how='left')\n"," item_id = dataset[['mid']].drop_duplicates()\n"," item_id['itemId'] = np.arange(len(item_id))\n"," dataset = pd.merge(dataset, item_id, on=['mid'], how='left')\n"," if 'test' in dataset:\n"," dataset = dataset[['userId', 'itemId', 'rating', 'timestamp', 'test']]\n"," else:\n"," dataset = dataset[['userId', 'itemId', 'rating', 'timestamp']]\n","\n"," return dataset"],"metadata":{"id":"QLkKHjwOMKKk"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["class data_loader(Dataset):\n"," \"\"\"Convert user, item, negative and target Tensors into Pytorch Dataset\"\"\"\n","\n"," def __init__(self, user_tensor, positive_item_tensor, negative_item_tensor, target_tensor):\n"," self.user_tensor = user_tensor\n"," self.positive_item_tensor = positive_item_tensor\n"," self.negative_item_tensor = negative_item_tensor\n"," self.target_tensor = target_tensor\n","\n"," def __getitem__(self, index):\n"," return self.user_tensor[index], self.positive_item_tensor[index], self.negative_item_tensor[index], self.target_tensor[index]\n","\n"," def __len__(self):\n"," return self.user_tensor.size(0)"],"metadata":{"id":"zI-Bd881MKHr"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["class data_loader_implicit(Dataset):\n"," \"\"\"Convert user and item Tensors into Pytorch Dataset\"\"\"\n","\n"," def __init__(self, user_tensor, item_tensor):\n"," self.user_tensor = user_tensor\n"," self.item_tensor = item_tensor\n","\n"," def __getitem__(self, index):\n"," return self.user_tensor[index], self.item_tensor[index]\n","\n"," def __len__(self):\n"," return self.user_tensor.size(0)"],"metadata":{"id":"m_uKfxmHMKEV"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["class data_loader_test_explicit(Dataset):\n"," \"\"\"Convert user, item and target Tensors into Pytorch Dataset\"\"\"\n","\n"," def __init__(self, user_tensor, item_tensor, target_tensor):\n"," self.user_tensor = user_tensor\n"," self.item_tensor = item_tensor\n"," self.target_tensor = target_tensor\n","\n"," def __getitem__(self, index):\n"," return self.user_tensor[index], self.item_tensor[index], self.target_tensor[index]\n","\n"," def __len__(self):\n"," return self.user_tensor.size(0)"],"metadata":{"id":"PxzQ_HE7Mrkk"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["class data_loader_negatives(Dataset):\n"," \"\"\"Convert user and item negative Tensors into Pytorch Dataset\"\"\"\n","\n"," def __init__(self, user_neg_tensor, item_neg_tensor):\n"," self.user_neg_tensor = user_neg_tensor\n"," self.item_neg_tensor = item_neg_tensor\n","\n"," def __getitem__(self, index):\n"," return self.user_neg_tensor[index], self.item_neg_tensor[index]\n","\n"," def __len__(self):\n"," return self.user_neg_tensor.size(0)"],"metadata":{"id":"A6GzzXxvMtbC"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["class SampleGenerator(object):\n"," \"\"\"Construct dataset\"\"\"\n","\n"," def __init__(self, ratings, config, split_val):\n"," \"\"\"\n"," args:\n"," ratings: pd.DataFrame containing 4 columns = ['userId', 'itemId', 'rating', 'timestamp']\n"," config: dictionary containing the configuration hyperparameters\n"," split_val: boolean that takes True if we are using a validation set\n"," \"\"\"\n"," assert 'userId' in ratings.columns\n"," assert 'itemId' in ratings.columns\n"," assert 'rating' in ratings.columns\n","\n"," self.config = config\n"," self.ratings = ratings\n"," self.split_val = split_val\n"," self.preprocess_ratings = self._binarize(ratings)\n"," self.user_pool = set(self.ratings['userId'].unique())\n"," self.item_pool = set(self.ratings['itemId'].unique())\n"," # create negative item samples\n"," self.negatives = self._sample_negative(ratings, self.split_val)\n"," if self.config['loo_eval']:\n"," if self.split_val:\n"," self.train_ratings, self.val_ratings = self._split_loo(self.preprocess_ratings, split_val=True)\n"," else:\n"," self.train_ratings, self.test_ratings = self._split_loo(self.preprocess_ratings, split_val=False)\n"," else:\n"," self.test_rate = self.config['test_rate']\n"," if self.split_val:\n"," self.train_ratings, self.val_ratings = self.train_test_split_random(self.ratings, split_val=True)\n"," else:\n"," self.train_ratings, self.test_ratings = self.train_test_split_random(self.ratings, split_val=False)\n","\n"," def _binarize(self, ratings):\n"," \"\"\"binarize into 0 or 1 for imlicit feedback\"\"\"\n"," ratings = deepcopy(ratings)\n"," ratings['rating'] = 1.0\n"," return ratings\n","\n"," def train_test_split_random(self, ratings, split_val):\n"," \"\"\"Random train/test split\"\"\"\n"," if 'test' in list(ratings):\n"," test = ratings[ratings['test'] == 1]\n"," train = ratings[ratings['test'] == 0]\n"," else:\n"," train, test = train_test_split(ratings, test_size=self.test_rate)\n"," if split_val:\n"," train, val = train_test_split(train, test_size=self.test_rate / (1 - self.test_rate))\n"," return train[['userId', 'itemId', 'rating']], val[['userId', 'itemId', 'rating']]\n"," else:\n"," return train[['userId', 'itemId', 'rating']], test[['userId', 'itemId', 'rating']]\n","\n"," def _split_loo(self, ratings, split_val):\n"," \"\"\"leave-one-out train/test split\"\"\"\n"," if 'test' in list(ratings):\n"," test = ratings[ratings['test'] == 1]\n"," ratings = ratings[ratings['test'] == 0]\n"," if split_val:\n"," ratings['rank_latest'] = ratings.groupby(['userId'])['timestamp'].rank(method='first', ascending=False)\n"," val = ratings[ratings['rank_latest'] == 1]\n"," train = ratings[ratings['rank_latest'] > 1]\n"," return train[['userId', 'itemId', 'rating']], val[['userId', 'itemId', 'rating']]\n"," return ratings[['userId', 'itemId', 'rating']], test[['userId', 'itemId', 'rating']]\n"," ratings['rank_latest'] = ratings.groupby(['userId'])['timestamp'].rank(method='first', ascending=False)\n"," test = ratings[ratings['rank_latest'] == 1]\n"," if split_val:\n"," val = ratings[ratings['rank_latest'] == 2]\n"," train = ratings[ratings['rank_latest'] > 2]\n"," assert train['userId'].nunique() == test['userId'].nunique() == val['userId'].nunique()\n"," return train[['userId', 'itemId', 'rating']], val[['userId', 'itemId', 'rating']]\n"," train = ratings[ratings['rank_latest'] > 1]\n"," assert train['userId'].nunique() == test['userId'].nunique()\n"," return train[['userId', 'itemId', 'rating']], test[['userId', 'itemId', 'rating']]\n","\n"," def _sample_negative(self, ratings, split_val):\n"," \"\"\"return all negative items & 100 sampled negative test items & 100 sampled negative val items\"\"\"\n"," interact_status = ratings.groupby('userId')['itemId'].apply(set).reset_index().rename(\n"," columns={'itemId': 'interacted_items'})\n"," interact_status['negative_items'] = interact_status['interacted_items'].apply(lambda x: self.item_pool - x)\n"," interact_status['test_negative_samples'] = interact_status['negative_items'].apply(lambda x: random.sample(x, 100))\n"," interact_status['negative_items'] = interact_status.apply(lambda x: (x.negative_items - set(x.test_negative_samples)), axis=1)\n"," if split_val:\n"," interact_status['val_negative_samples'] = interact_status['negative_items'].apply(lambda x: random.sample(x, 100))\n"," interact_status['negative_items'] = interact_status.apply(lambda x: (x.negative_items - set(x.val_negative_samples)), axis=1)\n"," return interact_status[['userId', 'negative_items', 'test_negative_samples', 'val_negative_samples']]\n"," else:\n"," return interact_status[['userId', 'negative_items', 'test_negative_samples']]\n","\n"," def train_data_loader(self, batch_size):\n"," \"\"\"instance train loader for one training epoch\"\"\"\n"," train_ratings = pd.merge(self.train_ratings, self.negatives[['userId', 'negative_items']], on='userId')\n"," users = [int(x) for x in train_ratings['userId']]\n"," items = [int(x) for x in train_ratings['itemId']]\n"," ratings = [float(x) for x in train_ratings['rating']]\n"," neg_items = [random.choice(list(neg_list)) for neg_list in train_ratings['negative_items']]\n"," dataset = data_loader(user_tensor=torch.LongTensor(users),\n"," positive_item_tensor=torch.LongTensor(items),\n"," negative_item_tensor=torch.LongTensor(neg_items),\n"," target_tensor=torch.FloatTensor(ratings))\n"," return DataLoader(dataset, batch_size=batch_size, shuffle=True)\n","\n"," def test_data_loader(self, batch_size):\n"," \"\"\"create evaluation data\"\"\"\n"," if self.config['loo_eval']:\n"," test_ratings = pd.merge(self.test_ratings, self.negatives[['userId', 'test_negative_samples']], on='userId')\n"," test_users, test_items, negative_users, negative_items = [], [], [], []\n"," for row in test_ratings.itertuples():\n"," test_users.append(int(row.userId))\n"," test_items.append(int(row.itemId))\n"," for i in range(len(row.test_negative_samples)):\n"," negative_users.append(int(row.userId))\n"," negative_items.append(int(row.test_negative_samples[i]))\n"," dataset = data_loader_implicit(user_tensor=torch.LongTensor(test_users),\n"," item_tensor=torch.LongTensor(test_items))\n"," dataset_negatives = data_loader_negatives(user_neg_tensor=torch.LongTensor(negative_users),\n"," item_neg_tensor=torch.LongTensor(negative_items))\n"," return [DataLoader(dataset, batch_size=batch_size, shuffle=False), DataLoader(dataset_negatives, batch_size=batch_size, shuffle=False)]\n"," else:\n"," test_ratings = self.test_ratings\n"," test_users = [int(x) for x in test_ratings['userId']]\n"," test_items = [int(x) for x in test_ratings['itemId']]\n"," test_ratings = [float(x) for x in test_ratings['rating']]\n"," dataset = data_loader_test_explicit(user_tensor=torch.LongTensor(test_users),\n"," item_tensor=torch.LongTensor(test_items),\n"," target_tensor=torch.FloatTensor(test_ratings))\n"," return DataLoader(dataset, batch_size=batch_size, shuffle=False)\n","\n"," def val_data_loader(self, batch_size):\n"," \"\"\"create validation data\"\"\"\n"," if self.config['loo_eval']:\n"," val_ratings = pd.merge(self.val_ratings, self.negatives[['userId', 'val_negative_samples']], on='userId')\n"," val_users, val_items, negative_users, negative_items = [], [], [], []\n"," for row in val_ratings.itertuples():\n"," val_users.append(int(row.userId))\n"," val_items.append(int(row.itemId))\n"," for i in range(len(row.val_negative_samples)):\n"," negative_users.append(int(row.userId))\n"," negative_items.append(int(row.val_negative_samples[i]))\n"," dataset = data_loader_implicit(user_tensor=torch.LongTensor(val_users),\n"," item_tensor=torch.LongTensor(val_items))\n"," dataset_negatives = data_loader_negatives(user_neg_tensor=torch.LongTensor(negative_users),\n"," item_neg_tensor=torch.LongTensor(negative_items))\n"," return [DataLoader(dataset, batch_size=batch_size, shuffle=False), DataLoader(dataset_negatives, batch_size=batch_size, shuffle=False)]\n"," else:\n"," val_ratings = self.val_ratings\n"," val_users = [int(x) for x in val_ratings['userId']]\n"," val_items = [int(x) for x in val_ratings['itemId']]\n"," val_ratings = [float(x) for x in val_ratings['rating']]\n"," dataset = data_loader_test_explicit(user_tensor=torch.LongTensor(val_users),\n"," item_tensor=torch.LongTensor(val_items),\n"," target_tensor=torch.FloatTensor(val_ratings))\n"," return DataLoader(dataset, batch_size=batch_size, shuffle=False)\n","\n"," def create_explainability_matrix(self, include_test=False):\n"," \"\"\"create explainability matrix\"\"\"\n"," if not include_test:\n"," print('Creating explainability matrix...')\n"," interaction_matrix = pd.crosstab(self.train_ratings.userId, self.train_ratings.itemId)\n"," missing_columns = list(set(range(self.config['num_items'])) - set(list(interaction_matrix)))\n"," missing_rows = list(set(range(self.config['num_users'])) - set(interaction_matrix.index))\n"," for missing_column in missing_columns:\n"," interaction_matrix[missing_column] = [0] * len(interaction_matrix)\n"," for missing_row in missing_rows:\n"," interaction_matrix.loc[missing_row] = [0] * self.config['num_items']\n"," interaction_matrix = np.array(interaction_matrix[list(range(self.config['num_items']))].sort_index())\n"," elif not self.split_val:\n"," print('Creating test explainability matrix...')\n"," interaction_matrix = np.array(pd.crosstab(self.preprocess_ratings.userId, self.preprocess_ratings.itemId)[\n"," list(range(self.config['num_items']))].sort_index())\n"," else:\n"," print('Creating val explainability matrix...')\n"," interaction_matrix = pd.crosstab(self.train_ratings.userId.append(self.val_ratings.userId), self.train_ratings.itemId.append(self.val_ratings.itemId))\n"," missing_columns = list(set(range(self.config['num_items'])) - set(list(interaction_matrix)))\n"," missing_rows = list(set(range(self.config['num_users'])) - set(interaction_matrix.index))\n"," for missing_column in missing_columns:\n"," interaction_matrix[missing_column] = [0] * len(interaction_matrix)\n"," for missing_row in missing_rows:\n"," interaction_matrix.loc[missing_row] = [0] * self.config['num_items']\n"," interaction_matrix = np.array(interaction_matrix[list(range(self.config['num_items']))].sort_index())\n"," #item_similarity_matrix = 1 - pairwise_distances(interaction_matrix.T, metric = \"hamming\")\n"," item_similarity_matrix = cosine_similarity(interaction_matrix.T)\n"," np.fill_diagonal(item_similarity_matrix, 0)\n"," neighborhood = [np.argpartition(row, - self.config['neighborhood'])[- self.config['neighborhood']:]\n"," for row in item_similarity_matrix]\n"," explainability_matrix = np.array([[sum([interaction_matrix[user, neighbor] for neighbor in neighborhood[item]])\n"," for item in range(self.config['num_items'])] for user in\n"," range(self.config['num_users'])]) / self.config['neighborhood']\n"," #explainability_matrix[explainability_matrix < 0.1] = 0\n"," #explainability_matrix = explainability_matrix + self.config['epsilon']\n"," return explainability_matrix\n","\n"," def create_popularity_vector(self, include_test=False):\n"," \"\"\"create popularity vector\"\"\"\n"," if not include_test:\n"," print('Creating popularity vector...')\n"," interaction_matrix = pd.crosstab(self.train_ratings.userId, self.train_ratings.itemId)\n"," missing_columns = list(set(range(self.config['num_items'])) - set(list(interaction_matrix)))\n"," missing_rows = list(set(range(self.config['num_users'])) - set(interaction_matrix.index))\n"," for missing_column in missing_columns:\n"," interaction_matrix[missing_column] = [0] * len(interaction_matrix)\n"," for missing_row in missing_rows:\n"," interaction_matrix.loc[missing_row] = [0] * self.config['num_items']\n"," interaction_matrix = np.array(interaction_matrix[list(range(self.config['num_items']))].sort_index())\n"," elif not self.split_val:\n"," print('Creating test popularity vector...')\n"," interaction_matrix = np.array(pd.crosstab(self.preprocess_ratings.userId, self.preprocess_ratings.itemId)[\n"," list(range(self.config['num_items']))].sort_index())\n"," else:\n"," print('Creating val popularity vector...')\n"," interaction_matrix = pd.crosstab(self.train_ratings.userId.append(self.val_ratings.userId),\n"," self.train_ratings.itemId.append(self.val_ratings.itemId))\n"," missing_columns = list(set(range(self.config['num_items'])) - set(list(interaction_matrix)))\n"," missing_rows = list(set(range(self.config['num_users'])) - set(interaction_matrix.index))\n"," for missing_column in missing_columns:\n"," interaction_matrix[missing_column] = [0] * len(interaction_matrix)\n"," for missing_row in missing_rows:\n"," interaction_matrix.loc[missing_row] = [0] * self.config['num_items']\n"," interaction_matrix = np.array(interaction_matrix[list(range(self.config['num_items']))].sort_index())\n"," popularity_vector = np.sum(interaction_matrix, axis=0)\n"," popularity_vector = (popularity_vector / max(popularity_vector)) ** 0.5\n"," return popularity_vector\n","\n"," def create_neighborhood(self, include_test=False):\n"," \"\"\"Determine item neighbors\"\"\"\n"," if not include_test:\n"," print('Determining item neighborhoods...')\n"," interaction_matrix = pd.crosstab(self.train_ratings.userId, self.train_ratings.itemId)\n"," missing_columns = list(set(range(self.config['num_items'])) - set(list(interaction_matrix)))\n"," missing_rows = list(set(range(self.config['num_users'])) - set(interaction_matrix.index))\n"," for missing_column in missing_columns:\n"," interaction_matrix[missing_column] = [0] * len(interaction_matrix)\n"," for missing_row in missing_rows:\n"," interaction_matrix.loc[missing_row] = [0] * self.config['num_items']\n"," interaction_matrix = np.array(interaction_matrix[list(range(self.config['num_items']))].sort_index())\n"," elif not self.split_val:\n"," print('Determining test item neighborhoods...')\n"," interaction_matrix = np.array(pd.crosstab(self.preprocess_ratings.userId, self.preprocess_ratings.itemId)[\n"," list(range(self.config['num_items']))].sort_index())\n"," else:\n"," print('Determining val item neighborhoods...')\n"," interaction_matrix = pd.crosstab(self.train_ratings.userId.append(self.val_ratings.userId),\n"," self.train_ratings.itemId.append(self.val_ratings.itemId))\n"," missing_columns = list(set(range(self.config['num_items'])) - set(list(interaction_matrix)))\n"," missing_rows = list(set(range(self.config['num_users'])) - set(interaction_matrix.index))\n"," for missing_column in missing_columns:\n"," interaction_matrix[missing_column] = [0] * len(interaction_matrix)\n"," for missing_row in missing_rows:\n"," interaction_matrix.loc[missing_row] = [0] * self.config['num_items']\n"," interaction_matrix = np.array(interaction_matrix[list(range(self.config['num_items']))].sort_index())\n"," item_similarity_matrix = cosine_similarity(interaction_matrix.T)\n"," np.fill_diagonal(item_similarity_matrix, 0)\n"," neighborhood = np.array([np.argpartition(row, - self.config['neighborhood'])[- self.config['neighborhood']:]\n"," for row in item_similarity_matrix])\n"," return neighborhood, item_similarity_matrix"],"metadata":{"id":"-9838B6oMvOu"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## Utils"],"metadata":{"id":"SOVjqzQzM0dJ"}},{"cell_type":"code","source":["# Checkpoints\n","def save_checkpoint(model, model_dir):\n"," torch.save(model.state_dict(), model_dir)\n","\n","\n","def resume_checkpoint(model, model_dir, device_id):\n"," state_dict = torch.load(model_dir,\n"," map_location=lambda storage, loc: storage.cuda(device=device_id)) # ensure all storage are on gpu\n"," model.load_state_dict(state_dict)\n","\n","\n","# Hyper params\n","def use_cuda(enabled, device_id=0):\n"," if enabled:\n"," assert torch.cuda.is_available(), 'CUDA is not available'\n"," torch.cuda.set_device(device_id)\n","\n","\n","def use_optimizer(network, params):\n"," if params['optimizer'] == 'sgd':\n"," optimizer = torch.optim.SGD(network.parameters(),\n"," lr=params['lr'],\n"," momentum=params['sgd_momentum'],\n"," weight_decay=params['weight_decay'])\n"," elif params['optimizer'] == 'adam':\n"," optimizer = torch.optim.Adam(network.parameters(),\n"," lr=params['lr'],\n"," weight_decay=params['weight_decay'])\n"," elif params['optimizer'] == 'rmsprop':\n"," optimizer = torch.optim.RMSprop(network.parameters(),\n"," lr=params['lr'],\n"," alpha=params['rmsprop_alpha'],\n"," momentum=params['rmsprop_momentum'])\n"," return optimizer"],"metadata":{"id":"ZcBBxGO_NBJL"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## Metrics"],"metadata":{"id":"zkaQyeVjM-Pn"}},{"cell_type":"code","source":["class MetronAtK(object):\n"," def __init__(self, top_k, loo_eval):\n"," self._top_k = top_k\n"," self.loo_eval = loo_eval\n"," self._subjects = None # Subjects which we ran evaluation on\n","\n"," @property\n"," def top_k(self):\n"," return self._top_k\n","\n"," @top_k.setter\n"," def top_k(self, top_k):\n"," self._top_k = top_k\n","\n"," @property\n"," def subjects(self):\n"," return self._subjects\n","\n"," @subjects.setter\n"," def subjects(self, subjects):\n"," assert isinstance(subjects, list)\n"," if self.loo_eval == True:\n"," test_users, test_items, test_scores = subjects[0], subjects[1], subjects[2]\n"," neg_users, neg_items, neg_scores = subjects[3], subjects[4], subjects[5]\n"," # the golden set\n"," test = pd.DataFrame({'user': test_users,\n"," 'test_item': test_items,\n"," 'test_score': test_scores})\n"," # the full set\n"," full = pd.DataFrame({'user': neg_users + test_users,\n"," 'item': neg_items + test_items,\n"," 'score': neg_scores + test_scores})\n"," full = pd.merge(full, test, on=['user'], how='left')\n"," # rank the items according to the scores for each user\n"," full['rank'] = full.groupby('user')['score'].rank(method='first', ascending=False)\n"," full.sort_values(['user', 'rank'], inplace=True)\n"," self._subjects = full\n"," else:\n"," test_users, test_items, test_true, test_output = subjects[0], subjects[1], subjects[2], subjects[3]\n"," # the golden set\n"," full = pd.DataFrame({'user': test_users,\n"," 'test_item': test_items,\n"," 'test_true': test_true,\n"," 'test_output': test_output})\n","\n"," # rank the items according to the scores for each user\n"," full['rank'] = full.groupby('user')['test_output'].rank(method='first', ascending=False)\n"," full['rank_true'] = full.groupby('user')['test_true'].rank(method='first', ascending=False)\n"," full.sort_values(['user', 'rank'], inplace=True)\n"," self._subjects = full\n","\n"," def cal_ndcg(self):\n"," \"\"\"NDCG@K for explicit evaluation\"\"\"\n"," full, top_k = self._subjects, self._top_k\n"," topp_k = full[full['rank_true'] <= top_k].copy()\n"," topp_k['idcg_unit'] = topp_k['rank_true'].apply(\n"," lambda x: math.log(2) / math.log(1 + x)) # the rank starts from 1\n"," topp_k['idcg'] = topp_k.groupby(['user'])['idcg_unit'].transform('sum')\n","\n"," test_in_top_k = topp_k[topp_k['rank'] <= top_k].copy()\n"," test_in_top_k['dcg_unit'] = test_in_top_k['rank'].apply(\n"," lambda x: math.log(2) / math.log(1 + x)) # the rank starts from 1\n"," test_in_top_k['dcg'] = test_in_top_k.groupby(['user'])['dcg_unit'].transform('sum')\n"," test_in_top_k['ndcg'] = test_in_top_k['dcg'] / topp_k['idcg']\n"," ndcg = np.sum(test_in_top_k.groupby(['user'])['ndcg'].max()) / len(full['user'].unique())\n"," del (topp_k, test_in_top_k)\n"," return ndcg\n","\n"," def cal_map_at_k(self):\n"," \"\"\"MAP@K for explicit evaluation\"\"\"\n"," full, top_k = self._subjects, self._top_k\n"," users = list(dict.fromkeys(list(full['user'])))\n"," actual = [list(full[(full['user'] == user) & (full['rank_true'] <= top_k)]['test_item']) for user in users]\n"," predicted = [list(full[(full['user'] == user) & (full['rank'] <= top_k)]['test_item']) for user in users]\n"," return mapk(actual, predicted, k=top_k)\n","\n"," def cal_hit_ratio_loo(self):\n"," \"\"\"HR@K for Leave-One-Out evaluation\"\"\"\n"," full, top_k = self._subjects, self._top_k\n"," top_k = full[full['rank'] <= top_k]\n"," test_in_top_k = top_k[top_k['test_item'] == top_k['item']] # golden items hit in the top_K items\n"," return len(test_in_top_k) * 1.0 / full['user'].nunique()\n","\n"," def cal_ndcg_loo(self):\n"," \"\"\"NDCG@K for Leave-One-Out evaluation\"\"\"\n"," full, top_k = self._subjects, self._top_k\n"," top_k = full[full['rank'] <= top_k]\n"," test_in_top_k = top_k[top_k['test_item'] == top_k['item']]\n"," test_in_top_k['ndcg'] = test_in_top_k['rank'].apply(\n"," lambda x: math.log(2) / math.log(1 + x)) # the rank starts from 1\n"," return test_in_top_k['ndcg'].sum() * 1.0 / full['user'].nunique()\n","\n"," def cal_mep(self, explainability_matrix, theta):\n"," \"\"\"Mean Explainability Precision at cutoff top_k and threshold theta\"\"\"\n"," full, top_k = self._subjects, self._top_k\n"," if self.loo_eval == True:\n"," full['exp_score'] = full[['user', 'item']].apply(lambda x: explainability_matrix[x[0], x[1]].item(), axis=1)\n"," else:\n"," full['exp_score'] = full[['user', 'test_item']].apply(lambda x: explainability_matrix[x[0], x[1]].item(), axis=1)\n"," full['exp_and_rec'] = ((full['exp_score'] > theta) & (full['rank'] <= top_k)) * 1\n"," full['topN'] = (full['rank'] <= top_k) * 1\n"," return np.mean(full.groupby('user')['exp_and_rec'].sum() / full.groupby('user')['topN'].sum())\n","\n"," def cal_weighted_mep(self, explainability_matrix, theta):\n"," \"\"\"Weighted Mean Explainability Precision at cutoff top_k and threshold theta\"\"\"\n"," full, top_k = self._subjects, self._top_k\n"," if self.loo_eval == True:\n"," full['exp_score'] = full[['user', 'item']].apply(lambda x: explainability_matrix[x[0], x[1]].item(), axis=1)\n"," else:\n"," full['exp_score'] = full[['user', 'test_item']].apply(lambda x: explainability_matrix[x[0], x[1]].item(), axis=1)\n"," full['exp_and_rec'] = ((full['exp_score'] > theta) & (full['rank'] <= top_k)) * 1 * (full['exp_score'])\n"," full['topN'] = (full['rank'] <= top_k) * 1\n"," return np.mean(full.groupby('user')['exp_and_rec'].sum() / full.groupby('user')['topN'].sum())\n","\n"," def avg_popularity(self, popularity_vector):\n"," \"\"\"Average popularity of top_k recommended items\"\"\"\n"," full, top_k = self._subjects, self._top_k\n"," if self.loo_eval == True:\n"," recommended_items = list(full.loc[full['rank'] <= top_k]['item'])\n"," else:\n"," recommended_items = list(full.loc[full['rank'] <= top_k]['test_item'])\n"," return np.mean([popularity_vector[i] for i in recommended_items])\n","\n"," def efd(self, popularity_vector):\n"," \"\"\"Expected Free Discovery (EFD) in top_k recommended items\"\"\"\n"," full, top_k = self._subjects, self._top_k\n"," if self.loo_eval == True:\n"," recommended_items = list(full.loc[full['rank'] <= top_k]['item'])\n"," else:\n"," recommended_items = list(full.loc[full['rank'] <= top_k]['test_item'])\n"," return np.mean([- np.log2(popularity_vector[i] + sys.float_info.epsilon) for i in recommended_items])\n","\n"," def avg_pairwise_similarity(self, item_similarity_matrix):\n"," \"\"\"Average Pairwise Similarity of top_k recommended items\"\"\"\n"," full, top_k = self._subjects, self._top_k\n"," full = full.loc[full['rank'] <= top_k]\n"," users = list(dict.fromkeys(list(full['user'])))\n"," if self.loo_eval == True:\n"," rec_items_for_users = [list(full.loc[full['user'] == u]['item']) for u in users]\n"," else:\n"," rec_items_for_users = [list(full.loc[full['user'] == u]['test_item']) for u in users]\n"," rec_items_for_users = [x for x in rec_items_for_users if len(x) > 1]\n"," item_combinations = [set(combinations(rec_items_for_user, 2)) for rec_items_for_user in rec_items_for_users]\n"," return np.mean([np.mean([item_similarity_matrix[i, j] for (i, j) in item_combinations[k]]) for k in range(len(item_combinations))])"],"metadata":{"id":"C6Igzex8NKvs"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## Model"],"metadata":{"id":"LJat1W4uM9mN"}},{"cell_type":"code","source":["class Engine(object):\n"," \"\"\"Meta Engine for training & evaluating BPR\"\"\"\n","\n"," def __init__(self, config):\n"," self.config = config\n"," self._metron = MetronAtK(top_k=config['top_k'], loo_eval=self.config['loo_eval'])\n"," self.opt = use_optimizer(self.model, config)\n","\n"," def train_single_batch_EBPR(self, users, pos_items, neg_items, ratings, explainability_matrix, popularity_vector, neighborhood):\n"," assert hasattr(self, 'model'), 'Please specify the exact model!'\n"," assert self.config['model'] in ['BPR', 'UBPR', 'EBPR', 'pUEBPR', 'UEBPR'], 'Please specify the right model!'\n"," if self.config['use_cuda'] is True:\n"," users, pos_items, neg_items, ratings = users.cuda(), pos_items.cuda(), neg_items.cuda(), ratings.cuda()\n"," self.opt.zero_grad()\n"," pos_prediction, neg_prediction = self.model(users, pos_items, neg_items)\n"," if self.config['model'] == 'BPR':\n"," loss = - (pos_prediction - neg_prediction).sigmoid().log().sum()\n"," elif self.config['model'] == 'UBPR':\n"," loss = - ((pos_prediction - neg_prediction).sigmoid().log() / popularity_vector[pos_items]).sum()\n"," elif self.config['model'] == 'EBPR':\n"," loss = - ((pos_prediction - neg_prediction).sigmoid().log() * explainability_matrix[users, pos_items] * (\n"," 1 - explainability_matrix[users, neg_items])).sum()\n"," elif self.config['model'] == 'pUEBPR':\n"," loss = - ((pos_prediction - neg_prediction).sigmoid().log() / popularity_vector[pos_items] *\n"," explainability_matrix[users, pos_items] * (1 - explainability_matrix[users, neg_items])).sum()\n"," elif self.config['model'] == 'UEBPR':\n"," loss = - ((pos_prediction - neg_prediction).sigmoid().log() / popularity_vector[pos_items] *\n"," explainability_matrix[users, pos_items] / popularity_vector[\n"," neighborhood[pos_items].flatten()].view(len(pos_items), self.config['neighborhood']).sum(\n"," 1) * (1 - explainability_matrix[users, neg_items] / popularity_vector[\n"," neighborhood[neg_items].flatten()].view(len(neg_items), self.config['neighborhood']).sum(\n"," 1))).sum()\n"," if self.config['l2_regularization'] > 0:\n"," l2_reg = 0\n"," for param in self.model.parameters():\n"," l2_reg += torch.norm(param)\n"," loss += self.config['l2_regularization'] * l2_reg\n"," loss.backward()\n"," self.opt.step()\n"," loss = loss.item()\n"," return loss\n","\n"," def train_an_epoch(self, train_loader, explainability_matrix, popularity_vector, neighborhood, epoch_id):\n"," assert hasattr(self, 'model'), 'Please specify the exact model!'\n"," self.model.train()\n"," if self.config['use_cuda'] is True:\n"," explainability_matrix = torch.from_numpy(explainability_matrix).float().cuda()\n"," popularity_vector = torch.from_numpy(popularity_vector).float().cuda()\n"," neighborhood = torch.from_numpy(neighborhood).cuda()\n"," total_loss = 0\n"," bar = pyprind.ProgBar(len(train_loader))\n"," for batch_id, batch in enumerate(train_loader):\n"," bar.update()\n"," assert isinstance(batch[0], torch.LongTensor)\n"," user, pos_item, neg_item, rating = batch[0], batch[1], batch[2], batch[3]\n"," loss = self.train_single_batch_EBPR(user, pos_item, neg_item, rating, explainability_matrix, popularity_vector, neighborhood)\n"," total_loss += loss\n","\n"," def evaluate(self, evaluate_data, explainability_matrix, popularity_vector, item_similarity_matrix, epoch_id):\n"," assert hasattr(self, 'model'), 'Please specify the exact model !'\n"," if self.config['loo_eval']:\n"," test_users_eval, test_items_eval, test_scores_eval, negative_users_eval, negative_items_eval, negative_scores_eval = [], [], [], [], [], []\n"," else:\n"," test_users_eval, test_items_eval, test_scores_eval, test_output_eval = [], [], [], []\n"," self.model.eval()\n"," with torch.no_grad():\n"," if self.config['loo_eval']:\n"," for batch_id, batch in enumerate(evaluate_data[0]):\n"," test_users, test_items = batch[0], batch[1]\n"," if self.config['use_cuda'] is True:\n"," test_users = test_users.cuda()\n"," test_items = test_items.cuda()\n"," test_scores, _ = self.model(test_users, test_items, test_items)\n"," if self.config['use_cuda'] is True:\n"," test_users_eval += test_users.cpu().data.view(-1).tolist()\n"," test_items_eval += test_items.cpu().data.view(-1).tolist()\n"," test_scores_eval += test_scores.cpu().data.view(-1).tolist()\n"," for batch_id, batch in enumerate(evaluate_data[1]):\n"," negative_users, negative_items = batch[0], batch[1]\n"," if self.config['use_cuda'] is True:\n"," negative_users = negative_users.cuda()\n"," negative_items = negative_items.cuda()\n"," negative_scores, _ = self.model(negative_users, negative_items, negative_items)\n"," if self.config['use_cuda'] is True:\n"," negative_users_eval += negative_users.cpu().data.view(-1).tolist()\n"," negative_items_eval += negative_items.cpu().data.view(-1).tolist()\n"," negative_scores_eval += negative_scores.cpu().data.view(-1).tolist()\n"," self._metron.subjects = [test_users_eval, test_items_eval, test_scores_eval, negative_users_eval,\n"," negative_items_eval, negative_scores_eval]\n"," hr, ndcg, mep, wmep, avg_pop, efd, avg_pair_sim = self._metron.cal_hit_ratio_loo(), self._metron.cal_ndcg_loo(), self._metron.cal_mep(explainability_matrix, theta=0), self._metron.cal_weighted_mep(explainability_matrix, theta=0), self._metron.avg_popularity(popularity_vector), self._metron.efd(popularity_vector), self._metron.avg_pairwise_similarity(item_similarity_matrix)\n"," print('Evaluating Epoch {}: NDCG@{} = {:.4f}, HR@{} = {:.4f}, MEP@{} = {:.4f}, WMEP@{} = {:.4f}, Avg_Pop@{} = {:.4f}, EFD@{} = {:.4f}, Avg_Pair_Sim@{} = {:.4f}'.format(epoch_id, self.config['top_k'],\n"," ndcg, self.config['top_k'], hr, self.config['top_k'], mep, self.config['top_k'], wmep, self.config['top_k'], avg_pop, self.config['top_k'], efd, self.config['top_k'], avg_pair_sim))\n"," return ndcg, hr, mep, wmep, avg_pop, efd, avg_pair_sim\n"," else:\n"," for batch_id, batch in enumerate(evaluate_data):\n"," test_users, test_items, test_output = batch[0], batch[1], batch[2]\n"," if self.config['use_cuda'] is True:\n"," test_users = test_users.cuda()\n"," test_items = test_items.cuda()\n"," test_output = test_output.cuda()\n"," test_scores, _ = self.model(test_users, test_items, test_items)\n"," if self.config['use_cuda'] is True:\n"," test_users_eval += test_users.cpu().data.view(-1).tolist()\n"," test_items_eval += test_items.cpu().data.view(-1).tolist()\n"," test_scores_eval += test_scores.cpu().data.view(-1).tolist()\n"," test_output_eval += test_output.cpu().data.view(-1).tolist()\n"," self._metron.subjects = [test_users_eval, test_items_eval, test_output_eval, test_scores_eval]\n"," map, ndcg, mep, wmep, avg_pop, efd, avg_pair_sim = self._metron.cal_map_at_k(), self._metron.cal_ndcg(), self._metron.cal_mep(explainability_matrix, theta=0), self._metron.cal_weighted_mep(explainability_matrix, theta=0), self._metron.avg_popularity(popularity_vector), self._metron.efd(popularity_vector), self._metron.avg_pairwise_similarity(item_similarity_matrix)\n"," print('Evaluating Epoch {}: MAP@{} = {:.4f}, NDCG@{} = {:.4f}, MEP@{} = {:.4f}, WMEP@{} = {:.4f}, Avg_Pop@{} = {:.4f}, EFD@{} = {:.4f}, Avg_Pair_Sim@{} = {:.4f}'.format(epoch_id, self.config['top_k'], map, self.config['top_k'], ndcg, self.config['top_k'], mep, self.config['top_k'], wmep, self.config['top_k'], avg_pop, self.config['top_k'], efd, self.config['top_k'], avg_pair_sim))\n"," return map, ndcg, mep, wmep, avg_pop, efd, avg_pair_sim\n","\n"," def save_explicit(self, epoch_id, map, ndcg, mep, wmep, avg_pop, efd, avg_pair_sim, num_epoch, best_model, best_performance, save_models):\n"," assert hasattr(self, 'model'), 'Please specify the exact model !'\n"," if ndcg > best_performance[1]:\n"," best_performance[0] = map\n"," best_performance[1] = ndcg\n"," best_performance[2] = mep\n"," best_performance[3] = wmep\n"," best_performance[4] = avg_pop\n"," best_performance[5] = efd\n"," best_performance[6] = avg_pair_sim\n"," best_performance[7] = epoch_id\n"," best_model = self.model\n"," if epoch_id == num_epoch - 1:\n"," alias = self.config['model'] + '_' + self.config['dataset'] + '_batchsize_' + str(self.config['batch_size']) + '_opt_' + str(self.config['optimizer']) + '_lr_' + str(self.config['lr']) + '_latent_' + str(self.config['num_latent']) + '_l2reg_' + str(self.config['l2_regularization'])\n"," model_dir = self.config['model_dir_explicit'].format(alias, best_performance[7], self.config['top_k'], best_performance[0], self.config['top_k'], best_performance[1], self.config['top_k'], best_performance[2], self.config['top_k'], best_performance[3], self.config['top_k'], best_performance[4], self.config['top_k'], best_performance[5], self.config['top_k'], best_performance[6])\n"," print('Best model: ' + model_dir)\n"," if save_models:\n"," save_checkpoint(best_model, model_dir)\n"," return best_model, best_performance\n","\n"," def save_implicit(self, epoch_id, ndcg, hr, mep, wmep, avg_pop, efd, avg_pair_sim, num_epoch, best_model, best_performance, save_models):\n"," assert hasattr(self, 'model'), 'Please specify the exact model !'\n"," if ndcg > best_performance[0]:\n"," best_performance[0] = ndcg\n"," best_performance[1] = hr\n"," best_performance[2] = mep\n"," best_performance[3] = wmep\n"," best_performance[4] = avg_pop\n"," best_performance[5] = efd\n"," best_performance[6] = avg_pair_sim\n"," best_performance[7] = epoch_id\n"," best_model = self.model\n"," if epoch_id == num_epoch - 1:\n"," alias = self.config['model'] + '_' + self.config['dataset'] + '_batchsize_' + str(self.config['batch_size']) + '_opt_' + str(self.config['optimizer']) + '_lr_' + str(self.config['lr']) + '_latent_' + str(self.config['num_latent']) + '_l2reg_' + str(self.config['l2_regularization'])\n"," model_dir = self.config['model_dir_implicit'].format(alias, best_performance[7], self.config['top_k'], best_performance[0], self.config['top_k'], best_performance[1], self.config['top_k'], best_performance[2], self.config['top_k'], best_performance[3], self.config['top_k'], best_performance[4], self.config['top_k'], best_performance[5], self.config['top_k'], best_performance[6])\n"," print('Best model: ' + model_dir)\n"," if save_models:\n"," save_checkpoint(best_model, model_dir)\n"," return best_model, best_performance\n","\n"," def load_model(self, test_model_path):\n"," resume_checkpoint(self.model, test_model_path, self.config['device_id'])\n"," return self.model"],"metadata":{"id":"eKGBACSgNZmS"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["EBPR model"],"metadata":{"id":"xyWX7moONrfE"}},{"cell_type":"code","source":["class BPR(torch.nn.Module):\n"," \"\"\"\"BPR model definition\"\"\"\n","\n"," def __init__(self, config):\n"," super(BPR, self).__init__()\n"," self.num_users = config['num_users']\n"," self.num_items = config['num_items']\n"," self.num_latent = config['num_latent']\n"," self.loo_eval = config['loo_eval']\n","\n"," self.embed_user = torch.nn.Embedding(self.num_users, self.num_latent)\n"," self.embed_item = torch.nn.Embedding(self.num_items, self.num_latent)\n","\n"," # torch.nn.init.xavier_uniform_(self.embed_user.weight)\n"," # torch.nn.init.xavier_uniform_(self.embed_item.weight)\n"," torch.nn.init.normal_(self.embed_user.weight, std=0.01)\n"," torch.nn.init.normal_(self.embed_item.weight, std=0.01)\n","\n"," def forward(self, user_indices, pos_item_indices, neg_item_indices):\n","\n"," user_latent = self.embed_user(user_indices)\n"," pos_item_latent = self.embed_item(pos_item_indices)\n"," neg_item_latent = self.embed_item(neg_item_indices)\n","\n"," pos_prediction = (user_latent * pos_item_latent).sum(dim=-1)\n"," neg_prediction = (user_latent * neg_item_latent).sum(dim=-1)\n"," return pos_prediction, neg_prediction\n","\n"," def init_weight(self):\n"," pass"],"metadata":{"id":"x6EO1nzANygD"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["class BPREngine(Engine):\n"," \"\"\"Engine for training & evaluating BPR\"\"\"\n"," def __init__(self, config):\n"," self.model = BPR(config)\n"," if config['use_cuda'] is True:\n"," use_cuda(True, config['device_id'])\n"," self.model.cuda()\n"," super(BPREngine, self).__init__(config)"],"metadata":{"id":"fkKgipJ1Nzjv"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## Training and Evaluation"],"metadata":{"id":"gSsRqIvUM-pb"}},{"cell_type":"code","source":["def main(args):\n"," # Read dataset\n"," dataset_name = args.dataset # 'ml-100k' for Movielens 100K. 'ml-1m' for the Movielens 1M dataset. 'lastfm-2k' for the\n"," # Last.FM 2K dataset. 'yahoo-r3' for the Yahoo! R3 dataset.\n"," dataset = read_data(dataset_name, args.int_per_item)\n","\n"," # Define hyperparameters\n"," config = {'model': args.model, # Model to train: 'BPR', 'UBPR', 'EBPR', 'pUEBPR', 'UEBPR'.\n"," 'dataset': dataset_name,\n"," 'num_epoch': args.num_epoch, # Number of training epochs.\n"," 'batch_size': args.batch_size, # Batch size.\n"," 'lr': args.lr, # Learning rate.\n"," #'optimizer': 'sgd',\n"," 'sgd_momentum': args.sgd_momentum,\n"," #'optimizer': 'rmsprop',\n"," 'rmsprop_alpha': args.rmsprop_alpha,\n"," 'rmsprop_momentum': args.rmsprop_momentum,\n"," 'optimizer': args.optimizer,\n"," 'num_users': len(dataset['userId'].unique()),\n"," 'num_items': len(dataset['itemId'].unique()),\n"," 'test_rate': args.test_rate, # Test rate for random train/val/test split. test_rate is the rate of test + validation. Used when 'loo_eval' is set to False.\n"," 'num_latent': args.num_latent, # Number of latent factors.\n"," 'weight_decay': args.weight_decay,\n"," 'l2_regularization': args.l2_regularization,\n"," 'use_cuda': args.use_cuda,\n"," 'device_id': args.device_id,\n"," 'top_k': args.top_k, # k in MAP@k, HR@k and NDCG@k.\n"," 'loo_eval': args.loo_eval, # True: LOO evaluation with HR@k and NDCG@k. False: Random train/test split\n"," # evaluation with MAP@k and NDCG@k.\n"," 'neighborhood': args.neighborhood, # Neighborhood size for explainability.\n"," 'model_dir_explicit':'Output/checkpoints/{}_Epoch{}_MAP@{}_{:.4f}_NDCG@{}_{:.4f}_MEP@{}_{:.4f}_WMEP@{}_{:.4f}_Avg_Pop@{}_{:.4f}_EFD@{}_{:.4f}_Avg_Pair_Sim@{}_{:.4f}.model',\n"," 'model_dir_implicit':'Output/checkpoints/{}_Epoch{}_NDCG@{}_{:.4f}_HR@{}_{:.4f}_MEP@{}_{:.4f}_WMEP@{}_{:.4f}_Avg_Pop@{}_{:.4f}_EFD@{}_{:.4f}_Avg_Pair_Sim@{}_{:.4f}.model'}\n","\n"," # DataLoader\n"," sample_generator = SampleGenerator(dataset, config, split_val=False)\n"," test_data = sample_generator.test_data_loader(config['batch_size'])\n","\n"," # Create explainability matrix\n"," explainability_matrix = sample_generator.create_explainability_matrix()\n"," test_explainability_matrix = sample_generator.create_explainability_matrix(include_test=True)\n","\n"," # Create popularity vector\n"," popularity_vector = sample_generator.create_popularity_vector()\n"," test_popularity_vector = sample_generator.create_popularity_vector(include_test=True)\n","\n"," #Create item neighborhood\n"," neighborhood, item_similarity_matrix = sample_generator.create_neighborhood()\n"," _, test_item_similarity_matrix = sample_generator.create_neighborhood(include_test=True)\n","\n"," # Specify the exact model\n"," engine = BPREngine(config)\n","\n"," # Initialize list of optimal results\n"," best_performance = [0] * 8\n"," best_ndcg = 0\n","\n"," best_model = ''\n"," for epoch in range(config['num_epoch']):\n"," print('Training epoch {}'.format(epoch))\n"," train_loader = sample_generator.train_data_loader(config['batch_size'])\n"," engine.train_an_epoch(train_loader, explainability_matrix, popularity_vector, neighborhood, epoch_id=epoch)\n"," if config['loo_eval']:\n"," ndcg, hr, mep, wmep, avg_pop, efd, avg_pair_sim = engine.evaluate(test_data, test_explainability_matrix, test_popularity_vector, test_item_similarity_matrix, epoch_id=str(epoch) + ' on test data')\n"," print('-' * 80)\n"," best_model, best_performance = engine.save_implicit(epoch, ndcg, hr, mep, wmep, avg_pop, efd, avg_pair_sim, config['num_epoch'], best_model, best_performance, save_models = args.save_models)\n"," else:\n"," map, ndcg, mep, wmep, avg_pop, efd, avg_pair_sim = engine.evaluate(test_data, test_explainability_matrix, test_popularity_vector, test_item_similarity_matrix, epoch_id=str(epoch) + ' on test data')\n"," print('-' * 80)\n"," best_model, best_performance = engine.save_explicit(epoch, map, ndcg, mep, wmep, avg_pop, efd, avg_pair_sim, config['num_epoch'], best_model, best_performance, save_models = args.save_models)"],"metadata":{"id":"K7Fi2n2-N323"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["#collapse-hide\n","if __name__ == \"__main__\":\n"," parser = argparse.ArgumentParser(description=\"Training script.\")\n","\n"," parser.add_argument(\"--model\", type =str, default='EBPR', help=\"Model to train: 'BPR', 'UBPR', 'EBPR', 'pUEBPR', \"\n"," \"'UEBPR'.\")\n"," parser.add_argument(\"--dataset\", type =str, default='lastfm-2k', help=\"'ml-100k' for Movielens 100K. 'ml-1m' for \"\n"," \"the Movielens 1M dataset. 'lastfm-2k' for \"\n"," \"the Last.FM 2K dataset. 'yahoo-r3' for the \"\n"," \"Yahoo! R3 dataset.\")\n"," parser.add_argument(\"--num_epoch\", type =int, default=50, help=\"Number of training epochs.\")\n"," parser.add_argument(\"--batch_size\", type =int, default=100, help=\"Batch size.\")\n"," parser.add_argument(\"--num_latent\", type=int, default=50, help=\"Number of latent features.\")\n"," parser.add_argument(\"--l2_regularization\", type=float, default=0.0, help=\"L2 regularization coefficient.\")\n"," parser.add_argument(\"--weight_decay\", type=float, default=0.0, help=\"Weight decay coefficient.\")\n"," parser.add_argument(\"--neighborhood\", type=int, default=20, help=\"Neighborhood size for explainability.\")\n"," parser.add_argument(\"--top_k\", type=int, default=10, help=\"Cutoff k in MAP@k, HR@k and NDCG@k, etc.\")\n"," parser.add_argument(\"--lr\", type=float, default=0.001, help=\"Learning rate.\")\n"," parser.add_argument(\"--optimizer\", type=str, default='adam', help=\"Optimizer: 'adam', 'sgd', 'rmsprop'.\")\n"," parser.add_argument(\"--sgd_momentum\", type =float, default=0.9, help=\"Momentum for SGD optimizer.\")\n"," parser.add_argument(\"--rmsprop_alpha\", type =float, default=0.9, help=\"alpha hyperparameter for RMSProp optimizer.\")\n"," parser.add_argument(\"--rmsprop_momentum\", type =float, default=0.0, help=\"Momentum for RMSProp optimizer.\")\n"," parser.add_argument(\"--loo_eval\", type=lambda x: (str(x).lower() == 'true'), default=True, help=\"True: LOO evaluation. False: Random \"\n"," \"train/test split\")\n"," parser.add_argument(\"--test_rate\", type=float, default=0.2, help=\"Test rate for random train/val/test \"\n"," \"split. test_rate is the rate of test + \"\n"," \"validation. Used when 'loo_eval' is set \"\n"," \"to False.\")\n"," parser.add_argument(\"--use_cuda\", type=lambda x: (str(x).lower() == 'true'), default=True, help=\"True is you want to use a CUDA device.\")\n"," parser.add_argument(\"--device_id\", type=int, default=0, help=\"ID of CUDA device if 'use_cuda' is True.\")\n"," parser.add_argument(\"--save_models\", type=lambda x: (str(x).lower() == 'true'), default=True,\n"," help=\"True if you want to save the best model(s).\")\n"," parser.add_argument(\"--int_per_item\", type =int, default=0, help=\"Minimum number of interactions per item for studying effect sparsity on the lastfm-2k dataset.\")\n","\n"," args = parser.parse_args([])\n"," main(args)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"IpgFPihmOA7O","executionInfo":{"status":"ok","timestamp":1639224302192,"user_tz":-330,"elapsed":2400988,"user":{"displayName":"Sparsh Agarwal","photoUrl":"https://lh3.googleusercontent.com/a/default-user=s64","userId":"13037694610922482904"}},"outputId":"3b2ea639-178f-4325-b6d7-08b6cb16eac8"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Creating explainability matrix...\n","Creating test explainability matrix...\n","Creating popularity vector...\n","Creating test popularity vector...\n","Determining item neighborhoods...\n","Determining test item neighborhoods...\n","Training epoch 0\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 0 on test data: NDCG@10 = 0.5700, HR@10 = 0.7060, MEP@10 = 0.1927, WMEP@10 = 0.0412, Avg_Pop@10 = 0.1881, EFD@10 = 3.0274, Avg_Pair_Sim@10 = 0.0141\n","--------------------------------------------------------------------------------\n","Training epoch 1\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 1 on test data: NDCG@10 = 0.5989, HR@10 = 0.7487, MEP@10 = 0.2025, WMEP@10 = 0.0426, Avg_Pop@10 = 0.1948, EFD@10 = 2.9236, Avg_Pair_Sim@10 = 0.0157\n","--------------------------------------------------------------------------------\n","Training epoch 2\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 2 on test data: NDCG@10 = 0.6113, HR@10 = 0.7839, MEP@10 = 0.2126, WMEP@10 = 0.0440, Avg_Pop@10 = 0.1925, EFD@10 = 2.9495, Avg_Pair_Sim@10 = 0.0162\n","--------------------------------------------------------------------------------\n","Training epoch 3\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 3 on test data: NDCG@10 = 0.6104, HR@10 = 0.7716, MEP@10 = 0.2122, WMEP@10 = 0.0439, Avg_Pop@10 = 0.1879, EFD@10 = 2.9977, Avg_Pair_Sim@10 = 0.0160\n","--------------------------------------------------------------------------------\n","Training epoch 4\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 4 on test data: NDCG@10 = 0.6143, HR@10 = 0.7684, MEP@10 = 0.2178, WMEP@10 = 0.0445, Avg_Pop@10 = 0.1829, EFD@10 = 3.0525, Avg_Pair_Sim@10 = 0.0161\n","--------------------------------------------------------------------------------\n","Training epoch 5\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 5 on test data: NDCG@10 = 0.6084, HR@10 = 0.7604, MEP@10 = 0.2199, WMEP@10 = 0.0444, Avg_Pop@10 = 0.1784, EFD@10 = 3.1006, Avg_Pair_Sim@10 = 0.0158\n","--------------------------------------------------------------------------------\n","Training epoch 6\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:02\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 6 on test data: NDCG@10 = 0.6086, HR@10 = 0.7556, MEP@10 = 0.2224, WMEP@10 = 0.0445, Avg_Pop@10 = 0.1747, EFD@10 = 3.1421, Avg_Pair_Sim@10 = 0.0158\n","--------------------------------------------------------------------------------\n","Training epoch 7\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 7 on test data: NDCG@10 = 0.6119, HR@10 = 0.7561, MEP@10 = 0.2269, WMEP@10 = 0.0449, Avg_Pop@10 = 0.1713, EFD@10 = 3.1823, Avg_Pair_Sim@10 = 0.0156\n","--------------------------------------------------------------------------------\n","Training epoch 8\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 8 on test data: NDCG@10 = 0.6085, HR@10 = 0.7551, MEP@10 = 0.2308, WMEP@10 = 0.0453, Avg_Pop@10 = 0.1692, EFD@10 = 3.2058, Avg_Pair_Sim@10 = 0.0155\n","--------------------------------------------------------------------------------\n","Training epoch 9\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 9 on test data: NDCG@10 = 0.6164, HR@10 = 0.7620, MEP@10 = 0.2335, WMEP@10 = 0.0457, Avg_Pop@10 = 0.1674, EFD@10 = 3.2273, Avg_Pair_Sim@10 = 0.0154\n","--------------------------------------------------------------------------------\n","Training epoch 10\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 10 on test data: NDCG@10 = 0.6156, HR@10 = 0.7567, MEP@10 = 0.2349, WMEP@10 = 0.0458, Avg_Pop@10 = 0.1654, EFD@10 = 3.2469, Avg_Pair_Sim@10 = 0.0153\n","--------------------------------------------------------------------------------\n","Training epoch 11\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 11 on test data: NDCG@10 = 0.6087, HR@10 = 0.7487, MEP@10 = 0.2374, WMEP@10 = 0.0458, Avg_Pop@10 = 0.1633, EFD@10 = 3.2697, Avg_Pair_Sim@10 = 0.0150\n","--------------------------------------------------------------------------------\n","Training epoch 12\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 12 on test data: NDCG@10 = 0.6114, HR@10 = 0.7487, MEP@10 = 0.2385, WMEP@10 = 0.0459, Avg_Pop@10 = 0.1622, EFD@10 = 3.2824, Avg_Pair_Sim@10 = 0.0151\n","--------------------------------------------------------------------------------\n","Training epoch 13\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 13 on test data: NDCG@10 = 0.6064, HR@10 = 0.7492, MEP@10 = 0.2405, WMEP@10 = 0.0461, Avg_Pop@10 = 0.1610, EFD@10 = 3.2928, Avg_Pair_Sim@10 = 0.0150\n","--------------------------------------------------------------------------------\n","Training epoch 14\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 14 on test data: NDCG@10 = 0.6073, HR@10 = 0.7556, MEP@10 = 0.2400, WMEP@10 = 0.0460, Avg_Pop@10 = 0.1605, EFD@10 = 3.2986, Avg_Pair_Sim@10 = 0.0149\n","--------------------------------------------------------------------------------\n","Training epoch 15\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 15 on test data: NDCG@10 = 0.6096, HR@10 = 0.7535, MEP@10 = 0.2423, WMEP@10 = 0.0462, Avg_Pop@10 = 0.1598, EFD@10 = 3.3039, Avg_Pair_Sim@10 = 0.0147\n","--------------------------------------------------------------------------------\n","Training epoch 16\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 16 on test data: NDCG@10 = 0.6182, HR@10 = 0.7668, MEP@10 = 0.2444, WMEP@10 = 0.0465, Avg_Pop@10 = 0.1596, EFD@10 = 3.3070, Avg_Pair_Sim@10 = 0.0148\n","--------------------------------------------------------------------------------\n","Training epoch 17\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 17 on test data: NDCG@10 = 0.6176, HR@10 = 0.7689, MEP@10 = 0.2445, WMEP@10 = 0.0465, Avg_Pop@10 = 0.1593, EFD@10 = 3.3106, Avg_Pair_Sim@10 = 0.0148\n","--------------------------------------------------------------------------------\n","Training epoch 18\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 18 on test data: NDCG@10 = 0.6203, HR@10 = 0.7700, MEP@10 = 0.2450, WMEP@10 = 0.0466, Avg_Pop@10 = 0.1588, EFD@10 = 3.3160, Avg_Pair_Sim@10 = 0.0148\n","--------------------------------------------------------------------------------\n","Training epoch 19\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 19 on test data: NDCG@10 = 0.6310, HR@10 = 0.7737, MEP@10 = 0.2472, WMEP@10 = 0.0468, Avg_Pop@10 = 0.1586, EFD@10 = 3.3176, Avg_Pair_Sim@10 = 0.0148\n","--------------------------------------------------------------------------------\n","Training epoch 20\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 20 on test data: NDCG@10 = 0.6313, HR@10 = 0.7748, MEP@10 = 0.2476, WMEP@10 = 0.0469, Avg_Pop@10 = 0.1587, EFD@10 = 3.3182, Avg_Pair_Sim@10 = 0.0148\n","--------------------------------------------------------------------------------\n","Training epoch 21\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 21 on test data: NDCG@10 = 0.6265, HR@10 = 0.7812, MEP@10 = 0.2493, WMEP@10 = 0.0471, Avg_Pop@10 = 0.1590, EFD@10 = 3.3133, Avg_Pair_Sim@10 = 0.0149\n","--------------------------------------------------------------------------------\n","Training epoch 22\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 22 on test data: NDCG@10 = 0.6291, HR@10 = 0.7828, MEP@10 = 0.2497, WMEP@10 = 0.0472, Avg_Pop@10 = 0.1585, EFD@10 = 3.3179, Avg_Pair_Sim@10 = 0.0149\n","--------------------------------------------------------------------------------\n","Training epoch 23\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 23 on test data: NDCG@10 = 0.6213, HR@10 = 0.7898, MEP@10 = 0.2507, WMEP@10 = 0.0474, Avg_Pop@10 = 0.1590, EFD@10 = 3.3139, Avg_Pair_Sim@10 = 0.0150\n","--------------------------------------------------------------------------------\n","Training epoch 24\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 24 on test data: NDCG@10 = 0.6248, HR@10 = 0.7908, MEP@10 = 0.2495, WMEP@10 = 0.0472, Avg_Pop@10 = 0.1590, EFD@10 = 3.3156, Avg_Pair_Sim@10 = 0.0148\n","--------------------------------------------------------------------------------\n","Training epoch 25\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 25 on test data: NDCG@10 = 0.6160, HR@10 = 0.7898, MEP@10 = 0.2509, WMEP@10 = 0.0473, Avg_Pop@10 = 0.1586, EFD@10 = 3.3186, Avg_Pair_Sim@10 = 0.0148\n","--------------------------------------------------------------------------------\n","Training epoch 26\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:02\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 26 on test data: NDCG@10 = 0.6178, HR@10 = 0.7924, MEP@10 = 0.2510, WMEP@10 = 0.0473, Avg_Pop@10 = 0.1586, EFD@10 = 3.3178, Avg_Pair_Sim@10 = 0.0148\n","--------------------------------------------------------------------------------\n","Training epoch 27\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 27 on test data: NDCG@10 = 0.6195, HR@10 = 0.7892, MEP@10 = 0.2504, WMEP@10 = 0.0472, Avg_Pop@10 = 0.1587, EFD@10 = 3.3151, Avg_Pair_Sim@10 = 0.0149\n","--------------------------------------------------------------------------------\n","Training epoch 28\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 28 on test data: NDCG@10 = 0.6155, HR@10 = 0.7860, MEP@10 = 0.2513, WMEP@10 = 0.0472, Avg_Pop@10 = 0.1586, EFD@10 = 3.3153, Avg_Pair_Sim@10 = 0.0149\n","--------------------------------------------------------------------------------\n","Training epoch 29\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 29 on test data: NDCG@10 = 0.6234, HR@10 = 0.7898, MEP@10 = 0.2521, WMEP@10 = 0.0473, Avg_Pop@10 = 0.1586, EFD@10 = 3.3147, Avg_Pair_Sim@10 = 0.0149\n","--------------------------------------------------------------------------------\n","Training epoch 30\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 30 on test data: NDCG@10 = 0.6276, HR@10 = 0.7930, MEP@10 = 0.2527, WMEP@10 = 0.0474, Avg_Pop@10 = 0.1586, EFD@10 = 3.3117, Avg_Pair_Sim@10 = 0.0147\n","--------------------------------------------------------------------------------\n","Training epoch 31\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 31 on test data: NDCG@10 = 0.6294, HR@10 = 0.7978, MEP@10 = 0.2532, WMEP@10 = 0.0474, Avg_Pop@10 = 0.1594, EFD@10 = 3.3045, Avg_Pair_Sim@10 = 0.0149\n","--------------------------------------------------------------------------------\n","Training epoch 32\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 32 on test data: NDCG@10 = 0.6329, HR@10 = 0.7988, MEP@10 = 0.2520, WMEP@10 = 0.0474, Avg_Pop@10 = 0.1592, EFD@10 = 3.3070, Avg_Pair_Sim@10 = 0.0149\n","--------------------------------------------------------------------------------\n","Training epoch 33\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 33 on test data: NDCG@10 = 0.6287, HR@10 = 0.7978, MEP@10 = 0.2519, WMEP@10 = 0.0474, Avg_Pop@10 = 0.1597, EFD@10 = 3.3019, Avg_Pair_Sim@10 = 0.0147\n","--------------------------------------------------------------------------------\n","Training epoch 34\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 34 on test data: NDCG@10 = 0.6365, HR@10 = 0.8063, MEP@10 = 0.2542, WMEP@10 = 0.0477, Avg_Pop@10 = 0.1599, EFD@10 = 3.3000, Avg_Pair_Sim@10 = 0.0148\n","--------------------------------------------------------------------------------\n","Training epoch 35\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 35 on test data: NDCG@10 = 0.6322, HR@10 = 0.8058, MEP@10 = 0.2536, WMEP@10 = 0.0475, Avg_Pop@10 = 0.1602, EFD@10 = 3.2970, Avg_Pair_Sim@10 = 0.0149\n","--------------------------------------------------------------------------------\n","Training epoch 36\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 36 on test data: NDCG@10 = 0.6308, HR@10 = 0.8047, MEP@10 = 0.2538, WMEP@10 = 0.0476, Avg_Pop@10 = 0.1604, EFD@10 = 3.2917, Avg_Pair_Sim@10 = 0.0149\n","--------------------------------------------------------------------------------\n","Training epoch 37\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 37 on test data: NDCG@10 = 0.6359, HR@10 = 0.8036, MEP@10 = 0.2538, WMEP@10 = 0.0475, Avg_Pop@10 = 0.1608, EFD@10 = 3.2883, Avg_Pair_Sim@10 = 0.0149\n","--------------------------------------------------------------------------------\n","Training epoch 38\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 38 on test data: NDCG@10 = 0.6394, HR@10 = 0.8084, MEP@10 = 0.2555, WMEP@10 = 0.0478, Avg_Pop@10 = 0.1610, EFD@10 = 3.2856, Avg_Pair_Sim@10 = 0.0150\n","--------------------------------------------------------------------------------\n","Training epoch 39\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 39 on test data: NDCG@10 = 0.6408, HR@10 = 0.8047, MEP@10 = 0.2570, WMEP@10 = 0.0479, Avg_Pop@10 = 0.1605, EFD@10 = 3.2903, Avg_Pair_Sim@10 = 0.0150\n","--------------------------------------------------------------------------------\n","Training epoch 40\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 40 on test data: NDCG@10 = 0.6461, HR@10 = 0.8111, MEP@10 = 0.2570, WMEP@10 = 0.0479, Avg_Pop@10 = 0.1611, EFD@10 = 3.2859, Avg_Pair_Sim@10 = 0.0150\n","--------------------------------------------------------------------------------\n","Training epoch 41\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 41 on test data: NDCG@10 = 0.6421, HR@10 = 0.8095, MEP@10 = 0.2578, WMEP@10 = 0.0479, Avg_Pop@10 = 0.1610, EFD@10 = 3.2850, Avg_Pair_Sim@10 = 0.0149\n","--------------------------------------------------------------------------------\n","Training epoch 42\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 42 on test data: NDCG@10 = 0.6463, HR@10 = 0.8132, MEP@10 = 0.2582, WMEP@10 = 0.0479, Avg_Pop@10 = 0.1613, EFD@10 = 3.2833, Avg_Pair_Sim@10 = 0.0149\n","--------------------------------------------------------------------------------\n","Training epoch 43\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 43 on test data: NDCG@10 = 0.6459, HR@10 = 0.8132, MEP@10 = 0.2582, WMEP@10 = 0.0480, Avg_Pop@10 = 0.1616, EFD@10 = 3.2800, Avg_Pair_Sim@10 = 0.0150\n","--------------------------------------------------------------------------------\n","Training epoch 44\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 44 on test data: NDCG@10 = 0.6427, HR@10 = 0.8127, MEP@10 = 0.2584, WMEP@10 = 0.0480, Avg_Pop@10 = 0.1616, EFD@10 = 3.2795, Avg_Pair_Sim@10 = 0.0149\n","--------------------------------------------------------------------------------\n","Training epoch 45\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 45 on test data: NDCG@10 = 0.6438, HR@10 = 0.8106, MEP@10 = 0.2578, WMEP@10 = 0.0480, Avg_Pop@10 = 0.1616, EFD@10 = 3.2779, Avg_Pair_Sim@10 = 0.0149\n","--------------------------------------------------------------------------------\n","Training epoch 46\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 46 on test data: NDCG@10 = 0.6407, HR@10 = 0.8138, MEP@10 = 0.2596, WMEP@10 = 0.0481, Avg_Pop@10 = 0.1620, EFD@10 = 3.2751, Avg_Pair_Sim@10 = 0.0149\n","--------------------------------------------------------------------------------\n","Training epoch 47\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 47 on test data: NDCG@10 = 0.6397, HR@10 = 0.8106, MEP@10 = 0.2598, WMEP@10 = 0.0481, Avg_Pop@10 = 0.1622, EFD@10 = 3.2711, Avg_Pair_Sim@10 = 0.0150\n","--------------------------------------------------------------------------------\n","Training epoch 48\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:02\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 48 on test data: NDCG@10 = 0.6416, HR@10 = 0.8175, MEP@10 = 0.2599, WMEP@10 = 0.0482, Avg_Pop@10 = 0.1627, EFD@10 = 3.2672, Avg_Pair_Sim@10 = 0.0149\n","--------------------------------------------------------------------------------\n","Training epoch 49\n"]},{"output_type":"stream","name":"stderr","text":["0% [##############################] 100% | ETA: 00:00:00\n","Total time elapsed: 00:00:03\n","/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:90: SettingWithCopyWarning: \n","A value is trying to be set on a copy of a slice from a DataFrame.\n","Try using .loc[row_indexer,col_indexer] = value instead\n","\n","See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n"]},{"output_type":"stream","name":"stdout","text":["Evaluating Epoch 49 on test data: NDCG@10 = 0.6474, HR@10 = 0.8186, MEP@10 = 0.2617, WMEP@10 = 0.0484, Avg_Pop@10 = 0.1628, EFD@10 = 3.2679, Avg_Pair_Sim@10 = 0.0151\n","--------------------------------------------------------------------------------\n","Best model: Output/checkpoints/EBPR_lastfm-2k_batchsize_100_opt_adam_lr_0.001_latent_50_l2reg_0.0_Epoch49_NDCG@10_0.6474_HR@10_0.8186_MEP@10_0.2617_WMEP@10_0.0484_Avg_Pop@10_0.1628_EFD@10_3.2679_Avg_Pair_Sim@10_0.0151.model\n"]}]},{"cell_type":"markdown","source":["---"],"metadata":{"id":"asGQoRLcZWwk"}},{"cell_type":"code","source":["!apt-get -qq install tree\n","!rm -r sample_data"],"metadata":{"id":"B7757SpaZWwl"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["!tree -h --du ."],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"2FhxHJlsZWwm","executionInfo":{"status":"ok","timestamp":1639224572299,"user_tz":-330,"elapsed":17,"user":{"displayName":"Sparsh Agarwal","photoUrl":"https://lh3.googleusercontent.com/a/default-user=s64","userId":"13037694610922482904"}},"outputId":"80b445d4-8d81-47f8-a169-e28fef30cda5"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":[".\n","├── [ 12M] data\n","│   ├── [1.8M] artists.dat\n","│   ├── [4.7K] readme.md\n","│   ├── [217K] tags.dat\n","│   ├── [1.1M] user_artists.dat\n","│   ├── [221K] user_friends.dat\n","│   ├── [4.0M] user_taggedartists.dat\n","│   └── [4.8M] user_taggedartists-timestamps.dat\n","└── [3.7M] Output\n"," └── [3.7M] checkpoints\n"," └── [3.7M] EBPR_lastfm-2k_batchsize_100_opt_adam_lr_0.001_latent_50_l2reg_0.0_Epoch49_NDCG@10_0.6474_HR@10_0.8186_MEP@10_0.2617_WMEP@10_0.0484_Avg_Pop@10_0.1628_EFD@10_3.2679_Avg_Pair_Sim@10_0.0151.model\n","\n"," 16M used in 3 directories, 8 files\n"]}]},{"cell_type":"code","source":["!pip install -q watermark\n","%reload_ext watermark\n","%watermark -a \"Sparsh A.\" -m -iv -u -t -d"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"OwoWkAq-ZWwn","executionInfo":{"status":"ok","timestamp":1639224585066,"user_tz":-330,"elapsed":3666,"user":{"displayName":"Sparsh Agarwal","photoUrl":"https://lh3.googleusercontent.com/a/default-user=s64","userId":"13037694610922482904"}},"outputId":"1074c880-414b-4c28-df89-9954b9e35e54"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Author: Sparsh A.\n","\n","Last updated: 2021-12-11 12:09:53\n","\n","Compiler : GCC 7.5.0\n","OS : Linux\n","Release : 5.4.104+\n","Machine : x86_64\n","Processor : x86_64\n","CPU cores : 2\n","Architecture: 64bit\n","\n","IPython : 5.5.0\n","pandas : 1.1.5\n","sys : 3.7.12 (default, Sep 10 2021, 00:21:48) \n","[GCC 7.5.0]\n","torch : 1.10.0+cu111\n","argparse: 1.1\n","pyprind : 2.11.3\n","numpy : 1.19.5\n","\n"]}]},{"cell_type":"markdown","source":["---"],"metadata":{"id":"mAkZcALhZWwo"}},{"cell_type":"markdown","source":["**END**"],"metadata":{"id":"RWCkGSF0ZWwo"}}]}