{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "2021-06-25-anime-recommender.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyPMbj/+4iLT9js27uIJJ/Jj"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "jM5F1Clckebn"
},
"source": [
"# RekoNet Anime Recommender\n",
"> The data crawled from the popular anime website [MyAnimeList.net](http://myanimelist.net/), and cleaned of duplicates as well as missing values and false data. Following that, autoencoders used to learn embeddings of all the anime titles present in the dataset, which were then used to cluster the same.\n",
"\n",
"- toc: true\n",
"- badges: true\n",
"- comments: true\n",
"- categories: [anime, autoencoder, pytorch]\n",
"- image:"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aoZY5KoFj60J"
},
"source": [
"### Overview\n",
"1. deep autoencoders for predicting ratings and generating embeddings.\n",
"2. form clusters using embeddings of anime titles\n",
"3. find similar animes using similarity metric based on like and dislike of user\n",
"4. combine with rating prediction to create a hybrid recommender"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "i8KumC_xkBbX"
},
"source": [
"### Background"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IZMeWF-Tjxi9"
},
"source": [
"Anime (a term derived from the English word animation) is a form of hand-drawn computer animation which originated in Japan and has now developed a cult following around the world. In recent years, the Anime industry has been growing at an enormous pace making billions of dollars in profit every year. Its market has gained attention from major streaming platforms like Netflix and Amazon Prime. In the pre-internet era, Anime enthusiasts discovered new titles through word of mouth. Hence personalized recommendations were not required. Moreover, the number of titles released were quite less to facilitate a data-based approach for personalized recommendations. However, in recent years, with the boom of streaming services and the amount of newly released anime titles, people can watch Anime as much as they like. This calls for a personalized recommendation system for this new generation of Anime watchers."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9IRd-q6NkRci"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JjX8CzAOlGwY"
},
"source": [
"The data used for training Rikonet was crawled from the popular anime website [MyAnimeList.net](http://myanimelist.net/) using the Jikan API. The collected data was cleaned of duplicates as well as missing values and false data and reduced to 6668 anime titles while retaining all the key information.\n",
"\n",
"Following that, autoencoders used to learn embeddings of all the anime titles present in the dataset, which were then used to cluster the same.\n",
"\n",
"The logically opposite clusters of the anime titles are estimated as well.\n",
"\n",
"At run-time, when a user requests a new recommendation list, the user’s context, i.e., the anime titles rated so far is fed into the primary autoencoder, which computes the predicted ratings for the unrated titles.\n",
"\n",
"These ratings are further fed to a hybrid filter, which generates 2 lists, namely - Similar Anime and Anime You May Like, the former showing anime titles similar to the ones the user rated highly and the later showing titles which the user may like based on his overall ratings."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "A52oi8A7dk9w"
},
"source": [
"### Setup"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "NW5qXlV2dQet",
"outputId": "7033b9d1-7e49-4df6-9133-7b2b975e7658"
},
"source": [
"!pip install google_trans_new"
],
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"text": [
"Collecting google_trans_new\n",
" Downloading https://files.pythonhosted.org/packages/f9/7b/9f136106dc5824dc98185c97991d3cd9b53e70a197154dd49f7b899128f6/google_trans_new-1.1.9-py3-none-any.whl\n",
"Installing collected packages: google-trans-new\n",
"Successfully installed google-trans-new-1.1.9\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "QfdiNlKmKZI-"
},
"source": [
"from collections import OrderedDict\n",
"from tabulate import tabulate\n",
"import matplotlib.pyplot as plt\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"import torch\n",
"import torch.nn as nn\n",
"import torch.nn.functional as F\n",
"import torch.nn.init as init\n",
"from torch.utils.data import Dataset, DataLoader\n",
"from torchvision import transforms\n",
"\n",
"from google_trans_new import google_translator "
],
"execution_count": 1,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "f9CoQ9_6dgNs"
},
"source": [
"### Download data and pre-trained model"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "czltDsN7YnOF",
"outputId": "0f95d4cc-9f97-426a-84b0-c1eff0e10cca"
},
"source": [
"!wget https://github.com/sparsh-ai/reco-data/raw/master/anime/anime_cleaned.csv\n",
"!wget https://github.com/sparsh-ai/reco-data/raw/master/anime/anime_genres.csv\n",
"!wget https://github.com/sparsh-ai/reco-data/raw/master/anime/clusters.csv\n",
"!wget https://github.com/sparsh-ai/reco-data/raw/master/anime/inputFormater.csv\n",
"!gdown --id 1LV7VHOTqU5WgBYxfRcUeY31dbhcBqyzb\n",
"!gdown --id 14x3TgzhFl-XCHjJHtX-mZrtTSkJgIIey"
],
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"text": [
"--2021-06-25 19:04:45-- https://github.com/sparsh-ai/reco-data/raw/master/anime/anime_cleaned.csv\n",
"Resolving github.com (github.com)... 192.30.255.112\n",
"Connecting to github.com (github.com)|192.30.255.112|:443... connected.\n",
"HTTP request sent, awaiting response... 302 Found\n",
"Location: https://raw.githubusercontent.com/sparsh-ai/reco-data/master/anime/anime_cleaned.csv [following]\n",
"--2021-06-25 19:04:45-- https://raw.githubusercontent.com/sparsh-ai/reco-data/master/anime/anime_cleaned.csv\n",
"Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...\n",
"Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 6326231 (6.0M) [text/plain]\n",
"Saving to: ‘anime_cleaned.csv’\n",
"\n",
"anime_cleaned.csv 100%[===================>] 6.03M 39.1MB/s in 0.2s \n",
"\n",
"2021-06-25 19:04:46 (39.1 MB/s) - ‘anime_cleaned.csv’ saved [6326231/6326231]\n",
"\n",
"--2021-06-25 19:04:46-- https://github.com/sparsh-ai/reco-data/raw/master/anime/anime_genres.csv\n",
"Resolving github.com (github.com)... 192.30.255.113\n",
"Connecting to github.com (github.com)|192.30.255.113|:443... connected.\n",
"HTTP request sent, awaiting response... 302 Found\n",
"Location: https://raw.githubusercontent.com/sparsh-ai/reco-data/master/anime/anime_genres.csv [following]\n",
"--2021-06-25 19:04:46-- https://raw.githubusercontent.com/sparsh-ai/reco-data/master/anime/anime_genres.csv\n",
"Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...\n",
"Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 1346938 (1.3M) [text/plain]\n",
"Saving to: ‘anime_genres.csv’\n",
"\n",
"anime_genres.csv 100%[===================>] 1.28M --.-KB/s in 0.07s \n",
"\n",
"2021-06-25 19:04:46 (17.6 MB/s) - ‘anime_genres.csv’ saved [1346938/1346938]\n",
"\n",
"--2021-06-25 19:04:46-- https://github.com/sparsh-ai/reco-data/raw/master/anime/clusters.csv\n",
"Resolving github.com (github.com)... 192.30.255.112\n",
"Connecting to github.com (github.com)|192.30.255.112|:443... connected.\n",
"HTTP request sent, awaiting response... 302 Found\n",
"Location: https://raw.githubusercontent.com/sparsh-ai/reco-data/master/anime/clusters.csv [following]\n",
"--2021-06-25 19:04:47-- https://raw.githubusercontent.com/sparsh-ai/reco-data/master/anime/clusters.csv\n",
"Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n",
"Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 289248 (282K) [text/plain]\n",
"Saving to: ‘clusters.csv’\n",
"\n",
"clusters.csv 100%[===================>] 282.47K --.-KB/s in 0.02s \n",
"\n",
"2021-06-25 19:04:47 (11.9 MB/s) - ‘clusters.csv’ saved [289248/289248]\n",
"\n",
"--2021-06-25 19:04:47-- https://github.com/sparsh-ai/reco-data/raw/master/anime/inputFormater.csv\n",
"Resolving github.com (github.com)... 192.30.255.112\n",
"Connecting to github.com (github.com)|192.30.255.112|:443... connected.\n",
"HTTP request sent, awaiting response... 302 Found\n",
"Location: https://raw.githubusercontent.com/sparsh-ai/reco-data/master/anime/inputFormater.csv [following]\n",
"--2021-06-25 19:04:48-- https://raw.githubusercontent.com/sparsh-ai/reco-data/master/anime/inputFormater.csv\n",
"Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...\n",
"Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 35960 (35K) [text/plain]\n",
"Saving to: ‘inputFormater.csv’\n",
"\n",
"inputFormater.csv 100%[===================>] 35.12K --.-KB/s in 0.001s \n",
"\n",
"2021-06-25 19:04:48 (26.5 MB/s) - ‘inputFormater.csv’ saved [35960/35960]\n",
"\n",
"Downloading...\n",
"From: https://drive.google.com/uc?id=1LV7VHOTqU5WgBYxfRcUeY31dbhcBqyzb\n",
"To: /content/autoEncoder.pth\n",
"291MB [00:01, 149MB/s]\n",
"Downloading...\n",
"From: https://drive.google.com/uc?id=14x3TgzhFl-XCHjJHtX-mZrtTSkJgIIey\n",
"To: /content/similar_anime_genre.csv\n",
"175MB [00:00, 212MB/s]\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "xSX9cwTnXfnn"
},
"source": [
"def top_animes(genre, ani_genre, all_anime):\n",
" top = []\n",
" print(\"\\nTop\", genre)\n",
" temp = list(ani_genre[ani_genre[genre]==1]['anime_id'])\n",
" temp = list(filter(lambda x: x in all_anime.index, temp))\n",
" temp.sort(key=lambda x: all_anime['score'][x], reverse=True)\n",
"\n",
" for i in range(5):\n",
" r = [i+1, temp[i], all_anime['title'][temp[i]], all_anime['title_english'][temp[i]],\n",
" all_anime['score'][temp[i]], all_anime['genre'][temp[i]]]\n",
" top.append(r)\n",
"\n",
" table = tabulate(top, headers=['S.No.', 'Anime ID', 'Title', 'English Title',\n",
" 'Anime Score', 'Anime Genre'], tablefmt='orgtbl')\n",
" print(table)"
],
"execution_count": 29,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "M5AMVrYVZt9Q",
"outputId": "1cbaa86c-1792-4ca4-ef41-acae64d4e3dd"
},
"source": [
"results = pd.read_csv('clusters.csv')\n",
"results.head()"
],
"execution_count": 2,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"