{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "2021-07-22-ravelry-pattern-recommender.ipynb",
"provenance": [],
"collapsed_sections": [],
"mount_file_id": "1wWG86wgHlCiQwniLfd6EhuCdMwN0_LMF",
"authorship_tag": "ABX9TyPH10dgglucbENo8EjTWHHo"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "FIOyUcOgRmGf"
},
"source": [
"# Ravelry Pattern Recommender\n",
"> Recommend Wool knitted desings (artifacts) to users\n",
"\n",
"- toc: true\n",
"- badges: true\n",
"- comments: true\n",
"- categories: [SVD, Surprise, API, Art&Culture]\n",
"- image:"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nwRjVKbeKNVT"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7F7Ixp_qLYEG"
},
"source": [
"[Ravelry](https://www.ravelry.com/about) describes itself as a place for knitters, crocheters, designers, spinners, weavers and dyers to keep track of their yarn, tools, project and pattern information, and look to others for ideas and inspiration."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vmY1dm8CORIP"
},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"metadata": {
"id": "1r1i9yf6Oa7K"
},
"source": [
"!pip install -q surprise"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "Sz5ifkA8OYvb"
},
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import datetime as dt\n",
"\n",
"import surprise\n",
"from surprise.prediction_algorithms import *\n",
"from surprise import Reader, Dataset\n",
"from surprise import SVD, accuracy\n",
"from surprise.model_selection import train_test_split\n",
"from surprise.model_selection import GridSearchCV\n",
"\n",
"import warnings\n",
"warnings.filterwarnings('ignore')"
],
"execution_count": 16,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "gXowNBAnmDh9",
"outputId": "b18f0b69-7e47-4302-cb10-c3c86ad26a99"
},
"source": [
"!pip install -q watermark\n",
"%reload_ext watermark\n",
"%watermark -m -iv"
],
"execution_count": 53,
"outputs": [
{
"output_type": "stream",
"text": [
"Compiler : GCC 7.5.0\n",
"OS : Linux\n",
"Release : 5.4.104+\n",
"Machine : x86_64\n",
"Processor : x86_64\n",
"CPU cores : 2\n",
"Architecture: 64bit\n",
"\n",
"numpy : 1.19.5\n",
"IPython : 5.5.0\n",
"pandas : 1.1.5\n",
"surprise: 0.1\n",
"sys : 3.7.11 (default, Jul 3 2021, 18:01:19) \n",
"[GCC 7.5.0]\n",
"\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "d9kuIz7HmDh_"
},
"source": [
"# Poetry is a tool for dependency management and packaging in Python. \n",
"# It allows you to declare the libraries your project depends on and it will manage (install/update) them for you.\n",
"# https://python-poetry.org/docs/basic-usage/\n",
"# !curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/install-poetry.py | python -\n",
"# !/root/.local/bin/poetry --version\n",
"# !/root/.local/bin/poetry new poetry-demo\n",
"# %cd poetry-demo\n",
"# !/root/.local/bin/poetry install\n",
"# !/root/.local/bin/poetry add numpy"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "XvWdTkkYQ0NY"
},
"source": [
"## What are patterns?"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "siXUALtfRIJt"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7Ev4xaxlRbnV"
},
"source": [
"> youtube: https://youtu.be/ybEClAPFF8M"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tF7mJenvOU04"
},
"source": [
"## Data"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UriwzI9lNJVT"
},
"source": [
"### Data Fetching from API (optional)"
]
},
{
"cell_type": "code",
"metadata": {
"id": "_mk9kmfFKbKO"
},
"source": [
"# import pandas as pd\n",
"# import requests\n",
"# import json\n",
"# import random\n",
"# import numpy\n",
"\n",
"# with open('creds.json') as f:\n",
"# creds = json.load(f)\n",
"\n",
"# users = []\n",
"\n",
"# for i in random.sample(range(1, 12000000), 50000):\n",
"# try:\n",
"# url ='https://api.ravelry.com/people/' + str(i) +'.json'\n",
"# response = requests.get(url, auth=(creds['id'], creds['key']))\n",
"# users.append(response.json()['user']['username'])\n",
"# except ValueError: \n",
"# pass\n",
"# if len(set(users)) >10000:\n",
"# break\n",
"\n",
"# parsed_data = []\n",
"\n",
"# for i, user in enumerate(users[9935:]): \n",
" \n",
"# url ='https://api.ravelry.com/projects/' + user + '/list.json?sort=completed_'\n",
"# response = requests.get(url, auth=(creds['id'], creds['key']))\n",
" \n",
"# try:\n",
"# for project in response.json()['projects']:\n",
"# if project['craft_name'] == 'Knitting': \n",
"# if project['pattern_id'] != None:\n",
"# pattern_url ='https://api.ravelry.com/patterns.json?ids=' + str(int(project['pattern_id']))\n",
"# pattern_response = requests.get(pattern_url, auth=(creds['id'], creds['key']))\n",
"# project_tuple = (user, project['completed'], project['rating'], project['status_name'], \n",
"# project['pattern_id'],\n",
"# pattern_response.json()['patterns'][str(int(project['pattern_id']))]['rating_average'],\n",
"# pattern_response.json()['patterns'][str(int(project['pattern_id']))]['rating_count'])\n",
"# parsed_data.append(project_tuple) \n",
" \n",
"# except ValueError:\n",
"# pass\n",
"\n",
"# print(i, len(parsed_data))\n",
"\n",
"# df = pd.DataFrame(parsed_data, columns= ['user', 'completed', 'rating', 'status', 'pattern_id', 'average_rating', 'rating_count'])\n",
"\n",
"# finished_projects = df[df['status'] == 'Finished']\n",
"# finished_projects.to_csv('ravelry_interactions.csv', index=False)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "eW8b8InXSPAq"
},
"source": [
"### Direct loading from drive"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 419
},
"id": "sTgvoAn-OPY2",
"outputId": "44f81d91-4abb-4849-bc96-48b883d8e468"
},
"source": [
"df = pd.read_csv('https://raw.githubusercontent.com/recohut/reco-data/ravelry/ravelry/v1/ravelry_interactions.csv')\n",
"df"
],
"execution_count": 8,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" user | \n",
" completed | \n",
" rating | \n",
" status | \n",
" pattern_id | \n",
" average_rating | \n",
" rating_count | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" hannahcf | \n",
" 2018/11/25 | \n",
" 4.0 | \n",
" Finished | \n",
" 17468 | \n",
" 4.520376 | \n",
" 1276.0 | \n",
"
\n",
" \n",
" | 1 | \n",
" hannahcf | \n",
" 2018/05/05 | \n",
" 4.0 | \n",
" Finished | \n",
" 629964 | \n",
" 4.285714 | \n",
" 7.0 | \n",
"
\n",
" \n",
" | 2 | \n",
" hannahcf | \n",
" 2018/03/13 | \n",
" 4.0 | \n",
" Finished | \n",
" 287992 | \n",
" 4.554957 | \n",
" 928.0 | \n",
"
\n",
" \n",
" | 3 | \n",
" hannahcf | \n",
" 2018/01/31 | \n",
" 4.0 | \n",
" Finished | \n",
" 475167 | \n",
" 4.500000 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 4 | \n",
" hannahcf | \n",
" 2018/01/05 | \n",
" 4.0 | \n",
" Finished | \n",
" 544863 | \n",
" 4.662651 | \n",
" 332.0 | \n",
"
\n",
" \n",
" | ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" | 11117 | \n",
" creativratte | \n",
" 2015/08/14 | \n",
" 4.0 | \n",
" Finished | \n",
" 223414 | \n",
" 4.611111 | \n",
" 18.0 | \n",
"
\n",
" \n",
" | 11118 | \n",
" creativratte | \n",
" 2015/06/21 | \n",
" 4.0 | \n",
" Finished | \n",
" 467793 | \n",
" 4.667697 | \n",
" 647.0 | \n",
"
\n",
" \n",
" | 11119 | \n",
" creativratte | \n",
" 2015/05/10 | \n",
" 4.0 | \n",
" Finished | \n",
" 211562 | \n",
" 4.668034 | \n",
" 9263.0 | \n",
"
\n",
" \n",
" | 11120 | \n",
" creativratte | \n",
" 2015/02/07 | \n",
" 4.0 | \n",
" Finished | \n",
" 250525 | \n",
" 4.629565 | \n",
" 575.0 | \n",
"
\n",
" \n",
" | 11121 | \n",
" creativratte | \n",
" 2014/09/28 | \n",
" 3.0 | \n",
" Finished | \n",
" 363243 | \n",
" 4.615530 | \n",
" 528.0 | \n",
"
\n",
" \n",
"
\n",
"
11122 rows × 7 columns
\n",
"
"
],
"text/plain": [
" user completed ... average_rating rating_count\n",
"0 hannahcf 2018/11/25 ... 4.520376 1276.0\n",
"1 hannahcf 2018/05/05 ... 4.285714 7.0\n",
"2 hannahcf 2018/03/13 ... 4.554957 928.0\n",
"3 hannahcf 2018/01/31 ... 4.500000 4.0\n",
"4 hannahcf 2018/01/05 ... 4.662651 332.0\n",
"... ... ... ... ... ...\n",
"11117 creativratte 2015/08/14 ... 4.611111 18.0\n",
"11118 creativratte 2015/06/21 ... 4.667697 647.0\n",
"11119 creativratte 2015/05/10 ... 4.668034 9263.0\n",
"11120 creativratte 2015/02/07 ... 4.629565 575.0\n",
"11121 creativratte 2014/09/28 ... 4.615530 528.0\n",
"\n",
"[11122 rows x 7 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 8
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "MjAup1CpSTxb"
},
"source": [
"## Preprocessing"
]
},
{
"cell_type": "code",
"metadata": {
"id": "N9l47cqiO3oN",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 419
},
"outputId": "77416481-dc14-4fc3-b26f-0439aa2a980d"
},
"source": [
"df_drop_nans = df[['user', 'pattern_id', 'rating']].dropna(subset = ['rating'])\n",
"df_drop_nans"
],
"execution_count": 9,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" user | \n",
" pattern_id | \n",
" rating | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" hannahcf | \n",
" 17468 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 1 | \n",
" hannahcf | \n",
" 629964 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 2 | \n",
" hannahcf | \n",
" 287992 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 3 | \n",
" hannahcf | \n",
" 475167 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 4 | \n",
" hannahcf | \n",
" 544863 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" | 11117 | \n",
" creativratte | \n",
" 223414 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 11118 | \n",
" creativratte | \n",
" 467793 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 11119 | \n",
" creativratte | \n",
" 211562 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 11120 | \n",
" creativratte | \n",
" 250525 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 11121 | \n",
" creativratte | \n",
" 363243 | \n",
" 3.0 | \n",
"
\n",
" \n",
"
\n",
"
8794 rows × 3 columns
\n",
"
"
],
"text/plain": [
" user pattern_id rating\n",
"0 hannahcf 17468 4.0\n",
"1 hannahcf 629964 4.0\n",
"2 hannahcf 287992 4.0\n",
"3 hannahcf 475167 4.0\n",
"4 hannahcf 544863 4.0\n",
"... ... ... ...\n",
"11117 creativratte 223414 4.0\n",
"11118 creativratte 467793 4.0\n",
"11119 creativratte 211562 4.0\n",
"11120 creativratte 250525 4.0\n",
"11121 creativratte 363243 3.0\n",
"\n",
"[8794 rows x 3 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 9
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "vLxYSS1GO3oO",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 142
},
"outputId": "049b21de-3d78-40f5-8219-889dd19f10e3"
},
"source": [
"df_drop_nans.describe(include='all').T"
],
"execution_count": 12,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" count | \n",
" unique | \n",
" top | \n",
" freq | \n",
" mean | \n",
" std | \n",
" min | \n",
" 25% | \n",
" 50% | \n",
" 75% | \n",
" max | \n",
"
\n",
" \n",
" \n",
" \n",
" | user | \n",
" 8794 | \n",
" 383 | \n",
" ciri | \n",
" 348 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" | pattern_id | \n",
" 8794 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 352854 | \n",
" 295926 | \n",
" 16 | \n",
" 106818 | \n",
" 267731 | \n",
" 564484 | \n",
" 1.1567e+06 | \n",
"
\n",
" \n",
" | rating | \n",
" 8794 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 3.65783 | \n",
" 0.569459 | \n",
" 0 | \n",
" 3 | \n",
" 4 | \n",
" 4 | \n",
" 4 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" count unique top freq ... 25% 50% 75% max\n",
"user 8794 383 ciri 348 ... NaN NaN NaN NaN\n",
"pattern_id 8794 NaN NaN NaN ... 106818 267731 564484 1.1567e+06\n",
"rating 8794 NaN NaN NaN ... 3 4 4 4\n",
"\n",
"[3 rows x 11 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 12
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "hNNOqLgGO3oP",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 419
},
"outputId": "a1ce6f18-68e4-496f-b291-f28a20346a65"
},
"source": [
"df_replace_nans = df[['user', 'pattern_id', 'rating', 'average_rating']]\n",
"rating_replace_nans = df_replace_nans['rating'].fillna(df_replace_nans['average_rating'])\n",
"df_replace_nans['rating'] = rating_replace_nans\n",
"df_replace_nans.drop(columns = 'average_rating', inplace = True)\n",
"df_replace_nans"
],
"execution_count": 14,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" user | \n",
" pattern_id | \n",
" rating | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" hannahcf | \n",
" 17468 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 1 | \n",
" hannahcf | \n",
" 629964 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 2 | \n",
" hannahcf | \n",
" 287992 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 3 | \n",
" hannahcf | \n",
" 475167 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 4 | \n",
" hannahcf | \n",
" 544863 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" | 11117 | \n",
" creativratte | \n",
" 223414 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 11118 | \n",
" creativratte | \n",
" 467793 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 11119 | \n",
" creativratte | \n",
" 211562 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 11120 | \n",
" creativratte | \n",
" 250525 | \n",
" 4.0 | \n",
"
\n",
" \n",
" | 11121 | \n",
" creativratte | \n",
" 363243 | \n",
" 3.0 | \n",
"
\n",
" \n",
"
\n",
"
11122 rows × 3 columns
\n",
"
"
],
"text/plain": [
" user pattern_id rating\n",
"0 hannahcf 17468 4.0\n",
"1 hannahcf 629964 4.0\n",
"2 hannahcf 287992 4.0\n",
"3 hannahcf 475167 4.0\n",
"4 hannahcf 544863 4.0\n",
"... ... ... ...\n",
"11117 creativratte 223414 4.0\n",
"11118 creativratte 467793 4.0\n",
"11119 creativratte 211562 4.0\n",
"11120 creativratte 250525 4.0\n",
"11121 creativratte 363243 3.0\n",
"\n",
"[11122 rows x 3 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 14
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "g7SkotwASXDO"
},
"source": [
"## Surprise Dataset"
]
},
{
"cell_type": "code",
"metadata": {
"id": "CQDgqT1QO3oQ"
},
"source": [
"reader = Reader()\n",
"data_replace = Dataset.load_from_df(df_replace_nans, reader)\n",
"data_drop = Dataset.load_from_df(df_drop_nans, reader)"
],
"execution_count": 17,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "h5PaCaUQO3oQ"
},
"source": [
"drop_trainset, drop_testset = train_test_split(data_drop, test_size=0.25)\n",
"replace_trainset, replace_testset = train_test_split(data_replace, test_size=0.25)"
],
"execution_count": 18,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "viTrkkNxO3oQ",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "6a22f2ba-8a25-4234-c1c6-3884b8698528"
},
"source": [
"drop_trainset.global_mean"
],
"execution_count": 27,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"3.6579226686884003"
]
},
"metadata": {
"tags": []
},
"execution_count": 27
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Fp7kMisbSZqy"
},
"source": [
"## SVD model"
]
},
{
"cell_type": "code",
"metadata": {
"scrolled": true,
"id": "r8sN2QQmO3oR",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "7bc9f777-8c4d-4450-bc94-3a6387a9b3f0"
},
"source": [
"algo = SVD(n_factors = 50, n_epochs = 45, lr_all = 0.004, reg_all = 0.2)\n",
"algo.fit(drop_trainset)"
],
"execution_count": 20,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
""
]
},
"metadata": {
"tags": []
},
"execution_count": 20
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "4aYUffujO3oS"
},
"source": [
"predictions = algo.test(drop_testset)"
],
"execution_count": 21,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "HkkxVTKZO3oS",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "a65302c8-6a56-491a-9340-6e6967fc243f"
},
"source": [
"accuracy.rmse(predictions)"
],
"execution_count": 22,
"outputs": [
{
"output_type": "stream",
"text": [
"RMSE: 0.4870\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0.4869683491545024"
]
},
"metadata": {
"tags": []
},
"execution_count": 22
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "I1ual6JfSdVa"
},
"source": [
"## Hyperparameter search"
]
},
{
"cell_type": "code",
"metadata": {
"id": "eRfVa1_8O3oT",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "aa554557-f3fe-4ec9-da00-b6d7a126d546"
},
"source": [
"param_grid = {'n_factors':[5, 10, 15, 20, 25, 30, 35, 40, 45, 50],\n",
" 'n_epochs': [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], \n",
" 'lr_all': [0.002, 0.003, 0.004, 0.005],\n",
" 'reg_all': [0.2, 0.3, 0.4, 0.5, 0.6]}\n",
"\n",
"gs_model = GridSearchCV(SVD,\n",
" param_grid=param_grid,\n",
" n_jobs = -1,\n",
" joblib_verbose=5)\n",
"\n",
"gs_model.fit(data_drop)\n",
"\n",
"gs_model.best_params"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.\n",
"[Parallel(n_jobs=-1)]: Done 14 tasks | elapsed: 4.6s\n",
"[Parallel(n_jobs=-1)]: Done 68 tasks | elapsed: 18.1s\n",
"[Parallel(n_jobs=-1)]: Done 158 tasks | elapsed: 42.0s\n",
"[Parallel(n_jobs=-1)]: Done 284 tasks | elapsed: 1.3min\n",
"[Parallel(n_jobs=-1)]: Done 446 tasks | elapsed: 2.2min\n",
"[Parallel(n_jobs=-1)]: Done 644 tasks | elapsed: 3.4min\n",
"[Parallel(n_jobs=-1)]: Done 878 tasks | elapsed: 5.0min\n",
"[Parallel(n_jobs=-1)]: Done 1148 tasks | elapsed: 6.6min\n",
"[Parallel(n_jobs=-1)]: Done 1454 tasks | elapsed: 8.3min\n",
"[Parallel(n_jobs=-1)]: Done 1796 tasks | elapsed: 10.6min\n",
"[Parallel(n_jobs=-1)]: Done 2174 tasks | elapsed: 13.1min\n",
"[Parallel(n_jobs=-1)]: Done 2588 tasks | elapsed: 15.6min\n",
"[Parallel(n_jobs=-1)]: Done 3038 tasks | elapsed: 19.2min\n",
"[Parallel(n_jobs=-1)]: Done 3524 tasks | elapsed: 22.0min\n",
"[Parallel(n_jobs=-1)]: Done 4046 tasks | elapsed: 26.4min\n",
"[Parallel(n_jobs=-1)]: Done 4604 tasks | elapsed: 29.8min\n",
"[Parallel(n_jobs=-1)]: Done 5198 tasks | elapsed: 34.4min\n",
"[Parallel(n_jobs=-1)]: Done 5828 tasks | elapsed: 39.3min\n",
"[Parallel(n_jobs=-1)]: Done 6494 tasks | elapsed: 44.1min\n",
"[Parallel(n_jobs=-1)]: Done 7196 tasks | elapsed: 50.1min\n",
"[Parallel(n_jobs=-1)]: Done 7934 tasks | elapsed: 56.6min\n",
"[Parallel(n_jobs=-1)]: Done 8708 tasks | elapsed: 62.5min\n",
"[Parallel(n_jobs=-1)]: Done 9518 tasks | elapsed: 69.4min\n"
],
"name": "stderr"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6K4HAML7Sg3G"
},
"source": [
"## Inference (rating prediction)"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "l3Vzqa8EjvRC",
"outputId": "7d088e26-8c2e-4b78-8575-f9e151e0d48f"
},
"source": [
"algo = SVD(**gs_model.best_params['rmse'])\n",
"algo.fit(drop_trainset)\n",
"predictions = algo.test(drop_testset)\n",
"accuracy.rmse(predictions)"
],
"execution_count": 48,
"outputs": [
{
"output_type": "stream",
"text": [
"RMSE: 0.4866\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0.4866481194826417"
]
},
"metadata": {
"tags": []
},
"execution_count": 48
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "Zhj3FhxgO3oV"
},
"source": [
"predictions_df = pd.DataFrame({\"user\": [prediction.uid for prediction in predictions],\n",
" \"item\": [prediction.iid for prediction in predictions],\n",
" \"actual\": [prediction.r_ui for prediction in predictions],\n",
" \"estimated\" :[prediction.est for prediction in predictions]})"
],
"execution_count": 49,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "sdPRZ6qGO3oV",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 297
},
"outputId": "dd30950a-58dd-45fd-a56d-cebdc9a911d1"
},
"source": [
"predictions_df[predictions_df['user'] == 'Ona'].describe()"
],
"execution_count": 50,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" item | \n",
" actual | \n",
" estimated | \n",
"
\n",
" \n",
" \n",
" \n",
" | count | \n",
" 2.000000 | \n",
" 2.000000 | \n",
" 2.000000 | \n",
"
\n",
" \n",
" | mean | \n",
" 269733.500000 | \n",
" 3.500000 | \n",
" 3.798873 | \n",
"
\n",
" \n",
" | std | \n",
" 124075.319788 | \n",
" 0.707107 | \n",
" 0.028703 | \n",
"
\n",
" \n",
" | min | \n",
" 181999.000000 | \n",
" 3.000000 | \n",
" 3.778577 | \n",
"
\n",
" \n",
" | 25% | \n",
" 225866.250000 | \n",
" 3.250000 | \n",
" 3.788725 | \n",
"
\n",
" \n",
" | 50% | \n",
" 269733.500000 | \n",
" 3.500000 | \n",
" 3.798873 | \n",
"
\n",
" \n",
" | 75% | \n",
" 313600.750000 | \n",
" 3.750000 | \n",
" 3.809021 | \n",
"
\n",
" \n",
" | max | \n",
" 357468.000000 | \n",
" 4.000000 | \n",
" 3.819169 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" item actual estimated\n",
"count 2.000000 2.000000 2.000000\n",
"mean 269733.500000 3.500000 3.798873\n",
"std 124075.319788 0.707107 0.028703\n",
"min 181999.000000 3.000000 3.778577\n",
"25% 225866.250000 3.250000 3.788725\n",
"50% 269733.500000 3.500000 3.798873\n",
"75% 313600.750000 3.750000 3.809021\n",
"max 357468.000000 4.000000 3.819169"
]
},
"metadata": {
"tags": []
},
"execution_count": 50
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "i8IWyRjAO3oW",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 855
},
"outputId": "9f9257ea-4676-4f3b-8102-36f070ddad3a"
},
"source": [
"df[df['user'] == \"Ona\"]"
],
"execution_count": 51,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" user | \n",
" completed | \n",
" rating | \n",
" status | \n",
" pattern_id | \n",
" average_rating | \n",
" rating_count | \n",
"
\n",
" \n",
" \n",
" \n",
" | 182 | \n",
" Ona | \n",
" 2015/04/22 | \n",
" 3.0 | \n",
" Finished | \n",
" 546649 | \n",
" 4.615385 | \n",
" 13.0 | \n",
"
\n",
" \n",
" | 183 | \n",
" Ona | \n",
" 2015/04/12 | \n",
" 3.0 | \n",
" Finished | \n",
" 393371 | \n",
" 4.531915 | \n",
" 47.0 | \n",
"
\n",
" \n",
" | 184 | \n",
" Ona | \n",
" 2014/11/26 | \n",
" NaN | \n",
" Finished | \n",
" 213483 | \n",
" 4.757460 | \n",
" 1307.0 | \n",
"
\n",
" \n",
" | 185 | \n",
" Ona | \n",
" 2014/08/29 | \n",
" 3.0 | \n",
" Finished | \n",
" 327637 | \n",
" 4.630499 | \n",
" 341.0 | \n",
"
\n",
" \n",
" | 186 | \n",
" Ona | \n",
" 2013/03/20 | \n",
" 4.0 | \n",
" Finished | \n",
" 359828 | \n",
" 4.586093 | \n",
" 302.0 | \n",
"
\n",
" \n",
" | 187 | \n",
" Ona | \n",
" 2013/01/13 | \n",
" 4.0 | \n",
" Finished | \n",
" 89104 | \n",
" 3.933333 | \n",
" 15.0 | \n",
"
\n",
" \n",
" | 188 | \n",
" Ona | \n",
" 2013/01/01 | \n",
" 4.0 | \n",
" Finished | \n",
" 134174 | \n",
" 4.523490 | \n",
" 298.0 | \n",
"
\n",
" \n",
" | 189 | \n",
" Ona | \n",
" 2012/12/27 | \n",
" 4.0 | \n",
" Finished | \n",
" 357468 | \n",
" 4.333333 | \n",
" 60.0 | \n",
"
\n",
" \n",
" | 190 | \n",
" Ona | \n",
" 2012/12/07 | \n",
" 4.0 | \n",
" Finished | \n",
" 10760 | \n",
" 4.209974 | \n",
" 381.0 | \n",
"
\n",
" \n",
" | 191 | \n",
" Ona | \n",
" 2012/12/04 | \n",
" 4.0 | \n",
" Finished | \n",
" 293353 | \n",
" 4.626655 | \n",
" 2341.0 | \n",
"
\n",
" \n",
" | 192 | \n",
" Ona | \n",
" 2012/12/02 | \n",
" 4.0 | \n",
" Finished | \n",
" 357468 | \n",
" 4.333333 | \n",
" 60.0 | \n",
"
\n",
" \n",
" | 193 | \n",
" Ona | \n",
" 2012/11/25 | \n",
" 4.0 | \n",
" Finished | \n",
" 323490 | \n",
" 4.421384 | \n",
" 477.0 | \n",
"
\n",
" \n",
" | 194 | \n",
" Ona | \n",
" 2012/11/12 | \n",
" 4.0 | \n",
" Finished | \n",
" 2525 | \n",
" 4.331984 | \n",
" 494.0 | \n",
"
\n",
" \n",
" | 195 | \n",
" Ona | \n",
" 2012/11/10 | \n",
" 4.0 | \n",
" Finished | \n",
" 88575 | \n",
" 4.441176 | \n",
" 34.0 | \n",
"
\n",
" \n",
" | 196 | \n",
" Ona | \n",
" 2012/10/18 | \n",
" 4.0 | \n",
" Finished | \n",
" 108356 | \n",
" 4.410345 | \n",
" 290.0 | \n",
"
\n",
" \n",
" | 197 | \n",
" Ona | \n",
" 2010/09/03 | \n",
" 3.0 | \n",
" Finished | \n",
" 170995 | \n",
" 4.410042 | \n",
" 239.0 | \n",
"
\n",
" \n",
" | 198 | \n",
" Ona | \n",
" 2010/08/30 | \n",
" 3.0 | \n",
" Finished | \n",
" 181999 | \n",
" 4.400741 | \n",
" 2700.0 | \n",
"
\n",
" \n",
" | 199 | \n",
" Ona | \n",
" 2010/08/27 | \n",
" 3.0 | \n",
" Finished | \n",
" 185648 | \n",
" 4.481057 | \n",
" 1135.0 | \n",
"
\n",
" \n",
" | 200 | \n",
" Ona | \n",
" 2010/08/22 | \n",
" 4.0 | \n",
" Finished | \n",
" 181999 | \n",
" 4.400741 | \n",
" 2700.0 | \n",
"
\n",
" \n",
" | 201 | \n",
" Ona | \n",
" 2010/08/12 | \n",
" 4.0 | \n",
" Finished | \n",
" 156437 | \n",
" 4.608737 | \n",
" 4235.0 | \n",
"
\n",
" \n",
" | 202 | \n",
" Ona | \n",
" 2010/07/26 | \n",
" 4.0 | \n",
" Finished | \n",
" 108356 | \n",
" 4.410345 | \n",
" 290.0 | \n",
"
\n",
" \n",
" | 203 | \n",
" Ona | \n",
" 2010/07/25 | \n",
" 4.0 | \n",
" Finished | \n",
" 92403 | \n",
" 4.309859 | \n",
" 71.0 | \n",
"
\n",
" \n",
" | 204 | \n",
" Ona | \n",
" 2010/07/13 | \n",
" 4.0 | \n",
" Finished | \n",
" 144814 | \n",
" 4.500000 | \n",
" 104.0 | \n",
"
\n",
" \n",
" | 205 | \n",
" Ona | \n",
" 2010/07/06 | \n",
" 4.0 | \n",
" Finished | \n",
" 144814 | \n",
" 4.500000 | \n",
" 104.0 | \n",
"
\n",
" \n",
" | 206 | \n",
" Ona | \n",
" NaN | \n",
" 4.0 | \n",
" Finished | \n",
" 532791 | \n",
" 4.684211 | \n",
" 19.0 | \n",
"
\n",
" \n",
" | 207 | \n",
" Ona | \n",
" NaN | \n",
" NaN | \n",
" Finished | \n",
" 367817 | \n",
" 4.297872 | \n",
" 47.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" user completed rating ... pattern_id average_rating rating_count\n",
"182 Ona 2015/04/22 3.0 ... 546649 4.615385 13.0\n",
"183 Ona 2015/04/12 3.0 ... 393371 4.531915 47.0\n",
"184 Ona 2014/11/26 NaN ... 213483 4.757460 1307.0\n",
"185 Ona 2014/08/29 3.0 ... 327637 4.630499 341.0\n",
"186 Ona 2013/03/20 4.0 ... 359828 4.586093 302.0\n",
"187 Ona 2013/01/13 4.0 ... 89104 3.933333 15.0\n",
"188 Ona 2013/01/01 4.0 ... 134174 4.523490 298.0\n",
"189 Ona 2012/12/27 4.0 ... 357468 4.333333 60.0\n",
"190 Ona 2012/12/07 4.0 ... 10760 4.209974 381.0\n",
"191 Ona 2012/12/04 4.0 ... 293353 4.626655 2341.0\n",
"192 Ona 2012/12/02 4.0 ... 357468 4.333333 60.0\n",
"193 Ona 2012/11/25 4.0 ... 323490 4.421384 477.0\n",
"194 Ona 2012/11/12 4.0 ... 2525 4.331984 494.0\n",
"195 Ona 2012/11/10 4.0 ... 88575 4.441176 34.0\n",
"196 Ona 2012/10/18 4.0 ... 108356 4.410345 290.0\n",
"197 Ona 2010/09/03 3.0 ... 170995 4.410042 239.0\n",
"198 Ona 2010/08/30 3.0 ... 181999 4.400741 2700.0\n",
"199 Ona 2010/08/27 3.0 ... 185648 4.481057 1135.0\n",
"200 Ona 2010/08/22 4.0 ... 181999 4.400741 2700.0\n",
"201 Ona 2010/08/12 4.0 ... 156437 4.608737 4235.0\n",
"202 Ona 2010/07/26 4.0 ... 108356 4.410345 290.0\n",
"203 Ona 2010/07/25 4.0 ... 92403 4.309859 71.0\n",
"204 Ona 2010/07/13 4.0 ... 144814 4.500000 104.0\n",
"205 Ona 2010/07/06 4.0 ... 144814 4.500000 104.0\n",
"206 Ona NaN 4.0 ... 532791 4.684211 19.0\n",
"207 Ona NaN NaN ... 367817 4.297872 47.0\n",
"\n",
"[26 rows x 7 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 51
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "W50r3F1cO3oW",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "b056c023-e59e-44b5-caf9-15050d4e2f29"
},
"source": [
"algo.predict('Ona', 1)"
],
"execution_count": 52,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Prediction(uid='Ona', iid=1, r_ui=None, est=3.7394823583945174, details={'was_impossible': False})"
]
},
"metadata": {
"tags": []
},
"execution_count": 52
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3UiBUobDSzt9"
},
"source": [
"## References\n",
"1. https://github.com/clareadunne/PatternRecommender `code`\n",
"2. https://www.ravelry.com/account/login `site`\n",
"3. [Googoe Images search service](https://www.google.com/search?q=ravelry&rlz=1C1GCEA_enIN909IN909&sxsrf=ALeKk01ydUvKpgGnU5d_WLs-CPtDmfh2wQ:1626942092877&source=lnms&tbm=isch&sa=X&ved=2ahUKEwjg7I_Mn_bxAhWpzDgGHe6_BjgQ_AUoAXoECAEQAw&biw=1366&bih=657) `site`\n",
"4. https://youtu.be/ybEClAPFF8M `video`"
]
}
]
}