{ "cells": [ { "cell_type": "markdown", "id": "55a5f31d", "metadata": {}, "source": [ "# GOODREADS CONTENT-BASED BOOK RECOMMENDATION SYSTEM" ] }, { "cell_type": "code", "execution_count": 1, "id": "8119de64", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import pickle\n", "import fasttext\n", "from rake_nltk import Rake\n", "from sklearn.base import BaseEstimator, TransformerMixin\n", "from sklearn.feature_extraction.text import TfidfVectorizer\n", "from sklearn.metrics.pairwise import linear_kernel\n", "from sklearn.pipeline import Pipeline\n", "from tqdm import tqdm\n", "\n", "tqdm.pandas()" ] }, { "cell_type": "markdown", "id": "3a999759", "metadata": {}, "source": [ "## GOAL\n", "\n", "Creating a good recommendation typically need a combination of content and user data. Here, instead of employing the user data, I will use only content data to create a recommendation system by generating item with high similarity with the item entered by user.\n", "\n", "## DATASET\n", "\n", "The dataset I used for this experiment is the [Goodreads' Best Book Dataset](https://www.kaggle.com/datasets/meetnaren/goodreads-best-books) that available on kaggle. The dataset contains 54301 rows of data and 12 columns." ] }, { "cell_type": "code", "execution_count": 2, "id": "0e3a3fb3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | book_authors | \n", "book_desc | \n", "book_edition | \n", "book_format | \n", "book_isbn | \n", "book_pages | \n", "book_rating | \n", "book_rating_count | \n", "book_review_count | \n", "book_title | \n", "genres | \n", "image_url | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Suzanne Collins | \n", "Winning will make you famous. Losing means cer... | \n", "NaN | \n", "Hardcover | \n", "9.78044E+12 | \n", "374 pages | \n", "4.33 | \n", "5519135 | \n", "160706 | \n", "The Hunger Games | \n", "Young Adult|Fiction|Science Fiction|Dystopia|F... | \n", "https://images.gr-assets.com/books/1447303603l... | \n", "
1 | \n", "J.K. Rowling|Mary GrandPré | \n", "There is a door at the end of a silent corrido... | \n", "US Edition | \n", "Paperback | \n", "9.78044E+12 | \n", "870 pages | \n", "4.48 | \n", "2041594 | \n", "33264 | \n", "Harry Potter and the Order of the Phoenix | \n", "Fantasy|Young Adult|Fiction | \n", "https://images.gr-assets.com/books/1255614970l... | \n", "
2 | \n", "Harper Lee | \n", "The unforgettable novel of a childhood in a sl... | \n", "50th Anniversary | \n", "Paperback | \n", "9.78006E+12 | \n", "324 pages | \n", "4.27 | \n", "3745197 | \n", "79450 | \n", "To Kill a Mockingbird | \n", "Classics|Fiction|Historical|Historical Fiction... | \n", "https://images.gr-assets.com/books/1361975680l... | \n", "
3 | \n", "Jane Austen|Anna Quindlen|Mrs. Oliphant|George... | \n", "«È cosa ormai risaputa che a uno scapolo in po... | \n", "Modern Library Classics, USA / CAN | \n", "Paperback | \n", "9.78068E+12 | \n", "279 pages | \n", "4.25 | \n", "2453620 | \n", "54322 | \n", "Pride and Prejudice | \n", "Classics|Fiction|Romance | \n", "https://images.gr-assets.com/books/1320399351l... | \n", "
4 | \n", "Stephenie Meyer | \n", "About three things I was absolutely positive.F... | \n", "NaN | \n", "Paperback | \n", "9.78032E+12 | \n", "498 pages | \n", "3.58 | \n", "4281268 | \n", "97991 | \n", "Twilight | \n", "Young Adult|Fantasy|Romance|Paranormal|Vampire... | \n", "https://images.gr-assets.com/books/1361039443l... | \n", "