{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Retail Product Recommender Engine\n", "**By [Czarina Luna](https://czarinaluna.com)**\n", "\n", "### Contents\n", "* [I. Overview](#I.-Overview)\n", "* [II. Business Problem](#II.-Business-Problem)\n", "* [III. Data Understanding](#III.-Data-Understanding)\n", "* [IV. Recommendation Systems](#IV.-Recommendation-Systems)\n", " * [Popularity Recommendations](#Popularity-Recommendations)\n", " * [Content-Based Recommenders](#Content-Based-Recommenders)\n", " * [Collaborative Filtering Systems](#Collaborative-Filtering-Systems)\n", "* [V. Results and Recommendations](#V.-Results-and-Recommendations)\n", "* [VI. Further Research](#VI.-Further-Research)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## I. Overview\n", "\n", "A recommender engine is developed to increase revenue of clothing rental companies by predicting user preferences and recommending products for users to rent. I apply different algorithms to create personalized recommendations using content-based and collaborative filtering systems. The algorithm that attained the lowest mean absolute error of 0.5 is the Singular Value Decomposition." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## II. Business Problem\n", "\n", "The clothing rental industry grows as more companies follow suit of the retailer Rent the Runway, which pioneered online services and subscriptions for designer rentals. To help grow the revenue of clothing rental companies, I develop recommendation systems that predict a set of user preferences and recommend the top preferences for the user. Doing so will conveniently expose users to relevant products to rent that tailor to their preferences. Using data from Rent the Runway, I conduct an analysis of the product reviews, model the data to predict user ratings, and provide recommendations accordingly." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## III. Data Understanding\n", "\n", "The Rent the Runway reviews ([data source](https://cseweb.ucsd.edu/~jmcauley/datasets.html#clothing_fit)) contain 200,000 ratings of 6,000 unique items rented between 2010 and 2018 by over 100,000 unique users.\n", "\n", "A quick look at the data structure:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fituser_idbust sizeitem_idweightratingrented forreview_textbody typereview_summarycategoryheightsizeagereview_date
0fit42027234d2260466137lbs10.0vacationAn adorable romper! Belt and zipper were a lit...hourglassSo many compliments!romper5' 8\"1428.0April 20, 2016
1fit27355134b153475132lbs10.0otherI rented this dress for a photo shoot. The the...straight & narrowI felt so glamourous!!!gown5' 6\"1236.0June 18, 2013
\n", "
" ], "text/plain": [ " fit user_id bust size item_id weight rating rented for \\\n", "0 fit 420272 34d 2260466 137lbs 10.0 vacation \n", "1 fit 273551 34b 153475 132lbs 10.0 other \n", "\n", " review_text body type \\\n", "0 An adorable romper! Belt and zipper were a lit... hourglass \n", "1 I rented this dress for a photo shoot. The the... straight & narrow \n", "\n", " review_summary category height size age review_date \n", "0 So many compliments! romper 5' 8\" 14 28.0 April 20, 2016 \n", "1 I felt so glamourous!!! gown 5' 6\" 12 36.0 June 18, 2013 " ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "raw_data = pd.read_csv('data/data.csv')\n", "raw_data.head(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Missing values:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "fit 0\n", "user_id 0\n", "bust size 18411\n", "item_id 0\n", "weight 29982\n", "rating 82\n", "rented for 10\n", "review_text 62\n", "body type 14637\n", "review_summary 345\n", "category 0\n", "height 677\n", "size 0\n", "age 960\n", "review_date 0\n", "dtype: int64" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data.isna().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Target variable:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10.0 124537\n", "8.0 53391\n", "6.0 10697\n", "4.0 2791\n", "2.0 1046\n", "Name: rating, dtype: int64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_data['rating'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Data Cleaning** \n", "\n", "To perform the pre-processing steps, I define the function `preprocess_data`:\n", "- Drop missing values for `rating` and change the scale for from 2-10 to 1-5.\n", "- Remove the units of measurement for `weight` and `height` to only keep the numerical values.\n", "- Impute missing values with the median for `age` and a few other features and with mode for `rented_for`.\n", "- Impute missing value for `bust_size` with the median value by `body_type` and vice versa.\n", "- Count the number of words in `text_summary` and `text_review` together to create new feature `length`.\n", "- Create new features `review_month`, `review_season`, and `review_year` based on `review_date`." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "def convert_height(x):\n", " '''\n", " Converts height from string format as feet and inches to integer in inches.\n", " '''\n", " height = [int(i) for i in x.replace('\\'', '').replace('\"', '').split()]\n", " return height[0]*12 + height[1]\n", "\n", "def preprocess_data(df):\n", " '''\n", " Cleans the dataframe using imputation and feature engineering.\n", " '''\n", " df.columns = df.columns.str.replace(' ', '_')\n", " df = df.dropna(subset=['rating'])\n", " \n", " df['weight'] = df['weight'].str.replace('lbs', '')\n", " df['rating'] = df['rating']/2\n", " df['height'] = df['height'].apply(lambda x: convert_height(x) if pd.notnull(x) else x)\n", " \n", " to_num = ['rating', 'weight', 'age']\n", " df[to_num] = df[to_num].apply(pd.to_numeric, errors='coerce')\n", " \n", " for col in ['height', 'age']:\n", " df[col] = df[col].fillna(df[col].median())\n", " \n", " weight_map = dict(df.groupby('height')['weight'].median())\n", " df['weight'] = df['weight'].fillna(df['height'].map(weight_map))\n", " \n", " for col in ['review_text', 'review_summary']:\n", " df[col] = df[col].replace('-', np.nan)\n", " df['review'] = df['review_summary'] + ' ' + df['review_text']\n", " df['review'] = df['review'].fillna('')\n", " df['review_length'] = df['review_text'].fillna('').apply(lambda x: len(x.split()))\n", " \n", " age_limit = (df['age'] > 60) | (df['age'] < 13)\n", " df['age'] = np.where(age_limit==True, df['age'].median(), df['age'])\n", " \n", " for col in ['bust_size', 'body_type']:\n", " to_map = dict(df.groupby('size')[col].last())\n", " df[col] = df[col].fillna(df['size'].map(to_map))\n", " \n", " df['rented_for'] = df['rented_for'].fillna(df['rented_for'].value_counts().index[0])\n", " \n", " df['review_date'] = pd.to_datetime(df['review_date'])\n", " df['review_month'] = pd.DatetimeIndex(df['review_date']).month\n", " df['review_season'] = pd.cut(df['review_month'].replace(12, 0), [0, 3, 6, 9, 11], include_lowest=True, labels=['Winter', 'Spring', 'Summer', 'Fall'])\n", " df['review_year'] = pd.DatetimeIndex(df['review_date']).year\n", " return df\n", "\n", "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Create new dataframe for processed data\n", "data = preprocess_data(raw_data)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
weight192462.00136.9220.4350.00125.00135.00145.00300.00
rating192462.004.550.721.004.005.005.005.00
height192462.0065.312.6654.0063.0065.0067.0078.00
size192462.0012.258.500.008.0012.0016.0058.00
age192462.0033.607.3914.0029.0032.0037.0060.00
review_length192462.0058.4043.040.0027.0050.0079.00398.00
review_month192462.006.853.381.004.007.0010.0012.00
review_year192462.002015.691.332010.002015.002016.002017.002018.00
\n", "
" ], "text/plain": [ " count mean std min 25% 50% 75% max\n", "weight 192462.00 136.92 20.43 50.00 125.00 135.00 145.00 300.00\n", "rating 192462.00 4.55 0.72 1.00 4.00 5.00 5.00 5.00\n", "height 192462.00 65.31 2.66 54.00 63.00 65.00 67.00 78.00\n", "size 192462.00 12.25 8.50 0.00 8.00 12.00 16.00 58.00\n", "age 192462.00 33.60 7.39 14.00 29.00 32.00 37.00 60.00\n", "review_length 192462.00 58.40 43.04 0.00 27.00 50.00 79.00 398.00\n", "review_month 192462.00 6.85 3.38 1.00 4.00 7.00 10.00 12.00\n", "review_year 192462.00 2015.69 1.33 2010.00 2015.00 2016.00 2017.00 2018.00" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.options.display.float_format = '{:.2f}'.format\n", "\n", "# Summary statistics for numerical features\n", "data.drop(columns=['user_id', 'item_id']).describe().T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Visualization\n", "\n", "Let's explore and visualize the processed data!\n", "***\n", "\n", "**User Data** I create a separate table for user information by grouping the data by `user_id` and adding new features:\n", "- `rating_count` is the total number of items the user rated and reviewed.\n", "- `rating_average` is the average rating of the items reviewed by the user.\n", "- `rented_for_top` is the user's most common reason for renting an item.\n", "- `category_top` is the most common clothing category among the items reviewed by the user.\n", "- `review_length_average` is the average length of text review posted by the user.\n", "- `review_month_top` and `review_season_top` are the most common month and season the user posted the review.\n", "- `rented_for_all` is a list of all the user's reasons for renting the items.\n", "- `category_for_all` is a list of all the clothing categories of the items reviewed by the user." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def create_user_data(df):\n", " '''\n", " Groups the data by user and returns dataframe containing user information. \n", " '''\n", " user_df = pd.DataFrame(df.groupby('user_id').count().reset_index()['user_id'])\n", " \n", " for col in df.columns:\n", " if col in ['bust_size', 'weight']:\n", " feature = df.sort_values('review_date', ascending=False).groupby('user_id')[col].first()\n", " user_df = user_df.merge(feature, on='user_id')\n", " if col == 'item_id':\n", " feature = df.groupby(df['user_id']).count()[col]\n", " user_df = user_df.merge(feature, on='user_id')\n", " if col == 'rating':\n", " feature = df.groupby(df['user_id']).mean()[col]\n", " user_df = user_df.merge(feature, on='user_id')\n", " if col in ['body_type', 'height', 'size', 'age']:\n", " feature = df.sort_values('review_date', ascending=False).groupby('user_id')[col].first()\n", " user_df = user_df.merge(feature, on='user_id')\n", " if col == 'review_length':\n", " feature = df.groupby(df['user_id']).mean()[col]\n", " user_df = user_df.merge(feature, on='user_id')\n", " if col in ['review_month', 'review_season']:\n", " feature = df.sort_values('review_date', ascending=False).groupby('user_id')[col].agg(lambda x: x.value_counts().index[0])\n", " user_df = user_df.merge(feature, on='user_id')\n", " if col in ['rented_for', 'category']:\n", " feature = df.sort_values('review_date', ascending=False).groupby('user_id')[col].agg(pd.Series.mode).apply(lambda x: x[0] if type(x)==np.ndarray else x)\n", " user_df = user_df.merge(feature, on='user_id')\n", " else:\n", " continue\n", " \n", " for col in ['rented_for', 'category']:\n", " feature = df.groupby('user_id')[col].apply(set).apply(lambda x: list(x))\n", " user_df = user_df.merge(feature, on='user_id')\n", " \n", " user_df.columns = ['user_id', 'bust_size', 'rating_count', 'weight', 'rating_average', 'rented_for_top', \n", " 'body_type', 'category_top', 'height', 'size', 'age', 'review_length_average', \n", " 'review_month_top', 'review_season_top', 'rented_for_all', 'category_all']\n", " \n", " return user_df" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user_idbust_sizerating_countweightrating_averagerented_for_topbody_typecategory_topheightsizeagereview_length_averagereview_month_topreview_season_toprented_for_allcategory_all
0932c2145.005.00formal affairpeardress66.00832.00121.503Winter[formal affair, wedding][dress, gown]
12534ddd/e1130.005.00partyfull bustlegging67.00840.0014.0012Winter[party][legging]
\n", "
" ], "text/plain": [ " user_id bust_size rating_count weight rating_average rented_for_top \\\n", "0 9 32c 2 145.00 5.00 formal affair \n", "1 25 34ddd/e 1 130.00 5.00 party \n", "\n", " body_type category_top height size age review_length_average \\\n", "0 pear dress 66.00 8 32.00 121.50 \n", "1 full bust legging 67.00 8 40.00 14.00 \n", "\n", " review_month_top review_season_top rented_for_all category_all \n", "0 3 Winter [formal affair, wedding] [dress, gown] \n", "1 12 Winter [party] [legging] " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create new dataframe for user data\n", "user_data = create_user_data(data)\n", "user_data.head(2)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "\n", "# Sort data by bust size\n", "bust_size_sorted_data = user_data.loc[(user_data['bust_size']>='32a') & (user_data['bust_size']<='38ddd/e')].sort_values('bust_size')\n", "\n", "sns.set_style('whitegrid')\n", "fig, axes = plt.subplots(nrows=2, figsize=(20, 18))\n", "\n", "# Plot the distribution of users by bust size\n", "sns.countplot(x='bust_size', data=bust_size_sorted_data, palette='twilight', ax=axes[0])\n", "axes[0].set_title('User Count by Bust Size', fontsize=16)\n", "axes[0].set_xlabel('Bust Size')\n", "axes[0].set_ylabel('User Count')\n", "axes[0].set_xticklabels(bust_size_sorted_data['bust_size'].unique(), rotation=90, fontsize=12)\n", "\n", "# Plot the distribution of users by size\n", "sns.countplot(x='size', data=user_data, palette='PuBu_r', ax=axes[1])\n", "axes[1].set_title('User Count by Size', fontsize=16)\n", "axes[1].set_xlabel('Size')\n", "axes[1].set_ylabel('User Count')\n", "plt.savefig('data/images/fig0.png', dpi=200, transparent=True) \n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "weight_data = user_data.loc[(user_data['weight']>=90) & (user_data['weight']<=210)]\n", "\n", "sns.set_style('whitegrid')\n", "fig, axes = plt.subplots(ncols=3, figsize=(15, 5))\n", "\n", "# Plot the distribution of users by weight\n", "sns.histplot(x='weight', data=weight_data, bins=24, color='darksalmon', kde=True, ax=axes[0])\n", "axes[0].set_title('User Count by Weight', fontsize=16)\n", "axes[0].set_xlabel('Weight')\n", "axes[0].set_ylabel('User Count')\n", "axes[0].grid(axis='x')\n", "\n", "# Plot the distribution of users by height\n", "sns.histplot(x='height', data=user_data, bins=24, color='midnightblue', ax=axes[1])\n", "axes[1].set_title('User Count by Height', fontsize=16)\n", "axes[1].set_xlabel('Height')\n", "axes[1].set(ylabel=None)\n", "axes[1].grid(axis='x')\n", "\n", "# Plot the distribution of users by age\n", "sns.histplot(x='age', data=user_data, bins=24, color='rebeccapurple', kde=True, ax=axes[2])\n", "axes[2].set_title('User Count by Age', fontsize=16)\n", "axes[2].set_xlabel('Age')\n", "axes[2].set(ylabel=None)\n", "axes[2].grid(axis='x')\n", "\n", "plt.savefig('data/images/fig1.png', dpi=200, transparent=True) \n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Normally distributed and diverse ranges of weight, height, and age above." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "body_type_values = user_data['body_type'].value_counts().values\n", "body_type_names = user_data['body_type'].value_counts().index\n", "\n", "body_type_circle = plt.Circle((0,0), 0.7, color='white')\n", "\n", "plt.style.use('seaborn')\n", "plt.figure(figsize=(8,8))\n", "colors = ['#DC7F8E', '#E5A1AA', '#F4BFBE', '#FFE0DA', '#F4C4B2', '#E8B08D', '#C68C73']\n", "\n", "# Plot a donut chart of user body type\n", "plt.pie(body_type_values, labels=body_type_names, colors=colors, autopct='%1.0f%%', startangle=40, pctdistance=0.85)\n", "p = plt.gcf()\n", "p.gca().add_artist(body_type_circle)\n", "\n", "plt.title('User Percentage by Body Type', fontsize=16)\n", "plt.savefig('data/images/fig2.png', dpi=200, transparent=True) \n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": false }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.set_style('whitegrid')\n", "fig, axes = plt.subplots(ncols=2, figsize=(16, 6))\n", "\n", "axes[0] = plt.subplot2grid((1, 5), (0, 0))\n", "axes[1] = plt.subplot2grid((1, 5), (0, 1), colspan=4)\n", "\n", "# Categorize rating count into binary classes\n", "item_count_data = user_data.copy()\n", "item_count_data['rating_count'] = item_count_data['rating_count'].apply(lambda x: 'Only one' if x==1 else 'More than one')\n", "\n", "# Plot the distribution of binary classes\n", "sns.countplot(x='rating_count', data=item_count_data, palette=['#5d2349', '#861d23'], order=['Only one', 'More than one'], ax=axes[0])\n", "axes[0].set(xlabel=None)\n", "axes[0].set_ylabel('User Count')\n", "\n", "rating_count_data = user_data.loc[(user_data['rating_count']>=2) & (user_data['rating_count']<=10)]\n", "\n", "# Show the distribution of the second class\n", "sns.countplot(x='rating_count', data=rating_count_data, palette='Reds_r', ax=axes[1])\n", "axes[1].set_title('User Count by Number of Items Rated', fontsize=16)\n", "axes[1].set_xlabel('Item rating count', fontsize=12)\n", "axes[1].set(ylabel=None)\n", "\n", "plt.savefig('data/images/fig3.png', dpi=200, transparent=True) \n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Overall, two thirds of users rented only one item and the remaining third rented more than one, on the left chart. Majority of those who rented more than once rented exactly two items, on the right chart." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.set_style('whitegrid')\n", "fig, axes = plt.subplots(ncols=2, figsize=(15, 5))\n", "\n", "# Plot the distribution of users by average rating\n", "sns.histplot(x='rating_average', data=user_data, bins=10, color='thistle', ax=axes[0])\n", "axes[0].set_title('User Count by Average Rating', fontsize=16)\n", "axes[0].set_xlabel('Average Rating')\n", "axes[0].set_ylabel('User Count')\n", "axes[0].grid(axis='x')\n", "\n", "# Plot the distribution of users by average review length\n", "sns.histplot(x='review_length_average', data=user_data, bins=30, color='lightsteelblue', kde=True, ax=axes[1])\n", "axes[1].set_title('User Count by Average Review Length', fontsize=16)\n", "axes[1].set_xlabel('Average Review Length')\n", "axes[1].set(ylabel=None)\n", "axes[1].grid(axis='x')\n", "\n", "plt.savefig('data/images/fig4.png', dpi=200, transparent=True) \n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A left-skewed distribution for the average rating per user with most of them giving the highest rating, on the left chart. And a right-skewed distribution for the average length of text review per user, on the right chart." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.set_style('whitegrid')\n", "fig, axes = plt.subplots(ncols=2, figsize=(15, 5))\n", "\n", "# Plot the distribution of users by month of review posted\n", "sns.countplot(x='review_month_top', data=user_data, palette='PuBuGn', ax=axes[0])\n", "axes[0].set_title('User Count by Top Month Rated', fontsize=16)\n", "axes[0].set_xlabel('Top Month Rated')\n", "axes[0].set_ylabel('User Count')\n", "\n", "# Plot the distribution of users by season of review posted\n", "sns.countplot(x='review_season_top', data=user_data, order=['Spring', 'Summer', 'Fall', 'Winter'], palette='PuBuGn', ax=axes[1])\n", "axes[1].set_title('User Count by Top Season Rated', fontsize=16)\n", "axes[1].set_xlabel('Top Season Rated')\n", "axes[1].set(ylabel=None)\n", "\n", "plt.savefig('data/images/fig5.png', dpi=200, transparent=True) \n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.set_style('whitegrid')\n", "fig, axes = plt.subplots(ncols=2, figsize=(16, 8))\n", "\n", "# Show top five clothing categories and top five reasons for rent\n", "category_top_data = user_data.loc[user_data['category_top'].isin(user_data['category_top'].value_counts()[:5].index.tolist())]\n", "rented_for_top_data = user_data.loc[user_data['rented_for_top'].isin(user_data['rented_for_top'].value_counts()[:5].index.tolist())]\n", "\n", "category_top_values = (category_top_data['category_top'].value_counts(normalize=True).values*100).tolist()\n", "category_top_labels = category_top_data['category_top'].value_counts().index.tolist()\n", "category_colors = ['#E5E4F4', '#E8F1DE', '#FDF9F0', '#F2E6F0', '#D9E4FB']\n", "\n", "# Plot the distribution of users by clothing category\n", "axes[0].pie(category_top_values, labels=category_top_labels, colors=category_colors, autopct='%.0f%%')\n", "axes[0].set_title('User Percentage by Top 5 Clothing Category', fontsize=16)\n", "\n", "rented_for_top_values = (rented_for_top_data['rented_for_top'].value_counts(normalize=True).values*100).tolist()\n", "rented_for_top_labels = rented_for_top_data['rented_for_top'].value_counts().index.tolist()\n", "rented_for_colors = ['#FBFBFB', '#FAEDDA', '#D2C1CE', '#E1CEC9', '#F3C0A1']\n", "\n", "# Plot the distribution of users by reason for rent\n", "axes[1].pie(rented_for_top_values, labels=rented_for_top_labels, colors=rented_for_colors, autopct='%.0f%%')\n", "axes[1].set_title('User Percentage by Top 5 Reason for Rent', fontsize=16)\n", "\n", "plt.savefig('data/images/fig6.png', dpi=200, transparent=True) \n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The most common clothing categories are dresses and gowns that align with the most common reasons for renting which are for wedding, formal affair, and party. \n", "***\n", "**Item Data** I create a separate table for item information by grouping the data by `item_id` and adding new features:\n", "- `fit_small`, `fit_large`, and `fit` are the count of users who rated the item as too small, too large, or right fit.\n", "- `user_count` is the total numbers of users who rated and reviewed the item.\n", "- `bust_size_top` and `body_type_top` are the most common bust size and body type of the users who rented the item.\n", "- `mean` and `median` of the `weight`, `height`, `size`, and `age` of all users who rented the item.\n", "- `rating_average` is the average of all user ratings of the item." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "def create_item_data(df):\n", " '''\n", " Groups the data by item and returns dataframe containing item details. \n", " '''\n", " item_df = pd.DataFrame(df.groupby('item_id').count().reset_index()['item_id'])\n", " \n", " for col in df.columns:\n", " if col == 'fit':\n", " feature_small = df.loc[df[col]=='small'].groupby('item_id').count()[col]\n", " feature_fit = df.loc[df[col]=='fit'].groupby('item_id').count()[col]\n", " feature_large = df.loc[df[col]=='large'].groupby('item_id').count()[col]\n", " for idx, feature in enumerate([feature_small, feature_fit, feature_large]):\n", " item_df = item_df.join(feature, on='item_id', rsuffix=idx).fillna(0)\n", " if col == 'user_id':\n", " feature = df.groupby(df['item_id']).count()[col]\n", " item_df = item_df.merge(feature, on='item_id')\n", " if col in ['bust_size', 'body_type']:\n", " feature = df.sort_values('review_date', ascending=False).groupby('item_id')[col].agg(pd.Series.mode).apply(lambda x: x[0] if type(x)==np.ndarray else x)\n", " item_df = item_df.merge(feature, on='item_id')\n", " if col in ['weight', 'height', 'size', 'age']:\n", " feature_mean = df.groupby(df['item_id']).mean()[col]\n", " feature_median = df.groupby(df['item_id']).median()[col]\n", " for feature in [feature_mean, feature_median]:\n", " item_df = item_df.merge(feature, on='item_id')\n", " if col in ['rating', 'review_length']:\n", " feature = df.groupby(df['item_id']).mean()[col]\n", " item_df = item_df.merge(feature, on='item_id')\n", " if col in ['rented_for', 'category']:\n", " feature = df.sort_values('review_date', ascending=False).groupby('item_id')[col].agg(pd.Series.mode).apply(lambda x: x[0] if type(x)==np.ndarray else x)\n", " item_df = item_df.merge(feature, on='item_id')\n", " if col == 'rented_for':\n", " feature = df.groupby('item_id')[col].apply(set).apply(lambda x: list(x))\n", " item_df = item_df.merge(feature, on='item_id')\n", " if col in ['review_month', 'review_season']:\n", " feature = df.sort_values('review_date', ascending=False).groupby('item_id')[col].agg(lambda x: x.value_counts().index[0])\n", " item_df = item_df.merge(feature, on='item_id')\n", " else:\n", " continue\n", " \n", " item_df.columns = ['item_id', 'fit_small', 'fit', 'fit_large', 'user_count', 'bust_size_top', 'weight_mean', \n", " 'weight_median', 'rating_average', 'rented_for_top', 'rented_for_all', 'body_type_top', \n", " 'category_top', 'height_mean', 'height_median', 'size_mean', 'size_median', 'age_mean', \n", " 'age_median', 'review_length_average', 'review_month_top', 'review_season_top']\n", " \n", " return item_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Table to use for item to item recommendations later:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
item_idfit_smallfitfit_largeuser_countbust_size_topweight_meanweight_medianrating_averagerented_for_top...category_topheight_meanheight_mediansize_meansize_medianage_meanage_medianreview_length_averagereview_month_topreview_season_top
012337373.00566.0047.0068636d140.67135.004.40formal affair...gown65.3965.0015.1213.0034.3633.0066.0812Winter
112379365.001497.00152.00171434b132.98130.004.77formal affair...gown65.0665.009.728.0031.3131.0074.415Winter
\n", "

2 rows × 22 columns

\n", "
" ], "text/plain": [ " item_id fit_small fit fit_large user_count bust_size_top \\\n", "0 123373 73.00 566.00 47.00 686 36d \n", "1 123793 65.00 1497.00 152.00 1714 34b \n", "\n", " weight_mean weight_median rating_average rented_for_top ... \\\n", "0 140.67 135.00 4.40 formal affair ... \n", "1 132.98 130.00 4.77 formal affair ... \n", "\n", " category_top height_mean height_median size_mean size_median age_mean \\\n", "0 gown 65.39 65.00 15.12 13.00 34.36 \n", "1 gown 65.06 65.00 9.72 8.00 31.31 \n", "\n", " age_median review_length_average review_month_top review_season_top \n", "0 33.00 66.08 12 Winter \n", "1 31.00 74.41 5 Winter \n", "\n", "[2 rows x 22 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create new dataframe for item data\n", "item_data = create_item_data(data)\n", "item_data.head(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Time Series Analysis" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.style.use('seaborn')\n", "\n", "time_series_data = data.sort_values('review_date').set_index('review_date', drop=True).drop('2010-11-03')\n", "\n", "# Resample data to yearly count of reviews\n", "yearly_data = time_series_data.resample('Y').count()\n", "yearly_data = yearly_data.drop(yearly_data.index[-1])\n", "\n", "# Plot the aggregated yearly count of reviews\n", "yearly_data['rating'].plot(figsize=(8,5), colormap='PRGn', xlabel='')\n", "\n", "plt.title('Total Count of Reviews By Year', fontsize=16)\n", "plt.savefig('data/images/fig7.png', dpi=200, transparent=True)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The count of reviews increased over the years from 10,000 in 2013 to almost 70,000 by 2018. " ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "scrolled": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeoAAAFACAYAAABz6j+yAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAABX7UlEQVR4nO3dd1yVZf/A8c9hbxFlOABFBRVFUcQVuFMz09JMTduu0tLKtGH2qA17cqX5NDQt3JlWVq7MraDiSlwIKrgQEJEp49y/P/ido8Q6h3PggH7fr1ev5D73ua7rvhjfc22VoigKQgghhKiSzExdACGEEEKUTAK1EEIIUYVJoBZCCCGqMAnUQgghRBUmgVoIIYSowiRQC5N72BYePGzPKyqG/Bw9PCRQi3KZOnUqfn5+pf63cOHCMtP566+/mD59ul55L1y4kMDAwDLvS09PZ/HixfTv35/AwEAeeeQRxo4dy7Fjx/TKz5jWrVvH/PnzDU5n//799OrVi5YtWzJz5sxi7ynue9KiRQu6du3K9OnTSU9PN7gc9+vevTszZswwapqG2LBhQ5HnDwwM5Omnn2bHjh0GpR0REaFN8/z588Xes2jRIvz8/BgzZoxBef1bTk4Os2bNKvQMVa3uhXFZmLoAonp69dVXGTp0qPbrKVOm4O3tzauvvqq95uHhUWY6P/zwA3Z2dkYv3/Xr13nxxRdJT0/n+eefx9/fn4yMDNauXcuzzz7L3Llz6dOnj9HzLcvXX39N165dDU5nzpw52NjY8N1331GnTp0S7xs5ciSPP/649uuMjAwOHjzIkiVLSElJ4csvvzS4LBqLFi3CycnJaOkZy5IlS3B0dEStVpOWlsbmzZuZMGECYWFhtG3b1qC0VSoV27Ztw9fXt8hrW7duNSjtkty8eZOwsDCCgoIqJH1R9UigFuXi5eWFl5eX9msbGxtcXFxo3bq16Qp1nylTppCens66deuoW7eu9nqPHj0YO3Ys06ZN45FHHsHBwcGEpSy/27dv06VLFzp06FDqfXXq1CnyPencuTPXr1/njz/+ICMjA3t7e6OUqXnz5kZJx9j8/f1xcXHRft2lSxcOHz7M+vXrDQ7UgYGBbNu2jfHjxxe6HhMTQ0xMDI0bNzYofSFAur5FBVIUhXXr1tG/f38CAgJ49NFHWb58ufb1kSNHcujQIXbt2oWfnx9XrlwBYO/evYwYMYLAwEBatmzJgAED2LZtm875njp1ioiICF555ZVCQRrAzMyMSZMm8fTTT5OWlqa9vn37dgYNGkTr1q3p0qUL8+fPJzc3V/t6cV2LH3/8Md27d9d+7efnx4YNG5g0aRKBgYG0b9+ejz/+mLy8PG0aV69eZeXKlfj5+ZVY/oyMDGbPnk337t0JCAhg8ODB7Nu3D4ArV67g5+fH1atXWbVqVaF600dxH1D279/P008/TUBAAKGhoSxYsID8/HygYLihTZs25OTkFHrP66+/zrPPPltsHSUnJ/POO+8QHBxMYGAgY8eOJT4+HigY8vh32T/++GP8/Py09wDMmDGDwYMHA3DixAmeffZZAgMDCQ4O5vXXX+fq1at6PzuAo6Oj9t9hYWE0b96cpKSkQvdMmzaNp556qtR0evfuzblz57h8+XKh61u2bKF9+/aFPiBA6d9buNelfuTIEYYOHUrLli3p0aMHP/30E1Dw/e/RowcAb7zxBiNHjtS+Nzs7m48++ojg4GDatm2r/bAqqj8J1KLCzJ07l48++oju3buzePFi+vTpw+eff868efMAmD59Os2bN6dNmzasXbsWNzc3Tp48yejRo2nSpAmLFy9m3rx52Nra8tZbb3Hr1i2d8t2/fz8AoaGhxb7erFkz3nnnHW2X8dq1axk/fjwtW7Zk0aJFjBgxgu+//553331X72f+5JNPcHFxYfHixTz77LP8+OOPrFu3DijoGnZ1daV3796sXbu22Per1WpeeeUVNmzYwOjRo1m4cCF169Zl9OjR7N27Fzc3N9auXVsoHTc3txLLo1arycvL0/6XkpLCr7/+yi+//ELPnj21remDBw8yatQo6tevz6JFi3j55ZdZtmwZs2bNAuDxxx8nIyODvXv3atPOzMxkz5499OvXr0i+2dnZPPfcc0RGRvLBBx/w+eefk5SUxIgRI0hNTaVDhw5YWloSHh6ufc+hQ4cAiIyM1F7bv38/oaGhZGVlMXr0aNzd3Vm8eDEzZ87k9OnTvPnmm2V+T+6vg9u3b7Ny5Uqio6N5+umnAejXrx9mZmZs3rxZ+56cnBy2bt3KgAEDSk27TZs2uLq6FvkguWXLliJDK2V9b+/35ptv0rt3b7799luaN2/OBx98wIULF3Bzc2PRokXae+6f37Fx40ZSU1OZP38+EyZMYNOmTdp7RfUmXd+iQqSkpLBs2TJefvllJk2aBMAjjzyCoigsXbqU559/nsaNG+Pg4ICdnZ22ezY6OppevXoV+gNUt25dnnzySU6cOEG3bt3KzPvGjRsA1KtXr8x71Wo18+fPp1+/fnz00Ufacjo6OjJ9+nReeeUVmjZtqvNzBwYGMm3aNAA6duzIzp072bNnD8OHD6d58+ZYWVlRu3btEocIdu3axdGjR1myZAkhISFAQVftM888w7x589iwYQOtW7cuMx2NL774gi+++KLQNWdnZwYPHqz9vgDMnz+fVq1aaT9EhYaGUqNGDd59911efvllGjZsiL+/P1u2bNG26Hbu3Elubm6xY/2//PILFy9eZNOmTTRq1EhbH926dSMsLIzx48cTGBhIREQEgwcPJjU1lfPnz9O8eXOOHDnCwIEDuXr1KpcuXaJLly5ER0dz+/ZtRo4cqZ1IWLNmTcLDw1Gr1ZiZldzm6Ny5c5Fr96fj4uJCaGgov//+u7aFumfPHjIyMor9EHI/lUpFz5492b59O6NGjQIgNjaWmJgYevXqxe+//669t6zvreaapnwvvvgiUNB1v337dvbs2cNLL71Es2bNAPD29i7Utd6wYUPmzp2LSqWiU6dOhIeHExERUWr5RfUgLWpRIU6cOFHsH/F+/fqRm5vLiRMnin3foEGD+PLLL8nMzOSff/5h06ZNrFy5EqBIt2tJzM3NgYIgXJaYmBhu3bpVpJyaCVhHjhzRKU+NVq1aFfra3d2dzMxMnd9/+PBh7O3tC/3RBnjssceIiorSuyvzueeeY/369fz000+MHz8eS0tLRo8ezbRp07Td31lZWZw8eZJu3boVan2HhoaiVqu1f+z79+/P33//rf0+bN68mU6dOhXp3oWCLlxvb2+8vb216dnY2NC2bVttKzokJESb9uHDh3Fzc6Nfv37aFvW+ffuoWbMmLVu2xMfHB2dnZ8aOHcuMGTPYvXs3rVu35vXXXy81SAMsX76c9evXs379epYvX87o0aNZuXIln332mfaegQMHcvz4cW23+2+//Ubnzp2pXbt2mXXcu3dvTp48qf2AuGXLFoKDg4vUiz7f2/s/gDk5OWFnZ1fmz1GrVq1QqVTar+vXr8+dO3fKLL+o+iRQiwqRmpoKUOQPXa1atQBKDDiZmZm8/fbbtGvXjqFDh/LNN99w9+5dQPd1o5px6WvXrpV4j+aPqqacmnJpODg4YG1trXdgtLW1LfS1mZmZXutd79y5U2xw0FzLyMjQqzweHh60bNmSgIAAJkyYwJgxY/j8889Zv359oTzVajVz5szB399f+1/Hjh0BSExMBAoCSmZmJvv27dN2g5fU4rx9+zaxsbGF0vP392fnzp3a9EJDQ0lISODixYtEREQQFBRE27ZtiY2N5datW+zfv5+QkBDMzMxwcHBgxYoVdOzYkY0bNzJ69Gg6d+7MqlWryqwDPz8/WrZsScuWLenYsSNvvfUWTz/9NGFhYdqydO3aFWdnZ/7880/S09PZtWtXmd3eGsHBwTg7O7N9+3agYLZ3cb0M+nxvbWxsCt2jy8/Rv3/2VCqVrLV+QEjXt6gQzs7OACQlJeHu7q69rpmwo3n932bOnMn+/fv59ttvadeuHVZWVly4cIFNmzbpnHenTp2Agklpmm7X+508eZKnn36a2bNn06JFC6Bg4tP97ty5w927dwuV898tdH1ayrqqUaNGkUlNcC9YllRvuho7dixbtmzhk08+ISQkBHd3d+049bhx47Td2vfTjIG7u7sTFBTEtm3btM/es2fPYvNxdHSkadOm2jHu+1lZWQHQtGlT3NzciIiI4MiRIwwZMoQWLVpga2vLoUOHCA8P1w4jADRp0oT58+eTk5NDZGQkP/zwA//5z3/w9/cv0pNRFj8/P/Lz87l69Squrq5YWVnx2GOPsXXrVurWrYuFhUWxdVEcc3NzevTowbZt2wgNDSU6OppHH320yH0V/b0VDy5pUYsK0bJlSywtLdmyZUuh63/++ScWFhYEBAQAFOm2PH78OCEhIXTu3Fn7B10z0UbX1kHTpk1p3749S5YsISEhodBrarWaBQsWYGdnR48ePWjYsCE1a9YstpxQMFkIClrYN2/eLJROeTZOKaubtm3btkUmbUFBN7O/vz/W1tZ653k/S0tL3nvvPTIyMpgzZw5Q8GxNmzYlPj5e2/LUfP/mzp2r7X2Agu7vPXv2sHXrVrp27Vri8rY2bdpw5coV6tWrp02vRYsWLF++nF27dmnvCwkJYefOnZw7d4527dphaWlJ69atWb58OWlpaTzyyCNAwZhxx44duXXrFlZWVnTs2FEbxEvrOSnJqVOnMDMzo379+tprAwcOJCoqijVr1tC7d+8irdrSPProo0RGRrJu3bpiu73BeN9bzdCOeHhIi1pUCBcXF0aOHMnSpUsxNzenXbt2HD58mKVLl/Liiy9So0YNoGD87cyZM0RERNCqVStatmzJ33//zcaNG6lTpw7h4eEsXboUKJhJrKsZM2YwcuRIBg8ezIsvvkizZs1ITk5m5cqVnDhxgoULF2qX6IwfP56ZM2dSo0YNevTowblz51i4cCF9+vTRbmQRGhrKsmXLCAsLo3HjxqxZs4bk5GS91yA7OTkRFRXF4cOHCQoKKjSmCAVdsK1atWLy5MlMmjSJOnXqsGHDBk6cOMHXX3+tV14l6dy5M6Ghofz222+MGDGCgIAAXn/9dV577TUcHBzo1asXKSkpzJ8/HzMzs0KbefTu3ZsZM2awY8cOFixYUGIegwcPJiwsjJdeeonRo0fj7OzM2rVr2bZtG0888YT2vpCQECZOnEjNmjW1vR9BQUHa3edq1qwJQEBAAIqiMH78eEaNGoWlpSU//PADTk5OtG/fvtTnjYqK0n6v8/Ly2Lt3Lxs3bmTAgAGFuqJbtWqFj48PR44cYcKECXrVaadOnbC3t+eHH37ggw8+KPYeY31vNc9y4MABGjRooNdkR1E9SaAWFWby5MnUrFmTtWvXsmTJEurVq8c777zD888/r73nhRdeYNKkSbzyyiv88MMPTJ06lezsbD755BMAGjVqxKJFi/jkk084duwYTz75pE55N2jQgJ9++omlS5eyZs0aEhIScHR0pGXLlqxZs0bbogcYMWIENjY2fP/99/z000+4ubnx4osvFtplbezYsSQmJjJv3jwsLCx44oknGDNmDCtWrNCrTsaMGaOdTb5169Yiu7eZm5uzZMkSvvjiC+bNm0dWVhbNmjXj22+/LXG5WXm888477N+/n08++YQ1a9bQo0cPFi9ezFdffcWGDRtwcHCgU6dOvP3224XGPmvUqEFISAiHDx+mS5cuJabv4ODAypUr+fzzz/noo4/IycnRLrm7/32dO3fG3Ny80IeW4OBgoPDyOmdnZ5YsWcKcOXN45513yM3NJSAggGXLlhXber3fK6+8ov23paUl9erV48033yz0c6gREhJCZmamtgy6srS0pFu3bvz+++/FdnuD8b63Dg4OjBo1ihUrVnDs2DG9hoVE9aRSZLaBEEIABasSevbsWWjpmhCmJi1qIcRDTVEUvvrqK6KiooiPjy+0h70QVYEEaiHEQ02lUrFlyxaSkpKYNWtWqYecCGEK0vUthBBCVGGyPEsIIYSowiRQCyGEEFVYlRyjzsvLJyXF+Ls+PYhq1rSTutKD1JfupK50J3WlH6mvolxdHUt8rUq2qC0sZOcdXUld6UfqS3dSV7qTutKP1Jd+qmSgFkIIIUQBCdRCCCFEFSaBWgghhKjCJFALIYQQVZgEaiGEEKIKk0AthBBCVGESqIUQQogqTAK1EEIIUYVJoBZCCCGqMAnUQgghhJHk5OSxbNlfZGXlGC1NCdRCCCGEkfz++yGmTFnGmjV7jJamBGohhBDCSK5cSQLg4sUbRktTArUQQghhJAkJtwGIi0s0WpoSqIUQQggj0QTq+Pgko6UpgVoIIYQwkhs3UgCIi7tptDQlUAshhBBGomlRp6ZmkpqaYZQ0JVALIYQQRqAoCjdv3tZ+baxxagnUQgghhBGkpWUVWj9trHFqCdRCCCGEEWjGp11cHABpUQshhBBVimZ8OiioCQDx8RKohRBCiCpDE6jbtSsI1MZqUVuUdUNubi5Tp07l6tWrmJmZMXPmTCwsLJg6dSoqlYomTZowffp0zMzMWLduHWvWrMHCwoJx48bRrVs3srOzmTx5MsnJydjb2zN79mxcXFyMUnghhBCiqtAEaj+/+jg42FRe1/fu3bvJy8tjzZo1vPbaa8yfP59PP/2UiRMnsmrVKhRFYceOHSQmJhIWFsaaNWtYunQpc+fOJScnh9WrV+Pr68uqVasYOHAgixcvNkrBhRBCiKpEM0bt4VETT09X4uISURTF4HTLDNQNGzYkPz8ftVpNeno6FhYWREVFERwcDEBoaCgHDhzg5MmTBAYGYmVlhaOjI15eXpw9e5bIyEhCQkK09x48eNDgQgshhBBVjWZplru7M97ermRkZJOSkm5wumV2fdvZ2XH16lX69u1LSkoKX3/9NYcPH0alUgFgb29PWloa6enpODo6at9nb29Penp6oeuae3Xh6upY9k0CkLrSl9SX7qSudCd1pZ8Hsb5u3UpDpVLRvHl9fH3rsWXLUdLSMvDzq2tQumUG6uXLl/PII4/w1ltvcf36dZ5//nlyc3O1r2dkZODk5ISDgwMZGRmFrjs6Oha6rrlXF4mJugX0h52rq6PUlR6kvnQndaU7qSv9PKj1deVKMrVqOZKSkomraw0ATp68jLe3R5nvLe2DS5ld305OTtoWcY0aNcjLy6N58+ZEREQAsGfPHoKCgggICCAyMpK7d++SlpZGTEwMvr6+tGnTht27d2vvbdu2bdlPK4QQQlQzN26k4OFREwBPT1cALl82fEJZmS3qF154gffee4/hw4eTm5vLpEmTaNGiBdOmTWPu3Ln4+PjQu3dvzM3NGTlyJMOHD0dRFCZNmoS1tTXDhg1jypQpDBs2DEtLS+bMmWNwoYUQQoiqJD09m4yMbNzdnQHw8ioI1MZYS11moLa3t2fBggVFrq9YsaLItSFDhjBkyJBC12xtbfnyyy8NKKIQQghRtd0/kQzuBWpjLNGSDU+EEEIIAyUkFCzN0gRqJyc7nJ3tjdKilkAthBBCGOjGjdsAuLk5a695eroSH59k8FpqCdRCCCGEgTS7kmkmk0FB93dWVg6JiXcMSlsCtRBCCGGgf3d9A3h61gYgLu6mQWlLoBZCCCEMpGlR3x+ovb3dAMPPpZZALYQQQhhIE6gLj1FLi1oIIYSoEhISUnBxccDa2lJ7zcuroEUdFyctaiGEEMKkEhJuF+r2BqhfX1rUQgghhMllZeWQmppZqNsbwMHBhtq1nWSMWgghhDClexPJahZ5zdOzNleuJKFWq8udvgRqIYQQwgCapVkeHs5FXvPyciUnJ08bzMtDArUQQghhgH/v830/Y5yiJYFaCCGEMEBpXd/GOEVLArUQQghhgBs3Crq+/z2ZDIxzipYEaiGEEMIAxe3zrSEtaiGEEMLEits+VOPeWmoJ1EIIIYRJJCTcxsnJDltbqyKv2dhY4e7uLIFaCCGEMJWEhJRiW9Manp6uXL2aTF5efrnSl0AthBBClFNOTh63bqUXOz6t4eXlSn6+muvXb5UrDwnUQgghRDlp1lAXN+Nbw9CZ3xKohRBCiHIqbSKZhqEzvyVQCyGEEOWkWUOtS6Au7+5kEqiFEEKIciptDbWGZhvR8p6iJYFaCCGEKKfS9vnWqFevFmZmqnKfSy2BWgghhCgnXcaorawsqFPHRVrUQgghRGXTJVBDwTj1tWu3yMnJ0zsPCdRCCCFEOd24kYK9vQ0ODral3ufp6YqiKFy9mqx3HhKohRBCiHJKSLhdZmsaDFtLLYFaCCGEKIe8vHySku7oGKgLDucoz1pqCdRCCCFEOSQl3UFRFB0DtRsgLWohhBCi0uiy2YmGp2dBi/ryZf2XaEmgFkIIIcrh3ozvkjc70ahXrxYODjacOnVZ73wkUAshhBDloOvSLAAzMzMCAxsRHX2N1NQMvfKRQC2EEEKUQ0KC7l3fAG3aNALg2LFYvfKRQC2EEOKBcutWWrlPqtLHjRu3Af0D9dGjF/TKRwK1EEKIB8rLLy+gU6fJbNx4sELz0ezzXdqBHPdr06YxAEePxuiVjwRqIYQQD4z8fDVHj8Zw924uY8YsYt68X1AUpULySki4jY2NJU5Odjrd7+7uTP36tTh6NEavMkmgFkII8cC4ePEGWVk5dOjgR/36tfj005+YOPG7cu2xXZaEhBTc3JxRqVQ6vycwsBFJSXf0Wk8tgVoIIcQD4/TpeAD69GnL5s3/oXVrH1av3s2wYZ/rPdu6NPn5am7eTNV5fFpD0/197Jju3d8SqIUQQjwwoqIK1in7+3vj7l6TjRvfp2/fIPbujaJfv4/KteFIcZKT08jPV+s8Pq3Rtm3BhLLISAnUQgghHkJRUXEANG/uCYC9vQ3ff/8GY8f25fz5a/TtO53ExFSD89F3aZZGQEBDzM3N9Jr5LYFaCCHEA+P06Tjc3Jxxda2hvWZubsaMGSMYN+4xkpLucPDgWYPz0cz41jdQ29lZ06yZJ//8c4ncXN3GzSVQCyGEeCDcvp3BlSvJ+Pt7Fft6SIg/ADEx1w3OS9811Pdr06YR2dm52vH0skigFkII8UA4c6Zwt/e/+fh4ABATc8PgvO51fes3Rg3Qtm3BhLLISN26vyVQCyGEeCBoxqf9/b2Lfd3TszYWFubExhojUN8Gyt+iBt03PpFALYQQ4oHw74lk/2ZpaYG3txuxsYZ3fcfHJwEFp2Lpq0mTujg62uo8oUwCtRBCiAfC6dNxWFlZ0KRJ3RLvadTIg1u30rl1K82gvKKjr+HqWoMaNez1fq+ZmRmtW/tw4cJ1bt8ue223BGohhBDVXn6+mrNnr+DrWw9LS4sS72vYsGCc2pDu7+zsHOLjE2ncuE6509B0fx8/XvZJWhKohRBCVHuxsQVbhzZvXvyMb41GjQwP1BcvJqBWK0YJ1Lp0f0ugFkIIUe2dPq2ZSFZWoC4IroYE6gsXCsa4GzcuuYu9LPqcpCWBWgghRLV3b8a3bi1qQ9ZSX7hwDcCgFrU+J2mV3JF/n2+++Ya///6b3Nxchg0bRnBwMFOnTkWlUtGkSROmT5+OmZkZ69atY82aNVhYWDBu3Di6detGdnY2kydPJjk5GXt7e2bPno2Li0u5H04IIYT4N80e32V1fXt41MTW1sqgtdTGaFFDQav6t98iiItLxM3NqcT7ymxRR0REcOzYMVavXk1YWBg3btzg008/ZeLEiaxatQpFUdixYweJiYmEhYWxZs0ali5dyty5c8nJyWH16tX4+vqyatUqBg4cyOLFiw16MCGEEOLfTp+Ox93dmdq1Sw54UDDjumFDD2Jjb5T7nOqYmOtYWVng5eVarvdr6LqeusxAvW/fPnx9fXnttdcYO3YsXbt2JSoqiuDgYABCQ0M5cOAAJ0+eJDAwECsrKxwdHfHy8uLs2bNERkYSEhKivffgwYMGPZgQQghxv5SUdK5eLXnr0H9r1MiDzMy72k1L9KEoCtHR1/Dx8cDc3LDR43vj1KVPKCuz6zslJYVr167x9ddfc+XKFcaNG4eiKNqDsu3t7UlLSyM9PR1HR0ft++zt7UlPTy90XXOvLlxdHcu+SQBSV/qS+tKd1JXupK70Y8z6ioq6BEBQUGOd0m3Z0ptNmw6RnJxKy5a6BXeNGzdSSEvLonlzT4OfoUePlpibm3Hy5KVS7yszUDs7O+Pj44OVlRU+Pj5YW1tz48a9vv2MjAycnJxwcHAgIyOj0HVHR8dC1zX36iIx0bDF6A8LV1dHqSs9SH3pTupKd1JX+jF2fR04UHAaVsOGHjql6+FRME8qMjIWf/8GeuUVHn4eAE9PV6M8Q/PmXoZ3fbdt25a9e/eiKAoJCQlkZWXRsWNHIiIiANizZw9BQUEEBAQQGRnJ3bt3SUtLIyYmBl9fX9q0acPu3bu197Zt29bgBxNCCCE0ytrj+9/uHc6h/8xvY00k02jTphF37+aWek+ZLepu3bpx+PBhBg8ejKIofPjhh9SvX59p06Yxd+5cfHx86N27N+bm5owcOZLhw4ejKAqTJk3C2tqaYcOGMWXKFIYNG4alpSVz5swxysMJIYQQUDCRzMrKQrv0qiyatdQXL+o/89sYS7Pu17ZtI374YUep9+i0POudd94pcm3FihVFrg0ZMoQhQ4YUumZra8uXX36pSzZCCCGEXvLy8jl7Nh4/v/qlbh16PxcXB5yd7cu1ROtei9o4gTowsFGZ98iGJ0IIIaqt2NgbZGfn6jzjG0ClUtGokQeXLiWQl5evV34XLlwv92EcxdGcpFUaCdRCCCGqLc3WoWVtdPJvDRt6kJubrz2uUhfZ2TnExSWWejqXvszMzOjTp/S5WxKohRBCVFu6bh36b+UZp754MQFFUbTvNZZFi8aW+roEaiGEENWWJlDr26Iuz57fxp5IpqHZl6QkEqiFEEJUW6dPx+HhUZNatfTbfETTKtZnQplmIpkxu751IYFaCCFEtZSSks61a7f07vYGaNjQHdCvRR0dXdCiNnbXd1kkUAshhKiWdD0xqzgODra4uztz8WKCzu8x1mEc+pJALYQQolo6fToe0H8imUajRnWIj08iOzunzHuNeRiHviRQCyGEqJbKO+Nbo1EjDxRF4dKlm2Xee/PmbdLTs40+kUwXEqiFEEJUS1FRl7G2tiz3mHHDhrrP/Db2Ht/6kEAthBCi2snLy+fcuav4+dXDwsK8XGncW6JV9sxvU00kAwnUQgghqqEzZ+K5ezeXli0blDsNfTY90bS6K3tpFkigFkIIUQ0dOlRwLnRwsG+50/D2dsPMTKVT17emRS1j1EIIIYQOjBGora0t8fR01anr+8KF67i5OePkZFfu/MpLArUQQohq5/Dh89Su7YSPj25nUJfEx8eDxMRU0tIyS7wnKyuH+Pgkk7SmQQK1EEKIaubatWSuXEkmKKhJmftkl0UzoSw2tuSNTy5evIGiKBKohRBCCF1our3btWticFr39vwueZxa85oplmaBBGohhBDVjDHGpzU0XeelBWpTTiQDCdRCCFFt3bmTSb9+HzF16nJTF6VSHTp0HisrC1q1amhwWvcCdckTyky52QlIoBZCiGopLy+fUaMWcvhwNJs3HzF1cSpNeno2UVFxtGrVEBsbK4PTq1+/NlZWFqWupb5w4fr/zxCvbXB+5SGBWgghqqFp01awc+dJAG7cuM3du7kmLlHlOHYshvx8tVG6vQHMzc1o2NCdmJiCCWP/pigKFy5cw8fHvdIP49CQQC2EENXM999vZ+nSbTRrVp/HH2+HoihcuZJk6mJVCmOOT2v4+NThzp1MkpLuFHktIaHgMA5TbB2qIYFaCCGqkZ07T/L++z9Su7YTYWFv4+/vDcDly2WfAPUg0ATqoCDDZ3xr+Pi4A8WPU1+4oJlIZprxaZBALYQQ1cb581d55ZUvMTc3Y/nySXh5ueLl5QrA5cuJJi5dxcvPV3PkSDQ+Ph64utYwWrqa1nJsbNGZ39HRmolk0qIWQghRiuTkNJ599gvS0rKYP3+0tutXE6jj4h78FvW5c1dIS8syarc33Gstv/fej7z66mJ27fqH/Hw1YPo11AAWJstZCCGEThITU3n55QVcvnyTN98cyODBnbWvNWjgBkBc3IPfoq6I8Wko2Djl/feHsHLlbtav38/69fvx8KjJoEGdOHy4IE9TtqglUAshRBV1+3YGixf/wbffbiEz8y5PPNGed94ZVOgeNzdnbGwsH4ox6ooK1ObmZrzxxgBef/0JDh+OZt26vfz6azhfffUHgMkO49CQQC2EEFVMenoW3367hcWL/+TOnUzc3JyZNm0ozz3XHTOzwiOWKpUKT09XnVvUaWmZLFr0O6+99rhJg095HDp0Hmdn+wpr3apUKoKDfQkO9mXWrJFs336cX38NN/oHA31JoBZCiCoiJyePJUu2snDhJpKT03BxcWD69OG8+GJP7OysS3yfl5cr0dHXuHMns8zgu27dPubN+xVHRzvGj3/c2I9QYRISUoiLS6RXr9ZFPqxUBBsbK/r3D6Z//+AKz6ssEqiFEKKKWLDgV/773w04OtoyZcpgxozpg4ODbZnv8/YuGKe+fPkmLVs2KPXec+euAve6kauLiur2rg4kUAshRBXxyy/h2NpacejQPGrVctT5ffcv0SorUEdHFwTqw4fPoyiKwcdEVpZDh6KBhzNQy/IsIYSoAmJirhMdfY0uXVrqFaQBvLw0M7/LnlB2/nzBBh7JyWmlnhhV1Rw+fB4LC3Nat25k6qJUOgnUQghRBWzZchSAvn3b6v1eb2/NWurSJ5SlpKSTmJiqbUVXl+7vzMy7nDx5iYCABtjaGn4QR3UjgVoIIaqALVsiUalU9OzZWu/33j9GXZrz5wu6vbt0aQFUn0B9/HgseXn5tGv38HV7gwRqIYQwuaSkOxw+fJ527ZqUa2tMJyc7nJ3ty2xRR0cXdHs/8UR7HB1tiYjQLVCfORPPb79FkJqaoXfZjOFhnkgGMplMCCFMbvv2Y6jVCn366N/treHl5cr581dLnSCmmfHdtGl92rZtzK5d/5CUdIfatZ1KTFdRFJ5/fh6XLiVgaWlOSIg/jz3Wjj592uLmZrz9tkuj2R3sYQ3U0qIWQggTM2R8WsPb243s7Fxu3rxd4j2aGd++vvVo394PuBcES3L6dDyXLiXQrJknzZp58vffJ3n77aW0bPkaTzwxgxUrdhZ7jrOxqNVqDh+OxtvbDXd35wrLpyqTFrUQQphQVlYOu3f/Q+PGdQw681izROvSpZu4u9cs9p7o6Gu4uxdsh6lpnUZEnKdv36AS0/3zz8MATJw4gCef7EhcXCJ//nmEP/44TETEecLDz1GnTk169Ghd7rKXJjr6GrdvZ9CrV2CFpF8dSItaCCFMaM+eU2Rm3jWo2xvuX6JV/Dh1RkY28fFJ+PrWA6BNm0aYm5uVOaFs8+ZIrKws6Nmz1f/n48rYsX3ZtOlDfvppKnCvR6AiREZeAIx7/nR1I4FaCCFMaMuWSACDA3VZS7QuXChYM60J1Pb2NrRs2YATJ2LJysop9j2XL9/k1KnLhIT44+hYdGvSTp2aUbOmA9u3H6uw7u/Tp+MBCAhoUCHpVwcSqIUQwkTUajVbtx6ldm0n2rZtbFBamiVaJQVqzdKsJk3unascHOxLbm4+J07EFvuezZsLPkSU1DVuYWFO9+6tuHbtFlFRceUue2lOn45DpVLh51e/QtKvDiRQCyGEiURGxpCUdIfevdtgbm7Yn+P69WujUqlKXEutWZrl61s4UEPJ66k3bz6CSqWid+82Jeb76KMFY8fbthm/+1tRFE6fjqNBAzccHGyMnn51IYFaCCFMRNPtXVog1JW1tSUeHs4lbiN6r0VdT3uttECdlHSHiIhzBAU1LnW2dbduAZibm7F9+zEDSl+8hITb3LqVTvPmXkZPuzqRQC2EECaydWsktrZWhIa2MEp63t5uXLt2i5ycvCKvnT9/FWdn+0Jrnz08auLl5crhw9Go1ep/le0oarXCY4+1KzVPZ2d72rf34+jRWG7eTDXKc2icPl3Qnd68uadR061uJFALIYQJxMbe4Pz5gkM4SjtrWh9eXm6o1QpXriQVup6Tk8fFiwk0aVK3yGYowcG+pKSka7vGNTZvPgLotra7V69AFEXh77+PG/YA/6IZ95YWtRBCiEp3b6KWYbO976dZS/3vCWUXL94gP1+tnfF9P83GJ/d3f6enZ7F79ymaNauPj49HmfneG6c2bve3Zsa3BGohhBCVzpBDOEpSUqDWjE8XF6iLG6feufMkd+/m0rdv6d3eGo0b16FBA3d27vyn2G73+8XFJXL9+i2d0j19Og47O2vt0rOHlQRqIYSoZIYewlGSBg2KP5dacwb1/TO+Nfz86lGjhl2hAzr++KOg27tfv5J3LLtfwczwQDIysjl48GyJ9928mUr37u8xYMDHZaaZk5NHdPQ1mjWrj5nZwx2qHu6nF0IIE/jrL8MP4SiOpkX97yVamj2+75/xrWFmZkZQUBMuXUrg5s1UcnLy+Ouv43h61qZFC2+d89Zs8Vna7O/Zs3/izp1MDh+OJiUlvdT0oqOvkZeX/9B3e4MEaiGEqHSbNxt+CEdxPDxqYmVlUUzX9zXs7KypX79Wse+7f5x6//7T3LmTSd++QSWewlWcDh2a4uBgw9atR4vdpeyffy6xYsUu7dcREedKTe/ejG8J1BKohRCiEt28mcpffx2jWTNPgw7hKI6ZmRn169cuFKjz89VcuHCNxo3rlNiFfO+AjnP8+afus73vZ2VlQbduAVy+fLPIDHJFUfjwwxUoisIbbzwBwIEDZ0pNT5Zm3aNToE5OTqZLly7ExMRw+fJlhg0bxvDhw5k+fbp27d26det46qmnGDJkCDt37gQgOzubCRMmMHz4cEaNGsWtW7pNIBBCiAfV6tW7yM3N57nnuldI+t7ebiQnp5GengVAfHwS2dm5xXZ7a7Ru7YOFhTkREefYsiUSFxcHbStbH5ru73/P/t68OZL9+8/Qq1dr3nzzSSwtLQgPL6tFXTDju1kzaVGXGahzc3P58MMPsbEp2L7t008/ZeLEiaxatQpFUdixYweJiYmEhYWxZs0ali5dyty5c8nJyWH16tX4+vqyatUqBg4cyOLFiyv8gYQQoqrKz1cTFrYTOztrhgx5pELyuDdOXdCqvncGddGJZBp2dta0atWA48djSUi4Te/ebbGwMNc77x49WqFSqQqNU9+9m8tHH63CwsKc//znWWxtrQgObsLJkxe1HyaKc/p0HPXq1cLZ2V7vcjxoygzUs2fPZujQobi5FcwmjIqKIjg4GIDQ0FAOHDjAyZMnCQwMxMrKCkdHR7y8vDh79iyRkZGEhIRo7z148GAFPooQQlRtO3eeJC4ukUGDOhV7GpUx3FuiVTChTDPju7QWNUC7dr7af5d37NzVtQZt2jTi0KHz2sli3323lUuXEnj55V40blzwYSE01B+1WuHQoehi00lKukNCwm3p9v5/FqW9uGHDBlxcXAgJCeHbb78FCsYaNBMM7O3tSUtLIz09HUdHR+377O3tSU9PL3Rdc6+uXF0dy75JAFJX+pL60p3Ule50qavVq3cDMHHiExVWtwEBBTO1b926g6urI3FxCQB07Ohbap69erXi6683Y29vw+DBHbG1Ld9uaU891YHIyAtERp6ne/cA5s37BRcXRz799Dlq1nQACgL1p5+u5+TJWJ55pnORNP75p+A0r6CgxvIzSBmB+ueff0alUnHw4EHOnDnDlClTCo0zZ2Rk4OTkhIODAxkZGYWuOzo6FrquuVdXiYm6B/WHmauro9SVHqS+dCd1pTtd6urKlST++OMwbdo0wtPTrcLq1tm5ILBFRcWTmJjGyZOXsbAwx8nJvtQ8mzXzxtbWij592pKenkN6evFnVJelUyd/ANavP8jmzcdIS8vi00+fJy9P0ebfqVMzzMxU7NhxkokTi5bpwIGC8esGDTwemp/B0j6QlBqoV65cqf33yJEj+eijj/jvf/9LREQE7du3Z8+ePXTo0IGAgADmz5/P3bt3ycnJISYmBl9fX9q0acPu3bsJCAhgz549tG1r3KUIQghRXaxYsRO1WuH553tUaD73706mKArR0dfw8XHH0rLUP/fUru3E3r2f4+JiWAu2eXNP6tWrxZYtkWRn5+DnV6/IMzs52REQ0JBjx2LIysrB1taq0Ov3tg6Vrm8ox/KsKVOmsHDhQp555hlyc3Pp3bs3rq6ujBw5kuHDh/P8888zadIkrK2tGTZsGNHR0QwbNoy1a9cyfvz4ingGIYSo0nJz81ixYhc1atgxYECHCs3L2dkeR0db4uISSUi4zZ07mWWOT2t4ebkafO6zSqWiV69AMjPvolYrzJgxotiJaR06+JGTk8fRoxeKvHb6dBxWVhZGX75WXZX+Ees+YWFh2n+vWLGiyOtDhgxhyJAhha7Z2try5ZdfGlA8IYSo/rZsieTmzduMHt3HaCdllUSlUuHt7UZs7A3OnSuY8e3np1ugNpY+fdqwfPlf9OrVmm7dAoq9p2PHpnz99WYOHjxL587Ntdfz8vI5d+4Kfn71yzXz/EGkc6AWQghRPsuX7wCosLXT/+bl5cqpU5e1+243aVLy0qyK0K1bAF9//Rpdu7Ys8R7NOu1/7w1+8WIC2dm50u19HwnUQghRgWJirrN3bxSdOzcr9vSqiqAZp/7rr4L1zJWVr4ZKpeKppzqVeo+LiyPNmtXnyJFocnLysLIqCEeydWhRsoWoEEJUoB9+KGhNV/Qksvt5exfse3Hy5CVUKlWVHevt0KEpWVk5nDx5UXtNAnVREqiFEKKCZGXlsHbtXmrXduKxx3Q729kYNIEawNOzdoWPi5dXx45NgcLd3zLjuygJ1EIIUUF++y2ClJR0nn22q7ZrtzJour6h8ru99dGhQ0GgDg+/P1DH4epaw6jndFd3EqiFEKKC/PDDDlQqFSNGdKvUfD097wXqyp5Ipg8Pj5o0bOhOePg58vPV3LmTSXx8krSm/0UCtRBCVIDw8LMcORJN9+4BhbqiK4OtrRXu7s5A1W5RQ8EuZWlpWZw+HXdft7eMT99PArUQQhiZoijMmLEGgLfeetIkZdB0f1f1QK3p/j548KxMJCuBLM8SQggj27w5kiNHounXrx1BQU1MUoYOHZpy8WICzZrVN0n+urp/QlmtWgXbl0rXd2HSohZCCCPKy8vn44/XYm5uxvvvDyn7DRXkvfeGcPToAhwcbE1WBl14etamXr1ahIefJSoqDnNzsyrfC1DZJFALIYQRrVmzh+joawwf3lV7/rIpmJubYWNjVfaNJqZSqejQoSnJyWkcPRpDkyZ1sba2NHWxqhQJ1EIIYSSZmXf5/POfsbW1YvLkp0xdnGpD0/2tKIp0exdDArUQQhjJkiVbuXEjhTFj+uLhUdPUxak2NIEaZCJZcSRQCyGEEdy6lcaXX26iZk0Hxo9/3NTFqVYaN65D7dpOgATq4kigFkIII1iw4Dfu3Mlk0qSBODnZmbo41YpKpaJLl5ZYWprTsqW3qYtT5agURVFMXYjiJCammboI1YKrq6PUlR6kvnQndaW7zMwsfH3H4u7uzIEDX8hkqDIU97N1+3YGV68m4+//cLaoXV0dS3xN1lELIYSBpk9fRU5OHlOmDJYgXU7OzvY4O9ubuhhVkgRqIUSlUavVnD9/jYiIc0REFOzv/Npr/QgIaGjqopVbVFQcP/64k+bNvRg0qLOpiyMeQBKohRAVKjr6Glu2RBIRcY5Dh85z+3ZGodc3bjzIU091ZOrUITRoULl7YhtKURSmTQtDURQ+/HAo5uYy7UcYnwRqIUSFyc9X89hj00lNzQQKzkl+9NE2tG/vS/v2fty4kcLMmWvYsOEgmzYd4oUXejJp0kDtDOCqbtOmQ+zbd5rHH29H9+6tTF0c8YCSyWTVnEz40U9F1peiKGzbdowWLbypV69WheRRmYxRVzEx1+nY8W169GjF3LmvUKeOS5F71Go1v/0Wwccfr+Py5Zs4ONgwfvzjTJw4ADOzqttCzcjI5pFH3iExMZWoqEXUqFHyZCBRmPzdKqq0yWRV97dAiGrm998PM3LkHJ54YgaJiammLk6VcObMFQA6d25ebJAGMDMzY+DAjuzf/18+/fR5bGys+Oyz9fzyS3hlFlVvCxf+ztWryYwb95hJtwoVDz4J1EIYQVZWDh99tBKA+PgkXn55ATk5eZVahiNHovn+++1UpU6ys2cLzhfW5QQnKysLXn75UX788U0ADh+OrtCyGeLSpZt89dXv1KlTkzfeGGDq4ogHnARqIYxg8eI/iI9PYty4xxgwoD3h4eeYMmVZpQbNd95ZxtSpyzl79kql5VmWM2c0gVr3/Zv9/b2xsDDn+PHYiioWmZl3mTVrDQcOnCnX+6dPX8ndu7lMnz4cBwcbI5dOiMIkUAthoKtXk/nyy99wda3B228/yYIFYwgIaMDKlbv47rutlVKG+PhETp26DMCffx6plDx1cfbsFZyc7Ers9i6Ora0Vfn71OH06jry8/Aop16pVu/jyy008+eTHfPLJOnJzde/92LnzJJs3H6FDBz+efLJjhZRPiPtJoBbCQDNnriYrK4cPPngGR0c77Oys+eGHN3F1rcGHH65g586TFV6GrVuPav9dVQJ1dnYOsbE3aNbME5VKpdd7W7f2ISsrh/Pnrxq9XIqiEBa2E0tLczw9azN//q/07z+DixcTynxvTk4e77//I2ZmKj755Hm9n0uI8pBALYQBwsPPsWHDQQIDfXjmmRDt9Xr1avHDD5OwsDBn9OiFxMRcr9BybN4cCUCLFt78888l4uISKzQ/XURHXyM/X03TpmWPT/+bZgOUEycuGrtYHDsWy5kz8fTp05adOz/h6acf4ejRGLp3f4916/aWOlyxZMlWLly4zvPP96BFC9mTWlQOCdRClFN+vpr33/8RgI8/fq7IUqKgoCbMmfMKqamZjBgxh9TUjOKSMdjt2xkcOHCGwEAfnn++BwCbN5u+VV2e8WmNVq0qLlCvWPE3ACNGdMPR0Y6vvhrH4sWvolLB+PFfM27cV0RHXyMuLpFr15JJSEghKekOsbE3+OKLDdSs6cCUKYONXi4hSiIbnghRTqtW7eKffy7x9NOPEBTUpNh7nnkmhDNn4lm8+A8mTPiaH398y+jl+Ouv4+Tnq+nTpy19+rTlnXeW8eefRxgzpq/R89KHZlKbLjO+/615c08sLMw5ccK4E8rS07PYsOEgnp616dKlhfb64MGdCQpqwrhxX7Fhw0E2bDhYYhqff/4iLi6yZlpUHgnUQpRDamoGn3yyDjs7a6ZNG1rqvdOmDeXQofNs2XKUS5duGn2bzC1bCrq9+/YNwt3dmaCgxkREnCMp6Y5Jd/jStKibNtW/RW1jY0XTpvWJioojNzcPS0vj/KnauPEgmZl3mTChf5EekAYN3Pjtt2ksXbqds2fjyctTk5+vJj8/n/x8NXl5ajw9azNyZHejlEUIXUmgFqIcvvhiA8nJaXzwwTN4eNQs9V5zczNGjuzGkSPRbNx4gEmTBhqtHHfv5rJjxwkaNHDHz68eAI891o7Dh6PZuvUozz7b1Wh56evs2St4eNSkZk2Hcr2/deuGnDp1mXPnrhptPHjFip2YmakYNiy02NctLS0YO9a0PRFC/JuMUQuhp+joayxdup0GDdx17l7u168d1taW/PzzfqOurd63L4qMjGz69GmjnYH82GNBAPz552Gj5aOv1NSCs4XLMz6toZlQdvKkccapT526zLFjsfTs2Zq6dav/Fq/i4SGBWgg9ffPNZvLy8vnww2E6nz3s5GRHz56tOX/+GlFRcUYri2a2d9++bbXXGjZ0p1kzT3bvPkV6epbR8tKHZny6PDO+NVq39gGMN6Fs5cpdACbtZRCiPCRQC6GHjIxsNmw4QL16tQoFR11oziresOGAUcqiVqvZuvUotWo50q6db6HXHnssiJycPHbsOGGUvPRlyIxvjWbNPLG0NDdKoM7KymH9+n24uzvTq1egwekJUZkkUAuhh02bIkhPz2bYsC56nz3cs2crHB1t2bjxIGq12uCyHD9+kYSE2/TqFYiFhXmh1+51f5tmmda9Gd/lD9TW1pY0beqpnVBmiE2bIkhNzWTYsC5F6kqIqk4CtRB6CAvbiUpV8mSk0tjYWPH448FcvZpMRMQ5g8uiWSvdp0/Rln2LFt54ebmyffsx7t7NNTgvfZ05E49KpcLXt55B6bRu3ZC7d3MN3r98xYqdAAwf3tWgdIQwBQnUQujo3LkrHD4cTdeuLfH0dC1XGk891QmAn382vPt7y5ZIbGwsC60H1lCpVPTtG0R6ejb79kUZnJc+FEXh7NkrNGzojq2tlUFp3ZtQdqncaVy4cI3w8HOEhPgbfWmcEJVBArUQOtJMRhoxomu503jkkea4uTmzadMhg47BjI29wblzV+nSpSX29sWf3tSvn2m6vxMSbpOSkm5Qt7eGMSaUrVixC4CRI7sZXB4hTEECtRA6uHs3l59+2kft2k707q3fJLL7mZub8eSTHUhJSWfXrvIf1nFvk5OSy9KunS+1azuxeXMk+fmGj4nr6t5GJ+Wf8a3RtGn9/59QVr4dynJy8li3bi8uLg707RtkcHmEMAUJ1ELoYMuWSJKT0xgyJAQrK8P2CTJG9/eWLZGoVKpSZzCbm5vRp08bkpLucPhwdLnz0pcmUDdvbniL2trakmbNPDl9Or5cPRBbtkSSlHSHp58O0XkpnRBVjQRqIXSgmYxkjDW4rVv74OPjwZYtkaSnZ+v9/qSkOxw6dJ527Zrg6lqj1Hsfe6wdULmbn9xbQ214oAZo1cqHu3dzOXdO/wlla9bsAQwbrhDC1CRQC1GGy5dvsnv3Kdq396NJk7oGp6dSqXjqqU5kZeVou7D1sX37MdRqpdjZ3v8WEuKPg4MNf/55xKg7opXmzJl4rK0tadjQ3SjplfckrYSE2+zceZLAQB/8/AzvhhfCVCRQC1GG1asLWmXG3NHqqac6AuXb/ESzG5lmrXRprK0t6dmzNXFxiTz22HQefXQaXbpMpUOHt2jT5nVatRrP7NnrjRbE8/PVnD9/lSZN6hptvXLr1gWB+vhx/capN248SH6+miFDQsq+WYgqTAK1EKXIy8tn9epdODra0r9/sNHSbdy4Lq1aNWTnzpMkJd3R+X03bqSwa9dJfH3r4uPjodN7hg/vipWVBUePxnLu3BWuXUsmLa1ga9GsrBzmzNnI5MnfG2XC2eXLN8nKyjHKjG+Npk09sbKy0HuJ1rp1e7GwMGfgwI5GK4sQpiCnZwlRip07T3L9egovvNCzxGVQ5TVoUGdOnLjIpk2HePHFnjq95/PP15OdncvYsY/pnE/Xri2Ji1uGSqXSHtyhkZiYyjPPzObHH/8mPT2LhQvHGnSkpDFnfGtYWVnQvLkXp0/HkZOTp9NkvqioOE6dukyfPm2pVUvOjhbVm7SohSiFZg1uRUxGGjiwAyqVip9/3q/T/WfPXmHVqt00bVqfoUP12xnNzMysSJAGcHWtwcaN7xMc7MuGDQd54YV5ZGXl6JX2/Yyxx3dxAgIakJOTx9mz8Trdv27dXgDp9hYPBAnUQpQgIeE227YdpWXLBtodsozJw6MmjzzSnEOHzvPPP5fKvH/mzNWo1QrTpg016n7VNWrYs3btFLp2bcn27ccZPvzzcp+6ZYw9voujz4SyvLx8fv75AM7O9vTq1dqo5RDCFCRQC1GCtWv3kp+vrtBjEV99taAL+5VXvuTOncwS79u3L4rt24/zyCPN6dmztdHLYW9vQ1jYWzz+eDv27z/DoEGfkJys+9i5xpkz8Tg52VG3rotRy6fZoez48bID9Z49p7h58zYDB3aUtdPigSCBWogS/PTTXqytLRk0qFOF5dGjR2smTOjPxYsJvPHGt8XOvlar1fznP6sBmD59eLFd2MZgbW3Jt99OYOjQUI4di+XRR6frdWpVdnYOsbE3aNq0vtHL6OdX//8nlJUdqDXd3s88I93e4sEggVqIYpw5E8+5c1fp0aMVNWrYV2he7777NJ07N+OPPw7zv//9WeT1jRsPcuLERZ56qpO2C7iiWFiYM3/+KAYN6szRozEsWbJN5/dGR18jP19t1IlkGlZWFvj7F0woK+00sLS0TP788wiNGtWhTZtGRi+HEKYggVqIYvz6azhQMOGrollYmPP11+Nxc3Nm5sw1hIef1b52924un3yyDisrC9599+kKLwsUTDybNWskLi6OfP75z1y/fkun91XU+LRGQEBDcnPzSz3yctOmQ2Rn5zJkyCMV1vMgRGUrNVDn5uYyefJkhg8fzuDBg9mxYweXL19m2LBhDB8+nOnTp6NWF6y9XLduHU899RRDhgxh586C7Razs7OZMGECw4cPZ9SoUdy6pdsvvBCmpCgKv/wSjp2ddal7aRuTu7sz3303HoBRoxZy82YqAEuXbiM+PomXXuqFt3flHdFYq5Yjn332HBkZ2UyfvlKn91TUjG8NXSaUrVu3D4DBgztXSBmEMIVSA/Vvv/2Gs7Mzq1at4rvvvmPmzJl8+umnTJw4kVWrVqEoCjt27CAxMZGwsDDWrFnD0qVLmTt3Ljk5OaxevRpfX19WrVrFwIEDWbx4cWU9lxDldurUZWJjb9CrV2ujr50uTceOzXj//WdISLjN2LGLSEq6w7x5v1Cjhh2TJg2stHJovPxyL9q2bcQvv4SzZ8+pMu+/t8d3xWzXqQnUYWF/c/VqcpHXL1++yYEDZ+jcuVm5zwsXoioqNVD36dOHN954Q/u1ubk5UVFRBAcX7NAUGhrKgQMHOHnyJIGBgVhZWeHo6IiXlxdnz54lMjKSkJAQ7b0HDx6swEcRwjh++aWg23vAgMrf0eq11/rRt28Q+/adpk+fD0lNzWTixIHUrOlQ6WUxMzNj9uwXMTNTMXXq8jJPrzpzJh53d2dcXCpmgxF/fy8GDuzAiRMX6dbtXTZtOlTo9fXrC9ajy9pp8aApdYsfe/uCSTTp6em8/vrrTJw4kdmzZ2vHfuzt7UlLSyM9PR1HR8dC70tPTy90XXOvrlxdZTchXUld6ae0+lIUhU2bInBwsGXo0M7Y2lpXYskKrFr1JkFBbxITcwNvbzemTn0KGxurSi8HQI8eAYwb15evvvqTsLAdTJ06uNj7UlMzuHo1mUcfDazQn8cNG97lu++2MnHiEl5+eQEvvdSTBQtGYW9vw88/78fW1ooXX+yOo6NdhZWhJPJ7qB+pL92VuRff9evXee211xg+fDj9+/fnv//9r/a1jIwMnJyccHBwICMjo9B1R0fHQtc19+oqMVH3oP4wc3V1lLrSQ1n1dfRoDJcu3eSppzqRnp5Denr5d+kyxHffvc7kyUuZPHkQaWl3SUu7W+ll0NTVxIkDWbt2HzNnrqV377bUr1+70H3XriUzY0bB8jEfH48K/3l88snOtGjRkLFjv+L77/9i165TjBrVmwsXrjNoUGeys/PJzq7c3wn5PdSP1FdRpX1wKbXrOykpiZdeeonJkyczeHDBJ+nmzZsTEREBwJ49ewgKCiIgIIDIyEju3r1LWloaMTEx+Pr60qZNG3bv3q29t23bso/lE8KUfvmlYHimMmZ7l6ZFC282b55B9+6tTFoOKNi5bPr04WRm3uWDD8K019PTs/jss5/o2PFtNmw4iL+/F6NG9a6UMjVpUpc///yI117rR2zsDd599wcAhgx5pFLyF6IyqZRSzrebNWsWmzdvxsfHR3vt/fffZ9asWeTm5uLj48OsWbMwNzdn3bp1rF27FkVRGDNmDL179yYrK4spU6aQmJiIpaUlc+bMwdVVt0ke8mlLN/LJVD+l1ZdaraZNmzdIT88mKmrxQ7+r1f11pSgKTzwxk4iIc6xY8RYJCbf57LP1JCam4u7uzHvvDWHIkBDMzSt/xeeePacYP/5rHB1t2bNntknKIL+H+pH6Kqq0FnWpgdqU5JuoG/mB109p9RURcY7+/WfwzDMhLFw4tpJLVvX8u65On46jR4/3UasVFEXBzs6a117rx6uv9qvU2fHFyc3NIzc3Hzu7yp9TAPJ7qC+pr6LK3fUtxMPkt98KhnRM3e1dVTVv7sX48Y8DMGxYF8LD5zB58iCTB2kAS0sLkwVpISqanEctBJCfr+a33yKoWdOB0NAWpi5OlfXee0OYMKE/Tk6VP6taiIeVtKiFoKDbOyHhNv36BWFpKZ9fS6JSqSRIC1HJJFALgWk3ORFCiNJIoBYPvby8fH7//RC1azvRuXMzUxdHCCEKkUAtHnr7958hKekOjz8ejIWFuamLI4QQhUigFg+9X3+tGpucCCFEcSRQi4fali2RbNwYjru7M+3b+5m6OEIIUYRMbxUPpYyMbD78cCVhYX9jbW3Jf/7zrEl2tBJCiLJIoBYPnaNHY3j11cXExt6geXMvvv76tQo7Q1kIIQwlgVo8NPLy8pk1ay0ffbSa/Hw1r77aj3ffffqh39NbCFG1SaAWD7yUlHR27DjB0qXbiIy8QJ06NVm0aBwhIf6mLpoQQpRJArV4IF28mMDWrUfZujWS8PBz5OergYJjEGfNeg5nZ3sTl1AIIXQjgVo8MBRF4aef9rFw4SbOnbuqvd62bWP69GlD795tCQlpJqf2CCGqFQnU4oGQkJDC229/z9atR7G2tqR37zb07t2GXr0CcXd3NnXxhBCi3CRQi2pNURQ2bDjAu+/+wO3bGYSE+DNv3ii8vFxNXTQhhDAKCdSi2rp5M5V33vmeP/88gp2dNZ999gIvvNADMzNZDy2EeHBIoBbVjqYV/f77P3LrVjodOzZlwYIxNGjgZuqiCSGE0UmgFtXKP/9c4v33fyQ8/By2tlZ8/PFIXn75UWlFCyEeWBKoRbWQnJzGp5+uIyxsJ4qi0KdPW2bMGCGtaCHEA08CtajScnPzWL78Lz7//GdSUzPx9a3LrFnP0bVrS1MXTQghKoUEalFlpaVlMmDALE6duoyTkx2zZo3kxRd7YmkpP7ZCiIeH/MUTVdY332zh1KnLDBjQnk8/fYHatZ1MXSQhhKh0EqhFlZSSks7//vcntWo5Mm/eaBwcbExdJCGEMAmZKiuqpP/970/S0rKYMKG/BGkhxENNArWocpKS7vDtt1twd3fmhRd6mro4QghhUhKoRZWzcOEmMjPvMnHiAOzsrE1dHCGEMCkJ1KJU2dk5XLuWXGn53biRwrJl26lXrxYjRnSrtHyFEKKqkkAtisjPV7N3bxRvvPEt/v6v0qbNG/zyy8FKyXv+/F/Jzs7lrbeexNraslLyFEKIqkxmfQugYP/sU6cu8/PPB9iw4QA3bqQAULeuC4qi8Npr/8PJyY7u3VtVWBni4xMJC/sbb283nnkmpMLyEUKI6kQCdTWWlZXDzZu3UanMDUonOTmNV15ZwP79ZwCoUcOOkSO7MWhQZzp08CM8/BxDh87mxRfns27dVNq39zNG8YuYN+8XcnPzmTz5KdnURAgh/p90fVdDiqLw66/htG//JvXrv8SsWWvIyMguV1oxMdfp23c6+/efoUuXFixbNpFTpxYzZ84rdOrUDDMzMzp1asaSJa+Tk5PHs89+walTl438RBAbe4PVq/fQpEldBg3qbPT0hRCiupJAXc3Ext7gmWdmM2rUQlJS0nF1deLLLzfRufNkfv01HEVRdE4rPPwsjz32EZcuJTBp0gDWrp1Cv37tih0bfvTRNixcOJa0tCyeeWY2sbE3jPlYzJmzkfx8Ne+8Mwhzc/mxFEIIDfmLWE1kZ+fw+ec/06XLVHbt+odu3QLYvfszoqO/YdKkASQl3WHUqIUMHvwp589fLTO9n3/ez+DBn5KWlsX8+aN4990hZR4VOXhwZz755DkSE1MZMuQzrl+/ZZRni4y8wPr1+2ne3Iv+/YONkqYQQjwoJFBXcVlZOWzadIjQ0Kl88cUGXFwcWLr0ddaseQcfHw/s7Kx5990h7Nkzmx49WrF3bxRdu77LtGlhbN9+jJiY6+Tm5mnTUxSFefN+Ydy4xVhbW7J69TsMH95V5/K8/PKjTJ06mLi4RIYM+YykpDvlfrb8fDVffrmJJ56YgaIovP9+2R8WhBDiYaNS9OkrrUSJiWmmLoJJKIrChQvX+fvvE/z990kOHjxDdnYu5uZmjB7dh8mTn8LBwVZ7v6uro7auFEVh69ajfPBBGHFxidp7zM3N8PR0pVEjDxRF4e+/T1K/fi1WrpxMs2ae5Srjhx+u5JtvNtOggTsrV75NkyZ19Urj0qWbjB//Pw4dOo+bmzMLFoyiR4/WepdFX/fXlyid1JXupK70I/VVlKurY4mvSaA2MUVRuHgxgSNHoomIOM+uXSeJj0/Svt6smSfduwcwZEhIsUG1uB/4rKwc/vrrODEx14mNvUFMzHUuXkzQtn5btWrIihVv4e5es9zlVqvVzJ69nnnzfsXJyY4lS17X6YxoRVFYsWIn06atIDPzLk880Z7PP38RF5eSf0iNSf5A6E7qSndSV/qR+ipKAnUVkpl5l8jICxw5Ek1k5AUiIy+QnHzvWWvUsKNLl5Z07x5At24B1KnjUmp6+vzAp6ZmcP16Co0aeRht+dNPP+1j0qTvyM9X8/HHz/HSS71KvDc+PpGpU5ezfftxnJzs+OyzFxg0qBMqlcooZdGF/IHQndSV7qSu9CP1VVRpgVoWq1YwRVE4cyaenTv/YefOk0REnOPu3Vzt656etQkNbUHbto0JCmpCQEADLCwMWxddkho17KlRw96oaT799CN4e7vxwgvzmDp1OdHR15g5c4T2GW7cSGHTpgh++SWcw4ejAQgJ8efLL8dQr14to5ZFCCEeRBKoK8iBA2dYs2YPu3b9o93lC8Df34vQ0BYEB/sSFNTYoO7nqiI42JctW2YwcuQXLF26jdjYG/Tu3YZffw0nPPwciqKgUqno3LkZzzwTwpAhITJpTAghdCSBugIsX/4XU6cuR61WqFXLkaee6kS3bgF07drigQjMxfHycuX336czZsxX/PXXcXbuPAlA+/Z+DBzYgccfD8bd3dm0hRRCiGpIArURKYrC7NnrmTv3F2rXduKbb8bTuXOzh6b16OhoR1jYW3z77RbMzc3o3z+4zDF2IYQQpZNAbSR5efm8/fZSVq3ajbe3G2vXTsHHx8PUxap05uZmjBv3mKmLIYQQDwwJ1EaQmXmX0aMXsm3bMVq1asjKlZNxc6th6mIJIYR4AEigNlBychojRnxBZOQFunZtyfffv1FoQxIhhBDCEBKoDRATc50RI+YQE3OdwYM7M3/+aKyspEqFEEIYj0SVctq27Sjjxi0mLS2L8eMf54MPnnloJo0JIYSoPBKo9aRWq5k371c+//xnrK0t+OqrcTz99COmLpYQQogHVJUO1Gq1mjt3skhOvkNychouLg40alSnUrecvF9aWibjx3/D5s1H8PSszbJlEwkIaGiSsgghhHg4VMlA3aLFeG7eTOXWrTTy89WFXvPwqElIiD+hoS0IDfXXaZ2uoijcvp1BcvIdkpLukJSUhpOTLT4+HtSt66JTl/WFC9d4/vl5REdfIyTEn2+/nUCtWpVzkIQQQoiHV5UM1Neu3cLFxZEGDdyoVcuRWrWcqFnTgStXkti37zQ//bSPn37aB0DjxnVo3doHtVpNVlYud+/mkJ1d8P/MzBxu3UojOTmNvLz8YvOysbGkQQN3fHw88PHxwMXFkfT0LNLSCv67cyeTtLQsjh2LIT09m7Fj+/Lhh8MqbD9uIYQQ4n4VHqjVajUfffQR586dw8rKilmzZuHt7V3qe27dWlXiySpqtZozZ66wd28Ue/ac4sCBM1y4cL3IfdbWltjYWFKzpiOtW7tSu7YTtWs7Urt2DVxcHLl9O4OLF29oj4I8e/ZKqWWqVcuR//73JQYN6qz7wwshhBAGqvBA/ddff5GTk8PatWs5fvw4n332Gf/73//KnZ6ZmRn+/l74+3sxdmxfcnPzuHr1FtbWFtjYWGFjY4W1tYVeM7AVRSEx8Q6xsddJTc3E0dH2//+zw8nJFgcHW1l2JYQQwiQqPPpERkYSEhICQOvWrTl16pRR07e0tKBBAzeD0lCpVLi51ZDdxIQQQlQ5FR6o09PTcXBw0H5tbm5OXl4eFhalZ13aIdqiMKkr/Uh96U7qSndSV/qR+tJdhQdqBwcHMjIytF+r1eoygzRQ4hi1KMzV1VHqSg9SX7qTutKd1JV+pL6KKu2DS4VvpdWmTRv27NkDwPHjx/H19a3oLIUQQogHRoW3qHv16sX+/fsZOnQoiqLwySefVHSWQgghxAOjwgO1mZkZM2bMqOhshBBCiAeSnCIhhBBCVGESqIUQQogqTAK1EEIIUYVJoBZCCCGqMAnUQgghRBWmUhRFMXUhhBBCCFE8aVELIYQQVZgEaiGEEKIKk0AthBBCVGESqIUQQogqTAK1EEIIUYVJoBZCCCGqsAo/lEMjNzeX9957j6tXr5KTk8O4ceNo3LgxU6dORaVS0aRJE6ZPn46ZWcFnh1u3bjF06FA2bdqEtbU1aWlpTJ48mfT0dHJzc5k6dSqBgYGVVfxKZWhdZWZm8tZbb5GamoqtrS3//e9/cXFxMfFTVRxD60sjJiaGIUOGcODAgULXHySG1pWiKISGhtKgQQMAWrduzVtvvWXCJ6pYhtZXfn4+n376KadOnSInJ4cJEybQrVs3Ez9VxTC0rr799lv27t0LwJ07d0hKSmL//v2mfKSqQ6kk69evV2bNmqUoiqLcunVL6dKlizJmzBglPDxcURRFmTZtmrJt2zZFURRlz549yoABA5TAwEAlOztbURRFWbBggbJs2TJFURQlJiZGGThwYGUVvdIZWlfLli1TFi5cqCiKovz888/KzJkzTfAUlcfQ+lIURUlLS1NGjRqldOjQodD1B42hdXXp0iVlzJgxpim8CRhaXz///LMyffp0RVEU5caNG9q/YQ8iY/weaowePVrZs2dP5RW+iqu0ru8+ffrwxhtvaL82NzcnKiqK4OBgAEJDQzlw4ABQcDTmsmXLcHZ21t7/wgsvMHToUADy8/Mf2BYPGKeuxo0bB8C1a9eoXbt25RXeBAytL0VRmDZtGm+++Sa2traVWvbKZmhdRUVFkZCQwMiRIxk1ahSxsbGVWv7KZmh97du3Dw8PD0aPHs0HH3xA9+7dK7X8lcnQutLYtm0bTk5OhISEVEq5q4NKC9T29vY4ODiQnp7O66+/zsSJE1EUBZVKpX09LS0NgM6dO1OzZs1C73dycsLGxobExEQmT57Mm2++WVlFr3SG1hUU/JI899xzrFixgi5dulRq+SubofW1aNEiunTpQtOmTSu97JXN0LpydXVl9OjRhIWFMWbMGCZPnlzpz1CZDK2vlJQULl++zDfffMOoUaN49913K/0ZKosx/m4BfPPNN4wfP77Syl0dVOpksuvXr/Pcc88xYMAA+vfvrx2rAMjIyMDJyanU9587d44XXniBSZMmaT+lPagMrSuAH3/8kZUrVzJhwoSKLGqVYEh9/fbbb/z888+MHDmSxMREXnrppcoosskYUlctWrSgR48eAAQFBZGQkIDygO9CbEh9OTs707VrV1QqFcHBwVy6dKkSSmw6hv7dunDhAk5OTnh7e1d0UauVSgvUSUlJvPTSS0yePJnBgwcD0Lx5cyIiIgDYs2cPQUFBJb7/woULvPHGG8yZM+eBbyEaWlfffPMNv/zyCwB2dnaYm5tXeJlNydD62r59O2FhYYSFheHq6sr3339fKeU2BUPratGiRfzwww8AnD17lrp162pbTA8iQ+urbdu27N69Gyiorzp16lR8oU3E0LoCOHDgAKGhoRVe1uqm0g7lmDVrFps3b8bHx0d77f3332fWrFnk5ubi4+PDrFmzCgWV7t27s3nzZqytrRk3bhznzp2jXr16ADg4OPC///2vMope6Qytq6SkJKZMmUJOTg75+fm89dZbtG3b1hSPUikMra/7lXT9QWFoXaWmpjJ58mQyMzMxNzfnww8/pFGjRqZ4lEphaH3l5OQwffp0YmJiUBSFjz76CH9/f1M8SoUzxu/hf/7zHzp37kzPnj0rvfxVmZyeJYQQQlRhsuGJEEIIUYVJoBZCCCGqMAnUQgghRBUmgVoIIYSowiRQCyGEEFWYBGohhBCiCpNALYQQQlRhEqiFEEKIKuz/ANWserykR+k1AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Resample data to monthly count of reviews\n", "monthly_data = time_series_data[~(time_series_data['review_year']==2011)].resample('MS').count()\n", "monthly_data = monthly_data.drop(monthly_data.index[-1])\n", "\n", "# Plot the aggregated monthly count of reviews\n", "monthly_data['rating'].plot(figsize=(8,5), colormap='seismic', xlabel='')\n", "\n", "plt.title('Total Count of Reviews By Month', fontsize=16)\n", "plt.savefig('data/images/fig8.png', dpi=200, transparent=True)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The count of reviews peak during months of spring and fall with the highest spike of over 8,000 reviews in October of 2017." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Resample data to yearly average rating of reviews\n", "yearly_data = time_series_data.resample('Y').mean()\n", "yearly_data = yearly_data.drop(yearly_data.index[-1])\n", "\n", "# Plot the aggregated to yearly average rating of reviews\n", "yearly_data['rating'].plot(figsize=(8,5), colormap='PRGn', xlabel='')\n", "\n", "plt.title('Average Rating of Reviews By Year', fontsize=16)\n", "plt.savefig('data/images/fig9.png', dpi=200, transparent=True)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The average ratings steadily increased from over 4.45 in 2013 to 4.575 in 2016 but went down by less then 0.025 in 2017." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Resample data to monthly average rating of reviews\n", "monthly_data = time_series_data[~(time_series_data['review_year']==2011)].resample('MS').mean()\n", "monthly_data = monthly_data.drop(monthly_data.index[-1])\n", "\n", "# Plot the aggregated to monthly average rating of reviews\n", "monthly_data['rating'].plot(figsize=(8,5), colormap='seismic', xlabel='')\n", "\n", "plt.title('Average Rating of Reviews By Month', fontsize=16)\n", "plt.savefig('data/images/fig10.png', dpi=200, transparent=True)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The average ratings peak during the latter months of the year and aligned with the higher counts of rentals in the fall." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## IV. Recommendation Systems\n", "\n", "> \"Recommendation Systems are software agents that elicit the interests and preferences of individual consumers […] and make recommendations accordingly. They have the potential to support and improve the quality of the\n", "decisions consumers make while searching for and selecting products online.\" [(Bo Xiao and Izak Benbasat)](https://misq.org/e-commerce-product-recommendation-agents-use-characteristics-and-impact.html)\n", "\n", "To start, I create a set of generalized recommendations that are based on all the data. For all the items, I calculate a weighted rating and return the top 10 highest-rated items across the board. To **personalize the recommendations**, I apply the different algorithms for Content-Based Recommenders and Collaborative Filtering Systems, which I implement using the `surprise` library later.\n", "***\n", "\n", "### Popularity Recommendations\n", "\n", "##### Bayesian Average\n", "\n", "$$W = \\left(\\frac{v}{v + m} \\right)R + \\left(\\frac{m}{v + m} \\right)C$$\n", "where:\n", "\n", "$W$ = Weighted rating
\n", "$v$ = Number of ratings for the item
\n", "$m$ = Minimum number of ratings required to be listed on top chart
\n", "$R$ = Average rating of the item
\n", "$C$ = Mean rating across the entire data" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "m = item_data['user_count'].quantile(0.9)\n", "C = item_data['rating_average'].mean()\n", "\n", "def weighted_rating(x, m=m, C=C):\n", " '''\n", " Calculates weighted rating based on Bayesian Average.\n", " '''\n", " v = x['user_count']\n", " R = x['rating_average']\n", " return (v/(v+m) * R) + (m/(m+v) * C)\n", "\n", "def popular_recommendation(df=data, n=10):\n", " '''\n", " Returns the most popular items according to the highest weighted ratings.\n", " '''\n", " item_df = create_item_data(df)\n", " \n", " top_item_ratings = item_df.loc[(item_df['user_count']>=m)]\n", " top_item_ratings['score'] = top_item_ratings.apply(weighted_rating, axis=1)\n", " top_item_ratings = top_item_ratings.sort_values('score', ascending=False)\n", " \n", " return top_item_ratings.head(n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top 10 Popularity-Based Recommendations:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
item_idfit_smallfitfit_largeuser_countbust_size_topweight_meanweight_medianrating_averagerented_for_toprented_for_allbody_type_topcategory_topheight_meanheight_mediansize_meansize_medianage_meanage_medianreview_length_averagereview_month_topreview_season_topscore
1948106439712.00257.0015.0028434b129.45130.004.89wedding[formal affair, other, party, wedding, date]athleticgown64.8165.008.148.0028.6528.0060.7410Winter4.82
122370983230.00151.009.0019034b134.62135.004.86formal affair[formal affair, other, party, wedding, date]hourglassgown65.1165.0012.1412.0033.9532.0072.493Winter4.77
2260121342727.00270.0011.0030834b132.22130.004.82wedding[formal affair, everyday, other, party, weddin...athleticgown66.2366.009.278.0029.3229.0061.104Spring4.77
160890364713.00126.004.0014334b140.04138.004.88formal affair[formal affair, other, party, wedding, date]athleticgown65.9466.0012.6712.0034.4332.0060.8710Winter4.76
2599137863120.00302.008.0033034b139.89135.004.81wedding[formal affair, other, party, wedding, date, v...hourglassmaxi65.9366.0012.4211.0032.9832.0065.146Spring4.76
112379365.001497.00152.00171434b132.98130.004.77formal affair[formal affair, work, other, party, wedding, d...hourglassgown65.0665.009.728.0031.3131.0074.415Winter4.76
181210030765.00122.005.0013234b137.98135.004.89wedding[formal affair, other, party, wedding, date, v...athleticdress65.9666.0011.5512.0032.8032.0055.2110Winter4.76
220111869239.0091.003.0010334c137.48135.004.92party[formal affair, other, party, wedding, date, v...hourglassdress65.3365.0013.5314.0030.3330.0065.7011Winter4.76
2351126066625.00126.003.0015434b130.13130.004.84party[formal affair, other, party, wedding, date, v...hourglassdress64.8265.0010.668.0032.7732.5054.057Spring4.74
123671437423.00112.002.0013734b133.23132.004.85wedding[formal affair, work, other, party, wedding, d...athleticdress65.8366.0010.188.0031.4930.0055.9710Fall4.74
\n", "
" ], "text/plain": [ " item_id fit_small fit fit_large user_count bust_size_top \\\n", "1948 1064397 12.00 257.00 15.00 284 34b \n", "1223 709832 30.00 151.00 9.00 190 34b \n", "2260 1213427 27.00 270.00 11.00 308 34b \n", "1608 903647 13.00 126.00 4.00 143 34b \n", "2599 1378631 20.00 302.00 8.00 330 34b \n", "1 123793 65.00 1497.00 152.00 1714 34b \n", "1812 1003076 5.00 122.00 5.00 132 34b \n", "2201 1186923 9.00 91.00 3.00 103 34c \n", "2351 1260666 25.00 126.00 3.00 154 34b \n", "1236 714374 23.00 112.00 2.00 137 34b \n", "\n", " weight_mean weight_median rating_average rented_for_top \\\n", "1948 129.45 130.00 4.89 wedding \n", "1223 134.62 135.00 4.86 formal affair \n", "2260 132.22 130.00 4.82 wedding \n", "1608 140.04 138.00 4.88 formal affair \n", "2599 139.89 135.00 4.81 wedding \n", "1 132.98 130.00 4.77 formal affair \n", "1812 137.98 135.00 4.89 wedding \n", "2201 137.48 135.00 4.92 party \n", "2351 130.13 130.00 4.84 party \n", "1236 133.23 132.00 4.85 wedding \n", "\n", " rented_for_all body_type_top \\\n", "1948 [formal affair, other, party, wedding, date] athletic \n", "1223 [formal affair, other, party, wedding, date] hourglass \n", "2260 [formal affair, everyday, other, party, weddin... athletic \n", "1608 [formal affair, other, party, wedding, date] athletic \n", "2599 [formal affair, other, party, wedding, date, v... hourglass \n", "1 [formal affair, work, other, party, wedding, d... hourglass \n", "1812 [formal affair, other, party, wedding, date, v... athletic \n", "2201 [formal affair, other, party, wedding, date, v... hourglass \n", "2351 [formal affair, other, party, wedding, date, v... hourglass \n", "1236 [formal affair, work, other, party, wedding, d... athletic \n", "\n", " category_top height_mean height_median size_mean size_median \\\n", "1948 gown 64.81 65.00 8.14 8.00 \n", "1223 gown 65.11 65.00 12.14 12.00 \n", "2260 gown 66.23 66.00 9.27 8.00 \n", "1608 gown 65.94 66.00 12.67 12.00 \n", "2599 maxi 65.93 66.00 12.42 11.00 \n", "1 gown 65.06 65.00 9.72 8.00 \n", "1812 dress 65.96 66.00 11.55 12.00 \n", "2201 dress 65.33 65.00 13.53 14.00 \n", "2351 dress 64.82 65.00 10.66 8.00 \n", "1236 dress 65.83 66.00 10.18 8.00 \n", "\n", " age_mean age_median review_length_average review_month_top \\\n", "1948 28.65 28.00 60.74 10 \n", "1223 33.95 32.00 72.49 3 \n", "2260 29.32 29.00 61.10 4 \n", "1608 34.43 32.00 60.87 10 \n", "2599 32.98 32.00 65.14 6 \n", "1 31.31 31.00 74.41 5 \n", "1812 32.80 32.00 55.21 10 \n", "2201 30.33 30.00 65.70 11 \n", "2351 32.77 32.50 54.05 7 \n", "1236 31.49 30.00 55.97 10 \n", "\n", " review_season_top score \n", "1948 Winter 4.82 \n", "1223 Winter 4.77 \n", "2260 Spring 4.77 \n", "1608 Winter 4.76 \n", "2599 Spring 4.76 \n", "1 Winter 4.76 \n", "1812 Winter 4.76 \n", "2201 Winter 4.76 \n", "2351 Spring 4.74 \n", "1236 Fall 4.74 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.set_option('display.max_columns', 30)\n", "\n", "top10_overall = popular_recommendation()\n", "top10_overall" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To simulate the online shopping experience, I can also filter the popularity-based recommendations on the data features such as `dress` for clothing category and `wedding` for reason to rent using the function I define as `filter_popular_recommendation`." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "column_list = []\n", "operator_list = []\n", "condition_list = []\n", "\n", "def append_condition(column, operation, condition):\n", " '''\n", " Appends a filter to column, operator, and condition lists.\n", " ''' \n", " column_list.append(column)\n", " operator_list.append(operation)\n", " condition_list.append(condition)\n", "\n", "def filter_popular_recommendation(df=data, n=10, bust_size=None, weight=None, rating=None, rented_for=None, \n", " body_type=None, category=None, height=None, size=None, age=None, \n", " review_month=None, review_season=None, review_year=None):\n", " '''\n", " Returns the most popular recommendations filtered by the features passed as arguments.\n", " '''\n", " if bust_size:\n", " append_condition('bust_size', '==', bust_size)\n", " if weight:\n", " append_condition('weight', '>=', weight-10)\n", " append_condition('weight', '<=', weight+10)\n", " if rented_for:\n", " append_condition('rented_for', '==', rented_for)\n", " if body_type:\n", " append_condition('body_type', '==', body_type)\n", " if category:\n", " append_condition('category', '==', category)\n", " if height:\n", " append_condition('height', '>=', height-2)\n", " append_condition('height', '>=', height+2)\n", " if size:\n", " append_condition('size', '==', size)\n", " if age:\n", " append_condition('age', '>=', age-4)\n", " append_condition('age', '<=', age+4)\n", " if review_month:\n", " append_condition('review_month', '==', review_month)\n", " if review_season:\n", " append_condition('review_season', '==', review_season)\n", " if review_year:\n", " append_condition('review_year', '==', review_year)\n", " \n", " condition = ' & '.join(f'{col} {op} {repr(cond)}' for col, op, cond in zip(column_list, operator_list, condition_list))\n", " filtered_df = df.query(condition)\n", " \n", " return popular_recommendation(filtered_df, n)\n", "\n", "def reset_condition():\n", " '''\n", " Reinitializes lists for query for filtered popularity recommender.\n", " '''\n", " column_list = []\n", " operator_list = []\n", " condition_list = []\n", " return column_list, operator_list, condition_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top 10 Popular Recommendations for Dress:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
item_idfit_smallfitfit_largeuser_countbust_size_topweight_meanweight_medianrating_averagerented_for_toprented_for_allbody_type_topcategory_topheight_meanheight_mediansize_meansize_medianage_meanage_medianreview_length_averagereview_month_topreview_season_topscore
122310030765.00122.005.0013234b137.98135.004.89wedding[formal affair, other, party, wedding, date, v...athleticdress65.9666.0011.5512.0032.8032.0055.2110Winter4.76
148311869239.0091.003.0010334c137.48135.004.92party[formal affair, other, party, wedding, date, v...hourglassdress65.3365.0013.5314.0030.3330.0065.7011Winter4.76
1584126066625.00126.003.0015434b130.13130.004.84party[formal affair, other, party, wedding, date, v...hourglassdress64.8265.0010.668.0032.7732.5054.057Spring4.74
83271437423.00112.002.0013734b133.23132.004.85wedding[formal affair, work, other, party, wedding, d...athleticdress65.8366.0010.188.0031.4930.0055.9710Fall4.74
8477243195.0099.0014.0011832d134.67135.004.86wedding[formal affair, work, other, party, wedding, d...hourglassdress65.2765.009.838.0033.5133.0051.0810Summer4.73
137811061012.00111.0027.0014036b142.86140.004.84wedding[formal affair, everyday, other, party, weddin...hourglassdress65.8166.0011.5714.0031.0130.0048.9011Summer4.73
15424146110.00227.004.0024134d141.32135.004.78wedding[formal affair, everyday, other, party, weddin...hourglassdress66.1066.0014.1512.0031.7030.0067.826Spring4.72
2540194098514.0099.001.0011434d139.75138.004.83wedding[formal affair, work, other, party, wedding, d...hourglassdress65.6066.0012.9314.0034.9834.0049.3711Winter4.71
1262103144071.00160.001.0023236c138.88135.004.77party[formal affair, other, party, wedding, date, v...hourglassdress65.3165.0015.0914.0031.2331.0060.1112Winter4.71
1812636991.00105.0038.0014434b136.67135.004.81formal affair[formal affair, other, party, wedding, date]hourglassdress65.4565.009.378.0034.0132.0066.346Winter4.71
\n", "
" ], "text/plain": [ " item_id fit_small fit fit_large user_count bust_size_top \\\n", "1223 1003076 5.00 122.00 5.00 132 34b \n", "1483 1186923 9.00 91.00 3.00 103 34c \n", "1584 1260666 25.00 126.00 3.00 154 34b \n", "832 714374 23.00 112.00 2.00 137 34b \n", "847 724319 5.00 99.00 14.00 118 32d \n", "1378 1106101 2.00 111.00 27.00 140 36b \n", "154 241461 10.00 227.00 4.00 241 34d \n", "2540 1940985 14.00 99.00 1.00 114 34d \n", "1262 1031440 71.00 160.00 1.00 232 36c \n", "181 263699 1.00 105.00 38.00 144 34b \n", "\n", " weight_mean weight_median rating_average rented_for_top \\\n", "1223 137.98 135.00 4.89 wedding \n", "1483 137.48 135.00 4.92 party \n", "1584 130.13 130.00 4.84 party \n", "832 133.23 132.00 4.85 wedding \n", "847 134.67 135.00 4.86 wedding \n", "1378 142.86 140.00 4.84 wedding \n", "154 141.32 135.00 4.78 wedding \n", "2540 139.75 138.00 4.83 wedding \n", "1262 138.88 135.00 4.77 party \n", "181 136.67 135.00 4.81 formal affair \n", "\n", " rented_for_all body_type_top \\\n", "1223 [formal affair, other, party, wedding, date, v... athletic \n", "1483 [formal affair, other, party, wedding, date, v... hourglass \n", "1584 [formal affair, other, party, wedding, date, v... hourglass \n", "832 [formal affair, work, other, party, wedding, d... athletic \n", "847 [formal affair, work, other, party, wedding, d... hourglass \n", "1378 [formal affair, everyday, other, party, weddin... hourglass \n", "154 [formal affair, everyday, other, party, weddin... hourglass \n", "2540 [formal affair, work, other, party, wedding, d... hourglass \n", "1262 [formal affair, other, party, wedding, date, v... hourglass \n", "181 [formal affair, other, party, wedding, date] hourglass \n", "\n", " category_top height_mean height_median size_mean size_median \\\n", "1223 dress 65.96 66.00 11.55 12.00 \n", "1483 dress 65.33 65.00 13.53 14.00 \n", "1584 dress 64.82 65.00 10.66 8.00 \n", "832 dress 65.83 66.00 10.18 8.00 \n", "847 dress 65.27 65.00 9.83 8.00 \n", "1378 dress 65.81 66.00 11.57 14.00 \n", "154 dress 66.10 66.00 14.15 12.00 \n", "2540 dress 65.60 66.00 12.93 14.00 \n", "1262 dress 65.31 65.00 15.09 14.00 \n", "181 dress 65.45 65.00 9.37 8.00 \n", "\n", " age_mean age_median review_length_average review_month_top \\\n", "1223 32.80 32.00 55.21 10 \n", "1483 30.33 30.00 65.70 11 \n", "1584 32.77 32.50 54.05 7 \n", "832 31.49 30.00 55.97 10 \n", "847 33.51 33.00 51.08 10 \n", "1378 31.01 30.00 48.90 11 \n", "154 31.70 30.00 67.82 6 \n", "2540 34.98 34.00 49.37 11 \n", "1262 31.23 31.00 60.11 12 \n", "181 34.01 32.00 66.34 6 \n", "\n", " review_season_top score \n", "1223 Winter 4.76 \n", "1483 Winter 4.76 \n", "1584 Spring 4.74 \n", "832 Fall 4.74 \n", "847 Summer 4.73 \n", "1378 Summer 4.73 \n", "154 Spring 4.72 \n", "2540 Winter 4.71 \n", "1262 Winter 4.71 \n", "181 Winter 4.71 " ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "top10_dress = filter_popular_recommendation(category='dress')\n", "column_list, operator_list, condition_list = reset_condition()\n", "top10_dress" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Top 10 Popular Recommendations for Wedding:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
item_idfit_smallfitfit_largeuser_countbust_size_topweight_meanweight_medianrating_averagerented_for_toprented_for_allbody_type_topcategory_topheight_meanheight_mediansize_meansize_medianage_meanage_medianreview_length_averagereview_month_topreview_season_topscore
139710643977.00153.008.0016834b129.88129.004.87wedding[wedding]athleticgown64.6864.008.088.0029.5729.0063.7710Fall4.77
112379320.00435.0046.0050134b133.64130.004.78wedding[wedding]athleticgown64.8765.0010.088.0031.6831.0076.0010Summer4.75
1867137863115.00226.007.0024834c139.42135.004.81wedding[wedding]athleticmaxi65.9066.0011.699.0032.4731.0067.146Summer4.75
1623121342715.00150.007.0017234b132.06130.004.82wedding[wedding]athleticgown65.9066.009.268.0029.8029.0059.129Spring4.73
1012786528.00400.0010.0043834b134.43135.004.76wedding[wedding]hourglassgown64.7265.0011.6812.0035.6434.0068.7510Fall4.72
11108701841.0088.0010.009934c135.56135.004.86wedding[wedding]hourglassdress65.3565.009.968.0032.5231.0060.197Summer4.72
129510030763.0075.003.008134b137.58135.004.89wedding[wedding]athleticdress65.8366.0011.3112.0032.2831.0055.7910Fall4.72
2202414616.00173.002.0018134d141.25135.004.79wedding[wedding]hourglassdress66.1766.0013.5612.0032.0130.0071.356Summer4.71
176713095377.0069.007.008334c138.06135.004.88wedding[wedding]hourglassgown64.8865.0013.8713.0031.4830.0058.5810Summer4.71
231516870829.00200.009.0021834c131.89132.504.78wedding[wedding]athleticgown65.5565.009.148.0031.0830.0066.335Spring4.71
\n", "
" ], "text/plain": [ " item_id fit_small fit fit_large user_count bust_size_top \\\n", "1397 1064397 7.00 153.00 8.00 168 34b \n", "1 123793 20.00 435.00 46.00 501 34b \n", "1867 1378631 15.00 226.00 7.00 248 34c \n", "1623 1213427 15.00 150.00 7.00 172 34b \n", "10 127865 28.00 400.00 10.00 438 34b \n", "1110 870184 1.00 88.00 10.00 99 34c \n", "1295 1003076 3.00 75.00 3.00 81 34b \n", "220 241461 6.00 173.00 2.00 181 34d \n", "1767 1309537 7.00 69.00 7.00 83 34c \n", "2315 1687082 9.00 200.00 9.00 218 34c \n", "\n", " weight_mean weight_median rating_average rented_for_top \\\n", "1397 129.88 129.00 4.87 wedding \n", "1 133.64 130.00 4.78 wedding \n", "1867 139.42 135.00 4.81 wedding \n", "1623 132.06 130.00 4.82 wedding \n", "10 134.43 135.00 4.76 wedding \n", "1110 135.56 135.00 4.86 wedding \n", "1295 137.58 135.00 4.89 wedding \n", "220 141.25 135.00 4.79 wedding \n", "1767 138.06 135.00 4.88 wedding \n", "2315 131.89 132.50 4.78 wedding \n", "\n", " rented_for_all body_type_top category_top height_mean height_median \\\n", "1397 [wedding] athletic gown 64.68 64.00 \n", "1 [wedding] athletic gown 64.87 65.00 \n", "1867 [wedding] athletic maxi 65.90 66.00 \n", "1623 [wedding] athletic gown 65.90 66.00 \n", "10 [wedding] hourglass gown 64.72 65.00 \n", "1110 [wedding] hourglass dress 65.35 65.00 \n", "1295 [wedding] athletic dress 65.83 66.00 \n", "220 [wedding] hourglass dress 66.17 66.00 \n", "1767 [wedding] hourglass gown 64.88 65.00 \n", "2315 [wedding] athletic gown 65.55 65.00 \n", "\n", " size_mean size_median age_mean age_median review_length_average \\\n", "1397 8.08 8.00 29.57 29.00 63.77 \n", "1 10.08 8.00 31.68 31.00 76.00 \n", "1867 11.69 9.00 32.47 31.00 67.14 \n", "1623 9.26 8.00 29.80 29.00 59.12 \n", "10 11.68 12.00 35.64 34.00 68.75 \n", "1110 9.96 8.00 32.52 31.00 60.19 \n", "1295 11.31 12.00 32.28 31.00 55.79 \n", "220 13.56 12.00 32.01 30.00 71.35 \n", "1767 13.87 13.00 31.48 30.00 58.58 \n", "2315 9.14 8.00 31.08 30.00 66.33 \n", "\n", " review_month_top review_season_top score \n", "1397 10 Fall 4.77 \n", "1 10 Summer 4.75 \n", "1867 6 Summer 4.75 \n", "1623 9 Spring 4.73 \n", "10 10 Fall 4.72 \n", "1110 7 Summer 4.72 \n", "1295 10 Fall 4.72 \n", "220 6 Summer 4.71 \n", "1767 10 Summer 4.71 \n", "2315 5 Spring 4.71 " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "top10_wedding = filter_popular_recommendation(rented_for='wedding')\n", "column_list, operator_list, condition_list = reset_condition()\n", "top10_wedding" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Content-Based Recommenders\n", "\n", "Content-based recommendation systems are based on the idea that if a user likes an item, the user will also like items similar to it. To measure the similarity between the items, I calculate the Pearson correlation using numerical and categorical features from the table `item_data` created earlier. Then, I complete a `similarity_matrix` of all the items to use in the function `content_based_similarity` I define, which generates content-based recommendations for any `item_id`. Lastly, I use the text features later to create a **text review-based recommender** using Natural Language Processing." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "def item_similarity(item_df):\n", " '''\n", " Measures pearson correlation of items from the table of item data and returns a similarity matrix.\n", " '''\n", " item_df = item_df.drop(['fit_small', 'fit_large', 'weight_mean', 'rented_for_all', 'height_mean', 'size_mean', \n", " 'age_mean', 'review_month_top'], axis=1)\n", " \n", " similarity_features = item_df[['item_id', 'fit', 'user_count', 'weight_median', 'rating_average', 'rented_for_top', \n", " 'body_type_top', 'category_top', 'height_median', 'size_median', 'age_median', \n", " 'review_length_average', 'review_season_top']]\n", " similarity_features = similarity_features.set_index('item_id')\n", " similarity_features = pd.get_dummies(similarity_features, columns=['rented_for_top', 'body_type_top', 'category_top', 'review_season_top'])\n", " \n", " similarity_matrix = similarity_features.T\n", " similarity_matrix = similarity_matrix.corr(method='pearson')\n", " \n", " return similarity_features, similarity_matrix\n", "\n", "pd.set_option('display.max_columns', 30)\n", "\n", "def content_based_similarity(similarity_matrix, item_id, n=20):\n", " '''\n", " Returns the most similar item recommendations to the given item based on the similarity matrix.\n", " '''\n", " recommendations = similarity_matrix[item_id].sort_values(ascending=False)\n", " recommendations = recommendations.drop([item_id], axis=0).index\n", " \n", " recommendations_list = []\n", " for i in range(n):\n", " recommendations_list.append(recommendations[i])\n", " \n", " display(item_data.loc[item_data['item_id']==item_id])\n", " print(f'----------------------------------------\\nTop {n} Recommendations for Item #{item_id}:')\n", " \n", " recommendations_df = item_data.loc[item_data['item_id'].isin(recommendations_list)]\n", " return recommendations_df" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "similarity_features, similarity_matrix = item_similarity(item_data)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
item_id123373123793124204124553125424125465125564126335127081127495127865128730128959129831130259...295948629597772960025296091329609402960969296185529626462963344296360129638502964470296500929659242966087
item_id
1233731.000.991.000.980.991.001.000.990.961.000.990.981.000.931.00...0.260.160.250.170.190.280.170.240.240.160.250.180.190.160.19
1237930.991.000.990.960.971.000.991.000.920.981.000.960.980.880.99...0.140.050.140.060.080.170.060.130.130.050.140.070.080.050.08
1242041.000.991.000.980.991.001.000.990.950.990.990.981.000.921.00...0.230.130.220.150.160.250.150.220.210.140.230.160.170.140.17
1245530.980.960.981.001.000.980.990.970.970.990.960.980.980.960.99...0.350.250.330.260.280.370.260.340.330.250.340.270.280.240.29
1254240.990.970.991.001.000.981.000.970.981.000.960.990.990.960.99...0.360.260.340.270.290.380.270.350.340.260.350.280.290.250.29
................................................................................................
29638500.250.140.230.340.350.210.280.130.500.330.160.420.310.590.22...1.000.970.980.980.991.000.971.001.000.991.000.990.990.900.99
29644700.180.070.160.270.280.140.210.060.430.260.080.350.240.520.15...0.990.990.970.980.990.990.980.991.000.990.991.000.990.901.00
29650090.190.080.170.280.290.150.220.070.440.270.100.360.250.530.16...0.990.990.960.971.000.990.991.001.001.000.990.991.000.860.99
29659240.160.050.140.240.250.120.190.040.420.230.070.320.220.490.13...0.910.810.960.960.860.870.800.880.880.870.900.900.861.000.91
29660870.190.080.170.290.290.150.230.070.450.270.090.360.250.540.16...0.990.980.980.990.990.990.970.990.990.990.991.000.990.911.00
\n", "

5850 rows × 5850 columns

\n", "
" ], "text/plain": [ "item_id 123373 123793 124204 124553 125424 125465 125564 \\\n", "item_id \n", "123373 1.00 0.99 1.00 0.98 0.99 1.00 1.00 \n", "123793 0.99 1.00 0.99 0.96 0.97 1.00 0.99 \n", "124204 1.00 0.99 1.00 0.98 0.99 1.00 1.00 \n", "124553 0.98 0.96 0.98 1.00 1.00 0.98 0.99 \n", "125424 0.99 0.97 0.99 1.00 1.00 0.98 1.00 \n", "... ... ... ... ... ... ... ... \n", "2963850 0.25 0.14 0.23 0.34 0.35 0.21 0.28 \n", "2964470 0.18 0.07 0.16 0.27 0.28 0.14 0.21 \n", "2965009 0.19 0.08 0.17 0.28 0.29 0.15 0.22 \n", "2965924 0.16 0.05 0.14 0.24 0.25 0.12 0.19 \n", "2966087 0.19 0.08 0.17 0.29 0.29 0.15 0.23 \n", "\n", "item_id 126335 127081 127495 127865 128730 128959 129831 \\\n", "item_id \n", "123373 0.99 0.96 1.00 0.99 0.98 1.00 0.93 \n", "123793 1.00 0.92 0.98 1.00 0.96 0.98 0.88 \n", "124204 0.99 0.95 0.99 0.99 0.98 1.00 0.92 \n", "124553 0.97 0.97 0.99 0.96 0.98 0.98 0.96 \n", "125424 0.97 0.98 1.00 0.96 0.99 0.99 0.96 \n", "... ... ... ... ... ... ... ... \n", "2963850 0.13 0.50 0.33 0.16 0.42 0.31 0.59 \n", "2964470 0.06 0.43 0.26 0.08 0.35 0.24 0.52 \n", "2965009 0.07 0.44 0.27 0.10 0.36 0.25 0.53 \n", "2965924 0.04 0.42 0.23 0.07 0.32 0.22 0.49 \n", "2966087 0.07 0.45 0.27 0.09 0.36 0.25 0.54 \n", "\n", "item_id 130259 ... 2959486 2959777 2960025 2960913 2960940 2960969 \\\n", "item_id ... \n", "123373 1.00 ... 0.26 0.16 0.25 0.17 0.19 0.28 \n", "123793 0.99 ... 0.14 0.05 0.14 0.06 0.08 0.17 \n", "124204 1.00 ... 0.23 0.13 0.22 0.15 0.16 0.25 \n", "124553 0.99 ... 0.35 0.25 0.33 0.26 0.28 0.37 \n", "125424 0.99 ... 0.36 0.26 0.34 0.27 0.29 0.38 \n", "... ... ... ... ... ... ... ... ... \n", "2963850 0.22 ... 1.00 0.97 0.98 0.98 0.99 1.00 \n", "2964470 0.15 ... 0.99 0.99 0.97 0.98 0.99 0.99 \n", "2965009 0.16 ... 0.99 0.99 0.96 0.97 1.00 0.99 \n", "2965924 0.13 ... 0.91 0.81 0.96 0.96 0.86 0.87 \n", "2966087 0.16 ... 0.99 0.98 0.98 0.99 0.99 0.99 \n", "\n", "item_id 2961855 2962646 2963344 2963601 2963850 2964470 2965009 \\\n", "item_id \n", "123373 0.17 0.24 0.24 0.16 0.25 0.18 0.19 \n", "123793 0.06 0.13 0.13 0.05 0.14 0.07 0.08 \n", "124204 0.15 0.22 0.21 0.14 0.23 0.16 0.17 \n", "124553 0.26 0.34 0.33 0.25 0.34 0.27 0.28 \n", "125424 0.27 0.35 0.34 0.26 0.35 0.28 0.29 \n", "... ... ... ... ... ... ... ... \n", "2963850 0.97 1.00 1.00 0.99 1.00 0.99 0.99 \n", "2964470 0.98 0.99 1.00 0.99 0.99 1.00 0.99 \n", "2965009 0.99 1.00 1.00 1.00 0.99 0.99 1.00 \n", "2965924 0.80 0.88 0.88 0.87 0.90 0.90 0.86 \n", "2966087 0.97 0.99 0.99 0.99 0.99 1.00 0.99 \n", "\n", "item_id 2965924 2966087 \n", "item_id \n", "123373 0.16 0.19 \n", "123793 0.05 0.08 \n", "124204 0.14 0.17 \n", "124553 0.24 0.29 \n", "125424 0.25 0.29 \n", "... ... ... \n", "2963850 0.90 0.99 \n", "2964470 0.90 1.00 \n", "2965009 0.86 0.99 \n", "2965924 1.00 0.91 \n", "2966087 0.91 1.00 \n", "\n", "[5850 rows x 5850 columns]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "similarity_matrix" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
item_idfit_smallfitfit_largeuser_countbust_size_topweight_meanweight_medianrating_averagerented_for_toprented_for_allbody_type_topcategory_topheight_meanheight_mediansize_meansize_medianage_meanage_medianreview_length_averagereview_month_topreview_season_top
012337373.00566.0047.0068636d140.67135.004.40formal affair[formal affair, work, other, party, wedding, d...hourglassgown65.3965.0015.1213.0034.3633.0066.0812Winter
\n", "
" ], "text/plain": [ " item_id fit_small fit fit_large user_count bust_size_top \\\n", "0 123373 73.00 566.00 47.00 686 36d \n", "\n", " weight_mean weight_median rating_average rented_for_top \\\n", "0 140.67 135.00 4.40 formal affair \n", "\n", " rented_for_all body_type_top \\\n", "0 [formal affair, work, other, party, wedding, d... hourglass \n", "\n", " category_top height_mean height_median size_mean size_median age_mean \\\n", "0 gown 65.39 65.00 15.12 13.00 34.36 \n", "\n", " age_median review_length_average review_month_top review_season_top \n", "0 33.00 66.08 12 Winter " ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "----------------------------------------\n", "Top 20 Recommendations for Item #123373:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
item_idfit_smallfitfit_largeuser_countbust_size_topweight_meanweight_medianrating_averagerented_for_toprented_for_allbody_type_topcategory_topheight_meanheight_mediansize_meansize_medianage_meanage_medianreview_length_averagereview_month_topreview_season_top
212420442.00630.00123.0079534b136.88135.004.65party[formal affair, work, other, party, wedding, d...hourglassdress65.1565.0010.978.0033.3333.0062.1312Winter
5125465150.00720.0013.0088334b143.82138.004.69formal affair[formal affair, work, other, party, wedding, d...hourglassgown65.8366.0016.9213.0032.7031.0064.224Spring
612556498.00443.0066.0060734b142.99138.004.43formal affair[formal affair, everyday, other, party, weddin...hourglassgown65.4865.0017.0516.0038.1636.0061.7311Winter
1212895944.00447.0025.0051634c140.82138.004.65formal affair[formal affair, work, everyday, other, party, ...hourglassgown65.2965.0014.2013.0035.5734.0070.7611Winter
1413025954.00643.00216.0091336c148.22140.004.38wedding[formal affair, work, everyday, other, party, ...hourglassdress65.5065.0018.0016.0036.1535.0065.431Winter
1613111771.00798.00112.0098134b139.21135.004.52formal affair[formal affair, work, other, party, wedding, d...athleticgown65.9166.0012.5412.0031.3131.0070.124Spring
1713153370.00973.0048.00109134b140.68138.004.69formal affair[formal affair, work, other, party, wedding, d...hourglassgown65.8166.0013.8112.0033.3231.0064.305Spring
2813686066.00618.0024.0070834c138.95135.004.62wedding[formal affair, work, other, party, wedding, d...hourglasssheath65.7966.0012.1912.0033.3632.0062.596Spring
2913758574.00932.0094.00110034b129.47129.004.63wedding[formal affair, other, party, wedding, date, v...athleticsheath64.8165.007.938.0030.7731.0057.054Winter
3013843187.00395.006.0048834c139.79138.004.63formal affair[formal affair, work, other, party, wedding, d...hourglassgown65.6766.0013.8612.0034.0032.0066.254Winter
4114405159.00416.0025.0050036c146.32140.004.52wedding[formal affair, work, other, party, wedding, d...hourglasssheath65.5465.0019.3416.0034.7033.0055.1510Winter
4814668431.00425.0038.0049434b133.21131.504.50formal affair[formal affair, other, party, wedding, date, v...hourglassgown64.8865.0010.8911.0034.0233.0069.591Winter
6215266239.00404.0042.0048534b136.36135.004.66formal affair[formal affair, work, other, party, wedding, d...hourglassgown65.7166.0012.2912.0033.2432.0070.605Spring
63152836174.00585.0035.0079434b125.80125.004.34party[formal affair, other, party, wedding, date, v...petitemini63.9764.007.898.0031.9232.0053.745Winter
6515347541.00408.0070.0051934c139.58135.004.55formal affair[formal affair, other, party, wedding, date]hourglassgown65.6666.0013.6812.0034.0633.0066.264Winter
6615400237.00537.0049.0062334c135.25135.004.56formal affair[formal affair, other, party, wedding, date]hourglassgown65.4365.0011.229.0033.7833.0078.805Winter
88166633128.00615.0061.0080434b126.02125.004.33wedding[formal affair, work, other, party, wedding, d...athleticmini64.0964.007.458.0030.8431.0059.316Spring
9116859229.00402.0098.0052934c135.33135.004.65formal affair[formal affair, party, other, wedding]hourglassgown65.5066.009.928.0032.6432.0061.094Spring
9216861044.00429.0059.0053234b135.79135.004.55wedding[formal affair, other, party, wedding, date]hourglassgown65.5866.0011.109.0031.6031.0065.235Spring
1972107648476.00437.0042.0055536c142.37138.004.51wedding[formal affair, work, other, party, wedding, d...hourglassdress65.2365.0015.4712.0034.4033.0059.1710Winter
\n", "
" ], "text/plain": [ " item_id fit_small fit fit_large user_count bust_size_top \\\n", "2 124204 42.00 630.00 123.00 795 34b \n", "5 125465 150.00 720.00 13.00 883 34b \n", "6 125564 98.00 443.00 66.00 607 34b \n", "12 128959 44.00 447.00 25.00 516 34c \n", "14 130259 54.00 643.00 216.00 913 36c \n", "16 131117 71.00 798.00 112.00 981 34b \n", "17 131533 70.00 973.00 48.00 1091 34b \n", "28 136860 66.00 618.00 24.00 708 34c \n", "29 137585 74.00 932.00 94.00 1100 34b \n", "30 138431 87.00 395.00 6.00 488 34c \n", "41 144051 59.00 416.00 25.00 500 36c \n", "48 146684 31.00 425.00 38.00 494 34b \n", "62 152662 39.00 404.00 42.00 485 34b \n", "63 152836 174.00 585.00 35.00 794 34b \n", "65 153475 41.00 408.00 70.00 519 34c \n", "66 154002 37.00 537.00 49.00 623 34c \n", "88 166633 128.00 615.00 61.00 804 34b \n", "91 168592 29.00 402.00 98.00 529 34c \n", "92 168610 44.00 429.00 59.00 532 34b \n", "1972 1076484 76.00 437.00 42.00 555 36c \n", "\n", " weight_mean weight_median rating_average rented_for_top \\\n", "2 136.88 135.00 4.65 party \n", "5 143.82 138.00 4.69 formal affair \n", "6 142.99 138.00 4.43 formal affair \n", "12 140.82 138.00 4.65 formal affair \n", "14 148.22 140.00 4.38 wedding \n", "16 139.21 135.00 4.52 formal affair \n", "17 140.68 138.00 4.69 formal affair \n", "28 138.95 135.00 4.62 wedding \n", "29 129.47 129.00 4.63 wedding \n", "30 139.79 138.00 4.63 formal affair \n", "41 146.32 140.00 4.52 wedding \n", "48 133.21 131.50 4.50 formal affair \n", "62 136.36 135.00 4.66 formal affair \n", "63 125.80 125.00 4.34 party \n", "65 139.58 135.00 4.55 formal affair \n", "66 135.25 135.00 4.56 formal affair \n", "88 126.02 125.00 4.33 wedding \n", "91 135.33 135.00 4.65 formal affair \n", "92 135.79 135.00 4.55 wedding \n", "1972 142.37 138.00 4.51 wedding \n", "\n", " rented_for_all body_type_top \\\n", "2 [formal affair, work, other, party, wedding, d... hourglass \n", "5 [formal affair, work, other, party, wedding, d... hourglass \n", "6 [formal affair, everyday, other, party, weddin... hourglass \n", "12 [formal affair, work, everyday, other, party, ... hourglass \n", "14 [formal affair, work, everyday, other, party, ... hourglass \n", "16 [formal affair, work, other, party, wedding, d... athletic \n", "17 [formal affair, work, other, party, wedding, d... hourglass \n", "28 [formal affair, work, other, party, wedding, d... hourglass \n", "29 [formal affair, other, party, wedding, date, v... athletic \n", "30 [formal affair, work, other, party, wedding, d... hourglass \n", "41 [formal affair, work, other, party, wedding, d... hourglass \n", "48 [formal affair, other, party, wedding, date, v... hourglass \n", "62 [formal affair, work, other, party, wedding, d... hourglass \n", "63 [formal affair, other, party, wedding, date, v... petite \n", "65 [formal affair, other, party, wedding, date] hourglass \n", "66 [formal affair, other, party, wedding, date] hourglass \n", "88 [formal affair, work, other, party, wedding, d... athletic \n", "91 [formal affair, party, other, wedding] hourglass \n", "92 [formal affair, other, party, wedding, date] hourglass \n", "1972 [formal affair, work, other, party, wedding, d... hourglass \n", "\n", " category_top height_mean height_median size_mean size_median \\\n", "2 dress 65.15 65.00 10.97 8.00 \n", "5 gown 65.83 66.00 16.92 13.00 \n", "6 gown 65.48 65.00 17.05 16.00 \n", "12 gown 65.29 65.00 14.20 13.00 \n", "14 dress 65.50 65.00 18.00 16.00 \n", "16 gown 65.91 66.00 12.54 12.00 \n", "17 gown 65.81 66.00 13.81 12.00 \n", "28 sheath 65.79 66.00 12.19 12.00 \n", "29 sheath 64.81 65.00 7.93 8.00 \n", "30 gown 65.67 66.00 13.86 12.00 \n", "41 sheath 65.54 65.00 19.34 16.00 \n", "48 gown 64.88 65.00 10.89 11.00 \n", "62 gown 65.71 66.00 12.29 12.00 \n", "63 mini 63.97 64.00 7.89 8.00 \n", "65 gown 65.66 66.00 13.68 12.00 \n", "66 gown 65.43 65.00 11.22 9.00 \n", "88 mini 64.09 64.00 7.45 8.00 \n", "91 gown 65.50 66.00 9.92 8.00 \n", "92 gown 65.58 66.00 11.10 9.00 \n", "1972 dress 65.23 65.00 15.47 12.00 \n", "\n", " age_mean age_median review_length_average review_month_top \\\n", "2 33.33 33.00 62.13 12 \n", "5 32.70 31.00 64.22 4 \n", "6 38.16 36.00 61.73 11 \n", "12 35.57 34.00 70.76 11 \n", "14 36.15 35.00 65.43 1 \n", "16 31.31 31.00 70.12 4 \n", "17 33.32 31.00 64.30 5 \n", "28 33.36 32.00 62.59 6 \n", "29 30.77 31.00 57.05 4 \n", "30 34.00 32.00 66.25 4 \n", "41 34.70 33.00 55.15 10 \n", "48 34.02 33.00 69.59 1 \n", "62 33.24 32.00 70.60 5 \n", "63 31.92 32.00 53.74 5 \n", "65 34.06 33.00 66.26 4 \n", "66 33.78 33.00 78.80 5 \n", "88 30.84 31.00 59.31 6 \n", "91 32.64 32.00 61.09 4 \n", "92 31.60 31.00 65.23 5 \n", "1972 34.40 33.00 59.17 10 \n", "\n", " review_season_top \n", "2 Winter \n", "5 Spring \n", "6 Winter \n", "12 Winter \n", "14 Winter \n", "16 Spring \n", "17 Spring \n", "28 Spring \n", "29 Winter \n", "30 Winter \n", "41 Winter \n", "48 Winter \n", "62 Spring \n", "63 Winter \n", "65 Winter \n", "66 Winter \n", "88 Spring \n", "91 Spring \n", "92 Spring \n", "1972 Winter " ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Example item\n", "content_based_similarity(similarity_matrix, 123373)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Text Review-Based Recommender\n", "\n", "To recommend items based on text reviews, Natural Language Processing (NLP) is used to:\n", "- Clean the text by removing stopwords and performing lemmatization.\n", "- Create the Term Frequency-Inverse Document Frequency (TF-IDF) vectors for the *documents*, which are the reviews.\n", "- Compute the pairwise cosine similarity from the constructed matrix of TF-IDF scores." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[nltk_data] Downloading package stopwords to\n", "[nltk_data] /Users/czarinaluna/nltk_data...\n", "[nltk_data] Package stopwords is already up-to-date!\n", "[nltk_data] Downloading package wordnet to\n", "[nltk_data] /Users/czarinaluna/nltk_data...\n", "[nltk_data] Package wordnet is already up-to-date!\n" ] } ], "source": [ "# Import the Natural Language Toolkit (nltk)\n", "import re\n", "import nltk\n", "\n", "nltk.download('stopwords')\n", "nltk.download('wordnet')\n", "\n", "stopwords = nltk.corpus.stopwords.words('english')" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "from nltk.stem.wordnet import WordNetLemmatizer\n", "\n", "lemmatizer = nltk.stem.WordNetLemmatizer()\n", "\n", "def preprocess(text):\n", " '''\n", " Text preprocessing to standardize, remove special characters and stopwords, and lemmatize.\n", " '''\n", " text = text.apply(lambda x: x.lower())\n", " text = text.apply(lambda x: re.sub(r'([^A-Za-z0-9|\\s|[:punct:]]*)', '', x))\n", " text = text.apply(lambda x: x.replace('[^a-zA-Z#]', ' '))\n", " text = text.apply(lambda x: ' '.join([i for i in x.split() if len(i)>3]))\n", " text = text.apply(lambda x: x.split())\n", " text = text.apply(lambda x: [lemmatizer.lemmatize(word) for word in x])\n", " text = text.apply(lambda x: [word for word in x if word not in stopwords])\n", " text = text.apply(lambda x: ' '.join(x))\n", " \n", " return text\n", "\n", "def create_text_df(df=data, item_df=item_data, text_review=True, category=False):\n", " '''\n", " Creates new feature combining review summary and review text, to add to item data.\n", " '''\n", " if item_df is None:\n", " item_df = create_item_df(df)\n", " \n", " text_df = df.copy()\n", " text_df['review'] = text_df['review_summary'] + ' ' + text_df['review_text']\n", " text_df['review'] = text_df['review'].fillna('')\n", " text_df['review'] = preprocess(text_df['review'])\n", " \n", " if text_review:\n", " text_df = text_df[['item_id', 'review']].groupby('item_id').agg(' '.join).reset_index()\n", " text_item_df = item_df.merge(text_df, on='item_id')\n", " \n", " if text_review == False and category == True:\n", " text_df = text_df[['item_id', 'rented_for']].groupby('item_id').agg(' '.join).reset_index()\n", " text_item_df = item_df.merge(text_df, on='item_id')\n", " \n", " return text_item_df" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "# Create new dataframe for text item data\n", "text_item_data = create_text_df()" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer\n", "\n", "count = CountVectorizer()\n", "tfidf = TfidfVectorizer(ngram_range=(1,3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> \"The TF-IDF score is the frequency of a word occurring in a document, down-weighted by the number of documents in which it occurs.\" [(Aditya Sharma)](https://www.datacamp.com/community/tutorials/recommender-systems-python])\n", "\n", "Finally, to compute the cosine similarity score between the text reviews, the dot product between each TF-IDF vector is calculated in the function `text_based_recommendation` below:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "from sklearn.metrics.pairwise import cosine_similarity, linear_kernel\n", "\n", "def text_based_recommendation(text_item_df, item_id, n=10, text_review=True, category=False):\n", " '''\n", " Returns the most similar item recommendations to the given item based on text reviews.\n", " '''\n", " if text_review:\n", " tfidf_matrix = tfidf.fit_transform(text_item_df['review'])\n", " cosine_similarity_ = linear_kernel(tfidf_matrix, tfidf_matrix)\n", "\n", " if text_review == False and category == True:\n", " count_matrix = count.fit_transform(text_item_df['rented_for'])\n", " cosine_similarity_ = cosine_similarity(count_matrix, count_matrix)\n", " \n", " indices = pd.Series(text_item_df.index, index=text_item_df['item_id']).drop_duplicates()\n", " idx = indices[item_id]\n", " \n", " similarity_scores = list(enumerate(cosine_similarity_[idx]))\n", " similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)\n", "\n", " top_similarity_scores = similarity_scores[1:n+1]\n", " item_indices = [i[0] for i in top_similarity_scores]\n", " top_text_based_recommendations = text_item_df['item_id'].iloc[item_indices]\n", " \n", " display(item_data.loc[item_data['item_id']==item_id])\n", " print(f'----------------------------------------\\nTop {n} Recommendations for Item #{item_id}:')\n", " \n", " recommendations_df = item_data.loc[item_data['item_id'].isin(top_text_based_recommendations)]\n", " return recommendations_df" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
item_idfit_smallfitfit_largeuser_countbust_size_topweight_meanweight_medianrating_averagerented_for_toprented_for_allbody_type_topcategory_topheight_meanheight_mediansize_meansize_medianage_meanage_medianreview_length_averagereview_month_topreview_season_top
012337373.00566.0047.0068636d140.67135.004.40formal affair[formal affair, work, other, party, wedding, d...hourglassgown65.3965.0015.1213.0034.3633.0066.0812Winter
\n", "
" ], "text/plain": [ " item_id fit_small fit fit_large user_count bust_size_top \\\n", "0 123373 73.00 566.00 47.00 686 36d \n", "\n", " weight_mean weight_median rating_average rented_for_top \\\n", "0 140.67 135.00 4.40 formal affair \n", "\n", " rented_for_all body_type_top \\\n", "0 [formal affair, work, other, party, wedding, d... hourglass \n", "\n", " category_top height_mean height_median size_mean size_median age_mean \\\n", "0 gown 65.39 65.00 15.12 13.00 34.36 \n", "\n", " age_median review_length_average review_month_top review_season_top \n", "0 33.00 66.08 12 Winter " ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "----------------------------------------\n", "Top 10 Recommendations for Item #123373:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
item_idfit_smallfitfit_largeuser_countbust_size_topweight_meanweight_medianrating_averagerented_for_toprented_for_allbody_type_topcategory_topheight_meanheight_mediansize_meansize_medianage_meanage_medianreview_length_averagereview_month_topreview_season_top
112379365.001497.00152.00171434b132.98130.004.77formal affair[formal affair, work, other, party, wedding, d...hourglassgown65.0665.009.728.0031.3131.0074.415Winter
5125465150.00720.0013.0088334b143.82138.004.69formal affair[formal affair, work, other, party, wedding, d...hourglassgown65.8366.0016.9213.0032.7031.0064.224Spring
612556498.00443.0066.0060734b142.99138.004.43formal affair[formal affair, everyday, other, party, weddin...hourglassgown65.4865.0017.0516.0038.1636.0061.7311Winter
1012786578.001278.0037.00139334b136.03135.004.72formal affair[formal affair, work, other, party, wedding, d...hourglassgown65.1765.0012.0412.0036.3535.0068.9311Winter
1713153370.00973.0048.00109134b140.68138.004.69formal affair[formal affair, work, other, party, wedding, d...hourglassgown65.8166.0013.8112.0033.3231.0064.305Spring
2013273860.00937.00580.00157734b144.50138.004.62formal affair[formal affair, work, other, party, wedding, d...hourglassgown65.5666.0015.4513.0032.1432.0078.315Winter
311390867.00331.00178.0051634b137.44135.004.61formal affair[formal affair, other, party, wedding, date]hourglassgown65.4565.0011.9911.0035.4934.0061.4512Winter
331403216.00288.00156.0045034c140.17135.004.54formal affair[formal affair, work, other, party, wedding, d...hourglassgown65.7066.0013.6612.0035.4233.5055.5111Winter
4714590676.001154.00242.00147234b135.60135.004.51formal affair[formal affair, other, party, wedding, date, v...hourglassgown65.7866.0010.599.0030.3630.0064.904Winter
6515347541.00408.0070.0051934c139.58135.004.55formal affair[formal affair, other, party, wedding, date]hourglassgown65.6666.0013.6812.0034.0633.0066.264Winter
\n", "
" ], "text/plain": [ " item_id fit_small fit fit_large user_count bust_size_top \\\n", "1 123793 65.00 1497.00 152.00 1714 34b \n", "5 125465 150.00 720.00 13.00 883 34b \n", "6 125564 98.00 443.00 66.00 607 34b \n", "10 127865 78.00 1278.00 37.00 1393 34b \n", "17 131533 70.00 973.00 48.00 1091 34b \n", "20 132738 60.00 937.00 580.00 1577 34b \n", "31 139086 7.00 331.00 178.00 516 34b \n", "33 140321 6.00 288.00 156.00 450 34c \n", "47 145906 76.00 1154.00 242.00 1472 34b \n", "65 153475 41.00 408.00 70.00 519 34c \n", "\n", " weight_mean weight_median rating_average rented_for_top \\\n", "1 132.98 130.00 4.77 formal affair \n", "5 143.82 138.00 4.69 formal affair \n", "6 142.99 138.00 4.43 formal affair \n", "10 136.03 135.00 4.72 formal affair \n", "17 140.68 138.00 4.69 formal affair \n", "20 144.50 138.00 4.62 formal affair \n", "31 137.44 135.00 4.61 formal affair \n", "33 140.17 135.00 4.54 formal affair \n", "47 135.60 135.00 4.51 formal affair \n", "65 139.58 135.00 4.55 formal affair \n", "\n", " rented_for_all body_type_top \\\n", "1 [formal affair, work, other, party, wedding, d... hourglass \n", "5 [formal affair, work, other, party, wedding, d... hourglass \n", "6 [formal affair, everyday, other, party, weddin... hourglass \n", "10 [formal affair, work, other, party, wedding, d... hourglass \n", "17 [formal affair, work, other, party, wedding, d... hourglass \n", "20 [formal affair, work, other, party, wedding, d... hourglass \n", "31 [formal affair, other, party, wedding, date] hourglass \n", "33 [formal affair, work, other, party, wedding, d... hourglass \n", "47 [formal affair, other, party, wedding, date, v... hourglass \n", "65 [formal affair, other, party, wedding, date] hourglass \n", "\n", " category_top height_mean height_median size_mean size_median age_mean \\\n", "1 gown 65.06 65.00 9.72 8.00 31.31 \n", "5 gown 65.83 66.00 16.92 13.00 32.70 \n", "6 gown 65.48 65.00 17.05 16.00 38.16 \n", "10 gown 65.17 65.00 12.04 12.00 36.35 \n", "17 gown 65.81 66.00 13.81 12.00 33.32 \n", "20 gown 65.56 66.00 15.45 13.00 32.14 \n", "31 gown 65.45 65.00 11.99 11.00 35.49 \n", "33 gown 65.70 66.00 13.66 12.00 35.42 \n", "47 gown 65.78 66.00 10.59 9.00 30.36 \n", "65 gown 65.66 66.00 13.68 12.00 34.06 \n", "\n", " age_median review_length_average review_month_top review_season_top \n", "1 31.00 74.41 5 Winter \n", "5 31.00 64.22 4 Spring \n", "6 36.00 61.73 11 Winter \n", "10 35.00 68.93 11 Winter \n", "17 31.00 64.30 5 Spring \n", "20 32.00 78.31 5 Winter \n", "31 34.00 61.45 12 Winter \n", "33 33.50 55.51 11 Winter \n", "47 30.00 64.90 4 Winter \n", "65 33.00 66.26 4 Winter " ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Same example item\n", "text_based_recommendation(text_item_data, 123373, n=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Key Differences** between the text-based recommendations and the content-based recommendations to the same item:\n", "\n", "|Feature|Content-based|Text-based|Item|\n", "|---|---|---:|:---|\n", "|rating_average|4.38 - 4.69|4.43 - 4.77|4.40|\n", "|rented_for_top|party, formal affair, wedding|formal affair (across the board)|formal affair|\n", "|body_type_top|hourglass, athlete|hourglass (across the board)|hourglass|\n", "|category_top|dress, gown, sheath|gown (across the board)|gown|\n", "\n", "***\n", "\n", "### Collaborative Filtering Systems\n", "\n", "Collaborative filtering systems recommend items to a user based on the user's past ratings *and* on the past ratings and preferences of other similar users. I apply the different implementations of collaborative filtering recommendation systems using the Python library [`surprise`](https://surprise.readthedocs.io/en/stable/index.html):\n", "\n", "\n", "|Prediction Algorithm|Description|\n", "|:---|:---|\n", "|[Normal Predictor](https://surprise.readthedocs.io/en/stable/basic_algorithms.html#surprise.prediction_algorithms.random_pred.NormalPredictor)|Algorithm predicting a random rating based on the distribution of the training set, which is assumed to be normal.\n", "|[Baseline Only](https://surprise.readthedocs.io/en/stable/basic_algorithms.html#surprise.prediction_algorithms.baseline_only.BaselineOnly)|Algorithm predicting the baseline estimate for given user and item.|\n", "|[KNN Basic](https://surprise.readthedocs.io/en/stable/knn_inspired.html#surprise.prediction_algorithms.knns.KNNBasic)|A basic collaborative filtering algorithm.|\n", "|[KNN Baseline](https://surprise.readthedocs.io/en/stable/knn_inspired.html#surprise.prediction_algorithms.knns.KNNBaseline)|A basic collaborative filtering algorithm, taking into account the mean ratings of each user.|\n", "|[KNN with Means](https://surprise.readthedocs.io/en/stable/knn_inspired.html#surprise.prediction_algorithms.knns.KNNWithMeans)|A basic collaborative filtering algorithm, taking into account the z-score normalization of each user.|\n", "|[KNN with Z-Score](https://surprise.readthedocs.io/en/stable/knn_inspired.html#surprise.prediction_algorithms.knns.KNNWithZScore)|A basic collaborative filtering algorithm taking into account a *baseline* rating.\n", "|[Single Value Decomposition](https://surprise.readthedocs.io/en/stable/matrix_factorization.html#surprise.prediction_algorithms.matrix_factorization.SVD)|The famous SVD algorithm, as popularized by Simon Funk during the Netflix Prize. When baselines are not used, this is equivalent to Probabilistic Matrix Factorization.|\n", "|[Single Value Decomposition ++](https://surprise.readthedocs.io/en/stable/matrix_factorization.html#surprise.prediction_algorithms.matrix_factorization.SVDpp)|The SVD++ algorithm, an extension of SVD taking into account implicit ratings.|\n", "|[Non-Negative Matrix Factorization](https://surprise.readthedocs.io/en/stable/matrix_factorization.html#surprise.prediction_algorithms.matrix_factorization.NMF)|A collaborative filtering algorithm based on Non-negative Matrix Factorization.|\n", "|[SlopeOne](https://surprise.readthedocs.io/en/stable/slope_one.html#surprise.prediction_algorithms.slope_one.SlopeOne)|A simple yet accurate collaborative filtering algorithm.|\n", "|[CoClustering](https://surprise.readthedocs.io/en/stable/co_clustering.html#surprise.prediction_algorithms.co_clustering.CoClustering)|A collaborative filtering algorithm based on co-clustering.|" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "data = data.rename(columns={'user_id': 'userID', 'item_id': 'itemID'})\n", "\n", "df_columns = ['userID', 'itemID', 'rating']\n", "df = data[df_columns]\n", "\n", "# Only use items with more than 25 ratings\n", "df['reviews'] = df.groupby(['itemID'])['rating'].transform('count')\n", "df = df.loc[df['reviews']>25, df_columns]" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "from surprise import Reader, Dataset\n", "\n", "reader = Reader(rating_scale=(1,5))\n", "read_data = Dataset.load_from_df(df, reader)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data Modeling:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Estimating biases using als...\n", "Estimating biases using als...\n", "Estimating biases using als...\n", "Computing the cosine similarity matrix...\n", "Done computing similarity matrix.\n", "Computing the cosine similarity matrix...\n", "Done computing similarity matrix.\n", "Computing the cosine similarity matrix...\n", "Done computing similarity matrix.\n", "Estimating biases using als...\n", "Computing the cosine similarity matrix...\n", "Done computing similarity matrix.\n", "Estimating biases using als...\n", "Computing the cosine similarity matrix...\n", "Done computing similarity matrix.\n", "Estimating biases using als...\n", "Computing the cosine similarity matrix...\n", "Done computing similarity matrix.\n", "Computing the cosine similarity matrix...\n", "Done computing similarity matrix.\n", "Computing the cosine similarity matrix...\n", "Done computing similarity matrix.\n", "Computing the cosine similarity matrix...\n", "Done computing similarity matrix.\n", "Computing the cosine similarity matrix...\n", "Done computing similarity matrix.\n", "Computing the cosine similarity matrix...\n", "Done computing similarity matrix.\n", "Computing the cosine similarity matrix...\n", "Done computing similarity matrix.\n" ] } ], "source": [ "from surprise import NormalPredictor, BaselineOnly, SVD, SVDpp, NMF, SlopeOne, CoClustering\n", "from surprise.model_selection import cross_validate\n", "\n", "from surprise.prediction_algorithms import knns\n", "sim_cos = {'name':'cosine', 'user_based':False}\n", "\n", "evaluation = []\n", "recommendation_systems = [NormalPredictor(), BaselineOnly(), knns.KNNBasic(sim_options=sim_cos), knns.KNNBaseline(sim_options=sim_cos), knns.KNNWithMeans(sim_options=sim_cos), knns.KNNWithZScore(sim_options=sim_cos), SVD(), SVDpp(), NMF(), SlopeOne(), CoClustering()]\n", "\n", "# Evaluate recommendation systems using Mean Absolute Error\n", "for system in recommendation_systems:\n", " score = cross_validate(system, read_data, measures=['MAE'], cv=3, verbose=False)\n", " evaluation.append((str(system).split(' ')[0].split('.')[-1], score['test_mae'].mean()))\n", "\n", "pd.options.display.float_format = '{:.4f}'.format \n", "\n", "evaluation = pd.DataFrame(evaluation, columns=['system', 'mae'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To evaluate, I use the mean absolute error which measures the difference between the rating predicted by the model and the actual rating by the user:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
systemmae
0NormalPredictor0.6566
1BaselineOnly0.5398
2KNNBasic0.5727
3KNNBaseline0.5457
4KNNWithMeans0.5624
5KNNWithZScore0.5635
6SVD0.5359
7SVDpp0.5396
8NMF0.6980
9SlopeOne0.5802
10CoClustering0.5673
\n", "
" ], "text/plain": [ " system mae\n", "0 NormalPredictor 0.6566\n", "1 BaselineOnly 0.5398\n", "2 KNNBasic 0.5727\n", "3 KNNBaseline 0.5457\n", "4 KNNWithMeans 0.5624\n", "5 KNNWithZScore 0.5635\n", "6 SVD 0.5359\n", "7 SVDpp 0.5396\n", "8 NMF 0.6980\n", "9 SlopeOne 0.5802\n", "10 CoClustering 0.5673" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "evaluation" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Computing the pearson similarity matrix...\n", "Done computing similarity matrix.\n", "Computing the pearson similarity matrix...\n", "Done computing similarity matrix.\n", "Computing the pearson similarity matrix...\n", "Done computing similarity matrix.\n", "Estimating biases using als...\n", "Computing the pearson similarity matrix...\n", "Done computing similarity matrix.\n", "Estimating biases using als...\n", "Computing the pearson similarity matrix...\n", "Done computing similarity matrix.\n", "Estimating biases using als...\n", "Computing the pearson similarity matrix...\n", "Done computing similarity matrix.\n", "Computing the pearson similarity matrix...\n", "Done computing similarity matrix.\n", "Computing the pearson similarity matrix...\n", "Done computing similarity matrix.\n", "Computing the pearson similarity matrix...\n", "Done computing similarity matrix.\n", "Computing the pearson similarity matrix...\n", "Done computing similarity matrix.\n", "Computing the pearson similarity matrix...\n", "Done computing similarity matrix.\n", "Computing the pearson similarity matrix...\n", "Done computing similarity matrix.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
systemmae
0KNNBasic0.5729
1KNNBaseline0.5391
2KNNWithMeans0.5586
3KNNWithZScore0.5583
\n", "
" ], "text/plain": [ " system mae\n", "0 KNNBasic 0.5729\n", "1 KNNBaseline 0.5391\n", "2 KNNWithMeans 0.5586\n", "3 KNNWithZScore 0.5583" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Switch similarity measure from cosine to pearson\n", "sim_pearson = {'name':'pearson', 'user_based':False}\n", "\n", "pearson_evaluation = []\n", "pearson_knns = [knns.KNNBasic(sim_options=sim_pearson), \n", " knns.KNNBaseline(sim_options=sim_pearson), \n", " knns.KNNWithMeans(sim_options=sim_pearson), \n", " knns.KNNWithZScore(sim_options=sim_pearson)]\n", "\n", "for system in pearson_knns:\n", " pearson_score = cross_validate(system, read_data, measures=['MAE'], cv=3, verbose=False)\n", " pearson_evaluation.append((str(system).split(' ')[0].split('.')[-1], pearson_score['test_mae'].mean()))\n", "\n", "pearson_evaluation = pd.DataFrame(pearson_evaluation, columns=['system', 'mae'])\n", "pearson_evaluation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The mean absolute errors of `KNNBaseline`, `KNNWithMeans`, and `KNNWithZScore` decreased by 0.01 and `KNNBaseline` becomes second to `SVD`.\n", "\n", "***\n", "\n", "#### Hyperparamater Tuning\n", "\n", "`GridSearchCV` is performed to optimize the Single Value Decomposition models:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "from surprise.model_selection import GridSearchCV\n", "\n", "def grid_search(system, params):\n", " '''\n", " Implements grid search and returns best cross validation scores and parameters.\n", " '''\n", " model = GridSearchCV(system, param_grid=params, n_jobs=-1)\n", " model.fit(read_data)\n", " \n", " print(model.best_score)\n", " print(model.best_params)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'rmse': 0.6785759649206107, 'mae': 0.5288288608687417}\n", "{'rmse': {'n_factors': 10, 'n_epochs': 100, 'lr_all': 0.001, 'reg_all': 0.1}, 'mae': {'n_factors': 10, 'n_epochs': 20, 'lr_all': 0.01, 'reg_all': 0.02}}\n" ] } ], "source": [ "params_svd1 = {'n_factors': [10, 50, 100], 'n_epochs': [10, 20, 100], 'lr_all': [0.001, 0.005, 0.01], 'reg_all': [0.02, 0.05, 0.1]}\n", "grid_search(SVD, params_svd1)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'rmse': 0.6785226031170102, 'mae': 0.5300882259833981}\n", "{'rmse': {'n_factors': 10, 'n_epochs': 100, 'lr_all': 0.001, 'reg_all': 0.1}, 'mae': {'n_factors': 10, 'n_epochs': 100, 'lr_all': 0.005, 'reg_all': 0.1}}\n" ] } ], "source": [ "params_svdpp1 = {'n_factors': [10, 50, 100], 'n_epochs': [10, 20, 100], 'lr_all': [0.001, 0.005, 0.01], 'reg_all': [0.02, 0.05, 0.1]}\n", "grid_search(SVDpp, params_svdpp1)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
systemmae
0SVD0.5302
1SVDpp0.5329
\n", "
" ], "text/plain": [ " system mae\n", "0 SVD 0.5302\n", "1 SVDpp 0.5329" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "svd_evaluation = []\n", "\n", "# Evaluate tuned Singular Value Decomposition models\n", "for system in [SVD(n_factors=10, n_epochs=20, lr_all=0.01, reg_all=0.02), \n", " SVDpp(n_factors=10, n_epochs=100, lr_all=0.005, reg_all=0.1)]:\n", " svd_score = cross_validate(system, read_data, measures=['MAE'], cv=3, verbose=False)\n", " svd_evaluation.append((str(system).split(' ')[0].split('.')[-1], svd_score['test_mae'].mean()))\n", "\n", "svd_evaluation = pd.DataFrame(svd_evaluation, columns=['system', 'mae'])\n", "svd_evaluation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## V. Results and Recommendations\n", "\n", "#### Systems Performance:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
systemmae
0NormalPredictor0.6566
1BaselineOnly0.5398
2KNNBasic0.5727
3KNNBaseline0.5457
4KNNWithMeans0.5624
5KNNWithZScore0.5635
6SVD0.5359
7SVDpp0.5396
8NMF0.6980
9SlopeOne0.5802
10CoClustering0.5673
11KNNBasic0.5729
12KNNBaseline0.5391
13KNNWithMeans0.5586
14KNNWithZScore0.5583
15SVD0.5302
16SVDpp0.5329
\n", "
" ], "text/plain": [ " system mae\n", "0 NormalPredictor 0.6566\n", "1 BaselineOnly 0.5398\n", "2 KNNBasic 0.5727\n", "3 KNNBaseline 0.5457\n", "4 KNNWithMeans 0.5624\n", "5 KNNWithZScore 0.5635\n", "6 SVD 0.5359\n", "7 SVDpp 0.5396\n", "8 NMF 0.6980\n", "9 SlopeOne 0.5802\n", "10 CoClustering 0.5673\n", "11 KNNBasic 0.5729\n", "12 KNNBaseline 0.5391\n", "13 KNNWithMeans 0.5586\n", "14 KNNWithZScore 0.5583\n", "15 SVD 0.5302\n", "16 SVDpp 0.5329" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_systems = pd.concat([evaluation, pearson_evaluation, svd_evaluation], ignore_index=True)\n", "all_systems" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results show that the tuned Singular Value Decomposition algorithm attains the lowest Mean Absolute Error of 0.5302 with a rating scale of 1 to 5." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.set_style('whitegrid')\n", "fig, ax = plt.subplots(figsize=(20,8))\n", "plt.subplots_adjust(bottom=0.2)\n", "\n", "# Plot the Mean Absolute Error of the models\n", "sns.barplot(all_systems.index, all_systems['mae'], palette='tab20b') \n", "ax.set(xlim=[-0.5,16.5], xlabel='Recommendation System', ylabel='Mean Absolute Error')\n", "ax.set_title('Collaborative Filtering and Recommender Systems Evaluation', fontsize=20)\n", "\n", "labels = ['Normal Predictor', 'Baseline Only', 'KNN Basic Cosine', 'KNN Baseline Cosine', 'KNN Means Cosine', \n", " 'KNN Z-Score Cosine', 'Default SVD', 'Default SVD++', 'NMF', 'Slope One', 'Co-Clustering', 'KNN Basic Pearson', \n", " 'KNN Baseline Pearson', 'KNN Means Pearson', 'KNN Z-Score Pearson', 'Tuned SVD', 'Tuned SVD++']\n", "plt.xticks(all_systems.index, labels, rotation=45)\n", "\n", "plt.savefig('data/images/fig11.png', dpi=200, transparent=True)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Recommender Engine:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "def svd_recommendation(user_id, n=10):\n", " '''\n", " Returns top item recommendations generated by the Single Value Decomposition model.\n", " '''\n", " unique_ids = df['itemID'].unique()\n", " item_user_id = df.loc[df['userID']==user_id, 'itemID']\n", " items_to_predict = np.setdiff1d(unique_ids, item_user_id)\n", " \n", " engine = SVD(n_factors=10, n_epochs=20, lr_all=0.01, reg_all=0.02)\n", " engine.fit(read_data.build_full_trainset())\n", "\n", " svd_recommendations = []\n", " for i in items_to_predict:\n", " svd_recommendations.append((i, engine.predict(uid=user_id, iid=i).est))\n", "\n", " display(user_data.loc[user_data['user_id']==user_id])\n", " print(f'----------------------------------------\\nTop {n} Recommendations for User #{user_id}:')\n", " \n", " svd_recommendations = pd.DataFrame(svd_recommendations, columns=['item_id', 'predicted_rating'])\n", " svd_recommendations = svd_recommendations.sort_values('predicted_rating', ascending=False).head(n)\n", " svd_recommendations = svd_recommendations.merge(item_data, on='item_id')\n", " \n", " return svd_recommendations" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user_idbust_sizerating_countweightrating_averagerented_for_topbody_typecategory_topheightsizeagereview_length_averagereview_month_topreview_season_toprented_for_allcategory_all
5276950148532a42105.00004.9762partypetitedress62.0000427.000033.61904Spring[formal affair, work, everyday, other, party, ...[sheath, dress, gown, romper, skirt, shirtdres...
7699573151732a17113.00005.0000partypetitedress62.0000125.000059.11769Summer[formal affair, work, everyday, party, date][down, dress, sheath, gown, suit, top, jacket,...
8965384960332a17100.00005.0000everydaypetitedress62.0000127.000091.88246Summer[formal affair, work, everyday, other, party, ...[shift, dress, gown, pants, sweater, top]
1526214502932a16115.00004.5000everydaypetitedress62.0000443.000014.18758Summer[work, everyday, vacation, other][dress, romper, maxi, skirt, culottes, jumpsui...
5053048061132a12125.00004.5833partypetitedress62.0000135.000080.00004Fall[party, wedding][sheath, dress, skirt]
\n", "
" ], "text/plain": [ " user_id bust_size rating_count weight rating_average \\\n", "52769 501485 32a 42 105.0000 4.9762 \n", "76995 731517 32a 17 113.0000 5.0000 \n", "89653 849603 32a 17 100.0000 5.0000 \n", "15262 145029 32a 16 115.0000 4.5000 \n", "50530 480611 32a 12 125.0000 4.5833 \n", "\n", " rented_for_top body_type category_top height size age \\\n", "52769 party petite dress 62.0000 4 27.0000 \n", "76995 party petite dress 62.0000 1 25.0000 \n", "89653 everyday petite dress 62.0000 1 27.0000 \n", "15262 everyday petite dress 62.0000 4 43.0000 \n", "50530 party petite dress 62.0000 1 35.0000 \n", "\n", " review_length_average review_month_top review_season_top \\\n", "52769 33.6190 4 Spring \n", "76995 59.1176 9 Summer \n", "89653 91.8824 6 Summer \n", "15262 14.1875 8 Summer \n", "50530 80.0000 4 Fall \n", "\n", " rented_for_all \\\n", "52769 [formal affair, work, everyday, other, party, ... \n", "76995 [formal affair, work, everyday, party, date] \n", "89653 [formal affair, work, everyday, other, party, ... \n", "15262 [work, everyday, vacation, other] \n", "50530 [party, wedding] \n", "\n", " category_all \n", "52769 [sheath, dress, gown, romper, skirt, shirtdres... \n", "76995 [down, dress, sheath, gown, suit, top, jacket,... \n", "89653 [shift, dress, gown, pants, sweater, top] \n", "15262 [dress, romper, maxi, skirt, culottes, jumpsui... \n", "50530 [sheath, dress, skirt] " ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Sample user\n", "sample = user_data.sort_values('rating_count', ascending=False)\n", "sample.loc[((user_data['bust_size']=='32a') & (user_data['height']==62))].head(5)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user_idbust_sizerating_countweightrating_averagerented_for_topbody_typecategory_topheightsizeagereview_length_averagereview_month_topreview_season_toprented_for_allcategory_all
5053048061132a12125.00004.5833partypetitedress62.0000135.000080.00004Fall[party, wedding][sheath, dress, skirt]
\n", "
" ], "text/plain": [ " user_id bust_size rating_count weight rating_average \\\n", "50530 480611 32a 12 125.0000 4.5833 \n", "\n", " rented_for_top body_type category_top height size age \\\n", "50530 party petite dress 62.0000 1 35.0000 \n", "\n", " review_length_average review_month_top review_season_top \\\n", "50530 80.0000 4 Fall \n", "\n", " rented_for_all category_all \n", "50530 [party, wedding] [sheath, dress, skirt] " ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "----------------------------------------\n", "Top 10 Recommendations for User #480611:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
item_idpredicted_ratingfit_smallfitfit_largeuser_countbust_size_topweight_meanweight_medianrating_averagerented_for_toprented_for_allbody_type_topcategory_topheight_meanheight_mediansize_meansize_medianage_meanage_medianreview_length_averagereview_month_topreview_season_top
012152814.89614.000062.00005.00007134d131.7606130.00004.9577wedding[formal affair, party, other, wedding]athleticgown64.943765.000010.04238.000032.859232.000055.450710Fall
114513904.89040.000027.00002.00002934b136.5862135.00004.9655wedding[formal affair, work, everyday, party, wedding]hourglassmaxi66.206966.000013.793112.000034.827635.000062.72417Summer
225469114.88971.000028.00000.00002934c130.7931130.00004.9310everyday[work, everyday, party, wedding, date, vacation]hourglasspant64.827665.00009.44838.000035.069035.000035.793110Summer
311869234.86409.000091.00003.000010334c137.4757135.00004.9223party[formal affair, other, party, wedding, date, v...hourglassdress65.330165.000013.534014.000030.330130.000065.699011Winter
415470514.86061.000044.00003.00004834c137.2917135.00004.9167wedding[formal affair, party, other, wedding]hourglassgown65.270865.000011.687512.000034.083332.000055.895811Spring
53710224.85811.000053.00008.00006234b138.8065137.50004.9194formal affair[formal affair, other, party, wedding, date]athleticdress65.064565.000012.661312.000036.241933.500055.709712Winter
612002234.85743.000037.00002.00004234c142.9286140.00004.9048wedding[formal affair, work, everyday, other, party, ...hourglasssheath65.785766.000015.904816.000043.166741.500051.547611Winter
73863144.85590.000028.00009.00003738d171.7568150.00004.9189wedding[formal affair, work, other, party, wedding, d...hourglasssheath65.000065.000033.000032.000039.000039.000055.027011Winter
87403494.85261.000021.00007.00002934b136.2069135.00004.8621wedding[formal affair, work, everyday, other, party, ...hourglassshift65.103465.000012.758612.000035.689735.000041.03455Spring
913288984.85241.000042.00001.00004434b135.7500135.00004.9091formal affair[formal affair, other, wedding]hourglassgown66.045566.00009.61368.000033.818232.000063.272710Winter
\n", "
" ], "text/plain": [ " item_id predicted_rating fit_small fit fit_large user_count \\\n", "0 1215281 4.8961 4.0000 62.0000 5.0000 71 \n", "1 1451390 4.8904 0.0000 27.0000 2.0000 29 \n", "2 2546911 4.8897 1.0000 28.0000 0.0000 29 \n", "3 1186923 4.8640 9.0000 91.0000 3.0000 103 \n", "4 1547051 4.8606 1.0000 44.0000 3.0000 48 \n", "5 371022 4.8581 1.0000 53.0000 8.0000 62 \n", "6 1200223 4.8574 3.0000 37.0000 2.0000 42 \n", "7 386314 4.8559 0.0000 28.0000 9.0000 37 \n", "8 740349 4.8526 1.0000 21.0000 7.0000 29 \n", "9 1328898 4.8524 1.0000 42.0000 1.0000 44 \n", "\n", " bust_size_top weight_mean weight_median rating_average rented_for_top \\\n", "0 34d 131.7606 130.0000 4.9577 wedding \n", "1 34b 136.5862 135.0000 4.9655 wedding \n", "2 34c 130.7931 130.0000 4.9310 everyday \n", "3 34c 137.4757 135.0000 4.9223 party \n", "4 34c 137.2917 135.0000 4.9167 wedding \n", "5 34b 138.8065 137.5000 4.9194 formal affair \n", "6 34c 142.9286 140.0000 4.9048 wedding \n", "7 38d 171.7568 150.0000 4.9189 wedding \n", "8 34b 136.2069 135.0000 4.8621 wedding \n", "9 34b 135.7500 135.0000 4.9091 formal affair \n", "\n", " rented_for_all body_type_top \\\n", "0 [formal affair, party, other, wedding] athletic \n", "1 [formal affair, work, everyday, party, wedding] hourglass \n", "2 [work, everyday, party, wedding, date, vacation] hourglass \n", "3 [formal affair, other, party, wedding, date, v... hourglass \n", "4 [formal affair, party, other, wedding] hourglass \n", "5 [formal affair, other, party, wedding, date] athletic \n", "6 [formal affair, work, everyday, other, party, ... hourglass \n", "7 [formal affair, work, other, party, wedding, d... hourglass \n", "8 [formal affair, work, everyday, other, party, ... hourglass \n", "9 [formal affair, other, wedding] hourglass \n", "\n", " category_top height_mean height_median size_mean size_median age_mean \\\n", "0 gown 64.9437 65.0000 10.0423 8.0000 32.8592 \n", "1 maxi 66.2069 66.0000 13.7931 12.0000 34.8276 \n", "2 pant 64.8276 65.0000 9.4483 8.0000 35.0690 \n", "3 dress 65.3301 65.0000 13.5340 14.0000 30.3301 \n", "4 gown 65.2708 65.0000 11.6875 12.0000 34.0833 \n", "5 dress 65.0645 65.0000 12.6613 12.0000 36.2419 \n", "6 sheath 65.7857 66.0000 15.9048 16.0000 43.1667 \n", "7 sheath 65.0000 65.0000 33.0000 32.0000 39.0000 \n", "8 shift 65.1034 65.0000 12.7586 12.0000 35.6897 \n", "9 gown 66.0455 66.0000 9.6136 8.0000 33.8182 \n", "\n", " age_median review_length_average review_month_top review_season_top \n", "0 32.0000 55.4507 10 Fall \n", "1 35.0000 62.7241 7 Summer \n", "2 35.0000 35.7931 10 Summer \n", "3 30.0000 65.6990 11 Winter \n", "4 32.0000 55.8958 11 Spring \n", "5 33.5000 55.7097 12 Winter \n", "6 41.5000 51.5476 11 Winter \n", "7 39.0000 55.0270 11 Winter \n", "8 35.0000 41.0345 5 Spring \n", "9 32.0000 63.2727 10 Winter " ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "svd_recommendation(480611)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## VI. Further Research\n", "\n", "For further research, the data should be updated with more recent rentals, and more features should be added such as prices for the products as well as product description." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Contact\n", "\n", "Feel free to contact me for any questions and connect with me on [Linkedin](https://www.linkedin.com/in/czarinagluna/)." ] } ], "metadata": { "kernelspec": { "display_name": "Python (learn-env)", "language": "python", "name": "learn-env" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }