{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "4553d762-c2a5-416b-a6af-e31cf10d8060",
   "metadata": {},
   "source": [
    "# Movie Madness  \n",
    "\n",
    "**Description:** \n",
    "You are a data analyst for a movie streaming service. You have been tasked with analyzing a dataset of movie ratings to determine which genres are the most popular among users.  \n",
    "\n",
    "The dataset contains the following columns:\n",
    "- **user_id:** Unique identifier for each user\n",
    "- **movie_id:** Unique identifier for each movie\n",
    "- **rating:** Rating given by the user to the movie (on a scale of 1-5)\n",
    "- **genre:** Genre of the movie (e.g. Action, Comedy, Drama, etc.)\n",
    "\n",
    "**Your task is to:**  \n",
    "- Load the dataset into a Pandas DataFrame\n",
    "- Group the data by genre and calculate the average rating for each genre\n",
    "- Sort the results in descending order by average rating\n",
    "\n",
    "**Data:**  \n",
    "You can use the following sample data to get started:  \n",
    "```\n",
    "user_id,movie_id,rating,genre\n",
    "1,101,4,Action\n",
    "1,102,3,Comedy\n",
    "2,101,5,Action\n",
    "2,103,4,Drama\n",
    "3,102,2,Comedy\n",
    "3,104,5,Action\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b06da085-cf53-43fb-ac61-a6e8a247a0cf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Python version 3.11.7 | packaged by Anaconda, Inc. | (main, Dec 15 2023, 18:05:47) [MSC v.1916 64 bit (AMD64)]\n",
      "Pandas version 2.2.1\n"
     ]
    }
   ],
   "source": [
    "# import libraries\n",
    "import pandas as pd\n",
    "import sys\n",
    "\n",
    "print('Python version ' + sys.version)\n",
    "print('Pandas version ' + pd.__version__)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "8e8e4449-e430-4d50-b58a-558545995a8e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>movie_id</th>\n",
       "      <th>rating</th>\n",
       "      <th>genre</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>101</td>\n",
       "      <td>4</td>\n",
       "      <td>Action</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>102</td>\n",
       "      <td>3</td>\n",
       "      <td>Comedy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>101</td>\n",
       "      <td>5</td>\n",
       "      <td>Action</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2</td>\n",
       "      <td>103</td>\n",
       "      <td>4</td>\n",
       "      <td>Drama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>3</td>\n",
       "      <td>102</td>\n",
       "      <td>2</td>\n",
       "      <td>Comedy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>3</td>\n",
       "      <td>104</td>\n",
       "      <td>5</td>\n",
       "      <td>Action</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user_id  movie_id  rating   genre\n",
       "0        1       101       4  Action\n",
       "1        1       102       3  Comedy\n",
       "2        2       101       5  Action\n",
       "3        2       103       4   Drama\n",
       "4        3       102       2  Comedy\n",
       "5        3       104       5  Action"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# let's try to copy the data using the clipboard\n",
    "df = pd.read_clipboard(sep=\",\")\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "50f812d2-ce5c-4b99-be25-dbb054c2b910",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 6 entries, 0 to 5\n",
      "Data columns (total 4 columns):\n",
      " #   Column    Non-Null Count  Dtype \n",
      "---  ------    --------------  ----- \n",
      " 0   user_id   6 non-null      int64 \n",
      " 1   movie_id  6 non-null      int64 \n",
      " 2   rating    6 non-null      int64 \n",
      " 3   genre     6 non-null      object\n",
      "dtypes: int64(3), object(1)\n",
      "memory usage: 324.0+ bytes\n"
     ]
    }
   ],
   "source": [
    "# check the data types\n",
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "c5cafbca-5f61-46ee-8d23-4122ab9e9b1e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "genre\n",
       "Action    4.666667\n",
       "Comedy    2.500000\n",
       "Drama     4.000000\n",
       "Name: rating, dtype: float64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create groupby object\n",
    "group = df.groupby('genre')\n",
    "\n",
    "# calculate average rating\n",
    "avg = group['rating'].mean()\n",
    "avg"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7cba4c79-c497-4325-a2e8-c2bac62dae2c",
   "metadata": {},
   "source": [
    "I decided to place it all in one line. Yes, it is a bit ugly.\n",
    "\n",
    "Here is what I did:  \n",
    "- I decided to merge the Series that contains the average ratings with the original dataframe via the column named genre\n",
    "- I renamed the Series so the column names were clear\n",
    "- I finally sorted the values descending "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "9d19ba2f-f31d-43ef-a57e-03675a67eba4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>movie_id</th>\n",
       "      <th>rating</th>\n",
       "      <th>average_rating</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>genre</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Action</th>\n",
       "      <td>1</td>\n",
       "      <td>101</td>\n",
       "      <td>4</td>\n",
       "      <td>4.666667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Action</th>\n",
       "      <td>2</td>\n",
       "      <td>101</td>\n",
       "      <td>5</td>\n",
       "      <td>4.666667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Action</th>\n",
       "      <td>3</td>\n",
       "      <td>104</td>\n",
       "      <td>5</td>\n",
       "      <td>4.666667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Drama</th>\n",
       "      <td>2</td>\n",
       "      <td>103</td>\n",
       "      <td>4</td>\n",
       "      <td>4.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Comedy</th>\n",
       "      <td>1</td>\n",
       "      <td>102</td>\n",
       "      <td>3</td>\n",
       "      <td>2.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Comedy</th>\n",
       "      <td>3</td>\n",
       "      <td>102</td>\n",
       "      <td>2</td>\n",
       "      <td>2.500000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        user_id  movie_id  rating  average_rating\n",
       "genre                                            \n",
       "Action        1       101       4        4.666667\n",
       "Action        2       101       5        4.666667\n",
       "Action        3       104       5        4.666667\n",
       "Drama         2       103       4        4.000000\n",
       "Comedy        1       102       3        2.500000\n",
       "Comedy        3       102       2        2.500000"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.set_index('genre').merge(avg.rename('average_rating'), on='genre').sort_values('average_rating', ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc5aa17a-7477-4f2a-b73f-93cf63a6b777",
   "metadata": {},
   "source": [
    "If all you needed to see was the averages..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "b8f83ffe-faeb-42ba-94ff-c77ba3bcb73a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "genre\n",
       "Action    4.666667\n",
       "Drama     4.000000\n",
       "Comedy    2.500000\n",
       "Name: rating, dtype: float64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "avg.sort_values(ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad584bc5-648e-4486-90ff-cec184716d9f",
   "metadata": {},
   "source": [
    "# Summary:\n",
    "This tutorial guided you through the analysis of a movie ratings dataset using Pandas. It covered loading data, grouping by genre, calculating average ratings, merging data, and sorting results.\n",
    "\n",
    "### Key Takeaways:\n",
    "- How to load data from a clipboard into a Pandas DataFrame using `pd.read_clipboard()`\n",
    "- Understanding data types using `df.info()`\n",
    "- Grouping data by a column (genre) using `df.groupby()`\n",
    "- Calculating the average rating for each group using `group['rating'].mean()`\n",
    "- Merging data from a Series into the original DataFrame using `df.merge()` or `df.set_index().merge()`\n",
    "- Renaming columns using `rename()`\n",
    "- Sorting data in descending order using `sort_values()`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b1d5ecb0-c6c6-493d-a582-a612a25a419a",
   "metadata": {},
   "source": [
    "<p class=\"text-muted\">This tutorial was created by <a href=\"https://www.hedaro.com\" target=\"_blank\"><strong>HEDARO</strong></a></p>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}