{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "764436ef",
   "metadata": {},
   "source": [
    "# Lesson 10 activity solution"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61077fda",
   "metadata": {},
   "source": [
    "## Learning objectives\n",
    "\n",
    "By the end of this activity, you will be able to:\n",
    "- Create Pandas Series and DataFrames\n",
    "- Load data from CSV files\n",
    "- Perform basic data exploration and analysis\n",
    "- Calculate descriptive statistics\n",
    "- Filter and manipulate DataFrame data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "af2f986e",
   "metadata": {},
   "source": [
    "## Tips\n",
    "\n",
    "- **Creating DataFrames:** Use `pd.DataFrame(dictionary)` where dictionary keys become column names\n",
    "- **Loading CSV files:** Use `pd.read_csv('filename.csv')`\n",
    "- **Basic exploration:** Use `.head()`, `.tail()`, `.info()`, `.describe()`, and `.shape`\n",
    "- **Filtering data:** Use conditions like `df[df['column'] > value]`\n",
    "- **Column selection:** Use `df['column_name']` or `df[['col1', 'col2']]`\n",
    "- **Adding columns:** Use `df['new_column'] = calculation`\n",
    "- **Statistics:** Use `.mean()`, `.max()`, `.min()`, `.sum()` methods\n",
    "\n",
    "**Remember:** Take your time with each step and test your code frequently!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "5136c48b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f4973eb2",
   "metadata": {},
   "source": [
    "---\n",
    "## Problem 1: creating your first DataFrame\n",
    "\n",
    "**Scenario:** You're working at a bookstore and need to create a simple inventory system.\n",
    "\n",
    "**Your Task:**\n",
    "1. Create a DataFrame called `books_df` with the following data:\n",
    "   - Book titles: [\"Python Basics\", \"Data Science Handbook\", \"Web Development Guide\"]\n",
    "   - Authors: [\"John Smith\", \"Jane Doe\", \"Mike Johnson\"]\n",
    "   - Prices: [29.99, 45.50, 35.00]\n",
    "   - Stock: [15, 8, 12]\n",
    "\n",
    "2. Display the DataFrame\n",
    "3. Print the shape of the DataFrame\n",
    "4. Display basic information about the DataFrame using `.info()`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "78d2b28c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Step 1: Create the DataFrame\n",
    "books_df = pd.DataFrame({\n",
    "    'title': ['Python Basics', 'Data Science Handbook', 'Web Development Guide'],\n",
    "    'author': ['John Smith', 'Jane Doe', 'Mike Johnson'],\n",
    "    'price': [29.99, 45.50, 35.00],\n",
    "    'stock': [15, 8, 12]\n",
    "})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "4a94147a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Books DataFrame:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>author</th>\n",
       "      <th>price</th>\n",
       "      <th>stock</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Python Basics</td>\n",
       "      <td>John Smith</td>\n",
       "      <td>29.99</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Data Science Handbook</td>\n",
       "      <td>Jane Doe</td>\n",
       "      <td>45.50</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Web Development Guide</td>\n",
       "      <td>Mike Johnson</td>\n",
       "      <td>35.00</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   title        author  price  stock\n",
       "0          Python Basics    John Smith  29.99     15\n",
       "1  Data Science Handbook      Jane Doe  45.50      8\n",
       "2  Web Development Guide  Mike Johnson  35.00     12"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Step 2: Display the DataFrame\n",
    "print('Books DataFrame:')\n",
    "books_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "9d28efd7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Shape of DataFrame: (3, 4)\n",
      "This means: 3 rows and 4 columns\n"
     ]
    }
   ],
   "source": [
    "# Step 3: Print the shape\n",
    "print(f'Shape of DataFrame: {books_df.shape}')\n",
    "print(f'This means: {books_df.shape[0]} rows and {books_df.shape[1]} columns')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "6ce8beda",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "DataFrame Information:\n",
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 3 entries, 0 to 2\n",
      "Data columns (total 4 columns):\n",
      " #   Column  Non-Null Count  Dtype  \n",
      "---  ------  --------------  -----  \n",
      " 0   title   3 non-null      object \n",
      " 1   author  3 non-null      object \n",
      " 2   price   3 non-null      float64\n",
      " 3   stock   3 non-null      int64  \n",
      "dtypes: float64(1), int64(1), object(2)\n",
      "memory usage: 228.0+ bytes\n"
     ]
    }
   ],
   "source": [
    "# Step 4: Display info\n",
    "print('DataFrame Information:')\n",
    "books_df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7e0ec8c0",
   "metadata": {},
   "source": [
    "---\n",
    "## Problem 2: loading and exploring data\n",
    "\n",
    "**Scenario:** You're a teacher analyzing student performance data.\n",
    "\n",
    "**Your Task:**\n",
    "1. Load the `students.csv` file into a DataFrame called `students_df`\n",
    "2. Display the first 3 rows using `.head()`\n",
    "3. Display the last 2 rows using `.tail()`\n",
    "4. Show descriptive statistics for numerical columns using `.describe()`\n",
    "5. Find the average grade of all students\n",
    "\n",
    "Students data file avalible for download here: [students.csv](https://media.githubusercontent.com/media/gperdrizet/fullstack-2605/refs/heads/main/data/students.csv)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c564957",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Students data loaded successfully!\n"
     ]
    }
   ],
   "source": [
    "# Step 1: Load the CSV file\n",
    "url = 'https://media.githubusercontent.com/media/gperdrizet/fullstack-2605/refs/heads/main/data/students.csv'\n",
    "students_df = pd.read_csv(url)\n",
    "print('Students data loaded successfully!')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "c13fce03",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "First 3 rows:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>age</th>\n",
       "      <th>grade</th>\n",
       "      <th>subject</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alice</td>\n",
       "      <td>20</td>\n",
       "      <td>85</td>\n",
       "      <td>Math</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>19</td>\n",
       "      <td>92</td>\n",
       "      <td>Science</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Charlie</td>\n",
       "      <td>21</td>\n",
       "      <td>78</td>\n",
       "      <td>Math</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      name  age  grade  subject\n",
       "0    Alice   20     85     Math\n",
       "1      Bob   19     92  Science\n",
       "2  Charlie   21     78     Math"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Step 2: Display first 3 rows\n",
    "print('First 3 rows:')\n",
    "students_df.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "d0981660",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Last 2 rows:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>age</th>\n",
       "      <th>grade</th>\n",
       "      <th>subject</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Grace</td>\n",
       "      <td>20</td>\n",
       "      <td>90</td>\n",
       "      <td>Math</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Henry</td>\n",
       "      <td>21</td>\n",
       "      <td>87</td>\n",
       "      <td>Science</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    name  age  grade  subject\n",
       "6  Grace   20     90     Math\n",
       "7  Henry   21     87  Science"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Step 3: Display last 2 rows\n",
    "print('Last 2 rows:')\n",
    "students_df.tail(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "7c2874f2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Descriptive Statistics:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>grade</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>8.000000</td>\n",
       "      <td>8.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>20.250000</td>\n",
       "      <td>87.125000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>1.035098</td>\n",
       "      <td>5.462535</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>19.000000</td>\n",
       "      <td>78.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>19.750000</td>\n",
       "      <td>84.250000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>20.000000</td>\n",
       "      <td>87.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>21.000000</td>\n",
       "      <td>90.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>22.000000</td>\n",
       "      <td>95.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             age      grade\n",
       "count   8.000000   8.000000\n",
       "mean   20.250000  87.125000\n",
       "std     1.035098   5.462535\n",
       "min    19.000000  78.000000\n",
       "25%    19.750000  84.250000\n",
       "50%    20.000000  87.500000\n",
       "75%    21.000000  90.500000\n",
       "max    22.000000  95.000000"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Step 4: Show descriptive statistics\n",
    "print('Descriptive Statistics:')\n",
    "students_df.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "7b26f3e4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average grade of all students: 87.12\n"
     ]
    }
   ],
   "source": [
    "# Step 5: Calculate average grade\n",
    "average_grade = students_df['grade'].mean()\n",
    "print(f'Average grade of all students: {average_grade:.2f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a79979e1",
   "metadata": {},
   "source": [
    "---\n",
    "## Problem 3: data filtering and selection\n",
    "\n",
    "**Scenario:** Continue working with the student data to find specific information.\n",
    "\n",
    "**Your Task:**\n",
    "1. Display only the 'name' and 'grade' columns from `students_df`\n",
    "2. Find all students who scored above 85\n",
    "3. Find all students studying 'Math'\n",
    "4. Find the highest grade in the dataset\n",
    "5. Count how many students are in each subject"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "f5ca71af",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Name and Grade columns:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>grade</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alice</td>\n",
       "      <td>85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Charlie</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Diana</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Eva</td>\n",
       "      <td>95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Frank</td>\n",
       "      <td>82</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Grace</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Henry</td>\n",
       "      <td>87</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      name  grade\n",
       "0    Alice     85\n",
       "1      Bob     92\n",
       "2  Charlie     78\n",
       "3    Diana     88\n",
       "4      Eva     95\n",
       "5    Frank     82\n",
       "6    Grace     90\n",
       "7    Henry     87"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Step 1: Display only name and grade columns\n",
    "print('Name and Grade columns:')\n",
    "grade_df = students_df[['name', 'grade']]\n",
    "grade_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "46954789",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Students with grades above 85:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>age</th>\n",
       "      <th>grade</th>\n",
       "      <th>subject</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>19</td>\n",
       "      <td>92</td>\n",
       "      <td>Science</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Diana</td>\n",
       "      <td>20</td>\n",
       "      <td>88</td>\n",
       "      <td>Science</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Eva</td>\n",
       "      <td>19</td>\n",
       "      <td>95</td>\n",
       "      <td>Math</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Grace</td>\n",
       "      <td>20</td>\n",
       "      <td>90</td>\n",
       "      <td>Math</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Henry</td>\n",
       "      <td>21</td>\n",
       "      <td>87</td>\n",
       "      <td>Science</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    name  age  grade  subject\n",
       "1    Bob   19     92  Science\n",
       "3  Diana   20     88  Science\n",
       "4    Eva   19     95     Math\n",
       "6  Grace   20     90     Math\n",
       "7  Henry   21     87  Science"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Step 2: Students with grades above 85\n",
    "print('Students with grades above 85:')\n",
    "high_performers_df = students_df[students_df['grade'] > 85]\n",
    "high_performers_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "f0481e8c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Students studying Math:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>age</th>\n",
       "      <th>grade</th>\n",
       "      <th>subject</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alice</td>\n",
       "      <td>20</td>\n",
       "      <td>85</td>\n",
       "      <td>Math</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Charlie</td>\n",
       "      <td>21</td>\n",
       "      <td>78</td>\n",
       "      <td>Math</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Eva</td>\n",
       "      <td>19</td>\n",
       "      <td>95</td>\n",
       "      <td>Math</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Grace</td>\n",
       "      <td>20</td>\n",
       "      <td>90</td>\n",
       "      <td>Math</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      name  age  grade subject\n",
       "0    Alice   20     85    Math\n",
       "2  Charlie   21     78    Math\n",
       "4      Eva   19     95    Math\n",
       "6    Grace   20     90    Math"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Step 3: Students studying Math\n",
    "print('Students studying Math:')\n",
    "math_students_df = students_df[students_df['subject'] == 'Math']\n",
    "math_students_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "94892c4d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Highest grade: 95\n"
     ]
    }
   ],
   "source": [
    "# Step 4: Highest grade\n",
    "highest_grade = students_df['grade'].max()\n",
    "print(f'Highest grade: {highest_grade}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "e6a1491a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of students in each subject:\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "subject\n",
       "Math       4\n",
       "Science    4\n",
       "Name: count, dtype: int64"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Step 5: Count students by subject\n",
    "print('Number of students in each subject:')\n",
    "subject_counts = students_df['subject'].value_counts()\n",
    "subject_counts"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5cc9aa70",
   "metadata": {},
   "source": [
    "---\n",
    "## Problem 4: sales data analysis\n",
    "\n",
    "**Scenario:** You're analyzing sales data for an electronics store.\n",
    "\n",
    "**Your Task:**\n",
    "1. Load the `sales.csv` file into a DataFrame called `sales_df`\n",
    "2. Calculate the total value for each product (price × quantity)\n",
    "3. Add this as a new column called 'total_value' to the DataFrame\n",
    "4. Find the product with the highest total value\n",
    "5. Calculate the grand total of all sales\n",
    "\n",
    "Sales data file avalible for download here: [sales.csv](https://media.githubusercontent.com/media/gperdrizet/fullstack-2605/refs/heads/main/data/sales.csv)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9d90ac54",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sales data loaded:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>product</th>\n",
       "      <th>price</th>\n",
       "      <th>quantity</th>\n",
       "      <th>date</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Laptop</td>\n",
       "      <td>999.99</td>\n",
       "      <td>5</td>\n",
       "      <td>2024-01-15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mouse</td>\n",
       "      <td>25.50</td>\n",
       "      <td>20</td>\n",
       "      <td>2024-01-15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Keyboard</td>\n",
       "      <td>75.00</td>\n",
       "      <td>15</td>\n",
       "      <td>2024-01-16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Monitor</td>\n",
       "      <td>299.99</td>\n",
       "      <td>8</td>\n",
       "      <td>2024-01-16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Headphones</td>\n",
       "      <td>59.99</td>\n",
       "      <td>12</td>\n",
       "      <td>2024-01-17</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      product   price  quantity        date\n",
       "0      Laptop  999.99         5  2024-01-15\n",
       "1       Mouse   25.50        20  2024-01-15\n",
       "2    Keyboard   75.00        15  2024-01-16\n",
       "3     Monitor  299.99         8  2024-01-16\n",
       "4  Headphones   59.99        12  2024-01-17"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Step 1: Load the sales data\n",
    "url = 'https://media.githubusercontent.com/media/gperdrizet/fullstack-2605/refs/heads/main/data/sales.csv'\n",
    "sales_df = pd.read_csv(url)\n",
    "print('Sales data loaded:')\n",
    "sales_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "637bbfe2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sales data with total_value column:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>product</th>\n",
       "      <th>price</th>\n",
       "      <th>quantity</th>\n",
       "      <th>date</th>\n",
       "      <th>total_value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Laptop</td>\n",
       "      <td>999.99</td>\n",
       "      <td>5</td>\n",
       "      <td>2024-01-15</td>\n",
       "      <td>4999.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mouse</td>\n",
       "      <td>25.50</td>\n",
       "      <td>20</td>\n",
       "      <td>2024-01-15</td>\n",
       "      <td>510.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Keyboard</td>\n",
       "      <td>75.00</td>\n",
       "      <td>15</td>\n",
       "      <td>2024-01-16</td>\n",
       "      <td>1125.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Monitor</td>\n",
       "      <td>299.99</td>\n",
       "      <td>8</td>\n",
       "      <td>2024-01-16</td>\n",
       "      <td>2399.92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Headphones</td>\n",
       "      <td>59.99</td>\n",
       "      <td>12</td>\n",
       "      <td>2024-01-17</td>\n",
       "      <td>719.88</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      product   price  quantity        date  total_value\n",
       "0      Laptop  999.99         5  2024-01-15      4999.95\n",
       "1       Mouse   25.50        20  2024-01-15       510.00\n",
       "2    Keyboard   75.00        15  2024-01-16      1125.00\n",
       "3     Monitor  299.99         8  2024-01-16      2399.92\n",
       "4  Headphones   59.99        12  2024-01-17       719.88"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Step 2 & 3: Calculate total value and add as new column\n",
    "sales_df['total_value'] = sales_df['price'] * sales_df['quantity']\n",
    "print('Sales data with total_value column:')\n",
    "sales_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "d0cdc7eb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Product with highest total value:\n",
      "Product: Laptop\n",
      "Total Value: $4999.95\n"
     ]
    }
   ],
   "source": [
    "# Step 4: Find product with highest total value\n",
    "max_value_idx = sales_df['total_value'].idxmax()\n",
    "top_product = sales_df.loc[max_value_idx]\n",
    "print(f'Product with highest total value:')\n",
    "print(f'Product: {top_product['product']}')\n",
    "print(f'Total Value: ${top_product['total_value']:.2f}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "27c4e204",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>product</th>\n",
       "      <th>price</th>\n",
       "      <th>quantity</th>\n",
       "      <th>date</th>\n",
       "      <th>total_value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Laptop</td>\n",
       "      <td>999.99</td>\n",
       "      <td>5</td>\n",
       "      <td>2024-01-15</td>\n",
       "      <td>4999.95</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  product   price  quantity        date  total_value\n",
       "0  Laptop  999.99         5  2024-01-15      4999.95"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Alternative method for Step 4:\n",
    "top_product_row = sales_df[sales_df['total_value'] == sales_df['total_value'].max()]\n",
    "top_product_row"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "552c89b8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Grand total of all sales: $9754.75\n"
     ]
    }
   ],
   "source": [
    "# Step 5: Calculate grand total of all sales\n",
    "grand_total = sales_df['total_value'].sum()\n",
    "print(f'Grand total of all sales: ${grand_total:.2f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ed628ab",
   "metadata": {},
   "source": [
    "---\n",
    "## Problem 5: series creation and manipulation\n",
    "\n",
    "**Scenario:** Create and work with Pandas Series for daily temperature data.\n",
    "\n",
    "**Your Task:**\n",
    "1. Create a Pandas Series called `temperatures` with the following data:\n",
    "   - Values: [22, 25, 23, 26, 24, 27, 25]\n",
    "   - Index: ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']\n",
    "2. Find the temperature for Wednesday\n",
    "3. Find days with temperature above 24 degrees\n",
    "4. Calculate the average temperature for the week\n",
    "5. Find the day with the highest temperature"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "4babcd5e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Temperature Series:\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Mon    22\n",
       "Tue    25\n",
       "Wed    23\n",
       "Thu    26\n",
       "Fri    24\n",
       "Sat    27\n",
       "Sun    25\n",
       "Name: Temperature (°C), dtype: int64"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Step 1: Create the temperature series\n",
    "temperatures = pd.Series(\n",
    "    [22, 25, 23, 26, 24, 27, 25],\n",
    "    index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'],\n",
    "    name='Temperature (°C)'\n",
    ")\n",
    "print('Temperature Series:')\n",
    "temperatures"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "9de80523",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Temperature on Wednesday: 23°C\n"
     ]
    }
   ],
   "source": [
    "# Step 2: Temperature for Wednesday\n",
    "wed_temp = temperatures['Wed']\n",
    "print(f'Temperature on Wednesday: {wed_temp}°C')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "f8e8298d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Days with temperature above 24°C:\n",
      "Tue    25\n",
      "Thu    26\n",
      "Sat    27\n",
      "Sun    25\n",
      "Name: Temperature (°C), dtype: int64\n"
     ]
    }
   ],
   "source": [
    "# Step 3: Days with temperature above 24\n",
    "hot_days = temperatures[temperatures > 24]\n",
    "print('Days with temperature above 24°C:')\n",
    "print(hot_days)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "50564792",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average temperature for the week: 24.6°C\n"
     ]
    }
   ],
   "source": [
    "# Step 4: Average temperature\n",
    "avg_temp = temperatures.mean()\n",
    "print(f'Average temperature for the week: {avg_temp:.1f}°C')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "04188089",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hottest day: Sat with 27°C\n"
     ]
    }
   ],
   "source": [
    "# Step 5: Day with highest temperature\n",
    "hottest_day = temperatures.idxmax()\n",
    "hottest_temp = temperatures.max()\n",
    "print(f'Hottest day: {hottest_day} with {hottest_temp}°C')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89fdce0f",
   "metadata": {},
   "source": [
    "---\n",
    "## Reflection questions\n",
    "\n",
    "**1. What is the difference between a Pandas Series and a DataFrame?**\n",
    "\n",
    "*Sample answer:* A Pandas Series is a one-dimensional labeled array that can hold any data type (like a single column). A DataFrame is a two-dimensional labeled data structure with rows and columns (like a table or spreadsheet). You can think of a DataFrame as a collection of Series objects that share the same index. Each column in a DataFrame is essentially a Series.\n",
    "\n",
    "**2. What are the advantages of using Pandas over working with plain Python lists and dictionaries?**\n",
    "\n",
    "*Sample answer:* Pandas offers several advantages:\n",
    "- **Labeled data:** Rows and columns have meaningful names, making data easier to understand\n",
    "- **Built-in operations:** Methods like `.mean()`, `.describe()`, `.filter()` are optimized and easy to use\n",
    "- **Handling missing data:** Pandas has tools to detect and handle missing values\n",
    "- **Easy data import/export:** Simple functions to read/write CSV, Excel, SQL databases, etc.\n",
    "- **Performance:** Built on NumPy, so it's much faster for large datasets\n",
    "- **Data alignment:** Automatically aligns data based on labels when performing operations\n",
    "\n",
    "**3. Describe a real-world scenario where you might use the filtering techniques you learned in Problem 3.**\n",
    "\n",
    "*Sample answer:* In an e-commerce business, I might use filtering to:\n",
    "- Find all orders above $100 for premium customer analysis\n",
    "- Filter products by category to analyze sales by department\n",
    "- Identify customers who haven't made a purchase in 90 days for targeted marketing\n",
    "- Find all transactions from a specific date range for monthly reports\n",
    "- Filter inventory items with stock below a threshold to trigger reordering\n",
    "\n",
    "**4. What did you find most challenging about working with Pandas in this activity?**\n",
    "\n",
    "*Sample answer:* Understanding the difference between single and double brackets for column selection was initially confusing. `df['column']` returns a Series, while `df[['column']]` returns a DataFrame. Also, remembering when to use `.loc[]` vs `.iloc[]` vs direct indexing took some practice. The boolean filtering syntax with nested brackets like `df[df['column'] > value]` also required careful attention to get right."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}