{ "cells": [ { "cell_type": "markdown", "id": "764436ef", "metadata": {}, "source": [ "# Lesson 10 activity solution" ] }, { "cell_type": "markdown", "id": "61077fda", "metadata": {}, "source": [ "## Learning objectives\n", "\n", "By the end of this activity, you will be able to:\n", "- Create Pandas Series and DataFrames\n", "- Load data from CSV files\n", "- Perform basic data exploration and analysis\n", "- Calculate descriptive statistics\n", "- Filter and manipulate DataFrame data" ] }, { "cell_type": "markdown", "id": "af2f986e", "metadata": {}, "source": [ "## Tips\n", "\n", "- **Creating DataFrames:** Use `pd.DataFrame(dictionary)` where dictionary keys become column names\n", "- **Loading CSV files:** Use `pd.read_csv('filename.csv')`\n", "- **Basic exploration:** Use `.head()`, `.tail()`, `.info()`, `.describe()`, and `.shape`\n", "- **Filtering data:** Use conditions like `df[df['column'] > value]`\n", "- **Column selection:** Use `df['column_name']` or `df[['col1', 'col2']]`\n", "- **Adding columns:** Use `df['new_column'] = calculation`\n", "- **Statistics:** Use `.mean()`, `.max()`, `.min()`, `.sum()` methods\n", "\n", "**Remember:** Take your time with each step and test your code frequently!" ] }, { "cell_type": "code", "execution_count": 1, "id": "5136c48b", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "id": "f4973eb2", "metadata": {}, "source": [ "---\n", "## Problem 1: creating your first DataFrame\n", "\n", "**Scenario:** You're working at a bookstore and need to create a simple inventory system.\n", "\n", "**Your Task:**\n", "1. Create a DataFrame called `books_df` with the following data:\n", " - Book titles: [\"Python Basics\", \"Data Science Handbook\", \"Web Development Guide\"]\n", " - Authors: [\"John Smith\", \"Jane Doe\", \"Mike Johnson\"]\n", " - Prices: [29.99, 45.50, 35.00]\n", " - Stock: [15, 8, 12]\n", "\n", "2. Display the DataFrame\n", "3. Print the shape of the DataFrame\n", "4. Display basic information about the DataFrame using `.info()`" ] }, { "cell_type": "code", "execution_count": 2, "id": "78d2b28c", "metadata": {}, "outputs": [], "source": [ "# Step 1: Create the DataFrame\n", "books_df = pd.DataFrame({\n", " 'title': ['Python Basics', 'Data Science Handbook', 'Web Development Guide'],\n", " 'author': ['John Smith', 'Jane Doe', 'Mike Johnson'],\n", " 'price': [29.99, 45.50, 35.00],\n", " 'stock': [15, 8, 12]\n", "})" ] }, { "cell_type": "code", "execution_count": 3, "id": "4a94147a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Books DataFrame:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titleauthorpricestock
0Python BasicsJohn Smith29.9915
1Data Science HandbookJane Doe45.508
2Web Development GuideMike Johnson35.0012
\n", "
" ], "text/plain": [ " title author price stock\n", "0 Python Basics John Smith 29.99 15\n", "1 Data Science Handbook Jane Doe 45.50 8\n", "2 Web Development Guide Mike Johnson 35.00 12" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Step 2: Display the DataFrame\n", "print('Books DataFrame:')\n", "books_df" ] }, { "cell_type": "code", "execution_count": 4, "id": "9d28efd7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Shape of DataFrame: (3, 4)\n", "This means: 3 rows and 4 columns\n" ] } ], "source": [ "# Step 3: Print the shape\n", "print(f'Shape of DataFrame: {books_df.shape}')\n", "print(f'This means: {books_df.shape[0]} rows and {books_df.shape[1]} columns')" ] }, { "cell_type": "code", "execution_count": 5, "id": "6ce8beda", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DataFrame Information:\n", "\n", "RangeIndex: 3 entries, 0 to 2\n", "Data columns (total 4 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 title 3 non-null object \n", " 1 author 3 non-null object \n", " 2 price 3 non-null float64\n", " 3 stock 3 non-null int64 \n", "dtypes: float64(1), int64(1), object(2)\n", "memory usage: 228.0+ bytes\n" ] } ], "source": [ "# Step 4: Display info\n", "print('DataFrame Information:')\n", "books_df.info()" ] }, { "cell_type": "markdown", "id": "7e0ec8c0", "metadata": {}, "source": [ "---\n", "## Problem 2: loading and exploring data\n", "\n", "**Scenario:** You're a teacher analyzing student performance data.\n", "\n", "**Your Task:**\n", "1. Load the `students.csv` file into a DataFrame called `students_df`\n", "2. Display the first 3 rows using `.head()`\n", "3. Display the last 2 rows using `.tail()`\n", "4. Show descriptive statistics for numerical columns using `.describe()`\n", "5. Find the average grade of all students\n", "\n", "Students data file avalible for download here: [students.csv](https://media.githubusercontent.com/media/gperdrizet/fullstack-2605/refs/heads/main/data/students.csv)" ] }, { "cell_type": "code", "execution_count": null, "id": "5c564957", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Students data loaded successfully!\n" ] } ], "source": [ "# Step 1: Load the CSV file\n", "url = 'https://media.githubusercontent.com/media/gperdrizet/fullstack-2605/refs/heads/main/data/students.csv'\n", "students_df = pd.read_csv(url)\n", "print('Students data loaded successfully!')" ] }, { "cell_type": "code", "execution_count": 7, "id": "c13fce03", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First 3 rows:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameagegradesubject
0Alice2085Math
1Bob1992Science
2Charlie2178Math
\n", "
" ], "text/plain": [ " name age grade subject\n", "0 Alice 20 85 Math\n", "1 Bob 19 92 Science\n", "2 Charlie 21 78 Math" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Step 2: Display first 3 rows\n", "print('First 3 rows:')\n", "students_df.head(3)" ] }, { "cell_type": "code", "execution_count": 8, "id": "d0981660", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Last 2 rows:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameagegradesubject
6Grace2090Math
7Henry2187Science
\n", "
" ], "text/plain": [ " name age grade subject\n", "6 Grace 20 90 Math\n", "7 Henry 21 87 Science" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Step 3: Display last 2 rows\n", "print('Last 2 rows:')\n", "students_df.tail(2)" ] }, { "cell_type": "code", "execution_count": 9, "id": "7c2874f2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Descriptive Statistics:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agegrade
count8.0000008.000000
mean20.25000087.125000
std1.0350985.462535
min19.00000078.000000
25%19.75000084.250000
50%20.00000087.500000
75%21.00000090.500000
max22.00000095.000000
\n", "
" ], "text/plain": [ " age grade\n", "count 8.000000 8.000000\n", "mean 20.250000 87.125000\n", "std 1.035098 5.462535\n", "min 19.000000 78.000000\n", "25% 19.750000 84.250000\n", "50% 20.000000 87.500000\n", "75% 21.000000 90.500000\n", "max 22.000000 95.000000" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Step 4: Show descriptive statistics\n", "print('Descriptive Statistics:')\n", "students_df.describe()" ] }, { "cell_type": "code", "execution_count": 10, "id": "7b26f3e4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Average grade of all students: 87.12\n" ] } ], "source": [ "# Step 5: Calculate average grade\n", "average_grade = students_df['grade'].mean()\n", "print(f'Average grade of all students: {average_grade:.2f}')" ] }, { "cell_type": "markdown", "id": "a79979e1", "metadata": {}, "source": [ "---\n", "## Problem 3: data filtering and selection\n", "\n", "**Scenario:** Continue working with the student data to find specific information.\n", "\n", "**Your Task:**\n", "1. Display only the 'name' and 'grade' columns from `students_df`\n", "2. Find all students who scored above 85\n", "3. Find all students studying 'Math'\n", "4. Find the highest grade in the dataset\n", "5. Count how many students are in each subject" ] }, { "cell_type": "code", "execution_count": 11, "id": "f5ca71af", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Name and Grade columns:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namegrade
0Alice85
1Bob92
2Charlie78
3Diana88
4Eva95
5Frank82
6Grace90
7Henry87
\n", "
" ], "text/plain": [ " name grade\n", "0 Alice 85\n", "1 Bob 92\n", "2 Charlie 78\n", "3 Diana 88\n", "4 Eva 95\n", "5 Frank 82\n", "6 Grace 90\n", "7 Henry 87" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Step 1: Display only name and grade columns\n", "print('Name and Grade columns:')\n", "grade_df = students_df[['name', 'grade']]\n", "grade_df" ] }, { "cell_type": "code", "execution_count": 12, "id": "46954789", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Students with grades above 85:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameagegradesubject
1Bob1992Science
3Diana2088Science
4Eva1995Math
6Grace2090Math
7Henry2187Science
\n", "
" ], "text/plain": [ " name age grade subject\n", "1 Bob 19 92 Science\n", "3 Diana 20 88 Science\n", "4 Eva 19 95 Math\n", "6 Grace 20 90 Math\n", "7 Henry 21 87 Science" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Step 2: Students with grades above 85\n", "print('Students with grades above 85:')\n", "high_performers_df = students_df[students_df['grade'] > 85]\n", "high_performers_df" ] }, { "cell_type": "code", "execution_count": 13, "id": "f0481e8c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Students studying Math:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameagegradesubject
0Alice2085Math
2Charlie2178Math
4Eva1995Math
6Grace2090Math
\n", "
" ], "text/plain": [ " name age grade subject\n", "0 Alice 20 85 Math\n", "2 Charlie 21 78 Math\n", "4 Eva 19 95 Math\n", "6 Grace 20 90 Math" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Step 3: Students studying Math\n", "print('Students studying Math:')\n", "math_students_df = students_df[students_df['subject'] == 'Math']\n", "math_students_df" ] }, { "cell_type": "code", "execution_count": 14, "id": "94892c4d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Highest grade: 95\n" ] } ], "source": [ "# Step 4: Highest grade\n", "highest_grade = students_df['grade'].max()\n", "print(f'Highest grade: {highest_grade}')" ] }, { "cell_type": "code", "execution_count": 15, "id": "e6a1491a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of students in each subject:\n" ] }, { "data": { "text/plain": [ "subject\n", "Math 4\n", "Science 4\n", "Name: count, dtype: int64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Step 5: Count students by subject\n", "print('Number of students in each subject:')\n", "subject_counts = students_df['subject'].value_counts()\n", "subject_counts" ] }, { "cell_type": "markdown", "id": "5cc9aa70", "metadata": {}, "source": [ "---\n", "## Problem 4: sales data analysis\n", "\n", "**Scenario:** You're analyzing sales data for an electronics store.\n", "\n", "**Your Task:**\n", "1. Load the `sales.csv` file into a DataFrame called `sales_df`\n", "2. Calculate the total value for each product (price × quantity)\n", "3. Add this as a new column called 'total_value' to the DataFrame\n", "4. Find the product with the highest total value\n", "5. Calculate the grand total of all sales\n", "\n", "Sales data file avalible for download here: [sales.csv](https://media.githubusercontent.com/media/gperdrizet/fullstack-2605/refs/heads/main/data/sales.csv)" ] }, { "cell_type": "code", "execution_count": null, "id": "9d90ac54", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sales data loaded:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
productpricequantitydate
0Laptop999.9952024-01-15
1Mouse25.50202024-01-15
2Keyboard75.00152024-01-16
3Monitor299.9982024-01-16
4Headphones59.99122024-01-17
\n", "
" ], "text/plain": [ " product price quantity date\n", "0 Laptop 999.99 5 2024-01-15\n", "1 Mouse 25.50 20 2024-01-15\n", "2 Keyboard 75.00 15 2024-01-16\n", "3 Monitor 299.99 8 2024-01-16\n", "4 Headphones 59.99 12 2024-01-17" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Step 1: Load the sales data\n", "url = 'https://media.githubusercontent.com/media/gperdrizet/fullstack-2605/refs/heads/main/data/sales.csv'\n", "sales_df = pd.read_csv(url)\n", "print('Sales data loaded:')\n", "sales_df" ] }, { "cell_type": "code", "execution_count": 17, "id": "637bbfe2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sales data with total_value column:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
productpricequantitydatetotal_value
0Laptop999.9952024-01-154999.95
1Mouse25.50202024-01-15510.00
2Keyboard75.00152024-01-161125.00
3Monitor299.9982024-01-162399.92
4Headphones59.99122024-01-17719.88
\n", "
" ], "text/plain": [ " product price quantity date total_value\n", "0 Laptop 999.99 5 2024-01-15 4999.95\n", "1 Mouse 25.50 20 2024-01-15 510.00\n", "2 Keyboard 75.00 15 2024-01-16 1125.00\n", "3 Monitor 299.99 8 2024-01-16 2399.92\n", "4 Headphones 59.99 12 2024-01-17 719.88" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Step 2 & 3: Calculate total value and add as new column\n", "sales_df['total_value'] = sales_df['price'] * sales_df['quantity']\n", "print('Sales data with total_value column:')\n", "sales_df" ] }, { "cell_type": "code", "execution_count": 18, "id": "d0cdc7eb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Product with highest total value:\n", "Product: Laptop\n", "Total Value: $4999.95\n" ] } ], "source": [ "# Step 4: Find product with highest total value\n", "max_value_idx = sales_df['total_value'].idxmax()\n", "top_product = sales_df.loc[max_value_idx]\n", "print(f'Product with highest total value:')\n", "print(f'Product: {top_product['product']}')\n", "print(f'Total Value: ${top_product['total_value']:.2f}')" ] }, { "cell_type": "code", "execution_count": 19, "id": "27c4e204", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
productpricequantitydatetotal_value
0Laptop999.9952024-01-154999.95
\n", "
" ], "text/plain": [ " product price quantity date total_value\n", "0 Laptop 999.99 5 2024-01-15 4999.95" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Alternative method for Step 4:\n", "top_product_row = sales_df[sales_df['total_value'] == sales_df['total_value'].max()]\n", "top_product_row" ] }, { "cell_type": "code", "execution_count": 20, "id": "552c89b8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Grand total of all sales: $9754.75\n" ] } ], "source": [ "# Step 5: Calculate grand total of all sales\n", "grand_total = sales_df['total_value'].sum()\n", "print(f'Grand total of all sales: ${grand_total:.2f}')" ] }, { "cell_type": "markdown", "id": "2ed628ab", "metadata": {}, "source": [ "---\n", "## Problem 5: series creation and manipulation\n", "\n", "**Scenario:** Create and work with Pandas Series for daily temperature data.\n", "\n", "**Your Task:**\n", "1. Create a Pandas Series called `temperatures` with the following data:\n", " - Values: [22, 25, 23, 26, 24, 27, 25]\n", " - Index: ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']\n", "2. Find the temperature for Wednesday\n", "3. Find days with temperature above 24 degrees\n", "4. Calculate the average temperature for the week\n", "5. Find the day with the highest temperature" ] }, { "cell_type": "code", "execution_count": 21, "id": "4babcd5e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Temperature Series:\n" ] }, { "data": { "text/plain": [ "Mon 22\n", "Tue 25\n", "Wed 23\n", "Thu 26\n", "Fri 24\n", "Sat 27\n", "Sun 25\n", "Name: Temperature (°C), dtype: int64" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Step 1: Create the temperature series\n", "temperatures = pd.Series(\n", " [22, 25, 23, 26, 24, 27, 25],\n", " index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'],\n", " name='Temperature (°C)'\n", ")\n", "print('Temperature Series:')\n", "temperatures" ] }, { "cell_type": "code", "execution_count": 22, "id": "9de80523", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Temperature on Wednesday: 23°C\n" ] } ], "source": [ "# Step 2: Temperature for Wednesday\n", "wed_temp = temperatures['Wed']\n", "print(f'Temperature on Wednesday: {wed_temp}°C')" ] }, { "cell_type": "code", "execution_count": 23, "id": "f8e8298d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Days with temperature above 24°C:\n", "Tue 25\n", "Thu 26\n", "Sat 27\n", "Sun 25\n", "Name: Temperature (°C), dtype: int64\n" ] } ], "source": [ "# Step 3: Days with temperature above 24\n", "hot_days = temperatures[temperatures > 24]\n", "print('Days with temperature above 24°C:')\n", "print(hot_days)" ] }, { "cell_type": "code", "execution_count": 24, "id": "50564792", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Average temperature for the week: 24.6°C\n" ] } ], "source": [ "# Step 4: Average temperature\n", "avg_temp = temperatures.mean()\n", "print(f'Average temperature for the week: {avg_temp:.1f}°C')" ] }, { "cell_type": "code", "execution_count": 25, "id": "04188089", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hottest day: Sat with 27°C\n" ] } ], "source": [ "# Step 5: Day with highest temperature\n", "hottest_day = temperatures.idxmax()\n", "hottest_temp = temperatures.max()\n", "print(f'Hottest day: {hottest_day} with {hottest_temp}°C')" ] }, { "cell_type": "markdown", "id": "89fdce0f", "metadata": {}, "source": [ "---\n", "## Reflection questions\n", "\n", "**1. What is the difference between a Pandas Series and a DataFrame?**\n", "\n", "*Sample answer:* A Pandas Series is a one-dimensional labeled array that can hold any data type (like a single column). A DataFrame is a two-dimensional labeled data structure with rows and columns (like a table or spreadsheet). You can think of a DataFrame as a collection of Series objects that share the same index. Each column in a DataFrame is essentially a Series.\n", "\n", "**2. What are the advantages of using Pandas over working with plain Python lists and dictionaries?**\n", "\n", "*Sample answer:* Pandas offers several advantages:\n", "- **Labeled data:** Rows and columns have meaningful names, making data easier to understand\n", "- **Built-in operations:** Methods like `.mean()`, `.describe()`, `.filter()` are optimized and easy to use\n", "- **Handling missing data:** Pandas has tools to detect and handle missing values\n", "- **Easy data import/export:** Simple functions to read/write CSV, Excel, SQL databases, etc.\n", "- **Performance:** Built on NumPy, so it's much faster for large datasets\n", "- **Data alignment:** Automatically aligns data based on labels when performing operations\n", "\n", "**3. Describe a real-world scenario where you might use the filtering techniques you learned in Problem 3.**\n", "\n", "*Sample answer:* In an e-commerce business, I might use filtering to:\n", "- Find all orders above $100 for premium customer analysis\n", "- Filter products by category to analyze sales by department\n", "- Identify customers who haven't made a purchase in 90 days for targeted marketing\n", "- Find all transactions from a specific date range for monthly reports\n", "- Filter inventory items with stock below a threshold to trigger reordering\n", "\n", "**4. What did you find most challenging about working with Pandas in this activity?**\n", "\n", "*Sample answer:* Understanding the difference between single and double brackets for column selection was initially confusing. `df['column']` returns a Series, while `df[['column']]` returns a DataFrame. Also, remembering when to use `.loc[]` vs `.iloc[]` vs direct indexing took some practice. The boolean filtering syntax with nested brackets like `df[df['column'] > value]` also required careful attention to get right." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.12" } }, "nbformat": 4, "nbformat_minor": 5 }