{
"cells": [
{
"cell_type": "markdown",
"id": "764436ef",
"metadata": {},
"source": [
"# Lesson 10 activity solution"
]
},
{
"cell_type": "markdown",
"id": "61077fda",
"metadata": {},
"source": [
"## Learning objectives\n",
"\n",
"By the end of this activity, you will be able to:\n",
"- Create Pandas Series and DataFrames\n",
"- Load data from CSV files\n",
"- Perform basic data exploration and analysis\n",
"- Calculate descriptive statistics\n",
"- Filter and manipulate DataFrame data"
]
},
{
"cell_type": "markdown",
"id": "af2f986e",
"metadata": {},
"source": [
"## Tips\n",
"\n",
"- **Creating DataFrames:** Use `pd.DataFrame(dictionary)` where dictionary keys become column names\n",
"- **Loading CSV files:** Use `pd.read_csv('filename.csv')`\n",
"- **Basic exploration:** Use `.head()`, `.tail()`, `.info()`, `.describe()`, and `.shape`\n",
"- **Filtering data:** Use conditions like `df[df['column'] > value]`\n",
"- **Column selection:** Use `df['column_name']` or `df[['col1', 'col2']]`\n",
"- **Adding columns:** Use `df['new_column'] = calculation`\n",
"- **Statistics:** Use `.mean()`, `.max()`, `.min()`, `.sum()` methods\n",
"\n",
"**Remember:** Take your time with each step and test your code frequently!"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "5136c48b",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "f4973eb2",
"metadata": {},
"source": [
"---\n",
"## Problem 1: creating your first DataFrame\n",
"\n",
"**Scenario:** You're working at a bookstore and need to create a simple inventory system.\n",
"\n",
"**Your Task:**\n",
"1. Create a DataFrame called `books_df` with the following data:\n",
" - Book titles: [\"Python Basics\", \"Data Science Handbook\", \"Web Development Guide\"]\n",
" - Authors: [\"John Smith\", \"Jane Doe\", \"Mike Johnson\"]\n",
" - Prices: [29.99, 45.50, 35.00]\n",
" - Stock: [15, 8, 12]\n",
"\n",
"2. Display the DataFrame\n",
"3. Print the shape of the DataFrame\n",
"4. Display basic information about the DataFrame using `.info()`"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "78d2b28c",
"metadata": {},
"outputs": [],
"source": [
"# Step 1: Create the DataFrame\n",
"books_df = pd.DataFrame({\n",
" 'title': ['Python Basics', 'Data Science Handbook', 'Web Development Guide'],\n",
" 'author': ['John Smith', 'Jane Doe', 'Mike Johnson'],\n",
" 'price': [29.99, 45.50, 35.00],\n",
" 'stock': [15, 8, 12]\n",
"})"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "4a94147a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Books DataFrame:\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" title | \n",
" author | \n",
" price | \n",
" stock | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" Python Basics | \n",
" John Smith | \n",
" 29.99 | \n",
" 15 | \n",
"
\n",
" \n",
" | 1 | \n",
" Data Science Handbook | \n",
" Jane Doe | \n",
" 45.50 | \n",
" 8 | \n",
"
\n",
" \n",
" | 2 | \n",
" Web Development Guide | \n",
" Mike Johnson | \n",
" 35.00 | \n",
" 12 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" title author price stock\n",
"0 Python Basics John Smith 29.99 15\n",
"1 Data Science Handbook Jane Doe 45.50 8\n",
"2 Web Development Guide Mike Johnson 35.00 12"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Step 2: Display the DataFrame\n",
"print('Books DataFrame:')\n",
"books_df"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "9d28efd7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Shape of DataFrame: (3, 4)\n",
"This means: 3 rows and 4 columns\n"
]
}
],
"source": [
"# Step 3: Print the shape\n",
"print(f'Shape of DataFrame: {books_df.shape}')\n",
"print(f'This means: {books_df.shape[0]} rows and {books_df.shape[1]} columns')"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "6ce8beda",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"DataFrame Information:\n",
"\n",
"RangeIndex: 3 entries, 0 to 2\n",
"Data columns (total 4 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 title 3 non-null object \n",
" 1 author 3 non-null object \n",
" 2 price 3 non-null float64\n",
" 3 stock 3 non-null int64 \n",
"dtypes: float64(1), int64(1), object(2)\n",
"memory usage: 228.0+ bytes\n"
]
}
],
"source": [
"# Step 4: Display info\n",
"print('DataFrame Information:')\n",
"books_df.info()"
]
},
{
"cell_type": "markdown",
"id": "7e0ec8c0",
"metadata": {},
"source": [
"---\n",
"## Problem 2: loading and exploring data\n",
"\n",
"**Scenario:** You're a teacher analyzing student performance data.\n",
"\n",
"**Your Task:**\n",
"1. Load the `students.csv` file into a DataFrame called `students_df`\n",
"2. Display the first 3 rows using `.head()`\n",
"3. Display the last 2 rows using `.tail()`\n",
"4. Show descriptive statistics for numerical columns using `.describe()`\n",
"5. Find the average grade of all students\n",
"\n",
"Students data file avalible for download here: [students.csv](https://media.githubusercontent.com/media/gperdrizet/fullstack-2605/refs/heads/main/data/students.csv)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5c564957",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Students data loaded successfully!\n"
]
}
],
"source": [
"# Step 1: Load the CSV file\n",
"url = 'https://media.githubusercontent.com/media/gperdrizet/fullstack-2605/refs/heads/main/data/students.csv'\n",
"students_df = pd.read_csv(url)\n",
"print('Students data loaded successfully!')"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "c13fce03",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"First 3 rows:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" age | \n",
" grade | \n",
" subject | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" Alice | \n",
" 20 | \n",
" 85 | \n",
" Math | \n",
"
\n",
" \n",
" | 1 | \n",
" Bob | \n",
" 19 | \n",
" 92 | \n",
" Science | \n",
"
\n",
" \n",
" | 2 | \n",
" Charlie | \n",
" 21 | \n",
" 78 | \n",
" Math | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name age grade subject\n",
"0 Alice 20 85 Math\n",
"1 Bob 19 92 Science\n",
"2 Charlie 21 78 Math"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Step 2: Display first 3 rows\n",
"print('First 3 rows:')\n",
"students_df.head(3)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "d0981660",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Last 2 rows:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" age | \n",
" grade | \n",
" subject | \n",
"
\n",
" \n",
" \n",
" \n",
" | 6 | \n",
" Grace | \n",
" 20 | \n",
" 90 | \n",
" Math | \n",
"
\n",
" \n",
" | 7 | \n",
" Henry | \n",
" 21 | \n",
" 87 | \n",
" Science | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name age grade subject\n",
"6 Grace 20 90 Math\n",
"7 Henry 21 87 Science"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Step 3: Display last 2 rows\n",
"print('Last 2 rows:')\n",
"students_df.tail(2)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "7c2874f2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Descriptive Statistics:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" age | \n",
" grade | \n",
"
\n",
" \n",
" \n",
" \n",
" | count | \n",
" 8.000000 | \n",
" 8.000000 | \n",
"
\n",
" \n",
" | mean | \n",
" 20.250000 | \n",
" 87.125000 | \n",
"
\n",
" \n",
" | std | \n",
" 1.035098 | \n",
" 5.462535 | \n",
"
\n",
" \n",
" | min | \n",
" 19.000000 | \n",
" 78.000000 | \n",
"
\n",
" \n",
" | 25% | \n",
" 19.750000 | \n",
" 84.250000 | \n",
"
\n",
" \n",
" | 50% | \n",
" 20.000000 | \n",
" 87.500000 | \n",
"
\n",
" \n",
" | 75% | \n",
" 21.000000 | \n",
" 90.500000 | \n",
"
\n",
" \n",
" | max | \n",
" 22.000000 | \n",
" 95.000000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" age grade\n",
"count 8.000000 8.000000\n",
"mean 20.250000 87.125000\n",
"std 1.035098 5.462535\n",
"min 19.000000 78.000000\n",
"25% 19.750000 84.250000\n",
"50% 20.000000 87.500000\n",
"75% 21.000000 90.500000\n",
"max 22.000000 95.000000"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Step 4: Show descriptive statistics\n",
"print('Descriptive Statistics:')\n",
"students_df.describe()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "7b26f3e4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Average grade of all students: 87.12\n"
]
}
],
"source": [
"# Step 5: Calculate average grade\n",
"average_grade = students_df['grade'].mean()\n",
"print(f'Average grade of all students: {average_grade:.2f}')"
]
},
{
"cell_type": "markdown",
"id": "a79979e1",
"metadata": {},
"source": [
"---\n",
"## Problem 3: data filtering and selection\n",
"\n",
"**Scenario:** Continue working with the student data to find specific information.\n",
"\n",
"**Your Task:**\n",
"1. Display only the 'name' and 'grade' columns from `students_df`\n",
"2. Find all students who scored above 85\n",
"3. Find all students studying 'Math'\n",
"4. Find the highest grade in the dataset\n",
"5. Count how many students are in each subject"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "f5ca71af",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Name and Grade columns:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" grade | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" Alice | \n",
" 85 | \n",
"
\n",
" \n",
" | 1 | \n",
" Bob | \n",
" 92 | \n",
"
\n",
" \n",
" | 2 | \n",
" Charlie | \n",
" 78 | \n",
"
\n",
" \n",
" | 3 | \n",
" Diana | \n",
" 88 | \n",
"
\n",
" \n",
" | 4 | \n",
" Eva | \n",
" 95 | \n",
"
\n",
" \n",
" | 5 | \n",
" Frank | \n",
" 82 | \n",
"
\n",
" \n",
" | 6 | \n",
" Grace | \n",
" 90 | \n",
"
\n",
" \n",
" | 7 | \n",
" Henry | \n",
" 87 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name grade\n",
"0 Alice 85\n",
"1 Bob 92\n",
"2 Charlie 78\n",
"3 Diana 88\n",
"4 Eva 95\n",
"5 Frank 82\n",
"6 Grace 90\n",
"7 Henry 87"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Step 1: Display only name and grade columns\n",
"print('Name and Grade columns:')\n",
"grade_df = students_df[['name', 'grade']]\n",
"grade_df"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "46954789",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Students with grades above 85:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" age | \n",
" grade | \n",
" subject | \n",
"
\n",
" \n",
" \n",
" \n",
" | 1 | \n",
" Bob | \n",
" 19 | \n",
" 92 | \n",
" Science | \n",
"
\n",
" \n",
" | 3 | \n",
" Diana | \n",
" 20 | \n",
" 88 | \n",
" Science | \n",
"
\n",
" \n",
" | 4 | \n",
" Eva | \n",
" 19 | \n",
" 95 | \n",
" Math | \n",
"
\n",
" \n",
" | 6 | \n",
" Grace | \n",
" 20 | \n",
" 90 | \n",
" Math | \n",
"
\n",
" \n",
" | 7 | \n",
" Henry | \n",
" 21 | \n",
" 87 | \n",
" Science | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name age grade subject\n",
"1 Bob 19 92 Science\n",
"3 Diana 20 88 Science\n",
"4 Eva 19 95 Math\n",
"6 Grace 20 90 Math\n",
"7 Henry 21 87 Science"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Step 2: Students with grades above 85\n",
"print('Students with grades above 85:')\n",
"high_performers_df = students_df[students_df['grade'] > 85]\n",
"high_performers_df"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "f0481e8c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Students studying Math:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" age | \n",
" grade | \n",
" subject | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" Alice | \n",
" 20 | \n",
" 85 | \n",
" Math | \n",
"
\n",
" \n",
" | 2 | \n",
" Charlie | \n",
" 21 | \n",
" 78 | \n",
" Math | \n",
"
\n",
" \n",
" | 4 | \n",
" Eva | \n",
" 19 | \n",
" 95 | \n",
" Math | \n",
"
\n",
" \n",
" | 6 | \n",
" Grace | \n",
" 20 | \n",
" 90 | \n",
" Math | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name age grade subject\n",
"0 Alice 20 85 Math\n",
"2 Charlie 21 78 Math\n",
"4 Eva 19 95 Math\n",
"6 Grace 20 90 Math"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Step 3: Students studying Math\n",
"print('Students studying Math:')\n",
"math_students_df = students_df[students_df['subject'] == 'Math']\n",
"math_students_df"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "94892c4d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Highest grade: 95\n"
]
}
],
"source": [
"# Step 4: Highest grade\n",
"highest_grade = students_df['grade'].max()\n",
"print(f'Highest grade: {highest_grade}')"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "e6a1491a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of students in each subject:\n"
]
},
{
"data": {
"text/plain": [
"subject\n",
"Math 4\n",
"Science 4\n",
"Name: count, dtype: int64"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Step 5: Count students by subject\n",
"print('Number of students in each subject:')\n",
"subject_counts = students_df['subject'].value_counts()\n",
"subject_counts"
]
},
{
"cell_type": "markdown",
"id": "5cc9aa70",
"metadata": {},
"source": [
"---\n",
"## Problem 4: sales data analysis\n",
"\n",
"**Scenario:** You're analyzing sales data for an electronics store.\n",
"\n",
"**Your Task:**\n",
"1. Load the `sales.csv` file into a DataFrame called `sales_df`\n",
"2. Calculate the total value for each product (price × quantity)\n",
"3. Add this as a new column called 'total_value' to the DataFrame\n",
"4. Find the product with the highest total value\n",
"5. Calculate the grand total of all sales\n",
"\n",
"Sales data file avalible for download here: [sales.csv](https://media.githubusercontent.com/media/gperdrizet/fullstack-2605/refs/heads/main/data/sales.csv)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d90ac54",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sales data loaded:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" product | \n",
" price | \n",
" quantity | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" Laptop | \n",
" 999.99 | \n",
" 5 | \n",
" 2024-01-15 | \n",
"
\n",
" \n",
" | 1 | \n",
" Mouse | \n",
" 25.50 | \n",
" 20 | \n",
" 2024-01-15 | \n",
"
\n",
" \n",
" | 2 | \n",
" Keyboard | \n",
" 75.00 | \n",
" 15 | \n",
" 2024-01-16 | \n",
"
\n",
" \n",
" | 3 | \n",
" Monitor | \n",
" 299.99 | \n",
" 8 | \n",
" 2024-01-16 | \n",
"
\n",
" \n",
" | 4 | \n",
" Headphones | \n",
" 59.99 | \n",
" 12 | \n",
" 2024-01-17 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" product price quantity date\n",
"0 Laptop 999.99 5 2024-01-15\n",
"1 Mouse 25.50 20 2024-01-15\n",
"2 Keyboard 75.00 15 2024-01-16\n",
"3 Monitor 299.99 8 2024-01-16\n",
"4 Headphones 59.99 12 2024-01-17"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Step 1: Load the sales data\n",
"url = 'https://media.githubusercontent.com/media/gperdrizet/fullstack-2605/refs/heads/main/data/sales.csv'\n",
"sales_df = pd.read_csv(url)\n",
"print('Sales data loaded:')\n",
"sales_df"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "637bbfe2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sales data with total_value column:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" product | \n",
" price | \n",
" quantity | \n",
" date | \n",
" total_value | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" Laptop | \n",
" 999.99 | \n",
" 5 | \n",
" 2024-01-15 | \n",
" 4999.95 | \n",
"
\n",
" \n",
" | 1 | \n",
" Mouse | \n",
" 25.50 | \n",
" 20 | \n",
" 2024-01-15 | \n",
" 510.00 | \n",
"
\n",
" \n",
" | 2 | \n",
" Keyboard | \n",
" 75.00 | \n",
" 15 | \n",
" 2024-01-16 | \n",
" 1125.00 | \n",
"
\n",
" \n",
" | 3 | \n",
" Monitor | \n",
" 299.99 | \n",
" 8 | \n",
" 2024-01-16 | \n",
" 2399.92 | \n",
"
\n",
" \n",
" | 4 | \n",
" Headphones | \n",
" 59.99 | \n",
" 12 | \n",
" 2024-01-17 | \n",
" 719.88 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" product price quantity date total_value\n",
"0 Laptop 999.99 5 2024-01-15 4999.95\n",
"1 Mouse 25.50 20 2024-01-15 510.00\n",
"2 Keyboard 75.00 15 2024-01-16 1125.00\n",
"3 Monitor 299.99 8 2024-01-16 2399.92\n",
"4 Headphones 59.99 12 2024-01-17 719.88"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Step 2 & 3: Calculate total value and add as new column\n",
"sales_df['total_value'] = sales_df['price'] * sales_df['quantity']\n",
"print('Sales data with total_value column:')\n",
"sales_df"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "d0cdc7eb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Product with highest total value:\n",
"Product: Laptop\n",
"Total Value: $4999.95\n"
]
}
],
"source": [
"# Step 4: Find product with highest total value\n",
"max_value_idx = sales_df['total_value'].idxmax()\n",
"top_product = sales_df.loc[max_value_idx]\n",
"print(f'Product with highest total value:')\n",
"print(f'Product: {top_product['product']}')\n",
"print(f'Total Value: ${top_product['total_value']:.2f}')"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "27c4e204",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" product | \n",
" price | \n",
" quantity | \n",
" date | \n",
" total_value | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" Laptop | \n",
" 999.99 | \n",
" 5 | \n",
" 2024-01-15 | \n",
" 4999.95 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" product price quantity date total_value\n",
"0 Laptop 999.99 5 2024-01-15 4999.95"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Alternative method for Step 4:\n",
"top_product_row = sales_df[sales_df['total_value'] == sales_df['total_value'].max()]\n",
"top_product_row"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "552c89b8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Grand total of all sales: $9754.75\n"
]
}
],
"source": [
"# Step 5: Calculate grand total of all sales\n",
"grand_total = sales_df['total_value'].sum()\n",
"print(f'Grand total of all sales: ${grand_total:.2f}')"
]
},
{
"cell_type": "markdown",
"id": "2ed628ab",
"metadata": {},
"source": [
"---\n",
"## Problem 5: series creation and manipulation\n",
"\n",
"**Scenario:** Create and work with Pandas Series for daily temperature data.\n",
"\n",
"**Your Task:**\n",
"1. Create a Pandas Series called `temperatures` with the following data:\n",
" - Values: [22, 25, 23, 26, 24, 27, 25]\n",
" - Index: ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']\n",
"2. Find the temperature for Wednesday\n",
"3. Find days with temperature above 24 degrees\n",
"4. Calculate the average temperature for the week\n",
"5. Find the day with the highest temperature"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "4babcd5e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Temperature Series:\n"
]
},
{
"data": {
"text/plain": [
"Mon 22\n",
"Tue 25\n",
"Wed 23\n",
"Thu 26\n",
"Fri 24\n",
"Sat 27\n",
"Sun 25\n",
"Name: Temperature (°C), dtype: int64"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Step 1: Create the temperature series\n",
"temperatures = pd.Series(\n",
" [22, 25, 23, 26, 24, 27, 25],\n",
" index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'],\n",
" name='Temperature (°C)'\n",
")\n",
"print('Temperature Series:')\n",
"temperatures"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "9de80523",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Temperature on Wednesday: 23°C\n"
]
}
],
"source": [
"# Step 2: Temperature for Wednesday\n",
"wed_temp = temperatures['Wed']\n",
"print(f'Temperature on Wednesday: {wed_temp}°C')"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "f8e8298d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Days with temperature above 24°C:\n",
"Tue 25\n",
"Thu 26\n",
"Sat 27\n",
"Sun 25\n",
"Name: Temperature (°C), dtype: int64\n"
]
}
],
"source": [
"# Step 3: Days with temperature above 24\n",
"hot_days = temperatures[temperatures > 24]\n",
"print('Days with temperature above 24°C:')\n",
"print(hot_days)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "50564792",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Average temperature for the week: 24.6°C\n"
]
}
],
"source": [
"# Step 4: Average temperature\n",
"avg_temp = temperatures.mean()\n",
"print(f'Average temperature for the week: {avg_temp:.1f}°C')"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "04188089",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Hottest day: Sat with 27°C\n"
]
}
],
"source": [
"# Step 5: Day with highest temperature\n",
"hottest_day = temperatures.idxmax()\n",
"hottest_temp = temperatures.max()\n",
"print(f'Hottest day: {hottest_day} with {hottest_temp}°C')"
]
},
{
"cell_type": "markdown",
"id": "89fdce0f",
"metadata": {},
"source": [
"---\n",
"## Reflection questions\n",
"\n",
"**1. What is the difference between a Pandas Series and a DataFrame?**\n",
"\n",
"*Sample answer:* A Pandas Series is a one-dimensional labeled array that can hold any data type (like a single column). A DataFrame is a two-dimensional labeled data structure with rows and columns (like a table or spreadsheet). You can think of a DataFrame as a collection of Series objects that share the same index. Each column in a DataFrame is essentially a Series.\n",
"\n",
"**2. What are the advantages of using Pandas over working with plain Python lists and dictionaries?**\n",
"\n",
"*Sample answer:* Pandas offers several advantages:\n",
"- **Labeled data:** Rows and columns have meaningful names, making data easier to understand\n",
"- **Built-in operations:** Methods like `.mean()`, `.describe()`, `.filter()` are optimized and easy to use\n",
"- **Handling missing data:** Pandas has tools to detect and handle missing values\n",
"- **Easy data import/export:** Simple functions to read/write CSV, Excel, SQL databases, etc.\n",
"- **Performance:** Built on NumPy, so it's much faster for large datasets\n",
"- **Data alignment:** Automatically aligns data based on labels when performing operations\n",
"\n",
"**3. Describe a real-world scenario where you might use the filtering techniques you learned in Problem 3.**\n",
"\n",
"*Sample answer:* In an e-commerce business, I might use filtering to:\n",
"- Find all orders above $100 for premium customer analysis\n",
"- Filter products by category to analyze sales by department\n",
"- Identify customers who haven't made a purchase in 90 days for targeted marketing\n",
"- Find all transactions from a specific date range for monthly reports\n",
"- Filter inventory items with stock below a threshold to trigger reordering\n",
"\n",
"**4. What did you find most challenging about working with Pandas in this activity?**\n",
"\n",
"*Sample answer:* Understanding the difference between single and double brackets for column selection was initially confusing. `df['column']` returns a Series, while `df[['column']]` returns a DataFrame. Also, remembering when to use `.loc[]` vs `.iloc[]` vs direct indexing took some practice. The boolean filtering syntax with nested brackets like `df[df['column'] > value]` also required careful attention to get right."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}