{ "cells": [ { "cell_type": "markdown", "id": "be6d9f42", "metadata": {}, "source": [ "# Lesson 10 activity: Pandas" ] }, { "cell_type": "markdown", "id": "8d35e167", "metadata": {}, "source": [ "## Learning objectives\n", "\n", "By the end of this activity, you will be able to:\n", "- Create Pandas Series and DataFrames\n", "- Load data from CSV files\n", "- Perform basic data exploration and analysis\n", "- Calculate descriptive statistics\n", "- Filter and manipulate DataFrame data" ] }, { "cell_type": "markdown", "id": "f4f4250f", "metadata": {}, "source": [ "## Tips\n", "\n", "- **Creating DataFrames:** Use `pd.DataFrame(dictionary)` where dictionary keys become column names\n", "- **Loading CSV files:** Use `pd.read_csv('filename.csv')`\n", "- **Basic exploration:** Use `.head()`, `.tail()`, `.info()`, `.describe()`, and `.shape`\n", "- **Filtering data:** Use conditions like `df[df['column'] > value]`\n", "- **Column selection:** Use `df['column_name']` or `df[['col1', 'col2']]`\n", "- **Adding columns:** Use `df['new_column'] = calculation`\n", "- **Statistics:** Use `.mean()`, `.max()`, `.min()`, `.sum()` methods\n", "\n", "**Remember:** Take your time with each step and test your code frequently!" ] }, { "cell_type": "code", "execution_count": 1, "id": "8a821472", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "markdown", "id": "b7a3b120", "metadata": {}, "source": [ "---\n", "## Problem 1: creating your first DataFrame\n", "\n", "**Scenario:** You're working at a bookstore and need to create a simple inventory system.\n", "\n", "**Your task:**\n", "1. Create a DataFrame called `books_df` with the following data:\n", " - Book titles: [\"Python Basics\", \"Data Science Handbook\", \"Web Development Guide\"]\n", " - Authors: [\"John Smith\", \"Jane Doe\", \"Mike Johnson\"]\n", " - Prices: [29.99, 45.50, 35.00]\n", " - Stock: [15, 8, 12]\n", "\n", "2. Display the DataFrame\n", "3. Print the shape of the DataFrame\n", "4. Display basic information about the DataFrame using `.info()`" ] }, { "cell_type": "code", "execution_count": null, "id": "37891ada", "metadata": {}, "outputs": [], "source": [ "# Step 1: Create the DataFrame\n" ] }, { "cell_type": "code", "execution_count": null, "id": "1d26a67e", "metadata": {}, "outputs": [], "source": [ "# Step 2: Display the DataFrame\n" ] }, { "cell_type": "code", "execution_count": null, "id": "6de21323", "metadata": {}, "outputs": [], "source": [ "# Step 3: Print the shape\n" ] }, { "cell_type": "code", "execution_count": null, "id": "142c46ea", "metadata": {}, "outputs": [], "source": [ "# Step 4: Display info\n" ] }, { "cell_type": "markdown", "id": "b44fc045", "metadata": {}, "source": [ "---\n", "## Problem 2: loading and exploring student data\n", "\n", "**Scenario:** You're a teacher analyzing student performance data.\n", "\n", "**Your task:**\n", "1. Load the `students.csv` file into a DataFrame called `students_df`\n", "2. Display the first 3 rows using `.head()`\n", "3. Display the last 2 rows using `.tail()`\n", "4. Show descriptive statistics for numerical columns using `.describe()`\n", "5. Find the average grade of all students\n", "\n", "Find `students.csv` on the **Datasets** page of the course site, or download it directly: [students.csv](https://media.githubusercontent.com/media/gperdrizet/fullstack-2605/refs/heads/main/data/students.csv)" ] }, { "cell_type": "code", "execution_count": null, "id": "d953fa64", "metadata": {}, "outputs": [], "source": [ "# Step 1: Load the CSV file\n" ] }, { "cell_type": "code", "execution_count": null, "id": "042d5db0", "metadata": {}, "outputs": [], "source": [ "# Step 2: Display first 3 rows\n" ] }, { "cell_type": "code", "execution_count": null, "id": "ac6010a1", "metadata": {}, "outputs": [], "source": [ "# Step 3: Display last 2 rows\n" ] }, { "cell_type": "code", "execution_count": null, "id": "6c0fe4f4", "metadata": {}, "outputs": [], "source": [ "# Step 4: Show descriptive statistics\n" ] }, { "cell_type": "code", "execution_count": null, "id": "9d012dc5", "metadata": {}, "outputs": [], "source": [ "# Step 5: Calculate average grade\n" ] }, { "cell_type": "markdown", "id": "8670b6ff", "metadata": {}, "source": [ "---\n", "## Problem 3: Data filtering and selection\n", "\n", "**Scenario:** Continue working with the student data to find specific information.\n", "\n", "**Your task:**\n", "1. Display only the 'name' and 'grade' columns from `students_df`\n", "2. Find all students who scored above 85\n", "3. Find all students studying 'Math'\n", "4. Find the highest grade in the dataset\n", "5. Count how many students are in each subject" ] }, { "cell_type": "code", "execution_count": null, "id": "9d91f2d8", "metadata": {}, "outputs": [], "source": [ "# Step 1: Display only name and grade columns\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f157592f", "metadata": {}, "outputs": [], "source": [ "# Step 2: Students with grades above 85\n" ] }, { "cell_type": "code", "execution_count": null, "id": "fd058aeb", "metadata": {}, "outputs": [], "source": [ "# Step 3: Students studying Math\n" ] }, { "cell_type": "code", "execution_count": null, "id": "c139a22b", "metadata": {}, "outputs": [], "source": [ "# Step 4: Highest grade\n" ] }, { "cell_type": "code", "execution_count": null, "id": "2a96dcba", "metadata": {}, "outputs": [], "source": [ "# Step 5: Count students by subject\n" ] }, { "cell_type": "markdown", "id": "3d5b4d72", "metadata": {}, "source": [ "---\n", "## Problem 4: sales data analysis\n", "\n", "**Scenario:** You're analyzing sales data for an electronics store.\n", "\n", "**Your task:**\n", "1. Load the `sales.csv` file into a DataFrame called `sales_df`\n", "2. Calculate the total value for each product (price × quantity)\n", "3. Add this as a new column called 'total_value' to the DataFrame\n", "4. Find the product with the highest total value\n", "5. Calculate the grand total of all sales\n", "\n", "Find `sales.csv` on the **Datasets** page of the course site, or download it directly: [sales.csv](https://media.githubusercontent.com/media/gperdrizet/fullstack-2605/refs/heads/main/data/sales.csv)" ] }, { "cell_type": "code", "execution_count": null, "id": "0ea00e85", "metadata": {}, "outputs": [], "source": [ "# Step 1: Load the sales data\n" ] }, { "cell_type": "code", "execution_count": null, "id": "759aaa1e", "metadata": {}, "outputs": [], "source": [ "# Step 2 & 3: Calculate total value and add as new column\n" ] }, { "cell_type": "code", "execution_count": null, "id": "c3457bdd", "metadata": {}, "outputs": [], "source": [ "# Step 4: Find product with highest total value\n" ] }, { "cell_type": "code", "execution_count": null, "id": "b70405b9", "metadata": {}, "outputs": [], "source": [ "# Step 5: Calculate grand total of all sales\n" ] }, { "cell_type": "markdown", "id": "4b46ec82", "metadata": {}, "source": [ "---\n", "## Problem 5: series creation and manipulation\n", "\n", "**Scenario:** Create and work with Pandas Series for daily temperature data.\n", "\n", "**Your task:**\n", "1. Create a Pandas Series called `temperatures` with the following data:\n", " - Values: [22, 25, 23, 26, 24, 27, 25]\n", " - Index: ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']\n", "2. Find the temperature for Wednesday\n", "3. Find days with temperature above 24 degrees\n", "4. Calculate the average temperature for the week\n", "5. Find the day with the highest temperature" ] }, { "cell_type": "code", "execution_count": null, "id": "8a5e217e", "metadata": {}, "outputs": [], "source": [ "# Step 1: Create the temperature series\n" ] }, { "cell_type": "code", "execution_count": null, "id": "4bfa4772", "metadata": {}, "outputs": [], "source": [ "# Step 2: Temperature for Wednesday\n" ] }, { "cell_type": "code", "execution_count": null, "id": "3f801c36", "metadata": {}, "outputs": [], "source": [ "# Step 3: Days with temperature above 24\n" ] }, { "cell_type": "code", "execution_count": null, "id": "3df5c281", "metadata": {}, "outputs": [], "source": [ "# Step 4: Average temperature\n" ] }, { "cell_type": "code", "execution_count": null, "id": "efe22f36", "metadata": {}, "outputs": [], "source": [ "# Step 5: Day with highest temperature\n" ] }, { "cell_type": "markdown", "id": "8dfe6711", "metadata": {}, "source": [ "---\n", "## Reflection questions\n", "\n", "Please answer these questions after completing the activity:\n", "\n", "1. **What is the difference between a Pandas Series and a DataFrame?**\n", " \n", " *Your answer:*\n", "\n", "2. **What are the advantages of using Pandas over working with plain Python lists and dictionaries?**\n", " \n", " *Your answer:*\n", "\n", "3. **Describe a real-world scenario where you might use the filtering techniques you learned in Problem 3.**\n", " \n", " *Your answer:*\n", "\n", "4. **What did you find most challenging about working with Pandas in this activity?**\n", " \n", " *Your answer:*" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" } }, "nbformat": 4, "nbformat_minor": 5 }