{ "cells": [ { "cell_type": "markdown", "id": "38a98679", "metadata": {}, "source": [ "# Data Analysis with Pandas\n", "\n", "Learn essential data analysis techniques using Python and Pandas. This tutorial covers loading data, exploration, filtering, grouping, and pivot tables using a real sales dataset.\n", "\n", "**Prerequisites:**\n", "- Basic Python knowledge\n", "- Pandas library installed\n", "\n", "**What you'll learn:**\n", "- Loading and exploring CSV data\n", "- Filtering and querying DataFrames\n", "- Group by operations and aggregations\n", "- Creating pivot tables for analysis" ] }, { "cell_type": "markdown", "id": "2eff9dd4", "metadata": {}, "source": [ "## 1. Setup: Import Libraries\n", "\n", "First, let's import the required libraries for data analysis." ] }, { "cell_type": "code", "execution_count": 17, "id": "7ac57f00", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Libraries imported successfully!\n", "Pandas version: 3.0.0\n", "NumPy version: 2.4.2\n" ] } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "from datetime import datetime\n", "\n", "# Display settings for better output formatting\n", "pd.set_option('display.max_columns', None)\n", "pd.set_option('display.width', None)\n", "pd.set_option('display.float_format', '{:.2f}'.format)\n", "\n", "print(\"Libraries imported successfully!\")\n", "print(f\"Pandas version: {pd.__version__}\")\n", "print(f\"NumPy version: {np.__version__}\")" ] }, { "cell_type": "markdown", "id": "57e5a1e0", "metadata": {}, "source": [ "## 2. Load and Explore Data\n", "\n", "Let's load our sales dataset and take a first look at the data structure." ] }, { "cell_type": "code", "execution_count": 2, "id": "60f2cf96", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ“Š Sales Data - First 10 rows:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateproductcategoryquantityunit_pricerevenueregionsalesperson
02025-01-05Laptop ProElectronics31299.993899.97NorthAlice Johnson
12025-01-07Wireless MouseElectronics1529.99449.85SouthBob Smith
22025-01-08Office ChairFurniture5249.991249.95EastCarol Davis
32025-01-10Standing DeskFurniture2599.991199.98WestDavid Wilson
42025-01-12Monitor 27inchElectronics8399.993199.92NorthAlice Johnson
52025-01-15Keyboard MechanicalElectronics12149.991799.88SouthBob Smith
62025-01-17Desk LampFurniture2049.99999.80EastCarol Davis
72025-01-20Webcam HDElectronics1079.99799.90WestDavid Wilson
82025-01-22Laptop ProElectronics51299.996499.95NorthEve Martinez
92025-01-25Wireless MouseElectronics2529.99749.75SouthFrank Brown
\n", "
" ], "text/plain": [ " date product category quantity unit_price revenue \\\n", "0 2025-01-05 Laptop Pro Electronics 3 1299.99 3899.97 \n", "1 2025-01-07 Wireless Mouse Electronics 15 29.99 449.85 \n", "2 2025-01-08 Office Chair Furniture 5 249.99 1249.95 \n", "3 2025-01-10 Standing Desk Furniture 2 599.99 1199.98 \n", "4 2025-01-12 Monitor 27inch Electronics 8 399.99 3199.92 \n", "5 2025-01-15 Keyboard Mechanical Electronics 12 149.99 1799.88 \n", "6 2025-01-17 Desk Lamp Furniture 20 49.99 999.80 \n", "7 2025-01-20 Webcam HD Electronics 10 79.99 799.90 \n", "8 2025-01-22 Laptop Pro Electronics 5 1299.99 6499.95 \n", "9 2025-01-25 Wireless Mouse Electronics 25 29.99 749.75 \n", "\n", " region salesperson \n", "0 North Alice Johnson \n", "1 South Bob Smith \n", "2 East Carol Davis \n", "3 West David Wilson \n", "4 North Alice Johnson \n", "5 South Bob Smith \n", "6 East Carol Davis \n", "7 West David Wilson \n", "8 North Eve Martinez \n", "9 South Frank Brown " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Load the sales data\n", "df = pd.read_csv('/Users/bamr87/github/zer0-mistakes/assets/data/notebooks/sales_data.csv')\n", "\n", "# Convert date column to datetime\n", "df['date'] = pd.to_datetime(df['date'])\n", "\n", "# Display first few rows\n", "print(\"๐Ÿ“Š Sales Data - First 10 rows:\")\n", "df.head(10)" ] }, { "cell_type": "code", "execution_count": 3, "id": "cfe86dbe", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ“‹ Dataset Information:\n", "Shape: 98 rows ร— 8 columns\n", "\n", "Column names: ['date', 'product', 'category', 'quantity', 'unit_price', 'revenue', 'region', 'salesperson']\n", "\n", "============================================================\n", "\n", "RangeIndex: 98 entries, 0 to 97\n", "Data columns (total 8 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 date 98 non-null datetime64[us]\n", " 1 product 98 non-null str \n", " 2 category 98 non-null str \n", " 3 quantity 98 non-null int64 \n", " 4 unit_price 98 non-null float64 \n", " 5 revenue 98 non-null float64 \n", " 6 region 98 non-null str \n", " 7 salesperson 98 non-null str \n", "dtypes: datetime64[us](1), float64(2), int64(1), str(4)\n", "memory usage: 6.3 KB\n" ] } ], "source": [ "# Get basic information about the dataset\n", "print(\"๐Ÿ“‹ Dataset Information:\")\n", "print(f\"Shape: {df.shape[0]} rows ร— {df.shape[1]} columns\")\n", "print(f\"\\nColumn names: {list(df.columns)}\")\n", "print(\"\\n\" + \"=\"*60)\n", "df.info()" ] }, { "cell_type": "code", "execution_count": 4, "id": "c3e0029f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ“ˆ Statistical Summary:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datequantityunit_pricerevenue
count9898.0098.0098.00
mean2025-05-08 14:56:19.59183616.55324.992840.04
min2025-01-05 00:00:002.0029.99449.85
25%2025-03-08 12:00:008.0049.991199.74
50%2025-05-09 00:00:0015.00174.992124.89
75%2025-07-09 12:00:0022.00399.993862.44
max2025-09-10 00:00:0050.001299.9911699.91
stdNaN10.74389.562236.43
\n", "
" ], "text/plain": [ " date quantity unit_price revenue\n", "count 98 98.00 98.00 98.00\n", "mean 2025-05-08 14:56:19.591836 16.55 324.99 2840.04\n", "min 2025-01-05 00:00:00 2.00 29.99 449.85\n", "25% 2025-03-08 12:00:00 8.00 49.99 1199.74\n", "50% 2025-05-09 00:00:00 15.00 174.99 2124.89\n", "75% 2025-07-09 12:00:00 22.00 399.99 3862.44\n", "max 2025-09-10 00:00:00 50.00 1299.99 11699.91\n", "std NaN 10.74 389.56 2236.43" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Statistical summary of numeric columns\n", "print(\"๐Ÿ“ˆ Statistical Summary:\")\n", "df.describe()" ] }, { "cell_type": "markdown", "id": "a2400b0c", "metadata": {}, "source": [ "## 3. Filtering and Querying Data\n", "\n", "Learn different ways to filter and query your DataFrame." ] }, { "cell_type": "code", "execution_count": 5, "id": "ed963c7d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ”Œ Electronics Sales: 68 transactions\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateproductcategoryquantityunit_pricerevenueregionsalesperson
02025-01-05Laptop ProElectronics31299.993899.97NorthAlice Johnson
12025-01-07Wireless MouseElectronics1529.99449.85SouthBob Smith
42025-01-12Monitor 27inchElectronics8399.993199.92NorthAlice Johnson
52025-01-15Keyboard MechanicalElectronics12149.991799.88SouthBob Smith
72025-01-20Webcam HDElectronics1079.99799.90WestDavid Wilson
\n", "
" ], "text/plain": [ " date product category quantity unit_price revenue \\\n", "0 2025-01-05 Laptop Pro Electronics 3 1299.99 3899.97 \n", "1 2025-01-07 Wireless Mouse Electronics 15 29.99 449.85 \n", "4 2025-01-12 Monitor 27inch Electronics 8 399.99 3199.92 \n", "5 2025-01-15 Keyboard Mechanical Electronics 12 149.99 1799.88 \n", "7 2025-01-20 Webcam HD Electronics 10 79.99 799.90 \n", "\n", " region salesperson \n", "0 North Alice Johnson \n", "1 South Bob Smith \n", "4 North Alice Johnson \n", "5 South Bob Smith \n", "7 West David Wilson " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Filter: Get all Electronics sales\n", "electronics_sales = df[df['category'] == 'Electronics']\n", "print(f\"๐Ÿ”Œ Electronics Sales: {len(electronics_sales)} transactions\")\n", "electronics_sales.head()" ] }, { "cell_type": "code", "execution_count": 6, "id": "6260239d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ’ฐ High-value North sales: 14 transactions\n", "Total revenue: $75,348.48\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateproductcategoryquantityunit_pricerevenueregionsalesperson
02025-01-05Laptop ProElectronics31299.993899.97NorthAlice Johnson
42025-01-12Monitor 27inchElectronics8399.993199.92NorthAlice Johnson
82025-01-22Laptop ProElectronics51299.996499.95NorthEve Martinez
322025-03-28Monitor 27inchElectronics12399.994799.88NorthEve Martinez
362025-04-08Laptop ProElectronics61299.997799.94NorthCarol Davis
522025-05-18Monitor 27inchElectronics15399.995999.85NorthAlice Johnson
562025-05-28Laptop ProElectronics31299.993899.97NorthEve Martinez
682025-06-28Headphones WirelessElectronics20199.993999.80NorthEve Martinez
722025-07-08Monitor 27inchElectronics14399.995599.86NorthCarol Davis
762025-07-18Laptop ProElectronics51299.996499.95NorthAlice Johnson
802025-07-28Office ChairFurniture15249.993749.85NorthEve Martinez
882025-08-18Headphones WirelessElectronics22199.994399.78NorthAlice Johnson
922025-08-28Monitor 27inchElectronics18399.997199.82NorthEve Martinez
962025-09-08Laptop ProElectronics61299.997799.94NorthCarol Davis
\n", "
" ], "text/plain": [ " date product category quantity unit_price \\\n", "0 2025-01-05 Laptop Pro Electronics 3 1299.99 \n", "4 2025-01-12 Monitor 27inch Electronics 8 399.99 \n", "8 2025-01-22 Laptop Pro Electronics 5 1299.99 \n", "32 2025-03-28 Monitor 27inch Electronics 12 399.99 \n", "36 2025-04-08 Laptop Pro Electronics 6 1299.99 \n", "52 2025-05-18 Monitor 27inch Electronics 15 399.99 \n", "56 2025-05-28 Laptop Pro Electronics 3 1299.99 \n", "68 2025-06-28 Headphones Wireless Electronics 20 199.99 \n", "72 2025-07-08 Monitor 27inch Electronics 14 399.99 \n", "76 2025-07-18 Laptop Pro Electronics 5 1299.99 \n", "80 2025-07-28 Office Chair Furniture 15 249.99 \n", "88 2025-08-18 Headphones Wireless Electronics 22 199.99 \n", "92 2025-08-28 Monitor 27inch Electronics 18 399.99 \n", "96 2025-09-08 Laptop Pro Electronics 6 1299.99 \n", "\n", " revenue region salesperson \n", "0 3899.97 North Alice Johnson \n", "4 3199.92 North Alice Johnson \n", "8 6499.95 North Eve Martinez \n", "32 4799.88 North Eve Martinez \n", "36 7799.94 North Carol Davis \n", "52 5999.85 North Alice Johnson \n", "56 3899.97 North Eve Martinez \n", "68 3999.80 North Eve Martinez \n", "72 5599.86 North Carol Davis \n", "76 6499.95 North Alice Johnson \n", "80 3749.85 North Eve Martinez \n", "88 4399.78 North Alice Johnson \n", "92 7199.82 North Eve Martinez \n", "96 7799.94 North Carol Davis " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Multiple conditions: High-value sales (revenue > $3000) in the North region\n", "high_value_north = df[(df['revenue'] > 3000) & (df['region'] == 'North')]\n", "print(f\"๐Ÿ’ฐ High-value North sales: {len(high_value_north)} transactions\")\n", "print(f\"Total revenue: ${high_value_north['revenue'].sum():,.2f}\")\n", "high_value_north" ] }, { "cell_type": "code", "execution_count": 7, "id": "f872b2fd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ“… Q3 2025 Sales: 28 transactions\n", "Q3 Total Revenue: $107,193.76\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateproductcategoryquantityunit_pricerevenueregionsalesperson
702025-07-02Office ChairFurniture11249.992749.89EastAlice Johnson
712025-07-05Standing DeskFurniture8599.994799.92WestBob Smith
722025-07-08Monitor 27inchElectronics14399.995599.86NorthCarol Davis
732025-07-10Keyboard MechanicalElectronics28149.994199.72SouthDavid Wilson
742025-07-12Desk LampFurniture3549.991749.65EastEve Martinez
\n", "
" ], "text/plain": [ " date product category quantity unit_price \\\n", "70 2025-07-02 Office Chair Furniture 11 249.99 \n", "71 2025-07-05 Standing Desk Furniture 8 599.99 \n", "72 2025-07-08 Monitor 27inch Electronics 14 399.99 \n", "73 2025-07-10 Keyboard Mechanical Electronics 28 149.99 \n", "74 2025-07-12 Desk Lamp Furniture 35 49.99 \n", "\n", " revenue region salesperson \n", "70 2749.89 East Alice Johnson \n", "71 4799.92 West Bob Smith \n", "72 5599.86 North Carol Davis \n", "73 4199.72 South David Wilson \n", "74 1749.65 East Eve Martinez " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Using query() method - more readable for complex filters\n", "q3_sales = df.query(\"date >= '2025-07-01' and date <= '2025-09-30'\")\n", "print(f\"๐Ÿ“… Q3 2025 Sales: {len(q3_sales)} transactions\")\n", "print(f\"Q3 Total Revenue: ${q3_sales['revenue'].sum():,.2f}\")\n", "q3_sales.head()" ] }, { "cell_type": "markdown", "id": "67a80476", "metadata": {}, "source": [ "## 4. Group By Operations\n", "\n", "Group data by one or more columns and perform aggregations." ] }, { "cell_type": "code", "execution_count": 8, "id": "b8160a8f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ’ต Revenue by Category:\n" ] }, { "data": { "text/plain": [ "category\n", "Electronics 208677.93\n", "Furniture 69645.85\n", "Name: revenue, dtype: float64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Group by category and calculate total revenue\n", "category_revenue = df.groupby('category')['revenue'].sum().sort_values(ascending=False)\n", "print(\"๐Ÿ’ต Revenue by Category:\")\n", "category_revenue" ] }, { "cell_type": "code", "execution_count": 9, "id": "2c53d835", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐ŸŒ Sales Statistics by Region:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_revenueavg_revenuenum_transactionstotal_units
region
North96996.703879.8725330
East85646.803568.6224320
South48294.721931.7925528
West47385.561974.4024444
\n", "
" ], "text/plain": [ " total_revenue avg_revenue num_transactions total_units\n", "region \n", "North 96996.70 3879.87 25 330\n", "East 85646.80 3568.62 24 320\n", "South 48294.72 1931.79 25 528\n", "West 47385.56 1974.40 24 444" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Group by region and calculate multiple metrics\n", "region_stats = df.groupby('region').agg({\n", " 'revenue': ['sum', 'mean', 'count'],\n", " 'quantity': 'sum'\n", "}).round(2)\n", "\n", "# Flatten column names\n", "region_stats.columns = ['total_revenue', 'avg_revenue', 'num_transactions', 'total_units']\n", "region_stats = region_stats.sort_values('total_revenue', ascending=False)\n", "\n", "print(\"๐ŸŒ Sales Statistics by Region:\")\n", "region_stats" ] }, { "cell_type": "code", "execution_count": 10, "id": "a8b3f35a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ† Top Salespeople by Total Revenue:\n" ] }, { "data": { "text/plain": [ "salesperson\n", "Eve Martinez 62347.93\n", "Alice Johnson 61397.89\n", "Carol Davis 58897.68\n", "David Wilson 34786.65\n", "Bob Smith 33356.65\n", "Frank Brown 27536.98\n", "Name: revenue, dtype: float64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Top salespeople by revenue\n", "salesperson_revenue = df.groupby('salesperson')['revenue'].sum().sort_values(ascending=False)\n", "print(\"๐Ÿ† Top Salespeople by Total Revenue:\")\n", "salesperson_revenue" ] }, { "cell_type": "markdown", "id": "1867dc5e", "metadata": {}, "source": [ "## 5. Pivot Tables\n", "\n", "Create pivot tables for multi-dimensional analysis." ] }, { "cell_type": "code", "execution_count": 11, "id": "d126bb0e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ“Š Revenue by Region and Category:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
categoryElectronicsFurniture
region
East63798.8121847.99
North81598.3015398.40
South35094.9413199.78
West28185.8819199.68
\n", "
" ], "text/plain": [ "category Electronics Furniture\n", "region \n", "East 63798.81 21847.99\n", "North 81598.30 15398.40\n", "South 35094.94 13199.78\n", "West 28185.88 19199.68" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Pivot table: Revenue by Region and Category\n", "pivot_region_category = pd.pivot_table(\n", " df,\n", " values='revenue',\n", " index='region',\n", " columns='category',\n", " aggfunc='sum',\n", " fill_value=0\n", ")\n", "\n", "print(\"๐Ÿ“Š Revenue by Region and Category:\")\n", "pivot_region_category" ] }, { "cell_type": "code", "execution_count": 12, "id": "416698ac", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ“… Monthly Revenue by Product:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
month2025-012025-022025-032025-042025-052025-062025-072025-082025-09
product
Desk Lamp999.80749.851249.752399.521099.781399.721749.651999.601599.68
Headphones Wireless0.001399.931999.902399.882999.855599.723599.824399.780.00
Keyboard Mechanical1799.882699.825249.653299.783749.752399.844199.729749.350.00
Laptop Pro10399.922599.985199.967799.9414299.899099.936499.9511699.917799.94
Monitor 27inch3199.922399.948799.783599.915999.854399.895599.8613599.660.00
Office Chair1249.953499.862499.901749.932999.882249.916499.743249.870.00
Standing Desk1199.984199.932999.953599.942399.964199.938399.865399.910.00
USB Hub0.00879.78719.82999.751199.701399.651399.651119.720.00
Webcam HD799.90959.88639.921199.852319.711119.861599.801279.841759.78
Wireless Mouse1199.60899.70599.801049.652039.32959.681349.551499.501139.62
\n", "
" ], "text/plain": [ "month 2025-01 2025-02 2025-03 2025-04 2025-05 2025-06 \\\n", "product \n", "Desk Lamp 999.80 749.85 1249.75 2399.52 1099.78 1399.72 \n", "Headphones Wireless 0.00 1399.93 1999.90 2399.88 2999.85 5599.72 \n", "Keyboard Mechanical 1799.88 2699.82 5249.65 3299.78 3749.75 2399.84 \n", "Laptop Pro 10399.92 2599.98 5199.96 7799.94 14299.89 9099.93 \n", "Monitor 27inch 3199.92 2399.94 8799.78 3599.91 5999.85 4399.89 \n", "Office Chair 1249.95 3499.86 2499.90 1749.93 2999.88 2249.91 \n", "Standing Desk 1199.98 4199.93 2999.95 3599.94 2399.96 4199.93 \n", "USB Hub 0.00 879.78 719.82 999.75 1199.70 1399.65 \n", "Webcam HD 799.90 959.88 639.92 1199.85 2319.71 1119.86 \n", "Wireless Mouse 1199.60 899.70 599.80 1049.65 2039.32 959.68 \n", "\n", "month 2025-07 2025-08 2025-09 \n", "product \n", "Desk Lamp 1749.65 1999.60 1599.68 \n", "Headphones Wireless 3599.82 4399.78 0.00 \n", "Keyboard Mechanical 4199.72 9749.35 0.00 \n", "Laptop Pro 6499.95 11699.91 7799.94 \n", "Monitor 27inch 5599.86 13599.66 0.00 \n", "Office Chair 6499.74 3249.87 0.00 \n", "Standing Desk 8399.86 5399.91 0.00 \n", "USB Hub 1399.65 1119.72 0.00 \n", "Webcam HD 1599.80 1279.84 1759.78 \n", "Wireless Mouse 1349.55 1499.50 1139.62 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Add month column for time-based analysis\n", "df['month'] = df['date'].dt.to_period('M')\n", "\n", "# Pivot table: Monthly revenue by product\n", "pivot_monthly = pd.pivot_table(\n", " df,\n", " values='revenue',\n", " index='product',\n", " columns='month',\n", " aggfunc='sum',\n", " fill_value=0\n", ")\n", "\n", "print(\"๐Ÿ“… Monthly Revenue by Product:\")\n", "pivot_monthly" ] }, { "cell_type": "markdown", "id": "f8265e28", "metadata": {}, "source": [ "## 6. Summary Statistics and Key Insights\n", "\n", "Generate a comprehensive summary of the sales data." ] }, { "cell_type": "code", "execution_count": 13, "id": "a75a22eb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "============================================================\n", "๐Ÿ“Š SALES ANALYSIS SUMMARY\n", "============================================================\n", "\n", "๐Ÿ’ฐ Total Revenue: $278,323.78\n", "๐Ÿ“ฆ Total Units Sold: 1,622\n", "๐Ÿงพ Number of Transactions: 98\n", "๐Ÿ’ต Average Transaction Value: $2,840.04\n", "\n", "๐Ÿ† Best Selling Product: Laptop Pro\n", "๐ŸŒ Top Region: North\n", "โญ Top Salesperson: Eve Martinez\n", "\n", "๐Ÿ“… Date Range: 2025-01-05 to 2025-09-10\n", "============================================================\n" ] } ], "source": [ "# Generate key business insights\n", "print(\"=\" * 60)\n", "print(\"๐Ÿ“Š SALES ANALYSIS SUMMARY\")\n", "print(\"=\" * 60)\n", "\n", "# Overall metrics\n", "total_revenue = df['revenue'].sum()\n", "avg_transaction = df['revenue'].mean()\n", "total_units = df['quantity'].sum()\n", "num_transactions = len(df)\n", "\n", "print(f\"\\n๐Ÿ’ฐ Total Revenue: ${total_revenue:,.2f}\")\n", "print(f\"๐Ÿ“ฆ Total Units Sold: {total_units:,}\")\n", "print(f\"๐Ÿงพ Number of Transactions: {num_transactions:,}\")\n", "print(f\"๐Ÿ’ต Average Transaction Value: ${avg_transaction:,.2f}\")\n", "\n", "# Best performing\n", "best_product = df.groupby('product')['revenue'].sum().idxmax()\n", "best_region = df.groupby('region')['revenue'].sum().idxmax()\n", "best_salesperson = df.groupby('salesperson')['revenue'].sum().idxmax()\n", "\n", "print(f\"\\n๐Ÿ† Best Selling Product: {best_product}\")\n", "print(f\"๐ŸŒ Top Region: {best_region}\")\n", "print(f\"โญ Top Salesperson: {best_salesperson}\")\n", "\n", "# Time range\n", "print(f\"\\n๐Ÿ“… Date Range: {df['date'].min().date()} to {df['date'].max().date()}\")\n", "print(\"=\" * 60)" ] }, { "cell_type": "markdown", "id": "cbd9f5bb", "metadata": {}, "source": [ "## Next Steps\n", "\n", "Now that you've learned the basics of Pandas data analysis, you can:\n", "\n", "1. **Visualize your data** - Check out the [Matplotlib Visualization](/notebooks/matplotlib-visualization/) tutorial\n", "2. **Perform statistical analysis** - See the [Python Statistics](/notebooks/python-statistics/) tutorial\n", "3. **Work with APIs** - Learn to fetch data in the [API Requests](/notebooks/api-requests/) tutorial\n", "\n", "**Key Takeaways:**\n", "- Use `head()`, `info()`, and `describe()` for initial data exploration\n", "- Filter with boolean indexing or the `query()` method\n", "- Use `groupby()` for aggregations by category\n", "- Create `pivot_table()` for multi-dimensional analysis" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.14.0" } }, "nbformat": 4, "nbformat_minor": 5 }