{
"cells": [
{
"cell_type": "markdown",
"id": "38a98679",
"metadata": {},
"source": [
"# Data Analysis with Pandas\n",
"\n",
"Learn essential data analysis techniques using Python and Pandas. This tutorial covers loading data, exploration, filtering, grouping, and pivot tables using a real sales dataset.\n",
"\n",
"**Prerequisites:**\n",
"- Basic Python knowledge\n",
"- Pandas library installed\n",
"\n",
"**What you'll learn:**\n",
"- Loading and exploring CSV data\n",
"- Filtering and querying DataFrames\n",
"- Group by operations and aggregations\n",
"- Creating pivot tables for analysis"
]
},
{
"cell_type": "markdown",
"id": "2eff9dd4",
"metadata": {},
"source": [
"## 1. Setup: Import Libraries\n",
"\n",
"First, let's import the required libraries for data analysis."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "7ac57f00",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Libraries imported successfully!\n",
"Pandas version: 3.0.0\n",
"NumPy version: 2.4.2\n"
]
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"from datetime import datetime\n",
"\n",
"# Display settings for better output formatting\n",
"pd.set_option('display.max_columns', None)\n",
"pd.set_option('display.width', None)\n",
"pd.set_option('display.float_format', '{:.2f}'.format)\n",
"\n",
"print(\"Libraries imported successfully!\")\n",
"print(f\"Pandas version: {pd.__version__}\")\n",
"print(f\"NumPy version: {np.__version__}\")"
]
},
{
"cell_type": "markdown",
"id": "57e5a1e0",
"metadata": {},
"source": [
"## 2. Load and Explore Data\n",
"\n",
"Let's load our sales dataset and take a first look at the data structure."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "60f2cf96",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"๐ Sales Data - First 10 rows:\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" product | \n",
" category | \n",
" quantity | \n",
" unit_price | \n",
" revenue | \n",
" region | \n",
" salesperson | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 2025-01-05 | \n",
" Laptop Pro | \n",
" Electronics | \n",
" 3 | \n",
" 1299.99 | \n",
" 3899.97 | \n",
" North | \n",
" Alice Johnson | \n",
"
\n",
" \n",
" | 1 | \n",
" 2025-01-07 | \n",
" Wireless Mouse | \n",
" Electronics | \n",
" 15 | \n",
" 29.99 | \n",
" 449.85 | \n",
" South | \n",
" Bob Smith | \n",
"
\n",
" \n",
" | 2 | \n",
" 2025-01-08 | \n",
" Office Chair | \n",
" Furniture | \n",
" 5 | \n",
" 249.99 | \n",
" 1249.95 | \n",
" East | \n",
" Carol Davis | \n",
"
\n",
" \n",
" | 3 | \n",
" 2025-01-10 | \n",
" Standing Desk | \n",
" Furniture | \n",
" 2 | \n",
" 599.99 | \n",
" 1199.98 | \n",
" West | \n",
" David Wilson | \n",
"
\n",
" \n",
" | 4 | \n",
" 2025-01-12 | \n",
" Monitor 27inch | \n",
" Electronics | \n",
" 8 | \n",
" 399.99 | \n",
" 3199.92 | \n",
" North | \n",
" Alice Johnson | \n",
"
\n",
" \n",
" | 5 | \n",
" 2025-01-15 | \n",
" Keyboard Mechanical | \n",
" Electronics | \n",
" 12 | \n",
" 149.99 | \n",
" 1799.88 | \n",
" South | \n",
" Bob Smith | \n",
"
\n",
" \n",
" | 6 | \n",
" 2025-01-17 | \n",
" Desk Lamp | \n",
" Furniture | \n",
" 20 | \n",
" 49.99 | \n",
" 999.80 | \n",
" East | \n",
" Carol Davis | \n",
"
\n",
" \n",
" | 7 | \n",
" 2025-01-20 | \n",
" Webcam HD | \n",
" Electronics | \n",
" 10 | \n",
" 79.99 | \n",
" 799.90 | \n",
" West | \n",
" David Wilson | \n",
"
\n",
" \n",
" | 8 | \n",
" 2025-01-22 | \n",
" Laptop Pro | \n",
" Electronics | \n",
" 5 | \n",
" 1299.99 | \n",
" 6499.95 | \n",
" North | \n",
" Eve Martinez | \n",
"
\n",
" \n",
" | 9 | \n",
" 2025-01-25 | \n",
" Wireless Mouse | \n",
" Electronics | \n",
" 25 | \n",
" 29.99 | \n",
" 749.75 | \n",
" South | \n",
" Frank Brown | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date product category quantity unit_price revenue \\\n",
"0 2025-01-05 Laptop Pro Electronics 3 1299.99 3899.97 \n",
"1 2025-01-07 Wireless Mouse Electronics 15 29.99 449.85 \n",
"2 2025-01-08 Office Chair Furniture 5 249.99 1249.95 \n",
"3 2025-01-10 Standing Desk Furniture 2 599.99 1199.98 \n",
"4 2025-01-12 Monitor 27inch Electronics 8 399.99 3199.92 \n",
"5 2025-01-15 Keyboard Mechanical Electronics 12 149.99 1799.88 \n",
"6 2025-01-17 Desk Lamp Furniture 20 49.99 999.80 \n",
"7 2025-01-20 Webcam HD Electronics 10 79.99 799.90 \n",
"8 2025-01-22 Laptop Pro Electronics 5 1299.99 6499.95 \n",
"9 2025-01-25 Wireless Mouse Electronics 25 29.99 749.75 \n",
"\n",
" region salesperson \n",
"0 North Alice Johnson \n",
"1 South Bob Smith \n",
"2 East Carol Davis \n",
"3 West David Wilson \n",
"4 North Alice Johnson \n",
"5 South Bob Smith \n",
"6 East Carol Davis \n",
"7 West David Wilson \n",
"8 North Eve Martinez \n",
"9 South Frank Brown "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Load the sales data\n",
"df = pd.read_csv('/Users/bamr87/github/zer0-mistakes/assets/data/notebooks/sales_data.csv')\n",
"\n",
"# Convert date column to datetime\n",
"df['date'] = pd.to_datetime(df['date'])\n",
"\n",
"# Display first few rows\n",
"print(\"๐ Sales Data - First 10 rows:\")\n",
"df.head(10)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "cfe86dbe",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"๐ Dataset Information:\n",
"Shape: 98 rows ร 8 columns\n",
"\n",
"Column names: ['date', 'product', 'category', 'quantity', 'unit_price', 'revenue', 'region', 'salesperson']\n",
"\n",
"============================================================\n",
"\n",
"RangeIndex: 98 entries, 0 to 97\n",
"Data columns (total 8 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 date 98 non-null datetime64[us]\n",
" 1 product 98 non-null str \n",
" 2 category 98 non-null str \n",
" 3 quantity 98 non-null int64 \n",
" 4 unit_price 98 non-null float64 \n",
" 5 revenue 98 non-null float64 \n",
" 6 region 98 non-null str \n",
" 7 salesperson 98 non-null str \n",
"dtypes: datetime64[us](1), float64(2), int64(1), str(4)\n",
"memory usage: 6.3 KB\n"
]
}
],
"source": [
"# Get basic information about the dataset\n",
"print(\"๐ Dataset Information:\")\n",
"print(f\"Shape: {df.shape[0]} rows ร {df.shape[1]} columns\")\n",
"print(f\"\\nColumn names: {list(df.columns)}\")\n",
"print(\"\\n\" + \"=\"*60)\n",
"df.info()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c3e0029f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"๐ Statistical Summary:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" quantity | \n",
" unit_price | \n",
" revenue | \n",
"
\n",
" \n",
" \n",
" \n",
" | count | \n",
" 98 | \n",
" 98.00 | \n",
" 98.00 | \n",
" 98.00 | \n",
"
\n",
" \n",
" | mean | \n",
" 2025-05-08 14:56:19.591836 | \n",
" 16.55 | \n",
" 324.99 | \n",
" 2840.04 | \n",
"
\n",
" \n",
" | min | \n",
" 2025-01-05 00:00:00 | \n",
" 2.00 | \n",
" 29.99 | \n",
" 449.85 | \n",
"
\n",
" \n",
" | 25% | \n",
" 2025-03-08 12:00:00 | \n",
" 8.00 | \n",
" 49.99 | \n",
" 1199.74 | \n",
"
\n",
" \n",
" | 50% | \n",
" 2025-05-09 00:00:00 | \n",
" 15.00 | \n",
" 174.99 | \n",
" 2124.89 | \n",
"
\n",
" \n",
" | 75% | \n",
" 2025-07-09 12:00:00 | \n",
" 22.00 | \n",
" 399.99 | \n",
" 3862.44 | \n",
"
\n",
" \n",
" | max | \n",
" 2025-09-10 00:00:00 | \n",
" 50.00 | \n",
" 1299.99 | \n",
" 11699.91 | \n",
"
\n",
" \n",
" | std | \n",
" NaN | \n",
" 10.74 | \n",
" 389.56 | \n",
" 2236.43 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date quantity unit_price revenue\n",
"count 98 98.00 98.00 98.00\n",
"mean 2025-05-08 14:56:19.591836 16.55 324.99 2840.04\n",
"min 2025-01-05 00:00:00 2.00 29.99 449.85\n",
"25% 2025-03-08 12:00:00 8.00 49.99 1199.74\n",
"50% 2025-05-09 00:00:00 15.00 174.99 2124.89\n",
"75% 2025-07-09 12:00:00 22.00 399.99 3862.44\n",
"max 2025-09-10 00:00:00 50.00 1299.99 11699.91\n",
"std NaN 10.74 389.56 2236.43"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Statistical summary of numeric columns\n",
"print(\"๐ Statistical Summary:\")\n",
"df.describe()"
]
},
{
"cell_type": "markdown",
"id": "a2400b0c",
"metadata": {},
"source": [
"## 3. Filtering and Querying Data\n",
"\n",
"Learn different ways to filter and query your DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ed963c7d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"๐ Electronics Sales: 68 transactions\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" product | \n",
" category | \n",
" quantity | \n",
" unit_price | \n",
" revenue | \n",
" region | \n",
" salesperson | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 2025-01-05 | \n",
" Laptop Pro | \n",
" Electronics | \n",
" 3 | \n",
" 1299.99 | \n",
" 3899.97 | \n",
" North | \n",
" Alice Johnson | \n",
"
\n",
" \n",
" | 1 | \n",
" 2025-01-07 | \n",
" Wireless Mouse | \n",
" Electronics | \n",
" 15 | \n",
" 29.99 | \n",
" 449.85 | \n",
" South | \n",
" Bob Smith | \n",
"
\n",
" \n",
" | 4 | \n",
" 2025-01-12 | \n",
" Monitor 27inch | \n",
" Electronics | \n",
" 8 | \n",
" 399.99 | \n",
" 3199.92 | \n",
" North | \n",
" Alice Johnson | \n",
"
\n",
" \n",
" | 5 | \n",
" 2025-01-15 | \n",
" Keyboard Mechanical | \n",
" Electronics | \n",
" 12 | \n",
" 149.99 | \n",
" 1799.88 | \n",
" South | \n",
" Bob Smith | \n",
"
\n",
" \n",
" | 7 | \n",
" 2025-01-20 | \n",
" Webcam HD | \n",
" Electronics | \n",
" 10 | \n",
" 79.99 | \n",
" 799.90 | \n",
" West | \n",
" David Wilson | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date product category quantity unit_price revenue \\\n",
"0 2025-01-05 Laptop Pro Electronics 3 1299.99 3899.97 \n",
"1 2025-01-07 Wireless Mouse Electronics 15 29.99 449.85 \n",
"4 2025-01-12 Monitor 27inch Electronics 8 399.99 3199.92 \n",
"5 2025-01-15 Keyboard Mechanical Electronics 12 149.99 1799.88 \n",
"7 2025-01-20 Webcam HD Electronics 10 79.99 799.90 \n",
"\n",
" region salesperson \n",
"0 North Alice Johnson \n",
"1 South Bob Smith \n",
"4 North Alice Johnson \n",
"5 South Bob Smith \n",
"7 West David Wilson "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Filter: Get all Electronics sales\n",
"electronics_sales = df[df['category'] == 'Electronics']\n",
"print(f\"๐ Electronics Sales: {len(electronics_sales)} transactions\")\n",
"electronics_sales.head()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "6260239d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"๐ฐ High-value North sales: 14 transactions\n",
"Total revenue: $75,348.48\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" product | \n",
" category | \n",
" quantity | \n",
" unit_price | \n",
" revenue | \n",
" region | \n",
" salesperson | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 2025-01-05 | \n",
" Laptop Pro | \n",
" Electronics | \n",
" 3 | \n",
" 1299.99 | \n",
" 3899.97 | \n",
" North | \n",
" Alice Johnson | \n",
"
\n",
" \n",
" | 4 | \n",
" 2025-01-12 | \n",
" Monitor 27inch | \n",
" Electronics | \n",
" 8 | \n",
" 399.99 | \n",
" 3199.92 | \n",
" North | \n",
" Alice Johnson | \n",
"
\n",
" \n",
" | 8 | \n",
" 2025-01-22 | \n",
" Laptop Pro | \n",
" Electronics | \n",
" 5 | \n",
" 1299.99 | \n",
" 6499.95 | \n",
" North | \n",
" Eve Martinez | \n",
"
\n",
" \n",
" | 32 | \n",
" 2025-03-28 | \n",
" Monitor 27inch | \n",
" Electronics | \n",
" 12 | \n",
" 399.99 | \n",
" 4799.88 | \n",
" North | \n",
" Eve Martinez | \n",
"
\n",
" \n",
" | 36 | \n",
" 2025-04-08 | \n",
" Laptop Pro | \n",
" Electronics | \n",
" 6 | \n",
" 1299.99 | \n",
" 7799.94 | \n",
" North | \n",
" Carol Davis | \n",
"
\n",
" \n",
" | 52 | \n",
" 2025-05-18 | \n",
" Monitor 27inch | \n",
" Electronics | \n",
" 15 | \n",
" 399.99 | \n",
" 5999.85 | \n",
" North | \n",
" Alice Johnson | \n",
"
\n",
" \n",
" | 56 | \n",
" 2025-05-28 | \n",
" Laptop Pro | \n",
" Electronics | \n",
" 3 | \n",
" 1299.99 | \n",
" 3899.97 | \n",
" North | \n",
" Eve Martinez | \n",
"
\n",
" \n",
" | 68 | \n",
" 2025-06-28 | \n",
" Headphones Wireless | \n",
" Electronics | \n",
" 20 | \n",
" 199.99 | \n",
" 3999.80 | \n",
" North | \n",
" Eve Martinez | \n",
"
\n",
" \n",
" | 72 | \n",
" 2025-07-08 | \n",
" Monitor 27inch | \n",
" Electronics | \n",
" 14 | \n",
" 399.99 | \n",
" 5599.86 | \n",
" North | \n",
" Carol Davis | \n",
"
\n",
" \n",
" | 76 | \n",
" 2025-07-18 | \n",
" Laptop Pro | \n",
" Electronics | \n",
" 5 | \n",
" 1299.99 | \n",
" 6499.95 | \n",
" North | \n",
" Alice Johnson | \n",
"
\n",
" \n",
" | 80 | \n",
" 2025-07-28 | \n",
" Office Chair | \n",
" Furniture | \n",
" 15 | \n",
" 249.99 | \n",
" 3749.85 | \n",
" North | \n",
" Eve Martinez | \n",
"
\n",
" \n",
" | 88 | \n",
" 2025-08-18 | \n",
" Headphones Wireless | \n",
" Electronics | \n",
" 22 | \n",
" 199.99 | \n",
" 4399.78 | \n",
" North | \n",
" Alice Johnson | \n",
"
\n",
" \n",
" | 92 | \n",
" 2025-08-28 | \n",
" Monitor 27inch | \n",
" Electronics | \n",
" 18 | \n",
" 399.99 | \n",
" 7199.82 | \n",
" North | \n",
" Eve Martinez | \n",
"
\n",
" \n",
" | 96 | \n",
" 2025-09-08 | \n",
" Laptop Pro | \n",
" Electronics | \n",
" 6 | \n",
" 1299.99 | \n",
" 7799.94 | \n",
" North | \n",
" Carol Davis | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date product category quantity unit_price \\\n",
"0 2025-01-05 Laptop Pro Electronics 3 1299.99 \n",
"4 2025-01-12 Monitor 27inch Electronics 8 399.99 \n",
"8 2025-01-22 Laptop Pro Electronics 5 1299.99 \n",
"32 2025-03-28 Monitor 27inch Electronics 12 399.99 \n",
"36 2025-04-08 Laptop Pro Electronics 6 1299.99 \n",
"52 2025-05-18 Monitor 27inch Electronics 15 399.99 \n",
"56 2025-05-28 Laptop Pro Electronics 3 1299.99 \n",
"68 2025-06-28 Headphones Wireless Electronics 20 199.99 \n",
"72 2025-07-08 Monitor 27inch Electronics 14 399.99 \n",
"76 2025-07-18 Laptop Pro Electronics 5 1299.99 \n",
"80 2025-07-28 Office Chair Furniture 15 249.99 \n",
"88 2025-08-18 Headphones Wireless Electronics 22 199.99 \n",
"92 2025-08-28 Monitor 27inch Electronics 18 399.99 \n",
"96 2025-09-08 Laptop Pro Electronics 6 1299.99 \n",
"\n",
" revenue region salesperson \n",
"0 3899.97 North Alice Johnson \n",
"4 3199.92 North Alice Johnson \n",
"8 6499.95 North Eve Martinez \n",
"32 4799.88 North Eve Martinez \n",
"36 7799.94 North Carol Davis \n",
"52 5999.85 North Alice Johnson \n",
"56 3899.97 North Eve Martinez \n",
"68 3999.80 North Eve Martinez \n",
"72 5599.86 North Carol Davis \n",
"76 6499.95 North Alice Johnson \n",
"80 3749.85 North Eve Martinez \n",
"88 4399.78 North Alice Johnson \n",
"92 7199.82 North Eve Martinez \n",
"96 7799.94 North Carol Davis "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Multiple conditions: High-value sales (revenue > $3000) in the North region\n",
"high_value_north = df[(df['revenue'] > 3000) & (df['region'] == 'North')]\n",
"print(f\"๐ฐ High-value North sales: {len(high_value_north)} transactions\")\n",
"print(f\"Total revenue: ${high_value_north['revenue'].sum():,.2f}\")\n",
"high_value_north"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "f872b2fd",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"๐
Q3 2025 Sales: 28 transactions\n",
"Q3 Total Revenue: $107,193.76\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" product | \n",
" category | \n",
" quantity | \n",
" unit_price | \n",
" revenue | \n",
" region | \n",
" salesperson | \n",
"
\n",
" \n",
" \n",
" \n",
" | 70 | \n",
" 2025-07-02 | \n",
" Office Chair | \n",
" Furniture | \n",
" 11 | \n",
" 249.99 | \n",
" 2749.89 | \n",
" East | \n",
" Alice Johnson | \n",
"
\n",
" \n",
" | 71 | \n",
" 2025-07-05 | \n",
" Standing Desk | \n",
" Furniture | \n",
" 8 | \n",
" 599.99 | \n",
" 4799.92 | \n",
" West | \n",
" Bob Smith | \n",
"
\n",
" \n",
" | 72 | \n",
" 2025-07-08 | \n",
" Monitor 27inch | \n",
" Electronics | \n",
" 14 | \n",
" 399.99 | \n",
" 5599.86 | \n",
" North | \n",
" Carol Davis | \n",
"
\n",
" \n",
" | 73 | \n",
" 2025-07-10 | \n",
" Keyboard Mechanical | \n",
" Electronics | \n",
" 28 | \n",
" 149.99 | \n",
" 4199.72 | \n",
" South | \n",
" David Wilson | \n",
"
\n",
" \n",
" | 74 | \n",
" 2025-07-12 | \n",
" Desk Lamp | \n",
" Furniture | \n",
" 35 | \n",
" 49.99 | \n",
" 1749.65 | \n",
" East | \n",
" Eve Martinez | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date product category quantity unit_price \\\n",
"70 2025-07-02 Office Chair Furniture 11 249.99 \n",
"71 2025-07-05 Standing Desk Furniture 8 599.99 \n",
"72 2025-07-08 Monitor 27inch Electronics 14 399.99 \n",
"73 2025-07-10 Keyboard Mechanical Electronics 28 149.99 \n",
"74 2025-07-12 Desk Lamp Furniture 35 49.99 \n",
"\n",
" revenue region salesperson \n",
"70 2749.89 East Alice Johnson \n",
"71 4799.92 West Bob Smith \n",
"72 5599.86 North Carol Davis \n",
"73 4199.72 South David Wilson \n",
"74 1749.65 East Eve Martinez "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Using query() method - more readable for complex filters\n",
"q3_sales = df.query(\"date >= '2025-07-01' and date <= '2025-09-30'\")\n",
"print(f\"๐
Q3 2025 Sales: {len(q3_sales)} transactions\")\n",
"print(f\"Q3 Total Revenue: ${q3_sales['revenue'].sum():,.2f}\")\n",
"q3_sales.head()"
]
},
{
"cell_type": "markdown",
"id": "67a80476",
"metadata": {},
"source": [
"## 4. Group By Operations\n",
"\n",
"Group data by one or more columns and perform aggregations."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "b8160a8f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"๐ต Revenue by Category:\n"
]
},
{
"data": {
"text/plain": [
"category\n",
"Electronics 208677.93\n",
"Furniture 69645.85\n",
"Name: revenue, dtype: float64"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Group by category and calculate total revenue\n",
"category_revenue = df.groupby('category')['revenue'].sum().sort_values(ascending=False)\n",
"print(\"๐ต Revenue by Category:\")\n",
"category_revenue"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "2c53d835",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"๐ Sales Statistics by Region:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" total_revenue | \n",
" avg_revenue | \n",
" num_transactions | \n",
" total_units | \n",
"
\n",
" \n",
" | region | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | North | \n",
" 96996.70 | \n",
" 3879.87 | \n",
" 25 | \n",
" 330 | \n",
"
\n",
" \n",
" | East | \n",
" 85646.80 | \n",
" 3568.62 | \n",
" 24 | \n",
" 320 | \n",
"
\n",
" \n",
" | South | \n",
" 48294.72 | \n",
" 1931.79 | \n",
" 25 | \n",
" 528 | \n",
"
\n",
" \n",
" | West | \n",
" 47385.56 | \n",
" 1974.40 | \n",
" 24 | \n",
" 444 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" total_revenue avg_revenue num_transactions total_units\n",
"region \n",
"North 96996.70 3879.87 25 330\n",
"East 85646.80 3568.62 24 320\n",
"South 48294.72 1931.79 25 528\n",
"West 47385.56 1974.40 24 444"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Group by region and calculate multiple metrics\n",
"region_stats = df.groupby('region').agg({\n",
" 'revenue': ['sum', 'mean', 'count'],\n",
" 'quantity': 'sum'\n",
"}).round(2)\n",
"\n",
"# Flatten column names\n",
"region_stats.columns = ['total_revenue', 'avg_revenue', 'num_transactions', 'total_units']\n",
"region_stats = region_stats.sort_values('total_revenue', ascending=False)\n",
"\n",
"print(\"๐ Sales Statistics by Region:\")\n",
"region_stats"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "a8b3f35a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"๐ Top Salespeople by Total Revenue:\n"
]
},
{
"data": {
"text/plain": [
"salesperson\n",
"Eve Martinez 62347.93\n",
"Alice Johnson 61397.89\n",
"Carol Davis 58897.68\n",
"David Wilson 34786.65\n",
"Bob Smith 33356.65\n",
"Frank Brown 27536.98\n",
"Name: revenue, dtype: float64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Top salespeople by revenue\n",
"salesperson_revenue = df.groupby('salesperson')['revenue'].sum().sort_values(ascending=False)\n",
"print(\"๐ Top Salespeople by Total Revenue:\")\n",
"salesperson_revenue"
]
},
{
"cell_type": "markdown",
"id": "1867dc5e",
"metadata": {},
"source": [
"## 5. Pivot Tables\n",
"\n",
"Create pivot tables for multi-dimensional analysis."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "d126bb0e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"๐ Revenue by Region and Category:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | category | \n",
" Electronics | \n",
" Furniture | \n",
"
\n",
" \n",
" | region | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | East | \n",
" 63798.81 | \n",
" 21847.99 | \n",
"
\n",
" \n",
" | North | \n",
" 81598.30 | \n",
" 15398.40 | \n",
"
\n",
" \n",
" | South | \n",
" 35094.94 | \n",
" 13199.78 | \n",
"
\n",
" \n",
" | West | \n",
" 28185.88 | \n",
" 19199.68 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"category Electronics Furniture\n",
"region \n",
"East 63798.81 21847.99\n",
"North 81598.30 15398.40\n",
"South 35094.94 13199.78\n",
"West 28185.88 19199.68"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Pivot table: Revenue by Region and Category\n",
"pivot_region_category = pd.pivot_table(\n",
" df,\n",
" values='revenue',\n",
" index='region',\n",
" columns='category',\n",
" aggfunc='sum',\n",
" fill_value=0\n",
")\n",
"\n",
"print(\"๐ Revenue by Region and Category:\")\n",
"pivot_region_category"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "416698ac",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"๐
Monthly Revenue by Product:\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | month | \n",
" 2025-01 | \n",
" 2025-02 | \n",
" 2025-03 | \n",
" 2025-04 | \n",
" 2025-05 | \n",
" 2025-06 | \n",
" 2025-07 | \n",
" 2025-08 | \n",
" 2025-09 | \n",
"
\n",
" \n",
" | product | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | Desk Lamp | \n",
" 999.80 | \n",
" 749.85 | \n",
" 1249.75 | \n",
" 2399.52 | \n",
" 1099.78 | \n",
" 1399.72 | \n",
" 1749.65 | \n",
" 1999.60 | \n",
" 1599.68 | \n",
"
\n",
" \n",
" | Headphones Wireless | \n",
" 0.00 | \n",
" 1399.93 | \n",
" 1999.90 | \n",
" 2399.88 | \n",
" 2999.85 | \n",
" 5599.72 | \n",
" 3599.82 | \n",
" 4399.78 | \n",
" 0.00 | \n",
"
\n",
" \n",
" | Keyboard Mechanical | \n",
" 1799.88 | \n",
" 2699.82 | \n",
" 5249.65 | \n",
" 3299.78 | \n",
" 3749.75 | \n",
" 2399.84 | \n",
" 4199.72 | \n",
" 9749.35 | \n",
" 0.00 | \n",
"
\n",
" \n",
" | Laptop Pro | \n",
" 10399.92 | \n",
" 2599.98 | \n",
" 5199.96 | \n",
" 7799.94 | \n",
" 14299.89 | \n",
" 9099.93 | \n",
" 6499.95 | \n",
" 11699.91 | \n",
" 7799.94 | \n",
"
\n",
" \n",
" | Monitor 27inch | \n",
" 3199.92 | \n",
" 2399.94 | \n",
" 8799.78 | \n",
" 3599.91 | \n",
" 5999.85 | \n",
" 4399.89 | \n",
" 5599.86 | \n",
" 13599.66 | \n",
" 0.00 | \n",
"
\n",
" \n",
" | Office Chair | \n",
" 1249.95 | \n",
" 3499.86 | \n",
" 2499.90 | \n",
" 1749.93 | \n",
" 2999.88 | \n",
" 2249.91 | \n",
" 6499.74 | \n",
" 3249.87 | \n",
" 0.00 | \n",
"
\n",
" \n",
" | Standing Desk | \n",
" 1199.98 | \n",
" 4199.93 | \n",
" 2999.95 | \n",
" 3599.94 | \n",
" 2399.96 | \n",
" 4199.93 | \n",
" 8399.86 | \n",
" 5399.91 | \n",
" 0.00 | \n",
"
\n",
" \n",
" | USB Hub | \n",
" 0.00 | \n",
" 879.78 | \n",
" 719.82 | \n",
" 999.75 | \n",
" 1199.70 | \n",
" 1399.65 | \n",
" 1399.65 | \n",
" 1119.72 | \n",
" 0.00 | \n",
"
\n",
" \n",
" | Webcam HD | \n",
" 799.90 | \n",
" 959.88 | \n",
" 639.92 | \n",
" 1199.85 | \n",
" 2319.71 | \n",
" 1119.86 | \n",
" 1599.80 | \n",
" 1279.84 | \n",
" 1759.78 | \n",
"
\n",
" \n",
" | Wireless Mouse | \n",
" 1199.60 | \n",
" 899.70 | \n",
" 599.80 | \n",
" 1049.65 | \n",
" 2039.32 | \n",
" 959.68 | \n",
" 1349.55 | \n",
" 1499.50 | \n",
" 1139.62 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"month 2025-01 2025-02 2025-03 2025-04 2025-05 2025-06 \\\n",
"product \n",
"Desk Lamp 999.80 749.85 1249.75 2399.52 1099.78 1399.72 \n",
"Headphones Wireless 0.00 1399.93 1999.90 2399.88 2999.85 5599.72 \n",
"Keyboard Mechanical 1799.88 2699.82 5249.65 3299.78 3749.75 2399.84 \n",
"Laptop Pro 10399.92 2599.98 5199.96 7799.94 14299.89 9099.93 \n",
"Monitor 27inch 3199.92 2399.94 8799.78 3599.91 5999.85 4399.89 \n",
"Office Chair 1249.95 3499.86 2499.90 1749.93 2999.88 2249.91 \n",
"Standing Desk 1199.98 4199.93 2999.95 3599.94 2399.96 4199.93 \n",
"USB Hub 0.00 879.78 719.82 999.75 1199.70 1399.65 \n",
"Webcam HD 799.90 959.88 639.92 1199.85 2319.71 1119.86 \n",
"Wireless Mouse 1199.60 899.70 599.80 1049.65 2039.32 959.68 \n",
"\n",
"month 2025-07 2025-08 2025-09 \n",
"product \n",
"Desk Lamp 1749.65 1999.60 1599.68 \n",
"Headphones Wireless 3599.82 4399.78 0.00 \n",
"Keyboard Mechanical 4199.72 9749.35 0.00 \n",
"Laptop Pro 6499.95 11699.91 7799.94 \n",
"Monitor 27inch 5599.86 13599.66 0.00 \n",
"Office Chair 6499.74 3249.87 0.00 \n",
"Standing Desk 8399.86 5399.91 0.00 \n",
"USB Hub 1399.65 1119.72 0.00 \n",
"Webcam HD 1599.80 1279.84 1759.78 \n",
"Wireless Mouse 1349.55 1499.50 1139.62 "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Add month column for time-based analysis\n",
"df['month'] = df['date'].dt.to_period('M')\n",
"\n",
"# Pivot table: Monthly revenue by product\n",
"pivot_monthly = pd.pivot_table(\n",
" df,\n",
" values='revenue',\n",
" index='product',\n",
" columns='month',\n",
" aggfunc='sum',\n",
" fill_value=0\n",
")\n",
"\n",
"print(\"๐
Monthly Revenue by Product:\")\n",
"pivot_monthly"
]
},
{
"cell_type": "markdown",
"id": "f8265e28",
"metadata": {},
"source": [
"## 6. Summary Statistics and Key Insights\n",
"\n",
"Generate a comprehensive summary of the sales data."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "a75a22eb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"============================================================\n",
"๐ SALES ANALYSIS SUMMARY\n",
"============================================================\n",
"\n",
"๐ฐ Total Revenue: $278,323.78\n",
"๐ฆ Total Units Sold: 1,622\n",
"๐งพ Number of Transactions: 98\n",
"๐ต Average Transaction Value: $2,840.04\n",
"\n",
"๐ Best Selling Product: Laptop Pro\n",
"๐ Top Region: North\n",
"โญ Top Salesperson: Eve Martinez\n",
"\n",
"๐
Date Range: 2025-01-05 to 2025-09-10\n",
"============================================================\n"
]
}
],
"source": [
"# Generate key business insights\n",
"print(\"=\" * 60)\n",
"print(\"๐ SALES ANALYSIS SUMMARY\")\n",
"print(\"=\" * 60)\n",
"\n",
"# Overall metrics\n",
"total_revenue = df['revenue'].sum()\n",
"avg_transaction = df['revenue'].mean()\n",
"total_units = df['quantity'].sum()\n",
"num_transactions = len(df)\n",
"\n",
"print(f\"\\n๐ฐ Total Revenue: ${total_revenue:,.2f}\")\n",
"print(f\"๐ฆ Total Units Sold: {total_units:,}\")\n",
"print(f\"๐งพ Number of Transactions: {num_transactions:,}\")\n",
"print(f\"๐ต Average Transaction Value: ${avg_transaction:,.2f}\")\n",
"\n",
"# Best performing\n",
"best_product = df.groupby('product')['revenue'].sum().idxmax()\n",
"best_region = df.groupby('region')['revenue'].sum().idxmax()\n",
"best_salesperson = df.groupby('salesperson')['revenue'].sum().idxmax()\n",
"\n",
"print(f\"\\n๐ Best Selling Product: {best_product}\")\n",
"print(f\"๐ Top Region: {best_region}\")\n",
"print(f\"โญ Top Salesperson: {best_salesperson}\")\n",
"\n",
"# Time range\n",
"print(f\"\\n๐
Date Range: {df['date'].min().date()} to {df['date'].max().date()}\")\n",
"print(\"=\" * 60)"
]
},
{
"cell_type": "markdown",
"id": "cbd9f5bb",
"metadata": {},
"source": [
"## Next Steps\n",
"\n",
"Now that you've learned the basics of Pandas data analysis, you can:\n",
"\n",
"1. **Visualize your data** - Check out the [Matplotlib Visualization](/notebooks/matplotlib-visualization/) tutorial\n",
"2. **Perform statistical analysis** - See the [Python Statistics](/notebooks/python-statistics/) tutorial\n",
"3. **Work with APIs** - Learn to fetch data in the [API Requests](/notebooks/api-requests/) tutorial\n",
"\n",
"**Key Takeaways:**\n",
"- Use `head()`, `info()`, and `describe()` for initial data exploration\n",
"- Filter with boolean indexing or the `query()` method\n",
"- Use `groupby()` for aggregations by category\n",
"- Create `pivot_table()` for multi-dimensional analysis"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.14.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}