{
"cells": [
{
"cell_type": "markdown",
"id": "5a048366-36d9-43fc-b2e7-6d4cf94469bc",
"metadata": {},
"source": [
"# Challenge: \"Sales Analysis\"\n",
"\n",
"### Description:\n",
"You are a sales analyst for an e-commerce company. You have been tasked with analyzing the sales data to determine which regions and product categories are generating the most revenue. You have a dataset containing information about sales, including the region, product category, and sales amount.\n",
"\n",
"### Tasks:\n",
"\n",
"Using Pandas, write a program to:\n",
"\n",
"- Load the dataset into a pandas DataFrame.\n",
"- Group the sales data by region and product category, and calculate the total sales amount for each region and product category.\n",
"- Calculate the total sales amount for each region and product category as a percentage of the overall sales amount.\n",
"- Print the top 3 regions and product categories with their total sales amount and percentage of overall sales.\n",
"\n",
"**Hint:** You can use the groupby function to group the data by region and product category, and calculate the total sales amount for each group. Then, use the pivot_table function to reshape the data and calculate the percentages."
]
},
{
"cell_type": "markdown",
"id": "d82f6ce6-be1a-4790-b77a-c482eadcf0bf",
"metadata": {},
"source": [
"### Here is the dataset\n",
"```\n",
"Region,Product Category,Sales Amount\n",
"North,Electronics,1000\n",
"North,Clothing,800\n",
"South,Electronics,1200\n",
"South,Electronics,900\n",
"East,Clothing,1100\n",
"East,Electronics,700\n",
"West,Electronics,1000\n",
"West,Electronics,900\n",
"North,Clothing,600\n",
"South,Clothing,1000\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "61aabeb9-a126-4894-a46a-657d4ac28858",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import sys"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "56b66926-d791-48db-b08c-2a24071fab6a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Python version 3.11.7 | packaged by Anaconda, Inc. | (main, Dec 15 2023, 18:05:47) [MSC v.1916 64 bit (AMD64)]\n",
"Pandas version 2.2.1\n"
]
}
],
"source": [
"print('Python version ' + sys.version)\n",
"print('Pandas version ' + pd.__version__)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "9dcbb8a6-c632-4692-9ca9-8c60c5ac6588",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Region | \n",
" Product Category | \n",
" Sales Amount | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" North | \n",
" Electronics | \n",
" 1000 | \n",
"
\n",
" \n",
" | 1 | \n",
" North | \n",
" Clothing | \n",
" 800 | \n",
"
\n",
" \n",
" | 2 | \n",
" South | \n",
" Electronics | \n",
" 1200 | \n",
"
\n",
" \n",
" | 3 | \n",
" South | \n",
" Electronics | \n",
" 900 | \n",
"
\n",
" \n",
" | 4 | \n",
" East | \n",
" Clothing | \n",
" 1100 | \n",
"
\n",
" \n",
" | 5 | \n",
" East | \n",
" Electronics | \n",
" 700 | \n",
"
\n",
" \n",
" | 6 | \n",
" West | \n",
" Electronics | \n",
" 1000 | \n",
"
\n",
" \n",
" | 7 | \n",
" West | \n",
" Electronics | \n",
" 900 | \n",
"
\n",
" \n",
" | 8 | \n",
" North | \n",
" Clothing | \n",
" 600 | \n",
"
\n",
" \n",
" | 9 | \n",
" South | \n",
" Clothing | \n",
" 1000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Region Product Category Sales Amount\n",
"0 North Electronics 1000\n",
"1 North Clothing 800\n",
"2 South Electronics 1200\n",
"3 South Electronics 900\n",
"4 East Clothing 1100\n",
"5 East Electronics 700\n",
"6 West Electronics 1000\n",
"7 West Electronics 900\n",
"8 North Clothing 600\n",
"9 South Clothing 1000"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_clipboard(sep=\",\")\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "9cf9dc6f-06b2-477e-a3e4-ffa45ccf0a52",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" Sales Amount | \n",
"
\n",
" \n",
" | Region | \n",
" Product Category | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | East | \n",
" Clothing | \n",
" 1100 | \n",
"
\n",
" \n",
" | Electronics | \n",
" 700 | \n",
"
\n",
" \n",
" | North | \n",
" Clothing | \n",
" 1400 | \n",
"
\n",
" \n",
" | Electronics | \n",
" 1000 | \n",
"
\n",
" \n",
" | South | \n",
" Clothing | \n",
" 1000 | \n",
"
\n",
" \n",
" | Electronics | \n",
" 2100 | \n",
"
\n",
" \n",
" | West | \n",
" Electronics | \n",
" 1900 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Sales Amount\n",
"Region Product Category \n",
"East Clothing 1100\n",
" Electronics 700\n",
"North Clothing 1400\n",
" Electronics 1000\n",
"South Clothing 1000\n",
" Electronics 2100\n",
"West Electronics 1900"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# group the sales data by region and product category\n",
"group = df.groupby(['Region','Product Category'])\n",
"\n",
"# total sales of group\n",
"group.sum()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "68884ccd-4afd-4adc-8681-1ff7fec2bb5b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Region | \n",
" Product Category | \n",
" Sales Amount | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" East | \n",
" Clothing | \n",
" 1100 | \n",
"
\n",
" \n",
" | 1 | \n",
" East | \n",
" Electronics | \n",
" 700 | \n",
"
\n",
" \n",
" | 2 | \n",
" North | \n",
" Clothing | \n",
" 1400 | \n",
"
\n",
" \n",
" | 3 | \n",
" North | \n",
" Electronics | \n",
" 1000 | \n",
"
\n",
" \n",
" | 4 | \n",
" South | \n",
" Clothing | \n",
" 1000 | \n",
"
\n",
" \n",
" | 5 | \n",
" South | \n",
" Electronics | \n",
" 2100 | \n",
"
\n",
" \n",
" | 6 | \n",
" West | \n",
" Electronics | \n",
" 1900 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Region Product Category Sales Amount\n",
"0 East Clothing 1100\n",
"1 East Electronics 700\n",
"2 North Clothing 1400\n",
"3 North Electronics 1000\n",
"4 South Clothing 1000\n",
"5 South Electronics 2100\n",
"6 West Electronics 1900"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# flatten the group\n",
"agg = group['Sales Amount'].sum().reset_index()\n",
"agg"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "3040186d-2863-4252-b379-8a892e0d3efa",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Region | \n",
" Product Category | \n",
" Sales Amount | \n",
" Percentage of Total Sales | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" East | \n",
" Clothing | \n",
" 1100 | \n",
" 11.956522 | \n",
"
\n",
" \n",
" | 1 | \n",
" East | \n",
" Electronics | \n",
" 700 | \n",
" 7.608696 | \n",
"
\n",
" \n",
" | 2 | \n",
" North | \n",
" Clothing | \n",
" 1400 | \n",
" 15.217391 | \n",
"
\n",
" \n",
" | 3 | \n",
" North | \n",
" Electronics | \n",
" 1000 | \n",
" 10.869565 | \n",
"
\n",
" \n",
" | 4 | \n",
" South | \n",
" Clothing | \n",
" 1000 | \n",
" 10.869565 | \n",
"
\n",
" \n",
" | 5 | \n",
" South | \n",
" Electronics | \n",
" 2100 | \n",
" 22.826087 | \n",
"
\n",
" \n",
" | 6 | \n",
" West | \n",
" Electronics | \n",
" 1900 | \n",
" 20.652174 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Region Product Category Sales Amount Percentage of Total Sales\n",
"0 East Clothing 1100 11.956522\n",
"1 East Electronics 700 7.608696\n",
"2 North Clothing 1400 15.217391\n",
"3 North Electronics 1000 10.869565\n",
"4 South Clothing 1000 10.869565\n",
"5 South Electronics 2100 22.826087\n",
"6 West Electronics 1900 20.652174"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# calculate total sales amount\n",
"total_sales = agg['Sales Amount'].sum()\n",
"\n",
"# calculate percentage of total sales for each region and product category\n",
"agg['Percentage of Total Sales'] = (agg['Sales Amount'] / total_sales) * 100\n",
"agg"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "e7f8e7c6-21ea-4adf-9911-37a29a10aaf6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Region | \n",
" Product Category | \n",
" Sales Amount | \n",
" Percentage of Total Sales | \n",
"
\n",
" \n",
" \n",
" \n",
" | 5 | \n",
" South | \n",
" Electronics | \n",
" 2100 | \n",
" 22.826087 | \n",
"
\n",
" \n",
" | 6 | \n",
" West | \n",
" Electronics | \n",
" 1900 | \n",
" 20.652174 | \n",
"
\n",
" \n",
" | 2 | \n",
" North | \n",
" Clothing | \n",
" 1400 | \n",
" 15.217391 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Region Product Category Sales Amount Percentage of Total Sales\n",
"5 South Electronics 2100 22.826087\n",
"6 West Electronics 1900 20.652174\n",
"2 North Clothing 1400 15.217391"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# get the top 3 regions\n",
"agg.sort_values(by='Percentage of Total Sales', ascending=False).head(3)"
]
},
{
"cell_type": "markdown",
"id": "5fd51f33-04ab-4a47-b86c-ea39bcf21e8c",
"metadata": {},
"source": [
"Here is an alternative solution using pivot tables."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "9b7fee3f-e59d-4b67-9cf3-a6c7785cab99",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | Product Category | \n",
" Clothing | \n",
" Electronics | \n",
"
\n",
" \n",
" | Region | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | East | \n",
" 1100.0 | \n",
" 700.0 | \n",
"
\n",
" \n",
" | North | \n",
" 1400.0 | \n",
" 1000.0 | \n",
"
\n",
" \n",
" | South | \n",
" 1000.0 | \n",
" 2100.0 | \n",
"
\n",
" \n",
" | West | \n",
" NaN | \n",
" 1900.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"Product Category Clothing Electronics\n",
"Region \n",
"East 1100.0 700.0\n",
"North 1400.0 1000.0\n",
"South 1000.0 2100.0\n",
"West NaN 1900.0"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# pivot the data to get sales amount by region and product category\n",
"pivot = pd.pivot_table(df, values='Sales Amount', index='Region', columns='Product Category', aggfunc='sum')\n",
"pivot"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "9206eda3-b6ce-4806-948f-6452eb09e426",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"9200"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# calculate total sales amount\n",
"total_sales = df['Sales Amount'].sum()\n",
"total_sales"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "659bf284-fd42-4f3d-9c8b-ffcff752ffb5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | Product Category | \n",
" Clothing | \n",
" Electronics | \n",
"
\n",
" \n",
" | Region | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | East | \n",
" 11.956522 | \n",
" 7.608696 | \n",
"
\n",
" \n",
" | North | \n",
" 15.217391 | \n",
" 10.869565 | \n",
"
\n",
" \n",
" | South | \n",
" 10.869565 | \n",
" 22.826087 | \n",
"
\n",
" \n",
" | West | \n",
" NaN | \n",
" 20.652174 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"Product Category Clothing Electronics\n",
"Region \n",
"East 11.956522 7.608696\n",
"North 15.217391 10.869565\n",
"South 10.869565 22.826087\n",
"West NaN 20.652174"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# calculate percentage of total sales for each region and product category\n",
"agg = (pivot / total_sales) * 100\n",
"agg"
]
},
{
"cell_type": "markdown",
"id": "6b385366-b19f-478f-905a-52e3bcf94bba",
"metadata": {},
"source": [
"With the shape of this DataFrame ordering by percentage of total sales is not really possible."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "9ce79699-f2eb-4842-9967-98649c3b275b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Region Product Category\n",
"South Electronics 22.826087\n",
"West Electronics 20.652174\n",
"North Clothing 15.217391\n",
"dtype: float64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# the stack function lets us reshape the data so we can sort by the percentages\n",
"agg.stack().sort_values(ascending=False).head(3)"
]
},
{
"cell_type": "markdown",
"id": "c0f62f91-a9a9-47c8-8450-7d715c32004c",
"metadata": {},
"source": [
"# Summary:\n",
"Now that you've gone through the tutorial, you've seen how the sales data was analyzed to find the best-selling regions and product categories. The analyst first loaded the data into a Pandas DataFrame and then grouped it by region and product category to calculate total sales for each combination. They then calculated the percentage of total sales for each group, which made it easy to compare performance across different regions and categories. Finally, they sorted the results to identify the top 3 regions and product categories, giving a clear picture of where the company's sales were strongest.\n",
"\n",
"### Key Takeaways:\n",
"- **Data Loading:** Use `pd.read_clipboard()` to load data from clipboard into a DataFrame.\n",
"- **Data Grouping:** Utilize `df.groupby()` to group data by multiple columns (region and product category).\n",
"- **Aggregation:** Apply `sum()` to calculate total sales for each group.\n",
"- **Data Reshaping:** Employ `reset_index()` to flatten grouped data and pd.pivot_table() for alternative data representation.\n",
"- **Percentage Calculation:** Calculate percentages of total sales using (grouped_sum / total_sales) * 100.\n",
"- **Sorting and Ranking:** Use `sort_values()` and `head()` to identify top-performing regions and product categories.\n",
"- **Data Transformation:** Apply `stack()` to reshape pivoted data for easier sorting."
]
},
{
"cell_type": "markdown",
"id": "f1497685-a0a1-4803-944f-78dd66748696",
"metadata": {},
"source": [
"This tutorial was created by HEDARO
"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}