{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Ex - GroupBy\n",
"\n",
"Check out [Alcohol Consumption Exercises Video Tutorial](https://youtu.be/az67CMdmS6s) to watch a data scientist go through the exercises"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Introduction:\n",
"\n",
"GroupBy can be summarized as Split-Apply-Combine.\n",
"\n",
"Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.\n",
"\n",
"Check out this [Diagram](http://i.imgur.com/yjNkiwL.png) \n",
"### Step 1. Import the necessary libraries"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 3. Assign it to a variable called drinks.(Watch the values of Column continent NA (North America), and how Pandas interprets it!"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" country | \n",
" beer_servings | \n",
" spirit_servings | \n",
" wine_servings | \n",
" total_litres_of_pure_alcohol | \n",
" continent | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" Afghanistan | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0.0 | \n",
" AS | \n",
"
\n",
" \n",
" | 1 | \n",
" Albania | \n",
" 89 | \n",
" 132 | \n",
" 54 | \n",
" 4.9 | \n",
" EU | \n",
"
\n",
" \n",
" | 2 | \n",
" Algeria | \n",
" 25 | \n",
" 0 | \n",
" 14 | \n",
" 0.7 | \n",
" AF | \n",
"
\n",
" \n",
" | 3 | \n",
" Andorra | \n",
" 245 | \n",
" 138 | \n",
" 312 | \n",
" 12.4 | \n",
" EU | \n",
"
\n",
" \n",
" | 4 | \n",
" Angola | \n",
" 217 | \n",
" 57 | \n",
" 45 | \n",
" 5.9 | \n",
" AF | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" country beer_servings spirit_servings wine_servings \\\n",
"0 Afghanistan 0 0 0 \n",
"1 Albania 89 132 54 \n",
"2 Algeria 25 0 14 \n",
"3 Andorra 245 138 312 \n",
"4 Angola 217 57 45 \n",
"\n",
" total_litres_of_pure_alcohol continent \n",
"0 0.0 AS \n",
"1 4.9 EU \n",
"2 0.7 AF \n",
"3 12.4 EU \n",
"4 5.9 AF "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"drinks = pd.read_csv('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv',keep_default_na=False)\n",
"drinks.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 4. Which continent drinks more beer on average?"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"continent\n",
"AF 61.471698\n",
"AS 37.045455\n",
"EU 193.777778\n",
"NA 145.434783\n",
"OC 89.687500\n",
"SA 175.083333\n",
"Name: beer_servings, dtype: float64"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"drinks.groupby('continent').beer_servings.mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 5. For each continent print the statistics for wine consumption."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" count | \n",
" mean | \n",
" std | \n",
" min | \n",
" 25% | \n",
" 50% | \n",
" 75% | \n",
" max | \n",
"
\n",
" \n",
" | continent | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | AF | \n",
" 53.0 | \n",
" 16.264151 | \n",
" 38.846419 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 2.0 | \n",
" 13.00 | \n",
" 233.0 | \n",
"
\n",
" \n",
" | AS | \n",
" 44.0 | \n",
" 9.068182 | \n",
" 21.667034 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 8.00 | \n",
" 123.0 | \n",
"
\n",
" \n",
" | EU | \n",
" 45.0 | \n",
" 142.222222 | \n",
" 97.421738 | \n",
" 0.0 | \n",
" 59.0 | \n",
" 128.0 | \n",
" 195.00 | \n",
" 370.0 | \n",
"
\n",
" \n",
" | NA | \n",
" 23.0 | \n",
" 24.521739 | \n",
" 28.266378 | \n",
" 1.0 | \n",
" 5.0 | \n",
" 11.0 | \n",
" 34.00 | \n",
" 100.0 | \n",
"
\n",
" \n",
" | OC | \n",
" 16.0 | \n",
" 35.625000 | \n",
" 64.555790 | \n",
" 0.0 | \n",
" 1.0 | \n",
" 8.5 | \n",
" 23.25 | \n",
" 212.0 | \n",
"
\n",
" \n",
" | SA | \n",
" 12.0 | \n",
" 62.416667 | \n",
" 88.620189 | \n",
" 1.0 | \n",
" 3.0 | \n",
" 12.0 | \n",
" 98.50 | \n",
" 221.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" count mean std min 25% 50% 75% max\n",
"continent \n",
"AF 53.0 16.264151 38.846419 0.0 1.0 2.0 13.00 233.0\n",
"AS 44.0 9.068182 21.667034 0.0 0.0 1.0 8.00 123.0\n",
"EU 45.0 142.222222 97.421738 0.0 59.0 128.0 195.00 370.0\n",
"NA 23.0 24.521739 28.266378 1.0 5.0 11.0 34.00 100.0\n",
"OC 16.0 35.625000 64.555790 0.0 1.0 8.5 23.25 212.0\n",
"SA 12.0 62.416667 88.620189 1.0 3.0 12.0 98.50 221.0"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"drinks.groupby('continent').wine_servings.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 6. Print the mean alcohol consumption per continent for every column"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" beer_servings | \n",
" spirit_servings | \n",
" wine_servings | \n",
" total_litres_of_pure_alcohol | \n",
"
\n",
" \n",
" | continent | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | AF | \n",
" 61.471698 | \n",
" 16.339623 | \n",
" 16.264151 | \n",
" 3.007547 | \n",
"
\n",
" \n",
" | AS | \n",
" 37.045455 | \n",
" 60.840909 | \n",
" 9.068182 | \n",
" 2.170455 | \n",
"
\n",
" \n",
" | EU | \n",
" 193.777778 | \n",
" 132.555556 | \n",
" 142.222222 | \n",
" 8.617778 | \n",
"
\n",
" \n",
" | NA | \n",
" 145.434783 | \n",
" 165.739130 | \n",
" 24.521739 | \n",
" 5.995652 | \n",
"
\n",
" \n",
" | OC | \n",
" 89.687500 | \n",
" 58.437500 | \n",
" 35.625000 | \n",
" 3.381250 | \n",
"
\n",
" \n",
" | SA | \n",
" 175.083333 | \n",
" 114.750000 | \n",
" 62.416667 | \n",
" 6.308333 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" beer_servings spirit_servings wine_servings \\\n",
"continent \n",
"AF 61.471698 16.339623 16.264151 \n",
"AS 37.045455 60.840909 9.068182 \n",
"EU 193.777778 132.555556 142.222222 \n",
"NA 145.434783 165.739130 24.521739 \n",
"OC 89.687500 58.437500 35.625000 \n",
"SA 175.083333 114.750000 62.416667 \n",
"\n",
" total_litres_of_pure_alcohol \n",
"continent \n",
"AF 3.007547 \n",
"AS 2.170455 \n",
"EU 8.617778 \n",
"NA 5.995652 \n",
"OC 3.381250 \n",
"SA 6.308333 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"drinks.groupby('continent').mean(numeric_only=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 7. Print the median alcohol consumption per continent for every column"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" beer_servings | \n",
" spirit_servings | \n",
" wine_servings | \n",
" total_litres_of_pure_alcohol | \n",
"
\n",
" \n",
" | continent | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | AF | \n",
" 32.0 | \n",
" 3.0 | \n",
" 2.0 | \n",
" 2.30 | \n",
"
\n",
" \n",
" | AS | \n",
" 17.5 | \n",
" 16.0 | \n",
" 1.0 | \n",
" 1.20 | \n",
"
\n",
" \n",
" | EU | \n",
" 219.0 | \n",
" 122.0 | \n",
" 128.0 | \n",
" 10.00 | \n",
"
\n",
" \n",
" | NA | \n",
" 143.0 | \n",
" 137.0 | \n",
" 11.0 | \n",
" 6.30 | \n",
"
\n",
" \n",
" | OC | \n",
" 52.5 | \n",
" 37.0 | \n",
" 8.5 | \n",
" 1.75 | \n",
"
\n",
" \n",
" | SA | \n",
" 162.5 | \n",
" 108.5 | \n",
" 12.0 | \n",
" 6.85 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" beer_servings spirit_servings wine_servings \\\n",
"continent \n",
"AF 32.0 3.0 2.0 \n",
"AS 17.5 16.0 1.0 \n",
"EU 219.0 122.0 128.0 \n",
"NA 143.0 137.0 11.0 \n",
"OC 52.5 37.0 8.5 \n",
"SA 162.5 108.5 12.0 \n",
"\n",
" total_litres_of_pure_alcohol \n",
"continent \n",
"AF 2.30 \n",
"AS 1.20 \n",
"EU 10.00 \n",
"NA 6.30 \n",
"OC 1.75 \n",
"SA 6.85 "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"drinks.groupby('continent').median(numeric_only=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 8. Print the mean, min and max values for spirit consumption for each Continent.\n",
"#### This time output a DataFrame"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" mean | \n",
" min | \n",
" max | \n",
"
\n",
" \n",
" | continent | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | AF | \n",
" 16.339623 | \n",
" 0 | \n",
" 152 | \n",
"
\n",
" \n",
" | AS | \n",
" 60.840909 | \n",
" 0 | \n",
" 326 | \n",
"
\n",
" \n",
" | EU | \n",
" 132.555556 | \n",
" 0 | \n",
" 373 | \n",
"
\n",
" \n",
" | NA | \n",
" 165.739130 | \n",
" 68 | \n",
" 438 | \n",
"
\n",
" \n",
" | OC | \n",
" 58.437500 | \n",
" 0 | \n",
" 254 | \n",
"
\n",
" \n",
" | SA | \n",
" 114.750000 | \n",
" 25 | \n",
" 302 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" mean min max\n",
"continent \n",
"AF 16.339623 0 152\n",
"AS 60.840909 0 326\n",
"EU 132.555556 0 373\n",
"NA 165.739130 68 438\n",
"OC 58.437500 0 254\n",
"SA 114.750000 25 302"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"drinks.groupby('continent').spirit_servings.agg(['mean', 'min', 'max'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.6"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}