{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Ex - GroupBy\n", "\n", "Check out [Alcohol Consumption Exercises Video Tutorial](https://youtu.be/az67CMdmS6s) to watch a data scientist go through the exercises" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Introduction:\n", "\n", "GroupBy can be summarized as Split-Apply-Combine.\n", "\n", "Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.\n", "\n", "Check out this [Diagram](http://i.imgur.com/yjNkiwL.png) \n", "### Step 1. Import the necessary libraries" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 3. Assign it to a variable called drinks.(Watch the values of Column continent NA (North America), and how Pandas interprets it!" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countrybeer_servingsspirit_servingswine_servingstotal_litres_of_pure_alcoholcontinent
0Afghanistan0000.0AS
1Albania89132544.9EU
2Algeria250140.7AF
3Andorra24513831212.4EU
4Angola21757455.9AF
\n", "
" ], "text/plain": [ " country beer_servings spirit_servings wine_servings \\\n", "0 Afghanistan 0 0 0 \n", "1 Albania 89 132 54 \n", "2 Algeria 25 0 14 \n", "3 Andorra 245 138 312 \n", "4 Angola 217 57 45 \n", "\n", " total_litres_of_pure_alcohol continent \n", "0 0.0 AS \n", "1 4.9 EU \n", "2 0.7 AF \n", "3 12.4 EU \n", "4 5.9 AF " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "drinks = pd.read_csv('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv',keep_default_na=False)\n", "drinks.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 4. Which continent drinks more beer on average?" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "continent\n", "AF 61.471698\n", "AS 37.045455\n", "EU 193.777778\n", "NA 145.434783\n", "OC 89.687500\n", "SA 175.083333\n", "Name: beer_servings, dtype: float64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "drinks.groupby('continent').beer_servings.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 5. For each continent print the statistics for wine consumption." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
continent
AF53.016.26415138.8464190.01.02.013.00233.0
AS44.09.06818221.6670340.00.01.08.00123.0
EU45.0142.22222297.4217380.059.0128.0195.00370.0
NA23.024.52173928.2663781.05.011.034.00100.0
OC16.035.62500064.5557900.01.08.523.25212.0
SA12.062.41666788.6201891.03.012.098.50221.0
\n", "
" ], "text/plain": [ " count mean std min 25% 50% 75% max\n", "continent \n", "AF 53.0 16.264151 38.846419 0.0 1.0 2.0 13.00 233.0\n", "AS 44.0 9.068182 21.667034 0.0 0.0 1.0 8.00 123.0\n", "EU 45.0 142.222222 97.421738 0.0 59.0 128.0 195.00 370.0\n", "NA 23.0 24.521739 28.266378 1.0 5.0 11.0 34.00 100.0\n", "OC 16.0 35.625000 64.555790 0.0 1.0 8.5 23.25 212.0\n", "SA 12.0 62.416667 88.620189 1.0 3.0 12.0 98.50 221.0" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "drinks.groupby('continent').wine_servings.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 6. Print the mean alcohol consumption per continent for every column" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
beer_servingsspirit_servingswine_servingstotal_litres_of_pure_alcohol
continent
AF61.47169816.33962316.2641513.007547
AS37.04545560.8409099.0681822.170455
EU193.777778132.555556142.2222228.617778
NA145.434783165.73913024.5217395.995652
OC89.68750058.43750035.6250003.381250
SA175.083333114.75000062.4166676.308333
\n", "
" ], "text/plain": [ " beer_servings spirit_servings wine_servings \\\n", "continent \n", "AF 61.471698 16.339623 16.264151 \n", "AS 37.045455 60.840909 9.068182 \n", "EU 193.777778 132.555556 142.222222 \n", "NA 145.434783 165.739130 24.521739 \n", "OC 89.687500 58.437500 35.625000 \n", "SA 175.083333 114.750000 62.416667 \n", "\n", " total_litres_of_pure_alcohol \n", "continent \n", "AF 3.007547 \n", "AS 2.170455 \n", "EU 8.617778 \n", "NA 5.995652 \n", "OC 3.381250 \n", "SA 6.308333 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "drinks.groupby('continent').mean(numeric_only=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 7. Print the median alcohol consumption per continent for every column" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
beer_servingsspirit_servingswine_servingstotal_litres_of_pure_alcohol
continent
AF32.03.02.02.30
AS17.516.01.01.20
EU219.0122.0128.010.00
NA143.0137.011.06.30
OC52.537.08.51.75
SA162.5108.512.06.85
\n", "
" ], "text/plain": [ " beer_servings spirit_servings wine_servings \\\n", "continent \n", "AF 32.0 3.0 2.0 \n", "AS 17.5 16.0 1.0 \n", "EU 219.0 122.0 128.0 \n", "NA 143.0 137.0 11.0 \n", "OC 52.5 37.0 8.5 \n", "SA 162.5 108.5 12.0 \n", "\n", " total_litres_of_pure_alcohol \n", "continent \n", "AF 2.30 \n", "AS 1.20 \n", "EU 10.00 \n", "NA 6.30 \n", "OC 1.75 \n", "SA 6.85 " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "drinks.groupby('continent').median(numeric_only=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 8. Print the mean, min and max values for spirit consumption for each Continent.\n", "#### This time output a DataFrame" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meanminmax
continent
AF16.3396230152
AS60.8409090326
EU132.5555560373
NA165.73913068438
OC58.4375000254
SA114.75000025302
\n", "
" ], "text/plain": [ " mean min max\n", "continent \n", "AF 16.339623 0 152\n", "AS 60.840909 0 326\n", "EU 132.555556 0 373\n", "NA 165.739130 68 438\n", "OC 58.437500 0 254\n", "SA 114.750000 25 302" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "drinks.groupby('continent').spirit_servings.agg(['mean', 'min', 'max'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.6" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }