{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Ex - GroupBy\n",
    "\n",
    "Check out [Alcohol Consumption Exercises Video Tutorial](https://youtu.be/az67CMdmS6s) to watch a data scientist go through the exercises"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Introduction:\n",
    "\n",
    "GroupBy can be summarized as Split-Apply-Combine.\n",
    "\n",
    "Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.\n",
    "\n",
    "Check out this [Diagram](http://i.imgur.com/yjNkiwL.png)  \n",
    "### Step 1. Import the necessary libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3. Assign it to a variable called drinks.(Watch the values of Column continent NA (North America), and how Pandas interprets it!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>AS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>EU</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>AF</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>EU</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>AF</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       country  beer_servings  spirit_servings  wine_servings  \\\n",
       "0  Afghanistan              0                0              0   \n",
       "1      Albania             89              132             54   \n",
       "2      Algeria             25                0             14   \n",
       "3      Andorra            245              138            312   \n",
       "4       Angola            217               57             45   \n",
       "\n",
       "   total_litres_of_pure_alcohol continent  \n",
       "0                           0.0        AS  \n",
       "1                           4.9        EU  \n",
       "2                           0.7        AF  \n",
       "3                          12.4        EU  \n",
       "4                           5.9        AF  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "drinks = pd.read_csv('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv',keep_default_na=False)\n",
    "drinks.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 4. Which continent drinks more beer on average?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "continent\n",
       "AF     61.471698\n",
       "AS     37.045455\n",
       "EU    193.777778\n",
       "NA    145.434783\n",
       "OC     89.687500\n",
       "SA    175.083333\n",
       "Name: beer_servings, dtype: float64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "drinks.groupby('continent').beer_servings.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 5. For each continent print the statistics for wine consumption."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>continent</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>AF</th>\n",
       "      <td>53.0</td>\n",
       "      <td>16.264151</td>\n",
       "      <td>38.846419</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>13.00</td>\n",
       "      <td>233.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AS</th>\n",
       "      <td>44.0</td>\n",
       "      <td>9.068182</td>\n",
       "      <td>21.667034</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>8.00</td>\n",
       "      <td>123.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>EU</th>\n",
       "      <td>45.0</td>\n",
       "      <td>142.222222</td>\n",
       "      <td>97.421738</td>\n",
       "      <td>0.0</td>\n",
       "      <td>59.0</td>\n",
       "      <td>128.0</td>\n",
       "      <td>195.00</td>\n",
       "      <td>370.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NA</th>\n",
       "      <td>23.0</td>\n",
       "      <td>24.521739</td>\n",
       "      <td>28.266378</td>\n",
       "      <td>1.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>34.00</td>\n",
       "      <td>100.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>OC</th>\n",
       "      <td>16.0</td>\n",
       "      <td>35.625000</td>\n",
       "      <td>64.555790</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>8.5</td>\n",
       "      <td>23.25</td>\n",
       "      <td>212.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SA</th>\n",
       "      <td>12.0</td>\n",
       "      <td>62.416667</td>\n",
       "      <td>88.620189</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>98.50</td>\n",
       "      <td>221.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           count        mean        std  min   25%    50%     75%    max\n",
       "continent                                                               \n",
       "AF          53.0   16.264151  38.846419  0.0   1.0    2.0   13.00  233.0\n",
       "AS          44.0    9.068182  21.667034  0.0   0.0    1.0    8.00  123.0\n",
       "EU          45.0  142.222222  97.421738  0.0  59.0  128.0  195.00  370.0\n",
       "NA          23.0   24.521739  28.266378  1.0   5.0   11.0   34.00  100.0\n",
       "OC          16.0   35.625000  64.555790  0.0   1.0    8.5   23.25  212.0\n",
       "SA          12.0   62.416667  88.620189  1.0   3.0   12.0   98.50  221.0"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "drinks.groupby('continent').wine_servings.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 6. Print the mean alcohol consumption per continent for every column"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>continent</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>AF</th>\n",
       "      <td>61.471698</td>\n",
       "      <td>16.339623</td>\n",
       "      <td>16.264151</td>\n",
       "      <td>3.007547</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AS</th>\n",
       "      <td>37.045455</td>\n",
       "      <td>60.840909</td>\n",
       "      <td>9.068182</td>\n",
       "      <td>2.170455</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>EU</th>\n",
       "      <td>193.777778</td>\n",
       "      <td>132.555556</td>\n",
       "      <td>142.222222</td>\n",
       "      <td>8.617778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NA</th>\n",
       "      <td>145.434783</td>\n",
       "      <td>165.739130</td>\n",
       "      <td>24.521739</td>\n",
       "      <td>5.995652</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>OC</th>\n",
       "      <td>89.687500</td>\n",
       "      <td>58.437500</td>\n",
       "      <td>35.625000</td>\n",
       "      <td>3.381250</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SA</th>\n",
       "      <td>175.083333</td>\n",
       "      <td>114.750000</td>\n",
       "      <td>62.416667</td>\n",
       "      <td>6.308333</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           beer_servings  spirit_servings  wine_servings  \\\n",
       "continent                                                  \n",
       "AF             61.471698        16.339623      16.264151   \n",
       "AS             37.045455        60.840909       9.068182   \n",
       "EU            193.777778       132.555556     142.222222   \n",
       "NA            145.434783       165.739130      24.521739   \n",
       "OC             89.687500        58.437500      35.625000   \n",
       "SA            175.083333       114.750000      62.416667   \n",
       "\n",
       "           total_litres_of_pure_alcohol  \n",
       "continent                                \n",
       "AF                             3.007547  \n",
       "AS                             2.170455  \n",
       "EU                             8.617778  \n",
       "NA                             5.995652  \n",
       "OC                             3.381250  \n",
       "SA                             6.308333  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "drinks.groupby('continent').mean(numeric_only=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 7. Print the median alcohol consumption per continent for every column"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>continent</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>AF</th>\n",
       "      <td>32.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AS</th>\n",
       "      <td>17.5</td>\n",
       "      <td>16.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>EU</th>\n",
       "      <td>219.0</td>\n",
       "      <td>122.0</td>\n",
       "      <td>128.0</td>\n",
       "      <td>10.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NA</th>\n",
       "      <td>143.0</td>\n",
       "      <td>137.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>6.30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>OC</th>\n",
       "      <td>52.5</td>\n",
       "      <td>37.0</td>\n",
       "      <td>8.5</td>\n",
       "      <td>1.75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SA</th>\n",
       "      <td>162.5</td>\n",
       "      <td>108.5</td>\n",
       "      <td>12.0</td>\n",
       "      <td>6.85</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           beer_servings  spirit_servings  wine_servings  \\\n",
       "continent                                                  \n",
       "AF                  32.0              3.0            2.0   \n",
       "AS                  17.5             16.0            1.0   \n",
       "EU                 219.0            122.0          128.0   \n",
       "NA                 143.0            137.0           11.0   \n",
       "OC                  52.5             37.0            8.5   \n",
       "SA                 162.5            108.5           12.0   \n",
       "\n",
       "           total_litres_of_pure_alcohol  \n",
       "continent                                \n",
       "AF                                 2.30  \n",
       "AS                                 1.20  \n",
       "EU                                10.00  \n",
       "NA                                 6.30  \n",
       "OC                                 1.75  \n",
       "SA                                 6.85  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "drinks.groupby('continent').median(numeric_only=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 8. Print the mean, min and max values for spirit consumption for each Continent.\n",
    "#### This time output a DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mean</th>\n",
       "      <th>min</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>continent</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>AF</th>\n",
       "      <td>16.339623</td>\n",
       "      <td>0</td>\n",
       "      <td>152</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AS</th>\n",
       "      <td>60.840909</td>\n",
       "      <td>0</td>\n",
       "      <td>326</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>EU</th>\n",
       "      <td>132.555556</td>\n",
       "      <td>0</td>\n",
       "      <td>373</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NA</th>\n",
       "      <td>165.739130</td>\n",
       "      <td>68</td>\n",
       "      <td>438</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>OC</th>\n",
       "      <td>58.437500</td>\n",
       "      <td>0</td>\n",
       "      <td>254</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SA</th>\n",
       "      <td>114.750000</td>\n",
       "      <td>25</td>\n",
       "      <td>302</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 mean  min  max\n",
       "continent                      \n",
       "AF          16.339623    0  152\n",
       "AS          60.840909    0  326\n",
       "EU         132.555556    0  373\n",
       "NA         165.739130   68  438\n",
       "OC          58.437500    0  254\n",
       "SA         114.750000   25  302"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "drinks.groupby('continent').spirit_servings.agg(['mean', 'min', 'max'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.6"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}