{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "4a87b5ef",
   "metadata": {},
   "source": [
    "---   \n",
    " <img align=\"left\" width=\"75\" height=\"75\"  src=\"https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png\"> \n",
    "\n",
    "<h1 align=\"center\">Department of Data Science</h1>\n",
    "<h1 align=\"center\">Course: Tools and Techniques for Data Science</h1>\n",
    "\n",
    "---\n",
    "<h3><div align=\"right\">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab0dc25c",
   "metadata": {},
   "source": [
    "<h1 align=\"center\">Lecture 3.17 (Pandas-09)</h1>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19f82705",
   "metadata": {},
   "source": [
    "## _Aggregating and Grouping Dataframes.ipynb_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "07d1d815",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0acb63e8",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87d7b36c",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f284fcde",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "9f676117",
   "metadata": {},
   "source": [
    "<h3><div align=\"right\">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>  \n",
    "<img align=\"left\" width=\"1000\" height=\"1000\"  src=\"images/groupbyfinal.png\"  >"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f2a42950",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1a434dac",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c736f024",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a4a53918",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36ad40d6",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "12db95e1",
   "metadata": {},
   "source": [
    "## Learning agenda of this notebook\n",
    "1. Overview of Aggregation Functions and the `agg()` method\n",
    "    - Applying a Built-in Aggregation Function on Entire Dataframe Object\n",
    "    - Applying a Built-in Aggregation Function on a Series Object\n",
    "    - Applying a User-Defined/Lambda Function on a Series Object<br><br>\n",
    "2. Computing the Minimum Temperature of each City using **hard way**<br><br>\n",
    "3. Computing the Minimum Temperature of each City using **`groupby`**<br><br>\n",
    "4. Practice GroupBy on Stack Overflow Survey Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "db17aec4",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5e54b5d6",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6ab0f3d7",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "63d2431e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d0a2d89b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "1664cd8f",
   "metadata": {},
   "source": [
    "## 1. Overview of Aggregation Functions and the `agg()` Method\n",
    "- An aggregation function is one which takes multiple individual values and returns a result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "298d74ce",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "      <th>humidity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>8</td>\n",
       "      <td>60</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>02/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>10</td>\n",
       "      <td>58</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>03/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>5</td>\n",
       "      <td>51</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>04/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>6</td>\n",
       "      <td>49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>12</td>\n",
       "      <td>54</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>18</td>\n",
       "      <td>74</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>02/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>10</td>\n",
       "      <td>71</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>03/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>12</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>04/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>15</td>\n",
       "      <td>76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>16</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-5</td>\n",
       "      <td>61</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>02/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-3</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>03/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-4</td>\n",
       "      <td>69</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>04/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-1</td>\n",
       "      <td>63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-7</td>\n",
       "      <td>72</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          date     city  temperature  humidity\n",
       "0   01/01/2022   lahore            8        60\n",
       "1   02/01/2022   lahore           10        58\n",
       "2   03/01/2022   lahore            5        51\n",
       "3   04/01/2022   lahore            6        49\n",
       "4   05/01/2022   lahore           12        54\n",
       "5   01/01/2022  karachi           18        74\n",
       "6   02/01/2022  karachi           10        71\n",
       "7   03/01/2022  karachi           12        78\n",
       "8   04/01/2022  karachi           15        76\n",
       "9   05/01/2022  karachi           16        70\n",
       "10  01/01/2022   murree           -5        61\n",
       "11  02/01/2022   murree           -3        68\n",
       "12  03/01/2022   murree           -4        69\n",
       "13  04/01/2022   murree           -1        63\n",
       "14  05/01/2022   murree           -7        72"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "df = pd.read_csv('datasets/groupbydata2.csv')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71f119a8",
   "metadata": {},
   "source": [
    "### a. Applying a Built-in Aggregation Function on Entire Dataframe Object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "3e9650a9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "date           01/01/2022\n",
       "city              karachi\n",
       "temperature            -7\n",
       "humidity               49\n",
       "dtype: object"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "7e5e6958",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "date           15\n",
       "city           15\n",
       "temperature    15\n",
       "humidity       15\n",
       "dtype: int64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "31839411",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/1t/g3ylw8h50cjdqmk5d6jh1qmm0000gn/T/ipykernel_3651/2026153972.py:2: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.\n",
      "  df.median()\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "temperature     8.0\n",
       "humidity       68.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Should be applied to numeric columns only, may raise a warning\n",
    "df.median()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "fa0eb1f9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "temperature     8.0\n",
       "humidity       68.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.median(numeric_only=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c41a2003",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bd9f3067",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b29bf1c2",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3655614e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8409ebd7",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "0a80e737",
   "metadata": {},
   "source": [
    "> We can call the `agg()` method on the dataframe to apply multiple aggregation functions at a time, by passing the `agg()` function a list of aggregation functions as strings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "875e29de",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "      <th>humidity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>-7</td>\n",
       "      <td>49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>18</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>15</td>\n",
       "      <td>15</td>\n",
       "      <td>15</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             date     city  temperature  humidity\n",
       "min    01/01/2022  karachi           -7        49\n",
       "max    05/01/2022   murree           18        78\n",
       "count          15       15           15        15"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.agg(['min', 'max',  'count'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ab3e6286",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a58a2e02",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "f7624ba3",
   "metadata": {},
   "source": [
    "> We can call the `describe()` method on the dataframe to get descriptive statistical measures on all its numeric columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "a26b0208",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>temperature</th>\n",
       "      <th>humidity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>15.000000</td>\n",
       "      <td>15.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>6.133333</td>\n",
       "      <td>64.933333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>8.253715</td>\n",
       "      <td>9.153194</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>-7.000000</td>\n",
       "      <td>49.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>-2.000000</td>\n",
       "      <td>59.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>8.000000</td>\n",
       "      <td>68.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>12.000000</td>\n",
       "      <td>71.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>18.000000</td>\n",
       "      <td>78.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       temperature   humidity\n",
       "count    15.000000  15.000000\n",
       "mean      6.133333  64.933333\n",
       "std       8.253715   9.153194\n",
       "min      -7.000000  49.000000\n",
       "25%      -2.000000  59.000000\n",
       "50%       8.000000  68.000000\n",
       "75%      12.000000  71.500000\n",
       "max      18.000000  78.000000"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "98f30e56",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c6ceb80e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "264cd891",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1b6edfe8",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "29818e96",
   "metadata": {},
   "source": [
    "### b. Applying a Built-in Aggregation Function on a Series Object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "111bef4b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "-7"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['temperature'].min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "22606fba",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "18"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['temperature'].max()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "709429fa",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6.133333333333334"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['temperature'].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b131b9b5",
   "metadata": {},
   "source": [
    "> We can call the `agg()` method on a series to apply multiple aggregation functions at a time, by passing the `agg()` function a list of aggregation functions as strings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4bea6bc9",
   "metadata": {},
   "outputs": [],
   "source": [
    "df['temperature'].agg(['min', 'max', 'mean', 'count'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07ad4b62",
   "metadata": {},
   "source": [
    "> We can call the `describe()` method on the dataframe to get descriptive statistical measures on all its numeric columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "80d38d82",
   "metadata": {},
   "outputs": [],
   "source": [
    "df['temperature'].describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8e72c885",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "86a71e02",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "39a48d79",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2d6f7d77",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "4d354a5b",
   "metadata": {},
   "source": [
    "### c. Applying a User-Defined/Lambda Function on a Series Object using the `apply()` Method\n",
    "- We have used this `apply()` method before as well that is used to invoke function on values of Series and return a resulting series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "8647021a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      8\n",
       "1     10\n",
       "2      5\n",
       "3      6\n",
       "4     12\n",
       "5     18\n",
       "6     10\n",
       "7     12\n",
       "8     15\n",
       "9     16\n",
       "10    -5\n",
       "11    -3\n",
       "12    -4\n",
       "13    -1\n",
       "14    -7\n",
       "Name: temperature, dtype: int64"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.temperature"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "ffd978e9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     46.4\n",
       "1     50.0\n",
       "2     41.0\n",
       "3     42.8\n",
       "4     53.6\n",
       "5     64.4\n",
       "6     50.0\n",
       "7     53.6\n",
       "8     59.0\n",
       "9     60.8\n",
       "10    23.0\n",
       "11    26.6\n",
       "12    24.8\n",
       "13    30.2\n",
       "14    19.4\n",
       "Name: temperature, dtype: float64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def ctof(x):\n",
    "    return x*9/5+32\n",
    "\n",
    "df.temperature.apply(ctof)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ec10b7f9",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e97d6d8f",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "253bc061",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a73e2af",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f4c6948e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "b427ebcd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     46.4\n",
       "1     50.0\n",
       "2     41.0\n",
       "3     42.8\n",
       "4     53.6\n",
       "5     64.4\n",
       "6     50.0\n",
       "7     53.6\n",
       "8     59.0\n",
       "9     60.8\n",
       "10    23.0\n",
       "11    26.6\n",
       "12    24.8\n",
       "13    30.2\n",
       "14    19.4\n",
       "Name: temperature, dtype: float64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.temperature.apply(lambda x: x*9/5+32)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d7d0bbd8",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a89c671b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb0141d9",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "afd6b8ba",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d9851993",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "1253cf51",
   "metadata": {},
   "source": [
    "# How to Compute the Minimum Temperature of Each City?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c21a3258",
   "metadata": {},
   "source": [
    "## 2. Doing it the Hard Way\n",
    "<img align=\"center\" width=\"700\" height=\"500\"  src=\"images/groupbyfinal.png\"  >"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "443db1c5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>02/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>03/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>04/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>02/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>03/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>04/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>02/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>03/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>04/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          date     city  temperature\n",
       "0   01/01/2022   lahore            8\n",
       "1   02/01/2022   lahore           10\n",
       "2   03/01/2022   lahore            5\n",
       "3   04/01/2022   lahore            6\n",
       "4   05/01/2022   lahore           12\n",
       "5   01/01/2022  karachi           18\n",
       "6   02/01/2022  karachi           10\n",
       "7   03/01/2022  karachi           12\n",
       "8   04/01/2022  karachi           15\n",
       "9   05/01/2022  karachi           16\n",
       "10  01/01/2022   murree           -5\n",
       "11  02/01/2022   murree           -3\n",
       "12  03/01/2022   murree           -4\n",
       "13  04/01/2022   murree           -1\n",
       "14  05/01/2022   murree           -7"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "df = pd.read_csv('datasets/groupbydata1.csv')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb304630",
   "metadata": {},
   "source": [
    "### a. Splitting the Dataframe\n",
    "- We need to use conditional selection technique, in which we pass a Boolean mask for the appropriate city column to be selected. Can do it using two ways:\n",
    "    - Using `df[]` subscript operator\n",
    "    - Using `df.loc` method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "8bc2737e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>02/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>03/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>04/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date     city  temperature\n",
       "5  01/01/2022  karachi           18\n",
       "6  02/01/2022  karachi           10\n",
       "7  03/01/2022  karachi           12\n",
       "8  04/01/2022  karachi           15\n",
       "9  05/01/2022  karachi           16"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['city']=='karachi']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "9d4410d9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>02/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>03/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>04/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date    city  temperature\n",
       "0  01/01/2022  lahore            8\n",
       "1  02/01/2022  lahore           10\n",
       "2  03/01/2022  lahore            5\n",
       "3  04/01/2022  lahore            6\n",
       "4  05/01/2022  lahore           12"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['city']=='lahore']\n",
    "df.loc[df.city=='lahore', :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "43a90d9a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>02/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>03/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>04/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date     city  temperature\n",
       "5  01/01/2022  karachi           18\n",
       "6  02/01/2022  karachi           10\n",
       "7  03/01/2022  karachi           12\n",
       "8  04/01/2022  karachi           15\n",
       "9  05/01/2022  karachi           16"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['city']=='karachi']\n",
    "df.loc[df.city=='karachi', :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "2fe134f3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>02/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>03/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>04/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          date    city  temperature\n",
       "10  01/01/2022  murree           -5\n",
       "11  02/01/2022  murree           -3\n",
       "12  03/01/2022  murree           -4\n",
       "13  04/01/2022  murree           -1\n",
       "14  05/01/2022  murree           -7"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['city']=='murree']\n",
    "df.loc[df.city=='murree', :]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "531be356",
   "metadata": {},
   "source": [
    ">**Limitation:**\n",
    ">- We have to repeat this process for every city separately.\n",
    ">- What if there are over 100 cities in the dataset?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9c11acbb",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2510868f",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a552b89a",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c0bf46fe",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3b691295",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aafece6c",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "e6a35e53",
   "metadata": {},
   "source": [
    "### b. Applying the `min()` Function\n",
    "- We need to apply the `min()` function on the temperature column of all of the above dataframes separately"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "98d13252",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.loc[df.city=='lahore', :].temperature.min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3a9b7261",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.loc[df.city=='lahore', :].temperature.min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2f36327e",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.loc[df.city=='karachi', :].temperature.min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9e76a7dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.loc[df.city=='murree', :].temperature.min()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6573946e",
   "metadata": {},
   "source": [
    ">**Limitation:**\n",
    ">- We have to repeat this process for every city separately.\n",
    ">- What if there are over 100 cities in the dataset?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "078a3e5c",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "adc56aef",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "49071bbc",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5647b406",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "84b892aa",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3bc149d5",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "8d980d9c",
   "metadata": {},
   "source": [
    "### c. Combining the Result\n",
    "- Since, we have got the minimum temperature of all the cities, we need to combine them to an appropriate series object to be used for later processing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "67ff4a19",
   "metadata": {},
   "outputs": [],
   "source": [
    "lhr = df.loc[df.city=='lahore', :].temperature.min()\n",
    "kci = df.loc[df.city=='karachi', :].temperature.min()\n",
    "murree = df.loc[df.city=='murree', :].temperature.min()\n",
    "\n",
    "s = pd.Series(data=[lhr, kci, murree], index=['L_min', 'K_min', 'M_min'] )\n",
    "s.name= 'Min Temperatures'\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "59a794d2",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f857f7dd",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30bc4bdc",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8eca9f7d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "440a8b2f",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c14e695f",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c2ba09af",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "3f7f9891",
   "metadata": {},
   "source": [
    "# How to Compute the Minimum Temperature of Each City?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62fb733d",
   "metadata": {},
   "source": [
    "## 3. An Elegant Way\n",
    "<img align=\"center\" width=\"700\" height=\"500\"  src=\"images/groupbyfinal.png\"  >"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "08b3cae2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>02/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>03/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>04/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>02/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>03/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>04/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>02/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>03/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>04/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          date     city  temperature\n",
       "0   01/01/2022   lahore            8\n",
       "1   02/01/2022   lahore           10\n",
       "2   03/01/2022   lahore            5\n",
       "3   04/01/2022   lahore            6\n",
       "4   05/01/2022   lahore           12\n",
       "5   01/01/2022  karachi           18\n",
       "6   02/01/2022  karachi           10\n",
       "7   03/01/2022  karachi           12\n",
       "8   04/01/2022  karachi           15\n",
       "9   05/01/2022  karachi           16\n",
       "10  01/01/2022   murree           -5\n",
       "11  02/01/2022   murree           -3\n",
       "12  03/01/2022   murree           -4\n",
       "13  04/01/2022   murree           -1\n",
       "14  05/01/2022   murree           -7"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "df = pd.read_csv('datasets/groupbydata1.csv')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93f02e73",
   "metadata": {},
   "source": [
    "### a. Step 1: Split Step\n",
    "- In the split step we divide the data inside the dataframe into multiple groups\n",
    "- Since we need to calculate the minimum temperature of each city, therefore, we will use `groupby()` method on the `city` column of the dataframe.\n",
    "- This will result a DataFrameGroupBy object, which is an iterable containing multiple small dataframes based on the `by` argument passed to the `groupby()` method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "b0ea5d79",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7fc92ad60d90>"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfgb = df.groupby('city')\n",
    "dfgb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "52a0e61a",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "46dd2666",
   "metadata": {},
   "source": [
    ">- Since this is an iterable, so let us iterate :)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "cc1b1019",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('karachi',          date     city  temperature\n",
      "5  01/01/2022  karachi           18\n",
      "6  02/01/2022  karachi           10\n",
      "7  03/01/2022  karachi           12\n",
      "8  04/01/2022  karachi           15\n",
      "9  05/01/2022  karachi           16)\n",
      "('lahore',          date    city  temperature\n",
      "0  01/01/2022  lahore            8\n",
      "1  02/01/2022  lahore           10\n",
      "2  03/01/2022  lahore            5\n",
      "3  04/01/2022  lahore            6\n",
      "4  05/01/2022  lahore           12)\n",
      "('murree',           date    city  temperature\n",
      "10  01/01/2022  murree           -5\n",
      "11  02/01/2022  murree           -3\n",
      "12  03/01/2022  murree           -4\n",
      "13  04/01/2022  murree           -1\n",
      "14  05/01/2022  murree           -7)\n"
     ]
    }
   ],
   "source": [
    "for mydf in dfgb:\n",
    "    print(mydf)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bdce1713",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f00f923a",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2c824665",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f2429cb1",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "da049f81",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "6b81eb53",
   "metadata": {},
   "source": [
    ">- To display indices of every group in the dataframe, use `groups` attribute of  `DataFrameGroupBy` object.\n",
    ">- Returns a Dictionary object (PrettyDict) with keys as the group value and value as list of corresponding row indices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "1de745f0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'karachi': [5, 6, 7, 8, 9], 'lahore': [0, 1, 2, 3, 4], 'murree': [10, 11, 12, 13, 14]}"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfgb.groups   # df.groupby('city').groups"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "00e3974a",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "64e34bf9",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "48f40627",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "444b9ae2",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2fd7fc5e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "d4c2271f",
   "metadata": {},
   "source": [
    ">- To display records of a specific group, use `get_group()` method on `DataFrameGroupBy` object.\n",
    ">- Construct and return DataFrame from `DataFrameGroupBy` object  with provided name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "cbfa0a16",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>02/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>03/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>04/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          date    city  temperature\n",
       "10  01/01/2022  murree           -5\n",
       "11  02/01/2022  murree           -3\n",
       "12  03/01/2022  murree           -4\n",
       "13  04/01/2022  murree           -1\n",
       "14  05/01/2022  murree           -7"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Display DataFrame of a specific group from groupby object by providing the specific group value\n",
    "dfgb.get_group('murree') # df.groupby('city').get_group('karachi') "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a93eea5d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22bdd270",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "06f58584",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3b1f8d1c",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "1c325989",
   "metadata": {},
   "source": [
    ">- To find the size of each group, use `size()` method of DataFrameGroupBy object.\n",
    ">- It return a series containing number of rows in each each group of the DataFrameGroupBy object as a Series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "fb875cd9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "city\n",
       "karachi    5\n",
       "lahore     5\n",
       "murree     5\n",
       "dtype: int64"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfgb.size()  #df.groupby('city').size()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "03042da0",
   "metadata": {},
   "source": [
    "> After understanding the `groupby()` method let us move to step 2, and that is `Applying a Function`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8f6f80ed",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fcfc7a1d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6f2df584",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3dd730b5",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ac3e2b74",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2030779b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38c9df60",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "95bc3a96",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "9519e655",
   "metadata": {},
   "source": [
    "### b. Step 2: Apply Step\n",
    "- Now second step is that we apply appropriate aggregate function on all the groups inside the DataFrameGroupBy object"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "729e7c98",
   "metadata": {},
   "source": [
    "**Let us first apply aggregate function on a specific column of `DataFrameGroupBy` object, which is a `SeriesGroupBy` object**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "4b906481",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>02/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>03/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>04/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>02/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>03/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>04/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>karachi</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>02/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>03/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>04/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>murree</td>\n",
       "      <td>-7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          date     city  temperature\n",
       "0   01/01/2022   lahore            8\n",
       "1   02/01/2022   lahore           10\n",
       "2   03/01/2022   lahore            5\n",
       "3   04/01/2022   lahore            6\n",
       "4   05/01/2022   lahore           12\n",
       "5   01/01/2022  karachi           18\n",
       "6   02/01/2022  karachi           10\n",
       "7   03/01/2022  karachi           12\n",
       "8   04/01/2022  karachi           15\n",
       "9   05/01/2022  karachi           16\n",
       "10  01/01/2022   murree           -5\n",
       "11  02/01/2022   murree           -3\n",
       "12  03/01/2022   murree           -4\n",
       "13  04/01/2022   murree           -1\n",
       "14  05/01/2022   murree           -7"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85bfea29",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "6f3ffb6d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7fc92ade70d0>"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('city')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "4175e21e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>01/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>02/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>03/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>04/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>05/01/2022</td>\n",
       "      <td>lahore</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date    city  temperature\n",
       "0  01/01/2022  lahore            8\n",
       "1  02/01/2022  lahore           10\n",
       "2  03/01/2022  lahore            5\n",
       "3  04/01/2022  lahore            6\n",
       "4  05/01/2022  lahore           12"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('city').get_group('lahore')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "c7820a7d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('city').get_group('lahore').temperature.min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "ff32ca65",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('city').get_group('lahore').temperature.min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "e1744064",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "10"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('city').get_group('karachi').temperature.min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "c4aab17c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "-7"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('city').get_group('murree').temperature.min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fe8e892f",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8f2dec82",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4f275823",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "a49a9ba0",
   "metadata": {},
   "source": [
    "### b. Step 3: Combine Step\n",
    "- Now we have got minimum temperature of all the three cities, let us combine the result into a series object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "e37d344a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "K_min    10\n",
       "L_min     5\n",
       "M_min    -7\n",
       "Name: Min Temperatures, dtype: int64"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "kci = df.groupby('city').get_group('karachi').temperature.min()\n",
    "lhr = df.groupby('city').get_group('lahore').temperature.min()\n",
    "murree = df.groupby('city').get_group('murree').temperature.min()\n",
    "\n",
    "s1 = pd.Series(data=[kci, lhr, murree], index=['K_min', 'L_min', 'M_min'] )\n",
    "s1.name= 'Min Temperatures'\n",
    "s1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "db772042",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "f07d918e",
   "metadata": {},
   "source": [
    ">- **Let us perform the `apply + combine` steps in one go, by applying the `min()` function on the temperature series of all the dataframes inside the DataFrameGroupBy object.**\n",
    ">- **This saves us from the hassle of applying `min()` method explicitly as done above**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "77db0e35",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.groupby('city')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2d014f0d",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.groupby('city').temperature"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "58604a90",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.groupby('city').temperature.min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "214d5d3a",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f362632d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e3325138",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "104e0f3c",
   "metadata": {},
   "source": [
    ">- **We can also apply `agg()` method on the temperature series of all the dataframes inside the DataFrameGroupBy object**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fd047429",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.groupby('city').temperature.agg(['min', 'max', 'sum', 'mean'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "908e2c72",
   "metadata": {},
   "source": [
    ">-Note that we have got a dataframe this time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ab453a05",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5affd9eb",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a20735b4",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "da368894",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "842b740b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "58697cde",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "94101a3b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "b5f491f4",
   "metadata": {},
   "source": [
    "## 4. Practice GroupBy on Stack Overflow Survey Dataset\n",
    "Visit to Download Data: https://insights.stackoverflow.com/survey/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12fdafcf",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "5f02fc6d",
   "metadata": {},
   "source": [
    "### a. Understand the Data Set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "2313a7da",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(88883, 9)"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "df = pd.read_csv('datasets/so_survey_subset.csv', index_col='Respondent')\n",
    "df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "adb21c6b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>MainBranch</th>\n",
       "      <th>Hobbyist</th>\n",
       "      <th>Country</th>\n",
       "      <th>YearsCode</th>\n",
       "      <th>ConvertedComp</th>\n",
       "      <th>LanguageWorkedWith</th>\n",
       "      <th>SocialMedia</th>\n",
       "      <th>Age</th>\n",
       "      <th>Gender</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Respondent</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>I am a student who is learning to code</td>\n",
       "      <td>Yes</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>HTML/CSS;Java;JavaScript;Python</td>\n",
       "      <td>Twitter</td>\n",
       "      <td>14.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>I am a student who is learning to code</td>\n",
       "      <td>No</td>\n",
       "      <td>Bosnia and Herzegovina</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C++;HTML/CSS;Python</td>\n",
       "      <td>Instagram</td>\n",
       "      <td>19.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>I am not primarily a developer, but I write co...</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Thailand</td>\n",
       "      <td>3</td>\n",
       "      <td>8820.0</td>\n",
       "      <td>HTML/CSS</td>\n",
       "      <td>Reddit</td>\n",
       "      <td>28.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>I am a developer by profession</td>\n",
       "      <td>No</td>\n",
       "      <td>United States</td>\n",
       "      <td>3</td>\n",
       "      <td>61000.0</td>\n",
       "      <td>C;C++;C#;Python;SQL</td>\n",
       "      <td>Reddit</td>\n",
       "      <td>22.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>I am a developer by profession</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Ukraine</td>\n",
       "      <td>16</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA</td>\n",
       "      <td>Facebook</td>\n",
       "      <td>30.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                   MainBranch Hobbyist  \\\n",
       "Respondent                                                               \n",
       "1                      I am a student who is learning to code      Yes   \n",
       "2                      I am a student who is learning to code       No   \n",
       "3           I am not primarily a developer, but I write co...      Yes   \n",
       "4                              I am a developer by profession       No   \n",
       "5                              I am a developer by profession      Yes   \n",
       "\n",
       "                           Country YearsCode  ConvertedComp  \\\n",
       "Respondent                                                    \n",
       "1                   United Kingdom         4            NaN   \n",
       "2           Bosnia and Herzegovina       NaN            NaN   \n",
       "3                         Thailand         3         8820.0   \n",
       "4                    United States         3        61000.0   \n",
       "5                          Ukraine        16            NaN   \n",
       "\n",
       "                                     LanguageWorkedWith SocialMedia   Age  \\\n",
       "Respondent                                                                  \n",
       "1                       HTML/CSS;Java;JavaScript;Python     Twitter  14.0   \n",
       "2                                   C++;HTML/CSS;Python   Instagram  19.0   \n",
       "3                                              HTML/CSS      Reddit  28.0   \n",
       "4                                   C;C++;C#;Python;SQL      Reddit  22.0   \n",
       "5           C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA    Facebook  30.0   \n",
       "\n",
       "           Gender  \n",
       "Respondent         \n",
       "1             Man  \n",
       "2             Man  \n",
       "3             Man  \n",
       "4             Man  \n",
       "5             Man  "
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aed11320",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.loc[df['Country']=='Pakistan', :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "46bd92ff",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8999fd3d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "600d2e20",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>QuestionText</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Column</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Respondent</th>\n",
       "      <td>Randomized respondent ID number (not in order ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MainBranch</th>\n",
       "      <td>Which of the following options best describes ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Hobbyist</th>\n",
       "      <td>Do you code as a hobby?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <td>In which country do you currently reside?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>YearsCode</th>\n",
       "      <td>Including any education, how many years have y...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ConvertedComp</th>\n",
       "      <td>Salary converted to annual USD salaries using ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>LanguageWorkedWith</th>\n",
       "      <td>Which of the following programming, scripting,...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SocialMedia</th>\n",
       "      <td>What social media site do you use the most?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Age</th>\n",
       "      <td>What is your age (in years)? If you prefer not...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Gender</th>\n",
       "      <td>Which of the following do you currently identi...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                         QuestionText\n",
       "Column                                                               \n",
       "Respondent          Randomized respondent ID number (not in order ...\n",
       "MainBranch          Which of the following options best describes ...\n",
       "Hobbyist                                      Do you code as a hobby?\n",
       "Country                     In which country do you currently reside?\n",
       "YearsCode           Including any education, how many years have y...\n",
       "ConvertedComp       Salary converted to annual USD salaries using ...\n",
       "LanguageWorkedWith  Which of the following programming, scripting,...\n",
       "SocialMedia               What social media site do you use the most?\n",
       "Age                 What is your age (in years)? If you prefer not...\n",
       "Gender              Which of the following do you currently identi..."
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "schema = pd.read_csv('datasets/so_survey_subset_schema.csv', index_col='Column')\n",
    "schema"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5429f2d0",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "146eb97a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "QuestionText    Do you code as a hobby?\n",
       "Name: Hobbyist, dtype: object"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "schema.loc['Hobbyist']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bf6d798f",
   "metadata": {},
   "outputs": [],
   "source": [
    "df['Hobbyist']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28946f82",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "54ddb6cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "schema.loc['Country']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "55a8ee40",
   "metadata": {},
   "outputs": [],
   "source": [
    "df['Country']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c98c33e1",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ad3de2b",
   "metadata": {},
   "outputs": [],
   "source": [
    "schema.loc['ConvertedComp']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "40df4193",
   "metadata": {},
   "outputs": [],
   "source": [
    "df['ConvertedComp']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f2c3819d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6f1899a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "schema.loc['LanguageWorkedWith']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4ef88604",
   "metadata": {},
   "outputs": [],
   "source": [
    "!cat datasets/so_survey_subset_schema.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7fa61427",
   "metadata": {},
   "outputs": [],
   "source": [
    "df['LanguageWorkedWith']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dbd881f9",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7bd2f021",
   "metadata": {},
   "outputs": [],
   "source": [
    "schema.loc['SocialMedia']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb7d90c9",
   "metadata": {},
   "outputs": [],
   "source": [
    "df['SocialMedia']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38ee282a",
   "metadata": {},
   "outputs": [],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "574e3847",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "da4a81c4",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "bc9d4cfb",
   "metadata": {},
   "source": [
    "##### Let us perform some basic statistical analysis on the Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2096bf5e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "35f48c5d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Returns the count of non-NA values for a series object.\n",
    "df['Hobbyist'].count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fe766e4c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Returns a Series containing counts of unique rows.\n",
    "df['Hobbyist'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e791cf10",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "39ba7c9b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "abac191b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "650bd519",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Returns the count of non-NA values for a series object.\n",
    "df['Country'].count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4aaea5b1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Returns a Series containing counts of unique rows.\n",
    "df['Country'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6b91f3cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "# To get the count of countries whose developers participated in the survey\n",
    "df['Country'].value_counts().count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4c3ae775",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4de9edd6",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aa4e378e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fc48a1f7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Returns the count of non-NA values for a series object.\n",
    "df['ConvertedComp'].count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb34078c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Returns a Series containing counts of unique rows.\n",
    "df['ConvertedComp'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "34dde858",
   "metadata": {},
   "outputs": [],
   "source": [
    "df['ConvertedComp'].mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f91267c0",
   "metadata": {},
   "outputs": [],
   "source": [
    "df['ConvertedComp'].median()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f48e6608",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "755b50b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b8b48433",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6516ee56",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "46da67ef",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0fe690a2",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "94e974b1",
   "metadata": {},
   "source": [
    "<h1 align=\"center\">Let us try answering certain Questions</h1>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f561cf90",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "1e75db36",
   "metadata": {},
   "source": [
    "##  Question 1: \n",
    ">**List the most popular SocialMedia web site for every Country**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "557059b6",
   "metadata": {},
   "source": [
    "**Let us first  do the easy task, and that is to list the most popular SocialMedia website of a single country (lets say Pakistan)**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "def1798a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>MainBranch</th>\n",
       "      <th>Hobbyist</th>\n",
       "      <th>Country</th>\n",
       "      <th>YearsCode</th>\n",
       "      <th>ConvertedComp</th>\n",
       "      <th>LanguageWorkedWith</th>\n",
       "      <th>SocialMedia</th>\n",
       "      <th>Age</th>\n",
       "      <th>Gender</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Respondent</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>I am a student who is learning to code</td>\n",
       "      <td>Yes</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>HTML/CSS;Java;JavaScript;Python</td>\n",
       "      <td>Twitter</td>\n",
       "      <td>14.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>I am a student who is learning to code</td>\n",
       "      <td>No</td>\n",
       "      <td>Bosnia and Herzegovina</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C++;HTML/CSS;Python</td>\n",
       "      <td>Instagram</td>\n",
       "      <td>19.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>I am not primarily a developer, but I write co...</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Thailand</td>\n",
       "      <td>3</td>\n",
       "      <td>8820.0</td>\n",
       "      <td>HTML/CSS</td>\n",
       "      <td>Reddit</td>\n",
       "      <td>28.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>I am a developer by profession</td>\n",
       "      <td>No</td>\n",
       "      <td>United States</td>\n",
       "      <td>3</td>\n",
       "      <td>61000.0</td>\n",
       "      <td>C;C++;C#;Python;SQL</td>\n",
       "      <td>Reddit</td>\n",
       "      <td>22.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>I am a developer by profession</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Ukraine</td>\n",
       "      <td>16</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA</td>\n",
       "      <td>Facebook</td>\n",
       "      <td>30.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88377</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Canada</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>HTML/CSS;JavaScript;Other(s):</td>\n",
       "      <td>YouTube</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88601</th>\n",
       "      <td>NaN</td>\n",
       "      <td>No</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88802</th>\n",
       "      <td>NaN</td>\n",
       "      <td>No</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88816</th>\n",
       "      <td>NaN</td>\n",
       "      <td>No</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88863</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Spain</td>\n",
       "      <td>8</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Bash/Shell/PowerShell;HTML/CSS;Java;JavaScript...</td>\n",
       "      <td>WhatsApp</td>\n",
       "      <td>18.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>88883 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                   MainBranch Hobbyist  \\\n",
       "Respondent                                                               \n",
       "1                      I am a student who is learning to code      Yes   \n",
       "2                      I am a student who is learning to code       No   \n",
       "3           I am not primarily a developer, but I write co...      Yes   \n",
       "4                              I am a developer by profession       No   \n",
       "5                              I am a developer by profession      Yes   \n",
       "...                                                       ...      ...   \n",
       "88377                                                     NaN      Yes   \n",
       "88601                                                     NaN       No   \n",
       "88802                                                     NaN       No   \n",
       "88816                                                     NaN       No   \n",
       "88863                                                     NaN      Yes   \n",
       "\n",
       "                           Country YearsCode  ConvertedComp  \\\n",
       "Respondent                                                    \n",
       "1                   United Kingdom         4            NaN   \n",
       "2           Bosnia and Herzegovina       NaN            NaN   \n",
       "3                         Thailand         3         8820.0   \n",
       "4                    United States         3        61000.0   \n",
       "5                          Ukraine        16            NaN   \n",
       "...                            ...       ...            ...   \n",
       "88377                       Canada       NaN            NaN   \n",
       "88601                          NaN       NaN            NaN   \n",
       "88802                          NaN       NaN            NaN   \n",
       "88816                          NaN       NaN            NaN   \n",
       "88863                        Spain         8            NaN   \n",
       "\n",
       "                                           LanguageWorkedWith SocialMedia  \\\n",
       "Respondent                                                                  \n",
       "1                             HTML/CSS;Java;JavaScript;Python     Twitter   \n",
       "2                                         C++;HTML/CSS;Python   Instagram   \n",
       "3                                                    HTML/CSS      Reddit   \n",
       "4                                         C;C++;C#;Python;SQL      Reddit   \n",
       "5                 C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA    Facebook   \n",
       "...                                                       ...         ...   \n",
       "88377                           HTML/CSS;JavaScript;Other(s):     YouTube   \n",
       "88601                                                     NaN         NaN   \n",
       "88802                                                     NaN         NaN   \n",
       "88816                                                     NaN         NaN   \n",
       "88863       Bash/Shell/PowerShell;HTML/CSS;Java;JavaScript...    WhatsApp   \n",
       "\n",
       "             Age Gender  \n",
       "Respondent               \n",
       "1           14.0    Man  \n",
       "2           19.0    Man  \n",
       "3           28.0    Man  \n",
       "4           22.0    Man  \n",
       "5           30.0    Man  \n",
       "...          ...    ...  \n",
       "88377        NaN    Man  \n",
       "88601        NaN    NaN  \n",
       "88802        NaN    NaN  \n",
       "88816        NaN    NaN  \n",
       "88863       18.0    Man  \n",
       "\n",
       "[88883 rows x 9 columns]"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "df9b825e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "WhatsApp                    266\n",
       "Facebook                    232\n",
       "YouTube                     182\n",
       "LinkedIn                     71\n",
       "Twitter                      58\n",
       "Instagram                    41\n",
       "Reddit                       28\n",
       "I don't use social media     23\n",
       "Snapchat                      5\n",
       "Hello                         1\n",
       "VK ВКонта́кте                 1\n",
       "Name: SocialMedia, dtype: int64"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[df.Country =='Pakistan', 'SocialMedia'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d54d37cb",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "846ab59e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "d65ad730",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "WeChat 微信                   403\n",
       "YouTube                      53\n",
       "Weibo 新浪微博                   42\n",
       "I don't use social media     27\n",
       "Twitter                      27\n",
       "Reddit                       12\n",
       "LinkedIn                     11\n",
       "Facebook                      8\n",
       "Instagram                     7\n",
       "Youku Tudou 优酷                7\n",
       "WhatsApp                      3\n",
       "VK ВКонта́кте                 1\n",
       "Name: SocialMedia, dtype: int64"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[df.Country =='Pakistan', :]\n",
    "df.loc[df.Country =='Pakistan', 'SocialMedia'].head(10)\n",
    "df.loc[df.Country =='Pakistan', 'SocialMedia'].value_counts()\n",
    "df.loc[df.Country =='Pakistan', 'SocialMedia'].value_counts(normalize=True)\n",
    "df.loc[df.Country =='China', 'SocialMedia'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "440d617b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4cf572c2",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2ba1fa80",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "95b681b2",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "54413fbf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7fc92e94b250>"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('Country')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "a7cf5d90",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>MainBranch</th>\n",
       "      <th>Hobbyist</th>\n",
       "      <th>Country</th>\n",
       "      <th>YearsCode</th>\n",
       "      <th>ConvertedComp</th>\n",
       "      <th>LanguageWorkedWith</th>\n",
       "      <th>SocialMedia</th>\n",
       "      <th>Age</th>\n",
       "      <th>Gender</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Respondent</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>84</th>\n",
       "      <td>I am a developer by profession</td>\n",
       "      <td>No</td>\n",
       "      <td>Pakistan</td>\n",
       "      <td>3</td>\n",
       "      <td>3468.0</td>\n",
       "      <td>C;C++;C#;Java;Kotlin;PHP;SQL</td>\n",
       "      <td>WhatsApp</td>\n",
       "      <td>26.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>119</th>\n",
       "      <td>I am a developer by profession</td>\n",
       "      <td>No</td>\n",
       "      <td>Pakistan</td>\n",
       "      <td>10</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C;C++;C#;HTML/CSS;Java;JavaScript;SQL</td>\n",
       "      <td>Facebook</td>\n",
       "      <td>28.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>298</th>\n",
       "      <td>I am a developer by profession</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Pakistan</td>\n",
       "      <td>4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>HTML/CSS;JavaScript;PHP;SQL;Other(s):</td>\n",
       "      <td>LinkedIn</td>\n",
       "      <td>23.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>299</th>\n",
       "      <td>I am a developer by profession</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Pakistan</td>\n",
       "      <td>19</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Assembly;C;C++;Java;Python;SQL</td>\n",
       "      <td>Facebook</td>\n",
       "      <td>25.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>311</th>\n",
       "      <td>I am a developer by profession</td>\n",
       "      <td>No</td>\n",
       "      <td>Pakistan</td>\n",
       "      <td>5</td>\n",
       "      <td>2600.0</td>\n",
       "      <td>Assembly;C;C++;C#;HTML/CSS;Java;Python;Scala;SQL</td>\n",
       "      <td>LinkedIn</td>\n",
       "      <td>24.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88862</th>\n",
       "      <td>I am a student who is learning to code</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Pakistan</td>\n",
       "      <td>3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Java</td>\n",
       "      <td>WhatsApp</td>\n",
       "      <td>21.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5439</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Pakistan</td>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Instagram</td>\n",
       "      <td>24.0</td>\n",
       "      <td>Woman</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39117</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Pakistan</td>\n",
       "      <td>4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C;C++;C#;HTML/CSS;Java;JavaScript;SQL</td>\n",
       "      <td>WhatsApp</td>\n",
       "      <td>22.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>60066</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Pakistan</td>\n",
       "      <td>4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Assembly;C++;C#;HTML/CSS;Java;PHP;Python;SQL</td>\n",
       "      <td>YouTube</td>\n",
       "      <td>20.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88182</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Pakistan</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>HTML/CSS;Java;JavaScript</td>\n",
       "      <td>Twitter</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>923 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        MainBranch Hobbyist   Country  \\\n",
       "Respondent                                                              \n",
       "84                  I am a developer by profession       No  Pakistan   \n",
       "119                 I am a developer by profession       No  Pakistan   \n",
       "298                 I am a developer by profession      Yes  Pakistan   \n",
       "299                 I am a developer by profession      Yes  Pakistan   \n",
       "311                 I am a developer by profession       No  Pakistan   \n",
       "...                                            ...      ...       ...   \n",
       "88862       I am a student who is learning to code      Yes  Pakistan   \n",
       "5439                                           NaN      Yes  Pakistan   \n",
       "39117                                          NaN      Yes  Pakistan   \n",
       "60066                                          NaN      Yes  Pakistan   \n",
       "88182                                          NaN      Yes  Pakistan   \n",
       "\n",
       "           YearsCode  ConvertedComp  \\\n",
       "Respondent                            \n",
       "84                 3         3468.0   \n",
       "119               10            NaN   \n",
       "298                4            NaN   \n",
       "299               19            NaN   \n",
       "311                5         2600.0   \n",
       "...              ...            ...   \n",
       "88862              3            NaN   \n",
       "5439               2            NaN   \n",
       "39117              4            NaN   \n",
       "60066              4            NaN   \n",
       "88182              1            NaN   \n",
       "\n",
       "                                          LanguageWorkedWith SocialMedia  \\\n",
       "Respondent                                                                 \n",
       "84                              C;C++;C#;Java;Kotlin;PHP;SQL    WhatsApp   \n",
       "119                    C;C++;C#;HTML/CSS;Java;JavaScript;SQL    Facebook   \n",
       "298                    HTML/CSS;JavaScript;PHP;SQL;Other(s):    LinkedIn   \n",
       "299                           Assembly;C;C++;Java;Python;SQL    Facebook   \n",
       "311         Assembly;C;C++;C#;HTML/CSS;Java;Python;Scala;SQL    LinkedIn   \n",
       "...                                                      ...         ...   \n",
       "88862                                                   Java    WhatsApp   \n",
       "5439                                                     NaN   Instagram   \n",
       "39117                  C;C++;C#;HTML/CSS;Java;JavaScript;SQL    WhatsApp   \n",
       "60066           Assembly;C++;C#;HTML/CSS;Java;PHP;Python;SQL     YouTube   \n",
       "88182                               HTML/CSS;Java;JavaScript     Twitter   \n",
       "\n",
       "             Age Gender  \n",
       "Respondent               \n",
       "84          26.0    Man  \n",
       "119         28.0    Man  \n",
       "298         23.0    Man  \n",
       "299         25.0    Man  \n",
       "311         24.0    Man  \n",
       "...          ...    ...  \n",
       "88862       21.0    Man  \n",
       "5439        24.0  Woman  \n",
       "39117       22.0    Man  \n",
       "60066       20.0    Man  \n",
       "88182        NaN    Man  \n",
       "\n",
       "[923 rows x 9 columns]"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('Country').get_group(\"Pakistan\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "8bc1753e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Respondent\n",
       "84        WhatsApp\n",
       "119       Facebook\n",
       "298       LinkedIn\n",
       "299       Facebook\n",
       "311       LinkedIn\n",
       "           ...    \n",
       "88862     WhatsApp\n",
       "5439     Instagram\n",
       "39117     WhatsApp\n",
       "60066      YouTube\n",
       "88182      Twitter\n",
       "Name: SocialMedia, Length: 923, dtype: object"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('Country').get_group(\"Pakistan\").loc[:, 'SocialMedia']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "caa9a03c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "WhatsApp                    266\n",
       "Facebook                    232\n",
       "YouTube                     182\n",
       "LinkedIn                     71\n",
       "Twitter                      58\n",
       "Instagram                    41\n",
       "Reddit                       28\n",
       "I don't use social media     23\n",
       "Snapchat                      5\n",
       "Hello                         1\n",
       "VK ВКонта́кте                 1\n",
       "Name: SocialMedia, dtype: int64"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('Country').get_group(\"Pakistan\").loc[:, 'SocialMedia'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "083d3d5f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Country              SocialMedia             \n",
       "Afghanistan          Facebook                     15\n",
       "                     YouTube                       9\n",
       "                     I don't use social media      6\n",
       "                     WhatsApp                      4\n",
       "                     Instagram                     1\n",
       "                     LinkedIn                      1\n",
       "                     Twitter                       1\n",
       "Albania              WhatsApp                     18\n",
       "                     Facebook                     16\n",
       "                     Instagram                    13\n",
       "                     YouTube                      10\n",
       "                     Twitter                       8\n",
       "                     LinkedIn                      7\n",
       "                     Reddit                        6\n",
       "                     I don't use social media      4\n",
       "                     Snapchat                      1\n",
       "                     WeChat 微信                     1\n",
       "Algeria              YouTube                      42\n",
       "                     Facebook                     41\n",
       "                     Twitter                      14\n",
       "                     LinkedIn                      9\n",
       "                     I don't use social media      7\n",
       "                     Instagram                     7\n",
       "                     Reddit                        2\n",
       "                     WhatsApp                      2\n",
       "Andorra              Facebook                      2\n",
       "                     I don't use social media      2\n",
       "                     Reddit                        1\n",
       "                     Snapchat                      1\n",
       "                     YouTube                       1\n",
       "Angola               Twitter                       2\n",
       "                     Facebook                      1\n",
       "                     YouTube                       1\n",
       "Antigua and Barbuda  YouTube                       2\n",
       "                     Facebook                      1\n",
       "                     Instagram                     1\n",
       "                     Twitter                       1\n",
       "                     VK ВКонта́кте                 1\n",
       "Argentina            WhatsApp                    202\n",
       "                     Twitter                      80\n",
       "                     YouTube                      62\n",
       "                     Instagram                    53\n",
       "                     Reddit                       50\n",
       "                     Facebook                     45\n",
       "                     I don't use social media     22\n",
       "                     LinkedIn                     16\n",
       "Armenia              Facebook                     38\n",
       "                     YouTube                      23\n",
       "                     LinkedIn                      5\n",
       "                     Instagram                     3\n",
       "                     Reddit                        3\n",
       "                     Twitter                       3\n",
       "                     WhatsApp                      3\n",
       "                     VK ВКонта́кте                 2\n",
       "                     I don't use social media      1\n",
       "Australia            Reddit                      407\n",
       "                     Facebook                    320\n",
       "                     Twitter                     283\n",
       "                     YouTube                     242\n",
       "                     I don't use social media    156\n",
       "Name: SocialMedia, dtype: int64"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('Country')['SocialMedia'].value_counts().head(60)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f8e521de",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.groupby('Country')['SocialMedia'].value_counts().head(50)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4640ef13",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.groupby('Country')['SocialMedia'].value_counts().head(50)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fefb673d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d8f5efc5",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a98c714e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "df115e7a",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "45db2fba",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "f18c4077",
   "metadata": {},
   "source": [
    "##  Question 2: \n",
    ">**What percentage of people in each country knows Python programming?**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "128cbeb8",
   "metadata": {},
   "source": [
    "**tc** = Total count of people from each country who participated in the survey?\n",
    "\n",
    "**pc** = Python People: Count of people from each country who knows Python"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "045b2e32",
   "metadata": {},
   "source": [
    "**tc (option 1):**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "fef318d9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>MainBranch</th>\n",
       "      <th>Hobbyist</th>\n",
       "      <th>Country</th>\n",
       "      <th>YearsCode</th>\n",
       "      <th>ConvertedComp</th>\n",
       "      <th>LanguageWorkedWith</th>\n",
       "      <th>SocialMedia</th>\n",
       "      <th>Age</th>\n",
       "      <th>Gender</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Respondent</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>I am a student who is learning to code</td>\n",
       "      <td>Yes</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>HTML/CSS;Java;JavaScript;Python</td>\n",
       "      <td>Twitter</td>\n",
       "      <td>14.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>I am a student who is learning to code</td>\n",
       "      <td>No</td>\n",
       "      <td>Bosnia and Herzegovina</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C++;HTML/CSS;Python</td>\n",
       "      <td>Instagram</td>\n",
       "      <td>19.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>I am not primarily a developer, but I write co...</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Thailand</td>\n",
       "      <td>3</td>\n",
       "      <td>8820.0</td>\n",
       "      <td>HTML/CSS</td>\n",
       "      <td>Reddit</td>\n",
       "      <td>28.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>I am a developer by profession</td>\n",
       "      <td>No</td>\n",
       "      <td>United States</td>\n",
       "      <td>3</td>\n",
       "      <td>61000.0</td>\n",
       "      <td>C;C++;C#;Python;SQL</td>\n",
       "      <td>Reddit</td>\n",
       "      <td>22.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>I am a developer by profession</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Ukraine</td>\n",
       "      <td>16</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA</td>\n",
       "      <td>Facebook</td>\n",
       "      <td>30.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88377</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Canada</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>HTML/CSS;JavaScript;Other(s):</td>\n",
       "      <td>YouTube</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88601</th>\n",
       "      <td>NaN</td>\n",
       "      <td>No</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88802</th>\n",
       "      <td>NaN</td>\n",
       "      <td>No</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88816</th>\n",
       "      <td>NaN</td>\n",
       "      <td>No</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88863</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Spain</td>\n",
       "      <td>8</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Bash/Shell/PowerShell;HTML/CSS;Java;JavaScript...</td>\n",
       "      <td>WhatsApp</td>\n",
       "      <td>18.0</td>\n",
       "      <td>Man</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>88883 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                   MainBranch Hobbyist  \\\n",
       "Respondent                                                               \n",
       "1                      I am a student who is learning to code      Yes   \n",
       "2                      I am a student who is learning to code       No   \n",
       "3           I am not primarily a developer, but I write co...      Yes   \n",
       "4                              I am a developer by profession       No   \n",
       "5                              I am a developer by profession      Yes   \n",
       "...                                                       ...      ...   \n",
       "88377                                                     NaN      Yes   \n",
       "88601                                                     NaN       No   \n",
       "88802                                                     NaN       No   \n",
       "88816                                                     NaN       No   \n",
       "88863                                                     NaN      Yes   \n",
       "\n",
       "                           Country YearsCode  ConvertedComp  \\\n",
       "Respondent                                                    \n",
       "1                   United Kingdom         4            NaN   \n",
       "2           Bosnia and Herzegovina       NaN            NaN   \n",
       "3                         Thailand         3         8820.0   \n",
       "4                    United States         3        61000.0   \n",
       "5                          Ukraine        16            NaN   \n",
       "...                            ...       ...            ...   \n",
       "88377                       Canada       NaN            NaN   \n",
       "88601                          NaN       NaN            NaN   \n",
       "88802                          NaN       NaN            NaN   \n",
       "88816                          NaN       NaN            NaN   \n",
       "88863                        Spain         8            NaN   \n",
       "\n",
       "                                           LanguageWorkedWith SocialMedia  \\\n",
       "Respondent                                                                  \n",
       "1                             HTML/CSS;Java;JavaScript;Python     Twitter   \n",
       "2                                         C++;HTML/CSS;Python   Instagram   \n",
       "3                                                    HTML/CSS      Reddit   \n",
       "4                                         C;C++;C#;Python;SQL      Reddit   \n",
       "5                 C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA    Facebook   \n",
       "...                                                       ...         ...   \n",
       "88377                           HTML/CSS;JavaScript;Other(s):     YouTube   \n",
       "88601                                                     NaN         NaN   \n",
       "88802                                                     NaN         NaN   \n",
       "88816                                                     NaN         NaN   \n",
       "88863       Bash/Shell/PowerShell;HTML/CSS;Java;JavaScript...    WhatsApp   \n",
       "\n",
       "             Age Gender  \n",
       "Respondent               \n",
       "1           14.0    Man  \n",
       "2           19.0    Man  \n",
       "3           28.0    Man  \n",
       "4           22.0    Man  \n",
       "5           30.0    Man  \n",
       "...          ...    ...  \n",
       "88377        NaN    Man  \n",
       "88601        NaN    NaN  \n",
       "88802        NaN    NaN  \n",
       "88816        NaN    NaN  \n",
       "88863       18.0    Man  \n",
       "\n",
       "[88883 rows x 9 columns]"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "b61430a3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Respondent\n",
       "1                United Kingdom\n",
       "2        Bosnia and Herzegovina\n",
       "3                      Thailand\n",
       "4                 United States\n",
       "5                       Ukraine\n",
       "                  ...          \n",
       "88377                    Canada\n",
       "88601                       NaN\n",
       "88802                       NaN\n",
       "88816                       NaN\n",
       "88863                     Spain\n",
       "Name: Country, Length: 88883, dtype: object"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[:, 'Country']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "id": "a61e0e18",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "United States        20949\n",
       "India                 9061\n",
       "Germany               5866\n",
       "United Kingdom        5737\n",
       "Canada                3395\n",
       "                     ...  \n",
       "Tonga                    1\n",
       "Timor-Leste              1\n",
       "North Korea              1\n",
       "Brunei Darussalam        1\n",
       "Chad                     1\n",
       "Name: Total, Length: 179, dtype: int64"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tc = df['Country'].value_counts()\n",
    "tc.name = 'Total'\n",
    "tc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "00634605",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3112db44",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "36b6d5e0",
   "metadata": {},
   "source": [
    "**tc (option 2):**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "id": "64cf9c86",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7fc92ea07b20>"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfgb = df.groupby('Country')\n",
    "dfgb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "id": "b2fca694",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.SeriesGroupBy object at 0x7fc92ea07220>"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('Country')['Country']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "id": "9f6901aa",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Country                                                                   \n",
       "Afghanistan                           Afghanistan                              44\n",
       "Albania                               Albania                                  86\n",
       "Algeria                               Algeria                                 134\n",
       "Andorra                               Andorra                                   7\n",
       "Angola                                Angola                                    5\n",
       "                                                                             ... \n",
       "Venezuela, Bolivarian Republic of...  Venezuela, Bolivarian Republic of...     88\n",
       "Viet Nam                              Viet Nam                                231\n",
       "Yemen                                 Yemen                                    19\n",
       "Zambia                                Zambia                                   12\n",
       "Zimbabwe                              Zimbabwe                                 39\n",
       "Name: Country, Length: 179, dtype: int64"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('Country')['Country'].apply(lambda x: x.value_counts())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "acbcaa82",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "684d3e07",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "aab00ae8",
   "metadata": {},
   "source": [
    "**pc:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "id": "d0066ac5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Respondent\n",
       "1                          HTML/CSS;Java;JavaScript;Python\n",
       "2                                      C++;HTML/CSS;Python\n",
       "3                                                 HTML/CSS\n",
       "4                                      C;C++;C#;Python;SQL\n",
       "5              C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA\n",
       "                               ...                        \n",
       "88377                        HTML/CSS;JavaScript;Other(s):\n",
       "88601                                                  NaN\n",
       "88802                                                  NaN\n",
       "88816                                                  NaN\n",
       "88863    Bash/Shell/PowerShell;HTML/CSS;Java;JavaScript...\n",
       "Name: LanguageWorkedWith, Length: 88883, dtype: object"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[:, 'LanguageWorkedWith']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "id": "db814708",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.SeriesGroupBy object at 0x7fc92ea4a460>"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('Country')['LanguageWorkedWith']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "id": "61ef5d9b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Respondent\n",
       "1         True\n",
       "2         True\n",
       "3        False\n",
       "4         True\n",
       "5         True\n",
       "         ...  \n",
       "88182    False\n",
       "88212     True\n",
       "88282    False\n",
       "88377    False\n",
       "88863    False\n",
       "Name: LanguageWorkedWith, Length: 88751, dtype: object"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('Country')['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "id": "9b3e1b22",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Country\n",
       "Afghanistan                              8\n",
       "Albania                                 23\n",
       "Algeria                                 40\n",
       "Andorra                                  0\n",
       "Angola                                   2\n",
       "                                        ..\n",
       "Venezuela, Bolivarian Republic of...    28\n",
       "Viet Nam                                78\n",
       "Yemen                                    3\n",
       "Zambia                                   4\n",
       "Zimbabwe                                14\n",
       "Name: LanguageWorkedWith, Length: 179, dtype: int64"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pp = df.groupby('Country')['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum())\n",
    "pp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "id": "26205a53",
   "metadata": {},
   "outputs": [],
   "source": [
    "pp.name = 'Knows Python'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "09886e0f",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6954a136",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "ced1a4b0",
   "metadata": {},
   "source": [
    "**Create a Dataframe of two series tc and pp**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "id": "989aec12",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Total</th>\n",
       "      <th>Knows Python</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>United States</th>\n",
       "      <td>20949</td>\n",
       "      <td>10083</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>India</th>\n",
       "      <td>9061</td>\n",
       "      <td>3105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Germany</th>\n",
       "      <td>5866</td>\n",
       "      <td>2451</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>United Kingdom</th>\n",
       "      <td>5737</td>\n",
       "      <td>2384</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Canada</th>\n",
       "      <td>3395</td>\n",
       "      <td>1558</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tonga</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Timor-Leste</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>North Korea</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Brunei Darussalam</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Chad</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>179 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                   Total  Knows Python\n",
       "United States      20949         10083\n",
       "India               9061          3105\n",
       "Germany             5866          2451\n",
       "United Kingdom      5737          2384\n",
       "Canada              3395          1558\n",
       "...                  ...           ...\n",
       "Tonga                  1             0\n",
       "Timor-Leste            1             1\n",
       "North Korea            1             0\n",
       "Brunei Darussalam      1             0\n",
       "Chad                   1             0\n",
       "\n",
       "[179 rows x 2 columns]"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "resultdf = pd.concat([tc, pp], axis=1)\n",
    "resultdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "id": "df17703c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Total           923\n",
       "Knows Python    251\n",
       "Name: Pakistan, dtype: int64"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "resultdf.loc['Pakistan']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c670b7e9",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2891419b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "e96094bb",
   "metadata": {},
   "source": [
    "**Percentage of people in each country knows Python?**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "id": "d09f7a77",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Total</th>\n",
       "      <th>Knows Python</th>\n",
       "      <th>Percentage</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>United States</th>\n",
       "      <td>20949</td>\n",
       "      <td>10083</td>\n",
       "      <td>48.131176</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>India</th>\n",
       "      <td>9061</td>\n",
       "      <td>3105</td>\n",
       "      <td>34.267741</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Germany</th>\n",
       "      <td>5866</td>\n",
       "      <td>2451</td>\n",
       "      <td>41.783157</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>United Kingdom</th>\n",
       "      <td>5737</td>\n",
       "      <td>2384</td>\n",
       "      <td>41.554820</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Canada</th>\n",
       "      <td>3395</td>\n",
       "      <td>1558</td>\n",
       "      <td>45.891016</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tonga</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Timor-Leste</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>100.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>North Korea</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Brunei Darussalam</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Chad</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>179 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                   Total  Knows Python  Percentage\n",
       "United States      20949         10083   48.131176\n",
       "India               9061          3105   34.267741\n",
       "Germany             5866          2451   41.783157\n",
       "United Kingdom      5737          2384   41.554820\n",
       "Canada              3395          1558   45.891016\n",
       "...                  ...           ...         ...\n",
       "Tonga                  1             0    0.000000\n",
       "Timor-Leste            1             1  100.000000\n",
       "North Korea            1             0    0.000000\n",
       "Brunei Darussalam      1             0    0.000000\n",
       "Chad                   1             0    0.000000\n",
       "\n",
       "[179 rows x 3 columns]"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "resultdf['Percentage'] = (resultdf['Knows Python'] / resultdf['Total']) * 100\n",
    "resultdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7530f928",
   "metadata": {},
   "outputs": [],
   "source": [
    "resultdf.loc['Pakistan']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b72302eb",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}