{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "4a87b5ef",
   "metadata": {},
   "source": [
    "---   \n",
    " <img align=\"left\" width=\"75\" height=\"75\"  src=\"https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png\"> \n",
    "\n",
    "<h1 align=\"center\">Department of Data Science</h1>\n",
    "<h1 align=\"center\">Course: Tools and Techniques for Data Science</h1>\n",
    "\n",
    "---\n",
    "<h3><div align=\"right\">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab0dc25c",
   "metadata": {},
   "source": [
    "<h1 align=\"center\">Lecture 3.19 (Pandas-11)</h1>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f4098342",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/arifpucit/data-science/blob/master/Section-3-Python-for-Data-Scientists/Lec-3.19(Pandas-11-Reshape-using-pivot-melt-crosstab).ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19f82705",
   "metadata": {},
   "source": [
    "## _Reshape using pivot, melt, and crosstab.ipynb_\n",
    "\n",
    "<img align=\"right\" width=\"400\" height=\"400\"  src=\"images/pandas-apps.png\"  >"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4e5a6724",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5ee97eba",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1d04e9fa",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5f6be983",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "12db95e1",
   "metadata": {},
   "source": [
    "## Learning agenda of this notebook\n",
    "\n",
    "1. Reshape Data Using `pivot()` and `pivot_table()` methods\n",
    "2. Reshape Data Using `melt()` method\n",
    "3. Reshape Data Using `crosstab()` method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0e3a8656",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "386eb62a",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0bd3b24b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e17eebda",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8c42bc83",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "98eb8c30",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9f90af46",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "3d7cef08",
   "metadata": {},
   "source": [
    "## 1. Reshaping Data Using `df.pivot()`  and `df.pivot_table()` Methods"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65f584ea",
   "metadata": {},
   "source": [
    "**```df.pivot(index=None, columns=None, values=None)```**<br>\n",
    "**```pandas.pivot(data, index=None, columns=None, values=None)```**\n",
    "\n",
    "Where,\n",
    "- `index`: Column to use as new dataframe's index. If None, uses existing index.\n",
    "- `columns`: Column to use to make new dataframe columns.\n",
    "- `values`:  Column(s) to use for populating new frame's values. \n",
    "\n",
    "Read more about `pd.pivot()`: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html\n",
    "\n",
    "Read more about `df.pivot()`: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d8f3bc2c",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "a4cfabf7",
   "metadata": {},
   "source": [
    "**`df.pivot_table(index=None, columns=None, values=None, aggfunc= 'mean', fill_valus=None)`**<br>\n",
    "**`pandas.pivot_table(data, index=None, columns=None, values=None, aggfunc= 'mean', fill_valus=None)`**\n",
    "\n",
    "Where,\n",
    "- `index`: Column to use as new dataframe's index. If None, uses existing index.\n",
    "- `columns`: Column to use to make new dataframe columns.\n",
    "- `values`:  Column(s) to use for populating new frame's values. \n",
    "- `aggfunc`:  default is numpy.mean\n",
    "- `fill_value`: Value to replace missing values with\n",
    "\n",
    "\n",
    "\n",
    "Read more about `pd.pivot_table()`: https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html\n",
    "\n",
    "Read more about `df.pivot_table()`: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot_table.html?highlight=dataframe%20pivot_table#pandas.DataFrame.pivot_table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6ddc0e09",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30159c80",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3a7e18b6",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "4b727393",
   "metadata": {},
   "source": [
    "### DataSet 1:\n",
    "\n",
    "You can use both `pivot()` as well as `pivot_table()` methods over here\n",
    "\n",
    "<img align=\"center\" width=\"900\" height=\"600\"  src=\"images/ds1.png\"  >"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1d6ab6a1",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d178b48b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7f722f49",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "3be05ec7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "      <th>humidity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>20/06/2021</td>\n",
       "      <td>Lahore</td>\n",
       "      <td>38</td>\n",
       "      <td>76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>21/06/2021</td>\n",
       "      <td>Lahore</td>\n",
       "      <td>41</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>22/06/2021</td>\n",
       "      <td>Lahore</td>\n",
       "      <td>39</td>\n",
       "      <td>85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>20/06/2021</td>\n",
       "      <td>Muree</td>\n",
       "      <td>17</td>\n",
       "      <td>71</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>21/06/2021</td>\n",
       "      <td>Muree</td>\n",
       "      <td>15</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>22/06/2021</td>\n",
       "      <td>Muree</td>\n",
       "      <td>18</td>\n",
       "      <td>74</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>20/06/2021</td>\n",
       "      <td>Karachi</td>\n",
       "      <td>33</td>\n",
       "      <td>93</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>21/06/2021</td>\n",
       "      <td>Karachi</td>\n",
       "      <td>30</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>22/06/2021</td>\n",
       "      <td>Karachi</td>\n",
       "      <td>35</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date     city  temperature  humidity\n",
       "0  20/06/2021   Lahore           38        76\n",
       "1  21/06/2021   Lahore           41        80\n",
       "2  22/06/2021   Lahore           39        85\n",
       "3  20/06/2021    Muree           17        71\n",
       "4  21/06/2021    Muree           15        70\n",
       "5  22/06/2021    Muree           18        74\n",
       "6  20/06/2021  Karachi           33        93\n",
       "7  21/06/2021  Karachi           30        91\n",
       "8  22/06/2021  Karachi           35        90"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "df = pd.read_csv('datasets/pivot_weather1.csv')\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "09152536",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>date</th>\n",
       "      <th>20/06/2021</th>\n",
       "      <th>21/06/2021</th>\n",
       "      <th>22/06/2021</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>city</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Karachi</th>\n",
       "      <td>93</td>\n",
       "      <td>91</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lahore</th>\n",
       "      <td>76</td>\n",
       "      <td>80</td>\n",
       "      <td>85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Muree</th>\n",
       "      <td>71</td>\n",
       "      <td>70</td>\n",
       "      <td>74</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "date     20/06/2021  21/06/2021  22/06/2021\n",
       "city                                       \n",
       "Karachi          93          91          90\n",
       "Lahore           76          80          85\n",
       "Muree            71          70          74"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pivot_table(index='city', columns='date', values='humidity')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c929fa6d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "106e2348",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "da60f9d0",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "9e4eac82",
   "metadata": {},
   "source": [
    ">**Suppose we want to have one record for each city, containing temperature and humidity for each date**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "ecdaafe6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">temperature</th>\n",
       "      <th colspan=\"3\" halign=\"left\">humidity</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th>20/06/2021</th>\n",
       "      <th>21/06/2021</th>\n",
       "      <th>22/06/2021</th>\n",
       "      <th>20/06/2021</th>\n",
       "      <th>21/06/2021</th>\n",
       "      <th>22/06/2021</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>city</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Karachi</th>\n",
       "      <td>33</td>\n",
       "      <td>30</td>\n",
       "      <td>35</td>\n",
       "      <td>93</td>\n",
       "      <td>91</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lahore</th>\n",
       "      <td>38</td>\n",
       "      <td>41</td>\n",
       "      <td>39</td>\n",
       "      <td>76</td>\n",
       "      <td>80</td>\n",
       "      <td>85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Muree</th>\n",
       "      <td>17</td>\n",
       "      <td>15</td>\n",
       "      <td>18</td>\n",
       "      <td>71</td>\n",
       "      <td>70</td>\n",
       "      <td>74</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        temperature                         humidity                      \n",
       "date     20/06/2021 21/06/2021 22/06/2021 20/06/2021 21/06/2021 22/06/2021\n",
       "city                                                                      \n",
       "Karachi          33         30         35         93         91         90\n",
       "Lahore           38         41         39         76         80         85\n",
       "Muree            17         15         18         71         70         74"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# using pivot()\n",
    "df1 = df.pivot(index='city', columns='date')\n",
    "df1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c87ddcf4",
   "metadata": {},
   "source": [
    ">**Let us repeat the same using `pivot_table()`**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "733834ef",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">humidity</th>\n",
       "      <th colspan=\"3\" halign=\"left\">temperature</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th>20/06/2021</th>\n",
       "      <th>21/06/2021</th>\n",
       "      <th>22/06/2021</th>\n",
       "      <th>20/06/2021</th>\n",
       "      <th>21/06/2021</th>\n",
       "      <th>22/06/2021</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>city</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Karachi</th>\n",
       "      <td>93</td>\n",
       "      <td>91</td>\n",
       "      <td>90</td>\n",
       "      <td>33</td>\n",
       "      <td>30</td>\n",
       "      <td>35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lahore</th>\n",
       "      <td>76</td>\n",
       "      <td>80</td>\n",
       "      <td>85</td>\n",
       "      <td>38</td>\n",
       "      <td>41</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Muree</th>\n",
       "      <td>71</td>\n",
       "      <td>70</td>\n",
       "      <td>74</td>\n",
       "      <td>17</td>\n",
       "      <td>15</td>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          humidity                       temperature                      \n",
       "date    20/06/2021 21/06/2021 22/06/2021  20/06/2021 21/06/2021 22/06/2021\n",
       "city                                                                      \n",
       "Karachi         93         91         90          33         30         35\n",
       "Lahore          76         80         85          38         41         39\n",
       "Muree           71         70         74          17         15         18"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# using pivot_table()\n",
    "df1 = df.pivot_table(index='city', columns='date')\n",
    "df1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80dc9a4c",
   "metadata": {},
   "source": [
    "- By setting the `index='city'`, the city column is the left most column now having unique values. \n",
    "- By setting the `columns='date'`, the values from the date column have become the  column headers now.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4d7b59da",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8780aeb5",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ef361057",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1a7ab5e5",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cb06eb2b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b3b2b092",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "70d95b45",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "0b80f036",
   "metadata": {},
   "source": [
    "**Suppose we want to see only the temperature or humidity column in the output dataframe. This can be achieved by setting the `values` argument to the name of the column**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "be6e29d0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>date</th>\n",
       "      <th>20/06/2021</th>\n",
       "      <th>21/06/2021</th>\n",
       "      <th>22/06/2021</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>city</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Karachi</th>\n",
       "      <td>33</td>\n",
       "      <td>30</td>\n",
       "      <td>35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lahore</th>\n",
       "      <td>38</td>\n",
       "      <td>41</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Muree</th>\n",
       "      <td>17</td>\n",
       "      <td>15</td>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "date     20/06/2021  21/06/2021  22/06/2021\n",
       "city                                       \n",
       "Karachi          33          30          35\n",
       "Lahore           38          41          39\n",
       "Muree            17          15          18"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1 = df.pivot(index='city', columns='date', values='temperature')\n",
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "f2c67520",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>date</th>\n",
       "      <th>20/06/2021</th>\n",
       "      <th>21/06/2021</th>\n",
       "      <th>22/06/2021</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>city</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Karachi</th>\n",
       "      <td>93</td>\n",
       "      <td>91</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lahore</th>\n",
       "      <td>76</td>\n",
       "      <td>80</td>\n",
       "      <td>85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Muree</th>\n",
       "      <td>71</td>\n",
       "      <td>70</td>\n",
       "      <td>74</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "date     20/06/2021  21/06/2021  22/06/2021\n",
       "city                                       \n",
       "Karachi          93          91          90\n",
       "Lahore           76          80          85\n",
       "Muree            71          70          74"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1 = df.pivot(index='city', columns='date', values='humidity')\n",
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d1b2dff",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8c45fb62",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "51238d81",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "9cd700b7",
   "metadata": {},
   "source": [
    "**Let us keep the date along index and city at the column, so that the output dataframe should have one record for each date, containing temperature and humidity for each city**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "dd1dbaef",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">temperature</th>\n",
       "      <th colspan=\"3\" halign=\"left\">humidity</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>city</th>\n",
       "      <th>Karachi</th>\n",
       "      <th>Lahore</th>\n",
       "      <th>Muree</th>\n",
       "      <th>Karachi</th>\n",
       "      <th>Lahore</th>\n",
       "      <th>Muree</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>20/06/2021</th>\n",
       "      <td>33</td>\n",
       "      <td>38</td>\n",
       "      <td>17</td>\n",
       "      <td>93</td>\n",
       "      <td>76</td>\n",
       "      <td>71</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21/06/2021</th>\n",
       "      <td>30</td>\n",
       "      <td>41</td>\n",
       "      <td>15</td>\n",
       "      <td>91</td>\n",
       "      <td>80</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22/06/2021</th>\n",
       "      <td>35</td>\n",
       "      <td>39</td>\n",
       "      <td>18</td>\n",
       "      <td>90</td>\n",
       "      <td>85</td>\n",
       "      <td>74</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           temperature              humidity             \n",
       "city           Karachi Lahore Muree  Karachi Lahore Muree\n",
       "date                                                     \n",
       "20/06/2021          33     38    17       93     76    71\n",
       "21/06/2021          30     41    15       91     80    70\n",
       "22/06/2021          35     39    18       90     85    74"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# using pivot()\n",
    "df1 = df.pivot(index='date', columns='city')\n",
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "bef120f4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">humidity</th>\n",
       "      <th colspan=\"3\" halign=\"left\">temperature</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>city</th>\n",
       "      <th>Karachi</th>\n",
       "      <th>Lahore</th>\n",
       "      <th>Muree</th>\n",
       "      <th>Karachi</th>\n",
       "      <th>Lahore</th>\n",
       "      <th>Muree</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>20/06/2021</th>\n",
       "      <td>93</td>\n",
       "      <td>76</td>\n",
       "      <td>71</td>\n",
       "      <td>33</td>\n",
       "      <td>38</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21/06/2021</th>\n",
       "      <td>91</td>\n",
       "      <td>80</td>\n",
       "      <td>70</td>\n",
       "      <td>30</td>\n",
       "      <td>41</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22/06/2021</th>\n",
       "      <td>90</td>\n",
       "      <td>85</td>\n",
       "      <td>74</td>\n",
       "      <td>35</td>\n",
       "      <td>39</td>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           humidity              temperature             \n",
       "city        Karachi Lahore Muree     Karachi Lahore Muree\n",
       "date                                                     \n",
       "20/06/2021       93     76    71          33     38    17\n",
       "21/06/2021       91     80    70          30     41    15\n",
       "22/06/2021       90     85    74          35     39    18"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# using pivot_table()\n",
    "df1 = df.pivot_table(index='date', columns='city')\n",
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "67a364c0",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "656bdfd0",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17e8d7ce",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "7ad0fa1f",
   "metadata": {},
   "source": [
    "### DataSet 2:\n",
    "You cannot use `pivot()` due to multiple values, however, `pivot_table()` will work, as it will take the `mean()` of those values\n",
    "<img align=\"center\" width=\"900\" height=\"600\"  src=\"images/ds2.png\"  >"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bdf8a311",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a8b09363",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4f6af2c7",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "7df8d4c1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "      <th>humidity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>20/06/2021</td>\n",
       "      <td>Lahore</td>\n",
       "      <td>38</td>\n",
       "      <td>76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>20/06/2021</td>\n",
       "      <td>Lahore</td>\n",
       "      <td>40</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>21/06/2021</td>\n",
       "      <td>Lahore</td>\n",
       "      <td>39</td>\n",
       "      <td>79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>21/06/2021</td>\n",
       "      <td>Lahore</td>\n",
       "      <td>37</td>\n",
       "      <td>74</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>20/06/2021</td>\n",
       "      <td>Muree</td>\n",
       "      <td>15</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>20/06/2021</td>\n",
       "      <td>Muree</td>\n",
       "      <td>17</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>21/06/2021</td>\n",
       "      <td>Muree</td>\n",
       "      <td>10</td>\n",
       "      <td>93</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>21/06/2021</td>\n",
       "      <td>Muree</td>\n",
       "      <td>8</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date    city  temperature  humidity\n",
       "0  20/06/2021  Lahore           38        76\n",
       "1  20/06/2021  Lahore           40        75\n",
       "2  21/06/2021  Lahore           39        79\n",
       "3  21/06/2021  Lahore           37        74\n",
       "4  20/06/2021   Muree           15        88\n",
       "5  20/06/2021   Muree           17        90\n",
       "6  21/06/2021   Muree           10        93\n",
       "7  21/06/2021   Muree            8        91"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "df = pd.read_csv('datasets/pivot_weather2.csv')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17b47a4a",
   "metadata": {},
   "source": [
    ">Note in this dataset we donot have unique values for date and city combined "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "da8298eb",
   "metadata": {},
   "outputs": [],
   "source": [
    "#df1 = df.pivot(index='date', columns='city')\n",
    "#df1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5aca275",
   "metadata": {},
   "source": [
    "- When we set the index to `date` and columns to `city`, the `pivot()` tries to set the left key to `20/06/2021` and then match the column name of the differing city (Lahore) values. \n",
    "- In this case there are two rows which have `20/06/2021` and columns of `Lahore`. The function doesn't know what value to put into cell values. \n",
    "- So raise a ValueError: Index contains duplicate entries, cannot reshape\n",
    "- Pivot and pivot_table may only exhibit the same functionality if the data allows. If there are duplicate entries possible from the index(es) of interest you will need to aggregate the data in pivot_table, not pivot (due to duplicate error).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4d017c0c",
   "metadata": {},
   "source": [
    "- Let us try to do the same using `pivot_table()` method\n",
    "- In the `pivot_table` function, there is another argument `aggfunc=’mean’` that decides this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "9e553c54",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">humidity</th>\n",
       "      <th colspan=\"2\" halign=\"left\">temperature</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>city</th>\n",
       "      <th>Lahore</th>\n",
       "      <th>Muree</th>\n",
       "      <th>Lahore</th>\n",
       "      <th>Muree</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>20/06/2021</th>\n",
       "      <td>75.5</td>\n",
       "      <td>89.0</td>\n",
       "      <td>39</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21/06/2021</th>\n",
       "      <td>76.5</td>\n",
       "      <td>92.0</td>\n",
       "      <td>38</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           humidity       temperature      \n",
       "city         Lahore Muree      Lahore Muree\n",
       "date                                       \n",
       "20/06/2021     75.5  89.0          39    16\n",
       "21/06/2021     76.5  92.0          38     9"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1 = df.pivot_table(index='date', columns='city')\n",
    "df1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5cce50d",
   "metadata": {},
   "source": [
    ">The default value to the `aggfunc` argument is `mean`, and you can explicitly pass any other aggregate function name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "9608dfb8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">humidity</th>\n",
       "      <th colspan=\"2\" halign=\"left\">temperature</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>city</th>\n",
       "      <th>Lahore</th>\n",
       "      <th>Muree</th>\n",
       "      <th>Lahore</th>\n",
       "      <th>Muree</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>20/06/2021</th>\n",
       "      <td>151</td>\n",
       "      <td>178</td>\n",
       "      <td>78</td>\n",
       "      <td>32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21/06/2021</th>\n",
       "      <td>153</td>\n",
       "      <td>184</td>\n",
       "      <td>76</td>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           humidity       temperature      \n",
       "city         Lahore Muree      Lahore Muree\n",
       "date                                       \n",
       "20/06/2021      151   178          78    32\n",
       "21/06/2021      153   184          76    18"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1 = df.pivot_table(index='date', columns='city', aggfunc='sum')\n",
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2b0cca30",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eed40835",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "06702a97",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "70942f23",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e387825e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee284d05",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "9f084bcd",
   "metadata": {},
   "source": [
    "### DataSet 3:\n",
    "<img align=\"center\" width=\"900\" height=\"600\"  src=\"images/ds3.png\"  >"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f2a7f4dc",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fbd6f893",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "abd24ed7",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "fcb47b09",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gender</th>\n",
       "      <th>sport</th>\n",
       "      <th>age</th>\n",
       "      <th>height</th>\n",
       "      <th>weight</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>male</td>\n",
       "      <td>cricket</td>\n",
       "      <td>22</td>\n",
       "      <td>72</td>\n",
       "      <td>200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>female</td>\n",
       "      <td>cricket</td>\n",
       "      <td>21</td>\n",
       "      <td>72</td>\n",
       "      <td>130</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>female</td>\n",
       "      <td>basketball</td>\n",
       "      <td>23</td>\n",
       "      <td>73</td>\n",
       "      <td>150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>male</td>\n",
       "      <td>basketball</td>\n",
       "      <td>21</td>\n",
       "      <td>75</td>\n",
       "      <td>175</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>female</td>\n",
       "      <td>cricket</td>\n",
       "      <td>20</td>\n",
       "      <td>68</td>\n",
       "      <td>170</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   gender       sport  age  height  weight\n",
       "0    male     cricket   22      72     200\n",
       "1  female     cricket   21      72     130\n",
       "2  female  basketball   23      73     150\n",
       "3    male  basketball   21      75     175\n",
       "4  female     cricket   20      68     170"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "df = pd.read_csv('datasets/pivot_std1.csv')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b45397f6",
   "metadata": {},
   "source": [
    "**Suppose we want to have one record for each gender, containing age, height and weight for each sport**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "158af4d5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">age</th>\n",
       "      <th colspan=\"2\" halign=\"left\">height</th>\n",
       "      <th colspan=\"2\" halign=\"left\">weight</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sport</th>\n",
       "      <th>basketball</th>\n",
       "      <th>cricket</th>\n",
       "      <th>basketball</th>\n",
       "      <th>cricket</th>\n",
       "      <th>basketball</th>\n",
       "      <th>cricket</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gender</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>23.0</td>\n",
       "      <td>20.5</td>\n",
       "      <td>73</td>\n",
       "      <td>70</td>\n",
       "      <td>150</td>\n",
       "      <td>150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>21.0</td>\n",
       "      <td>22.0</td>\n",
       "      <td>75</td>\n",
       "      <td>72</td>\n",
       "      <td>175</td>\n",
       "      <td>200</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              age             height             weight        \n",
       "sport  basketball cricket basketball cricket basketball cricket\n",
       "gender                                                         \n",
       "female       23.0    20.5         73      70        150     150\n",
       "male         21.0    22.0         75      72        175     200"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1 = df.pivot_table(index='gender', columns='sport')\n",
    "df1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c90ba62a",
   "metadata": {},
   "source": [
    "When we try to repeat the same using `pivot()`, we get a ValueError: Index contains duplicate entries, cannot reshape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d78d36f1",
   "metadata": {},
   "outputs": [],
   "source": [
    "#df1 = df.pivot(index='gender', columns='sport')\n",
    "#df1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b84c0910",
   "metadata": {},
   "source": [
    "- When we set the index to `gender` and columns to `sport`, the `pivot()`\n",
    "- In this case there are two rows which have `female` and play `basketball`. \n",
    "- The `pivot()` function doesn't know what value to put into cell values. \n",
    "- So raise a ValueError: Index contains duplicate entries, cannot reshape\n",
    "- The `pivot_table()` method use the default `aggfunc=’mean’` argument to decide this."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a3c1711b",
   "metadata": {},
   "source": [
    "**Use of margins argument to `pivot_table()` method**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "bc964838",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">age</th>\n",
       "      <th colspan=\"3\" halign=\"left\">height</th>\n",
       "      <th colspan=\"3\" halign=\"left\">weight</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sport</th>\n",
       "      <th>basketball</th>\n",
       "      <th>cricket</th>\n",
       "      <th>All</th>\n",
       "      <th>basketball</th>\n",
       "      <th>cricket</th>\n",
       "      <th>All</th>\n",
       "      <th>basketball</th>\n",
       "      <th>cricket</th>\n",
       "      <th>All</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gender</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>23.0</td>\n",
       "      <td>20.5</td>\n",
       "      <td>21.333333</td>\n",
       "      <td>73</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>71.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>150.000000</td>\n",
       "      <td>150.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>21.0</td>\n",
       "      <td>22.0</td>\n",
       "      <td>21.500000</td>\n",
       "      <td>75</td>\n",
       "      <td>72.000000</td>\n",
       "      <td>73.5</td>\n",
       "      <td>175.0</td>\n",
       "      <td>200.000000</td>\n",
       "      <td>187.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>All</th>\n",
       "      <td>22.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>21.400000</td>\n",
       "      <td>74</td>\n",
       "      <td>70.666667</td>\n",
       "      <td>72.0</td>\n",
       "      <td>162.5</td>\n",
       "      <td>166.666667</td>\n",
       "      <td>165.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              age                        height                      weight  \\\n",
       "sport  basketball cricket        All basketball    cricket   All basketball   \n",
       "gender                                                                        \n",
       "female       23.0    20.5  21.333333         73  70.000000  71.0      150.0   \n",
       "male         21.0    22.0  21.500000         75  72.000000  73.5      175.0   \n",
       "All          22.0    21.0  21.400000         74  70.666667  72.0      162.5   \n",
       "\n",
       "                           \n",
       "sport      cricket    All  \n",
       "gender                     \n",
       "female  150.000000  150.0  \n",
       "male    200.000000  187.5  \n",
       "All     166.666667  165.0  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pivot_table(index='gender', columns='sport', margins=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fe5a1d3f",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3406e6fa",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11548152",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "46676c68",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1412b518",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "127e7b32",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "e5486840",
   "metadata": {},
   "source": [
    "### DataSet 4:\n",
    "In this dataset, since we have ...., so the `pivot()` method will flag an error as it donot know what out of the three values to place in the dataframe. However, the `pivot_table()` method will use some aggregation function to compute the value to be placed and will work fine....\n",
    "\n",
    "<img align=\"center\" width=\"900\" height=\"600\"  src=\"images/ds4.png\"  >"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ad72ad40",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8bfce001",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cc2b97b9",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "74c41892",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>uniq_id</th>\n",
       "      <th>animal</th>\n",
       "      <th>water_need</th>\n",
       "      <th>speed</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1001</td>\n",
       "      <td>elephant</td>\n",
       "      <td>500</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1002</td>\n",
       "      <td>elephant</td>\n",
       "      <td>600</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1003</td>\n",
       "      <td>elephant</td>\n",
       "      <td>350</td>\n",
       "      <td>29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1004</td>\n",
       "      <td>tiger</td>\n",
       "      <td>300</td>\n",
       "      <td>60</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1005</td>\n",
       "      <td>tiger</td>\n",
       "      <td>320</td>\n",
       "      <td>65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1006</td>\n",
       "      <td>tiger</td>\n",
       "      <td>330</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1007</td>\n",
       "      <td>tiger</td>\n",
       "      <td>290</td>\n",
       "      <td>69</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1008</td>\n",
       "      <td>tiger</td>\n",
       "      <td>310</td>\n",
       "      <td>72</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>1009</td>\n",
       "      <td>zebra</td>\n",
       "      <td>200</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1010</td>\n",
       "      <td>zebra</td>\n",
       "      <td>220</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1011</td>\n",
       "      <td>zebra</td>\n",
       "      <td>240</td>\n",
       "      <td>72</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>1012</td>\n",
       "      <td>zebra</td>\n",
       "      <td>230</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>1013</td>\n",
       "      <td>zebra</td>\n",
       "      <td>220</td>\n",
       "      <td>82</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>1014</td>\n",
       "      <td>zebra</td>\n",
       "      <td>100</td>\n",
       "      <td>79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>1015</td>\n",
       "      <td>zebra</td>\n",
       "      <td>80</td>\n",
       "      <td>76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>1016</td>\n",
       "      <td>lion</td>\n",
       "      <td>420</td>\n",
       "      <td>66</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>1017</td>\n",
       "      <td>lion</td>\n",
       "      <td>600</td>\n",
       "      <td>67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>1018</td>\n",
       "      <td>lion</td>\n",
       "      <td>500</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>1019</td>\n",
       "      <td>lion</td>\n",
       "      <td>390</td>\n",
       "      <td>64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>1020</td>\n",
       "      <td>kangaroo</td>\n",
       "      <td>410</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>1021</td>\n",
       "      <td>kangaroo</td>\n",
       "      <td>430</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>1022</td>\n",
       "      <td>kangaroo</td>\n",
       "      <td>410</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    uniq_id    animal  water_need  speed\n",
       "0      1001  elephant         500     20\n",
       "1      1002  elephant         600     25\n",
       "2      1003  elephant         350     29\n",
       "3      1004     tiger         300     60\n",
       "4      1005     tiger         320     65\n",
       "5      1006     tiger         330     70\n",
       "6      1007     tiger         290     69\n",
       "7      1008     tiger         310     72\n",
       "8      1009     zebra         200     75\n",
       "9      1010     zebra         220     77\n",
       "10     1011     zebra         240     72\n",
       "11     1012     zebra         230     80\n",
       "12     1013     zebra         220     82\n",
       "13     1014     zebra         100     79\n",
       "14     1015     zebra          80     76\n",
       "15     1016      lion         420     66\n",
       "16     1017      lion         600     67\n",
       "17     1018      lion         500     68\n",
       "18     1019      lion         390     64\n",
       "19     1020  kangaroo         410     19\n",
       "20     1021  kangaroo         430     22\n",
       "21     1022  kangaroo         410     17"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "df = pd.read_csv('datasets/waterneed.csv')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1f7d727",
   "metadata": {},
   "source": [
    "- **The `pivot()` method requires atleast two arguments index and columns**\n",
    "\n",
    "- **The `pivot_table()` on the contrary can work on index argument only, the values place are using the mean aggregate function.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "03f5be0d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>speed</th>\n",
       "      <th>uniq_id</th>\n",
       "      <th>water_need</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>animal</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>elephant</th>\n",
       "      <td>24.666667</td>\n",
       "      <td>1002.0</td>\n",
       "      <td>483.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>kangaroo</th>\n",
       "      <td>19.333333</td>\n",
       "      <td>1021.0</td>\n",
       "      <td>416.666667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>lion</th>\n",
       "      <td>66.250000</td>\n",
       "      <td>1017.5</td>\n",
       "      <td>477.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>tiger</th>\n",
       "      <td>67.200000</td>\n",
       "      <td>1006.0</td>\n",
       "      <td>310.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>zebra</th>\n",
       "      <td>77.285714</td>\n",
       "      <td>1012.0</td>\n",
       "      <td>184.285714</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              speed  uniq_id  water_need\n",
       "animal                                  \n",
       "elephant  24.666667   1002.0  483.333333\n",
       "kangaroo  19.333333   1021.0  416.666667\n",
       "lion      66.250000   1017.5  477.500000\n",
       "tiger     67.200000   1006.0  310.000000\n",
       "zebra     77.285714   1012.0  184.285714"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1 = df.pivot_table(index='animal')\n",
    "df1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c216c23",
   "metadata": {},
   "source": [
    "You can apply aggregation function on the new dataframe as well, such as compute the average speed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "0d4e2a62",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "50.94714285714285"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1['speed'].agg('mean')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "116b0c39",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "speed          50.947143\n",
       "water_need    374.357143\n",
       "dtype: float64"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# You can also perfrom aggragtion to summarize data\n",
    "df1[['speed','water_need']].agg('mean')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e64df88d",
   "metadata": {},
   "source": [
    "**Multilevel indexing** You can perfrom multi-level indexing by passing the columns as a list to index argument to `pivottable()`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "d7b447e0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>speed</th>\n",
       "      <th>water_need</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>animal</th>\n",
       "      <th>uniq_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">elephant</th>\n",
       "      <th>1001</th>\n",
       "      <td>20</td>\n",
       "      <td>500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1002</th>\n",
       "      <td>25</td>\n",
       "      <td>600</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1003</th>\n",
       "      <td>29</td>\n",
       "      <td>350</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">kangaroo</th>\n",
       "      <th>1020</th>\n",
       "      <td>19</td>\n",
       "      <td>410</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1021</th>\n",
       "      <td>22</td>\n",
       "      <td>430</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1022</th>\n",
       "      <td>17</td>\n",
       "      <td>410</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">lion</th>\n",
       "      <th>1016</th>\n",
       "      <td>66</td>\n",
       "      <td>420</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1017</th>\n",
       "      <td>67</td>\n",
       "      <td>600</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1018</th>\n",
       "      <td>68</td>\n",
       "      <td>500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1019</th>\n",
       "      <td>64</td>\n",
       "      <td>390</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">tiger</th>\n",
       "      <th>1004</th>\n",
       "      <td>60</td>\n",
       "      <td>300</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1005</th>\n",
       "      <td>65</td>\n",
       "      <td>320</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1006</th>\n",
       "      <td>70</td>\n",
       "      <td>330</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1007</th>\n",
       "      <td>69</td>\n",
       "      <td>290</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1008</th>\n",
       "      <td>72</td>\n",
       "      <td>310</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"7\" valign=\"top\">zebra</th>\n",
       "      <th>1009</th>\n",
       "      <td>75</td>\n",
       "      <td>200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1010</th>\n",
       "      <td>77</td>\n",
       "      <td>220</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1011</th>\n",
       "      <td>72</td>\n",
       "      <td>240</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1012</th>\n",
       "      <td>80</td>\n",
       "      <td>230</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1013</th>\n",
       "      <td>82</td>\n",
       "      <td>220</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1014</th>\n",
       "      <td>79</td>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1015</th>\n",
       "      <td>76</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  speed  water_need\n",
       "animal   uniq_id                   \n",
       "elephant 1001        20         500\n",
       "         1002        25         600\n",
       "         1003        29         350\n",
       "kangaroo 1020        19         410\n",
       "         1021        22         430\n",
       "         1022        17         410\n",
       "lion     1016        66         420\n",
       "         1017        67         600\n",
       "         1018        68         500\n",
       "         1019        64         390\n",
       "tiger    1004        60         300\n",
       "         1005        65         320\n",
       "         1006        70         330\n",
       "         1007        69         290\n",
       "         1008        72         310\n",
       "zebra    1009        75         200\n",
       "         1010        77         220\n",
       "         1011        72         240\n",
       "         1012        80         230\n",
       "         1013        82         220\n",
       "         1014        79         100\n",
       "         1015        76          80"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pivot_table(index=['animal','uniq_id'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b646be76",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8b213225",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ef9e9624",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "92c2eb09",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "07d8214e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7c3e1995",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "cd95ac43",
   "metadata": {},
   "source": [
    "## 2. Reshaping Data Using `df.melt()` Method\n",
    "\n",
    "- Similar to `pivot()` and `pivot_table()`, Pandas `melt()` method is also used to transform or reshape data. \n",
    "- The `pd.melt()` method is used to change the DataFrame format from wide to long\n",
    "- The Pandas `pd.melt()` method is useful to reshape a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars). Its signature is:\n",
    "```\n",
    "pandas.melt(Dataframe, id_vars=None, value_vars=None, var_name=None, value_name='value',ignore_index=True)\n",
    "```\n",
    "Where,\n",
    "- `id_vars`: tuple, list, or ndarray, optional  (Column(s) to use as identifier variables)\n",
    "- `value_var`: tuple, list, or ndarray, optional (If not specified, uses all columns that are not set as id_vars)\n",
    "- `var_name`: Name to use for the ‘variable’ column. If None it uses frame.columns.name or ‘variable’.\n",
    "- `value_name`: Name to use for the ‘value’ column.\n",
    "- `ignore_index`: bool, default True (If True, original index is ignored. If False, the original index is retained.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "40a69bd2",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "29fd35a1",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fe29a636",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "761ce23e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5fa29f2e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "c3ba290f",
   "metadata": {},
   "source": [
    "### DataSet 1:\n",
    "<img align=\"center\" width=\"900\" height=\"600\"  src=\"images/melt2.png\"  >"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e492553e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3e0be1c1",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "ab8ace83",
   "metadata": {},
   "source": [
    "<img align=\"center\" width=\"900\" height=\"600\"  src=\"images/melt1.png\"  >"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0fecb36e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28af4b5e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee31757e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1659b507",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d6c9aa82",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3b12a1d4",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "af2eef12",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>day</th>\n",
       "      <th>lahore</th>\n",
       "      <th>karachi</th>\n",
       "      <th>murree</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Monday</td>\n",
       "      <td>32</td>\n",
       "      <td>75</td>\n",
       "      <td>41</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Tuesday</td>\n",
       "      <td>30</td>\n",
       "      <td>77</td>\n",
       "      <td>43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Wednesday</td>\n",
       "      <td>28</td>\n",
       "      <td>75</td>\n",
       "      <td>45</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Thursday</td>\n",
       "      <td>22</td>\n",
       "      <td>82</td>\n",
       "      <td>38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Friday</td>\n",
       "      <td>30</td>\n",
       "      <td>83</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Saturday</td>\n",
       "      <td>20</td>\n",
       "      <td>81</td>\n",
       "      <td>45</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Sunday</td>\n",
       "      <td>25</td>\n",
       "      <td>77</td>\n",
       "      <td>47</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         day  lahore  karachi  murree\n",
       "0     Monday      32       75      41\n",
       "1    Tuesday      30       77      43\n",
       "2  Wednesday      28       75      45\n",
       "3   Thursday      22       82      38\n",
       "4     Friday      30       83      30\n",
       "5   Saturday      20       81      45\n",
       "6     Sunday      25       77      47"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('datasets/weather.csv')\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "adcc07c8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>day</th>\n",
       "      <th>variable</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Monday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Tuesday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Wednesday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Thursday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Friday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Saturday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Sunday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Monday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Tuesday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Wednesday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Thursday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>82</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Friday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Saturday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>81</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Sunday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Monday</td>\n",
       "      <td>murree</td>\n",
       "      <td>41</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Tuesday</td>\n",
       "      <td>murree</td>\n",
       "      <td>43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Wednesday</td>\n",
       "      <td>murree</td>\n",
       "      <td>45</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Thursday</td>\n",
       "      <td>murree</td>\n",
       "      <td>38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Friday</td>\n",
       "      <td>murree</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Saturday</td>\n",
       "      <td>murree</td>\n",
       "      <td>45</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Sunday</td>\n",
       "      <td>murree</td>\n",
       "      <td>47</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          day variable  value\n",
       "0      Monday   lahore     32\n",
       "1     Tuesday   lahore     30\n",
       "2   Wednesday   lahore     28\n",
       "3    Thursday   lahore     22\n",
       "4      Friday   lahore     30\n",
       "5    Saturday   lahore     20\n",
       "6      Sunday   lahore     25\n",
       "7      Monday  karachi     75\n",
       "8     Tuesday  karachi     77\n",
       "9   Wednesday  karachi     75\n",
       "10   Thursday  karachi     82\n",
       "11     Friday  karachi     83\n",
       "12   Saturday  karachi     81\n",
       "13     Sunday  karachi     77\n",
       "14     Monday   murree     41\n",
       "15    Tuesday   murree     43\n",
       "16  Wednesday   murree     45\n",
       "17   Thursday   murree     38\n",
       "18     Friday   murree     30\n",
       "19   Saturday   murree     45\n",
       "20     Sunday   murree     47"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1 = pd.melt(df, id_vars =['day'])\n",
    "df1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9eb8a582",
   "metadata": {},
   "source": [
    ">You can change the name of columns for example, replace the column `variable` and `value` with some meaningful names. like `city` and `temperature` using the `var_name` and `value_name` arguments of `melt()` method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "071c2703",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>day</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Monday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Tuesday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Wednesday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Thursday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Friday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Saturday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Sunday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Monday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Tuesday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Wednesday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Thursday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>82</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Friday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Saturday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>81</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Sunday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Monday</td>\n",
       "      <td>murree</td>\n",
       "      <td>41</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Tuesday</td>\n",
       "      <td>murree</td>\n",
       "      <td>43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Wednesday</td>\n",
       "      <td>murree</td>\n",
       "      <td>45</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Thursday</td>\n",
       "      <td>murree</td>\n",
       "      <td>38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Friday</td>\n",
       "      <td>murree</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Saturday</td>\n",
       "      <td>murree</td>\n",
       "      <td>45</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Sunday</td>\n",
       "      <td>murree</td>\n",
       "      <td>47</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          day     city  temperature\n",
       "0      Monday   lahore           32\n",
       "1     Tuesday   lahore           30\n",
       "2   Wednesday   lahore           28\n",
       "3    Thursday   lahore           22\n",
       "4      Friday   lahore           30\n",
       "5    Saturday   lahore           20\n",
       "6      Sunday   lahore           25\n",
       "7      Monday  karachi           75\n",
       "8     Tuesday  karachi           77\n",
       "9   Wednesday  karachi           75\n",
       "10   Thursday  karachi           82\n",
       "11     Friday  karachi           83\n",
       "12   Saturday  karachi           81\n",
       "13     Sunday  karachi           77\n",
       "14     Monday   murree           41\n",
       "15    Tuesday   murree           43\n",
       "16  Wednesday   murree           45\n",
       "17   Thursday   murree           38\n",
       "18     Friday   murree           30\n",
       "19   Saturday   murree           45\n",
       "20     Sunday   murree           47"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1 = pd.melt(df, id_vars =['day'], var_name='city', value_name='temperature')\n",
    "df1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30eb27bb",
   "metadata": {},
   "source": [
    ">You can filter the rows of your choice using the `value_vars` argument of `melt()` method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "55f66c15",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>day</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Monday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Tuesday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Wednesday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Thursday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Friday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Saturday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Sunday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         day    city  temperature\n",
       "0     Monday  lahore           32\n",
       "1    Tuesday  lahore           30\n",
       "2  Wednesday  lahore           28\n",
       "3   Thursday  lahore           22\n",
       "4     Friday  lahore           30\n",
       "5   Saturday  lahore           20\n",
       "6     Sunday  lahore           25"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = pd.melt(df, id_vars = ['day'], value_vars =['lahore'], var_name='city', value_name='temperature')\n",
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "e075e2ca",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>day</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Monday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Tuesday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Wednesday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Thursday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>82</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Friday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Saturday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>81</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Sunday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         day     city  temperature\n",
       "0     Monday  karachi           75\n",
       "1    Tuesday  karachi           77\n",
       "2  Wednesday  karachi           75\n",
       "3   Thursday  karachi           82\n",
       "4     Friday  karachi           83\n",
       "5   Saturday  karachi           81\n",
       "6     Sunday  karachi           77"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3 = pd.melt(df, id_vars =['day'], value_vars=['karachi'],var_name='city', value_name='temperature')\n",
    "df3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "6c4568e4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>day</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Monday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Tuesday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Wednesday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Thursday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>82</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Friday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Saturday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>81</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Sunday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          day     city  temperature\n",
       "7      Monday  karachi           75\n",
       "8     Tuesday  karachi           77\n",
       "9   Wednesday  karachi           75\n",
       "10   Thursday  karachi           82\n",
       "11     Friday  karachi           83\n",
       "12   Saturday  karachi           81\n",
       "13     Sunday  karachi           77"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# You can achieve the similar result by using Boolean indexing\n",
    "df1[df1['city'] == 'karachi']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2d6f94c9",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e5246468",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "5f79cd6c",
   "metadata": {},
   "source": [
    ">You can apply aggregation function on the new dataframe as well, such as compute the average temperatures"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "b92af885",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>day</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Monday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Tuesday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Wednesday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Thursday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Friday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Saturday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Sunday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Monday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Tuesday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Wednesday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Thursday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>82</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Friday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Saturday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>81</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Sunday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Monday</td>\n",
       "      <td>murree</td>\n",
       "      <td>41</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Tuesday</td>\n",
       "      <td>murree</td>\n",
       "      <td>43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Wednesday</td>\n",
       "      <td>murree</td>\n",
       "      <td>45</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Thursday</td>\n",
       "      <td>murree</td>\n",
       "      <td>38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Friday</td>\n",
       "      <td>murree</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Saturday</td>\n",
       "      <td>murree</td>\n",
       "      <td>45</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Sunday</td>\n",
       "      <td>murree</td>\n",
       "      <td>47</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          day     city  temperature\n",
       "0      Monday   lahore           32\n",
       "1     Tuesday   lahore           30\n",
       "2   Wednesday   lahore           28\n",
       "3    Thursday   lahore           22\n",
       "4      Friday   lahore           30\n",
       "5    Saturday   lahore           20\n",
       "6      Sunday   lahore           25\n",
       "7      Monday  karachi           75\n",
       "8     Tuesday  karachi           77\n",
       "9   Wednesday  karachi           75\n",
       "10   Thursday  karachi           82\n",
       "11     Friday  karachi           83\n",
       "12   Saturday  karachi           81\n",
       "13     Sunday  karachi           77\n",
       "14     Monday   murree           41\n",
       "15    Tuesday   murree           43\n",
       "16  Wednesday   murree           45\n",
       "17   Thursday   murree           38\n",
       "18     Friday   murree           30\n",
       "19   Saturday   murree           45\n",
       "20     Sunday   murree           47"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "5c00f973",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "48.857142857142854"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# compute the average temperature of entire dataframe\n",
    "df1['temperature'].agg('mean')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "10bbbede",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>day</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Monday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Tuesday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Wednesday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Thursday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Friday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Saturday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Sunday</td>\n",
       "      <td>lahore</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         day    city  temperature\n",
       "0     Monday  lahore           32\n",
       "1    Tuesday  lahore           30\n",
       "2  Wednesday  lahore           28\n",
       "3   Thursday  lahore           22\n",
       "4     Friday  lahore           30\n",
       "5   Saturday  lahore           20\n",
       "6     Sunday  lahore           25"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1[df1['city'] == 'lahore' ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "0a8aac60",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "26.714285714285715"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# compute the average temperature of lahore city only\n",
    "df1[df1['city'] == 'lahore' ].temperature.agg('mean')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "1daeeda3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>day</th>\n",
       "      <th>city</th>\n",
       "      <th>temperature</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Monday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Tuesday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Wednesday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Thursday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>82</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Friday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Saturday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>81</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Sunday</td>\n",
       "      <td>karachi</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          day     city  temperature\n",
       "7      Monday  karachi           75\n",
       "8     Tuesday  karachi           77\n",
       "9   Wednesday  karachi           75\n",
       "10   Thursday  karachi           82\n",
       "11     Friday  karachi           83\n",
       "12   Saturday  karachi           81\n",
       "13     Sunday  karachi           77"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1[df1['city'] == 'karachi' ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "d3a26e60",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "78.57142857142857"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# compute the average temperature of karachi city only\n",
    "df1[df1['city'] == 'karachi' ].temperature.agg('mean')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22264aa4",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7b282f89",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cf552c05",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2b76e497",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "53e880bc",
   "metadata": {},
   "source": [
    "## 3. Reshaping Data Using `df.crosstab()` Method\n",
    "\n",
    "- The `pd.crosstab()` method is also used for data restructuring and reshaping.\n",
    "- It is normally used for quickly comparing categorical variables.\n",
    "- The cross table is also known as contingency table, which is a matrix type table that displays the (multivariate) frequency distribution of variables.\n",
    "```\n",
    "pandas.crosstab(index, \n",
    "                columns, \n",
    "                aggfunc=None,\n",
    "                values=None,\n",
    "                margins=False, \n",
    "                normalize=False)\n",
    "```\n",
    "Where,\n",
    "- `index`: array-like, Series, or list of arrays/Series (Values to group by in the rows)\n",
    "- `columns`: array-like, Series, or list of arrays/Series (Values to group by in the columns)\n",
    "- `values`: array-like, optional (Array of values to aggregate according to the factors. Requires aggfunc be specified)\n",
    "- `aggfunc`: function, optional If specified, requires values be specified as well.\n",
    "- `margins`: bool, default False, Add row/column margins (subtotals).\n",
    "- `normalize`: bool, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False (Normalize by dividing all values by the sum of values)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "2e20d349",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>gender</th>\n",
       "      <th>age</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Lahore</td>\n",
       "      <td>male</td>\n",
       "      <td>35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Lahore</td>\n",
       "      <td>male</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Lahore</td>\n",
       "      <td>female</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Lahore</td>\n",
       "      <td>female</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Lahore</td>\n",
       "      <td>female</td>\n",
       "      <td>33</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Karachi</td>\n",
       "      <td>male</td>\n",
       "      <td>66</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Karachi</td>\n",
       "      <td>male</td>\n",
       "      <td>29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Karachi</td>\n",
       "      <td>female</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Islamabad</td>\n",
       "      <td>female</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Islamabad</td>\n",
       "      <td>female</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Islamabad</td>\n",
       "      <td>male</td>\n",
       "      <td>55</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Islamabad</td>\n",
       "      <td>male</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Multan</td>\n",
       "      <td>male</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Multan</td>\n",
       "      <td>female</td>\n",
       "      <td>65</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         city  gender  age\n",
       "0      Lahore    male   35\n",
       "1      Lahore    male   40\n",
       "2      Lahore  female   70\n",
       "3      Lahore  female   25\n",
       "4      Lahore  female   33\n",
       "5     Karachi    male   66\n",
       "6     Karachi    male   29\n",
       "7     Karachi  female   24\n",
       "8   Islamabad  female   20\n",
       "9   Islamabad  female   10\n",
       "10  Islamabad    male   55\n",
       "11  Islamabad    male   15\n",
       "12     Multan    male   24\n",
       "13     Multan  female   65"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Reading data from 'datasets/sample.csv' file\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "df = pd.read_csv('datasets/sample1.csv')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53566f10",
   "metadata": {},
   "source": [
    ">Suppose we want to get the frequency distribution of males and females. You pass `city` column as `index` argument and `gender` column as `columns` argument to the `pd.crosstab()` method. It returns a frequency table containing the male and female count in each city"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "a65f7ee3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>gender</th>\n",
       "      <th>female</th>\n",
       "      <th>male</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>city</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Islamabad</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Karachi</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lahore</th>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Multan</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "gender     female  male\n",
       "city                   \n",
       "Islamabad       2     2\n",
       "Karachi         1     2\n",
       "Lahore          3     2\n",
       "Multan          1     1"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.crosstab(index=df.city, columns=df.gender)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3c1c84be",
   "metadata": {},
   "source": [
    "> You can also get the count of total male and female in each city by setting `margins` attribute to `True`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "806150e6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>gender</th>\n",
       "      <th>female</th>\n",
       "      <th>male</th>\n",
       "      <th>All</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>city</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Islamabad</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Karachi</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lahore</th>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Multan</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>All</th>\n",
       "      <td>7</td>\n",
       "      <td>7</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "gender     female  male  All\n",
       "city                        \n",
       "Islamabad       2     2    4\n",
       "Karachi         1     2    3\n",
       "Lahore          3     2    5\n",
       "Multan          1     1    2\n",
       "All             7     7   14"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.crosstab(index=df.city, columns=df.gender, margins=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8fea0848",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "48520a2c",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "4465dc48",
   "metadata": {},
   "source": [
    ">Instead of getting frequencies in whole number you can also calculate the percentage of male and female in each city. For that you set the `normalize` argument to a value of `True`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "1b4bf00d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>gender</th>\n",
       "      <th>female</th>\n",
       "      <th>male</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>city</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Islamabad</th>\n",
       "      <td>0.142857</td>\n",
       "      <td>0.142857</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Karachi</th>\n",
       "      <td>0.071429</td>\n",
       "      <td>0.142857</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lahore</th>\n",
       "      <td>0.214286</td>\n",
       "      <td>0.142857</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Multan</th>\n",
       "      <td>0.071429</td>\n",
       "      <td>0.071429</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "gender       female      male\n",
       "city                         \n",
       "Islamabad  0.142857  0.142857\n",
       "Karachi    0.071429  0.142857\n",
       "Lahore     0.214286  0.142857\n",
       "Multan     0.071429  0.071429"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.crosstab(index=df.city, columns=df.gender, normalize=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16a5ea1a",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5de11c86",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "b88e5189",
   "metadata": {},
   "source": [
    ">Suppose you want to get the average age of male and female in different cities. To achieve this, set the `values` argument to `age` column, and pass the appropariate aggregate function to the `aggfunc` argument"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "d10ad158",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>gender</th>\n",
       "      <th>female</th>\n",
       "      <th>male</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>city</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Islamabad</th>\n",
       "      <td>15.000000</td>\n",
       "      <td>35.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Karachi</th>\n",
       "      <td>24.000000</td>\n",
       "      <td>47.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lahore</th>\n",
       "      <td>42.666667</td>\n",
       "      <td>37.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Multan</th>\n",
       "      <td>65.000000</td>\n",
       "      <td>24.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "gender        female  male\n",
       "city                      \n",
       "Islamabad  15.000000  35.0\n",
       "Karachi    24.000000  47.5\n",
       "Lahore     42.666667  37.5\n",
       "Multan     65.000000  24.0"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.crosstab(index=df.city, columns=df.gender, values=df.age, aggfunc=np.mean)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "adba2813",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3e2d5a45",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0012883b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2f2486de",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}