{ "cells": [ { "cell_type": "markdown", "id": "4a87b5ef", "metadata": {}, "source": [ "--- \n", " \n", "\n", "

Department of Data Science

\n", "

Course: Tools and Techniques for Data Science

\n", "\n", "---\n", "

Instructor: Muhammad Arif Butt, Ph.D.

" ] }, { "cell_type": "markdown", "id": "ab0dc25c", "metadata": {}, "source": [ "

Lecture 3.19 (Pandas-11)

" ] }, { "cell_type": "markdown", "id": "f4098342", "metadata": {}, "source": [ "\"Open" ] }, { "cell_type": "markdown", "id": "19f82705", "metadata": {}, "source": [ "## _Reshape using pivot, melt, and crosstab.ipynb_\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "id": "4e5a6724", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "5ee97eba", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "1d04e9fa", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "5f6be983", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "12db95e1", "metadata": {}, "source": [ "## Learning agenda of this notebook\n", "\n", "1. Reshape Data Using `pivot()` and `pivot_table()` methods\n", "2. Reshape Data Using `melt()` method\n", "3. Reshape Data Using `crosstab()` method" ] }, { "cell_type": "code", "execution_count": null, "id": "0e3a8656", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "386eb62a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "0bd3b24b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "e17eebda", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "8c42bc83", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "98eb8c30", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "9f90af46", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "3d7cef08", "metadata": {}, "source": [ "## 1. Reshaping Data Using `df.pivot()` and `df.pivot_table()` Methods" ] }, { "cell_type": "markdown", "id": "65f584ea", "metadata": {}, "source": [ "**```df.pivot(index=None, columns=None, values=None)```**
\n", "**```pandas.pivot(data, index=None, columns=None, values=None)```**\n", "\n", "Where,\n", "- `index`: Column to use as new dataframe's index. If None, uses existing index.\n", "- `columns`: Column to use to make new dataframe columns.\n", "- `values`: Column(s) to use for populating new frame's values. \n", "\n", "Read more about `pd.pivot()`: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html\n", "\n", "Read more about `df.pivot()`: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html" ] }, { "cell_type": "code", "execution_count": null, "id": "d8f3bc2c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "a4cfabf7", "metadata": {}, "source": [ "**`df.pivot_table(index=None, columns=None, values=None, aggfunc= 'mean', fill_valus=None)`**
\n", "**`pandas.pivot_table(data, index=None, columns=None, values=None, aggfunc= 'mean', fill_valus=None)`**\n", "\n", "Where,\n", "- `index`: Column to use as new dataframe's index. If None, uses existing index.\n", "- `columns`: Column to use to make new dataframe columns.\n", "- `values`: Column(s) to use for populating new frame's values. \n", "- `aggfunc`: default is numpy.mean\n", "- `fill_value`: Value to replace missing values with\n", "\n", "\n", "\n", "Read more about `pd.pivot_table()`: https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html\n", "\n", "Read more about `df.pivot_table()`: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot_table.html?highlight=dataframe%20pivot_table#pandas.DataFrame.pivot_table" ] }, { "cell_type": "code", "execution_count": null, "id": "6ddc0e09", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "30159c80", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "3a7e18b6", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "4b727393", "metadata": {}, "source": [ "### DataSet 1:\n", "\n", "You can use both `pivot()` as well as `pivot_table()` methods over here\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "id": "1d6ab6a1", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "d178b48b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "7f722f49", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 4, "id": "3be05ec7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datecitytemperaturehumidity
020/06/2021Lahore3876
121/06/2021Lahore4180
222/06/2021Lahore3985
320/06/2021Muree1771
421/06/2021Muree1570
522/06/2021Muree1874
620/06/2021Karachi3393
721/06/2021Karachi3091
822/06/2021Karachi3590
\n", "
" ], "text/plain": [ " date city temperature humidity\n", "0 20/06/2021 Lahore 38 76\n", "1 21/06/2021 Lahore 41 80\n", "2 22/06/2021 Lahore 39 85\n", "3 20/06/2021 Muree 17 71\n", "4 21/06/2021 Muree 15 70\n", "5 22/06/2021 Muree 18 74\n", "6 20/06/2021 Karachi 33 93\n", "7 21/06/2021 Karachi 30 91\n", "8 22/06/2021 Karachi 35 90" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.read_csv('datasets/pivot_weather1.csv')\n", "df" ] }, { "cell_type": "code", "execution_count": 11, "id": "09152536", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
date20/06/202121/06/202122/06/2021
city
Karachi939190
Lahore768085
Muree717074
\n", "
" ], "text/plain": [ "date 20/06/2021 21/06/2021 22/06/2021\n", "city \n", "Karachi 93 91 90\n", "Lahore 76 80 85\n", "Muree 71 70 74" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.pivot_table(index='city', columns='date', values='humidity')" ] }, { "cell_type": "code", "execution_count": null, "id": "c929fa6d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "106e2348", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "da60f9d0", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "9e4eac82", "metadata": {}, "source": [ ">**Suppose we want to have one record for each city, containing temperature and humidity for each date**" ] }, { "cell_type": "code", "execution_count": 3, "id": "ecdaafe6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
temperaturehumidity
date20/06/202121/06/202122/06/202120/06/202121/06/202122/06/2021
city
Karachi333035939190
Lahore384139768085
Muree171518717074
\n", "
" ], "text/plain": [ " temperature humidity \n", "date 20/06/2021 21/06/2021 22/06/2021 20/06/2021 21/06/2021 22/06/2021\n", "city \n", "Karachi 33 30 35 93 91 90\n", "Lahore 38 41 39 76 80 85\n", "Muree 17 15 18 71 70 74" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# using pivot()\n", "df1 = df.pivot(index='city', columns='date')\n", "df1" ] }, { "cell_type": "markdown", "id": "c87ddcf4", "metadata": {}, "source": [ ">**Let us repeat the same using `pivot_table()`**" ] }, { "cell_type": "code", "execution_count": 3, "id": "733834ef", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
humiditytemperature
date20/06/202121/06/202122/06/202120/06/202121/06/202122/06/2021
city
Karachi939190333035
Lahore768085384139
Muree717074171518
\n", "
" ], "text/plain": [ " humidity temperature \n", "date 20/06/2021 21/06/2021 22/06/2021 20/06/2021 21/06/2021 22/06/2021\n", "city \n", "Karachi 93 91 90 33 30 35\n", "Lahore 76 80 85 38 41 39\n", "Muree 71 70 74 17 15 18" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# using pivot_table()\n", "df1 = df.pivot_table(index='city', columns='date')\n", "df1" ] }, { "cell_type": "markdown", "id": "80dc9a4c", "metadata": {}, "source": [ "- By setting the `index='city'`, the city column is the left most column now having unique values. \n", "- By setting the `columns='date'`, the values from the date column have become the column headers now.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "4d7b59da", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "8780aeb5", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "ef361057", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "1a7ab5e5", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "cb06eb2b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "b3b2b092", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "70d95b45", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "0b80f036", "metadata": {}, "source": [ "**Suppose we want to see only the temperature or humidity column in the output dataframe. This can be achieved by setting the `values` argument to the name of the column**" ] }, { "cell_type": "code", "execution_count": 4, "id": "be6e29d0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
date20/06/202121/06/202122/06/2021
city
Karachi333035
Lahore384139
Muree171518
\n", "
" ], "text/plain": [ "date 20/06/2021 21/06/2021 22/06/2021\n", "city \n", "Karachi 33 30 35\n", "Lahore 38 41 39\n", "Muree 17 15 18" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = df.pivot(index='city', columns='date', values='temperature')\n", "df1" ] }, { "cell_type": "code", "execution_count": 5, "id": "f2c67520", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
date20/06/202121/06/202122/06/2021
city
Karachi939190
Lahore768085
Muree717074
\n", "
" ], "text/plain": [ "date 20/06/2021 21/06/2021 22/06/2021\n", "city \n", "Karachi 93 91 90\n", "Lahore 76 80 85\n", "Muree 71 70 74" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = df.pivot(index='city', columns='date', values='humidity')\n", "df1" ] }, { "cell_type": "code", "execution_count": null, "id": "3d1b2dff", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "8c45fb62", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "51238d81", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "9cd700b7", "metadata": {}, "source": [ "**Let us keep the date along index and city at the column, so that the output dataframe should have one record for each date, containing temperature and humidity for each city**" ] }, { "cell_type": "code", "execution_count": 6, "id": "dd1dbaef", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
temperaturehumidity
cityKarachiLahoreMureeKarachiLahoreMuree
date
20/06/2021333817937671
21/06/2021304115918070
22/06/2021353918908574
\n", "
" ], "text/plain": [ " temperature humidity \n", "city Karachi Lahore Muree Karachi Lahore Muree\n", "date \n", "20/06/2021 33 38 17 93 76 71\n", "21/06/2021 30 41 15 91 80 70\n", "22/06/2021 35 39 18 90 85 74" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# using pivot()\n", "df1 = df.pivot(index='date', columns='city')\n", "df1" ] }, { "cell_type": "code", "execution_count": 7, "id": "bef120f4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
humiditytemperature
cityKarachiLahoreMureeKarachiLahoreMuree
date
20/06/2021937671333817
21/06/2021918070304115
22/06/2021908574353918
\n", "
" ], "text/plain": [ " humidity temperature \n", "city Karachi Lahore Muree Karachi Lahore Muree\n", "date \n", "20/06/2021 93 76 71 33 38 17\n", "21/06/2021 91 80 70 30 41 15\n", "22/06/2021 90 85 74 35 39 18" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# using pivot_table()\n", "df1 = df.pivot_table(index='date', columns='city')\n", "df1" ] }, { "cell_type": "code", "execution_count": null, "id": "67a364c0", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "656bdfd0", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "17e8d7ce", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "7ad0fa1f", "metadata": {}, "source": [ "### DataSet 2:\n", "You cannot use `pivot()` due to multiple values, however, `pivot_table()` will work, as it will take the `mean()` of those values\n", "" ] }, { "cell_type": "code", "execution_count": null, "id": "bdf8a311", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "a8b09363", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "4f6af2c7", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 13, "id": "7df8d4c1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datecitytemperaturehumidity
020/06/2021Lahore3876
120/06/2021Lahore4075
221/06/2021Lahore3979
321/06/2021Lahore3774
420/06/2021Muree1588
520/06/2021Muree1790
621/06/2021Muree1093
721/06/2021Muree891
\n", "
" ], "text/plain": [ " date city temperature humidity\n", "0 20/06/2021 Lahore 38 76\n", "1 20/06/2021 Lahore 40 75\n", "2 21/06/2021 Lahore 39 79\n", "3 21/06/2021 Lahore 37 74\n", "4 20/06/2021 Muree 15 88\n", "5 20/06/2021 Muree 17 90\n", "6 21/06/2021 Muree 10 93\n", "7 21/06/2021 Muree 8 91" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.read_csv('datasets/pivot_weather2.csv')\n", "df" ] }, { "cell_type": "markdown", "id": "17b47a4a", "metadata": {}, "source": [ ">Note in this dataset we donot have unique values for date and city combined " ] }, { "cell_type": "code", "execution_count": 17, "id": "da8298eb", "metadata": {}, "outputs": [], "source": [ "#df1 = df.pivot(index='date', columns='city')\n", "#df1" ] }, { "cell_type": "markdown", "id": "a5aca275", "metadata": {}, "source": [ "- When we set the index to `date` and columns to `city`, the `pivot()` tries to set the left key to `20/06/2021` and then match the column name of the differing city (Lahore) values. \n", "- In this case there are two rows which have `20/06/2021` and columns of `Lahore`. The function doesn't know what value to put into cell values. \n", "- So raise a ValueError: Index contains duplicate entries, cannot reshape\n", "- Pivot and pivot_table may only exhibit the same functionality if the data allows. If there are duplicate entries possible from the index(es) of interest you will need to aggregate the data in pivot_table, not pivot (due to duplicate error).\n" ] }, { "cell_type": "markdown", "id": "4d017c0c", "metadata": {}, "source": [ "- Let us try to do the same using `pivot_table()` method\n", "- In the `pivot_table` function, there is another argument `aggfunc=’mean’` that decides this." ] }, { "cell_type": "code", "execution_count": 11, "id": "9e553c54", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
humiditytemperature
cityLahoreMureeLahoreMuree
date
20/06/202175.589.03916
21/06/202176.592.0389
\n", "
" ], "text/plain": [ " humidity temperature \n", "city Lahore Muree Lahore Muree\n", "date \n", "20/06/2021 75.5 89.0 39 16\n", "21/06/2021 76.5 92.0 38 9" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = df.pivot_table(index='date', columns='city')\n", "df1" ] }, { "cell_type": "markdown", "id": "b5cce50d", "metadata": {}, "source": [ ">The default value to the `aggfunc` argument is `mean`, and you can explicitly pass any other aggregate function name." ] }, { "cell_type": "code", "execution_count": 12, "id": "9608dfb8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
humiditytemperature
cityLahoreMureeLahoreMuree
date
20/06/20211511787832
21/06/20211531847618
\n", "
" ], "text/plain": [ " humidity temperature \n", "city Lahore Muree Lahore Muree\n", "date \n", "20/06/2021 151 178 78 32\n", "21/06/2021 153 184 76 18" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = df.pivot_table(index='date', columns='city', aggfunc='sum')\n", "df1" ] }, { "cell_type": "code", "execution_count": null, "id": "2b0cca30", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "eed40835", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "06702a97", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "70942f23", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "e387825e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "ee284d05", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "9f084bcd", "metadata": {}, "source": [ "### DataSet 3:\n", "" ] }, { "cell_type": "code", "execution_count": null, "id": "f2a7f4dc", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "fbd6f893", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "abd24ed7", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 13, "id": "fcb47b09", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
gendersportageheightweight
0malecricket2272200
1femalecricket2172130
2femalebasketball2373150
3malebasketball2175175
4femalecricket2068170
\n", "
" ], "text/plain": [ " gender sport age height weight\n", "0 male cricket 22 72 200\n", "1 female cricket 21 72 130\n", "2 female basketball 23 73 150\n", "3 male basketball 21 75 175\n", "4 female cricket 20 68 170" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.read_csv('datasets/pivot_std1.csv')\n", "df" ] }, { "cell_type": "markdown", "id": "b45397f6", "metadata": {}, "source": [ "**Suppose we want to have one record for each gender, containing age, height and weight for each sport**" ] }, { "cell_type": "code", "execution_count": 14, "id": "158af4d5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ageheightweight
sportbasketballcricketbasketballcricketbasketballcricket
gender
female23.020.57370150150
male21.022.07572175200
\n", "
" ], "text/plain": [ " age height weight \n", "sport basketball cricket basketball cricket basketball cricket\n", "gender \n", "female 23.0 20.5 73 70 150 150\n", "male 21.0 22.0 75 72 175 200" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = df.pivot_table(index='gender', columns='sport')\n", "df1" ] }, { "cell_type": "markdown", "id": "c90ba62a", "metadata": {}, "source": [ "When we try to repeat the same using `pivot()`, we get a ValueError: Index contains duplicate entries, cannot reshape" ] }, { "cell_type": "code", "execution_count": null, "id": "d78d36f1", "metadata": {}, "outputs": [], "source": [ "#df1 = df.pivot(index='gender', columns='sport')\n", "#df1" ] }, { "cell_type": "markdown", "id": "b84c0910", "metadata": {}, "source": [ "- When we set the index to `gender` and columns to `sport`, the `pivot()`\n", "- In this case there are two rows which have `female` and play `basketball`. \n", "- The `pivot()` function doesn't know what value to put into cell values. \n", "- So raise a ValueError: Index contains duplicate entries, cannot reshape\n", "- The `pivot_table()` method use the default `aggfunc=’mean’` argument to decide this." ] }, { "cell_type": "markdown", "id": "a3c1711b", "metadata": {}, "source": [ "**Use of margins argument to `pivot_table()` method**" ] }, { "cell_type": "code", "execution_count": 15, "id": "bc964838", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ageheightweight
sportbasketballcricketAllbasketballcricketAllbasketballcricketAll
gender
female23.020.521.3333337370.00000071.0150.0150.000000150.0
male21.022.021.5000007572.00000073.5175.0200.000000187.5
All22.021.021.4000007470.66666772.0162.5166.666667165.0
\n", "
" ], "text/plain": [ " age height weight \\\n", "sport basketball cricket All basketball cricket All basketball \n", "gender \n", "female 23.0 20.5 21.333333 73 70.000000 71.0 150.0 \n", "male 21.0 22.0 21.500000 75 72.000000 73.5 175.0 \n", "All 22.0 21.0 21.400000 74 70.666667 72.0 162.5 \n", "\n", " \n", "sport cricket All \n", "gender \n", "female 150.000000 150.0 \n", "male 200.000000 187.5 \n", "All 166.666667 165.0 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.pivot_table(index='gender', columns='sport', margins=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "fe5a1d3f", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "3406e6fa", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "11548152", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "46676c68", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "1412b518", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "127e7b32", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "e5486840", "metadata": {}, "source": [ "### DataSet 4:\n", "In this dataset, since we have ...., so the `pivot()` method will flag an error as it donot know what out of the three values to place in the dataframe. However, the `pivot_table()` method will use some aggregation function to compute the value to be placed and will work fine....\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "id": "ad72ad40", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "8bfce001", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "cc2b97b9", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 16, "id": "74c41892", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
uniq_idanimalwater_needspeed
01001elephant50020
11002elephant60025
21003elephant35029
31004tiger30060
41005tiger32065
51006tiger33070
61007tiger29069
71008tiger31072
81009zebra20075
91010zebra22077
101011zebra24072
111012zebra23080
121013zebra22082
131014zebra10079
141015zebra8076
151016lion42066
161017lion60067
171018lion50068
181019lion39064
191020kangaroo41019
201021kangaroo43022
211022kangaroo41017
\n", "
" ], "text/plain": [ " uniq_id animal water_need speed\n", "0 1001 elephant 500 20\n", "1 1002 elephant 600 25\n", "2 1003 elephant 350 29\n", "3 1004 tiger 300 60\n", "4 1005 tiger 320 65\n", "5 1006 tiger 330 70\n", "6 1007 tiger 290 69\n", "7 1008 tiger 310 72\n", "8 1009 zebra 200 75\n", "9 1010 zebra 220 77\n", "10 1011 zebra 240 72\n", "11 1012 zebra 230 80\n", "12 1013 zebra 220 82\n", "13 1014 zebra 100 79\n", "14 1015 zebra 80 76\n", "15 1016 lion 420 66\n", "16 1017 lion 600 67\n", "17 1018 lion 500 68\n", "18 1019 lion 390 64\n", "19 1020 kangaroo 410 19\n", "20 1021 kangaroo 430 22\n", "21 1022 kangaroo 410 17" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.read_csv('datasets/waterneed.csv')\n", "df" ] }, { "cell_type": "markdown", "id": "c1f7d727", "metadata": {}, "source": [ "- **The `pivot()` method requires atleast two arguments index and columns**\n", "\n", "- **The `pivot_table()` on the contrary can work on index argument only, the values place are using the mean aggregate function.**" ] }, { "cell_type": "code", "execution_count": 17, "id": "03f5be0d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speeduniq_idwater_need
animal
elephant24.6666671002.0483.333333
kangaroo19.3333331021.0416.666667
lion66.2500001017.5477.500000
tiger67.2000001006.0310.000000
zebra77.2857141012.0184.285714
\n", "
" ], "text/plain": [ " speed uniq_id water_need\n", "animal \n", "elephant 24.666667 1002.0 483.333333\n", "kangaroo 19.333333 1021.0 416.666667\n", "lion 66.250000 1017.5 477.500000\n", "tiger 67.200000 1006.0 310.000000\n", "zebra 77.285714 1012.0 184.285714" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = df.pivot_table(index='animal')\n", "df1" ] }, { "cell_type": "markdown", "id": "6c216c23", "metadata": {}, "source": [ "You can apply aggregation function on the new dataframe as well, such as compute the average speed" ] }, { "cell_type": "code", "execution_count": 18, "id": "0d4e2a62", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "50.94714285714285" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1['speed'].agg('mean')" ] }, { "cell_type": "code", "execution_count": 19, "id": "116b0c39", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "speed 50.947143\n", "water_need 374.357143\n", "dtype: float64" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# You can also perfrom aggragtion to summarize data\n", "df1[['speed','water_need']].agg('mean')" ] }, { "cell_type": "markdown", "id": "e64df88d", "metadata": {}, "source": [ "**Multilevel indexing** You can perfrom multi-level indexing by passing the columns as a list to index argument to `pivottable()`" ] }, { "cell_type": "code", "execution_count": 20, "id": "d7b447e0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
speedwater_need
animaluniq_id
elephant100120500
100225600
100329350
kangaroo102019410
102122430
102217410
lion101666420
101767600
101868500
101964390
tiger100460300
100565320
100670330
100769290
100872310
zebra100975200
101077220
101172240
101280230
101382220
101479100
10157680
\n", "
" ], "text/plain": [ " speed water_need\n", "animal uniq_id \n", "elephant 1001 20 500\n", " 1002 25 600\n", " 1003 29 350\n", "kangaroo 1020 19 410\n", " 1021 22 430\n", " 1022 17 410\n", "lion 1016 66 420\n", " 1017 67 600\n", " 1018 68 500\n", " 1019 64 390\n", "tiger 1004 60 300\n", " 1005 65 320\n", " 1006 70 330\n", " 1007 69 290\n", " 1008 72 310\n", "zebra 1009 75 200\n", " 1010 77 220\n", " 1011 72 240\n", " 1012 80 230\n", " 1013 82 220\n", " 1014 79 100\n", " 1015 76 80" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.pivot_table(index=['animal','uniq_id'])" ] }, { "cell_type": "code", "execution_count": null, "id": "b646be76", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "8b213225", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "ef9e9624", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "92c2eb09", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "07d8214e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "7c3e1995", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "cd95ac43", "metadata": {}, "source": [ "## 2. Reshaping Data Using `df.melt()` Method\n", "\n", "- Similar to `pivot()` and `pivot_table()`, Pandas `melt()` method is also used to transform or reshape data. \n", "- The `pd.melt()` method is used to change the DataFrame format from wide to long\n", "- The Pandas `pd.melt()` method is useful to reshape a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars). Its signature is:\n", "```\n", "pandas.melt(Dataframe, id_vars=None, value_vars=None, var_name=None, value_name='value',ignore_index=True)\n", "```\n", "Where,\n", "- `id_vars`: tuple, list, or ndarray, optional (Column(s) to use as identifier variables)\n", "- `value_var`: tuple, list, or ndarray, optional (If not specified, uses all columns that are not set as id_vars)\n", "- `var_name`: Name to use for the ‘variable’ column. If None it uses frame.columns.name or ‘variable’.\n", "- `value_name`: Name to use for the ‘value’ column.\n", "- `ignore_index`: bool, default True (If True, original index is ignored. If False, the original index is retained.)" ] }, { "cell_type": "code", "execution_count": null, "id": "40a69bd2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "29fd35a1", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "fe29a636", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "761ce23e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "5fa29f2e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "c3ba290f", "metadata": {}, "source": [ "### DataSet 1:\n", "" ] }, { "cell_type": "code", "execution_count": null, "id": "e492553e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "3e0be1c1", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "ab8ace83", "metadata": {}, "source": [ "" ] }, { "cell_type": "code", "execution_count": null, "id": "0fecb36e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "28af4b5e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "ee31757e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "1659b507", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "d6c9aa82", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "3b12a1d4", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 20, "id": "af2eef12", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
daylahorekarachimurree
0Monday327541
1Tuesday307743
2Wednesday287545
3Thursday228238
4Friday308330
5Saturday208145
6Sunday257747
\n", "
" ], "text/plain": [ " day lahore karachi murree\n", "0 Monday 32 75 41\n", "1 Tuesday 30 77 43\n", "2 Wednesday 28 75 45\n", "3 Thursday 22 82 38\n", "4 Friday 30 83 30\n", "5 Saturday 20 81 45\n", "6 Sunday 25 77 47" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv('datasets/weather.csv')\n", "df" ] }, { "cell_type": "code", "execution_count": 21, "id": "adcc07c8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dayvariablevalue
0Mondaylahore32
1Tuesdaylahore30
2Wednesdaylahore28
3Thursdaylahore22
4Fridaylahore30
5Saturdaylahore20
6Sundaylahore25
7Mondaykarachi75
8Tuesdaykarachi77
9Wednesdaykarachi75
10Thursdaykarachi82
11Fridaykarachi83
12Saturdaykarachi81
13Sundaykarachi77
14Mondaymurree41
15Tuesdaymurree43
16Wednesdaymurree45
17Thursdaymurree38
18Fridaymurree30
19Saturdaymurree45
20Sundaymurree47
\n", "
" ], "text/plain": [ " day variable value\n", "0 Monday lahore 32\n", "1 Tuesday lahore 30\n", "2 Wednesday lahore 28\n", "3 Thursday lahore 22\n", "4 Friday lahore 30\n", "5 Saturday lahore 20\n", "6 Sunday lahore 25\n", "7 Monday karachi 75\n", "8 Tuesday karachi 77\n", "9 Wednesday karachi 75\n", "10 Thursday karachi 82\n", "11 Friday karachi 83\n", "12 Saturday karachi 81\n", "13 Sunday karachi 77\n", "14 Monday murree 41\n", "15 Tuesday murree 43\n", "16 Wednesday murree 45\n", "17 Thursday murree 38\n", "18 Friday murree 30\n", "19 Saturday murree 45\n", "20 Sunday murree 47" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = pd.melt(df, id_vars =['day'])\n", "df1" ] }, { "cell_type": "markdown", "id": "9eb8a582", "metadata": {}, "source": [ ">You can change the name of columns for example, replace the column `variable` and `value` with some meaningful names. like `city` and `temperature` using the `var_name` and `value_name` arguments of `melt()` method" ] }, { "cell_type": "code", "execution_count": 22, "id": "071c2703", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
daycitytemperature
0Mondaylahore32
1Tuesdaylahore30
2Wednesdaylahore28
3Thursdaylahore22
4Fridaylahore30
5Saturdaylahore20
6Sundaylahore25
7Mondaykarachi75
8Tuesdaykarachi77
9Wednesdaykarachi75
10Thursdaykarachi82
11Fridaykarachi83
12Saturdaykarachi81
13Sundaykarachi77
14Mondaymurree41
15Tuesdaymurree43
16Wednesdaymurree45
17Thursdaymurree38
18Fridaymurree30
19Saturdaymurree45
20Sundaymurree47
\n", "
" ], "text/plain": [ " day city temperature\n", "0 Monday lahore 32\n", "1 Tuesday lahore 30\n", "2 Wednesday lahore 28\n", "3 Thursday lahore 22\n", "4 Friday lahore 30\n", "5 Saturday lahore 20\n", "6 Sunday lahore 25\n", "7 Monday karachi 75\n", "8 Tuesday karachi 77\n", "9 Wednesday karachi 75\n", "10 Thursday karachi 82\n", "11 Friday karachi 83\n", "12 Saturday karachi 81\n", "13 Sunday karachi 77\n", "14 Monday murree 41\n", "15 Tuesday murree 43\n", "16 Wednesday murree 45\n", "17 Thursday murree 38\n", "18 Friday murree 30\n", "19 Saturday murree 45\n", "20 Sunday murree 47" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = pd.melt(df, id_vars =['day'], var_name='city', value_name='temperature')\n", "df1" ] }, { "cell_type": "markdown", "id": "30eb27bb", "metadata": {}, "source": [ ">You can filter the rows of your choice using the `value_vars` argument of `melt()` method" ] }, { "cell_type": "code", "execution_count": 23, "id": "55f66c15", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
daycitytemperature
0Mondaylahore32
1Tuesdaylahore30
2Wednesdaylahore28
3Thursdaylahore22
4Fridaylahore30
5Saturdaylahore20
6Sundaylahore25
\n", "
" ], "text/plain": [ " day city temperature\n", "0 Monday lahore 32\n", "1 Tuesday lahore 30\n", "2 Wednesday lahore 28\n", "3 Thursday lahore 22\n", "4 Friday lahore 30\n", "5 Saturday lahore 20\n", "6 Sunday lahore 25" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = pd.melt(df, id_vars = ['day'], value_vars =['lahore'], var_name='city', value_name='temperature')\n", "df2" ] }, { "cell_type": "code", "execution_count": 26, "id": "e075e2ca", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
daycitytemperature
0Mondaykarachi75
1Tuesdaykarachi77
2Wednesdaykarachi75
3Thursdaykarachi82
4Fridaykarachi83
5Saturdaykarachi81
6Sundaykarachi77
\n", "
" ], "text/plain": [ " day city temperature\n", "0 Monday karachi 75\n", "1 Tuesday karachi 77\n", "2 Wednesday karachi 75\n", "3 Thursday karachi 82\n", "4 Friday karachi 83\n", "5 Saturday karachi 81\n", "6 Sunday karachi 77" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3 = pd.melt(df, id_vars =['day'], value_vars=['karachi'],var_name='city', value_name='temperature')\n", "df3" ] }, { "cell_type": "code", "execution_count": 26, "id": "6c4568e4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
daycitytemperature
7Mondaykarachi75
8Tuesdaykarachi77
9Wednesdaykarachi75
10Thursdaykarachi82
11Fridaykarachi83
12Saturdaykarachi81
13Sundaykarachi77
\n", "
" ], "text/plain": [ " day city temperature\n", "7 Monday karachi 75\n", "8 Tuesday karachi 77\n", "9 Wednesday karachi 75\n", "10 Thursday karachi 82\n", "11 Friday karachi 83\n", "12 Saturday karachi 81\n", "13 Sunday karachi 77" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# You can achieve the similar result by using Boolean indexing\n", "df1[df1['city'] == 'karachi']" ] }, { "cell_type": "code", "execution_count": null, "id": "2d6f94c9", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "e5246468", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "5f79cd6c", "metadata": {}, "source": [ ">You can apply aggregation function on the new dataframe as well, such as compute the average temperatures" ] }, { "cell_type": "code", "execution_count": 27, "id": "b92af885", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
daycitytemperature
0Mondaylahore32
1Tuesdaylahore30
2Wednesdaylahore28
3Thursdaylahore22
4Fridaylahore30
5Saturdaylahore20
6Sundaylahore25
7Mondaykarachi75
8Tuesdaykarachi77
9Wednesdaykarachi75
10Thursdaykarachi82
11Fridaykarachi83
12Saturdaykarachi81
13Sundaykarachi77
14Mondaymurree41
15Tuesdaymurree43
16Wednesdaymurree45
17Thursdaymurree38
18Fridaymurree30
19Saturdaymurree45
20Sundaymurree47
\n", "
" ], "text/plain": [ " day city temperature\n", "0 Monday lahore 32\n", "1 Tuesday lahore 30\n", "2 Wednesday lahore 28\n", "3 Thursday lahore 22\n", "4 Friday lahore 30\n", "5 Saturday lahore 20\n", "6 Sunday lahore 25\n", "7 Monday karachi 75\n", "8 Tuesday karachi 77\n", "9 Wednesday karachi 75\n", "10 Thursday karachi 82\n", "11 Friday karachi 83\n", "12 Saturday karachi 81\n", "13 Sunday karachi 77\n", "14 Monday murree 41\n", "15 Tuesday murree 43\n", "16 Wednesday murree 45\n", "17 Thursday murree 38\n", "18 Friday murree 30\n", "19 Saturday murree 45\n", "20 Sunday murree 47" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1" ] }, { "cell_type": "code", "execution_count": 28, "id": "5c00f973", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "48.857142857142854" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# compute the average temperature of entire dataframe\n", "df1['temperature'].agg('mean')" ] }, { "cell_type": "code", "execution_count": 29, "id": "10bbbede", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
daycitytemperature
0Mondaylahore32
1Tuesdaylahore30
2Wednesdaylahore28
3Thursdaylahore22
4Fridaylahore30
5Saturdaylahore20
6Sundaylahore25
\n", "
" ], "text/plain": [ " day city temperature\n", "0 Monday lahore 32\n", "1 Tuesday lahore 30\n", "2 Wednesday lahore 28\n", "3 Thursday lahore 22\n", "4 Friday lahore 30\n", "5 Saturday lahore 20\n", "6 Sunday lahore 25" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1[df1['city'] == 'lahore' ]" ] }, { "cell_type": "code", "execution_count": 30, "id": "0a8aac60", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "26.714285714285715" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# compute the average temperature of lahore city only\n", "df1[df1['city'] == 'lahore' ].temperature.agg('mean')" ] }, { "cell_type": "code", "execution_count": 31, "id": "1daeeda3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
daycitytemperature
7Mondaykarachi75
8Tuesdaykarachi77
9Wednesdaykarachi75
10Thursdaykarachi82
11Fridaykarachi83
12Saturdaykarachi81
13Sundaykarachi77
\n", "
" ], "text/plain": [ " day city temperature\n", "7 Monday karachi 75\n", "8 Tuesday karachi 77\n", "9 Wednesday karachi 75\n", "10 Thursday karachi 82\n", "11 Friday karachi 83\n", "12 Saturday karachi 81\n", "13 Sunday karachi 77" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1[df1['city'] == 'karachi' ]" ] }, { "cell_type": "code", "execution_count": 32, "id": "d3a26e60", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "78.57142857142857" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# compute the average temperature of karachi city only\n", "df1[df1['city'] == 'karachi' ].temperature.agg('mean')" ] }, { "cell_type": "code", "execution_count": null, "id": "22264aa4", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "7b282f89", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "cf552c05", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "2b76e497", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "53e880bc", "metadata": {}, "source": [ "## 3. Reshaping Data Using `df.crosstab()` Method\n", "\n", "- The `pd.crosstab()` method is also used for data restructuring and reshaping.\n", "- It is normally used for quickly comparing categorical variables.\n", "- The cross table is also known as contingency table, which is a matrix type table that displays the (multivariate) frequency distribution of variables.\n", "```\n", "pandas.crosstab(index, \n", " columns, \n", " aggfunc=None,\n", " values=None,\n", " margins=False, \n", " normalize=False)\n", "```\n", "Where,\n", "- `index`: array-like, Series, or list of arrays/Series (Values to group by in the rows)\n", "- `columns`: array-like, Series, or list of arrays/Series (Values to group by in the columns)\n", "- `values`: array-like, optional (Array of values to aggregate according to the factors. Requires aggfunc be specified)\n", "- `aggfunc`: function, optional If specified, requires values be specified as well.\n", "- `margins`: bool, default False, Add row/column margins (subtotals).\n", "- `normalize`: bool, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False (Normalize by dividing all values by the sum of values)" ] }, { "cell_type": "code", "execution_count": 33, "id": "2e20d349", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citygenderage
0Lahoremale35
1Lahoremale40
2Lahorefemale70
3Lahorefemale25
4Lahorefemale33
5Karachimale66
6Karachimale29
7Karachifemale24
8Islamabadfemale20
9Islamabadfemale10
10Islamabadmale55
11Islamabadmale15
12Multanmale24
13Multanfemale65
\n", "
" ], "text/plain": [ " city gender age\n", "0 Lahore male 35\n", "1 Lahore male 40\n", "2 Lahore female 70\n", "3 Lahore female 25\n", "4 Lahore female 33\n", "5 Karachi male 66\n", "6 Karachi male 29\n", "7 Karachi female 24\n", "8 Islamabad female 20\n", "9 Islamabad female 10\n", "10 Islamabad male 55\n", "11 Islamabad male 15\n", "12 Multan male 24\n", "13 Multan female 65" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Reading data from 'datasets/sample.csv' file\n", "import numpy as np\n", "import pandas as pd\n", "df = pd.read_csv('datasets/sample1.csv')\n", "df" ] }, { "cell_type": "markdown", "id": "53566f10", "metadata": {}, "source": [ ">Suppose we want to get the frequency distribution of males and females. You pass `city` column as `index` argument and `gender` column as `columns` argument to the `pd.crosstab()` method. It returns a frequency table containing the male and female count in each city" ] }, { "cell_type": "code", "execution_count": 34, "id": "a65f7ee3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
genderfemalemale
city
Islamabad22
Karachi12
Lahore32
Multan11
\n", "
" ], "text/plain": [ "gender female male\n", "city \n", "Islamabad 2 2\n", "Karachi 1 2\n", "Lahore 3 2\n", "Multan 1 1" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(index=df.city, columns=df.gender)" ] }, { "cell_type": "markdown", "id": "3c1c84be", "metadata": {}, "source": [ "> You can also get the count of total male and female in each city by setting `margins` attribute to `True`" ] }, { "cell_type": "code", "execution_count": 35, "id": "806150e6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
genderfemalemaleAll
city
Islamabad224
Karachi123
Lahore325
Multan112
All7714
\n", "
" ], "text/plain": [ "gender female male All\n", "city \n", "Islamabad 2 2 4\n", "Karachi 1 2 3\n", "Lahore 3 2 5\n", "Multan 1 1 2\n", "All 7 7 14" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(index=df.city, columns=df.gender, margins=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "8fea0848", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "48520a2c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "4465dc48", "metadata": {}, "source": [ ">Instead of getting frequencies in whole number you can also calculate the percentage of male and female in each city. For that you set the `normalize` argument to a value of `True`" ] }, { "cell_type": "code", "execution_count": 36, "id": "1b4bf00d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
genderfemalemale
city
Islamabad0.1428570.142857
Karachi0.0714290.142857
Lahore0.2142860.142857
Multan0.0714290.071429
\n", "
" ], "text/plain": [ "gender female male\n", "city \n", "Islamabad 0.142857 0.142857\n", "Karachi 0.071429 0.142857\n", "Lahore 0.214286 0.142857\n", "Multan 0.071429 0.071429" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(index=df.city, columns=df.gender, normalize=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "16a5ea1a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "5de11c86", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "b88e5189", "metadata": {}, "source": [ ">Suppose you want to get the average age of male and female in different cities. To achieve this, set the `values` argument to `age` column, and pass the appropariate aggregate function to the `aggfunc` argument" ] }, { "cell_type": "code", "execution_count": 37, "id": "d10ad158", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
genderfemalemale
city
Islamabad15.00000035.0
Karachi24.00000047.5
Lahore42.66666737.5
Multan65.00000024.0
\n", "
" ], "text/plain": [ "gender female male\n", "city \n", "Islamabad 15.000000 35.0\n", "Karachi 24.000000 47.5\n", "Lahore 42.666667 37.5\n", "Multan 65.000000 24.0" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(index=df.city, columns=df.gender, values=df.age, aggfunc=np.mean)" ] }, { "cell_type": "code", "execution_count": null, "id": "adba2813", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "3e2d5a45", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "0012883b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "2f2486de", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }