{ "cells": [ { "cell_type": "markdown", "id": "4a87b5ef", "metadata": {}, "source": [ "--- \n", " \n", "\n", "

Department of Data Science

\n", "

Course: Tools and Techniques for Data Science

\n", "\n", "---\n", "

Instructor: Muhammad Arif Butt, Ph.D.

" ] }, { "cell_type": "markdown", "id": "ab0dc25c", "metadata": {}, "source": [ "

Lecture 3.17 (Pandas-09)

" ] }, { "cell_type": "markdown", "id": "19f82705", "metadata": {}, "source": [ "## _Aggregating and Grouping Dataframes.ipynb_" ] }, { "cell_type": "code", "execution_count": null, "id": "07d1d815", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "0acb63e8", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "87d7b36c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "f284fcde", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "9f676117", "metadata": {}, "source": [ "

Instructor: Muhammad Arif Butt, Ph.D.

\n", "" ] }, { "cell_type": "code", "execution_count": null, "id": "f2a42950", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "1a434dac", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "c736f024", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "a4a53918", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "36ad40d6", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "12db95e1", "metadata": {}, "source": [ "## Learning agenda of this notebook\n", "1. Overview of Aggregation Functions and the `agg()` method\n", " - Applying a Built-in Aggregation Function on Entire Dataframe Object\n", " - Applying a Built-in Aggregation Function on a Series Object\n", " - Applying a User-Defined/Lambda Function on a Series Object

\n", "2. Computing the Minimum Temperature of each City using **hard way**

\n", "3. Computing the Minimum Temperature of each City using **`groupby`**

\n", "4. Practice GroupBy on Stack Overflow Survey Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "db17aec4", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "5e54b5d6", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "6ab0f3d7", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "63d2431e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "d0a2d89b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "1664cd8f", "metadata": {}, "source": [ "## 1. Overview of Aggregation Functions and the `agg()` Method\n", "- An aggregation function is one which takes multiple individual values and returns a result." ] }, { "cell_type": "code", "execution_count": 1, "id": "298d74ce", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datecitytemperaturehumidity
001/01/2022lahore860
102/01/2022lahore1058
203/01/2022lahore551
304/01/2022lahore649
405/01/2022lahore1254
501/01/2022karachi1874
602/01/2022karachi1071
703/01/2022karachi1278
804/01/2022karachi1576
905/01/2022karachi1670
1001/01/2022murree-561
1102/01/2022murree-368
1203/01/2022murree-469
1304/01/2022murree-163
1405/01/2022murree-772
\n", "
" ], "text/plain": [ " date city temperature humidity\n", "0 01/01/2022 lahore 8 60\n", "1 02/01/2022 lahore 10 58\n", "2 03/01/2022 lahore 5 51\n", "3 04/01/2022 lahore 6 49\n", "4 05/01/2022 lahore 12 54\n", "5 01/01/2022 karachi 18 74\n", "6 02/01/2022 karachi 10 71\n", "7 03/01/2022 karachi 12 78\n", "8 04/01/2022 karachi 15 76\n", "9 05/01/2022 karachi 16 70\n", "10 01/01/2022 murree -5 61\n", "11 02/01/2022 murree -3 68\n", "12 03/01/2022 murree -4 69\n", "13 04/01/2022 murree -1 63\n", "14 05/01/2022 murree -7 72" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.read_csv('datasets/groupbydata2.csv')\n", "df" ] }, { "cell_type": "markdown", "id": "71f119a8", "metadata": {}, "source": [ "### a. Applying a Built-in Aggregation Function on Entire Dataframe Object" ] }, { "cell_type": "code", "execution_count": 2, "id": "3e9650a9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "date 01/01/2022\n", "city karachi\n", "temperature -7\n", "humidity 49\n", "dtype: object" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.min()" ] }, { "cell_type": "code", "execution_count": 3, "id": "7e5e6958", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "date 15\n", "city 15\n", "temperature 15\n", "humidity 15\n", "dtype: int64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.count()" ] }, { "cell_type": "code", "execution_count": 4, "id": "31839411", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/1t/g3ylw8h50cjdqmk5d6jh1qmm0000gn/T/ipykernel_3651/2026153972.py:2: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.\n", " df.median()\n" ] }, { "data": { "text/plain": [ "temperature 8.0\n", "humidity 68.0\n", "dtype: float64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Should be applied to numeric columns only, may raise a warning\n", "df.median()" ] }, { "cell_type": "code", "execution_count": 5, "id": "fa0eb1f9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "temperature 8.0\n", "humidity 68.0\n", "dtype: float64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.median(numeric_only=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "c41a2003", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "bd9f3067", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "b29bf1c2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "3655614e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "8409ebd7", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "0a80e737", "metadata": {}, "source": [ "> We can call the `agg()` method on the dataframe to apply multiple aggregation functions at a time, by passing the `agg()` function a list of aggregation functions as strings." ] }, { "cell_type": "code", "execution_count": 7, "id": "875e29de", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datecitytemperaturehumidity
min01/01/2022karachi-749
max05/01/2022murree1878
count15151515
\n", "
" ], "text/plain": [ " date city temperature humidity\n", "min 01/01/2022 karachi -7 49\n", "max 05/01/2022 murree 18 78\n", "count 15 15 15 15" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.agg(['min', 'max', 'count'])" ] }, { "cell_type": "code", "execution_count": null, "id": "ab3e6286", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "a58a2e02", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "f7624ba3", "metadata": {}, "source": [ "> We can call the `describe()` method on the dataframe to get descriptive statistical measures on all its numeric columns." ] }, { "cell_type": "code", "execution_count": 8, "id": "a26b0208", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
temperaturehumidity
count15.00000015.000000
mean6.13333364.933333
std8.2537159.153194
min-7.00000049.000000
25%-2.00000059.000000
50%8.00000068.000000
75%12.00000071.500000
max18.00000078.000000
\n", "
" ], "text/plain": [ " temperature humidity\n", "count 15.000000 15.000000\n", "mean 6.133333 64.933333\n", "std 8.253715 9.153194\n", "min -7.000000 49.000000\n", "25% -2.000000 59.000000\n", "50% 8.000000 68.000000\n", "75% 12.000000 71.500000\n", "max 18.000000 78.000000" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "code", "execution_count": null, "id": "98f30e56", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "c6ceb80e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "264cd891", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "1b6edfe8", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "29818e96", "metadata": {}, "source": [ "### b. Applying a Built-in Aggregation Function on a Series Object" ] }, { "cell_type": "code", "execution_count": 9, "id": "111bef4b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-7" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['temperature'].min()" ] }, { "cell_type": "code", "execution_count": 10, "id": "22606fba", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "18" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['temperature'].max()" ] }, { "cell_type": "code", "execution_count": 11, "id": "709429fa", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6.133333333333334" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['temperature'].mean()" ] }, { "cell_type": "markdown", "id": "b131b9b5", "metadata": {}, "source": [ "> We can call the `agg()` method on a series to apply multiple aggregation functions at a time, by passing the `agg()` function a list of aggregation functions as strings." ] }, { "cell_type": "code", "execution_count": null, "id": "4bea6bc9", "metadata": {}, "outputs": [], "source": [ "df['temperature'].agg(['min', 'max', 'mean', 'count'])" ] }, { "cell_type": "markdown", "id": "07ad4b62", "metadata": {}, "source": [ "> We can call the `describe()` method on the dataframe to get descriptive statistical measures on all its numeric columns." ] }, { "cell_type": "code", "execution_count": null, "id": "80d38d82", "metadata": {}, "outputs": [], "source": [ "df['temperature'].describe()" ] }, { "cell_type": "code", "execution_count": null, "id": "8e72c885", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "86a71e02", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "39a48d79", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "2d6f7d77", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "4d354a5b", "metadata": {}, "source": [ "### c. Applying a User-Defined/Lambda Function on a Series Object using the `apply()` Method\n", "- We have used this `apply()` method before as well that is used to invoke function on values of Series and return a resulting series." ] }, { "cell_type": "code", "execution_count": 12, "id": "8647021a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 8\n", "1 10\n", "2 5\n", "3 6\n", "4 12\n", "5 18\n", "6 10\n", "7 12\n", "8 15\n", "9 16\n", "10 -5\n", "11 -3\n", "12 -4\n", "13 -1\n", "14 -7\n", "Name: temperature, dtype: int64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.temperature" ] }, { "cell_type": "code", "execution_count": 13, "id": "ffd978e9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 46.4\n", "1 50.0\n", "2 41.0\n", "3 42.8\n", "4 53.6\n", "5 64.4\n", "6 50.0\n", "7 53.6\n", "8 59.0\n", "9 60.8\n", "10 23.0\n", "11 26.6\n", "12 24.8\n", "13 30.2\n", "14 19.4\n", "Name: temperature, dtype: float64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def ctof(x):\n", " return x*9/5+32\n", "\n", "df.temperature.apply(ctof)" ] }, { "cell_type": "code", "execution_count": null, "id": "ec10b7f9", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "e97d6d8f", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "253bc061", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "9a73e2af", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "f4c6948e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 14, "id": "b427ebcd", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 46.4\n", "1 50.0\n", "2 41.0\n", "3 42.8\n", "4 53.6\n", "5 64.4\n", "6 50.0\n", "7 53.6\n", "8 59.0\n", "9 60.8\n", "10 23.0\n", "11 26.6\n", "12 24.8\n", "13 30.2\n", "14 19.4\n", "Name: temperature, dtype: float64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.temperature.apply(lambda x: x*9/5+32)" ] }, { "cell_type": "code", "execution_count": null, "id": "d7d0bbd8", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "a89c671b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "eb0141d9", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "afd6b8ba", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "d9851993", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "1253cf51", "metadata": {}, "source": [ "# How to Compute the Minimum Temperature of Each City?" ] }, { "cell_type": "markdown", "id": "c21a3258", "metadata": {}, "source": [ "## 2. Doing it the Hard Way\n", "" ] }, { "cell_type": "code", "execution_count": 15, "id": "443db1c5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datecitytemperature
001/01/2022lahore8
102/01/2022lahore10
203/01/2022lahore5
304/01/2022lahore6
405/01/2022lahore12
501/01/2022karachi18
602/01/2022karachi10
703/01/2022karachi12
804/01/2022karachi15
905/01/2022karachi16
1001/01/2022murree-5
1102/01/2022murree-3
1203/01/2022murree-4
1304/01/2022murree-1
1405/01/2022murree-7
\n", "
" ], "text/plain": [ " date city temperature\n", "0 01/01/2022 lahore 8\n", "1 02/01/2022 lahore 10\n", "2 03/01/2022 lahore 5\n", "3 04/01/2022 lahore 6\n", "4 05/01/2022 lahore 12\n", "5 01/01/2022 karachi 18\n", "6 02/01/2022 karachi 10\n", "7 03/01/2022 karachi 12\n", "8 04/01/2022 karachi 15\n", "9 05/01/2022 karachi 16\n", "10 01/01/2022 murree -5\n", "11 02/01/2022 murree -3\n", "12 03/01/2022 murree -4\n", "13 04/01/2022 murree -1\n", "14 05/01/2022 murree -7" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.read_csv('datasets/groupbydata1.csv')\n", "df" ] }, { "cell_type": "markdown", "id": "eb304630", "metadata": {}, "source": [ "### a. Splitting the Dataframe\n", "- We need to use conditional selection technique, in which we pass a Boolean mask for the appropriate city column to be selected. Can do it using two ways:\n", " - Using `df[]` subscript operator\n", " - Using `df.loc` method" ] }, { "cell_type": "code", "execution_count": 16, "id": "8bc2737e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datecitytemperature
501/01/2022karachi18
602/01/2022karachi10
703/01/2022karachi12
804/01/2022karachi15
905/01/2022karachi16
\n", "
" ], "text/plain": [ " date city temperature\n", "5 01/01/2022 karachi 18\n", "6 02/01/2022 karachi 10\n", "7 03/01/2022 karachi 12\n", "8 04/01/2022 karachi 15\n", "9 05/01/2022 karachi 16" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['city']=='karachi']" ] }, { "cell_type": "code", "execution_count": 17, "id": "9d4410d9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datecitytemperature
001/01/2022lahore8
102/01/2022lahore10
203/01/2022lahore5
304/01/2022lahore6
405/01/2022lahore12
\n", "
" ], "text/plain": [ " date city temperature\n", "0 01/01/2022 lahore 8\n", "1 02/01/2022 lahore 10\n", "2 03/01/2022 lahore 5\n", "3 04/01/2022 lahore 6\n", "4 05/01/2022 lahore 12" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['city']=='lahore']\n", "df.loc[df.city=='lahore', :]" ] }, { "cell_type": "code", "execution_count": 18, "id": "43a90d9a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datecitytemperature
501/01/2022karachi18
602/01/2022karachi10
703/01/2022karachi12
804/01/2022karachi15
905/01/2022karachi16
\n", "
" ], "text/plain": [ " date city temperature\n", "5 01/01/2022 karachi 18\n", "6 02/01/2022 karachi 10\n", "7 03/01/2022 karachi 12\n", "8 04/01/2022 karachi 15\n", "9 05/01/2022 karachi 16" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['city']=='karachi']\n", "df.loc[df.city=='karachi', :]" ] }, { "cell_type": "code", "execution_count": 19, "id": "2fe134f3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datecitytemperature
1001/01/2022murree-5
1102/01/2022murree-3
1203/01/2022murree-4
1304/01/2022murree-1
1405/01/2022murree-7
\n", "
" ], "text/plain": [ " date city temperature\n", "10 01/01/2022 murree -5\n", "11 02/01/2022 murree -3\n", "12 03/01/2022 murree -4\n", "13 04/01/2022 murree -1\n", "14 05/01/2022 murree -7" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['city']=='murree']\n", "df.loc[df.city=='murree', :]" ] }, { "cell_type": "markdown", "id": "531be356", "metadata": {}, "source": [ ">**Limitation:**\n", ">- We have to repeat this process for every city separately.\n", ">- What if there are over 100 cities in the dataset?" ] }, { "cell_type": "code", "execution_count": null, "id": "9c11acbb", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "2510868f", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "a552b89a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "c0bf46fe", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "3b691295", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "aafece6c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "e6a35e53", "metadata": {}, "source": [ "### b. Applying the `min()` Function\n", "- We need to apply the `min()` function on the temperature column of all of the above dataframes separately" ] }, { "cell_type": "code", "execution_count": null, "id": "98d13252", "metadata": {}, "outputs": [], "source": [ "df.loc[df.city=='lahore', :].temperature.min()" ] }, { "cell_type": "code", "execution_count": null, "id": "3a9b7261", "metadata": {}, "outputs": [], "source": [ "df.loc[df.city=='lahore', :].temperature.min()" ] }, { "cell_type": "code", "execution_count": null, "id": "2f36327e", "metadata": {}, "outputs": [], "source": [ "df.loc[df.city=='karachi', :].temperature.min()" ] }, { "cell_type": "code", "execution_count": null, "id": "9e76a7dd", "metadata": {}, "outputs": [], "source": [ "df.loc[df.city=='murree', :].temperature.min()" ] }, { "cell_type": "markdown", "id": "6573946e", "metadata": {}, "source": [ ">**Limitation:**\n", ">- We have to repeat this process for every city separately.\n", ">- What if there are over 100 cities in the dataset?" ] }, { "cell_type": "code", "execution_count": null, "id": "078a3e5c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "adc56aef", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "49071bbc", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "5647b406", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "84b892aa", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "3bc149d5", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "8d980d9c", "metadata": {}, "source": [ "### c. Combining the Result\n", "- Since, we have got the minimum temperature of all the cities, we need to combine them to an appropriate series object to be used for later processing." ] }, { "cell_type": "code", "execution_count": null, "id": "67ff4a19", "metadata": {}, "outputs": [], "source": [ "lhr = df.loc[df.city=='lahore', :].temperature.min()\n", "kci = df.loc[df.city=='karachi', :].temperature.min()\n", "murree = df.loc[df.city=='murree', :].temperature.min()\n", "\n", "s = pd.Series(data=[lhr, kci, murree], index=['L_min', 'K_min', 'M_min'] )\n", "s.name= 'Min Temperatures'\n", "s" ] }, { "cell_type": "code", "execution_count": null, "id": "59a794d2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "f857f7dd", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "30bc4bdc", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "8eca9f7d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "440a8b2f", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "c14e695f", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "c2ba09af", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "3f7f9891", "metadata": {}, "source": [ "# How to Compute the Minimum Temperature of Each City?" ] }, { "cell_type": "markdown", "id": "62fb733d", "metadata": {}, "source": [ "## 3. An Elegant Way\n", "" ] }, { "cell_type": "code", "execution_count": 20, "id": "08b3cae2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datecitytemperature
001/01/2022lahore8
102/01/2022lahore10
203/01/2022lahore5
304/01/2022lahore6
405/01/2022lahore12
501/01/2022karachi18
602/01/2022karachi10
703/01/2022karachi12
804/01/2022karachi15
905/01/2022karachi16
1001/01/2022murree-5
1102/01/2022murree-3
1203/01/2022murree-4
1304/01/2022murree-1
1405/01/2022murree-7
\n", "
" ], "text/plain": [ " date city temperature\n", "0 01/01/2022 lahore 8\n", "1 02/01/2022 lahore 10\n", "2 03/01/2022 lahore 5\n", "3 04/01/2022 lahore 6\n", "4 05/01/2022 lahore 12\n", "5 01/01/2022 karachi 18\n", "6 02/01/2022 karachi 10\n", "7 03/01/2022 karachi 12\n", "8 04/01/2022 karachi 15\n", "9 05/01/2022 karachi 16\n", "10 01/01/2022 murree -5\n", "11 02/01/2022 murree -3\n", "12 03/01/2022 murree -4\n", "13 04/01/2022 murree -1\n", "14 05/01/2022 murree -7" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.read_csv('datasets/groupbydata1.csv')\n", "df" ] }, { "cell_type": "markdown", "id": "93f02e73", "metadata": {}, "source": [ "### a. Step 1: Split Step\n", "- In the split step we divide the data inside the dataframe into multiple groups\n", "- Since we need to calculate the minimum temperature of each city, therefore, we will use `groupby()` method on the `city` column of the dataframe.\n", "- This will result a DataFrameGroupBy object, which is an iterable containing multiple small dataframes based on the `by` argument passed to the `groupby()` method" ] }, { "cell_type": "code", "execution_count": 25, "id": "b0ea5d79", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfgb = df.groupby('city')\n", "dfgb" ] }, { "cell_type": "code", "execution_count": null, "id": "52a0e61a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "46dd2666", "metadata": {}, "source": [ ">- Since this is an iterable, so let us iterate :)" ] }, { "cell_type": "code", "execution_count": 26, "id": "cc1b1019", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('karachi', date city temperature\n", "5 01/01/2022 karachi 18\n", "6 02/01/2022 karachi 10\n", "7 03/01/2022 karachi 12\n", "8 04/01/2022 karachi 15\n", "9 05/01/2022 karachi 16)\n", "('lahore', date city temperature\n", "0 01/01/2022 lahore 8\n", "1 02/01/2022 lahore 10\n", "2 03/01/2022 lahore 5\n", "3 04/01/2022 lahore 6\n", "4 05/01/2022 lahore 12)\n", "('murree', date city temperature\n", "10 01/01/2022 murree -5\n", "11 02/01/2022 murree -3\n", "12 03/01/2022 murree -4\n", "13 04/01/2022 murree -1\n", "14 05/01/2022 murree -7)\n" ] } ], "source": [ "for mydf in dfgb:\n", " print(mydf)" ] }, { "cell_type": "code", "execution_count": null, "id": "bdce1713", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "f00f923a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "2c824665", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "f2429cb1", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "da049f81", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "6b81eb53", "metadata": {}, "source": [ ">- To display indices of every group in the dataframe, use `groups` attribute of `DataFrameGroupBy` object.\n", ">- Returns a Dictionary object (PrettyDict) with keys as the group value and value as list of corresponding row indices" ] }, { "cell_type": "code", "execution_count": 27, "id": "1de745f0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'karachi': [5, 6, 7, 8, 9], 'lahore': [0, 1, 2, 3, 4], 'murree': [10, 11, 12, 13, 14]}" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfgb.groups # df.groupby('city').groups" ] }, { "cell_type": "code", "execution_count": null, "id": "00e3974a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "64e34bf9", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "48f40627", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "444b9ae2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "2fd7fc5e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "d4c2271f", "metadata": {}, "source": [ ">- To display records of a specific group, use `get_group()` method on `DataFrameGroupBy` object.\n", ">- Construct and return DataFrame from `DataFrameGroupBy` object with provided name." ] }, { "cell_type": "code", "execution_count": 29, "id": "cbfa0a16", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datecitytemperature
1001/01/2022murree-5
1102/01/2022murree-3
1203/01/2022murree-4
1304/01/2022murree-1
1405/01/2022murree-7
\n", "
" ], "text/plain": [ " date city temperature\n", "10 01/01/2022 murree -5\n", "11 02/01/2022 murree -3\n", "12 03/01/2022 murree -4\n", "13 04/01/2022 murree -1\n", "14 05/01/2022 murree -7" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display DataFrame of a specific group from groupby object by providing the specific group value\n", "dfgb.get_group('murree') # df.groupby('city').get_group('karachi') " ] }, { "cell_type": "code", "execution_count": null, "id": "a93eea5d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "22bdd270", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "06f58584", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "3b1f8d1c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "1c325989", "metadata": {}, "source": [ ">- To find the size of each group, use `size()` method of DataFrameGroupBy object.\n", ">- It return a series containing number of rows in each each group of the DataFrameGroupBy object as a Series" ] }, { "cell_type": "code", "execution_count": 30, "id": "fb875cd9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "city\n", "karachi 5\n", "lahore 5\n", "murree 5\n", "dtype: int64" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfgb.size() #df.groupby('city').size()" ] }, { "cell_type": "markdown", "id": "03042da0", "metadata": {}, "source": [ "> After understanding the `groupby()` method let us move to step 2, and that is `Applying a Function`" ] }, { "cell_type": "code", "execution_count": null, "id": "8f6f80ed", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "fcfc7a1d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "6f2df584", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "3dd730b5", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "ac3e2b74", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "2030779b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "38c9df60", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "95bc3a96", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "9519e655", "metadata": {}, "source": [ "### b. Step 2: Apply Step\n", "- Now second step is that we apply appropriate aggregate function on all the groups inside the DataFrameGroupBy object" ] }, { "cell_type": "markdown", "id": "729e7c98", "metadata": {}, "source": [ "**Let us first apply aggregate function on a specific column of `DataFrameGroupBy` object, which is a `SeriesGroupBy` object**" ] }, { "cell_type": "code", "execution_count": 31, "id": "4b906481", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datecitytemperature
001/01/2022lahore8
102/01/2022lahore10
203/01/2022lahore5
304/01/2022lahore6
405/01/2022lahore12
501/01/2022karachi18
602/01/2022karachi10
703/01/2022karachi12
804/01/2022karachi15
905/01/2022karachi16
1001/01/2022murree-5
1102/01/2022murree-3
1203/01/2022murree-4
1304/01/2022murree-1
1405/01/2022murree-7
\n", "
" ], "text/plain": [ " date city temperature\n", "0 01/01/2022 lahore 8\n", "1 02/01/2022 lahore 10\n", "2 03/01/2022 lahore 5\n", "3 04/01/2022 lahore 6\n", "4 05/01/2022 lahore 12\n", "5 01/01/2022 karachi 18\n", "6 02/01/2022 karachi 10\n", "7 03/01/2022 karachi 12\n", "8 04/01/2022 karachi 15\n", "9 05/01/2022 karachi 16\n", "10 01/01/2022 murree -5\n", "11 02/01/2022 murree -3\n", "12 03/01/2022 murree -4\n", "13 04/01/2022 murree -1\n", "14 05/01/2022 murree -7" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": null, "id": "85bfea29", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 32, "id": "6f3ffb6d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('city')" ] }, { "cell_type": "code", "execution_count": 33, "id": "4175e21e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datecitytemperature
001/01/2022lahore8
102/01/2022lahore10
203/01/2022lahore5
304/01/2022lahore6
405/01/2022lahore12
\n", "
" ], "text/plain": [ " date city temperature\n", "0 01/01/2022 lahore 8\n", "1 02/01/2022 lahore 10\n", "2 03/01/2022 lahore 5\n", "3 04/01/2022 lahore 6\n", "4 05/01/2022 lahore 12" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('city').get_group('lahore')" ] }, { "cell_type": "code", "execution_count": 34, "id": "c7820a7d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('city').get_group('lahore').temperature.min()" ] }, { "cell_type": "code", "execution_count": 35, "id": "ff32ca65", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('city').get_group('lahore').temperature.min()" ] }, { "cell_type": "code", "execution_count": 36, "id": "e1744064", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('city').get_group('karachi').temperature.min()" ] }, { "cell_type": "code", "execution_count": 37, "id": "c4aab17c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-7" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('city').get_group('murree').temperature.min()" ] }, { "cell_type": "code", "execution_count": null, "id": "fe8e892f", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "8f2dec82", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "4f275823", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "a49a9ba0", "metadata": {}, "source": [ "### b. Step 3: Combine Step\n", "- Now we have got minimum temperature of all the three cities, let us combine the result into a series object" ] }, { "cell_type": "code", "execution_count": 38, "id": "e37d344a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "K_min 10\n", "L_min 5\n", "M_min -7\n", "Name: Min Temperatures, dtype: int64" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kci = df.groupby('city').get_group('karachi').temperature.min()\n", "lhr = df.groupby('city').get_group('lahore').temperature.min()\n", "murree = df.groupby('city').get_group('murree').temperature.min()\n", "\n", "s1 = pd.Series(data=[kci, lhr, murree], index=['K_min', 'L_min', 'M_min'] )\n", "s1.name= 'Min Temperatures'\n", "s1" ] }, { "cell_type": "code", "execution_count": null, "id": "db772042", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "f07d918e", "metadata": {}, "source": [ ">- **Let us perform the `apply + combine` steps in one go, by applying the `min()` function on the temperature series of all the dataframes inside the DataFrameGroupBy object.**\n", ">- **This saves us from the hassle of applying `min()` method explicitly as done above**" ] }, { "cell_type": "code", "execution_count": null, "id": "77db0e35", "metadata": {}, "outputs": [], "source": [ "df.groupby('city')" ] }, { "cell_type": "code", "execution_count": null, "id": "2d014f0d", "metadata": {}, "outputs": [], "source": [ "df.groupby('city').temperature" ] }, { "cell_type": "code", "execution_count": null, "id": "58604a90", "metadata": {}, "outputs": [], "source": [ "df.groupby('city').temperature.min()" ] }, { "cell_type": "code", "execution_count": null, "id": "214d5d3a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "f362632d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "e3325138", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "104e0f3c", "metadata": {}, "source": [ ">- **We can also apply `agg()` method on the temperature series of all the dataframes inside the DataFrameGroupBy object**" ] }, { "cell_type": "code", "execution_count": null, "id": "fd047429", "metadata": {}, "outputs": [], "source": [ "df.groupby('city').temperature.agg(['min', 'max', 'sum', 'mean'])" ] }, { "cell_type": "markdown", "id": "908e2c72", "metadata": {}, "source": [ ">-Note that we have got a dataframe this time" ] }, { "cell_type": "code", "execution_count": null, "id": "ab453a05", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "5affd9eb", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "a20735b4", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "da368894", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "842b740b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "58697cde", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "94101a3b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "b5f491f4", "metadata": {}, "source": [ "## 4. Practice GroupBy on Stack Overflow Survey Dataset\n", "Visit to Download Data: https://insights.stackoverflow.com/survey/" ] }, { "cell_type": "code", "execution_count": null, "id": "12fdafcf", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "5f02fc6d", "metadata": {}, "source": [ "### a. Understand the Data Set" ] }, { "cell_type": "code", "execution_count": 39, "id": "2313a7da", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(88883, 9)" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.read_csv('datasets/so_survey_subset.csv', index_col='Respondent')\n", "df.shape" ] }, { "cell_type": "code", "execution_count": 40, "id": "adb21c6b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MainBranchHobbyistCountryYearsCodeConvertedCompLanguageWorkedWithSocialMediaAgeGender
Respondent
1I am a student who is learning to codeYesUnited Kingdom4NaNHTML/CSS;Java;JavaScript;PythonTwitter14.0Man
2I am a student who is learning to codeNoBosnia and HerzegovinaNaNNaNC++;HTML/CSS;PythonInstagram19.0Man
3I am not primarily a developer, but I write co...YesThailand38820.0HTML/CSSReddit28.0Man
4I am a developer by professionNoUnited States361000.0C;C++;C#;Python;SQLReddit22.0Man
5I am a developer by professionYesUkraine16NaNC++;HTML/CSS;Java;JavaScript;Python;SQL;VBAFacebook30.0Man
\n", "
" ], "text/plain": [ " MainBranch Hobbyist \\\n", "Respondent \n", "1 I am a student who is learning to code Yes \n", "2 I am a student who is learning to code No \n", "3 I am not primarily a developer, but I write co... Yes \n", "4 I am a developer by profession No \n", "5 I am a developer by profession Yes \n", "\n", " Country YearsCode ConvertedComp \\\n", "Respondent \n", "1 United Kingdom 4 NaN \n", "2 Bosnia and Herzegovina NaN NaN \n", "3 Thailand 3 8820.0 \n", "4 United States 3 61000.0 \n", "5 Ukraine 16 NaN \n", "\n", " LanguageWorkedWith SocialMedia Age \\\n", "Respondent \n", "1 HTML/CSS;Java;JavaScript;Python Twitter 14.0 \n", "2 C++;HTML/CSS;Python Instagram 19.0 \n", "3 HTML/CSS Reddit 28.0 \n", "4 C;C++;C#;Python;SQL Reddit 22.0 \n", "5 C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA Facebook 30.0 \n", "\n", " Gender \n", "Respondent \n", "1 Man \n", "2 Man \n", "3 Man \n", "4 Man \n", "5 Man " ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "aed11320", "metadata": {}, "outputs": [], "source": [ "df.loc[df['Country']=='Pakistan', :]" ] }, { "cell_type": "code", "execution_count": null, "id": "46bd92ff", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "8999fd3d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 41, "id": "600d2e20", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
QuestionText
Column
RespondentRandomized respondent ID number (not in order ...
MainBranchWhich of the following options best describes ...
HobbyistDo you code as a hobby?
CountryIn which country do you currently reside?
YearsCodeIncluding any education, how many years have y...
ConvertedCompSalary converted to annual USD salaries using ...
LanguageWorkedWithWhich of the following programming, scripting,...
SocialMediaWhat social media site do you use the most?
AgeWhat is your age (in years)? If you prefer not...
GenderWhich of the following do you currently identi...
\n", "
" ], "text/plain": [ " QuestionText\n", "Column \n", "Respondent Randomized respondent ID number (not in order ...\n", "MainBranch Which of the following options best describes ...\n", "Hobbyist Do you code as a hobby?\n", "Country In which country do you currently reside?\n", "YearsCode Including any education, how many years have y...\n", "ConvertedComp Salary converted to annual USD salaries using ...\n", "LanguageWorkedWith Which of the following programming, scripting,...\n", "SocialMedia What social media site do you use the most?\n", "Age What is your age (in years)? If you prefer not...\n", "Gender Which of the following do you currently identi..." ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "schema = pd.read_csv('datasets/so_survey_subset_schema.csv', index_col='Column')\n", "schema" ] }, { "cell_type": "code", "execution_count": null, "id": "5429f2d0", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 42, "id": "146eb97a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "QuestionText Do you code as a hobby?\n", "Name: Hobbyist, dtype: object" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "schema.loc['Hobbyist']" ] }, { "cell_type": "code", "execution_count": null, "id": "bf6d798f", "metadata": {}, "outputs": [], "source": [ "df['Hobbyist']" ] }, { "cell_type": "code", "execution_count": null, "id": "28946f82", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "54ddb6cf", "metadata": {}, "outputs": [], "source": [ "schema.loc['Country']" ] }, { "cell_type": "code", "execution_count": null, "id": "55a8ee40", "metadata": {}, "outputs": [], "source": [ "df['Country']" ] }, { "cell_type": "code", "execution_count": null, "id": "c98c33e1", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "8ad3de2b", "metadata": {}, "outputs": [], "source": [ "schema.loc['ConvertedComp']" ] }, { "cell_type": "code", "execution_count": null, "id": "40df4193", "metadata": {}, "outputs": [], "source": [ "df['ConvertedComp']" ] }, { "cell_type": "code", "execution_count": null, "id": "f2c3819d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "6f1899a1", "metadata": {}, "outputs": [], "source": [ "schema.loc['LanguageWorkedWith']" ] }, { "cell_type": "code", "execution_count": null, "id": "4ef88604", "metadata": {}, "outputs": [], "source": [ "!cat datasets/so_survey_subset_schema.csv" ] }, { "cell_type": "code", "execution_count": null, "id": "7fa61427", "metadata": {}, "outputs": [], "source": [ "df['LanguageWorkedWith']" ] }, { "cell_type": "code", "execution_count": null, "id": "dbd881f9", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "7bd2f021", "metadata": {}, "outputs": [], "source": [ "schema.loc['SocialMedia']" ] }, { "cell_type": "code", "execution_count": null, "id": "bb7d90c9", "metadata": {}, "outputs": [], "source": [ "df['SocialMedia']" ] }, { "cell_type": "code", "execution_count": null, "id": "38ee282a", "metadata": {}, "outputs": [], "source": [ "df" ] }, { "cell_type": "code", "execution_count": null, "id": "574e3847", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "da4a81c4", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "bc9d4cfb", "metadata": {}, "source": [ "##### Let us perform some basic statistical analysis on the Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "2096bf5e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "35f48c5d", "metadata": {}, "outputs": [], "source": [ "# Returns the count of non-NA values for a series object.\n", "df['Hobbyist'].count()" ] }, { "cell_type": "code", "execution_count": null, "id": "fe766e4c", "metadata": {}, "outputs": [], "source": [ "# Returns a Series containing counts of unique rows.\n", "df['Hobbyist'].value_counts()" ] }, { "cell_type": "code", "execution_count": null, "id": "e791cf10", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "39ba7c9b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "abac191b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "650bd519", "metadata": {}, "outputs": [], "source": [ "# Returns the count of non-NA values for a series object.\n", "df['Country'].count()" ] }, { "cell_type": "code", "execution_count": null, "id": "4aaea5b1", "metadata": {}, "outputs": [], "source": [ "# Returns a Series containing counts of unique rows.\n", "df['Country'].value_counts()" ] }, { "cell_type": "code", "execution_count": null, "id": "6b91f3cf", "metadata": {}, "outputs": [], "source": [ "# To get the count of countries whose developers participated in the survey\n", "df['Country'].value_counts().count()" ] }, { "cell_type": "code", "execution_count": null, "id": "4c3ae775", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "4de9edd6", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "aa4e378e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "fc48a1f7", "metadata": {}, "outputs": [], "source": [ "# Returns the count of non-NA values for a series object.\n", "df['ConvertedComp'].count()" ] }, { "cell_type": "code", "execution_count": null, "id": "eb34078c", "metadata": {}, "outputs": [], "source": [ "# Returns a Series containing counts of unique rows.\n", "df['ConvertedComp'].value_counts()" ] }, { "cell_type": "code", "execution_count": null, "id": "34dde858", "metadata": {}, "outputs": [], "source": [ "df['ConvertedComp'].mean()" ] }, { "cell_type": "code", "execution_count": null, "id": "f91267c0", "metadata": {}, "outputs": [], "source": [ "df['ConvertedComp'].median()" ] }, { "cell_type": "code", "execution_count": null, "id": "f48e6608", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "755b50b5", "metadata": {}, "outputs": [], "source": [ "df.describe()" ] }, { "cell_type": "code", "execution_count": null, "id": "b8b48433", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "6516ee56", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "46da67ef", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "0fe690a2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "94e974b1", "metadata": {}, "source": [ "

Let us try answering certain Questions

" ] }, { "cell_type": "code", "execution_count": null, "id": "f561cf90", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "1e75db36", "metadata": {}, "source": [ "## Question 1: \n", ">**List the most popular SocialMedia web site for every Country**" ] }, { "cell_type": "markdown", "id": "557059b6", "metadata": {}, "source": [ "**Let us first do the easy task, and that is to list the most popular SocialMedia website of a single country (lets say Pakistan)**" ] }, { "cell_type": "code", "execution_count": 43, "id": "def1798a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MainBranchHobbyistCountryYearsCodeConvertedCompLanguageWorkedWithSocialMediaAgeGender
Respondent
1I am a student who is learning to codeYesUnited Kingdom4NaNHTML/CSS;Java;JavaScript;PythonTwitter14.0Man
2I am a student who is learning to codeNoBosnia and HerzegovinaNaNNaNC++;HTML/CSS;PythonInstagram19.0Man
3I am not primarily a developer, but I write co...YesThailand38820.0HTML/CSSReddit28.0Man
4I am a developer by professionNoUnited States361000.0C;C++;C#;Python;SQLReddit22.0Man
5I am a developer by professionYesUkraine16NaNC++;HTML/CSS;Java;JavaScript;Python;SQL;VBAFacebook30.0Man
..............................
88377NaNYesCanadaNaNNaNHTML/CSS;JavaScript;Other(s):YouTubeNaNMan
88601NaNNoNaNNaNNaNNaNNaNNaNNaN
88802NaNNoNaNNaNNaNNaNNaNNaNNaN
88816NaNNoNaNNaNNaNNaNNaNNaNNaN
88863NaNYesSpain8NaNBash/Shell/PowerShell;HTML/CSS;Java;JavaScript...WhatsApp18.0Man
\n", "

88883 rows × 9 columns

\n", "
" ], "text/plain": [ " MainBranch Hobbyist \\\n", "Respondent \n", "1 I am a student who is learning to code Yes \n", "2 I am a student who is learning to code No \n", "3 I am not primarily a developer, but I write co... Yes \n", "4 I am a developer by profession No \n", "5 I am a developer by profession Yes \n", "... ... ... \n", "88377 NaN Yes \n", "88601 NaN No \n", "88802 NaN No \n", "88816 NaN No \n", "88863 NaN Yes \n", "\n", " Country YearsCode ConvertedComp \\\n", "Respondent \n", "1 United Kingdom 4 NaN \n", "2 Bosnia and Herzegovina NaN NaN \n", "3 Thailand 3 8820.0 \n", "4 United States 3 61000.0 \n", "5 Ukraine 16 NaN \n", "... ... ... ... \n", "88377 Canada NaN NaN \n", "88601 NaN NaN NaN \n", "88802 NaN NaN NaN \n", "88816 NaN NaN NaN \n", "88863 Spain 8 NaN \n", "\n", " LanguageWorkedWith SocialMedia \\\n", "Respondent \n", "1 HTML/CSS;Java;JavaScript;Python Twitter \n", "2 C++;HTML/CSS;Python Instagram \n", "3 HTML/CSS Reddit \n", "4 C;C++;C#;Python;SQL Reddit \n", "5 C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA Facebook \n", "... ... ... \n", "88377 HTML/CSS;JavaScript;Other(s): YouTube \n", "88601 NaN NaN \n", "88802 NaN NaN \n", "88816 NaN NaN \n", "88863 Bash/Shell/PowerShell;HTML/CSS;Java;JavaScript... WhatsApp \n", "\n", " Age Gender \n", "Respondent \n", "1 14.0 Man \n", "2 19.0 Man \n", "3 28.0 Man \n", "4 22.0 Man \n", "5 30.0 Man \n", "... ... ... \n", "88377 NaN Man \n", "88601 NaN NaN \n", "88802 NaN NaN \n", "88816 NaN NaN \n", "88863 18.0 Man \n", "\n", "[88883 rows x 9 columns]" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 47, "id": "df9b825e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "WhatsApp 266\n", "Facebook 232\n", "YouTube 182\n", "LinkedIn 71\n", "Twitter 58\n", "Instagram 41\n", "Reddit 28\n", "I don't use social media 23\n", "Snapchat 5\n", "Hello 1\n", "VK ВКонта́кте 1\n", "Name: SocialMedia, dtype: int64" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df.Country =='Pakistan', 'SocialMedia'].value_counts()" ] }, { "cell_type": "code", "execution_count": null, "id": "d54d37cb", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "846ab59e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 44, "id": "d65ad730", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "WeChat 微信 403\n", "YouTube 53\n", "Weibo 新浪微博 42\n", "I don't use social media 27\n", "Twitter 27\n", "Reddit 12\n", "LinkedIn 11\n", "Facebook 8\n", "Instagram 7\n", "Youku Tudou 优酷 7\n", "WhatsApp 3\n", "VK ВКонта́кте 1\n", "Name: SocialMedia, dtype: int64" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df.Country =='Pakistan', :]\n", "df.loc[df.Country =='Pakistan', 'SocialMedia'].head(10)\n", "df.loc[df.Country =='Pakistan', 'SocialMedia'].value_counts()\n", "df.loc[df.Country =='Pakistan', 'SocialMedia'].value_counts(normalize=True)\n", "df.loc[df.Country =='China', 'SocialMedia'].value_counts()" ] }, { "cell_type": "code", "execution_count": null, "id": "440d617b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "4cf572c2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "2ba1fa80", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "95b681b2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 48, "id": "54413fbf", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('Country')" ] }, { "cell_type": "code", "execution_count": 49, "id": "a7cf5d90", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MainBranchHobbyistCountryYearsCodeConvertedCompLanguageWorkedWithSocialMediaAgeGender
Respondent
84I am a developer by professionNoPakistan33468.0C;C++;C#;Java;Kotlin;PHP;SQLWhatsApp26.0Man
119I am a developer by professionNoPakistan10NaNC;C++;C#;HTML/CSS;Java;JavaScript;SQLFacebook28.0Man
298I am a developer by professionYesPakistan4NaNHTML/CSS;JavaScript;PHP;SQL;Other(s):LinkedIn23.0Man
299I am a developer by professionYesPakistan19NaNAssembly;C;C++;Java;Python;SQLFacebook25.0Man
311I am a developer by professionNoPakistan52600.0Assembly;C;C++;C#;HTML/CSS;Java;Python;Scala;SQLLinkedIn24.0Man
..............................
88862I am a student who is learning to codeYesPakistan3NaNJavaWhatsApp21.0Man
5439NaNYesPakistan2NaNNaNInstagram24.0Woman
39117NaNYesPakistan4NaNC;C++;C#;HTML/CSS;Java;JavaScript;SQLWhatsApp22.0Man
60066NaNYesPakistan4NaNAssembly;C++;C#;HTML/CSS;Java;PHP;Python;SQLYouTube20.0Man
88182NaNYesPakistan1NaNHTML/CSS;Java;JavaScriptTwitterNaNMan
\n", "

923 rows × 9 columns

\n", "
" ], "text/plain": [ " MainBranch Hobbyist Country \\\n", "Respondent \n", "84 I am a developer by profession No Pakistan \n", "119 I am a developer by profession No Pakistan \n", "298 I am a developer by profession Yes Pakistan \n", "299 I am a developer by profession Yes Pakistan \n", "311 I am a developer by profession No Pakistan \n", "... ... ... ... \n", "88862 I am a student who is learning to code Yes Pakistan \n", "5439 NaN Yes Pakistan \n", "39117 NaN Yes Pakistan \n", "60066 NaN Yes Pakistan \n", "88182 NaN Yes Pakistan \n", "\n", " YearsCode ConvertedComp \\\n", "Respondent \n", "84 3 3468.0 \n", "119 10 NaN \n", "298 4 NaN \n", "299 19 NaN \n", "311 5 2600.0 \n", "... ... ... \n", "88862 3 NaN \n", "5439 2 NaN \n", "39117 4 NaN \n", "60066 4 NaN \n", "88182 1 NaN \n", "\n", " LanguageWorkedWith SocialMedia \\\n", "Respondent \n", "84 C;C++;C#;Java;Kotlin;PHP;SQL WhatsApp \n", "119 C;C++;C#;HTML/CSS;Java;JavaScript;SQL Facebook \n", "298 HTML/CSS;JavaScript;PHP;SQL;Other(s): LinkedIn \n", "299 Assembly;C;C++;Java;Python;SQL Facebook \n", "311 Assembly;C;C++;C#;HTML/CSS;Java;Python;Scala;SQL LinkedIn \n", "... ... ... \n", "88862 Java WhatsApp \n", "5439 NaN Instagram \n", "39117 C;C++;C#;HTML/CSS;Java;JavaScript;SQL WhatsApp \n", "60066 Assembly;C++;C#;HTML/CSS;Java;PHP;Python;SQL YouTube \n", "88182 HTML/CSS;Java;JavaScript Twitter \n", "\n", " Age Gender \n", "Respondent \n", "84 26.0 Man \n", "119 28.0 Man \n", "298 23.0 Man \n", "299 25.0 Man \n", "311 24.0 Man \n", "... ... ... \n", "88862 21.0 Man \n", "5439 24.0 Woman \n", "39117 22.0 Man \n", "60066 20.0 Man \n", "88182 NaN Man \n", "\n", "[923 rows x 9 columns]" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('Country').get_group(\"Pakistan\")" ] }, { "cell_type": "code", "execution_count": 50, "id": "8bc1753e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Respondent\n", "84 WhatsApp\n", "119 Facebook\n", "298 LinkedIn\n", "299 Facebook\n", "311 LinkedIn\n", " ... \n", "88862 WhatsApp\n", "5439 Instagram\n", "39117 WhatsApp\n", "60066 YouTube\n", "88182 Twitter\n", "Name: SocialMedia, Length: 923, dtype: object" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('Country').get_group(\"Pakistan\").loc[:, 'SocialMedia']" ] }, { "cell_type": "code", "execution_count": 51, "id": "caa9a03c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "WhatsApp 266\n", "Facebook 232\n", "YouTube 182\n", "LinkedIn 71\n", "Twitter 58\n", "Instagram 41\n", "Reddit 28\n", "I don't use social media 23\n", "Snapchat 5\n", "Hello 1\n", "VK ВКонта́кте 1\n", "Name: SocialMedia, dtype: int64" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('Country').get_group(\"Pakistan\").loc[:, 'SocialMedia'].value_counts()" ] }, { "cell_type": "code", "execution_count": 54, "id": "083d3d5f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Country SocialMedia \n", "Afghanistan Facebook 15\n", " YouTube 9\n", " I don't use social media 6\n", " WhatsApp 4\n", " Instagram 1\n", " LinkedIn 1\n", " Twitter 1\n", "Albania WhatsApp 18\n", " Facebook 16\n", " Instagram 13\n", " YouTube 10\n", " Twitter 8\n", " LinkedIn 7\n", " Reddit 6\n", " I don't use social media 4\n", " Snapchat 1\n", " WeChat 微信 1\n", "Algeria YouTube 42\n", " Facebook 41\n", " Twitter 14\n", " LinkedIn 9\n", " I don't use social media 7\n", " Instagram 7\n", " Reddit 2\n", " WhatsApp 2\n", "Andorra Facebook 2\n", " I don't use social media 2\n", " Reddit 1\n", " Snapchat 1\n", " YouTube 1\n", "Angola Twitter 2\n", " Facebook 1\n", " YouTube 1\n", "Antigua and Barbuda YouTube 2\n", " Facebook 1\n", " Instagram 1\n", " Twitter 1\n", " VK ВКонта́кте 1\n", "Argentina WhatsApp 202\n", " Twitter 80\n", " YouTube 62\n", " Instagram 53\n", " Reddit 50\n", " Facebook 45\n", " I don't use social media 22\n", " LinkedIn 16\n", "Armenia Facebook 38\n", " YouTube 23\n", " LinkedIn 5\n", " Instagram 3\n", " Reddit 3\n", " Twitter 3\n", " WhatsApp 3\n", " VK ВКонта́кте 2\n", " I don't use social media 1\n", "Australia Reddit 407\n", " Facebook 320\n", " Twitter 283\n", " YouTube 242\n", " I don't use social media 156\n", "Name: SocialMedia, dtype: int64" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('Country')['SocialMedia'].value_counts().head(60)" ] }, { "cell_type": "code", "execution_count": null, "id": "f8e521de", "metadata": {}, "outputs": [], "source": [ "df.groupby('Country')['SocialMedia'].value_counts().head(50)" ] }, { "cell_type": "code", "execution_count": null, "id": "4640ef13", "metadata": {}, "outputs": [], "source": [ "df.groupby('Country')['SocialMedia'].value_counts().head(50)" ] }, { "cell_type": "code", "execution_count": null, "id": "fefb673d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "d8f5efc5", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "a98c714e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "df115e7a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "45db2fba", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "f18c4077", "metadata": {}, "source": [ "## Question 2: \n", ">**What percentage of people in each country knows Python programming?**" ] }, { "cell_type": "markdown", "id": "128cbeb8", "metadata": {}, "source": [ "**tc** = Total count of people from each country who participated in the survey?\n", "\n", "**pc** = Python People: Count of people from each country who knows Python" ] }, { "cell_type": "markdown", "id": "045b2e32", "metadata": {}, "source": [ "**tc (option 1):**" ] }, { "cell_type": "code", "execution_count": 55, "id": "fef318d9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MainBranchHobbyistCountryYearsCodeConvertedCompLanguageWorkedWithSocialMediaAgeGender
Respondent
1I am a student who is learning to codeYesUnited Kingdom4NaNHTML/CSS;Java;JavaScript;PythonTwitter14.0Man
2I am a student who is learning to codeNoBosnia and HerzegovinaNaNNaNC++;HTML/CSS;PythonInstagram19.0Man
3I am not primarily a developer, but I write co...YesThailand38820.0HTML/CSSReddit28.0Man
4I am a developer by professionNoUnited States361000.0C;C++;C#;Python;SQLReddit22.0Man
5I am a developer by professionYesUkraine16NaNC++;HTML/CSS;Java;JavaScript;Python;SQL;VBAFacebook30.0Man
..............................
88377NaNYesCanadaNaNNaNHTML/CSS;JavaScript;Other(s):YouTubeNaNMan
88601NaNNoNaNNaNNaNNaNNaNNaNNaN
88802NaNNoNaNNaNNaNNaNNaNNaNNaN
88816NaNNoNaNNaNNaNNaNNaNNaNNaN
88863NaNYesSpain8NaNBash/Shell/PowerShell;HTML/CSS;Java;JavaScript...WhatsApp18.0Man
\n", "

88883 rows × 9 columns

\n", "
" ], "text/plain": [ " MainBranch Hobbyist \\\n", "Respondent \n", "1 I am a student who is learning to code Yes \n", "2 I am a student who is learning to code No \n", "3 I am not primarily a developer, but I write co... Yes \n", "4 I am a developer by profession No \n", "5 I am a developer by profession Yes \n", "... ... ... \n", "88377 NaN Yes \n", "88601 NaN No \n", "88802 NaN No \n", "88816 NaN No \n", "88863 NaN Yes \n", "\n", " Country YearsCode ConvertedComp \\\n", "Respondent \n", "1 United Kingdom 4 NaN \n", "2 Bosnia and Herzegovina NaN NaN \n", "3 Thailand 3 8820.0 \n", "4 United States 3 61000.0 \n", "5 Ukraine 16 NaN \n", "... ... ... ... \n", "88377 Canada NaN NaN \n", "88601 NaN NaN NaN \n", "88802 NaN NaN NaN \n", "88816 NaN NaN NaN \n", "88863 Spain 8 NaN \n", "\n", " LanguageWorkedWith SocialMedia \\\n", "Respondent \n", "1 HTML/CSS;Java;JavaScript;Python Twitter \n", "2 C++;HTML/CSS;Python Instagram \n", "3 HTML/CSS Reddit \n", "4 C;C++;C#;Python;SQL Reddit \n", "5 C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA Facebook \n", "... ... ... \n", "88377 HTML/CSS;JavaScript;Other(s): YouTube \n", "88601 NaN NaN \n", "88802 NaN NaN \n", "88816 NaN NaN \n", "88863 Bash/Shell/PowerShell;HTML/CSS;Java;JavaScript... WhatsApp \n", "\n", " Age Gender \n", "Respondent \n", "1 14.0 Man \n", "2 19.0 Man \n", "3 28.0 Man \n", "4 22.0 Man \n", "5 30.0 Man \n", "... ... ... \n", "88377 NaN Man \n", "88601 NaN NaN \n", "88802 NaN NaN \n", "88816 NaN NaN \n", "88863 18.0 Man \n", "\n", "[88883 rows x 9 columns]" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 56, "id": "b61430a3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Respondent\n", "1 United Kingdom\n", "2 Bosnia and Herzegovina\n", "3 Thailand\n", "4 United States\n", "5 Ukraine\n", " ... \n", "88377 Canada\n", "88601 NaN\n", "88802 NaN\n", "88816 NaN\n", "88863 Spain\n", "Name: Country, Length: 88883, dtype: object" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[:, 'Country']" ] }, { "cell_type": "code", "execution_count": 57, "id": "a61e0e18", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "United States 20949\n", "India 9061\n", "Germany 5866\n", "United Kingdom 5737\n", "Canada 3395\n", " ... \n", "Tonga 1\n", "Timor-Leste 1\n", "North Korea 1\n", "Brunei Darussalam 1\n", "Chad 1\n", "Name: Total, Length: 179, dtype: int64" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tc = df['Country'].value_counts()\n", "tc.name = 'Total'\n", "tc" ] }, { "cell_type": "code", "execution_count": null, "id": "00634605", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "3112db44", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "36b6d5e0", "metadata": {}, "source": [ "**tc (option 2):**" ] }, { "cell_type": "code", "execution_count": 58, "id": "64cf9c86", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfgb = df.groupby('Country')\n", "dfgb" ] }, { "cell_type": "code", "execution_count": 59, "id": "b2fca694", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('Country')['Country']" ] }, { "cell_type": "code", "execution_count": 60, "id": "9f6901aa", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Country \n", "Afghanistan Afghanistan 44\n", "Albania Albania 86\n", "Algeria Algeria 134\n", "Andorra Andorra 7\n", "Angola Angola 5\n", " ... \n", "Venezuela, Bolivarian Republic of... Venezuela, Bolivarian Republic of... 88\n", "Viet Nam Viet Nam 231\n", "Yemen Yemen 19\n", "Zambia Zambia 12\n", "Zimbabwe Zimbabwe 39\n", "Name: Country, Length: 179, dtype: int64" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('Country')['Country'].apply(lambda x: x.value_counts())" ] }, { "cell_type": "code", "execution_count": null, "id": "acbcaa82", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "684d3e07", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "aab00ae8", "metadata": {}, "source": [ "**pc:**" ] }, { "cell_type": "code", "execution_count": 61, "id": "d0066ac5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Respondent\n", "1 HTML/CSS;Java;JavaScript;Python\n", "2 C++;HTML/CSS;Python\n", "3 HTML/CSS\n", "4 C;C++;C#;Python;SQL\n", "5 C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA\n", " ... \n", "88377 HTML/CSS;JavaScript;Other(s):\n", "88601 NaN\n", "88802 NaN\n", "88816 NaN\n", "88863 Bash/Shell/PowerShell;HTML/CSS;Java;JavaScript...\n", "Name: LanguageWorkedWith, Length: 88883, dtype: object" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[:, 'LanguageWorkedWith']" ] }, { "cell_type": "code", "execution_count": 62, "id": "db814708", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('Country')['LanguageWorkedWith']" ] }, { "cell_type": "code", "execution_count": 63, "id": "61ef5d9b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Respondent\n", "1 True\n", "2 True\n", "3 False\n", "4 True\n", "5 True\n", " ... \n", "88182 False\n", "88212 True\n", "88282 False\n", "88377 False\n", "88863 False\n", "Name: LanguageWorkedWith, Length: 88751, dtype: object" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('Country')['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python'))" ] }, { "cell_type": "code", "execution_count": 64, "id": "9b3e1b22", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Country\n", "Afghanistan 8\n", "Albania 23\n", "Algeria 40\n", "Andorra 0\n", "Angola 2\n", " ..\n", "Venezuela, Bolivarian Republic of... 28\n", "Viet Nam 78\n", "Yemen 3\n", "Zambia 4\n", "Zimbabwe 14\n", "Name: LanguageWorkedWith, Length: 179, dtype: int64" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pp = df.groupby('Country')['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum())\n", "pp" ] }, { "cell_type": "code", "execution_count": 68, "id": "26205a53", "metadata": {}, "outputs": [], "source": [ "pp.name = 'Knows Python'" ] }, { "cell_type": "code", "execution_count": null, "id": "09886e0f", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "6954a136", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "ced1a4b0", "metadata": {}, "source": [ "**Create a Dataframe of two series tc and pp**" ] }, { "cell_type": "code", "execution_count": 69, "id": "989aec12", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TotalKnows Python
United States2094910083
India90613105
Germany58662451
United Kingdom57372384
Canada33951558
.........
Tonga10
Timor-Leste11
North Korea10
Brunei Darussalam10
Chad10
\n", "

179 rows × 2 columns

\n", "
" ], "text/plain": [ " Total Knows Python\n", "United States 20949 10083\n", "India 9061 3105\n", "Germany 5866 2451\n", "United Kingdom 5737 2384\n", "Canada 3395 1558\n", "... ... ...\n", "Tonga 1 0\n", "Timor-Leste 1 1\n", "North Korea 1 0\n", "Brunei Darussalam 1 0\n", "Chad 1 0\n", "\n", "[179 rows x 2 columns]" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "resultdf = pd.concat([tc, pp], axis=1)\n", "resultdf" ] }, { "cell_type": "code", "execution_count": 70, "id": "df17703c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Total 923\n", "Knows Python 251\n", "Name: Pakistan, dtype: int64" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "resultdf.loc['Pakistan']" ] }, { "cell_type": "code", "execution_count": null, "id": "c670b7e9", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "2891419b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "e96094bb", "metadata": {}, "source": [ "**Percentage of people in each country knows Python?**" ] }, { "cell_type": "code", "execution_count": 71, "id": "d09f7a77", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TotalKnows PythonPercentage
United States209491008348.131176
India9061310534.267741
Germany5866245141.783157
United Kingdom5737238441.554820
Canada3395155845.891016
............
Tonga100.000000
Timor-Leste11100.000000
North Korea100.000000
Brunei Darussalam100.000000
Chad100.000000
\n", "

179 rows × 3 columns

\n", "
" ], "text/plain": [ " Total Knows Python Percentage\n", "United States 20949 10083 48.131176\n", "India 9061 3105 34.267741\n", "Germany 5866 2451 41.783157\n", "United Kingdom 5737 2384 41.554820\n", "Canada 3395 1558 45.891016\n", "... ... ... ...\n", "Tonga 1 0 0.000000\n", "Timor-Leste 1 1 100.000000\n", "North Korea 1 0 0.000000\n", "Brunei Darussalam 1 0 0.000000\n", "Chad 1 0 0.000000\n", "\n", "[179 rows x 3 columns]" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "resultdf['Percentage'] = (resultdf['Knows Python'] / resultdf['Total']) * 100\n", "resultdf" ] }, { "cell_type": "code", "execution_count": null, "id": "7530f928", "metadata": {}, "outputs": [], "source": [ "resultdf.loc['Pakistan']" ] }, { "cell_type": "code", "execution_count": null, "id": "b72302eb", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 5 }