{ "cells": [ { "cell_type": "markdown", "id": "4a87b5ef", "metadata": {}, "source": [ "--- \n", " \n", "\n", "

Department of Data Science

\n", "

Course: Tools and Techniques for Data Science

\n", "\n", "---\n", "

Instructor: Muhammad Arif Butt, Ph.D.

" ] }, { "cell_type": "markdown", "id": "ab0dc25c", "metadata": {}, "source": [ "

Lecture 3.13 (Pandas-05)

" ] }, { "cell_type": "markdown", "id": "49aec3ca", "metadata": {}, "source": [ "\"Open" ] }, { "cell_type": "markdown", "id": "19f82705", "metadata": {}, "source": [ "## _Indexing, Subsetting and Slicing Dataframes.ipynb_" ] }, { "cell_type": "markdown", "id": "328bd879", "metadata": {}, "source": [ "## Motivation:\n", "- The ability to select specific rows and columns to access and filter data based on specific conditions are two of the key features of Pandas.\n", " - **Selection** allows you to access specific rows or columns (a subset) of the data by their index and/or location in the DataFrame\n", " - In large datasets, you may be required to select the first/last N records\n", " - In large datasets, you may be required to select a range (n to m) of records\n", " - In large datasets, you may be required to select specific columns of your interest\n", " - In large datasets, you may be required to select specific range and specific columns of your interest\n", " - **Filtering** allows you to access specific rows or columns (a subset) of the data based on one or more conditions\n", " - In a medical dataset, you may be required to filter record of all those patients who suffer with a specific disease, or who have a specific blood group\n", " - In a medical dataset, you may be required to filter pregnant women who have anemia, and compare this subset to women who don’t have anemia.\n", " - In a travel dataset, you may be required to filter hotels inside Lahore city, sorted by their minimum per day cost\n", " - In a client dataset, you may be required filter the clients who use a Gmail account(may require a string filter)\n", " - In a client dataset, you may be required to filter the clients who belong to a specific countries (may require use of .isin() function)" ] }, { "cell_type": "markdown", "id": "12db95e1", "metadata": {}, "source": [ "## Learning agenda of this notebook\n", "1. Understanding Indices of a Dataframe\n", " - Understand the Dataset\n", " - Changing the Column Indices of a Dataframe\n", " - Changing the Row Indices of a Dataframe\n", "2. Selecting Row(s) and Column(s) of a Dataframe using `df[]` \n", "3. Selecting Rows and Columns using `iloc` Method\n", "4. Selecting Rows and Columns using `loc` Method\n", "5. Conditional Selection \n", "6. Selecting columns of a specific data type\n" ] }, { "cell_type": "code", "execution_count": null, "id": "9ea649a9", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "8f71cf0e", "metadata": {}, "source": [ "## 1. Understanding Indices of a Dataframe\n", "\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "id": "cae470d2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "1d2bba7a", "metadata": {}, "source": [ "### a. Understand the Dataset\n", "- Let us first understand the dataframe on which we are going to work in today's notebook" ] }, { "cell_type": "code", "execution_count": 52, "id": "3a6a3d25", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
0MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
1MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
2MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
3MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
4MS05Zara40PeshawerAFTgroup DFemale65.972.83500.0
5MS06Mohid16LahoreMORNINGgroup CFemale69.378.6NaN
6MS07Zobia40SialkotAFTgroup BFemale90.2NaN4000.0
7MS08Idrees51MultanMORNINGgroup DMale84.176.08000.0
8MS09Jamil53KarachiAFTgroup CMale90.581.33500.0
9MS10Shahid38LahoreAFTERNOONgroup DMale90.581.33800.0
10MS11Khurram35IslamabadMORgroup BMale90.581.36000.0
11MS12Maaz25KarachiAFTERNOONgroup CMale90.581.3NaN
12MS13Mujahid18LahoreMORNINGgroup DMaleNaN76.57000.0
13MS14Sara28MultanAFTERNOONgroup AFemale84.176.08000.0
14MS15Fatima33SialkotAFTgroup CFemale90.581.33500.0
15MS16Kakamanna42MultanAFTERNOONgroup AMale90.581.33800.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 \\\n", "0 MS01 Rauf 52 Lahore MORNING group C Male 78.3 \n", "1 MS02 Arif 51 Islamabad AFT group A Male 70.5 \n", "2 MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 \n", "3 MS04 Hadeed 20 Lahore MOR group A Male 82.0 \n", "4 MS05 Zara 40 Peshawer AFT group D Female 65.9 \n", "5 MS06 Mohid 16 Lahore MORNING group C Female 69.3 \n", "6 MS07 Zobia 40 Sialkot AFT group B Female 90.2 \n", "7 MS08 Idrees 51 Multan MORNING group D Male 84.1 \n", "8 MS09 Jamil 53 Karachi AFT group C Male 90.5 \n", "9 MS10 Shahid 38 Lahore AFTERNOON group D Male 90.5 \n", "10 MS11 Khurram 35 Islamabad MOR group B Male 90.5 \n", "11 MS12 Maaz 25 Karachi AFTERNOON group C Male 90.5 \n", "12 MS13 Mujahid 18 Lahore MORNING group D Male NaN \n", "13 MS14 Sara 28 Multan AFTERNOON group A Female 84.1 \n", "14 MS15 Fatima 33 Sialkot AFT group C Female 90.5 \n", "15 MS16 Kakamanna 42 Multan AFTERNOON group A Male 90.5 \n", "\n", " subj2 scholarship \n", "0 84.4 5000.0 \n", "1 60.5 6000.0 \n", "2 75.1 8500.0 \n", "3 84.3 4000.0 \n", "4 72.8 3500.0 \n", "5 78.6 NaN \n", "6 NaN 4000.0 \n", "7 76.0 8000.0 \n", "8 81.3 3500.0 \n", "9 81.3 3800.0 \n", "10 81.3 6000.0 \n", "11 81.3 NaN \n", "12 76.5 7000.0 \n", "13 76.0 8000.0 \n", "14 81.3 3500.0 \n", "15 81.3 3800.0 " ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "df = pd.read_csv('datasets/groupdata.csv')\n", "df" ] }, { "cell_type": "code", "execution_count": 2, "id": "e67be45b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(16, 10)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# `shape` attribute of a dataframe object return a two value tuple containing rows and columns\n", "# Note the rows count does not include the column labels and column count does not include the row index\n", "df.shape" ] }, { "cell_type": "code", "execution_count": null, "id": "575dee99", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 53, "id": "8c00cc7d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RangeIndex(start=0, stop=16, step=1)" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# `index` attribute of a dataframe object return the list of row indices and its datatype\n", "df.index" ] }, { "cell_type": "code", "execution_count": null, "id": "9219f240", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 54, "id": "b8a7fc1b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['roll no', 'name', 'age', 'address', 'session', 'group', 'gender',\n", " 'subj1', 'subj2', 'scholarship'],\n", " dtype='object')" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# `columns` attribute of a dataframe object return the list of column labels and its datatype\n", "df.columns" ] }, { "cell_type": "code", "execution_count": null, "id": "782d6635", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 55, "id": "899d087f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "roll no object\n", "name object\n", "age int64\n", "address object\n", "session object\n", "group object\n", "gender object\n", "subj1 float64\n", "subj2 float64\n", "scholarship float64\n", "dtype: object" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# `dtypes` attribute of a dataframe object return the data type of each column in the dataframe\n", "df.dtypes" ] }, { "cell_type": "code", "execution_count": null, "id": "370b4796", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 56, "id": "edc86d22", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 16 entries, 0 to 15\n", "Data columns (total 10 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 roll no 16 non-null object \n", " 1 name 16 non-null object \n", " 2 age 16 non-null int64 \n", " 3 address 16 non-null object \n", " 4 session 16 non-null object \n", " 5 group 16 non-null object \n", " 6 gender 16 non-null object \n", " 7 subj1 15 non-null float64\n", " 8 subj2 15 non-null float64\n", " 9 scholarship 14 non-null float64\n", "dtypes: float64(3), int64(1), object(6)\n", "memory usage: 1.4+ KB\n" ] } ], "source": [ "#This method prints information about a DataFrame including the row indices, column labels, \n", "# non-null values count in each column, datatype and memory usage\n", "df.info()" ] }, { "cell_type": "code", "execution_count": null, "id": "7262d057", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 58, "id": "1a1b03fb", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
count161616.0000001616161615.00000015.00000014.000000
unique1616NaN6442NaNNaNNaN
topMS01RaufNaNLahoreAFTgroup CMaleNaNNaNNaN
freq11NaN55510NaNNaNNaN
meanNaNNaN36.062500NaNNaNNaNNaN82.15333378.1333335328.571429
stdNaNNaN12.228212NaNNaNNaNNaN9.8852615.9643781888.600854
minNaNNaN16.000000NaNNaNNaNNaN64.90000060.5000003500.000000
25%NaNNaN27.250000NaNNaNNaNNaN74.40000076.0000003800.000000
50%NaNNaN36.500000NaNNaNNaNNaN84.10000081.3000004500.000000
75%NaNNaN44.250000NaNNaNNaNNaN90.50000081.3000006750.000000
maxNaNNaN53.000000NaNNaNNaNNaN90.50000084.4000008500.000000
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 \\\n", "count 16 16 16.000000 16 16 16 16 15.000000 \n", "unique 16 16 NaN 6 4 4 2 NaN \n", "top MS01 Rauf NaN Lahore AFT group C Male NaN \n", "freq 1 1 NaN 5 5 5 10 NaN \n", "mean NaN NaN 36.062500 NaN NaN NaN NaN 82.153333 \n", "std NaN NaN 12.228212 NaN NaN NaN NaN 9.885261 \n", "min NaN NaN 16.000000 NaN NaN NaN NaN 64.900000 \n", "25% NaN NaN 27.250000 NaN NaN NaN NaN 74.400000 \n", "50% NaN NaN 36.500000 NaN NaN NaN NaN 84.100000 \n", "75% NaN NaN 44.250000 NaN NaN NaN NaN 90.500000 \n", "max NaN NaN 53.000000 NaN NaN NaN NaN 90.500000 \n", "\n", " subj2 scholarship \n", "count 15.000000 14.000000 \n", "unique NaN NaN \n", "top NaN NaN \n", "freq NaN NaN \n", "mean 78.133333 5328.571429 \n", "std 5.964378 1888.600854 \n", "min 60.500000 3500.000000 \n", "25% 76.000000 3800.000000 \n", "50% 81.300000 4500.000000 \n", "75% 81.300000 6750.000000 \n", "max 84.400000 8500.000000 " ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# If no argument is passed, this method displays descriptive statistics about the numeric columns of the dataframe\n", "df.describe(include='all')" ] }, { "cell_type": "code", "execution_count": null, "id": "f685f94c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "0de5cdd8", "metadata": {}, "source": [ "### b. Changing the Column Indices/Labels of a Datarame\n", "- Every dataframe has column labels associated with its columns\n", "- These by default are integer values from 0,1,2,3...\n", "- However, while creating a dataframe from scratch, or while reading them from a file you can set them to more meaningful string values.\n", "- While reading from csv file the first row in the file is taken as the column labels\n", "- We can change the column labels, if we want\n", "- Let us practically see this for better understanding" ] }, { "cell_type": "code", "execution_count": 59, "id": "3e29d2a9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
0MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
1MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
2MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
3MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
4MS05Zara40PeshawerAFTgroup DFemale65.972.83500.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "0 MS01 Rauf 52 Lahore MORNING group C Male 78.3 84.4 \n", "1 MS02 Arif 51 Islamabad AFT group A Male 70.5 60.5 \n", "2 MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 75.1 \n", "3 MS04 Hadeed 20 Lahore MOR group A Male 82.0 84.3 \n", "4 MS05 Zara 40 Peshawer AFT group D Female 65.9 72.8 \n", "\n", " scholarship \n", "0 5000.0 \n", "1 6000.0 \n", "2 8500.0 \n", "3 4000.0 \n", "4 3500.0 " ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.read_csv('datasets/groupdata.csv')\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 60, "id": "b1224d79", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MS01Rauf52LahoreMORNINGgroup CMale78.384.45000
0MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
1MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
2MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
3MS05Zara40PeshawerAFTgroup DFemale65.972.83500.0
4MS06Mohid16LahoreMORNINGgroup CFemale69.378.6NaN
\n", "
" ], "text/plain": [ " MS01 Rauf 52 Lahore MORNING group C Male 78.3 84.4 \\\n", "0 MS02 Arif 51 Islamabad AFT group A Male 70.5 60.5 \n", "1 MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 75.1 \n", "2 MS04 Hadeed 20 Lahore MOR group A Male 82.0 84.3 \n", "3 MS05 Zara 40 Peshawer AFT group D Female 65.9 72.8 \n", "4 MS06 Mohid 16 Lahore MORNING group C Female 69.3 78.6 \n", "\n", " 5000 \n", "0 6000.0 \n", "1 8500.0 \n", "2 4000.0 \n", "3 3500.0 \n", "4 NaN " ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv('datasets/groupdatawithoutcollables.csv')\n", "df.head()\n" ] }, { "cell_type": "code", "execution_count": 61, "id": "0daf15a9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456789
0MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
1MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
2MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
3MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
4MS05Zara40PeshawerAFTgroup DFemale65.972.83500.0
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 7 8 \\\n", "0 MS01 Rauf 52 Lahore MORNING group C Male 78.3 84.4 \n", "1 MS02 Arif 51 Islamabad AFT group A Male 70.5 60.5 \n", "2 MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 75.1 \n", "3 MS04 Hadeed 20 Lahore MOR group A Male 82.0 84.3 \n", "4 MS05 Zara 40 Peshawer AFT group D Female 65.9 72.8 \n", "\n", " 9 \n", "0 5000.0 \n", "1 6000.0 \n", "2 8500.0 \n", "3 4000.0 \n", "4 3500.0 " ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# To read such files, you have to pass the parameter `header=None` to the `read_csv()` method\n", "df = pd.read_csv('datasets/groupdatawithoutcollables.csv', header=None)\n", "df.head()\n" ] }, { "cell_type": "markdown", "id": "3488b6fe", "metadata": {}, "source": [ ">Let us suppose we have above dataframe, in which the column indices are just integer values associated with the position of every column. We want to assign some meaningful names to the columns for better understanding. There are many options or ways to do that." ] }, { "cell_type": "code", "execution_count": null, "id": "7d06d188", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "cd16b0a0", "metadata": {}, "source": [ "**Changing Column IndicesLabels:** Assign a list of column labels to the `columns` attribute of dataframe" ] }, { "cell_type": "code", "execution_count": 62, "id": "ee7c1175", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
0MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
1MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
2MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
3MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
4MS05Zara40PeshawerAFTgroup DFemale65.972.83500.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "0 MS01 Rauf 52 Lahore MORNING group C Male 78.3 84.4 \n", "1 MS02 Arif 51 Islamabad AFT group A Male 70.5 60.5 \n", "2 MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 75.1 \n", "3 MS04 Hadeed 20 Lahore MOR group A Male 82.0 84.3 \n", "4 MS05 Zara 40 Peshawer AFT group D Female 65.9 72.8 \n", "\n", " scholarship \n", "0 5000.0 \n", "1 6000.0 \n", "2 8500.0 \n", "3 4000.0 \n", "4 3500.0 " ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "col_names = ['roll no', 'name', 'age', 'address', 'session', 'group', 'gender', 'subj1', 'subj2', 'scholarship']\n", "df.columns = col_names\n", "df.head()" ] }, { "cell_type": "markdown", "id": "4f5e22ca", "metadata": {}, "source": [ ">Note that in the above dataframe, first column name has a space, which is a bit difficult to use sometimes, so if you want to change value of a specific column label, you can use the `df.rename()` method" ] }, { "cell_type": "code", "execution_count": 63, "id": "44c77040", "metadata": {}, "outputs": [], "source": [ "# You pass a dictionary object to the columns argument to rename() method\n", "# The key is the old column name, while the value is the new column name\n", "df1 = df.rename(columns={'roll no': 'rollno'}, inplace=False)" ] }, { "cell_type": "code", "execution_count": 64, "id": "d4a2bad3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rollnonameageaddresssessiongroupgendersubj1subj2scholarship
0MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
1MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
2MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
\n", "
" ], "text/plain": [ " rollno name age address session group gender subj1 subj2 \\\n", "0 MS01 Rauf 52 Lahore MORNING group C Male 78.3 84.4 \n", "1 MS02 Arif 51 Islamabad AFT group A Male 70.5 60.5 \n", "2 MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 75.1 \n", "\n", " scholarship \n", "0 5000.0 \n", "1 6000.0 \n", "2 8500.0 " ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.head(3)" ] }, { "cell_type": "markdown", "id": "d365bf55", "metadata": {}, "source": [ ">Last but not the least, another way is to assign appropriate column labels to your dataframes by passing a list of column names to the `names` argument of the `df.read_csv()` method. Do it at your own :)" ] }, { "cell_type": "code", "execution_count": null, "id": "84643aca", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "2c3c30ba", "metadata": {}, "source": [ "### c. Changing the Row Indices/Labels of a Dataframe\n", "- Every dataframe has row index associated with its rows\n", "- These by default are integer values from 0,1,2,3...\n", "- However, while creating a dataframe from scratch you may set them to some meaningful string values (seldom required).\n", "- We have already seen this in our previous session\n", "- Today, we will see two methods that work on row indices of a Pandas Dataframe named `df.set_index()` and `df.reset_index()`" ] }, { "cell_type": "code", "execution_count": 65, "id": "320a1817", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
0MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
1MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
2MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "0 MS01 Rauf 52 Lahore MORNING group C Male 78.3 84.4 \n", "1 MS02 Arif 51 Islamabad AFT group A Male 70.5 60.5 \n", "2 MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 75.1 \n", "\n", " scholarship \n", "0 5000.0 \n", "1 6000.0 \n", "2 8500.0 " ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Let us load the dataset again into dataframe\n", "import pandas as pd\n", "df = pd.read_csv('datasets/groupdata.csv')\n", "df.head(3)" ] }, { "cell_type": "code", "execution_count": 66, "id": "7ccd48bb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RangeIndex(start=0, stop=16, step=1)" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.index" ] }, { "cell_type": "markdown", "id": "34a875f9", "metadata": {}, "source": [ ">Let us suppose we have above dataframe, in which the row indices are just integer values associated with the position of every row. We want to assign some meaningful indices to the rows for better understanding. Suppose, we want to set values of the column rollno as index of this dataframe. So we donot want positional indices rather want some meaningful string indices, which are roll numbers of students in this case. " ] }, { "cell_type": "markdown", "id": "b8df0c60", "metadata": {}, "source": [ "**Changing Row Indices:** The `df.set_index()` method can be used to change row index of a dataframe using an existing column(s)\n", "`df.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)`\n", "\n", "- Where\n", " - `keys` is the column label \n", " - `drop=True`, to drop the column from data part of dataframe that is being used as column index now\n", " - `append=False`, Set it to True if you want to append columns to existing index\n", " - `inplace=False`, Set it to True to make changes in the original dataframe, i.e., do not create a new object\n", " - `verify_integrity=False`, Set it to True to check the new index for duplicates. Default value of False will improve the performance of this method.\n", "\n", "Returns Dataframe if `inplace=False` or None if `inplace=True`" ] }, { "cell_type": "code", "execution_count": 67, "id": "0c6413f7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
roll no
MS01MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
MS02MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
MS03MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 \\\n", "roll no \n", "MS01 MS01 Rauf 52 Lahore MORNING group C Male 78.3 \n", "MS02 MS02 Arif 51 Islamabad AFT group A Male 70.5 \n", "MS03 MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 \n", "\n", " subj2 scholarship \n", "roll no \n", "MS01 84.4 5000.0 \n", "MS02 60.5 6000.0 \n", "MS03 75.1 8500.0 " ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = df.set_index(keys='roll no', drop=False)\n", "df1.head(3)" ] }, { "cell_type": "markdown", "id": "435b624c", "metadata": {}, "source": [ "Note: The rollno column still exist as part of the dataframe. To drop it set `drop=True`" ] }, { "cell_type": "code", "execution_count": null, "id": "e3c2ba53", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 68, "id": "ab6d2fe3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameageaddresssessiongroupgendersubj1subj2scholarship
roll no
MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
\n", "
" ], "text/plain": [ " name age address session group gender subj1 subj2 \\\n", "roll no \n", "MS01 Rauf 52 Lahore MORNING group C Male 78.3 84.4 \n", "MS02 Arif 51 Islamabad AFT group A Male 70.5 60.5 \n", "MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 75.1 \n", "\n", " scholarship \n", "roll no \n", "MS01 5000.0 \n", "MS02 6000.0 \n", "MS03 8500.0 " ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = df.set_index(keys='roll no', drop=True)\n", "df1.head(3)" ] }, { "cell_type": "markdown", "id": "042e1a5f", "metadata": {}, "source": [ "Another point to note is that no change has been made to the original dataframe since `inplace` arguement is by default False. Let us verify this" ] }, { "cell_type": "code", "execution_count": 69, "id": "7a7e055a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
0MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
1MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
2MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
3MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
4MS05Zara40PeshawerAFTgroup DFemale65.972.83500.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "0 MS01 Rauf 52 Lahore MORNING group C Male 78.3 84.4 \n", "1 MS02 Arif 51 Islamabad AFT group A Male 70.5 60.5 \n", "2 MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 75.1 \n", "3 MS04 Hadeed 20 Lahore MOR group A Male 82.0 84.3 \n", "4 MS05 Zara 40 Peshawer AFT group D Female 65.9 72.8 \n", "\n", " scholarship \n", "0 5000.0 \n", "1 6000.0 \n", "2 8500.0 \n", "3 4000.0 \n", "4 3500.0 " ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "id": "7089ff61", "metadata": {}, "source": [ "Instead of returning a dataframe, the `df.set_index()` method can change the column index inplace. Let us do that now" ] }, { "cell_type": "code", "execution_count": 70, "id": "6c3de2ed", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameageaddresssessiongroupgendersubj1subj2scholarship
roll no
MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
MS05Zara40PeshawerAFTgroup DFemale65.972.83500.0
\n", "
" ], "text/plain": [ " name age address session group gender subj1 subj2 \\\n", "roll no \n", "MS01 Rauf 52 Lahore MORNING group C Male 78.3 84.4 \n", "MS02 Arif 51 Islamabad AFT group A Male 70.5 60.5 \n", "MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 75.1 \n", "MS04 Hadeed 20 Lahore MOR group A Male 82.0 84.3 \n", "MS05 Zara 40 Peshawer AFT group D Female 65.9 72.8 \n", "\n", " scholarship \n", "roll no \n", "MS01 5000.0 \n", "MS02 6000.0 \n", "MS03 8500.0 \n", "MS04 4000.0 \n", "MS05 3500.0 " ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.set_index(keys='roll no', drop=True, inplace=True)\n", "df.head()" ] }, { "cell_type": "markdown", "id": "79628642", "metadata": {}, "source": [ ">Note that the roll no column has become the index now, it is no more data of the dataframe" ] }, { "cell_type": "code", "execution_count": 71, "id": "951244e1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['MS01', 'MS02', 'MS03', 'MS04', 'MS05', 'MS06', 'MS07', 'MS08', 'MS09',\n", " 'MS10', 'MS11', 'MS12', 'MS13', 'MS14', 'MS15', 'MS16'],\n", " dtype='object', name='roll no')" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.index" ] }, { "cell_type": "code", "execution_count": null, "id": "e5463f8f", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "6a6e75bc", "metadata": {}, "source": [ "**Changing Row Indices Back to Positional:** The `df.reset_index()` method is used to reset the row index of the dataframe back to positional integer indices\n", "\n", "`df.reset_index(drop=False, inplace=False)`\n", "Where\n", "- `drop=False`, Do not try to insert index into dataframe columns. This resets the index to the default integer index.\n", "- `inplace=False`, Modify the DataFrame in place (do not create a new object).\n", "\n", "Returns Dataframe if `inplace=False` or None if `inplace=True`" ] }, { "cell_type": "code", "execution_count": 73, "id": "e210fbb9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameageaddresssessiongroupgendersubj1subj2scholarship
roll no
MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
MS05Zara40PeshawerAFTgroup DFemale65.972.83500.0
\n", "
" ], "text/plain": [ " name age address session group gender subj1 subj2 \\\n", "roll no \n", "MS01 Rauf 52 Lahore MORNING group C Male 78.3 84.4 \n", "MS02 Arif 51 Islamabad AFT group A Male 70.5 60.5 \n", "MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 75.1 \n", "MS04 Hadeed 20 Lahore MOR group A Male 82.0 84.3 \n", "MS05 Zara 40 Peshawer AFT group D Female 65.9 72.8 \n", "\n", " scholarship \n", "roll no \n", "MS01 5000.0 \n", "MS02 6000.0 \n", "MS03 8500.0 \n", "MS04 4000.0 \n", "MS05 3500.0 " ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 74, "id": "f7b5e6d4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(16, 9)" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.shape" ] }, { "cell_type": "code", "execution_count": 75, "id": "8aee9dd3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RangeIndex(start=0, stop=16, step=1)" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# reset the index\n", "df2 = df.reset_index()\n", "df2.index" ] }, { "cell_type": "code", "execution_count": 76, "id": "4704e4f6", "metadata": { "colab": {}, "colab_type": "code", "id": "054SCKpufWir", "outputId": "050ced56-1fa7-4c30-cbe7-f2dadbe2d710" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
0MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
1MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
2MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "0 MS01 Rauf 52 Lahore MORNING group C Male 78.3 84.4 \n", "1 MS02 Arif 51 Islamabad AFT group A Male 70.5 60.5 \n", "2 MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 75.1 \n", "\n", " scholarship \n", "0 5000.0 \n", "1 6000.0 \n", "2 8500.0 " ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.head(3)" ] }, { "cell_type": "code", "execution_count": 10, "id": "0db2f2f3", "metadata": {}, "outputs": [], "source": [ "df3 = df2.reset_index()" ] }, { "cell_type": "code", "execution_count": 11, "id": "b7d73dee", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexroll nonameageaddresssessiongroupgendersubj1subj2scholarship
00MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
11MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
22MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
33MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
44MS05Zara40PeshawerAFTgroup DFemale65.972.83500.0
55MS06Mohid16LahoreMORNINGgroup CFemale69.378.6NaN
66MS07Zobia40SialkotAFTgroup BFemale90.2NaN4000.0
77MS08Idrees51MultanMORNINGgroup DMale84.176.08000.0
88MS09Jamil53KarachiAFTgroup CMale90.581.33500.0
99MS10Shahid38LahoreAFTERNOONgroup DMale90.581.33800.0
1010MS11Khurram35IslamabadMORgroup BMale90.581.36000.0
1111MS12Maaz25KarachiAFTERNOONgroup CMale90.581.3NaN
1212MS13Mujahid18LahoreMORNINGgroup DMaleNaN76.57000.0
1313MS14Sara28MultanAFTERNOONgroup AFemale84.176.08000.0
1414MS15Fatima33SialkotAFTgroup CFemale90.581.33500.0
1515MS16Kakamanna42MultanAFTERNOONgroup AMale90.581.33800.0
\n", "
" ], "text/plain": [ " index roll no name age address session group gender \\\n", "0 0 MS01 Rauf 52 Lahore MORNING group C Male \n", "1 1 MS02 Arif 51 Islamabad AFT group A Male \n", "2 2 MS03 Shaista 35 Karachi AFTERNOON group B Female \n", "3 3 MS04 Hadeed 20 Lahore MOR group A Male \n", "4 4 MS05 Zara 40 Peshawer AFT group D Female \n", "5 5 MS06 Mohid 16 Lahore MORNING group C Female \n", "6 6 MS07 Zobia 40 Sialkot AFT group B Female \n", "7 7 MS08 Idrees 51 Multan MORNING group D Male \n", "8 8 MS09 Jamil 53 Karachi AFT group C Male \n", "9 9 MS10 Shahid 38 Lahore AFTERNOON group D Male \n", "10 10 MS11 Khurram 35 Islamabad MOR group B Male \n", "11 11 MS12 Maaz 25 Karachi AFTERNOON group C Male \n", "12 12 MS13 Mujahid 18 Lahore MORNING group D Male \n", "13 13 MS14 Sara 28 Multan AFTERNOON group A Female \n", "14 14 MS15 Fatima 33 Sialkot AFT group C Female \n", "15 15 MS16 Kakamanna 42 Multan AFTERNOON group A Male \n", "\n", " subj1 subj2 scholarship \n", "0 78.3 84.4 5000.0 \n", "1 70.5 60.5 6000.0 \n", "2 64.9 75.1 8500.0 \n", "3 82.0 84.3 4000.0 \n", "4 65.9 72.8 3500.0 \n", "5 69.3 78.6 NaN \n", "6 90.2 NaN 4000.0 \n", "7 84.1 76.0 8000.0 \n", "8 90.5 81.3 3500.0 \n", "9 90.5 81.3 3800.0 \n", "10 90.5 81.3 6000.0 \n", "11 90.5 81.3 NaN \n", "12 NaN 76.5 7000.0 \n", "13 84.1 76.0 8000.0 \n", "14 90.5 81.3 3500.0 \n", "15 90.5 81.3 3800.0 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3" ] }, { "cell_type": "code", "execution_count": null, "id": "36a4b115", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "bdac5b44", "metadata": {}, "source": [ "## 2. Selecting Column(s) and Row(s) of a Dataframe using `df[]` \n", "\n", "\n", "\n", "- Consider this dataframe which is sorted by ages. Note the row indices has been randomized and they do not correspond to positional indices (which we normally visualize as 0,1,2,3...)\n", "- To access column(s) of a dataframe:\n", " - To access single column, mention the column index/label inside `[]`, which in this case are strings, however, can be integer values as well. It will return a new Series object.\n", " - To access multiple columns, pass column indices/labels as a list inside `[]`. It will return a new Dataframe object.\n", "- To access rows(s) of a dataframe: \n", " - Mention the **positional** row indices as a slice object `[start:stop:step]`, (In this case the positional indices do not match with the actual row indices). It will return a new Dataframe object.\n", " - `start`: specifies from where the slicing should start, inclusive (default is 0) \n", " - `stop`: specifies where it has to stop, exclusive (default is end of the array) \n", " - `step`: is by-default 1\n", " \n", " \n", "**Note:** \n", "- You cannot use two subscript operators `df[2][5]` to access a specific element of a dataframe as in case of numPy arrays.\n", "- You cannot get the subset of a dataframe w.r.t rows and columns at the same time using `df[]`, it can either return a subset of columns only or rows only.\n", "- We will soon see the `df.loc[]` and `df.iloc[]` methods that provides simpler, elegant and powerful way to subset a dataframe as compared to `df[]` syntax." ] }, { "cell_type": "code", "execution_count": 4, "id": "563052be", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
5MS06Mohid16LahoreMORNINGgroup CFemale69.378.6NaN
12MS13Mujahid18LahoreMORNINGgroup DMaleNaN76.57000.0
3MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
11MS12Maaz25KarachiAFTERNOONgroup CMale90.581.3NaN
13MS14Sara28MultanAFTERNOONgroup AFemale84.176.08000.0
14MS15Fatima33SialkotAFTgroup CFemale90.581.33500.0
2MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
10MS11Khurram35IslamabadMORgroup BMale90.581.36000.0
9MS10Shahid38LahoreAFTERNOONgroup DMale90.581.33800.0
4MS05Zara40PeshawerAFTgroup DFemale65.972.83500.0
6MS07Zobia40SialkotAFTgroup BFemale90.2NaN4000.0
15MS16Kakamanna42MultanAFTERNOONgroup AMale90.581.33800.0
1MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
7MS08Idrees51MultanMORNINGgroup DMale84.176.08000.0
0MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
8MS09Jamil53KarachiAFTgroup CMale90.581.33500.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 \\\n", "5 MS06 Mohid 16 Lahore MORNING group C Female 69.3 \n", "12 MS13 Mujahid 18 Lahore MORNING group D Male NaN \n", "3 MS04 Hadeed 20 Lahore MOR group A Male 82.0 \n", "11 MS12 Maaz 25 Karachi AFTERNOON group C Male 90.5 \n", "13 MS14 Sara 28 Multan AFTERNOON group A Female 84.1 \n", "14 MS15 Fatima 33 Sialkot AFT group C Female 90.5 \n", "2 MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 \n", "10 MS11 Khurram 35 Islamabad MOR group B Male 90.5 \n", "9 MS10 Shahid 38 Lahore AFTERNOON group D Male 90.5 \n", "4 MS05 Zara 40 Peshawer AFT group D Female 65.9 \n", "6 MS07 Zobia 40 Sialkot AFT group B Female 90.2 \n", "15 MS16 Kakamanna 42 Multan AFTERNOON group A Male 90.5 \n", "1 MS02 Arif 51 Islamabad AFT group A Male 70.5 \n", "7 MS08 Idrees 51 Multan MORNING group D Male 84.1 \n", "0 MS01 Rauf 52 Lahore MORNING group C Male 78.3 \n", "8 MS09 Jamil 53 Karachi AFT group C Male 90.5 \n", "\n", " subj2 scholarship \n", "5 78.6 NaN \n", "12 76.5 7000.0 \n", "3 84.3 4000.0 \n", "11 81.3 NaN \n", "13 76.0 8000.0 \n", "14 81.3 3500.0 \n", "2 75.1 8500.0 \n", "10 81.3 6000.0 \n", "9 81.3 3800.0 \n", "4 72.8 3500.0 \n", "6 NaN 4000.0 \n", "15 81.3 3800.0 \n", "1 60.5 6000.0 \n", "7 76.0 8000.0 \n", "0 84.4 5000.0 \n", "8 81.3 3500.0 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df_sorted" ] }, { "cell_type": "code", "execution_count": 5, "id": "3b45abcc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
5MS06Mohid16LahoreMORNINGgroup CFemale69.378.6NaN
12MS13Mujahid18LahoreMORNINGgroup DMaleNaN76.57000.0
3MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
11MS12Maaz25KarachiAFTERNOONgroup CMale90.581.3NaN
13MS14Sara28MultanAFTERNOONgroup AFemale84.176.08000.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "5 MS06 Mohid 16 Lahore MORNING group C Female 69.3 78.6 \n", "12 MS13 Mujahid 18 Lahore MORNING group D Male NaN 76.5 \n", "3 MS04 Hadeed 20 Lahore MOR group A Male 82.0 84.3 \n", "11 MS12 Maaz 25 Karachi AFTERNOON group C Male 90.5 81.3 \n", "13 MS14 Sara 28 Multan AFTERNOON group A Female 84.1 76.0 \n", "\n", " scholarship \n", "5 NaN \n", "12 7000.0 \n", "3 4000.0 \n", "11 NaN \n", "13 8000.0 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Let us read a fresh dataframe and sort it by age column to have a clear understanding about indexing\n", "df = pd.read_csv('datasets/groupdata.csv')\n", "df_sorted = df.sort_values('age')\n", "df_sorted.head()" ] }, { "cell_type": "code", "execution_count": 6, "id": "4db05f3f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
12MS13Mujahid18LahoreMORNINGgroup DMaleNaN76.57000.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "12 MS13 Mujahid 18 Lahore MORNING group D Male NaN 76.5 \n", "\n", " scholarship \n", "12 7000.0 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted[1:2]" ] }, { "cell_type": "markdown", "id": "b0cf706f", "metadata": {}, "source": [ "**Example 1:** Select the data under the column `name`. Since the column labels are strings, so we mention it in single quotes." ] }, { "cell_type": "code", "execution_count": 13, "id": "786fe173", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5 Mohid\n", "12 Mujahid\n", "3 Hadeed\n", "11 Maaz\n", "13 Sara\n", "14 Fatima\n", "2 Shaista\n", "10 Khurram\n", "9 Shahid\n", "4 Zara\n", "6 Zobia\n", "15 Kakamanna\n", "1 Arif\n", "7 Idrees\n", "0 Rauf\n", "8 Jamil\n", "Name: name, dtype: object\n" ] }, { "data": { "text/plain": [ "pandas.core.series.Series" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s1 = df_sorted['name']\n", "print(s1)\n", "type(s1)" ] }, { "cell_type": "markdown", "id": "bcd12999", "metadata": {}, "source": [ "> The result is a new Series object. Since this is a series, so you can use many of the methods that you can use on Pandas Series and Dataframes, in a chained format, as shown below" ] }, { "cell_type": "code", "execution_count": 14, "id": "9dd6895e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Rauf\n", "1 Arif\n", "2 Shaista\n", "3 Hadeed\n", "4 Zara\n", "Name: name, dtype: object" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s1.head()\n", "# or\n", "df['name'].head()" ] }, { "cell_type": "code", "execution_count": null, "id": "c5bdf554", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "99e4dfe1", "metadata": {}, "source": [ "**Example 2:** To select multiple columns of a dataframe, pass a list of column names. The result is a new DataFrame object with the selected columns. " ] }, { "cell_type": "code", "execution_count": 15, "id": "f417bcd1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nogenderage
5MS06Female16
12MS13Male18
3MS04Male20
11MS12Male25
13MS14Female28
\n", "
" ], "text/plain": [ " roll no gender age\n", "5 MS06 Female 16\n", "12 MS13 Male 18\n", "3 MS04 Male 20\n", "11 MS12 Male 25\n", "13 MS14 Female 28" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Just get the first five rows for the columns `roll no`, `gender` and `age`\n", "d1 = df_sorted[['roll no', 'gender', 'age']].head()\n", "d1" ] }, { "cell_type": "code", "execution_count": null, "id": "a91ae4e2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "5936f688", "metadata": {}, "source": [ "**Example 3:** Select the data of a single row at position 1." ] }, { "cell_type": "code", "execution_count": 16, "id": "072691e5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
12MS13Mujahid18LahoreMORNINGgroup DMaleNaN76.57000.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "12 MS13 Mujahid 18 Lahore MORNING group D Male NaN 76.5 \n", "\n", " scholarship \n", "12 7000.0 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted[1:2:1]" ] }, { "cell_type": "markdown", "id": "08dc3331", "metadata": {}, "source": [ ">Point to note that in `df_sorted` dataframe at position 1, we have record of Muhahid having dataframe index 12" ] }, { "cell_type": "code", "execution_count": null, "id": "129853ae", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "18289301", "metadata": {}, "source": [ "**Example 4:** Select the rows from positional index 2 to 3." ] }, { "cell_type": "code", "execution_count": 26, "id": "a8a281b9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
3MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
11MS12Maaz25KarachiAFTERNOONgroup CMale90.581.3NaN
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "3 MS04 Hadeed 20 Lahore MOR group A Male 82.0 84.3 \n", "11 MS12 Maaz 25 Karachi AFTERNOON group C Male 90.5 81.3 \n", "\n", " scholarship \n", "3 4000.0 \n", "11 NaN " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted[2:4]" ] }, { "cell_type": "code", "execution_count": null, "id": "0b20029b", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "3399a250", "metadata": {}, "source": [ "**Example 5:** Select the rows from positional index 0, 5, 10, and 15" ] }, { "cell_type": "code", "execution_count": 7, "id": "a1d73eb4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5 Mohid\n", "14 Fatima\n", "6 Zobia\n", "8 Jamil\n", "Name: name, dtype: object" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted[::5]" ] }, { "cell_type": "markdown", "id": "926ae0a4", "metadata": {}, "source": [ ">Note that the output dataframe contains a subset of original dataframe. However, the index or row labels stays with the rows. It is not renumbered. So this means that every row is identified by a row label, which remain associated wih the row or record until you decide to reset the index" ] }, { "cell_type": "markdown", "id": "36db08b2", "metadata": {}, "source": [ "#### Resetting the Index of Subset of a Dataframe\n", "- When we slice data from a datafeame, the row index of resulting dataframe may not be contiguous values.\n", "- You can reset it using the `df.reset_index()` method discussed above as well." ] }, { "cell_type": "code", "execution_count": 28, "id": "c5c9fba2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
14MS15Fatima33SialkotAFTgroup CFemale90.581.33500.0
10MS11Khurram35IslamabadMORgroup BMale90.581.36000.0
4MS05Zara40PeshawerAFTgroup DFemale65.972.83500.0
15MS16Kakamanna42MultanAFTERNOONgroup AMale90.581.33800.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 \\\n", "14 MS15 Fatima 33 Sialkot AFT group C Female 90.5 \n", "10 MS11 Khurram 35 Islamabad MOR group B Male 90.5 \n", "4 MS05 Zara 40 Peshawer AFT group D Female 65.9 \n", "15 MS16 Kakamanna 42 Multan AFTERNOON group A Male 90.5 \n", "\n", " subj2 scholarship \n", "14 81.3 3500.0 \n", "10 81.3 6000.0 \n", "4 72.8 3500.0 \n", "15 81.3 3800.0 " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = df_sorted[5:12:2]\n", "df2" ] }, { "cell_type": "code", "execution_count": 29, "id": "9d7555f8", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexroll nonameageaddresssessiongroupgendersubj1subj2scholarship
014MS15Fatima33SialkotAFTgroup CFemale90.581.33500.0
110MS11Khurram35IslamabadMORgroup BMale90.581.36000.0
24MS05Zara40PeshawerAFTgroup DFemale65.972.83500.0
315MS16Kakamanna42MultanAFTERNOONgroup AMale90.581.33800.0
\n", "
" ], "text/plain": [ " index roll no name age address session group gender \\\n", "0 14 MS15 Fatima 33 Sialkot AFT group C Female \n", "1 10 MS11 Khurram 35 Islamabad MOR group B Male \n", "2 4 MS05 Zara 40 Peshawer AFT group D Female \n", "3 15 MS16 Kakamanna 42 Multan AFTERNOON group A Male \n", "\n", " subj1 subj2 scholarship \n", "0 90.5 81.3 3500.0 \n", "1 90.5 81.3 6000.0 \n", "2 65.9 72.8 3500.0 \n", "3 90.5 81.3 3800.0 " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3 = df2.reset_index()\n", "df3" ] }, { "cell_type": "markdown", "id": "8a48dbb0", "metadata": {}, "source": [ ">Note that the index has been reset, however, the old index is now added as a column in the dataframe. Mostly this is not required, so pass the `drop=True` argument to `reset_index()` method to avoid this." ] }, { "cell_type": "code", "execution_count": 30, "id": "9f2ae7eb", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
0MS15Fatima33SialkotAFTgroup CFemale90.581.33500.0
1MS11Khurram35IslamabadMORgroup BMale90.581.36000.0
2MS05Zara40PeshawerAFTgroup DFemale65.972.83500.0
3MS16Kakamanna42MultanAFTERNOONgroup AMale90.581.33800.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 \\\n", "0 MS15 Fatima 33 Sialkot AFT group C Female 90.5 \n", "1 MS11 Khurram 35 Islamabad MOR group B Male 90.5 \n", "2 MS05 Zara 40 Peshawer AFT group D Female 65.9 \n", "3 MS16 Kakamanna 42 Multan AFTERNOON group A Male 90.5 \n", "\n", " subj2 scholarship \n", "0 81.3 3500.0 \n", "1 81.3 6000.0 \n", "2 72.8 3500.0 \n", "3 81.3 3800.0 " ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df4 = df2.reset_index(drop=True)\n", "df4" ] }, { "cell_type": "code", "execution_count": null, "id": "d687a580", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "f45dc3cb", "metadata": {}, "source": [ "## 3. Selecting Rows and Columns using `iloc` Method\n", "\n", "\n", "- The **`df.iloc[]`** is more powerful than the **`df[]`**, as it allows to filter rows as well as columns of user choice at the same time.\n", "- It is used for filtering rows and selecting columns by **integer position** (0 to n-1) (neither by row index value/label nor by column index value/label). So you cannot mention the column names like `age` rather you need to give its positional index and that is 2.\n", "```\n", "df.iloc[rowstoselect, colstoselect]\n", "```\n", "- You can place a collon in any of the two arguments to select all rows or all columns.\n", "- Another point to keep in mind is that the indices are by position (0 to n-1) and not by actual values of row and column indices. \n", "- Allowed inputs within `[ , ]` are:\n", " - A single integer, e.g. ``5`` (note that ``5`` is interpreted as an integer position along the index).\n", " - A list or array of integers, e.g. `[9, 2, 7]`.\n", " - A slice object with integers, e.g. ``2:9``.\n", " - Note that as with usual Python slices, **stop** index is not included\n", " - **Note:** ``.iloc`` will raise ``IndexError`` if a requested indexer is out-of-bounds, except *slice* indexers which allow out-of-bounds indexing (this conforms with python/numpy *slice* semantics)." ] }, { "cell_type": "code", "execution_count": null, "id": "e54421d1", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "13ed9469", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "6afb21df", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "82a89ef2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "4e0a0aa8", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "373e2265", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 13, "id": "8df50cd2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
5MS06Mohid16LahoreMORNINGgroup CFemale69.378.6NaN
12MS13Mujahid18LahoreMORNINGgroup DMaleNaN76.57000.0
3MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
11MS12Maaz25KarachiAFTERNOONgroup CMale90.581.3NaN
13MS14Sara28MultanAFTERNOONgroup AFemale84.176.08000.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "5 MS06 Mohid 16 Lahore MORNING group C Female 69.3 78.6 \n", "12 MS13 Mujahid 18 Lahore MORNING group D Male NaN 76.5 \n", "3 MS04 Hadeed 20 Lahore MOR group A Male 82.0 84.3 \n", "11 MS12 Maaz 25 Karachi AFTERNOON group C Male 90.5 81.3 \n", "13 MS14 Sara 28 Multan AFTERNOON group A Female 84.1 76.0 \n", "\n", " scholarship \n", "5 NaN \n", "12 7000.0 \n", "3 4000.0 \n", "11 NaN \n", "13 8000.0 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Let us read a fresh dataframe and sort it by age column to have a clear understanding about indexing\n", "import pandas as pd\n", "df = pd.read_csv('datasets/groupdata.csv')\n", "df_sorted = df.sort_values('age')\n", "df_sorted.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "f1d3dc0f", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "7474d97c", "metadata": {}, "source": [ "### a. Selection of Rows Only" ] }, { "cell_type": "markdown", "id": "db4e6a54", "metadata": {}, "source": [ "**Example 1:** Select a single row with positional index 2 and all the columns" ] }, { "cell_type": "code", "execution_count": 23, "id": "f39ad79f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "roll no MS04\n", "name Hadeed\n", "age 20\n", "address Lahore\n", "session MOR\n", "group group A\n", "gender Male\n", "subj1 82.0\n", "subj2 84.3\n", "scholarship 4000.0\n", "Name: 3, dtype: object" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.iloc[2,:]" ] }, { "cell_type": "markdown", "id": "607f9b3c", "metadata": {}, "source": [ "Note that the integer values are interpreted as row# (positional index) of the dataframe. Moreover a Series object is returned" ] }, { "cell_type": "markdown", "id": "671ba668", "metadata": {}, "source": [ "**Example 2:** Select rows with positional indices 2, 4 and 1 and all the columns" ] }, { "cell_type": "code", "execution_count": 24, "id": "dc9b15be", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
3MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
13MS14Sara28MultanAFTERNOONgroup AFemale84.176.08000.0
12MS13Mujahid18LahoreMORNINGgroup DMaleNaN76.57000.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "3 MS04 Hadeed 20 Lahore MOR group A Male 82.0 84.3 \n", "13 MS14 Sara 28 Multan AFTERNOON group A Female 84.1 76.0 \n", "12 MS13 Mujahid 18 Lahore MORNING group D Male NaN 76.5 \n", "\n", " scholarship \n", "3 4000.0 \n", "13 8000.0 \n", "12 7000.0 " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.iloc[[2,4,1], :]" ] }, { "cell_type": "code", "execution_count": null, "id": "f31eae33", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "f0ced7aa", "metadata": {}, "source": [ "**Example 3:** Select rows with positional indices from 3 to 5 (stop value is not inclusive) and all the columns" ] }, { "cell_type": "code", "execution_count": 25, "id": "0f356dae", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
11MS12Maaz25KarachiAFTERNOONgroup CMale90.581.3NaN
13MS14Sara28MultanAFTERNOONgroup AFemale84.176.08000.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "11 MS12 Maaz 25 Karachi AFTERNOON group C Male 90.5 81.3 \n", "13 MS14 Sara 28 Multan AFTERNOON group A Female 84.1 76.0 \n", "\n", " scholarship \n", "11 NaN \n", "13 8000.0 " ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.iloc[3:5, :]" ] }, { "cell_type": "markdown", "id": "ade46ddf", "metadata": {}, "source": [ ">**In all above examples, if you omit comma and collon for the columns part, Pandas assumes it. I strongly recommend using the above style for clarity of code.**" ] }, { "cell_type": "code", "execution_count": null, "id": "e068a471", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "7b55926e", "metadata": {}, "source": [ "### b. Selection of Columns Only" ] }, { "cell_type": "code", "execution_count": 37, "id": "987fdfcb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['roll no', 'name', 'age', 'address', 'session', 'group', 'gender',\n", " 'subj1', 'subj2', 'scholarship'],\n", " dtype='object')" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.columns" ] }, { "cell_type": "markdown", "id": "406ecb3c", "metadata": {}, "source": [ "**Example 1:** Select all the row values under the column at positional index 3" ] }, { "cell_type": "code", "execution_count": 38, "id": "c4ea93a0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5 Lahore\n", "12 Lahore\n", "3 Lahore\n", "11 Karachi\n", "13 Multan\n", "14 Sialkot\n", "2 Karachi\n", "10 Islamabad\n", "9 Lahore\n", "4 Peshawer\n", "6 Sialkot\n", "15 Multan\n", "1 Islamabad\n", "7 Multan\n", "0 Lahore\n", "8 Karachi\n", "Name: address, dtype: object" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.iloc[:, 3]" ] }, { "cell_type": "code", "execution_count": null, "id": "25d445b9", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "d18d10de", "metadata": {}, "source": [ "**Example 2:** Select all the row values under the column at positional index 1, 4, and 5" ] }, { "cell_type": "code", "execution_count": 39, "id": "bcef93b7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namesessiongroup
5MohidMORNINGgroup C
12MujahidMORNINGgroup D
3HadeedMORgroup A
11MaazAFTERNOONgroup C
13SaraAFTERNOONgroup A
14FatimaAFTgroup C
2ShaistaAFTERNOONgroup B
10KhurramMORgroup B
9ShahidAFTERNOONgroup D
4ZaraAFTgroup D
6ZobiaAFTgroup B
15KakamannaAFTERNOONgroup A
1ArifAFTgroup A
7IdreesMORNINGgroup D
0RaufMORNINGgroup C
8JamilAFTgroup C
\n", "
" ], "text/plain": [ " name session group\n", "5 Mohid MORNING group C\n", "12 Mujahid MORNING group D\n", "3 Hadeed MOR group A\n", "11 Maaz AFTERNOON group C\n", "13 Sara AFTERNOON group A\n", "14 Fatima AFT group C\n", "2 Shaista AFTERNOON group B\n", "10 Khurram MOR group B\n", "9 Shahid AFTERNOON group D\n", "4 Zara AFT group D\n", "6 Zobia AFT group B\n", "15 Kakamanna AFTERNOON group A\n", "1 Arif AFT group A\n", "7 Idrees MORNING group D\n", "0 Rauf MORNING group C\n", "8 Jamil AFT group C" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.iloc[:, [1,4,5]]" ] }, { "cell_type": "code", "execution_count": null, "id": "522d549c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "2751b609", "metadata": {}, "source": [ "**Example 3:** Select all the row values under the columns from position 2 to 4 (Note that the stop index is not inclusive)" ] }, { "cell_type": "code", "execution_count": 40, "id": "9f62f365", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ageaddresssession
516LahoreMORNING
1218LahoreMORNING
320LahoreMOR
1125KarachiAFTERNOON
1328MultanAFTERNOON
1433SialkotAFT
235KarachiAFTERNOON
1035IslamabadMOR
938LahoreAFTERNOON
440PeshawerAFT
640SialkotAFT
1542MultanAFTERNOON
151IslamabadAFT
751MultanMORNING
052LahoreMORNING
853KarachiAFT
\n", "
" ], "text/plain": [ " age address session\n", "5 16 Lahore MORNING\n", "12 18 Lahore MORNING\n", "3 20 Lahore MOR\n", "11 25 Karachi AFTERNOON\n", "13 28 Multan AFTERNOON\n", "14 33 Sialkot AFT\n", "2 35 Karachi AFTERNOON\n", "10 35 Islamabad MOR\n", "9 38 Lahore AFTERNOON\n", "4 40 Peshawer AFT\n", "6 40 Sialkot AFT\n", "15 42 Multan AFTERNOON\n", "1 51 Islamabad AFT\n", "7 51 Multan MORNING\n", "0 52 Lahore MORNING\n", "8 53 Karachi AFT" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.iloc[:, 2:5]" ] }, { "cell_type": "code", "execution_count": null, "id": "6cfd4f30", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "04645731", "metadata": {}, "source": [ "### c. Selection of Rows + Columns\n", "```\n", "df.iloc[whatrowsIwant, whatcolumnsIwant]\n", "```\n", "- You can use a single value, a list of multiple values, or a slice object for selecting rows\n", "- You can use a single value, a list of multiple values, or a slice object for selecting columns" ] }, { "cell_type": "markdown", "id": "d23145c7", "metadata": {}, "source": [ "**Example 1:** Select only the rows at positional index 3 and 0, and from those two rows select only columns at positional index 1 and 5" ] }, { "cell_type": "code", "execution_count": 41, "id": "240720c4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namegroup
11Maazgroup C
5Mohidgroup C
\n", "
" ], "text/plain": [ " name group\n", "11 Maaz group C\n", "5 Mohid group C" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.iloc[[3, 0], [1, 5]]" ] }, { "cell_type": "code", "execution_count": null, "id": "608e11f0", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "540dae0e", "metadata": {}, "source": [ "**Example 2:** Select only the rows at positional index 0 to 4 (stop index is not inclusive), and from those two rows select only columns at positional index 2 and 3" ] }, { "cell_type": "code", "execution_count": 42, "id": "a707b416", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ageaddress
516Lahore
1218Lahore
320Lahore
1125Karachi
1328Multan
\n", "
" ], "text/plain": [ " age address\n", "5 16 Lahore\n", "12 18 Lahore\n", "3 20 Lahore\n", "11 25 Karachi\n", "13 28 Multan" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.iloc[0:5, 2:4]" ] }, { "cell_type": "code", "execution_count": null, "id": "a7a0d4f6", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "d92a8982", "metadata": {}, "source": [ "## 4. Selecting Rows and Columns using `df.loc[]` Method\n", "\n", "\n", "- The **`df.loc[]`** is also used for filtering rows and selecting columns but by row index value/label or by column index value/label (NOT by position). \n", "```\n", "df.loc[rowstoselect, colstoselect]\n", "```\n", "- You can place a collon in any of the two arguments to select all rows or all columns.\n", "- Another point to keep in mind is that the indices are NOT by position, rather by actual values of row and column indices. \n", "- Allowed inputs within `[ , ]` are:\n", " - A single label, e.g. `5` or `'a'`, (note that `5` is interpreted as actual index/label **NOT** as an integer position along the index).\n", " - A list or array of labels, e.g. `[9, 2, 7]` or `['ms07', 'ms02', 'ms08']`.\n", " - A slice object with labels, e.g. `[3:6:2]` or `['ms05':'ms09']`.\n", " - **Warning:** Note that contrary to usual Python slices, **both** the start and the stop are included" ] }, { "cell_type": "code", "execution_count": 9, "id": "9211ffa3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
5MS06Mohid16LahoreMORNINGgroup CFemale69.378.6NaN
12MS13Mujahid18LahoreMORNINGgroup DMaleNaN76.57000.0
3MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
11MS12Maaz25KarachiAFTERNOONgroup CMale90.581.3NaN
13MS14Sara28MultanAFTERNOONgroup AFemale84.176.08000.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "5 MS06 Mohid 16 Lahore MORNING group C Female 69.3 78.6 \n", "12 MS13 Mujahid 18 Lahore MORNING group D Male NaN 76.5 \n", "3 MS04 Hadeed 20 Lahore MOR group A Male 82.0 84.3 \n", "11 MS12 Maaz 25 Karachi AFTERNOON group C Male 90.5 81.3 \n", "13 MS14 Sara 28 Multan AFTERNOON group A Female 84.1 76.0 \n", "\n", " scholarship \n", "5 NaN \n", "12 7000.0 \n", "3 4000.0 \n", "11 NaN \n", "13 8000.0 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Let us read a fresh dataframe and sort it by age column to have a clear understanding about indexing\n", "import pandas as pd\n", "df = pd.read_csv('datasets/groupdata.csv')\n", "df_sorted = df.sort_values('age')\n", "df_sorted.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "31cc0e4c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "4a1a7d13", "metadata": {}, "source": [ "### a. Selection of Rows Only" ] }, { "cell_type": "markdown", "id": "885622be", "metadata": {}, "source": [ "**Example 1:** Select a single row with row index 2 and all the columns" ] }, { "cell_type": "code", "execution_count": 31, "id": "5d806857", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "roll no MS03\n", "name Shaista\n", "age 35\n", "address Karachi\n", "session AFTERNOON\n", "group group B\n", "gender Female\n", "subj1 64.9\n", "subj2 75.1\n", "scholarship 8500.0\n", "Name: 2, dtype: object" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.loc[2,:]" ] }, { "cell_type": "markdown", "id": "4cc79e65", "metadata": {}, "source": [ "**Example 2:** Select rows with row indices 2, 4 and 1 and all the columns" ] }, { "cell_type": "code", "execution_count": 11, "id": "531547c5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
2MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
4MS05Zara40PeshawerAFTgroup DFemale65.972.83500.0
1MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "2 MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 75.1 \n", "4 MS05 Zara 40 Peshawer AFT group D Female 65.9 72.8 \n", "1 MS02 Arif 51 Islamabad AFT group A Male 70.5 60.5 \n", "\n", " scholarship \n", "2 8500.0 \n", "4 3500.0 \n", "1 6000.0 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.loc[[2,4,1], :]" ] }, { "cell_type": "code", "execution_count": null, "id": "5994ef6f", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "4f4e34fb", "metadata": {}, "source": [ "**Example 3:** Select rows with row indices from 5 to 3 (stop value is inclusive) and all the columns" ] }, { "cell_type": "code", "execution_count": 33, "id": "c342f771", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
5MS06Mohid16LahoreMORNINGgroup CFemale69.378.6NaN
12MS13Mujahid18LahoreMORNINGgroup DMaleNaN76.57000.0
3MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
11MS12Maaz25KarachiAFTERNOONgroup CMale90.581.3NaN
13MS14Sara28MultanAFTERNOONgroup AFemale84.176.08000.0
14MS15Fatima33SialkotAFTgroup CFemale90.581.33500.0
2MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "5 MS06 Mohid 16 Lahore MORNING group C Female 69.3 78.6 \n", "12 MS13 Mujahid 18 Lahore MORNING group D Male NaN 76.5 \n", "3 MS04 Hadeed 20 Lahore MOR group A Male 82.0 84.3 \n", "11 MS12 Maaz 25 Karachi AFTERNOON group C Male 90.5 81.3 \n", "13 MS14 Sara 28 Multan AFTERNOON group A Female 84.1 76.0 \n", "14 MS15 Fatima 33 Sialkot AFT group C Female 90.5 81.3 \n", "2 MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 75.1 \n", "\n", " scholarship \n", "5 NaN \n", "12 7000.0 \n", "3 4000.0 \n", "11 NaN \n", "13 8000.0 \n", "14 3500.0 \n", "2 8500.0 " ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.loc[5:2, :]" ] }, { "cell_type": "code", "execution_count": 34, "id": "e7e7cf2d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
5MS06Mohid16LahoreMORNINGgroup CFemale69.378.6NaN
12MS13Mujahid18LahoreMORNINGgroup DMaleNaN76.57000.0
3MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "5 MS06 Mohid 16 Lahore MORNING group C Female 69.3 78.6 \n", "12 MS13 Mujahid 18 Lahore MORNING group D Male NaN 76.5 \n", "3 MS04 Hadeed 20 Lahore MOR group A Male 82.0 84.3 \n", "\n", " scholarship \n", "5 NaN \n", "12 7000.0 \n", "3 4000.0 " ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.loc[5:3, :]" ] }, { "cell_type": "markdown", "id": "d81adedd", "metadata": {}, "source": [ "**Question:** If you give the slice as `3:5`, it will select no row. Can anyone guess why it is so?" ] }, { "cell_type": "code", "execution_count": 12, "id": "43189731", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3 group A\n", "12 group D\n", "5 group C\n", "Name: group, dtype: object" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.loc[3:5:-1, :]" ] }, { "cell_type": "code", "execution_count": null, "id": "80ac8b1e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "f3319062", "metadata": {}, "source": [ ">**In all above examples, if you omit comma and collon for the columns part, Pandas assumes it. I strongly recommend using the above style for clarity of code.**" ] }, { "cell_type": "code", "execution_count": null, "id": "91e4d7b0", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "6390dab3", "metadata": {}, "source": [ "### b. Selection of Columns Only" ] }, { "cell_type": "code", "execution_count": 50, "id": "522af9aa", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['roll no', 'name', 'age', 'address', 'session', 'group', 'gender',\n", " 'subj1', 'subj2', 'scholarship'],\n", " dtype='object')" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.columns" ] }, { "cell_type": "markdown", "id": "931e08a7", "metadata": {}, "source": [ "**Example 1:** Select all the row values under the column with label `name`" ] }, { "cell_type": "code", "execution_count": 51, "id": "2cc48ff8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5 Mohid\n", "12 Mujahid\n", "3 Hadeed\n", "11 Maaz\n", "13 Sara\n", "14 Fatima\n", "2 Shaista\n", "10 Khurram\n", "9 Shahid\n", "4 Zara\n", "6 Zobia\n", "15 Kakamanna\n", "1 Arif\n", "7 Idrees\n", "0 Rauf\n", "8 Jamil\n", "Name: name, dtype: object" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.loc[:, 'name']" ] }, { "cell_type": "code", "execution_count": null, "id": "a6018a75", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "57f887bf", "metadata": {}, "source": [ "**Example 2:** Select all the row values under the columns with labels `name`, `address`, and `scholarship`" ] }, { "cell_type": "code", "execution_count": 52, "id": "53d61516", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameaddressscholarship
5MohidLahoreNaN
12MujahidLahore7000.0
3HadeedLahore4000.0
11MaazKarachiNaN
13SaraMultan8000.0
14FatimaSialkot3500.0
2ShaistaKarachi8500.0
10KhurramIslamabad6000.0
9ShahidLahore3800.0
4ZaraPeshawer3500.0
6ZobiaSialkot4000.0
15KakamannaMultan3800.0
1ArifIslamabad6000.0
7IdreesMultan8000.0
0RaufLahore5000.0
8JamilKarachi3500.0
\n", "
" ], "text/plain": [ " name address scholarship\n", "5 Mohid Lahore NaN\n", "12 Mujahid Lahore 7000.0\n", "3 Hadeed Lahore 4000.0\n", "11 Maaz Karachi NaN\n", "13 Sara Multan 8000.0\n", "14 Fatima Sialkot 3500.0\n", "2 Shaista Karachi 8500.0\n", "10 Khurram Islamabad 6000.0\n", "9 Shahid Lahore 3800.0\n", "4 Zara Peshawer 3500.0\n", "6 Zobia Sialkot 4000.0\n", "15 Kakamanna Multan 3800.0\n", "1 Arif Islamabad 6000.0\n", "7 Idrees Multan 8000.0\n", "0 Rauf Lahore 5000.0\n", "8 Jamil Karachi 3500.0" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.loc[:, ['name', 'address', 'scholarship']]" ] }, { "cell_type": "code", "execution_count": null, "id": "a09667a2", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "89016cac", "metadata": {}, "source": [ "**Example 3:** Select all the row values under the columns from a column range given as column labels (Note that the stop index is inclusive)" ] }, { "cell_type": "code", "execution_count": 54, "id": "c119c42e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
5
12
3
11
13
14
2
10
9
4
6
15
1
7
0
8
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: []\n", "Index: [5, 12, 3, 11, 13, 14, 2, 10, 9, 4, 6, 15, 1, 7, 0, 8]" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.loc[:, 'address':'name']" ] }, { "cell_type": "code", "execution_count": 39, "id": "f54ab064", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameageaddress
5Mohid16Lahore
12Mujahid18Lahore
3Hadeed20Lahore
11Maaz25Karachi
13Sara28Multan
14Fatima33Sialkot
2Shaista35Karachi
10Khurram35Islamabad
9Shahid38Lahore
4Zara40Peshawer
6Zobia40Sialkot
15Kakamanna42Multan
1Arif51Islamabad
7Idrees51Multan
0Rauf52Lahore
8Jamil53Karachi
\n", "
" ], "text/plain": [ " name age address\n", "5 Mohid 16 Lahore\n", "12 Mujahid 18 Lahore\n", "3 Hadeed 20 Lahore\n", "11 Maaz 25 Karachi\n", "13 Sara 28 Multan\n", "14 Fatima 33 Sialkot\n", "2 Shaista 35 Karachi\n", "10 Khurram 35 Islamabad\n", "9 Shahid 38 Lahore\n", "4 Zara 40 Peshawer\n", "6 Zobia 40 Sialkot\n", "15 Kakamanna 42 Multan\n", "1 Arif 51 Islamabad\n", "7 Idrees 51 Multan\n", "0 Rauf 52 Lahore\n", "8 Jamil 53 Karachi" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.loc[:, 'name':'address']" ] }, { "cell_type": "code", "execution_count": 40, "id": "eca0e6fa", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameaddress
5MohidLahore
12MujahidLahore
3HadeedLahore
11MaazKarachi
13SaraMultan
14FatimaSialkot
2ShaistaKarachi
10KhurramIslamabad
9ShahidLahore
4ZaraPeshawer
6ZobiaSialkot
15KakamannaMultan
1ArifIslamabad
7IdreesMultan
0RaufLahore
8JamilKarachi
\n", "
" ], "text/plain": [ " name address\n", "5 Mohid Lahore\n", "12 Mujahid Lahore\n", "3 Hadeed Lahore\n", "11 Maaz Karachi\n", "13 Sara Multan\n", "14 Fatima Sialkot\n", "2 Shaista Karachi\n", "10 Khurram Islamabad\n", "9 Shahid Lahore\n", "4 Zara Peshawer\n", "6 Zobia Sialkot\n", "15 Kakamanna Multan\n", "1 Arif Islamabad\n", "7 Idrees Multan\n", "0 Rauf Lahore\n", "8 Jamil Karachi" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.loc[:, 'name':'address':2]" ] }, { "cell_type": "code", "execution_count": null, "id": "4dae4af5", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "205220fc", "metadata": {}, "source": [ "### c. Selection of Rows + Columns\n", "```\n", "df.iloc[whatrowsIwant, whatcolumnsIwant]\n", "```\n", "- You can use a single value, a list of multiple values, or a slice object for selecting rows\n", "- You can use a single value, a list of multiple values, or a slice object for selecting columns" ] }, { "cell_type": "markdown", "id": "d75e875a", "metadata": {}, "source": [ "**Example 1:** Select rows with row indices 3 and 0, and from those two rows select only columns at column labels `name` and `address`" ] }, { "cell_type": "code", "execution_count": 55, "id": "7b356699", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameaddress
3HadeedLahore
0RaufLahore
\n", "
" ], "text/plain": [ " name address\n", "3 Hadeed Lahore\n", "0 Rauf Lahore" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.loc[[3, 0], ['name', 'address']]" ] }, { "cell_type": "code", "execution_count": null, "id": "13174faa", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "44d94e18", "metadata": {}, "source": [ "**Example 2:** Select rows with row indices 0 to 5 (Stop index is inclusive), and from those six rows select columns `name`, `age` and `session`" ] }, { "cell_type": "code", "execution_count": 56, "id": "80df8045", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameagesession
3Hadeed20MOR
11Maaz25AFTERNOON
13Sara28AFTERNOON
\n", "
" ], "text/plain": [ " name age session\n", "3 Hadeed 20 MOR\n", "11 Maaz 25 AFTERNOON\n", "13 Sara 28 AFTERNOON" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.loc[3:13, ['name', 'age', 'session']]" ] }, { "cell_type": "code", "execution_count": null, "id": "4487c360", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "02d3ea27", "metadata": {}, "source": [ "**Question:** You might be expecting `3:13` will return 11 rows, but it has returned only three. Can anyone guess why it is so?" ] }, { "cell_type": "code", "execution_count": 57, "id": "4958d945", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameagesession
5Mohid16MORNING
12Mujahid18MORNING
3Hadeed20MOR
11Maaz25AFTERNOON
13Sara28AFTERNOON
14Fatima33AFT
2Shaista35AFTERNOON
10Khurram35MOR
9Shahid38AFTERNOON
4Zara40AFT
6Zobia40AFT
15Kakamanna42AFTERNOON
1Arif51AFT
7Idrees51MORNING
0Rauf52MORNING
8Jamil53AFT
\n", "
" ], "text/plain": [ " name age session\n", "5 Mohid 16 MORNING\n", "12 Mujahid 18 MORNING\n", "3 Hadeed 20 MOR\n", "11 Maaz 25 AFTERNOON\n", "13 Sara 28 AFTERNOON\n", "14 Fatima 33 AFT\n", "2 Shaista 35 AFTERNOON\n", "10 Khurram 35 MOR\n", "9 Shahid 38 AFTERNOON\n", "4 Zara 40 AFT\n", "6 Zobia 40 AFT\n", "15 Kakamanna 42 AFTERNOON\n", "1 Arif 51 AFT\n", "7 Idrees 51 MORNING\n", "0 Rauf 52 MORNING\n", "8 Jamil 53 AFT" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sorted.loc[5:8, ['name', 'age', 'session']]" ] }, { "cell_type": "markdown", "id": "0e73e5a6", "metadata": {}, "source": [ "The range `5:8` has returned all the 16 rows in the dataframe :)" ] }, { "cell_type": "code", "execution_count": null, "id": "85d0c4a9", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "5cf91bb8", "metadata": {}, "source": [ "# After having understood all of this my recommendation is:" ] }, { "cell_type": "markdown", "id": "83e8e058", "metadata": {}, "source": [ ">**Always keep the row indices of your dataframe as 0, 1, 2, 3, 4, ... and the column indices as meaningful labels. If after a slicing or sorting operation, the row indices are a bit disturbed, use `df.reset_index()` method to adjust your row indices match the positional indices.**" ] }, { "cell_type": "code", "execution_count": null, "id": "a6fa1808", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "0e123a4b", "metadata": {}, "source": [ "## 5. Conditional Selection\n", "- Suppose we want to select only those rows where the age value is greater than 40. Note this time the dataframe has row indices that match with the positional indices.\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 81, "id": "2f853b18", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
0MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
1MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
2MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
3MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
4MS05Zara40PeshawerAFTgroup DFemale65.972.83500.0
5MS06Mohid16LahoreMORNINGgroup CFemale69.378.6NaN
6MS07Zobia40SialkotAFTgroup BFemale90.2NaN4000.0
7MS08Idrees51MultanMORNINGgroup DMale84.176.08000.0
8MS09Jamil53KarachiAFTgroup CMale90.581.33500.0
9MS10Shahid38LahoreAFTERNOONgroup DMale90.581.33800.0
10MS11Khurram35IslamabadMORgroup BMale90.581.36000.0
11MS12Maaz25KarachiAFTERNOONgroup CMale90.581.3NaN
12MS13Mujahid18LahoreMORNINGgroup DMaleNaN76.57000.0
13MS14Sara28MultanAFTERNOONgroup AFemale84.176.08000.0
14MS15Fatima33SialkotAFTgroup CFemale90.581.33500.0
15MS16Kakamanna42MultanAFTERNOONgroup AMale90.581.33800.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 \\\n", "0 MS01 Rauf 52 Lahore MORNING group C Male 78.3 \n", "1 MS02 Arif 51 Islamabad AFT group A Male 70.5 \n", "2 MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 \n", "3 MS04 Hadeed 20 Lahore MOR group A Male 82.0 \n", "4 MS05 Zara 40 Peshawer AFT group D Female 65.9 \n", "5 MS06 Mohid 16 Lahore MORNING group C Female 69.3 \n", "6 MS07 Zobia 40 Sialkot AFT group B Female 90.2 \n", "7 MS08 Idrees 51 Multan MORNING group D Male 84.1 \n", "8 MS09 Jamil 53 Karachi AFT group C Male 90.5 \n", "9 MS10 Shahid 38 Lahore AFTERNOON group D Male 90.5 \n", "10 MS11 Khurram 35 Islamabad MOR group B Male 90.5 \n", "11 MS12 Maaz 25 Karachi AFTERNOON group C Male 90.5 \n", "12 MS13 Mujahid 18 Lahore MORNING group D Male NaN \n", "13 MS14 Sara 28 Multan AFTERNOON group A Female 84.1 \n", "14 MS15 Fatima 33 Sialkot AFT group C Female 90.5 \n", "15 MS16 Kakamanna 42 Multan AFTERNOON group A Male 90.5 \n", "\n", " subj2 scholarship \n", "0 84.4 5000.0 \n", "1 60.5 6000.0 \n", "2 75.1 8500.0 \n", "3 84.3 4000.0 \n", "4 72.8 3500.0 \n", "5 78.6 NaN \n", "6 NaN 4000.0 \n", "7 76.0 8000.0 \n", "8 81.3 3500.0 \n", "9 81.3 3800.0 \n", "10 81.3 6000.0 \n", "11 81.3 NaN \n", "12 76.5 7000.0 \n", "13 76.0 8000.0 \n", "14 81.3 3500.0 \n", "15 81.3 3800.0 " ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Let us read a fresh dataframe and sort it by age column to have a clear understanding about indexing\n", "import pandas as pd\n", "df = pd.read_csv('datasets/groupdata.csv')\n", "df" ] }, { "cell_type": "markdown", "id": "df7b948d", "metadata": {}, "source": [ "### a. Option 1:\n", "- Create a Python list having Boolean values of exact same length as the rows of the dataframe \n", "- The value in the list need to be True for the row which we want to select\n", "- Convert the Python list to a Pandas series\n", "- Finally pass that series to the dataframe" ] }, { "cell_type": "code", "execution_count": 83, "id": "f2786c0a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 52\n", "1 51\n", "2 35\n", "3 20\n", "4 40\n", "5 16\n", "6 40\n", "7 51\n", "8 53\n", "9 38\n", "10 35\n", "11 25\n", "12 18\n", "13 28\n", "14 33\n", "15 42\n", "Name: age, dtype: int64" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.age" ] }, { "cell_type": "code", "execution_count": 82, "id": "2ac3b39e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[True, True, False, False, False, False, False, True, True, False, False, False, False, False, False, True]\n" ] } ], "source": [ "list1 = []\n", "for length in df.age:\n", " if length > 40:\n", " list1.append(True)\n", " else:\n", " list1.append(False)\n", "print(list1)" ] }, { "cell_type": "code", "execution_count": 84, "id": "b705b8ae", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
0MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
1MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
7MS08Idrees51MultanMORNINGgroup DMale84.176.08000.0
8MS09Jamil53KarachiAFTgroup CMale90.581.33500.0
15MS16Kakamanna42MultanAFTERNOONgroup AMale90.581.33800.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 \\\n", "0 MS01 Rauf 52 Lahore MORNING group C Male 78.3 \n", "1 MS02 Arif 51 Islamabad AFT group A Male 70.5 \n", "7 MS08 Idrees 51 Multan MORNING group D Male 84.1 \n", "8 MS09 Jamil 53 Karachi AFT group C Male 90.5 \n", "15 MS16 Kakamanna 42 Multan AFTERNOON group A Male 90.5 \n", "\n", " subj2 scholarship \n", "0 84.4 5000.0 \n", "1 60.5 6000.0 \n", "7 76.0 8000.0 \n", "8 81.3 3500.0 \n", "15 81.3 3800.0 " ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[list1]" ] }, { "cell_type": "code", "execution_count": null, "id": "a251de2d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "924aa546", "metadata": {}, "source": [ "### b. Option 2:\n", "- Instead of creating a Boolean list using the loop, use the condition inside the `df[cond]` operator, that will automatically generate the Boolean list." ] }, { "cell_type": "code", "execution_count": 85, "id": "1e7b82e6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
0MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
1MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
7MS08Idrees51MultanMORNINGgroup DMale84.176.08000.0
8MS09Jamil53KarachiAFTgroup CMale90.581.33500.0
15MS16Kakamanna42MultanAFTERNOONgroup AMale90.581.33800.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 \\\n", "0 MS01 Rauf 52 Lahore MORNING group C Male 78.3 \n", "1 MS02 Arif 51 Islamabad AFT group A Male 70.5 \n", "7 MS08 Idrees 51 Multan MORNING group D Male 84.1 \n", "8 MS09 Jamil 53 Karachi AFT group C Male 90.5 \n", "15 MS16 Kakamanna 42 Multan AFTERNOON group A Male 90.5 \n", "\n", " subj2 scholarship \n", "0 84.4 5000.0 \n", "1 60.5 6000.0 \n", "7 76.0 8000.0 \n", "8 81.3 3500.0 \n", "15 81.3 3800.0 " ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['age']>40]" ] }, { "cell_type": "code", "execution_count": 45, "id": "70fcfd32", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
0MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
1MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
7MS08Idrees51MultanMORNINGgroup DMale84.176.08000.0
8MS09Jamil53KarachiAFTgroup CMale90.581.33500.0
15MS16Kakamanna42MultanAFTERNOONgroup AMale90.581.33800.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 \\\n", "0 MS01 Rauf 52 Lahore MORNING group C Male 78.3 \n", "1 MS02 Arif 51 Islamabad AFT group A Male 70.5 \n", "7 MS08 Idrees 51 Multan MORNING group D Male 84.1 \n", "8 MS09 Jamil 53 Karachi AFT group C Male 90.5 \n", "15 MS16 Kakamanna 42 Multan AFTERNOON group A Male 90.5 \n", "\n", " subj2 scholarship \n", "0 84.4 5000.0 \n", "1 60.5 6000.0 \n", "7 76.0 8000.0 \n", "8 81.3 3500.0 \n", "15 81.3 3800.0 " ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df.age >40]" ] }, { "cell_type": "code", "execution_count": null, "id": "f60a6f70", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "5c503e91", "metadata": {}, "source": [ "### c. Option 3:\n", "- Best way is to use the **`df.loc[cond]`** method." ] }, { "cell_type": "code", "execution_count": 63, "id": "2f5e1006", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
0MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
1MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
7MS08Idrees51MultanMORNINGgroup DMale84.176.08000.0
8MS09Jamil53KarachiAFTgroup CMale90.581.33500.0
15MS16Kakamanna42MultanAFTERNOONgroup AMale90.581.33800.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 \\\n", "0 MS01 Rauf 52 Lahore MORNING group C Male 78.3 \n", "1 MS02 Arif 51 Islamabad AFT group A Male 70.5 \n", "7 MS08 Idrees 51 Multan MORNING group D Male 84.1 \n", "8 MS09 Jamil 53 Karachi AFT group C Male 90.5 \n", "15 MS16 Kakamanna 42 Multan AFTERNOON group A Male 90.5 \n", "\n", " subj2 scholarship \n", "0 84.4 5000.0 \n", "1 60.5 6000.0 \n", "7 76.0 8000.0 \n", "8 81.3 3500.0 \n", "15 81.3 3800.0 " ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df.age > 40]" ] }, { "cell_type": "code", "execution_count": 86, "id": "3936aad0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameage
0Rauf52
1Arif51
7Idrees51
8Jamil53
15Kakamanna42
\n", "
" ], "text/plain": [ " name age\n", "0 Rauf 52\n", "1 Arif 51\n", "7 Idrees 51\n", "8 Jamil 53\n", "15 Kakamanna 42" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Using 'loc' gives you the facility to slice the required columns as well\n", "df.loc[df.age > 40, ['name', 'age']]" ] }, { "cell_type": "code", "execution_count": null, "id": "7bc2cd79", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "481d208f", "metadata": {}, "source": [ "### d. Conditional Selection based on Multiple Conditions\n", "- Suppose we want to get all the records of the dataframe where the age value is greater than 40 and belong to Multan\n", "- For this use multiple conditions inside parenthesis and use logical operators (`&`, `|`) in between\n", "```\n", "df[(condition1) op (condition2) op (condition3)]\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "55c0a5b3", "metadata": {}, "outputs": [], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 89, "id": "b8a3f9d4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
2MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
3MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
5MS06Mohid16LahoreMORNINGgroup CFemale69.378.6NaN
9MS10Shahid38LahoreAFTERNOONgroup DMale90.581.33800.0
10MS11Khurram35IslamabadMORgroup BMale90.581.36000.0
11MS12Maaz25KarachiAFTERNOONgroup CMale90.581.3NaN
12MS13Mujahid18LahoreMORNINGgroup DMaleNaN76.57000.0
14MS15Fatima33SialkotAFTgroup CFemale90.581.33500.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "2 MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 75.1 \n", "3 MS04 Hadeed 20 Lahore MOR group A Male 82.0 84.3 \n", "5 MS06 Mohid 16 Lahore MORNING group C Female 69.3 78.6 \n", "9 MS10 Shahid 38 Lahore AFTERNOON group D Male 90.5 81.3 \n", "10 MS11 Khurram 35 Islamabad MOR group B Male 90.5 81.3 \n", "11 MS12 Maaz 25 Karachi AFTERNOON group C Male 90.5 81.3 \n", "12 MS13 Mujahid 18 Lahore MORNING group D Male NaN 76.5 \n", "14 MS15 Fatima 33 Sialkot AFT group C Female 90.5 81.3 \n", "\n", " scholarship \n", "2 8500.0 \n", "3 4000.0 \n", "5 NaN \n", "9 3800.0 \n", "10 6000.0 \n", "11 NaN \n", "12 7000.0 \n", "14 3500.0 " ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df.age < 40) & (df.address != 'Multan')]" ] }, { "cell_type": "code", "execution_count": null, "id": "799206fa", "metadata": {}, "outputs": [], "source": [ "df[(df.age > 40) & (df.address == 'Multan')]" ] }, { "cell_type": "code", "execution_count": null, "id": "9ee9ada0", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 68, "id": "4821330d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
1MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
3MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
15MS16Kakamanna42MultanAFTERNOONgroup AMale90.581.33800.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 \\\n", "1 MS02 Arif 51 Islamabad AFT group A Male 70.5 \n", "3 MS04 Hadeed 20 Lahore MOR group A Male 82.0 \n", "15 MS16 Kakamanna 42 Multan AFTERNOON group A Male 90.5 \n", "\n", " subj2 scholarship \n", "1 60.5 6000.0 \n", "3 84.3 4000.0 \n", "15 81.3 3800.0 " ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select records of group A male students only\n", "df1 = df[(df.group == 'group A') & (df.gender == 'Male')] \n", "df1" ] }, { "cell_type": "code", "execution_count": null, "id": "69f8f258", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 69, "id": "bbb05754", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
2MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
6MS07Zobia40SialkotAFTgroup BFemale90.2NaN4000.0
8MS09Jamil53KarachiAFTgroup CMale90.581.33500.0
11MS12Maaz25KarachiAFTERNOONgroup CMale90.581.3NaN
14MS15Fatima33SialkotAFTgroup CFemale90.581.33500.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "2 MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 75.1 \n", "6 MS07 Zobia 40 Sialkot AFT group B Female 90.2 NaN \n", "8 MS09 Jamil 53 Karachi AFT group C Male 90.5 81.3 \n", "11 MS12 Maaz 25 Karachi AFTERNOON group C Male 90.5 81.3 \n", "14 MS15 Fatima 33 Sialkot AFT group C Female 90.5 81.3 \n", "\n", " scholarship \n", "2 8500.0 \n", "6 4000.0 \n", "8 3500.0 \n", "11 NaN \n", "14 3500.0 " ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select the records of students who belong to Sialkot or Karachi\n", "df1 = df[(df.address == 'Sialkot') | (df.address == 'Karachi')]\n", "df1" ] }, { "cell_type": "code", "execution_count": null, "id": "24114c02", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 70, "id": "75d9fc93", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
4MS05Zara40PeshawerAFTgroup DFemale65.972.83500.0
7MS08Idrees51MultanMORNINGgroup DMale84.176.08000.0
13MS14Sara28MultanAFTERNOONgroup AFemale84.176.08000.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "4 MS05 Zara 40 Peshawer AFT group D Female 65.9 72.8 \n", "7 MS08 Idrees 51 Multan MORNING group D Male 84.1 76.0 \n", "13 MS14 Sara 28 Multan AFTERNOON group A Female 84.1 76.0 \n", "\n", " scholarship \n", "4 3500.0 \n", "7 8000.0 \n", "13 8000.0 " ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select records who lives outside Karachi and earn a scholarship of greater than 7000, or lives in Peshawer\n", "out = df[(df.address != 'Karachi') & (df.scholarship > 7000) | (df.address == 'Peshawer')]\n", "out\n" ] }, { "cell_type": "markdown", "id": "5e557480", "metadata": {}, "source": [ "**If there are many conditions connected with or operator, you can simplify it using the `series.isin()` method as shown below:**" ] }, { "cell_type": "code", "execution_count": 71, "id": "4f94f4c1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
1MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
2MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
4MS05Zara40PeshawerAFTgroup DFemale65.972.83500.0
8MS09Jamil53KarachiAFTgroup CMale90.581.33500.0
10MS11Khurram35IslamabadMORgroup BMale90.581.36000.0
11MS12Maaz25KarachiAFTERNOONgroup CMale90.581.3NaN
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 subj2 \\\n", "1 MS02 Arif 51 Islamabad AFT group A Male 70.5 60.5 \n", "2 MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 75.1 \n", "4 MS05 Zara 40 Peshawer AFT group D Female 65.9 72.8 \n", "8 MS09 Jamil 53 Karachi AFT group C Male 90.5 81.3 \n", "10 MS11 Khurram 35 Islamabad MOR group B Male 90.5 81.3 \n", "11 MS12 Maaz 25 Karachi AFTERNOON group C Male 90.5 81.3 \n", "\n", " scholarship \n", "1 6000.0 \n", "2 8500.0 \n", "4 3500.0 \n", "8 3500.0 \n", "10 6000.0 \n", "11 NaN " ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df.address.isin(['Karachi', 'Peshawer', 'Islamabad'])]" ] }, { "cell_type": "code", "execution_count": null, "id": "a968651d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "787bd353", "metadata": {}, "source": [ "## 6. Selecting columns of a specific data type\n", "- The `df.select_dtypes()` method is used to get a subset of the dataframe to select columns of a specific datatype(s) \n", "```\n", "df.select_dtypes(include, exclude)\n", "```\n", "\n", "- `include` and `exclude` arguments can be scalar or list-like\n", "- Atleast one of these parameters must be supplied" ] }, { "cell_type": "code", "execution_count": 72, "id": "ed488fd1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "roll no object\n", "name object\n", "age int64\n", "address object\n", "session object\n", "group object\n", "gender object\n", "subj1 float64\n", "subj2 float64\n", "scholarship float64\n", "dtype: object" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Let us first check the data types of each column\n", "df.dtypes" ] }, { "cell_type": "code", "execution_count": 73, "id": "077b0967", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subj1subj2scholarship
078.384.45000.0
170.560.56000.0
264.975.18500.0
382.084.34000.0
465.972.83500.0
569.378.6NaN
690.2NaN4000.0
784.176.08000.0
890.581.33500.0
990.581.33800.0
1090.581.36000.0
1190.581.3NaN
12NaN76.57000.0
1384.176.08000.0
1490.581.33500.0
1590.581.33800.0
\n", "
" ], "text/plain": [ " subj1 subj2 scholarship\n", "0 78.3 84.4 5000.0\n", "1 70.5 60.5 6000.0\n", "2 64.9 75.1 8500.0\n", "3 82.0 84.3 4000.0\n", "4 65.9 72.8 3500.0\n", "5 69.3 78.6 NaN\n", "6 90.2 NaN 4000.0\n", "7 84.1 76.0 8000.0\n", "8 90.5 81.3 3500.0\n", "9 90.5 81.3 3800.0\n", "10 90.5 81.3 6000.0\n", "11 90.5 81.3 NaN\n", "12 NaN 76.5 7000.0\n", "13 84.1 76.0 8000.0\n", "14 90.5 81.3 3500.0\n", "15 90.5 81.3 3800.0" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select the columns with object data type (categorical variables) only`\n", "df.select_dtypes(include='float64')" ] }, { "cell_type": "code", "execution_count": 74, "id": "bd895191", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agesubj1subj2scholarship
05278.384.45000.0
15170.560.56000.0
23564.975.18500.0
32082.084.34000.0
44065.972.83500.0
\n", "
" ], "text/plain": [ " age subj1 subj2 scholarship\n", "0 52 78.3 84.4 5000.0\n", "1 51 70.5 60.5 6000.0\n", "2 35 64.9 75.1 8500.0\n", "3 20 82.0 84.3 4000.0\n", "4 40 65.9 72.8 3500.0" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select the columns with int64 datatype\n", "df.select_dtypes(include=['int64', 'float64']).head()" ] }, { "cell_type": "code", "execution_count": null, "id": "953eb361", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "36c7b823", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "fe1c9368", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "d0a04543", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "9718eb42", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "a0fbd2c6", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 30, "id": "e183e117", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
roll nonameageaddresssessiongroupgendersubj1subj2scholarship
5MS06Mohid16LahoreMORNINGgroup CFemale69.378.6NaN
12MS13Mujahid18LahoreMORNINGgroup DMaleNaN76.57000.0
3MS04Hadeed20LahoreMORgroup AMale82.084.34000.0
11MS12Maaz25KarachiAFTERNOONgroup CMale90.581.3NaN
13MS14Sara28MultanAFTERNOONgroup AFemale84.176.08000.0
14MS15Fatima33SialkotAFTgroup CFemale90.581.33500.0
2MS03Shaista35KarachiAFTERNOONgroup BFemale64.975.18500.0
10MS11Khurram35IslamabadMORgroup BMale90.581.36000.0
9MS10Shahid38LahoreAFTERNOONgroup DMale90.581.33800.0
4MS05Zara40PeshawerAFTgroup DFemale65.972.83500.0
6MS07Zobia40SialkotAFTgroup BFemale90.2NaN4000.0
15MS16Kakamanna42MultanAFTERNOONgroup AMale90.581.33800.0
1MS02Arif51IslamabadAFTgroup AMale70.560.56000.0
7MS08Idrees51MultanMORNINGgroup DMale84.176.08000.0
0MS01Rauf52LahoreMORNINGgroup CMale78.384.45000.0
8MS09Jamil53KarachiAFTgroup CMale90.581.33500.0
\n", "
" ], "text/plain": [ " roll no name age address session group gender subj1 \\\n", "5 MS06 Mohid 16 Lahore MORNING group C Female 69.3 \n", "12 MS13 Mujahid 18 Lahore MORNING group D Male NaN \n", "3 MS04 Hadeed 20 Lahore MOR group A Male 82.0 \n", "11 MS12 Maaz 25 Karachi AFTERNOON group C Male 90.5 \n", "13 MS14 Sara 28 Multan AFTERNOON group A Female 84.1 \n", "14 MS15 Fatima 33 Sialkot AFT group C Female 90.5 \n", "2 MS03 Shaista 35 Karachi AFTERNOON group B Female 64.9 \n", "10 MS11 Khurram 35 Islamabad MOR group B Male 90.5 \n", "9 MS10 Shahid 38 Lahore AFTERNOON group D Male 90.5 \n", "4 MS05 Zara 40 Peshawer AFT group D Female 65.9 \n", "6 MS07 Zobia 40 Sialkot AFT group B Female 90.2 \n", "15 MS16 Kakamanna 42 Multan AFTERNOON group A Male 90.5 \n", "1 MS02 Arif 51 Islamabad AFT group A Male 70.5 \n", "7 MS08 Idrees 51 Multan MORNING group D Male 84.1 \n", "0 MS01 Rauf 52 Lahore MORNING group C Male 78.3 \n", "8 MS09 Jamil 53 Karachi AFT group C Male 90.5 \n", "\n", " subj2 scholarship \n", "5 78.6 NaN \n", "12 76.5 7000.0 \n", "3 84.3 4000.0 \n", "11 81.3 NaN \n", "13 76.0 8000.0 \n", "14 81.3 3500.0 \n", "2 75.1 8500.0 \n", "10 81.3 6000.0 \n", "9 81.3 3800.0 \n", "4 72.8 3500.0 \n", "6 NaN 4000.0 \n", "15 81.3 3800.0 \n", "1 60.5 6000.0 \n", "7 76.0 8000.0 \n", "0 84.4 5000.0 \n", "8 81.3 3500.0 " ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Let us read a fresh dataframe and sort it by age column to have a clear understanding about indexing\n", "import pandas as pd\n", "df = pd.read_csv('datasets/groupdata.csv')\n", "df = df.sort_values('age')\n", "df" ] }, { "cell_type": "code", "execution_count": 32, "id": "5b82d058", "metadata": {}, "outputs": [], "source": [ "dfgb = df.groupby('address')" ] }, { "cell_type": "code", "execution_count": 33, "id": "0bf78074", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3800.0" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('address').get_group('Multan').scholarship.min()" ] }, { "cell_type": "code", "execution_count": null, "id": "c718dae8", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 29, "id": "022bc741", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(16, 7)" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# It will delete the Scholarship column\n", "df_sorted.drop(columns=df.columns[df.apply(lambda col: col.isnull().sum() >= 1)], axis=1, inplace=True)\n", "df_sorted.shape" ] }, { "cell_type": "code", "execution_count": null, "id": "d879e8b4", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }