{ "cells": [ { "cell_type": "markdown", "id": "committed-satisfaction", "metadata": {}, "source": [ "# Organizing the DataFrame" ] }, { "cell_type": "markdown", "id": "taken-nutrition", "metadata": {}, "source": [ "## How to Sort Data By Single Column" ] }, { "cell_type": "code", "execution_count": 1, "id": "systematic-messenger", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "df = pd.read_csv(\"../data/titanic.csv\")" ] }, { "cell_type": "markdown", "id": "healthy-geography", "metadata": {}, "source": [ "Now that we've import pandas and created our DataFrame, let's see what it looks like again." ] }, { "cell_type": "code", "execution_count": 2, "id": "significant-silence", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
.......................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.4500NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ
\n", "

891 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "0 1 0 3 \n", "1 2 1 1 \n", "2 3 1 3 \n", "3 4 1 1 \n", "4 5 0 3 \n", ".. ... ... ... \n", "886 887 0 2 \n", "887 888 1 1 \n", "888 889 0 3 \n", "889 890 1 1 \n", "890 891 0 3 \n", "\n", " Name Sex Age SibSp \\\n", "0 Braund, Mr. Owen Harris male 22.0 1 \n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", "2 Heikkinen, Miss. Laina female 26.0 0 \n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", "4 Allen, Mr. William Henry male 35.0 0 \n", ".. ... ... ... ... \n", "886 Montvila, Rev. Juozas male 27.0 0 \n", "887 Graham, Miss. Margaret Edith female 19.0 0 \n", "888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 \n", "889 Behr, Mr. Karl Howell male 26.0 0 \n", "890 Dooley, Mr. Patrick male 32.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "0 0 A/5 21171 7.2500 NaN S \n", "1 0 PC 17599 71.2833 C85 C \n", "2 0 STON/O2. 3101282 7.9250 NaN S \n", "3 0 113803 53.1000 C123 S \n", "4 0 373450 8.0500 NaN S \n", ".. ... ... ... ... ... \n", "886 0 211536 13.0000 NaN S \n", "887 0 112053 30.0000 B42 S \n", "888 2 W./C. 6607 23.4500 NaN S \n", "889 0 111369 30.0000 C148 C \n", "890 0 370376 7.7500 NaN Q \n", "\n", "[891 rows x 12 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "id": "enclosed-acting", "metadata": {}, "source": [ "In this scenario, I am interested in sorting the data (rather like Excel). Rather than using the sort feature in Excel, we can use the `df.sort_values()` method in Python. This will take one argument, specifically the column that you want to organize by. By default, this will be ascending. Let's do this by class. In other words, sort the DataFrame so that those who were in first class appear first and those in third class appear last. We will do this by passing the argument \"Pclass\", the column name corresponding to Passenger Class." ] }, { "cell_type": "code", "execution_count": 3, "id": "collected-integer", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
44544611Dodge, Master. Washingtonmale4.0023363881.8583A34S
31031111Hays, Miss. Margaret Bechsteinfemale24.0001176783.1583C54C
30931011Francatelli, Miss. Laura Mabelfemale30.000PC 1748556.9292E36C
30730811Penasco y Castellana, Mrs. Victor de Satode (M...female17.010PC 17758108.9000C65C
30630711Fleming, Miss. MargaretfemaleNaN0017421110.8833NaNC
.......................................
37938003Gustafsson, Mr. Karl Gideonmale19.0003470697.7750NaNS
38138213Nakid, Miss. Maria (\"Mary\")female1.002265315.7417NaNC
38238303Tikkanen, Mr. Juhomale32.000STON/O 2. 31012937.9250NaNS
37137203Wiklund, Mr. Jakob Alfredmale18.01031012676.4958NaNS
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ
\n", "

891 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "445 446 1 1 \n", "310 311 1 1 \n", "309 310 1 1 \n", "307 308 1 1 \n", "306 307 1 1 \n", ".. ... ... ... \n", "379 380 0 3 \n", "381 382 1 3 \n", "382 383 0 3 \n", "371 372 0 3 \n", "890 891 0 3 \n", "\n", " Name Sex Age SibSp \\\n", "445 Dodge, Master. Washington male 4.0 0 \n", "310 Hays, Miss. Margaret Bechstein female 24.0 0 \n", "309 Francatelli, Miss. Laura Mabel female 30.0 0 \n", "307 Penasco y Castellana, Mrs. Victor de Satode (M... female 17.0 1 \n", "306 Fleming, Miss. Margaret female NaN 0 \n", ".. ... ... ... ... \n", "379 Gustafsson, Mr. Karl Gideon male 19.0 0 \n", "381 Nakid, Miss. Maria (\"Mary\") female 1.0 0 \n", "382 Tikkanen, Mr. Juho male 32.0 0 \n", "371 Wiklund, Mr. Jakob Alfred male 18.0 1 \n", "890 Dooley, Mr. Patrick male 32.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "445 2 33638 81.8583 A34 S \n", "310 0 11767 83.1583 C54 C \n", "309 0 PC 17485 56.9292 E36 C \n", "307 0 PC 17758 108.9000 C65 C \n", "306 0 17421 110.8833 NaN C \n", ".. ... ... ... ... ... \n", "379 0 347069 7.7750 NaN S \n", "381 2 2653 15.7417 NaN C \n", "382 0 STON/O 2. 3101293 7.9250 NaN S \n", "371 0 3101267 6.4958 NaN S \n", "890 0 370376 7.7500 NaN Q \n", "\n", "[891 rows x 12 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values(\"Pclass\")" ] }, { "cell_type": "markdown", "id": "tested-deposit", "metadata": {}, "source": [ "## How to Reverse Sort Data by Single Column" ] }, { "cell_type": "markdown", "id": "corresponding-register", "metadata": {}, "source": [ "As we can see, our data is now appearing as expected. We can pass additional keyword arguments to sort the data in the opposite direction, or descending by setting `ascending` parameter to `False`. See the example below." ] }, { "cell_type": "code", "execution_count": 4, "id": "numerical-retention", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
51151203Webber, Mr. JamesmaleNaN00SOTON/OQ 31013168.0500NaNS
50050103Calic, Mr. Petarmale17.0003150868.6625NaNS
50150203Canavan, Miss. Maryfemale21.0003648467.7500NaNQ
50250303O'Sullivan, Miss. Bridget MaryfemaleNaN003309097.6292NaNQ
.......................................
10210301White, Mr. Richard Frasarmale21.0013528177.2875D26S
71071111Mayne, Mlle. Berthe Antonine (\"Mrs de Villiers\")female24.000PC 1748249.5042C90C
71171201Klaber, Mr. HermanmaleNaN0011302826.5500C124S
71271311Taylor, Mr. Elmer Zebleymale48.0101999652.0000C126S
44544611Dodge, Master. Washingtonmale4.0023363881.8583A34S
\n", "

891 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "0 1 0 3 \n", "511 512 0 3 \n", "500 501 0 3 \n", "501 502 0 3 \n", "502 503 0 3 \n", ".. ... ... ... \n", "102 103 0 1 \n", "710 711 1 1 \n", "711 712 0 1 \n", "712 713 1 1 \n", "445 446 1 1 \n", "\n", " Name Sex Age SibSp \\\n", "0 Braund, Mr. Owen Harris male 22.0 1 \n", "511 Webber, Mr. James male NaN 0 \n", "500 Calic, Mr. Petar male 17.0 0 \n", "501 Canavan, Miss. Mary female 21.0 0 \n", "502 O'Sullivan, Miss. Bridget Mary female NaN 0 \n", ".. ... ... ... ... \n", "102 White, Mr. Richard Frasar male 21.0 0 \n", "710 Mayne, Mlle. Berthe Antonine (\"Mrs de Villiers\") female 24.0 0 \n", "711 Klaber, Mr. Herman male NaN 0 \n", "712 Taylor, Mr. Elmer Zebley male 48.0 1 \n", "445 Dodge, Master. Washington male 4.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "0 0 A/5 21171 7.2500 NaN S \n", "511 0 SOTON/OQ 3101316 8.0500 NaN S \n", "500 0 315086 8.6625 NaN S \n", "501 0 364846 7.7500 NaN Q \n", "502 0 330909 7.6292 NaN Q \n", ".. ... ... ... ... ... \n", "102 1 35281 77.2875 D26 S \n", "710 0 PC 17482 49.5042 C90 C \n", "711 0 113028 26.5500 C124 S \n", "712 0 19996 52.0000 C126 S \n", "445 2 33638 81.8583 A34 S \n", "\n", "[891 rows x 12 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values(\"Pclass\", ascending=False)" ] }, { "cell_type": "markdown", "id": "neutral-shield", "metadata": {}, "source": [ "## How to Sort Data by Multiple Columns" ] }, { "cell_type": "markdown", "id": "weekly-spread", "metadata": {}, "source": [ "Again, we can see the power of Pandas over Excel by the simplicity of altering our command to include multiple columns. Let's say that we want to sort all the data by Pclass, then we want that data organized again by sex, so that all male and female passengers appear in order. We can do this by passing the argument of what we want organized as a list. Note the order of the list as well. The columns that appear earlier in the list correspond to those that receive primacy in the ascending. In other words, we organize by passenger class firsit, then sex. In this case, the method head, is simply showing the top 100 rows." ] }, { "cell_type": "code", "execution_count": 5, "id": "residential-festival", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
111211Bonnell, Miss. Elizabethfemale58.00011378326.5500C103S
313211Spencer, Mrs. William Augustus (Marie Eugenie)femaleNaN10PC 17569146.5208B78C
525311Harper, Mrs. Henry Sleeper (Myna Haxtun)female49.010PC 1757276.7292D33C
.......................................
232411Sloper, Mr. William Thompsonmale28.00011378835.5000A6S
272801Fortune, Mr. Charles Alexandermale19.03219950263.0000C23 C25 C27S
303101Uruchurtu, Don. Manuel Emale40.000PC 1760127.7208NaNC
343501Meyer, Mr. Edgar Josephmale28.010PC 1760482.1708NaNC
353601Holverson, Mr. Alexander Oskarmale42.01011378952.0000NaNS
\n", "

100 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "1 2 1 1 \n", "3 4 1 1 \n", "11 12 1 1 \n", "31 32 1 1 \n", "52 53 1 1 \n", ".. ... ... ... \n", "23 24 1 1 \n", "27 28 0 1 \n", "30 31 0 1 \n", "34 35 0 1 \n", "35 36 0 1 \n", "\n", " Name Sex Age SibSp \\\n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", "11 Bonnell, Miss. Elizabeth female 58.0 0 \n", "31 Spencer, Mrs. William Augustus (Marie Eugenie) female NaN 1 \n", "52 Harper, Mrs. Henry Sleeper (Myna Haxtun) female 49.0 1 \n", ".. ... ... ... ... \n", "23 Sloper, Mr. William Thompson male 28.0 0 \n", "27 Fortune, Mr. Charles Alexander male 19.0 3 \n", "30 Uruchurtu, Don. Manuel E male 40.0 0 \n", "34 Meyer, Mr. Edgar Joseph male 28.0 1 \n", "35 Holverson, Mr. Alexander Oskar male 42.0 1 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "1 0 PC 17599 71.2833 C85 C \n", "3 0 113803 53.1000 C123 S \n", "11 0 113783 26.5500 C103 S \n", "31 0 PC 17569 146.5208 B78 C \n", "52 0 PC 17572 76.7292 D33 C \n", ".. ... ... ... ... ... \n", "23 0 113788 35.5000 A6 S \n", "27 2 19950 263.0000 C23 C25 C27 S \n", "30 0 PC 17601 27.7208 NaN C \n", "34 0 PC 17604 82.1708 NaN C \n", "35 0 113789 52.0000 NaN S \n", "\n", "[100 rows x 12 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values([\"Pclass\", \"Sex\"]).head(100)" ] }, { "cell_type": "markdown", "id": "revised-excuse", "metadata": {}, "source": [ "As with before, we can control how the data is sorted, either ascending or descending. If we set ascending to False, we organize all items in the list by this method. We can do this with the sample code below." ] }, { "cell_type": "code", "execution_count": 6, "id": "demographic-ultimate", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
5603Moran, Mr. JamesmaleNaN003308778.4583NaNQ
7803Palsson, Master. Gosta Leonardmale2.03134990921.0750NaNS
121303Saundercock, Mr. William Henrymale20.000A/5. 21518.0500NaNS
.......................................
85685711Wick, Mrs. George Dennick (Mary Hitchcock)female45.01136928164.8667NaNS
86286311Swift, Mrs. Frederick Joel (Margaret Welles Ba...female48.0001746625.9292D17S
87187211Beckwith, Mrs. Richard Leonard (Sallie Monypeny)female47.0111175152.5542D35S
87988011Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)female56.0011176783.1583C50C
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
\n", "

891 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "0 1 0 3 \n", "4 5 0 3 \n", "5 6 0 3 \n", "7 8 0 3 \n", "12 13 0 3 \n", ".. ... ... ... \n", "856 857 1 1 \n", "862 863 1 1 \n", "871 872 1 1 \n", "879 880 1 1 \n", "887 888 1 1 \n", "\n", " Name Sex Age SibSp \\\n", "0 Braund, Mr. Owen Harris male 22.0 1 \n", "4 Allen, Mr. William Henry male 35.0 0 \n", "5 Moran, Mr. James male NaN 0 \n", "7 Palsson, Master. Gosta Leonard male 2.0 3 \n", "12 Saundercock, Mr. William Henry male 20.0 0 \n", ".. ... ... ... ... \n", "856 Wick, Mrs. George Dennick (Mary Hitchcock) female 45.0 1 \n", "862 Swift, Mrs. Frederick Joel (Margaret Welles Ba... female 48.0 0 \n", "871 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47.0 1 \n", "879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 0 \n", "887 Graham, Miss. Margaret Edith female 19.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "0 0 A/5 21171 7.2500 NaN S \n", "4 0 373450 8.0500 NaN S \n", "5 0 330877 8.4583 NaN Q \n", "7 1 349909 21.0750 NaN S \n", "12 0 A/5. 2151 8.0500 NaN S \n", ".. ... ... ... ... ... \n", "856 1 36928 164.8667 NaN S \n", "862 0 17466 25.9292 D17 S \n", "871 1 11751 52.5542 D35 S \n", "879 1 11767 83.1583 C50 C \n", "887 0 112053 30.0000 B42 S \n", "\n", "[891 rows x 12 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values([\"Pclass\", \"Sex\"], ascending=False)" ] }, { "cell_type": "markdown", "id": "scientific-trigger", "metadata": {}, "source": [ "## How to Sort Data by Multiple Columns with Different Values Organized Differently" ] }, { "cell_type": "markdown", "id": "handmade-sweet", "metadata": {}, "source": [ "What if we want to organize the data differently. By this I mean, we want for all the data to be organized by passenger class first and for that data to be ascending (1, 2, 3), but we want the sex of the passengers to be organized descending (male, female, rather than female, male). To achieve this, we can pass a list to ascending with 0s and 1s. 0 is False and 1 is True." ] }, { "cell_type": "code", "execution_count": 7, "id": "ceramic-cartoon", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
6701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46S
232411Sloper, Mr. William Thompsonmale28.00011378835.5000A6S
272801Fortune, Mr. Charles Alexandermale19.03219950263.0000C23 C25 C27S
303101Uruchurtu, Don. Manuel Emale40.000PC 1760127.7208NaNC
343501Meyer, Mr. Edgar Josephmale28.010PC 1760482.1708NaNC
.......................................
86386403Sage, Miss. Dorothy Edith \"Dolly\"femaleNaN82CA. 234369.5500NaNS
87587613Najib, Miss. Adele Kiamie \"Jane\"female15.00026677.2250NaNC
88288303Dahlberg, Miss. Gerda Ulrikafemale22.000755210.5167NaNS
88588603Rice, Mrs. William (Margaret Norton)female39.00538265229.1250NaNQ
88888903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.4500NaNS
\n", "

891 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass Name \\\n", "6 7 0 1 McCarthy, Mr. Timothy J \n", "23 24 1 1 Sloper, Mr. William Thompson \n", "27 28 0 1 Fortune, Mr. Charles Alexander \n", "30 31 0 1 Uruchurtu, Don. Manuel E \n", "34 35 0 1 Meyer, Mr. Edgar Joseph \n", ".. ... ... ... ... \n", "863 864 0 3 Sage, Miss. Dorothy Edith \"Dolly\" \n", "875 876 1 3 Najib, Miss. Adele Kiamie \"Jane\" \n", "882 883 0 3 Dahlberg, Miss. Gerda Ulrika \n", "885 886 0 3 Rice, Mrs. William (Margaret Norton) \n", "888 889 0 3 Johnston, Miss. Catherine Helen \"Carrie\" \n", "\n", " Sex Age SibSp Parch Ticket Fare Cabin Embarked \n", "6 male 54.0 0 0 17463 51.8625 E46 S \n", "23 male 28.0 0 0 113788 35.5000 A6 S \n", "27 male 19.0 3 2 19950 263.0000 C23 C25 C27 S \n", "30 male 40.0 0 0 PC 17601 27.7208 NaN C \n", "34 male 28.0 1 0 PC 17604 82.1708 NaN C \n", ".. ... ... ... ... ... ... ... ... \n", "863 female NaN 8 2 CA. 2343 69.5500 NaN S \n", "875 female 15.0 0 0 2667 7.2250 NaN C \n", "882 female 22.0 0 0 7552 10.5167 NaN S \n", "885 female 39.0 0 5 382652 29.1250 NaN Q \n", "888 female NaN 1 2 W./C. 6607 23.4500 NaN S \n", "\n", "[891 rows x 12 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values([\"Pclass\", \"Sex\"], ascending=[1,0])" ] }, { "cell_type": "markdown", "id": "focal-roommate", "metadata": {}, "source": [ "What is particularly nice about Pandas over Excel is that this operation scales nicely. If we want to add more methods of sorting, we can do that too by simple increasing the indices of our lists. Always make sure that the length of your lists match, however. In other words, do not have 3 attributes to sort by and 2 items in your ascending list. In this case, we want to organize by passenger class, sex, and age with passenger class ascending, sex descending, and age ascending. Let's see what that would look like." ] }, { "cell_type": "code", "execution_count": 8, "id": "abroad-saudi", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
30530611Allison, Master. Hudson Trevormale0.9212113781151.5500C22 C26S
44544611Dodge, Master. Washingtonmale4.00023363881.8583A34S
80280311Carter, Master. William Thornton IImale11.0012113760120.0000B96 B98S
55055111Thayer, Mr. John Borland Jrmale17.000217421110.8833C70C
50550601Penasco y Castellana, Mr. Victor de Satodemale18.0010PC 17758108.9000C65C
.......................................
69769813Mullens, Miss. Katherine \"Katie\"femaleNaN00358527.7333NaNQ
72772813Mannion, Miss. MargarethfemaleNaN00368667.7375NaNQ
79279303Sage, Miss. Stella AnnafemaleNaN82CA. 234369.5500NaNS
86386403Sage, Miss. Dorothy Edith \"Dolly\"femaleNaN82CA. 234369.5500NaNS
88888903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.4500NaNS
\n", "

891 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "305 306 1 1 \n", "445 446 1 1 \n", "802 803 1 1 \n", "550 551 1 1 \n", "505 506 0 1 \n", ".. ... ... ... \n", "697 698 1 3 \n", "727 728 1 3 \n", "792 793 0 3 \n", "863 864 0 3 \n", "888 889 0 3 \n", "\n", " Name Sex Age SibSp Parch \\\n", "305 Allison, Master. Hudson Trevor male 0.92 1 2 \n", "445 Dodge, Master. Washington male 4.00 0 2 \n", "802 Carter, Master. William Thornton II male 11.00 1 2 \n", "550 Thayer, Mr. John Borland Jr male 17.00 0 2 \n", "505 Penasco y Castellana, Mr. Victor de Satode male 18.00 1 0 \n", ".. ... ... ... ... ... \n", "697 Mullens, Miss. Katherine \"Katie\" female NaN 0 0 \n", "727 Mannion, Miss. Margareth female NaN 0 0 \n", "792 Sage, Miss. Stella Anna female NaN 8 2 \n", "863 Sage, Miss. Dorothy Edith \"Dolly\" female NaN 8 2 \n", "888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 2 \n", "\n", " Ticket Fare Cabin Embarked \n", "305 113781 151.5500 C22 C26 S \n", "445 33638 81.8583 A34 S \n", "802 113760 120.0000 B96 B98 S \n", "550 17421 110.8833 C70 C \n", "505 PC 17758 108.9000 C65 C \n", ".. ... ... ... ... \n", "697 35852 7.7333 NaN Q \n", "727 36866 7.7375 NaN Q \n", "792 CA. 2343 69.5500 NaN S \n", "863 CA. 2343 69.5500 NaN S \n", "888 W./C. 6607 23.4500 NaN S \n", "\n", "[891 rows x 12 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values([\"Pclass\", \"Sex\", \"Age\"], ascending=[1,0,1])" ] }, { "cell_type": "markdown", "id": "introductory-insight", "metadata": {}, "source": [ "As we move forward throughout this textbook, we will explore more robust ways to sort and organize our data. For now, you should feel comfortable with how to use `sort_values()` to do fairly robust tasks quickly." ] }, { "cell_type": "code", "execution_count": null, "id": "altered-institution", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 5 }