{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**Dplyr** is one of the most popular r-packages and also part of **tidyverse** that's been developed by Hadley Wickham. The mere fact that dplyr package is very famous means, it's one of the most frequently used. Being a data scientist is not always about creating sophisticated models but Data Analysis (Manipulation) and Data Visualization play a very important role in BAU of many us - in fact, a very important part before any modeling exercise since Feature Engineering and EDA are the most important differentiating factors of your model and someone else's.\n", "\n", "Hence, this notebook aims to bring out some well known and not-so-well-known applications of dplyr so that any data analyst could leverage its potential" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "_cell_guid": "af305410-23e4-40eb-8b38-0fa5075ee365", "_uuid": "766132ca711314a13762f0332b55bb417244153b" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "Attaching package: ‘dplyr’\n", "\n", "The following objects are masked from ‘package:stats’:\n", "\n", " filter, lag\n", "\n", "The following objects are masked from ‘package:base’:\n", "\n", " intersect, setdiff, setequal, union\n", "\n" ] } ], "source": [ "# This R environment comes with all of CRAN preinstalled, as well as many other helpful packages\n", "# The environment is defined by the kaggle/rstats docker image: https://github.com/kaggle/docker-rstats\n", "# For example, here's several helpful packages to load in \n", "\n", "library(dplyr) # Loading Dplyr package\n", "\n", "# Input data files are available in the \"../input/\" directory.\n", "# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory\n", "\n", "system(\"ls ../input\")\n", "\n", "# Any results you write to the current directory are saved as output." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us start by reading the input training file using the base r function `read.csv`" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
1 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.2500 S
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38 1 0 PC 17599 71.2833 C85 C
3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.9250 S
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1000 C123 S
5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.0500 S
6 0 3 Moran, Mr. James male NA 0 0 330877 8.4583 Q
7 0 1 McCarthy, Mr. Timothy J male 54 0 0 17463 51.8625 E46 S
8 0 3 Palsson, Master. Gosta Leonard male 2 3 1 349909 21.0750 S
9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27 0 2 347742 11.1333 S
10 1 2 Nasser, Mrs. Nicholas (Adele Achem) female 14 1 0 237736 30.0708 C
11 1 3 Sandstrom, Miss. Marguerite Rut female 4 1 1 PP 9549 16.7000 G6 S
12 1 1 Bonnell, Miss. Elizabeth female 58 0 0 113783 26.5500 C103 S
13 0 3 Saundercock, Mr. William Henry male 20 0 0 A/5. 2151 8.0500 S
14 0 3 Andersson, Mr. Anders Johan male 39 1 5 347082 31.2750 S
15 0 3 Vestrom, Miss. Hulda Amanda Adolfina female 14 0 0 350406 7.8542 S
16 1 2 Hewlett, Mrs. (Mary D Kingcome) female 55 0 0 248706 16.0000 S
17 0 3 Rice, Master. Eugene male 2 4 1 382652 29.1250 Q
18 1 2 Williams, Mr. Charles Eugene male NA 0 0 244373 13.0000 S
19 0 3 Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele) female 31 1 0 345763 18.0000 S
20 1 3 Masselmani, Mrs. Fatima female NA 0 0 2649 7.2250 C
21 0 2 Fynney, Mr. Joseph J male 35 0 0 239865 26.0000 S
22 1 2 Beesley, Mr. Lawrence male 34 0 0 248698 13.0000 D56 S
23 1 3 McGowan, Miss. Anna \"Annie\" female 15 0 0 330923 8.0292 Q
24 1 1 Sloper, Mr. William Thompson male 28 0 0 113788 35.5000 A6 S
25 0 3 Palsson, Miss. Torborg Danira female 8 3 1 349909 21.0750 S
26 1 3 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)female 38 1 5 347077 31.3875 S
27 0 3 Emir, Mr. Farred Chehab male NA 0 0 2631 7.2250 C
28 0 1 Fortune, Mr. Charles Alexander male 19 3 2 19950 263.0000 C23 C25 C27 S
29 1 3 O'Dwyer, Miss. Ellen \"Nellie\" female NA 0 0 330959 7.8792 Q
30 0 3 Todoroff, Mr. Lalio male NA 0 0 349216 7.8958 S
862 0 2 Giles, Mr. Frederick Edward male 21 1 0 28134 11.5000 S
863 1 1 Swift, Mrs. Frederick Joel (Margaret Welles Barron)female 48 0 0 17466 25.9292 D17 S
864 0 3 Sage, Miss. Dorothy Edith \"Dolly\" female NA 8 2 CA. 2343 69.5500 S
865 0 2 Gill, Mr. John William male 24 0 0 233866 13.0000 S
866 1 2 Bystrom, Mrs. (Karolina) female 42 0 0 236852 13.0000 S
867 1 2 Duran y More, Miss. Asuncion female 27 1 0 SC/PARIS 2149 13.8583 C
868 0 1 Roebling, Mr. Washington Augustus II male 31 0 0 PC 17590 50.4958 A24 S
869 0 3 van Melkebeke, Mr. Philemon male NA 0 0 345777 9.5000 S
870 1 3 Johnson, Master. Harold Theodor male 4 1 1 347742 11.1333 S
871 0 3 Balkic, Mr. Cerin male 26 0 0 349248 7.8958 S
872 1 1 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47 1 1 11751 52.5542 D35 S
873 0 1 Carlsson, Mr. Frans Olof male 33 0 0 695 5.0000 B51 B53 B55 S
874 0 3 Vander Cruyssen, Mr. Victor male 47 0 0 345765 9.0000 S
875 1 2 Abelson, Mrs. Samuel (Hannah Wizosky) female 28 1 0 P/PP 3381 24.0000 C
876 1 3 Najib, Miss. Adele Kiamie \"Jane\" female 15 0 0 2667 7.2250 C
877 0 3 Gustafsson, Mr. Alfred Ossian male 20 0 0 7534 9.8458 S
878 0 3 Petroff, Mr. Nedelio male 19 0 0 349212 7.8958 S
879 0 3 Laleff, Mr. Kristo male NA 0 0 349217 7.8958 S
880 1 1 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56 0 1 11767 83.1583 C50 C
881 1 2 Shelley, Mrs. William (Imanita Parrish Hall) female 25 0 1 230433 26.0000 S
882 0 3 Markun, Mr. Johann male 33 0 0 349257 7.8958 S
883 0 3 Dahlberg, Miss. Gerda Ulrika female 22 0 0 7552 10.5167 S
884 0 2 Banfield, Mr. Frederick James male 28 0 0 C.A./SOTON 34068 10.5000 S
885 0 3 Sutehall, Mr. Henry Jr male 25 0 0 SOTON/OQ 392076 7.0500 S
886 0 3 Rice, Mrs. William (Margaret Norton) female 39 0 5 382652 29.1250 Q
887 0 2 Montvila, Rev. Juozas male 27 0 0 211536 13.0000 S
888 1 1 Graham, Miss. Margaret Edith female 19 0 0 112053 30.0000 B42 S
889 0 3 Johnston, Miss. Catherine Helen \"Carrie\" female NA 1 2 W./C. 6607 23.4500 S
890 1 1 Behr, Mr. Karl Howell male 26 0 0 111369 30.0000 C148 C
891 0 3 Dooley, Mr. Patrick male 32 0 0 370376 7.7500 Q
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " PassengerId & Survived & Pclass & Name & Sex & Age & SibSp & Parch & Ticket & Fare & Cabin & Embarked\\\\\n", "\\hline\n", "\t 1 & 0 & 3 & Braund, Mr. Owen Harris & male & 22 & 1 & 0 & A/5 21171 & 7.2500 & & S \\\\\n", "\t 2 & 1 & 1 & Cumings, Mrs. John Bradley (Florence Briggs Thayer) & female & 38 & 1 & 0 & PC 17599 & 71.2833 & C85 & C \\\\\n", "\t 3 & 1 & 3 & Heikkinen, Miss. Laina & female & 26 & 0 & 0 & STON/O2. 3101282 & 7.9250 & & S \\\\\n", "\t 4 & 1 & 1 & Futrelle, Mrs. Jacques Heath (Lily May Peel) & female & 35 & 1 & 0 & 113803 & 53.1000 & C123 & S \\\\\n", "\t 5 & 0 & 3 & Allen, Mr. William Henry & male & 35 & 0 & 0 & 373450 & 8.0500 & & S \\\\\n", "\t 6 & 0 & 3 & Moran, Mr. James & male & NA & 0 & 0 & 330877 & 8.4583 & & Q \\\\\n", "\t 7 & 0 & 1 & McCarthy, Mr. Timothy J & male & 54 & 0 & 0 & 17463 & 51.8625 & E46 & S \\\\\n", "\t 8 & 0 & 3 & Palsson, Master. Gosta Leonard & male & 2 & 3 & 1 & 349909 & 21.0750 & & S \\\\\n", "\t 9 & 1 & 3 & Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) & female & 27 & 0 & 2 & 347742 & 11.1333 & & S \\\\\n", "\t 10 & 1 & 2 & Nasser, Mrs. Nicholas (Adele Achem) & female & 14 & 1 & 0 & 237736 & 30.0708 & & C \\\\\n", "\t 11 & 1 & 3 & Sandstrom, Miss. Marguerite Rut & female & 4 & 1 & 1 & PP 9549 & 16.7000 & G6 & S \\\\\n", "\t 12 & 1 & 1 & Bonnell, Miss. Elizabeth & female & 58 & 0 & 0 & 113783 & 26.5500 & C103 & S \\\\\n", "\t 13 & 0 & 3 & Saundercock, Mr. William Henry & male & 20 & 0 & 0 & A/5. 2151 & 8.0500 & & S \\\\\n", "\t 14 & 0 & 3 & Andersson, Mr. Anders Johan & male & 39 & 1 & 5 & 347082 & 31.2750 & & S \\\\\n", "\t 15 & 0 & 3 & Vestrom, Miss. Hulda Amanda Adolfina & female & 14 & 0 & 0 & 350406 & 7.8542 & & S \\\\\n", "\t 16 & 1 & 2 & Hewlett, Mrs. (Mary D Kingcome) & female & 55 & 0 & 0 & 248706 & 16.0000 & & S \\\\\n", "\t 17 & 0 & 3 & Rice, Master. Eugene & male & 2 & 4 & 1 & 382652 & 29.1250 & & Q \\\\\n", "\t 18 & 1 & 2 & Williams, Mr. Charles Eugene & male & NA & 0 & 0 & 244373 & 13.0000 & & S \\\\\n", "\t 19 & 0 & 3 & Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele) & female & 31 & 1 & 0 & 345763 & 18.0000 & & S \\\\\n", "\t 20 & 1 & 3 & Masselmani, Mrs. Fatima & female & NA & 0 & 0 & 2649 & 7.2250 & & C \\\\\n", "\t 21 & 0 & 2 & Fynney, Mr. Joseph J & male & 35 & 0 & 0 & 239865 & 26.0000 & & S \\\\\n", "\t 22 & 1 & 2 & Beesley, Mr. Lawrence & male & 34 & 0 & 0 & 248698 & 13.0000 & D56 & S \\\\\n", "\t 23 & 1 & 3 & McGowan, Miss. Anna \"Annie\" & female & 15 & 0 & 0 & 330923 & 8.0292 & & Q \\\\\n", "\t 24 & 1 & 1 & Sloper, Mr. William Thompson & male & 28 & 0 & 0 & 113788 & 35.5000 & A6 & S \\\\\n", "\t 25 & 0 & 3 & Palsson, Miss. Torborg Danira & female & 8 & 3 & 1 & 349909 & 21.0750 & & S \\\\\n", "\t 26 & 1 & 3 & Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson) & female & 38 & 1 & 5 & 347077 & 31.3875 & & S \\\\\n", "\t 27 & 0 & 3 & Emir, Mr. Farred Chehab & male & NA & 0 & 0 & 2631 & 7.2250 & & C \\\\\n", "\t 28 & 0 & 1 & Fortune, Mr. Charles Alexander & male & 19 & 3 & 2 & 19950 & 263.0000 & C23 C25 C27 & S \\\\\n", "\t 29 & 1 & 3 & O'Dwyer, Miss. Ellen \"Nellie\" & female & NA & 0 & 0 & 330959 & 7.8792 & & Q \\\\\n", "\t 30 & 0 & 3 & Todoroff, Mr. Lalio & male & NA & 0 & 0 & 349216 & 7.8958 & & S \\\\\n", "\t ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮\\\\\n", "\t 862 & 0 & 2 & Giles, Mr. Frederick Edward & male & 21 & 1 & 0 & 28134 & 11.5000 & & S \\\\\n", "\t 863 & 1 & 1 & Swift, Mrs. Frederick Joel (Margaret Welles Barron) & female & 48 & 0 & 0 & 17466 & 25.9292 & D17 & S \\\\\n", "\t 864 & 0 & 3 & Sage, Miss. Dorothy Edith \"Dolly\" & female & NA & 8 & 2 & CA. 2343 & 69.5500 & & S \\\\\n", "\t 865 & 0 & 2 & Gill, Mr. John William & male & 24 & 0 & 0 & 233866 & 13.0000 & & S \\\\\n", "\t 866 & 1 & 2 & Bystrom, Mrs. (Karolina) & female & 42 & 0 & 0 & 236852 & 13.0000 & & S \\\\\n", "\t 867 & 1 & 2 & Duran y More, Miss. Asuncion & female & 27 & 1 & 0 & SC/PARIS 2149 & 13.8583 & & C \\\\\n", "\t 868 & 0 & 1 & Roebling, Mr. Washington Augustus II & male & 31 & 0 & 0 & PC 17590 & 50.4958 & A24 & S \\\\\n", "\t 869 & 0 & 3 & van Melkebeke, Mr. Philemon & male & NA & 0 & 0 & 345777 & 9.5000 & & S \\\\\n", "\t 870 & 1 & 3 & Johnson, Master. Harold Theodor & male & 4 & 1 & 1 & 347742 & 11.1333 & & S \\\\\n", "\t 871 & 0 & 3 & Balkic, Mr. Cerin & male & 26 & 0 & 0 & 349248 & 7.8958 & & S \\\\\n", "\t 872 & 1 & 1 & Beckwith, Mrs. Richard Leonard (Sallie Monypeny) & female & 47 & 1 & 1 & 11751 & 52.5542 & D35 & S \\\\\n", "\t 873 & 0 & 1 & Carlsson, Mr. Frans Olof & male & 33 & 0 & 0 & 695 & 5.0000 & B51 B53 B55 & S \\\\\n", "\t 874 & 0 & 3 & Vander Cruyssen, Mr. Victor & male & 47 & 0 & 0 & 345765 & 9.0000 & & S \\\\\n", "\t 875 & 1 & 2 & Abelson, Mrs. Samuel (Hannah Wizosky) & female & 28 & 1 & 0 & P/PP 3381 & 24.0000 & & C \\\\\n", "\t 876 & 1 & 3 & Najib, Miss. Adele Kiamie \"Jane\" & female & 15 & 0 & 0 & 2667 & 7.2250 & & C \\\\\n", "\t 877 & 0 & 3 & Gustafsson, Mr. Alfred Ossian & male & 20 & 0 & 0 & 7534 & 9.8458 & & S \\\\\n", "\t 878 & 0 & 3 & Petroff, Mr. Nedelio & male & 19 & 0 & 0 & 349212 & 7.8958 & & S \\\\\n", "\t 879 & 0 & 3 & Laleff, Mr. Kristo & male & NA & 0 & 0 & 349217 & 7.8958 & & S \\\\\n", "\t 880 & 1 & 1 & Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) & female & 56 & 0 & 1 & 11767 & 83.1583 & C50 & C \\\\\n", "\t 881 & 1 & 2 & Shelley, Mrs. William (Imanita Parrish Hall) & female & 25 & 0 & 1 & 230433 & 26.0000 & & S \\\\\n", "\t 882 & 0 & 3 & Markun, Mr. Johann & male & 33 & 0 & 0 & 349257 & 7.8958 & & S \\\\\n", "\t 883 & 0 & 3 & Dahlberg, Miss. Gerda Ulrika & female & 22 & 0 & 0 & 7552 & 10.5167 & & S \\\\\n", "\t 884 & 0 & 2 & Banfield, Mr. Frederick James & male & 28 & 0 & 0 & C.A./SOTON 34068 & 10.5000 & & S \\\\\n", "\t 885 & 0 & 3 & Sutehall, Mr. Henry Jr & male & 25 & 0 & 0 & SOTON/OQ 392076 & 7.0500 & & S \\\\\n", "\t 886 & 0 & 3 & Rice, Mrs. William (Margaret Norton) & female & 39 & 0 & 5 & 382652 & 29.1250 & & Q \\\\\n", "\t 887 & 0 & 2 & Montvila, Rev. Juozas & male & 27 & 0 & 0 & 211536 & 13.0000 & & S \\\\\n", "\t 888 & 1 & 1 & Graham, Miss. Margaret Edith & female & 19 & 0 & 0 & 112053 & 30.0000 & B42 & S \\\\\n", "\t 889 & 0 & 3 & Johnston, Miss. Catherine Helen \"Carrie\" & female & NA & 1 & 2 & W./C. 6607 & 23.4500 & & S \\\\\n", "\t 890 & 1 & 1 & Behr, Mr. Karl Howell & male & 26 & 0 & 0 & 111369 & 30.0000 & C148 & C \\\\\n", "\t 891 & 0 & 3 & Dooley, Mr. Patrick & male & 32 & 0 & 0 & 370376 & 7.7500 & & Q \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | \n", "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", "| 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22 | 1 | 0 | A/5 21171 | 7.2500 | | S | \n", "| 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Thayer) | female | 38 | 1 | 0 | PC 17599 | 71.2833 | C85 | C | \n", "| 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26 | 0 | 0 | STON/O2. 3101282 | 7.9250 | | S | \n", "| 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35 | 1 | 0 | 113803 | 53.1000 | C123 | S | \n", "| 5 | 0 | 3 | Allen, Mr. William Henry | male | 35 | 0 | 0 | 373450 | 8.0500 | | S | \n", "| 6 | 0 | 3 | Moran, Mr. James | male | NA | 0 | 0 | 330877 | 8.4583 | | Q | \n", "| 7 | 0 | 1 | McCarthy, Mr. Timothy J | male | 54 | 0 | 0 | 17463 | 51.8625 | E46 | S | \n", "| 8 | 0 | 3 | Palsson, Master. Gosta Leonard | male | 2 | 3 | 1 | 349909 | 21.0750 | | S | \n", "| 9 | 1 | 3 | Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) | female | 27 | 0 | 2 | 347742 | 11.1333 | | S | \n", "| 10 | 1 | 2 | Nasser, Mrs. Nicholas (Adele Achem) | female | 14 | 1 | 0 | 237736 | 30.0708 | | C | \n", "| 11 | 1 | 3 | Sandstrom, Miss. Marguerite Rut | female | 4 | 1 | 1 | PP 9549 | 16.7000 | G6 | S | \n", "| 12 | 1 | 1 | Bonnell, Miss. Elizabeth | female | 58 | 0 | 0 | 113783 | 26.5500 | C103 | S | \n", "| 13 | 0 | 3 | Saundercock, Mr. William Henry | male | 20 | 0 | 0 | A/5. 2151 | 8.0500 | | S | \n", "| 14 | 0 | 3 | Andersson, Mr. Anders Johan | male | 39 | 1 | 5 | 347082 | 31.2750 | | S | \n", "| 15 | 0 | 3 | Vestrom, Miss. Hulda Amanda Adolfina | female | 14 | 0 | 0 | 350406 | 7.8542 | | S | \n", "| 16 | 1 | 2 | Hewlett, Mrs. (Mary D Kingcome) | female | 55 | 0 | 0 | 248706 | 16.0000 | | S | \n", "| 17 | 0 | 3 | Rice, Master. Eugene | male | 2 | 4 | 1 | 382652 | 29.1250 | | Q | \n", "| 18 | 1 | 2 | Williams, Mr. Charles Eugene | male | NA | 0 | 0 | 244373 | 13.0000 | | S | \n", "| 19 | 0 | 3 | Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele) | female | 31 | 1 | 0 | 345763 | 18.0000 | | S | \n", "| 20 | 1 | 3 | Masselmani, Mrs. Fatima | female | NA | 0 | 0 | 2649 | 7.2250 | | C | \n", "| 21 | 0 | 2 | Fynney, Mr. Joseph J | male | 35 | 0 | 0 | 239865 | 26.0000 | | S | \n", "| 22 | 1 | 2 | Beesley, Mr. Lawrence | male | 34 | 0 | 0 | 248698 | 13.0000 | D56 | S | \n", "| 23 | 1 | 3 | McGowan, Miss. Anna \"Annie\" | female | 15 | 0 | 0 | 330923 | 8.0292 | | Q | \n", "| 24 | 1 | 1 | Sloper, Mr. William Thompson | male | 28 | 0 | 0 | 113788 | 35.5000 | A6 | S | \n", "| 25 | 0 | 3 | Palsson, Miss. Torborg Danira | female | 8 | 3 | 1 | 349909 | 21.0750 | | S | \n", "| 26 | 1 | 3 | Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson) | female | 38 | 1 | 5 | 347077 | 31.3875 | | S | \n", "| 27 | 0 | 3 | Emir, Mr. Farred Chehab | male | NA | 0 | 0 | 2631 | 7.2250 | | C | \n", "| 28 | 0 | 1 | Fortune, Mr. Charles Alexander | male | 19 | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S | \n", "| 29 | 1 | 3 | O'Dwyer, Miss. Ellen \"Nellie\" | female | NA | 0 | 0 | 330959 | 7.8792 | | Q | \n", "| 30 | 0 | 3 | Todoroff, Mr. Lalio | male | NA | 0 | 0 | 349216 | 7.8958 | | S | \n", "| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | \n", "| 862 | 0 | 2 | Giles, Mr. Frederick Edward | male | 21 | 1 | 0 | 28134 | 11.5000 | | S | \n", "| 863 | 1 | 1 | Swift, Mrs. Frederick Joel (Margaret Welles Barron) | female | 48 | 0 | 0 | 17466 | 25.9292 | D17 | S | \n", "| 864 | 0 | 3 | Sage, Miss. Dorothy Edith \"Dolly\" | female | NA | 8 | 2 | CA. 2343 | 69.5500 | | S | \n", "| 865 | 0 | 2 | Gill, Mr. John William | male | 24 | 0 | 0 | 233866 | 13.0000 | | S | \n", "| 866 | 1 | 2 | Bystrom, Mrs. (Karolina) | female | 42 | 0 | 0 | 236852 | 13.0000 | | S | \n", "| 867 | 1 | 2 | Duran y More, Miss. Asuncion | female | 27 | 1 | 0 | SC/PARIS 2149 | 13.8583 | | C | \n", "| 868 | 0 | 1 | Roebling, Mr. Washington Augustus II | male | 31 | 0 | 0 | PC 17590 | 50.4958 | A24 | S | \n", "| 869 | 0 | 3 | van Melkebeke, Mr. Philemon | male | NA | 0 | 0 | 345777 | 9.5000 | | S | \n", "| 870 | 1 | 3 | Johnson, Master. Harold Theodor | male | 4 | 1 | 1 | 347742 | 11.1333 | | S | \n", "| 871 | 0 | 3 | Balkic, Mr. Cerin | male | 26 | 0 | 0 | 349248 | 7.8958 | | S | \n", "| 872 | 1 | 1 | Beckwith, Mrs. Richard Leonard (Sallie Monypeny) | female | 47 | 1 | 1 | 11751 | 52.5542 | D35 | S | \n", "| 873 | 0 | 1 | Carlsson, Mr. Frans Olof | male | 33 | 0 | 0 | 695 | 5.0000 | B51 B53 B55 | S | \n", "| 874 | 0 | 3 | Vander Cruyssen, Mr. Victor | male | 47 | 0 | 0 | 345765 | 9.0000 | | S | \n", "| 875 | 1 | 2 | Abelson, Mrs. Samuel (Hannah Wizosky) | female | 28 | 1 | 0 | P/PP 3381 | 24.0000 | | C | \n", "| 876 | 1 | 3 | Najib, Miss. Adele Kiamie \"Jane\" | female | 15 | 0 | 0 | 2667 | 7.2250 | | C | \n", "| 877 | 0 | 3 | Gustafsson, Mr. Alfred Ossian | male | 20 | 0 | 0 | 7534 | 9.8458 | | S | \n", "| 878 | 0 | 3 | Petroff, Mr. Nedelio | male | 19 | 0 | 0 | 349212 | 7.8958 | | S | \n", "| 879 | 0 | 3 | Laleff, Mr. Kristo | male | NA | 0 | 0 | 349217 | 7.8958 | | S | \n", "| 880 | 1 | 1 | Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) | female | 56 | 0 | 1 | 11767 | 83.1583 | C50 | C | \n", "| 881 | 1 | 2 | Shelley, Mrs. William (Imanita Parrish Hall) | female | 25 | 0 | 1 | 230433 | 26.0000 | | S | \n", "| 882 | 0 | 3 | Markun, Mr. Johann | male | 33 | 0 | 0 | 349257 | 7.8958 | | S | \n", "| 883 | 0 | 3 | Dahlberg, Miss. Gerda Ulrika | female | 22 | 0 | 0 | 7552 | 10.5167 | | S | \n", "| 884 | 0 | 2 | Banfield, Mr. Frederick James | male | 28 | 0 | 0 | C.A./SOTON 34068 | 10.5000 | | S | \n", "| 885 | 0 | 3 | Sutehall, Mr. Henry Jr | male | 25 | 0 | 0 | SOTON/OQ 392076 | 7.0500 | | S | \n", "| 886 | 0 | 3 | Rice, Mrs. William (Margaret Norton) | female | 39 | 0 | 5 | 382652 | 29.1250 | | Q | \n", "| 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27 | 0 | 0 | 211536 | 13.0000 | | S | \n", "| 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19 | 0 | 0 | 112053 | 30.0000 | B42 | S | \n", "| 889 | 0 | 3 | Johnston, Miss. Catherine Helen \"Carrie\" | female | NA | 1 | 2 | W./C. 6607 | 23.4500 | | S | \n", "| 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26 | 0 | 0 | 111369 | 30.0000 | C148 | C | \n", "| 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32 | 0 | 0 | 370376 | 7.7500 | | Q | \n", "\n", "\n" ], "text/plain": [ " PassengerId Survived Pclass\n", "1 1 0 3 \n", "2 2 1 1 \n", "3 3 1 3 \n", "4 4 1 1 \n", "5 5 0 3 \n", "6 6 0 3 \n", "7 7 0 1 \n", "8 8 0 3 \n", "9 9 1 3 \n", "10 10 1 2 \n", "11 11 1 3 \n", "12 12 1 1 \n", "13 13 0 3 \n", "14 14 0 3 \n", "15 15 0 3 \n", "16 16 1 2 \n", "17 17 0 3 \n", "18 18 1 2 \n", "19 19 0 3 \n", "20 20 1 3 \n", "21 21 0 2 \n", "22 22 1 2 \n", "23 23 1 3 \n", "24 24 1 1 \n", "25 25 0 3 \n", "26 26 1 3 \n", "27 27 0 3 \n", "28 28 0 1 \n", "29 29 1 3 \n", "30 30 0 3 \n", "⋮ ⋮ ⋮ ⋮ \n", "862 862 0 2 \n", "863 863 1 1 \n", "864 864 0 3 \n", "865 865 0 2 \n", "866 866 1 2 \n", "867 867 1 2 \n", "868 868 0 1 \n", "869 869 0 3 \n", "870 870 1 3 \n", "871 871 0 3 \n", "872 872 1 1 \n", "873 873 0 1 \n", "874 874 0 3 \n", "875 875 1 2 \n", "876 876 1 3 \n", "877 877 0 3 \n", "878 878 0 3 \n", "879 879 0 3 \n", "880 880 1 1 \n", "881 881 1 2 \n", "882 882 0 3 \n", "883 883 0 3 \n", "884 884 0 2 \n", "885 885 0 3 \n", "886 886 0 3 \n", "887 887 0 2 \n", "888 888 1 1 \n", "889 889 0 3 \n", "890 890 1 1 \n", "891 891 0 3 \n", " Name Sex Age SibSp\n", "1 Braund, Mr. Owen Harris male 22 1 \n", "2 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38 1 \n", "3 Heikkinen, Miss. Laina female 26 0 \n", "4 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 \n", "5 Allen, Mr. William Henry male 35 0 \n", "6 Moran, Mr. James male NA 0 \n", "7 McCarthy, Mr. Timothy J male 54 0 \n", "8 Palsson, Master. Gosta Leonard male 2 3 \n", "9 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27 0 \n", "10 Nasser, Mrs. Nicholas (Adele Achem) female 14 1 \n", "11 Sandstrom, Miss. Marguerite Rut female 4 1 \n", "12 Bonnell, Miss. Elizabeth female 58 0 \n", "13 Saundercock, Mr. William Henry male 20 0 \n", "14 Andersson, Mr. Anders Johan male 39 1 \n", "15 Vestrom, Miss. Hulda Amanda Adolfina female 14 0 \n", "16 Hewlett, Mrs. (Mary D Kingcome) female 55 0 \n", "17 Rice, Master. Eugene male 2 4 \n", "18 Williams, Mr. Charles Eugene male NA 0 \n", "19 Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele) female 31 1 \n", "20 Masselmani, Mrs. Fatima female NA 0 \n", "21 Fynney, Mr. Joseph J male 35 0 \n", "22 Beesley, Mr. Lawrence male 34 0 \n", "23 McGowan, Miss. Anna \"Annie\" female 15 0 \n", "24 Sloper, Mr. William Thompson male 28 0 \n", "25 Palsson, Miss. Torborg Danira female 8 3 \n", "26 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson) female 38 1 \n", "27 Emir, Mr. Farred Chehab male NA 0 \n", "28 Fortune, Mr. Charles Alexander male 19 3 \n", "29 O'Dwyer, Miss. Ellen \"Nellie\" female NA 0 \n", "30 Todoroff, Mr. Lalio male NA 0 \n", "⋮ ⋮ ⋮ ⋮ ⋮ \n", "862 Giles, Mr. Frederick Edward male 21 1 \n", "863 Swift, Mrs. Frederick Joel (Margaret Welles Barron) female 48 0 \n", "864 Sage, Miss. Dorothy Edith \"Dolly\" female NA 8 \n", "865 Gill, Mr. John William male 24 0 \n", "866 Bystrom, Mrs. (Karolina) female 42 0 \n", "867 Duran y More, Miss. Asuncion female 27 1 \n", "868 Roebling, Mr. Washington Augustus II male 31 0 \n", "869 van Melkebeke, Mr. Philemon male NA 0 \n", "870 Johnson, Master. Harold Theodor male 4 1 \n", "871 Balkic, Mr. Cerin male 26 0 \n", "872 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47 1 \n", "873 Carlsson, Mr. Frans Olof male 33 0 \n", "874 Vander Cruyssen, Mr. Victor male 47 0 \n", "875 Abelson, Mrs. Samuel (Hannah Wizosky) female 28 1 \n", "876 Najib, Miss. Adele Kiamie \"Jane\" female 15 0 \n", "877 Gustafsson, Mr. Alfred Ossian male 20 0 \n", "878 Petroff, Mr. Nedelio male 19 0 \n", "879 Laleff, Mr. Kristo male NA 0 \n", "880 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56 0 \n", "881 Shelley, Mrs. William (Imanita Parrish Hall) female 25 0 \n", "882 Markun, Mr. Johann male 33 0 \n", "883 Dahlberg, Miss. Gerda Ulrika female 22 0 \n", "884 Banfield, Mr. Frederick James male 28 0 \n", "885 Sutehall, Mr. Henry Jr male 25 0 \n", "886 Rice, Mrs. William (Margaret Norton) female 39 0 \n", "887 Montvila, Rev. Juozas male 27 0 \n", "888 Graham, Miss. Margaret Edith female 19 0 \n", "889 Johnston, Miss. Catherine Helen \"Carrie\" female NA 1 \n", "890 Behr, Mr. Karl Howell male 26 0 \n", "891 Dooley, Mr. Patrick male 32 0 \n", " Parch Ticket Fare Cabin Embarked\n", "1 0 A/5 21171 7.2500 S \n", "2 0 PC 17599 71.2833 C85 C \n", "3 0 STON/O2. 3101282 7.9250 S \n", "4 0 113803 53.1000 C123 S \n", "5 0 373450 8.0500 S \n", "6 0 330877 8.4583 Q \n", "7 0 17463 51.8625 E46 S \n", "8 1 349909 21.0750 S \n", "9 2 347742 11.1333 S \n", "10 0 237736 30.0708 C \n", "11 1 PP 9549 16.7000 G6 S \n", "12 0 113783 26.5500 C103 S \n", "13 0 A/5. 2151 8.0500 S \n", "14 5 347082 31.2750 S \n", "15 0 350406 7.8542 S \n", "16 0 248706 16.0000 S \n", "17 1 382652 29.1250 Q \n", "18 0 244373 13.0000 S \n", "19 0 345763 18.0000 S \n", "20 0 2649 7.2250 C \n", "21 0 239865 26.0000 S \n", "22 0 248698 13.0000 D56 S \n", "23 0 330923 8.0292 Q \n", "24 0 113788 35.5000 A6 S \n", "25 1 349909 21.0750 S \n", "26 5 347077 31.3875 S \n", "27 0 2631 7.2250 C \n", "28 2 19950 263.0000 C23 C25 C27 S \n", "29 0 330959 7.8792 Q \n", "30 0 349216 7.8958 S \n", "⋮ ⋮ ⋮ ⋮ ⋮ ⋮ \n", "862 0 28134 11.5000 S \n", "863 0 17466 25.9292 D17 S \n", "864 2 CA. 2343 69.5500 S \n", "865 0 233866 13.0000 S \n", "866 0 236852 13.0000 S \n", "867 0 SC/PARIS 2149 13.8583 C \n", "868 0 PC 17590 50.4958 A24 S \n", "869 0 345777 9.5000 S \n", "870 1 347742 11.1333 S \n", "871 0 349248 7.8958 S \n", "872 1 11751 52.5542 D35 S \n", "873 0 695 5.0000 B51 B53 B55 S \n", "874 0 345765 9.0000 S \n", "875 0 P/PP 3381 24.0000 C \n", "876 0 2667 7.2250 C \n", "877 0 7534 9.8458 S \n", "878 0 349212 7.8958 S \n", "879 0 349217 7.8958 S \n", "880 1 11767 83.1583 C50 C \n", "881 1 230433 26.0000 S \n", "882 0 349257 7.8958 S \n", "883 0 7552 10.5167 S \n", "884 0 C.A./SOTON 34068 10.5000 S \n", "885 0 SOTON/OQ 392076 7.0500 S \n", "886 5 382652 29.1250 Q \n", "887 0 211536 13.0000 S \n", "888 0 112053 30.0000 B42 S \n", "889 2 W./C. 6607 23.4500 S \n", "890 0 111369 30.0000 C148 C \n", "891 0 370376 7.7500 Q " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "train <- read.csv('../input/train.csv',stringsAsFactors = F, header = T)\n", "train #just to see if it's been loaded" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Getting the total number of rows in the given dataframe (even though it's been very straight forward with nrow() in base-r, this being a dplyr starter-kit, we'll start with that." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\n", "
\n" ], "text/latex": [ "\\begin{tabular}{r|l}\n", " n\\\\\n", "\\hline\n", "\t 891\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "n | \n", "|---|\n", "| 891 | \n", "\n", "\n" ], "text/plain": [ " n \n", "1 891" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "train %>% count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above code just gives the row count of the dataframe that's been passed with the pipe %>% operator. The pipe operator works very similar to the | (pipe) operator in Unix environment where the ouput of the current operation is fed as the input of the following operation. Similary in dplyr or any other package that supports pipe operator, the functions in it will always take only dataframe as the first arugment hence the function can be called in two ways like below:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\n", "
\n" ], "text/latex": [ "\\begin{tabular}{r|l}\n", " n\\\\\n", "\\hline\n", "\t 891\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "n | \n", "|---|\n", "| 891 | \n", "\n", "\n" ], "text/plain": [ " n \n", "1 891" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\n", "
\n" ], "text/latex": [ "\\begin{tabular}{r|l}\n", " n\\\\\n", "\\hline\n", "\t 891\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "n | \n", "|---|\n", "| 891 | \n", "\n", "\n" ], "text/plain": [ " n \n", "1 891" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "count(train) #Without pipe, passing the df as the first argument\n", "train %>% count() #with pipe, more convient and more readability" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But dplyr's real flavor starts with the following 5 functions (or as most people call, verbs of dplyr):\n", "\n", "* select()\n", "* filter()\t\n", "* arrange() \n", "* mutate()\t\n", "* summarise()\t\n", "* group_by()\t\n", "\n", "And let us see what every one of these does!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### select:\n", "\n", "select() as the name suggests selects the columns that are required from a given dataframe and if multiple columns are required or not required, then `one_of()` could be used within select. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
\n" ], "text/latex": [ "\\begin{tabular}{r|l}\n", " Age\\\\\n", "\\hline\n", "\t 22\\\\\n", "\t 38\\\\\n", "\t 26\\\\\n", "\t 35\\\\\n", "\t 35\\\\\n", "\t NA\\\\\n", "\t 54\\\\\n", "\t 2\\\\\n", "\t 27\\\\\n", "\t 14\\\\\n", "\t 4\\\\\n", "\t 58\\\\\n", "\t 20\\\\\n", "\t 39\\\\\n", "\t 14\\\\\n", "\t 55\\\\\n", "\t 2\\\\\n", "\t NA\\\\\n", "\t 31\\\\\n", "\t NA\\\\\n", "\t 35\\\\\n", "\t 34\\\\\n", "\t 15\\\\\n", "\t 28\\\\\n", "\t 8\\\\\n", "\t 38\\\\\n", "\t NA\\\\\n", "\t 19\\\\\n", "\t NA\\\\\n", "\t NA\\\\\n", "\t ⋮\\\\\n", "\t 21\\\\\n", "\t 48\\\\\n", "\t NA\\\\\n", "\t 24\\\\\n", "\t 42\\\\\n", "\t 27\\\\\n", "\t 31\\\\\n", "\t NA\\\\\n", "\t 4\\\\\n", "\t 26\\\\\n", "\t 47\\\\\n", "\t 33\\\\\n", "\t 47\\\\\n", "\t 28\\\\\n", "\t 15\\\\\n", "\t 20\\\\\n", "\t 19\\\\\n", "\t NA\\\\\n", "\t 56\\\\\n", "\t 25\\\\\n", "\t 33\\\\\n", "\t 22\\\\\n", "\t 28\\\\\n", "\t 25\\\\\n", "\t 39\\\\\n", "\t 27\\\\\n", "\t 19\\\\\n", "\t NA\\\\\n", "\t 26\\\\\n", "\t 32\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Age | \n", "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", "| 22 | \n", "| 38 | \n", "| 26 | \n", "| 35 | \n", "| 35 | \n", "| NA | \n", "| 54 | \n", "| 2 | \n", "| 27 | \n", "| 14 | \n", "| 4 | \n", "| 58 | \n", "| 20 | \n", "| 39 | \n", "| 14 | \n", "| 55 | \n", "| 2 | \n", "| NA | \n", "| 31 | \n", "| NA | \n", "| 35 | \n", "| 34 | \n", "| 15 | \n", "| 28 | \n", "| 8 | \n", "| 38 | \n", "| NA | \n", "| 19 | \n", "| NA | \n", "| NA | \n", "| ⋮ | \n", "| 21 | \n", "| 48 | \n", "| NA | \n", "| 24 | \n", "| 42 | \n", "| 27 | \n", "| 31 | \n", "| NA | \n", "| 4 | \n", "| 26 | \n", "| 47 | \n", "| 33 | \n", "| 47 | \n", "| 28 | \n", "| 15 | \n", "| 20 | \n", "| 19 | \n", "| NA | \n", "| 56 | \n", "| 25 | \n", "| 33 | \n", "| 22 | \n", "| 28 | \n", "| 25 | \n", "| 39 | \n", "| 27 | \n", "| 19 | \n", "| NA | \n", "| 26 | \n", "| 32 | \n", "\n", "\n" ], "text/plain": [ " Age\n", "1 22 \n", "2 38 \n", "3 26 \n", "4 35 \n", "5 35 \n", "6 NA \n", "7 54 \n", "8 2 \n", "9 27 \n", "10 14 \n", "11 4 \n", "12 58 \n", "13 20 \n", "14 39 \n", "15 14 \n", "16 55 \n", "17 2 \n", "18 NA \n", "19 31 \n", "20 NA \n", "21 35 \n", "22 34 \n", "23 15 \n", "24 28 \n", "25 8 \n", "26 38 \n", "27 NA \n", "28 19 \n", "29 NA \n", "30 NA \n", "⋮ ⋮ \n", "862 21 \n", "863 48 \n", "864 NA \n", "865 24 \n", "866 42 \n", "867 27 \n", "868 31 \n", "869 NA \n", "870 4 \n", "871 26 \n", "872 47 \n", "873 33 \n", "874 47 \n", "875 28 \n", "876 15 \n", "877 20 \n", "878 19 \n", "879 NA \n", "880 56 \n", "881 25 \n", "882 33 \n", "883 22 \n", "884 28 \n", "885 25 \n", "886 39 \n", "887 27 \n", "888 19 \n", "889 NA \n", "890 26 \n", "891 32 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "select(train,Age) #without pipe " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
male 22
male 35
male NA
male 54
male 2
female 4
male 20
male 39
male 2
male NA
male 35
male 34
male 28
female 8
male NA
male 19
male NA
male 21
male 24
male 31
male NA
male 4
male 26
male 33
male 47
male 20
male 19
male NA
male 33
male 28
male 25
male 27
male 26
male 32
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " Sex & Age\\\\\n", "\\hline\n", "\t male & 22 \\\\\n", "\t female & 38 \\\\\n", "\t female & 26 \\\\\n", "\t female & 35 \\\\\n", "\t male & 35 \\\\\n", "\t male & NA \\\\\n", "\t male & 54 \\\\\n", "\t male & 2 \\\\\n", "\t female & 27 \\\\\n", "\t female & 14 \\\\\n", "\t female & 4 \\\\\n", "\t female & 58 \\\\\n", "\t male & 20 \\\\\n", "\t male & 39 \\\\\n", "\t female & 14 \\\\\n", "\t female & 55 \\\\\n", "\t male & 2 \\\\\n", "\t male & NA \\\\\n", "\t female & 31 \\\\\n", "\t female & NA \\\\\n", "\t male & 35 \\\\\n", "\t male & 34 \\\\\n", "\t female & 15 \\\\\n", "\t male & 28 \\\\\n", "\t female & 8 \\\\\n", "\t female & 38 \\\\\n", "\t male & NA \\\\\n", "\t male & 19 \\\\\n", "\t female & NA \\\\\n", "\t male & NA \\\\\n", "\t ⋮ & ⋮\\\\\n", "\t male & 21 \\\\\n", "\t female & 48 \\\\\n", "\t female & NA \\\\\n", "\t male & 24 \\\\\n", "\t female & 42 \\\\\n", "\t female & 27 \\\\\n", "\t male & 31 \\\\\n", "\t male & NA \\\\\n", "\t male & 4 \\\\\n", "\t male & 26 \\\\\n", "\t female & 47 \\\\\n", "\t male & 33 \\\\\n", "\t male & 47 \\\\\n", "\t female & 28 \\\\\n", "\t female & 15 \\\\\n", "\t male & 20 \\\\\n", "\t male & 19 \\\\\n", "\t male & NA \\\\\n", "\t female & 56 \\\\\n", "\t female & 25 \\\\\n", "\t male & 33 \\\\\n", "\t female & 22 \\\\\n", "\t male & 28 \\\\\n", "\t male & 25 \\\\\n", "\t female & 39 \\\\\n", "\t male & 27 \\\\\n", "\t female & 19 \\\\\n", "\t female & NA \\\\\n", "\t male & 26 \\\\\n", "\t male & 32 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sex | Age | \n", "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", "| male | 22 | \n", "| female | 38 | \n", "| female | 26 | \n", "| female | 35 | \n", "| male | 35 | \n", "| male | NA | \n", "| male | 54 | \n", "| male | 2 | \n", "| female | 27 | \n", "| female | 14 | \n", "| female | 4 | \n", "| female | 58 | \n", "| male | 20 | \n", "| male | 39 | \n", "| female | 14 | \n", "| female | 55 | \n", "| male | 2 | \n", "| male | NA | \n", "| female | 31 | \n", "| female | NA | \n", "| male | 35 | \n", "| male | 34 | \n", "| female | 15 | \n", "| male | 28 | \n", "| female | 8 | \n", "| female | 38 | \n", "| male | NA | \n", "| male | 19 | \n", "| female | NA | \n", "| male | NA | \n", "| ⋮ | ⋮ | \n", "| male | 21 | \n", "| female | 48 | \n", "| female | NA | \n", "| male | 24 | \n", "| female | 42 | \n", "| female | 27 | \n", "| male | 31 | \n", "| male | NA | \n", "| male | 4 | \n", "| male | 26 | \n", "| female | 47 | \n", "| male | 33 | \n", "| male | 47 | \n", "| female | 28 | \n", "| female | 15 | \n", "| male | 20 | \n", "| male | 19 | \n", "| male | NA | \n", "| female | 56 | \n", "| female | 25 | \n", "| male | 33 | \n", "| female | 22 | \n", "| male | 28 | \n", "| male | 25 | \n", "| female | 39 | \n", "| male | 27 | \n", "| female | 19 | \n", "| female | NA | \n", "| male | 26 | \n", "| male | 32 | \n", "\n", "\n" ], "text/plain": [ " Sex Age\n", "1 male 22 \n", "2 female 38 \n", "3 female 26 \n", "4 female 35 \n", "5 male 35 \n", "6 male NA \n", "7 male 54 \n", "8 male 2 \n", "9 female 27 \n", "10 female 14 \n", "11 female 4 \n", "12 female 58 \n", "13 male 20 \n", "14 male 39 \n", "15 female 14 \n", "16 female 55 \n", "17 male 2 \n", "18 male NA \n", "19 female 31 \n", "20 female NA \n", "21 male 35 \n", "22 male 34 \n", "23 female 15 \n", "24 male 28 \n", "25 female 8 \n", "26 female 38 \n", "27 male NA \n", "28 male 19 \n", "29 female NA \n", "30 male NA \n", "⋮ ⋮ ⋮ \n", "862 male 21 \n", "863 female 48 \n", "864 female NA \n", "865 male 24 \n", "866 female 42 \n", "867 female 27 \n", "868 male 31 \n", "869 male NA \n", "870 male 4 \n", "871 male 26 \n", "872 female 47 \n", "873 male 33 \n", "874 male 47 \n", "875 female 28 \n", "876 female 15 \n", "877 male 20 \n", "878 male 19 \n", "879 male NA \n", "880 female 56 \n", "881 female 25 \n", "882 male 33 \n", "883 female 22 \n", "884 male 28 \n", "885 male 25 \n", "886 female 39 \n", "887 male 27 \n", "888 female 19 \n", "889 female NA \n", "890 male 26 \n", "891 male 32 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#multicolumn selection\n", "train %>% select(one_of('Sex','Age')) " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
1 0 3 Braund, Mr. Owen Harris 1 0 A/5 21171 7.2500 S
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) 1 0 PC 17599 71.2833 C85 C
3 1 3 Heikkinen, Miss. Laina 0 0 STON/O2. 3101282 7.9250 S
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) 1 0 113803 53.1000 C123 S
5 0 3 Allen, Mr. William Henry 0 0 373450 8.0500 S
6 0 3 Moran, Mr. James 0 0 330877 8.4583 Q
7 0 1 McCarthy, Mr. Timothy J 0 0 17463 51.8625 E46 S
8 0 3 Palsson, Master. Gosta Leonard 3 1 349909 21.0750 S
9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) 0 2 347742 11.1333 S
10 1 2 Nasser, Mrs. Nicholas (Adele Achem) 1 0 237736 30.0708 C
11 1 3 Sandstrom, Miss. Marguerite Rut 1 1 PP 9549 16.7000 G6 S
12 1 1 Bonnell, Miss. Elizabeth 0 0 113783 26.5500 C103 S
13 0 3 Saundercock, Mr. William Henry 0 0 A/5. 2151 8.0500 S
14 0 3 Andersson, Mr. Anders Johan 1 5 347082 31.2750 S
15 0 3 Vestrom, Miss. Hulda Amanda Adolfina 0 0 350406 7.8542 S
16 1 2 Hewlett, Mrs. (Mary D Kingcome) 0 0 248706 16.0000 S
17 0 3 Rice, Master. Eugene 4 1 382652 29.1250 Q
18 1 2 Williams, Mr. Charles Eugene 0 0 244373 13.0000 S
19 0 3 Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele) 1 0 345763 18.0000 S
20 1 3 Masselmani, Mrs. Fatima 0 0 2649 7.2250 C
21 0 2 Fynney, Mr. Joseph J 0 0 239865 26.0000 S
22 1 2 Beesley, Mr. Lawrence 0 0 248698 13.0000 D56 S
23 1 3 McGowan, Miss. Anna \"Annie\" 0 0 330923 8.0292 Q
24 1 1 Sloper, Mr. William Thompson 0 0 113788 35.5000 A6 S
25 0 3 Palsson, Miss. Torborg Danira 3 1 349909 21.0750 S
26 1 3 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)1 5 347077 31.3875 S
27 0 3 Emir, Mr. Farred Chehab 0 0 2631 7.2250 C
28 0 1 Fortune, Mr. Charles Alexander 3 2 19950 263.0000 C23 C25 C27 S
29 1 3 O'Dwyer, Miss. Ellen \"Nellie\" 0 0 330959 7.8792 Q
30 0 3 Todoroff, Mr. Lalio 0 0 349216 7.8958 S
862 0 2 Giles, Mr. Frederick Edward 1 0 28134 11.5000 S
863 1 1 Swift, Mrs. Frederick Joel (Margaret Welles Barron)0 0 17466 25.9292 D17 S
864 0 3 Sage, Miss. Dorothy Edith \"Dolly\" 8 2 CA. 2343 69.5500 S
865 0 2 Gill, Mr. John William 0 0 233866 13.0000 S
866 1 2 Bystrom, Mrs. (Karolina) 0 0 236852 13.0000 S
867 1 2 Duran y More, Miss. Asuncion 1 0 SC/PARIS 2149 13.8583 C
868 0 1 Roebling, Mr. Washington Augustus II 0 0 PC 17590 50.4958 A24 S
869 0 3 van Melkebeke, Mr. Philemon 0 0 345777 9.5000 S
870 1 3 Johnson, Master. Harold Theodor 1 1 347742 11.1333 S
871 0 3 Balkic, Mr. Cerin 0 0 349248 7.8958 S
872 1 1 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) 1 1 11751 52.5542 D35 S
873 0 1 Carlsson, Mr. Frans Olof 0 0 695 5.0000 B51 B53 B55 S
874 0 3 Vander Cruyssen, Mr. Victor 0 0 345765 9.0000 S
875 1 2 Abelson, Mrs. Samuel (Hannah Wizosky) 1 0 P/PP 3381 24.0000 C
876 1 3 Najib, Miss. Adele Kiamie \"Jane\" 0 0 2667 7.2250 C
877 0 3 Gustafsson, Mr. Alfred Ossian 0 0 7534 9.8458 S
878 0 3 Petroff, Mr. Nedelio 0 0 349212 7.8958 S
879 0 3 Laleff, Mr. Kristo 0 0 349217 7.8958 S
880 1 1 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 0 1 11767 83.1583 C50 C
881 1 2 Shelley, Mrs. William (Imanita Parrish Hall) 0 1 230433 26.0000 S
882 0 3 Markun, Mr. Johann 0 0 349257 7.8958 S
883 0 3 Dahlberg, Miss. Gerda Ulrika 0 0 7552 10.5167 S
884 0 2 Banfield, Mr. Frederick James 0 0 C.A./SOTON 34068 10.5000 S
885 0 3 Sutehall, Mr. Henry Jr 0 0 SOTON/OQ 392076 7.0500 S
886 0 3 Rice, Mrs. William (Margaret Norton) 0 5 382652 29.1250 Q
887 0 2 Montvila, Rev. Juozas 0 0 211536 13.0000 S
888 1 1 Graham, Miss. Margaret Edith 0 0 112053 30.0000 B42 S
889 0 3 Johnston, Miss. Catherine Helen \"Carrie\" 1 2 W./C. 6607 23.4500 S
890 1 1 Behr, Mr. Karl Howell 0 0 111369 30.0000 C148 C
891 0 3 Dooley, Mr. Patrick 0 0 370376 7.7500 Q
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllll}\n", " PassengerId & Survived & Pclass & Name & SibSp & Parch & Ticket & Fare & Cabin & Embarked\\\\\n", "\\hline\n", "\t 1 & 0 & 3 & Braund, Mr. Owen Harris & 1 & 0 & A/5 21171 & 7.2500 & & S \\\\\n", "\t 2 & 1 & 1 & Cumings, Mrs. John Bradley (Florence Briggs Thayer) & 1 & 0 & PC 17599 & 71.2833 & C85 & C \\\\\n", "\t 3 & 1 & 3 & Heikkinen, Miss. Laina & 0 & 0 & STON/O2. 3101282 & 7.9250 & & S \\\\\n", "\t 4 & 1 & 1 & Futrelle, Mrs. Jacques Heath (Lily May Peel) & 1 & 0 & 113803 & 53.1000 & C123 & S \\\\\n", "\t 5 & 0 & 3 & Allen, Mr. William Henry & 0 & 0 & 373450 & 8.0500 & & S \\\\\n", "\t 6 & 0 & 3 & Moran, Mr. James & 0 & 0 & 330877 & 8.4583 & & Q \\\\\n", "\t 7 & 0 & 1 & McCarthy, Mr. Timothy J & 0 & 0 & 17463 & 51.8625 & E46 & S \\\\\n", "\t 8 & 0 & 3 & Palsson, Master. Gosta Leonard & 3 & 1 & 349909 & 21.0750 & & S \\\\\n", "\t 9 & 1 & 3 & Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) & 0 & 2 & 347742 & 11.1333 & & S \\\\\n", "\t 10 & 1 & 2 & Nasser, Mrs. Nicholas (Adele Achem) & 1 & 0 & 237736 & 30.0708 & & C \\\\\n", "\t 11 & 1 & 3 & Sandstrom, Miss. Marguerite Rut & 1 & 1 & PP 9549 & 16.7000 & G6 & S \\\\\n", "\t 12 & 1 & 1 & Bonnell, Miss. Elizabeth & 0 & 0 & 113783 & 26.5500 & C103 & S \\\\\n", "\t 13 & 0 & 3 & Saundercock, Mr. William Henry & 0 & 0 & A/5. 2151 & 8.0500 & & S \\\\\n", "\t 14 & 0 & 3 & Andersson, Mr. Anders Johan & 1 & 5 & 347082 & 31.2750 & & S \\\\\n", "\t 15 & 0 & 3 & Vestrom, Miss. Hulda Amanda Adolfina & 0 & 0 & 350406 & 7.8542 & & S \\\\\n", "\t 16 & 1 & 2 & Hewlett, Mrs. (Mary D Kingcome) & 0 & 0 & 248706 & 16.0000 & & S \\\\\n", "\t 17 & 0 & 3 & Rice, Master. Eugene & 4 & 1 & 382652 & 29.1250 & & Q \\\\\n", "\t 18 & 1 & 2 & Williams, Mr. Charles Eugene & 0 & 0 & 244373 & 13.0000 & & S \\\\\n", "\t 19 & 0 & 3 & Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele) & 1 & 0 & 345763 & 18.0000 & & S \\\\\n", "\t 20 & 1 & 3 & Masselmani, Mrs. Fatima & 0 & 0 & 2649 & 7.2250 & & C \\\\\n", "\t 21 & 0 & 2 & Fynney, Mr. Joseph J & 0 & 0 & 239865 & 26.0000 & & S \\\\\n", "\t 22 & 1 & 2 & Beesley, Mr. Lawrence & 0 & 0 & 248698 & 13.0000 & D56 & S \\\\\n", "\t 23 & 1 & 3 & McGowan, Miss. Anna \"Annie\" & 0 & 0 & 330923 & 8.0292 & & Q \\\\\n", "\t 24 & 1 & 1 & Sloper, Mr. William Thompson & 0 & 0 & 113788 & 35.5000 & A6 & S \\\\\n", "\t 25 & 0 & 3 & Palsson, Miss. Torborg Danira & 3 & 1 & 349909 & 21.0750 & & S \\\\\n", "\t 26 & 1 & 3 & Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson) & 1 & 5 & 347077 & 31.3875 & & S \\\\\n", "\t 27 & 0 & 3 & Emir, Mr. Farred Chehab & 0 & 0 & 2631 & 7.2250 & & C \\\\\n", "\t 28 & 0 & 1 & Fortune, Mr. Charles Alexander & 3 & 2 & 19950 & 263.0000 & C23 C25 C27 & S \\\\\n", "\t 29 & 1 & 3 & O'Dwyer, Miss. Ellen \"Nellie\" & 0 & 0 & 330959 & 7.8792 & & Q \\\\\n", "\t 30 & 0 & 3 & Todoroff, Mr. Lalio & 0 & 0 & 349216 & 7.8958 & & S \\\\\n", "\t ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮\\\\\n", "\t 862 & 0 & 2 & Giles, Mr. Frederick Edward & 1 & 0 & 28134 & 11.5000 & & S \\\\\n", "\t 863 & 1 & 1 & Swift, Mrs. Frederick Joel (Margaret Welles Barron) & 0 & 0 & 17466 & 25.9292 & D17 & S \\\\\n", "\t 864 & 0 & 3 & Sage, Miss. Dorothy Edith \"Dolly\" & 8 & 2 & CA. 2343 & 69.5500 & & S \\\\\n", "\t 865 & 0 & 2 & Gill, Mr. John William & 0 & 0 & 233866 & 13.0000 & & S \\\\\n", "\t 866 & 1 & 2 & Bystrom, Mrs. (Karolina) & 0 & 0 & 236852 & 13.0000 & & S \\\\\n", "\t 867 & 1 & 2 & Duran y More, Miss. Asuncion & 1 & 0 & SC/PARIS 2149 & 13.8583 & & C \\\\\n", "\t 868 & 0 & 1 & Roebling, Mr. Washington Augustus II & 0 & 0 & PC 17590 & 50.4958 & A24 & S \\\\\n", "\t 869 & 0 & 3 & van Melkebeke, Mr. Philemon & 0 & 0 & 345777 & 9.5000 & & S \\\\\n", "\t 870 & 1 & 3 & Johnson, Master. Harold Theodor & 1 & 1 & 347742 & 11.1333 & & S \\\\\n", "\t 871 & 0 & 3 & Balkic, Mr. Cerin & 0 & 0 & 349248 & 7.8958 & & S \\\\\n", "\t 872 & 1 & 1 & Beckwith, Mrs. Richard Leonard (Sallie Monypeny) & 1 & 1 & 11751 & 52.5542 & D35 & S \\\\\n", "\t 873 & 0 & 1 & Carlsson, Mr. Frans Olof & 0 & 0 & 695 & 5.0000 & B51 B53 B55 & S \\\\\n", "\t 874 & 0 & 3 & Vander Cruyssen, Mr. Victor & 0 & 0 & 345765 & 9.0000 & & S \\\\\n", "\t 875 & 1 & 2 & Abelson, Mrs. Samuel (Hannah Wizosky) & 1 & 0 & P/PP 3381 & 24.0000 & & C \\\\\n", "\t 876 & 1 & 3 & Najib, Miss. Adele Kiamie \"Jane\" & 0 & 0 & 2667 & 7.2250 & & C \\\\\n", "\t 877 & 0 & 3 & Gustafsson, Mr. Alfred Ossian & 0 & 0 & 7534 & 9.8458 & & S \\\\\n", "\t 878 & 0 & 3 & Petroff, Mr. Nedelio & 0 & 0 & 349212 & 7.8958 & & S \\\\\n", "\t 879 & 0 & 3 & Laleff, Mr. Kristo & 0 & 0 & 349217 & 7.8958 & & S \\\\\n", "\t 880 & 1 & 1 & Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) & 0 & 1 & 11767 & 83.1583 & C50 & C \\\\\n", "\t 881 & 1 & 2 & Shelley, Mrs. William (Imanita Parrish Hall) & 0 & 1 & 230433 & 26.0000 & & S \\\\\n", "\t 882 & 0 & 3 & Markun, Mr. Johann & 0 & 0 & 349257 & 7.8958 & & S \\\\\n", "\t 883 & 0 & 3 & Dahlberg, Miss. Gerda Ulrika & 0 & 0 & 7552 & 10.5167 & & S \\\\\n", "\t 884 & 0 & 2 & Banfield, Mr. Frederick James & 0 & 0 & C.A./SOTON 34068 & 10.5000 & & S \\\\\n", "\t 885 & 0 & 3 & Sutehall, Mr. Henry Jr & 0 & 0 & SOTON/OQ 392076 & 7.0500 & & S \\\\\n", "\t 886 & 0 & 3 & Rice, Mrs. William (Margaret Norton) & 0 & 5 & 382652 & 29.1250 & & Q \\\\\n", "\t 887 & 0 & 2 & Montvila, Rev. Juozas & 0 & 0 & 211536 & 13.0000 & & S \\\\\n", "\t 888 & 1 & 1 & Graham, Miss. Margaret Edith & 0 & 0 & 112053 & 30.0000 & B42 & S \\\\\n", "\t 889 & 0 & 3 & Johnston, Miss. Catherine Helen \"Carrie\" & 1 & 2 & W./C. 6607 & 23.4500 & & S \\\\\n", "\t 890 & 1 & 1 & Behr, Mr. Karl Howell & 0 & 0 & 111369 & 30.0000 & C148 & C \\\\\n", "\t 891 & 0 & 3 & Dooley, Mr. Patrick & 0 & 0 & 370376 & 7.7500 & & Q \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "PassengerId | Survived | Pclass | Name | SibSp | Parch | Ticket | Fare | Cabin | Embarked | \n", "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", "| 1 | 0 | 3 | Braund, Mr. Owen Harris | 1 | 0 | A/5 21171 | 7.2500 | | S | \n", "| 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Thayer) | 1 | 0 | PC 17599 | 71.2833 | C85 | C | \n", "| 3 | 1 | 3 | Heikkinen, Miss. Laina | 0 | 0 | STON/O2. 3101282 | 7.9250 | | S | \n", "| 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | 1 | 0 | 113803 | 53.1000 | C123 | S | \n", "| 5 | 0 | 3 | Allen, Mr. William Henry | 0 | 0 | 373450 | 8.0500 | | S | \n", "| 6 | 0 | 3 | Moran, Mr. James | 0 | 0 | 330877 | 8.4583 | | Q | \n", "| 7 | 0 | 1 | McCarthy, Mr. Timothy J | 0 | 0 | 17463 | 51.8625 | E46 | S | \n", "| 8 | 0 | 3 | Palsson, Master. Gosta Leonard | 3 | 1 | 349909 | 21.0750 | | S | \n", "| 9 | 1 | 3 | Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) | 0 | 2 | 347742 | 11.1333 | | S | \n", "| 10 | 1 | 2 | Nasser, Mrs. Nicholas (Adele Achem) | 1 | 0 | 237736 | 30.0708 | | C | \n", "| 11 | 1 | 3 | Sandstrom, Miss. Marguerite Rut | 1 | 1 | PP 9549 | 16.7000 | G6 | S | \n", "| 12 | 1 | 1 | Bonnell, Miss. Elizabeth | 0 | 0 | 113783 | 26.5500 | C103 | S | \n", "| 13 | 0 | 3 | Saundercock, Mr. William Henry | 0 | 0 | A/5. 2151 | 8.0500 | | S | \n", "| 14 | 0 | 3 | Andersson, Mr. Anders Johan | 1 | 5 | 347082 | 31.2750 | | S | \n", "| 15 | 0 | 3 | Vestrom, Miss. Hulda Amanda Adolfina | 0 | 0 | 350406 | 7.8542 | | S | \n", "| 16 | 1 | 2 | Hewlett, Mrs. (Mary D Kingcome) | 0 | 0 | 248706 | 16.0000 | | S | \n", "| 17 | 0 | 3 | Rice, Master. Eugene | 4 | 1 | 382652 | 29.1250 | | Q | \n", "| 18 | 1 | 2 | Williams, Mr. Charles Eugene | 0 | 0 | 244373 | 13.0000 | | S | \n", "| 19 | 0 | 3 | Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele) | 1 | 0 | 345763 | 18.0000 | | S | \n", "| 20 | 1 | 3 | Masselmani, Mrs. Fatima | 0 | 0 | 2649 | 7.2250 | | C | \n", "| 21 | 0 | 2 | Fynney, Mr. Joseph J | 0 | 0 | 239865 | 26.0000 | | S | \n", "| 22 | 1 | 2 | Beesley, Mr. Lawrence | 0 | 0 | 248698 | 13.0000 | D56 | S | \n", "| 23 | 1 | 3 | McGowan, Miss. Anna \"Annie\" | 0 | 0 | 330923 | 8.0292 | | Q | \n", "| 24 | 1 | 1 | Sloper, Mr. William Thompson | 0 | 0 | 113788 | 35.5000 | A6 | S | \n", "| 25 | 0 | 3 | Palsson, Miss. Torborg Danira | 3 | 1 | 349909 | 21.0750 | | S | \n", "| 26 | 1 | 3 | Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson) | 1 | 5 | 347077 | 31.3875 | | S | \n", "| 27 | 0 | 3 | Emir, Mr. Farred Chehab | 0 | 0 | 2631 | 7.2250 | | C | \n", "| 28 | 0 | 1 | Fortune, Mr. Charles Alexander | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S | \n", "| 29 | 1 | 3 | O'Dwyer, Miss. Ellen \"Nellie\" | 0 | 0 | 330959 | 7.8792 | | Q | \n", "| 30 | 0 | 3 | Todoroff, Mr. Lalio | 0 | 0 | 349216 | 7.8958 | | S | \n", "| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | \n", "| 862 | 0 | 2 | Giles, Mr. Frederick Edward | 1 | 0 | 28134 | 11.5000 | | S | \n", "| 863 | 1 | 1 | Swift, Mrs. Frederick Joel (Margaret Welles Barron) | 0 | 0 | 17466 | 25.9292 | D17 | S | \n", "| 864 | 0 | 3 | Sage, Miss. Dorothy Edith \"Dolly\" | 8 | 2 | CA. 2343 | 69.5500 | | S | \n", "| 865 | 0 | 2 | Gill, Mr. John William | 0 | 0 | 233866 | 13.0000 | | S | \n", "| 866 | 1 | 2 | Bystrom, Mrs. (Karolina) | 0 | 0 | 236852 | 13.0000 | | S | \n", "| 867 | 1 | 2 | Duran y More, Miss. Asuncion | 1 | 0 | SC/PARIS 2149 | 13.8583 | | C | \n", "| 868 | 0 | 1 | Roebling, Mr. Washington Augustus II | 0 | 0 | PC 17590 | 50.4958 | A24 | S | \n", "| 869 | 0 | 3 | van Melkebeke, Mr. Philemon | 0 | 0 | 345777 | 9.5000 | | S | \n", "| 870 | 1 | 3 | Johnson, Master. Harold Theodor | 1 | 1 | 347742 | 11.1333 | | S | \n", "| 871 | 0 | 3 | Balkic, Mr. Cerin | 0 | 0 | 349248 | 7.8958 | | S | \n", "| 872 | 1 | 1 | Beckwith, Mrs. Richard Leonard (Sallie Monypeny) | 1 | 1 | 11751 | 52.5542 | D35 | S | \n", "| 873 | 0 | 1 | Carlsson, Mr. Frans Olof | 0 | 0 | 695 | 5.0000 | B51 B53 B55 | S | \n", "| 874 | 0 | 3 | Vander Cruyssen, Mr. Victor | 0 | 0 | 345765 | 9.0000 | | S | \n", "| 875 | 1 | 2 | Abelson, Mrs. Samuel (Hannah Wizosky) | 1 | 0 | P/PP 3381 | 24.0000 | | C | \n", "| 876 | 1 | 3 | Najib, Miss. Adele Kiamie \"Jane\" | 0 | 0 | 2667 | 7.2250 | | C | \n", "| 877 | 0 | 3 | Gustafsson, Mr. Alfred Ossian | 0 | 0 | 7534 | 9.8458 | | S | \n", "| 878 | 0 | 3 | Petroff, Mr. Nedelio | 0 | 0 | 349212 | 7.8958 | | S | \n", "| 879 | 0 | 3 | Laleff, Mr. Kristo | 0 | 0 | 349217 | 7.8958 | | S | \n", "| 880 | 1 | 1 | Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) | 0 | 1 | 11767 | 83.1583 | C50 | C | \n", "| 881 | 1 | 2 | Shelley, Mrs. William (Imanita Parrish Hall) | 0 | 1 | 230433 | 26.0000 | | S | \n", "| 882 | 0 | 3 | Markun, Mr. Johann | 0 | 0 | 349257 | 7.8958 | | S | \n", "| 883 | 0 | 3 | Dahlberg, Miss. Gerda Ulrika | 0 | 0 | 7552 | 10.5167 | | S | \n", "| 884 | 0 | 2 | Banfield, Mr. Frederick James | 0 | 0 | C.A./SOTON 34068 | 10.5000 | | S | \n", "| 885 | 0 | 3 | Sutehall, Mr. Henry Jr | 0 | 0 | SOTON/OQ 392076 | 7.0500 | | S | \n", "| 886 | 0 | 3 | Rice, Mrs. William (Margaret Norton) | 0 | 5 | 382652 | 29.1250 | | Q | \n", "| 887 | 0 | 2 | Montvila, Rev. Juozas | 0 | 0 | 211536 | 13.0000 | | S | \n", "| 888 | 1 | 1 | Graham, Miss. Margaret Edith | 0 | 0 | 112053 | 30.0000 | B42 | S | \n", "| 889 | 0 | 3 | Johnston, Miss. Catherine Helen \"Carrie\" | 1 | 2 | W./C. 6607 | 23.4500 | | S | \n", "| 890 | 1 | 1 | Behr, Mr. Karl Howell | 0 | 0 | 111369 | 30.0000 | C148 | C | \n", "| 891 | 0 | 3 | Dooley, Mr. Patrick | 0 | 0 | 370376 | 7.7500 | | Q | \n", "\n", "\n" ], "text/plain": [ " PassengerId Survived Pclass\n", "1 1 0 3 \n", "2 2 1 1 \n", "3 3 1 3 \n", "4 4 1 1 \n", "5 5 0 3 \n", "6 6 0 3 \n", "7 7 0 1 \n", "8 8 0 3 \n", "9 9 1 3 \n", "10 10 1 2 \n", "11 11 1 3 \n", "12 12 1 1 \n", "13 13 0 3 \n", "14 14 0 3 \n", "15 15 0 3 \n", "16 16 1 2 \n", "17 17 0 3 \n", "18 18 1 2 \n", "19 19 0 3 \n", "20 20 1 3 \n", "21 21 0 2 \n", "22 22 1 2 \n", "23 23 1 3 \n", "24 24 1 1 \n", "25 25 0 3 \n", "26 26 1 3 \n", "27 27 0 3 \n", "28 28 0 1 \n", "29 29 1 3 \n", "30 30 0 3 \n", "⋮ ⋮ ⋮ ⋮ \n", "862 862 0 2 \n", "863 863 1 1 \n", "864 864 0 3 \n", "865 865 0 2 \n", "866 866 1 2 \n", "867 867 1 2 \n", "868 868 0 1 \n", "869 869 0 3 \n", "870 870 1 3 \n", "871 871 0 3 \n", "872 872 1 1 \n", "873 873 0 1 \n", "874 874 0 3 \n", "875 875 1 2 \n", "876 876 1 3 \n", "877 877 0 3 \n", "878 878 0 3 \n", "879 879 0 3 \n", "880 880 1 1 \n", "881 881 1 2 \n", "882 882 0 3 \n", "883 883 0 3 \n", "884 884 0 2 \n", "885 885 0 3 \n", "886 886 0 3 \n", "887 887 0 2 \n", "888 888 1 1 \n", "889 889 0 3 \n", "890 890 1 1 \n", "891 891 0 3 \n", " Name SibSp Parch\n", "1 Braund, Mr. Owen Harris 1 0 \n", "2 Cumings, Mrs. John Bradley (Florence Briggs Thayer) 1 0 \n", "3 Heikkinen, Miss. Laina 0 0 \n", "4 Futrelle, Mrs. Jacques Heath (Lily May Peel) 1 0 \n", "5 Allen, Mr. William Henry 0 0 \n", "6 Moran, Mr. James 0 0 \n", "7 McCarthy, Mr. Timothy J 0 0 \n", "8 Palsson, Master. Gosta Leonard 3 1 \n", "9 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) 0 2 \n", "10 Nasser, Mrs. Nicholas (Adele Achem) 1 0 \n", "11 Sandstrom, Miss. Marguerite Rut 1 1 \n", "12 Bonnell, Miss. Elizabeth 0 0 \n", "13 Saundercock, Mr. William Henry 0 0 \n", "14 Andersson, Mr. Anders Johan 1 5 \n", "15 Vestrom, Miss. Hulda Amanda Adolfina 0 0 \n", "16 Hewlett, Mrs. (Mary D Kingcome) 0 0 \n", "17 Rice, Master. Eugene 4 1 \n", "18 Williams, Mr. Charles Eugene 0 0 \n", "19 Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele) 1 0 \n", "20 Masselmani, Mrs. Fatima 0 0 \n", "21 Fynney, Mr. Joseph J 0 0 \n", "22 Beesley, Mr. Lawrence 0 0 \n", "23 McGowan, Miss. Anna \"Annie\" 0 0 \n", "24 Sloper, Mr. William Thompson 0 0 \n", "25 Palsson, Miss. Torborg Danira 3 1 \n", "26 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson) 1 5 \n", "27 Emir, Mr. Farred Chehab 0 0 \n", "28 Fortune, Mr. Charles Alexander 3 2 \n", "29 O'Dwyer, Miss. Ellen \"Nellie\" 0 0 \n", "30 Todoroff, Mr. Lalio 0 0 \n", "⋮ ⋮ ⋮ ⋮ \n", "862 Giles, Mr. Frederick Edward 1 0 \n", "863 Swift, Mrs. Frederick Joel (Margaret Welles Barron) 0 0 \n", "864 Sage, Miss. Dorothy Edith \"Dolly\" 8 2 \n", "865 Gill, Mr. John William 0 0 \n", "866 Bystrom, Mrs. (Karolina) 0 0 \n", "867 Duran y More, Miss. Asuncion 1 0 \n", "868 Roebling, Mr. Washington Augustus II 0 0 \n", "869 van Melkebeke, Mr. Philemon 0 0 \n", "870 Johnson, Master. Harold Theodor 1 1 \n", "871 Balkic, Mr. Cerin 0 0 \n", "872 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) 1 1 \n", "873 Carlsson, Mr. Frans Olof 0 0 \n", "874 Vander Cruyssen, Mr. Victor 0 0 \n", "875 Abelson, Mrs. Samuel (Hannah Wizosky) 1 0 \n", "876 Najib, Miss. Adele Kiamie \"Jane\" 0 0 \n", "877 Gustafsson, Mr. Alfred Ossian 0 0 \n", "878 Petroff, Mr. Nedelio 0 0 \n", "879 Laleff, Mr. Kristo 0 0 \n", "880 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 0 1 \n", "881 Shelley, Mrs. William (Imanita Parrish Hall) 0 1 \n", "882 Markun, Mr. Johann 0 0 \n", "883 Dahlberg, Miss. Gerda Ulrika 0 0 \n", "884 Banfield, Mr. Frederick James 0 0 \n", "885 Sutehall, Mr. Henry Jr 0 0 \n", "886 Rice, Mrs. William (Margaret Norton) 0 5 \n", "887 Montvila, Rev. Juozas 0 0 \n", "888 Graham, Miss. Margaret Edith 0 0 \n", "889 Johnston, Miss. Catherine Helen \"Carrie\" 1 2 \n", "890 Behr, Mr. Karl Howell 0 0 \n", "891 Dooley, Mr. Patrick 0 0 \n", " Ticket Fare Cabin Embarked\n", "1 A/5 21171 7.2500 S \n", "2 PC 17599 71.2833 C85 C \n", "3 STON/O2. 3101282 7.9250 S \n", "4 113803 53.1000 C123 S \n", "5 373450 8.0500 S \n", "6 330877 8.4583 Q \n", "7 17463 51.8625 E46 S \n", "8 349909 21.0750 S \n", "9 347742 11.1333 S \n", "10 237736 30.0708 C \n", "11 PP 9549 16.7000 G6 S \n", "12 113783 26.5500 C103 S \n", "13 A/5. 2151 8.0500 S \n", "14 347082 31.2750 S \n", "15 350406 7.8542 S \n", "16 248706 16.0000 S \n", "17 382652 29.1250 Q \n", "18 244373 13.0000 S \n", "19 345763 18.0000 S \n", "20 2649 7.2250 C \n", "21 239865 26.0000 S \n", "22 248698 13.0000 D56 S \n", "23 330923 8.0292 Q \n", "24 113788 35.5000 A6 S \n", "25 349909 21.0750 S \n", "26 347077 31.3875 S \n", "27 2631 7.2250 C \n", "28 19950 263.0000 C23 C25 C27 S \n", "29 330959 7.8792 Q \n", "30 349216 7.8958 S \n", "⋮ ⋮ ⋮ ⋮ ⋮ \n", "862 28134 11.5000 S \n", "863 17466 25.9292 D17 S \n", "864 CA. 2343 69.5500 S \n", "865 233866 13.0000 S \n", "866 236852 13.0000 S \n", "867 SC/PARIS 2149 13.8583 C \n", "868 PC 17590 50.4958 A24 S \n", "869 345777 9.5000 S \n", "870 347742 11.1333 S \n", "871 349248 7.8958 S \n", "872 11751 52.5542 D35 S \n", "873 695 5.0000 B51 B53 B55 S \n", "874 345765 9.0000 S \n", "875 P/PP 3381 24.0000 C \n", "876 2667 7.2250 C \n", "877 7534 9.8458 S \n", "878 349212 7.8958 S \n", "879 349217 7.8958 S \n", "880 11767 83.1583 C50 C \n", "881 230433 26.0000 S \n", "882 349257 7.8958 S \n", "883 7552 10.5167 S \n", "884 C.A./SOTON 34068 10.5000 S \n", "885 SOTON/OQ 392076 7.0500 S \n", "886 382652 29.1250 Q \n", "887 211536 13.0000 S \n", "888 112053 30.0000 B42 S \n", "889 W./C. 6607 23.4500 S \n", "890 111369 30.0000 C148 C \n", "891 370376 7.7500 Q " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#multicolumn rejection\n", "\n", "train %>% select(-one_of('Age','Sex'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Like selecting a column with entire column name (or multiple column names with one_of()), select could also be used with a few more string ops. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
13 0
21 0
33 0
41 0
53 0
63 0
71 0
83 1
93 2
102 0
113 1
121 0
133 0
143 5
153 0
162 0
173 1
182 0
193 0
203 0
212 0
222 0
233 0
241 0
253 1
263 5
273 0
281 2
293 0
303 0
8622 0
8631 0
8643 2
8652 0
8662 0
8672 0
8681 0
8693 0
8703 1
8713 0
8721 1
8731 0
8743 0
8752 0
8763 0
8773 0
8783 0
8793 0
8801 1
8812 1
8823 0
8833 0
8842 0
8853 0
8863 5
8872 0
8881 0
8893 2
8901 0
8913 0
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " PassengerId & Pclass & Parch\\\\\n", "\\hline\n", "\t 1 & 3 & 0 \\\\\n", "\t 2 & 1 & 0 \\\\\n", "\t 3 & 3 & 0 \\\\\n", "\t 4 & 1 & 0 \\\\\n", "\t 5 & 3 & 0 \\\\\n", "\t 6 & 3 & 0 \\\\\n", "\t 7 & 1 & 0 \\\\\n", "\t 8 & 3 & 1 \\\\\n", "\t 9 & 3 & 2 \\\\\n", "\t 10 & 2 & 0 \\\\\n", "\t 11 & 3 & 1 \\\\\n", "\t 12 & 1 & 0 \\\\\n", "\t 13 & 3 & 0 \\\\\n", "\t 14 & 3 & 5 \\\\\n", "\t 15 & 3 & 0 \\\\\n", "\t 16 & 2 & 0 \\\\\n", "\t 17 & 3 & 1 \\\\\n", "\t 18 & 2 & 0 \\\\\n", "\t 19 & 3 & 0 \\\\\n", "\t 20 & 3 & 0 \\\\\n", "\t 21 & 2 & 0 \\\\\n", "\t 22 & 2 & 0 \\\\\n", "\t 23 & 3 & 0 \\\\\n", "\t 24 & 1 & 0 \\\\\n", "\t 25 & 3 & 1 \\\\\n", "\t 26 & 3 & 5 \\\\\n", "\t 27 & 3 & 0 \\\\\n", "\t 28 & 1 & 2 \\\\\n", "\t 29 & 3 & 0 \\\\\n", "\t 30 & 3 & 0 \\\\\n", "\t ⋮ & ⋮ & ⋮\\\\\n", "\t 862 & 2 & 0 \\\\\n", "\t 863 & 1 & 0 \\\\\n", "\t 864 & 3 & 2 \\\\\n", "\t 865 & 2 & 0 \\\\\n", "\t 866 & 2 & 0 \\\\\n", "\t 867 & 2 & 0 \\\\\n", "\t 868 & 1 & 0 \\\\\n", "\t 869 & 3 & 0 \\\\\n", "\t 870 & 3 & 1 \\\\\n", "\t 871 & 3 & 0 \\\\\n", "\t 872 & 1 & 1 \\\\\n", "\t 873 & 1 & 0 \\\\\n", "\t 874 & 3 & 0 \\\\\n", "\t 875 & 2 & 0 \\\\\n", "\t 876 & 3 & 0 \\\\\n", "\t 877 & 3 & 0 \\\\\n", "\t 878 & 3 & 0 \\\\\n", "\t 879 & 3 & 0 \\\\\n", "\t 880 & 1 & 1 \\\\\n", "\t 881 & 2 & 1 \\\\\n", "\t 882 & 3 & 0 \\\\\n", "\t 883 & 3 & 0 \\\\\n", "\t 884 & 2 & 0 \\\\\n", "\t 885 & 3 & 0 \\\\\n", "\t 886 & 3 & 5 \\\\\n", "\t 887 & 2 & 0 \\\\\n", "\t 888 & 1 & 0 \\\\\n", "\t 889 & 3 & 2 \\\\\n", "\t 890 & 1 & 0 \\\\\n", "\t 891 & 3 & 0 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "PassengerId | Pclass | Parch | \n", "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", "| 1 | 3 | 0 | \n", "| 2 | 1 | 0 | \n", "| 3 | 3 | 0 | \n", "| 4 | 1 | 0 | \n", "| 5 | 3 | 0 | \n", "| 6 | 3 | 0 | \n", "| 7 | 1 | 0 | \n", "| 8 | 3 | 1 | \n", "| 9 | 3 | 2 | \n", "| 10 | 2 | 0 | \n", "| 11 | 3 | 1 | \n", "| 12 | 1 | 0 | \n", "| 13 | 3 | 0 | \n", "| 14 | 3 | 5 | \n", "| 15 | 3 | 0 | \n", "| 16 | 2 | 0 | \n", "| 17 | 3 | 1 | \n", "| 18 | 2 | 0 | \n", "| 19 | 3 | 0 | \n", "| 20 | 3 | 0 | \n", "| 21 | 2 | 0 | \n", "| 22 | 2 | 0 | \n", "| 23 | 3 | 0 | \n", "| 24 | 1 | 0 | \n", "| 25 | 3 | 1 | \n", "| 26 | 3 | 5 | \n", "| 27 | 3 | 0 | \n", "| 28 | 1 | 2 | \n", "| 29 | 3 | 0 | \n", "| 30 | 3 | 0 | \n", "| ⋮ | ⋮ | ⋮ | \n", "| 862 | 2 | 0 | \n", "| 863 | 1 | 0 | \n", "| 864 | 3 | 2 | \n", "| 865 | 2 | 0 | \n", "| 866 | 2 | 0 | \n", "| 867 | 2 | 0 | \n", "| 868 | 1 | 0 | \n", "| 869 | 3 | 0 | \n", "| 870 | 3 | 1 | \n", "| 871 | 3 | 0 | \n", "| 872 | 1 | 1 | \n", "| 873 | 1 | 0 | \n", "| 874 | 3 | 0 | \n", "| 875 | 2 | 0 | \n", "| 876 | 3 | 0 | \n", "| 877 | 3 | 0 | \n", "| 878 | 3 | 0 | \n", "| 879 | 3 | 0 | \n", "| 880 | 1 | 1 | \n", "| 881 | 2 | 1 | \n", "| 882 | 3 | 0 | \n", "| 883 | 3 | 0 | \n", "| 884 | 2 | 0 | \n", "| 885 | 3 | 0 | \n", "| 886 | 3 | 5 | \n", "| 887 | 2 | 0 | \n", "| 888 | 1 | 0 | \n", "| 889 | 3 | 2 | \n", "| 890 | 1 | 0 | \n", "| 891 | 3 | 0 | \n", "\n", "\n" ], "text/plain": [ " PassengerId Pclass Parch\n", "1 1 3 0 \n", "2 2 1 0 \n", "3 3 3 0 \n", "4 4 1 0 \n", "5 5 3 0 \n", "6 6 3 0 \n", "7 7 1 0 \n", "8 8 3 1 \n", "9 9 3 2 \n", "10 10 2 0 \n", "11 11 3 1 \n", "12 12 1 0 \n", "13 13 3 0 \n", "14 14 3 5 \n", "15 15 3 0 \n", "16 16 2 0 \n", "17 17 3 1 \n", "18 18 2 0 \n", "19 19 3 0 \n", "20 20 3 0 \n", "21 21 2 0 \n", "22 22 2 0 \n", "23 23 3 0 \n", "24 24 1 0 \n", "25 25 3 1 \n", "26 26 3 5 \n", "27 27 3 0 \n", "28 28 1 2 \n", "29 29 3 0 \n", "30 30 3 0 \n", "⋮ ⋮ ⋮ ⋮ \n", "862 862 2 0 \n", "863 863 1 0 \n", "864 864 3 2 \n", "865 865 2 0 \n", "866 866 2 0 \n", "867 867 2 0 \n", "868 868 1 0 \n", "869 869 3 0 \n", "870 870 3 1 \n", "871 871 3 0 \n", "872 872 1 1 \n", "873 873 1 0 \n", "874 874 3 0 \n", "875 875 2 0 \n", "876 876 3 0 \n", "877 877 3 0 \n", "878 878 3 0 \n", "879 879 3 0 \n", "880 880 1 1 \n", "881 881 2 1 \n", "882 882 3 0 \n", "883 883 3 0 \n", "884 884 2 0 \n", "885 885 3 0 \n", "886 886 3 5 \n", "887 887 2 0 \n", "888 888 1 0 \n", "889 889 3 2 \n", "890 890 1 0 \n", "891 891 3 0 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "train %>% select(starts_with('P'))" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
Braund, Mr. Owen Harris 22 7.2500
Cumings, Mrs. John Bradley (Florence Briggs Thayer) 38 71.2833
Heikkinen, Miss. Laina 26 7.9250
Futrelle, Mrs. Jacques Heath (Lily May Peel) 35 53.1000
Allen, Mr. William Henry 35 8.0500
Moran, Mr. James NA 8.4583
McCarthy, Mr. Timothy J 54 51.8625
Palsson, Master. Gosta Leonard 2 21.0750
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) 27 11.1333
Nasser, Mrs. Nicholas (Adele Achem) 14 30.0708
Sandstrom, Miss. Marguerite Rut 4 16.7000
Bonnell, Miss. Elizabeth 58 26.5500
Saundercock, Mr. William Henry 20 8.0500
Andersson, Mr. Anders Johan 39 31.2750
Vestrom, Miss. Hulda Amanda Adolfina 14 7.8542
Hewlett, Mrs. (Mary D Kingcome) 55 16.0000
Rice, Master. Eugene 2 29.1250
Williams, Mr. Charles Eugene NA 13.0000
Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele) 31 18.0000
Masselmani, Mrs. Fatima NA 7.2250
Fynney, Mr. Joseph J 35 26.0000
Beesley, Mr. Lawrence 34 13.0000
McGowan, Miss. Anna \"Annie\" 15 8.0292
Sloper, Mr. William Thompson 28 35.5000
Palsson, Miss. Torborg Danira 8 21.0750
Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)38 31.3875
Emir, Mr. Farred Chehab NA 7.2250
Fortune, Mr. Charles Alexander 19 263.0000
O'Dwyer, Miss. Ellen \"Nellie\" NA 7.8792
Todoroff, Mr. Lalio NA 7.8958
Giles, Mr. Frederick Edward 21 11.5000
Swift, Mrs. Frederick Joel (Margaret Welles Barron)48 25.9292
Sage, Miss. Dorothy Edith \"Dolly\" NA 69.5500
Gill, Mr. John William 24 13.0000
Bystrom, Mrs. (Karolina) 42 13.0000
Duran y More, Miss. Asuncion 27 13.8583
Roebling, Mr. Washington Augustus II 31 50.4958
van Melkebeke, Mr. Philemon NA 9.5000
Johnson, Master. Harold Theodor 4 11.1333
Balkic, Mr. Cerin 26 7.8958
Beckwith, Mrs. Richard Leonard (Sallie Monypeny) 47 52.5542
Carlsson, Mr. Frans Olof 33 5.0000
Vander Cruyssen, Mr. Victor 47 9.0000
Abelson, Mrs. Samuel (Hannah Wizosky) 28 24.0000
Najib, Miss. Adele Kiamie \"Jane\" 15 7.2250
Gustafsson, Mr. Alfred Ossian 20 9.8458
Petroff, Mr. Nedelio 19 7.8958
Laleff, Mr. Kristo NA 7.8958
Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 56 83.1583
Shelley, Mrs. William (Imanita Parrish Hall) 25 26.0000
Markun, Mr. Johann 33 7.8958
Dahlberg, Miss. Gerda Ulrika 22 10.5167
Banfield, Mr. Frederick James 28 10.5000
Sutehall, Mr. Henry Jr 25 7.0500
Rice, Mrs. William (Margaret Norton) 39 29.1250
Montvila, Rev. Juozas 27 13.0000
Graham, Miss. Margaret Edith 19 30.0000
Johnston, Miss. Catherine Helen \"Carrie\" NA 23.4500
Behr, Mr. Karl Howell 26 30.0000
Dooley, Mr. Patrick 32 7.7500
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " Name & Age & Fare\\\\\n", "\\hline\n", "\t Braund, Mr. Owen Harris & 22 & 7.2500 \\\\\n", "\t Cumings, Mrs. John Bradley (Florence Briggs Thayer) & 38 & 71.2833 \\\\\n", "\t Heikkinen, Miss. Laina & 26 & 7.9250 \\\\\n", "\t Futrelle, Mrs. Jacques Heath (Lily May Peel) & 35 & 53.1000 \\\\\n", "\t Allen, Mr. William Henry & 35 & 8.0500 \\\\\n", "\t Moran, Mr. James & NA & 8.4583 \\\\\n", "\t McCarthy, Mr. Timothy J & 54 & 51.8625 \\\\\n", "\t Palsson, Master. Gosta Leonard & 2 & 21.0750 \\\\\n", "\t Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) & 27 & 11.1333 \\\\\n", "\t Nasser, Mrs. Nicholas (Adele Achem) & 14 & 30.0708 \\\\\n", "\t Sandstrom, Miss. Marguerite Rut & 4 & 16.7000 \\\\\n", "\t Bonnell, Miss. Elizabeth & 58 & 26.5500 \\\\\n", "\t Saundercock, Mr. William Henry & 20 & 8.0500 \\\\\n", "\t Andersson, Mr. Anders Johan & 39 & 31.2750 \\\\\n", "\t Vestrom, Miss. Hulda Amanda Adolfina & 14 & 7.8542 \\\\\n", "\t Hewlett, Mrs. (Mary D Kingcome) & 55 & 16.0000 \\\\\n", "\t Rice, Master. Eugene & 2 & 29.1250 \\\\\n", "\t Williams, Mr. Charles Eugene & NA & 13.0000 \\\\\n", "\t Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele) & 31 & 18.0000 \\\\\n", "\t Masselmani, Mrs. Fatima & NA & 7.2250 \\\\\n", "\t Fynney, Mr. Joseph J & 35 & 26.0000 \\\\\n", "\t Beesley, Mr. Lawrence & 34 & 13.0000 \\\\\n", "\t McGowan, Miss. Anna \"Annie\" & 15 & 8.0292 \\\\\n", "\t Sloper, Mr. William Thompson & 28 & 35.5000 \\\\\n", "\t Palsson, Miss. Torborg Danira & 8 & 21.0750 \\\\\n", "\t Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson) & 38 & 31.3875 \\\\\n", "\t Emir, Mr. Farred Chehab & NA & 7.2250 \\\\\n", "\t Fortune, Mr. Charles Alexander & 19 & 263.0000 \\\\\n", "\t O'Dwyer, Miss. Ellen \"Nellie\" & NA & 7.8792 \\\\\n", "\t Todoroff, Mr. Lalio & NA & 7.8958 \\\\\n", "\t ⋮ & ⋮ & ⋮\\\\\n", "\t Giles, Mr. Frederick Edward & 21 & 11.5000 \\\\\n", "\t Swift, Mrs. Frederick Joel (Margaret Welles Barron) & 48 & 25.9292 \\\\\n", "\t Sage, Miss. Dorothy Edith \"Dolly\" & NA & 69.5500 \\\\\n", "\t Gill, Mr. John William & 24 & 13.0000 \\\\\n", "\t Bystrom, Mrs. (Karolina) & 42 & 13.0000 \\\\\n", "\t Duran y More, Miss. Asuncion & 27 & 13.8583 \\\\\n", "\t Roebling, Mr. Washington Augustus II & 31 & 50.4958 \\\\\n", "\t van Melkebeke, Mr. Philemon & NA & 9.5000 \\\\\n", "\t Johnson, Master. Harold Theodor & 4 & 11.1333 \\\\\n", "\t Balkic, Mr. Cerin & 26 & 7.8958 \\\\\n", "\t Beckwith, Mrs. Richard Leonard (Sallie Monypeny) & 47 & 52.5542 \\\\\n", "\t Carlsson, Mr. Frans Olof & 33 & 5.0000 \\\\\n", "\t Vander Cruyssen, Mr. Victor & 47 & 9.0000 \\\\\n", "\t Abelson, Mrs. Samuel (Hannah Wizosky) & 28 & 24.0000 \\\\\n", "\t Najib, Miss. Adele Kiamie \"Jane\" & 15 & 7.2250 \\\\\n", "\t Gustafsson, Mr. Alfred Ossian & 20 & 9.8458 \\\\\n", "\t Petroff, Mr. Nedelio & 19 & 7.8958 \\\\\n", "\t Laleff, Mr. Kristo & NA & 7.8958 \\\\\n", "\t Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) & 56 & 83.1583 \\\\\n", "\t Shelley, Mrs. William (Imanita Parrish Hall) & 25 & 26.0000 \\\\\n", "\t Markun, Mr. Johann & 33 & 7.8958 \\\\\n", "\t Dahlberg, Miss. Gerda Ulrika & 22 & 10.5167 \\\\\n", "\t Banfield, Mr. Frederick James & 28 & 10.5000 \\\\\n", "\t Sutehall, Mr. Henry Jr & 25 & 7.0500 \\\\\n", "\t Rice, Mrs. William (Margaret Norton) & 39 & 29.1250 \\\\\n", "\t Montvila, Rev. Juozas & 27 & 13.0000 \\\\\n", "\t Graham, Miss. Margaret Edith & 19 & 30.0000 \\\\\n", "\t Johnston, Miss. Catherine Helen \"Carrie\" & NA & 23.4500 \\\\\n", "\t Behr, Mr. Karl Howell & 26 & 30.0000 \\\\\n", "\t Dooley, Mr. Patrick & 32 & 7.7500 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Name | Age | Fare | \n", "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", "| Braund, Mr. Owen Harris | 22 | 7.2500 | \n", "| Cumings, Mrs. John Bradley (Florence Briggs Thayer) | 38 | 71.2833 | \n", "| Heikkinen, Miss. Laina | 26 | 7.9250 | \n", "| Futrelle, Mrs. Jacques Heath (Lily May Peel) | 35 | 53.1000 | \n", "| Allen, Mr. William Henry | 35 | 8.0500 | \n", "| Moran, Mr. James | NA | 8.4583 | \n", "| McCarthy, Mr. Timothy J | 54 | 51.8625 | \n", "| Palsson, Master. Gosta Leonard | 2 | 21.0750 | \n", "| Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) | 27 | 11.1333 | \n", "| Nasser, Mrs. Nicholas (Adele Achem) | 14 | 30.0708 | \n", "| Sandstrom, Miss. Marguerite Rut | 4 | 16.7000 | \n", "| Bonnell, Miss. Elizabeth | 58 | 26.5500 | \n", "| Saundercock, Mr. William Henry | 20 | 8.0500 | \n", "| Andersson, Mr. Anders Johan | 39 | 31.2750 | \n", "| Vestrom, Miss. Hulda Amanda Adolfina | 14 | 7.8542 | \n", "| Hewlett, Mrs. (Mary D Kingcome) | 55 | 16.0000 | \n", "| Rice, Master. Eugene | 2 | 29.1250 | \n", "| Williams, Mr. Charles Eugene | NA | 13.0000 | \n", "| Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele) | 31 | 18.0000 | \n", "| Masselmani, Mrs. Fatima | NA | 7.2250 | \n", "| Fynney, Mr. Joseph J | 35 | 26.0000 | \n", "| Beesley, Mr. Lawrence | 34 | 13.0000 | \n", "| McGowan, Miss. Anna \"Annie\" | 15 | 8.0292 | \n", "| Sloper, Mr. William Thompson | 28 | 35.5000 | \n", "| Palsson, Miss. Torborg Danira | 8 | 21.0750 | \n", "| Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson) | 38 | 31.3875 | \n", "| Emir, Mr. Farred Chehab | NA | 7.2250 | \n", "| Fortune, Mr. Charles Alexander | 19 | 263.0000 | \n", "| O'Dwyer, Miss. Ellen \"Nellie\" | NA | 7.8792 | \n", "| Todoroff, Mr. Lalio | NA | 7.8958 | \n", "| ⋮ | ⋮ | ⋮ | \n", "| Giles, Mr. Frederick Edward | 21 | 11.5000 | \n", "| Swift, Mrs. Frederick Joel (Margaret Welles Barron) | 48 | 25.9292 | \n", "| Sage, Miss. Dorothy Edith \"Dolly\" | NA | 69.5500 | \n", "| Gill, Mr. John William | 24 | 13.0000 | \n", "| Bystrom, Mrs. (Karolina) | 42 | 13.0000 | \n", "| Duran y More, Miss. Asuncion | 27 | 13.8583 | \n", "| Roebling, Mr. Washington Augustus II | 31 | 50.4958 | \n", "| van Melkebeke, Mr. Philemon | NA | 9.5000 | \n", "| Johnson, Master. Harold Theodor | 4 | 11.1333 | \n", "| Balkic, Mr. Cerin | 26 | 7.8958 | \n", "| Beckwith, Mrs. Richard Leonard (Sallie Monypeny) | 47 | 52.5542 | \n", "| Carlsson, Mr. Frans Olof | 33 | 5.0000 | \n", "| Vander Cruyssen, Mr. Victor | 47 | 9.0000 | \n", "| Abelson, Mrs. Samuel (Hannah Wizosky) | 28 | 24.0000 | \n", "| Najib, Miss. Adele Kiamie \"Jane\" | 15 | 7.2250 | \n", "| Gustafsson, Mr. Alfred Ossian | 20 | 9.8458 | \n", "| Petroff, Mr. Nedelio | 19 | 7.8958 | \n", "| Laleff, Mr. Kristo | NA | 7.8958 | \n", "| Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) | 56 | 83.1583 | \n", "| Shelley, Mrs. William (Imanita Parrish Hall) | 25 | 26.0000 | \n", "| Markun, Mr. Johann | 33 | 7.8958 | \n", "| Dahlberg, Miss. Gerda Ulrika | 22 | 10.5167 | \n", "| Banfield, Mr. Frederick James | 28 | 10.5000 | \n", "| Sutehall, Mr. Henry Jr | 25 | 7.0500 | \n", "| Rice, Mrs. William (Margaret Norton) | 39 | 29.1250 | \n", "| Montvila, Rev. Juozas | 27 | 13.0000 | \n", "| Graham, Miss. Margaret Edith | 19 | 30.0000 | \n", "| Johnston, Miss. Catherine Helen \"Carrie\" | NA | 23.4500 | \n", "| Behr, Mr. Karl Howell | 26 | 30.0000 | \n", "| Dooley, Mr. Patrick | 32 | 7.7500 | \n", "\n", "\n" ], "text/plain": [ " Name Age Fare \n", "1 Braund, Mr. Owen Harris 22 7.2500\n", "2 Cumings, Mrs. John Bradley (Florence Briggs Thayer) 38 71.2833\n", "3 Heikkinen, Miss. Laina 26 7.9250\n", "4 Futrelle, Mrs. Jacques Heath (Lily May Peel) 35 53.1000\n", "5 Allen, Mr. William Henry 35 8.0500\n", "6 Moran, Mr. James NA 8.4583\n", "7 McCarthy, Mr. Timothy J 54 51.8625\n", "8 Palsson, Master. Gosta Leonard 2 21.0750\n", "9 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) 27 11.1333\n", "10 Nasser, Mrs. Nicholas (Adele Achem) 14 30.0708\n", "11 Sandstrom, Miss. Marguerite Rut 4 16.7000\n", "12 Bonnell, Miss. Elizabeth 58 26.5500\n", "13 Saundercock, Mr. William Henry 20 8.0500\n", "14 Andersson, Mr. Anders Johan 39 31.2750\n", "15 Vestrom, Miss. Hulda Amanda Adolfina 14 7.8542\n", "16 Hewlett, Mrs. (Mary D Kingcome) 55 16.0000\n", "17 Rice, Master. Eugene 2 29.1250\n", "18 Williams, Mr. Charles Eugene NA 13.0000\n", "19 Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele) 31 18.0000\n", "20 Masselmani, Mrs. Fatima NA 7.2250\n", "21 Fynney, Mr. Joseph J 35 26.0000\n", "22 Beesley, Mr. Lawrence 34 13.0000\n", "23 McGowan, Miss. Anna \"Annie\" 15 8.0292\n", "24 Sloper, Mr. William Thompson 28 35.5000\n", "25 Palsson, Miss. Torborg Danira 8 21.0750\n", "26 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson) 38 31.3875\n", "27 Emir, Mr. Farred Chehab NA 7.2250\n", "28 Fortune, Mr. Charles Alexander 19 263.0000\n", "29 O'Dwyer, Miss. Ellen \"Nellie\" NA 7.8792\n", "30 Todoroff, Mr. Lalio NA 7.8958\n", "⋮ ⋮ ⋮ ⋮ \n", "862 Giles, Mr. Frederick Edward 21 11.5000 \n", "863 Swift, Mrs. Frederick Joel (Margaret Welles Barron) 48 25.9292 \n", "864 Sage, Miss. Dorothy Edith \"Dolly\" NA 69.5500 \n", "865 Gill, Mr. John William 24 13.0000 \n", "866 Bystrom, Mrs. (Karolina) 42 13.0000 \n", "867 Duran y More, Miss. Asuncion 27 13.8583 \n", "868 Roebling, Mr. Washington Augustus II 31 50.4958 \n", "869 van Melkebeke, Mr. Philemon NA 9.5000 \n", "870 Johnson, Master. Harold Theodor 4 11.1333 \n", "871 Balkic, Mr. Cerin 26 7.8958 \n", "872 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) 47 52.5542 \n", "873 Carlsson, Mr. Frans Olof 33 5.0000 \n", "874 Vander Cruyssen, Mr. Victor 47 9.0000 \n", "875 Abelson, Mrs. Samuel (Hannah Wizosky) 28 24.0000 \n", "876 Najib, Miss. Adele Kiamie \"Jane\" 15 7.2250 \n", "877 Gustafsson, Mr. Alfred Ossian 20 9.8458 \n", "878 Petroff, Mr. Nedelio 19 7.8958 \n", "879 Laleff, Mr. Kristo NA 7.8958 \n", "880 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 56 83.1583 \n", "881 Shelley, Mrs. William (Imanita Parrish Hall) 25 26.0000 \n", "882 Markun, Mr. Johann 33 7.8958 \n", "883 Dahlberg, Miss. Gerda Ulrika 22 10.5167 \n", "884 Banfield, Mr. Frederick James 28 10.5000 \n", "885 Sutehall, Mr. Henry Jr 25 7.0500 \n", "886 Rice, Mrs. William (Margaret Norton) 39 29.1250 \n", "887 Montvila, Rev. Juozas 27 13.0000 \n", "888 Graham, Miss. Margaret Edith 19 30.0000 \n", "889 Johnston, Miss. Catherine Helen \"Carrie\" NA 23.4500 \n", "890 Behr, Mr. Karl Howell 26 30.0000 \n", "891 Dooley, Mr. Patrick 32 7.7500 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "train %>% select(ends_with('e'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Group_by:\n", "\n", "Group_by is a lot similar to SQL Group by but more versatile. It is related to concept of “split-apply-combine”. Let us understand group_by with a starter example of finding out number of male and number of female - which logically could be the count of each Sex Type (once grouped by Sex).\n", "\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "
male 577
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " Sex & n\\\\\n", "\\hline\n", "\t female & 314 \\\\\n", "\t male & 577 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sex | n | \n", "|---|---|\n", "| female | 314 | \n", "| male | 577 | \n", "\n", "\n" ], "text/plain": [ " Sex n \n", "1 female 314\n", "2 male 577" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "train %>% group_by(Sex) %>% count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Aha! That seems simple and now let us do a two level grouping to understand how many of survived of each gender." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
0 female 81
0 male 468
1 female233
1 male 109
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " Survived & Sex & n\\\\\n", "\\hline\n", "\t 0 & female & 81 \\\\\n", "\t 0 & male & 468 \\\\\n", "\t 1 & female & 233 \\\\\n", "\t 1 & male & 109 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Survived | Sex | n | \n", "|---|---|---|---|\n", "| 0 | female | 81 | \n", "| 0 | male | 468 | \n", "| 1 | female | 233 | \n", "| 1 | male | 109 | \n", "\n", "\n" ], "text/plain": [ " Survived Sex n \n", "1 0 female 81\n", "2 0 male 468\n", "3 1 female 233\n", "4 1 male 109" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
female0 81
female1 233
male 0 468
male 1 109
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " Sex & Survived & n\\\\\n", "\\hline\n", "\t female & 0 & 81 \\\\\n", "\t female & 1 & 233 \\\\\n", "\t male & 0 & 468 \\\\\n", "\t male & 1 & 109 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Sex | Survived | n | \n", "|---|---|---|---|\n", "| female | 0 | 81 | \n", "| female | 1 | 233 | \n", "| male | 0 | 468 | \n", "| male | 1 | 109 | \n", "\n", "\n" ], "text/plain": [ " Sex Survived n \n", "1 female 0 81\n", "2 female 1 233\n", "3 male 0 468\n", "4 male 1 109" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "train %>% group_by(Survived, Sex) %>% count()\n", "\n", "train %>% group_by(Sex, Survived) %>% count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mutate and Summarise:\n", "\n", "That's minimally group_by, but the true power of group_by is unveiled only when it is coupled with mutate and summarise functions.\n", "\n", "Mutate function adds a new column based on the given expression while summarise function summarises the dataset based on the given function and let us see the difference in action with the following example.\n", "\n", "Let us get the average age of all survivors (and non-survivors): so this must be group_by -ed based on Survived while summarised by Age so that we will get a summarised mean value.for two groups." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "
0 NA
1 NA
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " Survived & mean(Age)\\\\\n", "\\hline\n", "\t 0 & NA\\\\\n", "\t 1 & NA\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Survived | mean(Age) | \n", "|---|---|\n", "| 0 | NA | \n", "| 1 | NA | \n", "\n", "\n" ], "text/plain": [ " Survived mean(Age)\n", "1 0 NA \n", "2 1 NA " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "
0 30.62618
1 28.34369
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " Survived & average\\_age\\\\\n", "\\hline\n", "\t 0 & 30.62618\\\\\n", "\t 1 & 28.34369\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Survived | average_age | \n", "|---|---|\n", "| 0 | 30.62618 | \n", "| 1 | 28.34369 | \n", "\n", "\n" ], "text/plain": [ " Survived average_age\n", "1 0 30.62618 \n", "2 1 28.34369 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "train %>% group_by(Survived) %>% summarise(mean(Age))\n", "\n", " #Remember we have got NAs, so mean() wouldn't work and to bypass NAs, na.rm = T must be passed. \n", "\n", "train %>% group_by(Survived) %>% summarise(average_age = mean(Age,na.rm=T))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's summarise() giving us the summary of the dataframe. If we need to create a new column, values filled for all 891 datapoints, that's where mutate plays its role. Let us create a new column, `Age_Bracket` containing value `Minor` if Age is less than 18 else `Major`" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
22 Major
38 Major
26 Major
35 Major
35 Major
54 Major
2 Minor
27 Major
14 Minor
4 Minor
58 Major
20 Major
39 Major
14 Minor
55 Major
2 Minor
31 Major
35 Major
34 Major
15 Minor
28 Major
8 Minor
38 Major
19 Major
21 Major
48 Major
24 Major
42 Major
27 Major
31 Major
4 Minor
26 Major
47 Major
33 Major
47 Major
28 Major
15 Minor
20 Major
19 Major
56 Major
25 Major
33 Major
22 Major
28 Major
25 Major
39 Major
27 Major
19 Major
26 Major
32 Major
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " Age & Age\\_Bracket\\\\\n", "\\hline\n", "\t 22 & Major\\\\\n", "\t 38 & Major\\\\\n", "\t 26 & Major\\\\\n", "\t 35 & Major\\\\\n", "\t 35 & Major\\\\\n", "\t NA & NA \\\\\n", "\t 54 & Major\\\\\n", "\t 2 & Minor\\\\\n", "\t 27 & Major\\\\\n", "\t 14 & Minor\\\\\n", "\t 4 & Minor\\\\\n", "\t 58 & Major\\\\\n", "\t 20 & Major\\\\\n", "\t 39 & Major\\\\\n", "\t 14 & Minor\\\\\n", "\t 55 & Major\\\\\n", "\t 2 & Minor\\\\\n", "\t NA & NA \\\\\n", "\t 31 & Major\\\\\n", "\t NA & NA \\\\\n", "\t 35 & Major\\\\\n", "\t 34 & Major\\\\\n", "\t 15 & Minor\\\\\n", "\t 28 & Major\\\\\n", "\t 8 & Minor\\\\\n", "\t 38 & Major\\\\\n", "\t NA & NA \\\\\n", "\t 19 & Major\\\\\n", "\t NA & NA \\\\\n", "\t NA & NA \\\\\n", "\t ⋮ & ⋮\\\\\n", "\t 21 & Major\\\\\n", "\t 48 & Major\\\\\n", "\t NA & NA \\\\\n", "\t 24 & Major\\\\\n", "\t 42 & Major\\\\\n", "\t 27 & Major\\\\\n", "\t 31 & Major\\\\\n", "\t NA & NA \\\\\n", "\t 4 & Minor\\\\\n", "\t 26 & Major\\\\\n", "\t 47 & Major\\\\\n", "\t 33 & Major\\\\\n", "\t 47 & Major\\\\\n", "\t 28 & Major\\\\\n", "\t 15 & Minor\\\\\n", "\t 20 & Major\\\\\n", "\t 19 & Major\\\\\n", "\t NA & NA \\\\\n", "\t 56 & Major\\\\\n", "\t 25 & Major\\\\\n", "\t 33 & Major\\\\\n", "\t 22 & Major\\\\\n", "\t 28 & Major\\\\\n", "\t 25 & Major\\\\\n", "\t 39 & Major\\\\\n", "\t 27 & Major\\\\\n", "\t 19 & Major\\\\\n", "\t NA & NA \\\\\n", "\t 26 & Major\\\\\n", "\t 32 & Major\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Age | Age_Bracket | \n", "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", "| 22 | Major | \n", "| 38 | Major | \n", "| 26 | Major | \n", "| 35 | Major | \n", "| 35 | Major | \n", "| NA | NA | \n", "| 54 | Major | \n", "| 2 | Minor | \n", "| 27 | Major | \n", "| 14 | Minor | \n", "| 4 | Minor | \n", "| 58 | Major | \n", "| 20 | Major | \n", "| 39 | Major | \n", "| 14 | Minor | \n", "| 55 | Major | \n", "| 2 | Minor | \n", "| NA | NA | \n", "| 31 | Major | \n", "| NA | NA | \n", "| 35 | Major | \n", "| 34 | Major | \n", "| 15 | Minor | \n", "| 28 | Major | \n", "| 8 | Minor | \n", "| 38 | Major | \n", "| NA | NA | \n", "| 19 | Major | \n", "| NA | NA | \n", "| NA | NA | \n", "| ⋮ | ⋮ | \n", "| 21 | Major | \n", "| 48 | Major | \n", "| NA | NA | \n", "| 24 | Major | \n", "| 42 | Major | \n", "| 27 | Major | \n", "| 31 | Major | \n", "| NA | NA | \n", "| 4 | Minor | \n", "| 26 | Major | \n", "| 47 | Major | \n", "| 33 | Major | \n", "| 47 | Major | \n", "| 28 | Major | \n", "| 15 | Minor | \n", "| 20 | Major | \n", "| 19 | Major | \n", "| NA | NA | \n", "| 56 | Major | \n", "| 25 | Major | \n", "| 33 | Major | \n", "| 22 | Major | \n", "| 28 | Major | \n", "| 25 | Major | \n", "| 39 | Major | \n", "| 27 | Major | \n", "| 19 | Major | \n", "| NA | NA | \n", "| 26 | Major | \n", "| 32 | Major | \n", "\n", "\n" ], "text/plain": [ " Age Age_Bracket\n", "1 22 Major \n", "2 38 Major \n", "3 26 Major \n", "4 35 Major \n", "5 35 Major \n", "6 NA NA \n", "7 54 Major \n", "8 2 Minor \n", "9 27 Major \n", "10 14 Minor \n", "11 4 Minor \n", "12 58 Major \n", "13 20 Major \n", "14 39 Major \n", "15 14 Minor \n", "16 55 Major \n", "17 2 Minor \n", "18 NA NA \n", "19 31 Major \n", "20 NA NA \n", "21 35 Major \n", "22 34 Major \n", "23 15 Minor \n", "24 28 Major \n", "25 8 Minor \n", "26 38 Major \n", "27 NA NA \n", "28 19 Major \n", "29 NA NA \n", "30 NA NA \n", "⋮ ⋮ ⋮ \n", "862 21 Major \n", "863 48 Major \n", "864 NA NA \n", "865 24 Major \n", "866 42 Major \n", "867 27 Major \n", "868 31 Major \n", "869 NA NA \n", "870 4 Minor \n", "871 26 Major \n", "872 47 Major \n", "873 33 Major \n", "874 47 Major \n", "875 28 Major \n", "876 15 Minor \n", "877 20 Major \n", "878 19 Major \n", "879 NA NA \n", "880 56 Major \n", "881 25 Major \n", "882 33 Major \n", "883 22 Major \n", "884 28 Major \n", "885 25 Major \n", "886 39 Major \n", "887 27 Major \n", "888 19 Major \n", "889 NA NA \n", "890 26 Major \n", "891 32 Major " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
0 Major 41.750842
0 Minor 5.836139
0 NA 14.029181
1 Major 25.701459
1 Minor 6.846240
1 NA 5.836139
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " Survived & Age\\_Bracket & pnt\\\\\n", "\\hline\n", "\t 0 & Major & 41.750842\\\\\n", "\t 0 & Minor & 5.836139\\\\\n", "\t 0 & NA & 14.029181\\\\\n", "\t 1 & Major & 25.701459\\\\\n", "\t 1 & Minor & 6.846240\\\\\n", "\t 1 & NA & 5.836139\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Survived | Age_Bracket | pnt | \n", "|---|---|---|---|---|---|\n", "| 0 | Major | 41.750842 | \n", "| 0 | Minor | 5.836139 | \n", "| 0 | NA | 14.029181 | \n", "| 1 | Major | 25.701459 | \n", "| 1 | Minor | 6.846240 | \n", "| 1 | NA | 5.836139 | \n", "\n", "\n" ], "text/plain": [ " Survived Age_Bracket pnt \n", "1 0 Major 41.750842\n", "2 0 Minor 5.836139\n", "3 0 NA 14.029181\n", "4 1 Major 25.701459\n", "5 1 Minor 6.846240\n", "6 1 NA 5.836139" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "train %>% mutate(Age_Bracket = ifelse(Age < 18, 'Minor','Major')) %>% select(starts_with('Age'))\n", "\n", "#In fact this can be coupled with Survivor list to see the impact of this Age_bracket\n", "\n", "train %>% \n", "mutate(Age_Bracket = ifelse(Age < 18, 'Minor','Major')) %>% \n", "group_by(Survived,Age_Bracket) %>% \n", "summarise(pnt = (n()/nrow(train))*100)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's how dplyr can get more powerful with group_by coupled with mutate or summarise for feautre engineering and for better data visualization. But this doesn't stop here, because one of the most important function a dataanalyst would require is sorting and that's what `arrange()` does.\n", "\n", "## arrange - in ascending order:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
870319 1 1 Wick, Miss. Mary Natalie female 31 0 2 36928 164.8667 C7 S
871857 1 1 Wick, Mrs. George Dennick (Mary Hitchcock) female 45 1 1 36928 164.8667 S
872690 1 1 Madill, Miss. Georgette Alexandra female 15 0 1 24160 211.3375 B5 S
873731 1 1 Allen, Miss. Elisabeth Walton female 29 0 0 24160 211.3375 B5 S
874780 1 1 Robert, Mrs. Edward Scott (Elisabeth Walton McMillan)female 43 0 1 24160 211.3375 B3 S
875378 0 1 Widener, Mr. Harry Elkins male 27 0 2 113503 211.5000 C82 C
876528 0 1 Farthing, Mr. John male NA 0 0 PC 17483 221.7792 C95 S
877381 1 1 Bidois, Miss. Rosalie female 42 0 0 PC 17757 227.5250 C
878558 0 1 Robbins, Mr. Victor male NA 0 0 PC 17757 227.5250 C
879701 1 1 Astor, Mrs. John Jacob (Madeleine Talmadge Force) female 18 1 0 PC 17757 227.5250 C62 C64 C
880717 1 1 Endres, Miss. Caroline Louise female 38 0 0 PC 17757 227.5250 C45 C
881119 0 1 Baxter, Mr. Quigg Edmond male 24 0 1 PC 17558 247.5208 B58 B60 C
882300 1 1 Baxter, Mrs. James (Helene DeLaudeniere Chaput) female 50 0 1 PC 17558 247.5208 B58 B60 C
883312 1 1 Ryerson, Miss. Emily Borie female 18 2 2 PC 17608 262.3750 B57 B59 B63 B66 C
884743 1 1 Ryerson, Miss. Susan Parker \"Suzette\" female 21 2 2 PC 17608 262.3750 B57 B59 B63 B66 C
885 28 0 1 Fortune, Mr. Charles Alexander male 19 3 2 19950 263.0000 C23 C25 C27 S
886 89 1 1 Fortune, Miss. Mabel Helen female 23 3 2 19950 263.0000 C23 C25 C27 S
887342 1 1 Fortune, Miss. Alice Elizabeth female 24 3 2 19950 263.0000 C23 C25 C27 S
888439 0 1 Fortune, Mr. Mark male 64 1 4 19950 263.0000 C23 C25 C27 S
889259 1 1 Ward, Miss. Anna female 35 0 0 PC 17755 512.3292 C
890680 1 1 Cardeza, Mr. Thomas Drake Martinez male 36 0 1 PC 17755 512.3292 B51 B53 B55 C
891738 1 1 Lesurer, Mr. Gustave J male 35 0 0 PC 17755 512.3292 B101 C
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " & PassengerId & Survived & Pclass & Name & Sex & Age & SibSp & Parch & Ticket & Fare & Cabin & Embarked\\\\\n", "\\hline\n", "\t870 & 319 & 1 & 1 & Wick, Miss. Mary Natalie & female & 31 & 0 & 2 & 36928 & 164.8667 & C7 & S \\\\\n", "\t871 & 857 & 1 & 1 & Wick, Mrs. George Dennick (Mary Hitchcock) & female & 45 & 1 & 1 & 36928 & 164.8667 & & S \\\\\n", "\t872 & 690 & 1 & 1 & Madill, Miss. Georgette Alexandra & female & 15 & 0 & 1 & 24160 & 211.3375 & B5 & S \\\\\n", "\t873 & 731 & 1 & 1 & Allen, Miss. Elisabeth Walton & female & 29 & 0 & 0 & 24160 & 211.3375 & B5 & S \\\\\n", "\t874 & 780 & 1 & 1 & Robert, Mrs. Edward Scott (Elisabeth Walton McMillan) & female & 43 & 0 & 1 & 24160 & 211.3375 & B3 & S \\\\\n", "\t875 & 378 & 0 & 1 & Widener, Mr. Harry Elkins & male & 27 & 0 & 2 & 113503 & 211.5000 & C82 & C \\\\\n", "\t876 & 528 & 0 & 1 & Farthing, Mr. John & male & NA & 0 & 0 & PC 17483 & 221.7792 & C95 & S \\\\\n", "\t877 & 381 & 1 & 1 & Bidois, Miss. Rosalie & female & 42 & 0 & 0 & PC 17757 & 227.5250 & & C \\\\\n", "\t878 & 558 & 0 & 1 & Robbins, Mr. Victor & male & NA & 0 & 0 & PC 17757 & 227.5250 & & C \\\\\n", "\t879 & 701 & 1 & 1 & Astor, Mrs. John Jacob (Madeleine Talmadge Force) & female & 18 & 1 & 0 & PC 17757 & 227.5250 & C62 C64 & C \\\\\n", "\t880 & 717 & 1 & 1 & Endres, Miss. Caroline Louise & female & 38 & 0 & 0 & PC 17757 & 227.5250 & C45 & C \\\\\n", "\t881 & 119 & 0 & 1 & Baxter, Mr. Quigg Edmond & male & 24 & 0 & 1 & PC 17558 & 247.5208 & B58 B60 & C \\\\\n", "\t882 & 300 & 1 & 1 & Baxter, Mrs. James (Helene DeLaudeniere Chaput) & female & 50 & 0 & 1 & PC 17558 & 247.5208 & B58 B60 & C \\\\\n", "\t883 & 312 & 1 & 1 & Ryerson, Miss. Emily Borie & female & 18 & 2 & 2 & PC 17608 & 262.3750 & B57 B59 B63 B66 & C \\\\\n", "\t884 & 743 & 1 & 1 & Ryerson, Miss. Susan Parker \"Suzette\" & female & 21 & 2 & 2 & PC 17608 & 262.3750 & B57 B59 B63 B66 & C \\\\\n", "\t885 & 28 & 0 & 1 & Fortune, Mr. Charles Alexander & male & 19 & 3 & 2 & 19950 & 263.0000 & C23 C25 C27 & S \\\\\n", "\t886 & 89 & 1 & 1 & Fortune, Miss. Mabel Helen & female & 23 & 3 & 2 & 19950 & 263.0000 & C23 C25 C27 & S \\\\\n", "\t887 & 342 & 1 & 1 & Fortune, Miss. Alice Elizabeth & female & 24 & 3 & 2 & 19950 & 263.0000 & C23 C25 C27 & S \\\\\n", "\t888 & 439 & 0 & 1 & Fortune, Mr. Mark & male & 64 & 1 & 4 & 19950 & 263.0000 & C23 C25 C27 & S \\\\\n", "\t889 & 259 & 1 & 1 & Ward, Miss. Anna & female & 35 & 0 & 0 & PC 17755 & 512.3292 & & C \\\\\n", "\t890 & 680 & 1 & 1 & Cardeza, Mr. Thomas Drake Martinez & male & 36 & 0 & 1 & PC 17755 & 512.3292 & B51 B53 B55 & C \\\\\n", "\t891 & 738 & 1 & 1 & Lesurer, Mr. Gustave J & male & 35 & 0 & 0 & PC 17755 & 512.3292 & B101 & C \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | \n", "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", "| 870 | 319 | 1 | 1 | Wick, Miss. Mary Natalie | female | 31 | 0 | 2 | 36928 | 164.8667 | C7 | S | \n", "| 871 | 857 | 1 | 1 | Wick, Mrs. George Dennick (Mary Hitchcock) | female | 45 | 1 | 1 | 36928 | 164.8667 | | S | \n", "| 872 | 690 | 1 | 1 | Madill, Miss. Georgette Alexandra | female | 15 | 0 | 1 | 24160 | 211.3375 | B5 | S | \n", "| 873 | 731 | 1 | 1 | Allen, Miss. Elisabeth Walton | female | 29 | 0 | 0 | 24160 | 211.3375 | B5 | S | \n", "| 874 | 780 | 1 | 1 | Robert, Mrs. Edward Scott (Elisabeth Walton McMillan) | female | 43 | 0 | 1 | 24160 | 211.3375 | B3 | S | \n", "| 875 | 378 | 0 | 1 | Widener, Mr. Harry Elkins | male | 27 | 0 | 2 | 113503 | 211.5000 | C82 | C | \n", "| 876 | 528 | 0 | 1 | Farthing, Mr. John | male | NA | 0 | 0 | PC 17483 | 221.7792 | C95 | S | \n", "| 877 | 381 | 1 | 1 | Bidois, Miss. Rosalie | female | 42 | 0 | 0 | PC 17757 | 227.5250 | | C | \n", "| 878 | 558 | 0 | 1 | Robbins, Mr. Victor | male | NA | 0 | 0 | PC 17757 | 227.5250 | | C | \n", "| 879 | 701 | 1 | 1 | Astor, Mrs. John Jacob (Madeleine Talmadge Force) | female | 18 | 1 | 0 | PC 17757 | 227.5250 | C62 C64 | C | \n", "| 880 | 717 | 1 | 1 | Endres, Miss. Caroline Louise | female | 38 | 0 | 0 | PC 17757 | 227.5250 | C45 | C | \n", "| 881 | 119 | 0 | 1 | Baxter, Mr. Quigg Edmond | male | 24 | 0 | 1 | PC 17558 | 247.5208 | B58 B60 | C | \n", "| 882 | 300 | 1 | 1 | Baxter, Mrs. James (Helene DeLaudeniere Chaput) | female | 50 | 0 | 1 | PC 17558 | 247.5208 | B58 B60 | C | \n", "| 883 | 312 | 1 | 1 | Ryerson, Miss. Emily Borie | female | 18 | 2 | 2 | PC 17608 | 262.3750 | B57 B59 B63 B66 | C | \n", "| 884 | 743 | 1 | 1 | Ryerson, Miss. Susan Parker \"Suzette\" | female | 21 | 2 | 2 | PC 17608 | 262.3750 | B57 B59 B63 B66 | C | \n", "| 885 | 28 | 0 | 1 | Fortune, Mr. Charles Alexander | male | 19 | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S | \n", "| 886 | 89 | 1 | 1 | Fortune, Miss. Mabel Helen | female | 23 | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S | \n", "| 887 | 342 | 1 | 1 | Fortune, Miss. Alice Elizabeth | female | 24 | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S | \n", "| 888 | 439 | 0 | 1 | Fortune, Mr. Mark | male | 64 | 1 | 4 | 19950 | 263.0000 | C23 C25 C27 | S | \n", "| 889 | 259 | 1 | 1 | Ward, Miss. Anna | female | 35 | 0 | 0 | PC 17755 | 512.3292 | | C | \n", "| 890 | 680 | 1 | 1 | Cardeza, Mr. Thomas Drake Martinez | male | 36 | 0 | 1 | PC 17755 | 512.3292 | B51 B53 B55 | C | \n", "| 891 | 738 | 1 | 1 | Lesurer, Mr. Gustave J | male | 35 | 0 | 0 | PC 17755 | 512.3292 | B101 | C | \n", "\n", "\n" ], "text/plain": [ " PassengerId Survived Pclass\n", "870 319 1 1 \n", "871 857 1 1 \n", "872 690 1 1 \n", "873 731 1 1 \n", "874 780 1 1 \n", "875 378 0 1 \n", "876 528 0 1 \n", "877 381 1 1 \n", "878 558 0 1 \n", "879 701 1 1 \n", "880 717 1 1 \n", "881 119 0 1 \n", "882 300 1 1 \n", "883 312 1 1 \n", "884 743 1 1 \n", "885 28 0 1 \n", "886 89 1 1 \n", "887 342 1 1 \n", "888 439 0 1 \n", "889 259 1 1 \n", "890 680 1 1 \n", "891 738 1 1 \n", " Name Sex Age SibSp\n", "870 Wick, Miss. Mary Natalie female 31 0 \n", "871 Wick, Mrs. George Dennick (Mary Hitchcock) female 45 1 \n", "872 Madill, Miss. Georgette Alexandra female 15 0 \n", "873 Allen, Miss. Elisabeth Walton female 29 0 \n", "874 Robert, Mrs. Edward Scott (Elisabeth Walton McMillan) female 43 0 \n", "875 Widener, Mr. Harry Elkins male 27 0 \n", "876 Farthing, Mr. John male NA 0 \n", "877 Bidois, Miss. Rosalie female 42 0 \n", "878 Robbins, Mr. Victor male NA 0 \n", "879 Astor, Mrs. John Jacob (Madeleine Talmadge Force) female 18 1 \n", "880 Endres, Miss. Caroline Louise female 38 0 \n", "881 Baxter, Mr. Quigg Edmond male 24 0 \n", "882 Baxter, Mrs. James (Helene DeLaudeniere Chaput) female 50 0 \n", "883 Ryerson, Miss. Emily Borie female 18 2 \n", "884 Ryerson, Miss. Susan Parker \"Suzette\" female 21 2 \n", "885 Fortune, Mr. Charles Alexander male 19 3 \n", "886 Fortune, Miss. Mabel Helen female 23 3 \n", "887 Fortune, Miss. Alice Elizabeth female 24 3 \n", "888 Fortune, Mr. Mark male 64 1 \n", "889 Ward, Miss. Anna female 35 0 \n", "890 Cardeza, Mr. Thomas Drake Martinez male 36 0 \n", "891 Lesurer, Mr. Gustave J male 35 0 \n", " Parch Ticket Fare Cabin Embarked\n", "870 2 36928 164.8667 C7 S \n", "871 1 36928 164.8667 S \n", "872 1 24160 211.3375 B5 S \n", "873 0 24160 211.3375 B5 S \n", "874 1 24160 211.3375 B3 S \n", "875 2 113503 211.5000 C82 C \n", "876 0 PC 17483 221.7792 C95 S \n", "877 0 PC 17757 227.5250 C \n", "878 0 PC 17757 227.5250 C \n", "879 0 PC 17757 227.5250 C62 C64 C \n", "880 0 PC 17757 227.5250 C45 C \n", "881 1 PC 17558 247.5208 B58 B60 C \n", "882 1 PC 17558 247.5208 B58 B60 C \n", "883 2 PC 17608 262.3750 B57 B59 B63 B66 C \n", "884 2 PC 17608 262.3750 B57 B59 B63 B66 C \n", "885 2 19950 263.0000 C23 C25 C27 S \n", "886 2 19950 263.0000 C23 C25 C27 S \n", "887 2 19950 263.0000 C23 C25 C27 S \n", "888 4 19950 263.0000 C23 C25 C27 S \n", "889 0 PC 17755 512.3292 C \n", "890 1 PC 17755 512.3292 B51 B53 B55 C \n", "891 0 PC 17755 512.3292 B101 C " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "train %>% arrange(Fare) %>% tail(22) #Extracting last 22 results after sorting the fare in asending order" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
631 1 1 Barkworth, Mr. Algernon Henry Wilsonmale 80.0 0 0 27042 30.0000 A23 S
852 0 3 Svensson, Mr. Johan male 74.0 0 0 347060 7.7750 S
97 0 1 Goldschmidt, Mr. George B male 71.0 0 0 PC 17754 34.6542 A5 C
494 0 1 Artagaveytia, Mr. Ramon male 71.0 0 0 PC 17609 49.5042 C
117 0 3 Connors, Mr. Patrick male 70.5 0 0 370369 7.7500 Q
673 0 2 Mitchell, Mr. Henry Michael male 70.0 0 0 C.A. 24580 10.5000 S
746 0 1 Crosby, Capt. Edward Gifford male 70.0 1 1 WE/P 5735 71.0000 B22 S
34 0 2 Wheadon, Mr. Edward H male 66.0 0 0 C.A. 24579 10.5000 S
55 0 1 Ostby, Mr. Engelhart Cornelius male 65.0 0 1 113509 61.9792 B30 C
281 0 3 Duane, Mr. Frank male 65.0 0 0 336439 7.7500 Q
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " PassengerId & Survived & Pclass & Name & Sex & Age & SibSp & Parch & Ticket & Fare & Cabin & Embarked\\\\\n", "\\hline\n", "\t 631 & 1 & 1 & Barkworth, Mr. Algernon Henry Wilson & male & 80.0 & 0 & 0 & 27042 & 30.0000 & A23 & S \\\\\n", "\t 852 & 0 & 3 & Svensson, Mr. Johan & male & 74.0 & 0 & 0 & 347060 & 7.7750 & & S \\\\\n", "\t 97 & 0 & 1 & Goldschmidt, Mr. George B & male & 71.0 & 0 & 0 & PC 17754 & 34.6542 & A5 & C \\\\\n", "\t 494 & 0 & 1 & Artagaveytia, Mr. Ramon & male & 71.0 & 0 & 0 & PC 17609 & 49.5042 & & C \\\\\n", "\t 117 & 0 & 3 & Connors, Mr. Patrick & male & 70.5 & 0 & 0 & 370369 & 7.7500 & & Q \\\\\n", "\t 673 & 0 & 2 & Mitchell, Mr. Henry Michael & male & 70.0 & 0 & 0 & C.A. 24580 & 10.5000 & & S \\\\\n", "\t 746 & 0 & 1 & Crosby, Capt. Edward Gifford & male & 70.0 & 1 & 1 & WE/P 5735 & 71.0000 & B22 & S \\\\\n", "\t 34 & 0 & 2 & Wheadon, Mr. Edward H & male & 66.0 & 0 & 0 & C.A. 24579 & 10.5000 & & S \\\\\n", "\t 55 & 0 & 1 & Ostby, Mr. Engelhart Cornelius & male & 65.0 & 0 & 1 & 113509 & 61.9792 & B30 & C \\\\\n", "\t 281 & 0 & 3 & Duane, Mr. Frank & male & 65.0 & 0 & 0 & 336439 & 7.7500 & & Q \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | \n", "|---|---|---|---|---|---|---|---|---|---|\n", "| 631 | 1 | 1 | Barkworth, Mr. Algernon Henry Wilson | male | 80.0 | 0 | 0 | 27042 | 30.0000 | A23 | S | \n", "| 852 | 0 | 3 | Svensson, Mr. Johan | male | 74.0 | 0 | 0 | 347060 | 7.7750 | | S | \n", "| 97 | 0 | 1 | Goldschmidt, Mr. George B | male | 71.0 | 0 | 0 | PC 17754 | 34.6542 | A5 | C | \n", "| 494 | 0 | 1 | Artagaveytia, Mr. Ramon | male | 71.0 | 0 | 0 | PC 17609 | 49.5042 | | C | \n", "| 117 | 0 | 3 | Connors, Mr. Patrick | male | 70.5 | 0 | 0 | 370369 | 7.7500 | | Q | \n", "| 673 | 0 | 2 | Mitchell, Mr. Henry Michael | male | 70.0 | 0 | 0 | C.A. 24580 | 10.5000 | | S | \n", "| 746 | 0 | 1 | Crosby, Capt. Edward Gifford | male | 70.0 | 1 | 1 | WE/P 5735 | 71.0000 | B22 | S | \n", "| 34 | 0 | 2 | Wheadon, Mr. Edward H | male | 66.0 | 0 | 0 | C.A. 24579 | 10.5000 | | S | \n", "| 55 | 0 | 1 | Ostby, Mr. Engelhart Cornelius | male | 65.0 | 0 | 1 | 113509 | 61.9792 | B30 | C | \n", "| 281 | 0 | 3 | Duane, Mr. Frank | male | 65.0 | 0 | 0 | 336439 | 7.7500 | | Q | \n", "\n", "\n" ], "text/plain": [ " PassengerId Survived Pclass Name Sex Age \n", "1 631 1 1 Barkworth, Mr. Algernon Henry Wilson male 80.0\n", "2 852 0 3 Svensson, Mr. Johan male 74.0\n", "3 97 0 1 Goldschmidt, Mr. George B male 71.0\n", "4 494 0 1 Artagaveytia, Mr. Ramon male 71.0\n", "5 117 0 3 Connors, Mr. Patrick male 70.5\n", "6 673 0 2 Mitchell, Mr. Henry Michael male 70.0\n", "7 746 0 1 Crosby, Capt. Edward Gifford male 70.0\n", "8 34 0 2 Wheadon, Mr. Edward H male 66.0\n", "9 55 0 1 Ostby, Mr. Engelhart Cornelius male 65.0\n", "10 281 0 3 Duane, Mr. Frank male 65.0\n", " SibSp Parch Ticket Fare Cabin Embarked\n", "1 0 0 27042 30.0000 A23 S \n", "2 0 0 347060 7.7750 S \n", "3 0 0 PC 17754 34.6542 A5 C \n", "4 0 0 PC 17609 49.5042 C \n", "5 0 0 370369 7.7500 Q \n", "6 0 0 C.A. 24580 10.5000 S \n", "7 1 1 WE/P 5735 71.0000 B22 S \n", "8 0 0 C.A. 24579 10.5000 S \n", "9 0 1 113509 61.9792 B30 C \n", "10 0 0 336439 7.7500 Q " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## Arrange in descending order\n", "\n", "train %>% arrange(desc(Age)) %>% head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### filter:\n", "\n", "Filter does row_wise filter ( similar to what select did with columns). filter() takes a logical expression and evaluates them and results the only_true datapoints. So to be clear, all that matters to filter() function is if the expression evaluates to TRUE.\n", "\n", "Let us start with filtering (extracting) only male and getting their Embarked station count. " ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
C 95
Q 41
S 441
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " Embarked & n\\\\\n", "\\hline\n", "\t C & 95\\\\\n", "\t Q & 41\\\\\n", "\t S & 441\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Embarked | n | \n", "|---|---|---|\n", "| C | 95 | \n", "| Q | 41 | \n", "| S | 441 | \n", "\n", "\n" ], "text/plain": [ " Embarked n \n", "1 C 95\n", "2 Q 41\n", "3 S 441" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "train %>% \n", "filter(Sex == 'male') %>%\n", "group_by(Embarked) %>%\n", "count()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\n", "
\n" ], "text/latex": [ "\\begin{tabular}{r|l}\n", " n\\\\\n", "\\hline\n", "\t 113\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "n | \n", "|---|\n", "| 113 | \n", "\n", "\n" ], "text/plain": [ " n \n", "1 113" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#Getting the count of everyone whose age is lesser than 18\n", "\n", "train %>% filter(Age < 18) %>% count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Coupling Filter with Regex to perform simple string manipulation and detection." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\n", "
344 0 2 Sedgwick, Mr. Charles Frederick Waddingtonmale 25 0 0 244361 13 S
\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " PassengerId & Survived & Pclass & Name & Sex & Age & SibSp & Parch & Ticket & Fare & Cabin & Embarked\\\\\n", "\\hline\n", "\t 344 & 0 & 2 & Sedgwick, Mr. Charles Frederick Waddington & male & 25 & 0 & 0 & 244361 & 13 & & S \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | \n", "|---|\n", "| 344 | 0 | 2 | Sedgwick, Mr. Charles Frederick Waddington | male | 25 | 0 | 0 | 244361 | 13 | | S | \n", "\n", "\n" ], "text/plain": [ " PassengerId Survived Pclass Name Sex \n", "1 344 0 2 Sedgwick, Mr. Charles Frederick Waddington male\n", " Age SibSp Parch Ticket Fare Cabin Embarked\n", "1 25 0 0 244361 13 S " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "train %>% filter(grepl('wick',train$Name))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*And this is dplyr in a nut shell and hope you get a decent start with this notebook if you are a beginner. Please share your thoughts in comments and suggestions! \n", "*\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.4.2" } }, "nbformat": 4, "nbformat_minor": 1 }