{ "cells": [ { "metadata": { "slideshow": { "slide_type": "slide" }, "toc": "true" }, "cell_type": "markdown", "source": "

Table of Contents

\n
" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "**Statistics:** \nis a branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "• A **population** is the set of all subjects in which we are interested. \n• A **sample** is a subset of a population. This is the group from which we have data. \n• A **parameter** is a numerical description of a population characteristic. \n• A **statistic** is a numerical description of a sample characteristic. \n• **Descriptive statistics** refers to methods for summarizing the data collected. \n• **Inferential statistics** involves using data from a sample to make conclusions about a \npopulation. " }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "Two uses of statistics, to describe or predict, and we call these: \n\n* Descriptive Statistics (to describe about the data)\n* Inferential Statistics (to discover patterns in data based on evidence and reasoning rather than explicit statements)\n" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "# Descriptive Statistics" }, { "metadata": { "slideshow": { "slide_type": "fragment" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/54wy5jK.png)\nImage Credit: https://www.cognity.pl/wykresy-w-excelu,blog,138.html " }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "## Types of Variables (Scales of Measurement) " }, { "metadata": { "slideshow": { "slide_type": "fragment" } }, "cell_type": "markdown", "source": "Nominal \nOrdinal \nInterval \nRatio " }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/xK8OLpY.png)" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "## Categorical" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "### Nominal scale:" }, { "metadata": { "slideshow": { "slide_type": "fragment" } }, "cell_type": "markdown", "source": "> In this scale, categories are nominated names (hence “nominal”). There is no inherent order between categories. Put simply, one cannot say that a particular category is superior/ better than another." }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "**Examples:** \n\n* **Gender (Male/ Female):-** One cannot say that Males are better than Females, or vice-versa.\n* **Blood Groups (A/B/O/AB):-** One cannot say that group A is superior to group O, for instance.\n* **Religion (Hindu/ Muslim/ Christian/ Buddhist, etc.):-** Here, too, the categories cannot be arranged in a logical order. Each category can only be considered as equal to the other.\n* **Colors (Red/Green/Blue/Yellow):-** " }, { "metadata": {}, "cell_type": "markdown", "source": "**Note:** a sub-type of nominal scale with only two categories (e.g. male/female) is called **“dichotomous.”** " }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "### Ordinal scale:" }, { "metadata": { "slideshow": { "slide_type": "fragment" } }, "cell_type": "markdown", "source": "> The various categories can be logically arranged in a meaningful order. However, the difference between the categories is not “meaningful”." }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "**Examples:**\n\n* **Ranks (1st/ 2nd/ 3rd, etc.):** The ranks can be arranged in either ascending or descending order without difficulty. However, the difference between ranks is not the same-the difference between the 1st rank and 2nd rank may be 20 units, but that between the 2nd and 3rd ranks may be 3 units. In addition, it is not possible to say that the 1st rank is x times better than the 2nd or 3rd rank purely on the basis of the ranks.\n* **Ranks (Good/ Better/ Best), (No pain/ Mild pain/ Moderate pain/ Severe pain):** Here, too, a meaningful arrangement (ordering) is possible, but the difference between the categories is subjective and not uniform. “Best” is not necessarily thrice as good as “Good”; or twice as good as “Better”.\n* **Degree (Graduate/ Master/ Phd)**\n* **Ranks (High/ Medium/ Low)**\n* **Customer Satisfaction Survey (Satisfied/ Neutral/ Dissatisfied)**" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "* **Likert scale (Strongly Disagree/ Disagree/ Neutral/ Agree/ Strongly Agree) :** The ordering is flexible- the order can easily be reversed without affecting the interpretation- (Strongly Agree/ Agree/ Neutral/ Disagree/ Strongly Disagree). Again, the difference between categories is not uniform." }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "**Note:** The best way to determine central tendency on a set of ordinal data is to use the **mode** or **median**; the mean cannot be defined from an ordinal set." }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "## Numerical" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "### Interval scale:" }, { "metadata": { "slideshow": { "slide_type": "fragment" } }, "cell_type": "markdown", "source": "> The values (not categories) can be ordered and have a meaningful difference, but doubling is not meaningful. This is because of the absence of an “absolute zero( The zero entry represents a position on a scale, but the entry is not inherently zero)\"." }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "**Example:** \n* Perhaps the best known example is temperature, in degrees *Celsius or Fahrenheit.* \n* **The Celsius scale:** The difference between 40 C and 50 C is the same as that between 20 C and 30 C (meaningful difference = equidistant). Besides, 50 C is hotter than 40 C (order). However, 20 C is not half as hot as 40 C and vice versa (doubling is not meaningful).\n\n* **Meaningful difference:** In the Celsius scale, the difference between each unit is the same anywhere on the scale- the difference between 49 C and 50 C is the same as the difference between any two consecutive values on the scale ( 1 unit).[Thus, (2-1)= (23-22)= (40-39)=(99-98)= 1].\n* **addition and subtraction make sense,** but **multiplication and division do not.** That is, 70 degrees is not “twice as hot” as 35 degrees. If this is confusing, think what a negative temperature would mean, or a 0 temperature! 30 degrees is -1 times as hot as -30 degrees? It doesn’t make sense! " }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "### Ratio scale:" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "> The values can be ordered, have a meaningful difference, and doubling is also meaningful. There is an **“absolute zero**(zero entry is an inherent zero)”." }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "**Examples:**\n\n* **The Kelvin scale:** 100 K is twice as hot as 50 K; the difference between values is meaningful and can be ordered.\n* *Hight and Weight* \n* 100 kg is twice as heavy as 50 kg; the difference between 45 kg and 55 kg is the same as that between 105 kg and 100 kg; values can be arranged in an order (ascending/ descending).\n* **Multiplication makes sense as well.** Two common examples are height and weight. A person who weighs 200 pounds weighs double what a person who weighs 100 pounds weighs. this difference is the same as that between 150 cm and 100 cm, or 200 cm and 150 cm; 100 cm is twice as tall as 50 cm;. The values also can be arranged in a particular manner (ascending/ descending). Ratio scales have a meaningful zero. " }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "| Provides | Nominal | Ordinal | Interval | Ratio |\n|----------------------------------|--------------|--------------|--------------|--------------|\n| Order of values is known | | $\\checkmark$ | $\\checkmark$ | $\\checkmark$ |\n| Counts or Frequency distribution | $\\checkmark$ | $\\checkmark$ | $\\checkmark$ | $\\checkmark$ |\n| Mode | $\\checkmark$ | $\\checkmark$ | $\\checkmark$ | $\\checkmark$ |\n| Median | | $\\checkmark$ | $\\checkmark$ | $\\checkmark$ |\n| Mean | | | $\\checkmark$ | $\\checkmark$ |\n| Difference between each value | | | $\\checkmark$ | $\\checkmark$ |\n| Add or subtract values | | | $\\checkmark$ | $\\checkmark$ |\n| Multiply and divide values | | | | $\\checkmark$ |\n| Has true zero | | | | $\\checkmark$ |\nImage Credit: https://www.mymarketresearchmethods.com/types-of-data-nominal-ordinal-interval-ratio/" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "## Measures of Central Tendency" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "**Mean**(Sum of all Observations / No. of Observations) \n**median**(mid point which divides data into 2 equal parts) \n**mode**(Most often or repeated the maximum number of times) " }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/wDLwYI2.png)\nImage Credit: https://www.wikitechy.com/tutorials/r-programming/mean-median-mode" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "### A R function for finding the mode" }, { "metadata": { "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "x<-c(13, 18, 13, 14, 13, 16, 14, 21, 13)", "execution_count": 2, "outputs": [] }, { "metadata": { "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "Mode <- function(x) {\n n<-length(table(x))\n mode <- as.integer(names(sort(table(x))))[n]\n print(mode)\n}\n", "execution_count": 3, "outputs": [] }, { "metadata": { "trusted": false }, "cell_type": "code", "source": "table(x)\nsort(table(x))\nnames(sort(table(x)))\nas.integer(names(sort(table(x))))\nn <- length(table(x))\nn\nas.integer(names(sort(table(x))))[n]", "execution_count": 4, "outputs": [ { "data": { "text/plain": "x\n13 14 16 18 21 \n 4 2 1 1 1 " }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": "x\n16 18 21 14 13 \n 1 1 1 2 4 " }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": "
    \n\t
  1. '16'
  2. \n\t
  3. '18'
  4. \n\t
  5. '21'
  6. \n\t
  7. '14'
  8. \n\t
  9. '13'
  10. \n
\n", "text/latex": "\\begin{enumerate*}\n\\item '16'\n\\item '18'\n\\item '21'\n\\item '14'\n\\item '13'\n\\end{enumerate*}\n", "text/markdown": "1. '16'\n2. '18'\n3. '21'\n4. '14'\n5. '13'\n\n\n", "text/plain": "[1] \"16\" \"18\" \"21\" \"14\" \"13\"" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": "
    \n\t
  1. 16
  2. \n\t
  3. 18
  4. \n\t
  5. 21
  6. \n\t
  7. 14
  8. \n\t
  9. 13
  10. \n
\n", "text/latex": "\\begin{enumerate*}\n\\item 16\n\\item 18\n\\item 21\n\\item 14\n\\item 13\n\\end{enumerate*}\n", "text/markdown": "1. 16\n2. 18\n3. 21\n4. 14\n5. 13\n\n\n", "text/plain": "[1] 16 18 21 14 13" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": "5", "text/latex": "5", "text/markdown": "5", "text/plain": "[1] 5" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": "13", "text/latex": "13", "text/markdown": "13", "text/plain": "[1] 13" }, "metadata": {}, "output_type": "display_data" } ] }, { "metadata": { "scrolled": false, "trusted": false }, "cell_type": "code", "source": "Mode(x)\nMode(mtcars$disp)\ntable(mtcars$disp)", "execution_count": 5, "outputs": [ { "name": "stdout", "output_type": "stream", "text": "[1] 13\n[1] 275\n" }, { "data": { "text/plain": "\n 71.1 75.7 78.7 79 95.1 108 120.1 120.3 121 140.8 145 146.7 160 \n 1 1 1 1 1 1 1 1 1 1 1 1 2 \n167.6 225 258 275.8 301 304 318 350 351 360 400 440 460 \n 2 1 1 3 1 1 1 1 1 2 1 1 1 \n 472 \n 1 " }, "metadata": {}, "output_type": "display_data" } ] }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "https://rstudio-pubs-static.s3.amazonaws.com/242140_d3bc74d91e8e47febc2e019762a4d877.html" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "## Measures of Variability " }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "variance \nStandard deviation " }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "![samplevarstd](https://i.imgur.com/932ScZV.gif)" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "**How to Calculate Variance:** https://www.wikihow.com/Calculate-Variance " }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "Range \nQuartiles \nIQR " }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "raw", "source": "Quartile: It divides the data set into 4 equal parts using Q1 (First Quartile), Q2 (Second Quartile) and Q3 (Third Quartile). Let’s understand it using an example, assume we have a numerical variable having values from 1 to 100. Here, value for Q1 would be 25 (value at 25th percentile), value for Q2 would be 50 (value at 50th percentile i.e. median) and value for Q3 would be 75 (value at 75th percentile).\n\nQuartiles are well understood when used with box plots. Box Plots is a five number summary which represents Minimum, Maximum, Q1, Q2 and Q3. Let’s find out." }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/DdSAvKa.png?1)" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/pytv7fm.png) \nsource: https://www.leansigmacorporation.com/box-plot-with-minitab/ " }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "## Measures of Shape" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "**Skewness ** \nSkewness is a measure of symmetry, or more precisely, the lack of symmetry. A \ndistribution, or data set, is symmetric if it looks the same to the left and right of \nthe center point. \n**Kurtosis** \nKurtosis is a parameter that describes the shape of a random variable’s probability \ndistribution." }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "![skewness-and-kurtosis](https://i.imgur.com/sEAMT4o.gif) \n\nsource: https://www.analyticsvidhya.com/blog/2014/07/statistics/skewness-and-kurtosis/ \n" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "**Skewness** \n* Measure of symmetrical distribution\n* 0 - The distributionon is symmetrical\n* Less than 0 - The value of the mean is less than the mode\n* Greater than 0 - The value of the mean is greater than the mode\n* Boxplot method" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "http://personal.cityu.edu.hk/~meachan/Online%20Anthropometry/Chapter5/Ch5-3.htm" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/gWfEjIO.png)" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "## Formulas" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/uqMeALb.png)
\nsource: http://www.stat.cmu.edu/~hseltman/309/Book/Book.pdf " }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "**List of packages where skewnes() and kurtosis() functions present** \n* moments \n* DescTools\n* e1071\n* fBasics package" }, { "metadata": { "slideshow": { "slide_type": "skip" }, "trusted": false }, "cell_type": "code", "source": "# install.packages(\"moments\")\n# install.packages(\"DescTools\")\n# install.packages(\"e1071\")\n# install.packages(\"fBasics\")", "execution_count": 14, "outputs": [ { "name": "stderr", "output_type": "stream", "text": "Installing package into ‘/home/nbuser/R’\n(as ‘lib’ is unspecified)\nWarning message in install.packages(\"DescTools\"):\n“installation of package ‘DescTools’ had non-zero exit status”Installing package into ‘/home/nbuser/R’\n(as ‘lib’ is unspecified)\nInstalling package into ‘/home/nbuser/R’\n(as ‘lib’ is unspecified)\n" } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "slide" }, "trusted": false }, "cell_type": "code", "source": "time <- c(19.09, 19.55, 17.89, 17.73, 25.15, 27.27, 25.24, 21.05, 21.65, 20.92, 22.61, \n 15.71, 22.04, 22.6, 24.25)\n\nlibrary(moments)\n# library(e1071)\nskewness(time)\n\n# [1] -0.01565162\n\nkurtosis(time)\n\n# [1] 2.301051", "execution_count": 15, "outputs": [ { "data": { "text/html": "-0.0156516191272306", "text/latex": "-0.0156516191272306", "text/markdown": "-0.0156516191272306", "text/plain": "[1] -0.01565162" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": "2.30105132553941", "text/latex": "2.30105132553941", "text/markdown": "2.30105132553941", "text/plain": "[1] 2.301051" }, "metadata": {}, "output_type": "display_data" } ] }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "## Working with Titanic dataset" }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "train <- read.csv(\"data/train1.csv\", stringsAsFactors = FALSE, header = T)\nhead(train)", "execution_count": 1, "outputs": [ { "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\t\n\n
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
1 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.2500 S
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer)female 38 1 0 PC 17599 71.2833 C85 C
3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.9250 S
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1000 C123 S
5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.0500 S
6 0 3 Moran, Mr. James male NA 0 0 330877 8.4583 Q
\n", "text/latex": "\\begin{tabular}{r|llllllllllll}\n PassengerId & Survived & Pclass & Name & Sex & Age & SibSp & Parch & Ticket & Fare & Cabin & Embarked\\\\\n\\hline\n\t 1 & 0 & 3 & Braund, Mr. Owen Harris & male & 22 & 1 & 0 & A/5 21171 & 7.2500 & & S \\\\\n\t 2 & 1 & 1 & Cumings, Mrs. John Bradley (Florence Briggs Thayer) & female & 38 & 1 & 0 & PC 17599 & 71.2833 & C85 & C \\\\\n\t 3 & 1 & 3 & Heikkinen, Miss. Laina & female & 26 & 0 & 0 & STON/O2. 3101282 & 7.9250 & & S \\\\\n\t 4 & 1 & 1 & Futrelle, Mrs. Jacques Heath (Lily May Peel) & female & 35 & 1 & 0 & 113803 & 53.1000 & C123 & S \\\\\n\t 5 & 0 & 3 & Allen, Mr. William Henry & male & 35 & 0 & 0 & 373450 & 8.0500 & & S \\\\\n\t 6 & 0 & 3 & Moran, Mr. James & male & NA & 0 & 0 & 330877 & 8.4583 & & Q \\\\\n\\end{tabular}\n", "text/markdown": "\nPassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | \n|---|---|---|---|---|---|\n| 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22 | 1 | 0 | A/5 21171 | 7.2500 | | S | \n| 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Thayer) | female | 38 | 1 | 0 | PC 17599 | 71.2833 | C85 | C | \n| 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26 | 0 | 0 | STON/O2. 3101282 | 7.9250 | | S | \n| 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35 | 1 | 0 | 113803 | 53.1000 | C123 | S | \n| 5 | 0 | 3 | Allen, Mr. William Henry | male | 35 | 0 | 0 | 373450 | 8.0500 | | S | \n| 6 | 0 | 3 | Moran, Mr. James | male | NA | 0 | 0 | 330877 | 8.4583 | | Q | \n\n\n", "text/plain": " PassengerId Survived Pclass\n1 1 0 3 \n2 2 1 1 \n3 3 1 3 \n4 4 1 1 \n5 5 0 3 \n6 6 0 3 \n Name Sex Age SibSp Parch\n1 Braund, Mr. Owen Harris male 22 1 0 \n2 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38 1 0 \n3 Heikkinen, Miss. Laina female 26 0 0 \n4 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 \n5 Allen, Mr. William Henry male 35 0 0 \n6 Moran, Mr. James male NA 0 0 \n Ticket Fare Cabin Embarked\n1 A/5 21171 7.2500 S \n2 PC 17599 71.2833 C85 C \n3 STON/O2. 3101282 7.9250 S \n4 113803 53.1000 C123 S \n5 373450 8.0500 S \n6 330877 8.4583 Q " }, "metadata": {}, "output_type": "display_data" } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "# find the average(mean) Fare\nmean(train$Fare)\n# On an average, passengers have paid $32 to board the titanic.\nlibrary(DescTools)\nMode(train$Age)\ntable(train$Age)\n# Most common age among passengers on Titanic was 24 years. As you can see, there\n# were 30 passengers on board who are 24 years old (highest among all). find out\n# the median\nmedian(train$Fare)\n# The mid value of Fare variable is $14.45. This means $14.45 divides the data\n# into two halves.", "execution_count": 17, "outputs": [ { "data": { "text/html": "32.2042079685746", "text/latex": "32.2042079685746", "text/markdown": "32.2042079685746", "text/plain": "[1] 32.20421" }, "metadata": {}, "output_type": "display_data" }, { "ename": "ERROR", "evalue": "Error in library(DescTools): there is no package called ‘DescTools’\n", "output_type": "error", "traceback": [ "Error in library(DescTools): there is no package called ‘DescTools’\nTraceback:\n", "1. library(DescTools)", "2. stop(txt, domain = NA)" ] } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "# variance of fare\nvar(train$Fare)\n# the average of squared difference from mean. Standard Deviation of Fare\nsqrt(var(train$Fare))\nsd(train$Fare)\n# calculate range\nrange(train$Fare)\n# It shows the lowest value and the highest value in a set of observation.\n", "execution_count": 18, "outputs": [ { "data": { "text/html": "2469.43684574312", "text/latex": "2469.43684574312", "text/markdown": "2469.43684574312", "text/plain": "[1] 2469.437" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": "49.6934285971809", "text/latex": "49.6934285971809", "text/markdown": "49.6934285971809", "text/plain": "[1] 49.69343" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": "49.6934285971809", "text/latex": "49.6934285971809", "text/markdown": "49.6934285971809", "text/plain": "[1] 49.69343" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": "
    \n\t
  1. 0
  2. \n\t
  3. 512.3292
  4. \n
\n", "text/latex": "\\begin{enumerate*}\n\\item 0\n\\item 512.3292\n\\end{enumerate*}\n", "text/markdown": "1. 0\n2. 512.3292\n\n\n", "text/plain": "[1] 0.0000 512.3292" }, "metadata": {}, "output_type": "display_data" } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "# find boxplot\nboxplot(train$Age ~ train$Pclass, xlab = \"Class\", ylab = \"Age\", col = c(\"red\"))", "execution_count": 2, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA0gAAANICAMAAADKOT/pAAAAM1BMVEUAAABNTU1oaGh8fHyM\njIyampqnp6eysrK9vb3Hx8fQ0NDZ2dnh4eHp6enw8PD/AAD///89ODILAAAACXBIWXMAABJ0\nAAASdAHeZh94AAAdg0lEQVR4nO3d7XraSBKG4RZgsDGwnP/RrhGO4yTTFkLVXW+VnvvHLDNX\ntF0t6YnNR5xyBbBY8R4AyICQAAOEBBggJMAAIQEGCAkwQEiAAUICDBASYICQAAOEBBggJMAA\nIQEGCAkwQEiAAUICDBASYICQAAOEBBggJMAAIQEGCAkwQEiAAUICDBASYICQAAOEBBggJMAA\nIQEGCAkwQEiAAUICDBASYICQAAOEBBggJMAAIQEGCAkwQEiAAUICDBASYICQAAOEBBggJMAA\nIQEGCAkwQEiAAUICDBASYICQAAOEBBggJMAAIQEGCAkwQEiAAUICDBASYICQAAOEBBggJMAA\nIQEGCAkwQEiAAUICDBASYICQAAOEBBggJMAAIQEGCAkwQEiAAUICDBASYICQAAOEBBggJMAA\nIQEGCAkwQEiAAUICDBASYICQAAOEBBggJMAAIQEGCAkwQEiAAUICDBASYICQAAOEBBggJMAA\nIQEGCAkwQEiAAUICDBASYKBDSAUI5om73D4chyUAS4QEGCAkwAAhAQYICTBASIABQgIMEBJg\ngJAAA4QEGCAkwAAhAQa6hvR+2I2f79vt31stAbjoGNJl8+2zstsmSwBOOoa0L8PbaXx0Pg5l\n32IJwEnHkIZy+np8KkOLJQAnHUP6488+/fwHoQgJwfAVCTDQ9znS8Tw+4jkSsun58vf226t2\nm0uTJQAffd9H2o/vIw27A+8jIRc+2QAY0Alp4c82emIRzx/FhGR6hnR+KcPhen3dlOHHlxo0\nviIpzIAwen5EaLj95v16CPIRIYUZEEbXl78/vg7th/JyuV72+i9/K8yAMLq+ITseXcYXvvXf\nkFWYAWF0/4jQ5zNzPiKEVBy+It3+edH/igTM4PAcaX/5fGy/BOCEV+1qFGZAGLyPVKMwA8LQ\n+WRD5yUmKcyAMAipRmEGhEFINQozIAxCAgwQEmCAkAADhFSjMAPCIKQahRkQBiHVKMyAMAip\nRmEGhEFINQozIAxCAgwQEmCAkAADhFSjMAPCIKQahRkQBiHVKMyAMAipRmEGhEFINQozIAxC\nAgwQEmCAkAADhFSjMAPCIKQahRkQBiHVKMyAMAipRmEGhEFINQozIAxCAgwQEmCAkBBd+fmv\nf+w0RJdDBJeYpDADpo0V+adESDUKM2Ba+fZPR4RUozADJpW//tcLIdUozIBJhOS8xCSFGTCJ\nkJyXQBI8R/JdAknwqp3vEkiD95E8l5ikMAPCIKQahRkQBiHVKMyAMAipRmEGhEFINQozIAxC\nAgwQEmCAkAADhFSjMAPCIKQahRkQBiHVKMyAMAipRmEGhEFINQozIAxCAgwQEmCAkAADhFSj\nMAMewR/s81xiksIMmMYfNfddYpLCDJjGDz/xXWKSwgyYxI/jcl5iksIM1iSeTdgiJOclVkjk\n2YQtQnJeYoVEnk0YE9kVIa2Gyu/dxkS+zhJSjcIMppKGJPLMj5BqFGYwlTYkCYRUozCDLZFn\nEzkRUo3CDLZEnk3kREg1CjNYk3g2kRMhAQYICTBASIABQqpRmAFhEFKNwgwIg5BqFGZAGIRU\nozADwiCkGoUZEAYhAQYICTBASIABQqpRmAFhEFKNwgwIg5BqFGZAGIRUozADwiCkGoUZEAYh\nAQYICTBASIABQqpRmAFhEFKNwgzW+OEnzRBSjcIMtvhxXA0RUo3CDLb4AZENEVKNwgym+JHF\nLRHSahBSS4S0GoTUEiGtB8+RGiKkGoUZbGV91U7iRX1CqlGYwZrELWdM5LcHQqpRmAHTRL5h\nJaQahRkwSeUlFEKqUZgBkwjJeQnkQEjOSyAJniP5LoEkeNXOd4lJCjPgERIv6hNSjcIMCIOQ\nahRmQBiEVKMwA8IgpBqFGRAGIQEGCAnR8aqd5xJIgveRfJeYpDADpvHJBt8lJinMgEl81s55\niUkKM2ASITkvMUlhBkwiJOclJinMgGk8R/JdAknwqp3vEqsk8Y6LOYldEdJ6iPzenRMh1SjM\nYEvk2UROhFSjMIMplde3ciKkGoUZTBFSS4RUozCDKUJqiZBqFGawxXOkhghpPbK+are+l7/f\nD7tys9u/t1oCP5G45YyJ/PbQMaTLpvy2bbIE1kfkG9aOIe3L8HYaH52PQ9m3WMKSwgyYpPIS\nSseQhnL6enwqQ4slLCnMgEkrDOmP72N//qbW+6zcKMyASSsMia9IaGCVz5GO5/ERz5Gc8Kpd\nuzG6HHK3/faq3ebSZAn8QOSWMyfx20Pf95H24/tIw+7A+0gORL4JyolPNqyGytPynHRCKt+1\nWWIWhRlMEVJLPUO67G8v1R02pWzfGi1hSGEGU4TUUseQzsPHV5rLEOUjQgoz2OI5UkMdQ3op\nu8vHP17OH0298PJ3fzrfNtuS2FPXTzZcPv/x8V0eb8j2lzMkkRf1e39EaCjf/sV8Cfwk57d2\nIrvq+q3d6Xo93D8ndPn5SZL7acko54sNKrvqGNKpDPvTdTd8lHTclGOLJfADlVvOlsquer78\nfRx+v1F0aLOEIYUZTKnccrZUdtX3Ddm3l/FPye4O52ZLmFGYwZbIswljIrvS+WRD5yUmKcxg\nS+T1LWMiuyKkGoUZrOV78ftGYleEVKMwA8IgJMAAIQEGCAkwQEg1CjMgDEKqUZgBYRBSjcIM\n1iReKM6JkGoUZrAl8tZlToRUozCDLZEP0+RESKuh8vHOnAhpNQipJUJaDUJqiZBqFGawxXOk\nhgipRmEGW7xq1xAh1SjMYI33kZohpBqFGRAGIdUozIBHSHydJSTEJvLMj5AQm8hrkYSE0FTe\nHSOkGoUZMImQnJeYpDADJhGS8xKTFGbANJ4j+S4xSWEGTONVO98lJinM8KDShPeuHiYxKyGt\nCie2FUJaFU5sK4S0KpzYVgipRmEGcyk3JYGQahRmQBiEVKMwA8IgpBqFGRAGIdUozGAu5aYk\nENKqcGJbIaRV4cS2QkirwolthZBqFGYwl3JTEgipRmEGhEFINQozIAxCqlGYAWEQUo3CDOZS\nbkoCIa0KJ7YVQloVTmwrhLQqnNhWCKlGYQZzKTclgZBqFGZAGIRUozADwiCkGoUZEAYh1SjM\nYC7lpiQQ0qpwYlshpFXhxLZCSKvCiW2FkGoUZjCXclMSCKlGYYY1C/Y3AxBSjcIMeIjCpSKk\nGoUZ8BCFS0VINQozmEu5KYldEdKq5DyxCrsipFXhxLZCSKvCiW2FkGoUZjCXclMSCKlGYQY8\nROFSEVKNwgx4iMKlIqQahRnwEIVLRUg1CjOYS7kpiV0R0qrkPLEKuyKkVeHEtpIipDYfFG6i\nzfl6/Ew5r59XjpD+F4X3jey9fl6E1BU3cgsKZ5WQulK45PkonFVC6krhkuejcFYJqSvvS+69\nfhsKuyKkrrwvuff6bSjsipC68r7k3uvnRUhded/I3uvnRUhded/I3uvnRUhdcSO3oHBWCakr\nhUuej8JZJaSuFC55PgpnlZC68r7k3uu3obArQurK+5J7r9+Gwq4IqSvvS+69fl6E1JX3jey9\nfl6E1JX3jey9fl6E1BU3cgsKZ5WQulK45PkonFVC6krhkuejcFYJqSvvS+69fhsKuyKkrrwv\nuff6bSjsipC68r7k3uvnRUhded/I3uvnRUhded/I3uvnRUhdcSO3oHBWCakrhUuej8JZJaSu\nFC55PgpnlZC68r7k3uu3obArQurK+5J7r9+Gwq4IqSvvS+69fl6E1JX3jey9fl6E1JX3jey9\nfl6E1BU3cgsKZ5WQulK45PkonFVC6krhkuejcFYJqSvvS+69fhsKuyKkrrwvuff6bSjsipC6\n8r7k3uvnRUhded/I3uvnRUhded/I3uvnRUhdcSO3oHBWCakrhUuej8JZJaSuFC55PgpnlZC6\n8r7k3uu3obArQurK+5J7r9+Gwq4Wh3TclY//sDsbzfNfS0z/cu8+HuZ9yb3Xz2tpSNtSbiGV\nwbQkQmrEe/28Fob0WraXW0iv5cVspCshNeO9fl4LQxrK5XoL6f4PM4SEGRTO6sKQxm/rCOlh\nCpc8H4WzujCkzedXpFPZmI10JSTMonBWbZ4jHYfyajbSlZCa8V6/DYVdLX3VblfutlYD/bvE\nA7/cu4+HeV9y7/XbUNiVyftIZff20JHvh3t3u/276VSE9PCZcl4/r46fbLhsym8/fwUjpEa8\n18+rY0j7Mrydxkfnj+dUe8MlCOnhM+W8fl6LX/7+/TXmxzSut/ecTl+PT2UwnIqQ1k3hrNqF\nVH5u46+3mn5+34mQMIPCWV36rd3LcPz458e3au/X3c/frvEV6X+E1IbCWV0Y0v4zjlPZXi8T\nb8p+PEc63j/aynMkL97rt6GwK4OPCH09mPqY0Pbbt4Gbi+FUhPTwmXJevw2FXS3+0Oqvr0jD\nA5+3e9+P7yMNuwPvI/nwXj+vxd/a/XqOtL++2X28gZAa8V4/L5M/2Hd/g7Us+7zdHy8AzjzU\nu4+Hed/I3uvnZfQRoduXpXJ4+P9halVCwgwKZ9Xlh58QEiwpnFWjkE77iXdjr39978YbsjCj\ncFYtQjofNpMfa/jwPhCS9yX3Xr8NhV0tDunydvtQ9/b4wIGXXdmO78jyrZ0X7/XbUNjVwpDe\n7q/aPfqzuN5Kuf3JJULy4r1+XktCOr7c3l3dn2a8Wn3elt2FkNx4r5/XgpCGW0W3jyjMetvn\nUIYjIXnxXj+vBSGVXx88nff+6Wkz/YYrIWEGhbPa/SvShxdCgiWFs2rwHOnd9odD/rHEg7/c\nu4+HKVzyfBTOat9X7Z5Z4pFf7t3Hw7wvuff6bSjsyuh9pN0j7yM9u8T0L/fu42Hel9x7/TYU\ndtXxkw1Llpj45d59PMz7knuvn1fHz9otXOKnX+7dx8O8b2Tv9fNy+fS39RKE9PCZcl4/L0Lq\nihu5BYWzSkhdKVzyfBTOKiF1pXDJ81E4q4TUlfcl916/DYVd5Qgpjjbn6/Ez5bx+Gwq7IqS+\n2pyvx8+U8/p5EVJfbc7X42fKef28coTk/dTnYd43svf6eRFSV9zILSicVULqSuGS56NwVgmp\nK4VLno/CWSWkrrwvuff6bSjsipC6mrEx79cX55h3ucx5r39DSF3NCcl71scp3MjeCKkrQsqK\nkLoipKwIqStCasF7/RtC6oqQWvBe/4aQuiKkFrzXvyGkrgipBe/1bwipK0JqwXv9G0LqipCy\nIqSuCCkrQuqKkLIipK4IqQXv9W8IqStCasF7/RtC6oqQWvBe/4aQuiKkFrzXvyGkrgipBe/1\nbwipK0LKipC6IqSsCKkrQsqKkLoipBa8178hpK4IqQXv9W8IqStCasF7/RtC6oqQWvBe/4aQ\nuiKkFrzXvyGkrggpK0LqipCyIqSuCCkrQuqKkFrwXv+GkLpae0jeP+1/jpn34Lxf/uQhjZeI\nc8+tPiTvUR9HSMoIKQxCUkZIYRCSMkIKg5CUEVIYhKSMkMIgJGWEFAYhKSOkMAhJGSGFQUjK\nCCkMQlJGSGEQkjJCCoOQlBFSGISkjJDCICRlhBQGISkjpDAISRkhhUFIyggpDEJSRkhhEJKy\nOSEFkvBKEZI0QgqDkJQRUhiEpIznSGEQkjJCCoOQlBFSGISkjJDCICRlhBQGISkjpDAISRkh\nhUFIyggpDEJSRkhhEJIyQgqDkJQRUhiEpIyQwiAkZYQUxipDimPGprzvpMcRUpKQmvCeIeU9\nl3JT953NvLzPHSK4xCTvGVLecyk3dd/ZzMv73CGCS0zyniHlPZdyU/edzby8zx0iuMQk7xlS\n3nMpN3Xf2czL+9whgkvIS3nPpdzUfWczL+9zhwguIS/lPZdyU/edzby8zx0iuIS8lPdcyk3d\ndzbz8j53iOASk7xnSHnPpdzUfWczL+9zhwguMcl7hpT3XMpN3Xc28/I+d4jgEpO8Z0h5z6Xc\n1H1nMy/vc4cILjHJe4aU91zKTd13NvPyPneI4BKTvGdIec+l3NR9ZzMv73OHCC4hL+U9l3JT\n953NvLzPHSK4hLyU91zKTd13NvPyPneI4BLyUt5zKTd139nMy/vcIYJLTPKeIeU9l3JT953N\nvLzPHSK4xCTvGVLecyk3dd/ZzMv73CGCS0zyniHlPZdyU/edzby8zx0iuMQk7xlS3nMpN3Xf\n2czL+9whgktM8p4h5T3n/SMz5ph5uWZe3ucOEVxCHiE5m3m5Zl7e5w4RXEIeITmbeblmXt7n\nDhFcQl7OkLxHfRwhGfGeIeU9l3JT953NvLzPHSK4xCTvGVLecyk3dd/ZzMv73CGCS0zyniHl\nPZdyU/edzby8zx0iuMQk7xlS3nMpN3Xf2czL+9whgktM8p7B+0WrOR7flHcejyMk1HmfWEJa\ndojgEvBASMsOEVwCHghp2SGCS0xSmCEdQlp2iOASkxRmMOe9KUJadojgEpMUZjDnvSlCWnaI\n4BKTFGYw570pQlp2iOASkxRmMOe9KUJadojgEuvkfWIJadkhgkvAAyEtO0RwCXggpGWHfHo/\n7MaPZu32762WsKMwQzqEtOyQ0WXz7WOO2yZLWFKYwZz3pghp2SGjfRneTuOj83Eo+xZLWFKY\nwZz3pghp2SGjoZy+Hp/K0GIJSwozmPPeFCEtO+R+XKn9i9kSlhRmMOe9KUJadsgo2FeklLxP\nLCEtO2T08RzpeB4fhXiOhAYIadkhd9tvr9ptLk2WgDZCWnbIp/f9+D7SsDvwPtI6EdKyQwSX\nmKQwgznvTRHSskMe+r996sfStKMwgznvTRHSskPuLi+lbI+f/ye8/O3Be1OEtOyQ0WW4f9Du\n/n9CSB68N0VIyw4Z7cvrR02vw/gxO/2QUvI+sYS07JDRcD/wPGzOhLRShLTskPtxnwdetltC\nWilCWnbIaFN+vQm72QYISWGGdAhp2SGj1/Ly+ehctoTkwntThLTskLv9Vz3HibeKvK/3jcIM\n5rw3RUjLDvl02v16dH4hJA/emyKkZYcILjFJYQZz3psipGWHCC6xTt4nlpCWHSK4BDwQ0rJD\nBJeAB0JadojgEpMUZkiHkJYdIrjEJIUZzHlvquXfCG1t5s6eOBnzDxFcYpLCDOZSbkoCIdUo\nzGAu5aYkdkVINQozmEu5KYldEdKq5DyxCrsiJISncLsQEsJTuF0IqUZhBoRBSDUKM5hLuSkJ\nhFSjMIO5lJuSQEg1CjOYS7kpiV0RUo3CDOZSbkpiV4S0KjlPrMKuCAnhKdwuhITwFG4XQqpR\nmAFhEFKNwgzmUm5KAiHVKMxgLuWmJBBSjcIM5lJuSmJXhFSjMIO5lJuS2BUhrUrOE6uwK0JC\neAq3CyEhPIXbhZBqFGZAGIRUozCDuZSbkkBINQozmEu5KQmEVKMwg7mUm5LYFSHVKMxgLuWm\nJHZFSKuS88Qq7IqQEJ7C7UJICE/hdiGkGoUZEAYh1SjMYC7lpiQQUo3CDOZSbkoCIdUozGAu\n5aYkdkVINQozmEu5KYldEdKq5DyxCrsiJISncLsQEsJTuF0IqUZhBoRBSDUKM5hLuSkJhFSj\nMIO5lJuSQEg1CjOYS7kpiV0RUo3CDOZSbkpiV4S0KjlPrMKuCAnhKdwuhITwFG4XQqpRmAFh\nEFKNwgzmUm5KAiHVKMxgLuWmJBBSjcIM5lJuSmJXhFSjMIO5lJuS2BUhrUrOE6uwK0JCeAq3\nCyEhPIXbhZBqFGZAGIRUozCDuZSbkkBINQozmEu5KQmEVKMwg7mUm5LYFSHVKMxgLuWmJHZF\nSKuS88Qq7IqQEJ7C7UJICE/hdiGkGoUZEAYh1SjMYC7lpiQQUo3CDOZSbkoCIdUozGAu5aYk\ndkVINQozmEu5KYldEdKq5DyxCrsiJISncLsQEsJTuF0IqUZhBoRBSDUKM5hLuSkJhFSjMIO5\nlJuSQEg1CjOYS7kpiV0RUo3CDOZSbkpiV4S0KjlPrMKuCAnhKdwuhITwFG4XQqpRmAFhEFKN\nwgzmUm5KAiHVKMxgLuWmJBBSjcIM5lJuSmJXhFSjMIO5lJuS2BUhrUrOE6uwK0JCeAq3CyEh\nPIXbhZBqFGZAGIRUozCDuZSbkkBINQozmEu5KQmEVKMwg7mUm5LYFSHVKMxgLuWmJHZFSKuS\n88Qq7IqQEJ7C7UJICE/hdiGkGoUZEAYh1SjMYC7lpiQQUo3CDOZSbkoCIdUozGAu5aYkdkVI\nNQozmEu5KYldEVICpQnvXT1OYVRCQngKtwshITyF24WQahRmQBiEVKMwg7VQz3xiIaQahRls\njRWRUhuEVKMwg63y7Z+pKGyJkGoUZjBV/vrfPBR2REirQUgtEdJqEFJLhLQePEdqiJBqFGaw\nFe6jP5EQUo3CDLYIqSFCqlGYwVbab+0UEFKNwgymeLGhJUKqUZjBFCG1REirQUgtEdJ6pH2O\npLAlQlqPtK/aKWyJkGoUZrCVNiQFhFSjMIMtQmqIkGoUZjBVPv88UrqNSSCkGoUZTH0WlDAk\nhR2tLKQ2P7hK4UJO4+XvllYW0prxFamlriG9H3bjb+C7/XurJVCV9zmSwo46hnTZfPtmaNtk\nCfwk0vehrb4Lbzdul0NG+zK8ncZH5+NQ9i2WwE9ihRRMx5CGcvp6fCpDiyXwMzJqpmNIf1zE\nf69ovJfBgC98RQIM9H2OdDyPj3iOhGx6vvy9/fa92+bSZAnAR9/3kfbj+0jD7sD7SMiFTzYA\nBggJMEBIgAFCAgwQEmCAkAADhAQYICTAACEBBggJMEBIgAFCAgwQEmCAkAADhAQYEA0JCOaJ\nu9w+HEUpt5lyU1F3FXTsuVJuM+Wmou4q6Nhzpdxmyk1F3VXQsedKuc2Um4q6q6Bjz5Vymyk3\nFXVXQceeK+U2U24q6q6Cjj1Xym2m3FTUXQUde66U20y5qai7Cjr2XCm3mXJTUXcVdOy5Um4z\n5aai7iro2HOl3GbKTUXdVdCx50q5zZSbirqroGMDWggJMEBIgAFCAgwQEmCAkAADhAQYICTA\nACEBBggJMEBIgAFCAgwQEmCAkAADhAQYICTAwDpCes23zddNGfYX7ymMXV5KeTl5T/GUfHfY\nfzg989cLaNuPf2nCkKykYdxVyJLS3WH/4TSkC+lUXi63L7Qv3oOY2t/2sy877zmeke0O+w+v\nZZsupN19Q8n2NZTbV9iYmwo59DxlH/TaTEu5rzJ4T/CMjFfiL6ekN9zHc/Oy9R7B3r68eo/w\njJx32N+ShvRajt4jWHsrH99ARJTzDvtbzpDOQ8in5T963Q3l4D3EM1LeYf9IGdJlSPiN3YeX\nkN/bZbzD/pUypO3Ge4I2LiFfbch4h/0rYUjnzfbsPUMjIa9WxJnnC3lpfnTM+ILd/X2kc4n4\npTbdHfaf0oV0ztjR/ZMNlx3PkWSlC+mlfPIexNb9s3Yhf49IdiUqst1w15IzpOt+KJuIX4/W\nEhLQGCEBBggJMEBIgAFCAgwQEmCAkAADhAQYICTAACEBBggJMEBIgAFCAgwQEmCAkAADhAQY\nICTAACEBBggJMEBIgAFCAgwQEmCAkAADhAQYICTAACEBBggJMEBIgAFCAgwQEmCAkAADhAQY\nICTAACGFcHoZystxfJjub+nLgasSwf7+91xuzldCEsVVCeBQho+vRpeP/zkTkiiuir7zGND1\n9neZvxCSKK6Kvn053B9cdq+fIR13pQz78T8et6Vsj38+QneEpG9bTt/+7RbS4f6c6VbS6/3h\n6/dH6I+Q9P35zdzt30p5u17fxv8+3Cp7K5vvj9AfIen7N6Rvj0o5fv0r39b5ISR9/xXS+XjY\njo/2pexO47d+vx+hP0LSt/t6jnS8fIa0vT8fuv23w/DxYHxZ7/cjdEdI+g6/XrV7vz0BuuXz\nUjavx/Ovr1TH/ebzmdHvR+iMkPR9vY+0La+/XmwY//Pvi/fXEyd0x2kP4GX8ZMN5V4brr5De\nr6f7c6TN/QW8zfdH6I+QItj+9Vm7z8/efeR0exH8n0foj5BCeNuVsn0bH45fh14+/vX9WHbX\nz88zjPX8foTuCAkwQEiAAUICDBASYICQAAOEBBggJMAAIQEGCAkwQEiAAUICDBASYICQAAOE\nBBggJMAAIQEGCAkwQEiAAUICDBASYICQAAOEBBggJMAAIQEGCAkwQEiAAUICDBASYICQAAOE\nBBggJMDA/wGUzE0GL/axxAAAAABJRU5ErkJggg==", "text/plain": "plot without title" }, "metadata": {}, "output_type": "display_data" } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "#base package\nfivenum(train$Age)\nsummary(train)", "execution_count": 20, "outputs": [ { "data": { "text/html": "
    \n\t
  1. 0.42
  2. \n\t
  3. 20
  4. \n\t
  5. 28
  6. \n\t
  7. 38
  8. \n\t
  9. 80
  10. \n
\n", "text/latex": "\\begin{enumerate*}\n\\item 0.42\n\\item 20\n\\item 28\n\\item 38\n\\item 80\n\\end{enumerate*}\n", "text/markdown": "1. 0.42\n2. 20\n3. 28\n4. 38\n5. 80\n\n\n", "text/plain": "[1] 0.42 20.00 28.00 38.00 80.00" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": " PassengerId Survived Pclass Name \n Min. : 1.0 Min. :0.0000 Min. :1.000 Length:891 \n 1st Qu.:223.5 1st Qu.:0.0000 1st Qu.:2.000 Class :character \n Median :446.0 Median :0.0000 Median :3.000 Mode :character \n Mean :446.0 Mean :0.3838 Mean :2.309 \n 3rd Qu.:668.5 3rd Qu.:1.0000 3rd Qu.:3.000 \n Max. :891.0 Max. :1.0000 Max. :3.000 \n \n Sex Age SibSp Parch \n Length:891 Min. : 0.42 Min. :0.000 Min. :0.0000 \n Class :character 1st Qu.:20.12 1st Qu.:0.000 1st Qu.:0.0000 \n Mode :character Median :28.00 Median :0.000 Median :0.0000 \n Mean :29.70 Mean :0.523 Mean :0.3816 \n 3rd Qu.:38.00 3rd Qu.:1.000 3rd Qu.:0.0000 \n Max. :80.00 Max. :8.000 Max. :6.0000 \n NA's :177 \n Ticket Fare Cabin Embarked \n Length:891 Min. : 0.00 Length:891 Length:891 \n Class :character 1st Qu.: 7.91 Class :character Class :character \n Mode :character Median : 14.45 Mode :character Mode :character \n Mean : 32.20 \n 3rd Qu.: 31.00 \n Max. :512.33 \n " }, "metadata": {}, "output_type": "display_data" } ] }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "## List of Packages to describe data" }, { "metadata": { "slideshow": { "slide_type": "slide" }, "trusted": false }, "cell_type": "code", "source": "# install.packages(\"Psych\")\n# install.packages(\"Hmisc\")\n# install.packages(\"pastecs\")\ninstall.packages(\"summarytools\")\n# install.packages(\"skimr\")\n# install.packages(\"stargazer\")", "execution_count": null, "outputs": [] }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "Psych \nHmisc \npastecs \nsummarytools \nskimr \nstargazer " }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "library(psych)\ndescribe(train)\ndescribeBy(train)", "execution_count": 22, "outputs": [ { "name": "stderr", "output_type": "stream", "text": "Warning message in describe(train):\n“NAs introduced by coercion”Warning message in describe(train):\n“NAs introduced by coercion”Warning message in describe(train):\n“NAs introduced by coercion”Warning message in describe(train):\n“NAs introduced by coercion”Warning message in describe(train):\n“NAs introduced by coercion”Warning message in FUN(newX[, i], ...):\n“no non-missing arguments to min; returning Inf”Warning message in FUN(newX[, i], ...):\n“no non-missing arguments to min; returning Inf”Warning message in FUN(newX[, i], ...):\n“no non-missing arguments to min; returning Inf”Warning message in FUN(newX[, i], ...):\n“no non-missing arguments to min; returning Inf”Warning message in FUN(newX[, i], ...):\n“no non-missing arguments to max; returning -Inf”Warning message in FUN(newX[, i], ...):\n“no non-missing arguments to max; returning -Inf”Warning message in FUN(newX[, i], ...):\n“no non-missing arguments to max; returning -Inf”Warning message in FUN(newX[, i], ...):\n“no non-missing arguments to max; returning -Inf”" }, { "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\n
varsnmeansdmediantrimmedmadminmaxrangeskewkurtosisse
PassengerId 1 891 4.460000e+022.573538e+02 446.0000 4.460000e+02 330.61980 1.00 891.0000 890.0000 0.0000000 -1.2040412 8.621678e+00
Survived 2 891 3.838384e-014.865925e-01 0.0000 3.548387e-01 0.00000 0.00 1.0000 1.0000 0.4769135 -1.7745414 1.630146e-02
Pclass 3 891 2.308642e+008.360712e-01 3.0000 2.385694e+00 0.00000 1.00 3.0000 2.0000-0.6284264 -1.2834293 2.800944e-02
Name* 4 891 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
Sex* 5 891 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
Age 6 714 2.969912e+011.452650e+01 28.0000 2.926923e+01 13.34340 0.42 80.0000 79.5800 0.3874744 0.1597671 5.436405e-01
SibSp 7 891 5.230079e-011.102743e+00 0.0000 2.720898e-01 0.00000 0.00 8.0000 8.0000 3.6829188 17.7269083 3.694329e-02
Parch 8 891 3.815937e-018.060572e-01 0.0000 1.823282e-01 0.00000 0.00 6.0000 6.0000 2.7398677 9.6880847 2.700393e-02
Ticket* 9 891 2.603185e+054.716093e+05236171.0000 1.956829e+05185104.09260693.00 3101298.00003100605.0000 5.2418837 28.9026995 1.579950e+04
Fare10 891 3.220421e+014.969343e+01 14.4542 2.137872e+01 10.23617 0.00 512.3292 512.3292 4.7712097 33.1230682 1.664792e+00
Cabin*11 891 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
Embarked*12 891 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
\n", "text/latex": "\\begin{tabular}{r|lllllllllllll}\n & vars & n & mean & sd & median & trimmed & mad & min & max & range & skew & kurtosis & se\\\\\n\\hline\n\tPassengerId & 1 & 891 & 4.460000e+02 & 2.573538e+02 & 446.0000 & 4.460000e+02 & 330.61980 & 1.00 & 891.0000 & 890.0000 & 0.0000000 & -1.2040412 & 8.621678e+00\\\\\n\tSurvived & 2 & 891 & 3.838384e-01 & 4.865925e-01 & 0.0000 & 3.548387e-01 & 0.00000 & 0.00 & 1.0000 & 1.0000 & 0.4769135 & -1.7745414 & 1.630146e-02\\\\\n\tPclass & 3 & 891 & 2.308642e+00 & 8.360712e-01 & 3.0000 & 2.385694e+00 & 0.00000 & 1.00 & 3.0000 & 2.0000 & -0.6284264 & -1.2834293 & 2.800944e-02\\\\\n\tName* & 4 & 891 & NaN & NA & NA & NaN & NA & Inf & -Inf & -Inf & NA & NA & NA\\\\\n\tSex* & 5 & 891 & NaN & NA & NA & NaN & NA & Inf & -Inf & -Inf & NA & NA & NA\\\\\n\tAge & 6 & 714 & 2.969912e+01 & 1.452650e+01 & 28.0000 & 2.926923e+01 & 13.34340 & 0.42 & 80.0000 & 79.5800 & 0.3874744 & 0.1597671 & 5.436405e-01\\\\\n\tSibSp & 7 & 891 & 5.230079e-01 & 1.102743e+00 & 0.0000 & 2.720898e-01 & 0.00000 & 0.00 & 8.0000 & 8.0000 & 3.6829188 & 17.7269083 & 3.694329e-02\\\\\n\tParch & 8 & 891 & 3.815937e-01 & 8.060572e-01 & 0.0000 & 1.823282e-01 & 0.00000 & 0.00 & 6.0000 & 6.0000 & 2.7398677 & 9.6880847 & 2.700393e-02\\\\\n\tTicket* & 9 & 891 & 2.603185e+05 & 4.716093e+05 & 236171.0000 & 1.956829e+05 & 185104.09260 & 693.00 & 3101298.0000 & 3100605.0000 & 5.2418837 & 28.9026995 & 1.579950e+04\\\\\n\tFare & 10 & 891 & 3.220421e+01 & 4.969343e+01 & 14.4542 & 2.137872e+01 & 10.23617 & 0.00 & 512.3292 & 512.3292 & 4.7712097 & 33.1230682 & 1.664792e+00\\\\\n\tCabin* & 11 & 891 & NaN & NA & NA & NaN & NA & Inf & -Inf & -Inf & NA & NA & NA\\\\\n\tEmbarked* & 12 & 891 & NaN & NA & NA & NaN & NA & Inf & -Inf & -Inf & NA & NA & NA\\\\\n\\end{tabular}\n", "text/markdown": "\n| | vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | \n|---|---|---|---|---|---|---|---|---|---|---|---|\n| PassengerId | 1 | 891 | 4.460000e+02 | 2.573538e+02 | 446.0000 | 4.460000e+02 | 330.61980 | 1.00 | 891.0000 | 890.0000 | 0.0000000 | -1.2040412 | 8.621678e+00 | \n| Survived | 2 | 891 | 3.838384e-01 | 4.865925e-01 | 0.0000 | 3.548387e-01 | 0.00000 | 0.00 | 1.0000 | 1.0000 | 0.4769135 | -1.7745414 | 1.630146e-02 | \n| Pclass | 3 | 891 | 2.308642e+00 | 8.360712e-01 | 3.0000 | 2.385694e+00 | 0.00000 | 1.00 | 3.0000 | 2.0000 | -0.6284264 | -1.2834293 | 2.800944e-02 | \n| Name* | 4 | 891 | NaN | NA | NA | NaN | NA | Inf | -Inf | -Inf | NA | NA | NA | \n| Sex* | 5 | 891 | NaN | NA | NA | NaN | NA | Inf | -Inf | -Inf | NA | NA | NA | \n| Age | 6 | 714 | 2.969912e+01 | 1.452650e+01 | 28.0000 | 2.926923e+01 | 13.34340 | 0.42 | 80.0000 | 79.5800 | 0.3874744 | 0.1597671 | 5.436405e-01 | \n| SibSp | 7 | 891 | 5.230079e-01 | 1.102743e+00 | 0.0000 | 2.720898e-01 | 0.00000 | 0.00 | 8.0000 | 8.0000 | 3.6829188 | 17.7269083 | 3.694329e-02 | \n| Parch | 8 | 891 | 3.815937e-01 | 8.060572e-01 | 0.0000 | 1.823282e-01 | 0.00000 | 0.00 | 6.0000 | 6.0000 | 2.7398677 | 9.6880847 | 2.700393e-02 | \n| Ticket* | 9 | 891 | 2.603185e+05 | 4.716093e+05 | 236171.0000 | 1.956829e+05 | 185104.09260 | 693.00 | 3101298.0000 | 3100605.0000 | 5.2418837 | 28.9026995 | 1.579950e+04 | \n| Fare | 10 | 891 | 3.220421e+01 | 4.969343e+01 | 14.4542 | 2.137872e+01 | 10.23617 | 0.00 | 512.3292 | 512.3292 | 4.7712097 | 33.1230682 | 1.664792e+00 | \n| Cabin* | 11 | 891 | NaN | NA | NA | NaN | NA | Inf | -Inf | -Inf | NA | NA | NA | \n| Embarked* | 12 | 891 | NaN | NA | NA | NaN | NA | Inf | -Inf | -Inf | NA | NA | NA | \n\n\n", "text/plain": " vars n mean sd median trimmed \nPassengerId 1 891 4.460000e+02 2.573538e+02 446.0000 4.460000e+02\nSurvived 2 891 3.838384e-01 4.865925e-01 0.0000 3.548387e-01\nPclass 3 891 2.308642e+00 8.360712e-01 3.0000 2.385694e+00\nName* 4 891 NaN NA NA NaN\nSex* 5 891 NaN NA NA NaN\nAge 6 714 2.969912e+01 1.452650e+01 28.0000 2.926923e+01\nSibSp 7 891 5.230079e-01 1.102743e+00 0.0000 2.720898e-01\nParch 8 891 3.815937e-01 8.060572e-01 0.0000 1.823282e-01\nTicket* 9 891 2.603185e+05 4.716093e+05 236171.0000 1.956829e+05\nFare 10 891 3.220421e+01 4.969343e+01 14.4542 2.137872e+01\nCabin* 11 891 NaN NA NA NaN\nEmbarked* 12 891 NaN NA NA NaN\n mad min max range skew kurtosis \nPassengerId 330.61980 1.00 891.0000 890.0000 0.0000000 -1.2040412\nSurvived 0.00000 0.00 1.0000 1.0000 0.4769135 -1.7745414\nPclass 0.00000 1.00 3.0000 2.0000 -0.6284264 -1.2834293\nName* NA Inf -Inf -Inf NA NA\nSex* NA Inf -Inf -Inf NA NA\nAge 13.34340 0.42 80.0000 79.5800 0.3874744 0.1597671\nSibSp 0.00000 0.00 8.0000 8.0000 3.6829188 17.7269083\nParch 0.00000 0.00 6.0000 6.0000 2.7398677 9.6880847\nTicket* 185104.09260 693.00 3101298.0000 3100605.0000 5.2418837 28.9026995\nFare 10.23617 0.00 512.3292 512.3292 4.7712097 33.1230682\nCabin* NA Inf -Inf -Inf NA NA\nEmbarked* NA Inf -Inf -Inf NA NA\n se \nPassengerId 8.621678e+00\nSurvived 1.630146e-02\nPclass 2.800944e-02\nName* NA\nSex* NA\nAge 5.436405e-01\nSibSp 3.694329e-02\nParch 2.700393e-02\nTicket* 1.579950e+04\nFare 1.664792e+00\nCabin* NA\nEmbarked* NA" }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": "Warning message in describe(x, type = type):\n“NAs introduced by coercion”Warning message in describe(x, type = type):\n“NAs introduced by coercion”Warning message in describe(x, type = type):\n“NAs introduced by coercion”Warning message in describe(x, type = type):\n“NAs introduced by coercion”Warning message in describe(x, type = type):\n“NAs introduced by coercion”Warning message in FUN(newX[, i], ...):\n“no non-missing arguments to min; returning Inf”Warning message in FUN(newX[, i], ...):\n“no non-missing arguments to min; returning Inf”Warning message in FUN(newX[, i], ...):\n“no non-missing arguments to min; returning Inf”Warning message in FUN(newX[, i], ...):\n“no non-missing arguments to min; returning Inf”Warning message in FUN(newX[, i], ...):\n“no non-missing arguments to max; returning -Inf”Warning message in FUN(newX[, i], ...):\n“no non-missing arguments to max; returning -Inf”Warning message in FUN(newX[, i], ...):\n“no non-missing arguments to max; returning -Inf”Warning message in FUN(newX[, i], ...):\n“no non-missing arguments to max; returning -Inf”Warning message in describeBy(train):\n“no grouping variable requested”" }, { "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\n
varsnmeansdmediantrimmedmadminmaxrangeskewkurtosisse
PassengerId 1 891 4.460000e+022.573538e+02 446.0000 4.460000e+02 330.61980 1.00 891.0000 890.0000 0.0000000 -1.2040412 8.621678e+00
Survived 2 891 3.838384e-014.865925e-01 0.0000 3.548387e-01 0.00000 0.00 1.0000 1.0000 0.4769135 -1.7745414 1.630146e-02
Pclass 3 891 2.308642e+008.360712e-01 3.0000 2.385694e+00 0.00000 1.00 3.0000 2.0000-0.6284264 -1.2834293 2.800944e-02
Name* 4 891 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
Sex* 5 891 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
Age 6 714 2.969912e+011.452650e+01 28.0000 2.926923e+01 13.34340 0.42 80.0000 79.5800 0.3874744 0.1597671 5.436405e-01
SibSp 7 891 5.230079e-011.102743e+00 0.0000 2.720898e-01 0.00000 0.00 8.0000 8.0000 3.6829188 17.7269083 3.694329e-02
Parch 8 891 3.815937e-018.060572e-01 0.0000 1.823282e-01 0.00000 0.00 6.0000 6.0000 2.7398677 9.6880847 2.700393e-02
Ticket* 9 891 2.603185e+054.716093e+05236171.0000 1.956829e+05185104.09260693.00 3101298.00003100605.0000 5.2418837 28.9026995 1.579950e+04
Fare10 891 3.220421e+014.969343e+01 14.4542 2.137872e+01 10.23617 0.00 512.3292 512.3292 4.7712097 33.1230682 1.664792e+00
Cabin*11 891 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
Embarked*12 891 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
\n", "text/latex": "\\begin{tabular}{r|lllllllllllll}\n & vars & n & mean & sd & median & trimmed & mad & min & max & range & skew & kurtosis & se\\\\\n\\hline\n\tPassengerId & 1 & 891 & 4.460000e+02 & 2.573538e+02 & 446.0000 & 4.460000e+02 & 330.61980 & 1.00 & 891.0000 & 890.0000 & 0.0000000 & -1.2040412 & 8.621678e+00\\\\\n\tSurvived & 2 & 891 & 3.838384e-01 & 4.865925e-01 & 0.0000 & 3.548387e-01 & 0.00000 & 0.00 & 1.0000 & 1.0000 & 0.4769135 & -1.7745414 & 1.630146e-02\\\\\n\tPclass & 3 & 891 & 2.308642e+00 & 8.360712e-01 & 3.0000 & 2.385694e+00 & 0.00000 & 1.00 & 3.0000 & 2.0000 & -0.6284264 & -1.2834293 & 2.800944e-02\\\\\n\tName* & 4 & 891 & NaN & NA & NA & NaN & NA & Inf & -Inf & -Inf & NA & NA & NA\\\\\n\tSex* & 5 & 891 & NaN & NA & NA & NaN & NA & Inf & -Inf & -Inf & NA & NA & NA\\\\\n\tAge & 6 & 714 & 2.969912e+01 & 1.452650e+01 & 28.0000 & 2.926923e+01 & 13.34340 & 0.42 & 80.0000 & 79.5800 & 0.3874744 & 0.1597671 & 5.436405e-01\\\\\n\tSibSp & 7 & 891 & 5.230079e-01 & 1.102743e+00 & 0.0000 & 2.720898e-01 & 0.00000 & 0.00 & 8.0000 & 8.0000 & 3.6829188 & 17.7269083 & 3.694329e-02\\\\\n\tParch & 8 & 891 & 3.815937e-01 & 8.060572e-01 & 0.0000 & 1.823282e-01 & 0.00000 & 0.00 & 6.0000 & 6.0000 & 2.7398677 & 9.6880847 & 2.700393e-02\\\\\n\tTicket* & 9 & 891 & 2.603185e+05 & 4.716093e+05 & 236171.0000 & 1.956829e+05 & 185104.09260 & 693.00 & 3101298.0000 & 3100605.0000 & 5.2418837 & 28.9026995 & 1.579950e+04\\\\\n\tFare & 10 & 891 & 3.220421e+01 & 4.969343e+01 & 14.4542 & 2.137872e+01 & 10.23617 & 0.00 & 512.3292 & 512.3292 & 4.7712097 & 33.1230682 & 1.664792e+00\\\\\n\tCabin* & 11 & 891 & NaN & NA & NA & NaN & NA & Inf & -Inf & -Inf & NA & NA & NA\\\\\n\tEmbarked* & 12 & 891 & NaN & NA & NA & NaN & NA & Inf & -Inf & -Inf & NA & NA & NA\\\\\n\\end{tabular}\n", "text/markdown": "\n| | vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | \n|---|---|---|---|---|---|---|---|---|---|---|---|\n| PassengerId | 1 | 891 | 4.460000e+02 | 2.573538e+02 | 446.0000 | 4.460000e+02 | 330.61980 | 1.00 | 891.0000 | 890.0000 | 0.0000000 | -1.2040412 | 8.621678e+00 | \n| Survived | 2 | 891 | 3.838384e-01 | 4.865925e-01 | 0.0000 | 3.548387e-01 | 0.00000 | 0.00 | 1.0000 | 1.0000 | 0.4769135 | -1.7745414 | 1.630146e-02 | \n| Pclass | 3 | 891 | 2.308642e+00 | 8.360712e-01 | 3.0000 | 2.385694e+00 | 0.00000 | 1.00 | 3.0000 | 2.0000 | -0.6284264 | -1.2834293 | 2.800944e-02 | \n| Name* | 4 | 891 | NaN | NA | NA | NaN | NA | Inf | -Inf | -Inf | NA | NA | NA | \n| Sex* | 5 | 891 | NaN | NA | NA | NaN | NA | Inf | -Inf | -Inf | NA | NA | NA | \n| Age | 6 | 714 | 2.969912e+01 | 1.452650e+01 | 28.0000 | 2.926923e+01 | 13.34340 | 0.42 | 80.0000 | 79.5800 | 0.3874744 | 0.1597671 | 5.436405e-01 | \n| SibSp | 7 | 891 | 5.230079e-01 | 1.102743e+00 | 0.0000 | 2.720898e-01 | 0.00000 | 0.00 | 8.0000 | 8.0000 | 3.6829188 | 17.7269083 | 3.694329e-02 | \n| Parch | 8 | 891 | 3.815937e-01 | 8.060572e-01 | 0.0000 | 1.823282e-01 | 0.00000 | 0.00 | 6.0000 | 6.0000 | 2.7398677 | 9.6880847 | 2.700393e-02 | \n| Ticket* | 9 | 891 | 2.603185e+05 | 4.716093e+05 | 236171.0000 | 1.956829e+05 | 185104.09260 | 693.00 | 3101298.0000 | 3100605.0000 | 5.2418837 | 28.9026995 | 1.579950e+04 | \n| Fare | 10 | 891 | 3.220421e+01 | 4.969343e+01 | 14.4542 | 2.137872e+01 | 10.23617 | 0.00 | 512.3292 | 512.3292 | 4.7712097 | 33.1230682 | 1.664792e+00 | \n| Cabin* | 11 | 891 | NaN | NA | NA | NaN | NA | Inf | -Inf | -Inf | NA | NA | NA | \n| Embarked* | 12 | 891 | NaN | NA | NA | NaN | NA | Inf | -Inf | -Inf | NA | NA | NA | \n\n\n", "text/plain": " vars n mean sd median trimmed \nPassengerId 1 891 4.460000e+02 2.573538e+02 446.0000 4.460000e+02\nSurvived 2 891 3.838384e-01 4.865925e-01 0.0000 3.548387e-01\nPclass 3 891 2.308642e+00 8.360712e-01 3.0000 2.385694e+00\nName* 4 891 NaN NA NA NaN\nSex* 5 891 NaN NA NA NaN\nAge 6 714 2.969912e+01 1.452650e+01 28.0000 2.926923e+01\nSibSp 7 891 5.230079e-01 1.102743e+00 0.0000 2.720898e-01\nParch 8 891 3.815937e-01 8.060572e-01 0.0000 1.823282e-01\nTicket* 9 891 2.603185e+05 4.716093e+05 236171.0000 1.956829e+05\nFare 10 891 3.220421e+01 4.969343e+01 14.4542 2.137872e+01\nCabin* 11 891 NaN NA NA NaN\nEmbarked* 12 891 NaN NA NA NaN\n mad min max range skew kurtosis \nPassengerId 330.61980 1.00 891.0000 890.0000 0.0000000 -1.2040412\nSurvived 0.00000 0.00 1.0000 1.0000 0.4769135 -1.7745414\nPclass 0.00000 1.00 3.0000 2.0000 -0.6284264 -1.2834293\nName* NA Inf -Inf -Inf NA NA\nSex* NA Inf -Inf -Inf NA NA\nAge 13.34340 0.42 80.0000 79.5800 0.3874744 0.1597671\nSibSp 0.00000 0.00 8.0000 8.0000 3.6829188 17.7269083\nParch 0.00000 0.00 6.0000 6.0000 2.7398677 9.6880847\nTicket* 185104.09260 693.00 3101298.0000 3100605.0000 5.2418837 28.9026995\nFare 10.23617 0.00 512.3292 512.3292 4.7712097 33.1230682\nCabin* NA Inf -Inf -Inf NA NA\nEmbarked* NA Inf -Inf -Inf NA NA\n se \nPassengerId 8.621678e+00\nSurvived 1.630146e-02\nPclass 2.800944e-02\nName* NA\nSex* NA\nAge 5.436405e-01\nSibSp 3.694329e-02\nParch 2.700393e-02\nTicket* 1.579950e+04\nFare 1.664792e+00\nCabin* NA\nEmbarked* NA" }, "metadata": {}, "output_type": "display_data" } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "library(Hmisc)\nHmisc::describe(train)", "execution_count": 23, "outputs": [ { "name": "stderr", "output_type": "stream", "text": "Loading required package: lattice\nLoading required package: survival\nLoading required package: Formula\nLoading required package: ggplot2\n\nAttaching package: ‘ggplot2’\n\nThe following objects are masked from ‘package:psych’:\n\n %+%, alpha\n\n\nAttaching package: ‘Hmisc’\n\nThe following object is masked from ‘package:psych’:\n\n describe\n\nThe following objects are masked from ‘package:base’:\n\n format.pval, units\n\n" }, { "data": { "text/plain": "train \n\n 12 Variables 891 Observations\n--------------------------------------------------------------------------------\nPassengerId \n n missing distinct Info Mean Gmd .05 .10 \n 891 0 891 1 446 297.3 45.5 90.0 \n .25 .50 .75 .90 .95 \n 223.5 446.0 668.5 802.0 846.5 \n\nlowest : 1 2 3 4 5, highest: 887 888 889 890 891\n--------------------------------------------------------------------------------\nSurvived \n n missing distinct Info Sum Mean Gmd \n 891 0 2 0.71 342 0.3838 0.4735 \n\n--------------------------------------------------------------------------------\nPclass \n n missing distinct Info Mean Gmd \n 891 0 3 0.81 2.309 0.8631 \n \nValue 1 2 3\nFrequency 216 184 491\nProportion 0.242 0.207 0.551\n--------------------------------------------------------------------------------\nName \n n missing distinct \n 891 0 891 \n\nlowest : Abbing, Mr. Anthony Abbott, Mr. Rossmore Edward Abbott, Mrs. Stanton (Rosa Hunt) Abelson, Mr. Samuel Abelson, Mrs. Samuel (Hannah Wizosky) \nhighest: Yousseff, Mr. Gerious Yrois, Miss. Henriette (\"Mrs Harbeck\") Zabour, Miss. Hileni Zabour, Miss. Thamine Zimmerman, Mr. Leo \n--------------------------------------------------------------------------------\nSex \n n missing distinct \n 891 0 2 \n \nValue female male\nFrequency 314 577\nProportion 0.352 0.648\n--------------------------------------------------------------------------------\nAge \n n missing distinct Info Mean Gmd .05 .10 \n 714 177 88 0.999 29.7 16.21 4.00 14.00 \n .25 .50 .75 .90 .95 \n 20.12 28.00 38.00 50.00 56.00 \n\nlowest : 0.42 0.67 0.75 0.83 0.92, highest: 70.00 70.50 71.00 74.00 80.00\n--------------------------------------------------------------------------------\nSibSp \n n missing distinct Info Mean Gmd \n 891 0 7 0.669 0.523 0.823 \n \nValue 0 1 2 3 4 5 8\nFrequency 608 209 28 16 18 5 7\nProportion 0.682 0.235 0.031 0.018 0.020 0.006 0.008\n--------------------------------------------------------------------------------\nParch \n n missing distinct Info Mean Gmd \n 891 0 7 0.556 0.3816 0.6259 \n \nValue 0 1 2 3 4 5 6\nFrequency 678 118 80 5 4 5 1\nProportion 0.761 0.132 0.090 0.006 0.004 0.006 0.001\n--------------------------------------------------------------------------------\nTicket \n n missing distinct \n 891 0 681 \n\nlowest : 110152 110413 110465 110564 110813 \nhighest: W./C. 6608 W./C. 6609 W.E.P. 5734 W/C 14208 WE/P 5735 \n--------------------------------------------------------------------------------\nFare \n n missing distinct Info Mean Gmd .05 .10 \n 891 0 248 1 32.2 36.78 7.225 7.550 \n .25 .50 .75 .90 .95 \n 7.910 14.454 31.000 77.958 112.079 \n\nlowest : 0.0000 4.0125 5.0000 6.2375 6.4375\nhighest: 227.5250 247.5208 262.3750 263.0000 512.3292\n--------------------------------------------------------------------------------\nCabin \n n missing distinct \n 204 687 147 \n\nlowest : A10 A14 A16 A19 A20, highest: F33 F38 F4 G6 T \n--------------------------------------------------------------------------------\nEmbarked \n n missing distinct \n 889 2 3 \n \nValue C Q S\nFrequency 168 77 644\nProportion 0.189 0.087 0.724\n--------------------------------------------------------------------------------" }, "metadata": {}, "output_type": "display_data" } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "library(pastecs)\nstat.desc(train)", "execution_count": 24, "outputs": [ { "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\n
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
nbr.val8.910000e+02891.000000008.910000e+02NA NA 7.140000e+02891.00000000891.00000000NA 891.000000NA NA
nbr.null0.000000e+00549.000000000.000000e+00NA NA 0.000000e+00608.00000000678.00000000NA 15.000000NA NA
nbr.na0.000000e+00 0.000000000.000000e+00NA NA 1.770000e+02 0.00000000 0.00000000NA 0.000000NA NA
min1.000000e+00 0.000000001.000000e+00NA NA 4.200000e-01 0.00000000 0.00000000NA 0.000000NA NA
max8.910000e+02 1.000000003.000000e+00NA NA 8.000000e+01 8.00000000 6.00000000NA 512.329200NA NA
range8.900000e+02 1.000000002.000000e+00NA NA 7.958000e+01 8.00000000 6.00000000NA 512.329200NA NA
sum3.973860e+05342.000000002.057000e+03NA NA 2.120517e+04466.00000000340.00000000NA 28693.949300NA NA
median4.460000e+02 0.000000003.000000e+00NA NA 2.800000e+01 0.00000000 0.00000000NA 14.454200NA NA
mean4.460000e+02 0.383838382.308642e+00NA NA 2.969912e+01 0.52300786 0.38159371NA 32.204208NA NA
SE.mean8.621678e+00 0.016301462.800944e-02NA NA 5.436405e-01 0.03694329 0.02700393NA 1.664792NA NA
CI.mean.0.951.692119e+01 0.031993785.497225e-02NA NA 1.067328e+00 0.07250613 0.05299881NA 3.267377NA NA
var6.623100e+04 0.236772226.990151e-01NA NA 2.110191e+02 1.21604308 0.64972824NA 2469.436846NA NA
std.dev2.573538e+02 0.486592458.360712e-01NA NA 1.452650e+01 1.10274343 0.80605722NA 49.693429NA NA
coef.var5.770266e-01 1.267701393.621485e-01NA NA 4.891222e-01 2.10846437 2.11234407NA 1.543073NA NA
\n", "text/latex": "\\begin{tabular}{r|llllllllllll}\n & PassengerId & Survived & Pclass & Name & Sex & Age & SibSp & Parch & Ticket & Fare & Cabin & Embarked\\\\\n\\hline\n\tnbr.val & 8.910000e+02 & 891.00000000 & 8.910000e+02 & NA & NA & 7.140000e+02 & 891.00000000 & 891.00000000 & NA & 891.000000 & NA & NA \\\\\n\tnbr.null & 0.000000e+00 & 549.00000000 & 0.000000e+00 & NA & NA & 0.000000e+00 & 608.00000000 & 678.00000000 & NA & 15.000000 & NA & NA \\\\\n\tnbr.na & 0.000000e+00 & 0.00000000 & 0.000000e+00 & NA & NA & 1.770000e+02 & 0.00000000 & 0.00000000 & NA & 0.000000 & NA & NA \\\\\n\tmin & 1.000000e+00 & 0.00000000 & 1.000000e+00 & NA & NA & 4.200000e-01 & 0.00000000 & 0.00000000 & NA & 0.000000 & NA & NA \\\\\n\tmax & 8.910000e+02 & 1.00000000 & 3.000000e+00 & NA & NA & 8.000000e+01 & 8.00000000 & 6.00000000 & NA & 512.329200 & NA & NA \\\\\n\trange & 8.900000e+02 & 1.00000000 & 2.000000e+00 & NA & NA & 7.958000e+01 & 8.00000000 & 6.00000000 & NA & 512.329200 & NA & NA \\\\\n\tsum & 3.973860e+05 & 342.00000000 & 2.057000e+03 & NA & NA & 2.120517e+04 & 466.00000000 & 340.00000000 & NA & 28693.949300 & NA & NA \\\\\n\tmedian & 4.460000e+02 & 0.00000000 & 3.000000e+00 & NA & NA & 2.800000e+01 & 0.00000000 & 0.00000000 & NA & 14.454200 & NA & NA \\\\\n\tmean & 4.460000e+02 & 0.38383838 & 2.308642e+00 & NA & NA & 2.969912e+01 & 0.52300786 & 0.38159371 & NA & 32.204208 & NA & NA \\\\\n\tSE.mean & 8.621678e+00 & 0.01630146 & 2.800944e-02 & NA & NA & 5.436405e-01 & 0.03694329 & 0.02700393 & NA & 1.664792 & NA & NA \\\\\n\tCI.mean.0.95 & 1.692119e+01 & 0.03199378 & 5.497225e-02 & NA & NA & 1.067328e+00 & 0.07250613 & 0.05299881 & NA & 3.267377 & NA & NA \\\\\n\tvar & 6.623100e+04 & 0.23677222 & 6.990151e-01 & NA & NA & 2.110191e+02 & 1.21604308 & 0.64972824 & NA & 2469.436846 & NA & NA \\\\\n\tstd.dev & 2.573538e+02 & 0.48659245 & 8.360712e-01 & NA & NA & 1.452650e+01 & 1.10274343 & 0.80605722 & NA & 49.693429 & NA & NA \\\\\n\tcoef.var & 5.770266e-01 & 1.26770139 & 3.621485e-01 & NA & NA & 4.891222e-01 & 2.10846437 & 2.11234407 & NA & 1.543073 & NA & NA \\\\\n\\end{tabular}\n", "text/markdown": "\n| | PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | \n|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n| nbr.val | 8.910000e+02 | 891.00000000 | 8.910000e+02 | NA | NA | 7.140000e+02 | 891.00000000 | 891.00000000 | NA | 891.000000 | NA | NA | \n| nbr.null | 0.000000e+00 | 549.00000000 | 0.000000e+00 | NA | NA | 0.000000e+00 | 608.00000000 | 678.00000000 | NA | 15.000000 | NA | NA | \n| nbr.na | 0.000000e+00 | 0.00000000 | 0.000000e+00 | NA | NA | 1.770000e+02 | 0.00000000 | 0.00000000 | NA | 0.000000 | NA | NA | \n| min | 1.000000e+00 | 0.00000000 | 1.000000e+00 | NA | NA | 4.200000e-01 | 0.00000000 | 0.00000000 | NA | 0.000000 | NA | NA | \n| max | 8.910000e+02 | 1.00000000 | 3.000000e+00 | NA | NA | 8.000000e+01 | 8.00000000 | 6.00000000 | NA | 512.329200 | NA | NA | \n| range | 8.900000e+02 | 1.00000000 | 2.000000e+00 | NA | NA | 7.958000e+01 | 8.00000000 | 6.00000000 | NA | 512.329200 | NA | NA | \n| sum | 3.973860e+05 | 342.00000000 | 2.057000e+03 | NA | NA | 2.120517e+04 | 466.00000000 | 340.00000000 | NA | 28693.949300 | NA | NA | \n| median | 4.460000e+02 | 0.00000000 | 3.000000e+00 | NA | NA | 2.800000e+01 | 0.00000000 | 0.00000000 | NA | 14.454200 | NA | NA | \n| mean | 4.460000e+02 | 0.38383838 | 2.308642e+00 | NA | NA | 2.969912e+01 | 0.52300786 | 0.38159371 | NA | 32.204208 | NA | NA | \n| SE.mean | 8.621678e+00 | 0.01630146 | 2.800944e-02 | NA | NA | 5.436405e-01 | 0.03694329 | 0.02700393 | NA | 1.664792 | NA | NA | \n| CI.mean.0.95 | 1.692119e+01 | 0.03199378 | 5.497225e-02 | NA | NA | 1.067328e+00 | 0.07250613 | 0.05299881 | NA | 3.267377 | NA | NA | \n| var | 6.623100e+04 | 0.23677222 | 6.990151e-01 | NA | NA | 2.110191e+02 | 1.21604308 | 0.64972824 | NA | 2469.436846 | NA | NA | \n| std.dev | 2.573538e+02 | 0.48659245 | 8.360712e-01 | NA | NA | 1.452650e+01 | 1.10274343 | 0.80605722 | NA | 49.693429 | NA | NA | \n| coef.var | 5.770266e-01 | 1.26770139 | 3.621485e-01 | NA | NA | 4.891222e-01 | 2.10846437 | 2.11234407 | NA | 1.543073 | NA | NA | \n\n\n", "text/plain": " PassengerId Survived Pclass Name Sex Age \nnbr.val 8.910000e+02 891.00000000 8.910000e+02 NA NA 7.140000e+02\nnbr.null 0.000000e+00 549.00000000 0.000000e+00 NA NA 0.000000e+00\nnbr.na 0.000000e+00 0.00000000 0.000000e+00 NA NA 1.770000e+02\nmin 1.000000e+00 0.00000000 1.000000e+00 NA NA 4.200000e-01\nmax 8.910000e+02 1.00000000 3.000000e+00 NA NA 8.000000e+01\nrange 8.900000e+02 1.00000000 2.000000e+00 NA NA 7.958000e+01\nsum 3.973860e+05 342.00000000 2.057000e+03 NA NA 2.120517e+04\nmedian 4.460000e+02 0.00000000 3.000000e+00 NA NA 2.800000e+01\nmean 4.460000e+02 0.38383838 2.308642e+00 NA NA 2.969912e+01\nSE.mean 8.621678e+00 0.01630146 2.800944e-02 NA NA 5.436405e-01\nCI.mean.0.95 1.692119e+01 0.03199378 5.497225e-02 NA NA 1.067328e+00\nvar 6.623100e+04 0.23677222 6.990151e-01 NA NA 2.110191e+02\nstd.dev 2.573538e+02 0.48659245 8.360712e-01 NA NA 1.452650e+01\ncoef.var 5.770266e-01 1.26770139 3.621485e-01 NA NA 4.891222e-01\n SibSp Parch Ticket Fare Cabin Embarked\nnbr.val 891.00000000 891.00000000 NA 891.000000 NA NA \nnbr.null 608.00000000 678.00000000 NA 15.000000 NA NA \nnbr.na 0.00000000 0.00000000 NA 0.000000 NA NA \nmin 0.00000000 0.00000000 NA 0.000000 NA NA \nmax 8.00000000 6.00000000 NA 512.329200 NA NA \nrange 8.00000000 6.00000000 NA 512.329200 NA NA \nsum 466.00000000 340.00000000 NA 28693.949300 NA NA \nmedian 0.00000000 0.00000000 NA 14.454200 NA NA \nmean 0.52300786 0.38159371 NA 32.204208 NA NA \nSE.mean 0.03694329 0.02700393 NA 1.664792 NA NA \nCI.mean.0.95 0.07250613 0.05299881 NA 3.267377 NA NA \nvar 1.21604308 0.64972824 NA 2469.436846 NA NA \nstd.dev 1.10274343 0.80605722 NA 49.693429 NA NA \ncoef.var 2.10846437 2.11234407 NA 1.543073 NA NA " }, "metadata": {}, "output_type": "display_data" } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "# install.packages('skimr', repos = 'http://cran.us.r-project.org')\nlibrary(skimr)\nskim(train)", "execution_count": 25, "outputs": [ { "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\n
variabletypestatlevelvalueformatted
PassengerIdinteger missing .all 0.00000000
PassengerIdinteger complete .all 891.0000000891
PassengerIdinteger n .all 891.0000000891
PassengerIdinteger mean .all 446.0000000446
PassengerIdinteger sd .all 257.3538420257.35
PassengerIdinteger p0 .all 1.00000001
PassengerIdinteger p25 .all 223.5000000223.5
PassengerIdinteger p50 .all 446.0000000446
PassengerIdinteger p75 .all 668.5000000668.5
PassengerIdinteger p100 .all 891.0000000891
PassengerIdinteger hist .all NA▇▇▇▇▇▇▇▇
Survived integer missing .all 0.00000000
Survived integer complete .all 891.0000000891
Survived integer n .all 891.0000000891
Survived integer mean .all 0.38383840.38
Survived integer sd .all 0.48659250.49
Survived integer p0 .all 0.00000000
Survived integer p25 .all 0.00000000
Survived integer p50 .all 0.00000000
Survived integer p75 .all 1.00000001
Survived integer p100 .all 1.00000001
Survived integer hist .all NA▇▁▁▁▁▁▁▅
Pclass integer missing .all 0.00000000
Pclass integer complete .all 891.0000000891
Pclass integer n .all 891.0000000891
Pclass integer mean .all 2.30864202.31
Pclass integer sd .all 0.83607120.84
Pclass integer p0 .all 1.00000001
Pclass integer p25 .all 2.00000002
Pclass integer p50 .all 3.00000003
..................
Ticket charactern .all 891.00000891
Ticket charactermin .all 3.000003
Ticket charactermax .all 18.0000018
Ticket characterempty .all 0.000000
Ticket charactern_unique .all 681.00000681
Fare numeric missing .all 0.000000
Fare numeric complete .all 891.00000891
Fare numeric n .all 891.00000891
Fare numeric mean .all 32.2042132.2
Fare numeric sd .all 49.6934349.69
Fare numeric p0 .all 0.000000
Fare numeric p25 .all 7.910407.91
Fare numeric p50 .all 14.4542014.45
Fare numeric p75 .all 31.0000031
Fare numeric p100 .all 512.32920512.33
Fare numeric hist .all NA▇▁▁▁▁▁▁▁
Cabin charactermissing .all 0.000000
Cabin charactercomplete .all 891.00000891
Cabin charactern .all 891.00000891
Cabin charactermin .all 0.000000
Cabin charactermax .all 15.0000015
Cabin characterempty .all 687.00000687
Cabin charactern_unique .all 148.00000148
Embarked charactermissing .all 0.000000
Embarked charactercomplete .all 891.00000891
Embarked charactern .all 891.00000891
Embarked charactermin .all 0.000000
Embarked charactermax .all 1.000001
Embarked characterempty .all 2.000002
Embarked charactern_unique .all 4.000004
\n", "text/latex": "\\begin{tabular}{r|llllll}\n variable & type & stat & level & value & formatted\\\\\n\\hline\n\t PassengerId & integer & missing & .all & 0.0000000 & 0 \\\\\n\t PassengerId & integer & complete & .all & 891.0000000 & 891 \\\\\n\t PassengerId & integer & n & .all & 891.0000000 & 891 \\\\\n\t PassengerId & integer & mean & .all & 446.0000000 & 446 \\\\\n\t PassengerId & integer & sd & .all & 257.3538420 & 257.35 \\\\\n\t PassengerId & integer & p0 & .all & 1.0000000 & 1 \\\\\n\t PassengerId & integer & p25 & .all & 223.5000000 & 223.5 \\\\\n\t PassengerId & integer & p50 & .all & 446.0000000 & 446 \\\\\n\t PassengerId & integer & p75 & .all & 668.5000000 & 668.5 \\\\\n\t PassengerId & integer & p100 & .all & 891.0000000 & 891 \\\\\n\t PassengerId & integer & hist & .all & NA & ▇▇▇▇▇▇▇▇ \\\\\n\t Survived & integer & missing & .all & 0.0000000 & 0 \\\\\n\t Survived & integer & complete & .all & 891.0000000 & 891 \\\\\n\t Survived & integer & n & .all & 891.0000000 & 891 \\\\\n\t Survived & integer & mean & .all & 0.3838384 & 0.38 \\\\\n\t Survived & integer & sd & .all & 0.4865925 & 0.49 \\\\\n\t Survived & integer & p0 & .all & 0.0000000 & 0 \\\\\n\t Survived & integer & p25 & .all & 0.0000000 & 0 \\\\\n\t Survived & integer & p50 & .all & 0.0000000 & 0 \\\\\n\t Survived & integer & p75 & .all & 1.0000000 & 1 \\\\\n\t Survived & integer & p100 & .all & 1.0000000 & 1 \\\\\n\t Survived & integer & hist & .all & NA & ▇▁▁▁▁▁▁▅ \\\\\n\t Pclass & integer & missing & .all & 0.0000000 & 0 \\\\\n\t Pclass & integer & complete & .all & 891.0000000 & 891 \\\\\n\t Pclass & integer & n & .all & 891.0000000 & 891 \\\\\n\t Pclass & integer & mean & .all & 2.3086420 & 2.31 \\\\\n\t Pclass & integer & sd & .all & 0.8360712 & 0.84 \\\\\n\t Pclass & integer & p0 & .all & 1.0000000 & 1 \\\\\n\t Pclass & integer & p25 & .all & 2.0000000 & 2 \\\\\n\t Pclass & integer & p50 & .all & 3.0000000 & 3 \\\\\n\t ... & ... & ... & ... & ... & ...\\\\\n\t Ticket & character & n & .all & 891.00000 & 891 \\\\\n\t Ticket & character & min & .all & 3.00000 & 3 \\\\\n\t Ticket & character & max & .all & 18.00000 & 18 \\\\\n\t Ticket & character & empty & .all & 0.00000 & 0 \\\\\n\t Ticket & character & n\\_unique & .all & 681.00000 & 681 \\\\\n\t Fare & numeric & missing & .all & 0.00000 & 0 \\\\\n\t Fare & numeric & complete & .all & 891.00000 & 891 \\\\\n\t Fare & numeric & n & .all & 891.00000 & 891 \\\\\n\t Fare & numeric & mean & .all & 32.20421 & 32.2 \\\\\n\t Fare & numeric & sd & .all & 49.69343 & 49.69 \\\\\n\t Fare & numeric & p0 & .all & 0.00000 & 0 \\\\\n\t Fare & numeric & p25 & .all & 7.91040 & 7.91 \\\\\n\t Fare & numeric & p50 & .all & 14.45420 & 14.45 \\\\\n\t Fare & numeric & p75 & .all & 31.00000 & 31 \\\\\n\t Fare & numeric & p100 & .all & 512.32920 & 512.33 \\\\\n\t Fare & numeric & hist & .all & NA & ▇▁▁▁▁▁▁▁ \\\\\n\t Cabin & character & missing & .all & 0.00000 & 0 \\\\\n\t Cabin & character & complete & .all & 891.00000 & 891 \\\\\n\t Cabin & character & n & .all & 891.00000 & 891 \\\\\n\t Cabin & character & min & .all & 0.00000 & 0 \\\\\n\t Cabin & character & max & .all & 15.00000 & 15 \\\\\n\t Cabin & character & empty & .all & 687.00000 & 687 \\\\\n\t Cabin & character & n\\_unique & .all & 148.00000 & 148 \\\\\n\t Embarked & character & missing & .all & 0.00000 & 0 \\\\\n\t Embarked & character & complete & .all & 891.00000 & 891 \\\\\n\t Embarked & character & n & .all & 891.00000 & 891 \\\\\n\t Embarked & character & min & .all & 0.00000 & 0 \\\\\n\t Embarked & character & max & .all & 1.00000 & 1 \\\\\n\t Embarked & character & empty & .all & 2.00000 & 2 \\\\\n\t Embarked & character & n\\_unique & .all & 4.00000 & 4 \\\\\n\\end{tabular}\n", "text/markdown": "\nvariable | type | stat | level | value | formatted | \n|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n| PassengerId | integer | missing | .all | 0.0000000 | 0 | \n| PassengerId | integer | complete | .all | 891.0000000 | 891 | \n| PassengerId | integer | n | .all | 891.0000000 | 891 | \n| PassengerId | integer | mean | .all | 446.0000000 | 446 | \n| PassengerId | integer | sd | .all | 257.3538420 | 257.35 | \n| PassengerId | integer | p0 | .all | 1.0000000 | 1 | \n| PassengerId | integer | p25 | .all | 223.5000000 | 223.5 | \n| PassengerId | integer | p50 | .all | 446.0000000 | 446 | \n| PassengerId | integer | p75 | .all | 668.5000000 | 668.5 | \n| PassengerId | integer | p100 | .all | 891.0000000 | 891 | \n| PassengerId | integer | hist | .all | NA | ▇▇▇▇▇▇▇▇ | \n| Survived | integer | missing | .all | 0.0000000 | 0 | \n| Survived | integer | complete | .all | 891.0000000 | 891 | \n| Survived | integer | n | .all | 891.0000000 | 891 | \n| Survived | integer | mean | .all | 0.3838384 | 0.38 | \n| Survived | integer | sd | .all | 0.4865925 | 0.49 | \n| Survived | integer | p0 | .all | 0.0000000 | 0 | \n| Survived | integer | p25 | .all | 0.0000000 | 0 | \n| Survived | integer | p50 | .all | 0.0000000 | 0 | \n| Survived | integer | p75 | .all | 1.0000000 | 1 | \n| Survived | integer | p100 | .all | 1.0000000 | 1 | \n| Survived | integer | hist | .all | NA | ▇▁▁▁▁▁▁▅ | \n| Pclass | integer | missing | .all | 0.0000000 | 0 | \n| Pclass | integer | complete | .all | 891.0000000 | 891 | \n| Pclass | integer | n | .all | 891.0000000 | 891 | \n| Pclass | integer | mean | .all | 2.3086420 | 2.31 | \n| Pclass | integer | sd | .all | 0.8360712 | 0.84 | \n| Pclass | integer | p0 | .all | 1.0000000 | 1 | \n| Pclass | integer | p25 | .all | 2.0000000 | 2 | \n| Pclass | integer | p50 | .all | 3.0000000 | 3 | \n| ... | ... | ... | ... | ... | ... | \n| Ticket | character | n | .all | 891.00000 | 891 | \n| Ticket | character | min | .all | 3.00000 | 3 | \n| Ticket | character | max | .all | 18.00000 | 18 | \n| Ticket | character | empty | .all | 0.00000 | 0 | \n| Ticket | character | n_unique | .all | 681.00000 | 681 | \n| Fare | numeric | missing | .all | 0.00000 | 0 | \n| Fare | numeric | complete | .all | 891.00000 | 891 | \n| Fare | numeric | n | .all | 891.00000 | 891 | \n| Fare | numeric | mean | .all | 32.20421 | 32.2 | \n| Fare | numeric | sd | .all | 49.69343 | 49.69 | \n| Fare | numeric | p0 | .all | 0.00000 | 0 | \n| Fare | numeric | p25 | .all | 7.91040 | 7.91 | \n| Fare | numeric | p50 | .all | 14.45420 | 14.45 | \n| Fare | numeric | p75 | .all | 31.00000 | 31 | \n| Fare | numeric | p100 | .all | 512.32920 | 512.33 | \n| Fare | numeric | hist | .all | NA | ▇▁▁▁▁▁▁▁ | \n| Cabin | character | missing | .all | 0.00000 | 0 | \n| Cabin | character | complete | .all | 891.00000 | 891 | \n| Cabin | character | n | .all | 891.00000 | 891 | \n| Cabin | character | min | .all | 0.00000 | 0 | \n| Cabin | character | max | .all | 15.00000 | 15 | \n| Cabin | character | empty | .all | 687.00000 | 687 | \n| Cabin | character | n_unique | .all | 148.00000 | 148 | \n| Embarked | character | missing | .all | 0.00000 | 0 | \n| Embarked | character | complete | .all | 891.00000 | 891 | \n| Embarked | character | n | .all | 891.00000 | 891 | \n| Embarked | character | min | .all | 0.00000 | 0 | \n| Embarked | character | max | .all | 1.00000 | 1 | \n| Embarked | character | empty | .all | 2.00000 | 2 | \n| Embarked | character | n_unique | .all | 4.00000 | 4 | \n\n\n", "text/plain": " variable type stat level value formatted\n1 PassengerId integer missing .all 0.0000000 0 \n2 PassengerId integer complete .all 891.0000000 891 \n3 PassengerId integer n .all 891.0000000 891 \n4 PassengerId integer mean .all 446.0000000 446 \n5 PassengerId integer sd .all 257.3538420 257.35 \n6 PassengerId integer p0 .all 1.0000000 1 \n7 PassengerId integer p25 .all 223.5000000 223.5 \n8 PassengerId integer p50 .all 446.0000000 446 \n9 PassengerId integer p75 .all 668.5000000 668.5 \n10 PassengerId integer p100 .all 891.0000000 891 \n11 PassengerId integer hist .all NA ▇▇▇▇▇▇▇▇ \n12 Survived integer missing .all 0.0000000 0 \n13 Survived integer complete .all 891.0000000 891 \n14 Survived integer n .all 891.0000000 891 \n15 Survived integer mean .all 0.3838384 0.38 \n16 Survived integer sd .all 0.4865925 0.49 \n17 Survived integer p0 .all 0.0000000 0 \n18 Survived integer p25 .all 0.0000000 0 \n19 Survived integer p50 .all 0.0000000 0 \n20 Survived integer p75 .all 1.0000000 1 \n21 Survived integer p100 .all 1.0000000 1 \n22 Survived integer hist .all NA ▇▁▁▁▁▁▁▅ \n23 Pclass integer missing .all 0.0000000 0 \n24 Pclass integer complete .all 891.0000000 891 \n25 Pclass integer n .all 891.0000000 891 \n26 Pclass integer mean .all 2.3086420 2.31 \n27 Pclass integer sd .all 0.8360712 0.84 \n28 Pclass integer p0 .all 1.0000000 1 \n29 Pclass integer p25 .all 2.0000000 2 \n30 Pclass integer p50 .all 3.0000000 3 \n... ... ... ... ... ... ... \n83 Ticket character n .all 891.00000 891 \n84 Ticket character min .all 3.00000 3 \n85 Ticket character max .all 18.00000 18 \n86 Ticket character empty .all 0.00000 0 \n87 Ticket character n_unique .all 681.00000 681 \n88 Fare numeric missing .all 0.00000 0 \n89 Fare numeric complete .all 891.00000 891 \n90 Fare numeric n .all 891.00000 891 \n91 Fare numeric mean .all 32.20421 32.2 \n92 Fare numeric sd .all 49.69343 49.69 \n93 Fare numeric p0 .all 0.00000 0 \n94 Fare numeric p25 .all 7.91040 7.91 \n95 Fare numeric p50 .all 14.45420 14.45 \n96 Fare numeric p75 .all 31.00000 31 \n97 Fare numeric p100 .all 512.32920 512.33 \n98 Fare numeric hist .all NA ▇▁▁▁▁▁▁▁ \n99 Cabin character missing .all 0.00000 0 \n100 Cabin character complete .all 891.00000 891 \n101 Cabin character n .all 891.00000 891 \n102 Cabin character min .all 0.00000 0 \n103 Cabin character max .all 15.00000 15 \n104 Cabin character empty .all 687.00000 687 \n105 Cabin character n_unique .all 148.00000 148 \n106 Embarked character missing .all 0.00000 0 \n107 Embarked character complete .all 891.00000 891 \n108 Embarked character n .all 891.00000 891 \n109 Embarked character min .all 0.00000 0 \n110 Embarked character max .all 1.00000 1 \n111 Embarked character empty .all 2.00000 2 \n112 Embarked character n_unique .all 4.00000 4 " }, "metadata": {}, "output_type": "display_data" } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "# install.packages('summarytools', repos = pos = 'http://cran.us.r-project.org')\n# library(summarytools)\nsummarytools::descr(train)\ndfSummary(train)", "execution_count": 27, "outputs": [ { "ename": "ERROR", "evalue": "Error in loadNamespace(name): there is no package called ‘summarytools’\n", "output_type": "error", "traceback": [ "Error in loadNamespace(name): there is no package called ‘summarytools’\nTraceback:\n", "1. summarytools::descr", "2. getExportedValue(pkg, name)", "3. asNamespace(ns)", "4. getNamespace(ns)", "5. tryCatch(loadNamespace(name), error = function(e) stop(e))", "6. tryCatchList(expr, classes, parentenv, handlers)", "7. tryCatchOne(expr, names, parentenv, handlers[[1L]])", "8. value[[3L]](cond)" ] } ] }, { "metadata": { "run_control": { "frozen": false, "marked": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "# stargazer package install.packages('stargazer', repos =\n# 'http://cran.us.r-project.org')\nmydata <- mtcars\nlibrary(stargazer)\nstargazer(mydata, type = \"text\", title = \"Descriptive statistics\", digits = 1, out = \"data/table1.txt\")\n\n# document for stargazer package:\n# https://cran.r-project.org/web/packages/stargazer/vignettes/stargazer.pdf", "execution_count": null, "outputs": [] }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "Interactive web app: \nhttps://vasileiostsakalos.shinyapps.io/descriptive_analysis_pt1/ " }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "https://towardsdatascience.com/understanding-descriptive-statistics-c9c2b0641291 " }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "## Anscombe’s Quartet" }, { "metadata": { "slideshow": { "slide_type": "fragment" } }, "cell_type": "markdown", "source": "**Summary Statistics Don’t Tell the Whole Story, it’s important to visualize the data to get a clear picture ** \nhttps://eagereyes.org/criticism/anscombes-quartet \nhttps://www.autodeskresearch.com/publications/samestats \nhttps://cran.r-project.org/web/packages/datasauRus/vignettes/Datasaurus.html " }, { "metadata": {}, "cell_type": "markdown", "source": "## Statistics for Google Sheets Add on" }, { "metadata": {}, "cell_type": "markdown", "source": "https://sites.google.com/site/statisticsforspreadsheets/" } ], "metadata": { "celltoolbar": "Slideshow", "hide_input": false, "kernelspec": { "name": "r", "display_name": "R", "language": "R" }, "language_info": { "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.4.1", "file_extension": ".r", "codemirror_mode": "r" }, "nav_menu": {}, "toc": { "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "base_numbering": 1, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "256px" }, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }