{ "cells": [ { "metadata": { "slideshow": { "slide_type": "slide" }, "toc": "true" }, "cell_type": "markdown", "source": "

Table of Contents

\n
" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "# Data Importing" }, { "metadata": { "slideshow": { "slide_type": "fragment" } }, "cell_type": "markdown", "source": "* **Using command line, text editor and point and click features**" }, { "metadata": { "slideshow": { "slide_type": "fragment" } }, "cell_type": "markdown", "source": "* **Reading Data From TXT|CSV Files: R Base Functions**" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "## Using command line, text editor and point and click features" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "### Using command line\n* reading keyboard input from the command line \n* reading data from a file\n* using `scan` and `data.entry` commands\n" }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" }, "trusted": false }, "cell_type": "code", "source": "# simmply assign a variable using the scan command with no arguments If you enter\n# a blank line (just hit the Enter key)\nx <- scan()\n# 1: 28.8 2: 27.3 3: 45.8 4: 34.8 5: 23.5 6: Read 5 items\n\nx\n# [1] 28.8 27.3 45.8 34.8 23.5", "execution_count": 63, "outputs": [ { "output_type": "display_data", "data": { "text/html": "", "text/latex": "", "text/markdown": "", "text/plain": "numeric(0)" }, "metadata": {} } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "# If you provide a filename, then the scan command will read the values from the\n# file Suppose that we have a file called diameters.csv\ny <- scan(\"data/diameters.csv\")\ny", "execution_count": 64, "outputs": [ { "output_type": "display_data", "data": { "text/html": "
    \n\t
  1. 28.8
  2. \n\t
  3. 27.3
  4. \n\t
  5. 45.8
  6. \n\t
  7. 34.8
  8. \n\t
  9. 25.3
  10. \n
\n", "text/latex": "\\begin{enumerate*}\n\\item 28.8\n\\item 27.3\n\\item 45.8\n\\item 34.8\n\\item 25.3\n\\end{enumerate*}\n", "text/markdown": "1. 28.8\n2. 27.3\n3. 45.8\n4. 34.8\n5. 25.3\n\n\n", "text/plain": "[1] 28.8 27.3 45.8 34.8 25.3" }, "metadata": {} } ] }, { "metadata": { "run_control": { "frozen": false, "marked": false, "read_only": false }, "scrolled": false, "slideshow": { "slide_type": "fragment" }, "trusted": false }, "cell_type": "code", "source": "# You can read more complex data from a file using the scan command, but you must\n# specify the structure of the file. This means that you have to specify the data\n# types. Here, assume that we have a data file called trees.csv:\n\nx <- scan(\"data/trees.csv\", what = list(\"character\", \"double\"), sep = \",\")\n# The first column is the character data, and the second column is the numeric\n# data.\nprint(x)\n# [[1]] [1] 'pine' 'pine' 'oak' 'pine' 'oak'\n\n# [[2]] [1] '28.8' '27.3' '45.8' '34.8' '25.3'", "execution_count": 65, "outputs": [ { "name": "stdout", "output_type": "stream", "text": "[[1]]\n[1] \"pine\" \"pine\" \"oak\" \"pine\" \"oak\" \n\n[[2]]\n[1] \"28.8\" \"27.3\" \"45.8\" \"34.8\" \"25.3\"\n\n" } ] }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "### Using interacive text editor/graphical interface" }, { "metadata": { "collapsed": true, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" }, "trusted": false }, "cell_type": "code", "source": "# Another method for entering data is to use the data.entry command. The command\n# will open up a graphical interface x<- data.entry(1, 8, 4, 8, 9, 15,16)", "execution_count": 66, "outputs": [] }, { "metadata": { "slideshow": { "slide_type": "fragment" } }, "cell_type": "markdown", "source": "**using edit() or fix() function to edit the dataframe dynamically using interactive spreadsheet model **\n\n```\n# not work in jupyter R-kernel\nelements <- data.frame()\nelements <- edit(elements) # Jupyter R kernel not supported this feature\nelements <- fix(elements) \n# edit() function displays the output result after edited but fix() function only saves results but not printed to the console \n```" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "### Using point and click" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "" }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "**Load csv, xls, xlsx, sav, dta, por, sas and stata files by point and click features** \n**Other similar features included:** https://support.rstudio.com/hc/en-us/articles/218611977-Importing-Data-with-RStudio?mobile_site=true" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "## Reading Tabular like Data From TXT|CSV Files: R Base Functions\n\n```\n* header\n* sep\n* stringsAsFactors\n* row.names\n* na.strings\n* nrows\n* skip\n```" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "**Comparison** \n![Imgur](https://i.imgur.com/0zwa4YI.jpg)" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "![Imgur](https://i.imgur.com/d84tL2e.png)" }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "### using clipboard " }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "fragment" }, "trusted": false }, "cell_type": "code", "source": "# read.table(“clipboard”): It allows to copy data from Excel and read it directly\n# in R. useful in some cases. myDf <- read.table('clipboard') myDf", "execution_count": 1, "outputs": [] }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "### using file.choose() " }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" }, "trusted": false }, "cell_type": "code", "source": "#Using file.choose() argument (select yearly_sales.txt file)\n# my_data <- read.table(file.choose(), sep=\",\")\n# head(my_data)", "execution_count": 2, "outputs": [] }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "### Using read.table\n* If you have a .txt or a tab-delimited text file, you can easily import it with the basic R function read.table()\n* read data from a file as a table\n* if the file is nicely formatted and arranged in convenient rows and columns" }, { "metadata": { "collapsed": true, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" }, "trusted": false }, "cell_type": "code", "source": "# take R documentation help\n# ?read.table or help(read.table)\n# ??read.table or help.search(\"read.table\")", "execution_count": 67, "outputs": [] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" }, "trusted": false }, "cell_type": "code", "source": "# read.table and read.delim are intended to import other common file types such\n# as .TXT\nsales_table_txt <- read.table(\"data/yearly_sales.txt\", header = TRUE, sep = \",\")\nhead(sales_table_txt)", "execution_count": 70, "outputs": [ { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\t\n\n
cust_idsales_totalnum_of_ordersgender
100001800.643 F
100002217.533 F
100003 74.582 M
100004498.603 M
100005723.114 F
100006 69.432 F
\n", "text/latex": "\\begin{tabular}{r|llll}\n cust\\_id & sales\\_total & num\\_of\\_orders & gender\\\\\n\\hline\n\t 100001 & 800.64 & 3 & F \\\\\n\t 100002 & 217.53 & 3 & F \\\\\n\t 100003 & 74.58 & 2 & M \\\\\n\t 100004 & 498.60 & 3 & M \\\\\n\t 100005 & 723.11 & 4 & F \\\\\n\t 100006 & 69.43 & 2 & F \\\\\n\\end{tabular}\n", "text/markdown": "\ncust_id | sales_total | num_of_orders | gender | \n|---|---|---|---|---|---|\n| 100001 | 800.64 | 3 | F | \n| 100002 | 217.53 | 3 | F | \n| 100003 | 74.58 | 2 | M | \n| 100004 | 498.60 | 3 | M | \n| 100005 | 723.11 | 4 | F | \n| 100006 | 69.43 | 2 | F | \n\n\n", "text/plain": " cust_id sales_total num_of_orders gender\n1 100001 800.64 3 F \n2 100002 217.53 3 F \n3 100003 74.58 2 M \n4 100004 498.60 3 M \n5 100005 723.11 4 F \n6 100006 69.43 2 F " }, "metadata": {} } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": false, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "# Reading .dat files in R\ntrial <- read.table(\"data/trialTable.dat\")\ntrial\nnames(trial)\nnames(trial)[names(trial)==names(trial)] <- c(\"one\", \"two\", \"three\")\ntrial", "execution_count": 71, "outputs": [ { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\n
V1V2V3
123
356
\n", "text/latex": "\\begin{tabular}{r|lll}\n V1 & V2 & V3\\\\\n\\hline\n\t 1 & 2 & 3\\\\\n\t 3 & 5 & 6\\\\\n\\end{tabular}\n", "text/markdown": "\nV1 | V2 | V3 | \n|---|---|\n| 1 | 2 | 3 | \n| 3 | 5 | 6 | \n\n\n", "text/plain": " V1 V2 V3\n1 1 2 3 \n2 3 5 6 " }, "metadata": {} }, { "output_type": "display_data", "data": { "text/html": "
    \n\t
  1. 'V1'
  2. \n\t
  3. 'V2'
  4. \n\t
  5. 'V3'
  6. \n
\n", "text/latex": "\\begin{enumerate*}\n\\item 'V1'\n\\item 'V2'\n\\item 'V3'\n\\end{enumerate*}\n", "text/markdown": "1. 'V1'\n2. 'V2'\n3. 'V3'\n\n\n", "text/plain": "[1] \"V1\" \"V2\" \"V3\"" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\n
onetwothree
123
356
\n", "text/latex": "\\begin{tabular}{r|lll}\n one & two & three\\\\\n\\hline\n\t 1 & 2 & 3\\\\\n\t 3 & 5 & 6\\\\\n\\end{tabular}\n", "text/markdown": "\none | two | three | \n|---|---|\n| 1 | 2 | 3 | \n| 3 | 5 | 6 | \n\n\n", "text/plain": " one two three\n1 1 2 3 \n2 3 5 6 " }, "metadata": {} } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "fragment" }, "trusted": false }, "cell_type": "code", "source": "iris <- read.table(url(\"http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data\"), \n header = FALSE, sep = \",\")\n# external sep argument is needed to import this file using read.table\nhead(iris)\n", "execution_count": 72, "outputs": [ { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\t\n\n
V1V2V3V4V5
5.1 3.5 1.4 0.2 Iris-setosa
4.9 3.0 1.4 0.2 Iris-setosa
4.7 3.2 1.3 0.2 Iris-setosa
4.6 3.1 1.5 0.2 Iris-setosa
5.0 3.6 1.4 0.2 Iris-setosa
5.4 3.9 1.7 0.4 Iris-setosa
\n", "text/latex": "\\begin{tabular}{r|lllll}\n V1 & V2 & V3 & V4 & V5\\\\\n\\hline\n\t 5.1 & 3.5 & 1.4 & 0.2 & Iris-setosa\\\\\n\t 4.9 & 3.0 & 1.4 & 0.2 & Iris-setosa\\\\\n\t 4.7 & 3.2 & 1.3 & 0.2 & Iris-setosa\\\\\n\t 4.6 & 3.1 & 1.5 & 0.2 & Iris-setosa\\\\\n\t 5.0 & 3.6 & 1.4 & 0.2 & Iris-setosa\\\\\n\t 5.4 & 3.9 & 1.7 & 0.4 & Iris-setosa\\\\\n\\end{tabular}\n", "text/markdown": "\nV1 | V2 | V3 | V4 | V5 | \n|---|---|---|---|---|---|\n| 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa | \n| 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa | \n| 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa | \n| 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa | \n| 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa | \n| 5.4 | 3.9 | 1.7 | 0.4 | Iris-setosa | \n\n\n", "text/plain": " V1 V2 V3 V4 V5 \n1 5.1 3.5 1.4 0.2 Iris-setosa\n2 4.9 3.0 1.4 0.2 Iris-setosa\n3 4.7 3.2 1.3 0.2 Iris-setosa\n4 4.6 3.1 1.5 0.2 Iris-setosa\n5 5.0 3.6 1.4 0.2 Iris-setosa\n6 5.4 3.9 1.7 0.4 Iris-setosa" }, "metadata": {} } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": false, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "data <- read.table(url(\"https://raw.githubusercontent.com/mrtnj/rstuff/master/comb_gnome_data.csv\"), \n header = TRUE, sep = \",\")\ndim(data)\nhead(data)", "execution_count": 73, "outputs": [ { "output_type": "display_data", "data": { "text/html": "
    \n\t
  1. 100
  2. \n\t
  3. 6
  4. \n
\n", "text/latex": "\\begin{enumerate*}\n\\item 100\n\\item 6\n\\end{enumerate*}\n", "text/markdown": "1. 100\n2. 6\n\n\n", "text/plain": "[1] 100 6" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\t\n\n
grouptreatmentmass0mass10mass25mass50
green control 150.04841188.1930 264.33913 465.6842
green control 128.94804168.4952 251.67897 491.2235
green pixies 161.00249267.2797 571.697312030.0171
green pixies 193.44995267.8308 436.31328 984.0751
green control 101.88225124.7450 169.00897 280.3632
green control 41.92258 51.2602 69.30728 114.5803
\n", "text/latex": "\\begin{tabular}{r|llllll}\n group & treatment & mass0 & mass10 & mass25 & mass50\\\\\n\\hline\n\t green & control & 150.04841 & 188.1930 & 264.33913 & 465.6842\\\\\n\t green & control & 128.94804 & 168.4952 & 251.67897 & 491.2235\\\\\n\t green & pixies & 161.00249 & 267.2797 & 571.69731 & 2030.0171\\\\\n\t green & pixies & 193.44995 & 267.8308 & 436.31328 & 984.0751\\\\\n\t green & control & 101.88225 & 124.7450 & 169.00897 & 280.3632\\\\\n\t green & control & 41.92258 & 51.2602 & 69.30728 & 114.5803\\\\\n\\end{tabular}\n", "text/markdown": "\ngroup | treatment | mass0 | mass10 | mass25 | mass50 | \n|---|---|---|---|---|---|\n| green | control | 150.04841 | 188.1930 | 264.33913 | 465.6842 | \n| green | control | 128.94804 | 168.4952 | 251.67897 | 491.2235 | \n| green | pixies | 161.00249 | 267.2797 | 571.69731 | 2030.0171 | \n| green | pixies | 193.44995 | 267.8308 | 436.31328 | 984.0751 | \n| green | control | 101.88225 | 124.7450 | 169.00897 | 280.3632 | \n| green | control | 41.92258 | 51.2602 | 69.30728 | 114.5803 | \n\n\n", "text/plain": " group treatment mass0 mass10 mass25 mass50 \n1 green control 150.04841 188.1930 264.33913 465.6842\n2 green control 128.94804 168.4952 251.67897 491.2235\n3 green pixies 161.00249 267.2797 571.69731 2030.0171\n4 green pixies 193.44995 267.8308 436.31328 984.0751\n5 green control 101.88225 124.7450 169.00897 280.3632\n6 green control 41.92258 51.2602 69.30728 114.5803" }, "metadata": {} } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "fragment" }, "trusted": false }, "cell_type": "code", "source": "# Read in the data\n df <- read.table(\"https://s3.amazonaws.com/assets.datacamp.com/blog_assets/scores_timed.txt\", \n header = FALSE, \n sep=\"/\", \n strip.white = TRUE, \n na.strings = \"EMPTY\") # na.strings indicates which strings should be interpreted as NA values. In this case, the string “EMPTY” is to be interpreted as an NA value.\n\n # Print out `df`\n print(df)", "execution_count": 74, "outputs": [ { "name": "stdout", "output_type": "stream", "text": " V1 V2 V3 V4 V5\n1 1 6 12:01:03 0.50 WORST\n2 2 16 07:42:51 0.32 BEST\n3 3 19 12:01:29 0.50 \n4 4 13 03:22:50 0.14 INTERMEDIATE\n5 5 8 09:30:03 0.40 WORST\n" } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "# strip.white = TRUE feature\ncarSpeeds <- read.csv(file = \"data/car-speeds.csv\", stringsAsFactors = FALSE)\nunique(carSpeeds$Color)\n", "execution_count": 75, "outputs": [ { "output_type": "display_data", "data": { "text/html": "
    \n\t
  1. 'Blue'
  2. \n\t
  3. ' Red'
  4. \n\t
  5. 'White'
  6. \n\t
  7. 'Red'
  8. \n\t
  9. 'Black'
  10. \n
\n", "text/latex": "\\begin{enumerate*}\n\\item 'Blue'\n\\item ' Red'\n\\item 'White'\n\\item 'Red'\n\\item 'Black'\n\\end{enumerate*}\n", "text/markdown": "1. 'Blue'\n2. ' Red'\n3. 'White'\n4. 'Red'\n5. 'Black'\n\n\n", "text/plain": "[1] \"Blue\" \" Red\" \"White\" \"Red\" \"Black\"" }, "metadata": {} } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "# we see two values for red cars.# Let’s try again, this time importing the data\n# using the strip.white argument. NOTE - this argument must be accompanied by the\n# sep argument, by which we indicate the type of delimiter in the file (the comma\n# for most .csv files)\ncarSpeeds <- read.csv(file = \"data/car-speeds.csv\", stringsAsFactors = FALSE, strip.white = TRUE, \n sep = \",\")\n\nunique(carSpeeds$Color)", "execution_count": 76, "outputs": [ { "output_type": "display_data", "data": { "text/html": "
    \n\t
  1. 'Blue'
  2. \n\t
  3. 'Red'
  4. \n\t
  5. 'White'
  6. \n\t
  7. 'Black'
  8. \n
\n", "text/latex": "\\begin{enumerate*}\n\\item 'Blue'\n\\item 'Red'\n\\item 'White'\n\\item 'Black'\n\\end{enumerate*}\n", "text/markdown": "1. 'Blue'\n2. 'Red'\n3. 'White'\n4. 'Black'\n\n\n", "text/plain": "[1] \"Blue\" \"Red\" \"White\" \"Black\"" }, "metadata": {} } ] }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "### Using read.csv()\n* Column seperated values\n* Each line contains a row of values which can be numbers or letters, and each value is separated by a comma. " }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" }, "trusted": false }, "cell_type": "code", "source": "trial <- read.csv(\"data/trialTable.csv\", header = FALSE)\ntrial", "execution_count": 77, "outputs": [ { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\n
V1V2V3
123
356
\n", "text/latex": "\\begin{tabular}{r|lll}\n V1 & V2 & V3\\\\\n\\hline\n\t 1 & 2 & 3\\\\\n\t 3 & 5 & 6\\\\\n\\end{tabular}\n", "text/markdown": "\nV1 | V2 | V3 | \n|---|---|\n| 1 | 2 | 3 | \n| 3 | 5 | 6 | \n\n\n", "text/plain": " V1 V2 V3\n1 1 2 3 \n2 3 5 6 " }, "metadata": {} } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "# By default, strings in the data are converted to factors. If you load the data\n# below with read.csv, then all the text columns will be treated as factors, even\n# though it might make more sense to treat some of them as strings. To do this,\n# use stringsAsFactors=FALSE:\ndata <- read.csv(\"data/datafile.csv\")\nstr(data)\ndata <- read.csv(\"data/datafile.csv\", stringsAsFactors = FALSE)\nstr(data)\n# You might have to convert some columns to factors\ndata$Sex <- factor(data$Sex)\nstr(data)", "execution_count": 78, "outputs": [ { "name": "stdout", "output_type": "stream", "text": "'data.frame':\t3 obs. of 4 variables:\n $ First : Factor w/ 3 levels \"\",\"Currer\",\"Dr.\": 2 3 1\n $ Last : Factor w/ 3 levels \"Bell\",\"Seuss\",..: 1 2 3\n $ Sex : Factor w/ 2 levels \"F\",\"M\": 1 2 NA\n $ Number: int 2 49 21\n'data.frame':\t3 obs. of 4 variables:\n $ First : chr \"Currer\" \"Dr.\" \"\"\n $ Last : chr \"Bell\" \"Seuss\" \"Student\"\n $ Sex : chr \"F\" \"M\" NA\n $ Number: int 2 49 21\n'data.frame':\t3 obs. of 4 variables:\n $ First : chr \"Currer\" \"Dr.\" \"\"\n $ Last : chr \"Bell\" \"Seuss\" \"Student\"\n $ Sex : Factor w/ 2 levels \"F\",\"M\": 1 2 NA\n $ Number: int 2 49 21\n" } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "# Another alternative is to load them as factors and convert some columns to characters:\n\ndata <- read.csv(\"data/datafile.csv\")\nstr(data)\ndata$First <- as.character(data$First)\ndata$Last <- as.character(data$Last)\nstr(data)\n# Another method: convert columns named \"First\" and \"Last\"\nstringcols <- c(\"First\",\"Last\")\ndata[stringcols] <- lapply(data[stringcols], as.character)\nstr(data)", "execution_count": 79, "outputs": [ { "name": "stdout", "output_type": "stream", "text": "'data.frame':\t3 obs. of 4 variables:\n $ First : Factor w/ 3 levels \"\",\"Currer\",\"Dr.\": 2 3 1\n $ Last : Factor w/ 3 levels \"Bell\",\"Seuss\",..: 1 2 3\n $ Sex : Factor w/ 2 levels \"F\",\"M\": 1 2 NA\n $ Number: int 2 49 21\n'data.frame':\t3 obs. of 4 variables:\n $ First : chr \"Currer\" \"Dr.\" \"\"\n $ Last : chr \"Bell\" \"Seuss\" \"Student\"\n $ Sex : Factor w/ 2 levels \"F\",\"M\": 1 2 NA\n $ Number: int 2 49 21\n'data.frame':\t3 obs. of 4 variables:\n $ First : chr \"Currer\" \"Dr.\" \"\"\n $ Last : chr \"Bell\" \"Seuss\" \"Student\"\n $ Sex : Factor w/ 2 levels \"F\",\"M\": 1 2 NA\n $ Number: int 2 49 21\n" } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "fragment" }, "trusted": false }, "cell_type": "code", "source": " # skipping unwanted rows\n inventories <- read.csv(\"data/inventories.csv\", skip = 6, header = TRUE, sep = \",\")\n head(inventories)", "execution_count": 80, "outputs": [ { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\t\n\n
LineXX1994.1X1994.2X1994.3X1994.4X1995.1X1995.2X1995.3X1995.4X1996.1X1996.2X1996.3X1996.4
1 Manufacturing and trade 918077 933394 943934 959179 974505 985513 992770 996357 997100 1001734 1009316 1016359
2 Manufacturing 396374 399508 402100 405833 411143 414963 418103 419897 424195 423339 426845 429994
3 Durable goods 237975 240652 242222 244930 248020 250316 253164 255633 259538 260090 263286 265147
4 Lumber and wood products 10264 10380 10159 10104 9967 9857 10303 10543 10559 10693 10696 10479
5 Furniture and fixtures 6695 6599 6647 6735 6810 6839 6753 6737 6501 6572 6650 6698
6 Stone, clay, and glass products 8313 8041 8114 8315 8475 8854 9023 8784 8844 8780 8844 9047
\n", "text/latex": "\\begin{tabular}{r|llllllllllllll}\n Line & X & X1994.1 & X1994.2 & X1994.3 & X1994.4 & X1995.1 & X1995.2 & X1995.3 & X1995.4 & X1996.1 & X1996.2 & X1996.3 & X1996.4\\\\\n\\hline\n\t 1 & Manufacturing and trade & 918077 & 933394 & 943934 & 959179 & 974505 & 985513 & 992770 & 996357 & 997100 & 1001734 & 1009316 & 1016359 \\\\\n\t 2 & Manufacturing & 396374 & 399508 & 402100 & 405833 & 411143 & 414963 & 418103 & 419897 & 424195 & 423339 & 426845 & 429994 \\\\\n\t 3 & Durable goods & 237975 & 240652 & 242222 & 244930 & 248020 & 250316 & 253164 & 255633 & 259538 & 260090 & 263286 & 265147 \\\\\n\t 4 & Lumber and wood products & 10264 & 10380 & 10159 & 10104 & 9967 & 9857 & 10303 & 10543 & 10559 & 10693 & 10696 & 10479 \\\\\n\t 5 & Furniture and fixtures & 6695 & 6599 & 6647 & 6735 & 6810 & 6839 & 6753 & 6737 & 6501 & 6572 & 6650 & 6698 \\\\\n\t 6 & Stone, clay, and glass products & 8313 & 8041 & 8114 & 8315 & 8475 & 8854 & 9023 & 8784 & 8844 & 8780 & 8844 & 9047 \\\\\n\\end{tabular}\n", "text/markdown": "\nLine | X | X1994.1 | X1994.2 | X1994.3 | X1994.4 | X1995.1 | X1995.2 | X1995.3 | X1995.4 | X1996.1 | X1996.2 | X1996.3 | X1996.4 | \n|---|---|---|---|---|---|\n| 1 | Manufacturing and trade | 918077 | 933394 | 943934 | 959179 | 974505 | 985513 | 992770 | 996357 | 997100 | 1001734 | 1009316 | 1016359 | \n| 2 | Manufacturing | 396374 | 399508 | 402100 | 405833 | 411143 | 414963 | 418103 | 419897 | 424195 | 423339 | 426845 | 429994 | \n| 3 | Durable goods | 237975 | 240652 | 242222 | 244930 | 248020 | 250316 | 253164 | 255633 | 259538 | 260090 | 263286 | 265147 | \n| 4 | Lumber and wood products | 10264 | 10380 | 10159 | 10104 | 9967 | 9857 | 10303 | 10543 | 10559 | 10693 | 10696 | 10479 | \n| 5 | Furniture and fixtures | 6695 | 6599 | 6647 | 6735 | 6810 | 6839 | 6753 | 6737 | 6501 | 6572 | 6650 | 6698 | \n| 6 | Stone, clay, and glass products | 8313 | 8041 | 8114 | 8315 | 8475 | 8854 | 9023 | 8784 | 8844 | 8780 | 8844 | 9047 | \n\n\n", "text/plain": " Line X X1994.1 X1994.2 X1994.3 X1994.4 X1995.1\n1 1 Manufacturing and trade 918077 933394 943934 959179 974505 \n2 2 Manufacturing 396374 399508 402100 405833 411143 \n3 3 Durable goods 237975 240652 242222 244930 248020 \n4 4 Lumber and wood products 10264 10380 10159 10104 9967 \n5 5 Furniture and fixtures 6695 6599 6647 6735 6810 \n6 6 Stone, clay, and glass products 8313 8041 8114 8315 8475 \n X1995.2 X1995.3 X1995.4 X1996.1 X1996.2 X1996.3 X1996.4\n1 985513 992770 996357 997100 1001734 1009316 1016359\n2 414963 418103 419897 424195 423339 426845 429994\n3 250316 253164 255633 259538 260090 263286 265147\n4 9857 10303 10543 10559 10693 10696 10479\n5 6839 6753 6737 6501 6572 6650 6698\n6 8854 9023 8784 8844 8780 8844 9047" }, "metadata": {} } ] }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "### Using read.csv2()\n* `read.csv2()` variant used in countries that use a comma “,” as decimal point and a semicolon “;” as field separators.\n* If you export a comma separated value file on a Swedish computer, you are likely to get a numbers with decimal commas and semicolons as field separators. Then what you want is read.csv2( ), which is a read.csv made for semicolon separated files with decimal commas." }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "states_eu <- read.csv(\"data/states_eu.csv\")\nstates_eu\nstr(states_eu)\nstates_eu <- read.csv2(\"data/states_eu.csv\", stringsAsFactors=FALSE)\nstates_eu", "execution_count": 81, "outputs": [ { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\n
statecapitalpop_millarea_sqmX
South DakotaPierre 0,853 77116 NA
New York Albany 19,746 54555 NA
Oregon Salem 3,97 98381 NA
Vermont Montpelier 0,627 9616 NA
Hawaii Honolulu 1,42 10931 NA
\n", "text/latex": "\\begin{tabular}{r|lllll}\n state & capital & pop\\_mill & area\\_sqm & X\\\\\n\\hline\n\t South Dakota & Pierre & 0,853 & 77116 & NA \\\\\n\t New York & Albany & 19,746 & 54555 & NA \\\\\n\t Oregon & Salem & 3,97 & 98381 & NA \\\\\n\t Vermont & Montpelier & 0,627 & 9616 & NA \\\\\n\t Hawaii & Honolulu & 1,42 & 10931 & NA \\\\\n\\end{tabular}\n", "text/markdown": "\nstate | capital | pop_mill | area_sqm | X | \n|---|---|---|---|---|\n| South Dakota | Pierre | 0,853 | 77116 | NA | \n| New York | Albany | 19,746 | 54555 | NA | \n| Oregon | Salem | 3,97 | 98381 | NA | \n| Vermont | Montpelier | 0,627 | 9616 | NA | \n| Hawaii | Honolulu | 1,42 | 10931 | NA | \n\n\n", "text/plain": " state capital pop_mill area_sqm X \n1 South Dakota Pierre 0,853 77116 NA\n2 New York Albany 19,746 54555 NA\n3 Oregon Salem 3,97 98381 NA\n4 Vermont Montpelier 0,627 9616 NA\n5 Hawaii Honolulu 1,42 10931 NA" }, "metadata": {} }, { "name": "stdout", "output_type": "stream", "text": "'data.frame':\t5 obs. of 5 variables:\n $ state : Factor w/ 5 levels \"Hawaii\",\"New York\",..: 4 2 3 5 1\n $ capital : Factor w/ 5 levels \"Albany\",\"Honolulu\",..: 4 1 5 3 2\n $ pop_mill: Factor w/ 5 levels \"0,627\",\"0,853\",..: 2 4 5 1 3\n $ area_sqm: int 77116 54555 98381 9616 10931\n $ X : logi NA NA NA NA NA\n" }, { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\n
state.capital.pop_mill.area_sqm.
South Dakota,Pierre,0,853,77116,
New York,Albany,19,746,54555,
Oregon,Salem,3,97,98381,
Vermont,Montpelier,0,627,9616,
Hawaii,Honolulu,1,42,10931,
\n", "text/latex": "\\begin{tabular}{r|l}\n state.capital.pop\\_mill.area\\_sqm.\\\\\n\\hline\n\t South Dakota,Pierre,0,853,77116,\\\\\n\t New York,Albany,19,746,54555, \\\\\n\t Oregon,Salem,3,97,98381, \\\\\n\t Vermont,Montpelier,0,627,9616, \\\\\n\t Hawaii,Honolulu,1,42,10931, \\\\\n\\end{tabular}\n", "text/markdown": "\nstate.capital.pop_mill.area_sqm. | \n|---|---|---|---|---|\n| South Dakota,Pierre,0,853,77116, | \n| New York,Albany,19,746,54555, | \n| Oregon,Salem,3,97,98381, | \n| Vermont,Montpelier,0,627,9616, | \n| Hawaii,Honolulu,1,42,10931, | \n\n\n", "text/plain": " state.capital.pop_mill.area_sqm.\n1 South Dakota,Pierre,0,853,77116,\n2 New York,Albany,19,746,54555, \n3 Oregon,Salem,3,97,98381, \n4 Vermont,Montpelier,0,627,9616, \n5 Hawaii,Honolulu,1,42,10931, " }, "metadata": {} } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": false, "slideshow": { "slide_type": "fragment" }, "trusted": false }, "cell_type": "code", "source": "df <- read.csv2(\"data/testSemCol.csv\", header = FALSE)\ndf", "execution_count": 82, "outputs": [ { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\n
V1V2V3
Col1Col2Col3
1 2 3
4 5 6
7 8 9
a b c
\n", "text/latex": "\\begin{tabular}{r|lll}\n V1 & V2 & V3\\\\\n\\hline\n\t Col1 & Col2 & Col3\\\\\n\t 1 & 2 & 3 \\\\\n\t 4 & 5 & 6 \\\\\n\t 7 & 8 & 9 \\\\\n\t a & b & c \\\\\n\\end{tabular}\n", "text/markdown": "\nV1 | V2 | V3 | \n|---|---|---|---|---|\n| Col1 | Col2 | Col3 | \n| 1 | 2 | 3 | \n| 4 | 5 | 6 | \n| 7 | 8 | 9 | \n| a | b | c | \n\n\n", "text/plain": " V1 V2 V3 \n1 Col1 Col2 Col3\n2 1 2 3 \n3 4 5 6 \n4 7 8 9 \n5 a b c " }, "metadata": {} } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "df <- read.csv2(\"https://s3.amazonaws.com/assets.datacamp.com/blog_assets/test.csv\", \n header = FALSE, sep=\",\")\ndf", "execution_count": 83, "outputs": [ { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\n
V1V2V3
Col1Col2Col3
1 2 3
4 5 6
7 8 9
a b c
\n", "text/latex": "\\begin{tabular}{r|lll}\n V1 & V2 & V3\\\\\n\\hline\n\t Col1 & Col2 & Col3\\\\\n\t 1 & 2 & 3 \\\\\n\t 4 & 5 & 6 \\\\\n\t 7 & 8 & 9 \\\\\n\t a & b & c \\\\\n\\end{tabular}\n", "text/markdown": "\nV1 | V2 | V3 | \n|---|---|---|---|---|\n| Col1 | Col2 | Col3 | \n| 1 | 2 | 3 | \n| 4 | 5 | 6 | \n| 7 | 8 | 9 | \n| a | b | c | \n\n\n", "text/plain": " V1 V2 V3 \n1 Col1 Col2 Col3\n2 1 2 3 \n3 4 5 6 \n4 7 8 9 \n5 a b c " }, "metadata": {} } ] }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "### Using read.delim()\n* `read.delim()` is almost the same as read.table(), except the field separator is tab by default. It is convenient for open tab delimited file. \n* `read.delim()` for reading “tab-separated value” files (“.txt”). By default, point (“.”) is used as decimal points.\n* `read.delim()` is related to other functions for reading data files into R, such as `read.csv()` or `read.table()`\n* `read.delim()` has arguments (or options) that are similar to the arguments of `read.csv()` or `read.table()`" }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "states<- read.delim(\"data/states.txt\")\nstates<- read.delim(\"data/states.txt\",stringsAsFactors=FALSE) # proper \nstates", "execution_count": 84, "outputs": [ { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\n
statecapitalpop_millarea_sqm
South DakotaPierre 0.853 77116
New York Albany 19.746 54555
Oregon Salem 3.970 98381
Vermont Montpelier 0.627 9616
Hawaii Honolulu 1.420 10931
\n", "text/latex": "\\begin{tabular}{r|llll}\n state & capital & pop\\_mill & area\\_sqm\\\\\n\\hline\n\t South Dakota & Pierre & 0.853 & 77116 \\\\\n\t New York & Albany & 19.746 & 54555 \\\\\n\t Oregon & Salem & 3.970 & 98381 \\\\\n\t Vermont & Montpelier & 0.627 & 9616 \\\\\n\t Hawaii & Honolulu & 1.420 & 10931 \\\\\n\\end{tabular}\n", "text/markdown": "\nstate | capital | pop_mill | area_sqm | \n|---|---|---|---|---|\n| South Dakota | Pierre | 0.853 | 77116 | \n| New York | Albany | 19.746 | 54555 | \n| Oregon | Salem | 3.970 | 98381 | \n| Vermont | Montpelier | 0.627 | 9616 | \n| Hawaii | Honolulu | 1.420 | 10931 | \n\n\n", "text/plain": " state capital pop_mill area_sqm\n1 South Dakota Pierre 0.853 77116 \n2 New York Albany 19.746 54555 \n3 Oregon Salem 3.970 98381 \n4 Vermont Montpelier 0.627 9616 \n5 Hawaii Honolulu 1.420 10931 " }, "metadata": {} } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": false, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "# filename <- 'http://www.nilu.no/projects/ccc/onlinedata/ozone/CZ03_2009.dat'\n\ndata <- read.delim(\"data/CZ03_2009.dat\", header = TRUE, sep = \"\", skip = 2, as.is = TRUE)\n\nhead(data)\n", "execution_count": 85, "outputs": [ { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\t\n\n
DateHourValue
01.01.200900:00 34.3
01.01.200901:00 31.9
01.01.200902:00 29.9
01.01.200903:00 28.5
01.01.200904:00 32.9
01.01.200905:00 20.5
\n", "text/latex": "\\begin{tabular}{r|lll}\n Date & Hour & Value\\\\\n\\hline\n\t 01.01.2009 & 00:00 & 34.3 \\\\\n\t 01.01.2009 & 01:00 & 31.9 \\\\\n\t 01.01.2009 & 02:00 & 29.9 \\\\\n\t 01.01.2009 & 03:00 & 28.5 \\\\\n\t 01.01.2009 & 04:00 & 32.9 \\\\\n\t 01.01.2009 & 05:00 & 20.5 \\\\\n\\end{tabular}\n", "text/markdown": "\nDate | Hour | Value | \n|---|---|---|---|---|---|\n| 01.01.2009 | 00:00 | 34.3 | \n| 01.01.2009 | 01:00 | 31.9 | \n| 01.01.2009 | 02:00 | 29.9 | \n| 01.01.2009 | 03:00 | 28.5 | \n| 01.01.2009 | 04:00 | 32.9 | \n| 01.01.2009 | 05:00 | 20.5 | \n\n\n", "text/plain": " Date Hour Value\n1 01.01.2009 00:00 34.3 \n2 01.01.2009 01:00 31.9 \n3 01.01.2009 02:00 29.9 \n4 01.01.2009 03:00 28.5 \n5 01.01.2009 04:00 32.9 \n6 01.01.2009 05:00 20.5 " }, "metadata": {} } ] }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "### Using read.delim2()\n* `read.delim2()` for reading “tab-separated value” files (“.txt”). By default, comma (“,”) is used as decimal points." }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "fragment" }, "trusted": false }, "cell_type": "code", "source": "sales_delim2_txt <- read.delim2(\"data/yearly_sales.txt\",sep = \",\")\nhead(sales_delim2_txt)", "execution_count": 86, "outputs": [ { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\t\n\n
cust_idsales_totalnum_of_ordersgender
100001800.643 F
100002217.533 F
10000374.58 2 M
100004498.6 3 M
100005723.114 F
10000669.43 2 F
\n", "text/latex": "\\begin{tabular}{r|llll}\n cust\\_id & sales\\_total & num\\_of\\_orders & gender\\\\\n\\hline\n\t 100001 & 800.64 & 3 & F \\\\\n\t 100002 & 217.53 & 3 & F \\\\\n\t 100003 & 74.58 & 2 & M \\\\\n\t 100004 & 498.6 & 3 & M \\\\\n\t 100005 & 723.11 & 4 & F \\\\\n\t 100006 & 69.43 & 2 & F \\\\\n\\end{tabular}\n", "text/markdown": "\ncust_id | sales_total | num_of_orders | gender | \n|---|---|---|---|---|---|\n| 100001 | 800.64 | 3 | F | \n| 100002 | 217.53 | 3 | F | \n| 100003 | 74.58 | 2 | M | \n| 100004 | 498.6 | 3 | M | \n| 100005 | 723.11 | 4 | F | \n| 100006 | 69.43 | 2 | F | \n\n\n", "text/plain": " cust_id sales_total num_of_orders gender\n1 100001 800.64 3 F \n2 100002 217.53 3 F \n3 100003 74.58 2 M \n4 100004 498.6 3 M \n5 100005 723.11 4 F \n6 100006 69.43 2 F " }, "metadata": {} } ] }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "### Using read.fwf() for Fixed-width files\n* In a fixed width file, every line has the same format and the information within a given line is strictly organized by columns.\n* A negative number in the list indicates that the column should be skipped. " }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" }, "trusted": false }, "cell_type": "code", "source": "# The first three columns are assumed to contain letters, the next five columns\n# contain numbers, the next two columns have letters, and the last four columns\n# are numbers\ntrial <- read.fwf(\"data/trialFWF.dat\", c(3, 5, 2, 4), skip = 1)\ntrial\ntrial$V1", "execution_count": 87, "outputs": [ { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\t\n\n
V1V2V3V4
B 100ZZ 18
C 200YY 20
D 300XX 22
\n", "text/latex": "\\begin{tabular}{r|llll}\n V1 & V2 & V3 & V4\\\\\n\\hline\n\t B & 100 & ZZ & 18 \\\\\n\t C & 200 & YY & 20 \\\\\n\t D & 300 & XX & 22 \\\\\n\\end{tabular}\n", "text/markdown": "\nV1 | V2 | V3 | V4 | \n|---|---|---|\n| B | 100 | ZZ | 18 | \n| C | 200 | YY | 20 | \n| D | 300 | XX | 22 | \n\n\n", "text/plain": " V1 V2 V3 V4\n1 B 100 ZZ 18\n2 C 200 YY 20\n3 D 300 XX 22" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/html": "
    \n\t
  1. B
  2. \n\t
  3. C
  4. \n\t
  5. D
  6. \n
\n", "text/latex": "\\begin{enumerate*}\n\\item B \n\\item C \n\\item D \n\\end{enumerate*}\n", "text/markdown": "1. B \n2. C \n3. D \n\n\n", "text/plain": "[1] B C D \nLevels: B C D " }, "metadata": {} } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "fragment" }, "trusted": false }, "cell_type": "code", "source": "# column can be ignored by giving a negative number\ntrial <- read.fwf(\"data/trialFWF.dat\", c(3, -5, 2, 4), skip = 1)\ntrial", "execution_count": 88, "outputs": [ { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\t\n\n
V1V2V3
B ZZ 18
C YY 20
D XX 22
\n", "text/latex": "\\begin{tabular}{r|lll}\n V1 & V2 & V3\\\\\n\\hline\n\t B & ZZ & 18 \\\\\n\t C & YY & 20 \\\\\n\t D & XX & 22 \\\\\n\\end{tabular}\n", "text/markdown": "\nV1 | V2 | V3 | \n|---|---|---|\n| B | ZZ | 18 | \n| C | YY | 20 | \n| D | XX | 22 | \n\n\n", "text/plain": " V1 V2 V3\n1 B ZZ 18\n2 C YY 20\n3 D XX 22" }, "metadata": {} } ] }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "# readr package\n* faster\n* helpful progress bar\n* read_csv(): to read a comma (“,”) separated values\n* read_csv2(): to read a semicolon (“;”) separated values\n* read_tsv(): to read a tab separated (“\\t”) values" }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "library(readr)\n# Import potatoes.txt using read_delim(): potatoes\npotatoesTXT <- read_delim(\"data/potatoes.txt\", delim = \"\\t\")\n# Print out potatoes\npotatoesTXT\n#read_csv function (readr package)\n# readr is already loaded\n# Column names\nproperties <- c(\"area\", \"temp\", \"size\", \"storage\", \"method\", \n \"texture\", \"flavor\", \"moistness\")\n# Import potatoes.csv with read_csv(): potatoes\npotatoesCSV <- read_csv(\"data/potatoes.csv\", col_names = properties)\npotatoesCSV", "execution_count": 89, "outputs": [ { "name": "stderr", "output_type": "stream", "text": "Parsed with column specification:\ncols(\n area = col_integer(),\n temp = col_integer(),\n size = col_integer(),\n storage = col_integer(),\n method = col_integer(),\n texture = col_double(),\n flavor = col_double(),\n moistness = col_double()\n)\n" }, { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\n
areatempsizestoragemethodtextureflavormoistness
1 1 1 1 1 2.93.23.0
1 1 1 1 2 2.32.52.6
1 1 1 1 3 2.52.82.8
1 1 1 1 4 2.12.92.4
1 1 1 1 5 1.92.82.2
1 1 1 2 1 1.83.01.7
1 1 1 2 2 2.63.12.4
1 1 1 2 3 3.03.02.9
1 1 1 2 4 2.23.22.5
1 1 1 2 5 2.02.81.9
1 1 1 3 1 1.82.61.5
1 1 1 3 2 2.02.81.9
1 1 1 3 3 2.62.62.6
1 1 1 3 4 2.13.22.1
1 1 1 3 5 2.53.02.1
1 1 1 4 1 2.63.12.4
1 1 1 4 2 2.72.92.4
1 1 1 4 3 2.23.12.3
1 1 1 4 4 3.13.42.7
1 1 1 4 5 3.02.62.7
1 1 2 1 1 3.13.02.8
1 1 2 1 2 2.72.82.7
1 1 2 1 3 2.43.02.9
1 1 2 1 4 2.22.92.3
1 1 2 1 5 1.92.92.0
1 1 2 2 1 1.82.61.8
1 1 2 2 2 2.22.92.1
1 1 2 2 3 2.83.22.8
1 1 2 2 4 2.33.22.4
1 1 2 2 5 2.03.02.0
........................
2 2 1 3 1 2.82.82.6
2 2 1 3 2 3.52.83.0
2 2 1 3 3 2.53.22.3
2 2 1 3 4 3.33.02.7
2 2 1 3 5 3.52.92.9
2 2 1 4 1 3.23.42.5
2 2 1 4 2 3.32.82.8
2 2 1 4 3 3.03.02.8
2 2 1 4 4 3.53.23.1
2 2 1 4 5 3.43.02.8
2 2 2 1 1 2.72.52.5
2 2 2 1 2 2.52.72.3
2 2 2 1 3 3.22.73.0
2 2 2 1 4 2.42.72.5
2 2 2 1 5 2.72.12.3
2 2 2 2 1 2.22.72.3
2 2 2 2 2 3.12.92.6
2 2 2 2 3 2.22.83.1
2 2 2 2 4 2.93.02.7
2 2 2 2 5 2.82.72.6
2 2 2 3 1 2.53.22.3
2 2 2 3 2 2.93.32.7
2 2 2 3 3 2.53.12.5
2 2 2 3 4 3.02.92.5
2 2 2 3 5 2.93.13.1
2 2 2 4 1 2.73.32.6
2 2 2 4 2 2.62.82.3
2 2 2 4 3 2.53.12.6
2 2 2 4 4 3.43.33.0
2 2 2 4 5 2.52.82.3
\n", "text/latex": "\\begin{tabular}{r|llllllll}\n area & temp & size & storage & method & texture & flavor & moistness\\\\\n\\hline\n\t 1 & 1 & 1 & 1 & 1 & 2.9 & 3.2 & 3.0\\\\\n\t 1 & 1 & 1 & 1 & 2 & 2.3 & 2.5 & 2.6\\\\\n\t 1 & 1 & 1 & 1 & 3 & 2.5 & 2.8 & 2.8\\\\\n\t 1 & 1 & 1 & 1 & 4 & 2.1 & 2.9 & 2.4\\\\\n\t 1 & 1 & 1 & 1 & 5 & 1.9 & 2.8 & 2.2\\\\\n\t 1 & 1 & 1 & 2 & 1 & 1.8 & 3.0 & 1.7\\\\\n\t 1 & 1 & 1 & 2 & 2 & 2.6 & 3.1 & 2.4\\\\\n\t 1 & 1 & 1 & 2 & 3 & 3.0 & 3.0 & 2.9\\\\\n\t 1 & 1 & 1 & 2 & 4 & 2.2 & 3.2 & 2.5\\\\\n\t 1 & 1 & 1 & 2 & 5 & 2.0 & 2.8 & 1.9\\\\\n\t 1 & 1 & 1 & 3 & 1 & 1.8 & 2.6 & 1.5\\\\\n\t 1 & 1 & 1 & 3 & 2 & 2.0 & 2.8 & 1.9\\\\\n\t 1 & 1 & 1 & 3 & 3 & 2.6 & 2.6 & 2.6\\\\\n\t 1 & 1 & 1 & 3 & 4 & 2.1 & 3.2 & 2.1\\\\\n\t 1 & 1 & 1 & 3 & 5 & 2.5 & 3.0 & 2.1\\\\\n\t 1 & 1 & 1 & 4 & 1 & 2.6 & 3.1 & 2.4\\\\\n\t 1 & 1 & 1 & 4 & 2 & 2.7 & 2.9 & 2.4\\\\\n\t 1 & 1 & 1 & 4 & 3 & 2.2 & 3.1 & 2.3\\\\\n\t 1 & 1 & 1 & 4 & 4 & 3.1 & 3.4 & 2.7\\\\\n\t 1 & 1 & 1 & 4 & 5 & 3.0 & 2.6 & 2.7\\\\\n\t 1 & 1 & 2 & 1 & 1 & 3.1 & 3.0 & 2.8\\\\\n\t 1 & 1 & 2 & 1 & 2 & 2.7 & 2.8 & 2.7\\\\\n\t 1 & 1 & 2 & 1 & 3 & 2.4 & 3.0 & 2.9\\\\\n\t 1 & 1 & 2 & 1 & 4 & 2.2 & 2.9 & 2.3\\\\\n\t 1 & 1 & 2 & 1 & 5 & 1.9 & 2.9 & 2.0\\\\\n\t 1 & 1 & 2 & 2 & 1 & 1.8 & 2.6 & 1.8\\\\\n\t 1 & 1 & 2 & 2 & 2 & 2.2 & 2.9 & 2.1\\\\\n\t 1 & 1 & 2 & 2 & 3 & 2.8 & 3.2 & 2.8\\\\\n\t 1 & 1 & 2 & 2 & 4 & 2.3 & 3.2 & 2.4\\\\\n\t 1 & 1 & 2 & 2 & 5 & 2.0 & 3.0 & 2.0\\\\\n\t ... & ... & ... & ... & ... & ... & ... & ...\\\\\n\t 2 & 2 & 1 & 3 & 1 & 2.8 & 2.8 & 2.6\\\\\n\t 2 & 2 & 1 & 3 & 2 & 3.5 & 2.8 & 3.0\\\\\n\t 2 & 2 & 1 & 3 & 3 & 2.5 & 3.2 & 2.3\\\\\n\t 2 & 2 & 1 & 3 & 4 & 3.3 & 3.0 & 2.7\\\\\n\t 2 & 2 & 1 & 3 & 5 & 3.5 & 2.9 & 2.9\\\\\n\t 2 & 2 & 1 & 4 & 1 & 3.2 & 3.4 & 2.5\\\\\n\t 2 & 2 & 1 & 4 & 2 & 3.3 & 2.8 & 2.8\\\\\n\t 2 & 2 & 1 & 4 & 3 & 3.0 & 3.0 & 2.8\\\\\n\t 2 & 2 & 1 & 4 & 4 & 3.5 & 3.2 & 3.1\\\\\n\t 2 & 2 & 1 & 4 & 5 & 3.4 & 3.0 & 2.8\\\\\n\t 2 & 2 & 2 & 1 & 1 & 2.7 & 2.5 & 2.5\\\\\n\t 2 & 2 & 2 & 1 & 2 & 2.5 & 2.7 & 2.3\\\\\n\t 2 & 2 & 2 & 1 & 3 & 3.2 & 2.7 & 3.0\\\\\n\t 2 & 2 & 2 & 1 & 4 & 2.4 & 2.7 & 2.5\\\\\n\t 2 & 2 & 2 & 1 & 5 & 2.7 & 2.1 & 2.3\\\\\n\t 2 & 2 & 2 & 2 & 1 & 2.2 & 2.7 & 2.3\\\\\n\t 2 & 2 & 2 & 2 & 2 & 3.1 & 2.9 & 2.6\\\\\n\t 2 & 2 & 2 & 2 & 3 & 2.2 & 2.8 & 3.1\\\\\n\t 2 & 2 & 2 & 2 & 4 & 2.9 & 3.0 & 2.7\\\\\n\t 2 & 2 & 2 & 2 & 5 & 2.8 & 2.7 & 2.6\\\\\n\t 2 & 2 & 2 & 3 & 1 & 2.5 & 3.2 & 2.3\\\\\n\t 2 & 2 & 2 & 3 & 2 & 2.9 & 3.3 & 2.7\\\\\n\t 2 & 2 & 2 & 3 & 3 & 2.5 & 3.1 & 2.5\\\\\n\t 2 & 2 & 2 & 3 & 4 & 3.0 & 2.9 & 2.5\\\\\n\t 2 & 2 & 2 & 3 & 5 & 2.9 & 3.1 & 3.1\\\\\n\t 2 & 2 & 2 & 4 & 1 & 2.7 & 3.3 & 2.6\\\\\n\t 2 & 2 & 2 & 4 & 2 & 2.6 & 2.8 & 2.3\\\\\n\t 2 & 2 & 2 & 4 & 3 & 2.5 & 3.1 & 2.6\\\\\n\t 2 & 2 & 2 & 4 & 4 & 3.4 & 3.3 & 3.0\\\\\n\t 2 & 2 & 2 & 4 & 5 & 2.5 & 2.8 & 2.3\\\\\n\\end{tabular}\n", "text/markdown": "\narea | temp | size | storage | method | texture | flavor | moistness | \n|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n| 1 | 1 | 1 | 1 | 1 | 2.9 | 3.2 | 3.0 | \n| 1 | 1 | 1 | 1 | 2 | 2.3 | 2.5 | 2.6 | \n| 1 | 1 | 1 | 1 | 3 | 2.5 | 2.8 | 2.8 | \n| 1 | 1 | 1 | 1 | 4 | 2.1 | 2.9 | 2.4 | \n| 1 | 1 | 1 | 1 | 5 | 1.9 | 2.8 | 2.2 | \n| 1 | 1 | 1 | 2 | 1 | 1.8 | 3.0 | 1.7 | \n| 1 | 1 | 1 | 2 | 2 | 2.6 | 3.1 | 2.4 | \n| 1 | 1 | 1 | 2 | 3 | 3.0 | 3.0 | 2.9 | \n| 1 | 1 | 1 | 2 | 4 | 2.2 | 3.2 | 2.5 | \n| 1 | 1 | 1 | 2 | 5 | 2.0 | 2.8 | 1.9 | \n| 1 | 1 | 1 | 3 | 1 | 1.8 | 2.6 | 1.5 | \n| 1 | 1 | 1 | 3 | 2 | 2.0 | 2.8 | 1.9 | \n| 1 | 1 | 1 | 3 | 3 | 2.6 | 2.6 | 2.6 | \n| 1 | 1 | 1 | 3 | 4 | 2.1 | 3.2 | 2.1 | \n| 1 | 1 | 1 | 3 | 5 | 2.5 | 3.0 | 2.1 | \n| 1 | 1 | 1 | 4 | 1 | 2.6 | 3.1 | 2.4 | \n| 1 | 1 | 1 | 4 | 2 | 2.7 | 2.9 | 2.4 | \n| 1 | 1 | 1 | 4 | 3 | 2.2 | 3.1 | 2.3 | \n| 1 | 1 | 1 | 4 | 4 | 3.1 | 3.4 | 2.7 | \n| 1 | 1 | 1 | 4 | 5 | 3.0 | 2.6 | 2.7 | \n| 1 | 1 | 2 | 1 | 1 | 3.1 | 3.0 | 2.8 | \n| 1 | 1 | 2 | 1 | 2 | 2.7 | 2.8 | 2.7 | \n| 1 | 1 | 2 | 1 | 3 | 2.4 | 3.0 | 2.9 | \n| 1 | 1 | 2 | 1 | 4 | 2.2 | 2.9 | 2.3 | \n| 1 | 1 | 2 | 1 | 5 | 1.9 | 2.9 | 2.0 | \n| 1 | 1 | 2 | 2 | 1 | 1.8 | 2.6 | 1.8 | \n| 1 | 1 | 2 | 2 | 2 | 2.2 | 2.9 | 2.1 | \n| 1 | 1 | 2 | 2 | 3 | 2.8 | 3.2 | 2.8 | \n| 1 | 1 | 2 | 2 | 4 | 2.3 | 3.2 | 2.4 | \n| 1 | 1 | 2 | 2 | 5 | 2.0 | 3.0 | 2.0 | \n| ... | ... | ... | ... | ... | ... | ... | ... | \n| 2 | 2 | 1 | 3 | 1 | 2.8 | 2.8 | 2.6 | \n| 2 | 2 | 1 | 3 | 2 | 3.5 | 2.8 | 3.0 | \n| 2 | 2 | 1 | 3 | 3 | 2.5 | 3.2 | 2.3 | \n| 2 | 2 | 1 | 3 | 4 | 3.3 | 3.0 | 2.7 | \n| 2 | 2 | 1 | 3 | 5 | 3.5 | 2.9 | 2.9 | \n| 2 | 2 | 1 | 4 | 1 | 3.2 | 3.4 | 2.5 | \n| 2 | 2 | 1 | 4 | 2 | 3.3 | 2.8 | 2.8 | \n| 2 | 2 | 1 | 4 | 3 | 3.0 | 3.0 | 2.8 | \n| 2 | 2 | 1 | 4 | 4 | 3.5 | 3.2 | 3.1 | \n| 2 | 2 | 1 | 4 | 5 | 3.4 | 3.0 | 2.8 | \n| 2 | 2 | 2 | 1 | 1 | 2.7 | 2.5 | 2.5 | \n| 2 | 2 | 2 | 1 | 2 | 2.5 | 2.7 | 2.3 | \n| 2 | 2 | 2 | 1 | 3 | 3.2 | 2.7 | 3.0 | \n| 2 | 2 | 2 | 1 | 4 | 2.4 | 2.7 | 2.5 | \n| 2 | 2 | 2 | 1 | 5 | 2.7 | 2.1 | 2.3 | \n| 2 | 2 | 2 | 2 | 1 | 2.2 | 2.7 | 2.3 | \n| 2 | 2 | 2 | 2 | 2 | 3.1 | 2.9 | 2.6 | \n| 2 | 2 | 2 | 2 | 3 | 2.2 | 2.8 | 3.1 | \n| 2 | 2 | 2 | 2 | 4 | 2.9 | 3.0 | 2.7 | \n| 2 | 2 | 2 | 2 | 5 | 2.8 | 2.7 | 2.6 | \n| 2 | 2 | 2 | 3 | 1 | 2.5 | 3.2 | 2.3 | \n| 2 | 2 | 2 | 3 | 2 | 2.9 | 3.3 | 2.7 | \n| 2 | 2 | 2 | 3 | 3 | 2.5 | 3.1 | 2.5 | \n| 2 | 2 | 2 | 3 | 4 | 3.0 | 2.9 | 2.5 | \n| 2 | 2 | 2 | 3 | 5 | 2.9 | 3.1 | 3.1 | \n| 2 | 2 | 2 | 4 | 1 | 2.7 | 3.3 | 2.6 | \n| 2 | 2 | 2 | 4 | 2 | 2.6 | 2.8 | 2.3 | \n| 2 | 2 | 2 | 4 | 3 | 2.5 | 3.1 | 2.6 | \n| 2 | 2 | 2 | 4 | 4 | 3.4 | 3.3 | 3.0 | \n| 2 | 2 | 2 | 4 | 5 | 2.5 | 2.8 | 2.3 | \n\n\n", "text/plain": " area temp size storage method texture flavor moistness\n1 1 1 1 1 1 2.9 3.2 3.0 \n2 1 1 1 1 2 2.3 2.5 2.6 \n3 1 1 1 1 3 2.5 2.8 2.8 \n4 1 1 1 1 4 2.1 2.9 2.4 \n5 1 1 1 1 5 1.9 2.8 2.2 \n6 1 1 1 2 1 1.8 3.0 1.7 \n7 1 1 1 2 2 2.6 3.1 2.4 \n8 1 1 1 2 3 3.0 3.0 2.9 \n9 1 1 1 2 4 2.2 3.2 2.5 \n10 1 1 1 2 5 2.0 2.8 1.9 \n11 1 1 1 3 1 1.8 2.6 1.5 \n12 1 1 1 3 2 2.0 2.8 1.9 \n13 1 1 1 3 3 2.6 2.6 2.6 \n14 1 1 1 3 4 2.1 3.2 2.1 \n15 1 1 1 3 5 2.5 3.0 2.1 \n16 1 1 1 4 1 2.6 3.1 2.4 \n17 1 1 1 4 2 2.7 2.9 2.4 \n18 1 1 1 4 3 2.2 3.1 2.3 \n19 1 1 1 4 4 3.1 3.4 2.7 \n20 1 1 1 4 5 3.0 2.6 2.7 \n21 1 1 2 1 1 3.1 3.0 2.8 \n22 1 1 2 1 2 2.7 2.8 2.7 \n23 1 1 2 1 3 2.4 3.0 2.9 \n24 1 1 2 1 4 2.2 2.9 2.3 \n25 1 1 2 1 5 1.9 2.9 2.0 \n26 1 1 2 2 1 1.8 2.6 1.8 \n27 1 1 2 2 2 2.2 2.9 2.1 \n28 1 1 2 2 3 2.8 3.2 2.8 \n29 1 1 2 2 4 2.3 3.2 2.4 \n30 1 1 2 2 5 2.0 3.0 2.0 \n... ... ... ... ... ... ... ... ... \n131 2 2 1 3 1 2.8 2.8 2.6 \n132 2 2 1 3 2 3.5 2.8 3.0 \n133 2 2 1 3 3 2.5 3.2 2.3 \n134 2 2 1 3 4 3.3 3.0 2.7 \n135 2 2 1 3 5 3.5 2.9 2.9 \n136 2 2 1 4 1 3.2 3.4 2.5 \n137 2 2 1 4 2 3.3 2.8 2.8 \n138 2 2 1 4 3 3.0 3.0 2.8 \n139 2 2 1 4 4 3.5 3.2 3.1 \n140 2 2 1 4 5 3.4 3.0 2.8 \n141 2 2 2 1 1 2.7 2.5 2.5 \n142 2 2 2 1 2 2.5 2.7 2.3 \n143 2 2 2 1 3 3.2 2.7 3.0 \n144 2 2 2 1 4 2.4 2.7 2.5 \n145 2 2 2 1 5 2.7 2.1 2.3 \n146 2 2 2 2 1 2.2 2.7 2.3 \n147 2 2 2 2 2 3.1 2.9 2.6 \n148 2 2 2 2 3 2.2 2.8 3.1 \n149 2 2 2 2 4 2.9 3.0 2.7 \n150 2 2 2 2 5 2.8 2.7 2.6 \n151 2 2 2 3 1 2.5 3.2 2.3 \n152 2 2 2 3 2 2.9 3.3 2.7 \n153 2 2 2 3 3 2.5 3.1 2.5 \n154 2 2 2 3 4 3.0 2.9 2.5 \n155 2 2 2 3 5 2.9 3.1 3.1 \n156 2 2 2 4 1 2.7 3.3 2.6 \n157 2 2 2 4 2 2.6 2.8 2.3 \n158 2 2 2 4 3 2.5 3.1 2.6 \n159 2 2 2 4 4 3.4 3.3 3.0 \n160 2 2 2 4 5 2.5 2.8 2.3 " }, "metadata": {} }, { "name": "stderr", "output_type": "stream", "text": "Parsed with column specification:\ncols(\n area = col_integer(),\n temp = col_integer(),\n size = col_integer(),\n storage = col_integer(),\n method = col_integer(),\n texture = col_double(),\n flavor = col_double(),\n moistness = col_double()\n)\n" }, { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\n
areatempsizestoragemethodtextureflavormoistness
1 1 1 1 1 2.93.23.0
1 1 1 1 2 2.32.52.6
1 1 1 1 3 2.52.82.8
1 1 1 1 4 2.12.92.4
1 1 1 1 5 1.92.82.2
1 1 1 2 1 1.83.01.7
1 1 1 2 2 2.63.12.4
1 1 1 2 3 3.03.02.9
1 1 1 2 4 2.23.22.5
1 1 1 2 5 2.02.81.9
1 1 1 3 1 1.82.61.5
1 1 1 3 2 2.02.81.9
1 1 1 3 3 2.62.62.6
1 1 1 3 4 2.13.22.1
1 1 1 3 5 2.53.02.1
1 1 1 4 1 2.63.12.4
1 1 1 4 2 2.72.92.4
1 1 1 4 3 2.23.12.3
1 1 1 4 4 3.13.42.7
1 1 1 4 5 3.02.62.7
1 1 2 1 1 3.13.02.8
1 1 2 1 2 2.72.82.7
1 1 2 1 3 2.43.02.9
1 1 2 1 4 2.22.92.3
1 1 2 1 5 1.92.92.0
1 1 2 2 1 1.82.61.8
1 1 2 2 2 2.22.92.1
1 1 2 2 3 2.83.22.8
1 1 2 2 4 2.33.22.4
1 1 2 2 5 2.03.02.0
........................
2 2 1 3 2 3.52.83.0
2 2 1 3 3 2.53.22.3
2 2 1 3 4 3.33.02.7
2 2 1 3 5 3.52.92.9
2 2 1 4 1 3.23.42.5
2 2 1 4 2 3.32.82.8
2 2 1 4 3 3.03.02.8
2 2 1 4 4 3.53.23.1
2 2 1 4 5 3.43.02.8
2 2 2 1 1 2.72.52.5
2 2 2 1 2 2.52.72.3
2 2 2 1 3 3.22.73.0
2 2 2 1 4 2.42.72.5
2 2 2 1 5 2.72.12.3
2 2 2 2 1 2.22.72.3
2 2 2 2 2 3.12.92.6
2 2 2 2 3 2.22.83.1
2 2 2 2 4 2.93.02.7
2 2 2 2 5 2.82.72.6
2 2 2 3 1 2.53.22.3
2 2 2 3 2 2.93.32.7
2 2 2 3 3 2.53.12.5
2 2 2 3 4 3.02.92.5
2 2 2 3 5 2.93.13.1
2 2 2 4 1 2.73.32.6
2 2 2 4 2 2.62.82.3
2 2 2 4 3 2.53.12.6
2 2 2 4 4 3.43.33.0
2 2 2 4 5 2.52.82.3
15 16 25 87 5 6.58.74.5
\n", "text/latex": "\\begin{tabular}{r|llllllll}\n area & temp & size & storage & method & texture & flavor & moistness\\\\\n\\hline\n\t 1 & 1 & 1 & 1 & 1 & 2.9 & 3.2 & 3.0\\\\\n\t 1 & 1 & 1 & 1 & 2 & 2.3 & 2.5 & 2.6\\\\\n\t 1 & 1 & 1 & 1 & 3 & 2.5 & 2.8 & 2.8\\\\\n\t 1 & 1 & 1 & 1 & 4 & 2.1 & 2.9 & 2.4\\\\\n\t 1 & 1 & 1 & 1 & 5 & 1.9 & 2.8 & 2.2\\\\\n\t 1 & 1 & 1 & 2 & 1 & 1.8 & 3.0 & 1.7\\\\\n\t 1 & 1 & 1 & 2 & 2 & 2.6 & 3.1 & 2.4\\\\\n\t 1 & 1 & 1 & 2 & 3 & 3.0 & 3.0 & 2.9\\\\\n\t 1 & 1 & 1 & 2 & 4 & 2.2 & 3.2 & 2.5\\\\\n\t 1 & 1 & 1 & 2 & 5 & 2.0 & 2.8 & 1.9\\\\\n\t 1 & 1 & 1 & 3 & 1 & 1.8 & 2.6 & 1.5\\\\\n\t 1 & 1 & 1 & 3 & 2 & 2.0 & 2.8 & 1.9\\\\\n\t 1 & 1 & 1 & 3 & 3 & 2.6 & 2.6 & 2.6\\\\\n\t 1 & 1 & 1 & 3 & 4 & 2.1 & 3.2 & 2.1\\\\\n\t 1 & 1 & 1 & 3 & 5 & 2.5 & 3.0 & 2.1\\\\\n\t 1 & 1 & 1 & 4 & 1 & 2.6 & 3.1 & 2.4\\\\\n\t 1 & 1 & 1 & 4 & 2 & 2.7 & 2.9 & 2.4\\\\\n\t 1 & 1 & 1 & 4 & 3 & 2.2 & 3.1 & 2.3\\\\\n\t 1 & 1 & 1 & 4 & 4 & 3.1 & 3.4 & 2.7\\\\\n\t 1 & 1 & 1 & 4 & 5 & 3.0 & 2.6 & 2.7\\\\\n\t 1 & 1 & 2 & 1 & 1 & 3.1 & 3.0 & 2.8\\\\\n\t 1 & 1 & 2 & 1 & 2 & 2.7 & 2.8 & 2.7\\\\\n\t 1 & 1 & 2 & 1 & 3 & 2.4 & 3.0 & 2.9\\\\\n\t 1 & 1 & 2 & 1 & 4 & 2.2 & 2.9 & 2.3\\\\\n\t 1 & 1 & 2 & 1 & 5 & 1.9 & 2.9 & 2.0\\\\\n\t 1 & 1 & 2 & 2 & 1 & 1.8 & 2.6 & 1.8\\\\\n\t 1 & 1 & 2 & 2 & 2 & 2.2 & 2.9 & 2.1\\\\\n\t 1 & 1 & 2 & 2 & 3 & 2.8 & 3.2 & 2.8\\\\\n\t 1 & 1 & 2 & 2 & 4 & 2.3 & 3.2 & 2.4\\\\\n\t 1 & 1 & 2 & 2 & 5 & 2.0 & 3.0 & 2.0\\\\\n\t ... & ... & ... & ... & ... & ... & ... & ...\\\\\n\t 2 & 2 & 1 & 3 & 2 & 3.5 & 2.8 & 3.0\\\\\n\t 2 & 2 & 1 & 3 & 3 & 2.5 & 3.2 & 2.3\\\\\n\t 2 & 2 & 1 & 3 & 4 & 3.3 & 3.0 & 2.7\\\\\n\t 2 & 2 & 1 & 3 & 5 & 3.5 & 2.9 & 2.9\\\\\n\t 2 & 2 & 1 & 4 & 1 & 3.2 & 3.4 & 2.5\\\\\n\t 2 & 2 & 1 & 4 & 2 & 3.3 & 2.8 & 2.8\\\\\n\t 2 & 2 & 1 & 4 & 3 & 3.0 & 3.0 & 2.8\\\\\n\t 2 & 2 & 1 & 4 & 4 & 3.5 & 3.2 & 3.1\\\\\n\t 2 & 2 & 1 & 4 & 5 & 3.4 & 3.0 & 2.8\\\\\n\t 2 & 2 & 2 & 1 & 1 & 2.7 & 2.5 & 2.5\\\\\n\t 2 & 2 & 2 & 1 & 2 & 2.5 & 2.7 & 2.3\\\\\n\t 2 & 2 & 2 & 1 & 3 & 3.2 & 2.7 & 3.0\\\\\n\t 2 & 2 & 2 & 1 & 4 & 2.4 & 2.7 & 2.5\\\\\n\t 2 & 2 & 2 & 1 & 5 & 2.7 & 2.1 & 2.3\\\\\n\t 2 & 2 & 2 & 2 & 1 & 2.2 & 2.7 & 2.3\\\\\n\t 2 & 2 & 2 & 2 & 2 & 3.1 & 2.9 & 2.6\\\\\n\t 2 & 2 & 2 & 2 & 3 & 2.2 & 2.8 & 3.1\\\\\n\t 2 & 2 & 2 & 2 & 4 & 2.9 & 3.0 & 2.7\\\\\n\t 2 & 2 & 2 & 2 & 5 & 2.8 & 2.7 & 2.6\\\\\n\t 2 & 2 & 2 & 3 & 1 & 2.5 & 3.2 & 2.3\\\\\n\t 2 & 2 & 2 & 3 & 2 & 2.9 & 3.3 & 2.7\\\\\n\t 2 & 2 & 2 & 3 & 3 & 2.5 & 3.1 & 2.5\\\\\n\t 2 & 2 & 2 & 3 & 4 & 3.0 & 2.9 & 2.5\\\\\n\t 2 & 2 & 2 & 3 & 5 & 2.9 & 3.1 & 3.1\\\\\n\t 2 & 2 & 2 & 4 & 1 & 2.7 & 3.3 & 2.6\\\\\n\t 2 & 2 & 2 & 4 & 2 & 2.6 & 2.8 & 2.3\\\\\n\t 2 & 2 & 2 & 4 & 3 & 2.5 & 3.1 & 2.6\\\\\n\t 2 & 2 & 2 & 4 & 4 & 3.4 & 3.3 & 3.0\\\\\n\t 2 & 2 & 2 & 4 & 5 & 2.5 & 2.8 & 2.3\\\\\n\t 15 & 16 & 25 & 87 & 5 & 6.5 & 8.7 & 4.5\\\\\n\\end{tabular}\n", "text/markdown": "\narea | temp | size | storage | method | texture | flavor | moistness | \n|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n| 1 | 1 | 1 | 1 | 1 | 2.9 | 3.2 | 3.0 | \n| 1 | 1 | 1 | 1 | 2 | 2.3 | 2.5 | 2.6 | \n| 1 | 1 | 1 | 1 | 3 | 2.5 | 2.8 | 2.8 | \n| 1 | 1 | 1 | 1 | 4 | 2.1 | 2.9 | 2.4 | \n| 1 | 1 | 1 | 1 | 5 | 1.9 | 2.8 | 2.2 | \n| 1 | 1 | 1 | 2 | 1 | 1.8 | 3.0 | 1.7 | \n| 1 | 1 | 1 | 2 | 2 | 2.6 | 3.1 | 2.4 | \n| 1 | 1 | 1 | 2 | 3 | 3.0 | 3.0 | 2.9 | \n| 1 | 1 | 1 | 2 | 4 | 2.2 | 3.2 | 2.5 | \n| 1 | 1 | 1 | 2 | 5 | 2.0 | 2.8 | 1.9 | \n| 1 | 1 | 1 | 3 | 1 | 1.8 | 2.6 | 1.5 | \n| 1 | 1 | 1 | 3 | 2 | 2.0 | 2.8 | 1.9 | \n| 1 | 1 | 1 | 3 | 3 | 2.6 | 2.6 | 2.6 | \n| 1 | 1 | 1 | 3 | 4 | 2.1 | 3.2 | 2.1 | \n| 1 | 1 | 1 | 3 | 5 | 2.5 | 3.0 | 2.1 | \n| 1 | 1 | 1 | 4 | 1 | 2.6 | 3.1 | 2.4 | \n| 1 | 1 | 1 | 4 | 2 | 2.7 | 2.9 | 2.4 | \n| 1 | 1 | 1 | 4 | 3 | 2.2 | 3.1 | 2.3 | \n| 1 | 1 | 1 | 4 | 4 | 3.1 | 3.4 | 2.7 | \n| 1 | 1 | 1 | 4 | 5 | 3.0 | 2.6 | 2.7 | \n| 1 | 1 | 2 | 1 | 1 | 3.1 | 3.0 | 2.8 | \n| 1 | 1 | 2 | 1 | 2 | 2.7 | 2.8 | 2.7 | \n| 1 | 1 | 2 | 1 | 3 | 2.4 | 3.0 | 2.9 | \n| 1 | 1 | 2 | 1 | 4 | 2.2 | 2.9 | 2.3 | \n| 1 | 1 | 2 | 1 | 5 | 1.9 | 2.9 | 2.0 | \n| 1 | 1 | 2 | 2 | 1 | 1.8 | 2.6 | 1.8 | \n| 1 | 1 | 2 | 2 | 2 | 2.2 | 2.9 | 2.1 | \n| 1 | 1 | 2 | 2 | 3 | 2.8 | 3.2 | 2.8 | \n| 1 | 1 | 2 | 2 | 4 | 2.3 | 3.2 | 2.4 | \n| 1 | 1 | 2 | 2 | 5 | 2.0 | 3.0 | 2.0 | \n| ... | ... | ... | ... | ... | ... | ... | ... | \n| 2 | 2 | 1 | 3 | 2 | 3.5 | 2.8 | 3.0 | \n| 2 | 2 | 1 | 3 | 3 | 2.5 | 3.2 | 2.3 | \n| 2 | 2 | 1 | 3 | 4 | 3.3 | 3.0 | 2.7 | \n| 2 | 2 | 1 | 3 | 5 | 3.5 | 2.9 | 2.9 | \n| 2 | 2 | 1 | 4 | 1 | 3.2 | 3.4 | 2.5 | \n| 2 | 2 | 1 | 4 | 2 | 3.3 | 2.8 | 2.8 | \n| 2 | 2 | 1 | 4 | 3 | 3.0 | 3.0 | 2.8 | \n| 2 | 2 | 1 | 4 | 4 | 3.5 | 3.2 | 3.1 | \n| 2 | 2 | 1 | 4 | 5 | 3.4 | 3.0 | 2.8 | \n| 2 | 2 | 2 | 1 | 1 | 2.7 | 2.5 | 2.5 | \n| 2 | 2 | 2 | 1 | 2 | 2.5 | 2.7 | 2.3 | \n| 2 | 2 | 2 | 1 | 3 | 3.2 | 2.7 | 3.0 | \n| 2 | 2 | 2 | 1 | 4 | 2.4 | 2.7 | 2.5 | \n| 2 | 2 | 2 | 1 | 5 | 2.7 | 2.1 | 2.3 | \n| 2 | 2 | 2 | 2 | 1 | 2.2 | 2.7 | 2.3 | \n| 2 | 2 | 2 | 2 | 2 | 3.1 | 2.9 | 2.6 | \n| 2 | 2 | 2 | 2 | 3 | 2.2 | 2.8 | 3.1 | \n| 2 | 2 | 2 | 2 | 4 | 2.9 | 3.0 | 2.7 | \n| 2 | 2 | 2 | 2 | 5 | 2.8 | 2.7 | 2.6 | \n| 2 | 2 | 2 | 3 | 1 | 2.5 | 3.2 | 2.3 | \n| 2 | 2 | 2 | 3 | 2 | 2.9 | 3.3 | 2.7 | \n| 2 | 2 | 2 | 3 | 3 | 2.5 | 3.1 | 2.5 | \n| 2 | 2 | 2 | 3 | 4 | 3.0 | 2.9 | 2.5 | \n| 2 | 2 | 2 | 3 | 5 | 2.9 | 3.1 | 3.1 | \n| 2 | 2 | 2 | 4 | 1 | 2.7 | 3.3 | 2.6 | \n| 2 | 2 | 2 | 4 | 2 | 2.6 | 2.8 | 2.3 | \n| 2 | 2 | 2 | 4 | 3 | 2.5 | 3.1 | 2.6 | \n| 2 | 2 | 2 | 4 | 4 | 3.4 | 3.3 | 3.0 | \n| 2 | 2 | 2 | 4 | 5 | 2.5 | 2.8 | 2.3 | \n| 15 | 16 | 25 | 87 | 5 | 6.5 | 8.7 | 4.5 | \n\n\n", "text/plain": " area temp size storage method texture flavor moistness\n1 1 1 1 1 1 2.9 3.2 3.0 \n2 1 1 1 1 2 2.3 2.5 2.6 \n3 1 1 1 1 3 2.5 2.8 2.8 \n4 1 1 1 1 4 2.1 2.9 2.4 \n5 1 1 1 1 5 1.9 2.8 2.2 \n6 1 1 1 2 1 1.8 3.0 1.7 \n7 1 1 1 2 2 2.6 3.1 2.4 \n8 1 1 1 2 3 3.0 3.0 2.9 \n9 1 1 1 2 4 2.2 3.2 2.5 \n10 1 1 1 2 5 2.0 2.8 1.9 \n11 1 1 1 3 1 1.8 2.6 1.5 \n12 1 1 1 3 2 2.0 2.8 1.9 \n13 1 1 1 3 3 2.6 2.6 2.6 \n14 1 1 1 3 4 2.1 3.2 2.1 \n15 1 1 1 3 5 2.5 3.0 2.1 \n16 1 1 1 4 1 2.6 3.1 2.4 \n17 1 1 1 4 2 2.7 2.9 2.4 \n18 1 1 1 4 3 2.2 3.1 2.3 \n19 1 1 1 4 4 3.1 3.4 2.7 \n20 1 1 1 4 5 3.0 2.6 2.7 \n21 1 1 2 1 1 3.1 3.0 2.8 \n22 1 1 2 1 2 2.7 2.8 2.7 \n23 1 1 2 1 3 2.4 3.0 2.9 \n24 1 1 2 1 4 2.2 2.9 2.3 \n25 1 1 2 1 5 1.9 2.9 2.0 \n26 1 1 2 2 1 1.8 2.6 1.8 \n27 1 1 2 2 2 2.2 2.9 2.1 \n28 1 1 2 2 3 2.8 3.2 2.8 \n29 1 1 2 2 4 2.3 3.2 2.4 \n30 1 1 2 2 5 2.0 3.0 2.0 \n... ... ... ... ... ... ... ... ... \n132 2 2 1 3 2 3.5 2.8 3.0 \n133 2 2 1 3 3 2.5 3.2 2.3 \n134 2 2 1 3 4 3.3 3.0 2.7 \n135 2 2 1 3 5 3.5 2.9 2.9 \n136 2 2 1 4 1 3.2 3.4 2.5 \n137 2 2 1 4 2 3.3 2.8 2.8 \n138 2 2 1 4 3 3.0 3.0 2.8 \n139 2 2 1 4 4 3.5 3.2 3.1 \n140 2 2 1 4 5 3.4 3.0 2.8 \n141 2 2 2 1 1 2.7 2.5 2.5 \n142 2 2 2 1 2 2.5 2.7 2.3 \n143 2 2 2 1 3 3.2 2.7 3.0 \n144 2 2 2 1 4 2.4 2.7 2.5 \n145 2 2 2 1 5 2.7 2.1 2.3 \n146 2 2 2 2 1 2.2 2.7 2.3 \n147 2 2 2 2 2 3.1 2.9 2.6 \n148 2 2 2 2 3 2.2 2.8 3.1 \n149 2 2 2 2 4 2.9 3.0 2.7 \n150 2 2 2 2 5 2.8 2.7 2.6 \n151 2 2 2 3 1 2.5 3.2 2.3 \n152 2 2 2 3 2 2.9 3.3 2.7 \n153 2 2 2 3 3 2.5 3.1 2.5 \n154 2 2 2 3 4 3.0 2.9 2.5 \n155 2 2 2 3 5 2.9 3.1 3.1 \n156 2 2 2 4 1 2.7 3.3 2.6 \n157 2 2 2 4 2 2.6 2.8 2.3 \n158 2 2 2 4 3 2.5 3.1 2.6 \n159 2 2 2 4 4 3.4 3.3 3.0 \n160 2 2 2 4 5 2.5 2.8 2.3 \n161 15 16 25 87 5 6.5 8.7 4.5 " }, "metadata": {} } ] }, { "metadata": { "slideshow": { "slide_type": "subslide" } }, "cell_type": "markdown", "source": "**Further References:** https://blog.rstudio.com/2016/08/05/readr-1-0-0/" }, { "metadata": { "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "# Data Exporting" }, { "metadata": { "run_control": { "marked": false }, "slideshow": { "slide_type": "slide" } }, "cell_type": "markdown", "source": "**Modifying changes in imported file and write/export those changes to the disk**" }, { "metadata": { "collapsed": true, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "sales <- read.csv(\"data/yearly_sales.csv\")\nsales$per_order <- sales$sales_total/sales$num_of_orders \nwrite.table(sales, \"data/sales_modified.txt\", sep = \"\\t\", row.names = FALSE) # row.names argument option", "execution_count": 90, "outputs": [] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "month <- month.abb\navgHigh <- c(38, 41, 47, 56, 69, 81, 83, 82, 71, 55, 48, 43)\nseasons <- c(\"Winter\", \"Spring\", \"Summer\", \"Fall\")\nseason <- rep(seasons[c(1:4,1)], c(2,3,3,3,1))\nschoolIn <- rep(c(\"yes\", \"no\", \"yes\"), c(5, 3, 4))\n\ngetwd()\nd <- data.frame(month, avgHigh, season, schoolIn)\nwrite.table(d, \"data/annual.txt\", quote=TRUE,\n sep=\"\\t\", row.names=FALSE)\n\nd <- read.delim(\"data/annual.txt\", header=TRUE, sep=\"\\t\")\nd", "execution_count": 91, "outputs": [ { "output_type": "display_data", "data": { "text/html": "'E:/Projects/LIT-01/Data-Importing'", "text/latex": "'E:/Projects/LIT-01/Data-Importing'", "text/markdown": "'E:/Projects/LIT-01/Data-Importing'", "text/plain": "[1] \"E:/Projects/LIT-01/Data-Importing\"" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\n
monthavgHighseasonschoolIn
Jan 38 Winteryes
Feb 41 Winteryes
Mar 47 Springyes
Apr 56 Springyes
May 69 Springyes
Jun 81 Summerno
Jul 83 Summerno
Aug 82 Summerno
Sep 71 Fall yes
Oct 55 Fall yes
Nov 48 Fall yes
Dec 43 Winteryes
\n", "text/latex": "\\begin{tabular}{r|llll}\n month & avgHigh & season & schoolIn\\\\\n\\hline\n\t Jan & 38 & Winter & yes \\\\\n\t Feb & 41 & Winter & yes \\\\\n\t Mar & 47 & Spring & yes \\\\\n\t Apr & 56 & Spring & yes \\\\\n\t May & 69 & Spring & yes \\\\\n\t Jun & 81 & Summer & no \\\\\n\t Jul & 83 & Summer & no \\\\\n\t Aug & 82 & Summer & no \\\\\n\t Sep & 71 & Fall & yes \\\\\n\t Oct & 55 & Fall & yes \\\\\n\t Nov & 48 & Fall & yes \\\\\n\t Dec & 43 & Winter & yes \\\\\n\\end{tabular}\n", "text/markdown": "\nmonth | avgHigh | season | schoolIn | \n|---|---|---|---|---|---|---|---|---|---|---|---|\n| Jan | 38 | Winter | yes | \n| Feb | 41 | Winter | yes | \n| Mar | 47 | Spring | yes | \n| Apr | 56 | Spring | yes | \n| May | 69 | Spring | yes | \n| Jun | 81 | Summer | no | \n| Jul | 83 | Summer | no | \n| Aug | 82 | Summer | no | \n| Sep | 71 | Fall | yes | \n| Oct | 55 | Fall | yes | \n| Nov | 48 | Fall | yes | \n| Dec | 43 | Winter | yes | \n\n\n", "text/plain": " month avgHigh season schoolIn\n1 Jan 38 Winter yes \n2 Feb 41 Winter yes \n3 Mar 47 Spring yes \n4 Apr 56 Spring yes \n5 May 69 Spring yes \n6 Jun 81 Summer no \n7 Jul 83 Summer no \n8 Aug 82 Summer no \n9 Sep 71 Fall yes \n10 Oct 55 Fall yes \n11 Nov 48 Fall yes \n12 Dec 43 Winter yes " }, "metadata": {} } ] }, { "metadata": { "run_control": { "frozen": false, "read_only": false }, "scrolled": true, "slideshow": { "slide_type": "subslide" }, "trusted": false }, "cell_type": "code", "source": "#quote argument option\nt=data.frame(rep(\"a\",5), rep(\"b\",5)) \nt\nwrite.table(t,\"data/try.txt\",row.names=F,col.names=F,sep=\"\\t\") \nwrite.table(t,\"data/tryN.txt\",row.names=F,col.names=F,sep=\"\\t\", quote=FALSE) ", "execution_count": 92, "outputs": [ { "output_type": "display_data", "data": { "text/html": "\n\n\n\t\n\t\n\t\n\t\n\t\n\n
rep..a...5.rep..b...5.
ab
ab
ab
ab
ab
\n", "text/latex": "\\begin{tabular}{r|ll}\n rep..a...5. & rep..b...5.\\\\\n\\hline\n\t a & b\\\\\n\t a & b\\\\\n\t a & b\\\\\n\t a & b\\\\\n\t a & b\\\\\n\\end{tabular}\n", "text/markdown": "\nrep..a...5. | rep..b...5. | \n|---|---|---|---|---|\n| a | b | \n| a | b | \n| a | b | \n| a | b | \n| a | b | \n\n\n", "text/plain": " rep..a...5. rep..b...5.\n1 a b \n2 a b \n3 a b \n4 a b \n5 a b " }, "metadata": {} } ] }, { "metadata": {}, "cell_type": "markdown", "source": "**Import Data into R: Read CSV, Excel, SPSS, Stata, SAS Files:** https://www.guru99.com/r-import-data.html \n**Exporting Data to Excel, CSV, SAS, STATA, Text File:** https://www.guru99.com/r-exporting-data.html " } ], "metadata": { "celltoolbar": "Slideshow", "hide_input": false, "kernelspec": { "name": "r", "display_name": "R", "language": "R" }, "language_info": { "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.5.3", "file_extension": ".r", "codemirror_mode": "r" }, "nav_menu": { "height": "323px", "width": "319px" }, "toc": { "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "base_numbering": 1, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": "block", "toc_window_display": false }, "toc_position": { "height": "811px", "left": "0px", "right": "1125px", "top": "107px", "width": "155px" } }, "nbformat": 4, "nbformat_minor": 2 }