We’ve touched on this subject already through the previous chapters, but a more detailed look at the type of objects R uses e.g. an object’s class is useful.
When working with classes, its good to know these functions:
## str() gives you the "Structure" of an object, including its class
str(mtcars)
## class() lets you query an objects class directly
class(mtcars)
There are many different types of class, and you can make your own, so this is not a definitive list, but will help in 95% of cases:
numeric - a number like 1 2 3character - a character string like hellological - TRUE or FALSEDate - an object that represents date. Note this is where it gets tricky, since when you print a Date object and a character with the same format, they look the same, but are different!a_date <- Sys.Date()
a_date
## [1] "2016-09-10"
class(a_date)
## [1] "Date"
a_character <- "2016-08-24"
a_character
## [1] "2016-08-24"
class(a_character)
## [1] "character"
Things like finding date ranges or weekdays will work on a Date object, but not on a character
factor - A categorical varaible.This is another tricky one, as they look the same as character when printed but act quite differently.
a_factor <- factor("hello", levels = c("hello","goodbye"))
a_factor
## [1] hello
## Levels: hello goodbye
class(a_factor)
## [1] "factor"
a_string <- "hello"
a_string
## [1] "hello"
class(a_string)
## [1] "character"
As an example, see what happens when we try to add a string and a factor together in the same vector:
## an unexpected result??
c(a_factor, a_string)
## [1] "1" "hello"
Whats going on? Since a_factor is a factor, it is actually represented as a number out of the choice of levels it could possibly be (c("hello","goodbye")). When it is added to the character the factor is coerced into a character via as.numeric, and then into a character as.character.
Upshot of this all is to be very careful in making sure your variables are the class you expect them to be.
A classic mistake is to use data.frame() or read.csv() to make a data.frame from your data, but to not set the stringsAsFactors = FALSE argument, which if not used will default to using factors instead.
There are also objects in R that work with combinations of the classes above.
A vector is a combination of the atomic elements above. You can only combine elements of the same type in a vector.
You create a vector using c()
a_vector <- c("a","b","c","d")
a_vector
## [1] "a" "b" "c" "d"
class(a_vector)
## [1] "character"
str(a_vector)
## chr [1:4] "a" "b" "c" "d"
The class of the vector is the same as the element!
This hints at a powerful feature of R, namely vectorisation. The atomic elements above are actually a one-length vector. This means that anything you can do to one element, will also work on an entire vector of the same class.
An example of this:
## sum individual elements
sum(1,2,3,3,4,5,6)
## [1] 24
## sum a vector
a_vector <- c(1,2,3,3,4,5,6)
sum(a_vector)
## [1] 24
Some useful shortcuts with vectors are below:
## make a sequence
1:10
## [1] 1 2 3 4 5 6 7 8 9 10
## the lowercase letters
letters
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
## uppercase
LETTERS
## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
The most common and useful R class is the data.frame
class(mtcars)
## [1] "data.frame"
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
This is most often used to represent tabular data, and very often used in R functions.
You make a data.frame using the data.frame() function:
## names before equals, values after
my_data_frame <- data.frame(numbers = 1:5,
letters = c("a","b","c","d","e"),
logic = c(TRUE, FALSE, FALSE, TRUE, TRUE))
class(my_data_frame)
## [1] "data.frame"
str(my_data_frame)
## 'data.frame': 5 obs. of 3 variables:
## $ numbers: int 1 2 3 4 5
## $ letters: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
## $ logic : logi TRUE FALSE FALSE TRUE TRUE
Each column can only be one class, but the class of the columns can be different types.
Also see that by default it will turn characters into factors. Turn this off via the stringsAsFactors = FALSE argument:
## names before equals, values after
my_data_frame <- data.frame(numbers = 1:5,
letters = c("a","b","c","d","e"),
logic = c(TRUE, FALSE, FALSE, TRUE, TRUE),
stringsAsFactors = FALSE)
class(my_data_frame)
## [1] "data.frame"
str(my_data_frame)
## 'data.frame': 5 obs. of 3 variables:
## $ numbers: int 1 2 3 4 5
## $ letters: chr "a" "b" "c" "d" ...
## $ logic : logi TRUE FALSE FALSE TRUE TRUE
You can access the individual columns of a data.frame via the $ notation:
## the column of numbers
my_data_frame$numbers
## [1] 1 2 3 4 5
class(my_data_frame$numbers)
## [1] "integer"
data.frame’s are a special case of the next object, lists() where all the columns are equal length.
A list is like a data.frame, but can carry variable lengths of objects. List elements can be anything, including data.frames or other lists
my_list <- list(letters = letters,
numbers = 1:5,
all_data = my_data_frame,
nested = list(LETTERS))
class(my_list)
## [1] "list"
str(my_list)
## List of 4
## $ letters : chr [1:26] "a" "b" "c" "d" ...
## $ numbers : int [1:5] 1 2 3 4 5
## $ all_data:'data.frame': 5 obs. of 3 variables:
## ..$ numbers: int [1:5] 1 2 3 4 5
## ..$ letters: chr [1:5] "a" "b" "c" "d" ...
## ..$ logic : logi [1:5] TRUE FALSE FALSE TRUE TRUE
## $ nested :List of 1
## ..$ : chr [1:26] "A" "B" "C" "D" ...
Like data.frames you can reach individual items via the $ symbol:
extract <- my_list$all_data
class(extract)
## [1] "data.frame"
str(extract)
## 'data.frame': 5 obs. of 3 variables:
## $ numbers: int 1 2 3 4 5
## $ letters: chr "a" "b" "c" "d" ...
## $ logic : logi TRUE FALSE FALSE TRUE TRUE
If you find an R object the wrong class for the function you need, what can you do? This is where corecian comes into play.
All the class have an as.this function, which when you pass an R object in will try to change it to what you need. It will usually error if this is impossible (which is much better than failing silently!)
Some coercing functions as shown below:
## quotes indicate characters
as.character(-1:3)
## [1] "-1" "0" "1" "2" "3"
## 0 is FALSE, everything else is TRUE
as.logical(-1:3)
## [1] TRUE FALSE TRUE TRUE TRUE
## character to date
as.Date("2015-01-02")
## [1] "2015-01-02"
## If your dates are in a different format to YYYY-MM-DD then you need to supply format argument
as.Date("20150102", format = "%Y%m%d")
## [1] "2015-01-02"
as.Date("12-24-2016", format = "%m-%d-%Y")
## [1] "2016-12-24"
## to change factors to numeric, be careful to go via as.character first
numeric_factor <- factor(1, levels = 5:1)
numeric_factor
## [1] 1
## Levels: 5 4 3 2 1
## it gives answer as 5, as thats the first factor
wrong_factor <- as.numeric(numeric_factor)
wrong_factor
## [1] 5
## we go via as.character, to get whats expected
right_factor <- as.numeric(as.character(numeric_factor))
right_factor
## [1] 1