#' ---
#' title: "R intermediate"
#' author: "Dan McGlinn"
#' date: '`r paste("First created on 2015-01-29. Updated on", Sys.Date())`'
#' output: html_document
#'---
#'
#' Home Page - http://dmcglinn.github.io/quant_methods/
#' GitHub Repo - https://github.com/dmcglinn/quant_methods
#'
#' ## Source Code Link
#' https://raw.githubusercontent.com/dmcglinn/quant_methods/gh-pages/lessons/R_intermediate.R
#'
#' ## Lesson Outline
#' * Programming for repetitive tasks
#' * For loops
#' - Capturing output
#' - Make loops general
#' * If statements
#' - Else statements
#' - Nested operations
#' - Else if statements
#' * Define Functions
#' * Debug Functions
#' * Document Functions
#'
#+ echo=FALSE
# specify that the root directory should be the parent directory of where this
# script is stored this is because this .Rmd file is in ./quant_methods/lessons
# and the data file that will be read into is located in ./quant_methods/data .
# If your data file located in the same directory as or in a subdirectory of
# your .R file then you don't need to specify this.
knitr::opts_knit$set(root.dir = '../')
#' The goals of this lesson are to increase student's
#' familiarity with the R programming language by discussing
#' how to control program flow and use functions
#' read in some data to work with
dat <- read.csv('./data/tgpp.csv')
#' or equally
dat <- read.csv('https://raw.githubusercontent.com/dmcglinn/quant_methods/gh-pages/data/tgpp.csv')
#' ## # Programming for repetive tasks
#' Frequently in programming you have to carry out repetitive tasks
#' for example you might want to know what the class of column of a data.frame
#' you could simply write this as
class(dat[,1])
class(dat[,2])
class(dat[,3])
#' and so on, but this is not only laborious but highly prone to typos and thus
#' errors.
#'
#' Based on the last HW assignment we know that the best approach to carrying out
#' this repetitive task is to use the `sapply()` function
sapply(dat, class)
#' However, it is very common that we need a more general approach to carrying out
#' a repetitive task then simply applying a single function (in the example above
#' applying the function class() to each column of dat
#' ## # For Loop
#' For loops are common feature of almost all programming languages. They are
#' typically not the most efficient way to carry out a repetitive or iterative
#' task however, they are frequently easy to understand and relatively easy to
#' modify to include additional tasks.
#' To use a for loop we need to create an iterator that will provide an index for
#' the operation we would like to repeat. An iterative this is any variable you
#' wish typically i, j, or k and so forth but could just as easily be "index" or
#' "my_iterator" although that is not recommended.
#' In the example below we will assign the iterator the value of "i"
for (i in 1:11) {
print(class(dat[ , i]))
}
#'
#' To break this example down we can see that
#'
1:11
#'
#' Generates a vector of numbers from 1 to 11.
#' The portion of code for(i in 1:11) sets the value of i to each value of
#' this vector as the for loop completes its tasks.
#'
#' Note the usage of `i in 1:11` this is somewhat unique to R because many other
#' languages use `i = 1:11` and thus this is a frequent error for many students.
#' Again I just want to emphasize we could have used a different name for our index
#' something like `j` or `my_index` it did not have to be `i` this is simply the
#' most common choice of an index in programming like in alebgra.
#'
#' Also here it is important to note the syntax and code style of the for loop:
#+ eval = FALSE
for (i in 1:11) {
... # note this line is 4 spaces from the left margin, 2 spaces is also common, 0 spaces is bad form
}
#'
#' Above the `...` just represents anything you want the loop to do each iteration
#' of the loop. This loop will iterate 11 times as `i` counts from 1 to 11. Note the
#' spacing of the code and the placement of the curly brackets to start and stop the
#' for loop. Note: it is possible to use different spacing (but not recommended):
#'
#' cramped example
for(i in 1:11){print(class(dat[,i]))}
#'
#' **Question**: Why do you think the code style in the above chunk is not generally recommended?
#'
#' ### #Capturing output
#' Right now our for loop just prints output to the console but often times we want
#' to capture that output and do something with it. To do this first we will have
#' to define an empty object we'll call this `dat_classes`
#'
dat_classes <- NULL
#'
#' Once the empty object is initialized we can simply index is R is smart enough to
#' convert this object to a vector of arbitrary size on the fly. This is not a wise
#' move if memory or time is a necessity but it makes for easy programming.
#'
for (i in 1:11) {
dat_classes[i] <- class(dat[ , i])
}
dat_classes
#' alternatively you can concatenate but the first approach is a bit cleaner
dat_classes <- NULL
for (i in 1:11) {
dat_classes <- c(dat_classes, class(dat[ , i]))
}
#' the gold star approach to this is to set aside exactly how much
#' memory you will need in your holder variable. In our case this is a
#' vector of strings 11 elements long so we can use:
dat_classes <- vector("character", 11)
for (i in 1:11) {
dat_classes[i] <- class(dat[ , i])
}
#' The three approaches above all give the same results but the third approach is
#' typically considered best practice and the first approach is probably the
#' easiest to read. We'll use the first approach for the reminder of this lesson.
#'
#' ### #Make your loops general
#' You don't want it to break if the number of columns of dat changes so you need
#' to write the loop such that it will always count to the appropriate number of
#' columns in dat
#'
dat_classes <- NULL
for (i in 1:ncol(dat)) {
dat_classes[i] <- class(dat[ , i])
}
#'
#' ## # If statements
#' If statements, like for loops, are a staple of programming. They allow
#' the user to specify that a particular task be executed based on a logical
#' TRUE / FALSE test.
dat_classes <- NULL
for (i in 1:ncol(dat)) {
dat_classes[i] <- class(dat[ , i])
if(dat_classes[i] == "integer") {
print('sweet!')
}
}
#'
#' Note above because this if statement is only a single line it is not required
#' that we include the brackets {} however it does make it more explicit to a
#' reader what your code is doing
#'
#' ### # Else statement
#' You can use an else clause to specify an alternative task to be carried out
#' if the logical test is FALSE.
#'
dat_classes <- NULL
for (i in 1:ncol(dat)) {
dat_classes[i] <- class(dat[ , i])
if(dat_classes[i] == "integer") {
print('sweet!')
}
else {
print('sour')
}
}
#'
#' ####Nested statements
#' You can nest if statements (and for loops) within one another
#'
dat_classes <- NULL
for (i in 1:ncol(dat)) {
dat_classes[i] <- class(dat[ , i])
if (dat_classes[i] == "integer") {
print('sweet!')
}
else {
if (dat_classes[i] == 'factor') {
print('ok')
}
else {
print('sour')
}
}
}
#'
#' ### #Else if statement
#' An alternative to the above syntax is to use an else if statement which are
#' sometimes a bit easier to read
#'
dat_classes <- NULL
for (i in 1:ncol(dat)) {
dat_classes[i] <- class(dat[ , i])
if (dat_classes[i] == "integer") {
print('sweet!')
}
else if (dat_classes[i] == 'factor') {
print('ok')
}
else {
print('sour')
}
}
#'
#' In one liner situations you can also use the function `ifelse()`
#'
x <- 1:10
ifelse(x > 5 , 'sweet!', 'sour!')
#'
#' Which produces the same result as:
for (i in x) {
if (i > 5)
print('sweet')
else
print('sour')
}
#'
#' ## #Define functions
#' Functions are one of the most important objects for unlocking R's power. The
#' provide a way to modularize repetitive tasks that we need for our analyses.
#' For example we can take the for loop that we wrote above which works on
#' the data.frame called "dat" and place it in a function so that the same
#' code can work on any data.frame we provide it.
#' Function names should be verbs when possible and also avoid other known R
#' function names when known.
#'
eval_class <- function(x) {
dat_classes <- NULL
for (i in 1:ncol(x)) {
dat_classes[i] <- class(x[ , i])
if (dat_classes[i] == "integer") {
print('sweet!')
}
else if (dat_classes[i] == 'factor') {
print('ok')
}
else {
print('sour')
}
}
return(dat_classes)
}
eval_class(dat)
#'
#' Above the only change we have made to our for loop is to substitute the object
#' name `dat` for `x`. For our function `eval_class()`, `x` is a variable or argument.
#' Additionally we added the line `return(dat_classes` which ensures that the object
#' is output by the function
#'
#' What if dat had twice as many columns?
dbl_dat <- cbind(dat, dat)
eval_class(dbl_dat)
#'
#' It is best practice to program defensively by ensuring that the user
#' supplies an object for the variable x that is sensible. In our case it
#' has to be a data.frame or a matrix object other types should return an
#' error with a reasonable explanation
#'
eval_class <- function(x) {
if (class(x) %in% c('data.frame', 'matrix')){
x_classes <- NULL
for (i in 1:ncol(x)) {
x_classes[i] <- class(x[ , i])
if (x_classes[i] == "integer") {
print('sweet!')
}
else if (x_classes[i] == 'factor') {
print('ok')
}
else {
print('sour')
}
}
}
else {
stop('x must be either a data.frame or matrix')
}
return(x_classes)
}
#+ error = TRUE
my_obj <- 1:10
eval_class(my_obj)
#' ## #Debug functions
#' To debug your function in R use the functions `debug()` and `undebug()`.
#' Rstudio has made the debugging experience for R users much better than previously.
#' Try out the following lines of code
#+ eval = FALSE
debug(eval_class)
eval_class(dat)
undebug(eval_class)
#'
#' ## #Document functions
#' Documentation is critical particularly when it comes to using functions which
#' usually have a least one argument and some type of output.
#'
#' One best practice to follow when documenting functions is to use Roxygen which is
#' a package that helps to build R help files (i.e., .Rd files) which are accessed
#' when the function `help` or `?` is used preceding a function name. Here is a
#' page that goes into detail about how to do this: https://jozef.io/r102-addin-roxytags/, but
#' for simplity here is an example with our function:
# #' Evaluate the class of each column in a matrix or data.frame
# #'
# #' @param x a matrix or data.frame
# #' @return a vector of strings that indicates the class of each column of `x`
# #'
# #' @export
# #' @examples
# #' eval_class(cars)
eval_class <- function(x) {
if (class(x) %in% c('data.frame', 'matrix')){
x_classes <- NULL
for (i in 1:ncol(x)) {
x_classes[i] <- class(x[ , i])
}
}
else {
stop('x must be either a data.frame or matrix')
}
return(x_classes)
}
#' Note above you would remove the preceeding `#` from each line of documentation
#' I had to include that here because R spin uses `#+` to identify formatted text.
#'
#' This provides a nice format that is easily understandable by a human, and if
#' you ever decide to package your function this can can now be used to generate
#' a help file for your function. Learn more at https://roxygen2.r-lib.org/articles/roxygen2.html
#'
#' Home Page - http://dmcglinn.github.io/quant_methods/
#' GitHub Repo - https://github.com/dmcglinn/quant_methods
#'