--- title: Vectorization and Indexing in R layout: default_with_disqus author: Eric C. Anderson output: bookdown::html_chapter --- # Vectorization, Recycling, and Indexing in R {#vectorization-and-indexing} Many functions and almost all the _operators_ (like `+` and `*`, etc.) are vectorized. They operate very quickly on each element of an atomic vector. Goals: we want to learn about: * Vectorization * Recycling * Indexing These three ideas of fundamental to R. We will also discuss: * Comparison operators * Logical operators * Mathematical operators ## Binary Comparison Operators {#bin-comp-ops} These are "binary" because they involve _two_ arguments. Operate _elementwise_ on vectors and return //logical vectors// ```{r, eval=FALSE} x < y # less than x > y # greater than x <= y # less than or equal to x >= y # greater than or equal to x == y # equal to x != y # not equal to ``` `==` is the "_comparison equals_" which tests for equality. (Be careful not to use `=` which, in today's versions of R, is actually interpreted as leftwards assignment.) ### Binary Comparison Examples * With numeric vectors ```{r} x <- c(1,2,5) y <- c(4,4,3) x == y x != y x < y ``` * With strings ```{r} a <- c("izzy", "jazz", "tyler") b <- c("devon", "vanessa", "hilary") a < b # alphabetical order ``` * Here is a tricky combination of both. Can you parse it? ```{r} (a < b) <= (x == y) # trickier...notice the parentheses to force precedence ``` ### Binary Comparison Between a Vector and a Scalar Check this out: ```{r} x <- 1:10 # the colon operator returns a sequence x x <= 3 # compare this to: x <= c(3,3,3,3,3,3,3,3,3,3) ``` What is going on here? ### Comparison With Different-Lengths of Vectors Try this one: ```{r} x <- 1:10 x x > c(1,7) # compare this to: x > c(1,7, 1,7, 1,7, 1,7, 1,7) ``` To understand what is going on here we need to talk about _recycling_ ## Recycling of Vectors in R {#recycling} A _very __super-wickedly__, important_, concept: R likes to operate on vectors of the same length, so if it encounters two vectors of different lengths in a binary operation, it merely replicates (_recycles_) the smaller vector until it is the same length as the longest vector, then it does the operation. If the recycled smaller vector has to be "chopped off" to make it the length of the longer vector, you will get a warning, but it will still return a result: ```{r} x <- c(1,2,3) y <- c(1,10) x * y ``` ### We will see Recycling In Many contexts Recycling occurs wherever two or more vectors get operated on elementwise, not just with comparison operators. It also happens (as we saw above) with mathematical operators. And it also happens with indexing operators when indexing by logical vectors (you'll see that later)!! You gotta know it! Here are some more examples: ```{r} x <- 1:20 x * c(1,0) # turns the even numbers to 0 x * c(0, 0, 1) # turns non-multiples of 3 to 0 x < ((1:4)^2) # recycling c(1, 4, 9, 16) ``` ## Combinations of Comparisons; Logical Ops {#combo-comps} __A Weather Example__ * Suppose two variables, `temp` (in degrees Celsius) and `precip` (in mm) each a vector of length 365. * Tell me how you test for: * All days with temp less than 10 and precip greater than 5 * Days with temp greater than 15 or with no precip (or both) * Days with temp greater than 15 or with no precip (but not both) ### Logical Operators-I These operate on `logical`s and return `logical`s. `numeric` and `complex` vectors are coerced to `logical` before applying these. * Unary operators (those that operate elementwise on a single vector) * `!` Turns `TRUE` to `FALSE` and `FALSE` to `TRUE` ```{r} x <- c(T, T, F, F) # you can use abbreviations for TRUE and FALSE... x !x ``` ### Logical Operators-II * Binary operators (operate elementwise on two vectors) * `&` --- Logical AND * `|` --- Logical OR * `xor(x,y)` --- Logical EXCLUSIVE OR ```{r} x <- c(NA, T, F, T, F) y <- c(T, T, F, F, NA) x y x & y x | y xor(x,y) ``` ## Mathematical Operators {#math-ops} Operate on `numeric` or `complex` mode data and return the same ```{r, eval=FALSE} x + y # addition x - y # subtraction x * y # multiplication x / y # division x ^ y # exponentiation x %% y # modulo division (remainder) 10 %% 3 = 1 x %/% y # integer division: 10 %/% 3 = 3 ``` ### Grouping Parts of Expressions Parentheses are good for ensuring that parts of complex expressions are evaluated in the right order. But, in case you want to appear like a real code jock and don't want to use parentheses, learn the rules of precence. ### Precedence of Operators we Have seen From highest to lowest: ```{r, eval=FALSE} ^ # exponentiation (right to left) - + # unary minus and plus : # sequence operator * / # multiply, divide + - # (binary) add, subtract < > <= >= == != # ordering and comparison ! # negation & # and | # or -> # rightwards assignment = # assignment (right to left) <- # assignment (right to left) ``` Higher precedence operators "stick" more tightly to their arguments. So, for example: ```{r} x<-3 y<-2 -x * y # this is like (-x) * y -x ^ y # this is like -(x ^ y) ``` ### One very important precedence rule Notice that the `:` has higher precedence than the `+`, `-`, `*`, or `/`. Thus ```{r} 1:5*3 # this is (1:5)*3 1:(5*3) # this is 1:15 ``` Or, if you want the sequence of numbers from 0 to n-1, be careful: ```{r} n <- 5 0:n-1 # wrong 0:(n-1) # right ``` ### Built In Help On Functions and Operators Recall that `?function_name` returns help (if available) for the function with name `function_name`: ```{r, eval=FALSE} # examples: ?c ?sum ?mean ``` Builtin help on topics we have discussed today can be found at `?Syntax`, `?Logic`, `?Comparison`, `?Arithmetic`. Also, all material here is covered in parts of sections 1 through 3 in [intro.pdf](http://cran.r-project.org/doc/manuals/R-intro.pdf) available on CRAN. ## Indexing {#indexing} There are times when we want to access one or just a few elements from a vector. We've already seen an example of extracting a single element, for example: ```{r} x <- c("devon", "alicia", "cassie") x[2] # this extracts the second element of x ``` Vectors in R are _base-1_ subscripted. i.e. elements are subscripted "1, 2, 3, ..." instead of "0, 1, 2, ..." ### Overview: 4 Ways To Extract from a Vector Single square brackets are the _indexing_ operators. There are four (common) ways of using the indexing operators. They differ by putting different things inside of the square brackets: 1. A vector of _positive_ indices: `x[c(1,6,4)]` 1. A vector of _negative_ indices: `x[-c(1,6,4)` 1. A _logical vector_ of the appropriate length: `x[c(T,F,F,T,T)]` 1. A _character vector_ of _names_: `x[c("Sept10","Sept24")]` _if the vector has a_ `names` _attribute_. Number four should not make sense to you yet! ### Indexing With Positive Integers * A vector of positive integers extracts the corresponding elements, _in the same order_ and _as many times_ as the indices are listed in the vector ```{r} x <- c(5,4,7,8) x[c(4,4,4,2,2,1,3,2)] # returns a vector of length 8! ``` * If an index exceeds the _length_ of the vector, it returns an `NA` for that element ```{r} x <- c(5,4,7,8) x[c(4,1,3,5)] # the 4th element of the returned vector is NA ``` and gives _no warning_ of this. ### Indexing With Negative Integers * A vector of negative integers says, "extract everything _except_ these indices." * The order of the remaining elements is preserved. * Multiple instances of the same negative integer have the same effect as a single one * Negative integers exceeding the length of the vector are just ignored ```{r} x <- c(5,4,7,8) # here is our vector... x[-2] x[-c(2,4)] x[-c(2,2,2,2,4,4,4,4)] x[-c(2,4,5,10,18)] ``` You _cannot_ mix positive and negative indices! ### Indexing with Logical Vectors * You can supply a logical vector that is "parallel" to the vector you want to extract from. Any element where a `TRUE` occurs in the index vector gets returned. Order of elements is preserved and elements can't get replicated. ```{r} x <- c(5,4,7,8) x[c(FALSE, TRUE, TRUE, FALSE)] ``` * If the index vector is shorter than the vector being indexed, the index vector is _recycled_ ```{r} x <- c(5,4,7,8) x[c(FALSE, TRUE)] ``` ### Empty Subscript Indexing * Here is a quirky feature that you should get to know well, as it will help to understand matrix and data.frame subscripting. * If you apply an empty indexing operator `[]` to a vector, then it returns everything in the vector. Observe: ```{r} x <- c(5,4,7,8) x[] x ``` * "When you give R nothing it gives you everything in return!" ## The Replacement form of Indexing {#replacement-form-indexing} * Also called the _assignment_ form. Allows you to change specified elements of a vector while leaving the others untouched (_except for mode changes due to coercion!_) * This usually takes some getting used to, but you will use it all over in R. So get comfortable with it! ```{r} x <- c(5,4,7,8) y <- x x[c(1,3)] <- 0 x x <- y x[c(T,F,T,F)] <- 1 x x <- y x[-c(1,3)] <- NA x x <- y x[c(1,3)] <- c("a","c") # coercion of remaining elements x x <- y x[c(3,1,2)] <- c("boing1", "boing2", "boing3") # note ordering x x <- y x[c(3,1,3,2,2,2)] <- c("boing1", "boing2", "boing3") # repeated occurrences ignored x ``` The vector that is being assigned gets _recycled_ as need be to match the length of the (extracted part of the) vector being indexed and assigned to. ### Assignment Beyond the length of the Vector * This is allowable when using the replacement form. Intermediate elements are set to NA ```{r} x <- c(5,4,7,8) length(x) x[10] <- 12 x length(x) ``` * Those NA's don't get overwritten by recycling. Recycling only occurs to match the length of the vector returned by the indexing operation: ```{r} x <- c(5,4,7,8) x[11:19] <- c(-1,0,1) ```