library(tidyverse)
set.seed(1234)
theme_set(theme_minimal())
for
loopsFunctions are one method of reducing duplication in your code. Another tool for reducing duplication is iteration, which lets you do the same thing to multiple inputs.
df <- tibble(
a = rnorm(10),
b = rnorm(10),
c = rnorm(10),
d = rnorm(10)
)
Let’s say we want to compute the median for each column.
median(df$a)
## [1] -0.5555419
median(df$b)
## [1] -0.4941011
median(df$c)
## [1] -0.4656169
median(df$d)
## [1] -0.605349
Boo! We’ve copied-and-pasted median()
three times. We don’t want to do that. Instead, we can use a for
loop:
output <- vector(mode = "double", length = ncol(df))
for (i in seq_along(df)) {
output[[i]] <- median(df[[i]])
}
output
## [1] -0.5555419 -0.4941011 -0.4656169 -0.6053490
Let’s review the three components of every for
loop.
output <- vector("double", length = ncol(df))
Before you start a loop, you need to create an empty vector to store the output of the loop. Preallocating space for your output is much more efficient than creating space as you move through the loop. The vector()
function allows you to create an empty vector of any type. The first argument mode
defines the type of vector (“logical”, “integer”, “double”, “character”, etc.) and the second argument length
defines the length of the vector.
Numeric vectors are initialized to 0
, logical vectors are initialized to FALSE
, character vectors are initialized to ""
, and list elements to NULL
.
vector(mode = "double", length = ncol(df))
## [1] 0 0 0 0
vector(mode = "logical", length = ncol(df))
## [1] FALSE FALSE FALSE FALSE
vector(mode = "character", length = ncol(df))
## [1] "" "" "" ""
vector(mode = "list", length = ncol(df))
## [[1]]
## NULL
##
## [[2]]
## NULL
##
## [[3]]
## NULL
##
## [[4]]
## NULL
i in seq_along(df)
This component determines what to loop over. During each iteration through the for
loop, a new value will be assigned to i
based on the the defined sequence. Here, the sequence is seq_along(df)
which creates a numeric vector for a sequence of numbers beginning at 1 and continuing until it reaches the length of df
(the length here is the number of columns in df
).
seq_along(df)
## [1] 1 2 3 4
output[[i]] <- median(df[[i]])
This is the code that actually performs the desired calculations. It runs multiple times for every value of i
. We use [[
notation to reference each column of df
and store it in the appropriate element in output
.
If you don’t preallocate space for the output, each time the for
loop iterates, it makes a copy of the output and appends the new value at the end. Copying data takes time and memory. If the output is preallocated space, the loop simply fills in the slots with the correct values.
x <- rnorm(1000, mean = 0, sd = 1)
str(x)
## num [1:1000] 1.449 -1.069 -0.855 -0.281 -0.994 ...
# load microbenchmark library to time code
library(microbenchmark)
microbenchmark(
# don't preallocate
`No preallocation` = {
output <- vector("numeric", 0)
for (i in seq_along(x)) {
output <- c(output, x[[i]] + 1)
}
},
# preallocate
`Preallocation` = {
output <- vector("numeric", length(x))
for (i in seq_along(x)) {
output[[i]] <- x[[i]] + 1
}
}) %>%
autoplot +
scale_y_log10(breaks = c(2, 4, 8, 16, 32)) +
labs(y = "Time [milliseconds]")
## Scale for 'y' is already present. Adding another scale for 'y', which
## will replace the existing scale.
Here, preallocating space for the vector cuts the computation time by a factor of 3.
for
loopmtcars
Write a for
loop that calculates the arithmetic mean for every column in mtcars
.
as_tibble(mtcars)
## # A tibble: 32 x 11
## mpg cyl disp hp drat wt qsec vs am gear carb
## * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## # ... with 22 more rows
Before you write the for
loop, identify the three components you need:
i in seq_along(mtcars)
mean()
of the \(i\)th column, store the new value as the \(i\)th element of the vector output
output <- vector("numeric", ncol(mtcars))
for(i in seq_along(mtcars)){
output[[i]] <- mean(mtcars[[i]], na.rm = TRUE)
}
output
## [1] 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250
## [7] 17.848750 0.437500 0.406250 3.687500 2.812500
diamonds
Write a for
loop that calculates the maximum value in each column of diamonds
.
diamonds
## # A tibble: 53,940 x 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
## 10 0.23 Very Good H VS1 59.4 61 338 4.00 4.05 2.39
## # ... with 53,930 more rows
Before you write the for
loop, identify the three components you need:
i in seq_along(diamonds)
diamonds
, store the new value as the \(i\)th element of the list output
output <- vector("numeric", ncol(diamonds))
for(i in seq_along(diamonds)){
output[i] <- max(diamonds[[i]])
}
output
## [1] 5.01 5.00 7.00 8.00 79.00 95.00 18823.00
## [8] 10.74 58.90 31.80
To preserve the name attributes from diamonds
, use the names()
function to extract the names of each column in diamonds
and apply them as the names to the output
vector:
# get the names of the columns in diamonds
names(diamonds)
## [1] "carat" "cut" "color" "clarity" "depth" "table" "price"
## [8] "x" "y" "z"
# assign the names of the diamonds columns to output
names(output) <- names(diamonds)
output
## carat cut color clarity depth table price x
## 5.01 5.00 7.00 8.00 79.00 95.00 18823.00 10.74
## y z
## 58.90 31.80
Notice that all the columns have a maximum value, even the apparently text-based columns. This is because
cut
,color
, andclarity
are all stored as factor columns. Remember that factor vectors are built on top of integers, so the underlying values are numeric. As a result we can applymax()
to a factor vector and still retrieve a (partially) meaningful result.
You will frequently need to iterate over vectors or data frames, perform an operation on each element, and save the results somewhere. for
loops are not the devil. At first, they may seem more intuitive to use because you are explicitly identifying each component of the process. However the downside is that they focus on a lot of non-essential stuff. You have to track the value on which you are iterating, you need to explicitly create a vector to store the output, you have to assign the output of each iteration to the appropriate element in the output vector, etc.
tidyverse
is all about focusing on verbs, not nouns. That is, focus on the operation being performed (e.g. mean()
, median()
, max()
), not all the extra code needed to make the operation work. The purrr
library provides a family of functions that mirrors for
loops. They:
There is one function for each type of output:
map()
makes a list.map_lgl()
makes a logical vector.map_int()
makes an integer vector.map_dbl()
makes a double vector.map_chr()
makes a character vector.map_dbl(df, mean)
## a b c d
## -0.3831574 -0.1181707 -0.3879468 -0.7661931
map_dbl(df, median)
## a b c d
## -0.5555419 -0.4941011 -0.4656169 -0.6053490
map_dbl(df, sd)
## a b c d
## 0.9957875 1.0673376 0.6660013 0.8942458
Just like all of our functions in the tidyverse
, the first argument is always the data object, and the second argument is the function to be applied. Additional arguments for the function to be applied can be specified like this:
map_dbl(df, mean, na.rm = TRUE)
## a b c d
## -0.3831574 -0.1181707 -0.3879468 -0.7661931
Or data can be piped:
df %>%
map_dbl(mean, na.rm = TRUE)
## a b c d
## -0.3831574 -0.1181707 -0.3879468 -0.7661931
for
loops using a map()
functionmtcars
Write a map()
function that calculates the arithmetic mean for every column in mtcars
.
as_tibble(mtcars)
## # A tibble: 32 x 11
## mpg cyl disp hp drat wt qsec vs am gear carb
## * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## # ... with 22 more rows
map_dbl(mtcars, mean)
## mpg cyl disp hp drat wt
## 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250
## qsec vs am gear carb
## 17.848750 0.437500 0.406250 3.687500 2.812500
diamonds
Write a map()
function that calculates the maximum value in each column of diamonds
.
diamonds
## # A tibble: 53,940 x 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
## 10 0.23 Very Good H VS1 59.4 61 338 4.00 4.05 2.39
## # ... with 53,930 more rows
map_dbl(diamonds, max)
## carat cut color clarity depth table price x
## 5.01 5.00 7.00 8.00 79.00 95.00 18823.00 10.74
## y z
## 58.90 31.80
Frequently when working with data frames you may wish to apply a specific function to multiple columns. For instance, calculating the average value of each column in mtcars
. If we wanted to calculate the average of a single column, it would be pretty simple using just tidyverse
functions:
mtcars %>%
summarize(mpg = mean(mpg))
## mpg
## 1 20.09062
If we want to calculate the mean for all of them, we’d have to duplicate this code many times over:
mtcars %>%
summarize(mpg = mean(mpg),
cyl = mean(cyl),
disp = mean(disp),
hp = mean(hp),
drat = mean(drat),
wt = mean(wt),
qsec = mean(qsec),
vs = mean(vs),
am = mean(am),
gear = mean(gear),
carb = mean(carb))
## mpg cyl disp hp drat wt qsec vs
## 1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375
## am gear carb
## 1 0.40625 3.6875 2.8125
But this is very repetitive and prone to mistakes - I cannot tell you how many typos I originally had in this code when I first wrote it. We’ve seen how to use loops and map()
functions to solve this task - let’s check out one final method: scoped verbs.
Scoped verbs allow you to use standard verbs (or functions) in dplyr
that affect multiple variables at once, combining both elements of repetition and (in some cases) conditional expressions:
_if
allows you to pick variables based on a predicate function like is.numeric()
or is.character()
_at
allows you to pick variables using the same syntax as select()
_all
operates on all variablessummarize_all()
summarize_all()
takes a tibble and a function and applies that function to each column:
summarize_all(mtcars, .funs = mean)
## mpg cyl disp hp drat wt qsec vs
## 1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375
## am gear carb
## 1 0.40625 3.6875 2.8125
If you want to apply multiple summaries, use the funs()
helper:
summarize_all(mtcars, .funs = funs(min, max))
## mpg_min cyl_min disp_min hp_min drat_min wt_min qsec_min vs_min am_min
## 1 10.4 4 71.1 52 2.76 1.513 14.5 0 0
## gear_min carb_min mpg_max cyl_max disp_max hp_max drat_max wt_max
## 1 3 1 33.9 8 472 335 4.93 5.424
## qsec_max vs_max am_max gear_max carb_max
## 1 22.9 1 1 5 8
And as always, you could combine this with group_by()
to calculate group-level summary statistics:
mtcars %>%
group_by(gear) %>%
summarize_all(.funs = mean)
## # A tibble: 3 x 11
## gear mpg cyl disp hp drat wt qsec vs am carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 3. 16.1 7.47 326. 176. 3.13 3.89 17.7 0.200 0. 2.67
## 2 4. 24.5 4.67 123. 89.5 4.04 2.62 19.0 0.833 0.667 2.33
## 3 5. 21.4 6.00 202. 196. 3.92 2.63 15.6 0.200 1.00 4.40
summarize_at()
summarize_at()
allows you to pick columns in the same way as select()
, that is, based on their names. There is one small difference: you need to wrap the complete selection with the vars()
helper (this avoids ambiguity).
summarize_at(mtcars, .vars = vars(-mpg), .funs = mean)
## cyl disp hp drat wt qsec vs am gear
## 1 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875
## carb
## 1 2.8125
By default, the newly created columns have the shortest names needed to uniquely identify the output.
summarize_at(mtcars, .vars = vars(mpg), .funs = funs(min, max))
## min max
## 1 10.4 33.9
summarize_at(mtcars, .vars = vars(mpg, wt), .funs = min)
## mpg wt
## 1 10.4 1.513
summarize_at(mtcars, .vars = vars(-mpg), .funs = funs(min, max))
## cyl_min disp_min hp_min drat_min wt_min qsec_min vs_min am_min gear_min
## 1 4 71.1 52 2.76 1.513 14.5 0 0 3
## carb_min cyl_max disp_max hp_max drat_max wt_max qsec_max vs_max am_max
## 1 1 8 472 335 4.93 5.424 22.9 1 1
## gear_max carb_max
## 1 5 8
summarize_if()
summarize_at()
allows you to pick variables to summarize based on their name, whereas summarize_if()
allows you to pick variables to summarize based on some property of the column. Typically this is their type because you want to (e.g.) apply a numeric summary function only to numeric columns:
starwars
## # A tibble: 87 x 13
## name height mass hair_color skin_color eye_color birth_year gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
## 1 Luke Sk… 172 77. blond fair blue 19.0 male
## 2 C-3PO 167 75. <NA> gold yellow 112. <NA>
## 3 R2-D2 96 32. <NA> white, bl… red 33.0 <NA>
## 4 Darth V… 202 136. none white yellow 41.9 male
## 5 Leia Or… 150 49. brown light brown 19.0 female
## 6 Owen La… 178 120. brown, gr… light blue 52.0 male
## 7 Beru Wh… 165 75. brown light blue 47.0 female
## 8 R5-D4 97 32. <NA> white, red red NA <NA>
## 9 Biggs D… 183 84. black light brown 24.0 male
## 10 Obi-Wan… 182 77. auburn, w… fair blue-gray 57.0 male
## # ... with 77 more rows, and 5 more variables: homeworld <chr>,
## # species <chr>, films <list>, vehicles <list>, starships <list>
starwars %>%
group_by(species) %>%
summarize_if(.predicate = is.numeric, .funs = mean, na.rm = TRUE)
## # A tibble: 38 x 4
## species height mass birth_year
## <chr> <dbl> <dbl> <dbl>
## 1 Aleena 79.0 15.0 NaN
## 2 Besalisk 198. 102. NaN
## 3 Cerean 198. 82.0 92.0
## 4 Chagrian 196. NaN NaN
## 5 Clawdite 168. 55.0 NaN
## 6 Droid 140. 69.8 53.3
## 7 Dug 112. 40.0 NaN
## 8 Ewok 88.0 20.0 8.00
## 9 Geonosian 183. 80.0 NaN
## 10 Gungan 209. 74.0 52.0
## # ... with 28 more rows
(Note that na.rm = TRUE
is passed on to mean()
in the same way as in purrr::map()
.)
mutate_all()
, mutate_if()
and mutate_at()
work in a similar way to their summarize equivalents.
mutate_all(mtcars, .funs = log10)
## mpg cyl disp hp drat wt qsec vs
## 1 1.322219 0.7781513 2.204120 2.041393 0.5910646 0.4183013 1.216430 -Inf
## 2 1.322219 0.7781513 2.204120 2.041393 0.5910646 0.4586378 1.230960 -Inf
## 3 1.357935 0.6020600 2.033424 1.968483 0.5854607 0.3654880 1.269746 0
## 4 1.330414 0.7781513 2.411620 2.041393 0.4885507 0.5071810 1.288696 0
## 5 1.271842 0.9030900 2.556303 2.243038 0.4983106 0.5365584 1.230960 -Inf
## 6 1.257679 0.7781513 2.352183 2.021189 0.4409091 0.5390761 1.305781 0
## 7 1.155336 0.9030900 2.556303 2.389166 0.5065050 0.5526682 1.199755 -Inf
## 8 1.387390 0.6020600 2.166430 1.792392 0.5670264 0.5037907 1.301030 0
## 9 1.357935 0.6020600 2.148603 1.977724 0.5932861 0.4983106 1.359835 0
## 10 1.283301 0.7781513 2.224274 2.089905 0.5932861 0.5365584 1.262451 0
## 11 1.250420 0.7781513 2.224274 2.089905 0.5932861 0.5365584 1.276462 0
## 12 1.214844 0.9030900 2.440594 2.255273 0.4871384 0.6095944 1.240549 -Inf
## 13 1.238046 0.9030900 2.440594 2.255273 0.4871384 0.5717088 1.245513 -Inf
## 14 1.181844 0.9030900 2.440594 2.255273 0.4871384 0.5774918 1.255273 -Inf
## 15 1.017033 0.9030900 2.673942 2.311754 0.4668676 0.7201593 1.254790 -Inf
## 16 1.017033 0.9030900 2.662758 2.332438 0.4771213 0.7343197 1.250908 -Inf
## 17 1.167317 0.9030900 2.643453 2.361728 0.5092025 0.7279477 1.241048 -Inf
## 18 1.510545 0.6020600 1.895975 1.819544 0.6106602 0.3424227 1.289366 0
## 19 1.482874 0.6020600 1.879096 1.716003 0.6928469 0.2081725 1.267641 0
## 20 1.530200 0.6020600 1.851870 1.812913 0.6253125 0.2636361 1.298853 0
## 21 1.332438 0.6020600 2.079543 1.986772 0.5682017 0.3918169 1.301247 0
## 22 1.190332 0.9030900 2.502427 2.176091 0.4409091 0.5465427 1.227115 -Inf
## 23 1.181844 0.9030900 2.482874 2.176091 0.4983106 0.5359267 1.238046 -Inf
## 24 1.123852 0.9030900 2.544068 2.389166 0.5717088 0.5843312 1.187803 -Inf
## 25 1.283301 0.9030900 2.602060 2.243038 0.4885507 0.5848963 1.231724 -Inf
## 26 1.436163 0.6020600 1.897627 1.819544 0.6106602 0.2866810 1.276462 0
## 27 1.414973 0.6020600 2.080266 1.959041 0.6464037 0.3304138 1.222716 -Inf
## 28 1.482874 0.6020600 1.978181 2.053078 0.5763414 0.1798389 1.227887 0
## 29 1.198657 0.9030900 2.545307 2.421604 0.6253125 0.5010593 1.161368 -Inf
## 30 1.294466 0.7781513 2.161368 2.243038 0.5587086 0.4424798 1.190332 -Inf
## 31 1.176091 0.9030900 2.478566 2.525045 0.5490033 0.5526682 1.164353 -Inf
## 32 1.330414 0.6020600 2.082785 2.037426 0.6138418 0.4440448 1.269513 0
## am gear carb
## 1 0 0.6020600 0.6020600
## 2 0 0.6020600 0.6020600
## 3 0 0.6020600 0.0000000
## 4 -Inf 0.4771213 0.0000000
## 5 -Inf 0.4771213 0.3010300
## 6 -Inf 0.4771213 0.0000000
## 7 -Inf 0.4771213 0.6020600
## 8 -Inf 0.6020600 0.3010300
## 9 -Inf 0.6020600 0.3010300
## 10 -Inf 0.6020600 0.6020600
## 11 -Inf 0.6020600 0.6020600
## 12 -Inf 0.4771213 0.4771213
## 13 -Inf 0.4771213 0.4771213
## 14 -Inf 0.4771213 0.4771213
## 15 -Inf 0.4771213 0.6020600
## 16 -Inf 0.4771213 0.6020600
## 17 -Inf 0.4771213 0.6020600
## 18 0 0.6020600 0.0000000
## 19 0 0.6020600 0.3010300
## 20 0 0.6020600 0.0000000
## 21 -Inf 0.4771213 0.0000000
## 22 -Inf 0.4771213 0.3010300
## 23 -Inf 0.4771213 0.3010300
## 24 -Inf 0.4771213 0.6020600
## 25 -Inf 0.4771213 0.3010300
## 26 0 0.6020600 0.0000000
## 27 0 0.6989700 0.3010300
## 28 0 0.6989700 0.3010300
## 29 0 0.6989700 0.6020600
## 30 0 0.6989700 0.7781513
## 31 0 0.6989700 0.9030900
## 32 0 0.6020600 0.3010300
filter_all()
is the most useful of the three filter()
variants. You use it conjunction with all_vars()
or any_vars()
depending on whether or not you want rows where all variables meet the criterion, or where just one variable meets it.
It’s particularly useful finding missing values:
library(nycflights13)
# Rows where any value is missing
filter_all(weather, .vars_predicate = any_vars(is.na(.)))
## # A tibble: 3,109 x 15
## origin year month day hour temp dewp humid wind_dir wind_speed
## <chr> <dbl> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 EWR 2013. 1. 1 17 39.2 28.4 64.9 270. 16.1
## 2 EWR 2013. 1. 1 18 39.2 28.4 64.9 330. 15.0
## 3 EWR 2013. 1. 3 16 30.9 14.0 49.0 NA 4.60
## 4 EWR 2013. 1. 6 10 33.8 30.2 86.5 210. 4.60
## 5 EWR 2013. 1. 6 12 33.8 32.0 93.0 220. 9.21
## 6 EWR 2013. 1. 6 13 35.6 32.0 86.6 240. 8.06
## 7 EWR 2013. 1. 6 14 35.6 32.0 86.6 230. 8.06
## 8 EWR 2013. 1. 6 15 37.4 30.2 75.0 250. 11.5
## 9 EWR 2013. 1. 11 17 45.0 36.0 70.5 NA 4.60
## 10 EWR 2013. 1. 11 21 46.4 39.2 75.8 90. 4.60
## # ... with 3,099 more rows, and 5 more variables: wind_gust <dbl>,
## # precip <dbl>, pressure <dbl>, visib <dbl>, time_hour <dttm>
# Rows where all wind variables are missing
filter_at(weather, .vars = vars(starts_with("wind")),
.vars_predicate = all_vars(is.na(.)))
## # A tibble: 3 x 15
## origin year month day hour temp dewp humid wind_dir wind_speed
## <chr> <dbl> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 EWR 2013. 3. 27 21 52.0 19.0 27.0 NA NA
## 2 JFK 2013. 7. 4 10 73.0 71.1 93.5 NA NA
## 3 JFK 2013. 7. 20 10 81.0 71.1 71.9 NA NA
## # ... with 5 more variables: wind_gust <dbl>, precip <dbl>,
## # pressure <dbl>, visib <dbl>, time_hour <dttm>
devtools::session_info()
## Session info -------------------------------------------------------------
## setting value
## version R version 3.4.3 (2017-11-30)
## system x86_64, darwin15.6.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## tz America/Chicago
## date 2018-04-18
## Packages -----------------------------------------------------------------
## package * version date source
## assertthat 0.2.0 2017-04-11 CRAN (R 3.4.0)
## backports 1.1.2 2017-12-13 CRAN (R 3.4.3)
## base * 3.4.3 2017-12-07 local
## bindr 0.1.1 2018-03-13 CRAN (R 3.4.3)
## bindrcpp 0.2.2.9000 2018-04-08 Github (krlmlr/bindrcpp@bd5ae73)
## broom 0.4.4 2018-03-29 CRAN (R 3.4.3)
## cellranger 1.1.0 2016-07-27 CRAN (R 3.4.0)
## cli 1.0.0 2017-11-05 CRAN (R 3.4.2)
## colorspace 1.3-2 2016-12-14 CRAN (R 3.4.0)
## compiler 3.4.3 2017-12-07 local
## crayon 1.3.4 2017-10-03 Github (gaborcsardi/crayon@b5221ab)
## datasets * 3.4.3 2017-12-07 local
## devtools 1.13.5 2018-02-18 CRAN (R 3.4.3)
## digest 0.6.15 2018-01-28 CRAN (R 3.4.3)
## dplyr * 0.7.4.9003 2018-04-08 Github (tidyverse/dplyr@b7aaa95)
## evaluate 0.10.1 2017-06-24 CRAN (R 3.4.1)
## forcats * 0.3.0 2018-02-19 CRAN (R 3.4.3)
## foreign 0.8-69 2017-06-22 CRAN (R 3.4.3)
## ggplot2 * 2.2.1.9000 2018-04-10 Github (tidyverse/ggplot2@3c9c504)
## glue 1.2.0 2017-10-29 CRAN (R 3.4.2)
## graphics * 3.4.3 2017-12-07 local
## grDevices * 3.4.3 2017-12-07 local
## grid 3.4.3 2017-12-07 local
## gtable 0.2.0 2016-02-26 CRAN (R 3.4.0)
## haven 1.1.1 2018-01-18 CRAN (R 3.4.3)
## hms 0.4.2 2018-03-10 CRAN (R 3.4.3)
## htmltools 0.3.6 2017-04-28 CRAN (R 3.4.0)
## httr 1.3.1 2017-08-20 CRAN (R 3.4.1)
## jsonlite 1.5 2017-06-01 CRAN (R 3.4.0)
## knitr 1.20 2018-02-20 CRAN (R 3.4.3)
## lattice 0.20-35 2017-03-25 CRAN (R 3.4.3)
## lazyeval 0.2.1 2017-10-29 CRAN (R 3.4.2)
## lubridate 1.7.4 2018-04-11 CRAN (R 3.4.3)
## magrittr 1.5 2014-11-22 CRAN (R 3.4.0)
## memoise 1.1.0 2017-04-21 CRAN (R 3.4.0)
## methods * 3.4.3 2017-12-07 local
## mnormt 1.5-5 2016-10-15 CRAN (R 3.4.0)
## modelr 0.1.1 2017-08-10 local
## munsell 0.4.3 2016-02-13 CRAN (R 3.4.0)
## nlme 3.1-137 2018-04-07 CRAN (R 3.4.4)
## parallel 3.4.3 2017-12-07 local
## pillar 1.2.1 2018-02-27 CRAN (R 3.4.3)
## pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.0)
## plyr 1.8.4 2016-06-08 CRAN (R 3.4.0)
## psych 1.8.3.3 2018-03-30 CRAN (R 3.4.4)
## purrr * 0.2.4 2017-10-18 CRAN (R 3.4.2)
## R6 2.2.2 2017-06-17 CRAN (R 3.4.0)
## Rcpp 0.12.16 2018-03-13 CRAN (R 3.4.4)
## readr * 1.1.1 2017-05-16 CRAN (R 3.4.0)
## readxl 1.0.0 2017-04-18 CRAN (R 3.4.0)
## reshape2 1.4.3 2017-12-11 CRAN (R 3.4.3)
## rlang 0.2.0.9001 2018-04-10 Github (r-lib/rlang@70d2d40)
## rmarkdown 1.9 2018-03-01 CRAN (R 3.4.3)
## rprojroot 1.3-2 2018-01-03 CRAN (R 3.4.3)
## rstudioapi 0.7 2017-09-07 CRAN (R 3.4.1)
## rvest 0.3.2 2016-06-17 CRAN (R 3.4.0)
## scales 0.5.0.9000 2018-04-10 Github (hadley/scales@d767915)
## stats * 3.4.3 2017-12-07 local
## stringi 1.1.7 2018-03-12 CRAN (R 3.4.3)
## stringr * 1.3.0 2018-02-19 CRAN (R 3.4.3)
## tibble * 1.4.2 2018-01-22 CRAN (R 3.4.3)
## tidyr * 0.8.0 2018-01-29 CRAN (R 3.4.3)
## tidyselect 0.2.4 2018-02-26 CRAN (R 3.4.3)
## tidyverse * 1.2.1 2017-11-14 CRAN (R 3.4.2)
## tools 3.4.3 2017-12-07 local
## utils * 3.4.3 2017-12-07 local
## withr 2.1.2 2018-04-10 Github (jimhester/withr@79d7b0d)
## xml2 1.2.0 2018-01-24 CRAN (R 3.4.3)
## yaml 2.1.18 2018-03-08 CRAN (R 3.4.4)
This work is licensed under the CC BY-NC 4.0 Creative Commons License.