Helper functions for metan
Tiago Olivoto
2023-03-06
Source:vignettes/vignettes_helper.Rmd
vignettes_helper.Rmd
See the section Rendering engine to know how HTML tables were generated.
Select helpers
The package metan
reexports the tidy
select helpers and implements own select helpers based on operations
with prefixes and suffixes (difference_var()
,
intersect_var()
, and union_var()
), length of
variable names (width_of()
,
width_greater_than()
, and width_less_than()
),
and on case type (lower_case_only()
,
upper_case_only()
, and title_case_only()
).
Variables that start with a prefix and ends with a suffix.
Here, we will select the variables from data_ge2
that
start with “C” and ends with “D”. Just to reduce the length of outputs,
only three rows are selected
library(metan)
data_sel <- head(data_ge2, 3)
data_sel %>%
select_cols(intersect_var("C", "D")) %>%
print_table()
Variables that start with a prefix OR ends with a suffix.
The following code select variables that start with “C” or ends with “D”.
data_sel %>%
select_cols(union_var("C", "D")) %>%
print_table()
Variables that start with a prefix AND NOT ends with a suffix.
The following code select variables that start with “C” and not ends with “D”.
data_sel %>%
select_cols(difference_var("C", "D")) %>%
print_table()
Selection based on length of column names.
- Select variables with an specific name length (four letters)
data_sel %>%
select_cols(width_of(4)) %>%
print_table()
- Select variables with width less than n.
data_sel %>%
select_cols(width_less_than(3)) %>%
print_table()
- Select variables with width greater than n.
data_sel %>%
select_cols(width_greater_than(2)) %>%
print_table()
Select variables by case type
Let’s create data frame with ‘messy’ columnn names.
df <- head(data_ge, 3)
colnames(df) <- c("Env", "gen", "Rep", "GY", "hm")
select_cols(df, lower_case_only()) %>% print_table()
Remove rows or colums wih NA values
The functions remove_rows_na()
and
remove_rows_na()
are used to remove rows and columns with
NA values, respectively.
Bind cross-validation objects
Split a dataframe into subsets grouped by one or more factors
Group data and exclude all non-numeric variables
g1 <- split_factors(data_ge, ENV)
is.split_factors(g1)
# [1] TRUE
Group data and keep all original variables
g2 <- split_factors(data_ge, ENV, GEN, keep_factors = TRUE)
print_table(g2[[1]])
Group a data frame using all factor variables
g3 <- as.split_factors(CO2)
names(g3)
# [1] "Qn1 | Quebec | nonchilled" "Qn2 | Quebec | nonchilled"
# [3] "Qn3 | Quebec | nonchilled" "Qc1 | Quebec | chilled"
# [5] "Qc3 | Quebec | chilled" "Qc2 | Quebec | chilled"
# [7] "Mn3 | Mississippi | nonchilled" "Mn2 | Mississippi | nonchilled"
# [9] "Mn1 | Mississippi | nonchilled" "Mc2 | Mississippi | chilled"
# [11] "Mc3 | Mississippi | chilled" "Mc1 | Mississippi | chilled"