Select a set of predictors with minimal multicollinearity
Source:R/non_collinear_vars.R
non_collinear_vars.Rd
Select a set of predictors with minimal multicollinearity using the variance
inflation factor (VIF) as criteria to remove collinear variables. The
algorithm will: (i) compute the VIF value of the correlation matrix
containing the variables selected in ...
; (ii) arrange the
VIF values and delete the variable with the highest VIF; and (iii)
iterate step ii until VIF value is less than or equal to
max_vif
.
Arguments
- .data
The data set containing the variables.
- ...
Variables to be submitted to selection. If
...
is null then all the numeric variables from.data
are used. It must be a single variable name or a comma-separated list of unquoted variables names.- max_vif
The maximum value for the Variance Inflation Factor (threshold) that will be accepted in the set of selected predictors.
- missingval
How to deal with missing values. For more information, please see
stats::cor()
.
Value
A data frame showing the number of selected predictors, maximum VIF value, condition number, determinant value, selected predictors and removed predictors from the original set of variables.
Examples
# \donttest{
library(metan)
# All numeric variables
non_collinear_vars(data_ge2)
#> Parameter values
#> 1 Predictors 10
#> 2 VIF 7.16
#> 3 Condition Number 56.797
#> 4 Determinant 0.0008810515
#> 5 Selected PERK, EP, CDED, NKR, PH, NR, TKW, EL, CD, ED
#> 6 Removed EH, CL, CW, KW, NKE
# Select variables and choose a VIF threshold to 5
non_collinear_vars(data_ge2, EH, CL, CW, KW, NKE, max_vif = 5)
#> Parameter values
#> 1 Predictors 4
#> 2 VIF 2.934
#> 3 Condition Number 11.248
#> 4 Determinant 0.2400583901
#> 5 Selected NKE, EH, CL, CW
#> 6 Removed KW
# }