Performs canonical correlation analysis with collinearity diagnostic, estimation of canonical loads, canonical scores, and hypothesis testing for correlation pairs.
Usage
can_corr(
.data,
FG,
SG,
by = NULL,
use = "cor",
test = "Bartlett",
prob = 0.05,
center = TRUE,
stdscores = FALSE,
verbose = TRUE,
collinearity = TRUE
)
Arguments
- .data
The data to be analyzed. It can be a data frame (possible with grouped data passed from
dplyr::group_by()
.- FG, SG
A comma-separated list of unquoted variable names that will compose the first (smallest) and second (highest) group of the correlation analysis, respectively. Select helpers are also allowed.
- by
One variable (factor) to compute the function by. It is a shortcut to
dplyr::group_by()
. To compute the statistics by more than one grouping variable use that function.- use
The matrix to be used. Must be one of 'cor' for analysis using the correlation matrix (default) or 'cov' for analysis using the covariance matrix.
- test
The test of significance of the relationship between the FG and SG. Must be one of the 'Bartlett' (default) or 'Rao'.
- prob
The probability of error assumed. Set to 0.05.
- center
Should the data be centered to compute the scores?
- stdscores
Rescale scores to produce scores of unit variance?
- verbose
Logical argument. If
TRUE
(default) then the results are shown in the console.- collinearity
Logical argument. If
TRUE
(default) then a collinearity diagnostic is performed for each group of variables according to Olivoto et al.(2017).
Value
If .data
is a grouped data passed from
dplyr::group_by()
then the results will be returned into a
list-column of data frames.
Matrix The correlation (or covariance) matrix of the variables
MFG, MSG The correlation (or covariance) matrix for the variables of the first group or second group, respectively.
MFG_SG The correlation (or covariance) matrix for the variables of the first group with the second group.
Coef_FG, Coef_SG Matrix of the canonical coefficients of the first group or second group, respectively.
Loads_FG, Loads_SG Matrix of the canonical loadings of the first group or second group, respectively.
Score_FG, Score_SG Canonical scores for the variables in FG and SG, respectively.
Crossload_FG, Crossload_FG Canonical cross-loadings for FG variables on the SG scores, and cross-loadings for SG variables on the FG scores, respectively.
SigTest A dataframe with the correlation of the canonical pairs and hypothesis testing results.
collinearity A list with the collinearity diagnostic for each group of variables.
References
Olivoto, T., V.Q. Souza, M. Nardino, I.R. Carvalho, M. Ferrari, A.J. Pelegrin, V.J. Szareski, and D. Schmidt. 2017. Multicollinearity in path analysis: a simple method to reduce its effects. Agron. J. 109:131-142. doi:10.2134/agronj2016.04.0196
Author
Tiago Olivoto tiagoolivoto@gmail.com
Examples
# \donttest{
library(metan)
cc1 <- can_corr(data_ge2,
FG = c(PH, EH, EP),
SG = c(EL, ED, CL, CD, CW, KW, NR))
#> ---------------------------------------------------------------------------
#> Matrix (correlation/covariance) between variables of first group (FG)
#> ---------------------------------------------------------------------------
#> PH EH EP
#> PH 1.0000000 0.9318282 0.6384123
#> EH 0.9318282 1.0000000 0.8695460
#> EP 0.6384123 0.8695460 1.0000000
#> ---------------------------------------------------------------------------
#> Collinearity within first group
#> ---------------------------------------------------------------------------
#> The multicollinearity in the matrix should be investigated.
#> CN = 977.586
#> Largest VIF = 229.164618380199
#> Matrix determinant: 0.0025852
#> Largest correlation: PH x EH = 0.932
#> Smallest correlation: PH x EP = 0.638
#> Number of VIFs > 10: 3
#> Number of correlations with r >= |0.8|: 2
#> Variables with largest weight in the last eigenvalues:
#> EH > PH > EP
#> ---------------------------------------------------------------------------
#> Matrix (correlation/covariance) between variables of second group (SG)
#> ---------------------------------------------------------------------------
#> EL ED CL CD CW KW NR
#> EL 1.00000000 0.3851451 0.2554068 0.91186526 0.4581728 0.6685601 -0.01387378
#> ED 0.38514512 1.0000000 0.6974629 0.38971282 0.7371305 0.8241426 0.55253448
#> CL 0.25540676 0.6974629 1.0000000 0.30036364 0.7383379 0.4709310 0.26193592
#> CD 0.91186526 0.3897128 0.3003636 1.00000000 0.4840299 0.6259806 -0.03584984
#> CW 0.45817278 0.7371305 0.7383379 0.48402989 1.0000000 0.7348622 0.16565752
#> KW 0.66856012 0.8241426 0.4709310 0.62598062 0.7348622 1.0000000 0.36214470
#> NR -0.01387378 0.5525345 0.2619359 -0.03584984 0.1656575 0.3621447 1.00000000
#> ---------------------------------------------------------------------------
#> Collinearity within second group
#> ---------------------------------------------------------------------------
#> Weak multicollinearity in the matrix
#> CN = 68.376
#> Matrix determinant: 0.0015322
#> Largest correlation: EL x CD = 0.912
#> Smallest correlation: EL x NR = -0.014
#> Number of VIFs > 10: 0
#> Number of correlations with r >= |0.8|: 2
#> Variables with largest weight in the last eigenvalues:
#> KW > ED > EL > CD > CL > CW > NR
#> ---------------------------------------------------------------------------
#> Matrix (correlation/covariance) between FG and SG
#> ---------------------------------------------------------------------------
#> EL ED CL CD CW KW NR
#> PH 0.3801960 0.6613148 0.3251648 0.3153910 0.5047388 0.7534439 0.3286065
#> EH 0.3626537 0.6302561 0.3971935 0.2805118 0.5193136 0.7029469 0.2648051
#> EP 0.2634237 0.4580196 0.3908239 0.1750448 0.4248098 0.4974193 0.1404315
#> ---------------------------------------------------------------------------
#> Correlation of the canonical pairs and hypothesis testing
#> ---------------------------------------------------------------------------
#> Var Percent Sum Corr Lambda Chisq DF p_val
#> U1V1 0.6315391 76.189861 76.18986 0.7946943 0.29647 181.76246 21 0.00000
#> U2V2 0.1867300 22.527394 98.71725 0.4321226 0.80462 32.49857 12 0.00116
#> U3V3 0.0106327 1.282745 100.00000 0.1031150 0.98937 1.59810 5 0.90148
#> ---------------------------------------------------------------------------
#> Canonical coefficients of the first group
#> ---------------------------------------------------------------------------
#> U1 U2 U3
#> PH 2.526492 5.866685 7.317151
#> EH -2.436372 -8.263008 -12.447948
#> EP 1.144533 2.747079 6.487414
#> ---------------------------------------------------------------------------
#> Canonical coefficients of the second group
#> ---------------------------------------------------------------------------
#> V1 V2 V3
#> EL -0.00892526 -0.9360837 0.7670684
#> ED 0.19371881 0.2969851 -1.8240876
#> CL -0.08385387 -1.2150642 0.1719827
#> CD -0.30662013 1.1369520 -1.4230311
#> CW -0.15225785 0.1913916 0.4777071
#> KW 1.16752245 -0.1255657 1.1247216
#> NR -0.05865868 0.4861885 0.6223953
#> ---------------------------------------------------------------------------
#> Canonical loads of the first group
#> ---------------------------------------------------------------------------
#> U1 U2 U3
#> PH 0.9868962 -0.07924975 -0.14055369
#> EH 0.9131089 -0.40755395 0.01148369
#> EP 0.6389394 -0.69262240 0.33470980
#> ---------------------------------------------------------------------------
#> Canonical loads of the second group
#> ---------------------------------------------------------------------------
#> V1 V2 V3
#> EL 0.4762839 -0.09829294 -0.22697572
#> ED 0.8298627 -0.16168789 -0.34031848
#> CL 0.3789207 -0.69598199 -0.28635983
#> CD 0.3948013 0.03075542 -0.46981539
#> CW 0.6243739 -0.37712156 -0.14762207
#> KW 0.9566482 -0.05042023 -0.09910729
#> NR 0.4351188 0.29047403 0.18639351
# Canonical correlations for each environment
cc3 <- data_ge2 %>%
can_corr(FG = c(PH, EH, EP),
SG = c(EL, ED, CL, CD, CW, KW, NR),
by = ENV,
verbose = FALSE)
# }