Canonical correlation analysis — can

Performs canonical correlation analysis with collinearity diagnostic, estimation of canonical loads, canonical scores, and hypothesis testing for correlation pairs.

Usage

can_corr(
  .data,
  FG,
  SG,
  by = NULL,
  use = "cor",
  test = "Bartlett",
  prob = 0.05,
  center = TRUE,
  stdscores = FALSE,
  verbose = TRUE,
  collinearity = TRUE
)

Arguments

.data: The data to be analyzed. It can be a data frame (possible with grouped data passed from dplyr::group_by().
FG, SG: A comma-separated list of unquoted variable names that will compose the first (smallest) and second (highest) group of the correlation analysis, respectively. Select helpers are also allowed.
by: One variable (factor) to compute the function by. It is a shortcut to dplyr::group_by(). To compute the statistics by more than one grouping variable use that function.
use: The matrix to be used. Must be one of 'cor' for analysis using the correlation matrix (default) or 'cov' for analysis using the covariance matrix.
test: The test of significance of the relationship between the FG and SG. Must be one of the 'Bartlett' (default) or 'Rao'.
prob: The probability of error assumed. Set to 0.05.
center: Should the data be centered to compute the scores?
stdscores: Rescale scores to produce scores of unit variance?
verbose: Logical argument. If TRUE (default) then the results are shown in the console.
collinearity: Logical argument. If TRUE (default) then a collinearity diagnostic is performed for each group of variables according to Olivoto et al.(2017).

Value

If .data is a grouped data passed from dplyr::group_by() then the results will be returned into a list-column of data frames.

Matrix The correlation (or covariance) matrix of the variables
MFG, MSG The correlation (or covariance) matrix for the variables of the first group or second group, respectively.
MFG_SG The correlation (or covariance) matrix for the variables of the first group with the second group.
Coef_FG, Coef_SG Matrix of the canonical coefficients of the first group or second group, respectively.
Loads_FG, Loads_SG Matrix of the canonical loadings of the first group or second group, respectively.
Score_FG, Score_SG Canonical scores for the variables in FG and SG, respectively.
Crossload_FG, Crossload_FG Canonical cross-loadings for FG variables on the SG scores, and cross-loadings for SG variables on the FG scores, respectively.
SigTest A dataframe with the correlation of the canonical pairs and hypothesis testing results.
collinearity A list with the collinearity diagnostic for each group of variables.

References

Olivoto, T., V.Q. Souza, M. Nardino, I.R. Carvalho, M. Ferrari, A.J. Pelegrin, V.J. Szareski, and D. Schmidt. 2017. Multicollinearity in path analysis: a simple method to reduce its effects. Agron. J. 109:131-142. doi:10.2134/agronj2016.04.0196

Author

Tiago Olivoto tiagoolivoto@gmail.com

Examples

# \donttest{
library(metan)

cc1 <- can_corr(data_ge2,
               FG = c(PH, EH, EP),
               SG = c(EL, ED, CL, CD, CW, KW, NR))
#> ---------------------------------------------------------------------------
#> Matrix (correlation/covariance) between variables of first group (FG)
#> ---------------------------------------------------------------------------
#>           PH        EH        EP
#> PH 1.0000000 0.9318282 0.6384123
#> EH 0.9318282 1.0000000 0.8695460
#> EP 0.6384123 0.8695460 1.0000000
#> ---------------------------------------------------------------------------
#> Collinearity within first group 
#> ---------------------------------------------------------------------------
#> The multicollinearity in the matrix should be investigated.
#> CN = 977.586
#> Largest VIF = 229.164618380199
#> Matrix determinant: 0.0025852 
#> Largest correlation: PH x EH = 0.932 
#> Smallest correlation: PH x EP = 0.638 
#> Number of VIFs > 10: 3 
#> Number of correlations with r >= |0.8|: 2 
#> Variables with largest weight in the last eigenvalues: 
#> EH > PH > EP 
#> ---------------------------------------------------------------------------
#> Matrix (correlation/covariance) between variables of second group (SG)
#> ---------------------------------------------------------------------------
#>             EL        ED        CL          CD        CW        KW          NR
#> EL  1.00000000 0.3851451 0.2554068  0.91186526 0.4581728 0.6685601 -0.01387378
#> ED  0.38514512 1.0000000 0.6974629  0.38971282 0.7371305 0.8241426  0.55253448
#> CL  0.25540676 0.6974629 1.0000000  0.30036364 0.7383379 0.4709310  0.26193592
#> CD  0.91186526 0.3897128 0.3003636  1.00000000 0.4840299 0.6259806 -0.03584984
#> CW  0.45817278 0.7371305 0.7383379  0.48402989 1.0000000 0.7348622  0.16565752
#> KW  0.66856012 0.8241426 0.4709310  0.62598062 0.7348622 1.0000000  0.36214470
#> NR -0.01387378 0.5525345 0.2619359 -0.03584984 0.1656575 0.3621447  1.00000000
#> ---------------------------------------------------------------------------
#> Collinearity within second group 
#> ---------------------------------------------------------------------------
#> Weak multicollinearity in the matrix
#> CN = 68.376
#> Matrix determinant: 0.0015322 
#> Largest correlation: EL x CD = 0.912 
#> Smallest correlation: EL x NR = -0.014 
#> Number of VIFs > 10: 0 
#> Number of correlations with r >= |0.8|: 2 
#> Variables with largest weight in the last eigenvalues: 
#> KW > ED > EL > CD > CL > CW > NR 
#> ---------------------------------------------------------------------------
#> Matrix (correlation/covariance) between FG and SG
#> ---------------------------------------------------------------------------
#>           EL        ED        CL        CD        CW        KW        NR
#> PH 0.3801960 0.6613148 0.3251648 0.3153910 0.5047388 0.7534439 0.3286065
#> EH 0.3626537 0.6302561 0.3971935 0.2805118 0.5193136 0.7029469 0.2648051
#> EP 0.2634237 0.4580196 0.3908239 0.1750448 0.4248098 0.4974193 0.1404315
#> ---------------------------------------------------------------------------
#> Correlation of the canonical pairs and hypothesis testing 
#> ---------------------------------------------------------------------------
#>            Var   Percent       Sum      Corr  Lambda     Chisq DF   p_val
#> U1V1 0.6315391 76.189861  76.18986 0.7946943 0.29647 181.76246 21 0.00000
#> U2V2 0.1867300 22.527394  98.71725 0.4321226 0.80462  32.49857 12 0.00116
#> U3V3 0.0106327  1.282745 100.00000 0.1031150 0.98937   1.59810  5 0.90148
#> ---------------------------------------------------------------------------
#> Canonical coefficients of the first group 
#> ---------------------------------------------------------------------------
#>           U1        U2         U3
#> PH  2.526492  5.866685   7.317151
#> EH -2.436372 -8.263008 -12.447948
#> EP  1.144533  2.747079   6.487414
#> ---------------------------------------------------------------------------
#> Canonical coefficients of the second group 
#> ---------------------------------------------------------------------------
#>             V1         V2         V3
#> EL -0.00892526 -0.9360837  0.7670684
#> ED  0.19371881  0.2969851 -1.8240876
#> CL -0.08385387 -1.2150642  0.1719827
#> CD -0.30662013  1.1369520 -1.4230311
#> CW -0.15225785  0.1913916  0.4777071
#> KW  1.16752245 -0.1255657  1.1247216
#> NR -0.05865868  0.4861885  0.6223953
#> ---------------------------------------------------------------------------
#> Canonical loads of the first group 
#> ---------------------------------------------------------------------------
#>           U1          U2          U3
#> PH 0.9868962 -0.07924975 -0.14055369
#> EH 0.9131089 -0.40755395  0.01148369
#> EP 0.6389394 -0.69262240  0.33470980
#> ---------------------------------------------------------------------------
#> Canonical loads of the second group 
#> ---------------------------------------------------------------------------
#>           V1          V2          V3
#> EL 0.4762839 -0.09829294 -0.22697572
#> ED 0.8298627 -0.16168789 -0.34031848
#> CL 0.3789207 -0.69598199 -0.28635983
#> CD 0.3948013  0.03075542 -0.46981539
#> CW 0.6243739 -0.37712156 -0.14762207
#> KW 0.9566482 -0.05042023 -0.09910729
#> NR 0.4351188  0.29047403  0.18639351


# Canonical correlations for each environment
cc3 <- data_ge2 %>%
       can_corr(FG = c(PH, EH, EP),
                SG = c(EL, ED, CL, CD, CW, KW, NR),
                by = ENV,
                verbose = FALSE)

# }