Impute the missing entries of a matrix with missing values using different algorithms. See Details section for more details
Usage
impute_missing_val(
.data,
naxis = 1,
algorithm = "EM-SVD",
tol = 1e-10,
max_iter = 1000,
simplified = FALSE,
verbose = TRUE
)
Arguments
- .data
A matrix to impute the missing entries. Frequently a two-way table of genotype means in each environment.
- naxis
The rank of the Singular Value Approximation. Defaults to
1
.- algorithm
The algorithm to impute missing values. Defaults to
"EM-SVD"
. Other possible values are"EM-AMMI"
and"colmeans"
. See Details section.- tol
The convergence tolerance for the algorithm.
- max_iter
The maximum number of steps to take. If
max_iter
is achieved without convergence, the algorithm will stop with a warning.- simplified
Valid argument when
algorithm = "EM-AMMI"
. IFFALSE
(default), the current effects of rows and columns change from iteration to iteration. IfTRUE
, the general mean and effects of rows and columns are computed in the first iteration only, and in next iterations uses these values.- verbose
Logical argument. If
verbose = FALSE
the code will run silently.
Value
An object of class imv
with the following values:
.data The imputed matrix
pc_ss The sum of squares representing variation explained by the principal components
iter The final number of iterations.
Final_RMSE The maximum change of the estimated values for missing cells in the last step of iteration.
final_axis The final number of principal component axis.
convergence Logical value indicating whether the modern converged.
Details
EM-AMMI
algorithm
The EM-AMMI
algorithm completes a data set with missing values according to both
main and interaction effects. The algorithm works as follows (Gauch and
Zobel, 1990):
The initial values are calculated as the grand mean increased by main effects of rows and main effects of columns. That way, the matrix of observations is pre-filled in.
The parameters of the AMMI model are estimated.
The adjusted means are calculated based on the AMMI model with
naxis
principal components.The missing cells are filled with the adjusted means.
The root mean square error of the predicted values (
RMSE_p
) is calculated with the two lasts iteration steps. IfRMSE_p > tol
, the steps 2 through 5 are repeated. Declare convergence ifRMSE_p < tol
. Ifmax_iter
is achieved without convergence, the algorithm will stop with a warning.
EM-SVD
algorithm
The EM-SVD
algorithm impute the missing entries using a low-rank Singular
Value Decomposition approximation estimated by the Expectation-Maximization
algorithm. The algorithm works as follows (Troyanskaya et al., 2001).
Initialize all
NA
values to the column means.Compute the first
naxis
terms of the SVD of the completed matrixReplace the previously missing values with their approximations from the SVD
The root mean square error of the predicted values (
RMSE_p
) is calculated with the two lasts iteration steps. IfRMSE_p > tol
, the steps 2 through 3 are repeated. Declare convergence ifRMSE_p < tol
. Ifmax_iter
is achieved without convergence, the algorithm will stop with a warning.
colmeans
algorithm
The colmeans
algorithm simply impute the missing entires using the
column mean of the respective entire. Thus, there is no iteractive process.
References
Gauch, H. G., & Zobel, R. W. (1990). Imputing missing yield trial data. Theoretical and Applied Genetics, 79(6), 753-761. doi:10.1007/BF00224240
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., . Altman, R. B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6), 520-525.
Examples
# \donttest{
library(metan)
mat <- (1:20) %*% t(1:10)
mat
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 1 2 3 4 5 6 7 8 9 10
#> [2,] 2 4 6 8 10 12 14 16 18 20
#> [3,] 3 6 9 12 15 18 21 24 27 30
#> [4,] 4 8 12 16 20 24 28 32 36 40
#> [5,] 5 10 15 20 25 30 35 40 45 50
#> [6,] 6 12 18 24 30 36 42 48 54 60
#> [7,] 7 14 21 28 35 42 49 56 63 70
#> [8,] 8 16 24 32 40 48 56 64 72 80
#> [9,] 9 18 27 36 45 54 63 72 81 90
#> [10,] 10 20 30 40 50 60 70 80 90 100
#> [11,] 11 22 33 44 55 66 77 88 99 110
#> [12,] 12 24 36 48 60 72 84 96 108 120
#> [13,] 13 26 39 52 65 78 91 104 117 130
#> [14,] 14 28 42 56 70 84 98 112 126 140
#> [15,] 15 30 45 60 75 90 105 120 135 150
#> [16,] 16 32 48 64 80 96 112 128 144 160
#> [17,] 17 34 51 68 85 102 119 136 153 170
#> [18,] 18 36 54 72 90 108 126 144 162 180
#> [19,] 19 38 57 76 95 114 133 152 171 190
#> [20,] 20 40 60 80 100 120 140 160 180 200
# 10% of missing values at random
miss_mat <- random_na(mat, prop = 10)
miss_mat
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 1 2 3 4 5 6 7 8 9 10
#> [2,] 2 4 6 8 10 NA 14 16 NA 20
#> [3,] 3 6 9 NA 15 18 21 24 27 30
#> [4,] 4 8 12 16 20 24 28 32 36 40
#> [5,] 5 10 15 20 25 30 35 40 45 50
#> [6,] NA 12 18 24 30 36 42 48 54 60
#> [7,] 7 14 21 28 35 42 49 56 63 70
#> [8,] 8 16 24 NA 40 48 56 64 72 80
#> [9,] 9 18 NA NA NA 54 63 72 81 90
#> [10,] NA 20 30 40 50 60 NA NA 90 100
#> [11,] NA 22 33 44 55 66 77 88 99 110
#> [12,] 12 24 36 48 60 72 84 96 108 120
#> [13,] 13 26 39 52 65 78 91 104 117 130
#> [14,] 14 28 42 56 70 84 NA 112 126 140
#> [15,] 15 30 45 60 75 90 105 NA 135 150
#> [16,] 16 32 48 64 NA 96 112 128 144 NA
#> [17,] 17 34 51 68 85 102 119 136 153 170
#> [18,] 18 36 54 72 90 108 126 144 162 180
#> [19,] NA 38 57 76 95 114 133 152 NA NA
#> [20,] 20 NA 60 80 100 120 140 160 180 200
mod <- impute_missing_val(miss_mat)
#> ----------------------------------------------
#> Convergence information
#> ----------------------------------------------
#> Number of iterations: 46
#> Final RMSE: 6.827148e-11
#> Number of axis: 1
#> Convergence: TRUE
#> ----------------------------------------------
mod$.data
#> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
#> [1,] 1 2 3 4 5 6 7 8 9 10
#> [2,] 2 4 6 8 10 12 14 16 18 20
#> [3,] 3 6 9 12 15 18 21 24 27 30
#> [4,] 4 8 12 16 20 24 28 32 36 40
#> [5,] 5 10 15 20 25 30 35 40 45 50
#> [6,] 6 12 18 24 30 36 42 48 54 60
#> [7,] 7 14 21 28 35 42 49 56 63 70
#> [8,] 8 16 24 32 40 48 56 64 72 80
#> [9,] 9 18 27 36 45 54 63 72 81 90
#> [10,] 10 20 30 40 50 60 70 80 90 100
#> [11,] 11 22 33 44 55 66 77 88 99 110
#> [12,] 12 24 36 48 60 72 84 96 108 120
#> [13,] 13 26 39 52 65 78 91 104 117 130
#> [14,] 14 28 42 56 70 84 98 112 126 140
#> [15,] 15 30 45 60 75 90 105 120 135 150
#> [16,] 16 32 48 64 80 96 112 128 144 160
#> [17,] 17 34 51 68 85 102 119 136 153 170
#> [18,] 18 36 54 72 90 108 126 144 162 180
#> [19,] 19 38 57 76 95 114 133 152 171 190
#> [20,] 20 40 60 80 100 120 140 160 180 200
# }