desc_stat()
Computes the most used measures of central tendency, position, and dispersion.desc_wider()
is useful to put the variables in columns and grouping variables in rows. The table is filled with a statistic chosen with the argumentstat
.
Usage
desc_stat(
.data = NULL,
...,
by = NULL,
stats = "main",
hist = FALSE,
level = 0.95,
digits = 4,
na.rm = FALSE,
verbose = TRUE,
plot_theme = theme_metan()
)
desc_wider(.data, which)
Arguments
- .data
The data to be analyzed. It can be a data frame (possible with grouped data passed from
dplyr::group_by()
or a numeric vector. Fordesc_wider()
.data
is an object of classdesc_stat
.- ...
A single variable name or a comma-separated list of unquoted variables names. If no variable is informed, all the numeric variables from
.data
will be used. Select helpers are allowed.- by
One variable (factor) to compute the function by. It is a shortcut to
dplyr::group_by()
. To compute the statistics by more than one grouping variable use that function.- stats
The descriptive statistics to show. This is used to filter the output after computation. Defaults to
"main"
(cv, max, mean median, min, sd.amo, se, ci ). Other allowed values are"all"
to show all the statistics,"robust"
to show robust statistics,"quantile"
to show quantile statistics, or chose one (or more) of the following:"av.dev"
: average deviation."ci.t"
: t-interval (95% confidence interval) of the mean."ci.z"
: z-interval (95% confidence interval) of the mean."cv"
: coefficient of variation."iqr"
: interquartile range."gmean"
: geometric mean."hmean"
: harmonic mean."Kurt"
: kurtosis."mad"
: median absolute deviation."max"
: maximum value."mean"
: arithmetic mean."median"
: median."min"
: minimum value."n"
: the length of the data."n.valid"
: The valid (NotNA
) number of elements"n.missing"
: The number of missing values"n.unique"
: The length of unique elements."ps"
: the pseudo-sigma (iqr / 1.35)."q2.5", "q25", "q75", "q97.5"
: the percentile 2.5\ quartile, third quartile, and percentile 97.5\range
: The range of data)."sd.amo", "sd.pop"
: the sample and population standard deviation."se"
: the standard error of the mean."skew"
: skewness."sum"
. the sum of the values."sum.dev"
: the sum of the absolute deviations."ave.sq.dev"
: the average of the squared deviations."sum.sq.dev"
: the sum of the squared deviations."n.valid"
: The size of sample with valid number (not NA)."var.amo", "var.pop"
: the sample and population variance.
Use a names to select the statistics. For example,
stats = c("median, mean, cv, n")
. Note that the statistic names are not case-sensitive. Both comma or space can be used as separator.- hist
Logical argument defaults to
FALSE
. Ifhist = TRUE
then a histogram is created for each selected variable.- level
The confidence level to compute the confidence interval of mean. Defaults to 0.95.
- digits
The number of significant digits.
- na.rm
Logical. Should missing values be removed? Defaults to
FALSE
.- verbose
Logical argument. If
verbose = FALSE
the code is run silently.- plot_theme
The graphical theme of the plot. Default is
plot_theme = theme_metan()
. For more details, seeggplot2::theme()
.- which
A statistic to fill the table.
Value
desc_stats()
returns a tibble with the statistics in the columns and variables (with possible grouping factors) in rows.desc_wider()
returns a tibble with variables in columns and grouping factors in rows.
Author
Tiago Olivoto tiagoolivoto@gmail.com
Examples
# \donttest{
library(metan)
#===============================================================#
# Example 1: main statistics (coefficient of variation, maximum,#
# mean, median, minimum, sample standard deviation, standard #
# error and confidence interval of the mean) for all numeric #
# variables in data #
#===============================================================#
desc_stat(data_ge2)
#> # A tibble: 15 × 10
#> variable cv max mean median min sd.amo se ci.t n.valid
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 CD 7.34 18.6 16.0 16 12.9 1.17 0.0939 0.186 156
#> 2 CDED 5.71 0.694 0.586 0.588 0.495 0.0334 0.0027 0.0053 156
#> 3 CL 7.95 34.7 29.0 28.7 23.5 2.31 0.185 0.365 156
#> 4 CW 25.2 38.5 24.8 24.5 11.1 6.26 0.501 0.99 156
#> 5 ED 5.58 54.9 49.5 49.9 43.5 2.76 0.221 0.437 156
#> 6 EH 21.2 1.88 1.34 1.41 0.752 0.284 0.0228 0.045 156
#> 7 EL 8.28 17.9 15.2 15.1 11.5 1.26 0.101 0.199 156
#> 8 EP 10.5 0.660 0.537 0.544 0.386 0.0564 0.0045 0.0089 156
#> 9 KW 18.9 251. 173. 175. 106. 32.8 2.62 5.18 156
#> 10 NKE 14.2 697. 512. 509. 332. 72.6 5.82 11.5 156
#> 11 NKR 10.7 42 32.2 32 23.2 3.47 0.277 0.548 156
#> 12 NR 10.2 21.2 16.1 16 12.4 1.64 0.131 0.259 156
#> 13 PERK 2.17 91.8 87.4 87.5 81.2 1.90 0.152 0.300 156
#> 14 PH 13.4 3.04 2.48 2.52 1.71 0.334 0.0267 0.0528 156
#> 15 TKW 13.9 452. 339. 342. 218. 47.1 3.77 7.44 156
#===============================================================#
#Example 2: robust statistics using a numeric vector as input #
# data
#===============================================================#
vect <- data_ge2$TKW
desc_stat(vect, stats = "robust")
#> # A tibble: 1 × 5
#> variable n median iqr ps
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 val 156 342. 57.8 42.8
#===============================================================#
# Example 3: Select specific statistics. In this example, NAs #
# are removed before analysis with a warning message #
#===============================================================#
desc_stat(c(12, 13, 19, 21, 8, NA, 23, NA),
stats = c('mean, se, cv, n, n.valid'),
na.rm = TRUE)
#> # A tibble: 1 × 6
#> variable mean se cv n n.valid
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 val 16 2.39 36.7 8 6
#===============================================================#
# Example 4: Select specific variables and compute statistics by#
# levels of a factor variable (GEN) #
#===============================================================#
stats <-
desc_stat(data_ge2,
EP, EL, EH, ED, PH, CD,
by = GEN)
stats
#> # A tibble: 78 × 11
#> GEN variable cv max mean median min sd.amo se ci.t n.valid
#> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 H1 CD 6.44 17.9 15.7 15.7 14.5 1.01 0.292 0.643 12
#> 2 H1 ED 2.66 53.3 51.2 50.8 49.2 1.36 0.393 0.864 12
#> 3 H1 EH 19.5 1.88 1.50 1.56 1.05 0.294 0.0848 0.187 12
#> 4 H1 EL 6.27 16.9 15.1 15.1 13.7 0.947 0.273 0.602 12
#> 5 H1 EP 9.91 0.658 0.570 0.574 0.492 0.0565 0.0163 0.0359 12
#> 6 H1 PH 11.7 3.00 2.62 2.70 2.11 0.307 0.0885 0.195 12
#> 7 H10 CD 6.32 17.5 15.9 15.7 14.4 1.00 0.290 0.638 12
#> 8 H10 ED 7.70 54.1 48.4 47.7 43.7 3.73 1.08 2.37 12
#> 9 H10 EH 23.2 1.71 1.26 1.25 0.888 0.293 0.0845 0.186 12
#> 10 H10 EL 6.83 16.7 15.1 14.9 13.6 1.03 0.298 0.656 12
#> # … with 68 more rows
# To get a 'wide' format with the maximum values for all variables
desc_wider(stats, max)
#> # A tibble: 13 × 7
#> GEN CD ED EH EL EP PH
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 H1 17.9 53.3 1.88 16.9 0.658 3.00
#> 2 H10 17.5 54.1 1.71 16.7 0.660 2.83
#> 3 H11 18.0 52.3 1.67 17.4 0.600 2.77
#> 4 H12 16.2 52.7 1.58 15.7 0.616 2.79
#> 5 H13 17.8 54.0 1.77 16.3 0.615 2.93
#> 6 H2 17.0 53.6 1.87 16.1 0.615 3.03
#> 7 H3 18.0 52.2 1.80 17.6 0.640 3.04
#> 8 H4 17.7 52.8 1.82 16.8 0.617 3.02
#> 9 H5 17.4 52.7 1.76 16.6 0.632 2.90
#> 10 H6 18.3 54.9 1.69 17.9 0.631 2.94
#> 11 H7 18.6 52.1 1.67 17.5 0.617 2.87
#> 12 H8 18.4 53.3 1.57 17.7 0.585 2.76
#> 13 H9 18.1 53.6 1.71 17.5 0.630 3.00
#===============================================================#
# Example 5: Compute all statistics for all numeric variables #
# by two or more factors. Note that group_by() was used to pass #
# grouped data to the function desc_stat() #
#===============================================================#
data_ge2 %>%
group_by(ENV, GEN) %>%
desc_stat()
#> # A tibble: 780 × 12
#> ENV GEN variable cv max mean median min sd.amo se
#> <fct> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A1 H1 CD 6.91 16.4 15.7 16.3 14.5 1.09 0.627
#> 2 A1 H1 CDED 2.04 0.561 0.550 0.551 0.538 0.0112 0.0065
#> 3 A1 H1 CL 1.48 28.4 28.1 28.1 27.6 0.415 0.239
#> 4 A1 H1 CW 7.93 25.1 23.5 24.0 21.4 1.86 1.08
#> 5 A1 H1 ED 1.98 52.2 51.1 50.7 50.3 1.01 0.583
#> 6 A1 H1 EH 5.36 1.76 1.68 1.71 1.58 0.0902 0.0521
#> 7 A1 H1 EL 7.15 16.1 15.4 16.0 14.2 1.10 0.637
#> 8 A1 H1 EP 5.34 0.658 0.626 0.628 0.591 0.0334 0.0193
#> 9 A1 H1 KW 8.31 217. 203. 208. 184. 16.8 9.72
#> 10 A1 H1 NKE 6.80 565. 527. 521. 494. 35.8 20.7
#> # … with 770 more rows, and 2 more variables: ci.t <dbl>, n.valid <dbl>
# }