- desc_stat()Computes the most used measures of central tendency, position, and dispersion.
- desc_wider()is useful to put the variables in columns and grouping variables in rows. The table is filled with a statistic chosen with the argument- stat.
Usage
desc_stat(
  .data = NULL,
  ...,
  by = NULL,
  stats = "main",
  hist = FALSE,
  level = 0.95,
  digits = 4,
  na.rm = FALSE,
  verbose = TRUE,
  plot_theme = theme_metan()
)
desc_wider(.data, which)Arguments
- .data
- The data to be analyzed. It can be a data frame (possible with grouped data passed from - dplyr::group_by()or a numeric vector. For- desc_wider()- .datais an object of class- desc_stat.
- ...
- A single variable name or a comma-separated list of unquoted variables names. If no variable is informed, all the numeric variables from - .datawill be used. Select helpers are allowed.
- by
- One variable (factor) to compute the function by. It is a shortcut to - dplyr::group_by(). To compute the statistics by more than one grouping variable use that function.
- stats
- The descriptive statistics to show. This is used to filter the output after computation. Defaults to - "main"(cv, max, mean median, min, sd.amo, se, ci ). Other allowed values are- "all"to show all the statistics,- "robust"to show robust statistics,- "quantile"to show quantile statistics, or chose one (or more) of the following:- "av.dev": average deviation.
- "ci.t": t-interval (95% confidence interval) of the mean.
- "ci.z": z-interval (95% confidence interval) of the mean.
- "cv": coefficient of variation.
- "iqr": interquartile range.
- "gmean": geometric mean.
- "hmean": harmonic mean.
- "Kurt": kurtosis.
- "mad": median absolute deviation.
- "max": maximum value.
- "mean": arithmetic mean.
- "median": median.
- "min": minimum value.
- "n": the length of the data.
- "n.valid": The valid (Not- NA) number of elements
- "n.missing": The number of missing values
- "n.unique": The length of unique elements.
- "ps": the pseudo-sigma (iqr / 1.35).
- "q2.5", "q25", "q75", "q97.5": the percentile 2.5\ quartile, third quartile, and percentile 97.5\
- range: The range of data).
- "sd.amo", "sd.pop": the sample and population standard deviation.
- "se": the standard error of the mean.
- "skew": skewness.
- "sum". the sum of the values.
- "sum.dev": the sum of the absolute deviations.
- "ave.sq.dev": the average of the squared deviations.
- "sum.sq.dev": the sum of the squared deviations.
- "n.valid": The size of sample with valid number (not NA).
- "var.amo", "var.pop": the sample and population variance.
 - Use a names to select the statistics. For example, - stats = c("median, mean, cv, n"). Note that the statistic names are not case-sensitive. Both comma or space can be used as separator.
- hist
- Logical argument defaults to - FALSE. If- hist = TRUEthen a histogram is created for each selected variable.
- level
- The confidence level to compute the confidence interval of mean. Defaults to 0.95. 
- digits
- The number of significant digits. 
- na.rm
- Logical. Should missing values be removed? Defaults to - FALSE.
- verbose
- Logical argument. If - verbose = FALSEthe code is run silently.
- plot_theme
- The graphical theme of the plot. Default is - plot_theme = theme_metan(). For more details, see- ggplot2::theme().
- which
- A statistic to fill the table. 
Value
- desc_stats()returns a tibble with the statistics in the columns and variables (with possible grouping factors) in rows.
- desc_wider()returns a tibble with variables in columns and grouping factors in rows.
Author
Tiago Olivoto tiagoolivoto@gmail.com
Examples
# \donttest{
library(metan)
#===============================================================#
# Example 1: main statistics (coefficient of variation, maximum,#
# mean, median, minimum, sample standard deviation, standard    #
# error and confidence interval of the mean) for all numeric    #
# variables in data                                             #
#===============================================================#
desc_stat(data_ge2)
#> # A tibble: 15 × 10
#>    variable    cv     max    mean  median     min  sd.amo     se    ci.t n.valid
#>    <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl>   <dbl>
#>  1 CD        7.34  18.6    16.0    16      12.9    1.17   0.0939  0.186      156
#>  2 CDED      5.71   0.694   0.586   0.588   0.495  0.0334 0.0027  0.0053     156
#>  3 CL        7.95  34.7    29.0    28.7    23.5    2.31   0.185   0.365      156
#>  4 CW       25.2   38.5    24.8    24.5    11.1    6.26   0.501   0.99       156
#>  5 ED        5.58  54.9    49.5    49.9    43.5    2.76   0.221   0.437      156
#>  6 EH       21.2    1.88    1.34    1.41    0.752  0.284  0.0228  0.045      156
#>  7 EL        8.28  17.9    15.2    15.1    11.5    1.26   0.101   0.199      156
#>  8 EP       10.5    0.660   0.537   0.544   0.386  0.0564 0.0045  0.0089     156
#>  9 KW       18.9  251.    173.    175.    106.    32.8    2.62    5.18       156
#> 10 NKE      14.2  697.    512.    509.    332.    72.6    5.82   11.5        156
#> 11 NKR      10.7   42      32.2    32      23.2    3.47   0.277   0.548      156
#> 12 NR       10.2   21.2    16.1    16      12.4    1.64   0.131   0.259      156
#> 13 PERK      2.17  91.8    87.4    87.5    81.2    1.90   0.152   0.300      156
#> 14 PH       13.4    3.04    2.48    2.52    1.71   0.334  0.0267  0.0528     156
#> 15 TKW      13.9  452.    339.    342.    218.    47.1    3.77    7.44       156
#===============================================================#
#Example 2: robust statistics using a numeric vector as input   #
# data
#===============================================================#
vect <- data_ge2$TKW
desc_stat(vect, stats = "robust")
#> # A tibble: 1 × 5
#>   variable     n median   iqr    ps
#>   <chr>    <dbl>  <dbl> <dbl> <dbl>
#> 1 val        156   342.  57.8  42.8
#===============================================================#
# Example 3: Select specific statistics. In this example, NAs   #
# are removed before analysis with a warning message            #
#===============================================================#
desc_stat(c(12, 13, 19, 21, 8, NA, 23, NA),
          stats = c('mean, se, cv, n, n.valid'),
          na.rm = TRUE)
#> # A tibble: 1 × 6
#>   variable  mean    se    cv     n n.valid
#>   <chr>    <dbl> <dbl> <dbl> <dbl>   <dbl>
#> 1 val         16  2.39  36.7     8       6
#===============================================================#
# Example 4: Select specific variables and compute statistics by#
# levels of a factor variable (GEN)                             #
#===============================================================#
stats <-
  desc_stat(data_ge2,
            EP, EL, EH, ED, PH, CD,
            by = GEN)
stats
#> # A tibble: 78 × 11
#>    GEN   variable    cv    max   mean median    min sd.amo     se   ci.t n.valid
#>    <fct> <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl>
#>  1 H1    CD        6.44 17.9   15.7   15.7   14.5   1.01   0.292  0.643       12
#>  2 H1    ED        2.66 53.3   51.2   50.8   49.2   1.36   0.393  0.864       12
#>  3 H1    EH       19.5   1.88   1.50   1.56   1.05  0.294  0.0848 0.187       12
#>  4 H1    EL        6.27 16.9   15.1   15.1   13.7   0.947  0.273  0.602       12
#>  5 H1    EP        9.91  0.658  0.570  0.574  0.492 0.0565 0.0163 0.0359      12
#>  6 H1    PH       11.7   3.00   2.62   2.70   2.11  0.307  0.0885 0.195       12
#>  7 H10   CD        6.32 17.5   15.9   15.7   14.4   1.00   0.290  0.638       12
#>  8 H10   ED        7.70 54.1   48.4   47.7   43.7   3.73   1.08   2.37        12
#>  9 H10   EH       23.2   1.71   1.26   1.25   0.888 0.293  0.0845 0.186       12
#> 10 H10   EL        6.83 16.7   15.1   14.9   13.6   1.03   0.298  0.656       12
#> # … with 68 more rows
# To get a 'wide' format with the maximum values for all variables
desc_wider(stats, max)
#> # A tibble: 13 × 7
#>    GEN      CD    ED    EH    EL    EP    PH
#>    <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 H1     17.9  53.3  1.88  16.9 0.658  3.00
#>  2 H10    17.5  54.1  1.71  16.7 0.660  2.83
#>  3 H11    18.0  52.3  1.67  17.4 0.600  2.77
#>  4 H12    16.2  52.7  1.58  15.7 0.616  2.79
#>  5 H13    17.8  54.0  1.77  16.3 0.615  2.93
#>  6 H2     17.0  53.6  1.87  16.1 0.615  3.03
#>  7 H3     18.0  52.2  1.80  17.6 0.640  3.04
#>  8 H4     17.7  52.8  1.82  16.8 0.617  3.02
#>  9 H5     17.4  52.7  1.76  16.6 0.632  2.90
#> 10 H6     18.3  54.9  1.69  17.9 0.631  2.94
#> 11 H7     18.6  52.1  1.67  17.5 0.617  2.87
#> 12 H8     18.4  53.3  1.57  17.7 0.585  2.76
#> 13 H9     18.1  53.6  1.71  17.5 0.630  3.00
#===============================================================#
# Example 5: Compute all statistics for all numeric variables   #
# by two or more factors. Note that group_by() was used to pass #
# grouped data to the function desc_stat()                      #
#===============================================================#
data_ge2 %>%
  group_by(ENV, GEN) %>%
  desc_stat()
#> # A tibble: 780 × 12
#>    ENV   GEN   variable    cv     max    mean  median     min  sd.amo      se
#>    <fct> <fct> <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#>  1 A1    H1    CD        6.91  16.4    15.7    16.3    14.5    1.09    0.627 
#>  2 A1    H1    CDED      2.04   0.561   0.550   0.551   0.538  0.0112  0.0065
#>  3 A1    H1    CL        1.48  28.4    28.1    28.1    27.6    0.415   0.239 
#>  4 A1    H1    CW        7.93  25.1    23.5    24.0    21.4    1.86    1.08  
#>  5 A1    H1    ED        1.98  52.2    51.1    50.7    50.3    1.01    0.583 
#>  6 A1    H1    EH        5.36   1.76    1.68    1.71    1.58   0.0902  0.0521
#>  7 A1    H1    EL        7.15  16.1    15.4    16.0    14.2    1.10    0.637 
#>  8 A1    H1    EP        5.34   0.658   0.626   0.628   0.591  0.0334  0.0193
#>  9 A1    H1    KW        8.31 217.    203.    208.    184.    16.8     9.72  
#> 10 A1    H1    NKE       6.80 565.    527.    521.    494.    35.8    20.7   
#> # … with 770 more rows, and 2 more variables: ci.t <dbl>, n.valid <dbl>
# }
