Group by one or more variables

Most data operations are usefully done on groups defined by variables. group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group". ungroup() removes grouping.

group_by(.data, ..., add = FALSE)

ungroup(x, ...)

Arguments

.data	a tbl
...	Variables to group by. All tbls accept variable names, some will also accept functions of variables. Duplicated groups will be silently dropped.
add	When `add = FALSE`, the default, `group_by()` will override existing groups. To instead add to the existing groups, use `add = TRUE`
x	A `tbl()`

Tbl types

group_by() is an S3 generic with methods for the three built-in tbls. See the help for the corresponding classes and their manip methods for more details:

data.frame: grouped_df
data.table: dtplyr::grouped_dt
SQLite: src_sqlite()
PostgreSQL: src_postgres()
MySQL: src_mysql()

Scoped grouping

The three scoped variants (group_by_all(), group_by_if() and group_by_at()) make it easy to group a dataset by a selection of variables.

Examples

by_cyl <- mtcars %>% group_by(cyl)

# grouping doesn't change how the data looks (apart from listing
# how it's grouped):
by_cyl
#> # A tibble: 32 × 11
#> # Groups: cyl [3]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>  * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21.0     6 160.0   110  3.90 2.620 16.46     0     1     4     4
#>  2  21.0     6 160.0   110  3.90 2.875 17.02     0     1     4     4
#>  3  22.8     4 108.0    93  3.85 2.320 18.61     1     1     4     1
#>  4  21.4     6 258.0   110  3.08 3.215 19.44     1     0     3     1
#>  5  18.7     8 360.0   175  3.15 3.440 17.02     0     0     3     2
#>  6  18.1     6 225.0   105  2.76 3.460 20.22     1     0     3     1
#>  7  14.3     8 360.0   245  3.21 3.570 15.84     0     0     3     4
#>  8  24.4     4 146.7    62  3.69 3.190 20.00     1     0     4     2
#>  9  22.8     4 140.8    95  3.92 3.150 22.90     1     0     4     2
#> 10  19.2     6 167.6   123  3.92 3.440 18.30     1     0     4     4
#> # ... with 22 more rows

# It changes how it acts with the other dplyr verbs:
by_cyl %>% summarise(
  disp = mean(disp),
  hp = mean(hp)
)
#> # A tibble: 3 × 3
#>     cyl     disp        hp
#>   <dbl>    <dbl>     <dbl>
#> 1     4 105.1364  82.63636
#> 2     6 183.3143 122.28571
#> 3     8 353.1000 209.21429
by_cyl %>% filter(disp == max(disp))
#> # A tibble: 3 × 11
#> # Groups: cyl [3]
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  21.4     6 258.0   110  3.08 3.215 19.44     1     0     3     1
#> 2  24.4     4 146.7    62  3.69 3.190 20.00     1     0     4     2
#> 3  10.4     8 472.0   205  2.93 5.250 17.98     0     0     3     4

# Each call to summarise() removes a layer of grouping
by_vs_am <- mtcars %>% group_by(vs, am)
by_vs <- by_vs_am %>% summarise(n = n())
by_vs
#> # A tibble: 4 × 3
#> # Groups: vs [?]
#>      vs    am     n
#>   <dbl> <dbl> <int>
#> 1     0     0    12
#> 2     0     1     6
#> 3     1     0     7
#> 4     1     1     7
by_vs %>% summarise(n = sum(n))
#> # A tibble: 2 × 2
#>      vs     n
#>   <dbl> <int>
#> 1     0    18
#> 2     1    14

# To removing grouping, use ungroup
by_vs %>%
  ungroup() %>%
  summarise(n = sum(n))
#> # A tibble: 1 × 1
#>       n
#>   <int>
#> 1    32

# You can group by expressions: this is just short-hand for
# a mutate/rename followed by a simple group_by
mtcars %>% group_by(vsam = vs + am)
#> # A tibble: 32 × 12
#> # Groups: vsam [3]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb  vsam
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21.0     6 160.0   110  3.90 2.620 16.46     0     1     4     4     1
#>  2  21.0     6 160.0   110  3.90 2.875 17.02     0     1     4     4     1
#>  3  22.8     4 108.0    93  3.85 2.320 18.61     1     1     4     1     2
#>  4  21.4     6 258.0   110  3.08 3.215 19.44     1     0     3     1     1
#>  5  18.7     8 360.0   175  3.15 3.440 17.02     0     0     3     2     0
#>  6  18.1     6 225.0   105  2.76 3.460 20.22     1     0     3     1     1
#>  7  14.3     8 360.0   245  3.21 3.570 15.84     0     0     3     4     0
#>  8  24.4     4 146.7    62  3.69 3.190 20.00     1     0     4     2     1
#>  9  22.8     4 140.8    95  3.92 3.150 22.90     1     0     4     2     1
#> 10  19.2     6 167.6   123  3.92 3.440 18.30     1     0     4     4     1
#> # ... with 22 more rows

# By default, group_by overrides existing grouping
by_cyl %>%
  group_by(vs, am) %>%
  group_vars()
#> [1] "vs" "am"

# Use add = TRUE to instead append
by_cyl %>%
  group_by(vs, am, add = TRUE) %>%
  group_vars()
#> [1] "cyl" "vs"  "am"

Arguments

Tbl types

Scoped grouping

Examples

Contents