Retain only unique/distinct rows from an input tbl. This is similar
to unique.data.frame(), but considerably faster.
distinct(.data, ..., .keep_all = FALSE)
| .data | a tbl |
|---|---|
| ... | Optional variables to use when determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be preserved. If omitted, will use all variables. |
| .keep_all | If |
#> [1] 100nrow(distinct(df))#> [1] 66nrow(distinct(df, x, y))#> [1] 66distinct(df, x)#> # A tibble: 10 × 1 #> x #> <int> #> 1 8 #> 2 7 #> 3 10 #> 4 5 #> 5 6 #> 6 2 #> 7 9 #> 8 4 #> 9 1 #> 10 3distinct(df, y)#> # A tibble: 10 × 1 #> y #> <int> #> 1 9 #> 2 8 #> 3 4 #> 4 6 #> 5 5 #> 6 1 #> 7 7 #> 8 2 #> 9 3 #> 10 10# Can choose to keep all other variables as well distinct(df, x, .keep_all = TRUE)#> # A tibble: 10 × 2 #> x y #> <int> <int> #> 1 8 9 #> 2 7 9 #> 3 10 8 #> 4 5 4 #> 5 6 6 #> 6 2 5 #> 7 9 6 #> 8 4 3 #> 9 1 1 #> 10 3 9distinct(df, y, .keep_all = TRUE)#> # A tibble: 10 × 2 #> x y #> <int> <int> #> 1 8 9 #> 2 10 8 #> 3 5 4 #> 4 6 6 #> 5 2 5 #> 6 6 1 #> 7 6 7 #> 8 8 2 #> 9 4 3 #> 10 9 10# You can also use distinct on computed variables distinct(df, diff = abs(x - y))#> # A tibble: 10 × 1 #> diff #> <int> #> 1 1 #> 2 2 #> 3 0 #> 4 3 #> 5 5 #> 6 6 #> 7 4 #> 8 9 #> 9 7 #> 10 8# The same behaviour applies for grouped data frames # except that the grouping variables are always included df <- tibble( g = c(1, 1, 2, 2), x = c(1, 1, 2, 1) ) %>% group_by(g) df %>% distinct()#> # A tibble: 3 × 2 #> # Groups: g [2] #> g x #> <dbl> <dbl> #> 1 1 1 #> 2 2 2 #> 3 2 1df %>% distinct(x)#> # A tibble: 3 × 2 #> # Groups: g [2] #> g x #> <dbl> <dbl> #> 1 1 1 #> 2 2 2 #> 3 2 1