These are generic functions that dispatch to individual tbl methods - see the
method documentation for details of individual data sources. x and
y should usually be from the same data source, but if copy is
TRUE, y will automatically be copied to the same source as x.
inner_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) left_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) right_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) full_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) semi_join(x, y, by = NULL, copy = FALSE, ...) anti_join(x, y, by = NULL, copy = FALSE, ...)
| x, y | tbls to join |
|---|---|
| by | a character vector of variables to join by. If To join by different variables on x and y use a named vector.
For example, |
| copy | If |
| suffix | If there are non-joined duplicate variables in |
| ... | other parameters passed onto methods |
Currently dplyr supports four join types:
inner_join()return all rows from x where there are matching
values in y, and all columns from x and y. If there are multiple matches
between x and y, all combination of the matches are returned.
left_join()return all rows from x, and all columns from x
and y. Rows in x with no match in y will have NA values in the new
columns. If there are multiple matches between x and y, all combinations
of the matches are returned.
right_join()return all rows from y, and all columns from x
and y. Rows in y with no match in x will have NA values in the new
columns. If there are multiple matches between x and y, all combinations
of the matches are returned.
semi_join()return all rows from x where there are matching
values in y, keeping just columns from x.
A semi join differs from an inner join because an inner join will return
one row of x for each matching row of y, where a semi
join will never duplicate rows of x.
anti_join()return all rows from x where there are not
matching values in y, keeping just columns from x.
full_join()return all rows and all columns from both x and y.
Where there are not matching values, returns NA for the one missing.
Groups are ignored for the purpose of joining, but the result preserves
the grouping of x.
# "Mutating" joins add variables to the LHS band_members %>% inner_join(band_instruments)#>#> # A tibble: 2 × 3 #> name band plays #> <chr> <chr> <chr> #> 1 John Beatles guitar #> 2 Paul Beatles bassband_members %>% left_join(band_instruments)#>#> # A tibble: 3 × 3 #> name band plays #> <chr> <chr> <chr> #> 1 Mick Stones <NA> #> 2 John Beatles guitar #> 3 Paul Beatles bassband_members %>% right_join(band_instruments)#>#> # A tibble: 3 × 3 #> name band plays #> <chr> <chr> <chr> #> 1 John Beatles guitar #> 2 Paul Beatles bass #> 3 Keith <NA> guitarband_members %>% full_join(band_instruments)#>#> # A tibble: 4 × 3 #> name band plays #> <chr> <chr> <chr> #> 1 Mick Stones <NA> #> 2 John Beatles guitar #> 3 Paul Beatles bass #> 4 Keith <NA> guitar# "Filtering" joins keep cases from the LHS band_members %>% semi_join(band_instruments)#>#> # A tibble: 2 × 2 #> name band #> <chr> <chr> #> 1 John Beatles #> 2 Paul Beatlesband_members %>% anti_join(band_instruments)#>#> # A tibble: 1 × 2 #> name band #> <chr> <chr> #> 1 Mick Stones# To suppress the message, supply by band_members %>% inner_join(band_instruments, by = "name")#> # A tibble: 2 × 3 #> name band plays #> <chr> <chr> <chr> #> 1 John Beatles guitar #> 2 Paul Beatles bass# This is good practice in production code # Use a named `by` if the join variables have different names band_members %>% full_join(band_instruments2, by = c("name" = "artist"))#> # A tibble: 4 × 3 #> name band plays #> <chr> <chr> <chr> #> 1 Mick Stones <NA> #> 2 John Beatles guitar #> 3 Paul Beatles bass #> 4 Keith <NA> guitar# Note that only the key from the LHS is kept