Join data frame tbls

See join for a description of the general purpose of the functions.

# S3 method for tbl_df
inner_join(x, y, by = NULL, copy = FALSE,
  suffix = c(".x", ".y"), ...,
  na_matches = pkgconfig::get_config("dplyr::na_matches"))

# S3 method for tbl_df
left_join(x, y, by = NULL, copy = FALSE, suffix = c(".x",
  ".y"), ..., na_matches = pkgconfig::get_config("dplyr::na_matches"))

# S3 method for tbl_df
right_join(x, y, by = NULL, copy = FALSE,
  suffix = c(".x", ".y"), ...,
  na_matches = pkgconfig::get_config("dplyr::na_matches"))

# S3 method for tbl_df
full_join(x, y, by = NULL, copy = FALSE, suffix = c(".x",
  ".y"), ..., na_matches = pkgconfig::get_config("dplyr::na_matches"))

# S3 method for tbl_df
semi_join(x, y, by = NULL, copy = FALSE, ...,
  na_matches = pkgconfig::get_config("dplyr::na_matches"))

# S3 method for tbl_df
anti_join(x, y, by = NULL, copy = FALSE, ...,
  na_matches = pkgconfig::get_config("dplyr::na_matches"))

Arguments

x	tbls to join
y	tbls to join
by	a character vector of variables to join by. If `NULL`, the default, `*_join()` will do a natural join, using all variables with common names across the two tables. A message lists the variables so that you can check they're right (to suppress the message, simply explicitly list the variables that you want to join). To join by different variables on x and y use a named vector. For example, `by = c("a" = "b")` will match `x.a` to `y.b`.
copy	If `x` and `y` are not from the same data source, and `copy` is `TRUE`, then `y` will be copied into the same src as `x`. This allows you to join tables across srcs, but it is a potentially expensive operation so you must opt into it.
suffix	If there are non-joined duplicate variables in `x` and `y`, these suffixes will be added to the output to diambiguate them. Should be a character vector of length 2.
...	included for compatibility with the generic; otherwise ignored.
na_matches	Use `"never"` to always treat two `NA` or `NaN` values as different, like joins for database sources, similarly to `merge(incomparables = FALSE)`. The default,`"na"`, always treats two `NA` or `NaN` values as equal, like `merge()`. Users and package authors can change the default behavior by calling `pkgconfig::set_config("dplyr::na_matches" = "never")`.

Examples

if (require("Lahman")) {
batting_df <- tbl_df(Batting)
person_df <- tbl_df(Master)

uperson_df <- tbl_df(Master[!duplicated(Master$playerID), ])

# Inner join: match batting and person data
inner_join(batting_df, person_df)
inner_join(batting_df, uperson_df)

# Left join: match, but preserve batting data
left_join(batting_df, uperson_df)

# Anti join: find batters without person data
anti_join(batting_df, person_df)
# or people who didn't bat
anti_join(person_df, batting_df)
}
#> Joining, by = "playerID"
#> Joining, by = "playerID"
#> Joining, by = "playerID"
#> Joining, by = "playerID"
#> Joining, by = "playerID"
#> # A tibble: 187 × 26
#>     playerID birthYear birthMonth birthDay birthCountry birthState birthCity
#>        <chr>     <int>      <int>    <int>        <chr>      <chr>     <chr>
#>  1 youngni99      1840          9       12          USA         NY Amsterdam
#>  2 yawketo99      1903          2       21          USA         MI   Detroit
#>  3 wrighal99      1842          3       31          USA         NY  New York
#>  4 wilsoju99      1896          2       28          USA         VA Remington
#>  5 willijo99      1885          4        6          USA         TX    Seguin
#>  6 williji99      1847          1        4          USA         OH   Catawba
#>  7 whiteso99      1868          6       12          USA         OH  Bellaire
#>  8 wellswi99      1905          8       10          USA         TX    Austin
#>  9 weissge99      1894          6       23          USA         CT New Haven
#> 10 weaveea99      1930          8       14          USA         MO St. Louis
#> # ... with 177 more rows, and 19 more variables: deathYear <int>,
#> #   deathMonth <int>, deathDay <int>, deathCountry <chr>, deathState <chr>,
#> #   deathCity <chr>, nameFirst <chr>, nameLast <chr>, nameGiven <chr>,
#> #   weight <int>, height <int>, bats <fctr>, throws <fctr>, debut <chr>,
#> #   finalGame <chr>, retroID <chr>, bbrefID <chr>, deathDate <date>,
#> #   birthDate <date>

Arguments

Examples

Contents