--- output: github_document --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" ) options(tibble.print_min = 5, tibble.print_max = 5) ``` # dplyr [![CRAN status](https://www.r-pkg.org/badges/version/dplyr)](https://cran.r-project.org/package=dplyr) [![R-CMD-check](https://github.com/tidyverse/dplyr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/dplyr/actions/workflows/R-CMD-check.yaml) [![Codecov test coverage](https://codecov.io/gh/tidyverse/dplyr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/dplyr?branch=main) ## Overview dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges: * `mutate()` adds new variables that are functions of existing variables * `select()` picks variables based on their names. * `filter()` picks cases based on their values. * `summarise()` reduces multiple values down to a single summary. * `arrange()` changes the ordering of the rows. These all combine naturally with `group_by()` which allows you to perform any operation "by group". You can learn more about them in `vignette("dplyr")`. As well as these single-table verbs, dplyr also provides a variety of two-table verbs, which you can learn about in `vignette("two-table")`. If you are new to dplyr, the best place to start is the [data transformation chapter](https://r4ds.hadley.nz/data-transform) in R for Data Science. ## Backends In addition to data frames/tibbles, dplyr makes working with other computational backends accessible and efficient. Below is a list of alternative backends: - [arrow](https://arrow.apache.org/docs/r/) for larger-than-memory datasets, including on remote cloud storage like AWS S3, using the Apache Arrow C++ engine, [Acero](https://arrow.apache.org/docs/cpp/streaming_execution.html). - [dtplyr](https://dtplyr.tidyverse.org/) for large, in-memory datasets. Translates your dplyr code to high performance [data.table](https://rdatatable.gitlab.io/data.table/) code. - [dbplyr](https://dbplyr.tidyverse.org/) for data stored in a relational database. Translates your dplyr code to SQL. - [duckplyr](https://duckdblabs.github.io/duckplyr/) for using [duckdb](https://duckdb.org) on large, in-memory datasets with zero extra copies. Translates your dplyr code to high performance duckdb queries with an automatic R fallback when translation isn't possible. - [duckdb](https://duckdb.org/docs/api/r) for large datasets that are still small enough to fit on your computer. - [sparklyr](https://spark.rstudio.com) for very large datasets stored in [Apache Spark](https://spark.apache.org). ## Installation ```{r, eval = FALSE} # The easiest way to get dplyr is to install the whole tidyverse: install.packages("tidyverse") # Alternatively, install just dplyr: install.packages("dplyr") ``` ### Development version To get a bug fix or to use a feature from the development version, you can install the development version of dplyr from GitHub. ```{r, eval = FALSE} # install.packages("pak") pak::pak("tidyverse/dplyr") ``` ## Cheat Sheet ## Usage ```{r, message = FALSE} library(dplyr) starwars %>% filter(species == "Droid") starwars %>% select(name, ends_with("color")) starwars %>% mutate(name, bmi = mass / ((height / 100) ^ 2)) %>% select(name:mass, bmi) starwars %>% arrange(desc(mass)) starwars %>% group_by(species) %>% summarise( n = n(), mass = mean(mass, na.rm = TRUE) ) %>% filter( n > 1, mass > 50 ) ``` ## Getting help If you encounter a clear bug, please file an issue with a minimal reproducible example on [GitHub](https://github.com/tidyverse/dplyr/issues). For questions and other discussion, please use [community.rstudio.com](https://community.rstudio.com/) or the [manipulatr mailing list](https://groups.google.com/d/forum/manipulatr). --- Please note that this project is released with a [Contributor Code of Conduct](https://dplyr.tidyverse.org/CODE_OF_CONDUCT). By participating in this project you agree to abide by its terms.