A large part of R’s success is the ecosystem of open-sourced packages that add extra functionality to what R comes with (referred to as “base R”).

CRAN

CRAN is the official central repository for R packages, where human reviewed packages go through quality checks for publishing on the CRAN network around the world. It is through CRAN that you install packages with the install.package() function, or via the Packages pane in the bottom left of RStudio.

Using a Package

This will become absolutely second nature, but it’s one of those things that is not 100% intuitive at first. When you’re using a package (which, really, means you’re using one or more of the functions in a package), there are two things that have to happen:

  1. The package has to be installed. This is what is described above. The first time you use a package, you will need to type install.packages(\[the package name\]) in the console and press . That will pull the package (and any packages the package depends on – few packages are built entirely from scratch!) from CRAN. And…it will show up in the Packages tab in the bottom right of your RStudio environment. This is a one-time step, although there is no harm in installing the same package multiple times, and, occasionally, you will find that a package has been updated and needs to be re-installed.

  2. The package has to be loaded. In your script (or in the console), enter library(\[the package name\]). Typically, you will have a list of these at the beginning of your scripts (although there are other techniques for centralizing a list of packages you commonly use…we’re not going to go there). Once a package is loaded, it will show with a checkbox next to it in the Packages list in your RStudio environment.

When you get a “function not found” error when running a script, 9 times out of 10 that’s because the function you’re calling is in a package that isn’t loaded.

Some Useful Packages

This list simply cannot be comprehensive, but, to give you a sense of the breadth and scope of R Packages, below are a few representative examples:

Accessing Web Analytics, Search, and Social Media Data

Many of the platforms that digital analysts work with already have one or more packages developed for accessing their data quickly and easily:

  • googleAnalyticsR – this is officially Tim’s favorite package (which may or may not be him sucking up to Mark). It uses the v4 API from Google Analytics which (right now) makes it unique.
  • RGA – another package for accessing Google Analytics data. This used to be our favorite. (See above.)
  • rga – you guessed it…another package for accessing Google Analytics data. Subtle Point: R is case-sensitive (and it’s unfortunate that we wound up with two different-but-similar-packages!) throughout the code. It’s easy for this to trip things up!
  • RSiteCatalyst – a package for accessing data from Adobe Analytics! This package works really well…but it’s dependent on constraints in the Adobe Analytics API.
  • twitteR – a package for accessing Twitter data.
  • searchConsoleR – yup… you guessed it. Google Search Console data (oh…and Mark wrote this one, too; fun fact that Tim discovered when he was creating this content!)

Some Other Packages You Will Get Familiar With

You will learn to know the name “Hadley Wickham,” as he is quite possibly the most influential R user (and package and content creator) of the modern era. He also now works for RStudio. All of the packages below are part of what has been dubbed “the Hadleyverse,” because Wickham was key in their creation. He’s actually been working to rebrand this “the tidyverse” because, well, that’s how he rolls.

For a more comprehensive list of packages in the Hadley-tidyverse, you can check out The Hitchhiker’s Guide to the Hadleyverse. But, here, we’re just going to list a small handful of those packages that are practically core to using R:

  • ggplot2 – It’s a package with its own web site! ggplot2 is a very powerful data visualization package. It is also extremely confusing to learn, as you have to throw how you think about the construction of charts and graphs out the window. We’ll be using ggplot2 in this course a bit, but know that it takes a while to fully learn. You can check out this site for some examples of visualizations with ggplot2.
  • dplyr – in some respects, this is a non-essential package, in that there are “base R” ways to do everything that its handful of functions do. On the other hand…those handful of functions, as well as the %>% operator (which is available once you’ve loaded dplyr…but actually comes from another package that dplyr uses) can streamline the heck out of your code!
  • lubridate – it’s a play on words: you’re “lubricating the process of working with dates” (I think). Dates come in all sorts of formats and structures, and this package makes short work of working with them.
  • tidyr – this is a package for getting your data into a “tidy” format…which, put in Excel terms, means a bunch of rows and a limited number of columns so that you could pivot the data very easily (because data that can be easily pivoted can be easily subsetted, plotted, summarized, and aggregated).

Github

There are also thousands of more experimental packages that are only available through Github. These packages have come much easier to work with since the introduction of the package devtools, which via its function install_github means access to this universe.

Beware though that the packages on Github are by their nature more experimental, so due care should be excercised to ensure they are trustworthy.