ggplot() is one of the most downloaded R packages and probably the one that bought Hadley to fame.

The “grammar of graphics” philosophy it supports not only lets you create professional looking plots, but once you have mastered its syntax should encourage you to think about plots in a more structured manner too.

The syntax does take time to master though, so do take time to check out the ggplot2 website and the ggplot2 cookbook which will walk you through common tasks. The author still refers to these!

Grammar rules

ggplot looks to break down a plot into various categories, that you can build up into a whole plot.

A brief explanation of the components are below:

Thinking about what you want to produce via the componenets above will get you to your desired plot quicker.

What plot to do?

Another great resource is the ggplot2 cheatsheet which groups geoms by the type of data you have.

Workflow

  1. First get your data ready and tidy. Whilst its possible to use “wide” data, I find it easiest to always start with tidy “long” data, so you can quickly repeat what you have learnt before.
## get the web data
web_data <- read.csv("./data/gadata_example_2.csv", stringsAsFactors = FALSE)

## remove row name row
web_data <- web_data[,-1]
## to date I have been using reshape2's melt() to make the long data:
library(reshape2)
web_data_long <- melt(web_data)
## Using date, channelGrouping, deviceCategory as id variables
head(web_data_long)
##         date channelGrouping deviceCategory variable value
## 1 2016-01-01         (Other)        desktop sessions    19
## 2 2016-01-01         (Other)         mobile sessions   112
## 3 2016-01-01         (Other)         tablet sessions    24
## 4 2016-01-01          Direct        desktop sessions   133
## 5 2016-01-01          Direct         mobile sessions   345
## 6 2016-01-01          Direct         tablet sessions   126
## but you should (and I) use the newer tidyr() package and use gather() instead:
library(tidyr)
## call the key column 'variable' and the value colum 'value' and 
## gather all variables apart from date, channelGrouping and deviceCategory
web_data_tidy <- web_data %>% gather(variable, value, -date, -channelGrouping, -deviceCategory)
head(web_data_tidy)
##         date channelGrouping deviceCategory variable value
## 1 2016-01-01         (Other)        desktop sessions    19
## 2 2016-01-01         (Other)         mobile sessions   112
## 3 2016-01-01         (Other)         tablet sessions    24
## 4 2016-01-01          Direct        desktop sessions   133
## 5 2016-01-01          Direct         mobile sessions   345
## 6 2016-01-01          Direct         tablet sessions   126
  1. Make sure all your columsn are the right class. In this case, I make the date column a Date object.

You could also choose to make factors out of your categories, as they let you set the order of colours in legends a bit easier.

str(web_data_tidy)
## 'data.frame':    22928 obs. of  5 variables:
##  $ date           : chr  "2016-01-01" "2016-01-01" "2016-01-01" "2016-01-01" ...
##  $ channelGrouping: chr  "(Other)" "(Other)" "(Other)" "Direct" ...
##  $ deviceCategory : chr  "desktop" "mobile" "tablet" "desktop" ...
##  $ variable       : chr  "sessions" "sessions" "sessions" "sessions" ...
##  $ value          : int  19 112 24 133 345 126 307 3266 1025 17 ...
web_data_tidy$date <- as.Date(web_data_tidy$date)

## we will only look at sessions
library(dplyr)
plot_data <- web_data_tidy %>% filter(variable == "sessions")
  1. Start up a gg object with a ggplot() call including your data, and any known aesthetics you want to apply to all plots. I tend to also set the theme here, favouring a minimal look:
library(ggplot2)
## I don't know why, but I always call them gg
gg <- ggplot(data = plot_data, aes(x = date)) + theme_minimal()
  1. Now the fun begins - experiment with adding various elements to your gg object using +. Once you have found something you want to keep, assign it to gg and then carry on to the next feature.

Anything you haven’t specified in the global line, you will need to add in the geom you are adding. Note that because we have put the data in the first line, we don’t need to specify it again.

## lets make some line plots
gg + geom_line(aes(y = value))

## hmm, too much data in there, lets colour by the channelGroupings
gg + geom_line(aes(y = value, colour = channelGrouping))

## we have desktop, mobile and tablet all in there, lets seperate them out with facet
gg + geom_line(aes(y = value, colour = channelGrouping)) + facet_grid(. ~ deviceCategory)

## I prefer it one other the other
gg + geom_line(aes(y = value, colour = channelGrouping)) + facet_grid(deviceCategory ~ .)

## lets try an area plot
gg + geom_area(aes(y = value, colour = channelGrouping, group = channelGrouping, fill = channelGrouping)) + facet_grid(deviceCategory ~ .)

## ok, lets keep that for now
gg <- gg + geom_area(aes(y = value, group = channelGrouping, fill = channelGrouping)) + facet_grid(deviceCategory ~ .)

The point above is to show how modifications can be quickly added as you try out ideas.

A little more styling, and we are done with this example:

## make the colours nicer
gg <- gg + scale_fill_brewer(palette = "Paired")
## add a title
gg <- gg + ggtitle("Sessions per device category")
## rename the x and y axis
gg <- gg + xlab("Date") + ylab("Sessions")
## change the legend title
gg <- gg + guides(fill = guide_legend(title = "Channel Grouping"))
## put the legend at the bottom
gg <- gg + theme(legend.position = "bottom")
## print the final plot
gg

Disclaimer, I don’t think area plots are very clear but they look pretty ;)