Interactive graphics are visual displays that dynamically provide information to users based on the user interacting with the graphic.
The best packages for creating interactive graphics in R are:
We use plotly to create a surface plot of the Maunga Whau volcano data.
We use ggiraph to provide an example of an
interactive graphic using the starwars data from the
dplyr package (Wickham et al.
2022).
We use rgl to display 3-dimensional perspective plot with contour levels of the Maunga Whau volcano.
library(rgl)
z <- 2 * volcano # Exaggerate the relief
x <- 10 * (1:nrow(z)) # 10 meter spacing (S to N)
y <- 10 * (1:ncol(z)) # 10 meter spacing (E to W)
open3d()
## wgl
## 1
id <- persp3d(x, y, z, aspect = "iso",
axes = FALSE, box = FALSE, polygon_offset = 1)
contourLines3d(id) # "z" is the default function
filledContour3d(id, polygon_offset = 1, nlevels = 10, replace = TRUE)
A Shiny app example related to NCAA swim teams can be found at https://shiny.rstudio.com/gallery/ncaa-swim-team-finder.html.
The easiest way to create interactive graphics is to:
ggplotly function from the
plotly package to make the graphic interactive.The advantage of this approach is that it is dead simple.
The disadvantage of this approach is that it may not give us the desired control over the aspects of the graphic that are interactive. We produce numerous interactive graphics using this approach below.
As stated on the plotly website (https://plotly.com/r/getting-started/):
plotly is an R package for creating interactive web-based graphs via the open source JavaScript graphing library plotly.js.
It can be used to add interactivity to plots created with ggplot2 or create interactive plots on its own.
First, we load the necessary packages. We load the
ggplot2 and plotly packages to create
the graphics and load the penguins data set from the
palmerpenguins package to load the data we will
plot.
library(ggplot2, quietly = TRUE)
library(plotly, quietly = TRUE)
data(penguins, package = "palmerpenguins")
In the code below, we use ggplot2 to create a basic
bar plot of penguin species. We assign this graphic the
name ggbar.
We then use the ggplotly function in the
plotly package to make the graphic interactive.
The interactive graphic provides the frequency associated with each species when we hover over a bar.
# bar plot of penguin species
ggbar <-
ggplot(penguins) +
geom_bar(aes(x = species))
# make bar plot interactive
ggplotly(ggbar)
Next, we use the direct capabilities of the plotly
package to create a bar plot of penguin species. In
general, the plot_ly function in the
plotly package is all we need to create basic
interactive graphics. We can also add additional layers to the graphics
using various add_* functions and customize the layout
using the layout functions.
The main arguments to the plot_ly function are:
data: an optional data frame whose variables will be
plotted. To access a variable in data, we must use
~ before the variable’s name.type: a character string indicating the type of plot to
create, e.g., "bar", "histogram",
"box", "violin", "scatter",...: arguments passed to the plot type that specify the
attributes of the graphic (which is similar to the aesthetics
in ggplot2), e.g., x, y,
etc.split: Discrete values used to create multiple traces
(one trace per value). This is similar to the group
argument in ggplot2. A “trace” describes “a single
series of data in a graph” (https://plotly.com/r/reference/index/).color: values mapped to a fill color.alpha: a number between 0 and 1 controlling the
transparency of the graphic.To create a bar plot using plotly, we need a data frame containing the count associated with each level of the categorical variable we want to display.
We first create a data frame that summarizes the counts of each
species. We use the group_by,
summarize, and n functions from
dplyr (Wickham et al.
2022) to do this.
# create data frame of frequency for each species
species_counts <-
penguins |>
dplyr::group_by(species) |>
dplyr::summarize(frequency = dplyr::n())
# print data frame
print(species_counts, n = 3)
## # A tibble: 3 × 2
## species frequency
## <fct> <int>
## 1 Adelie 152
## 2 Chinstrap 68
## 3 Gentoo 124
Once we have a data frame that describes the frequency associated
with each level of the categorical variable, we can create a bar plot
using plotly. In the plot_ly function,
we:
type argument to barx attribute.y
attribute.The interactive graphic provides the frequency associated with each species when we hover over a bar.
# create interactive bar chart
plot_ly(species_counts,
x = ~species,
y = ~frequency,
type = "bar")
We now create interactive histograms using two approaches.
In the code below, we:
bill_length_mm variable for the penguins data.
We assign this graphic the name gghist.plotly::ggplotly to make the graphic
interactive.The interactive graphic indicates the midpoint of each bin and the number of penguins falling in each bin.
gghist <-
ggplot(penguins) +
geom_histogram(aes(x = bill_length_mm))
ggplotly(gghist)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
To create a similar histogram using the plot_ly
function, we:
type arugment to histogram.bill_length_mm with the x
attribute.nbinsx argument to control the number of bins in
the histogram.The interactive histogram indicates the endpoints of each bin and the number of penguins in each bin.
plot_ly(penguins,
x = ~bill_length_mm,
type = "histogram",
nbinsx = 30)
The histogram produced by the plot_ly function looks a
bit different from the histogram produced by ggplot2
because the locations of the bins are different. To make them the safe,
we can use the ggplot2::layer_data to get the “under the
hood” information ggplot2 uses to produce its plot.
In the code below, we use layer_data to access the
internal data used by ggplot2 to create a histogram.
The xmin variable indicates the lower bound of each
histogram bin. The lower bound of the far left bin starts at 31.72724.
We then use the diff function to determine the bin width
(this computes the difference between success lower bounds).
# get histogram data from gghist
datahist <- layer_data(gghist)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# determine starting point
head(datahist$xmin, 3)
## [1] 31.76724 32.71552 33.66379
# determine bin width
head(diff(datahist$xmin), 3)
## [1] 0.9482759 0.9482759 0.9482759
Now that we know the start location of the left most bin and the size
(width) of the bins, we can pass these arguments as start
and size arguments to a named list for the
xbins argument to plot_ly. This will create an
interactive histogram that mimics the one produced by
ggplot2.
plot_ly(penguins,
x = ~bill_length_mm,
type = "histogram",
xbins = list(start = 31.76724,
size = 0.9482759))
We examine how to construct interactive density plots using two approaches.
In the code below, we:
bill_length_mm for each species that uses
semi-transparent color to distinguish the different
species. We assign this plot the name
ggdens.plotly::ggplotly to make the graphic
interactive.The interactive graphic indicates the species,
bill_length_mm, and density when we hover over a density
curve.
ggdens <-
ggplot(penguins) +
geom_density(aes(x = bill_length_mm, fill = species), alpha = 0.3)
ggplotly(ggdens)
Surprisingly, there is no easy way to create a standard density plot
natively using plot_ly.
To work around this, we can manually create the density information
using the base::density function and then extract the
associated x and y of the density information.
However, we want to do this individually for each species, so we instead
use the ggplot2::layer_data function to get the same
information from our previous ggplot2 graphic.
We assign the name dens_data to the information from the
layer_data function. dens_data stores the
relevant density curve information in the x and
density variables, while the group variable
distinguishes the different species. We turn the
group variable into a factor with the correct
species names.
# extract density data from ggdens
dens_data <- layer_data(ggdens)
# view data
head(dens_data, n = 3)
## fill y x density scaled ndensity count n
## 1 #F8766D 0.006280819 32.10000 0.006280819 0.04825549 0.04825549 0.9484037 151
## 2 #F8766D 0.006608059 32.15382 0.006608059 0.05076968 0.05076968 0.9978170 151
## 3 #F8766D 0.006947760 32.20763 0.006947760 0.05337960 0.05337960 1.0491118 151
## flipped_aes PANEL group ymin ymax weight colour alpha size linetype
## 1 FALSE 1 1 0 0.006280819 1 black 0.3 0.5 1
## 2 FALSE 1 1 0 0.006608059 1 black 0.3 0.5 1
## 3 FALSE 1 1 0 0.006947760 1 black 0.3 0.5 1
# convert group to factor
dens_data$group <-
factor(dens_data$group,
labels = c("adelie", "chinstrap", "gentoo"))
We want to trace the density curves for each species in a scatter
plot that connects the poitns for each species. Using
plot_ly, and the dens_data data frame, we:
x variable with the x
attribute.density variable with the y
attribute.group variable with the
split argument. This is roughly equivalent to the
group aesthetic in ggplot2.type = "scatter" to produce a scatter
plot.mode = "line" to connect the points using a
line but not show the points (markers) themselves.|> with the
layout function to change the x-axis label.The resulting interactive density plot indicates the value of
bill_length_mm for each species and the associated
density.
plot_ly(dens_data,
x = ~x,
y = ~density,
split = ~group,
type = "scatter",
mode = "line") |>
layout(xaxis = list(title = 'bill_length_mm'))
We create interactive box plots using two approaches.
In the code below, we:
bill_length_mm for each species. We associate
species with the x-variable and bill_length_mm
with the y-variable. We assign this plot the name
ggbox.plotly::ggplotly to make the graphic
interactive.The interactive graphic indicates the 5-number summary (min, Q1,
median, Q3, max) of bill_length_mm for each
species. It also indicates the value of any outlier.
ggbox <-
ggplot(penguins) +
geom_boxplot(aes(x = species, y = bill_length_mm))
ggplotly(ggbox)
We can create a similar set of parallel box plots using
plotly. Using the plot_ly function,
we:
species with the x
attributebill_length_mm with the y
attributetype = "box".The interactive graphic indicates the species and
5-number summary (min, Q1, median, Q3, max) of
bill_length_mm for each box plot.
plot_ly(penguins,
x = ~species,
y = ~bill_length_mm,
type = "box")
We use two approaches to create interactive violin plots.
In the code below, we:
bill_length_mm for each species. We assign
this plot the name ggvio.plotly::ggplotly to make the graphic
interactive.The interactive graphic indicates the species,
bill_length_mm, and the associated density for that value
of bill_length_mm when we hover over a violin curve.
ggvio <-
ggplot(penguins) +
geom_violin(aes(x = species, y = bill_length_mm))
ggplotly(ggvio)
To create a similar plot, using the plot_ly function
we:
species with the x
attributebill_length_mm with the y
attributestype = "violin".The interactive graphic indicates the species,
bill_length_mm, and the associated density for that value
of bill_length_mm when we hover over a violin curve, as
well as the 5-number summary of the associated box plot.
plot_ly(penguins,
x = ~species,
y = ~bill_length_mm,
type = "violin")
We will create an interactive scatter plot of
bill_length_mm versus body_mass_g for the
penguins data that uses different colors and shapes to
distinguish the different species.
We will investigate two simple approaches for doing this.
In the code below, we:
ggscatter.plotly::ggplotly to make the graphic
interactive.The interactive graphic indicates the species (twice),
body_mass_g, and bill_length_mm when we hover
over a density curve.
ggscatter <-
ggplot(penguins) +
geom_point(aes(x = body_mass_g,
y = bill_length_mm,
color = species,
shape = species))
ggplotly(ggscatter)
Notice that species is indicated twice when we hover
over a point. We can correct this behavior by using the
tooltip argument to specify the attributes (x,
y, color, etc.) we want to display when our
mouse hovers over a point.
# restrict attributes displayed from hover
ggplotly(ggscatter,
tooltip = c("shape", "x", "y"))
We can create a similar interactive scatter plot using
plot_ly: We:
type = "scatter" and
mode = "marker" to indicate that we want to plot
points.body_mass_g with the x attribute
and bill_length_m`` with they` attribute for the actual
points.species with the color and
symbol attributes to change those aspects of the plot.The resulting scatter plot indicates the species,
body_mass_g, and bill_length_mm of each
point.
plot_ly(penguins,
x = ~body_mass_g,
y = ~bill_length_mm,
color = ~species,
symbol = ~species,
mode = "markers",
type = "scatter")
We now attempt to add some linear regression smooths to an interactive scatter plot.
In the code below, we:
bill_length_mm versus body_mass_g that uses
different colors and shapes to distinguish the different
species."lm"
smooth for the points of each species. We assign this plot
the name ggsmooth.plotly::ggplotly to make the graphic
interactive.The interactive graphic indicates the species,
body_mass_g, and bill_length_mm of each point,
points on the .
ggsmooth <-
ggplot(penguins) +
geom_point(aes(x = body_mass_g,
y = bill_length_mm,
color = species,
shape = species)) +
geom_smooth(aes(x = body_mass_g,
y = bill_length_mm,
color = species),
method = "lm")
ggplotly(ggsmooth)
## `geom_smooth()` using formula 'y ~ x'
Adding a smooth to a plot using plotly natively is a bit more difficult because you have to manually compute the smooth, extract the fitted line for each group, and then add the fitted lines as a layer to an existing scatter plot.
In the code below, we fit a separate lines model for the data, which
essentially fits a separate liner regression model to the points of each
species. We then add the fitted values from
this model as a new variable to the penguins data
frame.
# fit separate lines/interaction model
lmod <- lm(bill_length_mm ~ body_mass_g + species,
data = penguins, na.action = na.exclude)
# add fitted values for each group
penguins$fitted <- fitted(lmod)
Now that all relevant data is in the penguins data
frame, we:
bill_length_mm versus body_mass_g that uses
different colors and shapes for the points of each species.add_lines.
penguins data frame to
add_lines (make sure to specify data = since
that is not the first argument of the add_lines
function).body_mass_g with the x
attributefitted with the y attribute.species
with the color attribute.inherit = FALSE, which means we are not
inheriting any of the attribute specifications in the
plot_ly function (which are otherwise passed by
default).The interactive scatter plot allow us to see the
species, bill_length_mm, and
body_mass_g for each point and the points on the smoother
line associated with each species.
plot_ly(penguins,
x = ~body_mass_g,
y = ~bill_length_mm,
mode = 'markers',
color = ~species,
symbol = ~species,
type = 'scatter') |>
add_lines(data = penguins,
x = ~body_mass_g,
y = ~fitted,
color = ~species,
inherit = FALSE)
Interactive maps can provide a lot of information. We will create an interactive map using ggplot2, plotly, and the sf package (Pebesma 2022).
First, we use the st_read function from the
sf package to read a shapefile related to North Carolina
packages that is installed by default with the sf package.
The imported shapefile is automatically converted to an sf
data frame. The imported object has many variables, but we point out
three:
NAME: the name of each North Carolina countyBIR74: the number of recorded births in each county in
1974.geometry: the MULTIPOLYGON associated with
each North Carolina county.# import sf object from shapefile in sf package
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"),
quiet = TRUE)
# display first 3 rows of nc for certain variables
head(nc[c("NAME", "BIR74", "geometry")], n = 3)
## Simple feature collection with 3 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -81.74107 ymin: 36.23388 xmax: -80.43531 ymax: 36.58965
## Geodetic CRS: NAD27
## NAME BIR74 geometry
## 1 Ashe 1091 MULTIPOLYGON (((-81.47276 3...
## 2 Alleghany 487 MULTIPOLYGON (((-81.23989 3...
## 3 Surry 3188 MULTIPOLYGON (((-80.45634 3...
In the code below, we:
BIR74 for each county using geom_sf.
fill = BIR74 so that the fill color of each
county is based on the BIR74 variable.NAME variable with the
label aesthetic so that the name of each county is
displayed when we hover over a county.scale_fill_viridis_c to change the color palette
used for the fill color.ggsf.plotly::ggplotly to make the graphic
interactive.The interactive graphic indicates the number of births in each county and the county name when we hover over a county.
# plot sf object using ggplot2
ggsf <-
ggplot(nc) +
geom_sf(aes(fill = BIR74, label = NAME)) +
scale_fill_viridis_c()
# make map interactive
ggplotly(ggsf)
Is there a way to provide information from multiple variable simulatneously when we hover over a county? Yes! But we have to be creative. We:
paste0 function to create a new variable,
info, that combines multiple variables into a single
character string for each county. The \n indicates to start
a new line. We add a new line before each variable name.info variable as a variable to the
nc data frame.# combine multiple variables into a character string
# (one per county)
info <- paste0(
"\nname: ", nc$NAME,
"\narea: ", nc$AREA,
"\nbirths in 1974: ", nc$BIR74,
"\nSIDS cases in 1974: ", nc$SID74)
# print first 2 values of info
info[1:2]
## [1] "\nname: Ashe\narea: 0.114\nbirths in 1974: 1091\nSIDS cases in 1974: 1"
## [2] "\nname: Alleghany\narea: 0.061\nbirths in 1974: 487\nSIDS cases in 1974: 0"
# add info the nc
nc$info <- info
Now, we use info as the label aesthetic in
geom_sf and specify tooltip = "label" so that
only the label variable is displayed when we hover over a
county.
# create mape that fills based on BIR74 but the tooltip
# based on info
ggsf <-
ggplot(nc) +
geom_sf(aes(fill = BIR74, label = info)) +
scale_fill_viridis_c()
# show only label tooltip
ggplotly(ggsf, tooltip = "label")
We can create a similar plot using plot_ly. We:
type = "scatter" and
mode = "lines".info variable in nc with the
split attribute to draw the separate traces for each
county. We could have used NAME, but then only the
NAME of each county would be displayed when we hover. This
way, we get additional information.BIR74 variable in nc with
the color attribute to fill each county with a color from a
gradient.showlegend = FALSE so that only the color scale
is displayed and no legend related to info. This is
a critical step.alpha = 1 so that the colors aren’t muted.hoverinfo = "text" so the only the
split information is displayedcolorbar function and change
the title to “BIR74” (otherwise it gets displayed twice).plot_ly(nc,
color = ~BIR74,
split = ~info,
showlegend = FALSE,
alpha = 1,
type = "scatter",
mode = "lines",
hoverinfo = "text") |>
colorbar(title = "BIR74")