--- title: "Drawing graphs" editor: markdown: wrap: 72 --- ## Our data - To illustrate making graphs, we need some data. - Data on 202 male and female athletes at the Australian Institute of Sport. - Variables: - categorical: Sex of athlete, sport they play - quantitative: height (cm), weight (kg), lean body mass, red and white blood cell counts, haematocrit and haemoglobin (blood), ferritin concentration, body mass index, percent body fat. - Values separated by tabs (which impacts reading in). ## Packages for this section ```{r graphs-R-1} library(tidyverse) ``` ## Reading data into R - Use `read_tsv` ("tab-separated values"), like `read_csv`. - Data in `ais.txt`: ```{r graphs-R-2} my_url <- "http://ritsokiguess.site/datafiles/ais.txt" athletes <- read_tsv(my_url) ``` ## The data (some) ```{r graphs-R-3} athletes ``` ## Types of graph {.smaller} Depends on number and type of variables: | Categorical | Quantitative | Graph | |-------------:|-------------:|:-------------------------------------------| | 1 | 0 | bar chart | | 0 | 1 | histogram | | 2 | 0 | grouped bar charts | | 1 | 1 | side-by-side boxplots | | 0 | 2 | scatterplot | | 2 | 1 | grouped boxplots | | 1 | 2 | scatterplot with points identified by group (eg. by colour) | With more (categorical) variables, might want *separate plots by groups*. This is called `facetting` in R. ## `ggplot` - R has a standard graphing procedure `ggplot`, that we use for all our graphs. - Use in different ways to get precise graph we want. - Let's start with bar chart of the sports played by the athletes. ## Bar chart ```{r graphs-R-4, fig.height=3.9} ggplot(athletes, aes(x = Sport)) + geom_bar() ``` ## Histogram of body mass index ```{r graphs-R-5, fig.height=3.9} ggplot(athletes, aes(x = BMI)) + geom_histogram(bins = 16) ``` ## Which sports are played by males and females? Grouped bar chart: ```{r graphs-R-6, fig.height=3.15} ggplot(athletes, aes(x = Sport, fill = Sex)) + geom_bar(position = "dodge") ``` ## BMI by gender ```{r graphs-R-7, fig.height=4} ggplot(athletes, aes(x = Sex, y = BMI)) + geom_boxplot() ``` ## Height vs. weight Scatterplot: ```{r graphs-R-8, fig.height=3.4} ggplot(athletes, aes(x = Ht, y = Wt)) + geom_point() ``` ## With regression line ```{r graphs-R-9, fig.height=3.6} ggplot(athletes, aes(x = Ht, y = Wt)) + geom_point() + geom_smooth(method = "lm") ``` ## BMI by sport and gender ```{r graphs-R-10, fig.height=3.6} ggplot(athletes, aes(x = Sport, y = BMI, fill = Sex)) + geom_boxplot() ``` A variation that uses `colour` instead of `fill`: ```{r} ggplot(athletes, aes(x = Sport, y = BMI, colour = Sex)) + geom_boxplot() ``` ## Height and weight by gender ```{r} #| fig-height: 9 #| fig-width: 9 ggplot(athletes, aes(x = Ht, y = Wt, colour = Sex)) + geom_point() ``` ## Height by weight by gender for each sport, with facets ```{r graphs-R-12, fig.height=3.6} ggplot(athletes, aes(x = Ht, y = Wt, colour = Sex)) + geom_point() + facet_wrap(~Sport) ``` ## Filling each facet Default uses same scale for each facet. To use different scales for each facet, this: ```{r graphs-R-13, fig.height=5.2} ggplot(athletes, aes(x = Ht, y = Wt, colour = Sex)) + geom_point() + facet_wrap(~Sport, scales = "free") ``` ## Another view of height vs weight ```{r} ggplot(athletes, aes(x = Ht, y = Wt)) + geom_point() + facet_wrap(~ Sex) ```