---
title: "Simple knitr Rmarkdown example -- automatic write-up of simulated data"
author: "Author name goes here"
output:
pdf_document:
keep_tex: true
extra_dependencies:
mathptmx: null
xcolor: ["table"]
booktabs: null
wrapfig: null
float: null
colortbl: null
pdflscape: null
tabu: null
makecell: null
fontsize: 12pt
date: "Last compiled on `r format(Sys.time(), '%d %B, %Y')`"
---
```{r setup, echo=FALSE, cache=FALSE, message=FALSE, results='hide', warning=FALSE}
rm(list = ls())
knitr::opts_chunk$set(collapse = TRUE,
fig.path = "knitr-figs/",
cache.path = "knitr-cache/",
fig.asp = 0.618,
fig.width = 9,
out.width = "6in",
echo = FALSE,
autodep = TRUE,
cache = FALSE,
fig.align = "center",
fig.pos = "tbp",
tab.pos = "tbp"
)
knitr::knit_hooks$set(document = function(x) {
sub('\\usepackage[]{color}',
'\\usepackage{xcolor}', x, fixed = TRUE)
}) # Last command from
# https://tex.stackexchange.com/questions/148188/knitr-xcolor-incompatible-color-definition
library(kableExtra)
library(tibble)
library(xtable)
```
# Introduction
The document produces output based on calculations similar to those
done in `rmark-exercise.Rmd`, but more in the style of a report, or a manuscript to be
submitted to a journal. As such, the output is .pdf and the options are set so
as to hide the underlying code.
Again, read through the .Rmd carefully to
see any comments as well as to understand how the output is being produced. Here
we are
producing a more polished document than in `rmark-exercise.Rmd`. The rendered .pdf
will not show the code, yet the code is still explicitly given
in the .Rmd file. So the R code can easily be changed to re-run the analyses and
update the document automatically, and all calculations, figures, and tables are
traceable back to the code.
# Generate data
```{r generate}
set.seed(42)
n = 50 # sample size
x = 1:n
sigma = 10
y = 10*x + rnorm(n, 0, sigma)
```
We have generated an example data set with a sample size of $n=$ `r n`, with values
$x_i = 1, 2, 3, ..., n$ and corresponding $y_i$ values given by
\begin{equation}
y_i = 10 x_i + \epsilon_i,
\end{equation}
where
\begin{equation}
\epsilon_i \sim \mbox{Normal}(0, `r sigma`).
\label{epsilon}
\end{equation}
The maximum value of $y_i$ is `r max(y)`, or rounded to two decimal places it is
`r round(max(y),2)`. The equation for $\epsilon$ is shown in (\ref{epsilon}).
## Show some data
```{r showdata}
data = tibble::as_tibble(cbind(x, y))
```
```{r tweak}
kable(data[1:10,],
caption = "The first rows of the data, nicely formatted.",
booktabs = TRUE,
label = "thedata") %>%
kable_styling(latex_options = c("striped"))
```
The first 10 rows of data are shown in Table \ref{tab:thedata}. This was done
using the `kable()` function, which is part of the knitr package.
This basic example shows how to easily add a label (that is then
referred to in the first sentence here), and add a bit of formatting (the
`booktabs` option makes the table look nicer, and the `striped` option is adding
the light-grey shading for readability). See `?knitr::kable()`
for details. For more advanced options, the `kableExtra` package is
recommended. If you have a specific request you can usually find helpful
examples from an internet search.
## Plot the data
```{r plotthedata, fig.cap="A plot of the generated data."}
plot(x, y)
```
It's always good to plot our data, which we do in Figure\ \ref{fig:plotthedata}.
Note that the figure does not appear exactly where it
does in the .Rmd file -- knitr (via Latex) is automatically choosing the best
place, based on the `fig.pos = "tbp"` option that is set in
`knitr::opts_chunk$set` at the start of the .Rmd file. The "tbp" means first try
and place the figure at the top of the current page, else then try the bottom of
the current page, else then put it on the next page. So your document will look
more professional than having a figure breaking up the text -- this is standard
publishing practice (look at a textbook or journal article), and saves you
having to worry about formatting. However, if necessary you can override this
automation by setting `fig.pos="h"`, for example, in the header of the chunk,
where "h" means to put the figure here, where the code appears. If that doesn't
work (because there isn't really space for it), you can set `fig.pos="H"`, where
the capital "H" forces the position. If you try it here (add `, fig.pos="H"`
after the `fig.cap="...."` in the chunk above) you will see that forcing the
figure onto a new page leaves a lot of ugly white space. The point of this is
that you don't need to worry about the formatting -- knitr and Latex do it for
you (but you can usually always override everything).
## Now to fit a linear regression
Figure\ \ref{fig:plotthedata} suggest we could fit a simple linear regression to
the data.
```{r regression}
fit = lm(y~x)
```
We do this in Figure\ \ref{fig:plotfit}, which shows a decent fit (as expected,
since we simulated the data from a linear relationship with a small amount of noise).
```{r plotfit, fig.cap="Linear regression (red line) fit to the data."}
plot(x, y)
abline(fit, col="red")
```
And in Table\ \ref{tab:fit} we show the results of the regression fit.
```{r fit}
kable(coefficients(summary(fit)),
caption = "Linear regression fit.",
booktabs = TRUE,
label = "fit",
digits = 2,
position = "b") # This position argument forces the table to the bottom
# of the page
```
## Summarise the results
So the maximum value of $y_i$ is `r round(max(y),0)`, which is
`r ifelse(max(y)>400, paste("greater than"), paste("less than"))`
the special value of 400.
So you can actually somewhat automate the text (just be careful and think about
other possibilities -- what if $y=399.9$ in the above example?).
## Now, let's go back and change the data
The *big* feature of dynamically generating reports is when you go back and
change or update the input data.
You could go back and change definition of $y$ with something like this line:
```{r changey, eval=FALSE, echo=TRUE}
y = 10*x^1.5 + rnorm(n, 0, 10)
```
And then re-run this document. It should build but with your newly generated data.