--- title: "Simple knitr Rmarkdown example -- automatic write-up of simulated data" author: "Author name goes here" output: pdf_document: keep_tex: true extra_dependencies: mathptmx: null xcolor: ["table"] booktabs: null wrapfig: null float: null colortbl: null pdflscape: null tabu: null makecell: null fontsize: 12pt date: "Last compiled on `r format(Sys.time(), '%d %B, %Y')`" --- ```{r setup, echo=FALSE, cache=FALSE, message=FALSE, results='hide', warning=FALSE} rm(list = ls()) knitr::opts_chunk$set(collapse = TRUE, fig.path = "knitr-figs/", cache.path = "knitr-cache/", fig.asp = 0.618, fig.width = 9, out.width = "6in", echo = FALSE, autodep = TRUE, cache = FALSE, fig.align = "center", fig.pos = "tbp", tab.pos = "tbp" ) knitr::knit_hooks$set(document = function(x) { sub('\\usepackage[]{color}', '\\usepackage{xcolor}', x, fixed = TRUE) }) # Last command from # https://tex.stackexchange.com/questions/148188/knitr-xcolor-incompatible-color-definition library(kableExtra) library(tibble) library(xtable) ``` # Introduction The document produces output based on calculations similar to those done in `rmark-exercise.Rmd`, but more in the style of a report, or a manuscript to be submitted to a journal. As such, the output is .pdf and the options are set so as to hide the underlying code. Again, read through the .Rmd carefully to see any comments as well as to understand how the output is being produced. Here we are producing a more polished document than in `rmark-exercise.Rmd`. The rendered .pdf will not show the code, yet the code is still explicitly given in the .Rmd file. So the R code can easily be changed to re-run the analyses and update the document automatically, and all calculations, figures, and tables are traceable back to the code. # Generate data ```{r generate} set.seed(42) n = 50 # sample size x = 1:n sigma = 10 y = 10*x + rnorm(n, 0, sigma) ``` We have generated an example data set with a sample size of $n=$ `r n`, with values $x_i = 1, 2, 3, ..., n$ and corresponding $y_i$ values given by \begin{equation} y_i = 10 x_i + \epsilon_i, \end{equation} where \begin{equation} \epsilon_i \sim \mbox{Normal}(0, `r sigma`). \label{epsilon} \end{equation} The maximum value of $y_i$ is `r max(y)`, or rounded to two decimal places it is `r round(max(y),2)`. The equation for $\epsilon$ is shown in (\ref{epsilon}). ## Show some data ```{r showdata} data = tibble::as_tibble(cbind(x, y)) ``` ```{r tweak} kable(data[1:10,], caption = "The first rows of the data, nicely formatted.", booktabs = TRUE, label = "thedata") %>% kable_styling(latex_options = c("striped")) ``` The first 10 rows of data are shown in Table \ref{tab:thedata}. This was done using the `kable()` function, which is part of the knitr package. This basic example shows how to easily add a label (that is then referred to in the first sentence here), and add a bit of formatting (the `booktabs` option makes the table look nicer, and the `striped` option is adding the light-grey shading for readability). See `?knitr::kable()` for details. For more advanced options, the `kableExtra` package is recommended. If you have a specific request you can usually find helpful examples from an internet search. ## Plot the data ```{r plotthedata, fig.cap="A plot of the generated data."} plot(x, y) ``` It's always good to plot our data, which we do in Figure\ \ref{fig:plotthedata}. Note that the figure does not appear exactly where it does in the .Rmd file -- knitr (via Latex) is automatically choosing the best place, based on the `fig.pos = "tbp"` option that is set in `knitr::opts_chunk$set` at the start of the .Rmd file. The "tbp" means first try and place the figure at the top of the current page, else then try the bottom of the current page, else then put it on the next page. So your document will look more professional than having a figure breaking up the text -- this is standard publishing practice (look at a textbook or journal article), and saves you having to worry about formatting. However, if necessary you can override this automation by setting `fig.pos="h"`, for example, in the header of the chunk, where "h" means to put the figure here, where the code appears. If that doesn't work (because there isn't really space for it), you can set `fig.pos="H"`, where the capital "H" forces the position. If you try it here (add `, fig.pos="H"` after the `fig.cap="...."` in the chunk above) you will see that forcing the figure onto a new page leaves a lot of ugly white space. The point of this is that you don't need to worry about the formatting -- knitr and Latex do it for you (but you can usually always override everything). ## Now to fit a linear regression Figure\ \ref{fig:plotthedata} suggest we could fit a simple linear regression to the data. ```{r regression} fit = lm(y~x) ``` We do this in Figure\ \ref{fig:plotfit}, which shows a decent fit (as expected, since we simulated the data from a linear relationship with a small amount of noise). ```{r plotfit, fig.cap="Linear regression (red line) fit to the data."} plot(x, y) abline(fit, col="red") ``` And in Table\ \ref{tab:fit} we show the results of the regression fit. ```{r fit} kable(coefficients(summary(fit)), caption = "Linear regression fit.", booktabs = TRUE, label = "fit", digits = 2, position = "b") # This position argument forces the table to the bottom # of the page ``` ## Summarise the results So the maximum value of $y_i$ is `r round(max(y),0)`, which is `r ifelse(max(y)>400, paste("greater than"), paste("less than"))` the special value of 400. So you can actually somewhat automate the text (just be careful and think about other possibilities -- what if $y=399.9$ in the above example?). ## Now, let's go back and change the data The *big* feature of dynamically generating reports is when you go back and change or update the input data. You could go back and change definition of $y$ with something like this line: ```{r changey, eval=FALSE, echo=TRUE} y = 10*x^1.5 + rnorm(n, 0, 10) ``` And then re-run this document. It should build but with your newly generated data.