---
title: "Simple knitr Rmarkdown example -- automatic write-up of simulated data"
author: "Author name goes here"
output:
  pdf_document:
    keep_tex: true
    extra_dependencies:
      mathptmx: null
      xcolor: ["table"]
      booktabs: null
      wrapfig: null
      float: null
      colortbl: null
      pdflscape: null
      tabu: null
      makecell: null
fontsize: 12pt
date: "Last compiled on `r format(Sys.time(), '%d %B, %Y')`"
---

<!-- This is a comment.
     The above is called the YAML. It includes settings such as the title,
     author, fontsize and automated date (see how it calls on R to do that). It
     also set to output to a pdf, and then keep_tex means to save the .tex file
     (more on that later), and extra_dependencies specifies latex packages to
     include. Not all are needed for this document, but these are ones that we
     have found useful.

	 Important: if you edit the YAML stuff then you must uses spaces, not TABs
     (the spaces are important). If you ever get an error straight away like
     this:
	 "Error in yaml::yaml.load(..., eval.expr = TRUE) :
        Scanner error: while scanning a plain scalar at line 11, column 14 found
		a tab character that violate indentation at line 12, column 1"
	 then double check that you do not have tabs (your editor may automatically
     add them in).

	 To render this document, in R run
	   rmarkdown::render("rmark-pdf-example.Rmd")
	 or click the knit button in RStudio. It is helpful for colleagues (and your
	   future self) to include such a command in a document, especially if you
	   start using packages like bookdown to produce more complex documents,
	   requiring a different R command.
-->

<!-- In the next chunk we set knitr options for the output for all future
    chunks. These options are adapted from those used in our groundfish report
    mentioned earlier in the module, at:
    https://github.com/pbs-assess/gfsynopsis/blob/master/report/report-rmd/00-load.Rmd
    Some options are commented out but might be useful for you.
	We will explain the options - for more explanation and further options see
    https://yihui.org/knitr/options/  . The chunk also loads the necessary
    libraries, which we assume you have installed from doing rmark-exercise.Rmd -->

```{r setup, echo=FALSE, cache=FALSE, message=FALSE, results='hide', warning=FALSE}
rm(list = ls())
knitr::opts_chunk$set(collapse = TRUE,
                      fig.path = "knitr-figs/",
                      cache.path = "knitr-cache/",
                      fig.asp = 0.618,
                      fig.width = 9,
                      out.width = "6in",
                      echo = FALSE,
                      autodep = TRUE,
                      cache = FALSE,
                      fig.align = "center",
                      fig.pos = "tbp",
                      tab.pos = "tbp"
                      )
knitr::knit_hooks$set(document = function(x) {
                                   sub('\\usepackage[]{color}',
                                       '\\usepackage{xcolor}', x, fixed = TRUE)
})  # Last command from
    # https://tex.stackexchange.com/questions/148188/knitr-xcolor-incompatible-color-definition
library(kableExtra)
library(tibble)
library(xtable)
```

<!-- So those commands do not show up in the rendered .pdf because of the
     echo=FALSE option in the first line. For future chunks, the R code and
     results again do not show up, because echo = FALSE is set for all chunks in
     the knitr::opts_chunk$set call above. -->

# Introduction

The document produces output based on calculations similar to those
done in `rmark-exercise.Rmd`, but more in the style of a report, or a manuscript to be
submitted to a journal. As such, the output is .pdf and the options are set so
as to hide the underlying code.

Again, read through the .Rmd carefully to
see any comments as well as to understand how the output is being produced. Here
we are
producing a more polished document than in `rmark-exercise.Rmd`. The rendered .pdf
will not show the code, yet the code is still explicitly given
in the .Rmd file. So the R code can easily be changed to re-run the analyses and
update the document automatically, and all calculations, figures, and tables are
traceable back to the code.

# Generate data

<!-- So all chunks, such as this next one, get evaluated but the output is not
     printed in the .pdf -->

```{r generate}
set.seed(42)
n = 50                      # sample size
x = 1:n
sigma = 10
y = 10*x + rnorm(n, 0, sigma)
```

We have generated an example data set with a sample size of $n=$ `r n`, with values
$x_i = 1, 2, 3, ..., n$ and corresponding $y_i$ values given by
\begin{equation}
y_i = 10 x_i + \epsilon_i,
\end{equation}
where
\begin{equation}
\epsilon_i \sim \mbox{Normal}(0, `r sigma`).
\label{epsilon}
\end{equation}


<!-- In the first sentence we used $..$ to have the variables as mathematical
     symbols (x_i is latex for 'x with a subscript i') -- see the .pdf.
     Then for the first equation we used \begin{equation} and \end{equation} to
     display it 'display style', which just means not within a sentence and with
     an automatic equation number. The 'label{epsilon}' gives the equation the
     label 'epsilon', such that we can refer to it in the text. -->

<!-- Also note that the R code and the equations are not 100% in sync. If you change
     the 10 in rnorm(0, 10) to 5, you would have to change both the R code to
     rnorm(0, 5) and the equation write-up to \mbox{Normal}(0, 5). To avoid
     errors, you could automate this as follows: -->

The maximum value of $y_i$ is `r max(y)`, or rounded to two decimal places it is
`r round(max(y),2)`. The equation for $\epsilon$ is shown in (\ref{epsilon}).
<!-- That last sentence has an example of referencing an equation number. The
     numbers are automated - if a new equation was added earier then this would
     be updated to (3) in the .pdf. -->

## Show some data


```{r showdata}
data = tibble::as_tibble(cbind(x, y))
```

```{r tweak}
kable(data[1:10,],
      caption = "The first rows of the data, nicely formatted.",
      booktabs = TRUE,
      label = "thedata") %>%
    kable_styling(latex_options = c("striped"))
```
The first 10 rows of data are shown in Table \ref{tab:thedata}. This was done
using the `kable()` function, which is part of the knitr package.
This basic example shows how to easily add a label (that is then
referred to in the first sentence here), and add a bit of formatting (the
`booktabs` option makes the table look nicer, and the `striped` option is adding
the light-grey shading for readability). See `?knitr::kable()`
for details. For more advanced options, the `kableExtra` package is
recommended. If you have a specific request you can usually find helpful
examples from an internet search.

## Plot the data

```{r plotthedata, fig.cap="A plot of the generated data."}
plot(x, y)
```

It's always good to plot our data, which we do in Figure\ \ref{fig:plotthedata}.
<!-- We use `Figure\` there because that makes the space a non-breaking space,
     which will force `Figure` and the resulting 1 to not split over a
     line. Which is good practice. So you will not get: "which we do in Figure
	 1. Rather this will always force the 1 to be on the same line. Remember
     Latex is a typesetting language designed to make output look
     professional. -->
Note that the figure does not appear exactly where it
does in the .Rmd file -- knitr (via Latex) is automatically choosing the best
place, based on the `fig.pos = "tbp"` option that is set in
`knitr::opts_chunk$set` at the start of the .Rmd file. The "tbp" means first try
and place the figure at the top of the current page, else then try the bottom of
the current page, else then put it on the next page. So your document will look
more professional than having a figure breaking up the text -- this is standard
publishing practice (look at a textbook or journal article), and saves you
having to worry about formatting. However, if necessary you can override this
automation by setting `fig.pos="h"`, for example, in the header of the chunk,
where "h" means to put the figure here, where the code appears. If that doesn't
work (because there isn't really space for it), you can set `fig.pos="H"`, where
the capital "H" forces the position. If you try it here (add `, fig.pos="H"`
after the `fig.cap="...."` in the chunk above) you will see that forcing the
figure onto a new page leaves a lot of ugly white space. The point of this is
that you don't need to worry about the formatting -- knitr and Latex do it for
you (but you can usually always override everything).

## Now to fit a linear regression

Figure\ \ref{fig:plotthedata} suggest we could fit a simple linear regression to
the data.
```{r regression}
fit = lm(y~x)
```
We do this in Figure\ \ref{fig:plotfit}, which shows a decent fit (as expected,
since we simulated the data from a linear relationship with a small amount of noise).
```{r plotfit, fig.cap="Linear regression (red line) fit to the data."}
plot(x, y)
abline(fit, col="red")
```
<!-- So you want to put your code that creates the plot close to where you first
     reference it. The figure should always be on the same page (or as soon as
     possible afterwards) as the first text reference to it. Journals may still
	 insist on all figures being in one place at the end of a manuscript, but if
     they don't insist then it can help your reviewers if the figures are in a
     more sensible position. If you have lots of figures relative to little text
     (as we often have in our fisheries stock assessments), then it may be best
     to stick with keeping all figures at the end.   -->

And in Table\ \ref{tab:fit} we show the results of the regression fit.
```{r fit}
kable(coefficients(summary(fit)),
      caption = "Linear regression fit.",
      booktabs = TRUE,
      label = "fit",
      digits = 2,
      position = "b")  # This position argument forces the table to the bottom
                       #  of the page
```

## Summarise the results

So the maximum value of $y_i$ is `r round(max(y),0)`, which is
`r ifelse(max(y)>400, paste("greater than"), paste("less than"))`
the special value of 400.

So you can actually somewhat automate the text (just be careful and think about
other possibilities -- what if $y=399.9$ in the above example?).

## Now, let's go back and change the data

The *big* feature of dynamically generating reports is when you go back and
change or update the input data.

You could go back and change definition of $y$ with something like this line:
```{r changey, eval=FALSE, echo=TRUE}
y = 10*x^1.5 + rnorm(n, 0, 10)
```
<!-- Note the use of eval=FALSE and echo=TRUE here, we do not want to run this
     chunk of code, but want to show it in the .pdf -->
And then re-run this document. It should build but with your newly generated data.