This page was last updated on September 03, 2019.
Are your experiments conducted and documented in a way that other researchers could follow the steps and get similar results? If yes, then your experiments are considered reproducible. We can’t expect to get exactly the same results in independent experiments; Why? When experiments take place in the lab and especially in the field, we can’t control all the potential sources of variation in the experiment. So long as the results are similar enough to not change the conclusions, then we’re ok.
Conducting reproducible research is extremely challenging - more than most scientists appreciate!
See, for example, an incredible, recent case concerning ageing experiments with C. elegans, see here.
When experiments or analyses are conducted in silico, i.e. on the computer, they should be able to be reproduced exactly. This is known as computational reproducibility. This refers to being able to pass all of one’s data analysis, data-sets, and conclusions to someone else and have them get exactly the same results on their machine.
A key goal of our Biology program at UBC Okanagan is to help students understand the importance of conducting reproducible research, regardless of where your research takes place (field, lab, computer). We want students (and professors) to get into the habit of making their entire research workflow reproducible from the very beginning. This means we’ll be trying to help you build important habits. This will take practice and will be difficult at times. You’ll see why it is so important for you to keep track of your code and to document it diligently to help yourself later (and any potential collaborators as well).
We will be using R Markdown to create reproducible lab reports.
Why use R Markdown for Lab Reports?
R Markdown is a markup language that provides an easy way to produce a rich, fully-documented reproducible analysis. It allows its user to share a single file that contains all of the commentary, R code, and metadata needed to reproduce the analysis from beginning to end. R Markdown allows for “chunks” of R code to be included along with Markdown text to produce a nicely formatted HTML, PDF, or Word file without having to know any complicated programming languages or having to fuss with getting the formatting just right in a Microsoft Word DOCX file.
One R Markdown file can generate a variety of different formats and all of this is done in a single text file with a few bits of formatting. You’ll be pleasantly surprised at how easy it is to write an R Markdown document after your first few encounters.
A good habit to get into whenever you start a new project with R code is to create a new RStudio project to go along with it. RStudio project files have the extension .Rproj
and store metadata that goes along with the documents you’ve saved and information about the R environment you are working in. More information about RStudio projects is available from RStudio, Inc. here.
The GIF below shows you how to create a new RStudio project called initial
and also your first R Markdown file. Note that you also may see a description about what version of R is running on your initial session like shown in the GIF below in the Console pane.
***
We have our first_rmarkdown.Rmd file set up.
Whether you’re working on your own laptop or on the lab desktop computers, you should think carefully about how to organize your BIOL202 lab materials: well-organized material will make studying for the final lab test much easier! A good strategy is to create a new directory on your UBC network drive called “BIOL202_labs”. Then follow these steps to create a project for your current lab:
For the first week of labs, you would use the project name “Lab_00”.
NOTE Recall that each assignment, you’ll be provided a new Markdown document to start with.
IMPORTANT: If you are working of your own laptop, be sure to back up your BIOL202 lab work on your network drive frequently.
ADVICE: You should refrain from typing your code directly in the console. Instead, type any code (final correct answer, or anything you’re just trying out) in the R Markdown file and run the chunk using either the Run button on the chunk (green sideways triangle) or by highlighting the code and clicking Run on the top right corner of the R Markdown editor. If at any point you need to start over, you can Run All Chunks above the chunk you’re working in by clicking on the down arrow in the code chunk.
Below, you’ll find more information about what R Markdown can do, and some of the options you can configure. Some of the information is more detailed than you need for completing standard assignments.
We now shift back to the R Markdown file called first_rmarkdown.Rmd created earlier. We know that we left some errors in the creation of variables there, and while it might seem strange to show you errors, it is good exposure for someone new to this to see what errors one might see initially. We are going to see what happens when we click the Knit HTML button with these errors. Then we will clean up the code and see what the resulting file looks like from the Knit.
***
When you initially created an R Markdown file, a basic template with some code and text was inputted for you. This is to give you a sense of how to create your own R Markdown file with your own R code and your own commentary. We modified some of that code here. I decided to remove all of the lines in the cars
named chunk of code even though the errors did not occur in the declaration of the objects that had names stored in them. We see that an HTML file is produced in the Viewer pane because View in Pane was selected.
As you look over the Including Plots text you may be surprised to see that there is no plot provided in the R Markdown file, but in the HTML file there is a scatter-plot showing temperature and pressure varying. This is something alluded to earlier. R Markdown runs the code stored in R chunks and then places that output into the HTML (or PDF or DOCX, etc.) format.
You can also see that the text appears as commentary before and after the R code. You’ll understand in a bit why the text “Including Plots” is so much larger than the other text.
Important note: Remember that all of the R code you want to run needs to be stored in a chunk (in the right order) for your analysis to be reproducible AND for you not to receive errors when you Knit. It is easy to do a lot of work in the R Console and then forget to add that work into a chunk in your Rmd file. This is probably the number one error you will see when you first begin working in RStudio. An example of this error is below in a GIF file.
***
The object not found
errors are the most frequently encountered errors and along with misspellings and not completing R code segments provide the vast majority of issues with R.
Line breaks / white spacing
Line breaks in combination with white space are incredibly important pieces in Markdown. They frequently denote the start of a new paragraph.
Here is an example of text with only a line break.
You may expect this line to appear in a new paragraph but it doesn't.
Here is an example of text with only a line break. You may expect this line to appear in a new paragraph but it doesn’t.
In order to start a new paragraph, you’ll need to be sure that white space exists between the two paragraphs:
Here is an example of text with a line break and white space.
You may expect this line to appear in a new paragraph and it does.
Here is an example of text with a line break and white space.
You may expect this line to appear in a new paragraph and it does.
Horizontal rules
Another useful way to divide up different parts of your analysis is by including horizontal lines. These can be easily adding by placing three asterisks next to each other (or three hyphens):
Blockquotes
If you’d like to quote someone or produce an indented text block, you can easily do so by adding a >
before the passage:
> Reproducible research is the idea that data analyses, and more generally,
scientific claims, are published with their data and software code so that
others may verify the findings and build upon them. - Roger Peng
Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. - Roger Peng
It took us a bit to get to what I believe is the best part about R Markdown: the addition of R code into the document and a compilation of that code in the resulting knitted document. You’ve seen some R chunks in the R Markdown file already. There are some properties to get used to about them:
```
.{r
, potentially some names and other chunk options, and then close that first line with a }
.Note that spaces in front of these backticks will produce
In our first_rmarkdown.Rmd file, let’s explore an example of recognizing and creating our own R chunks:
***
This example introduces you to two different ways to create a vector of values. You can see that the code is automatically run when we pressed the Knit HTML button with its output in the knitted file.
Important note: Any other potential R chunks after this chunk will have access to the three variables created here: count20
, count100_by_5
, and prod
. Any chunks before the mult_vectors
named chunk WILL NOT have access to these variables yet. You can read the document like a book, so it is important to add objects and work with them in appropriate order. You’ll receive errors from R if you don’t.
As you see in the previous example, it is a good habit to highlight the names of your R objects to distinguish them from the usual text. This can be done by enclosing the word in a single backtick such as what we did with one_value
.
The most common R chunk options you’ll likely work with are echo
, eval
, and include
. By default, all three of these options are set to TRUE
.
echo
dictates whether the code that produces the result should be printed before the corresponding R output
eval
specifies whether the code should be evaluated or just displayed without its output
include
specifies whether the code AND its output should be included in the resulting knitted document. If it is set to FALSE
the code is run but no remnant of the code or its output will be in the resulting document.
***
Since we specified that eval=FALSE
and that was where we declared the one_value
variable we now obtain an error. You can include multiple chunk options by separating with a comma.
***
White space is your friend. You should always include a blank white space between R chunks and your Markdown text. It makes your document much more readable and can reduce some potential errors. Also, leave a line of white space between header text and your paragraphs.
Commentary is always good. Explain yourself and your ideas whenever you can. Remember that your greatest collaborator is likely yourself a few months down the road. Be nice to yourself and explain what you are doing so that you can remember!
Remember that the Console and R Markdown environments (when Knitting) don’t interact with each other. This forces you to include only the code in your R chunks that produces exactly the results you want to share with others. Don’t inflate your document with extra output. You need to be concise and clear in exactly what you are doing.
The chunk options can really beautify your documents and customize them exactly to what you’d like the reader of your documents to see. You can find more information on all of the available R chunk options here.
Near the top of your R Markdown editor window sits one of the more useful tools for writing documents: the spell-check button. It is the green check-mark with “ABC” above it:
Before you submit a document or share it with someone else, please run a spell-check of your document. You’ll need to add some R commands to the dictionary or ignore them since those may not be words, but it is easy to misspell words as we type and this feature can really help.
RStudio includes really nice cheatsheets that can act as great references to many of the common tasks you will do inside of RStudio. You can get nice PDF versions of the files by going to Help -> Cheatsheets inside RStudio.
***