---
title: "Reproducible Research"
author: "Dr. Hua Zhou @ UCLA"
date: "Jan 11, 2022"
subtitle: Biostat 203B
output:
# ioslides_presentation: default
html_document:
toc: true
toc_depth: 4
bibliography: ../bib-HZ.bib
---
## Reproducible research in statistics/data science
> An article about computational science in a scientific publication is **not** the scholarship itself, it is merely **advertising** of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.
>
> @BuckheitDonoho95ReproRes
## Non-reproducible research
### [Duke Potti scandal](https://en.wikipedia.org/wiki/Anil_Potti)
----
- @Potti06GenomeSignature:
- @BaggerlyCoombes09:
- [Simply Statistics Blog: The Duke Saga Starter Set](https://simplystatistics.org/posts/2012-02-27-the-duke-saga-starter-set/)
### Microarray studies
Nature Genetics (2015 Impact Factor: 31.616). 20 articles about microarray profiling published in Nature Genetics between Jan 2005 and Dec 2006.
### Bible code
- @WitztumRipsRosenberg94BibleCode
- @McKayBarNatanBarHillelKalai99BibleCode
## Why reproducible research
- Reproducibility has been the foundation of science. It helps accumulate scientific knowledge.
- Greater research impact.
- Better work habit boosts quality of research.
- Better teamwork. For **you** as graduate students, it means better communication with your advisor.
```{r, eval=FALSE}
while true
Student: "that idea you told me to try - it doesn't work!"
Professor: "ok. how about trying this instead."
end
```
Unless you reproduce the computing environment (algorithms, dataset, tuning parameters), others cannot help you.
## How to be reproducible in data science?
> When we publish articles containing figures which were generated by computer, we also publish the complete software environment which generates the figures.
>
> @BuckheitDonoho95ReproRes
- A good example:
- I **highly** recommend the book _Reproducible Research with R and RStudio_ by Christopher Gandrud.
- [Amazon](https://www.amazon.com/Reproducible-Research-Studio-Chapman-Hall/dp/1466572841)
- [GitHub repo](https://github.com/christophergandrud/Rep-Res-Book)
## Tools for reproducible research
- Version control: Git+GitHub.
- Distribute method implementation, e.g., R/Python/Julia packages, on GitHub or bitbucket.
- Dynamic document: RMarkdown for R or [Jupyter](http://jupyter.org) for Julia/Python/R.
- Docker container for reproducing a computing environment.
- Cloud computing tools.
- We are going to practice reproducible research **now**. That is to make your homework reproducible using Git, GitHub, and RMarkdown.
## References