--- title: "Reproducible Research" author: "Dr. Hua Zhou @ UCLA" date: "Jan 15, 2019" subtitle: Biostat M280 output: # ioslides_presentation: default html_document: toc: true toc_depth: 4 bibliography: ../bib-HZ.bib --- ## Reproducible research in statistics/data science > An article about computational science in a scientific publication is **not** the scholarship itself, it is merely **advertising** of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. > > @BuckheitDonoho95ReproRes ## Non-reproducible research | [Duke Potti scandal](https://en.wikipedia.org/wiki/Anil_Potti)

---- - @Potti06GenomeSignature: - @BaggerlyCoombes09: - [Simply Statistics Blog: The Duke Saga Starter Set](https://simplystatistics.org/2012/02/27/the-duke-saga-starter-set/) ## Non-reproducible research | Microarray studies

Nature Genetics (2015 Impact Factor: 31.616). 20 articles about microarray profiling published in Nature Genetics between Jan 2005 and Dec 2006. ## Non-reproducible research | Bible code

- @WitztumRipsRosenberg94BibleCode - @McKayBarNatanBarHillelKalai99BibleCode ## Why reproducible research - Reproducibility has been the foundation of science. It helps accumulate scientific knowledge. - Greater research impact. - Better work habit boosts quality of research. - Better teamwork. For **you** as graduate students, it means better communication with your advisor. ```{r, eval=FALSE} while true Student: "that idea you told me to try - it doesn't work!" Professor: "ok. how about trying this instead." end ``` Unless you reproduce the computing environment (algorithms, dataset, tuning parameters), others cannot help you. ## How to be reproducible in statistics/data science? > When we publish articles containing figures which were generated by computer, we also publish the complete software environment which generates the figures. > > @BuckheitDonoho95ReproRes - A good example: - I **highly** recommend the book _Reproducible Research with R and RStudio_ by Christopher Gandrud. - [Amazon](https://www.amazon.com/Reproducible-Research-Studio-Chapman-Hall/dp/1466572841) - [GitHub repo](https://github.com/christophergandrud/Rep-Res-Book) ## Tools for reproducible research - Version control: Git+GitHub. - Distribute method implementation, e.g., R packages, on GitHub or bitbucket. - Dynamic document: RMarkdown for R or [Jupyter](http://jupyter.org) for Julia/Python/R. - Docker container for reproducing a computing environment. - Cloud computing tools. - We are going to practice reproducible research **now**. That is to make your homework reproducible using Git, GitHub, and RMarkdown. ## References