Nicer formatting for code (setq org-latex-listings t) (setq org-latex-listings 'minted) '(org-export-latex-listings-langs (quote ((emacs-lisp "Lisp") (lisp "Lisp") (clojure "Lisp") (c "C") (cc "C++") (fortran "fortran") (perl "Perl") (cperl "Perl") (python "Python") (ruby "Ruby") (html "HTML") (xml "XML") (tex "TeX") (latex "TeX") (shell-script "bash") (gnuplot "Gnuplot") (ocaml "Caml") (caml "Caml") (sql "SQL") (sqlite "sql") (R-mode "R")))) (setq org-latex-minted-options '(("linenos=true") ("bgcolor=lightgray") ("tabsize=2"))) #+END_SRC # ---------------------------------------------------------------------- # End preamble # ---------------------------------------------------------------------- # Start with doublespacing \clearpage * Why should we even care? ** Code :ignore: #+HEADER: :exports none #+BEGIN_SRC R :results silent :session #if (grepl("Zurich2018", getwd())) { # setwd("../../../src/") # load fe data to obtain slopes #----------------------------------------------------------------------- source("mrr_load.R") #+END_SRC \center \Huge It's not reproducible \\ if it only runs on your laptop! \vspace{4mm} \tiny http://www.jonzelner.net/docker/reproducibility/ * We are faced with a replication crisis #+ATTR_LATEX: :width 0.6\textwidth [[file:mrr_schoenbrodt2018.pdf]] \center \tiny Credit: Felix Schoenbrodt 2018 (@nicebread303) * We are faced with a replication crisis ** Left :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: #+ATTR_LATEX: :width 0.5\textwidth [[file:mrr_osc2015.pdf]] \center \tiny Open Science Collaboration 2015, Science ** Right :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: - Ongoing methodological crisis in science - Results of many scientific studies difficult to replicate - Involves diverse fields (from psychology to cancer research) \center \tiny Ioannidis 2005, PLOS Med * In addition * In addition \center \Huge We are probably also faced \\ with a reproducibility crisis * We are probably also faced with a reproducibility crisis ** Left :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: #+ATTR_LATEX: :width 0.6\textwidth [[file:mrr_economist2013.pdf]] \center \tiny ** Right :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: #+ATTR_LATEX: :width 0.8\textwidth [[file:mrr_baker2016.pdf]] \center \tiny Baker 2016, Nature * Replicating vs. reproducing - Replicating: People going out and collecting new data - Reproducing: People analyzing the same data - What cannot be replicated often difficult to reproduce \vspace{6mm} \center \tiny jblevins.org/log/rep * What makes it hard to reproduce our own work? * What makes it hard to reproduce our own work? \center \Huge We don't always use \\ reproducible work flows * What does that mean? - Using just intuition when organizing data, manuscripts, code - The same goes for analyzing data - After a couple of months (sometimes weeks) it is hard to remember: * What does that mean? - Using just intuition when organizing data, manuscripts, code - The same goes for analyzing data - After a couple of months (sometimes weeks) it is hard to remember: 1. What we did * What does that mean? - Using just intuition when organizing data, manuscripts, code - The same goes for analyzing data - After a couple of months (sometimes weeks) it is hard to remember: 1. What we did 2. Why we did it * What does that mean? - Using just intuition when organizing data, manuscripts, code - The same goes for analyzing data - After a couple of months (sometimes weeks) it is hard to remember: 1. What we did 2. Why we did it 3. How we did it * What can we do about it? Three simple rules: * What can we do about it? Three simple rules: 1. Separate data from analysis * What can we do about it? Three simple rules: 1. Separate data from analysis 2. Use version control * What can we do about it? Three simple rules: 1. Separate data from analysis 2. Use version control 3. Use code to analyze data (not GUIs) * What can we do about it? Three simple rules: 1. *Separate data from analysis* 2. Use version control 3. Use code to analyze data (not GUIs) * Separating data from analysis #+ATTR_LATEX: :width 0.6\textwidth [[file:mrr_example2018.pdf]] * Was this done here? * Was this done here? \center \animategraphics[autoplay,width=0.3\textwidth]{25}{mrr_devito2018-}{0}{65} * Separating data from analysis ** Left :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: #+ATTR_LATEX: :width 0.7\textwidth [[file:mrr_example2018a.pdf]] ** Right :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: * Separating data from analysis ** Left :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: #+ATTR_LATEX: :width 0.7\textwidth [[file:mrr_example2018a.pdf]] ** Right :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: - We want one and only one data set to work with - Once finalized (cleaned etc.), it is never touched again - Any analysis reads from but never writes to this data set * What can we do about it? Three simple rules: 1. Separate data from analysis 2. *Use version control* 3. Use code to analyze data (not GUIs) * Use a version control system (= use git) #+ATTR_LATEX: :width 0.7\textwidth [[file:mrr_example2018b.pdf]] \center \tiny https://www.quora.com/ * What is git and why should I use it? ** Left :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: - Version control system for source code management - Tracks every file in a project - Keeps track of any change to any file - Is relatively easy to use - Downside: it works best with text ** Right :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: #+ATTR_LATEX: :width 0.8\textwidth [[file:mrr_example2018c.pdf]] * Example: git * What can we do about it? Three simple rules: 1. Separate data from analysis 2. Use version control 3. *Use code to analyze data (not GUIs)* * Why code? #+ATTR_LATEX: :width 1.0\textwidth [[file:mrr_example2018d.pdf]] * Why code? ** Left :PROPERTIES: :BEAMER_col: 0.4 :BEAMER_opt: [T] :END: - *To keep track of the workflow* - To make the analysis transparent - To improve your skills and get more efficient as you code ** Right :PROPERTIES: :BEAMER_col: 0.6 :BEAMER_opt: [T] :END: *** Block \footnotesize #+NAME: code1 #+BEGIN_SRC R :session :exports code :results silent parse_msd <- function(m, sd) { # # this function will # produce a nicely formatted string of # mean and sd to be used inline in text # print(paste("M = ", round(m, 2), ", SD = ", round(sd, 2), sep="")) } #+END_SRC \normalsize * Without code your analysis won't be reproducible Options: - R or RStudio (it's free!), ideally also Python (it's free!) - Alternatively, Matlab (great, but commercial) - SAS (has been the market leader in commercial analytics, and it does include a free University Edition now) * Without code your analysis won't be reproducible Options: - *R or R studio (it's free!), ideally also Python (it's free!)* - Alternatively, Matlab (great, but commercial) - SAS (has been the market leader in commercial analytics, and it does include a free University Edition now) * Example: R ** LeftRight :PROPERTIES: :BEAMER_col: 1.0 :BEAMER_opt: [T] :BEAMER_env: block :END: *** Block \tiny #+NAME: code1 #+BEGIN_SRC R :session :exports code :results silent #----------------------------------------------------------------------- # This is a simple R program # 9/18/18, PH #----------------------------------------------------------------------- # # 1. Load and visualize data #----------------------------------------------------------------------- dat <- read.csv("../data/mrr.csv") # Histogramms hist(dat$y[dat$group=="X"], col="blue") hist(dat$y[dat$group=="Y"], col="blue") # 2. Compute linear model, adjusted for age #----------------------------------------------------------------------- lmfit <- lm(y ~ group + age, data=dat) # 3. Visualize residuals to check model assumptions #----------------------------------------------------------------------- plot(density(resid(lmfit))) # 4. Print coefficients #----------------------------------------------------------------------- summary(lmfit) #+END_SRC \normalsize * Coding: the good news - It is easier than you think - Once one language is learned, it's easy to learn another one * Summary: How to make research reproducible Essential: 1. Separate data and analysis 2. Use git to keep track of changes 3. Use R to keep track of your workflow Optional: 4. Combine coding and writing to produce manuscripts 5. Use Make to build your project * Summary: How to make research reproducible Essential: 1. Separate data and analysis 2. Use git to keep track of changes 3. Use R to keep track of your workflow Optional: 4. *Combine coding and writing to produce manuscripts* 5. Use Make to build your project * Combining coding and writing ** Left :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: Several Options: - knitr (RStudio) - *org-mode* - sweave ** Right :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: #+ATTR_LATEX: :width 1.0\textwidth [[file:mrr_example2018e.pdf]] * Example: org-mode * Summary: How to make research reproducible Essential: 1. Separate data and analysis 2. Use git to keep track of changes 3. Use R to keep track of your workflow Optional: 4. Combine coding and writing to produce manuscripts 5. *Use Make to build your project* * Example: Makefile * Conclusion - We need transparent and reproducible workflows - Efficient way to improve analyses and writing - Sharing data, code, workflows may become a requirement * Acknowledgments # \footnotesize ** Left :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: \usebeamerfont{acknowledgments} \singlespacing - Joe Zellner (jonzelner.net/docker/reproducibility/) - Andrew Gelman (andrewgelman.com) - Papaja package in R (crsh.github.io/papaja_man/) ** Right :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: #+ATTR_LATEX: :width 0.9\textwidth [[file:mrr_example2018f.pdf]] * References :ignore: \bibliographystyle{npp} \nobibliography{master}