#+STARTUP: beamer #+TITLE: Making research reproducible: git, R and org-mode * Preamble :ignore: ** Comments :ignore: # ---------------------------------------------------------------------- # - Turn on synonyms by starting synosaurus-mode # - Look up words using C-c sr # - Turn on dictionary by starting flyspell-mode # - Count words by section using org-wc-display # ---------------------------------------------------------------------- ** org specific settings :ignore: # ---------------------------------------------------------------------- #+OPTIONS: email:nil toc:nil num:nil title:t author:nil date:nil tex:t #+STARTUP: align fold logdone #+SEQ_TODO: TODO(t) #+TAGS: figure(f) check(c) noexport(n) ignore(i) #+LANGUAGE: en #+EXCLUDE_TAGS: noexport TODO # ---------------------------------------------------------------------- ** Latex header :ignore: # ---------------------------------------------------------------------- #+LATEX_CLASS: mybeamerfeinstein #+LATEX_HEADER: \usepackage{setspace} #+LATEX_HEADER: \usepackage{amsmath} #+LATEX_HEADER: \usepackage{fontspec} #+LATEX_HEADER: \usepackage{textpos} #+LATEX_HEADER: \usepackage{minted} #+LATEX_HEADER: \usepackage{bibentry} #+LATEX_HEADER: \usepackage[export]{adjustbox} #+LATEX_HEADER: \usepackage{graphicx,caption} #+LATEX_HEADER: \usepackage{eurosym} #+LATEX_HEADER: \usepackage{listings} #+LATEX_HEADER: \usepackage{textcomp} #+LATEX_HEADER: \usepackage{animate} #+LATEX_HEADER: \graphicspath{{../output/figures/}{../ext/logos/}{../lib/}} # ---------------------------------------------------------------------- ** Latex macros :ignore: # ---------------------------------------------------------------------- #+LATEX_HEADER: \newcommand{\auth}{Philipp Homan, MD, PhD} #+LATEX_HEADER: \newcommand{\authemail}{phoman1@northwell.edu} #+LATEX_HEADER: \newcommand{\authtwitter}{@philipphoman} #+LATEX_HEADER: \newcommand{\authgithub}{github.com/philipphoman} # ---------------------------------------------------------------------- ** Beamer-specific header :ignore: # ---------------------------------------------------------------------- #+LaTeX_CLASS_OPTIONS: [aspectratio=169, bigger] # ---------------------------------------------------------------------- ** Authors and affiliations :ignore: # ---------------------------------------------------------------------- #+LATEX_HEADER: \author{Philipp Homan, MD, PhD} #+LATEX_HEADER: \institute[shortinst]{ #+LATEX_HEADER: \footnotesize\vspace{5mm} #+LATEX_HEADER: \url{phoman1@northwell.edu}\\ #+LATEX_HEADER: \url{http://github.com/philipphoman/mrr}} # ---------------------------------------------------------------------- ** Buffer-wide source code blocks :ignore: # ---------------------------------------------------------------------- # Set elisp variables need for nice formatting We want no new lines in # inline results and a paragraph size of 80 characters Important: this # has to be evaluated witch C-c C-c in order to work in the current # buffer #+BEGIN_SRC emacs-lisp :exports none :results silent ; set timestamp format ;(setq org-export-date-timestamp-format "%ft%t%z") (require 'org-wc) (flyspell-mode t) (synosaurus-mode t) (auto-complete-mode t) (linum-mode t) (whitespace-mode t) (setq org-babel-inline-result-wrap "%s") (setq org-export-with-broken-links "mark") (setq fill-column 72) (setq whitespace-line-column 72) ;(setq org-latex-caption-above '(table image)) (setq org-latex-caption-above nil) (org-toggle-link-display) ; don't remove logfiles at export (setq org-latex-remove-logfiles nil) ; keybindings ; (global-set-key (kbd " c") "#+CAPTION: ") (defun setfillcolumn72 () (interactive) (setq fill-column 72) ) (defun setfillcolumn42 () (interactive) (setq fill-column 42) ) (define-key org-mode-map (kbd "C-c #") "#+CAPTION: ") (define-key org-mode-map (kbd "C-c f c 4 2") 'setfillcolumn42) (define-key org-mode-map (kbd "C-c f c 7 2") 'setfillcolumn72) (setq org-odt-category-map-alist '(("__figure__" "*figure*" "value" "figure" org-odt--enumerable-image-p))) ; let ess not ask for starting directory (setq ess-ask-for-ess-directory nil) ;(setq org-latex-pdf-process '("latexmk -pdflatex='xelatex ;-output-directory=../output/tex/ -interaction nonstopmode' -pdf ;-bibtex -f %f")) ;(setq org-latex-pdf-process '("latexmk -pdf ; -pdflatex='xelatex -shell-escape -interaction nonstopmode' -bibtex -f %f ")) (setq org-latex-pdf-process '("latexmk -pdflatex='xelatex -8bit -interaction nonstopmode' -shell-escape -pdf -bibtex -f %f")) (setq org-latex-logfiles-extensions (quote("bcf" "blg" "fdb_latexmk" "fls" "figlist" "idx" "log" "nav" "out" "ptc" "run.xml" "snm" "toc" "vrb" "xdv"))) (add-to-list 'org-structure-template-alist '("ca" "#+CAPTION: ")) (add-to-list 'org-structure-template-alist '("he" "#+LATEX_HEADER: ")) (add-to-list 'org-structure-template-alist '("dc" "src_R[:session]{}")) (add-to-list 'org-structure-template-alist '("sr" "#+HEADER: :exports none ,#+BEGIN_SRC R :colnames yes :results silent :session\n")) (add-to-list 'org-structure-template-alist '("er" "#+END_SRC")) (setq attrlatex "#+ATTR_LATEX: :width 1.0") (define-key org-mode-map (kbd "C-c #") attrlatex) (add-to-list 'org-structure-template-alist '("cl" "\n** Left\n:PROPERTIES: ?\n:BEAMER_col: 0.5 \n:END:")) (add-to-list 'org-structure-template-alist '("cr" "\n** Right\n:PROPERTIES: ?\n:BEAMER_col: 0.5 \n:END:")) (add-to-list 'org-structure-template-alist '("im" "#+ATTR_LATEX: :width 1.0\\textwidth \n[[file:")) (add-to-list 'org-structure-template-alist '("qt" "\\center \n\\tiny\n")) ; Nicer formatting for code (setq org-latex-listings t) (setq org-latex-listings 'minted) '(org-export-latex-listings-langs (quote ((emacs-lisp "Lisp") (lisp "Lisp") (clojure "Lisp") (c "C") (cc "C++") (fortran "fortran") (perl "Perl") (cperl "Perl") (python "Python") (ruby "Ruby") (html "HTML") (xml "XML") (tex "TeX") (latex "TeX") (shell-script "bash") (gnuplot "Gnuplot") (ocaml "Caml") (caml "Caml") (sql "SQL") (sqlite "sql") (R-mode "R")))) (setq org-latex-minted-options '(("linenos=true") ("bgcolor=lightgray") ("tabsize=2"))) #+END_SRC # ---------------------------------------------------------------------- # End preamble # ---------------------------------------------------------------------- # Start with doublespacing \clearpage * Why should we even care? ** Code :ignore: #+HEADER: :exports none #+BEGIN_SRC R :results silent :session #if (grepl("Zurich2018", getwd())) { # setwd("../../../src/") # load fe data to obtain slopes #----------------------------------------------------------------------- source("mrr_load.R") #+END_SRC \center \Huge It's not reproducible \\ if it only runs on your laptop! \vspace{4mm} \tiny http://www.jonzelner.net/docker/reproducibility/ * We are faced with a replication crisis #+ATTR_LATEX: :width 0.6\textwidth [[file:mrr_schoenbrodt2018.pdf]] \center \tiny Credit: Felix Schoenbrodt 2018 (@nicebread303) * We are faced with a replication crisis ** Left :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: #+ATTR_LATEX: :width 0.5\textwidth [[file:mrr_osc2015.pdf]] \center \tiny Open Science Collaboration 2015, Science ** Right :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: - Ongoing methodological crisis in science - Results of many scientific studies difficult to replicate - Involves diverse fields (from psychology to cancer research) \center \tiny Ioannidis 2005, PLOS Med * In addition * In addition \center \Huge We are probably also faced \\ with a reproducibility crisis * We are probably also faced with a reproducibility crisis ** Left :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: #+ATTR_LATEX: :width 0.6\textwidth [[file:mrr_economist2013.pdf]] \center \tiny ** Right :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: #+ATTR_LATEX: :width 0.8\textwidth [[file:mrr_baker2016.pdf]] \center \tiny Baker 2016, Nature * Replicating vs. reproducing - Replicating: People going out and collecting new data - Reproducing: People analyzing the same data - What cannot be replicated often difficult to reproduce \vspace{6mm} \center \tiny jblevins.org/log/rep * What makes it hard to reproduce our own work? * What makes it hard to reproduce our own work? \center \Huge We don't always use \\ reproducible work flows * What does that mean? - Using just intuition when organizing data, manuscripts, code - The same goes for analyzing data - After a couple of months (sometimes weeks) it is hard to remember: * What does that mean? - Using just intuition when organizing data, manuscripts, code - The same goes for analyzing data - After a couple of months (sometimes weeks) it is hard to remember: 1. What we did * What does that mean? - Using just intuition when organizing data, manuscripts, code - The same goes for analyzing data - After a couple of months (sometimes weeks) it is hard to remember: 1. What we did 2. Why we did it * What does that mean? - Using just intuition when organizing data, manuscripts, code - The same goes for analyzing data - After a couple of months (sometimes weeks) it is hard to remember: 1. What we did 2. Why we did it 3. How we did it * What can we do about it? Three simple rules: * What can we do about it? Three simple rules: 1. Separate data from analysis * What can we do about it? Three simple rules: 1. Separate data from analysis 2. Use version control * What can we do about it? Three simple rules: 1. Separate data from analysis 2. Use version control 3. Use code to analyze data (not GUIs) * What can we do about it? Three simple rules: 1. *Separate data from analysis* 2. Use version control 3. Use code to analyze data (not GUIs) * Separating data from analysis #+ATTR_LATEX: :width 0.6\textwidth [[file:mrr_example2018.pdf]] * Was this done here? * Was this done here? \center \animategraphics[autoplay,width=0.3\textwidth]{25}{mrr_devito2018-}{0}{65} * Separating data from analysis ** Left :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: #+ATTR_LATEX: :width 0.7\textwidth [[file:mrr_example2018a.pdf]] ** Right :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: * Separating data from analysis ** Left :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: #+ATTR_LATEX: :width 0.7\textwidth [[file:mrr_example2018a.pdf]] ** Right :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: - We want one and only one data set to work with - Once finalized (cleaned etc.), it is never touched again - Any analysis reads from but never writes to this data set * What can we do about it? Three simple rules: 1. Separate data from analysis 2. *Use version control* 3. Use code to analyze data (not GUIs) * Use a version control system (= use git) #+ATTR_LATEX: :width 0.7\textwidth [[file:mrr_example2018b.pdf]] \center \tiny https://www.quora.com/ * What is git and why should I use it? ** Left :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: - Version control system for source code management - Tracks every file in a project - Keeps track of any change to any file - Is relatively easy to use - Downside: it works best with text ** Right :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: #+ATTR_LATEX: :width 0.8\textwidth [[file:mrr_example2018c.pdf]] * Example: git * What can we do about it? Three simple rules: 1. Separate data from analysis 2. Use version control 3. *Use code to analyze data (not GUIs)* * Why code? #+ATTR_LATEX: :width 1.0\textwidth [[file:mrr_example2018d.pdf]] * Why code? ** Left :PROPERTIES: :BEAMER_col: 0.4 :BEAMER_opt: [T] :END: - *To keep track of the workflow* - To make the analysis transparent - To improve your skills and get more efficient as you code ** Right :PROPERTIES: :BEAMER_col: 0.6 :BEAMER_opt: [T] :END: *** Block \footnotesize #+NAME: code1 #+BEGIN_SRC R :session :exports code :results silent parse_msd <- function(m, sd) { # # this function will # produce a nicely formatted string of # mean and sd to be used inline in text # print(paste("M = ", round(m, 2), ", SD = ", round(sd, 2), sep="")) } #+END_SRC \normalsize * Without code your analysis won't be reproducible Options: - R or RStudio (it's free!), ideally also Python (it's free!) - Alternatively, Matlab (great, but commercial) - SAS (has been the market leader in commercial analytics, and it does include a free University Edition now) * Without code your analysis won't be reproducible Options: - *R or R studio (it's free!), ideally also Python (it's free!)* - Alternatively, Matlab (great, but commercial) - SAS (has been the market leader in commercial analytics, and it does include a free University Edition now) * Example: R ** LeftRight :PROPERTIES: :BEAMER_col: 1.0 :BEAMER_opt: [T] :BEAMER_env: block :END: *** Block \tiny #+NAME: code1 #+BEGIN_SRC R :session :exports code :results silent #----------------------------------------------------------------------- # This is a simple R program # 9/18/18, PH #----------------------------------------------------------------------- # # 1. Load and visualize data #----------------------------------------------------------------------- dat <- read.csv("../data/mrr.csv") # Histogramms hist(dat$y[dat$group=="X"], col="blue") hist(dat$y[dat$group=="Y"], col="blue") # 2. Compute linear model, adjusted for age #----------------------------------------------------------------------- lmfit <- lm(y ~ group + age, data=dat) # 3. Visualize residuals to check model assumptions #----------------------------------------------------------------------- plot(density(resid(lmfit))) # 4. Print coefficients #----------------------------------------------------------------------- summary(lmfit) #+END_SRC \normalsize * Coding: the good news - It is easier than you think - Once one language is learned, it's easy to learn another one * Summary: How to make research reproducible Essential: 1. Separate data and analysis 2. Use git to keep track of changes 3. Use R to keep track of your workflow Optional: 4. Combine coding and writing to produce manuscripts 5. Use Make to build your project * Summary: How to make research reproducible Essential: 1. Separate data and analysis 2. Use git to keep track of changes 3. Use R to keep track of your workflow Optional: 4. *Combine coding and writing to produce manuscripts* 5. Use Make to build your project * Combining coding and writing ** Left :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: Several Options: - knitr (RStudio) - *org-mode* - sweave ** Right :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: #+ATTR_LATEX: :width 1.0\textwidth [[file:mrr_example2018e.pdf]] * Example: org-mode * Summary: How to make research reproducible Essential: 1. Separate data and analysis 2. Use git to keep track of changes 3. Use R to keep track of your workflow Optional: 4. Combine coding and writing to produce manuscripts 5. *Use Make to build your project* * Example: Makefile * Conclusion - We need transparent and reproducible workflows - Efficient way to improve analyses and writing - Sharing data, code, workflows may become a requirement * Acknowledgments # \footnotesize ** Left :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: \usebeamerfont{acknowledgments} \singlespacing - Joe Zellner (jonzelner.net/docker/reproducibility/) - Andrew Gelman (andrewgelman.com) - Papaja package in R (crsh.github.io/papaja_man/) ** Right :PROPERTIES: :BEAMER_col: 0.5 :BEAMER_opt: [t] :END: #+ATTR_LATEX: :width 0.9\textwidth [[file:mrr_example2018f.pdf]] * References :ignore: \bibliographystyle{npp} \nobibliography{master}