{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction to reproducible research with `Jupyter Notebook` and `Python` \n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Learning objectives for this workshop:\n", "1. Understand the concept, importance, and components of reproducible research.\n", "1. Describe the architecture of a `Jupyter notebook` environment, including the notebook server, programming language kernels, interactive widgets, and situate `Jupyter notebooks` as part of the wider `Jupyter` ecosystem.\n", "1. Use a `Jupyter notebook` to create, manage and report on the execution of a programmatic workflow using a structured document combining narrative text, executable code, executed code output and embedded media items.\n", "1. Be able to access the wider `Jupyter` community, including but not limited to the `Jupyter` [Google Group](https://groups.google.com/forum/#!forum/jupyter), [Stack Overflow](http://stackoverflow.com/) and communities around specific `Python` packages\n", "\n", "--- " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- The sciences have with reproducibility problem. [Nature article](http://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970)\n", "- **Many published studies cannot be reproduced**:\n", "\n", "#### **Replication** and **reproduction**:\n", "- Peng (2009) Reproducible research and Biostatistics. [Biostatistics 10: 405-408](http://biostatistics.oxfordjournals.org/content/10/3/405.full)\n", " - **Replication** - when independent investigators use methods, protocols, data, and equipment to confirm scientific claims.\n", " - **Reproduction** - when data sets and computer code are made available for researchers to verify results.\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " ### *Science* retracts paper without agreement of lead author.\n", "- Journal retracted a study of how canvassers can sway people's opinions about gay marriage.\n", "- [Editor-in-Chief](http://news.sciencemag.org/policy/2015/05/science-retracts-gay-marriage-paper-without-lead-author-s-consent): Original survey data:\n", " + not made available for independent reproduction of results\n", " + Survey incentives misrepresented\n", " + Sponsorship statement false\n", "- Two grad students attempted to reproduce the study and could not. \n", "- Concluded the data must have been fabricated. [538 story](http://fivethirtyeight.com/features/how-two-grad-students-uncovered-michael-lacour-fraud-and-a-way-to-change-opinions-on-transgender-rights/)\n", "## Methods we'll discuss today can't prevent fraud, but they can make it easier to discover such issues.\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " ### Retracted, but not fraud\n", "- One researcher had seven papers retracted because of irreproducibility [Retraction Watch](http://retractionwatch.com/2014/11/14/univ-no-misconduct-but-poor-research-practice-in-mgt-profs-work-now-subject-to-7-retractions/#more-23666)\n", "- Couldn't find the data [Wiley](http://onlinelibrary.wiley.com/doi/10.1111/j.1468-1331.2011.03524.x/abstract)\n", "- \"Extensive\" errors force retraction of lymphoma paper [JRO](http://retractionwatch.com/2013/01/14/extensive-errors-force-retraction-of-lymphoma-radiation-paper/)\n", "- Many, many more [Irreproducible examples](https://github.com/Reproducible-Science-Curriculum/Reproducible-Science-Hackathon-Dec-08-2014/wiki/Irreproducible-Examples)\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Seizure study retracted after authors realized data were \"Terribly mixed\"\n", "- From the authors of Low Dose Lidocaine for Refractory Seizures in Preterm Neonates:\n", "\"The article has been retracted at the request of the authors. After carefully re-examining the data presented in the article, they identified that data of two different hospitals got terribly mixed. The published results cannot be reproduced in accordance with scientific and clinical correctness.\" [IJP](http://retractionwatch.com/2013/02/01/seizure-study-retracted-after-authors-realize-data-got-terribly-mixed/)\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Bad spreadsheet merge kills depression paper, quick fix resurrects it\n", "- The authors informed the [journal](http://retractionwatch.com/2014/07/01/bad-spreadsheet-merge-kills-depression-paper-quick-fix-resurrects-it/) that the merge of lab results and other survey data used in the paper resulted in an error regarding the identification codes.\n", "- **Original conclusion** : Lower levels of CSF IL-6 were associated with current depression and with future depression.\n", "- **Revised conclusion**: Higher levels of CSF IL-6 and IL-8 were associated with current depression.\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### [LIGO](http://www.ligo.org/) (Laser Interferometer Gravitational-Wave Observatory)\n", "\n", "- All [data](https://losc.ligo.org/data/) are publically available free of charge.\n", "- `Jupyter Notebooks` running `Python` are produced for each [publication](https://www.ligo.caltech.edu/page/detection-companion-papers). These notebooks allow full reproducibility: all analyses and figures can be recreated.\n", "- Produce in-depth [Tutorials](https://losc.ligo.org/tutorials/) using `Jupyter Notebooks` and `Python` \n", " - [Signal processing tutorial](https://losc.ligo.org/s/events/GW150914/GW150914_tutorial.html)\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The *Four Facets of reproducibility*:\n", "- **Documentation**: note the difference between binary files (e.g. docx) and .txt files and why text files are preferred for documentation.\n", "- **Organization**: tools to organize your projects so that you don’t have a single folder with hundreds of files.\n", "- **Automation**: the power of scripting to create automated data analyses.\n", "- **Dissemination**: publishing is not the end of your analysis, rather it is a way station towards your future research and the future research of others.\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Our reproducibility toolkit: `Jupyter Notebook` and the `Python` language\n", "Why `Python`?\n", " - Free!\n", " - Open source\n", " - Widely used and supported across many disciplines\n", " - Can be used on Windows, Mac OS X, or Linux!\n", "---" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" } }, "nbformat": 4, "nbformat_minor": 2 }