Ben Marwick
March 2014
Replicable refers to the ability to produce exactly the same results as published. Other people get exactly the same results when doing exactly the same thing. Technical: cf. validation and verification
Reproducible refers to the ability to create a workflow that independently upholds the published results using the information provided. Checking the results from the fixed digital form of data and code from the original study. Something similar happens in other people's hands. Substantive: possibly by a new implementation
“The goal of reproducible research is to tie specific instructions to data analysis and experimental data so that scholarship can be recreated, better understood and verified.” - Max Kuhn, CRAN Task View: Reproducible Research
Gavish & Gonoho AAAS 2011, Oxberry 2013
“An article about computational result is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result.” - Claerbout and Karrenbach, Proceedings of the 62nd Annual International Meeting of the Society of Exploration Geophysics. 1992
“When we publish articles containing figures which were generated by computer, we also publish the complete software environment which generates the figures” - Buckheit & Donoho, Wavelab and Reproducible Research, 1995.
Technical
Cultural & personal
Peng 2011, Science 334(6060) pp. 1226-1227
“Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.”– Donald E. Knuth, Literate Programming, 1984
For example… Let's calculate the current time in R.
time <- format(Sys.time(), "%a %d %b %X %Y")
The text and R code are interwoven in the output:
The time is `r time`
The time is Mon 18 Apr 7:57:06 PM 2016
For
Against
The machine-readable part
R: Free, open source, cross-platform, highly interactive, huge user community in academica and private sector
R packages: an ideal 'Compendium'?
“both a container for the different elements that make up the document and its computations (i.e. text, code, data, etc.), and as a means for distributing, managing and updating the collection… allow us to move from an era of advertisement to one where our scholarship itself is published” - Gentleman and Temple Lang 2004
library(rCharts)
open_notebook()
Markdown: lightweight document formatting syntax based on email text formatting. Easy to write, read and publish as-is.
The human-readable part
rmarkdown:
knitr - descendant of Sweave
Engine for dynamic report generation in R
A universal document converter, open source, cross-platform
-> Write code and narrative in rmarkdown
-> use knitr to get markdown (with computation of figures and tables)
-> use pandoc to get HTML/PDF/DOCX
…with a single easy R function render
Payoffs
Costs
RStudio is a free, open source, cross-platform integrated development environment for R
Has an integrated R console, deep support for markdown and git, a file manager, a text editor, a workspace browser, a data viewer, package development tools, etc. etc.
RStudio 'projects' make version control & document preparation simple
Payoffs
Costs
Stodden (IASSIST 2010) sampled American academics registered at the Machine Learning conference NIPS (134 responses from 593 requests (23%). Red = communitarian norms, Blue = private incentives
Stodden (IASSIST 2010) sampled American academics registered at the Machine Learning conference NIPS (134 responses from 593 requests (23%). Red = communitarian norms, Blue = private incentives
“Abandoning the habit of secrecy in favor of process transparency and peer review was the crucial step by which alchemy became chemistry.”
-Raymond, E. S., 2004, The art of UNIX programming: Addison-Wesley.
Presentation written in Markdown (R Presentation)
Compiled into HTML5 using RStudio
Source code hosting: https://github.com/benmarwick/CSSS-Primer-Reproducible-Research
ORCID: http://orcid.org/0000-0001-7879-4531
Licensing:
See Rpres file on github for full references and sources