Ben Marwick, UW Anthropology
April 2014
Stodden, V., et al. 2013. “Setting the default to reproducible.” computational science research. SIAM News 46: 4-6.
“The goal of reproducible research is to tie specific instructions to data analysis and experimental data so that scholarship can be recreated, better understood and verified.” - Max Kuhn, CRAN Task View: Reproducible Research
Gavish & Gonoho AAAS 2011, Oxberry 2013
“An article about computational result is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result.” - Claerbout and Karrenbach, Proceedings of the 62nd Annual International Meeting of the Society of Exploration Geophysics. 1992
“When we publish articles containing figures which were generated by computer, we also publish the complete software environment which generates the figures” - Buckheit & Donoho, Wavelab and Reproducible Research, 1995.
Technical
Cultural & personal
Peng 2011, Science 334(6060) pp. 1226-1227
The alternative to point-and-click analyses
“Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.”– Donald E. Knuth, Literate Programming, 1984
For example… Let's calculate the current time in R.
time <- format(Sys.time(), "%a %d %b %X %Y")
The text and R code are interwoven in the output:
The time is `r time`
The time is Wed 09 Apr 3:08:04 PM 2014
For
Against
The machine-readable part
R: Free, open source, cross-platform, highly interactive, huge user community in academica and private sector
R packages: an ideal 'Compendium'?
“both a container for the different elements that make up the document and its computations (i.e. text, code, data, etc.), and as a means for distributing, managing and updating the collection… allow us to move from an era of advertisement to one where our scholarship itself is published” - Gentleman and Temple Lang 2004
Markdown: lightweight document formatting syntax based on email text formatting. Easy to write, read and publish as-is.
The human-readable part
rmarkdown:
knitr - descendant of Sweave
Engine for dynamic report generation in R
A universal document converter, open source, cross-platform
…with a single simple R function render
Payoffs
Costs
RStudio is a free, open source, cross-platform IDE for R
With integrated R console, deep support for markdown and git, a text editor, a workspace browser, a data viewer, package development tools, etc. etc.
RStudio 'projects' make version control & literate programming simple
Payoffs
Costs
Stodden (IASSIST 2010) sampled American academics registered at the Machine Learning conference NIPS (134 responses from 593 requests (23%). Red = communitarian norms, Blue = private incentives
Stodden (IASSIST 2010) sampled American academics registered at the Machine Learning conference NIPS (134 responses from 593 requests (23%). Red = communitarian norms, Blue = private incentives
An incentive to share data and code by acknowledging open practices with badges in publications. Currently used by Psychological Science
“Abandoning the habit of secrecy in favor of process transparency and peer review was the crucial step by which alchemy became chemistry.”
-Raymond, E. S., 2004, The art of UNIX programming: Addison-Wesley.
Presentation written in Markdown (R Presentation)
Compiled into HTML5 using RStudio
Source code hosting: https://github.com/benmarwick/UW-eScience-reproducibility-social-sciences
ORCID: http://orcid.org/0000-0001-7879-4531
Licensing:
See Rpres file on github for full references and sources