To do reproducible research & encourage others to do so also
Increase credibility: article as only advertisement; show the correctness of my results
Increase impact: allow reuse of method; currently the main beneficiary is future me
December, 2014
To do reproducible research & encourage others to do so also
Increase credibility: article as only advertisement; show the correctness of my results
Increase impact: allow reuse of method; currently the main beneficiary is future me
Pre-publication work viewed as trade secrets
Anxiety about exposure to ridicule
Wide variation in data analysis tools
Oberg (1960) popularized the term culture shock as the "anxiety that results from losing all of our familiar signs and symbols of social intercourse"
Dependencies
Imprecise documentation
Code rot
Barriers to adoption and reuse in existing solutions
Repository with code and data (R markdown file, scripts, RProj, R package)
Review cycle means MS Word is a necessary format
Code is circulated with co-authors, but they don't do anything with it
Cited and described in methods section
Decipher analysis in Excel or SPSS file
Recompute all or some with R
Create repository with R code and data
More like a lab notebook, not cited in manuscript
Keep my contribution self-contained
Create repository for my contribution (from specific commit or release)
Cite repository in publication at figure caption (with no explanation)
Require student collaborators to acquire skills (sneak into coursework, require it for graduate student milestones, Software Carpentry)
Normalise scripted analyses by talking about it, showing it, citing it (at appropriate moments…)
Advocate Open Methods, flattery often works (open science may be a bridge too far for some)
Workflow software: elegant but esoteric & nobody uses them
Virtual machine: isolated & intelligible but heavyweight & black box
Linux container: in their infancy…
Operating system-level virtualization: very lightweight Linux VM
My use is inspired by the rocker project of Carl Boettiger & Dirk Eddelbuettel, and Carl's paper
Is optimized at the level of single applications, for me this is RStudio and the shell
Less disruptive to established workflows; not a drain on my laptop, I can use my usual text editor, use RStudio in my web browser, etc.
Highly portable to give identical environments across different machines; images can be snapshotted
Reusing and remixing images is trivial
Docker Hub gives free open hosting and continuous integration for images (can link to dockerfiles hosted on github, etc.)
Docker does not provide complete visualization but relies on the Linux kernel provided by the host
Docker is limited to 64 bit host machines
On Windows & OSX Docker must still be run in a fully virtualized environment (VirtualBox). The boot2docker tool helps, but could be smoother
Potential security issues
Will Docker be significantly adopted by any scientific research or teaching community?
Increase visibility of scripted analyses to reduce culture shock
Create opportunities with minimal inessential weirdness for students to learn & peers to familiarize (Software Carpentry, Open Methods)
Research project as R package (rather than RProj, scripts, etc.)
Document dependencies with Dockerfile and use Docker as a common computational environment for research and teaching
Presentation written in R Markdown using ioslides
Compiled into HTML5 using RStudio & [knitr]
Source code hosting: https://github.com/benmarwick/UW-eScience-reproducibility-collaboration
ORCID: http://orcid.org/0000-0001-7879-4531
Licensing:
K. Oberg, 1960. Cultural shock: Adjustment to new cultural environments, Practical Anthropology 7 (1960), pp. 177-182.
G.R. Weaver, 1994. Understanding and coping with cross-cultural adjustment stress. In: G.R. Weaver, Editor, Culture, communication and conflict: Readings in intercultural relations, Ginn Press,Needham Heights, MA, pp. 169-189.
Boettiger, Carl 2014 An introduction to Docker for reproducible research, with examples from the R environment http://arxiv.org/abs/1410.0846)