September 10-11, 2019
10 am - 4 pm
Instructors: Ben Marwick (University of Washington)
Helpers: Liying Wang (University of Washington)
In recent years serious concerns about the reproducibility and transparency of research have arisen in many scientific disciplines. These concerns reveal a wide gap between scientific practice and scientific ideals, and threaten to erode public support for research. In this workshop we will provide hands-on training in robust techniques, tools and services (all free) to improve the reproducibility and transparency of archaeological research. Most of these tools relate to the R programming language, which is central to recent developments in social and natural sciences.
This workshop is suited to novices who have never used R before: no prior experience is necessary. The course is aimed at archaeologists doing research at all career stages.
Where: 〒630-8577 奈良県奈良市二条町2丁目9−1. Get directions with OpenStreetMap or Google Maps. 🌏
When: September 10-11, 2019. Add to your Google Calendar. 📅
Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below). If you have previously installed these programs, please download and install the most recent versions (your version may be outdated and not work with the activities in this workshop). If you have problems or questions, please send us an email at bmarwick@uw.edu . Participants are also required to abide by our Code of Conduct.
Contact: Please email bmarwick@uw.edu for more information. ✉️
🎤 Lecture 2: “Open Access, Open Data, and Open Methods: Three steps to transparency that are redefining archaeological science” ⬇️ Download the slides from osf.io: PDF or pptx. Download the accompanying paper “Archaeological science and current trends in research publication, data management, and methods transparency and reproducibility” from osf.io: PDF
🎤 Lecture 3 "Introduction to collaborative reproducible research, the example of the Ocean Health Index" ⬇️ Download the slides from osf.io: PDF or pptx
Start time | End time | Topic |
---|---|---|
15:20 | 15:35 | Lecture: Introduction to Git and GitHub. Define key concepts such as remote, local, commit, push, pull, pull request |
15:35 | 15:50 | Activity: Create a GitHub account and follow some people |
15:50 | 16:10 | Activity: learn to fork, commit, and pull request on GitHub. Add a new file, add text to that file. Look at commit history and blame view on GitHub |
16:10 | 17:00 | Activity: learn to collaborate with Git & RStudio. Fork, clone, commit, identify and resolve merge conflicts |
Between each topic we will have a short break for fresh air and a stretch. We will be using Jenny Bryan's Happy Git with R book as our guide and reference. For a more in-depth coverage of many of the topics of the workshop, please refer to that text. Here are some further readings on Git for science:
Start time | End time | Topic |
---|---|---|
10:00 | 11:00 | Introduction to R and RStudio, including customising our .Rprofile file with git config and set our GitHub PAT |
11:00 | 11:15 | Run `rrtools::use_compendium("pkgname")` and edit our DESCRIPTION file |
11:15 | 11:30 | Run `usethis::use_mit_license(name = "My Name")` and discuss license choices |
11:30 | 12:00 | Run `usethis::use_git()` then `usethis::use_github()` |
12:00 | 12:15 | Run `rrtools::use_readme_rmd()` and discuss CONDUCT.md: a code of conduct for users, CONTRIBUTING.md: basic instructions for people who want to contribute to our compendium, and issue templates in GitHub repository settings |
12:15 | 12:30 | Run `rrtools::use_analysis()` and discuss `usethis::edit_git_ignore()` |
12:30 | 13:30 | Lunch 🍱 |
13:30 | 14:30 | Writing the `paper.Rmd`: code chunks and controlling their output, inline R code, discuss templates provided by the rticles package |
14:30 | 15:00 | Writing the `paper.Rmd`: figures, tables, captions, cross-refs, citations. Discuss references.bib, csl files, Zotero, and updating the description with `rrtools::add_dependencies_to_description()` |
15:00 | 15:30 | Containerisation and continuous integration using Binder, Docker, and Travis |
15:30 | 16:00 | Archiving our research compendium with a DOI at the Open Science Framework, discussion of the osfr R pkg to manage large files |
To participate in a this workshop, you will need access to the software described below. In addition, you will need an up-to-date web browser.
We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.
Git is a version control system that lets you track who made changes to what when and has options for easily updating a shared or public version of your code on github.com. You will need a supported web browser.
You will need an account at github.com for parts of the Git lesson. Basic GitHub accounts are free. We encourage you to create a GitHub account if you don't have one already. Please consider what personal information you'd like to reveal. For example, you may want to review these instructions for keeping your email address private provided at GitHub.
cmd
and press [Enter])setx HOME "%USERPROFILE%"
SUCCESS: Specified value was saved.
exit
then pressing [Enter]This will provide you with both Git and Bash in the Git Bash program.
Please open the Terminal app, type git --version
and press
Enter/Return. If it's not installed already,
follow the instructions to Install
the "command line
developer tools". Don't click "Get Xcode", because that will
take too long and is not necessary for our Git lesson.
After installing these tools, there won't be anything in your /Applications
folder, as they and Git are command line programs.
For older versions of OS X (10.5-10.8) use the
most recent available installer labelled "snow-leopard"
available here.
Because this installer is not signed by the developer, you may have to
right click (control click) on the .pkg file, click Open, and click
Open in the pop-up dialog. You can watch
a video tutorial about this case.
If Git is not already available on your machine you can try to
install it via your distro's package manager. For Debian/Ubuntu run
sudo apt-get install git
and for Fedora run
sudo dnf install git
.
R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio.
Install R by downloading and running this .exe file from CRAN. Also, please install the RStudio IDE. Note that if you have separate user and admin accounts, you should run the installers as administrator (right-click on .exe file and select "Run as administrator" instead of double-clicking). Otherwise problems may occur later, for example when installing R packages.
Install R by downloading and running this .pkg file from CRAN. Also, please install the RStudio IDE.
You can download the binary files for your distribution
from CRAN. Or
you can use your package manager (e.g. for Debian/Ubuntu
run sudo apt-get install r-base
and for Fedora run
sudo dnf install R
). Also, please install the
RStudio IDE.
Behaviour not explicitly mentioned above may still constitute harassment. The list above should not be taken as exhaustive but rather as a guide to make it easier to enrich all of us and the communities in which we participate. All interactions should be professional regardless of location: harassment is prohibited whether it occurs on- or offline, and the same standards apply to both.
Enforcement of the Code of Conduct will be respectful and not include any harassing behaviors.
Thank you for helping make this a welcoming, friendly community for all.
This code of conduct is an adaptation of the one used by the Software Carpentry Foundation and is a modified version of that used by PyCon, which in turn is forked from a template written by the Ada Initiative and hosted on the Geek Feminism Wiki. Contributors to this document: Adam Obeng, Aleksandra Pawlik, Bill Mills, Carol Willing, Erin Becker, Hilmar Lapp, Kara Woo, Karin Lagesen, Pauline Barmby, Sheila Miguez, Simon Waldman, Tracy Teal.
Eglen, S. J., Marwick, B., Halchenko, Y. O., Hanke, M., Sufi, S., Gleeson, P., … & Wachtler, T. (2017). Toward standard practices for sharing computer code and programs in neuroscience. Nature Neuroscience 20(6), 770-773. [DOI] [preprint] [PDF]
Marwick, B. 2017 Computational reproducibility in archaeological research: Basic principles and a case study of their implementation. Journal of Archaeological Method and Theory 24(2), 424-450. [DOI] [preprint] [code & data]
Marwick 2017 Using R and Related Tools for Reproducible Research in Archaeology. In Kitzes, J., Turek, D., & Deniz, F. (Eds.) The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. Oakland, CA: University of California Press. [online]
Marwick, B., & Birch, S. 2018 A Standard for the Scholarly Citation of Archaeological Data as an Incentive to Data Sharing. Advances in Archaeological Practice 1-19. [DOI] [preprint] [PDF] [code & data]
Marwick, B., Boettiger, C., & Mullen, L. 2017 Packaging data analytical work reproducibly using R (and friends). The American Statistician [DOI] [preprint]
Marwick, B, d’Alpoim Guedes, J., Barton, C. M., Bates, L. A., Baxter, M., Bevan, A., Bollwerk, E. A., Bocinsky, R. K., Brughmans, T., Carter, A. K., Conrad, C., Contreras, D. A., Costa, S., Crema, E. R., Daggett, A., Davies, B., Drake, B. L., Dye, T. S., France, P., Fullagar, R., Giusti, D., Graham, S., Harris, M. D., Hawks, J., Health, S., Huffer, D., Kansa, E. C., Kansa, S. W., Madsen, M. E., Melcher, J., Negre, J., Neiman, F. D., Opitz, R., Orton, D. C., Przstupa, P., Raviele, M., Riel-Savatore, J., Riris, P., Romanowska, I., Smith, J., Strupler, N., Ullah, I. I., Van Vlack, H. G., VanValkenburgh, N., Watrall, E. C., Webster, C., Wells, J., Winters, J., and Wren, C. D. (2017) Open science in archaeology. SAA Archaeological Record, 17(4), pp. 8-14. [PDF] [preprint]
Ram, K. B. Marwick 2017 Building Towards a Future Where Reproducible, Open Science is the Norm. In Kitzes, J., Turek, D., & Deniz, F. (Eds.) The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. Oakland, CA: University of California Press. [online]
Rokem, A., B. Marwick, V. Staneva 2017 Assessing Reproducibility. In Kitzes, J., Turek, D., & Deniz, F. (Eds.) The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. Oakland, CA: University of California Press. University of California Press. [online]
Liying Wang is a PhD student of archaeology at the University of Washington. Her PhD research focuses on European culture contact and its impact on indigenous societies in northeastern Taiwan. She recently started using R to analyze archaeological data. She will be a helper in this workshop.