Transparent and Reproducible Archaeological Research Using Open Data Science Tools

Lectures and Workshops at the Nara National Research Institute for Cultural Properties, Japan

September 10-11, 2019

10 am - 4 pm

Instructors: Ben Marwick (University of Washington)

Helpers: Liying Wang (University of Washington)

General Information

In recent years serious concerns about the reproducibility and transparency of research have arisen in many scientific disciplines. These concerns reveal a wide gap between scientific practice and scientific ideals, and threaten to erode public support for research. In this workshop we will provide hands-on training in robust techniques, tools and services (all free) to improve the reproducibility and transparency of archaeological research. Most of these tools relate to the R programming language, which is central to recent developments in social and natural sciences.

This workshop is suited to novices who have never used R before: no prior experience is necessary. The course is aimed at archaeologists doing research at all career stages.

Where: 〒630-8577 奈良県奈良市二条町2丁目9−1. Get directions with OpenStreetMap or Google Maps. 🌏

When: September 10-11, 2019. Add to your Google Calendar. 📅

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below). If you have previously installed these programs, please download and install the most recent versions (your version may be outdated and not work with the activities in this workshop). If you have problems or questions, please send us an email at bmarwick@uw.edu . Participants are also required to abide by our Code of Conduct.

Contact: Please email bmarwick@uw.edu for more information. ✉️


Lectures

🎤 Lecture 1: "Modern tools and approachesto scientific data management, analysis, visualization, collaboration in archaeology and cultural heritage" ⬇️ Download the slides from osf.io: PDF or pptx. Download the R code and data used in the demonstration from osf.io: zip

🎤 Lecture 2: “Open Access, Open Data, and Open Methods: Three steps to transparency that are redefining archaeological science” ⬇️ Download the slides from osf.io: PDF or pptx. Download the accompanying paper “Archaeological science and current trends in research publication, data management, and methods transparency and reproducibility” from osf.io: PDF

🎤 Lecture 3 "Introduction to collaborative reproducible research, the example of the Ocean Health Index" ⬇️ Download the slides from osf.io: PDF or pptx

Workshop Schedule

10 Sept 15:20-17:00 Git for Archaeological Science

⬇️ Download the slides from osf.io: PDF or pptx

Start time End time Topic
15:20 15:35 Lecture: Introduction to Git and GitHub. Define key concepts such as remote, local, commit, push, pull, pull request
15:35 15:50 Activity: Create a GitHub account and follow some people
15:50 16:10 Activity: learn to fork, commit, and pull request on GitHub. Add a new file, add text to that file. Look at commit history and blame view on GitHub
16:10 17:00 Activity: learn to collaborate with Git & RStudio. Fork, clone, commit, identify and resolve merge conflicts

Between each topic we will have a short break for fresh air and a stretch. We will be using Jenny Bryan's Happy Git with R book as our guide and reference. For a more in-depth coverage of many of the topics of the workshop, please refer to that text. Here are some further readings on Git for science:

11 Sept 10:00-16:00 Writing Reproducible Research with R and rrtools

⬇️ View the slides on GitHub or view the R Markdown source document. View the detailed step-by-step instructions at the rrtools GitHub repository. Download the example compendium containing the `paper.Rmd` file used for the demonstration from osf.io: zip

Start time End time Topic
10:00 11:00 Introduction to R and RStudio, including customising our .Rprofile file with git config and set our GitHub PAT
11:00 11:15 Run `rrtools::use_compendium("pkgname")` and edit our DESCRIPTION file
11:15 11:30 Run `usethis::use_mit_license(name = "My Name")` and discuss license choices
11:30 12:00 Run `usethis::use_git()` then `usethis::use_github()`
12:00 12:15 Run `rrtools::use_readme_rmd()` and discuss CONDUCT.md: a code of conduct for users, CONTRIBUTING.md: basic instructions for people who want to contribute to our compendium, and issue templates in GitHub repository settings
12:15 12:30 Run `rrtools::use_analysis()` and discuss `usethis::edit_git_ignore()`
12:30 13:30 Lunch 🍱
13:30 14:30 Writing the `paper.Rmd`: code chunks and controlling their output, inline R code, discuss templates provided by the rticles package
14:30 15:00 Writing the `paper.Rmd`: figures, tables, captions, cross-refs, citations. Discuss references.bib, csl files, Zotero, and updating the description with `rrtools::add_dependencies_to_description()`
15:00 15:30 Containerisation and continuous integration using Binder, Docker, and Travis
15:30 16:00 Archiving our research compendium with a DOI at the Open Science Framework, discussion of the osfr R pkg to manage large files
Between each topic we will have a short break for fresh air and a stretch. For a more in-depth coverage of many of the topics of the workshop, please refer to that text. Here are some further readings on writing reproducible research, and see our reading list below also:



Setup

To participate in a this workshop, you will need access to the software described below. In addition, you will need an up-to-date web browser.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

Git

Git is a version control system that lets you track who made changes to what when and has options for easily updating a shared or public version of your code on github.com. You will need a supported web browser.

You will need an account at github.com for parts of the Git lesson. Basic GitHub accounts are free. We encourage you to create a GitHub account if you don't have one already. Please consider what personal information you'd like to reveal. For example, you may want to review these instructions for keeping your email address private provided at GitHub.

Windows

Video Tutorial
  1. Download the Git for Windows installer.
  2. Run the installer and follow the steps below:
    1. Click on "Next" four times (two times if you've previously installed Git). You don't need to change anything in the Information, location, components, and start menu screens.
    2. Select “Use the nano editor by default” and click on “Next”.
    3. Keep "Use Git from the command line and..." selected and click on "Next". If you forgot to do this programs that you need for the workshop will not work properly. If this happens rerun the installer and select the appropriate option.
    4. Click on "Next".
    5. Keep "Checkout Windows-style, commit Unix-style line endings" selected and click on "Next".
    6. Select "Use Windows' default console window" and click on "Next".
    7. Click on "Install".
    8. Click on "Finish".
  3. If your "HOME" environment variable is not set (or you don't know what this is):
    1. Open command prompt (Open Start Menu then type cmd and press [Enter])
    2. Type the following line into the command prompt window exactly as shown:

      setx HOME "%USERPROFILE%"

    3. Press [Enter], you should see SUCCESS: Specified value was saved.
    4. Quit command prompt by typing exit then pressing [Enter]

This will provide you with both Git and Bash in the Git Bash program.

macOS

Please open the Terminal app, type git --version and press Enter/Return. If it's not installed already, follow the instructions to Install the "command line developer tools". Don't click "Get Xcode", because that will take too long and is not necessary for our Git lesson. After installing these tools, there won't be anything in your /Applications folder, as they and Git are command line programs. For older versions of OS X (10.5-10.8) use the most recent available installer labelled "snow-leopard" available here. Because this installer is not signed by the developer, you may have to right click (control click) on the .pkg file, click Open, and click Open in the pop-up dialog. You can watch a video tutorial about this case.

Linux

If Git is not already available on your machine you can try to install it via your distro's package manager. For Debian/Ubuntu run sudo apt-get install git and for Fedora run sudo dnf install git.

R & RStudio

R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio.

Windows

Video Tutorial

Install R by downloading and running this .exe file from CRAN. Also, please install the RStudio IDE. Note that if you have separate user and admin accounts, you should run the installers as administrator (right-click on .exe file and select "Run as administrator" instead of double-clicking). Otherwise problems may occur later, for example when installing R packages.

macOS

Video Tutorial

Install R by downloading and running this .pkg file from CRAN. Also, please install the RStudio IDE.

Linux

You can download the binary files for your distribution from CRAN. Or you can use your package manager (e.g. for Debian/Ubuntu run sudo apt-get install r-base and for Fedora run sudo dnf install R). Also, please install the RStudio IDE.

Code of Conduct

We are dedicated to providing a welcoming and supportive environment for all people, regardless of background or identity. However, we recognise that some groups in our community are subject to historical and ongoing discrimination, and may be vulnerable or disadvantaged. Membership in such a specific group can be on the basis of characteristics such as gender, sexual orientation, disability, physical appearance, body size, race, nationality, sex, colour, ethnic or social origin, pregnancy, citizenship, familial status, veteran status, genetic information, religion or belief, political or any other opinion, membership of a national minority, property, birth, age, or choice of text editor. We do not tolerate harassment of participants on the basis of these categories, or for any other reason. Harassment is any form of behaviour intended to exclude, intimidate, or cause discomfort. Because we are a diverse community, we may have different ways of communicating and of understanding the intent behind actions. Therefore we have chosen to prohibit certain forms of behaviour in our community, regardless of intent. Prohibited harassing behaviour includes but is not limited to: written or verbal comments which have the effect of excluding people on the basis of membership of a specific group listed above causing someone to fear for their safety, such as through

Behaviour not explicitly mentioned above may still constitute harassment. The list above should not be taken as exhaustive but rather as a guide to make it easier to enrich all of us and the communities in which we participate. All interactions should be professional regardless of location: harassment is prohibited whether it occurs on- or offline, and the same standards apply to both.

Enforcement of the Code of Conduct will be respectful and not include any harassing behaviors.

Thank you for helping make this a welcoming, friendly community for all.

This code of conduct is an adaptation of the one used by the Software Carpentry Foundation and is a modified version of that used by PyCon, which in turn is forked from a template written by the Ada Initiative and hosted on the Geek Feminism Wiki. Contributors to this document: Adam Obeng, Aleksandra Pawlik, Bill Mills, Carol Willing, Erin Becker, Hilmar Lapp, Kara Woo, Karin Lagesen, Pauline Barmby, Sheila Miguez, Simon Waldman, Tracy Teal.

Further reading 📄👀

Eglen, S. J., Marwick, B., Halchenko, Y. O., Hanke, M., Sufi, S., Gleeson, P., … & Wachtler, T. (2017). Toward standard practices for sharing computer code and programs in neuroscience. Nature Neuroscience 20(6), 770-773. [DOI] [preprint] [PDF]

Marwick, B. 2017 Computational reproducibility in archaeological research: Basic principles and a case study of their implementation. Journal of Archaeological Method and Theory 24(2), 424-450. [DOI] [preprint] [code & data]

Marwick 2017 Using R and Related Tools for Reproducible Research in Archaeology. In Kitzes, J., Turek, D., & Deniz, F. (Eds.) The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. Oakland, CA: University of California Press. [online]

Marwick, B., & Birch, S. 2018 A Standard for the Scholarly Citation of Archaeological Data as an Incentive to Data Sharing. Advances in Archaeological Practice 1-19. [DOI] [preprint] [PDF] [code & data]

Marwick, B., Boettiger, C., & Mullen, L. 2017 Packaging data analytical work reproducibly using R (and friends). The American Statistician [DOI] [preprint]

Marwick, B, d’Alpoim Guedes, J., Barton, C. M., Bates, L. A., Baxter, M., Bevan, A., Bollwerk, E. A., Bocinsky, R. K., Brughmans, T., Carter, A. K., Conrad, C., Contreras, D. A., Costa, S., Crema, E. R., Daggett, A., Davies, B., Drake, B. L., Dye, T. S., France, P., Fullagar, R., Giusti, D., Graham, S., Harris, M. D., Hawks, J., Health, S., Huffer, D., Kansa, E. C., Kansa, S. W., Madsen, M. E., Melcher, J., Negre, J., Neiman, F. D., Opitz, R., Orton, D. C., Przstupa, P., Raviele, M., Riel-Savatore, J., Riris, P., Romanowska, I., Smith, J., Strupler, N., Ullah, I. I., Van Vlack, H. G., VanValkenburgh, N., Watrall, E. C., Webster, C., Wells, J., Winters, J., and Wren, C. D. (2017) Open science in archaeology. SAA Archaeological Record, 17(4), pp. 8-14. [PDF] [preprint]

Ram, K. B. Marwick 2017 Building Towards a Future Where Reproducible, Open Science is the Norm. In Kitzes, J., Turek, D., & Deniz, F. (Eds.) The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. Oakland, CA: University of California Press. [online]

Rokem, A., B. Marwick, V. Staneva 2017 Assessing Reproducibility. In Kitzes, J., Turek, D., & Deniz, F. (Eds.) The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. Oakland, CA: University of California Press. University of California Press. [online]

About the instructors 🍎

Ben Marwick is an Associate Professor of archaeology at the University of Washington. He studies Pleistocene archaeology in mainland Southeast Asia and Australia. He uses R in his day-to-day work and research publications, and has written extensively (including in Nature and the Journal of Archaeological Method and Theory) on the importance of using code to improve the reproducibility of research in archaeology and elsewhere. Ben is the convener of the SAA Open Science Interest Group and maintains an annotated list of R packages useful for archeologists on GitHub.

Liying Wang is a PhD student of archaeology at the University of Washington. Her PhD research focuses on European culture contact and its impact on indigenous societies in northeastern Taiwan. She recently started using R to analyze archaeological data. She will be a helper in this workshop.