Transparent and Open Archaeological Research Using R

A Short Workshop at the Society of American Archaeology Annual Meeting, Albuquerque Convention Center

April 10, 2019

1-5 pm

Instructors: Ben Marwick (University of Washington)

Helpers: Matt Harris (AECOM), Liying Wang (University of Washington), Clemens Schmid (RGZM)

General Information

In recent years serious concerns about the reproducibility and transparency of research have arisen in many scientific disciplines. These concerns reveal a wide gap between scientific practice and scientific ideals, and threaten to erode public support for research. In this workshop we will provide hands-on training in robust techniques, tools and services (all free) to improve the reproducibility and transparency of archaeological research. Most of these tools relate to the R programming language, which is central to recent developments in social and natural sciences.

This workshop is suited to novices who have never used R before: no prior experience is necessary. The course is aimed at archaeologists doing research at all career stages.

Where: 401 2nd St NW, Albuquerque, NM 87102. Get directions with OpenStreetMap or Google Maps. 🌏

When: April 10, 2019. Add to your Google Calendar. 📅

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below). If you have previously installed these programs, please download and install the most recent versions (your version may be outdated and not work with the activities in this workshop). If you have problems or questions, please send us an email at bmarwick@uw.edu . Participants are also required to abide by our Code of Conduct.

Contact: Please email bmarwick@uw.edu for more information. ✉️

How to register

To participate in the workshop, you need to register for the SAA meeting. While completing the registration steps on the SAA website, you need to browse for this workshop among the items listed on Wednesday, 10 April 2019. When you find it, click 'add' to select the workshop, then click 'proceed to checkout'. Registration for the meeting closes on March 12, 2019. The fees for the workshop are set by the SAA to cover the room hire, audiovisual equipment hire, and internet access for participants. All of the instructors are volunteers.

Schedule

Start time End time Topic
1:00 1:45 Introduction to R and a RStudio
2:00 2:45 Writing with RMarkdown
3:00 3:45 Git & GitHub
4:00 4:45 Data repositories & Open Science Framework
4:45 5:00 Good enough practices
Between each topic we will have a 15 minute break for fresh air and a stretch. We will be using the tidyverse, a modern, unified collection of R packages designed for data science. For a more in-depth coverage of many of the topics of the workshop, you may want to read R for Data Science by Hadley Wickham and Garrett Grolemund.


Syllabus

R & RStudio

  • Introduction to R
  • Working in RStudio
  • Using functions
  • Getting unstuck
  • Getting help

Writing with RMarkdown

  • Writing text and code together
  • Reading in data from Excel sheets
  • Making publication-quality plots
  • Captions, cross-references and citations
  • Creating beautiful Word or PDF documents

Git & GitHub

  • Creating a repository
  • Recording changes to files: add, commit, ...
  • Viewing changes: status, diff, ...
  • Ignoring files
  • Working on the web: clone, pull, push, ...
  • Resolving conflicts
  • Using Git in R Studio
  • Open licenses

Data repositories & OSF

  • DOIs and metadata
  • Connecting GitHub to OSF
  • Snapshotting a version
  • Citing data repositories

Setup

To participate in a this workshop, you will need access to the software described below. In addition, you will need an up-to-date web browser.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

Git

Git is a version control system that lets you track who made changes to what when and has options for easily updating a shared or public version of your code on github.com. You will need a supported web browser.

You will need an account at github.com for parts of the Git lesson. Basic GitHub accounts are free. We encourage you to create a GitHub account if you don't have one already. Please consider what personal information you'd like to reveal. For example, you may want to review these instructions for keeping your email address private provided at GitHub.

Windows

Video Tutorial
  1. Download the Git for Windows installer.
  2. Run the installer and follow the steps below:
    1. Click on "Next" four times (two times if you've previously installed Git). You don't need to change anything in the Information, location, components, and start menu screens.
    2. Select “Use the nano editor by default” and click on “Next”.
    3. Keep "Use Git from the command line and..." selected and click on "Next". If you forgot to do this programs that you need for the workshop will not work properly. If this happens rerun the installer and select the appropriate option.
    4. Click on "Next".
    5. Keep "Checkout Windows-style, commit Unix-style line endings" selected and click on "Next".
    6. Select "Use Windows' default console window" and click on "Next".
    7. Click on "Install".
    8. Click on "Finish".
  3. If your "HOME" environment variable is not set (or you don't know what this is):
    1. Open command prompt (Open Start Menu then type cmd and press [Enter])
    2. Type the following line into the command prompt window exactly as shown:

      setx HOME "%USERPROFILE%"

    3. Press [Enter], you should see SUCCESS: Specified value was saved.
    4. Quit command prompt by typing exit then pressing [Enter]

This will provide you with both Git and Bash in the Git Bash program.

macOS

Please open the Terminal app, type git --version and press Enter/Return. If it's not installed already, follow the instructions to Install the "command line developer tools". Don't click "Get Xcode", because that will take too long and is not necessary for our Git lesson. After installing these tools, there won't be anything in your /Applications folder, as they and Git are command line programs. For older versions of OS X (10.5-10.8) use the most recent available installer labelled "snow-leopard" available here. Because this installer is not signed by the developer, you may have to right click (control click) on the .pkg file, click Open, and click Open in the pop-up dialog. You can watch a video tutorial about this case.

Linux

If Git is not already available on your machine you can try to install it via your distro's package manager. For Debian/Ubuntu run sudo apt-get install git and for Fedora run sudo dnf install git.

R & RStudio

R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio.

Windows

Video Tutorial

Install R by downloading and running this .exe file from CRAN. Also, please install the RStudio IDE. Note that if you have separate user and admin accounts, you should run the installers as administrator (right-click on .exe file and select "Run as administrator" instead of double-clicking). Otherwise problems may occur later, for example when installing R packages.

macOS

Video Tutorial

Install R by downloading and running this .pkg file from CRAN. Also, please install the RStudio IDE.

Linux

You can download the binary files for your distribution from CRAN. Or you can use your package manager (e.g. for Debian/Ubuntu run sudo apt-get install r-base and for Fedora run sudo dnf install R). Also, please install the RStudio IDE.

Code of Conduct

We are dedicated to providing a welcoming and supportive environment for all people, regardless of background or identity. However, we recognise that some groups in our community are subject to historical and ongoing discrimination, and may be vulnerable or disadvantaged. Membership in such a specific group can be on the basis of characteristics such as gender, sexual orientation, disability, physical appearance, body size, race, nationality, sex, colour, ethnic or social origin, pregnancy, citizenship, familial status, veteran status, genetic information, religion or belief, political or any other opinion, membership of a national minority, property, birth, age, or choice of text editor. We do not tolerate harassment of participants on the basis of these categories, or for any other reason.

Harassment is any form of behaviour intended to exclude, intimidate, or cause discomfort. Because we are a diverse community, we may have different ways of communicating and of understanding the intent behind actions. Therefore we have chosen to prohibit certain forms of behaviour in our community, regardless of intent. Prohibited harassing behaviour includes but is not limited to:

written or verbal comments which have the effect of excluding people on the basis of membership of a specific group listed above causing someone to fear for their safety, such as through

Behaviour not explicitly mentioned above may still constitute harassment. The list above should not be taken as exhaustive but rather as a guide to make it easier to enrich all of us and the communities in which we participate. All interactions should be professional regardless of location: harassment is prohibited whether it occurs on- or offline, and the same standards apply to both.

Enforcement of the Code of Conduct will be respectful and not include any harassing behaviors.

Thank you for helping make this a welcoming, friendly community for all.

This code of conduct is an adaptation of the one used by the Software Carpentry Foundation and is a modified version of that used by PyCon, which in turn is forked from a template written by the Ada Initiative and hosted on the Geek Feminism Wiki. Contributors to this document: Adam Obeng, Aleksandra Pawlik, Bill Mills, Carol Willing, Erin Becker, Hilmar Lapp, Kara Woo, Karin Lagesen, Pauline Barmby, Sheila Miguez, Simon Waldman, Tracy Teal.

Further reading 📄👀

Eglen, S. J., Marwick, B., Halchenko, Y. O., Hanke, M., Sufi, S., Gleeson, P., … & Wachtler, T. (2017). Toward standard practices for sharing computer code and programs in neuroscience. Nature Neuroscience 20(6), 770-773. [DOI] [preprint] [PDF]

Marwick, B. 2017 Computational reproducibility in archaeological research: Basic principles and a case study of their implementation. Journal of Archaeological Method and Theory 24(2), 424-450. [DOI] [preprint] [code & data]

Marwick 2017 Using R and Related Tools for Reproducible Research in Archaeology. In Kitzes, J., Turek, D., & Deniz, F. (Eds.) The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. Oakland, CA: University of California Press. [online]

Marwick, B., & Birch, S. 2018 A Standard for the Scholarly Citation of Archaeological Data as an Incentive to Data Sharing. Advances in Archaeological Practice 1-19. [DOI] [preprint] [PDF] [code & data]

Marwick, B., Boettiger, C., & Mullen, L. 2017 Packaging data analytical work reproducibly using R (and friends). The American Statistician [DOI] [preprint]

Marwick, B, d’Alpoim Guedes, J., Barton, C. M., Bates, L. A., Baxter, M., Bevan, A., Bollwerk, E. A., Bocinsky, R. K., Brughmans, T., Carter, A. K., Conrad, C., Contreras, D. A., Costa, S., Crema, E. R., Daggett, A., Davies, B., Drake, B. L., Dye, T. S., France, P., Fullagar, R., Giusti, D., Graham, S., Harris, M. D., Hawks, J., Health, S., Huffer, D., Kansa, E. C., Kansa, S. W., Madsen, M. E., Melcher, J., Negre, J., Neiman, F. D., Opitz, R., Orton, D. C., Przstupa, P., Raviele, M., Riel-Savatore, J., Riris, P., Romanowska, I., Smith, J., Strupler, N., Ullah, I. I., Van Vlack, H. G., VanValkenburgh, N., Watrall, E. C., Webster, C., Wells, J., Winters, J., and Wren, C. D. (2017) Open science in archaeology. SAA Archaeological Record, 17(4), pp. 8-14. [PDF] [preprint]

Ram, K. B. Marwick 2017 Building Towards a Future Where Reproducible, Open Science is the Norm. In Kitzes, J., Turek, D., & Deniz, F. (Eds.) The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. Oakland, CA: University of California Press. [online]

Rokem, A., B. Marwick, V. Staneva 2017 Assessing Reproducibility. In Kitzes, J., Turek, D., & Deniz, F. (Eds.) The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. Oakland, CA: University of California Press. University of California Press. [online]

About the instructors 🍎

Ben Marwick is an Associate Professor of archaeology at the University of Washington. He studies Pleistocene archaeology in mainland Southeast Asia and Australia. He uses R in his day-to-day work and research publications, and has written extensively (including in Nature and the Journal of Archaeological Method and Theory) on the importance of using code to improve the reproducibility of research in archaeology and elsewhere. Ben is the convener of the SAA Open Science Interest Group and maintains an annotated list of R packages useful for archeologists on GitHub.

Matt Harris is the Director of GIS, Data Analysis, & Geoarchaeology in the Cultural Resources Deptartment of AECOM. He is an advanced R user, with a focus on spatial analysis and simulation. He documents many of his explorations using R on his blog. Matt is a member of the SAA Open Science Interest Group, and has previously instructed R to archaeologists via the SAA Online workshops and in person.

Clemens Schmid is an early career archaeologist from Germany. He currently works in a research project about the archaeological site of Olympia (Greece) at the Römisch-Germanisches Zentralmuseum (RGZM) in Mainz. Clemens is an avid R developer and a founding member of the ISAAKiel working group. He developed several R Packages, RStudio Addins and R Shiny Webapps and conducted multiple workshops about R for archaeologists.