Find us on GitHub

Archaeological Data Science Using R: A Short Workshop

Smithsonian Institution, National Museum of Natural History

13 April 2018

9:30 am - 1:00 pm

Instructors: Ben Marwick (University of Washington)

Helpers: Matt Harris (AECOM), Jon Clindaniel (Harvard University)

General Information

This workshop is for any archaeologist who has data they want to analyze, and no prior computational experience is required. This hands-on workshop teaches basic concepts, skills and tools for working more effectively with data. It is loosly based on the Data Carpentry curriculum (which is highly suitable for self-guided learning).

If you want to do data analysis and visualization more efficiently and with less pain, R can help. R is also excellent for reproducible research, a cornerstone of scientific archaeology. Writing scripts in R makes it easy to keep track of your work, easy to redo analyses on new data, and easy to share your work with others. We will introduce common archaeological data analysis and visualisation tasks using modern and widely used tools in the R programming language. Participants should bring their laptops and plan to participate actively. By the end of the workshop learners should be able to more effectively manage and analyze data and be able to apply the tools and approaches directly to their ongoing research.

Who: The course is aimed at archaeologists at all career stages.

The course is intended for archaeologists with no prior experience with R, statistics, or any other programming language.

Some basic familiarity with spreadsheets would be helpful.

Where: 10th Street and Constitution Avenue, NW Washington, DC. Get directions with OpenStreetMap or Google Maps. Our workshop will be held in the NMNH - SIL Training Room - CE107. This is a secure building, and we will issue visitor name badges at the door at 9:15 am so you can get in and out during the workshop. Please bring a print-out of your eventbrite ticket to ensure you can get into the workshop. Drinks are allowed, but not food (small snacks are ok).

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating sytem (not a tablet, Chromebook, etc.) that they have administrative privileges on. You should have a few specific software packages installed (listed below). You are also required to abide by our Code of Conduct.

Contact: Please mail bmarwick@uw.edu for more information.


Preliminary Schedule

An encyclopedic treatment of the topics in the schedule is beyond the scope of this short workshop. Our goal is to use real-world archaeological data and working R code to familiarize participants with how they can get started doing these things in R by themselves.
Start time End time Topic
9:30 9:45 Introduction to R and RStudio
9:45 10:00 Importing, inspecting & cleaning data from spreadsheets
10:00 10:15 Exploratory Data Analysis with dplyr & tidyr
10:15 10:30 Exploratory Data Analysis with dplyr & tidyr
10:30 10:45 Visualising data with ggplot & plotly
10:45 11:00 Visualising data with ggplot & plotly
11:00 11:15 Break
11:15 11:30 Hypothesis testing: two samples
11:30 11:45 Hypothesis testing: more than two samples
11:45 12:00 Hypothesis testing: count data
12:00 12:15 Break
12:15 12:30 Importing & mapping GIS data with sf & google maps
12:30 12:45 Spatial data analysis: spatial joins
12:45 1:00 Spatial data analysis: point pattern analysis
We will be using the tidyverse, a modern, unified collection of R packages designed for data science. For a more in-depth coverage of many of the topics of the workshop, you may want to read R for Data Science by Hadley Wickham and Garrett Grolemund.


Setup

To participate in this workshop, you will need working copies of the R and RStudio. Please make sure to install everything (or at least to download the installers) before the start of your workshop. If you have previously installed these programs, please download and install the most recent versions (your version may be outdated and not work with the activities in this workshop). Participants should bring and use their own laptops to insure the proper setup of tools for an efficient workflow once you leave the workshop. If you have problems or questions, please send us an email at bmarwick@uw.edu .

Please follow these Setup Instructions, see the section for your operating system for those directions.

Windows

Please go through all the installation steps below and make sure that you not only installed them, but start them up to make sure they're working. If you have any problems, don't hesitate to email the instructors to ask for help, or arrive early on the first day of the workshop to get help.

  1. A spreadsheet program
    For this workshop you will need a spreadsheet program. Many people already have Microsoft Excel installed, and if you do, you're set!
    If you need a spreadsheet program, there are a few other options, like OpenOffice and LibreOffice. Install instructions for LibreOffice, which is free and open source, are here.
    • Download the Installer
      Install LibreOffice by going to the installation page. The version for Windows should automatically be selected. Click on the button below "Main Installer" Download Version x.y.z. You will go to a page that asks about a donation, but you don't need to make one. Your download should begin automatically.
    • Install LibreOffice
      Once the installer is downloaded, double click on it and it should install.
    • To use LibreOffice, double click on the icon and it will open.

  2. R
    In the workshop, we will use RStudio. RStudio is a nice interface to the programming language R. To use RStudio, you need to install both R and RStudio.
    • Download R from here
    • Run the .exe file that was just downloaded
    • Go to the RStudio Download page
    • Under Installers select RStudio x.yy.zzz - Windows Vista/7/8/10
    • Double click the file to install it
    • Once it's installed, open RStudio to make sure it works and you don't get any error messages.

Mac

Please go through all the installation steps below and make sure that you not only installed them, but start them up to make sure they're working. If you have any problems, don't hesitate to email the instructors to ask for help, or arrive early on the first day of the workshop to get help.

  1. A spreadsheet program
    For this workshop you will need a spreadsheet program. Many people already have Microsoft Excel installed, and if you do, you're set!
    If you need a spreadsheet program, there are a few other options, like OpenOffice and LibreOffice. Install instructions for LibreOffice, which is free and open source, are here.
    • Download the Installer
      Install LibreOffice by going to the installation page. The version for Mac should automatically be selected. Click on the button below "Main Installer" Download Version x.y.z. You will go to a page that asks about a donation, but you don't need to make one. Your download should begin automatically.
    • Install LibreOffice
      Once the installer is downloaded, double click on it and it should install.
    • To use LibreOffice, double click on the icon and it will open.

  2. R
    In the workshop, we will use RStudio. RStudio is a nice interface to the programming language R. To use RStudio, you need to install both R and RStudio.
    • Go to CRAN and click on Download R for (Mac) OS X
    • Select the .pkg file for the version of OS X that you have and the file will download.
    • Double click on the file that was downloaded and R will install
    • Go to the RStudio Download page
    • Under Installers select RStudio x.yy.zzz - Mac OS X 10.6+ (64-bit) to download it.
    • Once it's downloaded, double click the file to install it
    • Once it's installed, open RStudio to make sure it works and you don't get any error messages.

Linux

Please go through all the installation steps below and make sure that you not only installed them, but start them up to make sure they're working. If you have any problems, don't hesitate to email the instructors to ask for help, or arrive early on the first day of the workshop to get help.

  1. A spreadsheet program
    For this workshop you will need a spreadsheet program. LibreOffice comes preinstalled with several Linux distributions. If you don't already have it, use your package manager to install it: (e.g., sudo apt-get install libreoffice for Ubuntu and other Debian-based distributions).

  2. R
    In the workshop, we will use RStudio. RStudio is a nice interface to the programming language R. To use RStudio, you need to install both R and RStudio.
    • Follow the instructions for your distribution from CRAN. For most distributions, you can use your package manager (e.g. for Debian/Ubuntu run sudo apt-get install r-base, and for Fedora run sudo yum install R) but make sure that you have at least R 3.2.2 (as pre-packaged versions might be out of date).
    • To install RStudio, go to the RStudio Download page
    • Under Installers select the version for your distribution.
    • Once it's downloaded, double click the file to install it (or sudo dpkg -i rstudio-x.yy.zzz-amd64.deb at the terminal).
    • Once it's installed, open RStudio to make sure it works and you don't get any error messages.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

Code of Conduct

We are dedicated to providing a welcoming and supportive environment for all people, regardless of background or identity. However, we recognise that some groups in our community are subject to historical and ongoing discrimination, and may be vulnerable or disadvantaged. Membership in such a specific group can be on the basis of characteristics such as gender, sexual orientation, disability, physical appearance, body size, race, nationality, sex, colour, ethnic or social origin, pregnancy, citizenship, familial status, veteran status, genetic information, religion or belief, political or any other opinion, membership of a national minority, property, birth, age, or choice of text editor. We do not tolerate harassment of participants on the basis of these categories, or for any other reason. Harassment is any form of behaviour intended to exclude, intimidate, or cause discomfort. Because we are a diverse community, we may have different ways of communicating and of understanding the intent behind actions. Therefore we have chosen to prohibit certain forms of behaviour in our community, regardless of intent. Prohibited harassing behaviour includes but is not limited to: written or verbal comments which have the effect of excluding people on the basis of membership of a specific group listed above causing someone to fear for their safety, such as through

  • stalking, following, or intimidation
  • the display of sexual or violent images
  • unwelcome sexual attention
  • nonconsensual or unwelcome physical contact
  • sustained disruption of talks, events or communications
  • incitement to violence, suicide, or self-harm
  • continuing to initiate interaction (including photography or recording) with someone after being asked to stop
  • publication of private communication without consent

Behaviour not explicitly mentioned above may still constitute harassment. The list above should not be taken as exhaustive but rather as a guide to make it easier to enrich all of us and the communities in which we participate. All interactions should be professional regardless of location: harassment is prohibited whether it occurs on- or offline, and the same standards apply to both.

Enforcement of the Code of Conduct will be respectful and not include any harassing behaviors.

Thank you for helping make this a welcoming, friendly community for all.

This code of conduct is an adaptation of the one used by the Software Carpentry Foundation and is a modified version of that used by PyCon, which in turn is forked from a template written by the Ada Initiative and hosted on the Geek Feminism Wiki. Contributors to this document: Adam Obeng, Aleksandra Pawlik, Bill Mills, Carol Willing, Erin Becker, Hilmar Lapp, Kara Woo, Karin Lagesen, Pauline Barmby, Sheila Miguez, Simon Waldman, Tracy Teal.

About the organisers

Ben Marwick is an Associate Professor of archaeology at the University of Washington, and a Senior Research Scientist at the Centre for Archaeological Science at the University of Wollongong. He studies Pleistocene archaeology in mainland Southeast Asia and Australia. He uses R in his day-to-day work and research publications, and has written extensively (including in Nature and the Journal of Archaeological Method and Theory) on the importance of using code to improve the reproducibility of research in archaeology and elsewhere. Ben is the convenor of the SAA Open Science Interest Group.

Matt Harris is the Director of GIS, Data Analysis, & Geoarchaeology in the Cultural Resources Deptartment of AECOM. He is an advanced R user, with a focus on spatial analysis and simulation. He documents many of his explorations using R on his blog. Matt is a member of the SAA Open Science Interest Group, and has previously instructed R to archaeologists via the SAA Online workshops and in person.

Jon Clindaniel is a Ph.D. candidate in Anthropology at Harvard University (Expected 2019). For his dissertation, Jon uses a combination of archaeological excavation and computational techniques to decipher Inka khipu signs. He uses R for advanced statistical analyses.