Creating a reproducible research compendium

Please start by:

We will run the workshop code in the rstudio.cloud service because it will save us time. I have already installed many packages there, so we don’t have to wait for that in the workshop.

Create a basic R package

Run this line:

rrtools::use_compendium("pkgname")

Notes:

  • this uses usethis::create_package() to create a basic R package with the name pkgname (you should use a different one), and then, if you’re using RStudio, opens the project. If you’re not using RStudio, it sets the working directory to the pkgname directory.
  • we need to:
    • choose a location for the compendium package. We recommend you set the working directory in RStudio using the drop-down menu: Session -> Set Working Directory and then run rrtools::use_compendium("pkgname").
    • edit the DESCRIPTION file (located in your pkgname directory) to include accurate metadata
    • periodically update the Imports: section of the DESCRIPTION file with the names of packages used in the code we write in the Rmd document(s) (e.g., usethis::use_package("dplyr", "imports"))

Attach a license to our compendium

Run this line:

usethis::use_mit_license(name = "My Name")

Notes:

  • this adds a reference to the MIT license in the DESCRIPTION file and generates a LICENSE file listing the name provided as the copyright holder
  • to use a different license, replace this line with usethis::use_gpl3_license(name = "My Name"), or follow the instructions for other licenses

Start version control and make a GitHub repository for our compendium

Then run this line to tell Git who we are on your computer:

usethis::use_git_config(user.name = "Jane Doe", user.email = "jane@example.com")

Then run these lines:

usethis::use_git()
# open up the GitHub panel to generate 
# your Personal Authorisation Token (PAT) 
usethis::browse_github_pat() 
# get a token from https://github.com/settings/tokens

After you get the token from GitHub, save it in your environment:

usethis::edit_r_environ() 
# Paste your copied PAT into your .Renviron file as system variable: 
# GITHUB_PAT=XXXXXX

Restart R, then run this line:

usethis::use_github(protocol = "https", 
                    private = FALSE)

Make a readme document, a code of conduct and a guide for contributors

Run this line:

rrtools::use_readme_rmd()

Then commit and push to GitHub, then browse your files on GitHub and see how the Readme file looks.

Notes:

  • this generates README.Rmd and renders it to README.md, ready to display on GitHub. It contains:
    • a template citation to show others how to cite your project. Edit this to include the correct title and DOI.
    • license information for the text, figures, code and data in your compendium
  • this also adds two other markdown files: a code of conduct for users CONDUCT.md, and basic instructions for people who want to contribute to your project CONTRIBUTING.md, including for first-timers to git and GitHub.
  • render this document after each change to refresh README.md, which is the file that GitHub displays on the repository home page

Create a compendium file structure and add template files

Run this line:

rrtools::use_analysis()

Then knit the Rmd document right away to see how the template looks.

Notes:

  • this function has three location = options: top_level to create a top-level analysis/ directory, inst to create an inst/ directory (so that all the sub-directories are available after the package is installed), and vignettes to create a vignettes/ directory (and automatically update the DESCRIPTION). The default is a top-level analysis/.
  • for each option, the contents of the sub-directories are the same, with the following (using the default analysis/ for example):

    analysis/
    |
    ├── paper/
    │   ├── paper.Rmd       # this is the main document to edit
    │   └── references.bib  # this contains the reference list information
    ├── figures/            # location of the figures produced by the Rmd
    |
    ├── data/
    │   ├── raw_data/       # data obtained from elsewhere
    │   └── derived_data/   # data generated during the analysis
    |
    └── templates
    ├── journal-of-archaeological-science.csl
    |                   # this sets the style of citations & reference list
    ├── template.docx   # used to style the output of the paper.Rmd
    └── template.Rmd
    
  • the paper.Rmd is ready to write in and render with bookdown. It includes:

    • a YAML header that identifies the references.bib file and the supplied csl file (to style the reference list)
    • a colophon that adds some git commit details to the end of the document. This means that the output file (HTML/PDF/Word) is always traceable to a specific state of the code.
  • the references.bib file has just one item to demonstrate the format. It is ready to insert more reference details.

  • you can replace the supplied csl file with a different citation style from https://github.com/citation-style-language/

  • we recommend using the citr addin and Zotero to efficiently insert citations while writing in an Rmd file

  • remember that the Imports: field in the DESCRIPTION file must include the names of all packages used in analysis documents (e.g. paper.Rmd). We have a helper function rrtools::add_dependencies_to_description() that will scan the Rmd file, identify libraries used in there, and add them to the DESCRIPTION file.

  • this function has an data_in_git = argument, which is TRUE by default. If set to FALSE you will exclude files in the data/ directory from being tracked by git and prevent them from appearing on GitHub. You should set data_in_git = FALSE if your data files are large (>100 mb is the limit for GitHub) or you do not want to make the data files publicly accessible on GitHub.

    • To load your custom code in the paper.Rmd, you have a few options. You can write all your R code in chunks in the Rmd, that’s the simplest method. Or you can write R code in script files in /R, and include devtools::load_all(".") at the top of your paper.Rmd. Or you can write functions in /R and use library(pkgname) at the top of your paper.Rmd, or omit library and preface each function call with pkgname::. Up to you to choose whatever seems most natural to you.
Next