Skip to contents

Data checks and summaries: duplicate records, negative analytic values, numbers of analytic results, percentiles of results

Usage

ps_checkData(
  doc = "ps_checkData",
  data,
  CheckDupVars,
  GroupVar,
  Groups = "All",
  ByGroup = TRUE,
  ID = " ",
  AnalyticVars,
  folder = " "
)

Arguments

doc

A character string written to the output list; default is the function name

data

An R object (data frame) containing analytic data

CheckDupVars

A vector with names of identifying variables, typically group and lab ID

GroupVar

The name of variable defining the groups (required)

Groups

A character vector of groups by which numbers of samples and statistics will be tabulated or "All"

ByGroup

Logical: default is TRUE. If FALSE, tabulations are for all groups combined

ID

The name of lab ID, default is " " (no lab ID)

AnalyticVars

A character vector of names of analytic variables for which tabulations are done

folder

The path to the folder in which data frames will be saved; default is " ", no path

Value

The function returns a list with the following components

  • usage: A string with the contents of the argument doc, date run, R version used

  • dataUsed: The data frame specified by the argument data and GroupVar

  • params: A character vector with the values of CheckDupVars, GroupVar, and Groups

  • analyticVars: The vector of names specified by the argument AnalyticVars

  • Duplicates: A data frame containing the observations with duplicate values

  • NegativeValues: A data frame containing the observations with at least one negative value for a variable in AnalyticVars

  • Nvalues: A data frame contain the number of observations with a value for each analytic variable

  • statistics: A data frame containing the descriptive statistics (by group, if ByGroup = TRUE)

  • location: The value of the parameter folder

Detail

AnalyticVars must be a vector of length at least 2. If Groups specifies selected groups (is not equal to "All"), it must be a vector of length at least 2. The function returns a list with four data frames: duplicate observations, observations with negative values for one or more analytic variables, numbers of observations for each analytic variable, and descriptive statistics (quantiles and number missing). If the largest values is < 10 (true if use log10 transforms), the descriptive statistics are rounded to 2 digits, otherwise to integers. If ByGroup=TRUE, numbers of observations and statistics statistics are by group.

Examples

data(ObsidianSources)
analyticVars<-c("Rb","Sr","Y","Zr","Nb")
dataCheck<-ps_checkData(data=ObsidianSources,CheckDupVars=analyticVars,GroupVar="Code",Groups="All",
ByGroup=TRUE, ID = "ID", AnalyticVars=analyticVars)