ps_tree
ps_tree.Rd
Fit a recursive partitioning model (classification tree) to data from sources
Usage
ps_tree(
doc = "ps_tree",
data,
GroupVar,
Groups = "All",
AnalyticVars,
wts = NA,
Seed = 11111,
CpDigits = 3,
plotTree = TRUE,
plotCp = TRUE,
Model,
ModelTitle,
minSplit = 20,
cP = 0.01,
predictSources = TRUE,
predictUnknowns = FALSE,
unknownData,
ID = " ",
unknownID = " ",
folder = " "
)
Arguments
- doc
A string with documentation added to defintion of usage, default is ps_tree (the function name)
- data
A data frame with the source data to be analyzed
- GroupVar
The name of the variable defining groups, grouping is required
- Groups
A vector of codes for groups to be used, 'All' (the default) if use all groups
- AnalyticVars
A vector with the names (character values) of the analytic variables
- wts
Option to weight the observations, if used, vector with length nrow(data); if NA (the default), assume equal weights
- Seed
A positive integer, to produce a reproducible analysis
- CpDigits
The number of significant digits to display in the Cp table, default value is 3
- plotTree
Logical. If TRUE (the default), plot the recursive partitioning tree
- plotCp
Logical. If TRUE (the default), plot the Cp table values
- Model
A character string containing the names of the variables (characters) considered separated by + signs
- ModelTitle
The parameter Model as a single character value
- minSplit
The minimum size of a group for splitting, default is 20 (the default in rpart())
- cP
The required improvement in Cp for a group to be split, default is .01 (the default in rpart())
- predictSources
Logical: if TRUE, use the tree to predict sources for the source data; default is TRUE
- predictUnknowns
Logical: if TRUE, use the tree to predict sources for observations in unknownData; default is FALSE
- unknownData
Data frame with data used to predict sources, must contain all variables in AnalyticVars
- ID
If not " " (the default), the name of a variable identifying a sample in data
- unknownID
If not " " (the default), the name of a variable identifying a sample in unknownData
- folder
The path to the folder in which data frames will be saved; default is " "
Value
The function returns a list with the following components:
usage: A string with the contents of the argument doc, the date run, the version of R used
dataUsed: The contents of the argument data restricted to the groups used
params_grouping: A list with the values of the arguments GroupVar and Groups
analyticVars: A vector with the value of the argument AnalyticVars
params: A list with the values of the grouping, logical, and splitting parameters
Seed: A positive integer to set the random number generator
model: A character string with the value of the argument ModelTitle
treeFit: A list with details of the tree construction_
classification: A data frame showing the crossclassification of sources and predicted sources. Rows represent sources, columns represent predicted source
CpTable: A data frame showing the decrease in Cp with increasing numbers of splits
predictedSource: If predictSources = TRUE, a data frame with the predicted source for each source sample, plus the known source, the sample ID (if given), and the analytic variable values
predictedProbs: If predictSources = TRUE, a data frame with the set of prediction probabilities for each source sample, plus the known source and sample ID (if given)
predictedSourceUnknowns: If predictUnknowns = TRUE, a data frame with the predicted source for each unknown sample, plus the the sample ID (if given) and the analytic variable values
predictedProbsUnknowns: If predictUnknowns = TRUE, a data frame with the set of prediction probabilities for each unknown sample, plus the sample ID (if given)
errorRate: If predictSources = TRUE, the proportion of misassigned source samples
errorCount: If predictSources = TRUE, a vector with the number of misassigned sources and total number of sources
predictedTotalsUnknowns: If predictUnknowns = TRUE, a vector with the number of objects predicted to be from each source
location: The value of the argument folder
Details
The function fits a classification tree model us the R function rpart(). The variables in AnalyticVars are considered in the order in which they appear in the Model argument (from left to right). See the vignette for more details.
Examples
# Analyze the obsidian source data with variables in the model statement in order of
# importance from a random forest analysis
data(ObsidianSources)
analyticVars<-c("Rb","Sr","Y","Zr","Nb")
save_tree <- ps_tree(data=ObsidianSources, GroupVar="Code",Groups="All",
AnalyticVars=analyticVars, Model = "Rb"+"Sr"+"Y"+"Zr"+"Nb",
ModelTitle = "Sr + Nb + Rb + Y + Zr", predictSources=TRUE, predictUnknowns=FALSE,
ID="ID")
# Predict the sources of the obsidian artifacts
data(ObsidianSources)
data(ObsidianArtifacts)
analyticVars<-c("Rb","Sr","Y","Zr","Nb")
save_tree <- ps_tree(data=ObsidianSources, GroupVar="Code",Groups="All",
AnalyticVars=analyticVars, Model = "Rb"+"Sr"+"Y"+"Zr"+"Nb",
ModelTitle = "Sr + Nb + Rb + Y + Zr", predictSources=FALSE, predictUnknowns=TRUE,
unknownData=ObsidianArtifacts, unknownID="ID")