MLModel
is a function supplied by the MachineShop package. It allows for the integration of statistical and machine learning models supplied by other R packages with the MachineShop model fitting, prediction, and performance assessment tools.
The following are guidelines for writing model constructor functions that are wrappers around the MLModel
function.
In this context, the term “constructor” refers to the wrapper function and “source package” to the package supplying the original model implementation.
The constructor should produce a valid model if called without any arguments; i.e., not have any required arguments.
The source package defaults will be used for parameters with NULL
values.
Model formula, data, and weights are separate from model parameters and should not be defined as constructor arguments.
Include all external packages whose functions are called directly from within the constructor.
Use :: to reference source package functions.
"binary"
, "factor"
, "matrix"
, "numeric"
, "ordered"
, and/or "Surv"
) that can be analyzed with the model.new_params(environment())
if all arguments are to be passed to the source package fit function as supplied. Additional steps may be needed to pass the constructor arguments to the source package in a different format; e.g., when some model parameters must be passed in a control structure, as in C50Model
and CForestModel
.The first three arguments should be formula
, data
, and weights
followed by an ellipsis (...
).
If weights are not supported, the following, or equivalent, should be included in the function:
Only add elements to the resulting fit object if they are needed and will be used in the predict
or varimp
functions.
Return the fit object.
The arguments are a model fit object
, newdata
frame, optionally times
for prediction at survival time points, and an ellipsis.
The predict function should return a vector or column matrix of probabilities for the second level of binary factors, a matrix whose columns contain the probabilities for factors with more than two levels, a matrix of predicted responses if matrix, a vector or column matrix of predicted responses if numeric, a matrix whose columns contain survival probabilities at times
if supplied, or a vector of predicted survival means if times
are not supplied.
Should have a single model fit object
argument followed by an ellipsis.
Variable importance results should generally be returned as a vector with elements named after the corresponding predictor variables. The package will handle conversions to a data frame and VariableImportance
object. If there is more than one set of relevant variable importance measures, they can be returned as a matrix or data frame with predictor variable names as the row names.
Include the first sentences from the source package.
Start sentences with the parameter value type (logical, numeric, character, etc.).
Start sentences with lowercase.
Omit indefinite articles (a, an, etc.) from the starting sentences.
Include response types (binary, factor, matrix, numeric, ordered, and/or Surv).
Include the following sentence:
Default values for the arguments and further model details can be found in the source link below.
MLModel class object.
\code{\link[<source package>]{<fit function>}}, \code{\link{fit}},
\code{\link{resample}}
If adding a new model to the package, save its source code in a file whose name begins with “ML_” followed by the model name, and ending with a .R extension; e.g., "R/ML_CustomModel.R"
.
Export the model in NAMESPACE
.
Add any required packages to the “Suggests” section of DESCRIPTION
.
Add the model to R/models.R
.
Add the model to R/modelinfo.R
.
Add a unit testing file to tests/testthat
.