Version 1.1.0 makes two changes. First, it enables estimation of the conditional misclassification rate of predictions by classification random forests as proposed by Lu and Hardin (2021). Second, it compartmentalizes a costly step in the quantForestError
algorithm: The identification of each training observation’s out-of-bag terminal nodes.
The conditional misclassification rate of predictions by classification random forests can now be estimated. To estimate it, simply set the what
argument in the quantForestError
function to "mcr"
. what
will default to this if the provided forest
is a classification random forest. See the example code in the README for a toy demonstration of the performance of this estimator.
The identification of each training observation’s out-of-bag terminal nodes is now compartmentalized from the main quantForestError
function. By isolating this step from the main quantForestError
function, Version 1.1.0 allows users to more efficiently iterate the algorithm. Users may wish to feed quantForestError
batches of test observations iteratively if they have streaming data or a large test set that cannot be processed in one go due to memory constraints. In previous versions of this package, doing so would require the algorithm to recompute each training observation’s out-of-bag terminal nodes in each iteration. This was redundant and costly. By separating this computation from the rest of the quantForestError
algorithm, Version 1.1.0 allows the user to perform this computation only once.
As part of this modularization, the quantForestError
function now has two additional arguments. If set to TRUE
, return_train_nodes
will return a data.table
identifying each training observation’s out-of-bag terminal nodes. This data.table
can then be fed back into quantForestError
via the argument train_nodes
to avoid the redundant recomputation.
Version 1.1.0 also exports the function that produces the data.table
identifying each training observation’s out-of-bag terminal nodes. It is called findOOBErrors
. Assuming the same inputs, findOOBErrors
will produce the same output that is returned by setting return_train_nodes
to TRUE
in the quantForestError
function.
See the documentation on quantForestError
and findOOBErrors
for examples.
Neither of these changes affects code that relied on Version 1.0.0 of this package, as the changes consist solely of a newly exported function, two optional arguments to quantForestError
that by default do nothing new, and a new possible input for the what
argument.
This package has been updated to reflect the conventional sign of bias (mean prediction minus mean response). Previous versions of the package returned negative bias (mean response minus mean prediction). The sign of any algebraic operations involving the bias outputted by this package must therefore be reversed to preserve their intended effect.
In the future, we hope to implement a stochastic version of the quantForestError
function, in which the parameters are estimated by random subsets of the training sample and/or the trees of the random forest.
Thanks to John Sheffield (Github Profile) for his helpful improvements to the computational performance of this package. (See the Issue Tracker for details.) These changes, which substantially reduce the runtime and memory load of this package’s quantForestError
, perror
, and qerror
functions, have been implemented in Version 0.2.0.
Version 0.2.0 also now allows the user to generate conditional prediction intervals with different type-I error rates in a single call of the quantForestError
function.