We have reverted the change made in hardhat 1.0.0 that caused
recipe preprocessors to drop non-standard roles by default when calling
forge()
. Determining what roles are required at
bake()
time is really something that should be controlled
within recipes, not hardhat. This results in the following changes
(#207):
The new argument, bake_dependent_roles
, that was
added to default_recipe_blueprint()
in 1.0.0 has been
removed. It is no longer needed with the new behavior.
By default, forge()
will pass on all columns from
new_data
to bake()
except those with roles of
"outcome"
or "case_weights"
. With
outcomes = TRUE
, it will also pass on the
"outcome"
role. This is essentially the same as the
pre-1.0.0 behavior, and means that, by default, all non-standard roles
are required at bake()
time. This assumption is now also
enforced by recipes 1.0.0, even if you aren’t using hardhat or a
workflow.
In the development version of recipes, which will become recipes
1.0.0, there is a new update_role_requirements()
function
that can be used to declare that a role is not required at
bake()
time. hardhat now knows how to respect that feature,
and in forge()
it won’t pass on columns of
new_data
to bake()
that have roles that aren’t
required at bake()
time.
Fixed a bug where the results from calling mold()
using hardhat < 1.0.0 were no longer compatible with calling
forge()
in hardhat >= 1.0.0. This could occur if you
save a workflow object after fitting it, then load it into an R session
that uses a newer version of hardhat (#200).
Internal details related to how blueprints work alongside
mold()
and forge()
were heavily re-factored to
support the fix for #200. These changes are mostly internal or developer
focused. They include:
Blueprints no longer store the clean/process functions used when
calling mold()
and forge()
. These were stored
in blueprint$mold$clean()
,
blueprint$mold$process()
,
blueprint$forge$clean()
, and
blueprint$forge$process()
and were strictly for internal
use. Storing them in the blueprint caused problems because blueprints
created with old versions of hardhat were unlikely to be compatible with
newer versions of hardhat. This change means that
new_blueprint()
and the other blueprint constructors no
longer have mold
or forge
arguments.
run_mold()
has been repurposed. Rather than calling
the $clean()
and $process()
functions (which,
as mentioned above, are no longer in the blueprint), the methods for
this S3 generic have been rewritten to directly call the current
versions of the clean and process functions that live in hardhat. This
should result in less accidental breaking changes.
New run_forge()
which is a forge()
equivalent to run_mold()
. It handles the clean/process
steps that were previously handled by the $clean()
and
$process()
functions stored directly in the
blueprint.
Recipe preprocessors now ignore non-standard recipe roles
(i.e. not "outcome"
or "predictor"
) by default
when calling forge()
. Previously, it was assumed that all
non-standard role columns present in the original training data were
also required in the test data when forge()
is called. It
seems to be more often the case that those columns are actually not
required to bake()
new data, and often won’t even be
present when making predictions on new data. For example, a custom
"case_weights"
role might be required for computing
case-weighted estimates at prep()
time, but won’t be
necessary at bake()
time (since the estimates have already
been pre-computed and stored). To account for the case when you do
require a specific non-standard role to be present at
bake()
time, default_recipe_blueprint()
has
gained a new argument, bake_dependent_roles
, which can be
set to a character vector of non-standard roles that are
required.
New weighted_table()
for generating a weighted
contingency table, similar to table()
(#191).
New experimental family of functions for working with case
weights. In particular, frequency_weights()
and
importance_weights()
(#190).
use_modeling_files()
and
create_modeling_package()
no longer open the package
documentation file in the current RStudio session (#192).
rlang >=1.0.2 and vctrs >=0.4.1 are now required.
Bumped required R version to >= 3.4.0
to reflect
tidyverse standards.
Moved tune()
from tune to hardhat (#181).
Added extract_parameter_dials()
and
extract_parameter_set_dials()
generics to extend the family
of extract_*()
generics.
mold()
no longer misinterprets ::
as an
interaction term (#174).
When indicators = "none"
, mold()
no
longer misinterprets factor columns as being part of an inline function
if there is a similarly named non-factor column also present
(#182).
Added a new family of extract_*()
S3 generics for
extracting important components from various tidymodels objects. S3
methods will be defined in other tidymodels packages. For example, tune
will register an extract_workflow()
method to easily
extract the workflow embedded within the result of
tune::last_fit()
.
A logical indicators
argument is no longer allowed
in default_formula_blueprint()
. This was soft-deprecated in
hardhat 0.1.4, but will now result in an error (#144).
use_modeling_files()
(and therefore,
create_modeling_package()
) now ensures that all generated
functions are templated on the model name. This makes it easier to add
multiple models to the same package (#152).
All preprocessors can now mold()
and
forge()
predictors to one of three output formats (either
tibble, matrix, or dgCMatrix
sparse matrix) via the
composition
argument of a blueprint (#100, #150).
Setting indicators = "none"
in
default_formula_blueprint()
no longer accidentally expands
character columns into dummy variable columns. They are now left
completely untouched and pass through as characters. When
indicators = "traditional"
or
indicators = "one_hot"
, character columns are treated as
unordered factors (#139).
The indicators
argument of
default_formula_blueprint()
now takes character input
rather than logical. To update:
indicators = TRUE -> indicators = "traditional"
indicators = FALSE -> indicators = "none"
Logical input for indicators
will continue to work, with
a warning, until hardhat 0.1.6, where it will be formally
deprecated.
There is also a new indicators = "one_hot"
option which
expands all factor columns into K
dummy variable columns
corresponding to the K
levels of that factor, rather than
the more traditional K - 1
expansion.
Updated to stay current with the latest vctrs 0.3.0 conventions.
scream()
is now stricter when checking ordered
factor levels in new data against the ptype
used at
training time. Ordered factors must now have exactly the same
set of levels at training and prediction time. See ?scream
for a new graphic outlining how factor levels are handled
(#132).
The novel factor level check in scream()
no longer
throws a novel level warning on NA
values (#131).
default_recipe_blueprint()
now defaults to prepping
recipes with fresh = TRUE
. This is a safer default, and
guards the user against accidentally skipping this preprocessing step
when tuning (#122).
model_matrix()
now correctly strips all attributes
from the result of the internal call to
model.matrix()
.
forge()
now works correctly when used with a recipe
that has a predictor with multiple roles (#120).
Require recipes 0.1.8 to incorporate an important bug fix with
juice()
and 0-column selections.
NEWS.md
file to track changes to the
package.