step_collapse_cart()
can pool a predictor’s factor
levels using a tree-based method.
step_collapse_stringdist()
can pool a predictor’s
factor levels using string distances.
Case weights support have been added to
step_discretize_cart()
, step_discretize_xgb()
,
step_lencode_bayes()
, step_lencode_glm()
, and
step_lencode_mixed()
.
step_embed()
now correctly defaults to have a random
id with the word “embed”. (#102)
step_feature_hash()
is soft deprecated in embed in
favor of step_dummy_hash()
in textrecipes. (#95)
Steps now have a dedicated subsection detailing what happens when
tidy()
is applied. (#105)
Reorganize documentation for all recipe step tidy
methods (#115).
Fixed a bug where woe_table()
and
step_woe()
didn’t respect the factor levels of the outcome.
(109)
Re-licensed package from GPL-2 to MIT. See consent from copyright holders here.
The tunable parameter ranges for step_umap()
were
changed for neighbors
, num_comp
, and
min_dist
to prevent uwot
segmentation faults.
The step also check to see if the data dimensions are consistent with
the argument values.
Two new PCA steps were added, each using sparse techniques for
estimation: step_pca_sparse()
and
step_pca_sparse_bayes()
.
Updated to use recipes_eval_select()
from recipes
0.1.17 (#85).
Added prefix
argument to step_umap()
to
harmonize with other recipes steps (#93).
All embed recipe steps now officially support empty selections to be more aligned with recipes, dplyr and other packages that use tidyselect.
step_woe()
no longer warns about high-cardinality
predictors when the recipe is estimated. Instead it warns when
categories have fewer than 10 data points in the training set.
(#74)
Minor release with changes to test for cases when CRAN cannot get
xgboost
to work on their Solaris configuration.
lme4
and rstanarm
are now in the
Suggests list so they are not automatically installed with
embed
. A message is written to the console if those
packages are missing and their associated steps functions are
invoked.
Changes to tests to get out of archive jail.
Updated the plumbing behind step_woe()
.
Due to a bug in tensorflow
, added a “warm start” to
instigate a TF session if one does not currently exist.
dplyr
1.0.0step_discretize_xgb()
and
step_discretize_cart()
can be used to convert numeric
predictors to categorical using supervised binning methods based on tree
models. Thanks to Konrad Semsch for the contribution.
Added step_feature_hash()
for creating dummy
variables using feature hashing.
tidy.step_woe()
now has column names consistent with
other recipe steps.stringsAsFactors
change.embed
0.0.5The example data are now in the modeldata
package.
Small TF updates to step_embed()
.
embed
0.0.4Methods were added for a future generic called
tunable()
. This outlines which parameters in a step
can/could be tuned.
Small updates to work with different versions of
tidyr
.
embed
0.0.3step_umap()
was added for both supervised and
unsupervised encodings.step_woe()
created weight of evidence encodings.embed
0.0.2A mostly maintainence release to be compatible with version 0.1.3 of
recipes
.
The package now depends on the generics
pacakge to
get the broom
tidy
methods.
Karim Lahrichi added the ability to use callbacks when fitting tensorflow models. PR
embed
0.0.1First CRAN version