V 1.0.5, July 2022

FEAT: - New functions compute_probability_ratio and compute_weight_of_evidence to be used for target encoding - New function get_most_frequent_element to identify most frequent element in a list

V 1.0.4

BUGFIX: Fix generate_from_character, when there were some NAs in the column it would drop the line. It is not the case anymore.

V 1.0.3

BUGFIX: Fix bud on fast_is_bijection when column has multiple class FEAT: Harmonize logging levels between functions

V 1.0.2

Remove useless dependencies. Make sure library works on windows, macos, ubuntu, and R versions from 3.3 to 4.1.

V 1.0.1

Based on CRAN feedbacks removed problematic vignettes.

V 1.0.0

For this version 1.0.0 there are a lot of changes, and version is not compatible with previous version of the paclage.

Also there might be some rework to do on code using previous version of this package (and we are orry about it), we strongly believe that this version will be easier to use, faster, and more maintanable in time.

In this version : - All function names and variables are snake_case (there used to be a mix of camel case and snake case) - We remove a lost of useless code that was slowing done the package (particularly garbage collection) - We made the code more readable so that it is easier to contribute to this package - Logging is more explicit and cleaner. - We took into account linting. - A few more functions are availables.

We hope that you will like even more this new version of the package. Please don’t hesitate to provide feedback, warn us about bug, suggest improvements or even better developp some improvements on this package. To do so please go to github (https://github.com/ELToulemonde/dataPreparation/).

V 0.4.3

V 0.4.2

V 0.4.1

V 0.4.0

WARNING: - In aggregate_by_key generated column names are changed. - In aggregate_by_key generated column for character is different.

V 0.3.9

V 0.3.8

V 0.3.7

-Code quality: - Improving code quality using lintr - Suppressing some useless code - Meeting new covr standard - Improve log of setColAsXXX

V 0.3.6

V 0.3.5

WARNING: - one_hot_encoder now requires you to run build_encoding first. - aggregate_by_key now require functions to be passed by character name

This version is making (as much as possible) transformation reproducible on train and test set. This is to prepare future pipeline feature.

V 0.3.4

WARNING: - which_are_included: in case of bijection (col1 is a bijection of col2), they are both included in the other, but the choice of the one to drop might have changed in this version.

V 0.3.3

WARNING: - date3 column in messy_adult data set has changed in order to illustrate the recognition of date character even if there are leading and/or trailing white spaces. - date4 column in messy_adult data set has changed in order to illustrate the recognition of date character even if there are multiple separator.

V 0.3.2

v 0.3.1

v 0.3

WARNING: - date1 column in messy_adult data set has changed in order to illustrate the recognition of date character even if “0” are not present in month or day part.

v 0.2

WARNING: - If you were using diffDates, it is now called generate_date_diffs - date2 column in messy_adult data set have changed in order to illustrate new timestamp features - set_col_as_factorOrLogical doesn’t exist anymore: it as been splitted between set_col_as_factor and generateFromCat - Considering all those changes: shape_set and prepare_set don’t give the same result anymore.

v 0.1: release on CRAN July 2017