library(tidybins)
suppressPackageStartupMessages(library(dplyr))
Binning by value is the only original binning method implemented in this package. It is inspired by the case in marketing when accounts need to be binned by their sales. For example, creating 10 bins, where each bin represent 10% of all market sales. The first bin contains the highest sales accounts, thus has the small total number of accounts, whereas the last bin contains the smallest sales accounts, thus requiring the most number of accounts per bin to reach 10% of the market sales.
::tibble(SALES = as.integer(rnorm(1000L, mean = 10000L, sd = 3000))) -> sales_data
tibble
%>%
sales_data bin_cols(SALES, bin_type = "value") -> sales_data1
#> Warning: SALES contains negative values. Negative values are treated as 0.
sales_data1#> # A tibble: 1,000 x 2
#> SALES SALES_va10
#> <int> <int>
#> 1 7979 2
#> 2 5475 1
#> 3 13642 9
#> 4 9723 4
#> 5 17671 10
#> 6 9517 4
#> 7 10351 5
#> 8 2162 1
#> 9 14162 9
#> 10 12246 7
#> # … with 990 more rows
Notice that the sum is equal across bins.
%>%
sales_data1 bin_summary() %>%
print(width = Inf)
#> # A tibble: 11 x 14
#> column method n_bins .rank .min .mean .max .count .uniques
#> <chr> <chr> <int> <int> <int> <dbl> <int> <int> <int>
#> 1 SALES equal value 10 10 14780 15919. 20855 63 62
#> 2 SALES equal value 10 9 13553 14046. 14723 72 72
#> 3 SALES equal value 10 8 12562 13007. 13552 77 76
#> 4 SALES equal value 10 7 11855 12179. 12546 82 76
#> 5 SALES equal value 10 6 11110 11502. 11848 87 84
#> 6 SALES equal value 10 5 10290 10705. 11105 94 88
#> 7 SALES equal value 10 4 9381 9835. 10289 101 95
#> 8 SALES equal value 10 3 8366 8872. 9373 113 111
#> 9 SALES equal value 10 2 7103 7783. 8360 129 120
#> 10 SALES equal value 10 1 1229 5517. 7094 181 177
#> 11 SALES equal value 10 0 -1420 -1420 -1420 1 1
#> relative_value .sum .med .sd width
#> <dbl> <int> <dbl> <dbl> <int>
#> 1 100 1002884 15741 1089. 6075
#> 2 88.2 1011335 14017 356. 1170
#> 3 81.7 1001533 13012 281. 990
#> 4 76.5 998703 12140 216. 691
#> 5 72.3 1000691 11511 211. 738
#> 6 67.2 1006264 10736 256. 815
#> 7 61.8 993372 9797 266. 908
#> 8 55.7 1002549 8869 302. 1007
#> 9 48.9 1004040 7803 357. 1257
#> 10 34.7 998607 5727 1239. 5865
#> 11 -8.92 -1420 -1420 NA 0