santoku santoku logo

CRAN status Lifecycle: experimental CRAN Downloads Per Month R-CMD-check AppVeyor build status Codecov test coverage

santoku is a versatile cutting tool for R. It provides chop(), a replacement for base::cut().

Advantages

Here are some advantages of santoku:

These advantages make santoku especially useful for exploratory analysis, where you may not know the range of your data in advance.

Examples

library(santoku)

chop returns a factor:

chop(1:8, c(3, 5, 7))
#> [1] [1, 3) [1, 3) [3, 5) [3, 5) [5, 7) [5, 7) [7, 8] [7, 8]
#> Levels: [1, 3) [3, 5) [5, 7) [7, 8]

Include a number twice to match it exactly:

chop(1:8, c(3, 5, 5, 7))
#> [1] [1, 3) [1, 3) [3, 5) [3, 5) {5}    (5, 7) [7, 8] [7, 8]
#> Levels: [1, 3) [3, 5) {5} (5, 7) [7, 8]

Customize output with lbl_* functions:

chop(1:8, c(3, 5, 7), labels = lbl_dash())
#> [1] 1—3 1—3 3—5 3—5 5—7 5—7 7—8 7—8
#> Levels: 1—3 3—5 5—7 7—8

Chop into fixed-width intervals:

chop_width(runif(10), 0.1)
#>  [1] [0.8278, 0.9278)  [0.8278, 0.9278)  [0.8278, 0.9278)  [0.3278, 0.4278) 
#>  [5] [0.7278, 0.8278)  [0.2278, 0.3278)  [0.9278, 1.028)   [0.02781, 0.1278)
#>  [9] [0.9278, 1.028)   [0.02781, 0.1278)
#> 6 Levels: [0.02781, 0.1278) [0.2278, 0.3278) ... [0.9278, 1.028)

Or into fixed-size groups:

chop_n(1:10, 5)
#>  [1] [1, 6)  [1, 6)  [1, 6)  [1, 6)  [1, 6)  [6, 10] [6, 10] [6, 10] [6, 10]
#> [10] [6, 10]
#> Levels: [1, 6) [6, 10]

Chop dates by calendar month, then tabulate:

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

tab_width(as.Date("2021-12-31") + 1:90, months(1), 
            labels = lbl_discrete(fmt = "%d %b")
          )
#> 01 Jan—31 Jan 01 Feb—28 Feb 01 Mar—31 Mar 
#>            31            28            31

For more information, see the vignette.