A performance example

Before we measure performance of the main functionality of the package, note that something simple as '(a:b)[-i]' can and has been accelerated in this package:

a <- 1L
b <- 1e7L
i <- sample(a:b,1e3)
x <- c(
  R = median(microbenchmark((a:b)[-i], times=times)$time)
, bit = median(microbenchmark(bit_rangediff(c(a,b), i), times=times)$time)
, merge = median(microbenchmark(merge_rangediff(c(a,b), bit_sort(i)), times=times)$time)
)
knitr::kable(as.data.frame(as.list(x/x["R"]*100)), caption="% of time relative to R", digits=1)

Table: % of time relative to R

R bit merge
100 22.7 22.6

The vignette is compiled with the following performance settings: 5 replications with domain size small 1000 and big 106, sample size small 1000 and big 106.

Boolean data types

“A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.”
“Il semble que la perfection soit atteinte non quand il n'y a plus rien à ajouter, mais quand il n'y a plus rien à retrancher”
(Antoine de St. Exupery, Terre des Hommes (Gallimard, 1939), p. 60.)

We compare memory consumption (n=1e+06) and runtime (median of 5 replications) of the different booltypes for the following filter scenarios:

Table: selection characteristic

coin often rare chunk
random 50% random 99% random 1% contiguous chunk of 5%

There are substantial savings in skewed filter situations:

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'rare' scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'often' scenario

Even in non-skewed situations the new booltypes are competitive:

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'coin' scenario

Detailed tables follow.

% memory consumption of filter

Table: % bytes of logical

coin often rare chunk
logical 100.0 100.0 100.0 100.0
bit 3.2 3.2 3.2 3.2
bitwhich 49.9 1.0 1.0 5.0
which 50.1 99.0 1.0 5.0
ri NA NA NA 0.0

% time extracting

Table: % time of logical

coin often rare chunk
logical 9.9 4.6 5.2 NA
bit 18.2 18.4 18.7 NA
bitwhich 77.9 44.4 2.6 NA
which NA NA NA NA
ri NA NA NA NA

% time assigning

Table: % time of logical

coin often rare chunk
logical 29.6 29.8 29.5 NA
bit 58.9 26.4 16.9 NA
bitwhich 164.7 54.1 45.0 NA
which NA NA NA NA
ri NA NA NA NA

% time subscripting with 'which'

Table: % time of logical

coin often rare chunk
logical 14.1 19.0 0.4 NA
bit 31.0 63.1 0.8 NA
bitwhich 47.7 94.8 1.4 NA
which NA NA NA NA
ri NA NA NA NA

% time assigning with 'which'

Table: % time of logical

coin often rare chunk
logical 11.8 22.2 0.4 NA
bit 19.4 34.4 0.7 NA
bitwhich 90.7 28.5 1.6 NA
which NA NA NA NA
ri NA NA NA NA

% time Boolean NOT

Table: % time for Boolean NOT

coin often rare chunk
logical 11.5 11.6 11.6 11.8
bit 1.0 1.1 1.0 1.1
bitwhich 20.7 0.8 0.7 2.7
which NA NA NA NA
ri NA NA NA NA

% time Boolean AND

Table: % time for Boolean &

coin often rare chunk
logical 47.8 19.8 13.2 12.5
bit 2.7 2.7 2.7 2.8
bitwhich 31.6 3.3 3.0 5.5
which NA NA NA NA
ri NA NA NA NA

% time Boolean OR

Table: % time for Boolean |

coin often rare chunk
logical 42.2 12.7 15.1 14.4
bit 2.8 2.7 2.8 2.7
bitwhich 30.7 3.0 3.2 5.8
which NA NA NA NA
ri NA NA NA NA

% time Boolean EQUALITY

Table: % time for Boolean ==

coin often rare chunk
logical 14.0 14.2 14.3 14.3
bit 2.7 2.7 2.7 2.7
bitwhich 18.2 2.8 2.7 4.0
which NA NA NA NA
ri NA NA NA NA

% time Boolean XOR

Table: % time for Boolean !=

coin often rare chunk
logical 15.7 15.6 15.8 16.2
bit 3.0 2.9 2.9 2.9
bitwhich 18.3 2.8 2.7 4.1
which NA NA NA NA
ri NA NA NA NA

% time Boolean SUMMARY

Table: % time for Boolean summary

coin often
logical 100.0 36.9
bit 49.8 12.5

Fast methods for integer set operations

“The space-efficient structure of bitmaps dramatically reduced the run time of sorting”
(Jon Bently, Programming Pearls, Cracking the oyster, p. 7)

Execution time for R (R) and bit (b)

Execution time for R, bit and merge relative to most expensive R in 'unsorted bigbig' scenario

Execution time for R, bit and merge in 'sorted bigbig' scenario

% time for sorting

Table: sorted data relative to R's sort

small big
sort 224.6 613.8
sortunique 99.1 46.2

Table: unsorted data relative to R's sort

small big
sort 48.6 68.9
sortunique 21.1 13.0

% time for unique

Table: sorted data relative to R

small big
bit 129.5 27.6
merge 28.2 11.5
sort 0.0 0.0

Table: unsorted data relative to R

small big
bit 135.6 16.5
merge 203.5 51.5
sort 174.0 45.0

% time for duplicated

Table: sorted data relative to R

small big
bit 267.2 28.4
merge 42.5 13.6
sort 0.0 0.0

Table: unsorted data relative to R

small big
bit 261.2 17.3
merge 285.0 60.5
sort 243.0 52.6

% time for anyDuplicated

Table: sorted data relative to R

small big
bit 174.3 32.4
merge 31.5 12.0
sort 0.0 0.0

Table: unsorted data relative to R

small big
bit 181.6 16.2
merge 276.6 59.6
sort 244.4 53.6

% time for sumDuplicated

Table: sorted data relative to R

small big
bit 143.8 26.3
merge 26.3 9.6
sort 0.0 0.0

Table: unsorted data relative to R

small big
bit 132.2 17.2
merge 199.6 57.9
sort 176.0 52.2

% time for match

Table: sorted data relative to R

smallsmall smallbig bigsmall bigbig
bit NA NA NA NA
merge 54.5 0 24.7 15.1
sort 0.0 0 0.0 0.0

Table: unsorted data relative to R

smallsmall smallbig bigsmall bigbig
bit NA NA NA NA
merge 437.2 53.9 119.0 53.7
sort 381.2 53.9 105.7 47.4

% time for in

Table: sorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 208.7 6.6 15.3 24.7
merge 42.6 0.0 21.7 13.0
sort 0.0 0.0 0.0 0.0

Table: unsorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 224.1 3.3 8.5 10.7
merge 388.1 51.9 103.0 52.1
sort 343.8 51.9 90.7 46.4

% time for notin

Table: sorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 278.0 6.9 13.6 23.8
merge 44.8 0.0 18.1 14.5
sort 0.0 0.0 0.0 0.0

Table: unsorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 303.5 3.4 8.0 10.5
merge 330.1 52.0 92.8 51.6
sort 286.0 52.0 82.0 45.2

% time for union

Table: sorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 84.8 23.2 26.4 13.1
merge 54.3 9.4 9.5 7.1
sort 0.0 0.0 0.0 0.0

Table: unsorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 86.4 15.3 15.5 7.4
merge 207.9 47.2 47.5 33.3
sort 154.1 41.8 41.9 29.0

% time for intersect

Table: sorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 90.1 16 23.9 18.3
merge 31.4 0 0.1 10.0
sort 0.0 0 0.0 0.0

Table: unsorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 86.2 8.6 13.8 9.1
merge 181.8 54.6 98.5 37.2
sort 152.6 54.5 98.5 32.1

% time for setdiff

Table: sorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 99.3 10.2 17.5 21.7
merge 43.4 0.0 6.5 16.5
sort 0.0 0.0 0.0 0.0

Table: unsorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 102.4 5.2 12.2 9.4
merge 277.1 51.3 36.7 53.4
sort 232.0 51.3 32.5 46.0

% time for symdiff

Table: sorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 76.0 12.4 12.4 19.6
merge 15.9 3.4 3.3 6.4
sort 0.0 0.0 0.0 0.0

Table: unsorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 76.6 7.5 7.4 9.0
merge 115.8 17.2 17.0 27.7
sort 100.0 15.1 15.0 24.6

% time for setequal

Table: sorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 105.5 107.4 11.3 11.8
merge 27.8 26.6 5.5 5.4
sort 0.0 0.0 0.0 0.0

Table: unsorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 113.2 117.2 6.4 6.4
merge 200.8 75200.9 18.6 34.0
sort 173.3 75174.1 15.5 31.0

% time for setearly

Table: sorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 107.4 6.5 16.5 11.6
merge 28.4 0.0 0.1 5.9
sort 0.0 0.0 0.0 0.0

Table: unsorted data relative to R

smallsmall smallbig bigsmall bigbig
bit 117.1 3.3 8.7 5.0
merge 206.4 37.9 104.2 26.0
sort 177.8 37.9 104.1 23.5