This document covers working with combinatorial iterators in
RcppAlgos
. Combinatorial iterators in
RcppAlgos
are memory efficient like traditional iterator
objects. They allow traversal of combinations/permutations one by one
without the necessity for storing all results in memory.
Unlike traditional combinatorial iterators, the iterators in
RcppAlgos
offers random access via the [[
operator. This means, we can access the nth lexicographical
order result on demand without having to first iterate over the
previous n - 1 results.
In order to iterate, we must initialize an iterator via
comboIter
or permuteIter
. The interface is
very similar to comboGeneral
and
permuteGeneral
.
library(RcppAlgos)
## Initialize the iterator
= comboIter(5, 3)
a
## Get the first combination
$nextIter()
a#> [1] 1 2 3
## And the next
$nextIter()
a#> [1] 1 2 4
## Set the current iterator to a variable
= a$currIter()
iter = 1
i
## Iterate until there are no more
while (!is.null(iter)) {
cat(i, " ", iter, "\n")
= a$nextIter()
iter = i + 1
i
}#> 1 1 2 4
#> 2 1 2 5
#> 3 1 3 4
#> 4 1 3 5
#> 5 1 4 5
#> 6 2 3 4
#> 7 2 3 5
#> 8 2 4 5
#> 9 3 4 5
#> No more results. To see the last result, use the prevIter method(s)
## See the output of comboGeneral for comparison
comboGeneral(5, 3, lower = 2)
#> [,1] [,2] [,3]
#> [1,] 1 2 4
#> [2,] 1 2 5
#> [3,] 1 3 4
#> [4,] 1 3 5
#> [5,] 1 4 5
#> [6,] 2 3 4
#> [7,] 2 3 5
#> [8,] 2 4 5
#> [9,] 3 4 5
## Call the summary method to see information about our iterator
$summary()
a#> $description
#> [1] "Combinations of 5 choose 3"
#>
#> $currentIndex
#> [1] 11
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] -1
Some of the combinatorial iterators in RcppAlgos
are
bidirectional iterators. This means that not only can we iterate in a
forward manner (i.e. lexicographically), but we can also iterate
backwards (i.e. Reverse
Lexicographical Order) via the prevIter
method(s).
## Using the same iterable from the previous section
$currIter()
a#> No more results. To see the last result, use the prevIter method(s)
#>
#> NULL
## As the comment says, we call the prevIter method to see the last result
$prevIter()
a#> [1] 3 4 5
## Get the previous result
$prevIter()
a#> [1] 2 4 5
## As in the previous example, we set the current iterator to a variable
= a$currIter()
iter
## Defined above
print(i)
#> [1] 10
## Iterate until we are at the very beginning. Note that the
## output is exactly the same as above, but in reverse order
while (!is.null(iter)) {
= i - 1
i cat(i, " ", iter, "\n")
= a$prevIter()
iter
}#> 9 2 4 5
#> 8 2 3 5
#> 7 2 3 4
#> 6 1 4 5
#> 5 1 3 5
#> 4 1 3 4
#> 3 1 2 5
#> 2 1 2 4
#> 1 1 2 3
#> Iterator Initialized. To see the first result, use the nextIter method(s)
## Call the summary method to see information about our iterator
$summary()
a#> $description
#> [1] "Combinations of 5 choose 3"
#>
#> $currentIndex
#> [1] 0
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] 10
There are four methods which allow for obtaining more than one result
at a time: nextNIter
, prevNIter
,
nextRemaining
, and prevRemaining
.
## Reset the iterator
$startOver()
a
## Get the next 4 combinations
$nextNIter(4)
a#> [,1] [,2] [,3]
#> [1,] 1 2 3
#> [2,] 1 2 4
#> [3,] 1 2 5
#> [4,] 1 3 4
## Get the summary. Note that the index has been updated
$summary()
a#> $description
#> [1] "Combinations of 5 choose 3"
#>
#> $currentIndex
#> [1] 4
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] 6
## View the current combination
$currIter()
a#> [1] 1 3 4
## Get the remaining combinations with nextRemaining
$nextRemaining()
a#> [,1] [,2] [,3]
#> [1,] 1 3 5
#> [2,] 1 4 5
#> [3,] 2 3 4
#> [4,] 2 3 5
#> [5,] 2 4 5
#> [6,] 3 4 5
$summary()
a#> $description
#> [1] "Combinations of 5 choose 3"
#>
#> $currentIndex
#> [1] 11
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] -1
Now, we look at the opposite direction.
## Get the previous 4 combinations
$prevNIter(4)
a#> [,1] [,2] [,3]
#> [1,] 3 4 5
#> [2,] 2 4 5
#> [3,] 2 3 5
#> [4,] 2 3 4
## Get the summary. Note that the index has been updated
$summary()
a#> $description
#> [1] "Combinations of 5 choose 3"
#>
#> $currentIndex
#> [1] 7
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] 3
## View the current combination
$currIter()
a#> [1] 2 3 4
## Get the remaining previous combinations with prevRemaining
$prevRemaining()
a#> [,1] [,2] [,3]
#> [1,] 1 4 5
#> [2,] 1 3 5
#> [3,] 1 3 4
#> [4,] 1 2 5
#> [5,] 1 2 4
#> [6,] 1 2 3
$summary()
a#> $description
#> [1] "Combinations of 5 choose 3"
#>
#> $currentIndex
#> [1] 0
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] 10
As with the bidirectional iterators, with some of the combinatorial
iterators in RcppAlgos
, we can jump to the
nth result without the need for iterating over the
first n - 1 results.
## Reset the iterator
$startOver()
a
## How many total combinations do we have?
$summary()$totalResults
a#> [1] 10
## Let's get the 3rd combination
3]]
a[[#> [1] 1 2 5
## See the summary. Note that the index has been updated
$summary()
a#> $description
#> [1] "Combinations of 5 choose 3"
#>
#> $currentIndex
#> [1] 3
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] 7
## Let's see the 9th combination
9]]
a[[#> [1] 2 4 5
## What about the first and last combination?
$front()
a#> [1] 1 2 3
$back()
a#> [1] 3 4 5
## Again the index has been updated
$summary()
a#> $description
#> [1] "Combinations of 5 choose 3"
#>
#> $currentIndex
#> [1] 10
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] 0
$currIter()
a#> [1] 3 4 5
We can also easily return a random sample of combinations with the
[[
operator by passing a vector of indices. In these cases,
it should be noted that the current index will not be updated.
## Set the current index to the second combination
2]]
a[[#> [1] 1 2 4
set.seed(121)
= sample(a$summary()$totalResults, 4)
samp
samp#> [1] 4 7 10 1
a[[samp]]#> [,1] [,2] [,3]
#> [1,] 1 3 4
#> [2,] 2 3 4
#> [3,] 3 4 5
#> [4,] 1 2 3
## Note that the current index remains unchanged
$summary()
a#> $description
#> [1] "Combinations of 5 choose 3"
#>
#> $currentIndex
#> [1] 2
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] 8
Just as with comboGeneral
and
permuteGeneral
, we can pass a user defined function to
comboIter
and permuteIter
.
## Initialize the iterator
= permuteIter(LETTERS[1:4], 3, FUN = function(p) paste(p, collapse = ""),
b FUN.VALUE = "a")
$nextIter()
b#> [1] "ABC"
$nextNIter(5)
b#> [1] "ABD" "ACB" "ACD" "ADB" "ADC"
$back()
b#> [1] "DCB"
$prevIter()
b#> [1] "DCA"
$prevNIter(5)
b#> [1] "DBC" "DBA" "DAC" "DAB" "CDB"
$nextRemaining()
b#> [1] "DAB" "DAC" "DBA" "DBC" "DCA" "DCB"
## Random access
5]]
b[[#> [1] "ADB"
$prevRemaining()
b#> [1] "ACD" "ACB" "ABD" "ABC"
## View the source vector
$sourceVector()
b#> [1] "A" "B" "C" "D"
2.5.0
As of version 2.5.0
, we no longer rely on
Rcpp
as a dependency, which means that we do not utilize
Rcpp
modules for exposing C++ classes. This is now carried
out using external pointers (See External
pointers and weak references) along with S4 Classes. We use the slots
of S4
classes for exposing each method so access is carried
out with the “at sign”, @
. We have also added the ability
to access each method with the “dollar sign”, $
, for
backwards compatibility.
2.5.0
Our tests show that accessing methods is much more efficient in
2.5.0
compared to prior versions. In the below tests, we
measure excecution time of calling nextIter
multiple times
in different versions. We will use the function
test_nextIter
for our testing. If one needs to reproduce,
simply download the 2.4.3
tar here: https://cran.r-project.org/src/contrib/Archive/RcppAlgos/,
change RcppAlgos
to RcppAlgos243
in a few
place (e.g. DESCRIPTION
, NAMESPACE
, etc.), and
rebuild.
<- function(n, m, get_val = FALSE, v = 243) {
test_nextIter <- if (v == 243) {
a ::comboIter(n, m)
RcppAlgos243else {
} comboIter(n, m)
}
<- comboCount(n, m)
total
if (get_val) {
<- matrix(0L, nrow = total, ncol = m)
mat for (i in 1:total) mat[i, ] <- a$nextIter()
return(mat)
else {
} if (v == 243) {
for (i in 1:total) a$nextIter()
else {
} for (i in 1:total) a@nextIter()
}
invisible(NULL)
} }
2.4.3
Using Rcpp
library(microbenchmark)
## Using R version 4.1.3
comboCount(15, 8)
#> [1] 6435
microbenchmark(test_nextIter(15, 8))
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> test_nextIter(15, 8) 46.31869 51.43212 57.28727 51.93104 52.92773 516.2674 100
identical(test_nextIter(15, 8, get_val = TRUE), comboGeneral(15, 8))
#> [1] TRUE
comboCount(25, 10)
#> [1] 3268760
system.time(test_nextIter(25, 10))
#> user system elapsed
#> 23.537 0.070 23.614
Rprof("Version243.out", memory.profiling = TRUE)
test_nextIter(25, 10)
Rprof(NULL)
summaryRprof("Version243.out", memory = "both")
#> $by.self
#> self.time self.pct total.time total.pct mem.total
#> "as.environment" 6.98 32.37 6.98 32.37 4045.5
#> "$" 6.28 29.13 16.40 76.07 9433.6
#> "test_nextIter" 2.36 10.95 21.56 100.00 12438.3
#> ".External" 2.02 9.37 2.02 9.37 1172.9
#> "exists" 1.50 6.96 1.50 6.96 823.7
#> "get" 1.50 6.96 1.50 6.96 856.2
#> "a$nextIter" 0.78 3.62 2.80 12.99 1620.6
#> "is.symbol" 0.14 0.65 0.14 0.65 72.5
#>
#> $by.total
#> total.time total.pct mem.total self.time self.pct
#> "test_nextIter" 21.56 100.00 12438.3 2.36 10.95
#> "$" 16.40 76.07 9433.6 6.28 29.13
#> "as.environment" 6.98 32.37 4045.5 6.98 32.37
#> "a$nextIter" 2.80 12.99 1620.6 0.78 3.62
#> ".External" 2.02 9.37 1172.9 2.02 9.37
#> "exists" 1.50 6.96 823.7 1.50 6.96
#> "get" 1.50 6.96 856.2 1.50 6.96
#> "is.symbol" 0.14 0.65 72.5 0.14 0.65
#>
#> $sample.interval
#> [1] 0.02
#>
#> $sampling.time
#> [1] 21.56
2.5.0
(No Rcpp
)microbenchmark(test_nextIter(15, 8, v = 250))
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> test_nextIter(15, 8, v = 251) 3.872621 4.105784 4.464846 4.376151 4.647602 9.961372 100
system.time(test_nextIter(25, 10, v = 250))
#> user system elapsed
#> 2.093 0.011 2.110
identical(test_nextIter(15, 8, get_val = TRUE, v = 250), comboGeneral(15, 8))
#> [1] TRUE
Rprof("Version250.out", memory.profiling = TRUE)
test_nextIter(25, 10, v = 250)
Rprof(NULL)
summaryRprof("Version250.out", memory = "both")
#> $by.self
#> self.time self.pct total.time total.pct mem.total
#> "<Anonymous>" 1.10 55.56 1.66 83.84 701.2
#> ".Call" 0.56 28.28 0.56 28.28 255.6
#> "test_nextIter" 0.32 16.16 1.98 100.00 831.8
#>
#> $by.total
#> total.time total.pct mem.total self.time self.pct
#> "test_nextIter" 1.98 100.00 831.8 0.32 16.16
#> "<Anonymous>" 1.66 83.84 701.2 1.10 55.56
#> ".Call" 0.56 28.28 255.6 0.56 28.28
#>
#> $sample.interval
#> [1] 0.02
#>
#> $sampling.time
#> [1] 1.98
It appears that memory is the issue in previous versions. Indeed, if
we look at Memory
statistics from Rprof, and view both files with
memory = "stats"
we see that the C funciton,
duplicate
, appears to be the main culprit.
### Verison 2.4.3
summaryRprof("Version243.out", memory = "stats")
#> index: "test_nextIter"
#> vsize.small max.vsize.small vsize.large max.vsize.large nodes max.nodes duplications tot.duplications samples
#> 594132 981272 0 0 11705787 19332264 9200 1085637 118
#> --------------------------------------------------------------------------------------------------------------------------------------------------------
#> index: "test_nextIter":"$"
#> vsize.small max.vsize.small vsize.large max.vsize.large nodes max.nodes duplications tot.duplications samples
#> 602919 16615120 19309 15833656 11636867 128195480 9069 7436251 820
#> --------------------------------------------------------------------------------------------------------------------------------------------------------
#> index: "test_nextIter":"a$nextIter"
#> vsize.small max.vsize.small vsize.large max.vsize.large nodes max.nodes duplications tot.duplications samples
#> 586247 980856 0 0 11551771 19329128 9169 1283636 140
## Version 2.5.0
summaryRprof("Version250.out", memory = "stats")
#> index: "test_nextIter"
#> vsize.small max.vsize.small vsize.large max.vsize.large nodes max.nodes duplications tot.duplications samples
#> 1901132 3243968 0 0 6653888 11353832 0 0 16
#> --------------------------------------------------------------------------------------------------------------------------------------------------------
#> index: "test_nextIter":"<Anonymous>"
#> vsize.small max.vsize.small vsize.large max.vsize.large nodes max.nodes duplications tot.duplications samples
#> 2412388 36833808 196363 16298112 9270795 197594096 0 7 83
With verison 2.5.0
there are only 7
tot.duplications
whereas with version 2.4.3
there are millions of tot.duplications
. In fact, there are
a total of 1085637 + 7436251 + 1283636 = 9,805,524
duplications with version 2.4.3
. This together with
comboCount(25, 10) = 3,268,760
implies that the C funciton,
duplicate
, is called about 3 times per iteration with older
versions (i.e. 9805524 / 3268760 ~= 2.999769
).
For most partition cases, we have all of the capabilities of the
standard comboIter
and permuteIter
except for
bidirectionality (i.e. the prevIter
methods). For cases
involving standard multisets we also don’t have random access
methods.
## Similar illustration of comboIter(5, 3) at the top
= partitionsIter(16, 4)
p @nextIter()
p#> [1] 1 2 3 10
@nextIter()
p#> [1] 1 2 4 9
= p@currIter()
iter = 1
i
while (!is.null(iter)) {
cat(i, " ", iter, "\n")
= p@nextIter()
iter = i + 1
i
}#> 1 1 2 4 9
#> 2 1 2 5 8
#> 3 1 2 6 7
#> 4 1 3 4 8
#> 5 1 3 5 7
#> 6 1 4 5 6
#> 7 2 3 4 7
#> 8 2 3 5 6
#> No more results.
partitionsGeneral(16, 4, lower = 2)
#> [,1] [,2] [,3] [,4]
#> [1,] 1 2 4 9
#> [2,] 1 2 5 8
#> [3,] 1 2 6 7
#> [4,] 1 3 4 8
#> [5,] 1 3 5 7
#> [6,] 1 4 5 6
#> [7,] 2 3 4 7
#> [8,] 2 3 5 6
@summary()
p#> $description
#> [1] "Partitions of 16 into 4 parts"
#>
#> $currentIndex
#> [1] 10
#>
#> $totalResults
#> [1] 9
#>
#> $totalRemaining
#> [1] -1
## Using random access
7]]
p[[#> [1] 1 4 5 6
## No previous iterators
@prevIter()
p#> Error: no slot of name "prevIter" for this object of class "Partitions"
Now, the combinatorial iterators have all of the features of their
“general” analogs (I.e. {combo|permute|partitions}General
),
which includes constrained results.
For general constrained cases, these iterators offer huge advantages
over their “general” counterparts. Previously, one had to guess how many
results there would be using the upper
parameter as
executing the function with no constraints meant the user could be
waiting for a while or consume a large amount of resources.
Another drawback is that it difficult to start generating from a
particular point. With the “general” functions, if the
lower
parameter is used, we have to make a decision in
order to disambiguate the use. Without constraints, using
lower
is easy to understand. It simply means to start
generating results starting at a particular lexicographical result,
which we can do efficiently (i.e. no need to generate the first
lower - 1
results). With constraints, it could mean one of
two things:
In RcppAlgos
we have always used the first
interpretation. A big downside for the second point is that we don’t
have any fast algorithms for enumerating the total number of results,
which reduces determining the nth result to a brute
force approach.
With iterators, we can generate n results with
nextNIter(n)
or calling nextIter()
n
times (or some combination of the two). Then, if we want to continue
iterating, we pick up where we left off fetching the (n +
1)th result and beyond (if there are any results left).
This allows us to keep memory low without sacrificing our current
state.
set.seed(55)
= runif(10, -5, 5)
s
print(s)
#> [1] 0.478135161 -2.818403214 -4.650360052 2.915492940
#> [5] 0.602420762 -4.257748260 -3.684770642 -2.058761222
#> [9] 0.007612633 -4.116755421
## Using comboGeneral to retrieve all results
comboGeneral(s, 5, constraintFun = "mean",
comparisonFun = "<", limitConstraints = -3)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -4.650360 -4.257748 -4.116755 -3.684771 -2.818403214
#> [2,] -4.650360 -4.257748 -4.116755 -3.684771 -2.058761222
#> [3,] -4.650360 -4.257748 -4.116755 -3.684771 0.007612633
#> [4,] -4.650360 -4.257748 -4.116755 -3.684771 0.478135161
#> [5,] -4.650360 -4.257748 -4.116755 -3.684771 0.602420762
#> [6,] -4.650360 -4.257748 -4.116755 -2.818403 -2.058761222
#> [7,] -4.650360 -4.257748 -4.116755 -2.818403 0.007612633
#> [8,] -4.650360 -4.257748 -4.116755 -2.818403 0.478135161
#> [9,] -4.650360 -4.257748 -4.116755 -2.818403 0.602420762
#> [10,] -4.650360 -4.257748 -4.116755 -2.058761 0.007612633
#> [11,] -4.650360 -4.257748 -3.684771 -2.818403 -2.058761222
#> [12,] -4.650360 -4.257748 -3.684771 -2.818403 0.007612633
#> [13,] -4.650360 -4.116755 -3.684771 -2.818403 -2.058761222
#> [14,] -4.650360 -4.116755 -3.684771 -2.818403 0.007612633
#> [15,] -4.257748 -4.116755 -3.684771 -2.818403 -2.058761222
## Using comboIter
= comboIter(s, 5, constraintFun = "mean",
a comparisonFun = "<", limitConstraints = -3)
## See the first result
@nextIter()
a#> [1] -4.650360 -4.257748 -4.116755 -3.684771 -2.818403
## Get the next three
@nextNIter(3)
a#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -4.65036 -4.257748 -4.116755 -3.684771 -2.058761222
#> [2,] -4.65036 -4.257748 -4.116755 -3.684771 0.007612633
#> [3,] -4.65036 -4.257748 -4.116755 -3.684771 0.478135161
## See the summary... Note the totalResults and totalRemaining
## fields are NA as we are not able to calculate this upfront.
@summary()
a#> $description
#> [1] "Combinations of 10 choose 5 where the mean is < -3"
#>
#> $currentIndex
#> [1] 4
#>
#> $totalResults
#> [1] NA
#>
#> $totalRemaining
#> [1] NA
@nextNIter(3)
a#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -4.65036 -4.257748 -4.116755 -3.684771 0.602420762
#> [2,] -4.65036 -4.257748 -4.116755 -2.818403 -2.058761222
#> [3,] -4.65036 -4.257748 -4.116755 -2.818403 0.007612633
## Get the rest
@nextRemaining()
a#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -4.650360 -4.257748 -4.116755 -2.818403 0.478135161
#> [2,] -4.650360 -4.257748 -4.116755 -2.818403 0.602420762
#> [3,] -4.650360 -4.257748 -4.116755 -2.058761 0.007612633
#> [4,] -4.650360 -4.257748 -3.684771 -2.818403 -2.058761222
#> [5,] -4.650360 -4.257748 -3.684771 -2.818403 0.007612633
#> [6,] -4.650360 -4.116755 -3.684771 -2.818403 -2.058761222
#> [7,] -4.650360 -4.116755 -3.684771 -2.818403 0.007612633
#> [8,] -4.257748 -4.116755 -3.684771 -2.818403 -2.058761222
They are very efficient as well. Consider the example below where we
use comboGeneral
to generate all results without capping
the output. Again, we are in a situation where we don’t know a
priori how many results we will obtain.
set.seed(77)
= runif(50, 20, 100)
s
## Over one trillion results to sift through
comboCount(s, 15)
#> [1] 2.25083e+12
system.time({
print(
nrow(
comboGeneral(s, 15,
constraintFun = "mean",
comparisonFun = ">",
limitConstraints = 83)
)
)
})#> [1] 38935252
#> user system elapsed
#> 6.140 4.446 11.208
## Over 4 GBs of results
38935252 * 15 * 8) / 2^30
(#> [1] 4.351353
Just over 11 seconds isn’t bad, however 4 GBs could put a strain on your computer.
Let’s use iterators instead and only generate ten thousand at a time to keep memory low. We should mention here that the iterators are “smart” in that there is no fear in requesting more results than what is actually left. For example, if in the problem above, we had iterated to the 38th million result and requested 10 million more, we would only obtain 935,252 results.
system.time({
= comboIter(s, 15,
a constraintFun = "mean",
comparisonFun = ">",
limitConstraints = 83)
while (!is.null(a@nextNIter(1e4))) {}
print(a@summary())
})#> No more results.
#>
#> $description
#> [1] "Combinations of 50 choose 15 where the mean is > 83"
#>
#> $currentIndex
#> [1] 38935252
#>
#> $totalResults
#> [1] NA
#>
#> $totalRemaining
#> [1] NA
#>
#> user system elapsed
#> 2.852 1.051 3.907
## Only 11 MBs per iteration
1e4 * 15 * 8) / 2^20
(#> [1] 1.144409
Wow! Using the iterator approach is almost 3 times faster
(11.208 / 3.907 ~= 2.869
)! Our gains came strictly from
memory efficiency (From over 4 GBs to just over 1 MB) as the underlying
algorithm is exactly the same.