seplyr
is an R
package that makes it easy to program over dplyr
0.7.*
+ without needing to directly use rlang
notation.
seplyr
seplyr
is a dplyr
adapter layer that prefers “slightly clunkier” standard interfaces (or referentially transparent interfaces), which are actually very powerful and can be used to some advantage.
The above description and comparisons can come off as needlessly broad and painfully abstract. Things are much clearer if we move away from theory and return to our example.
Let’s translate the above example into a re-usable function in small (easy) stages. First translate the interactive script from dplyr
notation into seplyr
notation. This step is a pure re-factoring, we are changing the code without changing its observable external behavior.
The translation is mechanical in that it is mostly using seplyr
documentation as a lookup table. What you have to do is:
dplyr
verbs to their matching seplyr
“*_se()
” adapters.summarize()
) to explicit vectors by adding the “c()
” notation.=
” in expressions with “:=
”.Our converted code looks like the following.
library("dplyr")
library("seplyr")
%>%
starwars group_by_se("homeworld") %>%
summarize_se(c("mean_height" := "mean(height, na.rm = TRUE)",
"mean_mass" := "mean(mass, na.rm = TRUE)",
"count" := "n()"))
## # A tibble: 49 x 4
## homeworld mean_height mean_mass count
## <chr> <dbl> <dbl> <int>
## 1 Alderaan 176. 64 3
## 2 Aleen Minor 79 15 1
## 3 Bespin 175 79 1
## 4 Bestine IV 180 110 1
## 5 Cato Neimoidia 191 90 1
## 6 Cerea 198 82 1
## 7 Champala 196 NaN 1
## 8 Chandrila 150 NaN 1
## 9 Concord Dawn 183 79 1
## 10 Corellia 175 78.5 2
## # … with 39 more rows
This code works the same as the original dplyr
code. Also the translation could be performed by following the small set of explicit re-coding rules that we gave above.
Obviously at this point all we have done is: worked to make the code a bit less pleasant looking. We have yet to see any benefit from this conversion (though we can turn this on its head and say all the original dplyr
notation is saving us is from having to write a few quote marks).
The benefit is: this new code can very easily be parameterized and wrapped in a re-usable function. In fact it is now simpler to do than to describe.
<- function(data,
grouped_mean
grouping_variables,
value_variables,count_name = "count") {
<- paste0("mean_",
result_names
value_variables)<- paste0("mean(",
expressions
value_variables, ", na.rm = TRUE)")
<- result_names := expressions
calculation %>%
data group_by_se(grouping_variables) %>%
summarize_se(c(calculation,
count_name := "n()")) %>%
ungroup()
}
%>%
starwars grouped_mean(grouping_variables = c("eye_color", "skin_color"),
value_variables = c("mass", "birth_year"))
## # A tibble: 53 x 5
## eye_color skin_color mean_mass mean_birth_year count
## <chr> <chr> <dbl> <dbl> <int>
## 1 black green 80.5 44 2
## 2 black grey 78.7 NaN 4
## 3 black none NaN NaN 1
## 4 black orange 80 22 1
## 5 black red, blue, white 57 NaN 1
## 6 black white, blue NaN NaN 1
## 7 blue blue NaN NaN 1
## 8 blue brown 136 NaN 1
## 9 blue dark 50 NaN 1
## 10 blue fair 90 62.6 10
## # … with 43 more rows
We have translated our original interactive or ad-hoc calculation into a parameterized reusable function in two easy steps:
To be sure: there are some clunky details of using to build up the expressions, but the conversion process is very regular and easy. In seplyr
parametric programming is intentionally easy (just replace values with variables).
The seplyr
methodology is simple, easy to teach, and powerful.
There are alternatives that differ in philosophy.
rlang
/tidyeval is often used to build new non-standard interfaces on top of existing non-standard interfaces. This is needed because non-standard interfaces do not naturally compose. The tidy/non-tidy analogy is: it works around a mess by introducing a new mess.wrapr::let()
converts non-standard interfaces into standard referentially transparent or value-oriented interfaces. It tries to help clean up messes.seplyr
is a demonstration of a possible variation of dplyr
where more of the interfaces expect values. It tries to cut down on the initial mess, which in turn cuts down on the need for tools and training in dealing with messes.The seplyr
package contains a number of worked examples both in help()
and vignette(package='seplyr')
documentation.