tidytable
?tidyverse
-like syntaxdata.table
and the tidyverse
’s
vctrs
Install the released version from CRAN with:
install.packages("tidytable")
Or install the development version from GitHub with:
# install.packages("devtools")
::install_github("markfairbanks/tidytable") devtools
tidytable
uses verb.()
syntax to replicate
tidyverse
functions:
library(tidytable)
<- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df
%>%
df select.(x, y, z) %>%
filter.(x < 4, y > 1) %>%
arrange.(x, y) %>%
mutate.(double_x = x * 2,
x_plus_y = x + y)
#> # A tidytable: 3 × 5
#> x y z double_x x_plus_y
#> <int> <int> <chr> <dbl> <int>
#> 1 1 4 a 2 5
#> 2 2 5 a 4 7
#> 3 3 6 b 6 9
A full list of functions can be found here.
Group by calls are done by using the .by
argument of any
function that has “by group” functionality.
.by = z
.by = c(y, z)
%>%
df summarize.(avg_x = mean(x),
count = n(),
.by = z)
#> # A tidytable: 2 × 3
#> z avg_x count
#> <chr> <dbl> <int>
#> 1 a 1.5 2
#> 2 b 3 1
.by
vs. group_by()
tidytable
follows data.table
semantics
where .by
must be called each time you want a function to
operate “by group”.
Below is some example tidytable
code that utilizes
.by
that we’ll then compare to its dplyr
equivalent. The goal is to grab the first two rows of each group using
slice.()
, then add a group row number column using
mutate.()
:
library(tidytable)
<- data.table(x = c("a", "a", "a", "b", "b"))
df
%>%
df slice.(1:2, .by = x) %>%
mutate.(group_row_num = row_number(), .by = x)
#> # A tidytable: 4 × 2
#> x group_row_num
#> <chr> <int>
#> 1 a 1
#> 2 a 2
#> 3 b 1
#> 4 b 2
Note how .by
is called in both slice.()
and
mutate.()
.
Compared to a dplyr
pipe chain that utilizes
group_by()
, where each function operates “by group” until
ungroup()
is called:
library(dplyr)
<- tibble(x = c("a", "a", "a", "b", "b"))
df
%>%
df group_by(x) %>%
slice(1:2) %>%
mutate(group_row_num = row_number()) %>%
ungroup()
#> # A tibble: 4 × 2
#> x group_row_num
#> <chr> <int>
#> 1 a 1
#> 2 a 2
#> 3 b 1
#> 4 b 2
Note that the ungroup()
call is unnecessary in
tidytable
.
tidytable
allows you to select/drop columns just like
you would in the tidyverse by utilizing the tidyselect
package
in the background.
Normal selection can be mixed with all tidyselect
helpers: everything()
, starts_with()
,
ends_with()
, any_of()
, where()
,
etc.
<- data.table(
df a = 1:3,
b1 = 4:6,
b2 = 7:9,
c = c("a", "a", "b")
)
%>%
df select.(a, starts_with("b"))
#> # A tidytable: 3 × 3
#> a b1 b2
#> <int> <int> <int>
#> 1 1 4 7
#> 2 2 5 8
#> 3 3 6 9
To drop columns use a -
sign:
%>%
df select.(-a, -starts_with("b"))
#> # A tidytable: 3 × 1
#> c
#> <chr>
#> 1 a
#> 2 a
#> 3 b
These same ideas can be used whenever selecting columns in
tidytable
functions - for example when using
count.()
, drop_na.()
, across.()
,
pivot_longer.()
, etc.
A full overview of selection options can be found here.
.by
tidyselect
helpers also work when using
.by
:
<- data.table(
df a = 1:3,
b = c("a", "a", "b"),
c = c("a", "a", "b")
)
%>%
df summarize.(avg_a = mean(a), .by = where(is.character))
#> # A tidytable: 2 × 3
#> b c avg_a
#> <chr> <chr> <dbl>
#> 1 a a 1.5
#> 2 b b 3
Tidy evaluation can be used to write custom functions with
tidytable
functions. The embracing shortcut
{{ }}
works, or you can use enquo()
with
!!
if you prefer:
<- data.table(x = c(1, 1, 1), y = c(1, 1, 1), z = c("a", "a", "b"))
df
<- function(data, add_col) {
add_one %>%
data mutate.(new_col = {{ add_col }} + 1)
}
%>%
df add_one(x)
#> # A tidytable: 3 × 4
#> x y z new_col
#> <dbl> <dbl> <chr> <dbl>
#> 1 1 1 a 2
#> 2 1 1 a 2
#> 3 1 1 b 2
The .data
and .env
pronouns also work
within tidytable
functions:
<- 10
var
%>%
df mutate.(new_col = .data$x + .env$var)
#> # A tidytable: 3 × 4
#> x y z new_col
#> <dbl> <dbl> <chr> <dbl>
#> 1 1 1 a 11
#> 2 1 1 a 11
#> 3 1 1 b 11
A full overview of tidy evaluation can be found here.
dt()
helperThe dt()
function makes regular data.table
syntax pipeable, so you can easily mix tidytable
syntax
with data.table
syntax:
<- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df
%>%
df dt(, .(x, y, z)) %>%
dt(x < 4 & y > 1) %>%
dt(order(x, y)) %>%
dt(, double_x := x * 2) %>%
dt(, .(avg_x = mean(x)), by = z)
#> # A tidytable: 2 × 2
#> z avg_x
#> <chr> <dbl>
#> 1 a 1.5
#> 2 b 3
For those interested in performance, speed comparisons can be found here.