At the surface, counting sounds pretty simple, right? You just want to know how many occurrences of something there are. Well - unfortunately, it’s not that easy. And in clinical reports, there’s quite a bit of nuance that goes into the different types of frequency tables that need to be created. Fortunately, we’ve added a good bit of flexibility into group_count()
to help you get what you need when creating these reports, whether you’re creating a demographics table, adverse events, or lab results.
Let’s start with a basic example. This table demonstrates the distribution of subject disposition across treatment groups. Additionally, we’re sorting by descending total occurrences using the “Total” group.
<- tplyr_table(adsl, TRT01P, where = SAFFL == "Y") %>%
t add_total_group() %>%
add_treat_grps(Treated = c("Xanomeline Low Dose", "Xanomeline High Dose")) %>%
add_layer(
group_count(DCDECOD) %>%
set_order_count_method("bycount") %>%
set_ordering_cols(Total)
%>%
) build() %>%
arrange(desc(ord_layer_1)) %>%
select(starts_with("row"), var1_Placebo, `var1_Xanomeline Low Dose`,
`var1_Xanomeline High Dose`, var1_Treated, var1_Total)
kable(t)
row_label1 | var1_Placebo | var1_Xanomeline Low Dose | var1_Xanomeline High Dose | var1_Treated | var1_Total |
---|---|---|---|---|---|
COMPLETED | 58 ( 67.4%) | 25 ( 29.8%) | 27 ( 32.1%) | 52 ( 31.0%) | 110 ( 43.3%) |
ADVERSE EVENT | 8 ( 9.3%) | 44 ( 52.4%) | 40 ( 47.6%) | 84 ( 50.0%) | 92 ( 36.2%) |
WITHDRAWAL BY SUBJECT | 9 ( 10.5%) | 10 ( 11.9%) | 8 ( 9.5%) | 18 ( 10.7%) | 27 ( 10.6%) |
STUDY TERMINATED BY SPONSOR | 2 ( 2.3%) | 2 ( 2.4%) | 3 ( 3.6%) | 5 ( 3.0%) | 7 ( 2.8%) |
PROTOCOL VIOLATION | 2 ( 2.3%) | 1 ( 1.2%) | 3 ( 3.6%) | 4 ( 2.4%) | 6 ( 2.4%) |
LACK OF EFFICACY | 3 ( 3.5%) | 0 ( 0.0%) | 1 ( 1.2%) | 1 ( 0.6%) | 4 ( 1.6%) |
DEATH | 2 ( 2.3%) | 1 ( 1.2%) | 0 ( 0.0%) | 1 ( 0.6%) | 3 ( 1.2%) |
PHYSICIAN DECISION | 1 ( 1.2%) | 0 ( 0.0%) | 2 ( 2.4%) | 2 ( 1.2%) | 3 ( 1.2%) |
LOST TO FOLLOW-UP | 1 ( 1.2%) | 1 ( 1.2%) | 0 ( 0.0%) | 1 ( 0.6%) | 2 ( 0.8%) |
Another exceptionally important consideration within count layers is whether you should be using distinct counts, non-distinct counts, or some combination of both. Adverse event tables are a perfect example. Often, you’re concerned about how many subjects had an adverse event in particular instead of just the number of occurrences of that adverse event. Similarly, the number occurrences of an event isn’t necessarily relevant when compared to the total number of adverse events that occurred. For this reason, what you likely want to look at is instead the number of subjects who experienced an event compared to the total number of subjects in that treatment group.
‘Tplyr’ allows you to focus on these distinct counts and distinct percents within some grouping variable, like subject. Additionally, you can mix and match with the distinct counts with non-distinct counts in the same row too. The set_distinct_by()
function sets the variables used to calculate the distinct occurrences of some value using the specified distinct_by
variables.
<- tplyr_table(adae, TRTA) %>%
t add_layer(
group_count(AEDECOD) %>%
set_distinct_by(USUBJID) %>%
set_format_strings(f_str("xxx (xx.xx%) [xxx]", distinct_n, distinct_pct, n))
%>%
) build() %>%
head()
kable(t)
row_label1 | var1_Placebo | var1_Xanomeline High Dose | var1_Xanomeline Low Dose | ord_layer_index | ord_layer_1 |
---|---|---|---|---|---|
ACTINIC KERATOSIS | 0 ( 0.00%) [ 0] | 1 ( 2.38%) [ 1] | 0 ( 0.00%) [ 0] | 1 | 1 |
ALOPECIA | 1 ( 4.76%) [ 1] | 0 ( 0.00%) [ 0] | 0 ( 0.00%) [ 0] | 1 | 2 |
BLISTER | 0 ( 0.00%) [ 0] | 1 ( 2.38%) [ 2] | 5 (11.90%) [ 8] | 1 | 3 |
COLD SWEAT | 1 ( 4.76%) [ 3] | 0 ( 0.00%) [ 0] | 0 ( 0.00%) [ 0] | 1 | 4 |
DERMATITIS ATOPIC | 1 ( 4.76%) [ 1] | 0 ( 0.00%) [ 0] | 0 ( 0.00%) [ 0] | 1 | 5 |
DERMATITIS CONTACT | 0 ( 0.00%) [ 0] | 0 ( 0.00%) [ 0] | 1 ( 2.38%) [ 2] | 1 | 6 |
You may have seen tables before like the one above. This display shows the number of subjects who experienced an adverse event, the percentage of subjects within the given treatment group who experienced that event, and then the total number of occurrences of that event. Using set_distinct_by()
triggered the derivation of distinct_n
and distinct_pct
in addition to the n
and pct
created within group_count
. The display of the values is then controlled by the f_str()
call in set_format_strings()
.
Certain summary tables present counts within groups. One example could be in a disposition table where a disposition reason of “Other” summarizes what those other reasons were. A very common example is an Adverse Event table that displays counts for body systems, and then the events within those body systems. This is again a nuanced situation - there are two variables being summarized: The body system counts, and the advert event counts.
One way to approach this would be creating two summaries. One summarizing the body system, and another summarizing the preferred terms by body system, and then merging the two together. But we don’t want you to have to do that. Instead, we handle this complexity for you. This is done in group_count()
by submitting two target variables with dplyr::vars()
. The first variable should be your grouping variable that you want summarized, which we refer to as the “Outside” variable, and the second should have the narrower scope, which we call the “Inside” variable.
The example below demonstrates how to do a nested summary. Look at the first row - here row_label1
and row_label2
are both “CARDIAC DISORDERS”. This line is the summary for AEBODSYS.
In the rows below that, row_label1
continues on with the value “CARDIAC DISORDERS”, but row_label2
changes. These are the summaries for AEDECOD
.
tplyr_table(adae, TRTA) %>%
add_layer(
group_count(vars(AEBODSYS, AEDECOD))
%>%
) build() %>%
head() %>%
kable()
row_label1 | row_label2 | var1_Placebo | var1_Xanomeline High Dose | var1_Xanomeline Low Dose | ord_layer_index | ord_layer_1 | ord_layer_2 |
---|---|---|---|---|---|---|---|
SKIN AND SUBCUTANEOUS TISSUE DISORDERS | SKIN AND SUBCUTANEOUS TISSUE DISORDERS | 47 (100.0%) | 111 (100.0%) | 118 (100.0%) | 1 | 1 | Inf |
SKIN AND SUBCUTANEOUS TISSUE DISORDERS | ACTINIC KERATOSIS | 0 ( 0.0%) | 1 ( 0.9%) | 0 ( 0.0%) | 1 | 1 | 1 |
SKIN AND SUBCUTANEOUS TISSUE DISORDERS | ALOPECIA | 1 ( 2.1%) | 0 ( 0.0%) | 0 ( 0.0%) | 1 | 1 | 2 |
SKIN AND SUBCUTANEOUS TISSUE DISORDERS | BLISTER | 0 ( 0.0%) | 2 ( 1.8%) | 8 ( 6.8%) | 1 | 1 | 3 |
SKIN AND SUBCUTANEOUS TISSUE DISORDERS | COLD SWEAT | 3 ( 6.4%) | 0 ( 0.0%) | 0 ( 0.0%) | 1 | 1 | 4 |
SKIN AND SUBCUTANEOUS TISSUE DISORDERS | DERMATITIS ATOPIC | 1 ( 2.1%) | 0 ( 0.0%) | 0 ( 0.0%) | 1 | 1 | 5 |
This accomplishes what we needed, but it’s not exactly the presentation you might hope for. We have a solution for this as well.
tplyr_table(adae, TRTA) %>%
add_layer(
group_count(vars(AEBODSYS, AEDECOD)) %>%
set_nest_count(TRUE) %>%
set_indentation("--->")
%>%
) build() %>%
head() %>%
kable()
row_label1 | var1_Placebo | var1_Xanomeline High Dose | var1_Xanomeline Low Dose | ord_layer_index | ord_layer_1 | ord_layer_2 |
---|---|---|---|---|---|---|
SKIN AND SUBCUTANEOUS TISSUE DISORDERS | 47 (100.0%) | 111 (100.0%) | 118 (100.0%) | 1 | 1 | Inf |
—>ACTINIC KERATOSIS | 0 ( 0.0%) | 1 ( 0.9%) | 0 ( 0.0%) | 1 | 1 | 1 |
—>ALOPECIA | 1 ( 2.1%) | 0 ( 0.0%) | 0 ( 0.0%) | 1 | 1 | 2 |
—>BLISTER | 0 ( 0.0%) | 2 ( 1.8%) | 8 ( 6.8%) | 1 | 1 | 3 |
—>COLD SWEAT | 3 ( 6.4%) | 0 ( 0.0%) | 0 ( 0.0%) | 1 | 1 | 4 |
—>DERMATITIS ATOPIC | 1 ( 2.1%) | 0 ( 0.0%) | 0 ( 0.0%) | 1 | 1 | 5 |
By using set_nest_count()
, this triggers ‘Tplyr’ to drop row_label1, and indent all of the AEDECOD values within row_label2. The columns are renamed appropriately as well. The default indentation used will be 3 spaces, but as you can see here - you can set the indentation however you like. This let’s you use tab strings for different language-specific output types, stick with spaces, indent wider or smaller - whatever you wish. All of the existing order variables remain, so this has no impact on your ability to sort the table.
There’s a lot more to counting! So be sure to check out our vignettes on sorting, shift tables, and denominators.