dtrackr - Configuration example

Global configuration

Most of the behaviour of dtrackr can be specified at the individual call level using the .headline and .messages glue specifications to define a format. Sometimes however this is annoying to do for all the stages in a flow chart and a global configuration of behaviour is desirable.

Naming conventions for groups

One of the areas where default behaviour may be undesirable is the naming of groups. The default setting combines the group name {.group} and the group value {.value} into a concatenated colon separated string as demonstrated below:

# these are the defaults
old = options(
  dtrackr.strata_glue="{.group}:{.value}",
  dtrackr.strata_sep="; "
)

dtrackr::ILPD %>%
  track() %>%
  group_by(Case_or_Control) %>%
  comment() %>%
  group_by(Gender,.add = TRUE) %>%
  comment(
    .messages = c(
    "{.count} patients",
    "{sprintf('%1.0f',.count/.total*100)}% of the total")) %>%
  ungroup() %>%
  flowchart()
%0 7:s->11 8:s->11 9:s->11 10:s->11 5:s->7 5:s->8 6:s->9 6:s->10 3:s->5 4:s->6 2:s->3 2:s->4 1:s->2 11 583 items 7 Case_or_Control:case; Gender:Female 92 patients 16% of the total 8 Case_or_Control:case; Gender:Male 324 patients 56% of the total 9 Case_or_Control:control; Gender:Female 50 patients 9% of the total 10 Case_or_Control:control; Gender:Male 117 patients 20% of the total 5 stratify by Case_or_Control, Gender 6 stratify by Case_or_Control, Gender 3 Case_or_Control:case 416 items 4 Case_or_Control:control 167 items 2 stratify by Case_or_Control 1 583 items
# reset options 
options(old)

In particular in situations like this where you are faceting on factors or strings, disposing of the group name may make this clearer. In the following example we only include the group value, force it to lower case, and use a comma to separate multiple facets. We have used manual override of the messages in the grouping stages to specify what we are faceting by in a more natural way:

# only include the group value in the description of the group
old = options(
  dtrackr.strata_glue="{tolower(.value)}",
  dtrackr.strata_sep=", "
)

dtrackr::ILPD %>%
  track() %>%
  group_by(
    Case_or_Control,
    .messages = "case or control"
  ) %>%
  comment() %>%
  group_by(
    Gender,
    .add = TRUE, 
    .messages = "by {tolower(.cols)}" #.cols contains a csv string of the grouping variables
  ) %>%
  comment(
    .messages = c(
    "{.count} patients",
    "{sprintf('%1.0f',.count/.total*100)}% of the total")) %>%
  ungroup() %>%
  flowchart()
%0 7:s->11 8:s->11 9:s->11 10:s->11 5:s->7 5:s->8 6:s->9 6:s->10 3:s->5 4:s->6 2:s->3 2:s->4 1:s->2 11 583 items 7 case, female 92 patients 16% of the total 8 case, male 324 patients 56% of the total 9 control, female 50 patients 9% of the total 10 control, male 117 patients 20% of the total 5 by case_or_control, gender 6 by case_or_control, gender 3 case 416 items 4 control 167 items 2 case or control 1 583 items
# reset options 
options(old)

N.B. this setting affects the “strata” label of the group, which in turn affects the flowchart branching. If this is not unique from one group to another strange behaviours will be observed.

Default text

With the group strata label defined you can set other defaults. In the flowchart above the “583 items” labels are generated by the default message setting, and the headings for the groups by the default headline setting. In this example we change these to alter the default text.

old = options(
  dtrackr.strata_glue="{tolower(.value)}",
  dtrackr.strata_sep=", ",
  dtrackr.default_message = "containing {.count} patients",
  dtrackr.default_headline = "subgroup: {.strata}"
)

dtrackr::ILPD %>%
  track() %>%
  group_by(
    Case_or_Control,
    .messages = "case or control"
  ) %>%
  comment() %>%
  group_by(
    Gender,
    .add = TRUE, 
    .messages = "by gender"
  ) %>%
  comment(
    .messages = c(
    "{.count} patients",
    "{sprintf('%1.0f',.count/.total*100)}% of the total")) %>%
  ungroup() %>%
  flowchart()
%0 7:s->11 8:s->11 9:s->11 10:s->11 5:s->7 5:s->8 6:s->9 6:s->10 3:s->5 4:s->6 2:s->3 2:s->4 1:s->2 11 subgroup: containing 583 patients 7 subgroup: case, female 92 patients 16% of the total 8 subgroup: case, male 324 patients 56% of the total 9 subgroup: control, female 50 patients 9% of the total 10 subgroup: control, male 117 patients 20% of the total 5 by gender 6 by gender 3 subgroup: case containing 416 patients 4 subgroup: control containing 167 patients 2 case or control 1 subgroup: containing 583 patients
# N.b. this setting includes some unwanted headlines in the ungrouped stages of the flow chart. If a headline evaluates to "" then the headline is suppressed and we can get rid of unwanted headlines. An example of doing this is as follows:
# options(dtrackr.default_headline = "{ifelse(.strata != '', glue::glue('subgroup: {.strata}'), '')}")

# reset options 
options(old)

Subgroup count

Subgroup counts are a slightly neater way of doing this. Their default layout can be modified using dtrackr.default_count_subgroup.

old = options(
  dtrackr.default_headline = "{.strata}",
  dtrackr.default_count_subgroup = "{tolower(.name)}: {.count}/{.subtotal}"
)

dtrackr::ILPD %>%
  track() %>%
  group_by(
    Case_or_Control,
    .messages = "case or control"
  ) %>%
  comment() %>%
  count_subgroup(
    Gender
  ) %>%
  ungroup() %>%
  flowchart()
%0 5:s->7 6:s->7 3:s->5 4:s->6 2:s->3 2:s->4 1:s->2 7 583 items 5 Case_or_Control:case female: 92/416 male: 324/416 6 Case_or_Control:control female: 50/167 male: 117/167 3 Case_or_Control:case 416 items 4 Case_or_Control:control 167 items 2 case or control 1 583 items
# reset options 
options(old)

Exclusions

Elsewhere we discuss the possibility of capturing excluded items for debugging. This behaviour can be added to any pipeline with the capture_exclusions() function. Alternatively it can be globally enabled with the following option. Usual caveats about performance apply.

options(dtrackr.exclusions=TRUE)

Reporting exclusions which do nothing

Sometimes in a pipeline we have a exclusion criteria which is not triggered, or is not triggered for a particular subgroup. In this case the default is not to show the zero items that were excluded. However sometimes it is reassuring to know that an filter was applied even if it results in nothing:

options(dtrackr.show_zero_exclusions=FALSE)

Maximum groupings

in count_subgroup() and group_by() statements there can be a large number of items generated if a particular grouping variable has a lot of possible values. This can cause performance issues and legibility issues for the resulting graph and is usually a mistake, as a result of an interim stage of the data pipeline (e.g. a dataset %>% group_by(nearly_unique_id) %>% filter(row_number()==1) type operation). The most number of groups that dtrackr will attempt to keep track of is configurable but defaults to 16:

options(dtrackr.max_supported_groupings = 16)