Once a dataset is cleaned and ready for statistical analysis, the first step is typically to summarize it. The univariate_table() function makes it easy to create a custom descriptive analysis while consistently producing clean, presentation-ready output. It is built to integrate directly into your analysis work flow (e.g. R markdown) but can also be called from the console and be rendered in a number of formats.

require(cheese)

heart_disease %>%
  univariate_table()
Variable Level Summary
Age 56 (48, 61)
Sex Female 97 (32.01%)
Male 206 (67.99%)
ChestPain Typical angina 23 (7.59%)
Atypical angina 50 (16.5%)
Non-anginal pain 86 (28.38%)
Asymptomatic 144 (47.52%)
BP 130 (120, 140)
Cholesterol 241 (211, 275)
MaximumHR 153 (133.5, 166)
ExerciseInducedAngina No 204 (67.33%)
Yes 99 (32.67%)
HeartDisease No 164 (54.13%)
Yes 139 (45.87%)

By default, an HTML table is produced containing descriptive statistics for columns in the dataset.

Custom string templates

In the table above, the summary statistics are presented within the cells in a particular format for different types of data. You can use the _summary arguments to customize not only the appearance that the results are presented with, but the values that go into the results themselves.

Suppose instead of the "median (q1, q3)" being displayed for numeric data, you want the "mean [sd] / median", in that exact format:

heart_disease %>%
  univariate_table(
    numeric_summary = 
      c(
        Summary = "mean [sd] / median"
      )
  )
Variable Level Summary
Age 54.44 [9.04] / 56
Sex Female 97 (32.01%)
Male 206 (67.99%)
ChestPain Typical angina 23 (7.59%)
Atypical angina 50 (16.5%)
Non-anginal pain 86 (28.38%)
Asymptomatic 144 (47.52%)
BP 131.69 [17.6] / 130
Cholesterol 246.69 [51.78] / 241
MaximumHR 149.61 [22.88] / 153
ExerciseInducedAngina No 204 (67.33%)
Yes 99 (32.67%)
HeartDisease No 164 (54.13%)
Yes 139 (45.87%)

The name Summary was used to ensure that the result for the numeric data binded in the same column as the result for the other data types. If you chose to name it something else, you'd get a new column with those summaries:

heart_disease %>%
  univariate_table(
    numeric_summary = 
      c(
        NewSummary = "mean [sd] / median"
      )
  )
Variable Level NewSummary Summary
Age 54.44 [9.04] / 56
Sex Female 97 (32.01%)
Male 206 (67.99%)
ChestPain Typical angina 23 (7.59%)
Atypical angina 50 (16.5%)
Non-anginal pain 86 (28.38%)
Asymptomatic 144 (47.52%)
BP 131.69 [17.6] / 130
Cholesterol 246.69 [51.78] / 241
MaximumHR 149.61 [22.88] / 153
ExerciseInducedAngina No 204 (67.33%)
Yes 99 (32.67%)
HeartDisease No 164 (54.13%)
Yes 139 (45.87%)

You can add as many summary columns as you want separately for each type of data:

heart_disease %>%
  univariate_table(
    numeric_summary = 
      c(
        `Numeric only` = "mean [sd] / median",
        Summary = "median (q1, q3)"
      ),
    categorical_summary = 
      c(
        Summary = "count",
        `Categorical only` = "percent = 100 * proportion"
      )
  )
Variable Level Numeric only Summary Categorical only
Age 54.44 [9.04] / 56 56 (48, 61)
Sex Female 97 32.01 = 100 * 0.32
Male 206 67.99 = 100 * 0.68
ChestPain Typical angina 23 7.59 = 100 * 0.08
Atypical angina 50 16.5 = 100 * 0.17
Non-anginal pain 86 28.38 = 100 * 0.28
Asymptomatic 144 47.52 = 100 * 0.48
BP 131.69 [17.6] / 130 130 (120, 140)
Cholesterol 246.69 [51.78] / 241 241 (211, 275)
MaximumHR 149.61 [22.88] / 153 153 (133.5, 166)
ExerciseInducedAngina No 204 67.33 = 100 * 0.67
Yes 99 32.67 = 100 * 0.33
HeartDisease No 164 54.13 = 100 * 0.54
Yes 139 45.87 = 100 * 0.46

A more visually-appealing case for adding multiple summaries is probably when all the data is the same type:

heart_disease %>%
  univariate_table(
    categorical_types = NULL, #Easily disable categorical data from being summarized
    numeric_summary =
      c(
        `Median (Q1, Q3)` = "median (q1, q3)",
        `Min-Max` = "min - max",
        `Mean (SD)` = "mean (sd)"
      )
  )
Variable Median (Q1, Q3) Min-Max Mean (SD)
Age 56 (48, 61) 29 - 77 54.44 (9.04)
BP 130 (120, 140) 94 - 200 131.69 (17.6)
Cholesterol 241 (211, 275) 126 - 564 246.69 (51.78)
MaximumHR 153 (133.5, 166) 71 - 202 149.61 (22.88)

Or when adding a summary that applies to all columns:

heart_disease %>%
  univariate_table(
    all_summary = 
      c(
        `# obs. non-missing` = "available of length"
      )
  )
Variable Level Summary # obs. non-missing
Age 56 (48, 61) 303 of 303
Sex 303 of 303
Female 97 (32.01%)
Male 206 (67.99%)
ChestPain 303 of 303
Typical angina 23 (7.59%)
Atypical angina 50 (16.5%)
Non-anginal pain 86 (28.38%)
Asymptomatic 144 (47.52%)
BP 130 (120, 140) 303 of 303
Cholesterol 241 (211, 275) 303 of 303
BloodSugar 303 of 303
MaximumHR 153 (133.5, 166) 303 of 303
ExerciseInducedAngina 303 of 303
No 204 (67.33%)
Yes 99 (32.67%)
HeartDisease 303 of 303
No 164 (54.13%)
Yes 139 (45.87%)

These add an extra row for categorical variables. You may have also noticed that the BloodSugar column didn't show up in the table until the all_summary argument was used–this is because it is not classified as numeric or categorical data, and thus not evaluated by default. See the “Backend functionality” section to learn more.

Stratification variables

The strata argument takes a formula() that can be used to stratify the analysis by any number of variables. Columns on the left side will appear down the rows, and columns on the right side will spread across the columns. You can use + on either side to specify more than one column. Let's start by stratifying sex across the columns:

heart_disease %>%
  univariate_table(
    strata = ~ Sex
  )
Variable Level Female Male
Age 57 (50, 63) 54.5 (47, 59.75)
ChestPain Typical angina 4 (4.12%) 19 (9.22%)
Atypical angina 18 (18.56%) 32 (15.53%)
Non-anginal pain 35 (36.08%) 51 (24.76%)
Asymptomatic 40 (41.24%) 104 (50.49%)
BP 132 (120, 140) 130 (120, 140)
Cholesterol 254 (215, 302) 235 (208.75, 268.5)
MaximumHR 157 (142, 165) 150.5 (132, 167.5)
ExerciseInducedAngina No 75 (77.32%) 129 (62.62%)
Yes 22 (22.68%) 77 (37.38%)
HeartDisease No 72 (74.23%) 92 (44.66%)
Yes 25 (25.77%) 114 (55.34%)

You can do the same thing down the rows:

heart_disease %>%
  univariate_table(
    strata = Sex ~ 1
  )
Sex Variable Level Summary
Female Age 57 (50, 63)
ChestPain Typical angina 4 (4.12%)
Atypical angina 18 (18.56%)
Non-anginal pain 35 (36.08%)
Asymptomatic 40 (41.24%)
BP 132 (120, 140)
Cholesterol 254 (215, 302)
MaximumHR 157 (142, 165)
ExerciseInducedAngina No 75 (77.32%)
Yes 22 (22.68%)
HeartDisease No 72 (74.23%)
Yes 25 (25.77%)
Male Age 54.5 (47, 59.75)
ChestPain Typical angina 19 (9.22%)
Atypical angina 32 (15.53%)
Non-anginal pain 51 (24.76%)
Asymptomatic 104 (50.49%)
BP 130 (120, 140)
Cholesterol 235 (208.75, 268.5)
MaximumHR 150.5 (132, 167.5)
ExerciseInducedAngina No 129 (62.62%)
Yes 77 (37.38%)
HeartDisease No 92 (44.66%)
Yes 114 (55.34%)

Or even both:

heart_disease %>%
  univariate_table(
    strata = Sex ~ HeartDisease
  )
Sex Variable Level No Yes
Female Age 54 (46, 63.25) 60 (57, 62)
ChestPain Typical angina 4 (5.56%) 0 (0%)
Atypical angina 16 (22.22%) 2 (8%)
Non-anginal pain 34 (47.22%) 1 (4%)
Asymptomatic 18 (25%) 22 (88%)
BP 130 (119.5, 140) 140 (130, 158)
Cholesterol 249 (210.75, 289.5) 268 (236, 307)
MaximumHR 159 (146.75, 167.25) 146 (133, 157)
ExerciseInducedAngina No 64 (88.89%) 11 (44%)
Yes 8 (11.11%) 14 (56%)
Male Age 52 (44, 57) 57.5 (51, 61)
ChestPain Typical angina 12 (13.04%) 7 (6.14%)
Atypical angina 25 (27.17%) 7 (6.14%)
Non-anginal pain 34 (36.96%) 17 (14.91%)
Asymptomatic 21 (22.83%) 83 (72.81%)
BP 130 (120, 140) 130 (120, 140)
Cholesterol 229.5 (206.5, 250.75) 247.5 (212, 282)
MaximumHR 163 (150, 175.75) 141 (125, 156)
ExerciseInducedAngina No 77 (83.7%) 52 (45.61%)
Yes 15 (16.3%) 62 (54.39%)

Now suppose you want both stratification variables across the columns:

heart_disease %>%
  univariate_table(
    strata = ~ Sex + HeartDisease
  )
Female
Male
Variable Level No Yes No Yes
Age 54 (46, 63.25) 60 (57, 62) 52 (44, 57) 57.5 (51, 61)
ChestPain Typical angina 4 (5.56%) 0 (0%) 12 (13.04%) 7 (6.14%)
Atypical angina 16 (22.22%) 2 (8%) 25 (27.17%) 7 (6.14%)
Non-anginal pain 34 (47.22%) 1 (4%) 34 (36.96%) 17 (14.91%)
Asymptomatic 18 (25%) 22 (88%) 21 (22.83%) 83 (72.81%)
BP 130 (119.5, 140) 140 (130, 158) 130 (120, 140) 130 (120, 140)
Cholesterol 249 (210.75, 289.5) 268 (236, 307) 229.5 (206.5, 250.75) 247.5 (212, 282)
MaximumHR 159 (146.75, 167.25) 146 (133, 157) 163 (150, 175.75) 141 (125, 156)
ExerciseInducedAngina No 64 (88.89%) 11 (44%) 77 (83.7%) 52 (45.61%)
Yes 8 (11.11%) 14 (56%) 15 (16.3%) 62 (54.39%)

The levels will span the columns in a hierarchical fashion depending on their order in the formula:

heart_disease %>%
  univariate_table(
    strata = ~ HeartDisease + Sex
  )
No
Yes
Variable Level Female Male Female Male
Age 54 (46, 63.25) 52 (44, 57) 60 (57, 62) 57.5 (51, 61)
ChestPain Typical angina 4 (5.56%) 12 (13.04%) 0 (0%) 7 (6.14%)
Atypical angina 16 (22.22%) 25 (27.17%) 2 (8%) 7 (6.14%)
Non-anginal pain 34 (47.22%) 34 (36.96%) 1 (4%) 17 (14.91%)
Asymptomatic 18 (25%) 21 (22.83%) 22 (88%) 83 (72.81%)
BP 130 (119.5, 140) 130 (120, 140) 140 (130, 158) 130 (120, 140)
Cholesterol 249 (210.75, 289.5) 229.5 (206.5, 250.75) 268 (236, 307) 247.5 (212, 282)
MaximumHR 159 (146.75, 167.25) 163 (150, 175.75) 146 (133, 157) 141 (125, 156)
ExerciseInducedAngina No 64 (88.89%) 77 (83.7%) 11 (44%) 52 (45.61%)
Yes 8 (11.11%) 15 (16.3%) 14 (56%) 62 (54.39%)

Similarly, the rows also collapse hierarchically:

heart_disease %>%
  univariate_table(
    strata = HeartDisease + Sex ~ 1
  )
HeartDisease Sex Variable Level Summary
No Female Age 54 (46, 63.25)
ChestPain Typical angina 4 (5.56%)
Atypical angina 16 (22.22%)
Non-anginal pain 34 (47.22%)
Asymptomatic 18 (25%)
BP 130 (119.5, 140)
Cholesterol 249 (210.75, 289.5)
MaximumHR 159 (146.75, 167.25)
ExerciseInducedAngina No 64 (88.89%)
Yes 8 (11.11%)
Male Age 52 (44, 57)
ChestPain Typical angina 12 (13.04%)
Atypical angina 25 (27.17%)
Non-anginal pain 34 (36.96%)
Asymptomatic 21 (22.83%)
BP 130 (120, 140)
Cholesterol 229.5 (206.5, 250.75)
MaximumHR 163 (150, 175.75)
ExerciseInducedAngina No 77 (83.7%)
Yes 15 (16.3%)
Yes Female Age 60 (57, 62)
ChestPain Typical angina 0 (0%)
Atypical angina 2 (8%)
Non-anginal pain 1 (4%)
Asymptomatic 22 (88%)
BP 140 (130, 158)
Cholesterol 268 (236, 307)
MaximumHR 146 (133, 157)
ExerciseInducedAngina No 11 (44%)
Yes 14 (56%)
Male Age 57.5 (51, 61)
ChestPain Typical angina 7 (6.14%)
Atypical angina 7 (6.14%)
Non-anginal pain 17 (14.91%)
Asymptomatic 83 (72.81%)
BP 130 (120, 140)
Cholesterol 247.5 (212, 282)
MaximumHR 141 (125, 156)
ExerciseInducedAngina No 52 (45.61%)
Yes 62 (54.39%)

You can use any of the functionality described in the previous section with stratification variables as well:

heart_disease %>%
  univariate_table(
    strata = ~ Sex + HeartDisease,
    numeric_summary = 
      c(
        `Mean (SD)` = "mean (sd)"
      ),
    categorical_summary = 
      c(
        `Count (%)` = "count (percent%)"
      )
  )
Female
Male
No
Yes
No
Yes
Variable Level Mean (SD) Count (%) Mean (SD) Count (%) Mean (SD) Count (%) Mean (SD) Count (%)
Age 54.56 (10.27) 59.08 (4.86) 51.04 (8.62) 56.09 (8.39)
ChestPain Typical angina 4 (5.56%) 0 (0%) 12 (13.04%) 7 (6.14%)
Atypical angina 16 (22.22%) 2 (8%) 25 (27.17%) 7 (6.14%)
Non-anginal pain 34 (47.22%) 1 (4%) 34 (36.96%) 17 (14.91%)
Asymptomatic 18 (25%) 22 (88%) 21 (22.83%) 83 (72.81%)
BP 128.74 (16.54) 146.6 (21.12) 129.65 (16.02) 131.93 (17.22)
Cholesterol 256.75 (66.22) 276.16 (59.88) 231.6 (37.64) 246.06 (45.44)
MaximumHR 154.03 (19.25) 143.16 (20.18) 161.78 (18.56) 138.4 (23.08)
ExerciseInducedAngina No 64 (88.89%) 11 (44%) 77 (83.7%) 52 (45.61%)
Yes 8 (11.11%) 14 (56%) 15 (16.3%) 62 (54.39%)

The summary columns simply get added to the column-spanning hierarchy.

Adding sample size

The add_n argument will add the sample size to the label for the stratification group:

heart_disease %>%
  univariate_table(
    strata = ~ Sex,
    add_n = TRUE
  )
Variable Level Female (N=97) Male (N=206)
Age 57 (50, 63) 54.5 (47, 59.75)
ChestPain Typical angina 4 (4.12%) 19 (9.22%)
Atypical angina 18 (18.56%) 32 (15.53%)
Non-anginal pain 35 (36.08%) 51 (24.76%)
Asymptomatic 40 (41.24%) 104 (50.49%)
BP 132 (120, 140) 130 (120, 140)
Cholesterol 254 (215, 302) 235 (208.75, 268.5)
MaximumHR 157 (142, 165) 150.5 (132, 167.5)
ExerciseInducedAngina No 75 (77.32%) 129 (62.62%)
Yes 22 (22.68%) 77 (37.38%)
HeartDisease No 72 (74.23%) 92 (44.66%)
Yes 25 (25.77%) 114 (55.34%)

When multiple stratification variables are added on one side of the formula, the sample size will show up on the lowest level of the hierarchy, excluding summary columns:

heart_disease %>%
  univariate_table(
    strata = ~ Sex + HeartDisease,
    add_n = TRUE
  )
Female
Male
Variable Level No (N=72) Yes (N=25) No (N=92) Yes (N=114)
Age 54 (46, 63.25) 60 (57, 62) 52 (44, 57) 57.5 (51, 61)
ChestPain Typical angina 4 (5.56%) 0 (0%) 12 (13.04%) 7 (6.14%)
Atypical angina 16 (22.22%) 2 (8%) 25 (27.17%) 7 (6.14%)
Non-anginal pain 34 (47.22%) 1 (4%) 34 (36.96%) 17 (14.91%)
Asymptomatic 18 (25%) 22 (88%) 21 (22.83%) 83 (72.81%)
BP 130 (119.5, 140) 140 (130, 158) 130 (120, 140) 130 (120, 140)
Cholesterol 249 (210.75, 289.5) 268 (236, 307) 229.5 (206.5, 250.75) 247.5 (212, 282)
MaximumHR 159 (146.75, 167.25) 146 (133, 157) 163 (150, 175.75) 141 (125, 156)
ExerciseInducedAngina No 64 (88.89%) 11 (44%) 77 (83.7%) 52 (45.61%)
Yes 8 (11.11%) 14 (56%) 15 (16.3%) 62 (54.39%)

A limitation is that when sample size is added in the presence of row and column strata, it is displayed for the marginal groups only:

heart_disease %>%
  univariate_table(
    strata = Sex ~ HeartDisease,
    add_n = TRUE
  )
Sex Variable Level No (N=164) Yes (N=139)
Female (N=97) Age 54 (46, 63.25) 60 (57, 62)
ChestPain Typical angina 4 (5.56%) 0 (0%)
Atypical angina 16 (22.22%) 2 (8%)
Non-anginal pain 34 (47.22%) 1 (4%)
Asymptomatic 18 (25%) 22 (88%)
BP 130 (119.5, 140) 140 (130, 158)
Cholesterol 249 (210.75, 289.5) 268 (236, 307)
MaximumHR 159 (146.75, 167.25) 146 (133, 157)
ExerciseInducedAngina No 64 (88.89%) 11 (44%)
Yes 8 (11.11%) 14 (56%)
Male (N=206) Age 52 (44, 57) 57.5 (51, 61)
ChestPain Typical angina 12 (13.04%) 7 (6.14%)
Atypical angina 25 (27.17%) 7 (6.14%)
Non-anginal pain 34 (36.96%) 17 (14.91%)
Asymptomatic 21 (22.83%) 83 (72.81%)
BP 130 (120, 140) 130 (120, 140)
Cholesterol 229.5 (206.5, 250.75) 247.5 (212, 282)
MaximumHR 163 (150, 175.75) 141 (125, 156)
ExerciseInducedAngina No 77 (83.7%) 52 (45.61%)
Yes 15 (16.3%) 62 (54.39%)

Association metrics

Often when a descriptive analysis is stratified by one or more variables, it is also of interest to add statistics that compare each variable across the groups. The associations argument allows you to add a list containing an unlimited number of functions that can produce a scalar value to be placed in the table. First, let's define a function:

#Function for a p-value
pval <-
  function(y, x) {

    #For categorical data use Fisher's Exact test
    if(some_type(x, "factor")) {

      p <- fisher.test(factor(y), factor(x), simulate.p.value = TRUE)$p.value

    #Otherwise use Kruskall-Wallis
    } else {

      p <- kruskal.test(x, factor(y))$p.value

    }

    ifelse(p < 0.001, "<0.001", as.character(round(p, 2)))

  }

The stratification variable will be placed in the second argument of the function(s) provided. Now you can add it to the function call:

heart_disease %>%
  univariate_table(
    strata = ~ HeartDisease,
    associations = list(`P-value` = pval)
  )
Variable Level No Yes P-value
Age 52 (44.75, 59) 58 (52, 62) 0.12
Sex <0.001
Female 72 (43.9%) 25 (17.99%)
Male 92 (56.1%) 114 (82.01%)
ChestPain <0.001
Typical angina 16 (9.76%) 7 (5.04%)
Atypical angina 41 (25%) 9 (6.47%)
Non-anginal pain 68 (41.46%) 18 (12.95%)
Asymptomatic 39 (23.78%) 105 (75.54%)
BP 130 (120, 140) 130 (120, 145) 0.51
Cholesterol 234.5 (208.75, 267.25) 249 (217.5, 283.5) 0.11
MaximumHR 161 (148.75, 172) 142 (125, 156.5) 0.08
ExerciseInducedAngina <0.001
No 141 (85.98%) 63 (45.32%)
Yes 23 (14.02%) 76 (54.68%)

The name of function in the list is what becomes the column label.

The comparison will take place across the number of subgroups there are within the column stratification:

heart_disease %>%
  univariate_table(
    strata = ~ Sex + HeartDisease,
    associations = list(`P-value` = pval)
  )
Female
Male
Variable Level No Yes No Yes P-value
Age 54 (46, 63.25) 60 (57, 62) 52 (44, 57) 57.5 (51, 61) 0.53
ChestPain <0.001
Typical angina 4 (5.56%) 0 (0%) 12 (13.04%) 7 (6.14%)
Atypical angina 16 (22.22%) 2 (8%) 25 (27.17%) 7 (6.14%)
Non-anginal pain 34 (47.22%) 1 (4%) 34 (36.96%) 17 (14.91%)
Asymptomatic 18 (25%) 22 (88%) 21 (22.83%) 83 (72.81%)
BP 130 (119.5, 140) 140 (130, 158) 130 (120, 140) 130 (120, 140) 0.55
Cholesterol 249 (210.75, 289.5) 268 (236, 307) 229.5 (206.5, 250.75) 247.5 (212, 282) 0.11
MaximumHR 159 (146.75, 167.25) 146 (133, 157) 163 (150, 175.75) 141 (125, 156) 0.01
ExerciseInducedAngina <0.001
No 64 (88.89%) 11 (44%) 77 (83.7%) 52 (45.61%)
Yes 8 (11.11%) 14 (56%) 15 (16.3%) 62 (54.39%)

However, using a row stratification makes the comparisons be within those groups:

heart_disease %>%
  univariate_table(
    strata = Sex ~ HeartDisease,
    associations = list(`P-value` = pval)
  )
Sex Variable Level No Yes P-value
Female Age 54 (46, 63.25) 60 (57, 62) 0.17
ChestPain <0.001
Typical angina 4 (5.56%) 0 (0%)
Atypical angina 16 (22.22%) 2 (8%)
Non-anginal pain 34 (47.22%) 1 (4%)
Asymptomatic 18 (25%) 22 (88%)
BP 130 (119.5, 140) 140 (130, 158) 0.37
Cholesterol 249 (210.75, 289.5) 268 (236, 307) 0.58
MaximumHR 159 (146.75, 167.25) 146 (133, 157) 0.15
ExerciseInducedAngina <0.001
No 64 (88.89%) 11 (44%)
Yes 8 (11.11%) 14 (56%)
Male Age 52 (44, 57) 57.5 (51, 61) 0.29
ChestPain <0.001
Typical angina 12 (13.04%) 7 (6.14%)
Atypical angina 25 (27.17%) 7 (6.14%)
Non-anginal pain 34 (36.96%) 17 (14.91%)
Asymptomatic 21 (22.83%) 83 (72.81%)
BP 130 (120, 140) 130 (120, 140) 0.71
Cholesterol 229.5 (206.5, 250.75) 247.5 (212, 282) 0.11
MaximumHR 163 (150, 175.75) 141 (125, 156) 0.26
ExerciseInducedAngina <0.001
No 77 (83.7%) 52 (45.61%)
Yes 15 (16.3%) 62 (54.39%)

In general, there must be at least one column stratification variable in order to use association metrics. See univariate_associations() for more details on the workhorse of this functionality.

Backend functionality

descriptives() is the function that drives the computation behind the statistics for the columns of the input dataset. Any of its arguments can be passed from univariate_table() to add further customization.

Specifying data types

As noted above, one of columns did not appear in the table by default because it was a logical() type. By default, only factor() and numeric() types are placed into the result, though there are (at least) three ways to include it:

Change column type prior to function call

You could simply just make the column a conformable type outside of the call:

heart_disease %>%
  dplyr::mutate(
    BloodSugar = factor(BloodSugar)
  ) %>%
  univariate_table()
Variable Level Summary
Age 56 (48, 61)
Sex Female 97 (32.01%)
Male 206 (67.99%)
ChestPain Typical angina 23 (7.59%)
Atypical angina 50 (16.5%)
Non-anginal pain 86 (28.38%)
Asymptomatic 144 (47.52%)
BP 130 (120, 140)
Cholesterol 241 (211, 275)
BloodSugar FALSE 258 (85.15%)
TRUE 45 (14.85%)
MaximumHR 153 (133.5, 166)
ExerciseInducedAngina No 204 (67.33%)
Yes 99 (32.67%)
HeartDisease No 164 (54.13%)
Yes 139 (45.87%)

Change scope of what column types are evaluated by what function sets

The _types arguments allow you to specify the data types that are to be interpreted by the high-level function call. Let's allow logical() types to be treated as a categorical variable:

heart_disease %>%
  univariate_table(
    categorical_types = c("factor", "logical")
  )
Variable Level Summary
Age 56 (48, 61)
Sex Female 97 (32.01%)
Male 206 (67.99%)
ChestPain Typical angina 23 (7.59%)
Atypical angina 50 (16.5%)
Non-anginal pain 86 (28.38%)
Asymptomatic 144 (47.52%)
BP 130 (120, 140)
Cholesterol 241 (211, 275)
BloodSugar FALSE 258 (85.15%)
TRUE 45 (14.85%)
MaximumHR 153 (133.5, 166)
ExerciseInducedAngina No 204 (67.33%)
Yes 99 (32.67%)
HeartDisease No 164 (54.13%)
Yes 139 (45.87%)

Allow evaluation by its own set of functions

The most flexible approach would be to define its own set of functions. By default, the data type of anything that is not interpreted as categorical or numeric is considered “other”. There is infrastruce in place to supply functions and summaries in the same manner for these columns.

heart_disease %>%
  univariate_table(
    f_other = list(count = function(x) table(x)),
    other_summary = 
      c(
        Summary = "count"
      )
  )
Variable Level Summary
Age 56 (48, 61)
Sex Female 97 (32.01%)
Male 206 (67.99%)
ChestPain Typical angina 23 (7.59%)
Atypical angina 50 (16.5%)
Non-anginal pain 86 (28.38%)
Asymptomatic 144 (47.52%)
BP 130 (120, 140)
Cholesterol 241 (211, 275)
BloodSugar 258
45
MaximumHR 153 (133.5, 166)
ExerciseInducedAngina No 204 (67.33%)
Yes 99 (32.67%)
HeartDisease No 164 (54.13%)
Yes 139 (45.87%)

You would need to also define functions for the percentages, proportions, etc. to exactly match the other examples.

Adding user-specified functions

You can also add custom functions that can be available for numeric or categorical columns:

heart_disease %>%
  univariate_table(
    categorical_types = NULL,
    f_numeric =
      list(
        cv = ~sd(.x) / mean(.x)
      ),
    numeric_summary = 
      c(
        `Coef. of variation` = "sd / mean = cv"
      )
  )
Variable Coef. of variation
Age 9.04 / 54.44 = 0.17
BP 17.6 / 131.69 = 0.13
Cholesterol 51.78 / 246.69 = 0.21
MaximumHR 22.88 / 149.61 = 0.15

The names of functions become the patterns that searched in the string templates.

Additional preferences

Finally, we'll look at a few of the appearance-related arguments. These can be applied with any combination of other arguments.

Rendering format

As mentioned above, the default format for the table is HTML, but you could choose an alternative with the format argument:

heart_disease %>%
  univariate_table(
    format = "none"
  )
## # A tibble: 14 x 3
##    Variable              Level            Summary         
##    <chr>                 <chr>            <chr>           
##  1 Age                   ""               56 (48, 61)     
##  2 Sex                   Female           97 (32.01%)     
##  3 ""                    Male             206 (67.99%)    
##  4 ChestPain             Typical angina   23 (7.59%)      
##  5 ""                    Atypical angina  50 (16.5%)      
##  6 ""                    Non-anginal pain 86 (28.38%)     
##  7 ""                    Asymptomatic     144 (47.52%)    
##  8 BP                    ""               130 (120, 140)  
##  9 Cholesterol           ""               241 (211, 275)  
## 10 MaximumHR             ""               153 (133.5, 166)
## 11 ExerciseInducedAngina No               204 (67.33%)    
## 12 ""                    Yes              99 (32.67%)     
## 13 HeartDisease          No               164 (54.13%)    
## 14 ""                    Yes              139 (45.87%)

There are also options for "latex", "pandoc", "markdown".

Relabeling, releveling and reordering

You can use the labels and levels arguments to add clean text to any of the variable or categorical level names, and the order argument to change the position of the variables in the result:

heart_disease %>%
  univariate_table(
    labels = 
      c(
        Age = "Age (years)",
        ChestPain = "Chest pain"
      ),
    levels = 
      list(
        Sex =
          c(
            Male = "M"
          )
      ),
    order = 
      c(
        "BP",
        "Age",
        "Cholesterol"
      )
  )
Variable Level Summary
BP 130 (120, 140)
Age (years) 56 (48, 61)
Cholesterol 241 (211, 275)
Chest pain Typical angina 23 (7.59%)
Atypical angina 50 (16.5%)
Non-anginal pain 86 (28.38%)
Asymptomatic 144 (47.52%)
ExerciseInducedAngina No 204 (67.33%)
Yes 99 (32.67%)
HeartDisease No 164 (54.13%)
Yes 139 (45.87%)
MaximumHR 153 (133.5, 166)
Sex Female 97 (32.01%)
M 206 (67.99%)

Notice you only need to specify values that need to be changed. Also, ordering is done with the original names even when relabeled.

Headers, fill values, and captions

The variableName and levelName arguments are used to change what the headers are for the column names and categorical levels, while fill_blanks determines what goes in empty cells. Finally, the caption argument specifies labels the entire table:

heart_disease %>%
  univariate_table(
    variableName = "THESE ARE VARIABLES",
    levelName = "THESE ARE LEVELS",
    fill_blanks = "BLANK",
    caption = "HERE IS MY CAPTION"
  )
HERE IS MY CAPTION
THESE ARE VARIABLES THESE ARE LEVELS Summary
Age BLANK 56 (48, 61)
Sex Female 97 (32.01%)
Male 206 (67.99%)
ChestPain Typical angina 23 (7.59%)
Atypical angina 50 (16.5%)
Non-anginal pain 86 (28.38%)
Asymptomatic 144 (47.52%)
BP BLANK 130 (120, 140)
Cholesterol BLANK 241 (211, 275)
MaximumHR BLANK 153 (133.5, 166)
ExerciseInducedAngina No 204 (67.33%)
Yes 99 (32.67%)
HeartDisease No 164 (54.13%)
Yes 139 (45.87%)