In addition to the modelling functions, the package includes the subbuild
function that may be useful when defining the subgroup covariates to use in the analysis.
We use the prca
data that was used in Rosenkranz (2016) https://onlinelibrary.wiley.com/doi/abs/10.1002/bimj.201500147
library(subtee)
################################################################################
# The data comes from a clinical trial of an prostate cancer
# treatment
# Data is loaded from Royston, Patrick, and Willi Sauerbrei.
# Multivariable model-building: a pragmatic approach to
# regression anaylsis based on fractional polynomials for
# modelling continuous variables. Vol. 777. John Wiley & Sons, 2008.
# https://www.imbi.uni-freiburg.de/Royston-Sauerbrei-book
= get_prca_data()
prca #> Downloading remote dataset.
The subbuild
function basically creates binary subgroup indicator variables. For example, if we need to create the subgroup indicator for the group of subjects older than 65 years old, we simply specify this expression in the function
<- subbuild(data = prca, AGE > 65)
subgroups head(subgroups)
#> AGE > 65
#> 1 1
#> 2 1
#> 3 1
#> 4 1
#> 5 1
#> 6 1
When a continuous covariate is given with no cutoff, the functions will then create n.cuts + 1
subgroups with approximately equal sample sizes.
<- subbuild(data = prca, AGE, n.cuts = 4)
subgroups head(subgroups)
#> AGE<=68 68<AGE<=72 72<AGE<=74 74<AGE<=76 AGE>76
#> 1 0 0 0 1 0
#> 2 0 1 0 0 0
#> 3 0 0 0 1 0
#> 4 1 0 0 0 0
#> 5 0 1 0 0 0
#> 6 0 0 0 1 0
The indicator variable that the subject had bone metastasis at baseline, BM
, contains only 0s and 1s but it is possible to create the indicator using subbuild
.
<- subbuild(data = prca, BM == 1)
subgroups head(subgroups)
#> BM == 1
#> 1 0
#> 2 0
#> 3 0
#> 4 0
#> 5 0
#> 6 0
Doing this may be useful for consistency as the subbuild may take several expressions to define all the candidate subgroups to be analysed at once.
<- subbuild(prca,
cand.groups == 1, PF == 1, HX == 1,
BM == 4, AGE > 65, WT > 100)
STAGE head(cand.groups)
#> BM == 1 PF == 1 HX == 1 STAGE == 4 AGE > 65 WT > 100
#> 1 0 0 0 0 1 0
#> 2 0 0 1 0 1 1
#> 3 0 1 1 0 1 0
#> 4 0 0 0 0 1 0
#> 5 0 0 0 0 1 0
#> 6 0 0 0 0 1 0
If no expressions are given subbuild
generates the binary subgroup indicators based on all covariates in the data set (here restrict to columns 2 to 7) and default settings
<- subbuild(prca[,2:7])
cand.groups head(cand.groups)
#> AGE<=71 71<AGE<=75 AGE>75 WT<=93 93<WT<=103 WT>103 SBP<=13 13<SBP<=15 SBP>15
#> 1 0 1 0 1 0 0 0 1 0
#> 2 1 0 0 0 1 0 0 1 0
#> 3 0 1 0 0 1 0 0 1 0
#> 4 1 0 0 0 1 0 0 0 1
#> 5 1 0 0 0 1 0 0 0 1
#> 6 0 1 0 0 1 0 0 1 0
#> DBP<=8 8<DBP<=9 DBP>9 SZ<=7 7<SZ<=17 SZ>17 AP<=6 6<AP<=15 AP>15
#> 1 0 1 0 1 0 0 1 0 0
#> 2 1 0 0 1 0 0 1 0 0
#> 3 1 0 0 1 0 0 0 1 0
#> 4 0 0 1 0 0 1 1 0 0
#> 5 0 0 1 0 1 0 1 0 0
#> 6 0 0 1 0 1 0 0 1 0
Equivalent to the above statement, subgroup indicators are created for the named covariates based on default settings
<- subbuild(prca, AGE, WT, SBP, DBP, SZ, AP)
cand.groups head(cand.groups)
#> AGE<=71 71<AGE<=75 AGE>75 WT<=93 93<WT<=103 WT>103 SBP<=13 13<SBP<=15 SBP>15
#> 1 0 1 0 1 0 0 0 1 0
#> 2 1 0 0 0 1 0 0 1 0
#> 3 0 1 0 0 1 0 0 1 0
#> 4 1 0 0 0 1 0 0 0 1
#> 5 1 0 0 0 1 0 0 0 1
#> 6 0 1 0 0 1 0 0 1 0
#> DBP<=8 8<DBP<=9 DBP>9 SZ<=7 7<SZ<=17 SZ>17 AP<=6 6<AP<=15 AP>15
#> 1 0 1 0 1 0 0 1 0 0
#> 2 1 0 0 1 0 0 1 0 0
#> 3 1 0 0 1 0 0 0 1 0
#> 4 0 0 1 0 0 1 1 0 0
#> 5 0 0 1 0 1 0 1 0 0
#> 6 0 0 1 0 1 0 0 1 0
The matrix with all the candidate subgroups will still need to be concatenated with the original data.frame (or at least the response and treatment variables) to be used in the fitting functions unadj
, modav
, and bagged
.
<- cbind(prca[, c("SURVTIME", "CENS", "RX")], cand.groups)
fitdat head(fitdat)
#> SURVTIME CENS RX AGE<=71 71<AGE<=75 AGE>75 WT<=93 93<WT<=103 WT>103 SBP<=13
#> 1 72.5 0 0 0 1 0 1 0 0 0
#> 2 40.5 1 1 1 0 0 0 1 0 0
#> 3 20.5 1 0 0 1 0 0 1 0 0
#> 4 65.5 0 0 1 0 0 0 1 0 0
#> 5 24.5 1 0 1 0 0 0 1 0 0
#> 6 46.5 1 0 0 1 0 0 1 0 0
#> 13<SBP<=15 SBP>15 DBP<=8 8<DBP<=9 DBP>9 SZ<=7 7<SZ<=17 SZ>17 AP<=6 6<AP<=15
#> 1 1 0 0 1 0 1 0 0 1 0
#> 2 1 0 1 0 0 1 0 0 1 0
#> 3 1 0 1 0 0 1 0 0 0 1
#> 4 0 1 0 0 1 0 0 1 1 0
#> 5 0 1 0 0 1 0 1 0 1 0
#> 6 1 0 0 0 1 0 1 0 0 1
#> AP>15
#> 1 0
#> 2 0
#> 3 0
#> 4 0
#> 5 0
#> 6 0
Note that the names for the subgroup defining variables are not standard R names. This can be modified using the option make.valid.names = TRUE
.
<- subbuild(prca,
cand.groups == 1, PF == 1, HX == 1,
BM == 4, AGE > 65, WT > 100, make.valid.names = TRUE)
STAGE head(cand.groups)
#> BM.1 PF.1 HX.1 STAGE.4 AGE.g.65 WT.g.100
#> 1 0 0 0 0 1 0
#> 2 0 0 1 0 1 1
#> 3 0 1 1 0 1 0
#> 4 0 0 0 0 1 0
#> 5 0 0 0 0 1 0
#> 6 0 0 0 0 1 0
However, the fitting functions in the package allow to use expressions as variable names and this will lead to more informative plots and summary tables.