Exercise 2. Comparing survival proportions and mortality rates by stage for cause-specific and all-cause survival

Load dependencies

library(biostat3) # loads the survival and muhaz packages
library(dplyr)    # for data manipulation

We start by listing the first few observations to get an idea about the data. We then define two 1/0 variables for the events that we are interested in.

## Examine the data
data(melanoma)
head(melanoma)
##      sex age     stage mmdx yydx surv_mm surv_yy       status
## 1 Female  81 Localised    2 1981    26.5     2.5  Dead: other
## 2 Female  75 Localised    9 1975    55.5     4.5  Dead: other
## 3 Female  78 Localised    2 1978   177.5    14.5  Dead: other
## 4 Female  75   Unknown    8 1975    29.5     2.5 Dead: cancer
## 5 Female  81   Unknown    7 1981    57.5     4.5  Dead: other
## 6 Female  75 Localised    9 1975    19.5     1.5 Dead: cancer
##            subsite        year8594         dx       exit agegrp id
## 1    Head and Neck Diagnosed 75-84 1981-02-02 1983-04-20    75+  1
## 2    Head and Neck Diagnosed 75-84 1975-09-21 1980-05-07    75+  2
## 3            Limbs Diagnosed 75-84 1978-02-21 1992-12-07    75+  3
## 4 Multiple and NOS Diagnosed 75-84 1975-08-25 1978-02-08    75+  4
## 5    Head and Neck Diagnosed 75-84 1981-07-09 1986-04-25    75+  5
## 6            Trunk Diagnosed 75-84 1975-09-03 1977-04-19    75+  6
##        ydx    yexit
## 1 1981.088 1983.298
## 2 1975.720 1980.348
## 3 1978.140 1992.934
## 4 1975.646 1978.104
## 5 1981.517 1986.312
## 6 1975.671 1977.296
## Create 0/1 outcome variables
melanoma <- 
    transform(melanoma,
              death_cancer = ifelse( status == "Dead: cancer", 1, 0),
              death_all = ifelse( status == "Dead: cancer" |
                                  status == "Dead: other", 1, 0))

(a) Plot estimates of the survivor function and hazard function by stage.

We now tabulate the distribution of the melanoma patients by cancer stage at diagnosis.

## Tabulate by stage
Freq <- xtabs(~stage, data=melanoma)
cbind(Freq, Prop=prop.table(Freq))
##           Freq       Prop
## Unknown   1631 0.20977492
## Localised 5318 0.68398714
## Regional   350 0.04501608
## Distant    476 0.06122186

We then plot the survival by stage.

par(mfrow=c(1, 2))
mfit <- survfit(Surv(surv_mm, death_cancer) ~ stage, data = melanoma)

plot(mfit, col=1:4,
     xlab = "Follow-up Time",
     ylab = "Survival")
## legend("topright", levels(melanoma$stage), col=1:4, lty = 1)

hazards <- muhaz2(Surv(surv_mm, death_cancer)~stage, melanoma)
plot(hazards,
     col=1:4, lty=1, xlim=c(0,250), ylim=c(0,0.08),
     legend.args=list(bty="n"))

Survival depends heavily on stage. It is interesting to note that patients with stage 0 (unknown) appear to have a similar survival to patients with stage 1 (localized).

(b) Estimate the mortality rates for each stage using, for example, the new survRate command.

survRate(Surv(surv_mm/12, death_cancer) ~ stage, data=melanoma)
##                     stage    tstop event       rate      lower      upper
## stage=Unknown     Unknown 10267.12   274 0.02668712 0.02362045 0.03004141
## stage=Localised Localised 38626.58  1013 0.02622546 0.02463514 0.02789150
## stage=Regional   Regional  1500.25   218 0.14530912 0.12665886 0.16593238
## stage=Distant     Distant   875.75   408 0.46588638 0.42177124 0.51336147
melanoma %>%
    select(death_cancer, surv_mm, stage) %>%
    group_by(stage) %>%
    summarise(D = sum(death_cancer), M = sum(surv_mm/12), Rate = D/M,
              CI_low = stats::poisson.test(D,M)$conf.int[1],
              CI_high = stats::poisson.test(D,M)$conf.int[2]) 
## # A tibble: 4 x 6
##       stage     D        M       Rate     CI_low    CI_high
##      <fctr> <dbl>    <dbl>      <dbl>      <dbl>      <dbl>
## 1   Unknown   274 10267.12 0.02668712 0.02362045 0.03004141
## 2 Localised  1013 38626.58 0.02622546 0.02463514 0.02789150
## 3  Regional   218  1500.25 0.14530912 0.12665886 0.16593238
## 4   Distant   408   875.75 0.46588638 0.42177124 0.51336147

The time unit is months (since we specified surv_mm as the analysis time). Therefore, the units of the rates shown above are events/person-month. We could multiply these rates by 12 to obtain estimates with units events/person-year. For example,

        . stset surv_mm, failure(status==1) scale(12)
        . strate stage

                 failure _d:  status == 1
           analysis time _t:  surv_mm/12

        Estimated rates and lower/upper bounds of 95% confidence intervals
        (7775 records included in the analysis)
          +--------------------------------------------------------------+
          |     stage      D          Y       Rate      Lower      Upper |
          |--------------------------------------------------------------|
          |   Unknown    274    1.0e+04   0.026687   0.023707   0.030042 |
          | Localised   1013    3.9e+04   0.026225   0.024659   0.027891 |
          |  Regional    218    1.5e+03   0.145309   0.127245   0.165937 |
          |   Distant    408   875.7500   0.465886   0.422804   0.513359 |
          +--------------------------------------------------------------+

(c) Here we tabulate crude rates per 1000 person years.

To obtain mortality rates per 1000 person years:

survRate(Surv(surv_mm/12/1000, death_cancer) ~ stage, data=melanoma)
##                     stage    tstop event      rate     lower     upper
## stage=Unknown     Unknown 10.26713   274  26.68712  23.62045  30.04141
## stage=Localised Localised 38.62658  1013  26.22546  24.63514  27.89150
## stage=Regional   Regional  1.50025   218 145.30912 126.65886 165.93238
## stage=Distant     Distant  0.87575   408 465.88638 421.77124 513.36147

Estimated rates (per 1000) and lower/upper bounds of 95% confidence intervals (7775 records included in the analysis)

          +----------------------------------------------------------+
          |     stage      D         Y      Rate     Lower     Upper |
          |----------------------------------------------------------|
          |   Unknown    274   10.2671    26.687    23.707    30.042 |
          | Localised   1013   38.6266    26.225    24.659    27.891 |
          |  Regional    218    1.5003   145.309   127.245   165.937 |
          |   Distant    408    0.8758   465.886   422.804   513.359 |
          +----------------------------------------------------------+

(d) Below we see that the crude mortality rate is higher for males than for females.

survRate(Surv(surv_mm/12/1000, death_cancer) ~ sex, data=melanoma)
##               sex    tstop event     rate    lower    upper
## sex=Male     Male 21.96892  1074 48.88725 46.00684 51.90076
## sex=Female Female 29.30079   839 28.63404 26.72903 30.63898

We see that the crude mortality rate is higher for males than females, a difference which is also reflected in the survival and hazard curves

(e)

The majority of patients are alive at end of study. 1,913 died from cancer while 1,134 died from another cause. The cause of death is highly depending of age, as young people die less from other causes. To observe this we tabulate the events by age group.

xtabs(~status+agegrp, melanoma)
##                    agegrp
## status              0-44 45-59 60-74  75+
##   Alive             1615  1568  1178  359
##   Dead: cancer       386   522   640  365
##   Dead: other         39   147   461  487
##   Lost to follow-up    6     1     1    0

(f)

The survival is worse for all-cause survival than for cause-specific, since you now can die from other causes, and these deaths are incorporated in the Kaplan-Meier estimates. The ”other cause” mortality is particularly present in patients with localised and unknown stage.

par(mfrow=c(1, 1))
afit <- survfit(Surv(surv_mm, death_all) ~ stage, data = melanoma)
plot(afit, col=1:4,
     xlab = "Follow-up Time",
     ylab = "Survival",
     main = "Kaplan-Meier survival estimates\nAll-cause")
legend("topright", levels(melanoma$stage), col=1:4, lty = 1)

(g)

By comparing Kaplan-Meier estimates for cancer deaths with all-cause mortality conditioned on age over 75 years, we see that the “other” cause mortality is particularly influential in patients with localised and unknown stage. Patients with localised disease, have a better prognosis (i.e. the cancer does not kill them), and are thus more likely to experience death from another cause. For regional and distant stage, the cancer is more aggressive and is the cause of death for most of these patients (i.e. it is the cancer that kills these patients before they have “the chance” to die from something else).

par(mfrow=c(1, 2))
mfit75 <- survfit(Surv(surv_mm, death_cancer) ~ stage, data = subset(melanoma,agegrp=="75+"))
plot(mfit75, col=1:4,
     xlab = "Follow-up Time",
     ylab = "Survival",
     main = "Kaplan-Meier survival estimates\nCancer | Age 75+")
legend("topright", levels(melanoma$stage), col=1:4, lty = 1)

afit75 <- survfit(Surv(surv_mm, death_all) ~ stage, data = subset(melanoma,agegrp=="75+"))
plot(afit75, col=1:4,
     xlab = "Follow-up Time",
     ylab = "Survival",
     main = "Kaplan-Meier survival estimates\nAll-cause | Age 75+")
legend("topright", levels(melanoma$stage), col=1:4, lty = 1)

(h) Compare Kaplan-Meier estimates for cancer deaths with all-cause mortality by age group.

par(mfrow=c(1, 2))
mfitage <- survfit(Surv(surv_mm, death_cancer) ~ agegrp, data = melanoma)
plot(mfitage, col=1:4,
     xlab = "Follow-up Time",
     ylab = "Survival",
     main = "Kaplan-Meier estimates of\ncancer survival by age group")
legend("topright", levels(melanoma$agegrp), col=1:4, lty = 1)

afitage <- survfit(Surv(surv_mm, death_all) ~ agegrp, data = melanoma)
plot(afitage, col=1:4,
     xlab = "Follow-up Time",
     ylab = "Survival",
     main = "Kaplan-Meier estimates of\nall-cause survival by age group")
legend("topright", levels(melanoma$agegrp), col=1:4, lty = 1)