Z(log) Transformation for Laboratory Measurements

Introduction

The zlog package offers functions to transform laboratory measurements into standardised \(z\) or \(z(log)\)-values as suggested in Hoffmann et al. (2017). Therefore the lower and upper reference limits are needed. If these are not known they could estimated from a given sample.

Z or Z(log) Transformation

Hoffmann et al. (2017) define \(z\) as follows:

\[z = (x - (limits_1 + limits_2 )/2) * 3.92/(limits_2 - limits_1)\]

Consequently the \(z(log)\) is defined as:

\[zlog = (\log(x) - (\log(limits_1) + \log(limits_2))/2) * 3.92/(\log(limits_2) - \log(limits_1))\]

Where \(x\) is the measured laboratory value and \(limits_1\) and \(limits_2\) are the lower and upper reference limit, respectively.

Example data and reference limits are taken from Hoffmann et al. (2017), Table 2.

library("zlog")

albumin <- c(42, 34, 38, 43, 50, 42, 27, 31, 24)
z(albumin, limits = c(35, 52))
## [1] -0.345876 -2.190548 -1.268212 -0.115292  1.498796 -0.345876 -3.804636
## [8] -2.882300 -4.496388
zlog(albumin, limits = c(35, 52))
## [1] -0.15472223 -2.24698167 -1.14569028  0.07826303  1.57162335 -0.15472223
## [7] -4.52949253 -3.16160843 -5.69571148

Inverse Z or Z(log) Transformation/Undo Transformation

izlog(zlog(albumin, limits = c(35, 52)), limits = c(35, 52))
## [1] 42 34 38 43 50 42 27 31 24

Z(log) Dependent Colour Gradient

Hoffmann et al. (2017) suggested a colour gradient to visualise laboratory measurements for the user.

It could be used to highlight the values in a table:

Table reproduced from Hoffmann et al. (2017), Table 2, limits used: albumin 35-52 g/l, bilirubin 2-21 µmol/l.
Category albumin zlog(albumin) bilirubin zlog(bilirubin)
blood donor 42 -0.15 11 0.88
blood donor 34 -2.25 9 0.55
blood donor 38 -1.15 2 -1.96
hepatitis without cirrhosis 43 0.08 5 -0.43
hepatitis without cirrhosis 50 1.57 22 2.04
hepatitis without cirrhosis 42 -0.15 42 3.12
hepatitis with cirrhosis 27 -4.53 37 2.90
hepatitis with cirrhosis 31 -3.16 200 5.72
hepatitis with cirrhosis 24 -5.70 20 1.88

Estimate Reference Limits

The reference_limits functions calculates the lower and upper 2.5 or 97.5 (or a user given probability) quantiles:

reference_limits(albumin)
## lower upper 
##  24.6  48.6
reference_limits(albumin, probs = c(0.05, 0.95))
## lower upper 
##  25.2  47.2
exp(reference_limits(log(albumin)))
##    lower    upper 
## 24.57207 48.51429

Working with Reference Tables

Most laboratories use their own age- and sex-specific reference limits. The lookup_limits function could be used to find the correct reference limit.

# toy example
reference <- data.frame(
    param = c("albumin", rep("bilirubin", 4)),
    age = c(0, 1, 2, 3, 7),     # days
    sex = "both",
    units = c("g/l", rep("µmol/l", 4)), # ignored
    lower = c(35, rep(NA, 4)),  # no real reference values
    upper = c(52, 5, 8, 13, 18) # no real reference values
)
knitr::kable(reference)
param age sex units lower upper
albumin 0 both g/l 35 52
bilirubin 1 both µmol/l NA 5
bilirubin 2 both µmol/l NA 8
bilirubin 3 both µmol/l NA 13
bilirubin 7 both µmol/l NA 18
# lookup albumin reference values for 18 year old woman
lookup_limits(
    age = 18 * 365.25,
    sex = "female",
    table = reference[reference$param == "albumin",]
)
##         lower upper
## albumin    35    52
# lookup albumin and bilirubin values for 18 year old woman
lookup_limits(
    age = 18 * 365.25,
    sex = "female",
    table = reference
)
##           lower upper
## albumin      35    52
## bilirubin    NA    18
# lookup bilirubin reference values for infants
lookup_limits(
    age = 0:8,
    sex = rep(c("female", "male"), 5:4),
    table = reference[reference$param == "bilirubin",]
)
##           lower upper
## bilirubin    NA    NA
## bilirubin    NA     5
## bilirubin    NA     8
## bilirubin    NA    13
## bilirubin    NA    13
## bilirubin    NA    13
## bilirubin    NA    13
## bilirubin    NA    18
## bilirubin    NA    18

Missing Reference Limits

Sometimes reference limits are not specified. That is often the case for biomarkers that are related to infection or cancer. Using zero as lower boundary results in skewed distributions (Hoffmann et al. 2017, fig. 7). Haeckel et al. (2015) suggested to set the lower reference limit to 15 % of the upper one.

# use default fractions
set_missing_limits(reference)
##       param age  sex  units lower upper
## 1   albumin   0 both    g/l 35.00    52
## 2 bilirubin   1 both µmol/l  0.75     5
## 3 bilirubin   2 both µmol/l  1.20     8
## 4 bilirubin   3 both µmol/l  1.95    13
## 5 bilirubin   7 both µmol/l  2.70    18
# set fractions manually
set_missing_limits(reference, fraction = c(0.2, 5))
##       param age  sex  units lower upper
## 1   albumin   0 both    g/l  35.0    52
## 2 bilirubin   1 both µmol/l   1.0     5
## 3 bilirubin   2 both µmol/l   1.6     8
## 4 bilirubin   3 both µmol/l   2.6    13
## 5 bilirubin   7 both µmol/l   3.6    18

Impute Missing Laboratory Measurements

If laboratory measurements are missing they could be imputed using “normal” values from the reference table. Using the "logmean" (default) or "mean" reference value (default) will result in a \(zlog\) or \(z\)-value of zero, respectively.

x <- data.frame(
    age = c(40, 50),
    sex = c("female", "male"),
    albumin = c(42, NA)
)
x
##   age    sex albumin
## 1  40 female      42
## 2  50   male      NA
z_df(impute_df(x, reference, method = "mean"), reference)
##   age    sex   albumin
## 1  40 female -0.345876
## 2  50   male  0.000000
zlog_df(impute_df(x, reference), reference)
##   age    sex    albumin
## 1  40 female -0.1547222
## 2  50   male  0.0000000

PBC Example

For demonstration we choose the pbc dataset from the survival package and exclude all non-laboratory measurements except age and sex:

library("survival")
data("pbc")
labs <- c(
    "bili", "chol", "albumin", "copper", "alk.phos", "ast", "trig",
    "platelet", "protime"
)
pbc <- pbc[, c("age", "sex", labs)]
knitr::kable(head(pbc), digits = 1)
age sex bili chol albumin copper alk.phos ast trig platelet protime
58.8 f 14.5 261 2.6 156 1718.0 137.9 172 190 12.2
56.4 f 1.1 302 4.1 54 7394.8 113.5 88 221 10.6
70.1 m 1.4 176 3.5 210 516.0 96.1 55 151 12.0
54.7 f 1.8 244 2.5 64 6121.8 60.6 92 183 10.3
38.1 f 3.4 279 3.5 143 671.0 113.2 72 136 10.9
66.3 f 0.8 248 4.0 50 944.0 93.0 63 NA 11.0

Next we estimate all reference limits from the data. We want to use sex-specific values for copper and aspartate aminotransferase ("ast").

## replicate copper and ast 2 times, use the others just once
param <- rep(labs, ifelse(labs %in% c("copper", "ast"), 2, 1))
sex <- rep_len("both", length(param))

## replace sex == both with female and male for copper and ast
sex[param %in% c("copper", "ast")] <- c("f", "m")

## create data.frame, we ignore age-specific values for now and set age to zero
## (means applicable for all ages)
reference <- data.frame(
    param = param, age = 0, sex = sex, lower = NA, upper = NA
)

## estimate reference limits from sample data
for (i in seq_len(nrow(reference))) {
    reference[i, c("lower", "upper")] <-
        if (reference$sex[i] == "both")
            reference_limits(pbc[reference$param[i]])
        else
            reference_limits(pbc[pbc$sex == reference$sex[i], reference$param[i]])
}
knitr::kable(reference)
param age sex lower upper
bili 0 both 0.4000 17.3150
chol 0 both 174.0750 1086.2250
albumin 0 both 2.5400 4.2200
copper 0 f 12.8250 269.2750
copper 0 m 23.5000 388.0000
alk.phos 0 both 504.7500 9261.7400
ast 0 f 49.6000 249.7438
ast 0 m 55.4775 208.3350
trig 0 both 52.0250 279.8000
platelet 0 both 95.0000 470.4000
protime 0 both 9.5000 13.1625

The pbc dataset contains a few missing values. We impute the with the corresponding mean reference value (which is in this example just the sample mean but would be in real life the mean of a e.g. healthy subpopulation).

pbc[c(6, 14),]
##         age sex bili chol albumin copper alk.phos ast trig platelet protime
## 6  66.25873   f  0.8  248    3.98     50      944  93   63       NA      11
## 14 56.22177   m  0.8   NA    2.27     43      728  71   NA      156      11
pbc <- impute_df(pbc, reference)
pbc[c(6, 14),]
##         age sex bili     chol albumin copper alk.phos ast     trig platelet
## 6  66.25873   f  0.8 248.0000    3.98     50      944  93  63.0000 211.3954
## 14 56.22177   m  0.8 434.8386    2.27     43      728  71 120.6507 156.0000
##    protime
## 6       11
## 14      11

Subsequently we can convert the laboratory measurements into \(z(log)\)-values using the zlog_df function that applies the zlog for every numeric column in a data.frame (except the "age" column):

pbc <- zlog_df(pbc, reference)
age sex bili chol albumin copper alk.phos ast trig platelet protime
58.8 f 1.8 -1.1 -1.8 1.3 -0.3 0.5 0.8 -0.3 1.0
56.4 f -0.9 -0.8 1.8 -0.1 1.7 0.0 -0.7 0.1 -0.6
70.1 m -0.7 -1.9 0.5 1.1 -1.9 -0.3 -1.8 -0.8 0.8
54.7 f -0.4 -1.2 -2.0 0.1 1.4 -1.5 -0.6 -0.4 -1.0
38.1 f 0.3 -1.0 0.6 1.1 -1.6 0.0 -1.2 -1.1 -0.3
66.3 f -1.2 -1.2 1.5 -0.2 -1.1 -0.4 -1.5 0.0 -0.2
55.5 f -1.0 -0.6 1.7 -0.2 -1.3 -1.5 1.3 -0.1 -1.7
53.1 f -2.3 -0.9 1.5 -0.2 1.0 -3.3 1.0 1.4 -0.2
42.5 f 0.2 0.5 -0.5 0.4 0.1 0.6 -0.7 0.4 -0.2
70.6 f 1.6 -1.7 -1.4 1.1 -1.2 0.7 0.4 0.9 0.3
53.7 f -0.7 -1.1 1.8 -0.3 -0.9 -0.8 -1.0 0.5 0.8
59.1 f 0.3 -1.3 0.6 0.6 -1.7 -0.7 -0.6 -2.7 2.4
45.7 f -1.4 -0.9 1.3 -0.5 -0.8 -0.6 0.2 0.4 -0.6
56.2 m -1.2 0.0 -2.8 -1.1 -1.5 -1.2 0.0 -0.7 -0.2
64.6 f -1.2 -1.4 1.3 1.4 1.9 0.3 -0.5 0.8 -0.2
40.4 f -1.4 -1.6 0.9 -1.0 -1.5 -1.0 -1.7 -0.2 -0.4
52.2 f 0.0 -1.0 -0.3 1.3 -0.5 0.1 0.1 0.1 -0.8
53.9 f 1.5 -1.9 -1.2 3.0 -1.1 2.2 1.2 0.7 1.2
49.6 f -1.4 -1.3 0.6 -0.5 -0.2 -0.4 0.0 -0.0 -0.2
60.0 f 0.7 -0.3 0.5 1.1 -0.2 0.2 0.3 1.0 1.8
64.2 m -1.5 -1.2 1.2 -1.2 -1.3 -1.5 -0.9 1.1 0.2
56.3 f 0.3 -1.0 0.8 2.7 -0.6 0.2 -1.8 -0.5 0.4
56.0 f 2.0 -0.2 -0.8 2.9 1.4 1.7 1.1 0.0 0.5
44.5 m -0.2 0.1 1.5 0.4 1.3 2.1 1.5 -2.7 -1.5
45.1 f -1.4 -0.8 1.7 -0.5 -1.6 -0.1 -1.4 1.0 0.1

Session information

sessionInfo()
## R Under development (unstable) (2020-07-01 r78759)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux bullseye/sid
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.13.so
## 
## locale:
##  [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8   
##  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] survival_3.2-7 zlog_1.0.0    
## 
## loaded via a namespace (and not attached):
##  [1] rstudioapi_0.13   knitr_1.31        xml2_1.3.2        magrittr_2.0.1   
##  [5] splines_4.1.0     rvest_0.3.6       munsell_0.5.0     lattice_0.20-41  
##  [9] colorspace_2.0-0  viridisLite_0.3.0 R6_2.5.0          rlang_0.4.10     
## [13] stringr_1.4.0     highr_0.8         httr_1.4.2        tools_4.1.0      
## [17] grid_4.1.0        webshot_0.5.2     xfun_0.21         htmltools_0.5.1.1
## [21] yaml_2.2.1        digest_0.6.27     lifecycle_1.0.0   Matrix_1.3-2     
## [25] kableExtra_1.3.1  glue_1.4.2        evaluate_0.14     rmarkdown_2.6    
## [29] stringi_1.5.3     compiler_4.1.0    scales_1.1.1

References

Haeckel, Rainer, Werner Wosniok, Ebrhard Gurr, and Burkhard Peil. 2015. “Permissible Limits for Uncertainty of Measurement in Laboratory Medicine.” Clinical Chemistry and Laboratory Medicine 53 (8): 1161–71. https://doi.org/10.1515/cclm-2014-0874.

Hoffmann, Georg, Frank Klawonn, Ralf Lichtinghagen, and Matthias Orth. 2017. “The Zlog-Value as Basis for the Standardization of Laboratory Results.” LaboratoriumsMedizin 41 (1): 23–32. https://doi.org/10.1515/labmed-2016-0087.