Forest plots are commonly used in the medical research publications, especially in meta-analysis. And it can also be used to report the coefficients and confidence intervals (CIs) of the regression models.
There are lots of packages out there can be used to create draw a forest plot. The most popular one is forestpot. Packages specialised for the meta-analysis, like meta, metafor and rmeta. Some other packages, like ggforestplot, tried to use ggplot2 to draw a forest plot, they are not available on the CRAN yet.
The main differences of the forestploter
from the other
packages are:
The layout of the forest plot is determined by the dataset provided.
The first step is to provide a data.frame
will be used
in the forest plot. Column names of the data will be drawn as the header
and contents inside the data will be displayed in the forest plot. One
or multiple blank columns without any content (blanks) should be
provided to draw confidence interval. Space to draw the CI is
determined by the width of this column. Increase the number of space in
the column to give more space to draw CI.
First we need to get the data ready to plot.
library(grid)
library(forestploter)
# Read provided sample example data
dt <- read.csv(system.file("extdata", "example_data.csv", package = "forestploter"))
# Keep needed columns
dt <- dt[,1:6]
# indent the subgroup if there is a number in the placebo column
dt$Subgroup <- ifelse(is.na(dt$Placebo),
dt$Subgroup,
paste0(" ", dt$Subgroup))
# NA to blank or NA will be transformed to carachter.
dt$Treatment <- ifelse(is.na(dt$Treatment), "", dt$Treatment)
dt$Placebo <- ifelse(is.na(dt$Placebo), "", dt$Placebo)
dt$se <- (log(dt$hi) - log(dt$est))/1.96
# Add blank column for the forest plot to display CI.
# Adjust the column width with space.
dt$` ` <- paste(rep(" ", 20), collapse = " ")
# Create confidence interval column to display
dt$`HR (95% CI)` <- ifelse(is.na(dt$se), "",
sprintf("%.2f (%.2f to %.2f)",
dt$est, dt$low, dt$hi))
head(dt)
#> Subgroup Treatment Placebo est low hi se
#> 1 All Patients 781 780 1.869694 0.13245636 3.606932 0.3352463
#> 2 Sex NA NA NA NA
#> 3 Male 535 548 1.449472 0.06834426 2.830600 0.3414741
#> 4 Female 246 232 2.275120 0.50768005 4.042560 0.2932884
#> 5 Age NA NA NA NA
#> 6 <65 yr 297 333 1.509242 0.67029394 2.348190 0.2255292
#> HR (95% CI)
#> 1 1.87 (0.13 to 3.61)
#> 2
#> 3 1.45 (0.07 to 2.83)
#> 4 2.28 (0.51 to 4.04)
#> 5
#> 6 1.51 (0.67 to 2.35)
The data we have above will be the basic layout of the forest plot. The example below shows how to draw a simple forest plot by applying a theme. A footnote was added as a demonstration.
p <- forest(dt[,c(1:3, 8:9)],
est = dt$est,
lower = dt$low,
upper = dt$hi,
sizes = dt$se,
ci_column = 4,
ref_line = 1,
arrow_lab = c("Placebo Better", "Treatment Better"),
xlim = c(0, 4),
ticks_at = c(0.5, 1, 2, 3),
footnote = "This is the demo data. Please feel free to change\nanything you want.")
# Print plot
plot(p)
Now we will use the same data above, pretending that’s what we have and add a summary point. We also want to change the graphical parameters for confidence interval and other parts of the plot.
dt_tmp <- rbind(dt[-1, ], dt[1, ])
dt_tmp[nrow(dt_tmp), 1] <- "Overall"
# Define theme
tm <- forest_theme(base_size = 10,
# Confidence interval point shape, line type/color/width
ci_pch = 16,
ci_col = "#762a83",
ci_lty = 1,
ci_lwd = 1.5,
ci_Theight = 0.2, # Set an T end at the end of CI
# Reference line width/type/color
refline_lwd = 1,
refline_lty = "dashed",
refline_col = "grey20",
# Vertical line width/type/color
vertline_lwd = 1,
vertline_lty = "dashed",
vertline_col = "grey20",
# Change summary color for filling and borders
summary_fill = "#4575b4",
summary_col = "#4575b4",
# Footnote font size/face/color
footnote_cex = 0.6,
footnote_fontface = "italic",
footnote_col = "blue")
pt <- forest(dt_tmp[,c(1:3, 8:9)],
est = dt_tmp$est,
lower = dt_tmp$low,
upper = dt_tmp$hi,
sizes = dt_tmp$se,
is_summary = c(rep(FALSE, nrow(dt_tmp)-1), TRUE),
ci_column = 4,
ref_line = 1,
arrow_lab = c("Placebo Better", "Treatment Better"),
xlim = c(0, 4),
ticks_at = c(0.5, 1, 2, 3),
footnote = "This is the demo data. Please feel free to change\nanything you want.",
theme = tm)
# Print plot
plot(pt)
The package has some functionality to modify the forestplot. Below is the functions to edit various aspects of the plot:
edit_plot
function can be used to change the color
or font face of some columns or rows.add_underline
function can be used to add a border
to a specific row.add_text
function can be used to add text to
certain rows/columns.insert_text
function can be used to insert a row
before or after a certain row and add text.# Change text color in row 3
g <- edit_plot(p, row = 3, gp = gpar(col = "red", fontface = "italic"))
# Bold grouping text
g <- edit_plot(g,
row = c(2, 5, 10, 13, 17, 20),
gp = gpar(fontface = "bold"))
# Edit background of row 5
g <- edit_plot(g, row = 5, which = "background",
gp = gpar(fill = "darkolivegreen1"))
# Insert text at top
g <- insert_text(g,
text = "Treatment group",
col = 2:3,
part = "header",
gp = gpar(fontface = "bold"))
# Add underline at the bottom of the header
g <- add_underline(g, part = "header")
# Insert text
g <- insert_text(g,
text = "This is a long text. Age and gender summarised above.\nBMI is next",
row = 10,
just = "left",
gp = gpar(cex = 0.6, col = "green", fontface = "italic"))
plot(g)
The add_text
simply put the text in the plot without
adding any rows to the plot. Adding a blank row to the data before
drawing a forest plot and use add_text
function to add text
to the row have the same effect as insert_text
.
If drawing CI to multiple columns is desired, one only need to
provide a vector of the position of the columns to be drawn in the data.
As seen in the example below, the CI will be drawn in the column 3 and
5. The first and second elements in est
, lower
and upper
will be drawn in column 3 and column 5.
For a more complex example, one may want to draw CI by groups. The
solution is simple, just provide all the values sequently to
est
, lower
and upper
. Which
means, the first n
elements in the est
,
lower
and upper
are considered as same group,
same for next n
elements. The n
is determined
by the length of ci_column
. As it is shown in the example
below, est_gp1
and est_gp2
will be drawn in
column 3 and column 5 as normal, considered as group 1. But
est_gp3
and est_gp4
will be considered as
group 2.
This is an example of multiple CI columns and groups:
dt <- read.csv(system.file("extdata", "example_data.csv", package = "forestploter"))
# indent the subgroup if there is a number in the placebo column
dt$Subgroup <- ifelse(is.na(dt$Placebo),
dt$Subgroup,
paste0(" ", dt$Subgroup))
# NA to blank or NA will be transformed to carachter.
dt$`n` <- ifelse(is.na(dt$Treatment), "", dt$Treatment)
dt$`n ` <- ifelse(is.na(dt$Placebo), "", dt$Placebo)
# Add two blank column for CI
dt$`CVD outcome` <- paste(rep(" ", 20), collapse = " ")
dt$`COPD outcome` <- paste(rep(" ", 20), collapse = " ")
# Set-up theme
tm <- forest_theme(base_size = 10,
refline_lty = "solid",
ci_pch = c(15, 18),
ci_col = c("#377eb8", "#4daf4a"),
footnote_col = "blue",
legend_name = "Group",
legend_value = c("Trt 1", "Trt 2"),
vertline_lty = c("dashed", "dotted"),
vertline_col = c("#d6604d", "#bababa"))
p <- forest(dt[,c(1, 19, 21, 20, 22)],
est = list(dt$est_gp1,
dt$est_gp2,
dt$est_gp3,
dt$est_gp4),
lower = list(dt$low_gp1,
dt$low_gp2,
dt$low_gp3,
dt$low_gp4),
upper = list(dt$hi_gp1,
dt$hi_gp2,
dt$hi_gp3,
dt$hi_gp4),
ci_column = c(3, 5),
ref_line = 1,
vert_line = c(0.5, 2),
nudge_y = 0.2,
theme = tm)
plot(p)
If the desired forest plot is multiple column, some may want to have
different settings for different columns. For example, different CI
column has different xlim, x-axis ticks, x-axis labels, xlog, reference
line, vertical line or arrow labels. This can be easily done by
providing a list or vector. Provide a list for xlim
,
vert_line
, arrow_lab
and
ticks_at
, atomic vector for xlab
,
xlog
and ref_line
. See the example below.
p <- forest(dt[,c(1, 19, 21, 20, 22)],
est = list(dt$est_gp1,
dt$est_gp2,
dt$est_gp3,
dt$est_gp4),
lower = list(dt$low_gp1,
dt$low_gp2,
dt$low_gp3,
dt$low_gp4),
upper = list(dt$hi_gp1,
dt$hi_gp2,
dt$hi_gp3,
dt$hi_gp4),
ci_column = c(3, 5),
ref_line = c(1, 0),
vert_line = list(c(0.3, 1.4), c(0.6, 2)),
xlog = c(T, F),
arrow_lab = list(c("L1", "R1"), c("L2", "R2")),
xlim = list(c(0, 3), c(-1, 3)),
ticks_at = list(c(0.1, 0.5, 1, 2.5), c(-1, 0, 2)),
xlab = c("OR", "Beta"),
nudge_y = 0.2,
theme = tm)
plot(p)
One can use the base method or use ggsave
function to
save plot. For the ggsave
function, please don’t ignore the
plot
parameter. The width and height should be tuned to get
a desired plot. You can also set autofit=TRUE
in the
print
or plot
function to autofit the plot,
but this may change not be compact as it should be.
# Base method
png('rplot.png', res = 300, width = 7.5, height = 7.5, units = "in")
p
dev.off()
# ggsave function
ggplot2::ggsave(filename = "rplot.png", plot = p,
dpi = 300,
width = 7.5, height = 7.5, units = "in")