A proxy forward model is a numerical model that takes a hypothetical “true” climate time-series and simulates the processes of proxy creation and retrieval. The output is a simulated proxy time-series that should resemble what a true proxy would look like given our assumed true climate.
sedproxy is an R package that implements a forward model for climate proxies archived in marine sediments. The model is described in Dolman, A. M. and Laepple, T.: Sedproxy: a forward model for sediment-archived climate proxies, Climate of the Past, 14(12), 1851–1868, https://doi.org/10.5194/cp-14-1851-2018, 2018.
The tutorial is intended to get you started using sedproxy and to illustrate some of its possible applications.
sedproxy can be installed directly from github. Remove the comments marks “#” on the code below to install. If you are doing this tutorial on the RStudio Cloud sedproxy is already installed.
# if (!require("devtools")) {
# install.packages("devtools")
# }
#
# devtools::install_github("EarthSystemDiagnostics/sedproxy")
The main function in the sedproxy package is ClimToProxyClim
for Climate-To-Proxy-Climate, and its first argument is clim.signal
. This is the “assumed true” past climate from which a proxy record will be modelled. This could be the output from a GCM, ESM, or a simple stochastic time series.
It should be a matrix with rows corresponding to years - with the most recent year at the top, going back in time as you go down the matrix (as if it were a sediment column!). In this example the columns of the matrix represent months of the year, but more generally the columns can also represent “habitats” such as different depths in the ocean at which the proxy creation process might take place.
The sedproxy package contains an example input climate taken from the Trace21k climate model run. It is the closest model grid box to core MD97-2141, from which there is a foraminiferal Mg/Ca record (record 41 in Shakun et al. 2012, Rosenthal et al., 2003).
The model data are in “time-forward” order so we need to reverse the order when we read it, and convert to °C from Kelvin.
::opts_chunk$set(echo=TRUE, message = FALSE, warning = FALSE,
knitrfig.width = 6, fig.height = 4)
library(sedproxy)
library(dplyr)
library(tidyr)
library(ggplot2)
<- N41.t21k.climate[nrow(N41.t21k.climate):1,] - 273.15 clim.in
Additionally we need to turn it into an R time-series object. The Trace simulation runs to the year 1990 AD, therefore the start time for the input climate is -39 years BP.
<- ts(clim.in, start = -39) clim.in
The second argument timepoint
defines the points in time for which you want to model a proxy value. If you have a real proxy record you can use the estimated ages of your actual samples. Here we will define a regular time-series with a sample every 100 years.
Because of the bioturbation process, the modelled proxy value at each timepoint depends on values of the input climate both earlier and later in time than the timepoint itself. This time windows spans from 1 “bioturbation width” younger than the timepoint, and 3 bioturbation widths earlier in time. If timepoints are requested that cannot be returned, the model will still run, but with a warning that some timepoints have been omitted.
<- seq(1, 20000, by = 100) req.timepoints
habitat.weights
define the relative abundance or flux of climate signal carriers (e.g. foraminiferal shells) corresponding to the habitats (here months) in the different columns of the input climate. For example, a particular taxa might be more abundant in the warm summer months than during the winter.
<- 1+cos(seq(pi, 3*pi - 2*pi/12, length.out = 12))
hab.wts <- hab.wts / sum(hab.wts)
hab.wts plot(hab.wts, type = "b")
The bioturbation depth bio.depth
and sedimentation rate sed.acc.rate
control the amount of smoothing that takes place due to mixing of the sediment. For most marine sediment cores we do not currently have a measured estimate of the bioturbation depth. As a default we use a value of 10 cm as this is close to the mean observed for locations where this has been measured by particle tracers. The sedimentation rate can be estimated from a set of down-core depths and age estimates. For this example we will use a sedimentation rate of 10 cm / kyr as together with a sediment slice thickness (called layer.width
in sedproxy) this implies a sampling interval of 100 years.
The sedproxy forward model simulates measurement error as a simple normal/Gaussian distributed random variable, the argument sigma.meas
sets the standard deviation of this error. Here we set it to 0.25 °C
Additionally, inter-individual error is modelled as a normally distributed error term, sigma.ind
, here set to 1 °C. The argument n.samples
controls the number of signal carriers (e.g. forams) measured together in a single sample. The inter-individual error is scaled by the square root of n.samples
as a mean is effectively taken over this number of individuals, here 30.
<- ClimToProxyClim(clim.signal = clim.in,
PFM timepoints = req.timepoints,
habitat.weights = hab.wts,
sed.acc.rate = 50,
sigma.meas = 0.25,
sigma.ind = 1,
n.samples = 30,
# this controls the resolution at which the input climate
# is returned and will be plotted later
plot.sig.res = 1)
There is a plotting function that by default plots all of the modelled stages of proxy creation.
PlotPFMs(PFM)
If we repeat the call to ClimToProxyClim
with n.replicates > 1 we will get multiple realisations of the simulated proxy.
5.reps <- ClimToProxyClim(clim.signal = clim.in,
PFM.timepoints = req.timepoints,
habitat.weights = hab.wts,
sed.acc.rate = 50,
sigma.meas = 0.25,
sigma.ind = 1,
n.samples = 30,
plot.sig.res = 1,
n.replicates = 5)
We can plot just the final simulated proxy records.
PlotPFMs(PFM.5.reps, plot.stages = c("simulated.proxy"))
But they are hard to distinguish, so we will extract the results and make our own custom plot. The output is a list.
attributes(PFM.5.reps)
## $names
## [1] "simulated.proxy" "smoothed.signal" "everything" "calibration.pars"
##
## $class
## [1] "sedproxy.pfm"
is.list(PFM.5.reps)
## [1] TRUE
The first element is a dataframe/tibble with only the first replicate. The second is a smoothed version of the climate signal. The poorly named element “everything” is what we want, this contains all the replicate simulated proxies in a large long format dataframe/tibble.
5.reps$everything PFM.
## # A tibble: 28,839 x 7
## timepoints n.samples replicate stage value scale label
## <dbl> <dbl> <dbl> <chr> <dbl> <chr> <chr>
## 1 1 30 1 proxy.bt.sb.sampY 28.7 Proxy units (4) +Alia…
## 2 101 30 1 proxy.bt.sb.sampY 28.7 Proxy units (4) +Alia…
## 3 201 30 1 proxy.bt.sb.sampY 28.8 Proxy units (4) +Alia…
## 4 301 30 1 proxy.bt.sb.sampY 28.7 Proxy units (4) +Alia…
## 5 401 30 1 proxy.bt.sb.sampY 28.6 Proxy units (4) +Alia…
## 6 501 30 1 proxy.bt.sb.sampY 28.7 Proxy units (4) +Alia…
## 7 601 30 1 proxy.bt.sb.sampY 28.6 Proxy units (4) +Alia…
## 8 701 30 1 proxy.bt.sb.sampY 28.7 Proxy units (4) +Alia…
## 9 801 30 1 proxy.bt.sb.sampY 28.7 Proxy units (4) +Alia…
## 10 901 30 1 proxy.bt.sb.sampY 28.7 Proxy units (4) +Alia…
## # … with 28,829 more rows
5.reps$everything %>%
PFM.filter(stage == "simulated.proxy",
<= 3) %>%
replicate ggplot(aes(x = timepoints, y = value, colour = factor(replicate))) +
geom_line()
<- subset(PFM.5.reps$everything, stage == "simulated.proxy")
reps
plot(value~timepoints,
col = replicate,
type = "n",
data = reps)
<- length(unique(reps$replicate))
n.reps <- rainbow(n.reps)
colrs
for (i in seq_along(unique(reps$replicate))){
lines(value~timepoints,
col = colrs[i],
data = reps[reps$replicate == i,])
}
Here we use sedproxy to examine the influence of the number of forams per sample and the number of samples measured per depth/timepoint/sediment slice.
.30 <- ClimToProxyClim(
pfmclim.signal = clim.in,
timepoints = req.timepoints,
#calibration.type = "MgCa",
sed.acc.rate = 50,
habitat.weights = hab.wts,
sigma.meas = 0.26, sigma.ind = 2,
n.samples = 30,
n.replicates = 300)
.5 <- ClimToProxyClim(
pfmclim.signal = clim.in,
timepoints = req.timepoints,
#calibration.type = "MgCa",
sed.acc.rate = 50,
habitat.weights = hab.wts,
sigma.meas = 0.26, sigma.ind = 2,
n.samples = 5,
n.replicates = 300)
# Here we take a shortcut. Instead of actually simulating the measurement of 6 samples at each timepoint we just shrink sigma.meas by sqrt(6) and multiply the number of forams by 6.
.5x6 <- ClimToProxyClim(
pfmclim.signal = clim.in,
timepoints = req.timepoints,
#calibration.type = "MgCa",
sed.acc.rate = 50,
habitat.weights = hab.wts,
sigma.meas = 0.26 / sqrt(6),
sigma.ind = 2, n.samples = 6*5,
n.replicates = 300)
.120 <- ClimToProxyClim(
pfmclim.signal = clim.in,
timepoints = req.timepoints,
#calibration.type = "MgCa",
sed.acc.rate = 50,
habitat.weights = hab.wts,
sigma.meas = 0.26,
sigma.ind = 2, n.samples = 120,
n.replicates = 300)
.10x12 <- ClimToProxyClim(
pfmclim.signal = clim.in,
timepoints = req.timepoints,
#calibration.type = "MgCa",
sed.acc.rate = 50,
habitat.weights = hab.wts,
sigma.meas = 0.26 / sqrt(10),
sigma.ind = 2, n.samples = 10*12,
n.replicates = 300)
Combine the three different measurement strategies and plot.
<- bind_rows(`5 forams` = pfm.5$everything,
combined.pfms `30 forams` = pfm.30$everything,
#`6*5 forams` = pfm.5x6$everything,
`120 forams` = pfm.120$everything,
`10x12 forams` = pfm.10x12$everything,
.id = "measurement.strategy") %>%
mutate(measurement.strategy =
factor(measurement.strategy,
ordered = TRUE,
levels = c("5 forams", "30 forams", "120 forams", "10x12 forams")))
%>%
combined.pfms filter(stage %in% c("simulated.proxy")) %>%
ggplot(aes(x = timepoints, y = value, group = factor(replicate))) +
geom_line(alpha = 0.01) +
facet_wrap(~measurement.strategy)
We can take the mean and standard deviation across replicates for each timepoint and plot this instead.
<- combined.pfms %>%
mean.error group_by(stage, timepoints, measurement.strategy) %>%
summarise(mean = mean(value),
sd = sd(value))
%>%
mean.error filter(stage %in% c("simulated.proxy")) %>%
ggplot(aes(x = timepoints, y = mean)) +
geom_line() +
geom_ribbon(aes(ymax = mean + sd, ymin = mean - sd),
alpha = 0.2, colour = NA) +
facet_wrap(~measurement.strategy) +
theme_bw()
And finally we can summarise the error for each measurement strategy.
%>%
mean.error filter(stage %in% "simulated.proxy") %>%
group_by(stage, measurement.strategy) %>%
summarise(mean.error = mean(sd)) %>%
::kable() knitr
stage | measurement.strategy | mean.error |
---|---|---|
simulated.proxy | 5 forams | 1.0469763 |
simulated.proxy | 30 forams | 0.4864570 |
simulated.proxy | 120 forams | 0.3320404 |
simulated.proxy | 10x12 forams | 0.2229671 |
Dolman, A. M. and Laepple, T.: Sedproxy: a forward model for sediment-archived climate proxies, Climate of the Past, 14(12), 1851–1868, https://doi.org/10.5194/cp-14-1851-2018, 2018.
Rosenthal, Y., Oppo, D. W. and Linsley, B. K.: The amplitude and phasing of climate change during the last deglaciation in the Sulu Sea, western equatorial Pacific, Geophys. Res. Lett., 30(8), 1428, https://doi.org/10.1029/2002GL016612, 2003.
Shakun, J. D., Clark, P. U., He, F., Marcott, S. A., Mix, A. C., Liu, Z., Otto-Bliesner, B., Schmittner, A. and Bard, E.: Global warming preceded by increasing carbon dioxide concentrations during the last deglaciation, Nature, 484(7392), 49–54, https://doi.org/10.1038/nature10915, 2012.