This page is designed to explain how ?lpdf
objects can
be used for automated learning of hyperparameters and general fitting of
predictors.
library(outerbase)
Let’s set things up for a 3 dimensional example.
= new(outermod)
om = 3
d setcovfs(om, rep("mat25pow",d))
= list(seq(0.01,0.99,by=0.02),
knotlist seq(0.01,0.99,by=0.02),
seq(0.01,0.99,by=0.02))
setknot(om, knotlist)
The values of covariance function hyperparameters are extremely
important for successful near-interpolation of surfaces. This impact is
felt in outerbase
and this section is designed to
illustrate that.
Consider the first four basis functions plotted below.
= 30
sampsize = seq(1/(2*sampsize),1-1/(2*sampsize),1/sampsize)
design1d = cbind(design1d,sample(design1d),sample(design1d))
x = new(outerbase, om, x)
ob = ob$getbase(1)
basis_func0 matplot(x[,1],basis_func0[,1:4],
type='l', ylab="func", xlab="first dim")
The hyperparameters will now be changed in a way that we know will
change these basis functions. Note that ob$build
is
required after updating the hyperparameters for it to take effect as
ob
does not know when om
is updated.
= gethyp(om)
hyp0 2] = 3 #changing the power on first parameter
hyp0[$updatehyp(hyp0)
om$build() #rebuild after updatehyp ob
This leads to an asymmetric basis function set for this first
dimension because of the power transform in
?covf_mat25pow
.
= ob$getbase(1)
basis_func1 matplot(x[,1],basis_func0[,1:4],
type='l', ylab="func", xlab="first dim",
main="original hyperparameters")
matplot(x[,1],basis_func1[,1:4],
type='l', ylab="func", xlab="first dim",
main="new hyperparameters")
The core building block for outerbase learning is the base class
?lpdf
, log probability density functions. This base class
forms the backbone behind learning using statistical models. Instances
of this class allow us to optimize coefficients, infer on uncertainty
and learn hyperparameters of covariance functions.
A small dataset can illustrate (almost) all core concepts related to
lpdf
.
= obtest_borehole3d(x) y
The length of the hyperparameters is 2 dimensions for each covariance function for a total of 6 hyperparameters.
gethyp(om)
#> inpt1.scale inpt1.power inpt2.scale inpt2.power inpt3.scale inpt3.power
#> 0 3 0 0 0 0
= c(-0.5,0,-0.5,0,-0.5,0)
hyp0 $updatehyp(hyp0) om
We will use 60
terms to build our approximation.
= om$selectterms(60) terms
The idea is to build a loglik
object that represent that
log likelihood of our data given the model and coefficients. We will
begin with ?loglik_std
, although this model is not
recommended for speed reasons. We can initialize it and check that we
can get gradients with respect to coefficients, covariance
hyperparameters, and parameters of the lpdf
object
itself.
= new(loglik_std, om, terms, y, x)
loglik = rep(0,loglik$nterms)
coeff0 $update(coeff0) # update it to get gradients
loglik$val
loglik#> [1] -160.7842
head(loglik$grad) # dim 60 for number of coeffients
#> [,1]
#> [1,] 0.1046944
#> [2,] -1.0283251
#> [3,] -0.3458546
#> [4,] 2.9306071
#> [5,] -0.4560907
#> [6,] -0.1768621
A reasonable statistical model also needs prior on the coefficients. This tells us what distribution we expect on the coefficients.
= new(logpr_gauss, om, terms) logpr
To make the handling of these two objects loglik
and
logpr
easier, the ?lpdfvec
class is helpful to
tie the objects together. It will share the hyperparameter vector
between them, so they need to be based on the same outermod
object. But it will concatenate the parameters.
= new(lpdfvec, loglik, logpr)
logpdf = getpara(logpdf)
para0
para0#> noisescale coeffscale
#> 3.035664 6.000000
2] = 4
para0[$updatepara(para0)
logpdfgetpara(logpdf)
#> noisescale coeffscale
#> 3.035664 4.000000
The coefficients coeff
are considered ancillary
parameters that need to be optimized out (or something more
sophisticated, hint on current research). For this class, it is easiest
to do this via lpdf$optnewton
, which takes a single Newton
step to optimize the coefficients.
$optnewton() logpdf
Some test data will help illustrate prediction.
= 1000
testsampsize = matrix(runif(testsampsize*d),ncol=d)
xtest = obtest_borehole3d(xtest) ytest
We can see the predictive accuracy using the ?predictor
class which automatically pulls correct information out of
loglik
to design predictions.
= new(predictor,loglik)
predt $update(xtest)
predt= as.vector(predt$mean())
yhat = as.vector(predt$var())
varpred
plot(yhat,ytest, xlab="prediction", ylab="actual")
hist((ytest-yhat)/sqrt(varpred),
main="standarized test residuals",
xlab = "standarized test residuals")
The main value in this approach is the automated pulling of important gradients related to covariance hyperparameters and model parameters.
$optnewton()
logpdf$gradhyp # dim 6 for all hyperparameter
logpdf#> [,1]
#> [1,] -1.2753801
#> [2,] 0.1352539
#> [3,] 7.4960689
#> [4,] 0.8817342
#> [5,] 8.5644385
#> [6,] 0.9453822
$gradpara # dim 2 since 2 parameters
logpdf#> [,1]
#> [1,] -18.283833
#> [2,] -3.063965
This allows us to use custom functions to learn these hyperparameters
to give maximum predictive power. The goal right now is a single point
estimate of these hyperparameters. One has to be very careful to keep
these in good ranges, and the call below will make sure to return
-inf
if there is a problem with the chosen
hyperparameters.
= function(parlist) { #my optimization function for tuning
totobj = logpdf$paralpdf(parlist$para) # get regularization for lpdf
regpara = om$hyplpdf(parlist$hyp) # get regularization for om
reghyp if(is.finite(regpara) && is.finite(reghyp)) { # if they pass
$updatehyp(parlist$hyp) # update hyperparameters
om$updateom() # update the outerbase inside
logpdf$updatepara(parlist$para) # update parameter
logpdf$optnewton() # do opt
logpdf
= parlist #match structure
gval $hyp = -logpdf$gradhyp-om$hyplpdf_grad(parlist$hyp)
gval$para = -logpdf$gradpara-logpdf$paralpdf_grad(parlist$para)
gvallist(val = -logpdf$val-reghyp-regpara, gval = gval)
else list(val = Inf, gval = parlist) } }
This works by querying the objects themselves to check if the
parameters are reasonable. This package provides a custom deployment of
BFGS
in ?BFGS_std
to optimize functions like
this.
= list(para = getpara(logpdf), hyp = gethyp(om))
parlist totobj(parlist)
#> $val
#> [1] 86.02803
#>
#> $gval
#> $gval$para
#> [,1]
#> [1,] 18.283833
#> [2,] 2.563965
#>
#> $gval$hyp
#> [,1]
#> [1,] -4.0817628
#> [2,] -0.1352539
#> [3,] -12.8532117
#> [4,] -0.8817342
#> [5,] -13.9215814
#> [6,] -0.9453822
= BFGS_std(totobj, parlist, verbose=3) #
opth #>
#> ########started BFGS#######
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 0 86.028 NA NA 0.1
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 1 83.146 -2.88146 -45.2956 0.1
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 2 78.1594 -4.98609 -0.235414 0.50357
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 3 60.5026 -17.6542 -31.1706 0.539329
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 4 55.385 -5.11704 -2.77443 0.573679
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 5 46.786 -8.59786 -7.32502 0.606459
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 6 38.3531 -8.43183 -5.27123 0.637561
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 7 33.1031 -5.24912 -9.31017 0.666913
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 8 32.2324 -0.870293 -9.3074 0.694484
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 9 31.5246 -0.707647 -1.5764 0.720271
#> num iter: 10 obj start: 86.02803 obj end: 31.49121
#> final learning rate: 0.7442976
#> approx lower bound (not achieved): 31.43811
#> #########finished BFGS########
Then we just have to update out parameters and re-optimize, and we can check that we are at least must closer to stationary point.
totobj(opth$parlist)
#> $val
#> [1] 31.49121
#>
#> $gval
#> $gval$para
#> [,1]
#> [1,] -0.5105922
#> [2,] -1.4425483
#>
#> $gval$hyp
#> [,1]
#> [1,] 0.02649228
#> [2,] 0.06837665
#> [3,] -0.40623766
#> [4,] 0.06382342
#> [5,] -0.38065302
#> [6,] 0.19144968
These steps can all be nicely wrapped up in another function
?BFGS_lpdf
, which is a simpler call with the same result.
Note because of some things not fully understood, these numbers will not
match exactly above, but they will be quite close and
functionally the same.
= BFGS_lpdf(om, logpdf,
opth parlist=parlist,
verbose = 3, newt= TRUE)
#>
#> ########started BFGS#######
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 0 86.028 NA NA 0.1
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 1 83.146 -2.88146 -45.2956 0.1
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 2 78.1594 -4.98609 -0.235414 0.50357
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 3 60.5026 -17.6542 -31.1706 0.539329
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 4 55.385 -5.11704 -2.77443 0.573679
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 5 46.786 -8.59786 -7.32502 0.606459
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 6 38.3531 -8.43183 -5.27123 0.637561
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 7 33.1031 -5.24912 -9.31017 0.666913
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 8 32.2324 -0.870293 -9.3074 0.694484
#> iter.no obj.value wolfe.cond.1 wolfe.cond.2 learning.rate
#> 9 31.5246 -0.707647 -1.5764 0.720271
#> num iter: 10 obj start: 86.02803 obj end: 31.49121
#> final learning rate: 0.7442976
#> approx lower bound (not achieved): 31.43811
#> #########finished BFGS########
The revised predictions are then built after hyperparameter optimization where we find improved predictive accuracy in nearly every category. Here we can see a much better alignment between predictions and actual alongside a better plot of standardized residuals (more like a standard normal distribution).
= new(predictor,loglik)
predtt $update(xtest)
predtt= as.vector(predtt$mean())
yhat = as.vector(predtt$var())
varpred
plot(ytest,yhat)
hist((ytest-yhat)/sqrt(varpred), main="standarized test residuals",
xlab = "standarized test residuals")