Building an Artificial Neural Network with ‘oops’ Layers to Forecast Inflation

Christopher Mann

2022-03-02

I discuss how to use the oops package in R to create an artificial neural network by hand, then train the network to forecast price inflation. This is for illustrative purposes and is not intended as an optimal method of either building a neural network or forecasting inflation.

Introduction to Artificial Neural Networks

Most discussions of artificial neural networks start with a graph of interconnected circles to demonstrate how the network connects data inputs to output through different “hidden layers”. Those graphs are overly complex and obfuscate the network’s internal structure. I think that the best way to start is to think about a regression. A regression takes a set of independent variables and relates it to another dependent variable through a specified function, such as relating height, exercise, and eating habits to weight. Mathematically, this is

\[\widehat{y} = f(\beta\cdot X)\]

where \(X\) is a matrix of independent variables, \(\beta\) is a vector of regression coefficients that describe the importance of each variable, \(f(\bullet)\) is the function, and \(\widehat{y}\) is the prediction of the dependent variable. The simple linear regression that plots the best-fitting, straight line through a series of data point just uses the linear function \(f(x) = x\). A regression typically solves for \(\beta\) by minimizing some loss function, commonly the sum of squared prediction error for each observation - \(\sum_i \left(y_i - \widehat{y}_i\right)^2\).

Neural networks expand the idea of regression by nesting multiple functions and sets of coefficient weights.

\[\widehat{y} = f_n \left(\dots f_2\left(\beta_2\cdot f_1\left(\beta_1\cdot X\right)\right)\right)\]

Each of the \(n\) functions above and their weights are the network’s layers, \(n-1\) of which are often considered “hidden”. The nested functions add nonlinearity and allow complex relationships to exist relative to standard regressions. Note that this simplifies to a single, linear function if all \(f\) are linear.

The different \(\beta\)’s are estimated by starting with an original guess, then feeding the input matrix \(X\) forward through each function to end up with a prediction. The prediction for each observation is compared to its original value to determine the error. The error is then used with the gradient of each function, starting with the last function, to determine how to update each \(\beta\). This is called backpropagation and the process repeats until the error is sufficiently small or the maximum number of iterations are reached.

Implementation

To implement a neural network, we will use the ‘oops’ library in R to create a Layer Class. Class() creates a function that generates Layer objects. The argument "Layer" gives each instance the S3 class name “Layer” to use with the different methods.

library(oops)

Layer <- oClass("Layer")

Each layer will have a set of weights, or \(\beta\) in previous examples. The dimensions of beta will vary for each instance of the layer, so they will be provided when created. Once the dimensions are known, beta can be initialized to set of random numbers. Though we will initialize beta to a specific set of values in this example to ensure that the vignette is reproducible. We will also keep track of the output in each layer for backpropagation.

We will create an init method for “Layer” objects that takes the dimensions and populates the object fields accordingly. init() is automatically called on each new oops Class instance.

init.Layer <- function(self, dim){
  self$dim    <- dim
  self$beta   <- matrix(rep(0.01, prod(dim)), nrow = dim[1])
  self$output <- 0
  return(self)
}

Each layer is also associated with a particular activation function and its derivative. We could include this in init function above, but another solution is to create a different Class associated with each function. As long as the new class inherits our Layer Class, the init() function will continue to work.

Most neural network layers use the same activation function in all layers. However, this exercise is primarily for demonstration purpose; so we will use three different functions.

Layer_Relu <- oClass(
  inherit = Layer,
  active_fun = function(x) ifelse(x <= 0, 0, x),
  deriv_fun  = function(x) ifelse(x <= 0, 0, 1)
)
#> Warning in create_oclass(dots, name = name, inherit = inherit, portable
#> = portable, : Classes should be named to ensure that methods are properly
#> deployed.

Layer_Gcu <- oClass(
  inherit = Layer, 
  active_fun = function(x) x*cos(x),
  deriv_fun  = function(x) cos(x) - x*sin(x)
)
#> Warning in create_oclass(dots, name = name, inherit = inherit, portable
#> = portable, : Classes should be named to ensure that methods are properly
#> deployed.

Layer_Softplus <- oClass(
  inherit = Layer,
  active_fun = function(x) log(1+exp(x)),
  deriv_fun  = function(x) 1/(1+exp(-x))
)
#> Warning in create_oclass(dots, name = name, inherit = inherit, portable
#> = portable, : Classes should be named to ensure that methods are properly
#> deployed.

Note the difference between each of the Class objects above and the original Layer class. First, I did not provide a name for each class, which issued a warning. This is ok since I do not need to distinguish between the individual layer types - this can be done by examining the activation function if necessary. Also, each new class inherits from Layer, meaning that “Layer” methods will already deploy on each new instance.

class( Layer_Relu(1) )
#> [1] "Layer"    "Instance"

ls( Layer_Relu(1) )
#> [1] "beta"   "dim"    "output"

The other difference is that the values for each function are provided at the time of Class creation. Any named value passed to Class() is stored as field that all instances inherit. For example, all instances created by Layer_Relu can automatically access the active_fun field, though they do not directly store it. This is why active_fun did not show up when ls() was used above.

Now that we have our layers, let us build a Class for our neural network.

NNetwork <- oClass("NNetwork")

Each network needs to initialize with a dependent \(y\) variable and a matrix of independent \(X\) variables. It also needs to store the layers - we will use one of each of the three we created. Finally, we will train the network by fitting our beta weights to the data.

init.NNetwork <- function(self, y, x){
  self$y      <- y
  self$X      <- X
  self$layers <- list(
    Layer_Gcu( dim=c(ncol(X), 4) ),
    Layer_Relu( dim=c(4, 2) ),
    Layer_Softplus( dim=c(2,1) )
  )
  train_nn(self)
}

Fitting a neural network involves two steps: feeding the data through each of the layers, then propagating the error backwards through each layer to adjust the weights. This is repeated until the change in the error is sufficiently small or a number of iterations, or epochs, has been reached. So, let us add some starting values to our neural network Class to store these values, and some additional control variables.

NNetwork$mse        <- Inf
NNetwork$epoch      <- 0
NNetwork$tolerance  <- 1e-08
NNetwork$max_epoch  <- 5000
NNetwork$learn_rate <- 0.2

One of the included control variables is the learning rate. This determines how much we will change our beta weights in each epoch. If this value is too high, our betas may oscillate around their optimal values without reaching them. If it is too low, then the number of epochs will grow without limit (or hit the maximum) and we may never reach the best set of values.

Next, let us create our train_nn function.

train_nn <- function(nn){
  while(TRUE) {
    nn$epoch <- nn$epoch + 1
    if (nn$epoch >= nn$max_epoch) break
    out  <- feed_forward(nn)          # returns output of last layer
    diff <- back_propagate(nn, out)   # returns change in mse
    if (diff < nn$tolerance){
      nn$flag <- 1
      break
    }
  }
  nn
}

To feed forward, we start with our input X and apply it to each layer’s activation function sequentially, keeping track of the output at each step for adjustment purposes later. We will return the final output so that we can easily pass it into the next backpropagation step.

feed_forward <- function(nn){
  output <- nn$X
  for (layer in nn$layers){
    output <- layer$active_fun(output %*% layer$beta)
    layer$output <- output
  }
  output
}

Feeding the data through the layers is the easy part. Changing the weights is more difficult. We use a method called gradient descent, which takes the derivative of the activation function in each layer and moves along it depending the relative size of the error.

For the algorithm, we start at the last layer and calculate the loss gradient, \(\delta\). Let \(\epsilon = \left(y - \widehat{y}\right)\) be the total prediction error and \(f_i^\prime\left(z_i\right)\) be the derivative of the activation function of layer \(i\) applied to its output. Finally, let \(\cdot\) represent element-wise multiplication, while \(\times\) is matrix multiplication.

For the last layer,

\[\delta_L = \epsilon\cdot f_L^\prime\left(z_L\right)\]

The loss gradient for each other layer is calculated recursively by

\[\delta_i = \left[\beta_{i+1}^T \times \delta_{i+1}\right]\cdot f_i^\prime\left(z_i\right)\] Finally, our change in weights for each layer is determined by multiplying the output from the previous layer by the loss gradient. The initial layer uses the original input matrix.

\[\Delta \beta_i = -z_{i-1}^T \times \delta_i\] Since we are calculating the error in this step, we might as well calculate the mean square error and return how it is changing at each iteration.

back_propagate <- function(nn, output){
  error  <- output - nn$y
  delta  <- 2*error/length(output)     # Loss = sum(error^2)/n => dLoss = 2/n*error
  nlayer <- length(nn$layers)
  
  for (i in nlayer:1){
    layeri <- nn$layers[[i]]
    if (i == nlayer){
      delta <- layeri$deriv_fun(output) * delta
    } else {
      delta <- (delta %*% t(nn$layers[[i+1]]$beta)) *
        layeri$deriv_fun(layeri$output)
    }
    if (i == 1){
      dweight <- t(nn$X) %*% delta
    } else {
      dweight <- t(nn$layers[[i-1]]$output) %*% delta
    }
    layeri$beta <- layeri$beta - nn$learn_rate * dweight
  }
  mse    <- mean(error^2)
  dmse   <- abs(nn$mse - mse)
  nn$mse <- mse
  return(dmse)
}

All we have left now is our prediction function. This will look very similar to the feed_forward() function defined above, except that the output at each layer is not saved.

predict_nn <- function(nn, X){
  output <- X
  for (layer in nn$layers){
    output <- layer$active_fun(output %*% layer$beta)
  }
  output
}

Application: Predicting Inflation

We need some data to apply our artificial neural network. The data is available using data(cpi_data) in the oops package, but this will demonstrate how to construct the data set as well. We will use the eFRED package to download the consumer price index.

library(eFRED)
library(dplyr)
dat <- fred("CPIAUCSL") 

To calculate annual inflation, we find the log difference of CPI and its 12 month lag. We can also calculate several past values of price inflation to use as our independent variables.

cpi_data <- dat %>%  transmute(
    date,
    pi    = log(CPIAUCSL) - log(lag(CPIAUCSL, 12)),
    pi.1  = lag(pi, 1),
    pi.2  = lag(pi, 2),
    pi.3  = lag(pi, 3),
    pi.6  = lag(pi, 6),
    pi.12 = lag(pi, 12)
  ) %>%
  na.omit

Now, let us split the dataset into training and test sets that will be used for fitting and evaluating the network accordingly.

train <- cpi_data[which(cpi_data$date <= as.Date("2019-12-01")),]
test  <- cpi_data[which(cpi_data$date > as.Date("2019-12-01") & cpi_data$date < as.Date("2021-01-01")),]

Now we train an artificial neural network with our training data.

y <- as.matrix(train$pi, ncol=1) 
X <- cbind(1, as.matrix(train[,3:7]))

nn <- NNetwork(y, X)

Note that nn is an oops Class object holding all of the network information. We can check the number iterations and the mean squared error accordingly. We can also check the weights for each layer, though these can be difficult to interpret compared to linear regression coefficients.

nn$epoch
#> [1] 4337

nn$mse
#> [1] 0.0001611906

nn$layers[[1]]$beta
#>             [,1]       [,2]       [,3]       [,4]
#>        0.4329761  0.4329761  0.4329761  0.4329761
#> pi.1  -0.4688054 -0.4688054 -0.4688054 -0.4688054
#> pi.2  -0.4537249 -0.4537249 -0.4537249 -0.4537249
#> pi.3  -0.4372365 -0.4372365 -0.4372365 -0.4372365
#> pi.6  -0.3863754 -0.3863754 -0.3863754 -0.3863754
#> pi.12 -0.2635620 -0.2635620 -0.2635620 -0.2635620

To predict the level of price inflation in the next period, we take the last columns of our testing data set and feed it into the neural network. Note that this approach assumes that we are only predicting one period ahead since we are using all of the most recent data for each observation.

Xtest <- cbind(1, as.matrix(test[, 3:7]))
prediction <- predict_nn(nn, Xtest)
prediction
#>           [,1]
#> 877 0.02310848
#> 878 0.02360990
#> 879 0.02400029
#> 880 0.02346047
#> 881 0.02183437
#> 882 0.02040416
#> 883 0.02007325
#> 884 0.02045291
#> 885 0.02074300
#> 886 0.02053071
#> 887 0.02066337
#> 888 0.02092708

mean(prediction - test$pi) # Average error
#> [1] 0.009243088

Note that an ordinary least squares regression actually performs better than the previous value. Artificial neural networks do not always provide superior forecasts to less sophisticated methods. A lot of tuning of model parameters and the number of types of layers are necessary to produce optimal forecasts.

ols <- lm(y~-1+X, data = train)
mean(Xtest %*% as.matrix(ols$coeff) - test$pi)
#> [1] 0.0009104316