The R package RRRR provides methods for estimating online Robust Reduced-Rank Regression.
If you use any code from the RRRR
package in a publication, please use the following citation:
Yangzhuoran Yang and Ziping Zhao (2020). RRRR: Online Robust Reduced-Rank Regression Estimation. R package version 1.0.0. https://pkg.yangzhuoranyang.com/RRRR/.
This vignette aims to provide illustrations to estimate and update (Online) Reduced-Rank Regression using various methods contained in the package.
The formulation of the reduced-rank regression is as follow: \[y = \mu +AB' x + D z+innov,\] where for each realization
The matrix resulted from \(AB'\) will be a reduced rank coefficient matrix with rank of \(r\). The function estimates parameters \(\mu\), \(A\), \(B\), \(D\), and \(Sigma\), the covariance matrix of the innovation’s distribution.
To simulate example data that can be used to estimate reduced-rank regression, use function RRR_sim
.
data <- RRR_sim()
data
#> Simulated Data for Reduced-Rank Regression
#> ------------
#> Specifications:
#> N P Q R r
#> 1000 3 3 1 1
#>
#> Coefficients:
#> mu A B D Sigma1 Sigma2 Sigma3
#> 1 0.100000 -0.741124 -2.533624 0.313055 1.000000 0.000000 0.000000
#> 2 0.100000 0.965108 -1.360413 0.899096 0.000000 1.000000 0.000000
#> 3 0.100000 0.086855 -0.261861 -1.067664 0.000000 0.000000 1.000000
A number of parameters can be specified. See ?RRR_sim
. The default arguments are set in such a way that the matrix resulted from \(AB'\) will be a reduced rank coefficient matrix with rank of \(r\).
str(data)
#> List of 4
#> $ spec:List of 11
#> ..$ P : num 3
#> ..$ N : num 1000
#> ..$ Q : num 3
#> ..$ R : num 1
#> ..$ A : num [1:3, 1] -0.7411 0.9651 0.0869
#> ..$ B : num [1:3, 1] -2.534 -1.36 -0.262
#> ..$ mu : num [1:3] 0.1 0.1 0.1
#> ..$ D : num [1:3, 1] 0.313 0.899 -1.068
#> ..$ r : num 1
#> ..$ Sigma: num [1:3, 1:3] 1 0 0 0 1 0 0 0 1
#> ..$ innov: num [1:1000, 1:3] 1.2173 0.0473 -0.5319 -0.6484 1.1812 ...
#> $ y : num [1:1000, 1:3] 0.837 -3.285 -0.456 -4.804 1.756 ...
#> $ x : num [1:1000, 1:3] -0.928 -1.36 0.308 -1.466 -0.283 ...
#> $ z : num [1:1000] 1.5043 -0.1134 -0.2133 -1.243 -0.0085 ...
#> - attr(*, "class")= chr [1:2] "RRR_data" "list"
The returned list of RRR_sim
contains the input specifications and the data points \(y\), \(x\) and \(z\).
RRR
The Gaussian Maximum Likelihood Estimation method is described in Johansen, S. (1991). This method is not robust in the sense that it assumes a Gaussian distribution for the innovations which does not take into account the heavy-tailedness of the true distribution and outliers.
res_gmle <- RRR(y=data$y, x=data$x, z = data$z)
res_gmle
#> Reduced-Rank Regression
#> ------------
#> Specifications:
#> N P Q R r
#> 1000 3 3 1 1
#>
#> Coefficients:
#> mu A B D Sigma1 Sigma2 Sigma3
#> 1 0.099621 2.163276 0.848105 0.274346 2.573112 0.573237 -0.215022
#> 2 0.045403 -2.959030 0.495220 0.825019 0.573237 5.746330 -1.546488
#> 3 0.113156 -0.208228 0.098220 -0.992187 -0.215022 -1.546488 3.704359
The matrix \(z\) and the constant \(\mu\) term are optional.
RRRR
The Majorisation-Minimisation estimation method is partly described in Zhao, Z., & Palomar, D. P. (2017). This method is robust in the sense that it assumes a heavy-tailed Cauchy distribution for the innovations. As before the matrix \(z\) and the constant term \(\mu\) are optional.
res_mm <- RRRR(y=data$y, x=data$x, z = data$z,
itr = 100,
earlystop = 1e-4)
res_mm
#> Robust Reduced-Rank Regression
#> ------
#> Majorisation-Minimisation
#> ------------
#> Specifications:
#> N P Q R r
#> 1000 3 3 1 1
#>
#> Coefficients:
#> mu A B D Sigma1 Sigma2 Sigma3
#> 1 0.1243295 1.1155298 1.7177603 0.3285943 0.6700157 0.0083308 0.0129760
#> 2 0.0084515 -1.4196819 0.9444727 0.9574126 0.0083308 0.7528924 0.0138218
#> 3 0.1051765 -0.1065252 0.1882104 -1.0488185 0.0129760 0.0138218 0.7323035
Additional arguments that are worth noticing are itr
, which control the maximum number of iteration, and earlystop
, which is the criteria to stop the algorithm early. The algorithm will stop if the improvement on objective value is small than earlystop
\(\times\ objective\_from\_last\_iteration\).
This method is an iterative optimization algorithm so we can use the inbuilt plot.RRRR
method to see the convergence plot of the algorithm.
Argument aes_x
can set the x axis to be the number of iteration or the run time. Argument xlog10
can indicate whether the scale of x axis is log 10 transformed.
ORRRR
The description of the generic Stochastic Successive Upper-bound Minimisation method and the Sample Average Approximation can be found in Razaviyayn, M., Sanjabi, M., & Luo, Z. Q. (2016).
There are two major estimation methods supported:
The algorithm is online in the sense that the data is continuously incorporated and the algorithm can update the parameters accordingly. As before the matrix \(z\) and the constant term \(\mu\) are optional.
At each iteration of SAA, a new realisation of the parameters is achieved by solving the minimisation problem of the sample average of the desired objective function using the data currently incorporated. This can be computationally expensive when the objective function is highly nonconvex. The SMM method overcomes this difficulty by replacing the objective function by a well-chosen majorising surrogate function which can be much easier to optimise.
By default the function ORRRR
uses SMM.
res_smm <- ORRRR(y=data$y, x=data$x, z=data$z,
initial_size = 100, addon = 10)
#> Loading required namespace: lazybar
res_smm
#> Online Robust Reduced-Rank Regression
#> ------
#> Stochastic Majorisation-Minimisation
#> ------------
#> Specifications:
#> N P R r initial_size addon
#> 1000 3 1 1 100 10
#>
#> Coefficients:
#> mu A B D Sigma1 Sigma2 Sigma3
#> 1 0.1266684 1.1252397 1.7025433 0.3306562 0.6862673 0.0081127 0.0138327
#> 2 0.0104494 -1.4338496 0.9356531 0.9576452 0.0081127 0.7733818 0.0156449
#> 3 0.1032553 -0.1090974 0.1870791 -1.0498457 0.0138327 0.0156449 0.7507572
The simulated data set is of size 1000. In the above command, in the first iteration 100 realisations are used in the estimation with 10 more data points in each of the following iteration. Because of the increasing data size, the estimation will be slower the longer the algorithm run. Therefore, the estimated time left in the progress bar is not very accurate in this sense.
The output from ORRRR
can also plotted using plot.RRRR
.
When using SAA, there are two sub solver supported in each iteration.
optim
function from the stats
package, andres_saa_optim <- ORRRR(y=data$y, x=data$x, z=data$z,
method = "SAA", SAAmethod = "optim")
res_saa_mm <- ORRRR(y=data$y, x=data$x, z=data$z,
method = "SAA", SAAmethod = "MM")
optim
is a general purpose solver which means it will be quite slow for this specific problem, especially when the number of parameters is large. Embedding majorisation-minimisation into subiteration of SAA is more like a heuristic without solid theory backing up its efficiency. Due to the time constraint we do not show the estimated result here.
update.RRRR
With the result from ORRRR
, user can still update it with newly achieved data using function update
. Note the result from RRRR
can also be updated where it simply takes the result from RRRR
as the starting point in online estimation.
newdata <- RRR_sim()
res2_smm <- update(res_smm, newy=newdata$y, newx=newdata$x, newz=newdata$z)
res2_smm
#> Online Robust Reduced-Rank Regression
#> ------
#> Stochastic Majorisation-Minimisation
#> ------------
#> Specifications:
#> N P R r initial_size addon
#> 2000 3 1 1 1010 10
#>
#> Coefficients:
#> mu A B D Sigma1 Sigma2 Sigma3
#> 1 0.115843 0.491977 2.127719 0.722976 2.544548 0.106334 0.156734
#> 2 -0.013222 -1.183080 0.751260 1.162244 0.106334 1.215185 0.284107
#> 3 0.100806 0.069405 0.231641 -0.059730 0.156734 0.284107 2.456228
Without other arguments specified, update
will just take the original specification in the model. If it applies to output of RRRR
, then the default would be the default arguments in ORRRR
, i.e., with method
set to “SMM” and addon
set to 10.
This package is free and open source software, licensed under GPL-3