This vignette introduces the DTSg package, shows how to create objects of its main class and explains their two interfaces: R6 and S3. Familiarity with the data.table package helps better understanding certain parts of the vignette, but is not essential to follow it.
First, let’s load some data. The package is shipped with a data.table containing a daily time series of river flow:
library(data.table)
library(DTSg)
data(flow)
flow
#>             date   flow
#>    1: 2007-01-01  9.540
#>    2: 2007-01-02  9.285
#>    3: 2007-01-03  8.940
#>    4: 2007-01-04  8.745
#>    5: 2007-01-05  8.490
#>   ---                  
#> 2165: 2012-12-27 26.685
#> 2166: 2012-12-28 28.050
#> 2167: 2012-12-29 23.580
#> 2168: 2012-12-30 18.840
#> 2169: 2012-12-31 17.250
summary(flow)
#>       date                          flow        
#>  Min.   :2007-01-01 00:00:00   Min.   :  4.995  
#>  1st Qu.:2008-07-19 00:00:00   1st Qu.:  8.085  
#>  Median :2010-01-12 00:00:00   Median : 11.325  
#>  Mean   :2010-01-08 23:32:46   Mean   : 16.197  
#>  3rd Qu.:2011-07-08 00:00:00   3rd Qu.: 18.375  
#>  Max.   :2012-12-31 00:00:00   Max.   :290.715Now that we have a data set, we can create our first object by providing it to the new method of the package’s main class generator DTSg. In addition, we specify an ID in order to give the new object a name:
TS <- DTSg$new(values = flow, ID = "River Flow")Creating an object with the package’s alternative interface abusing an S4 constructor looks like this:
TS <- new(Class = "DTSg", values = flow, ID = "River Flow")Printing the object shows us the data provided, the specified ID, some more slots for metadata, which we left empty, as well as that the object represents a regular UTC time series with a periodicity of one day. It also shows us that the first column has been renamed to .dateTime. This columns serves as its time index and cannot be changed at will:
TS$print() # or print(TS) or just TS
#> Values:
#>        .dateTime   flow
#>           <POSc>  <num>
#>    1: 2007-01-01  9.540
#>    2: 2007-01-02  9.285
#>    3: 2007-01-03  8.940
#>    4: 2007-01-04  8.745
#>    5: 2007-01-05  8.490
#>   ---                  
#> 2188: 2012-12-27 26.685
#> 2189: 2012-12-28 28.050
#> 2190: 2012-12-29 23.580
#> 2191: 2012-12-30 18.840
#> 2192: 2012-12-31 17.250
#> 
#> ID:          River Flow
#> Parameter:   
#> Variant:     
#> Unit:        
#> Aggregated:  FALSE
#> Regular:     TRUE
#> Periodicity: Time difference of 1 days
#> Time zone:   UTCWith this done, we can move on and further explore our time series with a summary (summary), a report on missing values (nas) and a plot (plot). It suddenly seems to contain several missing values which apparently were not there upon loading the data set (plot requires the xts, dygraphs and RColorBrewer packages to be installed; HTML vignettes unfortunately cannot display interactive elements, hence I included a static image of the JavaScript chart instead):
TS$summary() # or summary(TS)
#>       flow        
#>  Min.   :  4.995  
#>  1st Qu.:  8.085  
#>  Median : 11.325  
#>  Mean   : 16.197  
#>  3rd Qu.: 18.375  
#>  Max.   :290.715  
#>  NA's   :23
TS$nas(cols = "flow") # or nas(TS, cols = "flow")
#>    .col .group      .from        .to .n
#> 1: flow      1 2007-10-12 2007-10-24 13
#> 2: flow      2 2007-10-26 2007-11-03  9
#> 3: flow      3 2007-11-10 2007-11-10  1
if (requireNamespace("xts", quietly = TRUE) &&
    requireNamespace("dygraphs", quietly = TRUE) &&
    requireNamespace("RColorBrewer", quietly = TRUE)) {
  TS$plot(cols = "flow") # or plot(TS, cols = "flow")
}Looking at the original data set reveals that the missing values implicitly already were there. Putting it into the object simply expanded the data set to the automatically detected periodicity and made them explicit:
flow[date >= as.POSIXct("2007-10-09", tz = "UTC") & date <= as.POSIXct("2007-11-13", tz = "UTC"), ]
#>           date   flow
#>  1: 2007-10-09  9.180
#>  2: 2007-10-10  9.075
#>  3: 2007-10-11  9.000
#>  4: 2007-10-25 18.150
#>  5: 2007-11-04 25.350
#>  6: 2007-11-05 23.550
#>  7: 2007-11-06 23.400
#>  8: 2007-11-07 26.400
#>  9: 2007-11-08 39.150
#> 10: 2007-11-09 37.200
#> 11: 2007-11-11 27.450
#> 12: 2007-11-12 39.750
#> 13: 2007-11-13 31.350For fairly small gaps like this it might be okay to fill them by means of linear interpolation. Using the colapply method together with the interpolateLinear function will do the trick:
TS <- TS$colapply(fun = interpolateLinear)
# or colapply(TS, fun = interpolateLinear)
TS$nas()
#> Empty data.table (0 rows) of 5 cols: .col,.group,.from,.to,.nIn case no column name is provided through the cols argument, the first numeric column is taken by default. Column names for the cols argument can be requested with the help of the cols method by the way. It supports a class argument:
TS$cols(class = "all") # or cols(TS, class = "all")
#> [1] "flow"
TS$cols(class = "numeric")
#> [1] "flow"
TS$cols(class = "character")
#> character(0)The time series reaches from the beginning of the year 2007 to the end of the year 2012. Let’s say we are only interested in the first two years. With alter we can shorten, lengthen and/or change the periodicity of a DTSg object. The latter can be achieved through its by argument (no example given):
TS <- TS$alter(from = "2007-01-01", to = "2008-12-31")
# or alter(TS, from = "2007-01-01", to = "2008-12-31")
TS
#> Values:
#>       .dateTime   flow
#>          <POSc>  <num>
#>   1: 2007-01-01  9.540
#>   2: 2007-01-02  9.285
#>   3: 2007-01-03  8.940
#>   4: 2007-01-04  8.745
#>   5: 2007-01-05  8.490
#>  ---                  
#> 727: 2008-12-27 18.180
#> 728: 2008-12-28 16.575
#> 729: 2008-12-29 13.695
#> 730: 2008-12-30 12.540
#> 731: 2008-12-31 11.940
#> 
#> ID:          River Flow
#> Parameter:   
#> Variant:     
#> Unit:        
#> Aggregated:  FALSE
#> Regular:     TRUE
#> Periodicity: Time difference of 1 days
#> Time zone:   UTCIn order to get mean monthly river flows as an example, we can use the aggregate method with one of the package’s temporal aggregation level functions as its funby argument:
TSm <- TS$aggregate(funby = byYm____, fun = mean)
#  or aggregate(TS, funby = byYm____, fun = mean)
TSm
#> Values:
#>      .dateTime      flow
#>         <POSc>     <num>
#>  1: 2007-01-01 25.281290
#>  2: 2007-02-01 14.496964
#>  3: 2007-03-01 12.889839
#>  4: 2007-04-01 12.470500
#>  5: 2007-05-01  9.233226
#>  6: 2007-06-01  9.193500
#>  7: 2007-07-01 12.272419
#>  8: 2007-08-01 11.291129
#>  9: 2007-09-01  8.874500
#> 10: 2007-10-01 13.063065
#> 11: 2007-11-01 25.668500
#> 12: 2007-12-01 20.474032
#> 13: 2008-01-01 19.677097
#> 14: 2008-02-01 14.185345
#> 15: 2008-03-01 23.577581
#> 16: 2008-04-01 23.284000
#> 17: 2008-05-01 14.325968
#> 18: 2008-06-01  9.287000
#> 19: 2008-07-01 22.004032
#> 20: 2008-08-01 12.641129
#> 21: 2008-09-01 13.710500
#> 22: 2008-10-01 10.626774
#> 23: 2008-11-01  8.902000
#> 24: 2008-12-01 16.435645
#>      .dateTime      flow
#> 
#> ID:          River Flow
#> Parameter:   
#> Variant:     
#> Unit:        
#> Aggregated:  TRUE
#> Regular:     FALSE
#> Periodicity: 1 months
#> Min lag:     Time difference of 28 days
#> Max lag:     Time difference of 31 days
#> Time zone:   UTCPrinting the object shows us that its aggregated field has been set to TRUE. This is merely an indicator telling us to now interpret the timestamps of the series as periods between subsequent timestamps and not as snap-shots anymore.
The one family of temporal aggregation level functions of the package sets a timestamp to the lowest possible time of the corresponding temporal aggregation level, i.e., truncates a timestamp, and the other family extracts a certain part of it. An example is given for quarters below. By convention the year is set to 2199 in the latter case:
TSQ <- TS$aggregate(funby = by_Q____, fun = mean)
#  or aggregate(TS, funby = by_Q____, fun = mean)
TSQ
#> Values:
#>     .dateTime     flow
#>        <POSc>    <num>
#> 1: 2199-01-01 18.46127
#> 2: 2199-04-01 12.95266
#> 3: 2199-07-01 13.48924
#> 4: 2199-10-01 15.84620
#> 
#> ID:          River Flow
#> Parameter:   
#> Variant:     
#> Unit:        
#> Aggregated:  TRUE
#> Regular:     FALSE
#> Periodicity: 3 months
#> Min lag:     Time difference of 90 days
#> Max lag:     Time difference of 92 days
#> Time zone:   UTCAdditional temporal aggregation level functions exist for years, days, hours, minutes and seconds.
The last thing we want to achieve is the calculation of moving averages for a window of two time steps before and after each timestamp. We can do this with the help of the rollapply method:
TSs <- TS$rollapply(fun = mean, na.rm = TRUE, before = 2, after = 2)
#  or rollapply(TS, fun = mean, na.rm = TRUE, before = 2, after = 2)
TSs
#> Values:
#>       .dateTime    flow
#>          <POSc>   <num>
#>   1: 2007-01-01  9.2550
#>   2: 2007-01-02  9.1275
#>   3: 2007-01-03  9.0000
#>   4: 2007-01-04  8.7720
#>   5: 2007-01-05  8.5710
#>  ---                   
#> 727: 2008-12-27 19.3080
#> 728: 2008-12-28 16.4520
#> 729: 2008-12-29 14.5860
#> 730: 2008-12-30 13.6875
#> 731: 2008-12-31 12.7250
#> 
#> ID:          River Flow
#> Parameter:   
#> Variant:     
#> Unit:        
#> Aggregated:  FALSE
#> Regular:     TRUE
#> Periodicity: Time difference of 1 days
#> Time zone:   UTCOn a sidenote, some of the methods which take a function as an argument (colapply and rollapply) hand over to it an additional list argument called .helpers containing useful data for the development of user defined functions (please see the respective help pages for further information). This can of course be a problem for functions like sum which do not expect such a thing. A solution is to wrap it in an anonymous function with a ... parameter like this: function(x, ...) sum(x).
With this said, let’s join the result of the last calculation to the original time series and extract the values as a data.table for further processing in a final step (please note that the .dateTime column got its original name back):
TS <- TS$merge(y = TSs, suffixes = c("_orig", "_movavg"))
# or merge(TS, y = TSs, suffixes = c("_orig", "_movavg"))
TS$values()
#>            date flow_orig flow_movavg
#>   1: 2007-01-01     9.540      9.2550
#>   2: 2007-01-02     9.285      9.1275
#>   3: 2007-01-03     8.940      9.0000
#>   4: 2007-01-04     8.745      8.7720
#>   5: 2007-01-05     8.490      8.5710
#>  ---                                 
#> 727: 2008-12-27    18.180     19.3080
#> 728: 2008-12-28    16.575     16.4520
#> 729: 2008-12-29    13.695     14.5860
#> 730: 2008-12-30    12.540     13.6875
#> 731: 2008-12-31    11.940     12.7250The fields of a DTSg object of which the metadata are part of can be accessed through so called active bindings:
TS$ID
#> [1] "River Flow"Valid results are returned for the following fields:
A subset of this fields can also be actively set (please note that reference semantics always apply to fields, hence the “largely” in the title of the package):
# two new variables to demonstrate reference semantics
TSassigned <- TS
TScloned   <- TS$clone(deep = TRUE) # or clone(x = TS, deep = TRUE)
# set new ID
TS$ID <- "Two River Flows"
TS
#> Values:
#>       .dateTime flow_orig flow_movavg
#>          <POSc>     <num>       <num>
#>   1: 2007-01-01     9.540      9.2550
#>   2: 2007-01-02     9.285      9.1275
#>   3: 2007-01-03     8.940      9.0000
#>   4: 2007-01-04     8.745      8.7720
#>   5: 2007-01-05     8.490      8.5710
#>  ---                                 
#> 727: 2008-12-27    18.180     19.3080
#> 728: 2008-12-28    16.575     16.4520
#> 729: 2008-12-29    13.695     14.5860
#> 730: 2008-12-30    12.540     13.6875
#> 731: 2008-12-31    11.940     12.7250
#> 
#> ID:          Two River Flows
#> Parameter:   
#> Variant:     
#> Unit:        
#> Aggregated:  FALSE
#> Regular:     TRUE
#> Periodicity: Time difference of 1 days
#> Time zone:   UTC
# due to reference semantics, the new ID is also propagated to TSassigned, but not to TScloned (as all data manipulating methods create a clone by default, it is usually best to set or update fields after and not before calling such a method)
TSassigned
#> Values:
#>       .dateTime flow_orig flow_movavg
#>          <POSc>     <num>       <num>
#>   1: 2007-01-01     9.540      9.2550
#>   2: 2007-01-02     9.285      9.1275
#>   3: 2007-01-03     8.940      9.0000
#>   4: 2007-01-04     8.745      8.7720
#>   5: 2007-01-05     8.490      8.5710
#>  ---                                 
#> 727: 2008-12-27    18.180     19.3080
#> 728: 2008-12-28    16.575     16.4520
#> 729: 2008-12-29    13.695     14.5860
#> 730: 2008-12-30    12.540     13.6875
#> 731: 2008-12-31    11.940     12.7250
#> 
#> ID:          Two River Flows
#> Parameter:   
#> Variant:     
#> Unit:        
#> Aggregated:  FALSE
#> Regular:     TRUE
#> Periodicity: Time difference of 1 days
#> Time zone:   UTC
TScloned
#> Values:
#>       .dateTime flow_orig flow_movavg
#>          <POSc>     <num>       <num>
#>   1: 2007-01-01     9.540      9.2550
#>   2: 2007-01-02     9.285      9.1275
#>   3: 2007-01-03     8.940      9.0000
#>   4: 2007-01-04     8.745      8.7720
#>   5: 2007-01-05     8.490      8.5710
#>  ---                                 
#> 727: 2008-12-27    18.180     19.3080
#> 728: 2008-12-28    16.575     16.4520
#> 729: 2008-12-29    13.695     14.5860
#> 730: 2008-12-30    12.540     13.6875
#> 731: 2008-12-31    11.940     12.7250
#> 
#> ID:          River Flow
#> Parameter:   
#> Variant:     
#> Unit:        
#> Aggregated:  FALSE
#> Regular:     TRUE
#> Periodicity: Time difference of 1 days
#> Time zone:   UTCThese are:
For a full explanation of all the methods, functions, arguments and fields available in the package please consult the help pages.