To begin, we’ll load foqat
and show three datasets in
foqat
:
aqi
is a dataset about time series of air quality with
1-second resolution.
voc
is a dataset about time series of volatile organic
compounds with 1-hour resolution.
met
is a dataset about time series of meterological
conditions with 1-hour resolution.
library(foqat)
head(aqi)
#> Time NO NO2 CO SO2 O3
#> 1 2017-05-01 01:00:00 0.0376578 2.79326 0.256900 NA 56.5088
#> 2 2017-05-01 01:01:00 0.0341483 2.76094 0.254692 NA 57.0546
#> 3 2017-05-01 01:02:00 0.0310285 2.65239 0.265178 NA 57.6654
#> 4 2017-05-01 01:03:00 0.0357016 2.60257 0.269691 NA 58.7863
#> 5 2017-05-01 01:04:00 0.0337507 2.59527 0.273395 NA 59.0342
#> 6 2017-05-01 01:05:00 0.0238120 2.57260 0.276464 NA 59.2240
head(voc)
#> Time Propylene Acetylene n.Butane trans.2.Butene Cyclohexane
#> 1 2020-05-01 00:00:00 0.233 0.1750 0.544 0.020 0.1020
#> 2 2020-05-01 01:00:00 0.376 0.2025 0.704 0.028 0.1045
#> 3 2020-05-01 02:00:00 0.519 0.2300 0.864 0.036 0.1070
#> 4 2020-05-01 03:00:00 0.805 0.2850 1.184 0.052 0.1120
#> 5 2020-05-01 04:00:00 0.658 0.2920 1.304 0.075 0.1230
#> 6 2020-05-01 05:00:00 0.538 0.3700 0.904 0.049 0.1110
head(met)
#> Time TEM HUM WS WD
#> 1 2017-05-01 00:00:00 21.4 87.0 3.0 39
#> 2 2017-05-01 00:05:00 21.2 86.7 3.6 68
#> 3 2017-05-01 00:10:00 21.0 86.3 3.5 76
#> 4 2017-05-01 00:15:00 20.9 85.8 3.4 73
#> 5 2017-05-01 00:20:00 20.8 86.0 2.8 68
#> 6 2017-05-01 00:25:00 20.8 86.0 2.3 68
The statdf()
allows you to statistics time series:
statdf(aqi)
#> mean sd min 25% 50% 75% max integrity
#> NO 0.33 0.61 -0.08 0.08 0.13 0.37 19.02 0.765
#> NO2 3.06 2.68 -0.15 1.07 2.21 4.13 20.53 0.786
#> CO 0.30 0.09 0.17 0.25 0.27 0.34 0.73 0.709
#> SO2 1.80 2.76 -0.15 0.25 0.97 2.11 34.08 0.734
#> O3 52.86 19.53 7.95 38.89 49.50 64.38 106.61 0.783
We can resample time series by using trs()
.
You can use bkip
to set a new time resolution.
The time series can be clipped by using st
(start time) and
et
(end time).
The default function of resampling is mean
. The wind data
is acceptable by setting wind
to TRUE
and
specifying coliws
(the column index of the wind speed) and
coliwd
(the column index of the wind speed).
=trs(met, bkip = "1 hour", st = "2017-05-01 01:00:00", wind = TRUE, coliws = 4, coliwd = 5)
new_met#> Joining, by = "temp_datetime"
head(new_met)
#> Time TEM HUM WS WD
#> 1 2017-05-01 01:00:00 21.18333 83.15833 4.555427 72.52891
#> 2 2017-05-01 02:00:00 21.54167 77.62500 4.238292 72.02753
#> 3 2017-05-01 03:00:00 20.71667 80.22500 5.287611 82.34847
#> 4 2017-05-01 04:00:00 20.52500 79.80000 5.653918 89.15400
#> 5 2017-05-01 05:00:00 21.12500 61.41667 7.417430 98.62400
#> 6 2017-05-01 06:00:00 21.30000 51.44167 8.401939 89.26818
You can also change the default function of resampling to
sum
, median
, min
,
max
, sd
, quantile
. If you choose
quantile
, you will also need to fill probs
(e.g., 0.5).
svri()
helps you compute the variation of time series
(e.g. calculate the max value of all values grouped by hours of
day).
The parameters of bkip
, st
,
et
, fun
is same as trs
. The wind
data is acceptable just like trs()
.
mode
allows you to choose modes of calculation,
value
is the sub parameter of mode
.There have
three modes: recipes
, ncycle
,
custom
which will be introduced below:
mode = recipes
recipes
stands for built-in solutions.
The mode recipes
corresponds to three values
:
day
, week
, month
.
day
means the time series will group by hours from 0 to
23.
week
means the time series will group by hours from 1 to
7.
month
means the time series will group by hours from 1 to
31. Below is an example which calculate the median values for time
series group by hour (e.g., 0:00, 1:00 …).
=svri(voc, bkip="1 hour", mode="recipes", value="day", fun="median")
new_voc#> Joining, by = "temp_datetime"
head(new_voc)
#> hour of day Propylene Acetylene n.Butane trans.2.Butene Cyclohexane
#> 1 0 0.461 0.3555 0.583 0.051 0.1020
#> 2 1 0.581 0.3710 0.704 0.048 0.1045
#> 3 2 0.583 0.4020 0.864 0.041 0.1120
#> 4 3 0.805 0.4530 1.184 0.052 0.1220
#> 5 4 0.658 0.4180 1.304 0.075 0.1230
#> 6 5 0.572 0.5620 0.923 0.049 0.1210
mode = ncycle
ncycle
stands for grouping time series by the order
number of each row in each cycle.
Below is an example which calculate the median values for time series
group by hour (e.g., 0:00, 1:00 …).
=svri(voc, bkip="1 hour", st="2020-05-01 00:00:00", mode="ncycle", value=24, fun="median")
new_voc#> Joining, by = "temp_datetime"
head(new_voc)
#> cycle Propylene Acetylene n.Butane trans.2.Butene Cyclohexane
#> 1 0 0.461 0.3555 0.583 0.051 0.1020
#> 2 1 0.581 0.3710 0.704 0.048 0.1045
#> 3 2 0.583 0.4020 0.864 0.041 0.1120
#> 4 3 0.805 0.4530 1.184 0.052 0.1220
#> 5 4 0.658 0.4180 1.304 0.075 0.1230
#> 6 5 0.572 0.5620 0.923 0.049 0.1210
mode = custom
custom
stands for grouping time series by a reference
column in time serires. If you select mode = custom
,
value
stands for the column index of the reference column.
Below is an example which calculate the median values for time series
group by hour (e.g., 0:00, 1:00 …).
#add a new column stands for hour.
$hour=lubridate::hour(voc$Time)
voc#calculate according to the index of reference column.
=svri(voc, bkip = "1 hour", mode="custom", value=7, fun="median")
new_vochead(new_voc[,-2])
#> custom cycle Propylene Acetylene n.Butane trans.2.Butene Cyclohexane
#> 1 0 0.461 0.3555 0.583 0.051 0.1020
#> 2 1 0.581 0.3710 0.704 0.048 0.1045
#> 3 2 0.583 0.4020 0.864 0.041 0.1120
#> 4 3 0.805 0.4530 1.184 0.052 0.1220
#> 5 4 0.658 0.4180 1.304 0.075 0.1230
#> 6 5 0.572 0.5620 0.923 0.049 0.1210
#rmove voc
rm(voc)
avri()
is a customized version of svri()
which helps you to calculate the average variation (with standard
deviation) of time series.
The output is a data frame which contains both the average variations and the standard deviations. An example is a time series of 3 species. The second to the fourth column are the average variations, and the fifth to the seventh column are the standard deviations.
=avri(voc, bkip = "1 hour", st = "2020-05-01 01:00:00")
new_voc#> Joining, by = "temp_datetime"
head(new_voc)
#> hour of day Propylene_ave Acetylene_ave n.Butane_ave trans.2.Butene_ave
#> 1 0 0.735375 0.48525 1.1655 0.0695
#> 2 1 0.737650 0.39920 1.0683 0.0575
#> 3 2 0.831800 0.37320 1.1748 0.0534
#> 4 3 1.420300 0.38060 2.2370 0.0910
#> 5 4 1.051800 0.42100 1.8614 0.0664
#> 6 5 1.133200 0.59140 1.8872 0.0604
#> Cyclohexane_ave Propylene_sd Acetylene_sd n.Butane_sd trans.2.Butene_sd
#> 1 0.0910 0.5876756 0.2137766 1.1091851 0.04648835
#> 2 0.1034 0.5677007 0.2099156 0.9505957 0.03992493
#> 3 0.1098 0.6452141 0.1634861 0.8482224 0.02864961
#> 4 0.1210 1.5527906 0.1721926 2.1616755 0.06951619
#> 5 0.1392 0.8127953 0.2090957 1.4807749 0.03008820
#> 6 0.1652 0.8916562 0.2569549 1.6192165 0.02475480
#> Cyclohexane_sd
#> 1 0.02184414
#> 2 0.02442181
#> 3 0.02115892
#> 4 0.02736786
#> 5 0.06293012
#> 6 0.10819057
prop()
helps you convert time series into proportion
time series (e.g., convert a time series of concentrations of species
into a time series of contributions of species).
=prop(voc)
prop_vochead(prop_voc)
#> Time Propylene Acetylene n.Butane trans.2.Butene Cyclohexane
#> 1 2020-05-01 00:00:00 0.2169460 0.1629423 0.5065177 0.01862197 0.09497207
#> 2 2020-05-01 01:00:00 0.2657244 0.1431095 0.4975265 0.01978799 0.07385159
#> 3 2020-05-01 02:00:00 0.2955581 0.1309795 0.4920273 0.02050114 0.06093394
#> 4 2020-05-01 03:00:00 0.3301887 0.1168991 0.4856440 0.02132896 0.04593929
#> 5 2020-05-01 04:00:00 0.2683524 0.1190865 0.5318108 0.03058728 0.05016313
#> 6 2020-05-01 05:00:00 0.2728195 0.1876268 0.4584178 0.02484787 0.05628803
anylm()
allows you to analyze linear regression for time
series in batch.
xd
are the index of columns you want to put in x axis
(independent variables).
yd
are the index of columns you want to put in y axis
(dependent variables).
zd
are the index of columns you want to put as color
scales. td
are the index of columns you want to use as a
basis for grouping.
A simple example is demonstrated below to illustrate the
functionality.
This example explores the correlation of the built-in dataset aqi.
Grouped by day, it explores the correlation of O3 with NO and NO2 for
each day. and explores the effect of CO on correlations using CO as the
fill color.
=data.frame(aqi,day=day(lubridate::aqi$Time))
df=anylm(df, xd=c(2,3), yd=6, zd=4, td=7,dign=3)
lr_resultView(lr_result)