tibbletime
has been officially retired. We will continue to maintain but not be adding new functionality. Options to get new functionality:
tibble
structuretsibble
structureBuilt on top of the tidyverse
, tibbletime
is an extension that allows for the creation of time aware tibbles through the setting of a time index.
Some immediate advantages of this include:
Performing compact time-based subsetting on tibbles.
Partitioning an index column by time (like yearly, monthly, every 2 weeks, etc.) so that you can use dplyr
’s grouped functionality to summarise and aggregate by time period.
Changing the periodicity of a time-based tibble. This allows easily changing from a daily dataset to a monthly or yearly dataset.
Easily working with the pipe and packages like dplyr
and tidyr
to make for a seamless experience with time series and the tidyverse. Each function has also been designed to work with dplyr::group_by()
allowing for powerful data manipulation.
Modifying functions for rolling analysis.
Creating tbl_time
time series objects quickly.
Using fully supported Date
and POSIXct
index columns, along with experimental support for yearmon
, yearqtr
and hms
which should become more stable as some issues in dplyr
are worked out.
Development Version:
CRAN Version:
If you have been using 0.0.2
, the update to 0.1.0
has introduced major breaking changes. This was necessary for long term stability of the package, and no attempt to support backwards compatability was made at this early stage in development. We apologize for any issues this causes. See NEWS for complete details.
The first thing to do is to turn your tibble
into a tbl_time
object. Notice the specification of the index
as the date
column of FB
.
library(tibbletime)
library(dplyr)
# Facebook stock prices. Comes with the package
data(FB)
# Convert FB to tbl_time
FB <- as_tbl_time(FB, index = date)
FB
#> # A time tibble: 1,008 x 8
#> # Index: date
#> symbol date open high low close volume adjusted
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
#> 2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8
#> 3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400 28.8
#> 4 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4
#> 5 FB 2013-01-08 29.5 29.6 28.9 29.1 45871300 29.1
#> 6 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700 30.6
#> 7 FB 2013-01-10 30.6 31.5 30.3 31.3 95316400 31.3
#> 8 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
#> 9 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800 31.0
#> 10 FB 2013-01-15 30.6 31.7 29.9 30.1 173242600 30.1
#> # … with 998 more rows
There are a number of functions that were designed specifically for tbl_time
objects. Some of them are:
filter_time()
- Succinctly filter a tbl_time object by date.
as_period()
- Convert a tbl_time object from daily to monthly, from minute data to hourly, and more. This allows the user to easily aggregate data to a less granular level.
collapse_by()
- Take an tbl_time
object, and collapse the index so that all observations in an interval share the same date. The most common use of this is to then group on this column with dplyr::group_by()
and perform time-based calculations with summarise()
, mutate()
or any other dplyr
function.
collapse_index()
- A lower level version of collapse_by()
that directly modifies the index
column and not the entire tbl_time
object. It allows the user more flexibility when collapsing, like the ability to assign the resulting collapsed index to a new column.
rollify()
- Modify a function so that it calculates a value (or a set of values) at specific time intervals. This can be used for rolling averages and other rolling calculations inside the tidyverse
framework.
create_series()
- Use shorthand notation to quickly initialize a tbl_time
object containing a regularly spaced index column of class Date
, POSIXct
, yearmon
, yearqtr
or hms
.
To look at just a few:
# Filter for dates from March 2013 to December 2015
FB %>%
filter_time('2013-03' ~ '2015')
#> # A time tibble: 716 x 8
#> # Index: date
#> symbol date open high low close volume adjusted
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 FB 2013-03-01 27.0 28.1 26.8 27.8 54064800 27.8
#> 2 FB 2013-03-04 27.8 28.1 27.4 27.7 32400700 27.7
#> 3 FB 2013-03-05 27.9 28.2 27.2 27.5 40622200 27.5
#> 4 FB 2013-03-06 28.1 28.1 27.4 27.5 33532600 27.5
#> 5 FB 2013-03-07 27.6 28.7 27.5 28.6 74540200 28.6
#> 6 FB 2013-03-08 28.4 28.5 27.7 28.0 44198900 28.0
#> 7 FB 2013-03-11 28.0 28.6 27.8 28.1 35642100 28.1
#> 8 FB 2013-03-12 28.1 28.3 27.6 27.8 27569600 27.8
#> 9 FB 2013-03-13 27.6 27.6 26.9 27.1 39619500 27.1
#> 10 FB 2013-03-14 27.1 27.4 26.8 27.0 27646400 27.0
#> # … with 706 more rows
# Change from daily to monthly periodicity
# This just reduces the tibble to the last row in each month
FB %>%
as_period("monthly", side = "end")
#> # A time tibble: 48 x 8
#> # Index: date
#> symbol date open high low close volume adjusted
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 FB 2013-01-31 29.2 31.5 28.7 31.0 190744900 31.0
#> 2 FB 2013-02-28 26.8 27.3 26.3 27.2 83027800 27.2
#> 3 FB 2013-03-28 26.1 26.2 25.5 25.6 28585700 25.6
#> 4 FB 2013-04-30 27.1 27.8 27.0 27.8 36245700 27.8
#> 5 FB 2013-05-31 24.6 25.0 24.3 24.4 35925000 24.4
#> 6 FB 2013-06-28 24.7 25.0 24.4 24.9 96778900 24.9
#> 7 FB 2013-07-31 38.0 38.3 36.3 36.8 154828700 36.8
#> 8 FB 2013-08-30 42.0 42.3 41.1 41.3 67735100 41.3
#> 9 FB 2013-09-30 50.1 51.6 49.8 50.2 100095000 50.2
#> 10 FB 2013-10-31 47.2 52 46.5 50.2 248809000 50.2
#> # … with 38 more rows
# Maybe you don't want to lose the rest of the month's information,
# and instead you'd like to take the average of every column for each month
FB %>%
select(-symbol) %>%
collapse_by("monthly") %>%
group_by(date) %>%
summarise_all(mean)
#> # A time tibble: 48 x 7
#> # Index: date
#> date open high low close volume adjusted
#> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2013-01-31 30.2 30.8 29.8 30.3 79802462. 30.3
#> 2 2013-02-28 28.3 28.6 27.7 28.1 50402095. 28.1
#> 3 2013-03-28 26.9 27.2 26.5 26.8 36359025 26.8
#> 4 2013-04-30 26.6 27.0 26.2 26.6 33568600 26.6
#> 5 2013-05-31 26.4 26.6 25.9 26.1 44640673. 26.1
#> 6 2013-06-28 24.0 24.3 23.7 23.9 39416575 23.9
#> 7 2013-07-31 27.7 28.2 27.4 27.9 65364414. 27.9
#> 8 2013-08-30 38.7 39.3 38.2 38.7 61136095. 38.7
#> 9 2013-09-30 45.5 46.3 44.9 45.8 79154190 45.8
#> 10 2013-10-31 50.7 51.5 49.7 50.5 88375435. 50.5
#> # … with 38 more rows
# Perform a 5 period rolling average
mean_5 <- rollify(mean, window = 5)
mutate(FB, roll_mean = mean_5(adjusted))
#> # A time tibble: 1,008 x 9
#> # Index: date
#> symbol date open high low close volume adjusted roll_mean
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28 NA
#> 2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8 NA
#> 3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400 28.8 NA
#> 4 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4 NA
#> 5 FB 2013-01-08 29.5 29.6 28.9 29.1 45871300 29.1 28.6
#> 6 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700 30.6 29.1
#> 7 FB 2013-01-10 30.6 31.5 30.3 31.3 95316400 31.3 29.8
#> 8 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7 30.4
#> 9 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800 31.0 30.7
#> 10 FB 2013-01-15 30.6 31.7 29.9 30.1 173242600 30.1 30.9
#> # … with 998 more rows
# Create a time series
# Every other day in 2013
create_series(~'2013', '2 day')
#> # A time tibble: 183 x 1
#> # Index: date
#> date
#> <dttm>
#> 1 2013-01-01 00:00:00
#> 2 2013-01-03 00:00:00
#> 3 2013-01-05 00:00:00
#> 4 2013-01-07 00:00:00
#> 5 2013-01-09 00:00:00
#> 6 2013-01-11 00:00:00
#> 7 2013-01-13 00:00:00
#> 8 2013-01-15 00:00:00
#> 9 2013-01-17 00:00:00
#> 10 2013-01-19 00:00:00
#> # … with 173 more rows
Groups created through dplyr::group_by()
are supported throughout the package. Because collapse_index()
is just adding a column you can group on, all dplyr
functions are supported.
# Facebook, Amazon, Netflix and Google stocks
data(FANG)
# Summarise by period and by group
FANG %>%
as_tbl_time(date) %>%
group_by(symbol) %>%
# Collapse to yearly
collapse_by("year") %>%
# Additionally group by date (yearly)
group_by(symbol, date) %>%
# Perform a yearly summary for each symbol
summarise(
adj_min = min(adjusted),
adj_max = max(adjusted),
adj_range = adj_max - adj_min
)
#> # A time tibble: 16 x 5
#> # Index: date
#> # Groups: symbol [4]
#> symbol date adj_min adj_max adj_range
#> <chr> <date> <dbl> <dbl> <dbl>
#> 1 AMZN 2013-12-31 248. 404. 156.
#> 2 AMZN 2014-12-31 287. 407. 120.
#> 3 AMZN 2015-12-31 287. 694. 407.
#> 4 AMZN 2016-12-30 482. 844. 362.
#> 5 FB 2013-12-31 22.9 58.0 35.1
#> 6 FB 2014-12-31 53.5 81.4 27.9
#> 7 FB 2015-12-31 74.1 109. 35.0
#> 8 FB 2016-12-30 94.2 133. 39.1
#> 9 GOOG 2013-12-31 351. 560. 209.
#> 10 GOOG 2014-12-31 495. 609. 114.
#> 11 GOOG 2015-12-31 493. 777. 284.
#> 12 GOOG 2016-12-30 668. 813. 145.
#> 13 NFLX 2013-12-31 13.1 54.4 41.2
#> 14 NFLX 2014-12-31 44.9 69.2 24.3
#> 15 NFLX 2015-12-31 45.5 131. 85.4
#> 16 NFLX 2016-12-30 82.8 128. 45.6
tibbletime
assumes that your dates are in ascending order. A warning will be generated if they are not when you use a function where order is relevant. We do this for speed purposes and to not force a change on the user’s dataset by sorting for them.