Curate File Collections

Ideally, file collections are generated by passing proper file specifications in a single call to collate(). In reality, post-processing of the file collection objects is sometimes needed. Thus, we support a selected number of file collection operations.

1 Merge file collections

You can merge file collections after they are collated. This operation returns the union of the files.

library("pkglite")
pkg <- system.file("examples/pkg1/", package = "pkglite")

fc <- merge(
  pkg %>% collate(file_root_core()),
  pkg %>% collate(file_r()),
  pkg %>% collate(file_r(), file_man())
)

fc
-- File collection -------------------------------------------------------------
-- Package: pkg1 ---------------------------------------------------------------
               path_rel format
1           DESCRIPTION   text
2             NAMESPACE   text
3               NEWS.md   text
4             README.md   text
5              R/data.R   text
6             R/hello.R   text
7      R/pkg1-package.R   text
8         R/sysdata.rda binary
9        man/dataset.Rd   text
10   man/hello_world.Rd   text
11  man/pkg1-package.Rd   text
12 man/figures/logo.png binary

By design, one file collection object only stores metadata of files from a single package. Therefore, merging file collections from different packages will result in an error.

2 Prune file collections

To remove files from a file collection, use prune():

fc %>% prune(path = c("NEWS.md", "man/figures/logo.png"))
-- File collection -------------------------------------------------------------
-- Package: pkg1 ---------------------------------------------------------------
              path_rel format
1          DESCRIPTION   text
2            NAMESPACE   text
3            README.md   text
4             R/data.R   text
5            R/hello.R   text
6     R/pkg1-package.R   text
7        R/sysdata.rda binary
8       man/dataset.Rd   text
9   man/hello_world.Rd   text
10 man/pkg1-package.Rd   text

Only the files matching the exact relative path(s) will be removed.

The prune operation is type-stable. If all files in a file collection are removed, an empty file collection is returned so that it can still be merged with the other file collections.

pkg %>%
  collate(file_data()) %>%
  prune(path = "data/dataset.rda")
-- File collection -------------------------------------------------------------
-- Package: pkg1 ---------------------------------------------------------------
[1] path_rel format  
<0 rows> (or 0-length row.names)

3 Sanitize file collections

A file collection might contain files that should almost always be excluded, such as the files defined in pattern_file_sanitize():

pattern_file_sanitize()
[1] "/\\.DS_Store$"     "/Thumbs\\.db$"     "/\\.git$"         
[4] "/\\.svn$"          "/\\.hg$"           "/\\.Rproj\\.user$"
[7] "/\\.Rhistory$"     "/\\.RData$"        "/\\.Ruserdata$"   

You can use sanitize() to remove such files (if any) from a file collection:

fc %>% sanitize()
-- File collection -------------------------------------------------------------
-- Package: pkg1 ---------------------------------------------------------------
               path_rel format
1           DESCRIPTION   text
2             NAMESPACE   text
3               NEWS.md   text
4             README.md   text
5              R/data.R   text
6             R/hello.R   text
7      R/pkg1-package.R   text
8         R/sysdata.rda binary
9        man/dataset.Rd   text
10   man/hello_world.Rd   text
11  man/pkg1-package.Rd   text
12 man/figures/logo.png binary