The R community tinyverse
movement is a tryout of creating a new package development standards.
The clue is the TINY part here. The last years od R package development
were full of high number of dependencies temptations.
tinyverse means as least dependencies as possible as R
package dependencies matter. Every dependency you add to your project is
an invitation to break your project.
More information is available on tinyverse
website.
tinyverse badgesThe tidyverse team want to help and motivate R
developers. Thus they created a rest-api which generates a dependencies
badge for each CRAN package. The badge contains 2 numbers the first
number is a direct dependencies and the second one recursive ones. The R
base packages are not counted. The tinyverse badge could
have 4 colors bright green or green or orange or red, to get a green
badge package have to have less than 5 packages (<5) in the
Depends/Imports/LinkingTo fields
(check Dependencies subsection for more description). To have a bright
green a zero dependencies are needed. The orange badge is from 5 to 9
dependencies (>=5 and <=9). And the last oen red when there are
more than 9 dependencies (>= 10). Of course the base packages are not
counted as a dependency, pacs::pacs_base().
Summing up, each badge constraint:
https://tinyverse.netlify.app/
https://tinyverse.netlify.com/badge/<package>
Examples:
The tidyverse is an opinionated collection of R packages
designed for data science. All packages share an underlying design
philosophy, grammar, and data structures. On the other hand the
tinyvere is only a R community movement which try to make a
new programming standard. There is no tinyverse packages
collection, any package which have less than 5 direct dependencies (in
the Depends/Imports/LinkingTo
fields) are treated as a decent one. The best is to have zero
dependencies. Even tidyverse looks to go in the direction
of tinyverse if we check their lower level packages like
purrr, forcats, renv or
rlang.
the core tidyverse packages:
ggplot2:
tibble:
tidyr:
readr:
purrr:
dplyr:
stringr:
forcats:
examples of random tinyverse packages, bright green or
green badges:
TL;DR
install.packages requires
Depends/Imports/LinkingTo
DESCRIPTION fields dependencies, recursively.
pacs::pac_deps_user could be used to get them.R CMD check requires
Depends/Imports/LinkingTo/Suggests
DESCRIPTION fields dependencies, and for them
Depends/Imports/LinkingTo
fields recursively. pacs::pac_deps_dev could be used to get
them.Now you might think of what preciously these R package dependencies mean. The R DESCRIPTION file is the place where we could explore the number and nature of dependencies, the 5 fields are representing different types of dependencies: Depends/Imports/LinkingTo/Suggests/Enhances.
DESCRIPTION file dependencies:
Package: NAME
...
Depends:
R (>= 3.6)
Imports:
dplyr
data.table
LinkingTo:
Rcpp
Suggests:
testthat
ca2cat
Enhances:
Hmisc
...
We could get any installed packages description file with the
packageDescription function. More than that the
pacs::pac_description could get any even not installed
package description file and for any version you want.
packageDescription("pacs")
When we run install.packages (and other install
functions like remotes::install_github) only 3 fields are
installed
Depends/Imports/LinkingTo.
We could easily confirm that by checking its help page and the
dependencies argument definition:
?install.packages
...
dependencies:
...
The default, ‘NA’,
means ‘c("Depends", "Imports", "LinkingTo")’.
Depends are packages library
(attached), before the main package is library (attached).
So when we library() the main package
Depends dependencies functions are available to the end
user in the R console. This could be more convenient for the end user if
the main package is for example offering additional functionality over
the dependency one.
The Imports field lists packages whose namespaces
are imported from (as specified in the NAMESPACE file or when sb is
using ::/::: inside the package) but which do
not need to be attached (library). So when we
library() the main package such dependencies functions are
not available to the user in the R console. Namespaces accessed by the
:: and ::: operators
(e.g. ggplot2::ggplot) must be listed in the
Imports field, or in Suggests (when
used only for tests or examples).
A package that wishes to make use of header files in other packages to compile its C/C++ code needs to declare them as a comma-separated list in the field LinkingTo. Specifying a package in LinkingTo suffices if these are C/C++ headers containing source code or static linking is done at installation: the packages do not need to be (and usually should not be) listed in the Depends or Imports fields.
So what about the rest. Suggests are installed when
we run R CMD CHECK (or higher level like
devtools::check()), they are used for tests (e.g. testthat)
or for examples (roxygen2 @examples). Enhances is used
rarely as these are packages which could extent the usage and are NOT
needed for running examples and tests. If your tests/examples use e.g. a
dataset from another package it should be in Suggests
and not Enhances.
So now we see that a Imports dependency is not equal
to a Suggests dependency. From the perspective of the
end user we focus on
Depends/Imports/LinkingTo
dependencies which they will downlaod with
install.packages.
It’s common for packages to be listed in Imports in DESCRIPTION, but
not in NAMESPACE. The DESCRIPTION file Imports field really has nothing
to do with functions imported into the namespace. The DESCRIPTION file
Imports is mainly used by install.packages. On the other
hand NAMESPACE is a place where we defining what we need to build our
package and what we want to expose to the end users (export). Nowadays
the NAMESPACE file is even more mistearious as it is build automatically
e.g. by roxygen2 package. A package have to be listed in
the Imports in DESCRIPTION file, but not in NAMESPACE
if we will call the dependencies function with :: in the
main package. This explicit calls to dependencies are preferred.
If you are interested “How-R-Searches-And-Finds-Stuff” I recommend a great blog post which have more than 10 years and still is a one of the most valuable R sources.
tinytest vs testthatThis subsection will be a subjective view on the difference between
tinytest and testthat packages. This is a
great example in my opinion as showing that package could have many
dependencies nevertheless not exposed to the end user (these
dependencies are not installed with install.packages), as
is in Suggests field of the DESCRIPTION file.
tinytest was created to offer similar functionality to
testthat package nevertheless tinytest has
zero dependencies. For me tinytest is an interesting
alternative compared to testthat nevertheless not so
obvious replacement. I do not care how many dependencies has the
testthat package as it is located in Suggests
field of DESCRIPTION file. testthat will not be delayed
loaded with requireNampese too. This mean that the higher
number of dependencies from the testthat package is only my
problem (developer one, not the end user) when e.g. I am checking the
package (e.g. with R CMD check). To be fair how many
additional packages have to be downloaded be a
developer (e.g. for R CMD check) when comparing
tinytest and testthat. In the case of tinytest
it is zero packages and for testthat 80 packages now.
Please use pacs::pac_deps_dev("tinytest") and
pacs::pac_deps_dev("testthat") calls to confirm that.
Dependencies from the end user perspective:
One of the method of reducing number of dependencies is to transfer the package from Imports to Suggests or not include it at all and load it in the delayed manner. So we have to identify package functions which will be used optionally or rarely (are not a core of the package). Then we have to apply conditional execution of it if the package is installed (available), if not then ask to install it. If function with such delayed loaded package is used in examples or tests then the package have to be in **Suggests* field.
func <- function() {
if (requireNamespace("PACKAGE", quietly = TRUE)) {
regular code
} else {
stop("Please install the PACKAGE to use the func function")
}
}
parsnip and caret packages are examples
which apply this strategy. It could be easily confirmed by looking for
requireNamespace phrase with github search, from each of
the repo.
pacs packageOne of the functionality of the pacs package is to check
a package complexity. We could check the number of dependencies
(recursively or not) and even check how many MB are allocated for a
package and all its dependencies.
Weight Case Study: devtools
Take into account that packages sizes are appropriate for your local
system (Sys.info()). Installation with
install.packages and some devtools functions
might result in different packages sizes.
If you do not want to install anything in your current library
(.libPaths()) and still inspect a package size, then a
usage of the withr package is recommended.
withr::with_temp_libpaths is recommended to isolate the
download process.
# restart of R session could be needed
withr::with_temp_libpaths({install.packages("devtools"); cat(pacs::pac_true_size("devtools") / 10**6, "MB", "\n")})Installation in your main library.
# if not have
install.packages("devtools")Size of the devtools package:
cat(pacs::pac_size("devtools") / 10**6, "MB", "\n")True size of the package as taking into account its dependencies. At
the time of writing it, it is 113MB for
devtools without base packages
(Mac OS arm64).
cat(pacs::pac_true_size("devtools") / 10**6, "MB", "\n")A reasonable assumption might be to count only dependencies which are
not used by any other package. Then we could use
exclude_joint argument to limit them. However hard to
assume if your local installation is a reasonable proxy for an average
user.
# exclude packages if at least one other package use it too
cat(pacs::pac_true_size("devtools", exclude_joint = 1L) / 10**6, "MB", "\n")It is crucial to check the number of dependencies too:
# 70 recursive dependencies
pacs::pac_deps("devtools", local = TRUE)$Package
# 20 direct dependencies
pacs::pac_deps("devtools", local = TRUE, recursive = FALSE)$PackagePlease read in the order all of the 3 sources to become a R packages developer guru :=)