If you’re writing an R package that uses reticulate
as
an interface to a Python session, you likely also need to install one or
more Python packages on the user’s machine for your package to function.
In addition, you’d likely prefer to insulate users from details around
how Python + reticulate
are configured as much as possible.
This vignette documents a few approaches for accomplishing these
goals.
Previously, packages like tensorflow accomplished this
by providing helper functions (e.g.
tensorflow::install_tensorflow()
), and documenting that
users should call this function to prepare the environment. For
example:
library(tensorflow)
install_tensorflow()
# use tensorflow
The biggest downside with this approach is that it requires users to
manually download and install an appropriate version of Python. In
addition, if the user has not downloaded an appropriate version
of Python, then the version discovered on the user’s system may not
conform with the requirements imposed by the tensorflow
package – leading to more trouble.
Fixing this often requires instructing the user to install Python,
and then use reticulate
APIs
(e.g. reticulate::use_python()
and other tools) to find and
use an appropriate Python version + environment. This is,
understandably, more cognitive overhead than you might want to impose on
users of your package.
Another huge problem with manual configuration is that if different R packages use different default Python environments, then those packages can’t ever be loaded in the same R session (since there can only be one active Python environment at a time within an R session).
With newer versions of reticulate
, it’s possible for
client packages to declare their Python dependencies directly in the
DESCRIPTION
file, with the use of the
Config/reticulate
field.
With automatic configuration, reticulate
wants to
encourage a world wherein different R packages wrapping Python packages
can live together in the same Python environment / R session. In
essence, we would like to minimize the number of conflicts that could
arise through different R packages having incompatible Python
dependencies.
For example, if we had a package rscipy
that acted as an
interface to the SciPy Python package,
we might use the following DESCRIPTION
:
Package: rscipy
Title: An R Interface to scipy
Version: 1.0.0
Description: Provides an R interface to the Python package scipy.
Config/reticulate:
list(
packages = list(
list(package = "scipy")
)
)
< ... other fields ... >
With this, reticulate
will take care of automatically
configuring a Python environment for the user when the
rscipy
package is loaded and used (i.e. it’s no longer
necessary to provide the user with a special
install_tensorflow()
type function).
Specifically, after the rscipy
package is loaded, the
following will occur:
Unless the user has explicitly instructed reticulate
to use an existing Python environment, reticulate
will
prompt the user to download and install Miniconda (if
necessary).
After this, when the Python session is initialized by
reticulate
, all declared dependencies of loaded packages in
Config/reticulate
will be discovered.
These dependencies will then be installed into an appropriate Conda environment, as provided by the Miniconda installation.
In this case, the end user workflow will be exactly as with an R package that has no Python dependencies:
library(rscipy)
# use the package
If the user has no compatible version of Python available on their
system, they will be prompted to install Miniconda. If they do have
Python already, then the required Python packages (in this case
scipy
) will be installed in the standard shared environment
for R sessions (typically a virtual environment, or a Conda environment
named “r-reticulate”).
In effect, users have to pay a one-time, mostly-automated
initialization cost in order to use your package, and then things will
then work as any other R package would. In particular, users are
otherwise insulated from details as to how reticulate
works.
In some cases, a user may try to load your package after Python has
already been initialized. To ensure that reticulate
can
still configure the active Python environment, you can include the
code:
<- function(libname, pkgname) {
.onLoad ::configure_environment(pkgname)
reticulate }
This will instruct reticulate
to immediately try to
configure the active Python environment, installing any required Python
packages as necessary.
The goal of these mechanisms is to allow easy interoperability
between R packages that have Python dependencies, as well as to minimize
specialized version/configuration steps for end-users. To that end,
reticulate
will (by default) track an older version of
Python than the current release, giving Python packages time to adapt as
is required. Python 2 will not be supported.
Tools for breaking these rules are not yet implemented, but will be provided as the need arises.
Declared Python package dependencies should have the following format:
package: The name of the Python package.
version: The version of the package that should
be installed. When left unspecified, the latest-available version will
be installed. This should only be set in exceptional cases – for
example, if the most recently-released version of a Python package
breaks compatibility with your package (or other Python packages) in a
fundamental way. If multiple R packages request different versions of a
particular Python package, reticulate
will signal a
warning.
pip: Whether this package should be retrieved
from the PyPI with pip
, or
(if FALSE
) from the Anaconda repositories.
For example, we could change the Config/reticulate
directive from above to specify that scipy [1.3.0]
be
installed from PyPI (with pip
):
Config/reticulate:
list(
packages = list(
list(package = "scipy", version = "1.3.0", pip = TRUE)
)
)