doconv

The tool offers a set of functions for converting ‘Microsoft Word’ or ‘Microsoft PowerPoint’ documents to ‘PDF’ format and also for converting them to images in the form of thumbnails. In order to work, ‘LibreOffice’ must be installed on the machine and possibly ‘python’ and ‘Microsoft Word’.

R build status

Installation

You can install the latest version from GitHub with:

# install.packages("devtools")
devtools::install_github("ardata-fr/doconv")

Example

library(doconv)

Generate thumbails from file

You can generate thumbails as an image by using to_miniature:

docx_file <- system.file(package = "doconv", "doc-examples/example.docx")
to_miniature(
  filename = docx_file, 
  row = c(1, 1, 2, 2),
  use_docx2pdf = TRUE)

It uses ‘LibreOffice’ to convert Word or PowerPoint documents to PDF. It probably works with other types of document but the package is only focusing on PDF, Word and PowerPoint documents. If option use_docx2pdf=TRUE, docx2pdf is used instead of ‘LibreOffice’ to convert Word files to PDF; you can only use that option if ‘Word’ and ‘docx2pdf’ is installed on your machine.

Convert a PowerPoint file to PDF

docx_file <- system.file(package = "doconv", "doc-examples/example.pptx")
to_pdf(docx_file, output = "pptx_example.pdf")
to_miniature("pptx_example.pdf", width = 1000)

Convert a Word file to PDF

to_pdf(docx_file, output = "docx_example.pdf")

Setup

First ‘LibreOffice’ must be available on your machine, please visit https://www.libreoffice.org/ and follow the installation instructions.

Use function check_libreoffice_export() to check that the software is installed and can export to PDF:

check_libreoffice_export()
#> [1] TRUE

If ‘Microsoft Word’ is available on your machine, you can get images or pdf that looks exactly the same than the document rendered with ‘Microsoft Word’, if not ‘LibreOffice’ is used to convert Word documents to PDF or as an image, in this case, be aware that ‘LibreOffice’ does not always render the document as ‘Microsoft Word’ would do (sections can be misunderstood for example).

If ‘Microsoft Word’ is available on your machine, install python module ‘docx2pdf’ with the command docx2pdf_install() (and make sure beforehand that python is available on your machine too):

library(locatexec)
library(doconv)
if(exec_available("python", error = TRUE) && # check that python is available
   !docx2pdf_available()){ # check that docx2pdf is available
  docx2pdf_install()
}

docx2pdf and batch process

If docx2pdf process Word documents that contains TOC or any Word computed field, the user will be invited to confirm the operation with a Word dialog box. That makes the process unreliable when running in non-interactive mode.

Then docx2pdf option should only be used in interactive mode when Word is available.