HTML tables are a valuable data source but extracting and recasting these data into a useful format can be tedious. htmltab is a package for extracting structured information from HTML tables. It is similar to readHTMLTable()
of the XML package but provides two major advantages:
Additionally, the function preprocesses table code, removes unneeded parts and so helps to alleviate the need for tedious post-processing.
You can install the released version of htmltab from CRAN with:
And the development version from GitHub with:
To see htmltab in action, take a look at the case studies in this blog post, the package vignette, or the package manual.