Title: | Extract Text from Rich Text Format (RTF) Documents |
---|---|
Description: | Wraps the 'unrtf' utility <https://www.gnu.org/software/unrtf/> to extract text from RTF files. Supports document conversion to HTML, LaTeX or plain text. Output in HTML is recommended because 'unrtf' has limited support for converting between character encodings. |
Authors: | Jeroen Ooms [aut, cre], Free Software Foundation, Inc [cph] |
Maintainer: | Jeroen Ooms <[email protected]> |
License: | GPL-3 |
Version: | 1.4.7 |
Built: | 2024-11-25 05:54:42 UTC |
Source: | https://github.com/ropensci/unrtf |
Converts an rtf document to html, text or latex. Output in html is recommended
because unrtf
has limited support for converting between character encodings
which is problematic for non-ascii text.
unrtf( file = NULL, format = c("html", "text", "latex"), verbose = FALSE, conf_dir = NULL )
unrtf( file = NULL, format = c("html", "text", "latex"), verbose = FALSE, conf_dir = NULL )
file |
path or url to the 'rtf' file |
format |
output format, must be "text", "html" or "latex" |
verbose |
print some output to stderr |
conf_dir |
use a custom dir with |
Output can be customized via a set of .conf
files which serve as templates for
the various formats. The default conf files are located in system.file("share", package = "unrtf")
To modify the output, copy these files to a custom location and set pass the
directory as the conf_dir
argument in unrtf
.
library(unrtf) text <- unrtf("https://jeroen.github.io/files/sample.rtf", format = "text") html <- unrtf("https://jeroen.github.io/files/sample.rtf", format = "html") cat(text)
library(unrtf) text <- unrtf("https://jeroen.github.io/files/sample.rtf", format = "text") html <- unrtf("https://jeroen.github.io/files/sample.rtf", format = "html") cat(text)