Title: | Extract Text from Microsoft Word Documents |
---|---|
Description: | Wraps the 'AntiWord' utility to extract text from Microsoft Word documents. The utility only supports the old 'doc' format, not the new xml based 'docx' format. Use the 'xml2' package to read the latter. |
Authors: | Jeroen Ooms [aut, cre] , Adri van Os [cph] (Author 'antiword' utility) |
Maintainer: | Jeroen Ooms <[email protected]> |
License: | GPL-2 |
Version: | 1.3.4 |
Built: | 2024-12-02 05:43:10 UTC |
Source: | https://github.com/ropensci/antiword |
Wraps the antiword utility. Takes a path to an word file and returns text from the document.
antiword(file = NULL, format = FALSE)
antiword(file = NULL, format = FALSE)
file |
path or url to your word file |
format |
format the output text (-f parameter) |
text <- antiword("https://jeroen.github.io/files/UDHR-english.doc") cat(text)
text <- antiword("https://jeroen.github.io/files/UDHR-english.doc") cat(text)