This repository has been archived. The former README is now in README-NOT.md.
ropensci / antiword Goto Github PK
View Code? Open in Web Editor NEWR wrapper for antiword utility
Home Page: https://docs.ropensci.org/antiword
R wrapper for antiword utility
Home Page: https://docs.ropensci.org/antiword
This repository has been archived. The former README is now in README-NOT.md.
Is there a way to extract content in .doc files?
Thought others may run across a similar issue. I'm processing a large number of MS word documents, and one of them is apparently corrupted (it's attached). Antiword produced the following error:
System call to 'antiword' failed (1): Read long 0x7a74 not possible
I tried using tryCatch(antiword(file), finally=as.character(NA)), but tryCatch didn't save me. I didn't understand how to get around this. Tried signalCondition() clumsily but with no luck.
So, inside the 'antiword' function there is a 'stop' command. All I really need is a warning, so to hack my way around this, I wrote this myantiword function:
is_windows <- function(){
identical(.Platform$OS.type, "windows")
}
myantiword <- function (file = NULL, format = FALSE)
{
args <- if (length(file)) {
if (grepl("^https?://", file)) {
tmp <- tempfile(fileext = ".doc")
utils::download.file(file, tmp, mode = "wb")
file <- tmp
}
file <- normalizePath(file, mustWork = TRUE)
c(ifelse(isTRUE(format), "-f", "-t"), ifelse(is_windows(),
shQuote(file), file))
}
wd <- getwd()
on.exit(setwd(wd))
bindir <- system.file("bin", package = "antiword")
setwd(bindir)
postfix <- if (is_windows())
.Machine$sizeof.pointer * 8
path <- file.path(bindir, paste0("antiword", postfix))
out <- sys::exec_internal(path, args, error = FALSE)
if (out$status == 0) {
if (length(out$stderr))
cat(rawToChar(out$stderr), file = stderr())
return(rawToChar(out$stdout))
} else {
warning(sprintf("System call to 'antiword' failed (%d): %s",
out$status, rawToChar(out$stderr)))
return(as.character(NA))
}
}
It would be great to be able to use other parameters, for instance:
-w width
In text mode this is the line width in characters. A value of zero puts an entire paragraph on a line, useful when the text is to used as input for another wordprocessor.
For example -w 0 would be helpfull for extracting text in an NLP pipeline.
Hi there,
I fork the antiword library, git clone on my machine (OSX, 10.14.1). R v3.6.3
.
In order to locally install the library, I open R shell and load devtools from within the antiword library location. I then run devtools::load_all()
; sorry if I'm doing this in an inane manner. I get error as:
$ devtools::load_all()
Loading antiword
Re-compiling antiword
─ installing *source* package ‘antiword’ ...
** using staged installation
** libs
rm -f antiword.so register.o libantiword/main_u.o libantiword/asc85enc.o libantiword/blocklist.o libantiword/chartrans.o libantiword/datalist.o libantiword/depot.o libantiword/dib2eps.o libantiword/doclist.o libantiword/fail.o libantiword/finddata.o libantiword/findtext.o libantiword/fmt_text.o libantiword/fontlist.o libantiword/fonts.o libantiword/fonts_u.o libantiword/hdrftrlist.o libantiword/imgexam.o libantiword/imgtrans.o libantiword/jpeg2eps.o libantiword/listlist.o libantiword/misc.o libantiword/notes.o libantiword/options.o libantiword/out2window.o libantiword/output.o libantiword/pdf.o libantiword/pictlist.o libantiword/png2eps.o libantiword/postscript.o libantiword/prop0.o libantiword/prop2.o libantiword/prop6.o libantiword/prop8.o libantiword/properties.o libantiword/propmod.o libantiword/rowlist.o libantiword/sectlist.o libantiword/stylelist.o libantiword/stylesheet.o libantiword/summary.o libantiword/tabstop.o libantiword/text.o libantiword/unix.o libantiword/utf8.o libantiword/word2text.o libantiword/worddos.o libantiword/wordlib.o libantiword/wordmac.o libantiword/wordole.o libantiword/wordwin.o libantiword/xmalloc.o libantiword/xml.o antiword
gcc -I"/Users/sanjeevsariya/bin/R3.6.3/Rv3.6.3/lib/R/include" -DNDEBUG -I/usr/local/include -fPIC -g -O2 -UNDEBUG -Wall -pedantic -g -O0 -fdiagnostics-color=always -c libantiword/main_u.c -o libantiword/main_u.o
In file included from libantiword/main_u.c:48:
libantiword/antiword.h:13:2: error: Exactly one of the DEBUG and NDEBUG flags MUST be set
#error Exactly one of the DEBUG and NDEBUG flags MUST be set
^
1 error generated.
make: *** [libantiword/main_u.o] Error 1
ERROR: compilation failed for package ‘antiword’
─ removing ‘/private/var/folders/7w/kl6vpf596h738qtnndqqm7k00000gn/T/Rtmpi2skhK/devtools_install_56981013902c/antiword’
Error in (function (command = NULL, args = character(), error_on_status = TRUE, :
System command 'R' failed, exit status: 1, stdout + stderr (last 10 lines):
E> rm -f antiword.so register.o libantiword/main_u.o libantiword/asc85enc.o libantiword/blocklist.o libantiword/chartrans.o libantiword/datalist.o libantiword/depot.o libantiword/dib2eps.o libantiword/doclist.o libantiword/fail.o libantiword/finddata.o libantiword/findtext.o libantiword/fmt_text.o libantiword/fontlist.o libantiword/fonts.o libantiword/fonts_u.o libantiword/hdrftrlist.o libantiword/imgexam.o libantiword/imgtrans.o libantiword/jpeg2eps.o libantiword/listlist.o libantiword/misc.o libantiword/notes.o libantiword/options.o libantiword/out2window.o libantiword/output.o libantiword/pdf.o libantiword/pictlist.o libantiword/png2eps.o libantiword/postscript.o libantiword/prop0.o libantiword/prop2.o libantiword/prop6.o libantiword/prop8.o libantiword/properties.o libantiword/propmod.o libantiword/rowlist.o libantiword/sectlist.
[...]
Type .Last.error.trace to see where the error occured
I compiled R locally as with below flags:
./configure --prefix=~/Rv3.6.3 --enable-R-shlib --enable-BLAS-shlib
Any pointers shall be appreciated in order to set this initial loadings/compilations/installations.
Platform: x86_64-apple-darwin18.2.0 (64-bit)
Running under: macOS Mojave 10.14.1
other attached packages:
devtools_2.2.2 usethis_1.5.1
Rcpp_1.0.3 rstudioapi_0.11 magrittr_1.5 pkgload_1.0.2
R6_2.4.1 rlang_0.4.5 fansi_0.4.1 tools_3.6.3
pkgbuild_1.0.6 sessioninfo_1.1.1 cli_2.0.2 withr_2.1.2
ellipsis_0.3.0 remotes_2.1.1 assertthat_0.2.1 digest_0.6.25
rprojroot_1.3-2 crayon_1.3.4 processx_3.4.2 callr_3.4.2
fs_1.3.2 ps_1.3.2 testthat_2.3.2 memoise_1.1.0
glue_1.3.1 compiler_3.6.3 desc_1.2.0 backports_1.1.5
prettyunits_1.1.1
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.