pridiltal / staplr Goto Github PK

PDF Toolkit. :paperclip: :hammer: :wrench: :scissors: :bookmark_tabs: :file_folder::paperclip: :bookmark: :construction: :construction_worker:

Home Page: https://pridiltal.github.io/staplr/

R 1.91% Java 95.42% Shell 0.10% C++ 1.73% HTML 0.11% Roff 0.72%

r pdftk pdf toolkit

staplr's Introduction

staplr

This package provides functions to manipulate PDF files:

fill out PDF forms: get_fields() and set_fields()
merge multiple PDF files into one: staple_pdf()
remove selected pages from a file: remove_pages()
rename multiple files in a directory: rename_files()
rotate entire pdf document: rotate_pdf()
rotate selected pages of a pdf file: rotate_pages()
Select pages from a file: select_pages()
splits single input PDF document into individual pages: split_pdf()
splits single input PDF document into parts from given points: split_from()

This package is still under development and this repository contains a development version of the R package staplr.

Installation

staplr requires a Java installation on your system. You can get the latest version of java from here. OpenJDK also works.

You can install the stable version from CRAN.

install.packages('staplr', dependencies = TRUE)

You can install staplr from github with:

# install.packages("devtools")
devtools::install_github("pridiltal/staplr")

Example

library(staplr)
# Merge multiple PDF files into one
staple_pdf()

# This command prompts the user to select the file interactively. 
# Remove page 2 and 3 from the selected file.
remove_pages(rmpages = c(2,3))

# This function selects pages from a file;
select_pages(selpages = c(1,3))

# This function splits a single input PDF document into individual pages
split_pdf()

# This function writes renamed files back to directory
#if the directory contains 3 PDF files
rename_files(new_names = paste("file",1:3))

# These functions are to fill out pdf forms
get_fields() 
set_fields()
# This includes 2 external functions `get_fields` and `set_fields` 
# and files to use as examples.
# This is what the example file looks like

# If you get path to this file by
pdfFile = system.file('testForm.pdf',package = 'staplr')

# And do
fields = get_fields(pdfFile)
# You'll get a list of fields that the pdf contains 
# along with some additional information about the fields.

# You make modifications in any of the fields by
fields$TextField1$value = 'this is text'
set_fields(pdfFile, 'newFile.pdf', fields)

# This will create a filled pdf file

Troubleshooting and 2.11.0 changes

As of version 2.11.0, the package uses pdftk-java instead of using the original pdftk. pdftk-java is included with the package so if you have a working java installation, you shouldn’t have any problems.
While default java options should be enough for most use cases, if you need to, you can change java options that is used to run pdftk by doing

options('staplr_java_options' = '-Xmx512m')

This option is not affected by rJava settings.

If you don’t have a working java installation, your installation will fail since you can’t install rJava. Make sure you follow the proper instructions for java installation. For openJDK on linux make sure you get both jdk and jre and run javareconf.

sudo apt update -y
sudo apt install -y openjdk-8-jdk openjdk-8-jre
sudo R CMD javareconf

Also restart your R session after javareconf

pdftk-java is built as a faithful representation of the original pdftk so there shouldn’t be any major differences between the outputs. However, for any reason you’d prefer to run a local installation of pdftk rather than using the version that is shipped with the package, do

# set staplr_custom_pdftk to the path to local installation
# just setting to pdftk will do if it's already in your path
 options('staplr_custom_pdftk' = 'pdftk')

If you want to do this, you can get the original version of pdftk from here. Note that MacOS users with a version higher than “High Sierra” should use this version instead.

Make sure to set the option back to NULL if you want to use the built in pdftk later.

References

https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/

staplr's People

Contributors

Stargazers

Watchers

staplr's Issues

Installation issues

Hi guys, I'm trying to install the package but I am running into issues.

> install.packages("staplr")

  There is a binary version available but the source version is later:
       binary source needs_compilation
staplr  1.1.0  2.1.0             FALSE

installing the source package ‘staplr’

trying URL 'https://cran.rstudio.com/src/contrib/staplr_2.1.0.tar.gz'
Content type 'application/x-gzip' length 30644 bytes (29 KB)
==================================================
downloaded 29 KB

* installing *source* package ‘staplr’ ...
** package ‘staplr’ successfully unpacked and MD5 sums checked
** R
** inst
** preparing package for lazy loading
Error : .onLoad failed in loadNamespace() for 'tcltk', details:
  call: fun(libname, pkgname)
  error: X11 library is missing: install XQuartz from xquartz.macosforge.org
ERROR: lazy loading failed for package ‘staplr’
* removing ‘/Library/Frameworks/R.framework/Versions/3.4/Resources/library/staplr’
Warning in install.packages :
  installation of package ‘staplr’ had non-zero exit status

The downloaded source packages are in
	‘/private/var/folders/y3/fbt4vmln131f4lvf_q5xrzgm0000gn/T/RtmppfuDX1/downloaded_packages’

> devtools::install_github("pridiltal/staplr")
Downloading GitHub repo pridiltal/staplr@master
from URL https://api.github.com/repos/pridiltal/staplr/zipball/master
Installing staplr
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save  \
  --no-restore --quiet CMD INSTALL  \
  '/private/var/folders/y3/fbt4vmln131f4lvf_q5xrzgm0000gn/T/RtmppfuDX1/devtools17ae55d0fbcbb/pridiltal-staplr-fb29a5b'  \
  --library='/Library/Frameworks/R.framework/Versions/3.4/Resources/library' --install-tests 

* installing *source* package ‘staplr’ ...
** R
** inst
** preparing package for lazy loading
Error : .onLoad failed in loadNamespace() for 'tcltk', details:
  call: fun(libname, pkgname)
  error: X11 library is missing: install XQuartz from xquartz.macosforge.org
ERROR: lazy loading failed for package ‘staplr’
* removing ‘/Library/Frameworks/R.framework/Versions/3.4/Resources/library/staplr’
Installation failed: Command failed (1)

More details:

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] raster_2.6-7 sp_1.2-7    

loaded via a namespace (and not attached):
 [1] httr_1.3.1      compiler_3.4.3  R6_2.2.2        tools_3.4.3     withr_2.1.1    
 [6] curl_3.1        yaml_2.1.18     memoise_1.1.0   Rcpp_0.12.15    grid_3.4.3     
[11] git2r_0.21.0    digest_0.6.14   devtools_1.13.5 lattice_0.20-35

Does this mean I have to upgrade to 3.5 to use this package?

Support for PDF flattening

Hi there,

thanks for taking care of this project!

I have one feature request for the set_fields() method: Add a parameter to optionally extend the system_command with flatten to disallow editing the PDF afterwards.

I could definitely need this for one of my projects! Or do you see any other options do this right now (that I couldn't find in the documentation)?

Really looking forward to having the latest version 2.11 available on CRAN for R 4.0.2.

function name typo

https://github.com/pridiltal/staplr/blob/master/R/fill_pdf.R#L215-L243

I think this function may have a typo: idenfity_ instead of identify_

Merge Not working

Hi,

Thank you for this package, I have a bunch of pdf files to merge, N = 1,900. I do not have access to any merging software for MS Word or PDF documents.

I installed the free version of the PDF toolkit, and ran the example of the staple_pdf() below take from the documentation:

library(staplr)
library(lattice)

staple_pdf()
## End(Not run)
## Not run:
dir <- '//user/location'
require(lattice)
for(i in 1:3) {
  pdf(file.path(dir, paste("plot", i, ".pdf", sep = "")))
  print(xyplot(iris[,1] ~ iris[,i], data = iris))
  dev.off()
}
output_file <- file.path(dir, paste('_Full_pdf.pdf', ))
staple_pdf(input_directory = NULL, input_files = NULL,
           output_filepath = NULL)

This example creates three pdf files successfully: plots1, plots2, plots3 and is supposed to merge those files into a _full_pdf.pdf file. However, all I get is the prompt window to choose the folder, but no output is generated.

Please advise, thank you!

Argument is of length zero when running set_fields

Is there any way to detect which field is the one problematic one?
I'm getting the following error message when running the final set_fields function:

Error in if (fieldToFill$type %in% c("Text", "Choice")) { : 
  argument is of length zero

I am working with 177 fields and don't know which one is causing this problem.
Thanks and hope you can give me an idea or add a message with a suggested field!!

staple_pdf example not working on Windows PC

The example for staple_pdf included in the package is not working for me.

The three plot PDFs get written to the tempdir successfully, in my case: "C:\Users\Richard\AppData\Local\Temp\Rtmp6TqGpZ"

The generated output_file path is:
"C:\Users\Richard\AppData\Local\Temp\Rtmp6TqGpZ/Full_pdf.pdf"

However, when running
staple_pdf(input_directory = dir, output_filepath = output_file)

I only get the integer 127 as a result, with no error messages. I have tried manually changing the ouput_file path, as well as supplying a vector of input files, all with 127 as a result.

Any ideas?

sessionInfo()

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252 `

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] staplr_2.9.0 lattice_0.20-38

loaded via a namespace (and not attached):
[1] compiler_3.6.1 magrittr_1.5 assertthat_0.2.1 tools_3.6.1 stringi_1.4.3
[6] grid_3.6.1 stringr_1.4.0 packrat_0.5.0 tcltk_3.6.1

system("java -version")

java version "12.0.2" 2019-07-16
Java(TM) SE Runtime Environment (build 12.0.2+10)
Java HotSpot(TM) 64-Bit Server VM (build 12.0.2+10, mixed mode, sharing)

Error when using set_fields 'All unnamed arguments must be length 1'

hi there, I have been trying to run the staplr code on Ubuntu 20.04.5 and keep getting an error (All unnamed arguments must be length 1) when trying to use the set_fields function.

The get_fields function seems to work fine, I am using it in R and able to see that the field variable is updated with the new information. The issue is only when the set_fields function is applied.

I installed java using the code provided and installed rJava (install.packages("rJava")):

sudo apt update -y
sudo apt install -y openjdk-8-jdk openjdk-8-jre
sudo R CMD javareconf

I first installed the package using: install.packages('staplr', dependencies = TRUE)

Uninstalled this version after is produced this error and tried: devtools::install_github("pridiltal/staplr")

However the same issue keeps arising.

As part of my troubleshooting I tried to install pdftools using: sudo apt-get install libpoppler-cpp-dev

I'm at a loss as to what to try next so any insight would be appreciated. Thanks!

Filling forms

Were you planning to wrap the form filling functions of pdftk (generate_fdf and fill_form commands)? I could work on those on a spare time but was wondering if you had any ideas on how they should work.

generate_fdf command creates a plain text file with field names of a pdf form. Given an fdf file and a pdf, fill_form command fills the pdf.

A simple interface would be writing a getFields function that'll execute generate_fdf, parse that file and return a list of fields and their current vaules. Then a setFields function will take that list as an input and write it back to the file.

get_fields() hangs for pdf I created, works for package example file

Following the example code, get_fields() works fine:

require(staplr)
#> Loading required package: staplr
pdfFile <- system.file('testForm.pdf',package = 'staplr')

fields <- get_fields(pdfFile)
fields
#> $TextField1
#> $TextField1$type
#> [1] "Text"
#> 
#> $TextField1$name
#> [1] "TextField1"
#> 
#> $TextField1$value
#> [1] "Jone was here"
# .... (more deleted for brevity)

However, when I try the same code with a fillable PDF I've created, get_fields() hangs:

Test 2.pdf

pdf <- "~/temp/Test 2.pdf"
fields <- get_fields(pdf)

I am using:
OSX Mojave (10.14.3),
XQuartz 2.7.11
staplr 2.9.0
pdftk 2.02

Allowing for a vector of input files instead of merging an entire directory

Super cool function!

I have situations where I don't want to bind all the files in a directory.

I currently changed the function and added an input_files argument. Maybe there is a more elegant way to do this?

staple_pdf <- function(input_directory = NULL, input_files = NULL, output_filename = "Full_pdf", 
                        output_directory = NULL) 
{
  if(is.null(input_directory) & is.null(input_files)) {
    input_directory <- tcltk::tk_choose.dir(caption = "Select directory which contains PDF fies")
  }
  if(!is.null(input_directory)){input_filepaths <- (Sys.glob(file.path(input_directory, "*.pdf")))}
  if(!is.null(input_files)){input_filepaths <- input_files}
  
  if (is.null(output_directory)) {
    output_directory <- tcltk::tk_choose.dir(caption = "Select directory to save output")
  }
  output_filepath <- file.path(output_directory, paste(output_filename, 
                                                       ".pdf", sep = ""))
  quoted_names <- paste0("\"", input_filepaths, "\"")
  file_list <- paste(quoted_names, collapse = " ")
  output_filepath <- paste0("\"", output_filepath, "\"")
  system_command <- paste("pdftk", file_list, "cat", "output", 
                          output_filepath, sep = " ")
  system(command = system_command)
}

Let me know what you think and I can do a PR.
Cheers
Dan

Switch to pdftk-Java and ship a prebuilt jar

I just learned that a pure Java implementation of pdftk exists. If we committed to using this version:

users wouldn't have to follow an external link to install pdftk on their systems
Mac users who follow a wrong link wouldn't get stuck
cran can properly run our tests
this version is in active development which means there are real people that we can talk to when things go wrong

All that will be needed is to place the jar in inst, direct mentions of pdftk to this file, and make sure everything still works. Cran wants Java programs to be compiled into jars anyway so that doesn't leave much for us to do.

Some caveats:

~~we may have change licenses. Pdftk-Java is on gpl2 which based on my understanding is not compatible with gpl3~~.
running jars require finding JAVA_HOME which is often problematic in different systems. I have some heuristics that I use for my own packages and rjava has more advanced ways of locating Java but problems can still arise especially when multiple versions of Java is installed (eg 32 vs 64 bit). This one doesn't seem to pick between 32 vs 64 bit Java so rjava should be good enough. Worth to do some tests though.
we may have to fiddle with Java options. Most operations shouldn't be too memory intensive but for extreme use cases, problems can arise with default settings. This also requires some testing. Though if this was going to be an issue, I suspect we would see it with regular pdftk as well

Edit: Also original pdftk won't be available in later ubuntu version due to a dependency of it going away so at least a part of the userbase will be using pdftk-java whether we wan't it or not

Staplr removed for real this time

https://cran.r-project.org/web/packages/staplr/index.html

@pridiltal Did you get a notification about this? A devtools::check seems to think everything's fine

Windows specific encoding issue

There are still a few windows specific issues with encoding. I will close this once I see it working in all platforms, after which it should be ready for a CRAN bump

staplr's set_fields and get_fields running for hours with no results

Hey guys. Great library.. just what I need right now. The thing is that I'm having problems with the basic example, using the included PDF when installed. I'm running:

library(staplr)
pdfFile = system.file('testForm.pdf', package = 'staplr')
fields <- get_fields(pdfFile)
fields$TextField1$value <- "Test"
set_fields(input_filepath = pdfFile,
           output_filepath = 'filledPdf.pdf', 
           fields = fields)

And after hours.. still waiting.

On the other hand, I tried with a personal PDF I have, with lots of filling spaces, and same result.. I import the PDF correctly and when running the get_fields function, nothing happens for hours. Note that it works with the previous code.

Hope you can help me solve this issue!

Thanks, B.

Is it possible to select more than 1 value in a checkbox?

Hello.
I tried with fields$field$value <- c(1,3) but doesn't work. Is there a way to check more than one option in a field with multiple checkboxes?
Thanks!

rotate_pdf doesn't actually turn pages by given degrees

When testing something else I noticed that the way we describe rotations and the way staplr interprets is a little different if we do rotate_pdf with page_rotation = 90 you'll get something like

|--------|
| text   |
|        |
|        |
|        |    
|        |
|        |
|--------|

           ↓  90 (east) rotation

|------------------|
|                t |
|                e |
|                x |
|                t |
|------------------|

Imagine the letters of the text rotated as well. This is expected behaviour. However if you take that output and try to rotate again you get

|------------------|
|                t |
|                e |
|                x |
|                t |
|------------------|

           ↓  90 (east) rotation

|------------------|
|                t |
|                e |
|                x |
|                t |
|------------------|

This happens because pdftk doesn't actually try to rotate the pages if they already have the right orientation. In this context, east doesn't mean "turn this file by 90 degrees" but "turn it so the top of the file is on the right side".

This behaviour was unexpected for me. We should either try to make sure to clarify this on docs or find a way to get the current orientation of the page and tell pdftk to rotate relative to the current orientation. Accepting the pdtfk notation (north east south west) could also help in clarifying what the function is actually doing

remove_pages: Let users have the same input and output filepath

When you want to manipulate a PDF, it would be great if you could just overwrite the file instead of having to create a new one. I recently supervised a project in which I had to manipulate lots of temporary PDF files. For each of those files I had to create copies for removing certain pages. Please add the option (maybe something like overwrite = TRUE?).

overrwrites does not work if you run this code twice

if (requireNamespace("lattice", quietly = TRUE)) {
dir <- tempdir()
for(i in 1:3) {
pdf(file.path(dir, paste("plot", i, ".pdf", sep = "")))
print(lattice::xyplot(iris[,1] ~ iris[,i], data = iris))
dev.off()
}
output_file <- file.path(dir, paste('Full_pdf.pdf', sep = ""))
staple_pdf(input_directory = dir, output_filepath = output_file)
}

Error: The given output filename: C:\Users\310677\AppData\Local\Temp\Rtmp082zIl\Full_pdf.pdf
matches an input filename. Exiting.
Errors encountered. No output created.
Done. Input errors, so no output created.
[1] 1

pdftk: not found

My understanding from the README is that if I have JAVA installed (JDK and JRE), then staplr contains pdftk-java and I do not need to separately install pdftk.

But, when I run the README example, I get "sh: 1: pdftk: not found":

> pdfFile = system.file('testForm.pdf',package = 'staplr')
> pdfFile
[1] "/home/kfowler/R/x86_64-pc-linux-gnu-library/3.6/staplr/testForm.pdf"
> get_fields(pdfFile)
sh: 1: pdftk: not found
Error in file(con, "r") : cannot open the connection
In addition: Warning messages:
1: In system(system_command) : error in running command
2: In file(con, "r") :
  cannot open file '/tmp/RtmpZQhoUu/file165a71ea13f0b': No such file or directory

I am using staplr_2.9.0, with R 3.6.3, on a ubuntu 20.04 system.
I ran javareconf, and restarted R before attempting the example above.

$ sudo R CMD javareconf
Java interpreter : /usr/lib/jvm/default-java/bin/java
Java version     : 11.0.7
Java home path   : /usr/lib/jvm/default-java
Java compiler    : /usr/lib/jvm/default-java/bin/javac
Java headers gen.: /usr/bin/javah
Java archive tool: /usr/lib/jvm/default-java/bin/jar

trying to compile and link a JNI program 
detected JNI cpp flags    : -I$(JAVA_HOME)/include -I$(JAVA_HOME)/include/linux
detected JNI linker flags : -L$(JAVA_HOME)/lib/server -ljvm
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -I/usr/lib/jvm/default-java/include -I/usr/lib/jvm/default-java/include/linux    -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-jbaK_j/r-base-3.6.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c conftest.c -o conftest.o
gcc -std=gnu99 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o conftest.so conftest.o -L/usr/lib/jvm/default-java/lib/server -ljvm -L/usr/lib/R/lib -lR


JAVA_HOME        : /usr/lib/jvm/default-java
Java library path: $(JAVA_HOME)/lib/server
JNI cpp flags    : -I$(JAVA_HOME)/include -I$(JAVA_HOME)/include/linux
JNI linker flags : -L$(JAVA_HOME)/lib/server -ljvm
Updating Java configuration in /usr/lib/R
Done.

pre-cran

@pridiltal If you didn't send it to cran yet could you wait for the weekend. there is a small change i want to make about encoding problems

Rstudio crashes on loading library(staplr)

I wasn't able to load staplr successfully in Rstudio. It would crash. I then tried running the R script I had written in Rgui and everything worked fine. My issue is resolved because of that, but I wanted to let you know it's crashing in Rstudio. Also in case anyone else comes across this issue you can try to run the script outside of Rstudio.

rotate_pdf doesn't seem to work

I ran the following code, but no file is being created or replaced.

library(staplr)
rotate_pdf(page_rotation = 90, input_filepath = "test.pdf", output_filepath = "success.pdf", overwrite = TRUE)

update:
in case it was a coding error, I also created a question in stackoverflow

get_fields() Error in XML::htmlParse(fields, asText = TRUE, encoding = "UTF-8") : empty or no content specified

I am trying to get the fields of the attached pdf

subset_1_part1.pdf

using:

get_fields("path/to/pdf/example.pdf)

the following error is generated:

Error in XML::htmlParse(fields, asText = TRUE, encoding = "UTF-8") : 
  empty or no content specified

files with dots in field names cannot be processed

I have loaded a sample PDF that is already filled in, and using get_fields produces the full list of fields. However, when I use set_fields to create a copy of the filled PDF, certain fields are blank, and some values have been written to the wrong field.

Cran removal

@pridiltal, seems like the package was removed from cran, likely because of the Java version issue. The current version should be fine. Mind submitting it?

R 4.0.0 possible issue

I was having problems with staplr::staple_pdf crashing my session, when i changed R from 4.0.0 to 3.6.2 it works fine again. Just thought you should know.

get_fields returns an empty list on Ubuntu 18.04

get_fields seems to always return an empty list on Ubuntu 18.04.

> library(staplr)
> pdfFile = system.file('testForm.pdf',package = 'staplr')
> fields = get_fields(pdfFile)
Warning message:
In get_fields(pdfFile) :
  some fields seems to include plain text UTF-8. Setting convert_field_names = TRUE might help. These fields have problematic names: 
 weird #C3#91 characters, #E2#80#93weird#E2#80#93 dash, #E2#80#94weird#E2#80#94 long dash, #C2#BD #C2#BE #E2#86#92 #E2#80#98 #E2#80#99 #E2#80#9D #E2#80#9C #E2#80#A2 characters #E2#86#92nospace
> fields
named list()

I tried this with both the CRAN and github versions of staplr. I'm using pdftk from the 18.10 repository, as suggested at the askubuntu link in staplr's readme.

I finally tracked this down to the fdfAnnotate function. It assumes that values come before names in the FDF file. But in my FDF file the names come first, then values:

%FDF-1.2
%\E2\E3\CF\D3
1 0 obj 
<<
/FDF 
<<
/Fields [
<<
/T (#E2#80#94weird#E2#80#94 long dash)
/V ()
>> 
<<
/T (#C2#BD #C2#BE #E2#86#92 #E2#80#98 #E2#80#99 #E2#80#9D #E2#80#9C #E2#80#A2 characters #E2#86#92nospace)
/V ()
>> 
<<
/T (RadioGroup)
/V /Off
>> 
<<
/T (TextField2)
/V ()
>> 
<<
/T (TextField1)
/V ()
>> 
<<
/T (#E2#80#93weird#E2#80#93 dash)
/V ()
>> 
<<
/T (&weird& and)
/V ()
>> 
<<
/T (weird #C3#91 characters)
/V ()
>> 
<<
/T ({weird} things)
/V ()
>> 
<<
/T (\(weird\) paranthesis)
/V ()
>> 
<<
/T (InterstingChar2)
/V (\FE\FF\00\BD\00 \00\BE\00 !\92\00  �\00  �\00  �\00  �\00  ")
>> 
<<
/T (TextFieldPage2)
/V ()
>> 
<<
/T (InterstingChar1)
/V (\D1, \F1, \C9, \CD, \D3)
>> 
<<
/T (TextFieldPage3)
/V ()
>> 
<<
/T (hierarchy2)
/Kids [
<<
/T (child2)
/Kids [
<<
/T (node2)
/V ()
>> 
<<
/T (node1)
/V ()
>>]
>> 
<<
/T (child)
/Kids [
<<
/T (node2)
/V ()
>> 
<<
/T (node3)
/V ()
>> 
<<
/T (node1)
/V ()
>>]
>>]
>> 
<<
/T (hierarchy)
/Kids [
<<
/T (node4)
/V ()
>> 
<<
/T (node2)
/V ()
>> 
<<
/T (node3)
/V ()
>> 
<<
/T (node1)
/V ()
>>]
>> 
<<
/T ([weird] brackets)
/V ()
>> 
<<
/T (node1)
/V ()
>> 
<<
/T (InterstingChar3)
/V (\FE\FF\00t\00h\00e\00r\00e\00 \00b\00e\00 \00e\00m\00o\00j\00i\00:\00 \D8=\DC�\00 \00a\00n\00d\00 \00s\00o\00m\00e\00 \00w\00i\00s\00e\00 \00m\00o\00n\00k\00e\00y\00 \D8=\DEH\D8=\DEI\D8=\DEJ)
>> 
<<
/T (betweenHierarch)
/V ()
>> 
<<
/T (TextFieldForMoreWeirdChars)
/V ()
>> 
<<
/T ("weird" quotes)
/V ()
>> 
<<
/T (checkBox)
/V /Off
>> 
<<
/T (<weird> tags)
/V ()
>> 
<<
/T (List Box)
/V ()
>>]
>>
>>
endobj 
trailer

<<
/Root 1 0 R
>>
%%EOF

`staplr` hex stickers

staplr checked in to useR! 2018 in Brisbane, Australia

@oganm @padpadpadpad @mathesong

get_fields with multi-line value

Hello, when using get_fields on a TextField with a multi-line value, only characters before the first line break are returned.

> in_fields <- get_fields("in.pdf")
> in_fields$multilinetext
$type
[1] "Text"

$name
[1] "multilinetext"

$value
[1] "Hello"

> in_fields$multilinetext$value <- paste0(c("Hello", "World"),
+                                             collapse = '\n')
> in_fields$multilinetext$value
[1] "Hello\nWorld"
> set_fields("in.pdf", "out.pdf", in_fields)
> get_fields("out.pdf")$multilinetext$value
[1] "Hello"

The pdf renders correctly with the full field value (whether manually entered/saved or with set_fields). Still trying to figure out whether the loss is happening in pdftk or get_fields.

Has anyone else hit this limitation? Any ideas for a workaround?

Thank you for a very useful package!

fields filled with set_fields sometimes is not visible initially (a minor edit to the field makes the entire text visible)

As revealed by #17, pdftk seems to be having problems with writing foreign language characters.

Problems are:

dump_data_fields's output encodes special characters in numeric character reference decimals. Stackoverflow told me that I can remedy this fairly easily, though it is somewhat hacky.
generate_fdf's output encodes things correctly but using readLines causes them to be read in hex representations. This can be fixed by changing encoding but I am unsure if it'll happen in all systems. I don't have a good way of testing things in different language devices. This could be fixed by changing the encoding while using readLines or writing a similar function to convert hex values into proper characters after using readLines

This is the 4th piece of code I wrote that was later revealed to be secretly racist. Need to have a decent way to test things in different localisations. Not sure what the best practices are

pdftk doesn't seem to understand ~ as home

Header / Footer

Can we use your package to add header / footer to merged pdf file?

get_fields leads to NAs and empty strings

I would like to extract data from pdfs similar to testForm.pdf
With testForm.pdf everything works fine.
When I try it with other pdfs (I tested two types) it leads to NAs in checkboxes and empty strings in text fields.
This is my R version:

platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 5.2
year 2018
month 12
day 20
svn rev 75870
language R
version.string R version 3.5.2 (2018-12-20)
nickname Eggshell Igloo

I just installes the github-version of staplr, it does not work.
Do you have any ideas?
Thank you!

Warning for pdftk installation

The Mac installation of pdftk from their website does not work and has not been updated for a long time (see this SO conversation). This causes staple_pdf() to hang infinitely.

In the above conversation they give a useable link for the installation of pdftk on Mac that might be worth mentioning in the README. Not really your fault at all but nice to mention. Again I could do the PR for this.

After I installed using the download of that link it worked fine on MacOS High Sierra 10.13.4.

Cheers
Dan

Check which field fills which field on PDF

Hello.
I've been using this awesome library for filling PDFs but I got a terrible experience filling and guessing which fields were which. I wrote the following script to create a filled with features names a PDF so you can see which names the library assigned to each blank.
Hope it comes useful for all of you. Thanks to @oganm for this super tool!

# FILL ALL FEATURES WITH FEATURES NAMES ON NEW PDF
library(dplyr)
library(staplr)
buttons <- names(fields)[sapply(seq_along(fields), function(x){"Button" %in% fields[[x]]})]
all <- data.frame(names = names(fields)) %>% 
  mutate(buttons = ifelse(names %in% buttons, TRUE, FALSE))
for (i in 1:nrow(all)) {
  if (all$buttons[i] == TRUE) {
    fields[[all$names[i]]]$value <- 1
  } else {
    fields[[all$names[i]]]$value <- all$names[i] 
  }
}
set_fields(pdfFile,"filled_fields.pdf", fields)

Field names with apostrophes do not show up in fields with get_fields()

If a field name has an apostrophe (e.g., "Employee's birthday") the field will simply not show up when you use get_fields().

CRAN update

It seems like a reasonable time has passed since the initial upload to CRAN. Since there are backwards incompatible changes in this version, it might make sense to have it out in the earliest opportunity. Are there things left to do before the next CRAN release?

Edit: I have also been holding out on adding staplr to shinyapps-package-dependencies because I'm lazy and didn't want to go back to fix the tests after the update but will do so after the release

Cran removal

Seems like the package is gone.

split_pdf on Windows PC

I have an issue migrating my code from MacOS to Windows devices.
The following code snippet works just fine on my MacOS but hangs itself after the first iteration:

include
path <- list.files(path = paste0(getwd(),"/raw"),
                   pattern = "*.pdf",
                   recursive = TRUE,
                   full.names =  TRUE)
file <- list.files(path = paste0(getwd(),"/raw"),
                   pattern = "*.pdf",
                   recursive = TRUE)

for(i in 1:length(file)){
  file.path <- str_extract(file[i],".*(?=(KV[0-9]+\\.pdf))")
  dir.create(file.path(paste0(getwd(),"/output/",file.path)), showWarnings = FALSE)
  split_pdf(input_filepath = path[i],
            output_directory = file.path,
            prefix = paste0("p","_"))
}

I installed pdftk on both devices and they work fine, so I guess it's an issue regarding staplr.

Any suggestions to quickly solve the issue?

pridiltal / staplr Goto Github PK

staplr's Introduction

staplr

staplr

Installation

Example

Troubleshooting and 2.11.0 changes

References

staplr's People

Contributors

Stargazers

Watchers

Forkers

staplr's Issues

Recommend Projects

Recommend Topics

Recommend Org