Giter VIP home page Giter VIP logo

qpdf's Introduction

qpdf

Split, Combine and Compress PDF files

Project Status: Active – The project has reached a stable, usable state and is being actively developed. CRAN_Status_Badge CRAN RStudio mirror downloads

Content-preserving transformations transformations of PDF files such as split, combine, and compress. This package interfaces directly to the 'qpdf' C++ API and does not require any command line utilities. Note that 'qpdf' does not read actual content from PDF files: to extract text and data you need the 'pdftools' package.

Hello World

All functions take one or more input and output pdf files.

library(qpdf)
pdf_compress("~/Downloads/v71i02.pdf")
[1] "/Users/jeroen/Downloads/v71i02_output.pdf"

qpdf's People

Contributors

jeroen avatar mmahmoudian avatar raymondben avatar tylfin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

qpdf's Issues

Error: too many open files

Usually qpdf::pdf_combine worked fine for a few PDFs, but when I try to combine 500 PDFs get the error: Too many open files.

I found that this error is related to the fact that qpdf opens the files during the process: https://qpdf.readthedocs.io/en/stable/cli.html. There is also a solution with --keep-files-open=[y|n]. However, I think this is not implemented in the R package.

Could I modify pdf_combine that it works?

Feature Request: Integration of Pages from Multiple PDFs

I have a relatively common use case where I need to modify .pdf files by combining parts of multiple .pdf files. For example, I may want pages 1-2 from input1.pdf, then page 3 from input2.pdf then page 3 from input1.pdf. According to section 7.8 of the qpdf manual (http://qpdf.sourceforge.net/files/qpdf-manual.pdf), there is a function in qpdf to do this.

I think that this would be a modification of the pdf_combine() function to take a pages argument that would be a list of vectors.

The result is possible with a combination of pdf_split() and pdf_combine(), but the result is a less efficient .pdf output file because of pdf object duplication (and it makes intermediate files than would be unnecessary).

Bookmarks

Hi...
When I join few pdf files using pdf_combine function, bookmarks will be removed/ deleted.
Please advice.
Thanks

Error: ld: library not found for -ljpeg

Hi, I am trying to install by compiling from source on R 4.0.3 on BigSur 11.2 and get this error:

ld: library not found for -ljpeg
clang-11: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [qpdf.so] Error 1
ERROR: compilation failed for package `‘qpdf’```

I have libjpeg 9d installed via Homebrew. I tried to reinstall and re-start R with no success. No matter what, the error persists. Any suggestions? Thanks

Feature Request: pdf_split() function appending too many zeros into file name.

Hi,

While working on pdf_split to split my pdf file into seperate pages it creates a file name and append zeros with the length of pdf into file name.e.g if my pdf file consists of 16 pages then it appends 16 zeros to file name which works fine for small pdf file but in my scenerio my pdf file is of 25k pages so in this case it tries to append 25k zeros to my file name which in turns break my code and not able to create seperate file.So requesting to please modify and add features to remove zeros or allow to set custom file name.
Will appreciate your all efforts.
Thanks :)

Feature Request

Hi,

Could you could add a version of pdf_split that splits a page down the middle into two separate files? Papers written in two columns have always been a pet peeve of mine, and I tried to implement the aforementioned function myself without much success.

jpeglib.h not found

You may need a SystemRequirements on jpeglib.h headers

In file included from libqpdf/Pl_DCT.cc:1:0:
include/qpdf/Pl_DCT.hh:27:21: fatal error: jpeglib.h: No such file or directory
 #include <jpeglib.h>
                     ^

As per cburgmer/csscritic#69, you may need libjpeg-dev for compiling, which may need to be checked.

Question: overlaying OCR'd text in package scope?

Hello,

Thanks so much for pdftools, qpdf, and all the other ropensci packages!

I recently received scanned pdfs and needed to make them searchable. The OCRmyPDF library accomplishes that by running OCR with Tesseract then adding an invisible text layer over the base raster layer.

It appears that OCRmyPDF uses pikepdf as its primary PDF manipulation tool, and pikepdf is built on QPDF.

I'm not sure if making PDFs searchable is common enough to warrant building, but if it were is that in scope for this package or would it belong somewhere else?

Best,
Trey

EDIT:
Tesseract has a text-only PDF output option that may allow using qpdf's overlay function to create the searchable text layer. Discussion at Tesseract issue 660.

Apparently OCRmyPDF uses that Tesseract output to create the overlaid PDF page. I can't quite figure out if the "sandwich renderer" is a name they came up with or an actual external tool.

WARNING qpdf is needed for checks on size reduction of PDFs

During package building/checking there is the following warning if qpdf is not installed:
WARNING qpdf is needed for checks on size reduction of PDFs
I tried installing the qpdf R package, but that did not help. Now I found out that it has nothing to do with this R package, it instead refers to the qpdf command line utility which must be available on the PATH. I suppose more people might be confused by this as well, so maybe it would help to add a note to this repo's readme, or the DESCRIPTION? In any case I figured it wouldn't hurt to have this issue here to explain what's going on.

Expose overlay feature

qpdf has a nice feature to overlay one page over another. I'm looking for this package's equivalent of

system2("qpdf", args = c("--overlay", overlay_file_path, "--to=1", "--", base_file_path, output_file_path))

I'd like to contribute this issue, but have no experience with Rcpp, so I'm a bit lost, frankly.

Error when installing qpdf in Windows 10

Hello,
I got an error when installing either qpdf package from CRAN and Github.

libqpdf/QPDFXRefEntry.o libqpdf/QTC.o libqpdf/QUtil.o libqpdf/RC4.o libqpdf/rijndael.o libqpdf/SecureRandomDataProvider.o libqpdf/sha2.o libqpdf/sha2big.o
C:/rtools40/mingw32/bin/g++ -shared -s -static-libgcc -o qpdf.dll tmp.def RcppExports.o bindings.o -Llibqpdf -lstatqpdf -ljpeg -lz -LC:/PROGRA~1/R/R-40~1.3/bin/i386 -lR
C:/rtools40/mingw32/bin/../lib/gcc/i686-w64-mingw32/8.3.0/../../../../i686-w64-mingw32/bin/ld.exe: cannot find -ljpeg
collect2.exe: error: ld returned 1 exit status
no DLL was created
ERROR: compilation failed for package 'qpdf'
* removing 'C:/Users/rave1/Documents/R/win-library/4.0/qpdf'

I provide my current system configuration and log showing detailed information.
I am afraid I missed adding a library/application in my system.
Would you please advise?
Attached, you will find my current system configuration and logs of installation trial from Github:
qpdf-logs-system-config.txt

Compatibility with qpdf 11

It appears there are a few breaking changes in qpdf 11, which cause a few build errors when compiling:

g++ -std=gnu++14 -I"/usr/include/R" -DNDEBUG -I/usr/include/p11-kit-1  -I'/usr/lib64/R/library/Rcpp/include' -I/usr/local/include   -fpic  -O2  -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection  -c bindings.cpp -o bindings.o
In file included from /usr/include/qpdf/Buffer.hh:26,
                 from /usr/include/qpdf/QPDF.hh:37,
                 from bindings.cpp:1:
/usr/include/qpdf/PointerHolder.hh:31:3: warning: #warning "POINTERHOLDER_TRANSITION is not defined -- see qpdf/PointerHolder.hh" [-Wcpp]
   31 | # warning "POINTERHOLDER_TRANSITION is not defined -- see qpdf/PointerHolder.hh"
      |   ^~~~~~~
bindings.cpp: In function 'QPDF read_pdf_with_password(const char*, const char*)':
bindings.cpp:27:10: error: use of deleted function 'QPDF::QPDF(const QPDF&)'
   27 |   return pdf;
      |          ^~~
/usr/include/qpdf/QPDF.hh:941:5: note: declared here
  941 |     QPDF(QPDF const&) = delete;
      |     ^~~~
bindings.cpp: In function 'int cpp_pdf_length(const char*, const char*)':
bindings.cpp:32:53: error: use of deleted function 'QPDF::QPDF(const QPDF&)'
   32 |   QPDF pdf = read_pdf_with_password(infile, password);
      |                                                     ^
/usr/include/qpdf/QPDF.hh:941:5: note: declared here
  941 |     QPDF(QPDF const&) = delete;
      |     ^~~~
bindings.cpp: In function 'Rcpp::CharacterVector cpp_pdf_split(const char*, std::string, const char*)':
bindings.cpp:41:55: error: use of deleted function 'QPDF::QPDF(const QPDF&)'
   41 |   QPDF inpdf = read_pdf_with_password(infile, password);
      |                                                       ^
/usr/include/qpdf/QPDF.hh:941:5: note: declared here
  941 |     QPDF(QPDF const&) = delete;
      |     ^~~~
bindings.cpp:44:21: warning: comparison of integer expressions of different signedness: 'int' and 'std::vector<QPDFPageObjectHelper>::size_type' {aka 'long unsigned int'} [-Wsign-compare]
   44 |   for (int i = 0; i < pages.size(); i++) {
      |                   ~~^~~~~~~~~~~~~~
bindings.cpp: In function 'Rcpp::CharacterVector cpp_pdf_select(const char*, const char*, Rcpp::IntegerVector, const char*)':
bindings.cpp:61:55: error: use of deleted function 'QPDF::QPDF(const QPDF&)'
   61 |   QPDF inpdf = read_pdf_with_password(infile, password);
      |                                                       ^
/usr/include/qpdf/QPDF.hh:941:5: note: declared here
  941 |     QPDF(QPDF const&) = delete;
      |     ^~~~
bindings.cpp: In function 'Rcpp::CharacterVector cpp_pdf_combine(Rcpp::CharacterVector, const char*, const char*)':
bindings.cpp:82:64: error: use of deleted function 'QPDF::QPDF(const QPDF&)'
   82 |     QPDF inpdf = read_pdf_with_password(infiles.at(i), password);
      |                                                                ^
/usr/include/qpdf/QPDF.hh:941:5: note: declared here
  941 |     QPDF(QPDF const&) = delete;
      |     ^~~~
bindings.cpp:84:23: warning: comparison of integer expressions of different signedness: 'int' and 'std::vector<QPDFPageObjectHelper>::size_type' {aka 'long unsigned int'} [-Wsign-compare]
   84 |     for (int i = 0; i < pages.size(); i++) {
      |                     ~~^~~~~~~~~~~~~~
bindings.cpp: In function 'Rcpp::CharacterVector cpp_pdf_compress(const char*, const char*, bool, const char*)':
bindings.cpp:98:55: error: use of deleted function 'QPDF::QPDF(const QPDF&)'
   98 |   QPDF inpdf = read_pdf_with_password(infile, password);
      |                                                       ^
/usr/include/qpdf/QPDF.hh:941:5: note: declared here
  941 |     QPDF(QPDF const&) = delete;
      |     ^~~~
bindings.cpp: In function 'Rcpp::CharacterVector cpp_pdf_rotate_pages(const char*, const char*, Rcpp::IntegerVector, int, bool, const char*)':
bindings.cpp:111:55: error: use of deleted function 'QPDF::QPDF(const QPDF&)'
  111 |   QPDF inpdf = read_pdf_with_password(infile, password);
      |                                                       ^
/usr/include/qpdf/QPDF.hh:941:5: note: declared here
  941 |     QPDF(QPDF const&) = delete;
      |     ^~~~

pdf_split creates file names with excessive length

When pdf_split is used on a file with n pages, it numbers the results file using an n digit number.
So a 20 page pdf file gets a name with a 20 digit number on the end. I have to split many files
with a large number of pages. Any file with more than 260 pages gets a file name with more than
260 characters. On a windows machine, this causes the program to error out and fail.

To have a unique file number for an n-page file should only
require ceiling(log10(n))+1 digits not n digits.

I am using qpdf Version: 1.1 under R version 4.1.3.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.