ropensci / qpdf Goto Github PK
View Code? Open in Web Editor NEWSplit, Combine and Compress PDF files
Home Page: https://docs.ropensci.org/qpdf
License: Other
Split, Combine and Compress PDF files
Home Page: https://docs.ropensci.org/qpdf
License: Other
Hi...
When I join few pdf files using pdf_combine function, bookmarks will be removed/ deleted.
Please advice.
Thanks
Usually qpdf::pdf_combine
worked fine for a few PDFs, but when I try to combine 500 PDFs get the error: Too many open files
.
I found that this error is related to the fact that qpdf opens the files during the process: https://qpdf.readthedocs.io/en/stable/cli.html. There is also a solution with --keep-files-open=[y|n]
. However, I think this is not implemented in the R package.
Could I modify pdf_combine that it works?
Hi,
While working on pdf_split to split my pdf file into seperate pages it creates a file name and append zeros with the length of pdf into file name.e.g if my pdf file consists of 16 pages then it appends 16 zeros to file name which works fine for small pdf file but in my scenerio my pdf file is of 25k pages so in this case it tries to append 25k zeros to my file name which in turns break my code and not able to create seperate file.So requesting to please modify and add features to remove zeros or allow to set custom file name.
Will appreciate your all efforts.
Thanks :)
It appears there are a few breaking changes in qpdf 11, which cause a few build errors when compiling:
g++ -std=gnu++14 -I"/usr/include/R" -DNDEBUG -I/usr/include/p11-kit-1 -I'/usr/lib64/R/library/Rcpp/include' -I/usr/local/include -fpic -O2 -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -c bindings.cpp -o bindings.o
In file included from /usr/include/qpdf/Buffer.hh:26,
from /usr/include/qpdf/QPDF.hh:37,
from bindings.cpp:1:
/usr/include/qpdf/PointerHolder.hh:31:3: warning: #warning "POINTERHOLDER_TRANSITION is not defined -- see qpdf/PointerHolder.hh" [-Wcpp]
31 | # warning "POINTERHOLDER_TRANSITION is not defined -- see qpdf/PointerHolder.hh"
| ^~~~~~~
bindings.cpp: In function 'QPDF read_pdf_with_password(const char*, const char*)':
bindings.cpp:27:10: error: use of deleted function 'QPDF::QPDF(const QPDF&)'
27 | return pdf;
| ^~~
/usr/include/qpdf/QPDF.hh:941:5: note: declared here
941 | QPDF(QPDF const&) = delete;
| ^~~~
bindings.cpp: In function 'int cpp_pdf_length(const char*, const char*)':
bindings.cpp:32:53: error: use of deleted function 'QPDF::QPDF(const QPDF&)'
32 | QPDF pdf = read_pdf_with_password(infile, password);
| ^
/usr/include/qpdf/QPDF.hh:941:5: note: declared here
941 | QPDF(QPDF const&) = delete;
| ^~~~
bindings.cpp: In function 'Rcpp::CharacterVector cpp_pdf_split(const char*, std::string, const char*)':
bindings.cpp:41:55: error: use of deleted function 'QPDF::QPDF(const QPDF&)'
41 | QPDF inpdf = read_pdf_with_password(infile, password);
| ^
/usr/include/qpdf/QPDF.hh:941:5: note: declared here
941 | QPDF(QPDF const&) = delete;
| ^~~~
bindings.cpp:44:21: warning: comparison of integer expressions of different signedness: 'int' and 'std::vector<QPDFPageObjectHelper>::size_type' {aka 'long unsigned int'} [-Wsign-compare]
44 | for (int i = 0; i < pages.size(); i++) {
| ~~^~~~~~~~~~~~~~
bindings.cpp: In function 'Rcpp::CharacterVector cpp_pdf_select(const char*, const char*, Rcpp::IntegerVector, const char*)':
bindings.cpp:61:55: error: use of deleted function 'QPDF::QPDF(const QPDF&)'
61 | QPDF inpdf = read_pdf_with_password(infile, password);
| ^
/usr/include/qpdf/QPDF.hh:941:5: note: declared here
941 | QPDF(QPDF const&) = delete;
| ^~~~
bindings.cpp: In function 'Rcpp::CharacterVector cpp_pdf_combine(Rcpp::CharacterVector, const char*, const char*)':
bindings.cpp:82:64: error: use of deleted function 'QPDF::QPDF(const QPDF&)'
82 | QPDF inpdf = read_pdf_with_password(infiles.at(i), password);
| ^
/usr/include/qpdf/QPDF.hh:941:5: note: declared here
941 | QPDF(QPDF const&) = delete;
| ^~~~
bindings.cpp:84:23: warning: comparison of integer expressions of different signedness: 'int' and 'std::vector<QPDFPageObjectHelper>::size_type' {aka 'long unsigned int'} [-Wsign-compare]
84 | for (int i = 0; i < pages.size(); i++) {
| ~~^~~~~~~~~~~~~~
bindings.cpp: In function 'Rcpp::CharacterVector cpp_pdf_compress(const char*, const char*, bool, const char*)':
bindings.cpp:98:55: error: use of deleted function 'QPDF::QPDF(const QPDF&)'
98 | QPDF inpdf = read_pdf_with_password(infile, password);
| ^
/usr/include/qpdf/QPDF.hh:941:5: note: declared here
941 | QPDF(QPDF const&) = delete;
| ^~~~
bindings.cpp: In function 'Rcpp::CharacterVector cpp_pdf_rotate_pages(const char*, const char*, Rcpp::IntegerVector, int, bool, const char*)':
bindings.cpp:111:55: error: use of deleted function 'QPDF::QPDF(const QPDF&)'
111 | QPDF inpdf = read_pdf_with_password(infile, password);
| ^
/usr/include/qpdf/QPDF.hh:941:5: note: declared here
941 | QPDF(QPDF const&) = delete;
| ^~~~
Hello,
I got an error when installing either qpdf package from CRAN and Github.
libqpdf/QPDFXRefEntry.o libqpdf/QTC.o libqpdf/QUtil.o libqpdf/RC4.o libqpdf/rijndael.o libqpdf/SecureRandomDataProvider.o libqpdf/sha2.o libqpdf/sha2big.o
C:/rtools40/mingw32/bin/g++ -shared -s -static-libgcc -o qpdf.dll tmp.def RcppExports.o bindings.o -Llibqpdf -lstatqpdf -ljpeg -lz -LC:/PROGRA~1/R/R-40~1.3/bin/i386 -lR
C:/rtools40/mingw32/bin/../lib/gcc/i686-w64-mingw32/8.3.0/../../../../i686-w64-mingw32/bin/ld.exe: cannot find -ljpeg
collect2.exe: error: ld returned 1 exit status
no DLL was created
ERROR: compilation failed for package 'qpdf'
* removing 'C:/Users/rave1/Documents/R/win-library/4.0/qpdf'
I provide my current system configuration and log showing detailed information.
I am afraid I missed adding a library/application in my system.
Would you please advise?
Attached, you will find my current system configuration and logs of installation trial from Github:
qpdf-logs-system-config.txt
qpdf
has a nice feature to overlay one page over another. I'm looking for this package's equivalent of
system2("qpdf", args = c("--overlay", overlay_file_path, "--to=1", "--", base_file_path, output_file_path))
I'd like to contribute this issue, but have no experience with Rcpp, so I'm a bit lost, frankly.
Hi,
Awesome package!
Is there a way to remove PDF pages using qpdf::pdf_subset()
without losing all the bookmarks?
I have a relatively common use case where I need to modify .pdf files by combining parts of multiple .pdf files. For example, I may want pages 1-2 from input1.pdf, then page 3 from input2.pdf then page 3 from input1.pdf. According to section 7.8 of the qpdf manual (http://qpdf.sourceforge.net/files/qpdf-manual.pdf), there is a function in qpdf to do this.
I think that this would be a modification of the pdf_combine()
function to take a pages
argument that would be a list of vectors.
The result is possible with a combination of pdf_split()
and pdf_combine()
, but the result is a less efficient .pdf output file because of pdf object duplication (and it makes intermediate files than would be unnecessary).
When I use pdf_combine(), the size of the synthesized pages is not uniform. I hope I can consider my request.
You may need a SystemRequirements
on jpeglib.h
headers
In file included from libqpdf/Pl_DCT.cc:1:0:
include/qpdf/Pl_DCT.hh:27:21: fatal error: jpeglib.h: No such file or directory
#include <jpeglib.h>
^
As per cburgmer/csscritic#69, you may need libjpeg-dev
for compiling, which may need to be checked.
Hi,
Could you could add a version of pdf_split
that splits a page down the middle into two separate files? Papers written in two columns have always been a pet peeve of mine, and I tried to implement the aforementioned function myself without much success.
When pdf_split is used on a file with n pages, it numbers the results file using an n digit number.
So a 20 page pdf file gets a name with a 20 digit number on the end. I have to split many files
with a large number of pages. Any file with more than 260 pages gets a file name with more than
260 characters. On a windows machine, this causes the program to error out and fail.
To have a unique file number for an n-page file should only
require ceiling(log10(n))+1 digits not n digits.
I am using qpdf Version: 1.1 under R version 4.1.3.
Hi, I am trying to install by compiling from source on R 4.0.3 on BigSur 11.2 and get this error:
ld: library not found for -ljpeg
clang-11: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [qpdf.so] Error 1
ERROR: compilation failed for package `‘qpdf’```
I have libjpeg 9d installed via Homebrew. I tried to reinstall and re-start R with no success. No matter what, the error persists. Any suggestions? Thanks
Hello,
Thanks so much for pdftools, qpdf, and all the other ropensci packages!
I recently received scanned pdfs and needed to make them searchable. The OCRmyPDF library accomplishes that by running OCR with Tesseract then adding an invisible text layer over the base raster layer.
It appears that OCRmyPDF uses pikepdf as its primary PDF manipulation tool, and pikepdf is built on QPDF.
I'm not sure if making PDFs searchable is common enough to warrant building, but if it were is that in scope for this package or would it belong somewhere else?
Best,
Trey
EDIT:
Tesseract has a text-only PDF output option that may allow using qpdf's overlay function to create the searchable text layer. Discussion at Tesseract issue 660.
Apparently OCRmyPDF uses that Tesseract output to create the overlaid PDF page. I can't quite figure out if the "sandwich renderer" is a name they came up with or an actual external tool.
During package building/checking there is the following warning if qpdf
is not installed:
WARNING qpdf is needed for checks on size reduction of PDFs
I tried installing the qpdf
R package, but that did not help. Now I found out that it has nothing to do with this R package, it instead refers to the qpdf
command line utility which must be available on the PATH
. I suppose more people might be confused by this as well, so maybe it would help to add a note to this repo's readme, or the DESCRIPTION? In any case I figured it wouldn't hurt to have this issue here to explain what's going on.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.