boazsegev / combine_pdf Goto Github PK

A Pure ruby library to merge PDF files, number pages and maybe more...

License: MIT License

Ruby 100.00%

ruby pdf-files pdf-generation pdf pdf-merge

combine_pdf's Introduction

CombinePDF - the ruby way for merging PDF files

CombinePDF is a nifty model, written in pure Ruby, to parse PDF files and combine (merge) them with other PDF files, watermark them or stamp them (all using the PDF file format and pure Ruby code).

Unmaintained - Help Wanted(!)

I decided to stop maintaining this gem and hope someone could take over the PR reviews and maintenance of this gem (or simply open a successful fork).

I wrote this gem because I needed to solve an issue with bates-numbering existing PDF documents.

However, since 2014 I have been maintaining the gem for free and for no reason at all, except that I enjoyed sharing it with the community.

I love this gem, but I cannot keep maintaining it as I have my own projects to focus own and I need both the time and (more importantly) the mindspace.

Install

Install with ruby gems:

gem install combine_pdf

Known Limitations

Quick rundown:

When reading PDF Forms, some form data might be lost. I tried fixing this to the best of my ability, but I'm not sure it all works just yet.
When combining PDF Forms, form data might be unified. I couldn't fix this because this is how PDF forms work (filling a field fills in the data in any field with the same name), but frankly, I kinda liked the issue... it's almost a feature.
When unifying the same TOC data more then once, one of the references will be unified with the other (meaning that if the pages look the same, both references will link to the same page instead of linking to two different pages). You can fix this by adding content to the pages before merging the PDF files (i.e. add empty text boxes to all the pages).
Some links and data (URL links and PDF "Named Destinations") are stored at the root of a PDF and they aren't linked back to from the page. Keeping this information requires merging the PDF objects rather then their pages.

Some links will be lost when ripping pages out of PDF files and merging them with another PDF.
Some encrypted PDF files (usually the ones you can't view without a password) will fail quietly instead of noisily. If you prefer to choose the noisy route, you can specify the raise_on_encrypted option using CombinePDF.load(pdf_file, raise_on_encrypted: true) which will raise a CombinePDF::EncryptionError.
Sometimes the CombinePDF will raise an exception even if the PDF could be parsed (i.e., when PDF optional content exists)... I find it better to err on the side of caution, although for optional content PDFs an exception is avoidable using CombinePDF.load(pdf_file, allow_optional_content: true).
The CombinePDF gem runs recursive code to both parse and format the PDF files. Hence, PDF files that have heavily nested objects, as well as those that where combined in a way that results in cyclic nesting, might explode the stack - resulting in an exception or program failure.

CombinePDF is written natively in Ruby and should (presumably) work on all Ruby platforms that follow Ruby 2.0 compatibility.

However, PDF files are quite complex creatures and no guaranty is provided.

For example, PDF Forms are known to have issues and form data might be lost when attempting to combine PDFs with filled form data (also, forms are global objects, not page specific, so one should combine the whole of the PDF for any data to have any chance of being preserved).

The same applies to PDF links and the table of contents, which all have global attributes and could be corrupted or lost when combining PDF data.

If this library causes loss of data or burns down your house, I'm not to blame - as pointed to by the MIT license. That being said, I'm using the library happily after testing against different solutions.

Combine/Merge PDF files or Pages

To combine PDF files (or data):

pdf = CombinePDF.new
pdf << CombinePDF.load("file1.pdf") # one way to combine, very fast.
pdf << CombinePDF.load("file2.pdf")
pdf.save "combined.pdf"

Or even a one liner:

(CombinePDF.load("file1.pdf") << CombinePDF.load("file2.pdf") << CombinePDF.load("file3.pdf")).save("combined.pdf")

you can also add just odd or even pages:

pdf = CombinePDF.new
i = 0
CombinePDF.load("file.pdf").pages.each do |page|
  i += 1
  pdf << page if i.even?
end
pdf.save "even_pages.pdf"

notice that adding all the pages one by one is slower then adding the whole file.

Add content to existing pages (Stamp / Watermark)

To add content to existing PDF pages, first import the new content from an existing PDF file. After that, add the content to each of the pages in your existing PDF.

In this example, we will add a company logo to each page:

company_logo = CombinePDF.load("company_logo.pdf").pages[0]
pdf = CombinePDF.load "content_file.pdf"
pdf.pages.each {|page| page << company_logo} # notice the << operator is on a page and not a PDF object.
pdf.save "content_with_logo.pdf"

Notice the << operator is on a page and not a PDF object. The << operator acts differently on PDF objects and on Pages.

The << operator defaults to secure injection by renaming references to avoid conflics. For overlaying pages using compressed data that might not be editable (due to limited filter support), you can use:

pdf.pages(nil, false).each {|page| page << stamp_page}

Page Numbering

adding page numbers to a PDF object or file is as simple as can be:

pdf = CombinePDF.load "file_to_number.pdf"
pdf.number_pages
pdf.save "file_with_numbering.pdf"

Numbering can be done with many different options, with different formating, with or without a box object, and even with opacity values - see documentation.

For example, should you prefer to place the page number on the bottom right side of all PDF pages, do:

pdf.number_pages(location: [:bottom_right])

As another example, the dashes around the number are removed and a box is placed around it. The numbering is semi-transparent and the first 3 pages are numbered using letters (a,b,c) rather than numbers:

# number first 3 pages as "a", "b", "c"
pdf.number_pages(number_format: " %s ",
                 location: [:top, :bottom, :top_left, :top_right, :bottom_left, :bottom_right],
                 start_at: "a",
                 page_range: (0..2),
                 box_color: [0.8,0.8,0.8],
                 border_color: [0.4, 0.4, 0.4],
                 border_width: 1,
                 box_radius: 6,
                 opacity: 0.75)
# number the rest of the pages as 4, 5, ... etc'
pdf.number_pages(number_format: " %s ",
                 location: [:top, :bottom, :top_left, :top_right, :bottom_left, :bottom_right],
                 start_at: 4,
                 page_range: (3..-1),
                 box_color: [0.8,0.8,0.8],
                 border_color: [0.4, 0.4, 0.4],
                 border_width: 1,
                 box_radius: 6,
                 opacity: 0.75)

pdf.number_pages(number_format: " %s ", location: :bottom_right, font_size: 44)

Loading and Parsing PDF data

Loading PDF data can be done from file system or directly from the memory.

Loading data from a file is easy:

pdf = CombinePDF.load("file.pdf")

You can also parse PDF files from memory. Loading from the memory is especially effective for importing PDF data recieved through the internet or from a different authoring library such as Prawn:

pdf_data = prawn_pdf_document.render # Import PDF data from Prawn
pdf = CombinePDF.parse(pdf_data)

Using parse is also effective when loading data from a remote location, circumventing the need for unnecessary temporary files. For example:

require 'combine_pdf'
require 'net/http'

url = "https://example.com/my.pdf"
pdf = CombinePDF.parse Net::HTTP.get_response(URI.parse(url)).body

Rendering PDF data

Similarly, to loading and parsing, rendering can also be performed either to the memory or to a file.

You can output a string of PDF data using .to_pdf. For example, to let a user download the PDF from either a Rails application or a Plezi application:

# in a controller action
send_data combined_file.to_pdf, filename: "combined.pdf", type: "application/pdf"

In Sinatra:

# in your path's block
status 200
body combined_file.to_pdf
headers 'content-type' => "application/pdf"

If you prefer to save the PDF data to a file, you can always use the save method as we did in our earlier examples.

Some PDF files contain optional content sections which cannot always be merged reliably. By default, an exception is raised if one of these files are detected. You can optionally pass an allow_optional_content parameter to the PDFParser.new, CombinePDF.load and CombinePDF.parse methods:

new_pdf = CombinePDF.new
new_pdf << CombinePDF.load(pdf_file, allow_optional_content: true)
attachments.each { |att| new_pdf << CombinePDF.load(att, allow_optional_content: true) }

Demo

You can see a Demo for a "Bates stumping web-app" and read through it's code . Good luck :)

Decryption & Filters

Some PDF files are encrypted and some are compressed (the use of filters)...

There is very little support for encrypted files and very very basic and limited support for compressed files.

I need help with that.

Comments and file structure

If you want to help with the code, please be aware:

I'm a self learned hobbiest at heart. The documentation is lacking and the comments in the code are poor guidlines.

The code itself should be very straight forward, but feel free to ask whatever you want.

Credit

Stefan Leitner (@sLe1tner) wrote the outline merging code supporting PDFs which contain a ToC.

Caige Nichols wrote an amazing RC4 gem which I used in my code.

I wanted to install the gem, but I had issues with the internet and ended up copying the code itself into the combine_pdf_decrypt class file.

Credit to his wonderful is given here. Please respect his license and copyright... and mine.

License

MIT

Contributions

You can look at the GitHub Issues Page and see the "help wanted" tags.

If you're thinking of donations or sending me money - no need. This project can sustain itself without your money.

What this project needs is the time given by caring developers who keep it up to date and fix any documentation errors or issues they notice ... having said that, gifts (such as free coffee or iTunes gift cards) are always fun. But I think there are those in real need that will benefit more from your generosity.

combine_pdf's People

Contributors

Stargazers

Watchers

Forkers

lamphuongha amount bbucek espinosa kruszczynski fyquah taniarv bruce-shi flyeven nathanl hasaniskandar benmkramer mmaloon lgn21st sjchakrav sashman alectrico andyentity wingleungchoi mehanoid carlhuth jacobbullock owst reyko rotair sle1tner joelw gyuchang wa-labs aom pavshka mattgibson farmdrop progmem holtmaat yardstick kingfun2015 qq18436558 paulslaby bloomandwild femmestem sabril lcshen rosanarufer subtletree uedev avit edsontrick idealprojectgroup s2t2 pierre-alain-b garethcokell 8vius ismudnx jolohaga jaytho aha-app botp greencard-fun norancer irineufilho fieldly dashmeet elbzero berniechiu kwkyle robrieba sled hiro-riveros zilverline andreipiatrou yaseeniqbal75 learningtapestry saonam satyap clementkerneur edman193 clark-upserv kyle-owen-opendoor olleolleolle leviwilson tvcam digitalboxmobile aliismayilov yudechen0820 adam-e-trepanier nic-lan quatro rubyclickap todomodo macfire shopify nac13k enwood elvisortiz16 aleebberg moodcal maryshirl friscotony bizzcontrol

combine_pdf's Issues

Umlaut handling

I want to add textboxes to an existing page and everything is working fine, except German special characters are not output correctly.

page.textbox("ßöäüÖÄÜ", address_options)

becomes

in the pdf file. Is there some sort of workaround?

File parsing fail when the `endobj` keyword is missing

Hi @boazsegev,

Sometimes I got this error when combine two pdf file:
no implicit conversion of Symbol into Integer
.../ruby/2.0.0/gems/combine_pdf-0.1.16/lib/combine_pdf/combine_pdf_pdf.rb:582:in `[]'

Please help me solve it.

undefined method errors after 0.2.18 update

have script that combines latest version of a PDF from several folders in a hierarchy - combining them into a single PDF.

Under 0.2.17 it works as expected - under 0.2.18 I get an error when adding the second pdf to the new document.
error:

C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/combine_pdf-0.2.18/lib/combine_pdf/pdf_public.rb:303:in `insert': undefined method `update' for nil:NilClass (NoMethodError)
        from C:/Ruby22-x64/lib/ruby/gems/2.2.0/gems/combine_pdf-0.2.18/lib/combine_pdf/pdf_public.rb:275:in `<<'
        from C:/Users/User02/Documents/combinePDF/combine.rb:62:in `block in <main>'
        from C:/Users/User02/Documents/combinePDF/combine.rb:57:in `each'
        from C:/Users/User02/Documents/combinePDF/combine.rb:57:in `<main>'

ruby code

# file_list is an array of PDF files - something like:
# ["R:/Project DS/C/A1_3_151203_Construction.pdf", "R:/Project DS/B/A1_2_160410_Permit_Revisions.pdf","R:/Project DS/C/A1_3_160412_More_Permit_Revisions.pdf"]

pdf_merge = CombinePDF.new
file_list.each do |pdf_file|
  pdf_merge << CombinePDF.load(pdf_file)    
end

root is unknown - cannot determine if file is Encrypted - only in production

Hello, i'm having some problems reading a pdf, the error is the one reported in the subject, the strange things is that it works just great in development, but in production i get this strange error. Also reading the file from console works correctly, both in dev or production enviroments

Fail to load pdf document after combine files

I am using Rails Prawn to generate two files. Both of them work fine. After I tried combine them to one file, it gives error 'Failed to load PDF document'. I am able to open it with Preview or AdobeAcrobat. It gives error only when I try to display it in the browser. Any ideas?

pdf = CombinePDF.new
pdf << CombinePDF.parse(pdf1.render)
pdf << CombinePDF.parse(pdf2.render)
respond_to do |format|
    format.pdf do
        send_data pdf, :filename => "Receipt.pdf", :type => "application/pdf", :disposition => "inline"
    end
end

Merge result might not open on Adobe Reader with error 14

File opens in all other readers, but not in Adobe's. Unfortunately, this is the most popular reader.

I was able to fix files with ghostscript (installed in ubuntu server by default I think) with code like this:

      pdf = CombinePDF.new
      files.each do |file|
        pdf << CombinePDF.new(file)
        File.unlink file
      end
      temp_file_name = Rails.root.join('tmp', Dir::Tmpname.make_tmpname(['fwsrp', '.pdf'], nil))
      pdf.save(temp_file_name)
      `gs -o #{file_name} -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress #{temp_file_name}`    # fix corrupt file output

This is the ghostscript output:

   **** Warning:  An error occurred while reading an XREF table.
   **** The file has been damaged.  This may have been caused
   **** by a problem while converting or transfering the file.
   **** Ghostscript will attempt to recover the data.

After this file opens in Adobe Reader, hope it helps to find the issue, or at least as a temporary fix for generated files.

Fix YARD Documentation

The RubyGem.org uses YARD documentation that ignores the @Private tag.

In order to make the library more accessible for developers and collaborators, the documentation needs to be re-organized.

Some PDF data will be ignored by Acrobat Reader on Mac (and maybe on windows).

Some PDF objects will be ignored by Acrobat Reader on Mac (and maybe on windows), so that empty pages or missing data (when stamped) might be seen when using Acrobat Reader to read the PDF data.

The cause is yet unknown.

The issue was discovered with Imanol's help from http://unoycero.com

Problem parsing a certain pdf

Hi, I'm getting an error when trying to load the pdf data for a certain pdf:
https://docs.google.com/file/d/0B4AGXAJrQz1RNE5OZHFTdWIycHc/edit?pli=1

pdf = CombinePDF.new('/home/stefan/Useful_KI_Information.pdf')
didn't find reference {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>34}
Couldn't connect all values from references - didn't find reference {:DecodeParms=>{:Columns=>5, :Predictor=>12}, :Filter=>:FlateDecode, :ID=>["\xE9\xE3\xD7l\x19\v\xC9\x11\xB1\xD9\x99yM){\x1F", "'J\xB0\xCCk\xB4\xDFO\xA6\x83V\x9F\x1DM\x13\xB5"], :Index=>[35, 34], :Info=>nil, :Length=>112, :Prev=>253565, :Root=>{:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>36}, :Size=>35, :Type=>:XRef, :W=>[1, 3, 1], :raw_stream_content=>"h\xDEbb\x00\x01&F\x86\x1D\xB5\fL\f\f\x8C\xC7\x81$\xA3\x18\x0F\x98}\eD2\x80E\xA6\xBEG\x88\x80\xD50L\x9A\x0E\"\x99\xD7\x81H&\x7F\x90\x9A\xFD%`\xF6\x15\xB0\x9AV\x10\xC9\xCD\vVs\n,\xD2\x05\"\xF9\x0E\x81\xCD\x04\xEBe\xBC\x0F\xB4\xF7\xAFR\eX\x84\x19L\xB2\x81I\x06Ft\x92\xF9/vqF$q\xA6\xFF`\x11\x06\x80\x00\x03\x00\xBD\xCF\x14\\\r", :indirect_generation_number=>0, :indirect_reference_id=>20}!!!
didn't find reference {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>34}
Couldn't connect all values from references - didn't find reference {:DecodeParms=>{:Columns=>5, :Predictor=>12}, :Filter=>:FlateDecode, :ID=>["\xE9\xE3\xD7l\x19\v\xC9\x11\xB1\xD9\x99yM){\x1F", "'J\xB0\xCCk\xB4\xDFO\xA6\x83V\x9F\x1DM\x13\xB5"], :Index=>[35, 34], :Info=>nil, :Length=>112, :Prev=>253565, :Root=>{:Metadata=>{:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>17}, :PageLabels=>{:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>31}, :Pages=>{:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>33}, :Type=>:Catalog, :indirect_generation_number=>0, :indirect_reference_id=>36}, :Size=>35, :Type=>:XRef, :W=>[1, 3, 1], :raw_stream_content=>"h\xDEbb\x00\x01&F\x86\x1D\xB5\fL\f\f\x8C\xC7\x81$\xA3\x18\x0F\x98}\eD2\x80E\xA6\xBEG\x88\x80\xD50L\x9A\x0E\"\x99\xD7\x81H&\x7F\x90\x9A\xFD%`\xF6\x15\xB0\x9AV\x10\xC9\xCD\vVs\n,\xD2\x05\"\xF9\x0E\x81\xCD\x04\xEBe\xBC\x0F\xB4\xF7\xAFR\eX\x84\x19L\xB2\x81I\x06Ft\x92\xF9/vqF$q\xA6\xFF`\x11\x06\x80\x00\x03\x00\xBD\xCF\x14\\\r", :indirect_generation_number=>0, :indirect_reference_id=>20}!!!
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>34, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>31, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>33, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>33, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>55, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>33, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>21, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>33, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>22, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>33, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>23, :referenced_object=>nil}

Using ruby 2.1.5. It's working for every PDF I've tried except this one.. also, looking through issues it's potentially related to #6

Some PDF pages cannot be injected with text objects

Some PDFs, specifically certain scanned PDF files (usually those with text recognition), cannot be injected with text objects.

The text injected into those PDF pages (I tried numbering the pages) was not rendered - or, as far as I could tell, was rendered as transparent (render mode 3) even when the mode operator (render modes 1 and 2) was present.

Shapes injected into the same pages were rendered correctly.

Seems as if the previous content streams (the ones after which the content is injected) override the later content streams (the injected data).

Reversing the streams order could not be tested, because the white page - although scanned - isn't transparent, so that the content "behind" it could not be observed.

add font support for text box / numbering feature

The standard 14 PDF fonts have an uncertain future and the recommendation is to embed fonts, even for those 14 standard fonts.

For the future of the numbering feature, we need to enable embedding fonts as part of the PDFWriter or PDF object.

There are a number of issues to think about:

Where would the fonts be embedded?
The natural place would be the PDFWriter object... but... embedding fonts in the PDFWriter object might cause duplication of fonts when importing multiple pages. Hence, the embedding of fonts should be carefully coded, so that the add_referenced method of the PDF object could recognize any duplication.

On this matter, thought should be given to the question, weather the "add_referenced" method have a better duplication recognition algorithm then the basic one currently in place. Speed performance should overrule space conservation.

Metrics calculations
The current metrics calculations will be broken with any external fonts.
Adding fonts support will require adding a new metrics calculation for "dimensions_of" method currently defined in the metrics_dictionary.rb file.
Unicode support
Some languages are written from right to left or up to down (or down to up, but that's more history and is quire of a non issue at the current stage).
Limitations of Font support
The font support should, by definition, be limited. Libraries like Prawn provide PDF Authoring, whereas the CombinePDF Library is looking for providing a pdftk alternative.

Hence, it is more important to have pristine support for limited number of fonts (maybe only TrueType fonts, maybe only OpenType fonts or maybe a closed library of specific unicode royalty-free fonts) rather then an extensive font library.

It seems to me that having a library of hard-coded fonts might be the best solution for allowing the use of Unicode in the page numbering and text box features.

adding pagenumbers broke PDF generated by convert-jpg-to-pdf.net

Hello,

I have problem with PDF generated by convert-jpg-to-pdf.net (created not by me, but our users),

Example:

require "combine_pdf"
pdf = CombinePDF.load("lena.pdf")
pdf.number_pages
pdf.save("lena-pagenumbers.pdf")

File lena-pagenumbers.pdfis broken and displays empty page.

BTW: PDFinfo shows "producer":

pdfinfo lena.pdf|grep Producer
Producer:       PDFlib Lite 7.0.5p3 (PHP5/Linux-x86_64)
pdfinfo lena-pagenumbers.pdf|grep Producer
Producer:       Ruby CombinePDF 0.2.9 Library

How to reproduce:

chose your favourite .jpg (http://www.cs.cmu.edu/~chuck/lennapg/len_std.jpg)
create pdf via http://www.convert-jpg-to-pdf.net
use CombinePDF to add page numbers (or whatever) and save result
result will be empty page (but not empty file, file contains original pdf file and somethin probably pagenumbers).

Documentation incomplete? – overlay with position

Hi,

I'm considering using this gem and will give it a shot when I have some time, but I am not sure whether it suits my needs, as the documentation is overly simplistic.

I need to overlay a smaller PDF over a larger PDF, in a specific location. Is this possible? The samples simply "overlay" a PDF on top of another and that's it.

I suppose a workaround that might work is to ensure the "overlay" PDF is the same size as the "background" PDF, and just have the overlay PDF mostly transparent. Is this the suggested workaround?

Issues with Adobe Acrobat Reader DC

When I open a PDF generated by combine_pdf in Adobe Acrobat Reader DC, there are a couple issues:

If I add a Sticky Note to it and save the file, the Sticky Note shows on every page of the PDF.

On a PC, I get an error from Acrobat: "The document could not be saved. There was a problem reading this document. (23)"

Has anyone else come across this?

Thanks

Null Byte

So I'm pulling down a remote PDF file on Heroku. I want to add a watermark on it. This works perfect locally on Mac, but in production on Heroku it's failing. This is how I'm doing it

def pdf(number)
require 'open-uri'
company_logo = CombinePDF.new("logo.pdf").pages[0]
open("#{number}.pdf", 'wb') do |file|
file << open("http://patentimages.storage.googleapis.com/pdfs/#{number}.pdf").read
end

pdf = CombinePDF.new open("#{number}.pdf").read
pdf.pages.each {|page| page << company_logo}
pdf.save "#{number}.pdf"
end
However this line is failing:
pdf = CombinePDF.new contents
With this:
ArgumentError: string contains null byte

Also before the error I see this:
PDF 1.5 Object streams found - they are not fully supported! attempting to extract objects.
Again this works perfectly locally on Mac without any changes in code. What am I missing?

Btw, number in this case is "US5409533"

page.copy v.s. page.copy(true) issue

I'm not sure if this is a genuine issue or if I'm doing something incorrectly.

I have this code which worked correctly in 0.2.6. It loads a prawn pdf then loads blank.pdf and stamps a different prawn generated pdf onto it and combines them

prawn_pdf = render_to_string "/forms/show.pdf"
pdf = CombinePDF.parse(prawn_pdf)
blank_declaration = CombinePDF.load("#{Rails.root}/public/blank.pdf")
signature_pdf_data = render_to_string "/electronic_signatures/signature.pdf"
signature_pdf = CombinePDF.parse(signature_pdf_data)
blank_declaration.pages.first << signature_pdf.pages.first # error on this line
pdf << blank_declaration

After updating to 0.2.8 I get this error on the line mentioned above

NoMethodError (undefined method `[]' for nil:NilClass):

This seems to relate to the changes mentioned in issue #32

I've reverted to 0.2.6 for now which is working fine anyway.

Cheers for the awesome gem!

Unknown PDF parsing error - maleformed PDF file?

i try the following code.

combine_pdf = CombinePDF.new
combine_pdf << CombinePDF.parse(pdf_data)

got the following error

Warning: parser advnacing for unknown reason. Potential data-loss.
Warning: parser advnacing for unknown reason. Potential data-loss.
RuntimeError: Unknown PDF parsing error - maleformed PDF file?

spec

combine_pdf 0.2.14

I doubt the pdf_data is wrong or not supported yet.

"%PDF-1.4\n1 0 obj\n<<\n/Title (\xFE\xFFbSSpg\rR\xA1)\n/Producer (wkhtmltopdf)\n/CreationDate (D:20160224104241)\n>>\nendobj\n4 0 obj\n<<\n/Type /ExtGState\n/SA true\n/SM 0.02\n/ca 1.0\n/CA 1.0\n/AIS false\n/SMask /None>>\nendobj\n5 0 obj\n[/Pattern /DeviceRGB]\nendobj\n7 0 obj\n<<\n/Type /XObject\n/Subtype /Image\n/Width 14\n/Height 99\n/BitsPerComponent 8\n/ColorSpace /DeviceRGB\n/Length 8 0 R\n/Filter /DCTDecode\n>>\nstream\n\xFF\xD8\xFF\xE0\u0000\u0010JFIF\u0000\u0001\u0001\u0001\u0000`\u0000`\u0000\u0000\xFF\xDB\u0000C\u0000\u0002\u0001\u0001\u0002\u0001\u0001\u0002\u0002\u0002\u0002\u0002\u0002\u0002\u0002\u0003\u0005\u0003\u0003\u0003\u0003\u0003\u0006\u0004\u0004\u0003\u0005\a\u0006\a\a\a\u0006\a\a\b\t\v\t\b\b\n\b\a\a\n\r\n\n\v\f\f\f\f\a\t\u000E\u000F\r\f\u000E\v\f\f\f\xFF\xDB\u0000C\u0001\  (to be continued)"

parsing of PaperPort PDFs fails

Hi,

I have PDF created by PaperPort 12 (probably http://www.nuance.com/imaging/paperport/paperport-upgrade-to-12.asp, but not sure), and convert_pdf create document with empty pages.

How to reproduce:

Download pdf: http://vitas.matfyz.cz/tmp/couldnt-connect-refrence.pdf (contains 2 pages 2 images of scanned papers)
run

require "combine_pdf"
require "pp"

pdf =  CombinePDF.load("./couldnt-connect-refrence.pdf");
pdf.number_pages
pdf.save "out.pdf"

output is:

$ ruby com.rb 
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>8, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>7, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>11, :referenced_object=>nil}
couldn't connect a reference!!! could be a null or removed (empty) object, Silent error!!!
 Object raising issue: {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>10, :referenced_object=>nil}

more info:

$ pdfinfo couldnt-connect-refrence.pdf 
Title:          
Subject:        
Keywords:       
Author:         
Creator:        PaperPort 12
Producer:       PaperPort 12
CreationDate:   Fri Nov 13 13:42:58 2015
ModDate:        Fri Nov 13 13:42:58 2015
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          2
Encrypted:      no
Page size:      612 x 841.68 pts
Page rot:       0
File size:      466188 bytes
Optimized:      no
PDF version:    1.3

$ pdfinfo out.pdf 
Title:          
Subject:        
Keywords:       
Author:         
Creator:        PaperPort 12
Producer:       Ruby CombinePDF 0.2.11 Library
CreationDate:   Wed Nov 18 12:57:23 2015
ModDate:        Wed Nov 18 12:57:23 2015
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          2
Encrypted:      no
Page size:      612 x 841.68 pts
Page rot:       0
File size:      240627 bytes
Optimized:      no
PDF version:    1.3

undefined method `[]' for nil:NilClass error on windows


CombinePDF.new('mypdf.pdf')

.../Ruby200/lib/ruby/gems/2.0.0/gems/combine_pdf-0.1.18/lib/combine_pdf/ combine_pdf_parser.rb:163:in `_parse_': undefined method`[]' for nil:NilClass ( NoMethodError)

Ruby 2.0.0p481 / Win7 / Version 0.1.18

Work fine on ruby 2.2.0p0@debian

Try to give you more debug info later.

Check for corrupt pdf file

Firstly thanks for the awesome gem!

I have a pdf that returns this error Unknown PDF parsing error - maleformed PDF file? when parsing.

Is it possible to check if a pdf is corrupt and not parse it if that is the case, so not to break the whole request?

I'm using version 0.2.14. The pdf in question is version 1.2.

Cheers!

PDF files with more then one Catalog get duplicated page objects

Some PDF files have (through history, or malformed authoring tools) more then one Catalog object.

This causes page objects to be duplicated more then once (as the PDF object merges the different catalogs)...

The pages shouldn't be duplicated, as there should be only one active Catalog object per file - the rest should be discarded.

Bookmarks/Links drop when combining PDFs

Combining two PDF's with appropriate bookmarks and links to named destinations creates a pdf where

bookmarks are gone, and
clicking on one of the previous in-document links throws a "The document's page tree contains an invalid node." (in Acrobat).

I see issue #31, but that shouldnt apply here because the pdf has different named destinations (resolved with GUIDs)

Page number overlays table content

Thanks for writing this awesome gem! When I add page numbers to my PDF, say at the bottom, it overlays whatever content may already exist there. How can I adjust the page number so it displays at a certain position in the margin/footer/header?

Thanks for your help.

Extend Flate Filter support

The current Flate Filter support is very basic and it is limited to basic zlib compression.

We need to add support for zlib with parameters (TIFF and PNG groups variations).

After completing the Flate filter, more filter support might be explored (although they are rarely used in a way that is required - as currently filters are only required for extracting Content Stream objects from the PDF.

Memory leak

I'm running this gem for my project where i need to combine multiple pdf into single pdf file.
It seems ruby process takes more and more memory with every pdf.

PDF Strings can cause corrupt PDF output

The issue was reported by Diyei Gomi that showed how certain PDF files, when stamped one on top of the other, can result in a malformed PDF being Rendered.

Looking into the issue, I discovered that Hex strings were parsed using a case-sensative method (which was a bug). fixing this seems to have fixed the issue on the PDF files used for testing.

Further testing and verification is required. A patched version will be released soon (hopefully one correcting the issue).

PDF page stamping and text writing might fail when the resources dictionary contains reference (indirect) objects

I am trying to add the first page of one pdf file as background of every page of another pdf file. This works fine for most normal files but I have found some files which won't work. You can find one example at https://www.dropbox.com/s/p0m6fd59st42kkd/bg_ftc.pdf?dl=0
The file was created with word afaik (sorry for that).

You can reproduce my problem with the following minmal example:

b = CombinePDF.load("bg_ftc.pdf").pages[0].copy(true)
o = CombinePDF.new
o << [b]
o.save "o.pdf"

Combine pdfs from PaperClip saved to S3

I'm trying to combine PDFs that were saved on S3 using PaperClip.

Controller code:

    pdf = CombinePDF.new
    @costproject.attachments.each do |attachment|
      pdf << CombinePDF.new(attachment.attach.path)
    end
    send_file pdf, :disposition => 'inline', :type => "application/pdf"

I'm getting "stack level too deep"

Is there a way to make this work?

Thanks!

Latest release breaks page cominations

After upgraded to Nov 4th release, I now receive this error:

"no implicit conversion of Symbol into Integer"

Here is a portion of the backtrace:

combine_pdf (0.2.10) lib/combine_pdf/page_methods.rb:616:in `[]'
combine_pdf (0.2.10) lib/combine_pdf/page_methods.rb:616:in `init_contents'
combine_pdf (0.2.10) lib/combine_pdf/page_methods.rb:601:in `block in copy'
combine_pdf (0.2.10) lib/combine_pdf/page_methods.rb:601:in `instance_exec'
combine_pdf (0.2.10) lib/combine_pdf/page_methods.rb:601:in `copy'
combine_pdf (0.2.10) lib/combine_pdf/page_methods.rb:58:in `inject_page'
combine_pdf (0.2.10) lib/combine_pdf/page_methods.rb:49:in `<<'

Clarify error optional content

i get the following runtime error "Optional Content PDF files aren't supported and their pages cannot be safely extracted", this is a one off error and doesn't normally happen so looks to be pdf specific.

I've had a look at the code but can you quickly clarify the issue here for me please. Thanks.

lazy load of pdf document

Hello! I want to use combine_pdf for parsing a ~ 300 pages pdf. But when I try to load this relatively big document via CombinePDF.load() memory usage increases drastically and it returns me a memory allocation usage: broken pipe(). It looks like I need to load just 1st page, edit it, then 2nd page, edit it and so on. Is there any ability to do it?

Footer template with page numbering

Thanks for the quick reply for my previous answer.

I have another question or 2. Can you tell me please if I can achieve this with combine_pdf?

Can I use a footer pdf template and somehow apply the page numbering on it?
I want to generate a pdf footer template with another library from HTML and put it over existing footer of other pdf documents(unfortunately I won't know the format of the other pdfs, but I can generate the HTML footer template on the fly)

A very slow method would be to iterate each page, get the current page number, replace the page numbering placeholder from the HTML footer with the current page number, convert the HTML template to PDF than merge it. However, the performance will be incredibly low.

I'm thinking if I can calculate the position of the page numbering placeholder on the footer and use the coordinates in your page_numbering function. But I'm not sure if how it works....

The footer will also have links or links with images to external websites.

I want to replace first and last page with my own pdf pages. Does this break the TOC or other bookmarks?

Thank you

Output file cannot be saved from Adobe Reader with "Save As optimizes for Fast Web View" preference enabled.

Getting error "The document could not be saved. There was a problem reading this document (23)." while trying to save the output file from Adobe Reader with "Save As optimizes for Fast Web View" preference enabled.

Combination of Documents with embedded fonts

We combine a background document with a letter - This document contains text in an embedded font. By it's own it looks just fine - after the merge single characters are missing and replaced by a box.

Undefined method #empty? when parsing PDF

Hi - I'm looking at combine_pdf to get around the issues with prawn-templates. I have a PDF document that is a form (not a fillable PDF form, just an ordinary PDF created in Word that makes a printed paper form). Then I'm creating a PDF in prawn that looks like the form is filled out when the form is in the background and the user data is laid over it. Prawn-templates worked great for that, but I don't want to be stuck at 0.15 forever - so here we are.

I'm doing some very preliminary experiments with combine_pdf - when I do the following with one of my PDFs that will act as the background:

CombinePDF.new(template_file)

I get this error:

undefined method `empty?' for #<Enumerator: "Identity":bytes>
combine_pdf (0.1.9) lib/combine_pdf/combine_pdf_parser.rb:210:in `_parse_'

and in the log file, there are the messages:

Couldn't connect all values from references - didn't find reference {a big hash}!!!
PDF 1.5 Object streams found - they are not fully supported! attempting to extract objects.

is there perhaps a way to generate my template file, or use Acrobat to get rid of the object streams (e.g. saving it while forcing compatibility with older versions of Acrobat)? I have control of the template file, so if it's possible to generate it in a way that doesn't use object streams, I can do that.

Thanks for your help.

Inheritance fails for some Catalog and Pages properties.

Some PDF files contain the Page's Resources dictionary (/Resources) in the Catalog instead of the Page objects.

These resources are wrongfully ignored by CombinePDF during the catalog rebuilding process...

These PDF files are rare, but it is a known issue.

This will take some work to resolve... I'm working on this.

PDF 1.5 Object Stream Error when combining file created with Prawn

So I'm trying to watermark on demand in my rails app. I'm using using Prawn to generate the stamp across a blank page, and then Combine_Pdf to combine that with watermark with an existing pdf file. The combine process has been successful so far, but I keep getting the warning "PDF 1.5 Object streams found - they are not fully supported! attempting to extract objects."

The watermarks from Prawn are only text objects, but is there anything I should do to avoid this warning message?

CombinePDF.parse fail when PDF has comments with some special string.

Hello,

First of all, I'd like to thank you for you to provide this great PDF merge tool.

My client use some tools to generate PDF and turn out it will insert some random comments into the PDF file, here is a example:

5 0 obj
% [8496] 
<<
/Filter /FlateDecode
/Length 3739
>>
stream
...
endstream
endobj

the line NO.2 will cause Runtime Error with message "Unknown PDF parsing error - maleformed PDF file?", my temporary solution is just ignore all comments but I'm not sure if I break something else, please advise, thanks again.

lgn21st@a0cc17e

Adding page numbers set them to wrong positions

Hello!

Please advise me with number_pages method. I used the next code:

# erasing previously generated file
  if File::exist?('public/pdf/test_full.pdf')
    File::delete('public/pdf/test_full.pdf')
  end  

# creating an object
  pdf = CombinePDF.new()

# adding clear page
  pdf.new_page

# setting up options for number_pages procedure
  opt = {
        :number_format => '%s',
        :start_at => 1,
        :font => :Times,
        :margin_from_side => 0,
        :margin_from_height => 0,
        :location => [:bottom],
        :font_size => 12
  }

# running numbering
  pdf.number_pages(opt)

# getting a result
  pdf.save('public/pdf/test_full.pdf')

What I am waiting to receive: I want to see a blank page with "1" at the middle-bottom position of page.

What I received: "1" in strange position, as I wanted to add "1" to the center-right position of page. I attached a result file: test_full.pdf

NoMethodError: undefined method `[]' for nil:NilClass

with some pdfs I have strange error:

NoMethodError: undefined method `[]' for nil:NilClass
    from /path_to_gem/ruby/2.2.0/gems/combine_pdf-0.2.17/lib/combine_pdf/page_methods.rb:876:in `block (2 levels) in should_secure?'

I have digg into that code and debug that v variable has this value:

{:F6=>{:is_reference_only=>true, :referenced_object=>{:Type=>:Font, :Subtype=>:TrueType, :BaseFont=>:"Arial-BoldMT", :FirstChar=>30, :LastChar=>255, :Encoding=>:WinAnsiEncoding, :Widths=>[750, 750, 278, 333, 474, 556, 556, 889, 722, 238, 333, 333, 389, 584, 278, 333, 278, 278, 556, 556, 556, 556, 556, 556, 556, 556, 556, 556, 333, 333, 584, 584, 584, 611, 975, 722, 722, 722, 722, 667, 611, 778, 722, 278, 556, 722, 611, 833, 722, 778, 667, 778, 722, 667, 611, 722, 667, 944, 667, 667, 611, 333, 278, 333, 584, 556, 333, 556, 611, 556, 611, 556, 333, 611, 611, 278, 278, 556, 278, 889, 611, 611, 611, 611, 389, 556, 333, 611, 556, 778, 556, 556, 500, 389, 280, 389, 584, 750, 556, 750, 278, 556, 500, 1000, 556, 556, 333, 1000, 667, 333, 1000, 750, 611, 750, 750, 278, 278, 500, 500, 350, 556, 1000, 333, 1000, 556, 333, 944, 750, 500, 667, 278, 333, 556, 556, 556, 556, 280, 556, 333, 737, 370, 556, 584, 333, 737, 552, 400, 549, 333, 333, 333, 576, 556, 333, 333, 333, 365, 556, 834, 834, 834, 611, 722, 722, 722, 722, 722, 722, 1000, 722, 667, 667, 667, 667, 278, 278, 278, 278, 722, 722, 778, 778, 778, 778, 778, 584, 778, 722, 722, 722, 722, 667, 667, 611, 556, 556, 556, 556, 556, 556, 889, 556, 556, 556, 556, 556, 278, 278, 278, 278, 611, 611, 611, 611, 611, 611, 611, 549, 611, 611, 611, 611, 611, 556, 611, 556], :FontDescriptor=>{:is_reference_only=>true, :referenced_object=>{:Type=>:FontDescriptor, :Ascent=>728, :CapHeight=>0, :Descent=>-210, :Flags=>42, :FontBBox=>[-628, -376, 2000, 1018], :FontName=>:Arial_Bold, :ItalicAngle=>0, :StemV=>0}}}}, :F7=>{:is_reference_only=>true, :referenced_object=>{:Type=>:Font, :Subtype=>:TrueType, :BaseFont=>:ArialMT, :FirstChar=>30, :LastChar=>255, :Encoding=>:WinAnsiEncoding, :Widths=>[750, 750, 278, 278, 355, 556, 556, 889, 667, 191, 333, 333, 389, 584, 278, 333, 278, 278, 556, 556, 556, 556, 556, 556, 556, 556, 556, 556, 278, 278, 584, 584, 584, 556, 1015, 667, 667, 722, 722, 667, 611, 778, 722, 278, 500, 667, 556, 833, 722, 778, 667, 778, 722, 667, 611, 722, 667, 944, 667, 667, 611, 278, 278, 278, 469, 556, 333, 556, 556, 500, 556, 556, 278, 556, 556, 222, 222, 500, 222, 833, 556, 556, 556, 556, 333, 500, 278, 556, 500, 722, 500, 500, 500, 334, 260, 334, 584, 750, 556, 750, 222, 556, 333, 1000, 556, 556, 333, 1000, 667, 333, 1000, 750, 611, 750, 750, 222, 222, 333, 333, 350, 556, 1000, 333, 1000, 500, 333, 944, 750, 500, 667, 278, 333, 556, 556, 556, 556, 260, 556, 333, 737, 370, 556, 584, 333, 737, 552, 400, 549, 333, 333, 333, 576, 537, 333, 333, 333, 365, 556, 834, 834, 834, 611, 667, 667, 667, 667, 667, 667, 1000, 722, 667, 667, 667, 667, 278, 278, 278, 278, 722, 722, 778, 778, 778, 778, 778, 584, 778, 722, 722, 722, 722, 667, 667, 611, 556, 556, 556, 556, 556, 556, 889, 500, 556, 556, 556, 556, 278, 278, 278, 278, 556, 556, 556, 556, 556, 556, 556, 549, 611, 556, 556, 556, 556, 500, 556, 500], :FontDescriptor=>{:is_reference_only=>true, :referenced_object=>{:Type=>:FontDescriptor, :Ascent=>728, :CapHeight=>0, :Descent=>-210, :Flags=>42, :FontBBox=>[-665, -325, 2000, 1006], :FontName=>:Arial, :ItalicAngle=>0, :StemV=>0}}}}, :F21=>{:is_reference_only=>true, :referenced_object=>{:Type=>:Font, :Subtype=>:TrueType, :BaseFont=>:"Arial-ItalicMT", :FirstChar=>30, :LastChar=>255, :Encoding=>:WinAnsiEncoding, :Widths=>[750, 750, 278, 278, 355, 556, 556, 889, 667, 191, 333, 333, 389, 584, 278, 333, 278, 278, 556, 556, 556, 556, 556, 556, 556, 556, 556, 556, 278, 278, 584, 584, 584, 556, 1015, 667, 667, 722, 722, 667, 611, 778, 722, 278, 500, 667, 556, 833, 722, 778, 667, 778, 722, 667, 611, 722, 667, 944, 667, 667, 611, 278, 278, 278, 469, 556, 333, 556, 556, 500, 556, 556, 278, 556, 556, 222, 222, 500, 222, 833, 556, 556, 556, 556, 333, 500, 278, 556, 500, 722, 500, 500, 500, 334, 260, 334, 584, 750, 556, 750, 222, 556, 333, 1000, 556, 556, 333, 1000, 667, 333, 1000, 750, 611, 750, 750, 222, 222, 333, 333, 350, 556, 1000, 333, 1000, 500, 333, 944, 750, 500, 667, 278, 333, 556, 556, 556, 556, 260, 556, 333, 737, 370, 556, 584, 333, 737, 552, 400, 549, 333, 333, 333, 576, 537, 333, 333, 333, 365, 556, 834, 834, 834, 611, 667, 667, 667, 667, 667, 667, 1000, 722, 667, 667, 667, 667, 278, 278, 278, 278, 722, 722, 778, 778, 778, 778, 778, 584, 778, 722, 722, 722, 722, 667, 667, 611, 556, 556, 556, 556, 556, 556, 889, 500, 556, 556, 556, 556, 278, 278, 278, 278, 556, 556, 556, 556, 556, 556, 556, 549, 611, 556, 556, 556, 556, 500, 556, 500], :FontDescriptor=>{:is_reference_only=>true, :referenced_object=>{:Type=>:FontDescriptor, :Ascent=>728, :CapHeight=>0, :Descent=>-208, :Flags=>106, :FontBBox=>[-517, -325, 1359, 998], :FontName=>:Arial_Italic, :ItalicAngle=>-12, :StemV=>0}}}}, :F37=>{:is_reference_only=>true, :referenced_object=>{:Type=>:Font, :Subtype=>:TrueType, :BaseFont=>:"Arial-BoldItalicMT", :FirstChar=>30, :LastChar=>255, :Encoding=>:WinAnsiEncoding, :Widths=>[750, 750, 278, 333, 474, 556, 556, 889, 722, 238, 333, 333, 389, 584, 278, 333, 278, 278, 556, 556, 556, 556, 556, 556, 556, 556, 556, 556, 333, 333, 584, 584, 584, 611, 975, 722, 722, 722, 722, 667, 611, 778, 722, 278, 556, 722, 611, 833, 722, 778, 667, 778, 722, 667, 611, 722, 667, 944, 667, 667, 611, 333, 278, 333, 584, 556, 333, 556, 611, 556, 611, 556, 333, 611, 611, 278, 278, 556, 278, 889, 611, 611, 611, 611, 389, 556, 333, 611, 556, 778, 556, 556, 500, 389, 280, 389, 584, 750, 556, 750, 278, 556, 500, 1000, 556, 556, 333, 1000, 667, 333, 1000, 750, 611, 750, 750, 278, 278, 500, 500, 350, 556, 1000, 333, 1000, 556, 333, 944, 750, 500, 667, 278, 333, 556, 556, 556, 556, 280, 556, 333, 737, 370, 556, 584, 333, 737, 552, 400, 549, 333, 333, 333, 576, 556, 333, 333, 333, 365, 556, 834, 834, 834, 611, 722, 722, 722, 722, 722, 722, 1000, 722, 667, 667, 667, 667, 278, 278, 278, 278, 722, 722, 778, 778, 778, 778, 778, 584, 778, 722, 722, 722, 722, 667, 667, 611, 556, 556, 556, 556, 556, 556, 889, 556, 556, 556, 556, 556, 278, 278, 278, 278, 611, 611, 611, 611, 611, 611, 611, 549, 611, 611, 611, 611, 611, 556, 611, 556], :FontDescriptor=>{:is_reference_only=>true, :referenced_object=>{:Type=>:FontDescriptor, :Ascent=>728, :CapHeight=>0, :Descent=>-210, :Flags=>106, :FontBBox=>[-560, -376, 1390, 1018], :FontName=>:Arial_Bold_Italic, :ItalicAngle=>-12, :StemV=>0}}}}, :F63=>{:is_reference_only=>true, :referenced_object=>{:Type=>:Font, :Subtype=>:TrueType, :BaseFont=>:"TimesNewRomanPS-BoldMT", :FirstChar=>30, :LastChar=>255, :Encoding=>:WinAnsiEncoding, :Widths=>[778, 778, 250, 333, 555, 500, 500, 1000, 833, 278, 333, 333, 500, 570, 250, 333, 250, 278, 500, 500, 500, 500, 500, 500, 500, 500, 500, 500, 333, 333, 570, 570, 570, 500, 930, 722, 667, 722, 722, 667, 611, 778, 778, 389, 500, 778, 667, 944, 722, 778, 611, 778, 722, 556, 667, 722, 722, 1000, 722, 722, 667, 333, 278, 333, 581, 500, 333, 500, 556, 444, 556, 444, 333, 500, 556, 278, 333, 556, 278, 833, 556, 500, 556, 556, 444, 389, 333, 556, 500, 722, 500, 500, 444, 394, 220, 394, 520, 778, 500, 778, 333, 500, 500, 1000, 500, 500, 333, 1000, 556, 333, 1000, 778, 667, 778, 778, 333, 333, 500, 500, 350, 500, 1000, 333, 1000, 389, 333, 722, 778, 444, 722, 250, 333, 500, 500, 500, 500, 220, 500, 333, 747, 300, 500, 570, 333, 747, 500, 400, 549, 300, 300, 333, 576, 540, 333, 333, 300, 330, 500, 750, 750, 750, 500, 722, 722, 722, 722, 722, 722, 1000, 722, 667, 667, 667, 667, 389, 389, 389, 389, 722, 722, 778, 778, 778, 778, 778, 570, 778, 722, 722, 722, 722, 722, 611, 556, 500, 500, 500, 500, 500, 500, 722, 444, 444, 444, 444, 444, 278, 278, 278, 278, 500, 556, 500, 500, 500, 500, 500, 549, 500, 556, 556, 556, 556, 500, 556, 500], :FontDescriptor=>{:is_reference_only=>true, :referenced_object=>{:Type=>:FontDescriptor, :Ascent=>677, :CapHeight=>0, :Descent=>-216, :Flags=>42, :FontBBox=>[-558, -307, 2000, 1026], :FontName=>:Times_New_Roman_Bold, :ItalicAngle=>0, :StemV=>0}}}}, :F65=>{:is_reference_only=>true, :referenced_object=>{:Type=>:Font, :Subtype=>:TrueType, :BaseFont=>:"TimesNewRomanPS-ItalicMT", :FirstChar=>30, :LastChar=>255, :Encoding=>:WinAnsiEncoding, :Widths=>[778, 778, 250, 333, 420, 500, 500, 833, 778, 214, 333, 333, 500, 675, 250, 333, 250, 278, 500, 500, 500, 500, 500, 500, 500, 500, 500, 500, 333, 333, 675, 675, 675, 500, 920, 611, 611, 667, 722, 611, 611, 722, 722, 333, 444, 667, 556, 833, 667, 722, 611, 722, 611, 500, 556, 722, 611, 833, 611, 556, 556, 389, 278, 389, 422, 500, 333, 500, 500, 444, 500, 444, 278, 500, 500, 278, 278, 444, 278, 722, 500, 500, 500, 500, 389, 389, 278, 500, 444, 667, 444, 444, 389, 400, 275, 400, 541, 778, 500, 778, 333, 500, 556, 889, 500, 500, 333, 1000, 500, 333, 944, 778, 556, 778, 778, 333, 333, 556, 556, 350, 500, 889, 333, 980, 389, 333, 667, 778, 389, 556, 250, 389, 500, 500, 500, 500, 275, 500, 333, 760, 276, 500, 675, 333, 760, 500, 400, 549, 300, 300, 333, 576, 523, 250, 333, 300, 310, 500, 750, 750, 750, 500, 611, 611, 611, 611, 611, 611, 889, 667, 611, 611, 611, 611, 333, 333, 333, 333, 722, 667, 722, 722, 722, 722, 722, 675, 722, 722, 722, 722, 722, 556, 611, 500, 500, 500, 500, 500, 500, 500, 667, 444, 444, 444, 444, 444, 278, 278, 278, 278, 500, 500, 500, 500, 500, 500, 500, 549, 500, 500, 500, 500, 500, 444, 500, 444], :FontDescriptor=>{:is_reference_only=>true, :referenced_object=>{:Type=>:FontDescriptor, :Ascent=>694, :CapHeight=>0, :Descent=>-216, :Flags=>106, :FontBBox=>[-498, -307, 1333, 1023], :FontName=>:Times_New_Roman_Italic, :ItalicAngle=>-16, :StemV=>0}}}}, :F68=>{:is_reference_only=>true, :referenced_object=>{:Type=>:Font, :Subtype=>:TrueType, :BaseFont=>:TimesNewRomanPSMT, :FirstChar=>30, :LastChar=>255, :Encoding=>:WinAnsiEncoding, :Widths=>[778, 778, 250, 333, 408, 500, 500, 833, 778, 180, 333, 333, 500, 564, 250, 333, 250, 278, 500, 500, 500, 500, 500, 500, 500, 500, 500, 500, 278, 278, 564, 564, 564, 444, 921, 722, 667, 667, 722, 611, 556, 722, 722, 333, 389, 722, 611, 889, 722, 722, 556, 722, 667, 556, 611, 722, 722, 944, 722, 722, 611, 333, 278, 333, 469, 500, 333, 444, 500, 444, 500, 444, 333, 500, 500, 278, 278, 500, 278, 778, 500, 500, 500, 500, 333, 389, 278, 500, 500, 722, 500, 500, 444, 480, 200, 480, 541, 778, 500, 778, 333, 500, 444, 1000, 500, 500, 333, 1000, 556, 333, 889, 778, 611, 778, 778, 333, 333, 444, 444, 350, 500, 1000, 333, 980, 389, 333, 722, 778, 444, 722, 250, 333, 500, 500, 500, 500, 200, 500, 333, 760, 276, 500, 564, 333, 760, 500, 400, 549, 300, 300, 333, 576, 453, 333, 333, 300, 310, 500, 750, 750, 750, 444, 722, 722, 722, 722, 722, 722, 889, 667, 611, 611, 611, 611, 333, 333, 333, 333, 722, 722, 722, 722, 722, 722, 722, 564, 722, 722, 722, 722, 722, 722, 556, 500, 444, 444, 444, 444, 444, 444, 667, 444, 444, 444, 444, 444, 278, 278, 278, 278, 500, 500, 500, 500, 500, 500, 500, 549, 500, 500, 500, 500, 500, 500, 500, 500], :FontDescriptor=>{:is_reference_only=>true, :referenced_object=>{:Type=>:FontDescriptor, :Ascent=>693, :CapHeight=>0, :Descent=>-216, :Flags=>42, :FontBBox=>[-568, -307, 2000, 1007], :FontName=>:Times_New_Roman, :ItalicAngle=>0, :StemV=>0}}}}, :R8=>{:is_reference_only=>true, :referenced_object=>{:BaseFont=>:"OKDLIF+Helvetica,Bold", :FontDescriptor=>{:is_reference_only=>true, :referenced_object=>{:Type=>:FontDescriptor, :FontName=>:"OKDLIF+Helvetica,Bold", :FontBBox=>[-22, -218, 703, 729], :Flags=>32, :Ascent=>729, :CapHeight=>729, :Descent=>-218, :ItalicAngle=>0, :StemV=>105, :MissingWidth=>500, :XHeight=>549, :CharSet=>"/A/D/T/d/f/five/nine/one/p/two/underscore/zero", :FontFile3=>{:is_reference_only=>true, :referenced_object=>{:Filter=>:FlateDecode, :Subtype=>:Type1C, :Length=>1132, :raw_stream_content=>"x\x9CU\x92}LSW\x18\xC6\xCF\xA5\xED\xA5\xB2R(\xE4\xA2\x13\xA1W7\x04*\x90B`\xC0\x808\xF9(\r\x12\x1D\xA2\xA2\xE0\xE4\xB3\xC0\r-E)\fp\f'[\x82\x1C]\x96\x00c\x0E\xA4\x01\x01\x91\x8D\xAF\x11X\x14\xE7`S7]\x18\xE0\x12!\v\x92\x18\x83(\x1A$\x83\xEC\xBD\xE4\xD4d\xA7\x9B\xCB\xB2?\xEE\xCD=\xCF9\xEF\xB9\xBF\xE7y_\x06I\x1D\x10\xC30.z\x83\xB1\xDC`\x11r\xB3\x03b\xCD\xC6<\xBB\x14 z2\xE26\a\xD1KRF\x8A7R7L2/\x94\xDC\xFA\xC4\x19+$X!\xED\xDE&\xEDu\x83v\x15\xD4\xB9@\x95+\x920LE}s\x9C\xB9\xA4\xF2\xA4PPh\xE1\xFD\x0E\x1DH\xF3\xDF\xB5+\xE0?%822\x92\xCF\xA9\xFCw\x87\x8F7\x94\n\x05\xC5\xFCN\xFAQn0\x9AKL\x86bK\x14\x1FGO\e\x8DB._`\xAC,),\xE5\xB3\xF3\xF2\fy\xF6\xB2\xC3\xD9FC\x11\xAF\x13\x8CBI\x89\xB9\x9C\xF7\x8B\xF3\xE7C\xB4\xDA\xE0@\xFA\n\xDB'\x98r\xCAJ\xF9\xD4\xEC\xE2R>\x99\xB7\xF3\xFFOA\b)\xF6\x94\xC4\xE7\xE5\x1F\x8C\xCC\xD4\x06\x87\x84!\xB4\x03\xA5 \x1F\x94\x80\x12Q\x18\xDA\x8A\xDEA*\xE4\x86\xDC\x91\ar\xA6Y )\x15Z\xD02\x13\xC7|\xC8\xB4;\xB8;\xF88t;\xFC,yCR#\xD6)7\xF2\xB1\x15\xDCA\x05\x17\xAD\x10`e\xC4\xB3\xE0\xCB\x8D\x11_\x19\x04\xB1\xE43\x9A\x10y\x9B%\xCD/M2\bd\xAF\x03\x95?\x00\x15G\x0E\xB0\xCB0$S\x8A@\x8A\xAD\x10\v\x9B`\t61\xB7\x80\x87\xD3\xC0K ]\x1C\xE4\b\xFF\xB2\x9E\xAE\xFB\xD9\xBE\xFDc\xE6),\a\x8Fg\xA0\x84\x88\xD9\xB2\xD9\x82\xEF\xD4\x057\xF4}I8\x11g\x14\v\xFB\xE5\xE0d\xE6\xEE]\xD3\xBD\xE9\x9F\xAE\xD3%dL\xAD\xAE\x8EM\xCF\xA8\xEDX\xA0\x12KA\xC5\xCC\x81\x06n\x82F2\xE7\x01C,H\xF0L\xD7\xF5\xDB\xB7\xA6zW0\xC80H\x8D+)\xD3\xE9\xB7u\x9DD\x8A\xE5\x84\xEEkl\xAB\x1C-\xF3t\x04\xD9\x8F\xD1$\x8C\x84\xA6F\x13\x99\x9A\x92Z(\xE5c\xFA\x84Z\x99\x9EW\xA0\xA2\xBF\a\xF0\e&\xC2\x93f\xB6p\xE2X\xA7\x9E^\xC1\xF9\x13W\x12\x1E\xDB\xBEg M=\x986)L\xE2_\xF1\xB5\xCE\xFEq9q\x12\x9D\xFF\xE1\xCCHLH\xC8\x98^y16C9\xE1!\xB6\x8A\xA1O\x96\xD7\x99E\xBB\xF5I\xB1\x8B\x8B\x18g/7_\xE8jn\xA9?\xDB\xEC\xBD\xEEx\xE2|\xE1\xB9\n,\x0F:\x9E\xA9U'\xED\x0E\x9A\xB5\xE9\x81\x17\xF5\v\x8E\x94\x89\x96\xCES\x87PC\xED\x89\x9C\xB8\xCA\x81\x00*\xDB<Q\xC1\t\xDB*hX\xA58^\n\n\xD8\x02rh\x03\x96\xB9\f\xAFK\x86 \x92\xFB\x940KQ\xA0\xC4\x8F\xF1\xF4\xD0\xC8o\x83w\xDB\xFF\xC4\xC0\xE2?\xAA\x1E\x14\x8D\x1F\x9FM\x18~\x8B\xDA\xD8\xA1\xF1!~d\xE7C_p\e\xE8kjiW\x7F\xD1\xD8\xD0\xD8\xD1%'\x9E\x06}t\xBA\xD0=R\xE7\xBD(\xA6pS#\x87#b2\xD3\xF6\xEE\xCD\xBD1\xFF`\xF8\xFB;4\xA8q\x922\xCD\x88\xF7\x89\x82\xB3=\xD7\x89\xCF\xED\f\x16P\x88\x9F\x83\xEC\xEF\xDC\xF2\xEC.\xBB6\n\xB9\xDD8\xA9(\xEB\xDD\x9C\x83\x15ZL\xBC0Q\xB5\x86|}hD?mX\xC0s\xF8n\xEF\xB7?\rOX\x970l\xC1\xE0q\xEA\x91\xF0C\xF6\x94n0\x82r\xC9\xA4\x03\xB8\xE7\xFD\xD6\xFC\v\xA6\x86\x18\xAC\xC1\xD1\xB5Q\x1F\x99Oe\x9B,\xB98\x1F\x9B\xDB\xAA\x06\xAA\xAF\xD4\xDE\xC7O\xF1B\xC3|SO\xCB\xE8\x95K\xC3XN\x01\xB0\x95\x813\xF6\x1F\x9F\x11\xAB9[5m\xDA\xD1 \xC7\xB2\xF0\xD8,\xA2\xA4\xC1:\xDAO\xD0a\xA9\xA4\x11}\x03\x9E\xF0\x9E\xBD\xB5\x8D\xF61\x1Dg/\x0E\xB5[\xD7f\xC1eq`\x02?\x93\x83W\xE0#\xE2O|\xC3B\x89\xE6\x13\\{\xEE\xB47l\xED`\x7F\x1F\xBC\xFA\xCB\x9D\xA1\xAC\xF8\x98\x93F\xA2%\x12o\xA2\x88\xCCO\xFE\x98l\x96\x8B\xE5\xAF\xDC\xC3%\x90\xAC\xD3F\x89\xCE\xF6\xAB\xAB7\x8Er6\x176\xD9\xA6\x96\x81\e\xFBe\xFFWm\xA3X~o\xF4HXD\xE6\x91\x84}B\xEF\xCD:5\xF1f\xCF\x13\xD7\xA7\xE1\xE0M'?`m\r\x02\xC1'\xF8\x05\xD9|\xAC\xB0\xC6\x98\xA3\xEE\x00?\x19\\e\x95\x96\x0Eq\xD0\n1\xD6\x8C\x0E\x16\x9C\x9C`\xFBk\xE0\xD4\xA8P\xC0\xF6&\x853B\x7F\x01\xE0s6%"}}}}, :Type=>:Font, :FirstChar=>48, :LastChar=>112, :Widths=>[556, 556, 556, 0, 0, 556, 0, 0, 0, 556, 0, 0, 0, 0, 0, 0, 0, 722, 0, 0, 722, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 611, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 556, 0, 0, 0, 0, 611, 0, 333, 0, 0, 0, 0, 0, 0, 0, 0, 0, 611], :Encoding=>:WinAnsiEncoding, :Subtype=>:Type1}}}

while k is equal to :Font

so as a work arround add additional check actual_value(v[k]) to this string:

v.keys.each do |name| return true if actual_value(foreign_res[k]) && actual_value(foreign_res[k])[name] && actual_value(v[k]) && actual_value(foreign_res[k])[name] != actual_value(v[k])[name]
code

Open and save sign pdf

I need to stamp a logo in a file signed, but when I use

pdf = CombinePDF.load "signed_document.pdf"
pdf.save "signed_stamped_document.pdf"

the signature was deleted

so...this gem support or will support files signed?

undefined method `length' for #<CombinePDF

Boaz,
This may be related to issue #10.

I have successfully created a combined pdf that includes costprojects and their S3 attachments.

It works fine, if I display the pdf in the web browser. But, I'm now trying to email the pdf to the user.

I'm getting "undefined method 'length' for #CombinePDF::PDF:0x007fa87c8de3a8" on this line of code attachments['Report.pdf'] = pdf

Here is my controller:

def pdfemail
@costprojects = Costproject.find(params[:costproject_ids])
  respond_to do |format|
    format.html
    format.pdf do
      pdf = CombinePDF.new
      @costprojects.each do |costproject|
       @costproject = costproject
           pdf2 = render_to_string pdf: "Costproject.pdf", template: "costprojects/pdfemail", encoding: "UTF-8"
        pdf << CombinePDF.parse(pdf2)
        costproject.attachments.each do |attachment|
          pdf << CombinePDF.parse( Net::HTTP.get( URI.parse( attachment.attach.url ) ) )
        end
      end

      SendReport.send_report(pdf).deliver
      redirect_to :back
      flash[:notice] = 'Email containing pdf has been sent to you!'
    end
  end
end

This is the mailer:

class SendReport < ActionMailer::Base
  def send_report(pdf)
    tomail = "[email protected]"
    frommail = "[email protected]"
    attachments['Report.pdf'] = pdf
    mail(
        :to => tomail,
        :from => frommail,
        :subject => "Report pdf")
  end
end

I am using Ruby 1.9.3

pdf includes:

<CombinePDF::PDF:0x007fa87c8de3a8 @objects=[{:Type=>:ExtGState, :SA=>true, :SM=>0.02, :ca=>1.0, :CA=>1.0, :AIS=>false, :SMask=>:None}, {:indirect_without_dictionary=>[:Pattern, :DeviceRGB]}, {:Type=>:XObject, :Subtype=>:Image, :Width=>71, :Height=>75, :BitsPerComponent=>8, :ColorSpace=>:DeviceRGB, :Length=>{:is_reference_only=>true, :referenced_object=>{:indirect_without_dictionary=>5776}}, :Filter=>:DCTDecode, :raw_stream_content=>"\xFF\xD8\xFF

(lots of stuff)

currentdict /CMap defineresource pop\nend\nend"}}}}}, :XObject=>{}}}, :Annots=>{:is_reference_only=>true, :referenced_object=>{:indirect_without_dictionary=>[]}}, :MediaBox=>[0, 0, 595, 842]}}], :Count=>2, :ProcSet=>[:PDF, :Text, :ImageB, :ImageC]}}}], @Version=1.4, @info={:Producer=>"Ruby CombinePDF Library by Boaz Segev"}, @string_output=:literal, @set_start_id=1>

It it possible to combine multiple pdf with Destination property?

Hello

It it possible to combine multiple pdf with Destination property?

I have two PDFs. one is pdf which has destination property. (link text)
The other one is pdf which has no destination properly.

I can combine two PDF, but destination property broken.

This is sample.
https://gist.github.com/hiroyuki-sato/ce48110f34accec1f837

Thank you for your advice.

Hiroyuki Sato.

PDF files created with Wicked PDF would not be stamped / numbered correctly

This issue was reported by Saba.

Looking into the issue, I discovered that the Wicked PDF's engine ( wkhtmltopdf ) doesn’t wrap the PDF content streams correctly.

to be more technical:

Immediately at the start of each page’s content stream, wkhtmltopdf creates a 'transformation' to the drawing engine (using the PDF command: 0.060000000 0 0 -0.060000000 28.3200000 813.679999 cm)…

This transformation isn’t wrapped in a container (the PDF commands q and Q), so that the transformation effects all the content streams as well as the original content.

The end result is that the stamped content (page numbering / watermark / overlaid page) is resized and appears inverted (upside down and mirror like) in the top left corner of the PDF page - which is an unexpected bug caused by this malformed PDF.

This was fixed by injecting a wrapper around the all the content streams, each time content is injected to an existing PDF page.

Create empty pdf file when PDF 1.5 Object streams found

The current combine function doesn't support to compile pdf 1.5 object stream.

We get the output like this:

Starting to parse PDF data.
...
PDF 1.5 Object streams found - they are not fully supported! attempting to extract objects.
Attempting {:Type=>:ObjStm, :N=>18, :First=>127, :Length=>1253, :Filter=>:FlateDecode, :indirect_generation_number=>0, :indirect_reference_id=>8}
didn't find reference {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>25}
Couldn't connect all values from references - didn't find reference {:Type=>:XRef, :Index=>[0, 28], :Size=>28, :W=>[1, 2, 1], :Root=>nil, :Info=>{:Producer=>"pdfTeX-1.40.15", :Creator=>"TeX", :CreationDate=>"D:20141023222449+07'00'", :ModDate=>"D:20141023222449+07'00'", :Trapped=>:False, :"PTEX.Fullbanner"=>"This is pdfTeX, Version 3.14159265-2.6-1.40.15 (TeX Live 2014/Arch Linux) kpathsea version 6.2.0", :indirect_generation_number=>0, :indirect_reference_id=>26}, :ID=>[25531, 1, 5, 56, 97, 0, 36, 2912, 6, 25531, 1, 5, 56, 97, 0, 36, 2912, 6], :Length=>88, :Filter=>:FlateDecode, :raw_stream_content=>"x\xDA\x15\xC8\xBB\x15@@\x14\x84\xE1\x99\xF5\xD8]\xD6\xAB#=H\xB5!\x17(@\x0F\x1A\xD2\x87#\x11J\x98\e|\xE7?3\x00>\x87@\x01\xD1Am\xA4\x95$\x1D\xEFh_/\x998\xFA\xCDv.\x85\x94\xE2\x99V\xFB\x02\xA7\xC3\x1A\xB9\xEC\xD6\x8A\xE7h\xADe\xE0\xF5\x82\xCF\x8C\x1F\xA4\xA4\n\xB0", :indirect_generation_number=>0, :indirect_reference_id=>27}!!!
setting parsed collection and returning collection.
connecting objects with their references (serialize_objects_and_references).
couldn't connect a reference!!! could be a null object, Silent error!!!
couldn't connect a reference!!! could be a null object, Silent error!!!
couldn't connect a reference!!! could be a null object, Silent error!!!
couldn't connect a reference!!! could be a null object, Silent error!!!
couldn't connect a reference!!! could be a null object, Silent error!!!
couldn't connect a reference!!! could be a null object, Silent error!!!
couldn't connect a reference!!! could be a null object, Silent error!!!
couldn't connect a reference!!! could be a null object, Silent error!!!
couldn't connect a reference!!! could be a null object, Silent error!!!
finished to initialize PDF object.
Resources re-building disabled as it isn't worth the price in peformance as of yet.
Formatting PDF output
Building XREF

How can I handle for this problem?

combinePDF number_pages method not wok

First of all thank you very match for this great library. Can not figure out why so small amount of stars.

One issue i have.

@pdf_combiner = CombinePDF.new
#loop start
pdf = CombinePDF.new(pdf_path)
@pdf_combiner << pdf
#loop end

@pdf_combiner.number_pages
@pdf_combiner.save book_pdf_path if @pdf_combiner

#done!

This code works. All pages merged and final pdf stored in file system.
Expect pages numbering.
Can non figure out what the problem is.

won't be able to load Adobe LiveCycle pdf file

hi @boazsegev

It seems the library is having an issue loading pdf file that is creating using Adobe LiveCycle (containing xml file).

Link to download sample pdf

here is the error I got

Can i use Combine PDF to only replace links location?

Hey,

I'm looking for days for a solution and couldn't find it. I know you can stamp, merge 2 pdf, but is there a way that I can only the links addresses of the document?

Visually the document should be the same, just the links should be different.

Thanks a lot

Skipping numberibng pages

Is there currently an easy way to skip pages for numbering? For example specifying pages or skipping odd/even pages?

Combined doc starting with a top margin

I am writing to one file and then combining with another, the blank file i write on has the correct co-ordinates and writes in the correct place, but when combined with the original file a top margin seems to be added, see screenshot below.

Can i remove the top margin when combining? or any ideas why this happens?