Giter VIP home page Giter VIP logo

Comments (3)

MatthiasValvekens avatar MatthiasValvekens commented on June 1, 2024

Hmmm, this is a though one.

First, some theoretical analysis/rambling: I guess Okular and evince use the same PDF writing backend, then. I tried filling out the form with Okular, and the revision made by Okular overrides the form field's appearance stream while also incrementing the generation number. This is a misinterpretation of what generation numbers are for: when overriding an existing object, you can simply clobber the previous incarnation without bumping the gen number --- in fact Okular does exactly that when overriding the form field itself. You only need to start a new generation if you're reusing the object ID of a previously freed object. However, since object IDs are not in short supply, pretty much no-one does that these days.

PyHanko's cross-reference table parsing logic (which was written with forensic analysis on signed files in mind) treats this as an error because the spec doesn't tell you what to do when there are multiple "currently active"/unfreed generations with the same object ID in a file, and different implementations do different things there. This can be abused to craft a PDF file that renders differently with different viewers, which I'm sure you'll agree is a problem when dealing with signed files ;).

Anyway, the whole object freeing/generation number thing is confusingly specified in general, so I concede that my own interpretation may very well be wrong. The issues with differing interpretations "in the wild" are real, though.

TL;DR: The form file looks fine. This is probably a bug in Okular/evince/...'s form filling logic (which is an interesting data point in itself).


So, what can we do about that? Whether anything needs to be done depends on what you want to achieve:

  • PyHanko never writes nonzero generation numbers and never frees objects, so any putative form filling implementation in pyHanko would not be exposed to this class of bugs. If you were intending to use pyHanko if/when it gains generic form filling capabilities, this won't be an issue. :)
  • If you need a way to test signing behaviour with filled forms in the meantime, there are other viewers out there that support form filling: you could use Adobe Reader (or Acrobat, obviously), Foxit's viewer, or even your browser's built-in PDF viewer (Firefox, Chrome and their derivatives should be OK).
  • I could also try to figure out whether it's possible to add a flag to the xref parser to ignore this sort of quirks, but I can't say off-hand whether that will be easy. It would also have to be disabled by default, for the reasons that I mentioned above.

Anyway, that turned out quite a bit longer than I anticipated... Thanks for the test files, and let me know what you think.

from pyhanko.

fredericoschardong avatar fredericoschardong commented on June 1, 2024

Thank you for the explanation.

could also try to figure out whether it's possible to add a flag to the xref parser to ignore this sort of quirks, but I can't say off-hand whether that will be easy. It would also have to be disabled by default, for the reasons that I mentioned above.

Nah, don't worry about it. Indeed I was able to sign the filled form using adobe reader. Interestingly enough, if I use Chrome, it fails in the same manner as Okular, and it worked as expected using Firefox.

Nonetheless, I couldn't get the expected results when setting the second signature to docmdp_permissions=MDPPerm.NO_CHANGES. Adobe reader still allows changes to the form; please see example2.zip for a working example. Here is the output on adobe reader asking to fill the (already filled) form:

output

Am I missing something on PyHanko's side, or is my understanding of expecting the reader not to allow any more changes wrong?

from pyhanko.

MatthiasValvekens avatar MatthiasValvekens commented on June 1, 2024

That's a valid question. The logic for DocMDP policies on an approval signature (i.e. a signature that's not a certification signature) is a bit wonky, and in case of NO_CHANGES, I should probably tighten up this piece of fallback logic that's used when there's no field lock dictionary on the field:

if not meta_certify and docmdp_perms is not None:
if lock_dict is None:
# set a field lock that doesn't do anything
sig_field['/Lock'] = lock_dict = generic.DictionaryObject({
pdf_name('/Action'): pdf_name('/Include'),
pdf_name('/Fields'): generic.ArrayObject()
})
lock_dict['/P'] = generic.NumberObject(docmdp_perms.value)
. I guess it makes sense to call that a bug.

In the meantime, you can try creating a signature field with a field lock dictionary that locks everything, and then sign that. Here's the documentation on how to do field locking in pyHanko: https://pyhanko.readthedocs.io/en/latest/lib-guide/sig-fields.html#document-modification-policy-settings. You can combine that with a DocMDP policy setting, which is very common, but it's a PDF 2.0 thing, and it's pretty poorly specified in general (it's a common source of complaints in the industry).

Also, in case you're not already doing that: in multi-signer workflows, it's good practice to create all visible signature fields before filling them. The rationale behind that is that the initial signer should have the right to know where all the other signatures will go, so they can make sure that the visual appearances of later signatures won't change the meaning of any document content. Not all validators enforce that principle, though. PyHanko's only does it by default if a certification signature is present.

So the ideal workflow is more or less this:

  • Set up all form fields, with field lock dictionaries for the signature fields if necessary. You can put a DocMDP NO_CHANGES trigger on the field that will be signed last (subject to the caveats from above)
  • Optionally put in a visible or invisible certification signature with DocMDP set to "form filling" (if that makes sense in your context)
  • Do form filling operations
  • Put in the "regular" signatures

I can't do any real testing right now, but modulo some details I think this should work. :)

from pyhanko.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.