Comments (3)
Hmmm, this is a though one.
First, some theoretical analysis/rambling: I guess Okular and evince use the same PDF writing backend, then. I tried filling out the form with Okular, and the revision made by Okular overrides the form field's appearance stream while also incrementing the generation number. This is a misinterpretation of what generation numbers are for: when overriding an existing object, you can simply clobber the previous incarnation without bumping the gen number --- in fact Okular does exactly that when overriding the form field itself. You only need to start a new generation if you're reusing the object ID of a previously freed object. However, since object IDs are not in short supply, pretty much no-one does that these days.
PyHanko's cross-reference table parsing logic (which was written with forensic analysis on signed files in mind) treats this as an error because the spec doesn't tell you what to do when there are multiple "currently active"/unfreed generations with the same object ID in a file, and different implementations do different things there. This can be abused to craft a PDF file that renders differently with different viewers, which I'm sure you'll agree is a problem when dealing with signed files ;).
Anyway, the whole object freeing/generation number thing is confusingly specified in general, so I concede that my own interpretation may very well be wrong. The issues with differing interpretations "in the wild" are real, though.
TL;DR: The form file looks fine. This is probably a bug in Okular/evince/...'s form filling logic (which is an interesting data point in itself).
So, what can we do about that? Whether anything needs to be done depends on what you want to achieve:
- PyHanko never writes nonzero generation numbers and never frees objects, so any putative form filling implementation in pyHanko would not be exposed to this class of bugs. If you were intending to use pyHanko if/when it gains generic form filling capabilities, this won't be an issue. :)
- If you need a way to test signing behaviour with filled forms in the meantime, there are other viewers out there that support form filling: you could use Adobe Reader (or Acrobat, obviously), Foxit's viewer, or even your browser's built-in PDF viewer (Firefox, Chrome and their derivatives should be OK).
- I could also try to figure out whether it's possible to add a flag to the xref parser to ignore this sort of quirks, but I can't say off-hand whether that will be easy. It would also have to be disabled by default, for the reasons that I mentioned above.
Anyway, that turned out quite a bit longer than I anticipated... Thanks for the test files, and let me know what you think.
from pyhanko.
Thank you for the explanation.
could also try to figure out whether it's possible to add a flag to the xref parser to ignore this sort of quirks, but I can't say off-hand whether that will be easy. It would also have to be disabled by default, for the reasons that I mentioned above.
Nah, don't worry about it. Indeed I was able to sign the filled form using adobe reader. Interestingly enough, if I use Chrome, it fails in the same manner as Okular, and it worked as expected using Firefox.
Nonetheless, I couldn't get the expected results when setting the second signature to docmdp_permissions=MDPPerm.NO_CHANGES
. Adobe reader still allows changes to the form; please see example2.zip for a working example. Here is the output on adobe reader asking to fill the (already filled) form:
Am I missing something on PyHanko's side, or is my understanding of expecting the reader not to allow any more changes wrong?
from pyhanko.
That's a valid question. The logic for DocMDP policies on an approval signature (i.e. a signature that's not a certification signature) is a bit wonky, and in case of NO_CHANGES
, I should probably tighten up this piece of fallback logic that's used when there's no field lock dictionary on the field:
pyHanko/pyhanko/sign/signers/pdf_signer.py
Lines 1288 to 1295 in 6eef056
In the meantime, you can try creating a signature field with a field lock dictionary that locks everything, and then sign that. Here's the documentation on how to do field locking in pyHanko: https://pyhanko.readthedocs.io/en/latest/lib-guide/sig-fields.html#document-modification-policy-settings. You can combine that with a DocMDP policy setting, which is very common, but it's a PDF 2.0 thing, and it's pretty poorly specified in general (it's a common source of complaints in the industry).
Also, in case you're not already doing that: in multi-signer workflows, it's good practice to create all visible signature fields before filling them. The rationale behind that is that the initial signer should have the right to know where all the other signatures will go, so they can make sure that the visual appearances of later signatures won't change the meaning of any document content. Not all validators enforce that principle, though. PyHanko's only does it by default if a certification signature is present.
So the ideal workflow is more or less this:
- Set up all form fields, with field lock dictionaries for the signature fields if necessary. You can put a DocMDP
NO_CHANGES
trigger on the field that will be signed last (subject to the caveats from above) - Optionally put in a visible or invisible certification signature with DocMDP set to "form filling" (if that makes sense in your context)
- Do form filling operations
- Put in the "regular" signatures
I can't do any real testing right now, but modulo some details I think this should work. :)
from pyhanko.
Related Issues (20)
- [pyhanko-certvalidator] PEM certificate not getting extracted due to incorrect Content-Type header HOT 3
- [pyhanko-certvalidator] Ability to skip nonce validation in OCSP response HOT 3
- Expose encryption dictionary in PdfFileReader as instance variable HOT 9
- The Coordinates Not Set Properly HOT 3
- LICENSE.PyPDF2 missing from wheel distributions HOT 3
- Add digital signature is broken for PDF file larger than 100 000 000 bytes HOT 3
- Xrefs disable
- Support of 64bit PKCS#11 libraries (drivers) HOT 4
- Support of non-English aplphabet (e.g. UTF-8) in stamp-text HOT 1
- libcrypto.so.3: undefined symbol: C_GetFunctionList' HOT 2
- PKCS11: identifiying signing key HOT 4
- hardware token pkcs11.exceptions.NoSuchKey after upgrading to 0.23.0 HOT 3
- CLI: Signing produces name from certificate without international characters HOT 1
- PDF signing breaks if no fields object in Acroform HOT 2
- ValueError: invalid literal for int() with base 10: '' while signing file
- ValueError("Invalid padding bytes.") when trying to decrypt Adobe.PubSec encrypted pdf file HOT 15
- Signature invisible in Adobe Reader but visible in other viewers HOT 7
- Certvalidator report crl as good with one trust root, but invalid with two roots HOT 2
- Support for PQC algs HOT 4
- How to get custom text in sign HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyhanko.