Giter VIP home page Giter VIP logo

Comments (3)

gchehab avatar gchehab commented on June 7, 2024 1

Hi Matthias,

It did the trick, the validation occurred as expected, The two firsts were considered valid and the latter invalid.

Regarding the pdf standard 'openness', I know what you mean. I made an wrapper to enable multithread/multiserver batch OCR scanning (https://github.com/gchehab/ocr-server) and the many diverse ways an embedded image may be encoded almost drove me nuts, I am quite sure that there are a lot of cases that I failed to address.

I just got my hands on another PDF that a colleague is working on that gives a type error during its parse while validating, I am not sure, however if the issue is not some error on the PDF file signature itself. I'll open an issue specific for this error.

Thank you for your quick support,
Guilherme

from pyhanko.

MatthiasValvekens avatar MatthiasValvekens commented on June 7, 2024

Hi, thanks for the report!

The first issue has to do with incremental update validation in pyHanko, and can be solved by whitelisting one more key. Someone else had the same issue; I'll do a bugfix release to correct that particular problem soon. In the meantime, here's how to work around the problem by defining your own diff policy:

As for the workaround, here's the declaration of the default diff
analysis policy:

DEFAULT_DIFF_POLICY = StandardDiffPolicy(
.
You can define a custom diff policy to get around this kind of thing,
and pass it as the diff_policy parameter in various validation methods.

Start by copying the default rules, since you probably want to preserve
most of those. The FormUpdatingRule class actually takes another
parameter called ignored_acroform_keys where you can pass in keys that
are to be ignored in the "strict" part of the comparison comparison
analysis. Passing in "{'/Fields', '/DA', '/DR'}" should be OK.

Alternatively, you can cheat and add "/DA" to the list of keys in
pyhanko.sign.diff_analysis.ACROFORM_EXEMPT_STRICT_COMPARISON at runtime.
Then it will work automagically with the default diff policy

The second issue is indeed certvalidator-related. It tries to fetch some related certs from a CRL AIA record, but the server isn't setting the Content-Type header. That's is a bit annoying, because there are multiple ways to encode certs in a situation like this, and ordinarily you'd rely on the Content-Type header to select the correct parser. I suspect that this particular fetch wasn't working all along, and my move from urllib to requests in my fork of certvalidator simply broke the error handling code. I'd guess that the certs that certvalidator was trying to fetch were actually already available in the local cache anyway.

I'll look into this one more closely, and do a bugfix release if necessary. Thanks for bringing it to my attention!

from pyhanko.

MatthiasValvekens avatar MatthiasValvekens commented on June 7, 2024

Hi, can you reproduce the problem with the latest HEAD for pyHanko and pyhanko-certvalidator? Your file passes validation on my machine now (well, the last signer's certificate doesn't jive with pyHanko's default key usage policy, but you can easily change that in the configuration file).

If this fixes your issue, then I'll do a bugfix release for pyhanko-certvalidator, update the dependency in pyHanko itself, and do a bugfix release here as well. :)

PS: If you face similar diff analysis issues in the future, please report them if you can. This aspect of PDF signature validation isn't standardised anywhere (yet), so I'm always on the lookout for interesting corner cases.

from pyhanko.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.