Giter VIP home page Giter VIP logo

Comments (20)

danielfett avatar danielfett commented on June 15, 2024 1

FYI: Remote signing with pyHanko has been integrated into a library for the yes® banking ecosystem: https://github.com/yescom/pyyes

Thanks again for the support!

from pyhanko.

MatthiasValvekens avatar MatthiasValvekens commented on June 15, 2024

Hi Daniel,

Thank you for taking interest in pyHanko! Your use case is an interesting one; making allowances for remote signing crossed my mind before, but I didn't consider the possibility of the signer's cert not being available to the Signer class. Currently, Signer only delegates the actual "raw" signing operation to subclasses, but constructs the CMS object by itself.

Let me take another look at how essential that assumption currently is. Probably pyHanko only uses the cert to build the CMS object, but if your interface spits out entire CMS objects that shouldn't be a problem. I'll get back to you as soon as I know more.

from pyhanko.

MatthiasValvekens avatar MatthiasValvekens commented on June 15, 2024

Hi Daniel,

I've given it some thought, and I think I can factor out the bits of PdfSigner that perform the following tasks:

  • setting up a form field and a signature placeholder,
  • (optionally) providing a visual appearance for the signature,
  • computing the document hash,
  • passing said hash to an external CMS provider,
  • and finally embedding the resulting CMS into the document.

PdfSigner currently performs a number of intermediate operations between these steps, some of which do require access to the signer's certificate. This means that (part of) the bring-your-own-CMS implementation will probably end up being structured like a coroutine, where control is yielded to the caller every so often. I'm still ironing out the details on how I'm going to do that, but the refactor looks doable.

In this case, you wouldn't have to implement Signer, but rather an interface that supplies CMS objects given some hash, which seems to be what you want, if I'm interpreting your use case correctly.
That being said, you will have to get pyHanko to agree with the external CMS provider on what digest algorithm to use to hash the document. This bit of info has to be attached to the corresponding signerInfo entry in the CMS object, since it's necessary for the validator to know which hashing algorithm to use (see e.g. here).

You would also have to be able to estimate the size of the CMS object ahead of time. My own implementation does that by creating a "dummy" CMS object first, but obviously that doesn't make much sense if the final CMS object is generated remotely.

Are these constraints that you can live with? :)

from pyhanko.

MatthiasValvekens avatar MatthiasValvekens commented on June 15, 2024

The refactor was easier than I expected. Here's some code from the test suite that demonstrates the way the new "CMS-agnostic" API-would work. As you can see, it's based on (generator) coroutines, and heavily uses yield/send to manage control flow.

    # CMS-agnostic signing example
    #
    # write an in-place certification signature using the PdfCMSEmbedder
    # low-level API directly.

    input_buf = BytesIO(MINIMAL)
    w = IncrementalPdfFileWriter(input_buf)

    # Phase 1: coroutine sets up the form field
    cms_writer = signers.PdfCMSEmbedder().write_cms(
        field_name='Signature', writer=w
    )
    sig_field_ref = next(cms_writer)

    # just for kicks, let's check
    assert sig_field_ref.get_object()['/T'] == 'Signature'

    # Phase 2: make a placeholder signature object,
    # wrap it up together with the MDP config we want, and send that
    # on to cms_writer
    timestamp = datetime.now(tz=tzlocal.get_localzone())
    sig_obj = signers.SignatureObject(timestamp=timestamp, bytes_reserved=8192)

    md_algorithm = 'sha256'
    cms_writer.send(
        signers.SigObjSetup(
            sig_placeholder=sig_obj,
            mdp_setup=signers.SigMDPSetup(
                md_algorithm=md_algorithm, certify=True,
                docmdp_perms=fields.MDPPerm.NO_CHANGES
            )
        )
    )

    # Phase 3: write & hash the document (with placeholder)
    document_hash = cms_writer.send(
        signers.SigIOSetup(md_algorithm=md_algorithm, in_place=True)
    )

    # Phase 4: construct CMS signature object, and pass it on to cms_writer

    # NOTE: I'm using a regular SimpleSigner here, but you can substitute
    # whatever CMS supplier you want.

    signer: signers.SimpleSigner = FROM_CA
    # let's supply the CMS object as a raw bytestring
    cms_bytes = signer.sign(
        data_digest=document_hash, digest_algorithm=md_algorithm,
        timestamp=timestamp
    ).dump()
    output, sig_contents = cms_writer.send(cms_bytes)

It's necessarily a bit more low-level and verbose than the usual signing API, and it lacks some of the more advanced PDF-specific features (mainly seed value checking). That said, the refactored PdfSigner class uses this exact API under the hood now. It should therefore be fairly robust, but the new API also takes basically all input parameters at face value (AKA garbage in -> garbage out), so be careful. :)

Would that serve your needs? If so, I'll merge the refactor commit into the main branch.

EDIT about where to go from here: I'm not sure if the remote signing API you have in mind follows a particular standard, but as long as it's clear to what service it caters, I'd be happy to accept a pull request!

Putting the new functionality in a separate module under pyhanko.sign seems reasonable then, especially since you won't be plugging into the PdfSigner class or the Signer hierarchy directly---the new API necessarily operates at a lower abstraction level.

from pyhanko.

danielfett avatar danielfett commented on June 15, 2024

That was quick! Looks good to me at first glance. I'm not able to supply a timestamp to the remote signing service, and I'll need to save the document to some temporary place, make some HTTP requests, and then resume the signing process. I'll try to figure out how to do that with the proposed solution.

The remote signing service implements the signDoc interface described in Section 8.2.1.1 of ETSI Standard ETSI TS 119 432. I therefore suppose that it will be interesting for others as well.

More precisely, a request as follows is send to the remote signing service:

POST 5634/csc/v1/signatures/signDoc
Content-Type: application/json
host: qtsp.com

{
   "credentialID":"qes_eidas",
   "SAD":"eyJraWQiOiJDWHVwIiwiYWxnI...",
   "documentDigests":{
      "hashes":[
         "sTOgwOm+474gFj0q0x1iSNspKqbcse4IeiqlDg/HWuI=",
         "HZQzZmMAIWekfGH0/ZKW1nsdt0xg3H6bZYztgsMTLw0="
      ],
      "hashAlgorithmOID":"2.16.840.1.101.3.4.2.1"
   },
   "profile":"http://uri.etsi.org/19432/v1.1.1#/creationprofile#",
   "signature_format":"P",
   "conformance_level":"AdES-B-T"
}

And the response looks as follows:

HTTP/1.1 200 OK +
{
   "SignatureObject":[
      "KedJuTob5gtvYx9qM3k3gm7kbLBwV…bEQRl26S2tmXjqNND7MRGtoew==",
      "AedJuTob5gtvYx9qM3k3gm7kbLBwV…bEQRl26S2tmXjqNND7MRGtoes=="
   ],
   "revocationInfo":{
      "ocsp":[
         "MIIJg...jSc="
      ],
      "crl":[
         "MIIC4...X7M="
      ]
   }
}

The objects in these examples are shortened; I can provide a full response upon request. The examples are taken from the yes® Signature Service which I'm testing with Python.

from pyhanko.

MatthiasValvekens avatar MatthiasValvekens commented on June 15, 2024

Interesting! I wasn't aware of that standard, actually, so I'll give it a read once I have some time.

I realised that I didn't tell you on which branch the changes live, so here's a link: https://github.com/MatthiasValvekens/pyHanko/compare/feature/cms-agnostic-sign.
If you're happy with this API, I can merge it into master too (just not right now), if that's more convenient. :)

Concerning timestamps, the timestamp that you pass to the signature placeholder is intended to be stored in the PDF file itself, not the CMS. It's an optional entry in the (PDF) signature object, and not intended to be an authoritative record of the signing time anyhow (but PAdES allows it). I should probably tweak the type hints in the API to make that more obvious, thanks.

from pyhanko.

danielfett avatar danielfett commented on June 15, 2024

That seems to work perfectly. I'll prepare a full code example for publication. A signed PDF is attached (from a sandbox CA environment). If you're interested to getting access to this remote signing service sandbox, I can set that up for you. Thanks again for the modifications!
test-out.pdf

from pyhanko.

MatthiasValvekens avatar MatthiasValvekens commented on June 15, 2024

That's very good to hear! I don't need access to the sandbox right now, but that'd certainly be useful to review a (potential) pull request later, in case you're planning to submit one.

from pyhanko.

danielfett avatar danielfett commented on June 15, 2024

Quick update on the info above: It seems that I need to embed the revocation information as well. I'm trying to figure out how to do that.

from pyhanko.

MatthiasValvekens avatar MatthiasValvekens commented on June 15, 2024

The PDF standard gives you two choices. You can embed them into the signature CMS object directly as an "Adobe-style" revocation information attribute (OID 1.2.840.113583.1.1.8). Unfortunately this attribute must be a signed attribute, which is disqualifying in your use case, I presume.

The alternative (more modern) way is to embed the revocation info into the document security store (DSS) in an (unsigned!) incremental update after signing. The part of PdfSigner that does that is the following:

        # [...snip]
        output, sig_contents = cms_writer.send(timestamp_cms)

        # update the DSS
        from pyhanko.sign import validation
        validation.DocumentSecurityStore.add_dss(
            output_stream=output, sig_contents=sig_contents,
            paths=validation_paths, validation_context=validation_context
        )

I concede that this API isn't terribly convenient if all you have is the raw revocation data (the API was built to consume output from certvalidator), but it should be workable: you can instantiate a certvalidator.ValidationContext manually, using the revocation data received from the signing service & any intermediate certs that might be relevant. As far as validation paths go: passing in an empty list probably won't cause any trouble (since the relevant certificates should be available in the CMS anyway).

I could tweak this a little to make it easier to work with given raw revocation info (wouldn't be terribly hard), I'll see if I can get to it tonight. :)

from pyhanko.

MatthiasValvekens avatar MatthiasValvekens commented on June 15, 2024

Update: I tweaked the internal DSS API a bit in commit ff1a225. On the current master branch, add_dss now also accepts certs, ocsps and crls as (optional) keyword args, so you don't have to go through the trouble to present your revocation info in certvalidator-compatible format.

Note: the CMS objects are expected in their asn1crypto "parsed" from, though, not as bytes objects. To parse an OCSP response, for example, you'd call OCSPResponse.load(...) on the DER-encoded bytes representation you got from the signing service.

EDIT: oh, if you're embedding the revocation info into the DSS, you'll also want to pass SigSeedSubFilter.PADES in the subfilter parameter when constructing the signature object, to signal that you're using a PAdES-style signature.

from pyhanko.

danielfett avatar danielfett commented on June 15, 2024

Thanks for the tweaks. I currently do

yesresponse = signer.sign(
    data_digest=document_hash, digest_algorithm=md_algorithm, timestamp=timestamp
)
cms_bytes = yesresponse.dump_cms()
output, sig_contents = cms_writer.send(cms_bytes)

validation.DocumentSecurityStore.add_dss(
    output_stream=output,
    sig_contents=sig_contents,
    certs=[],
    ocsps=yesresponse.ocsps_parsed(),
    crls=yesresponse.crls_parsed(),
)


Path("test-out.pdf").write_bytes(output.read())

But I end up with an empty output stream.

from pyhanko.

MatthiasValvekens avatar MatthiasValvekens commented on June 15, 2024

Ah, that's probably because the stream pointer is still at the end of the file. Inserting output.seek(0) before the read() call would resolve that.

from pyhanko.

danielfett avatar danielfett commented on June 15, 2024

That was it, excellent!

from pyhanko.

stale avatar stale commented on June 15, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions!

from pyhanko.

MatthiasValvekens avatar MatthiasValvekens commented on June 15, 2024

Great! Thanks for the update. :)

Just a heads-up, in case you haven't pinned your pyHanko version, or are planning to upgrade in the near future: the PdfCMSEmbedder API you're using will change very slightly in the upcoming 0.7.0 release (due later this month). I've updated the relevant example in the docs, see here. The change will obviously also be listed in the release notes, but I thought I'd mention it here as well.

The modifications required on your end should be minimal, but it's still a breaking change.

The reason for the change is to make it easier to interrupt the signing process and resume it later (possibly in a different process, or even on a different machine entirely). This is relevant in some remote signing workflows that rely on callbacks through webhooks and the like. Not sure if that applies to your use case, but in case it matters: please feel free to ask for further info. Such questions are also useful for me to know what to focus on when writing the documentation. :)

from pyhanko.

danielfett avatar danielfett commented on June 15, 2024

That is a great improvement, looking forward to adapting my code to it!

from pyhanko.

MatthiasValvekens avatar MatthiasValvekens commented on June 15, 2024

Hi Daniel,

FYI: pyHanko 0.7.0 was just released on PyPI. I noticed that your setup.py doesn't pin an exact pyHanko version, which will probably cause problems for people trying to install it, with the breaking change in the new version that I mentioned in my previous comment. :)

I recommend adding an ==0.6.1 constraint to your dependency list, that would give you time to update. Actually, even after updating, it would probably be a good idea to use pinned pyHanko versions until we hit 1.0.0, just to be on the safe side.

from pyhanko.

danielfett avatar danielfett commented on June 15, 2024

Hi @MatthiasValvekens,

I have updated pyyes and my signatures are validating correctly. However, I'm not sure if my current use of your library is correct, mostly because I'm using the signature data twice. May I ask you to give this a quick look?

https://github.com/yescom/pyyes/blob/16bd4b9165980a3c0a6ed3d9ae0f476697038534/yes/documents.py#L234-L249

Again, if there's anything I can do for you - in particular providing a test environment for remote signatures - please let me know.

Thanks a lot,
Daniel

from pyhanko.

MatthiasValvekens avatar MatthiasValvekens commented on June 15, 2024

Hi Daniel,

Thanks for checking in!

First, the fact that signature_bytes is used twice isn't that weird by itself: it's used both to embed the actual signature, and to compute a hash to key an entry in the VRI section of the Document Security Store. Having said that, pretty much no-one bothers with VRI these days. It was intended as a mechanism to optimise validation, but it ended up largely unused, and is pretty much deprecated in recent PAdES if I recall correctly. VRI generation is also togglable since 0.8.0. Anyway, that's just a minor background detail.

However, if you do things right, finish_signing() already calls add_dss() for you in the background, provided that you set up the signing process to do PAdES-style signing. This wasn't the case back when we had our first conversation on this topic. Actually, the way you're currently doing things (calling add_dss() before finish_signing()) kind of only works by accident. The resulting document is likely correct right now, but the way you're calling the API is sort of fragile.

Ordinarily, I would recommend letting pyHanko do the revocation info bookkeeping entirely on its own, but from what I remember, you don't have access to the certificate & revocation information until after submitting a hash to the server, right?
Either way, I would at least swap the add_dss() and finish_signing() calls. If you call add_dss() yourself, there's no need to keep tabs on the value for post_sign_instr you got from the API earlier. You also don't need to pass in a validation context to finish_signing(), that only matters for document timestamps.

As an alternative to calling add_dss() manually, you may want to instantiate your own value for post_sign_instr in the call to finish_signing() instead, in order to get something slightly more declarative. See here for further documentation on how to set that up. Essentially, if all you need is PAdES-B-LT (without the extra document timestamp) then you only need to supply a value for the validation_info field. That one also has lots of subfields, but in your case, passing in the CRL and OCSP values is probably enough (you can probably even leave signer_path empty).

Hope that helps, and if anything else is unclear: let me know! These low(ish)-level APIs are tricky to document holistically, since they have to accommodate so many different workflows...

from pyhanko.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.