Comments (20)
FYI: Remote signing with pyHanko has been integrated into a library for the yes® banking ecosystem: https://github.com/yescom/pyyes
Thanks again for the support!
from pyhanko.
Hi Daniel,
Thank you for taking interest in pyHanko! Your use case is an interesting one; making allowances for remote signing crossed my mind before, but I didn't consider the possibility of the signer's cert not being available to the Signer
class. Currently, Signer
only delegates the actual "raw" signing operation to subclasses, but constructs the CMS object by itself.
Let me take another look at how essential that assumption currently is. Probably pyHanko only uses the cert to build the CMS object, but if your interface spits out entire CMS objects that shouldn't be a problem. I'll get back to you as soon as I know more.
from pyhanko.
Hi Daniel,
I've given it some thought, and I think I can factor out the bits of PdfSigner
that perform the following tasks:
- setting up a form field and a signature placeholder,
- (optionally) providing a visual appearance for the signature,
- computing the document hash,
- passing said hash to an external CMS provider,
- and finally embedding the resulting CMS into the document.
PdfSigner
currently performs a number of intermediate operations between these steps, some of which do require access to the signer's certificate. This means that (part of) the bring-your-own-CMS implementation will probably end up being structured like a coroutine, where control is yielded to the caller every so often. I'm still ironing out the details on how I'm going to do that, but the refactor looks doable.
In this case, you wouldn't have to implement Signer
, but rather an interface that supplies CMS objects given some hash, which seems to be what you want, if I'm interpreting your use case correctly.
That being said, you will have to get pyHanko to agree with the external CMS provider on what digest algorithm to use to hash the document. This bit of info has to be attached to the corresponding signerInfo
entry in the CMS object, since it's necessary for the validator to know which hashing algorithm to use (see e.g. here).
You would also have to be able to estimate the size of the CMS object ahead of time. My own implementation does that by creating a "dummy" CMS object first, but obviously that doesn't make much sense if the final CMS object is generated remotely.
Are these constraints that you can live with? :)
from pyhanko.
The refactor was easier than I expected. Here's some code from the test suite that demonstrates the way the new "CMS-agnostic" API-would work. As you can see, it's based on (generator) coroutines, and heavily uses yield
/send
to manage control flow.
# CMS-agnostic signing example
#
# write an in-place certification signature using the PdfCMSEmbedder
# low-level API directly.
input_buf = BytesIO(MINIMAL)
w = IncrementalPdfFileWriter(input_buf)
# Phase 1: coroutine sets up the form field
cms_writer = signers.PdfCMSEmbedder().write_cms(
field_name='Signature', writer=w
)
sig_field_ref = next(cms_writer)
# just for kicks, let's check
assert sig_field_ref.get_object()['/T'] == 'Signature'
# Phase 2: make a placeholder signature object,
# wrap it up together with the MDP config we want, and send that
# on to cms_writer
timestamp = datetime.now(tz=tzlocal.get_localzone())
sig_obj = signers.SignatureObject(timestamp=timestamp, bytes_reserved=8192)
md_algorithm = 'sha256'
cms_writer.send(
signers.SigObjSetup(
sig_placeholder=sig_obj,
mdp_setup=signers.SigMDPSetup(
md_algorithm=md_algorithm, certify=True,
docmdp_perms=fields.MDPPerm.NO_CHANGES
)
)
)
# Phase 3: write & hash the document (with placeholder)
document_hash = cms_writer.send(
signers.SigIOSetup(md_algorithm=md_algorithm, in_place=True)
)
# Phase 4: construct CMS signature object, and pass it on to cms_writer
# NOTE: I'm using a regular SimpleSigner here, but you can substitute
# whatever CMS supplier you want.
signer: signers.SimpleSigner = FROM_CA
# let's supply the CMS object as a raw bytestring
cms_bytes = signer.sign(
data_digest=document_hash, digest_algorithm=md_algorithm,
timestamp=timestamp
).dump()
output, sig_contents = cms_writer.send(cms_bytes)
It's necessarily a bit more low-level and verbose than the usual signing API, and it lacks some of the more advanced PDF-specific features (mainly seed value checking). That said, the refactored PdfSigner
class uses this exact API under the hood now. It should therefore be fairly robust, but the new API also takes basically all input parameters at face value (AKA garbage in -> garbage out), so be careful. :)
Would that serve your needs? If so, I'll merge the refactor commit into the main branch.
EDIT about where to go from here: I'm not sure if the remote signing API you have in mind follows a particular standard, but as long as it's clear to what service it caters, I'd be happy to accept a pull request!
Putting the new functionality in a separate module under pyhanko.sign
seems reasonable then, especially since you won't be plugging into the PdfSigner
class or the Signer
hierarchy directly---the new API necessarily operates at a lower abstraction level.
from pyhanko.
That was quick! Looks good to me at first glance. I'm not able to supply a timestamp to the remote signing service, and I'll need to save the document to some temporary place, make some HTTP requests, and then resume the signing process. I'll try to figure out how to do that with the proposed solution.
The remote signing service implements the signDoc interface described in Section 8.2.1.1 of ETSI Standard ETSI TS 119 432. I therefore suppose that it will be interesting for others as well.
More precisely, a request as follows is send to the remote signing service:
POST 5634/csc/v1/signatures/signDoc
Content-Type: application/json
host: qtsp.com
{
"credentialID":"qes_eidas",
"SAD":"eyJraWQiOiJDWHVwIiwiYWxnI...",
"documentDigests":{
"hashes":[
"sTOgwOm+474gFj0q0x1iSNspKqbcse4IeiqlDg/HWuI=",
"HZQzZmMAIWekfGH0/ZKW1nsdt0xg3H6bZYztgsMTLw0="
],
"hashAlgorithmOID":"2.16.840.1.101.3.4.2.1"
},
"profile":"http://uri.etsi.org/19432/v1.1.1#/creationprofile#",
"signature_format":"P",
"conformance_level":"AdES-B-T"
}
And the response looks as follows:
HTTP/1.1 200 OK +
{
"SignatureObject":[
"KedJuTob5gtvYx9qM3k3gm7kbLBwV…bEQRl26S2tmXjqNND7MRGtoew==",
"AedJuTob5gtvYx9qM3k3gm7kbLBwV…bEQRl26S2tmXjqNND7MRGtoes=="
],
"revocationInfo":{
"ocsp":[
"MIIJg...jSc="
],
"crl":[
"MIIC4...X7M="
]
}
}
The objects in these examples are shortened; I can provide a full response upon request. The examples are taken from the yes® Signature Service which I'm testing with Python.
from pyhanko.
Interesting! I wasn't aware of that standard, actually, so I'll give it a read once I have some time.
I realised that I didn't tell you on which branch the changes live, so here's a link: https://github.com/MatthiasValvekens/pyHanko/compare/feature/cms-agnostic-sign.
If you're happy with this API, I can merge it into master too (just not right now), if that's more convenient. :)
Concerning timestamps, the timestamp that you pass to the signature placeholder is intended to be stored in the PDF file itself, not the CMS. It's an optional entry in the (PDF) signature object, and not intended to be an authoritative record of the signing time anyhow (but PAdES allows it). I should probably tweak the type hints in the API to make that more obvious, thanks.
from pyhanko.
That seems to work perfectly. I'll prepare a full code example for publication. A signed PDF is attached (from a sandbox CA environment). If you're interested to getting access to this remote signing service sandbox, I can set that up for you. Thanks again for the modifications!
test-out.pdf
from pyhanko.
That's very good to hear! I don't need access to the sandbox right now, but that'd certainly be useful to review a (potential) pull request later, in case you're planning to submit one.
from pyhanko.
Quick update on the info above: It seems that I need to embed the revocation information as well. I'm trying to figure out how to do that.
from pyhanko.
The PDF standard gives you two choices. You can embed them into the signature CMS object directly as an "Adobe-style" revocation information attribute (OID 1.2.840.113583.1.1.8). Unfortunately this attribute must be a signed attribute, which is disqualifying in your use case, I presume.
The alternative (more modern) way is to embed the revocation info into the document security store (DSS) in an (unsigned!) incremental update after signing. The part of PdfSigner
that does that is the following:
# [...snip]
output, sig_contents = cms_writer.send(timestamp_cms)
# update the DSS
from pyhanko.sign import validation
validation.DocumentSecurityStore.add_dss(
output_stream=output, sig_contents=sig_contents,
paths=validation_paths, validation_context=validation_context
)
I concede that this API isn't terribly convenient if all you have is the raw revocation data (the API was built to consume output from certvalidator
), but it should be workable: you can instantiate a certvalidator.ValidationContext
manually, using the revocation data received from the signing service & any intermediate certs that might be relevant. As far as validation paths go: passing in an empty list probably won't cause any trouble (since the relevant certificates should be available in the CMS anyway).
I could tweak this a little to make it easier to work with given raw revocation info (wouldn't be terribly hard), I'll see if I can get to it tonight. :)
from pyhanko.
Update: I tweaked the internal DSS API a bit in commit ff1a225. On the current master branch, add_dss
now also accepts certs
, ocsps
and crls
as (optional) keyword args, so you don't have to go through the trouble to present your revocation info in certvalidator
-compatible format.
Note: the CMS objects are expected in their asn1crypto
"parsed" from, though, not as bytes
objects. To parse an OCSP response, for example, you'd call OCSPResponse.load(...)
on the DER-encoded bytes
representation you got from the signing service.
EDIT: oh, if you're embedding the revocation info into the DSS, you'll also want to pass SigSeedSubFilter.PADES
in the subfilter
parameter when constructing the signature object, to signal that you're using a PAdES-style signature.
from pyhanko.
Thanks for the tweaks. I currently do
yesresponse = signer.sign(
data_digest=document_hash, digest_algorithm=md_algorithm, timestamp=timestamp
)
cms_bytes = yesresponse.dump_cms()
output, sig_contents = cms_writer.send(cms_bytes)
validation.DocumentSecurityStore.add_dss(
output_stream=output,
sig_contents=sig_contents,
certs=[],
ocsps=yesresponse.ocsps_parsed(),
crls=yesresponse.crls_parsed(),
)
Path("test-out.pdf").write_bytes(output.read())
But I end up with an empty output stream.
from pyhanko.
Ah, that's probably because the stream pointer is still at the end of the file. Inserting output.seek(0)
before the read()
call would resolve that.
from pyhanko.
That was it, excellent!
from pyhanko.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions!
from pyhanko.
Great! Thanks for the update. :)
Just a heads-up, in case you haven't pinned your pyHanko version, or are planning to upgrade in the near future: the PdfCMSEmbedder
API you're using will change very slightly in the upcoming 0.7.0 release (due later this month). I've updated the relevant example in the docs, see here. The change will obviously also be listed in the release notes, but I thought I'd mention it here as well.
The modifications required on your end should be minimal, but it's still a breaking change.
The reason for the change is to make it easier to interrupt the signing process and resume it later (possibly in a different process, or even on a different machine entirely). This is relevant in some remote signing workflows that rely on callbacks through webhooks and the like. Not sure if that applies to your use case, but in case it matters: please feel free to ask for further info. Such questions are also useful for me to know what to focus on when writing the documentation. :)
from pyhanko.
That is a great improvement, looking forward to adapting my code to it!
from pyhanko.
Hi Daniel,
FYI: pyHanko 0.7.0 was just released on PyPI. I noticed that your setup.py
doesn't pin an exact pyHanko version, which will probably cause problems for people trying to install it, with the breaking change in the new version that I mentioned in my previous comment. :)
I recommend adding an ==0.6.1
constraint to your dependency list, that would give you time to update. Actually, even after updating, it would probably be a good idea to use pinned pyHanko versions until we hit 1.0.0
, just to be on the safe side.
from pyhanko.
I have updated pyyes and my signatures are validating correctly. However, I'm not sure if my current use of your library is correct, mostly because I'm using the signature data twice. May I ask you to give this a quick look?
Again, if there's anything I can do for you - in particular providing a test environment for remote signatures - please let me know.
Thanks a lot,
Daniel
from pyhanko.
Hi Daniel,
Thanks for checking in!
First, the fact that signature_bytes
is used twice isn't that weird by itself: it's used both to embed the actual signature, and to compute a hash to key an entry in the VRI section of the Document Security Store. Having said that, pretty much no-one bothers with VRI these days. It was intended as a mechanism to optimise validation, but it ended up largely unused, and is pretty much deprecated in recent PAdES if I recall correctly. VRI generation is also togglable since 0.8.0
. Anyway, that's just a minor background detail.
However, if you do things right, finish_signing()
already calls add_dss()
for you in the background, provided that you set up the signing process to do PAdES-style signing. This wasn't the case back when we had our first conversation on this topic. Actually, the way you're currently doing things (calling add_dss()
before finish_signing()
) kind of only works by accident. The resulting document is likely correct right now, but the way you're calling the API is sort of fragile.
Ordinarily, I would recommend letting pyHanko do the revocation info bookkeeping entirely on its own, but from what I remember, you don't have access to the certificate & revocation information until after submitting a hash to the server, right?
Either way, I would at least swap the add_dss()
and finish_signing()
calls. If you call add_dss()
yourself, there's no need to keep tabs on the value for post_sign_instr
you got from the API earlier. You also don't need to pass in a validation context to finish_signing()
, that only matters for document timestamps.
As an alternative to calling add_dss()
manually, you may want to instantiate your own value for post_sign_instr
in the call to finish_signing()
instead, in order to get something slightly more declarative. See here for further documentation on how to set that up. Essentially, if all you need is PAdES-B-LT (without the extra document timestamp) then you only need to supply a value for the validation_info
field. That one also has lots of subfields, but in your case, passing in the CRL and OCSP values is probably enough (you can probably even leave signer_path
empty).
Hope that helps, and if anything else is unclear: let me know! These low(ish)-level APIs are tricky to document holistically, since they have to accommodate so many different workflows...
from pyhanko.
Related Issues (20)
- Expose encryption dictionary in PdfFileReader as instance variable HOT 9
- The Coordinates Not Set Properly HOT 3
- LICENSE.PyPDF2 missing from wheel distributions HOT 3
- Add digital signature is broken for PDF file larger than 100 000 000 bytes HOT 3
- Xrefs disable
- Support of 64bit PKCS#11 libraries (drivers) HOT 4
- Support of non-English aplphabet (e.g. UTF-8) in stamp-text HOT 1
- libcrypto.so.3: undefined symbol: C_GetFunctionList' HOT 2
- PKCS11: identifiying signing key HOT 4
- hardware token pkcs11.exceptions.NoSuchKey after upgrading to 0.23.0 HOT 3
- CLI: Signing produces name from certificate without international characters HOT 1
- PDF signing breaks if no fields object in Acroform HOT 2
- ValueError: invalid literal for int() with base 10: '' while signing file
- ValueError("Invalid padding bytes.") when trying to decrypt Adobe.PubSec encrypted pdf file HOT 15
- Signature invisible in Adobe Reader but visible in other viewers HOT 7
- Certvalidator report crl as good with one trust root, but invalid with two roots HOT 2
- Support for PQC algs HOT 4
- How to get custom text in sign HOT 1
- SimpleSigner.load_pkcs12() passphrase utf-8 character error HOT 1
- LTV not working when signing in interrupted mode
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyhanko.