Comments (16)
I am more in favor of using a non-caching client when reconciling certificaterequests in approver-policy.
Yes, I agree that would solve the problem.
You'd GET the CertificateRequest using a non-caching client, here:
approver-policy/pkg/internal/controllers/certificaterequests.go
Lines 174 to 177 in c8f9506
Ignoring a custom error message will never be clean, and quite fragile. If anyone changes the literal error message in the cert-manager webhook, this will break again. The HTTP status code is just Forbidden - which could mean many things.
The full error message is:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "admission webhook \"webhook.cert-manager.io\" denied the request: status.conditions: Forbidden: 'Approved' condition may not be modified once set",
"reason": "NotAcceptable",
"code": 406
}
Given the following patch.json
:
{
"apiVersion": "cert-manager.io/v1",
"kind": "CertificateRequest",
"status": {
"conditions": [
{
"message": "new message",
"type": "Approved"
}
]
}
}
curl 'https://127.0.0.1:44121/apis/cert-manager.io/v1/namespaces/default/certificaterequests/foo/status?fieldManager=cmctl&force=true' \
--silent \
--insecure \
--header "Accept: application/json" \
--header "Authorization: Bearer $(kubectl create token -n cert-manager cert-manager)" \
--header "Content-Type: application/apply-patch+yaml" \
--request PATCH \
--data @patch.json
So that error response might be unique and stable enough to reliably discern it from other errors.
I'd probably choose this over adding a non-caching client because the code solving the problem would be easy to understand and close to the site of the problem.
The configuration of the non-caching client would be here, I suppose, which is a further away from where the site of the problem:
approver-policy/pkg/internal/cmd/cmd.go
Lines 76 to 98 in c8f9506
Another option might be to use the cached CR resourceVersion
in the patch request, which would cause a HTTP conflict error if the cached CR had since been modified by cert-manager or by an external issuer,
approver-policy could handle the conflict by requeuing the CR, by returning an error and relying on the error backoff mechanism or by explicitly requeuing the CR.
Some future reconcile attempt would eventually succeed once the approver-policy cache had the latest version of the CR.
E.g. the following json patch applied using the previous curl command:
{
"apiVersion": "cert-manager.io/v1",
"kind": "CertificateRequest",
"metadata": {
"resourceVersion": "999"
},
"status": {
"conditions": [
{
"message": "new message",
"type": "Approved"
}
]
}
}
Gives:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "Operation cannot be fulfilled on certificaterequests.cert-manager.io \"foo\": the object has been modified; please apply your changes to the latest version and try again",
"reason": "Conflict",
"details": {
"name": "foo",
"group": "cert-manager.io",
"kind": "certificaterequests"
},
"code": 409
}
from approver-policy.
To all following this issue, I can report that it seems to work a lot better now! 🎉 I had a feeling the fix would not matter, but it has an effect....
from approver-policy.
See also this slack thread
from approver-policy.
Based on our certificate renewal policy, the frequency of this error should have decreased if this was a transient error. This is not the case, so I suspect this to be a real bug.
It seems like cert-manager/cert-manager#3735 introduced this strict validation in cert-manager: https://github.com/cert-manager/cert-manager/blob/5141dddf2c0c5e10c5d4452c99a2e260c6eb2983/internal/apis/certmanager/validation/certificaterequest.go#L202-L214. I wonder if the validation was intentionally made extremely strict (DeepEqual
), or if the validation could be loosened up a bit? I am pretty sure it's not vital to validate that conditions do not change at all, or?
from approver-policy.
Looking closer at the managedFields
in the example request, I wonder if this might be a bug in cert-manager:
managedFields:
- apiVersion: cert-manager.io/v1
fieldsType: FieldsV1
fieldsV1:
'f:status':
'f:conditions':
'k:{"type":"Approved"}':
.: {}
'f:lastTransitionTime': {}
'f:message': {}
'f:reason': {}
'f:status': {}
'f:type': {}
manager: approver-policy
operation: Apply
subresource: status
time: '2023-10-30T13:45:09Z'
- apiVersion: cert-manager.io/v1
fieldsType: FieldsV1
fieldsV1:
'f:status':
'f:ca': {}
'f:certificate': {}
'f:conditions':
'k:{"type":"Approved"}':
.: {}
'f:lastTransitionTime': {}
'f:message': {}
'f:reason': {}
'f:status': {}
'f:type': {}
'k:{"type":"Ready"}':
.: {}
'f:lastTransitionTime': {}
'f:message': {}
'f:reason': {}
'f:status': {}
'f:type': {}
manager: cert-manager-certificaterequests-issuer-vault
operation: Apply
subresource: status
time: '2023-10-30T13:45:09Z'
I do not expect the cert-manager Vault issuer to approve certificate requests, but it seems like it (cert-manager-certificaterequests-issuer-vault
) did from the managed fields.
from approver-policy.
Checking if cert-manager is configured correctly, since I am not expecting cert-manager to approve any requests:
I1103 11:05:17.741662 1 controller.go:192] "cert-manager/controller: not starting controller as it's disabled" controller="certificatesigningrequests-issuer-vault"
I1103 11:05:17.742502 1 controller.go:192] "cert-manager/controller: not starting controller as it's disabled" controller="certificatesigningrequests-issuer-acme"
I1103 11:05:17.742515 1 controller.go:192] "cert-manager/controller: not starting controller as it's disabled" controller="ingress-shim"
I1103 11:05:17.742521 1 controller.go:192] "cert-manager/controller: not starting controller as it's disabled" controller="certificaterequests-approver"
I1103 11:05:17.743122 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificates-metrics"
I1103 11:05:17.743150 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificates-trigger"
I1103 11:05:17.743166 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificates-key-manager"
I1103 11:05:17.743181 1 controller.go:215] "cert-manager/controller: starting controller" controller="issuers"
I1103 11:05:17.743197 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificates-revision-manager"
I1103 11:05:17.743214 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificates-issuing"
I1103 11:05:17.743319 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificates-request-manager"
I1103 11:05:17.746431 1 controller.go:192] "cert-manager/controller: not starting controller as it's disabled" controller="certificatesigningrequests-issuer-ca"
I1103 11:05:17.746998 1 controller.go:192] "cert-manager/controller: not starting controller as it's disabled" controller="certificatesigningrequests-issuer-selfsigned"
I1103 11:05:17.747009 1 controller.go:192] "cert-manager/controller: not starting controller as it's disabled" controller="certificatesigningrequests-issuer-venafi"
I1103 11:05:17.747549 1 controller.go:192] "cert-manager/controller: not starting controller as it's disabled" controller="gateway-shim"
I1103 11:05:17.748065 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificaterequests-issuer-selfsigned"
I1103 11:05:17.748113 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificaterequests-issuer-vault"
I1103 11:05:17.748131 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificaterequests-issuer-venafi"
I1103 11:05:17.748147 1 controller.go:215] "cert-manager/controller: starting controller" controller="clusterissuers"
I1103 11:05:17.748159 1 controller.go:215] "cert-manager/controller: starting controller" controller="challenges"
I1103 11:05:17.748174 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificates-readiness"
I1103 11:05:17.748190 1 controller.go:215] "cert-manager/controller: starting controller" controller="orders"
I1103 11:05:17.748632 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificaterequests-issuer-acme"
I1103 11:05:17.748656 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificaterequests-issuer-ca"
from approver-policy.
Here is a newer request from today, which I think proves that there is a bug in cert-manager: the cert-manager Vault issuer sets/owns the Approved condition - with a timestamp after approver-policy approval. No idea why the error ends up in approver-policy tho. I cannot find any related errors/warnings in cert-manager logs.
apiVersion: cert-manager.io/v1
kind: CertificateRequest
metadata:
annotations:
cert-manager.io/certificate-name: prefixverification-forecast-selector
cert-manager.io/certificate-revision: '7'
cert-manager.io/private-key-secret-name: prefixverification-forecast-selector-9qvxg
resourceVersion: '3086765553'
name: prefixverification-forecast-selector-7
uid: 05ae7f2b-8672-4185-87d5-bc5e6ba1cf4e
creationTimestamp: '2023-11-05T09:57:44Z'
generation: 1
managedFields:
- apiVersion: cert-manager.io/v1
fieldsType: FieldsV1
fieldsV1:
'f:status':
'f:conditions':
'k:{"type":"Approved"}':
.: {}
'f:lastTransitionTime': {}
'f:message': {}
'f:reason': {}
'f:status': {}
'f:type': {}
manager: approver-policy
operation: Apply
subresource: status
time: '2023-11-05T09:57:44Z'
- apiVersion: cert-manager.io/v1
fieldsType: FieldsV1
fieldsV1:
'f:status':
'f:ca': {}
'f:certificate': {}
'f:conditions':
'k:{"type":"Approved"}':
.: {}
'f:lastTransitionTime': {}
'f:message': {}
'f:reason': {}
'f:status': {}
'f:type': {}
'k:{"type":"Ready"}':
.: {}
'f:lastTransitionTime': {}
'f:message': {}
'f:reason': {}
'f:status': {}
'f:type': {}
manager: cert-manager-certificaterequests-issuer-vault
operation: Apply
subresource: status
time: '2023-11-05T09:57:45Z'
- apiVersion: cert-manager.io/v1
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:annotations':
.: {}
'f:cert-manager.io/certificate-name': {}
'f:cert-manager.io/certificate-revision': {}
'f:cert-manager.io/private-key-secret-name': {}
'f:labels':
.: {}
'f:app.kubernetes.io/managed-by': {}
'f:app.kubernetes.io/name': {}
'f:ownerReferences':
.: {}
'k:{"uid":"fc04c6eb-d67e-466c-93f2-0b0fa6a12f9f"}': {}
'f:spec':
.: {}
'f:duration': {}
'f:issuerRef':
.: {}
'f:group': {}
'f:kind': {}
'f:name': {}
'f:request': {}
'f:usages': {}
manager: cert-manager-certificates-request-manager
operation: Update
time: '2023-11-05T09:57:44Z'
namespace: preprod-fifty-for-imbalance-forecast-postprocessing
ownerReferences:
- apiVersion: cert-manager.io/v1
blockOwnerDeletion: true
controller: true
kind: Certificate
name: prefixverification-forecast-selector
uid: fc04c6eb-d67e-466c-93f2-0b0fa6a12f9f
labels:
app.kubernetes.io/managed-by: identity-provider
app.kubernetes.io/name: prefixverification-forecast-selector
spec:
duration: 144h0m0s
extra:
authentication.kubernetes.io/pod-name:
- cert-manager-5f86f8b489-6k548
authentication.kubernetes.io/pod-uid:
- f5d36458-dfb1-458e-8c79-31c78c87a0ea
groups:
- 'system:serviceaccounts'
- 'system:serviceaccounts:cert-manager'
- 'system:authenticated'
issuerRef:
group: cert-manager.io
kind: ClusterIssuer
name: vault-spiffe-issuer
request: >-
<REDACTED>
uid: 256a8935-1197-4f9a-915d-5dfd38d5f0c6
usages:
- client auth
username: 'system:serviceaccount:cert-manager:cert-manager'
status:
ca: >-
<REDACTED>
certificate: >-
<REDACTED>
conditions:
- lastTransitionTime: '2023-11-05T09:57:44Z'
message: 'Approved by CertificateRequestPolicy: "vault-spiffe"'
reason: policy.cert-manager.io
status: 'True'
type: Approved
- lastTransitionTime: '2023-11-05T09:57:45Z'
message: Certificate fetched from issuer successfully
reason: Issued
status: 'True'
type: Ready
from approver-policy.
@erikgb I looked at the code and my theory is that there are two problems:
-
approver-policy sometimes attempts to approve an already approved CertificateRequest
I'm hopeless at describing these sorts of interactions, but bear with me:
a. cert-manager creates CR v0
b. cert-manager and approver-policy both reconcile CR v0 in parallel:- approver-policy begins reconciling CR v0 and performs policy checks , meanwhile
- cert-manager reconciles CR v0 and adds the Ready condition with status False,
- approver-policy cache receives the updated CR v1 with Ready=False and re-queues CR v1 to be re-reconciled
- approver-policy completes policy checks and sets Approved condition.
c. approver-policy reconciles CR v1
- approver-policy completes policy checks and again attempts to set Approved condition
- cert-manager webhook denies the request and approver-policy logs the error
d. approver-policy cache receives the updated CR v2 with Approved=True, Ready=False and re-queues CR v2 to be re-reconciled
e. approver-policy ignores CR v2 because it already has an Approved condition.
One solution would be to modify approver-policy to expect this situation and ignore the error, let's call it ApprovedConditionAlreadySet
, in much the same way that we might IgnoreNotFound
or IgnoreAlreadyExists
errors.
-
cert-manager server side apply is setting (but not changing) the Approved condition and in so doing, taking joint ownership of that field.
I haven't looked into this, but I would guess that cert-manager is creating a server-side apply patch which contains the existing conditions from the CR, rather than only the Ready condition.
from approver-policy.
@erikgb I looked at the code and my theory is that there are two problems:
approver-policy sometimes attempts to approve an already approved CertificateRequest
I'm hopeless at describing these sorts of interactions, but bear with me:
a. cert-manager creates CR v0
b. cert-manager and approver-policy both reconcile CR v0 in parallel:
- approver-policy begins reconciling CR v0 and performs policy checks , meanwhile
- cert-manager reconciles CR v0 and adds the Ready condition with status False,
- approver-policy cache receives the updated CR v1 with Ready=False and re-queues CR v1 to be re-reconciled
- approver-policy completes policy checks and sets Approved condition.
c. approver-policy reconciles CR v1
- approver-policy completes policy checks and again attempts to set Approved condition
- cert-manager webhook denies the request and approver-policy logs the error
d. approver-policy cache receives the updated CR v2 with Approved=True, Ready=False and re-queues CR v2 to be re-reconciled
e. approver-policy ignores CR v2 because it already has an Approved condition.One solution would be to modify approver-policy to expect this situation and ignore the error, let's call it
ApprovedConditionAlreadySet
, in much the same way that we mightIgnoreNotFound
orIgnoreAlreadyExists
errors.
cert-manager server side apply is setting (but not changing) the Approved condition and in so doing, taking joint ownership of that field.
I haven't looked into this, but I would guess that cert-manager is creating a server-side apply patch which contains the existing conditions from the CR, rather than only the Ready condition.
I don't think the scenario you are describing is possible because each resource can only be reconciled one at a time
from approver-policy.
I don't think the scenario you are describing is possible because each resource can only be reconciled one at a time
Yes, and that is how it is in the scenario I described.
approver-policy is re-reconciling the CR at v1 before its cache has been updated, with the result of its previous reconcile.
I.e. approver-policy cache contains CR with the condition Ready=False, but not yet Approved=true/false.
from approver-policy.
I don't think the scenario you are describing is possible because each resource can only be reconciled one at a time
Yes, and that is how it is in the scenario I described.
approver-policy is re-reconciling the CR at v1 before its cache has been updated, with the result of its previous reconcile.
I.e. approver-policy cache contains CR with the condition Ready=False, but not yet Approved=true/false.
Thanks for explaining. I would expect the patch to be identical, so the webhook shouldn't complain about the condition changing, right?
from approver-policy.
Thanks for explaining. I would expect the patch to be identical, so the webhook shouldn't complain about the condition changing, right?
The cache is only updated from watch push events, ref. kubernetes-sigs/controller-runtime#1622, so if the cache is not updated from the previous round of reconciliation, the patch will not
be identical. The lastTransitionTime
field is the culprit.
Thinking a bit more about the suggested solution from @wallrj, I am more in favor of using a non-caching client when reconciling certificaterequests in approver-policy. Ignoring a custom error message will never be clean, and quite fragile. If anyone changes the literal error message in the cert-manager webhook, this will break again. The HTTP status code is just Forbidden
- which could mean many things.
from approver-policy.
Wow! 👏 Thanks for your deep analysis and suggestions, @wallrj! I didn't know you could use resourceVersion
with SSA as with CSA. The difference is probably that it's required with CSA, but optional with SSA. I am more in favor of your last suggestion now. It will address the root cause of this issue. Handling the error from cert-manager is just a workaround IMO. WDYT? @inteon
from approver-policy.
I'm confused, does #303 not solve this issue? Do we still get these errors after merging that PR?
from approver-policy.
Confused? Why? I think it might solve this issue, but we won't know for sure. I do not have the intention to build and provision a snapshot version in our busiest cluster.
from approver-policy.
@inteon Yeah, I realise that I was over complicating...I hadn't looked carefully at the code. I still think the situation I described can happen. It will be interesting to see whether @erikgb observes these errors after he deploys this fix (albeit fewer of them).
And still think that there is a bug in the cert-manager server-side apply code which causes it to take joint field ownership of the approval conditions.
from approver-policy.
Related Issues (20)
- Regex to disallow wildcard certificates HOT 2
- CertificateRequest approved but stuck with empty status HOT 1
- Webhook Custom CA
- Allow Custom Labels to be added to Resources
- group 'cert-manager.io' does not work HOT 4
- Error: YAML parse error on cert-manager-approver-policy/templates/deployment.yaml: error converting YAML to JSON: yaml: line 48: mapping values are not allowed in this context HOT 6
- Add Custom Annotations
- Improve CRD fields for specifying key requirements
- Setting .Values.nameOverride makes the pod not have rights to update secret cert-manager-approver-policy-tls HOT 1
- Helm chart rendering error: converting YAML to JSON: yaml: line 61: did not find expected key
- Simplify configuration by creating RBAC by default
- [CertificateRequestPolicy] `selector.issuerRef` incorrect example list instead of map
- Should initialize controller-runtime logging
- Include binary artifacts your releases.
- Add Helm option to create RBAC allowing approval for all issuers HOT 1
- Feature: Take control of approval for the whole cluster HOT 2
- failed to create subjectaccessreview HOT 11
- Typo in error message: connection patch should say CertificateRequest.Status patch HOT 1
- Infinite loop causing increase in certificaterequestpolicies version HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from approver-policy.