Comments (18)
from kube-mgmt.
We've confirmed that v0.11 fixes the logging issue. Since the logging problem and the resource sync issue have been resolved, I'm going to close this for now.
from kube-mgmt.
@muzcategui1106-gs can you show an example of a configmap that's annotated with policy-status that should not be? From looking at the configmap replication code in kube-mgmt, it's not obvious how this could happen.
RE: The sync channel closure, that is to be expected with the Kubernetes client. However, the delay on upserting into OPA is NOT expected. I'll try to reproduce the problem.
One thing that might help would be to update the data replicator to use the PATCH method on OPA instead of DELETE followed by a series of PUTs. Here is the code in question: https://github.com/open-policy-agent/kube-mgmt/blob/master/pkg/data/generic.go#L147
from kube-mgmt.
Sure I not allowed to show an entire configmcannot show the entire configmap but only the error that I get ( i changed the filename in the error message)
Version Info:
OPA: 0.14.0
Kube-mgmt: 0.9
Flags on kube-mgmt:
args: - "--replicate-cluster=v1/namespaces" - "--replicate-cluster=v1/persistentvolumes" - "--replicate=v1/persistentvolumeclaims" - "--replicate=v1/resourcequotas" - "--require-policy-label"
Error on configmap
openpolicyagent.org/policy-status: '{"status":"error","error":{"code":"invalid_parameter","message":"error(s) occurred while compiling module(s)","errors":[{"code":"rego_parse_error","message":"no match found, expected: [ \\t\\r\\n] or [A-Za-z_]","location":{"file":"opa/example.conf","row":7,"col":11}}]}}'
Regarding resource utilization I dont have any specification on my deployment about that. I should have bvut from what i see they pods dont seem to be starving for resources
NAME CPU(cores) MEMORY(bytes)
opa-5985f8cf78-kv45t 1571m 132Mi
opa-5985f8cf78-rbq4p 1m 151Mi
opa-5985f8cf78-rcrds 1369m 156Mi
policy-engine-6b6ffbb5f9-5kznf 914m 78Mi
policy-engine-6b6ffbb5f9-64l8d 503m 76Mi
policy-engine-6b6ffbb5f9-bhlht 329m 81Mi
What should be the recommended values in your opinion?
Regarding the code you mention https://github.com/open-policy-agent/kube-mgmt/blob/master/pkg/data/generic.go#L147 I observed the same when I was looking from a performance point of view it does seem like a good idea to do PATCH as opposed to DELETE and PUTS
from kube-mgmt.
@muzcategui1106-gs can you confirm whether the openpolicyagent.org/policy label is set on that configmap? I have a test deployment that is emitting the "Unable to decode..." error log but the configmaps are not being loaded incorrectly:
$ kubectl -n torin-test get configmap -o yaml
apiVersion: v1
items:
- apiVersion: v1
data:
x.txt: |
hello
kind: ConfigMap
metadata:
creationTimestamp: "2019-09-19T15:26:12Z"
name: notpolicy
namespace: torin-test
resourceVersion: "7431454"
selfLink: /api/v1/namespaces/torin-test/configmaps/notpolicy
uid: d01934e1-daf1-11e9-ae65-024400571c2a
- apiVersion: v1
data:
x.rego: |-
package foo
p = 1
kind: ConfigMap
metadata:
annotations:
openpolicyagent.org/policy-status: '{"status":"ok"}'
creationTimestamp: "2019-09-19T15:26:06Z"
labels:
openpolicyagent.org/policy: rego
name: policy
namespace: torin-test
resourceVersion: "7431761"
selfLink: /api/v1/namespaces/torin-test/configmaps/policy
uid: cc69ae52-daf1-11e9-ae65-024400571c2a
kind: List
metadata:
resourceVersion: ""
selfLink: ""
Logs:
E0923 13:10:34.000910 1 streamwatcher.go:109] Unable to decode an event from the watch stream: unable to decode watch event: no kind "Status" is registered for version "v1" in scheme "github.com/open-policy-agent/kube-mgmt/pkg/configmap/configmap.go:102"
E0923 13:32:19.022065 1 streamwatcher.go:109] Unable to decode an event from the watch stream: unable to decode watch event: no kind "Status" is registered for version "v1" in scheme "github.com/open-policy-agent/kube-mgmt/pkg/configmap/configmap.go:102"
E0923 13:49:55.032696 1 streamwatcher.go:109] Unable to decode an event from the watch stream: unable to decode watch event: no kind "Status" is registered for version "v1" in scheme "github.com/open-policy-agent/kube-mgmt/pkg/configmap/configmap.go:102"
time="2019-09-23T13:51:44Z" level=info msg="Sync channel for v1/pods closed. Restarting immediately."
time="2019-09-23T13:51:44Z" level=info msg="Syncing v1/pods."
time="2019-09-23T13:51:44Z" level=info msg="Listed v1/pods and got 633 resources with resourceVersion 9361168. Took 294.467604ms."
time="2019-09-23T13:51:45Z" level=info msg="Loaded v1/pods resources into OPA. Took 524.414201ms. Starting watch at resourceVersion 9361168."
from kube-mgmt.
the openpolicyagent.org/policy is not set only the annotation
from kube-mgmt.
@muzcategui1106-gs we've cut v0.10 which contains #50. This should reduce CPU usage during resync. As of yet, the configmap annotation problem is still unclear and I've not been able to reproduce it. Any hints you can provide would be helpful. Finally, @patrick-east has looked into the decode errors and has some thoughts.
from kube-mgmt.
@tsandall this is awesome so hopefully this should reduce CPU uses and increase resource sync. Will give it a try.
Regarding the configmap sync let me see what I can dig in the next couple of days :)
on the meantime I will start running 0.10 in my dev environment
from kube-mgmt.
@patrick-east has looked into the decode errors and has some thoughts.
Right, so looking into it these type of errors occur if the scheme wasn't registered with our client (which happens over here: https://github.com/open-policy-agent/kube-mgmt/blob/master/pkg/configmap/configmap.go#L104-L112 ). What isn't clear is what object came over the wire that it was unable to decode.
The current theory is that some object version/schema changed while kube-mgmt was running, or the client code we're using is out of date or has a bug in it and is missing part of the schema for some object.
As of the last chat with @tsandall it seems like this particular issue isn't likely a big deal as kube-mgmt will just continue processing the objects we care about.
from kube-mgmt.
Having just installed OPA and kube-mgmt in our 3 dev clusters I am also seeing this in the logs for each environment (131 "Unable to decode.." messages last hour).
Also seem to be having some (possibly related?) issues syncing kubernetes resources, leading to data being seemingly unavailable to some of the OPA instances, which in turn leads to authorization decisions being allowed or denied based on which of the instances is consulted (the default of course set to deny).
Let me know if I can provide any data to help you look into this.
from kube-mgmt.
Hi @anderseknert, please add more details. Like: versions of OPA/kube-mgmt, the startup arguments and the full error.
from kube-mgmt.
Hi @rtoma, and thanks for reaching out. Here are some of the details. Let me know if you need more.
OPA image: docker.io/openpolicyagent/opa:0.14.2
Startup arguments run --server --authentication=token --authorization=basic --ignore=.* /policies/authz.rego
kube-mgmt image: openpolicyagent/kube-mgmt:0.10
Startup arguments: --opa-auth-token-file=/policies/token --require-policy-label=true --replicate=v1/services --replicate=v1/pods --replicate=v1/configmaps --replicate=v1/persistentvolumeclaims --replicate=apps/v1/deployments --replicate=apps/v1/statefulsets --replicate=autoscaling/v1/horizontalpodautoscalers --replicate=batch/v1/jobs --replicate=batch/v1beta1/cronjobs --replicate=extensions/v1beta1/ingresses --replicate=extensions/v1beta1/replicasets --replicate=networking.k8s.io/v1/networkpolicies
All in all the payload returned when hitting /v1/data/
is about 25MB of JSON so it's definitely syncing - it just seems to have intermittent problems on some of the instances.
Full error:
E1030 18:02:45.864525 1 streamwatcher.go:109] Unable to decode an event from the watch stream: unable to decode watch event: no kind "Status" is registered for version "v1" in scheme "github.com/open-policy-agent/kube-mgmt/pkg/configmap/configmap.go:102"
from kube-mgmt.
Hi, our kube-mgmt logging is full of these errors without functional impact. Which is confirmed by:
As of the last chat with @tsandall it seems like this particular issue isn't likely a big deal as kube-mgmt will just continue processing the objects we care about.
So maybe focus on the 'issues syncing kubernetes resources' which clearly has a functional impact. Maybe it is an idea to create a script that makes /v1/data calls on all pods every minute and compare the results? We've created a metrics collector (which I can not share) for this purpose. Another suggestion because it seems you want to seriously use OPA: develop an OPA regression tester that periodically POSTs artificial AdmissionReview payloads (extract from OPA's decision log) to the OPA webhook endpoint and match the result against the expected results. That way you can verify OPA and its policies are behaving as you expect.
from kube-mgmt.
Thanks @rtoma. Yeah we are definitely looking at using OPA at a larger scale. So far I have some tests in place, and more definitely to come - though currently they all point to an ingress controller just like the actual authorization webhook that we're looking at using OPA for as our first proof of concenpt. It's from these tests I've noticed different results on re-running tests though nothing in the kubernetes data actually changed during that time frame. Will see about running tests targeting the pods directly, thanks for the pointer. Even if I can verify the sync issue I'm not sure how to proceed with that from there though.
from kube-mgmt.
So my tests work something like this:
- In the setup, create and deploy a kubernetes resource.
- Verify that I can run policy queries targeting OP with aforementioned resource as data.
- Teardown, delete kubernetes resource.
Sleeping 60 seconds between 1 and 2 seems to solve the problem I had with data getting out of sync, so this seems like a problem with my tests rather than with kube-mgmt. Will extend the test suite to also target individual pods, but for now this seems solved for me.
The error message originally reported is still very much present and annoying, but it did not cause sync issues.
from kube-mgmt.
Will anything be done about the no kind "Status" is registered for version "v1"
error? It's kind of confusing and can make debugging harder.
Is it an issue with the Kubernetes version? I'm seeing it on both a 1.12 and a 1.15 cluster.
from kube-mgmt.
I took another look into the no kind "Status" is registered
error message and found (what I think is) a solution. I've posted a PR w/ the one-line fix #58. I'll merge this fix later today and cut a release.
@muzcategui1106-gs if you can test this out, that would be great.
from kube-mgmt.
@tsandall yes when the release comes out, I will test out the changes in our dev environment
from kube-mgmt.
Related Issues (20)
- Alpine Base image HOT 3
- Bad indents: can't specify resources for mgmt HOT 1
- Helm chart does not support Kubernetes v1.25 PodDisruptionBudget HOT 1
- helm: openpolicyagent/opa image is outdated and has a critical vulnerability
- Upgrading the Helm chart on Kubernetes v1.25 fails with podDisruptionsBudget enabled
- kube-mgmt doesn't reload configmaps if opa container restarts HOT 6
- CVE reported on kube-mgmt v8.0.1 - libcrypto1.1 HOT 1
- Breaking issue when running with more than 1 replica HOT 8
- upgrading from 8.0.2 to 8.1.0 breaks namespaces sync HOT 10
- Failed calling webhook "webhook.openpolicyagent.org" error HOT 5
- CVE reported for gopkg.in/yaml.v3 HOT 3
- Kube mgmt fails after upgrade - {"code":"undefined_document","message":"document missing: data.system.main"} HOT 2
- kube-mgmt does not retry adding policies to OPA HOT 1
- When OPA container restarted kube-mgmt is not re-syncing the policies HOT 2
- opa-kube-mgmt Helm Chart config can't use existing Cert-Manager Issuer or an existing Secret created from Cert-Manager HOT 4
- CVE Vulnerabilities HOT 1
- Add startup probe to kube-mgmt container HOT 12
- Add liveness probe to kube-mgmt container HOT 5
- Do not use ClusterRole and ClusterRoleBinding when .Values.mgmt.namespaces list is empty
- Pre populate data in opa container on startup. HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kube-mgmt.