Giter VIP home page Giter VIP logo

Comments (18)

tmjd avatar tmjd commented on August 11, 2024

I think #1513 will address this. Though really I've moved this to doing gets instead of deletes. So if the Gets are being audited there would still be lots of logs.
Another possibility would only delete once when we know it needs to be deleted or at start, and just assume that nothing else will be creating the tigerastatus.

from operator.

djsly avatar djsly commented on August 11, 2024

in our case, we are look at 4.. and 5.. return code in general, for any verb. This helps us detect wrongly configured services or clients that might be doing things wrong.

This means that we would indeed still report it but with a different verb.

Another possibility would only delete once when we know it needs to be deleted or at start, and just assume that nothing else will be creating the tigerastatus.

This seems indeed like a cleaner solution, I haven't looked into the tigerastatus and it's lifecycle, but in the case of AKS, I"m they are not creating any.

From looking at the Crd definition, https://docs.tigera.io/manifests/ocp/crds/01-crd-tigerastatus.yaml should we expect the status to be created after the first install and stay present to reflect the live status of the operator ?

from operator.

djsly avatar djsly commented on August 11, 2024

actually, from this comment in your code

// removeTigeraStatus returns true and removes the status displayed in TigeraStatus if corresponding CR not found

Could there be an issue where the CR should be present on the AKS cluster ?
I'm actually lost, since I cannot find the operator running in my cluster.
and I only see a single CRD related to Tigera operator.

❯ k get crd | grep tigera
installations.operator.tigera.io                                   2021-01-19T01:37:17Z

No installations CR exist on the cluster.

I'm asking the AKS folks if they are running the operator on their side targeting clusters. cause from your code, it seems that the deletion should only occur when the statusManager isn't enabled.

from operator.

djsly avatar djsly commented on August 11, 2024

so I was tired and I was looking at a cluster that wasn't updated to 1.20 yet, so indeed I now have a tiger-operator namespace and the logs was showing

it seems that the status manager isn't starting, it explains why it keeps trying to delete the status CR

{"level":"info","ts":1631506159.7930598,"logger":"setup","msg":"Checking type of cluster","provider":""}
{"level":"info","ts":1631506159.7975225,"logger":"setup","msg":"Checking if TSEE controllers are required","required":false}
{"level":"info","ts":1631506160.0142817,"logger":"typha_autoscaler","msg":"Starting typha autoscaler","syncPeriod":10}
{"level":"info","ts":1631506160.0143795,"logger":"setup","msg":"starting manager"}
I0913 04:09:20.015304       1 leaderelection.go:243] attempting to acquire leader lease  tigera-operator/operator-lock...
{"level":"info","ts":1631506160.1331236,"logger":"typha_autoscaler","msg":"Updating typha replicas from 3 to 0"}
{"level":"info","ts":1631506170.0432284,"logger":"typha_autoscaler","msg":"Updating typha replicas from 0 to 3"}
I0913 04:09:37.643890       1 leaderelection.go:253] successfully acquired lease tigera-operator/operator-lock
{"level":"info","ts":1631506177.644897,"logger":"controller-runtime.manager.controller.apiserver-controller","msg":"Starting EventSource","source":"kind source: /, Kind="}
{"level":"info","ts":1631506177.6461124,"logger":"controller-runtime.manager.controller.tigera-installation-controller","msg":"Starting EventSource","source":"kind source: /, Kind="}
{"level":"info","ts":1631506177.7463608,"logger":"controller-runtime.manager.controller.apiserver-controller","msg":"Starting EventSource","source":"kind source: /, Kind="}
{"level":"info","ts":1631506177.7476687,"logger":"controller-runtime.manager.controller.apiserver-controller","msg":"Starting EventSource","source":"kind source: /V1, Kind=ConfigMap"}
{"level":"info","ts":1631506177.74758,"logger":"controller-runtime.manager.controller.tigera-installation-controller","msg":"Starting EventSource","source":"kind source: /V1, Kind=Secret"}
{"level":"info","ts":1631506178.953494,"logger":"controller-runtime.manager.controller.apiserver-controller","msg":"Starting EventSource","source":"kind source: /V1, Kind=Secret"}
{"level":"info","ts":1631506180.9547164,"logger":"controller-runtime.manager.controller.tigera-installation-controller","msg":"Starting EventSource","source":"kind source: /V1, Kind=ConfigMap"}
{"level":"info","ts":1631506180.9582217,"logger":"controller-runtime.manager.controller.apiserver-controller","msg":"Starting EventSource","source":"kind source: /V1, Kind=Secret"}
{"level":"info","ts":1631506180.9617813,"logger":"controller-runtime.manager.controller.apiserver-controller","msg":"Starting EventSource","source":"kind source: /, Kind="}
{"level":"info","ts":1631506180.972377,"logger":"controller-runtime.manager.controller.tigera-installation-controller","msg":"Starting EventSource","source":"kind source: /V1, Kind=ConfigMap"}
{"level":"info","ts":1631506180.981175,"logger":"controller-runtime.manager.controller.tigera-installation-controller","msg":"Starting EventSource","source":"kind source: /V1, Kind=ConfigMap"}
{"level":"info","ts":1631506180.9844275,"logger":"controller-runtime.manager.controller.tigera-installation-controller","msg":"Starting EventSource","source":"kind source: /, Kind="}
{"level":"info","ts":1631506180.9886284,"logger":"controller-runtime.manager.controller.tigera-installation-controller","msg":"Starting EventSource","source":"kind source: /, Kind="}
{"level":"info","ts":1631506181.0789032,"logger":"controller-runtime.manager.controller.apiserver-controller","msg":"Starting Controller"}
{"level":"info","ts":1631506181.0797353,"logger":"controller-runtime.manager.controller.apiserver-controller","msg":"Starting workers","worker count":1}
{"level":"info","ts":1631506181.0802593,"logger":"controller_apiserver","msg":"Reconciling APIServer","Request.Namespace":"","Request.Name":"default"}
{"level":"info","ts":1631506181.0815806,"logger":"controller_apiserver","msg":"APIServer config not found","Request.Namespace":"","Request.Name":"default"}
{"level":"info","ts":1631506181.0896528,"logger":"controller-runtime.manager.controller.tigera-installation-controller","msg":"Starting EventSource","source":"kind source: /, Kind="}
{"level":"info","ts":1631506181.1946084,"logger":"controller-runtime.manager.controller.tigera-installation-controller","msg":"Starting EventSource","source":"kind source: /, Kind="}
{"level":"info","ts":1631506181.2977993,"logger":"controller-runtime.manager.controller.tigera-installation-controller","msg":"Starting EventSource","source":"kind source: /, Kind="}
{"level":"info","ts":1631506181.3995752,"logger":"controller-runtime.manager.controller.tigera-installation-controller","msg":"Starting EventSource","source":"kind source: /, Kind="}
{"level":"info","ts":1631506181.5003502,"logger":"controller-runtime.manager.controller.tigera-installation-controller","msg":"Starting EventSource","source":"kind source: /, Kind="}
{"level":"info","ts":1631506181.6088758,"logger":"controller-runtime.manager.controller.tigera-installation-controller","msg":"Starting EventSource","source":"kind source: /, Kind="}
{"level":"info","ts":1631506181.7110636,"logger":"controller-runtime.manager.controller.tigera-installation-controller","msg":"Starting Controller"}
{"level":"info","ts":1631506181.7465715,"logger":"controller-runtime.manager.controller.tigera-installation-controller","msg":"Starting workers","worker count":1}
{"level":"info","ts":1631506185.01433,"logger":"status_manager.calico","msg":"Status manager is not ready to report component statuses."}
{"level":"info","ts":1631506191.3367884,"logger":"status_manager","msg":"Failed to update tigera status","error":"Operation cannot be fulfilled on tigerastatuses.operator.tigera.io \"calico\": the object has been modified; please apply your changes to the latest version and try again"}
{"level":"info","ts":1631537615.0556335,"logger":"status_manager","msg":"Failed to update tigera status","error":"Operation cannot be fulfilled on tigerastatuses.operator.tigera.io \"calico\": the object has been modified; please apply your changes to the latest version and try again"}
{"level":"info","ts":1631552145.0310025,"logger":"status_manager","msg":"Failed to update tigera status","error":"Operation cannot be fulfilled on tigerastatuses.operator.tigera.io \"calico\": the object has been modified; please apply your changes to the latest version and try again"}
{"level":"info","ts":1631552190.0333617,"logger":"status_manager","msg":"Failed to update tigera status","error":"Operation cannot be fulfilled on tigerastatuses.operator.tigera.io \"calico\": the object has been modified; please apply your changes to the latest version and try again"}
{"level":"info","ts":1631552210.0374384,"logger":"status_manager","msg":"Failed to update tigera status","error":"Operation cannot be fulfilled on tigerastatuses.operator.tigera.io \"calico\": the object has been modified; please apply your changes to the latest version and try again"}
{"level":"info","ts":1631553660.0494459,"logger":"status_manager","msg":"Failed to update tigera status","error":"Operation cannot be fulfilled on tigerastatuses.operator.tigera.io \"calico\": the object has been modified; please apply your changes to the latest version and try again"}
{"level":"info","ts":1631576174.2103264,"logger":"controller_apiserver","msg":"Reconciling APIServer","Request.Namespace":"","Request.Name":"default"}
{"level":"info","ts":1631576174.2106676,"logger":"controller_apiserver","msg":"APIServer config not found","Request.Namespace":"","Request.Name":"default"}
{"level":"info","ts":1631611172.4854195,"logger":"controller_apiserver","msg":"Reconciling APIServer","Request.Namespace":"","Request.Name":"default"}
{"level":"info","ts":1631611172.489367,"logger":"controller_apiserver","msg":"APIServer config not found","Request.Namespace":"","Request.Name":"default"}

from operator.

djsly avatar djsly commented on August 11, 2024

CC: @paulgmiller since you might be interested in this

from operator.

tmjd avatar tmjd commented on August 11, 2024

@djsly You could add a CR for the apiserver which will cause the calico-apiserver to be deployed and then the status won't be trying to remove the tigerastatus for it anymore.

apiVersion: operator.tigera.io/v1
kind: APIServer 
metadata: 
  name: default 
spec: {}

From looking at the Crd definition, https://docs.tigera.io/manifests/ocp/crds/01-crd-tigerastatus.yaml should we expect the status to be created after the first install and stay present to reflect the live status of the operator ?

There will be a tigerastatus only when the corresponding tigera operator CR will is created, that is why you're seeing a delete for apiserver tigerastatus because no apiserver.operator.tigera.io 'default' has been created so it is ensuring the tigerastatus is cleaned up.

from operator.

djsly avatar djsly commented on August 11, 2024

thanks @tmjd , so I guess

  1. Issue on AKS team: why aren't they creating the default APIServer CR. was this by design or something they weren't aware
  2. Issue on the Operator side: Does it need to constantly try to delete something that was not created on purpose. Can the logic be improved to try once, or maybe to run a GET with a labelSelector, this would yield a No Resource found with a 200 OK
❯ k get APIServer -l name=tigera-default -v 6
I0914 11:40:11.106136   90335 loader.go:372] Config loaded from file:  /Users/sylvain_boily/.kube/config
I0914 11:40:11.416221   90335 round_trippers.go:454] GET https://<apiserver>:443/apis/operator.tigera.io/v1/apiservers?labelSelector=name%3Dtigera-default&limit=500 200 OK in 300 milliseconds
No resources found

from operator.

tmjd avatar tmjd commented on August 11, 2024
  1. I'm not sure if they were aware or not or if it was a choice they made.
  2. It doesn't constantly need to try that delete. That's a good idea on the get with a labelSelector as an easy change. We'd need to add a label but that is easy enough and I don't think would cause any issues. Though I do think I'd prefer to adjust how we do the removal, either with a watch to see if it ever got created or delete it perhaps at "start of day" or if we knew it should be removed.

from operator.

djsly avatar djsly commented on August 11, 2024

so I added the following to my cluster

❯ cat default-apiserver 
apiVersion: operator.tigera.io/v1
kind: APIServer 
metadata: ``
  name: default 
spec: {}

and I'm getting 10x more 404 now :)

Screen Shot 2021-09-14 at 1 59 30 PM

Screen Shot 2021-09-14 at 1 59 36 PM

from operator.

tmjd avatar tmjd commented on August 11, 2024

Does that continue forever?
Does kubectl get tigerastatus apiserver -o yaml show that the apiserver is ready? I would expect once apiserver was available that those would stop.

from operator.

djsly avatar djsly commented on August 11, 2024
status:
  conditions:
  - lastTransitionTime: "2021-09-14T17:49:50Z"
    status: "False"
    type: Progressing
  - lastTransitionTime: "2021-09-14T17:49:45Z"
    message: 'Pod calico-apiserver/calico-apiserver-f878b8657-gk2lx failed to pull
      container image for: calico-apiserver'
    reason: Some pods are failing
    status: "True"
    type: Degraded
  - lastTransitionTime: "2021-09-14T17:49:45Z"
    status: "False"
    type: Available

I guess something isn't configured with the with image... I'm not seeing where this container is getting created.

Actually it went in the calico-apiserver namespace.... can I have it run in the tigera-operator namespace ?

Events:
  Type     Reason   Age                    From     Message
  ----     ------   ----                   ----     -------
  Warning  Failed   11m (x8568 over 32h)   kubelet  Error: ImagePullBackOff
  Normal   BackOff  108s (x8613 over 32h)  kubelet  Back-off pulling image "mcr.microsoft.com/oss/calico/apiserver:v3.20.0"

I'm not sure if the image was pushed to the mcr.microsoft.com repo...

from operator.

djsly avatar djsly commented on August 11, 2024

It was confirmed that the image was missing, I will be retrying once it is available on mcr.microsoft.com. They are working on it

from operator.

djsly avatar djsly commented on August 11, 2024

@tmjd sorry for the late reply, the image is now present in the Microsoft repo.

The ApiServer started

❯ k describe apiserver default 
Name:         default
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  operator.tigera.io/v1
Kind:         APIServer
Metadata:
  Creation Timestamp:  2021-09-16T13:13:58Z
  Generation:          1
  Managed Fields:
    API Version:  operator.tigera.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2021-09-16T13:13:58Z
    API Version:  operator.tigera.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:state:
    Manager:         operator
    Operation:       Update
    Time:            2021-09-21T20:41:21Z
  Resource Version:  1078163162
  UID:               a2c5f2f4-f6f7-40cc-9663-0de3001aa195
Spec:
Status:
  State:  Ready
Events:   <none>

and the POD is running

❯ k get pods -n calico-apiserver 
NAME                                READY   STATUS    RESTARTS   AGE
calico-apiserver-64c74c9dc5-5wgp9   1/1     Running   0          5d7h

it also looks like the 404s are gone as well now!

one question, is there a need to have the apiserver pods run in the calico-apiserver namespace ? could we configure it to run in the same tigera-operator namespace ?

from operator.

tmjd avatar tmjd commented on August 11, 2024

The tigera-operator is opinionated and we think it is best practice to use different namespaces, so the operator does not provide an option to put that in a different namespace.

from operator.

djsly avatar djsly commented on August 11, 2024

@tmjd / @caseydavenport what is the fix in the end ?

from operator.

caseydavenport avatar caseydavenport commented on August 11, 2024

Sorry, I was just skimming issues and it looked like this one was fixed based on your comment:

it also looks like the 404s are gone as well now!

Is it not fixed, @djsly?

from operator.

djsly avatar djsly commented on August 11, 2024

from operator.

caseydavenport avatar caseydavenport commented on August 11, 2024

I don't think it's worth jumping through hoops to avoid 404s showing up in the API server logs - that's a normal and expected response on a distributed system and shouldn't be used as a metric of whether or not something is functioning or not.

However, it does make sense to avoid calling Delete() unnecessarily. This PR updates the operator to track whether or not it has created / deleted the CR, so it knows when it needs to delete it and when it does not: #1654

It has the side-effect of not spotting when a user creates / deletes the CR out-of-band, which isn't ideal. Need to think about that more.

from operator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.