Giter VIP home page Giter VIP logo

Comments (9)

adleong avatar adleong commented on August 15, 2024 2

Thank you for this very helpful data. Using this, I was able to reproduce the issue and found the root cause to be a missing field in the HTTPRoute CRD schema. I've added the missing field here #12454 and confirmed that this resolves the issue.

from linkerd2.

aminafshar avatar aminafshar commented on August 15, 2024

Screenshot 2024-03-21 at 15 45 37

from linkerd2.

aminafshar avatar aminafshar commented on August 15, 2024

Another httproute controller-route-user-5186 created and the one above deleted controller-route-user-476722,
and policy controller keeps throwing hundreds of the same errors:

{"timestamp":"2024-03-21T12:00:12.113301Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-user-476722\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-user-476722\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-user-476722\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}
{"timestamp":"2024-03-21T12:00:12.430386Z","level":"ERROR","fields":{"message":"Failed to patch HTTPRoute","namespace":"my-sandbox","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-user-5186\" }","error":"ApiError: httproutes.policy.linkerd.io \"controller-route-user-5186\" not found: NotFound (ErrorResponse { status: \"Failure\", message: \"httproutes.policy.linkerd.io \\\"controller-route-user-5186\\\" not found\", reason: \"NotFound\", code: 404 })"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Controller"}]}

and it took about 40 minutes for the status field to be updated. note the creationTimestamp and status update time.

apiVersion: policy.linkerd.io/v1beta3
kind: HTTPRoute
metadata:
  creationTimestamp: '2024-03-21T13:09:58Z'
  generation: 1
  managedFields:
    - apiVersion: policy.linkerd.io/v1beta3
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          f:parentRefs: {}
          f:rules: {}
      manager: fabric8
      operation: Apply
      time: '2024-03-21T13:09:58Z'
    - apiVersion: policy.linkerd.io/v1beta3
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:parents: {}
      manager: policy.linkerd.io
      operation: Update
      subresource: status
      time: '2024-03-21T13:49:16Z'
  name: controller-route-user-5186
  namespace: my-sandbox
  resourceVersion: '648189883'
  uid: d350c128-3751-4b20-8f85-d0959ffa6c21
  selfLink: >-
    /apis/policy.linkerd.io/v1beta3/namespaces/my-sandbox/httproutes/controller-route-user-5186
status:
  parents:
    - conditions:
        - lastTransitionTime: '2024-03-21T13:14:16Z'
          message: ''
          reason: Accepted
          status: 'True'
          type: Accepted
        - lastTransitionTime: '2024-03-21T13:14:16Z'
          message: ''
          reason: ResolvedRefs
          status: 'True'
          type: ResolvedRefs
      controllerName: linkerd.io/policy-controller
      parentRef:
        group: core
        kind: Service
        name: my-controller
        namespace: my-sandbox
spec:
  parentRefs:
    - group: core
      kind: Service
      name: my-controller
      port: 5051
  rules:
    - backendRefs:
        - group: core
          kind: Service
          name: my-app-0
          port: 3004
          weight: 1
      matches:
        - headers:
            - name: x-user-id
              type: Exact
              value: '5186'
          path:
            type: PathPrefix
            value: /

linkerd-destination-56f85576c7-tpx4h_policy.log

from linkerd2.

adleong avatar adleong commented on August 15, 2024

@aminafshar this looks like it is likely the same issue as #12104 and is fixed in #12215

from linkerd2.

olix0r avatar olix0r commented on August 15, 2024

This was fixed in https://github.com/linkerd/linkerd2/releases/tag/edge-24.3.4. Please let us know if issues persist.

from linkerd2.

aminafshar avatar aminafshar commented on August 15, 2024

@adleong , @olix0r
Now we're running edge-24.4.1 (Kubernetes version: v1.28.8),
It seems resource-wise policy controller running normally and memory leak issue resolved
but still we are seeing a long delay of about several minutes between httproute creation and status field update
and we see lots of errors as below

{"timestamp":"2024-04-15T08:18:19.231788Z","level":"ERROR","fields":{"message":"Failed to send HTTPRoute patch","id.namespace":"pangolin","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-user-4288\" }","error":"no available capacity"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"status::Index"}]}
Screenshot 2024-04-15 at 11 17 09

from linkerd2.

aminafshar avatar aminafshar commented on August 15, 2024

As you see memory usage became a flat line for the last 7 hours, and seems the policy controller is just stuck in that state, keeps throwing the same error

2024-04-15T13:53:09+03:00 {"timestamp":"2024-04-15T10:53:09.703054Z","level":"ERROR","fields":{"message":"Failed to send HTTPRoute patch","id.namespace":"pangolin","route":"GroupKindName { group: \"policy.linkerd.io\", kind: \"HTTPRoute\", name: \"controller-route-user-0123\" }","error":"no available capacity"},"target":"linkerd_policy_controller_k8s_status::index","spans":[{"name":"httproutes.policy.linkerd.io"}]}

from linkerd2.

adleong avatar adleong commented on August 15, 2024

Hi @aminafshar, sorry to hear you're still experiencing this.

Those error messages indicates that the policy controller is generating HTTPRoute status patches more quickly than the kubernetes API can keep up with. The policy controller will only generate a patch for an HTTPRoute if the HTTPRoute's status is out of date and needs to be updated. I've attempted to reproduce this with 1000 HTTPRoutes but I only see patches generated when the HTTTPRoutes are first created and not continuously like you seem to be experiencing. Are HTTPRoutes being created or updated rapidly by some controller or automated process?

If you can provide the output of linkerd diagnostics controller-metrics, it can help us confirm what we're seeing. If you can also share the yaml formatted output from one of these HTTPRoutes (e.g. kubectl get httproute/X -o yaml) we can see if anything seems unexpected about the resource itself or its status.

from linkerd2.

aminafshar avatar aminafshar commented on August 15, 2024

Hi @adleong , I asked our developers to provide info on how they create and manage httproutes.

At the time of writing, there are about ~60 httproutes on the cluster and only a few deleted/created recently.
linkerd-destination pods restarted, running for the last ~2hours. Logs and diagnostics output and some recent httproutes yaml output attached.
linkerd-diagnostics-controller-metrics.txt
policy_linkerd-destination-887769595-492pk.log
policy_linkerd-destination-887769595-hdmzp.log
policy_linkerd-destination-887769595-gttn5.log
httproutes.yml.txt

from linkerd2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.