Giter VIP home page Giter VIP logo

Comments (8)

saad-ali avatar saad-ali commented on May 18, 2024

CC @msau42 @jsafrane

from external-attacher.

jsafrane avatar jsafrane commented on May 18, 2024

Looking at the logs, exponential backoff is weird. Controller got synced at these intervals:

I1128 12:32:23.206454       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:32:23.233230       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:32:23.247519       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:32:23.257638       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:32:23.317422       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:32:23.420913       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:32:23.587099       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:32:23.913460       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:32:24.561902       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:32:24.572704       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:32:25.852872       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:32:25.860689       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:32:30.980237       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:32:31.009714       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:32:31.030087       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:32:51.490190       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:32:51.508139       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:35:35.348500       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:35:47.998133       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:37:28.546844       1 controller.go:167] Started VA processing "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"

Recalculated to 10 of milliseconds between calls:

2.7
1.4
1.0
6.0
10.3
16.6
32.6
64.8
1.1
128.0
0.8
512.0
2.9
2.0
2046.0
1.8
16384.0
1265.0
10054.9

It shows some exponential growth from 16 through 32, 64, 128, but it misses 256, 1024 and lot of numbers after 2048.

Unfortunately, informer does not log anything.

As side note, these short 1-2ms intervals in between are caused by timestamp in VolumeAttachment.Status.Error - when backoff timer fires up, the controller re-tries attach and writes new timestamp to Status. This triggers new "VolumeAttachment changed" event and the controller reconciles the object again. It tries to attach the volume again and writes new .Status. At this point, the timestamp is the same as before (it's rounded to seconds), so it does not generate new "VolumeAttachment changed" event. IMO we could do something about it and not to try attaching the volume so quickly, however, it most probably is not root cause of this bug.

from external-attacher.

jsafrane avatar jsafrane commented on May 18, 2024

This "VolumeAttachment changed" is related to the bug. New attach fails and VolumeAttachment is re-enqueued with new interval, overwriting the old one. So VolumeAttachment is enqueued with timeout 2*N instead of N, as we can see aboce. I thought that workqueue is smart enough not to enqueue an item if the item is still in the queue. I'll ignore "VolumeAttachment updated" events if the only difference is VolumeAttachment.Status.Error.

That does not explain why there is such a long gap between 2046.0 (=20s) and 16384 (=168s) though. And that's where these 2.5 minutes are, the previous syncs take ~30 seconds together.

from external-attacher.

jsafrane avatar jsafrane commented on May 18, 2024

I create #104, it does not retry to attach/detach on VolumeAttachment changed events caused by AttachError write.

from external-attacher.

fejta-bot avatar fejta-bot commented on May 18, 2024

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

from external-attacher.

fejta-bot avatar fejta-bot commented on May 18, 2024

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

from external-attacher.

fejta-bot avatar fejta-bot commented on May 18, 2024

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

from external-attacher.

k8s-ci-robot avatar k8s-ci-robot commented on May 18, 2024

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

from external-attacher.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.