Comments (8)
When running into the issue, problematic volumes are unable to be deleted
╰─$ kl get volumes
NAME DATA ENGINE STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
pvc-2c181828-451f-4e5a-8423-57b2bb51eb1c v2 deleting faulted 4294967296 c3-small-x86-01 17m
pvc-32c2a5f3-4ffd-457c-8633-02397aabae03 v2 deleting faulted 4294967296 c3-small-x86-01 17m
pvc-49aaa930-dbcb-49fb-8804-c27ada3a6e08 v2 deleting faulted 4294967296 c3-small-x86-01 17m
pvc-c61eb69e-8dc1-4852-9deb-fce50b915007 v2 deleting faulted 4294967296 c3-small-x86-01 17m
pvc-fead3001-9a23-40fd-be1f-c899fe54806b v2 deleting faulted 4294967296 c3-small-x86-01 17m
In addition, nvmf_subsystem_remove_listener
showed subsystem busy, retry later.
error.
I also added support bundle in #7703 (comment)
cc @DamiaSan
from longhorn.
Just a side request: we need to have SPDK tgt log in the support bundle or even be able to monitor at runtime.
I believe currently we also don't have logs of v1 iscsi tgt.
tgt log messages are redirected to STDOUT and recorded in log of instance-manager pod.
https://github.com/longhorn/longhorn-instance-manager/blob/master/package/instance-manager#L25-L41
from longhorn.
cc @DamiaSan
from longhorn.
Do you have SPDK logs?
from longhorn.
Sorry. I don't keep the environment and log. But the spdk_tgt only constant emits
[2024-01-16 06:45:25.210261] ctrlr.c: 178:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:uuid:00000000-0000-0000-0000-ac1f6be298ca from subsystem nqn..............
from longhorn.
Just a side request: we need to have SPDK tgt log in the support bundle or even be able to monitor at runtime.
I believe currently we also don't have logs of v1 iscsi tgt.
from longhorn.
Encounter the error once when testing this issue.
[longhorn-instance-manager] time="2024-01-17T09:42:01Z" level=error msg="Failed to delete replica with cleanupRequired flag false" func="spdk.(*Replica).Delete.func1" file="replica.go:682" error="error sending message, id 7606, method nvmf_subsystem_remove_listener, params {nqn.2023-01.io.longhorn.spdk:vol-r-1c41559a {TCP IPv4 10.42.1.237 20001} }: {\"code\": -32603,\"message\": \"subsystem busy, retry later.\n\"}" lvsName=disk-2 lvsUUID=14821ea5-b73e-45de-9574-3d4628637986 replicaName=vol-r-1c41559a
from longhorn.
In addition,
nvmf_subsystem_remove_listener
showedsubsystem busy, retry later.
error.
Something similar happens here: spdk/spdk#3079
from longhorn.
Related Issues (20)
- [BACKPORT][v1.5.4][BUG] Executing fstrim while rebuilding causes IO errors HOT 2
- [BACKPORT][v1.6.1][BUG] Executing fstrim while rebuilding causes IO errors HOT 1
- [BACKPORT][v1.5.4][BUG] Deadlock between volume migration and upgrade after Longhorn upgrade HOT 1
- [BACKPORT][v1.6.1][BUG] Deadlock between volume migration and upgrade after Longhorn upgrade HOT 1
- [BUG] CSI detachment can complete while volume is still attached
- [BUG/IMPROVEMENT] Share manager ignores node selector and/or storage class topology
- [BUG] Unable to drain a node HOT 1
- [IMPROVEMENT] Check uio and nvme-tcp kernel modules before creating an instance-manager pod for v2 data engine
- [BACKPORT][v1.6.1][BUG] Create backup failed: failed lock lock-*.lck type 1 acquisition HOT 1
- [BACKPORT][v1.5.4][BUG] Create backup failed: failed lock lock-*.lck type 1 acquisition HOT 1
- [RELEASE] 1.5.4 HOT 11
- [BACKPORT][v1.4.5][BUG] Deadlock between volume migration and upgrade after Longhorn upgrade HOT 1
- [BACKPORT][v1.4.5][IMPROVEMENT] Make environment_check look for a global default K8s priority class in those releases that it affects. HOT 3
- [BACKPORT][v1.4.5][BUG] Executing fstrim while rebuilding causes IO errors HOT 1
- [BACKPORT][v1.6.1][BUG] Longhorn may keep corrupted salvaged replicas and discard good ones HOT 1
- [BACKPORT][v1.6.1][IMPROVEMENT] Remove startup probe of CSI driver after liveness probe conn fix ready HOT 1
- [BUG] Volumes stuck upgrading after 1.5.3 -> 1.6.0 upgrade. HOT 27
- [IMPROVEMENT] Improve UX for additional disks HOT 1
- [IMPROVEMENT] Helm Chart: Support Gateway API and improve Ingress
- [BUG] Cannot scale down an operator managed workload due to pvc stuck in terminating Longhorn 1.4.3 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from longhorn.