cluster-api v0.3.7 capp v0.3.2 packet-ccm v1.1.0 I am hittin

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Control Plane rolling update stall with EIP about cluster-api-provider-packet HOT 18 CLOSED

kubernetes-sigs commented on September 23, 2024

Control Plane rolling update stall with EIP

from cluster-api-provider-packet.

Comments (18)

deitch commented on September 23, 2024

Let me see if I understand this. When I saw node, I mean "control plane node"

node A is in good state
node B is brought up
node A needs to be brought down
node A apiserver goes down
CCM sees node A apiserver is down, switches EIP to node B
CAPI kills etcd on node A
node A still has some processes that need to talk to etcd, no longer can talk local, so try to talk to the loadbalancer EIP
node A still has EIP configured locally, so it tries to reach etcd locally, fails

Is that correct?

from cluster-api-provider-packet.

jhead-slg commented on September 23, 2024

Yes, that is mostly correct. I believe step 6 happens after step 3 which causes the API to die as well.

from cluster-api-provider-packet.

deitch commented on September 23, 2024

So what really needs to happen is, once node A goes down (step 4), it needs the local IP routing removed. Correct?

from cluster-api-provider-packet.

jhead-slg commented on September 23, 2024

Correct.

from cluster-api-provider-packet.

deitch commented on September 23, 2024

Thanks for the clarity. It would be nice not to have to deal with the IP locally at all. E.g. if the EIP were 100.10.10.10, and the node IPs were 100.10.10.20 and 100.10.10.30, then it would work perfectly. The problem is you need a real load balancer doing inbound NAT (changing the dst IP on the packet that hits the host) in front of it to get there, rather than lower-level network primitives (routers and switches).

BGP helps, but doesn't completely solve it. Same with EIP. FWIW, the Kubernetes kube-proxy also helps, as it sets up iptables rules, independent of the local routes. I wouldn't mind trying to leverage that, but kube-proxy is, essentially, global. All hosts have it, and the rules are the same across all of them.

CCM itself is a Deployment with replicas=1, so it cannot control the IP addr/routes/iptables on a different host, unless we deploy another DaemonSet.

from cluster-api-provider-packet.

deitch commented on September 23, 2024

Also, your fix works well when installing via CAPP (hence the issue on this repo), but the EIP is controlled via CCM, and needs to account for non-CAPP situations.

from cluster-api-provider-packet.

fejta-bot commented on September 23, 2024

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

from cluster-api-provider-packet.

fejta-bot commented on September 23, 2024

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

from cluster-api-provider-packet.

fejta-bot commented on September 23, 2024

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

from cluster-api-provider-packet.

k8s-ci-robot commented on September 23, 2024

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

from cluster-api-provider-packet.

cprivitere commented on September 23, 2024

/reopen

from cluster-api-provider-packet.

cprivitere commented on September 23, 2024

/remove-lifecycle rotten

from cluster-api-provider-packet.

k8s-ci-robot commented on September 23, 2024

@cprivitere: Reopened this issue.

In response to this:

/reopen

from cluster-api-provider-packet.

cprivitere commented on September 23, 2024

This should be tested with the latest CPEM to see if the daemon set changes resolves it.

from cluster-api-provider-packet.

k8s-triage-robot commented on September 23, 2024

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

from cluster-api-provider-packet.

k8s-triage-robot commented on September 23, 2024

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

from cluster-api-provider-packet.

k8s-triage-robot commented on September 23, 2024

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

from cluster-api-provider-packet.

k8s-ci-robot commented on September 23, 2024

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

from cluster-api-provider-packet.

Control Plane rolling update stall with EIP about cluster-api-provider-packet HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent