Comments (18)
Let me see if I understand this. When I saw node, I mean "control plane node"
- node A is in good state
- node B is brought up
- node A needs to be brought down
- node A apiserver goes down
- CCM sees node A apiserver is down, switches EIP to node B
- CAPI kills etcd on node A
- node A still has some processes that need to talk to etcd, no longer can talk local, so try to talk to the loadbalancer EIP
- node A still has EIP configured locally, so it tries to reach etcd locally, fails
Is that correct?
from cluster-api-provider-packet.
Yes, that is mostly correct. I believe step 6 happens after step 3 which causes the API to die as well.
from cluster-api-provider-packet.
So what really needs to happen is, once node A goes down (step 4), it needs the local IP routing removed. Correct?
from cluster-api-provider-packet.
Correct.
from cluster-api-provider-packet.
Thanks for the clarity. It would be nice not to have to deal with the IP locally at all. E.g. if the EIP were 100.10.10.10, and the node IPs were 100.10.10.20 and 100.10.10.30, then it would work perfectly. The problem is you need a real load balancer doing inbound NAT (changing the dst IP on the packet that hits the host) in front of it to get there, rather than lower-level network primitives (routers and switches).
BGP helps, but doesn't completely solve it. Same with EIP. FWIW, the Kubernetes kube-proxy also helps, as it sets up iptables rules, independent of the local routes. I wouldn't mind trying to leverage that, but kube-proxy is, essentially, global. All hosts have it, and the rules are the same across all of them.
CCM itself is a Deployment with replicas=1
, so it cannot control the IP addr/routes/iptables on a different host, unless we deploy another DaemonSet.
from cluster-api-provider-packet.
Also, your fix works well when installing via CAPP (hence the issue on this repo), but the EIP is controlled via CCM, and needs to account for non-CAPP situations.
from cluster-api-provider-packet.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
from cluster-api-provider-packet.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
from cluster-api-provider-packet.
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
from cluster-api-provider-packet.
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen
.
Mark the issue as fresh with/remove-lifecycle rotten
.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
from cluster-api-provider-packet.
/reopen
from cluster-api-provider-packet.
/remove-lifecycle rotten
from cluster-api-provider-packet.
@cprivitere: Reopened this issue.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
from cluster-api-provider-packet.
This should be tested with the latest CPEM to see if the daemon set changes resolves it.
from cluster-api-provider-packet.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
from cluster-api-provider-packet.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
from cluster-api-provider-packet.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
from cluster-api-provider-packet.
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
from cluster-api-provider-packet.
Related Issues (20)
- Provision EM Clusters with Service LoadBalancing optionally enabled through CPEM HOT 8
- Can't open standard output error in controller logs HOT 5
- update device-id logging lines in packetmachine_controller.go to just use deviceID HOT 10
- Scaling from 0 seems to have issues HOT 7
- Use same packet-ci yaml file for github actions and e2e tests. HOT 8
- Get rid of spurious error message about packetmachine with machine name not existing HOT 8
- Add kube-state-metrics support HOT 10
- Update to current version of kustomize HOT 9
- Create onboarding and growth path for contributors HOT 6
- Create project roadmap HOT 6
- Review/update contributing.md HOT 7
- Create dev container HOT 6
- Debug flag HOT 6
- Makefile cleanup HOT 6
- Start testing against k/k master and/or next-release-latest HOT 6
- Add an auth provider for issuing tokens to device creation HOT 6
- CAPI v1.7.0-beta.0 has been released and is ready for testing HOT 1
- Create a -development template to avoid issues for folks following Quickstart directions HOT 1
- Fix yq usage in make verify
- CAPI v1.8.0-beta.0 has been released and is ready for testing
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cluster-api-provider-packet.