Comments (10)
OK, what i've described above definitely seems to be what's happening as I just had the issue again.
Stuff broke just as described above, after the 're-election', the state of my system was this:
My router-001:
8: eth6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
link/ether 52:54:00:76:99:24 brd ff:ff:ff:ff:ff:ff
inet 10.1.3.240/24 brd 10.1.3.255 scope global eth6
inet 10.1.3.254/24 scope global secondary eth6
inet6 fe80::5054:ff:fe76:9924/64 scope link
valid_lft forever preferred_lft forever
router-002:
8: eth6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
link/ether 52:54:00:9f:9d:41 brd ff:ff:ff:ff:ff:ff
inet 10.1.3.241/24 brd 10.1.3.255 scope global eth6
inet6 fe80::5054:ff:fe9f:9d41/64 scope link
valid_lft forever preferred_lft forever
From the arp cache of one of my app servers:
10.1.3.254 ether 52:54:00:9f:9d:41 C eth1
10.1.3.254 is its default gateways. So it was receiving packets from requests via router-001, but was trying to send packets back via router-002.
This seems very badly broken :(
from keepalived.
Any update on this? We are suffering this issue using Keepalived 1.2.8.
from keepalived.
I don't think so, I had no response :(
We spent a short time trying to work out if we could use the script running functionality to produce GARPs ourselves but ran out of time and had other work prioritised over it.
We have ended up in a situation where if one instance goes into a fault state, we simply have that keepalived stop itself using the script running functionality. This has mostly worked, we have still ended up once or twice in a situation where it's caused some downtime of the services behind it. But so far it's better than it breaking every time there's a fault.
from keepalived.
hi guys,
hmm, sounds strange. during master transition, code is sending "updates" which are GARP for IPv4 and Unsollicited Neigh adverts. First thing to try is to tcpdump during master state transition to see if GARP packet are sent on the wire... (daemon will log "VRRP_Instance(%s) Sending gratuitous ARPs on %s for %s"). If you see those log and packet on the wire... then maybe (for sure) your layer2 remote party is not honouring GARP :/ which is bad.
if so, maybe an ICMP will fix the issue... I was considering some time ago adding the ability to send ICMP in addition to GARP.
Please, let me know your debug.
Regs,
Alexandre
from keepalived.
Could some one experimenting the same issue help reproduce it in my lab in order to check it and fix it ! (I am right now in a coding process to fix all issues reported).
Regs,
Alexandre
from keepalived.
Hi,
I spend time with this issue and extended gratuitous ARP handling to workaround some corner case.
I just commit a patch fixing this issue under "vrrp: fix/extend gratuitous ARP handling"
Please give it a try and report.
Best regs,
Alexandre
from keepalived.
Hello,
We fixed the issue in our scenario with a work-around which is a small daemon that sends gratuitous ARP when you are the master, rather than messing with Keepalived code which would be probably not merged.
Because several reasons we can't spend time testing this patch now, but I can say it's in the good direction. Any feedback from anyone else would be highly appreciated.
from keepalived.
hello,
This is exactly the code I included mainline : adding the possibility to periodically send garp while in MASTER state (using garp_master_refresh), because in some corner case sending gratuitous ARP only during MASTER transition (as specified by RFC) can be not enough.
regs,
Alexandre
from keepalived.
Just FYI, over the years, we have often seen artifacts of various switch problems or network topology changes, where STP or similar could eat the packets from the master for some seconds, isolating it from the backup node(s). The backup nodes would become master and GARP, while the master thought nothing changed and would continue VRRPing to itself and possibly some servers on that switch. Once the network converges, the backup sees the master's advertisements and stops, but no other GARPs occur, so the isolated segment sees no recovery GARP and the two sets of servers can stay unmatched.
Combined with the way positive feedback can be used from higher layers in Linux (and now newer versions of Windows), certain situations can cause the hosts to stay even if the traffic is partially broken as a result of the gateway mismatches. I have actually influenced the ARP behaviour (and fixed the problem) at this point by adding a UDP iptables (eg: not ARP layer) firewall rule on the backup node to stop the positive feedback.
So, I think periodic GARPs are required in many cases...or static MACs, where the MAC address of the multicast advertisement is enough to update the CAM tables on all of the networking equipment. In the latter case, the above scenario would mend itself purely as the VRRP packets pass from the new master.
from keepalived.
Hi Simon !
agreed... In a first though I chose to disable garp_master_refresh by default... maybe we need to make it on by default with a long timer (say: every 5min).
from keepalived.
Related Issues (20)
- Swap virtual_router_id of two vrrp instance(same interface) cause vip loss when reload HOT 2
- Notifications rejected by Exchange Online when From includes a display name HOT 1
- Can keepalived control the appending order of IPv6 protocol stack IPs? HOT 2
- Can the order of adding IPv6 to keepalived be controlled HOT 1
- healthchecker runtime error on check_data.h HOT 12
- How to implement two mutual exclusive VIPs HOT 4
- vip appears on both machines HOT 2
- Keepalived VIP not getting original source IP of the request to VIP HOT 3
- FIFO process seems to be killed prematurely before stop command terminates HOT 5
- keepalived on Wi-Fi - network delay HOT 4
- After restart NIC, keepalived can not become master state HOT 6
- What is the correct way to disable preempt for keepalived HOT 4
- vrrp_script; Cannot find script docker in path - disabling HOT 3
- The old virtual_ipaddress_excluded has not been deleted after reconfig vlan interface HOT 9
- keepalived Docker Image Build Failed HOT 8
- unable to recover from split brain problem HOT 2
- Configure virtual server only on master but not on backups HOT 5
- v2.2.8: nopreempt is configured, the notify_xxx method will not be triggered. HOT 2
- One-off symlink resolution causes fragile setups on NixOS HOT 6
- Add documentation for notify script option HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from keepalived.