Giter VIP home page Giter VIP logo

Comments (10)

cron2 avatar cron2 commented on May 28, 2024 1

Bandaid patch v3 has been merged and will be part of the upcoming 2.6_rc1 version. With that, @schwabe and I can no longer break TCP servers this way.

from openvpn.

bernhardschmidt avatar bernhardschmidt commented on May 28, 2024

Possibly related, a few minutes after restart we have an openvpn process that is pegging at 100%

recvfrom(-1, 0x55b337b832b8, 2330, 0, 0x55b337b66810, [28]) = -1 EBADF (Bad file descriptor)
recvmsg(-1, {msg_namelen=128}, MSG_ERRQUEUE) = -1 EBADF (Bad file descriptor)
write(1, "9b24ef5845c205e04e757b933ffa70e6"..., 109) = 109
recvfrom(-1, 0x55b337b832b8, 2330, 0, 0x55b337b66810, [28]) = -1 EBADF (Bad file descriptor)
recvmsg(-1, {msg_namelen=128}, MSG_ERRQUEUE) = -1 EBADF (Bad file descriptor)
write(1, "9b24ef5845c205e04e757b933ffa70e6"..., 109) = 109
recvfrom(-1, 0x55b337b832b8, 2330, 0, 0x55b337b66810, [28]) = -1 EBADF (Bad file descriptor)
recvmsg(-1, {msg_namelen=128}, MSG_ERRQUEUE) = -1 EBADF (Bad file descriptor)
write(1, "9b24ef5845c205e04e757b933ffa70e6"..., 109) = 109
recvfrom(-1, 0x55b337b832b8, 2330, 0, 0x55b337b66810, [28]) = -1 EBADF (Bad file descriptor)
recvmsg(-1, {msg_namelen=128}, MSG_ERRQUEUE) = -1 EBADF (Bad file descriptor)
write(1, "9b24ef5845c205e04e757b933ffa70e6"..., 109) = 109

from openvpn.

cron2 avatar cron2 commented on May 28, 2024

from openvpn.

bernhardschmidt avatar bernhardschmidt commented on May 28, 2024

Unfortunately no. For the first error, this is all I got on the log

Dec 7 12:21:12 eduvpn-n09 openvpn[147602]: d400821cdfd1c0294d1ec1b8bd15b768/2001:9e8:xxx MULTI: Learn: 2001:4ca0:2fff:2:3:0:7:1005 -> d400821cdfd1c0294d1ec1b8bd15b768/2001:9e8:xxx
Dec 7 12:21:12 eduvpn-n09 openvpn[147602]: d400821cdfd1c0294d1ec1b8bd15b768/2001:9e8:xxx MULTI: primary virtual IPv6 for d400821cdfd1c0294d1ec1b8bd15b768/2001:9e8:xxx: 2001:4ca0:2fff:2:3:0:7:1005
Dec 7 12:21:12 eduvpn-n09 openvpn[147602]: d400821cdfd1c0294d1ec1b8bd15b768/2001:9e8:xxx read TCPv6_SERVER []: Bad file descriptor (fd=-1,code=9)
Dec 7 12:21:12 eduvpn-n09 openvpn[147602]: d400821cdfd1c0294d1ec1b8bd15b768/2001:9e8:xxxread TCPv6_SERVER []: Bad file descriptor (fd=-1,code=9)

And in the second case logging was disabled, I attached to the running openvpn process using strace.

from openvpn.

cron2 avatar cron2 commented on May 28, 2024

from openvpn.

cron2 avatar cron2 commented on May 28, 2024

So, looking a bit more closely into our code (socket.c and mtu.c) - the combination of recvfrom() and recvmsg() is only ever done for UDP, so it seems the second log might be a different issue. We currently have no idea why the file descriptor might change to "-1" without (especially in the UDP case) the process just ending - there are no races with "pass socket to kernel, userland must no longer use it" anymore...

from openvpn.

cron2 avatar cron2 commented on May 28, 2024

0001-WIP-ASSERT-if-sock-fd-passed-to-recv-is-not-0-GH-iss.txt

I have attached a patch that adds ASSERT(sock && sock->fd > 0) to every place where we recv() or recvfrom() etc. from a file descriptor - one for TCP, two for UDP (ignoring the one in mtu.c). This is not a bugfix, but if you could run a server instance with verb 6 and DCO enabled, and this problem happens again, the server will stop and the log file should hopefully give us some more hints how we managed to break things...

from openvpn.

cron2 avatar cron2 commented on May 28, 2024

JFTR, I managed to reproduce the crash for a TCP-based server, with the ASSERT() patch above

Dec 18 12:30:26 ubuntu2004 tun-tcp-p2mp-username-cn[1515354]: gremlin52251/2001:608:0:814::f000:21 dco_install_key: peer_id=106 keyid=0, currently 0 keys installed
Dec 18 12:30:26 ubuntu2004 tun-tcp-p2mp-username-cn[1515354]: gremlin52251/2001:608:0:814::f000:21 dco_new_key: slot 0, key-id 0, peer-id 106, cipher AES-256-GCM
Dec 18 12:30:26 ubuntu2004 tun-tcp-p2mp-username-cn[1515354]: gremlin52251/2001:608:0:814::f000:21 SENT CONTROL [gremlin52251]: 'PUSH_REPLY,route 10.220.0.0 255.255.255.0,route 10.220.128.0 255.255.128.0,route-ipv6 fd00:abcd:220::/48,tun-ipv6,route-gateway 10.220.112.1,topology subnet,ping 10,ping-restart 30,ifconfig-ipv6 fd00:abcd:220:112::11d0/64 fd00:abcd:220:112::1,ifconfig 10.220.113.210 255.255.252.0,peer-id 106,cipher AES-256-GCM,protocol-flags cc-exit tls-ekm' (status=1)
Dec 18 12:30:26 ubuntu2004 tun-tcp-p2mp-username-cn[1515354]: gremlin52251/2001:608:0:814::f000:21 Assertion failed at socket.c:3361 (sock && sock->sd >= 0)
Dec 18 12:30:26 ubuntu2004 tun-tcp-p2mp-username-cn[1515354]: gremlin52251/2001:608:0:814::f000:21 Exiting due to fatal error

(won't help someone who needs a working server :-) - but I hope this gives us some logging to understand what weird flow of events made us arrive there - there's nothing in the code that would ever set sock->sd to -1...)

I have not been able to make a UDP server crash, but brute-forcing ("5.000 client connects in 90 minutes") uncovered some other interesting misbehaviours...

from openvpn.

cron2 avatar cron2 commented on May 28, 2024

https://patchwork.openvpn.net/project/openvpn2/patch/[email protected]/

I do have a patch that bandaid-fixes the issue for me - that is, I can reproduce the TCP server crash, and with the fix, it will just kill the "broken" client instance.

Dec 22 10:49:25 ubuntu2004 tun-tcp-p2mp-username-cn[1659541]: gremlin50083/2001:608:0:814::f000:21 dco_install_key: peer_id=258 keyid=0, currently 0 keys installed
Dec 22 10:49:25 ubuntu2004 tun-tcp-p2mp-username-cn[1659541]: gremlin50083/2001:608:0:814::f000:21 dco_new_key: slot 0, key-id 0, peer-id 258, cipher AES-256-GCM
Dec 22 10:49:25 ubuntu2004 tun-tcp-p2mp-username-cn[1659541]: gremlin50083/2001:608:0:814::f000:21 SENT CONTROL [gremlin50083]: 'PUSH_REPLY,route 10.220.0.0 255.255.255.0,route 10.220.128.0 255.255.128.0,route-ipv6 fd00:abcd:220::/48,tun-ipv6,route-gateway 10.220.112.1,topology subnet,ping 10,ping-restart 30,ifconfig-ipv6 fd00:abcd:220:112::110c/64 fd00:abcd:220:112::1,ifconfig 10.220.113.14 255.255.252.0,peer-id 258,cipher AES-256-GCM,protocol-flags cc-exit' (status=1)
Dec 22 10:49:25 ubuntu2004 tun-tcp-p2mp-username-cn[1659541]: gremlin50083/2001:608:0:814::f000:21 BUG: link_socket_read_tcp(): sock->sd==-1, reset client instance
Dec 22 10:49:25 ubuntu2004 tun-tcp-p2mp-username-cn[1659541]: gremlin50083/2001:608:0:814::f000:21 Connection reset, restarting [0]
Dec 22 10:49:25 ubuntu2004 tun-tcp-p2mp-username-cn[1659541]: gremlin50083/2001:608:0:814::f000:21 SIGUSR1[soft,connection-reset] received, client-instance restarting
Dec 22 10:49:25 ubuntu2004 tun-tcp-p2mp-username-cn[1659541]: dco_del_peer: peer-id 258

The current theory about the underlying issue is "an incoming TCP session close in just the wrong moment", so that client's session would be broken anyway. I consider this to be a bandaid, because it would be preferrable to never be in this situation in the first place - but fixing the underlying issue might take longer. So, for 2.6.0 with DCO, this should get the job done.

@bernhardschmidt please see if you can still break it :-)

from openvpn.

bernhardschmidt avatar bernhardschmidt commented on May 28, 2024

We currently do not run DCO due to other bugs, but I remember this being fixed with rc1 (hit us before within minutes).

Closing as suggested by cron2, thanks for the fix.

from openvpn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.