Giter VIP home page Giter VIP logo

Comments (11)

comrada avatar comrada commented on September 13, 2024 3

@jpechane no, 0.29.2 does not fix this problem, because as I wrote in my comment, the infinite wait in my case happens while trying to close the socket for writing, and the 0.29.2 fix sets the SO_LINGER property after this call.
In my fix (based on 0.29.2), I moved the property setting to the beginning of the close() method and it works for me.
image

from mysql-binlog-connector-java.

phamlehieu avatar phamlehieu commented on September 13, 2024 2

Hi @comrada , does the 0.29.2 work for you? I'm facing a similar issue after upgrading to Aurora v3.

from mysql-binlog-connector-java.

Naros avatar Naros commented on September 13, 2024

Hi, so I'm wondering, based on looking at the JDK11 bug history, if this may not be related to some past issues with JDK11 and SSL connections hitting a deadlock scenario where you have this unique combination of the socket shutting down:

https://bugs.openjdk.org/browse/JDK-8207004
https://bugs.openjdk.org/browse/JDK-8207009

Can you please share what JDK vendor you're using? Would it be possible to upgrade to 11.0.20 and see whether this problem goes away, or better yet move to JDK17 as JDK11 is being EoL in October 2024?

from mysql-binlog-connector-java.

aadant avatar aadant commented on September 13, 2024

@Naros thanks, looks like it, indeed we use this

# java --version
openjdk 11.0.16 2022-07-19
OpenJDK Runtime Environment 18.9 (build 11.0.16+8)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.16+8, mixed mode, sharing)

from mysql-binlog-connector-java.

aadant avatar aadant commented on September 13, 2024

@Naros unfortunately the issue continues to happen with OpenJDK 17 ...

bash-4.4# java --version
openjdk 17.0.2 2022-01-18
OpenJDK Runtime Environment (build 17.0.2+8-86)
OpenJDK 64-Bit Server VM (build 17.0.2+8-86, mixed mode, sharing)

from mysql-binlog-connector-java.

aadant avatar aadant commented on September 13, 2024

stack trace involving shyiko :

"blc-fpif1-txnrepodbl1:3306" #77 prio=5 os_prio=0 cpu=3406817.79ms elapsed=71952.46s tid=0x00007feb4423f7a0 nid=0x11a runnable  [0x00007feb5affd000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.FileDispatcherImpl.write0([email protected]/Native Method)
        at sun.nio.ch.SocketDispatcher.write([email protected]/SocketDispatcher.java:62)
        at sun.nio.ch.NioSocketImpl.tryWrite([email protected]/NioSocketImpl.java:398)
        at sun.nio.ch.NioSocketImpl.implWrite([email protected]/NioSocketImpl.java:413)
        at sun.nio.ch.NioSocketImpl.write([email protected]/NioSocketImpl.java:440)
        at sun.nio.ch.NioSocketImpl$2.write([email protected]/NioSocketImpl.java:826)
        at java.net.Socket$SocketOutputStream.write([email protected]/Socket.java:1035)
        at sun.security.ssl.SSLSocketOutputRecord.flush([email protected]/SSLSocketOutputRecord.java:271)
        at sun.security.ssl.OutputRecord.changeWriteCiphers([email protected]/OutputRecord.java:224)
        at sun.security.ssl.KeyUpdate$KeyUpdateProducer.produce([email protected]/KeyUpdate.java:321)
        at sun.security.ssl.KeyUpdate$KeyUpdateKickstartProducer.produce([email protected]/KeyUpdate.java:171)
        at sun.security.ssl.SSLHandshake.kickstart([email protected]/SSLHandshake.java:542)
        at sun.security.ssl.PostHandshakeContext.kickstart([email protected]/PostHandshakeContext.java:69)
        at sun.security.ssl.TransportContext.kickstart([email protected]/TransportContext.java:251)
        at sun.security.ssl.SSLSocketImpl.startHandshake([email protected]/SSLSocketImpl.java:443)
        at sun.security.ssl.SSLSocketImpl.startHandshake([email protected]/SSLSocketImpl.java:421)
        at sun.security.ssl.SSLSocketImpl.tryKeyUpdate([email protected]/SSLSocketImpl.java:1543)
        at sun.security.ssl.SSLSocketImpl.decode([email protected]/SSLSocketImpl.java:1515)
        at sun.security.ssl.SSLSocketImpl.readApplicationRecord([email protected]/SSLSocketImpl.java:1474)
        at sun.security.ssl.SSLSocketImpl$AppInputStream.read([email protected]/SSLSocketImpl.java:1059)
        at com.github.shyiko.mysql.binlog.io.ByteArrayInputStream.readWithinBlockBoundaries(ByteArrayInputStream.java:266)
        at com.github.shyiko.mysql.binlog.io.ByteArrayInputStream.read(ByteArrayInputStream.java:245)
        at com.github.shyiko.mysql.binlog.io.ByteArrayInputStream.fill(ByteArrayInputStream.java:112)
        at com.github.shyiko.mysql.binlog.io.ByteArrayInputStream.read(ByteArrayInputStream.java:105)
        at com.github.shyiko.mysql.binlog.event.deserialization.AbstractRowsEventDataDeserializer.deserializeNewDecimal(AbstractRowsEventDataDeserializer.java:258)
        at com.github.shyiko.mysql.binlog.event.deserialization.AbstractRowsEventDataDeserializer.deserializeCell(AbstractRowsEventDataDeserializer.java:169)
        at com.github.shyiko.mysql.binlog.event.deserialization.AbstractRowsEventDataDeserializer.deserializeRow(AbstractRowsEventDataDeserializer.java:143)
        at com.github.shyiko.mysql.binlog.event.deserialization.UpdateRowsEventDataDeserializer.deserializeRows(UpdateRowsEventDataDeserializer.java:72)
        at com.github.shyiko.mysql.binlog.event.deserialization.UpdateRowsEventDataDeserializer.deserialize(UpdateRowsEventDataDeserializer.java:58)
        at com.github.shyiko.mysql.binlog.event.deserialization.UpdateRowsEventDataDeserializer.deserialize(UpdateRowsEventDataDeserializer.java:33)
        at com.github.shyiko.mysql.binlog.event.deserialization.EventDeserializer.deserializeEventData(EventDeserializer.java:337)
        at com.github.shyiko.mysql.binlog.event.deserialization.EventDeserializer.nextEvent(EventDeserializer.java:246)
        at io.debezium.connector.mysql.MySqlStreamingChangeEventSource$1.nextEvent(MySqlStreamingChangeEventSource.java:238)
        at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:1068)
        at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:648)
        at com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:949)
        at java.lang.Thread.run([email protected]/Thread.java:833)

"blc-keepalive-fpif1-txnrepodbl1:3306" #79 prio=5 os_prio=0 cpu=30.29ms elapsed=71952.43s tid=0x00007feb484dd850 nid=0x11c waiting on condition  [0x00007feb5aefd000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x00007fefd423e348> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:211)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:715)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:938)
        at java.util.concurrent.locks.ReentrantLock$Sync.lock([email protected]/ReentrantLock.java:153)
        at java.util.concurrent.locks.ReentrantLock.lock([email protected]/ReentrantLock.java:322)
        at sun.security.ssl.SSLSocketImpl.closeNotify([email protected]/SSLSocketImpl.java:726)
        at sun.security.ssl.TransportContext.closeNotify([email protected]/TransportContext.java:280)
        at sun.security.ssl.TransportContext.initiateOutboundClose([email protected]/TransportContext.java:591)
        at sun.security.ssl.TransportContext.closeOutbound([email protected]/TransportContext.java:575)
        at sun.security.ssl.SSLSocketImpl.shutdownOutput([email protected]/SSLSocketImpl.java:865)
        at com.github.shyiko.mysql.binlog.network.protocol.PacketChannel.close(PacketChannel.java:125)
        at com.github.shyiko.mysql.binlog.BinaryLogClient.disconnectChannel(BinaryLogClient.java:1361)
        at com.github.shyiko.mysql.binlog.BinaryLogClient.terminateConnect(BinaryLogClient.java:1345)
        at com.github.shyiko.mysql.binlog.BinaryLogClient.access$800(BinaryLogClient.java:96)
        at com.github.shyiko.mysql.binlog.BinaryLogClient$5.run(BinaryLogClient.java:894)
        at java.util.concurrent.Executors$RunnableAdapter.call([email protected]/Executors.java:539)
        at java.util.concurrent.FutureTask.run([email protected]/FutureTask.java:264)
        at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
        at java.lang.Thread.run([email protected]/Thread.java:833)

from mysql-binlog-connector-java.

Naros avatar Naros commented on September 13, 2024

Hi @aadant, can you describe the steps on how to reproduce it reliably?

from mysql-binlog-connector-java.

aadant avatar aadant commented on September 13, 2024

It happens in the context of debezium embedded
(Altinity sink connector). The problem occurs several times a week. There is an open issue there. It is a random issue though.

from mysql-binlog-connector-java.

comrada avatar comrada commented on September 13, 2024

Hi, bringing this topic up again because we have encountered it in our company.

Given: AWS Aurora 3, AWS MSK, Confluent Kafka Connect 7.6.1, Debezium 2.6.1.
openjdk version "11.0.22" 2024-01-16 LTS
OpenJDK Runtime Environment Zulu11.70+15-CA (build 11.0.22+7-LTS)
OpenJDK 64-Bit Server VM Zulu11.70+15-CA (build 11.0.22+7-LTS, mixed mode)

Problems started after MySQL upgrade from version 5.7 to 8 i.e. Aurora 2 -> 3. In the logs we see the message
Keepalive: Trying to restore lost connection to xxxx-xxxx-xxx.rds.amazonaws.com:3306
and silence.
First of all I tried setting the use.nongraceful.disconnect flag, but that had no effect. Then I started digging deeper. I modified your connector code a bit, added logs to it after each step and found two things:

  1. For some reason, setting the use.nongraceful.disconnect flag in the Debezium connector does not cause the useNonGracefulDisconnect field to be set to the appropriate value. I verified this by adding the output of the field values to the log here. So I just hardcoded true wherever that flag is used.
  2. The hang occurs when attempting a half-close here, and the workaround implemented here added setting SO_LINGER after the half-close attempt but before closing. I'm not an expert in TCP, but I assumed that this option had something to do with the half-close, so I moved this block up to the beginning of the close() method and it worked, the connection immediately started to close and the connector re-established the connection.

About reliably reproducing the bug. Based on what I described above, namely that the hang happens on half-close, I'm not sure how it can be reproduced. In our case, Aurora DB sometimes has such extreme loads that it just dies for a few minutes, at which point Debezium stops getting heartbeats in response and tries to restart BinaryLogClient. In the case of a read socket close, an SSLException occurs:

javax.net.ssl.SSLException: closing inbound before receiving peer's close_notify
	at java.base/sun.security.ssl.SSLSocketImpl.shutdownInput(SSLSocketImpl.java:848)
	at java.base/sun.security.ssl.SSLSocketImpl.shutdownInput(SSLSocketImpl.java:826)
	at com.github.shyiko.mysql.binlog.network.protocol.PacketChannel.close(PacketChannel.java:141)
	at com.github.shyiko.mysql.binlog.BinaryLogClient.disconnectChannel(BinaryLogClient.java:1386)
	at com.github.shyiko.mysql.binlog.BinaryLogClient.terminateConnect(BinaryLogClient.java:1361)
	at com.github.shyiko.mysql.binlog.BinaryLogClient.terminateConnect(BinaryLogClient.java:1356)
	at com.github.shyiko.mysql.binlog.BinaryLogClient.disconnect(BinaryLogClient.java:1327)
	at io.debezium.connector.mysql.MySqlStreamingChangeEventSource$ReaderThreadLifecycleListener.onCommunicationFailure(MySqlStreamingChangeEventSource.java:1132)
	at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:1106)
	at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:655)
	at com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:958)
	at java.base/java.lang.Thread.run(Thread.java:829)

it is ignored and proceeds to a write socket close, where the endless waiting takes place, because in our case the server is in a coma and not responding. How to reproduce this, I don't know.
I hope my experiment will give someone useful insights.

P.S. Recommendation from me: please add more logs, at least with DEBUG level.

from mysql-binlog-connector-java.

jpechane avatar jpechane commented on September 13, 2024

@comrada Debezium 2.6.1 contains 0.29.1 of binlog client. The issue should be fixed in 0.29.2. Could you please try to use it in yor deployment if it helps with the problem?

from mysql-binlog-connector-java.

Vittalgc avatar Vittalgc commented on September 13, 2024

Hi @comrada We're also on Aurora v3 and running into the same deadlock issue, although I'm not sure if the disconnect happens because of aurora dying sometimes like you mentioned in your case. For this change you mention though, closing the socket without linger is generally discouraged from what I've read online. So I guess this should be directed towards the people that implemented this initially to address this issue.

But I wanted to ask if this can lead to any data loss? Has this fix been working fine for you ?

from mysql-binlog-connector-java.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.