Giter VIP home page Giter VIP logo

Comments (5)

volgariver6 avatar volgariver6 commented on May 29, 2024

https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22PBJ%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-nightly-2eeef92-20240509232129%5C%22,%20matrixorigin_io_component%3D%5C%22ProxySet%5C%22%7D%20%21%3D%20%60gc%60%20%21%3D%20%60DEBUG%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22,%22maxLines%22:5000%7D%5D,%22range%22:%7B%22from%22:%221715308557000%22,%22to%22:%221715308560000%22%7D%7D%7D&schemaVersion=1&orgId=1

从日志看,在10:35:58 时间,proxy没有收到任何的请求,怀疑是网络问题导致的

from matrixone.

guguducken avatar guguducken commented on May 29, 2024

10:35:58时间较短,服务只有不到一秒的无法连接,或许可以看看2024-05-09 23:42:10到2024-05-09 23:42:12之间的情况。
https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22DQF%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-nightly-2eeef92-20240509232129%5C%22,%20matrixorigin_io_component%3D%5C%22ProxySet%5C%22%7D%20%7C%3D%20%60new%20connection%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221715269330000%22,%22to%22:%221715269332000%22%7D%7D%7D&schemaVersion=1&orgId=1

from matrixone.

volgariver6 avatar volgariver6 commented on May 29, 2024

@guguducken please comment the conclusion and forward it back to @aressu1985

from matrixone.

guguducken avatar guguducken commented on May 29, 2024

根因:客户端 ipv4.tcp_tw_reuse 设置为2,只对本地回环地址的timewait socket进行重用,通过10.222.1.128创建短连接,激增时会导致无法创建

复现过程:

  1. 启动sysbench开始短连接测试
  2. 通过命令观察mo服务端口的可连通性
    while true; do mysql -h 10.222.1.132 -udump -P30015 -p111 -e 'select 1;' > /dev/null;sleep 1; done
  3. 经过一段时间后,会产生报错
    ERROR 2003 (HY000): Can't connect to MySQL server on '10.222.1.132:30015' (99)

排查过程:

  1. 尝试直接proxy pod,发现没有问题
  2. 修改service type,使用nodePort,测试后仍然有问题
  3. 修改服务端集群中节点(132,134)的内核参数,其中包括
    ulimit -n 65535
    sysctl net.ipv4.tcp_tw_reuse=1
    sysctl net.ipv4.tcp_fin_timeout=5
  4. 在节点132,134上使用tcpdump抓包分析网络流量,检查是否是cni的问题
  5. 重启节点132,134
  6. 再次复现,通过ss -s观察到出问题是服务端timewait较少,但是客户端timewait数量激增
  7. 增加对照客户端,使用复现过程中测试可连通性的脚本测试mo集群的连通性,并持续观察客户端timewait数量情况
  8. 修改客户端内核参数net.ipv4.tcp_tw_reuse=1,并再次测试尝试复现
  9. 修改内核参数后测试不再报错

from matrixone.

aressu1985 avatar aressu1985 commented on May 29, 2024

fixed

from matrixone.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.