Giter VIP home page Giter VIP logo

Comments (7)

jparisu avatar jparisu commented on June 12, 2024

Hi @guni9191 ,
Glad to know you are working with DDS Router.
About your problem, with further information we are not able to give you an answer or a solution.
In order to help you, we would need more information regarding your scenario:

Error case

The error is occurring only when trying to close the DDS Router application, or the application just stops and then you are not able to stop it?
There is an echo participant that would be helpful to debug if the router is frozen, or if it is only incapable of stopping: https://eprosima-dds-router.readthedocs.io/en/latest/rst/user_manual/participants/echo.html

DDS network

Please, let us know the data types and rates that you are using, and also the QoS of your topics.
Some restrictive QoS with huge data loads may slow down the application drastically.

Network architecture

Are you working in local, WAN, in the same host? What is you bandwidth?

All the information that you are able to give us will help us to solve your problem.

from dds-router.

guni9191 avatar guni9191 commented on June 12, 2024

The error is occurring only when trying to close the DDS Router application, or the application just stops and then you are not able to stop it?
There is an echo participant that would be helpful to debug if the router is frozen, or if it is only incapable of stopping: https://eprosima-dds-router.readthedocs.io/en/latest/rst/user_manual/participants/echo.html

=> The application just stops and then I am not able to stop it. I have tried echo participant you have introduced me and it stops showing information too. when ^c is pressed, "Stopping DDS Router" only shows up.

data types and rates that you are using

=> i am using custom ROS2 msg types, 25 topics (nine 2hz, five 5hz, nine 1hz, two 0.1hz). i'm not sure about the data length, but my wireshark detects that the packets are 7052 frames/sec and a single frame contains 1304 bytes. Most of the qos setting is ROS2 QOS default setting, except one topic uses liveliness qos. This is very unusually large amout of data since my local rtps frame only have 300bytes on average, and not much frames(only about 200 compared to 7052frames). Also from wireshark i see a single frame that contains multiple duplicated messages (tcp payload). is this normal?

Are you working in local, WAN, in the same host? What is you bandwidth?

=> not sure about how to check the bandwidth but i'm guessing it's at least 100mbps. there seems to be some kind of firewall for my wifi but not sure about my environment. As i've said earlier i'm using azure cloud server so it's WAN. I was testing the round trip time by using system stamp, and when it stops, the rtt reaches to almost 20seconds.

from dds-router.

guni9191 avatar guni9191 commented on June 12, 2024

also my config for the tcp client is as below
version: v3.0 # 0

allowlist:

  • name: rt/*
    type: A_msgs*
  • name: rt/*
    type: B_msgs*
  • name: rt/*
    type: C_msgs*
    ...

participants:

  • name: SimpleParticipant # 3
    kind: local # 4
    domain: 0 # 5

  • name: WanParticipant # 6
    kind: wan # 7
    connection-addresses: # 8

    • ip: azure_cloud_server_public_ip
      port: my_port
      transport: tcp

from dds-router.

jparisu avatar jparisu commented on June 12, 2024

@guni9191, thank you for the detailed information.
So far, we do not know what can be producing this issue. We will try to extend our battery test.

If you could help us further, it will be important to know if the freeze is produced due to CPU usage and/or memory usage. An htop analysis will be interesting.
Also, if the application stops due to a deadlock, would be interesting to get the back-trace (using gdb for instance) to know if it is a transport issue, or it is something related with the DDS Router application.

Finally, I guess the large size of your frames is related with TCP. Would you be able to run it with UDP?

from dds-router.

guni9191 avatar guni9191 commented on June 12, 2024

@jparisu

  1. htop analysis didn't show the cpu and ram usage difference.
  2. The problem might be related to network bandwidth and latency.
    -> It seems that if i use faster 1100Mbps wifi instead of 433Mbps wifi the stopping behavior did not occur. Also, since there were not much frames generated when using 1100Mbps, i'm guessing that tcp packets with low bandwidth is more likely to disassemble and reassemble, generating much more unnecessary frames and heavy traffic.
  3. I think it is not udp/tcp matter, although using udp made things twice as faster. i am not able to test the application since i cannot modify router port-forwarding in my test environment. testing simple pub/sub in my house showed twice the faster rtt time though.

Can you guys test fastdds router in heavy traffic, low bandwidth environment? as far as i know, dds should work robustly in such a difficult situation, and most of all, the application should not stop. thx in advance for your response

from dds-router.

guni9191 avatar guni9191 commented on June 12, 2024

@jparisu
I think i found the reason why. As I have expected, it was the bandwidth problem.

Let me explain how I've found out.

  • let's say there are PC A and B and B has public address.
  • Both of them are running fastdds router
  • "PC A" runs a ros2 node "node A" that publishes "topic A" in total bandwidth of 4mbps
  • "PC B" runs a ros2 node "node B" that subscribes "topic A" and then publishes same size "topic B"
  • "PC A" runs an another ros2 node "node C" that subscribes "topic B"

To limit the bandwidth intentionally, I have used "wondershaper" tool and limited "PC A" bandwidth with downspeed 6mbps and upload speed 2mbps.

then, "node C" in "PC A" got some of the message from "topic B", and eventually it stopped receiving any messages. When I tried to stop fastdds router of "PC A" in this state, I got the message "Stopping DDS Router" but it did not stop gracefully. If i close "node A" the router stopped correctly, but closing "node C" didn't stop router from gracefully stopping.

Can you guess why the "node C" gradually stopped from subscribing topics and router ^C message also got stuck? If my environment have such a limited bandwidth, then is there another way to avoid this behavior?

from dds-router.

jparisu avatar jparisu commented on June 12, 2024

Hi @guni9191

I think we know what could happen in your scenario. We see two problems here:

Bandwidth

In an scenario with a limited bandwidth, it could happen that the DDS Router receives messages faster than it can route them. This will slow the whole application, arriving to a point where some messages have to be discarded for memory issues. Check the following documentation: https://eprosima-dds-router.readthedocs.io/en/latest/rst/user_manual/configuration.html#maximum-history-depth
In this case, there is few that you neither us could do to improve this. Try to limit the amount of topics that are forwarded to reduce the traffic: https://eprosima-dds-router.readthedocs.io/en/latest/rst/user_manual/configuration.html#id1 .

DDS Router closure

We think we found a bug in the DDS Router thread management that makes application to not close until all messages have been forwarded. Thus, if messages arrive faster than they are delivered, this behavior could happen. (We are not sure about this but it could be the case).

New DDS Router update

It is not related with this issue, but we have importantly update the DDS Router so the core logic is moved to a different repository (https://github.com/eProsima/DDS-Pipe).
This issue should be fixed in this new version. The release of it is still not ready, but the Router can be used equally as before by adding the new dependency and compiling again.
If you want to try it out, it would help us a lot.

Comment

Are you using different domains or Discovery-Server in order to force different nodes to communicate through the router?
I suppose you are, as if you weren't you would be experiencing a loop in the routers that would replicate to infinity all your messages. Just in case, check this: https://eprosima-dds-router.readthedocs.io/en/latest/rst/user_manual/configuration.html#participant-configuration

from dds-router.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.