Comments (7)
Hi @guni9191 ,
Glad to know you are working with DDS Router.
About your problem, with further information we are not able to give you an answer or a solution.
In order to help you, we would need more information regarding your scenario:
Error case
The error is occurring only when trying to close the DDS Router application, or the application just stops and then you are not able to stop it?
There is an echo
participant that would be helpful to debug if the router is frozen, or if it is only incapable of stopping: https://eprosima-dds-router.readthedocs.io/en/latest/rst/user_manual/participants/echo.html
DDS network
Please, let us know the data types and rates that you are using, and also the QoS of your topics.
Some restrictive QoS with huge data loads may slow down the application drastically.
Network architecture
Are you working in local, WAN, in the same host? What is you bandwidth?
All the information that you are able to give us will help us to solve your problem.
from dds-router.
The error is occurring only when trying to close the DDS Router application, or the application just stops and then you are not able to stop it?
There is an echo participant that would be helpful to debug if the router is frozen, or if it is only incapable of stopping: https://eprosima-dds-router.readthedocs.io/en/latest/rst/user_manual/participants/echo.html
=> The application just stops and then I am not able to stop it. I have tried echo participant you have introduced me and it stops showing information too. when ^c is pressed, "Stopping DDS Router" only shows up.
data types and rates that you are using
=> i am using custom ROS2 msg types, 25 topics (nine 2hz, five 5hz, nine 1hz, two 0.1hz). i'm not sure about the data length, but my wireshark detects that the packets are 7052 frames/sec and a single frame contains 1304 bytes. Most of the qos setting is ROS2 QOS default setting, except one topic uses liveliness qos. This is very unusually large amout of data since my local rtps frame only have 300bytes on average, and not much frames(only about 200 compared to 7052frames). Also from wireshark i see a single frame that contains multiple duplicated messages (tcp payload). is this normal?
Are you working in local, WAN, in the same host? What is you bandwidth?
=> not sure about how to check the bandwidth but i'm guessing it's at least 100mbps. there seems to be some kind of firewall for my wifi but not sure about my environment. As i've said earlier i'm using azure cloud server so it's WAN. I was testing the round trip time by using system stamp, and when it stops, the rtt reaches to almost 20seconds.
from dds-router.
also my config for the tcp client is as below
version: v3.0 # 0
allowlist:
- name: rt/*
type: A_msgs* - name: rt/*
type: B_msgs* - name: rt/*
type: C_msgs*
...
participants:
-
name: SimpleParticipant # 3
kind: local # 4
domain: 0 # 5 -
name: WanParticipant # 6
kind: wan # 7
connection-addresses: # 8- ip: azure_cloud_server_public_ip
port: my_port
transport: tcp
- ip: azure_cloud_server_public_ip
from dds-router.
@guni9191, thank you for the detailed information.
So far, we do not know what can be producing this issue. We will try to extend our battery test.
If you could help us further, it will be important to know if the freeze is produced due to CPU usage and/or memory usage. An htop
analysis will be interesting.
Also, if the application stops due to a deadlock, would be interesting to get the back-trace (using gdb
for instance) to know if it is a transport issue, or it is something related with the DDS Router application.
Finally, I guess the large size of your frames is related with TCP. Would you be able to run it with UDP?
from dds-router.
- htop analysis didn't show the cpu and ram usage difference.
- The problem might be related to network bandwidth and latency.
-> It seems that if i use faster 1100Mbps wifi instead of 433Mbps wifi the stopping behavior did not occur. Also, since there were not much frames generated when using 1100Mbps, i'm guessing that tcp packets with low bandwidth is more likely to disassemble and reassemble, generating much more unnecessary frames and heavy traffic. - I think it is not udp/tcp matter, although using udp made things twice as faster. i am not able to test the application since i cannot modify router port-forwarding in my test environment. testing simple pub/sub in my house showed twice the faster rtt time though.
Can you guys test fastdds router in heavy traffic, low bandwidth environment? as far as i know, dds should work robustly in such a difficult situation, and most of all, the application should not stop. thx in advance for your response
from dds-router.
@jparisu
I think i found the reason why. As I have expected, it was the bandwidth problem.
Let me explain how I've found out.
- let's say there are PC A and B and B has public address.
- Both of them are running fastdds router
- "PC A" runs a ros2 node "node A" that publishes "topic A" in total bandwidth of 4mbps
- "PC B" runs a ros2 node "node B" that subscribes "topic A" and then publishes same size "topic B"
- "PC A" runs an another ros2 node "node C" that subscribes "topic B"
To limit the bandwidth intentionally, I have used "wondershaper" tool and limited "PC A" bandwidth with downspeed 6mbps and upload speed 2mbps.
then, "node C" in "PC A" got some of the message from "topic B", and eventually it stopped receiving any messages. When I tried to stop fastdds router of "PC A" in this state, I got the message "Stopping DDS Router" but it did not stop gracefully. If i close "node A" the router stopped correctly, but closing "node C" didn't stop router from gracefully stopping.
Can you guess why the "node C" gradually stopped from subscribing topics and router ^C message also got stuck? If my environment have such a limited bandwidth, then is there another way to avoid this behavior?
from dds-router.
Hi @guni9191
I think we know what could happen in your scenario. We see two problems here:
Bandwidth
In an scenario with a limited bandwidth, it could happen that the DDS Router receives messages faster than it can route them. This will slow the whole application, arriving to a point where some messages have to be discarded for memory issues. Check the following documentation: https://eprosima-dds-router.readthedocs.io/en/latest/rst/user_manual/configuration.html#maximum-history-depth
In this case, there is few that you neither us could do to improve this. Try to limit the amount of topics that are forwarded to reduce the traffic: https://eprosima-dds-router.readthedocs.io/en/latest/rst/user_manual/configuration.html#id1 .
DDS Router closure
We think we found a bug in the DDS Router thread management that makes application to not close until all messages have been forwarded. Thus, if messages arrive faster than they are delivered, this behavior could happen. (We are not sure about this but it could be the case).
New DDS Router update
It is not related with this issue, but we have importantly update the DDS Router so the core logic is moved to a different repository (https://github.com/eProsima/DDS-Pipe).
This issue should be fixed in this new version. The release of it is still not ready, but the Router can be used equally as before by adding the new dependency and compiling again.
If you want to try it out, it would help us a lot.
Comment
Are you using different domains or Discovery-Server in order to force different nodes to communicate through the router?
I suppose you are, as if you weren't you would be experiencing a loop in the routers that would replicate to infinity all your messages. Just in case, check this: https://eprosima-dds-router.readthedocs.io/en/latest/rst/user_manual/configuration.html#participant-configuration
from dds-router.
Related Issues (20)
- ddsrouter-config.yaml HOT 2
- Procedure replication for global installation of DDS-Router HOT 5
- DDS-Router issue HOT 6
- DDS-Router for Windows 10 build failed HOT 1
- Latest Documentation
- Question: Does DDS-Router support data deduplication HOT 1
- TLS config for both server and client does not work HOT 2
- Configuration example: WAN with dynamic ipaddresses and dyndns (optional SSL/TLS) HOT 4
- colcon build failed HOT 1
- Messages not routing with Foxy HOT 3
- docs: `addresses` key doesn't seem valid in `connection-addresses` HOT 2
- About 'wan communication over TCP' in readdocs HOT 1
- DDS Router Docker image not compatible with arm64 systems
- About WAN configuration and example HOT 2
- run example error HOT 4
- Deadline is set to 0, it may cause DataReaderImpl::deadline_timer_ timeout immediately HOT 2
- Reporting a vulnerability HOT 1
- Repeater using TCP with same domain ids HOT 1
- Adding ROS 2 introspection as SUPER_CLIENT HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dds-router.