Giter VIP home page Giter VIP logo

Comments (11)

maqi avatar maqi commented on June 12, 2024

I further investigation indicates this issue only happens a "new added" vault node trying to send RPC 1103 to a group when itself is part of it.
There is situation when a "new added" vault node trying to send RPC 1200 to a group containing itself, and 4 response can be correctly received.
There is also situation an original vault node (created during Setup) trying to send RPC 1103 to a group containing itself, and 4 response can be correctly received.

Does this ring any bell to get problem solved ?

from maidsafe.

dirvine avatar dirvine commented on June 12, 2024

Ah in this case the vault will not receive it's own request I think. The 4 returned nodes will be the 4 closest excluding himself ?

from maidsafe.

chandraprakash avatar chandraprakash commented on June 12, 2024

I think in this case, the group id a non existing id so the sender should also receive group message. We are only restricting the case when group id matches existing node id. In case the node is the group leader, the message is sent to the next closest and then returned to this node for replication.

from maidsafe.

maqi avatar maqi commented on June 12, 2024

The further investigation on the log msg unveils the scenario that causing this problem :
A vault node S wants to send an rpc to group T , node S picked up a random node R and send the request ->
The random node R relayed the request to another node GL ->
Node GL realised itself is among the closest nodes of the group T, so it replicated the request to the other 3 members (GM1, GM2 and S (yes, S is IN the group ! ) ) ->
Node GL then send Response to node R, which will relay the response msg to S ->
GM1, GM2 will send Response to node R, which will relay the response msg to S ->
Node S received response from Gl, GM1 and GM2 via node R, but it never received the response from itself !
(according to the log, S received a request from R (replicated from GL), but never sent out a response)->
After a while the timer in S will get timed out and reporting back only got 3 responses.

Following is the detailed log msg :
S - [0E755B3..15EE8C0]
T - [3479713..5A3DDB7]
R - [D3A6FC9..71F9D9A]
GL - [00BF68C..EFF47B1]
GM1 - [5A9049A..182A5D4]
GM2 - [5BF23D1..7618C3E]

V 2848942912 15:30:36.226484 pd/common/rpc_handler.cc:58] Send - 0E755B3..15EE8C0 - RPC 1103 to group around 3479713..5A3DDB7
I 2848942912 15:30:36.226702 routing/timer.cc:69] AddTask added a task, with id 952949398
...
I 2597161792 15:30:36.234240 routing/routing_private.cc:493] This node [D3A6FC9..71F9D9A] received message type: kNodeLevel Request from 0E755B3..15EE8C0 -- RELAY REQUEST id: 952949398
V 2597161792 15:30:36.234585 routing/network_utils.cc:352] Rudp recursive send message to 00BF68C..EFF47B1
I 2613947200 15:30:36.257132 routing/network_utils.cc:322] Type kNodeLevel Request message successfully sent from D3A6FC9..71F9D9A to 00BF68C..EFF47B1 with destination ID 3479713..5A3DDB7 id: 952949398
...
I 2244668224 15:30:36.244563 routing/routing_private.cc:493] This node [00BF68C..EFF47B1] received message type: kNodeLevel Request from D3A6FC9..71F9D9A id: 952949398
V 2244668224 15:30:36.244812 routing/message_handler.cc:141] This node is in closest proximity to this message destination ID [ 3479713..5A3DDB7 ]. id: 952949398
I 2244668224 15:30:36.245064 routing/message_handler.cc:205] Group members for group_id 3479713..5A3DDB7 are: [00BF68C..EFF47B1][0E755B3..15EE8C0][5A9049A..182A5D4][5BF23D1..7618C3E]
I 2244668224 15:30:36.245161 routing/message_handler.cc:209] Replicating message to : 0E755B3..15EE8C0 [ group_id : 3479713..5A3DDB7] id: 952949398
V 2244668224 15:30:36.245266 routing/network_utils.cc:263] >>>>>>>>> rudp send message to connection id 0E755B3..15EE8C0
I 2244668224 15:30:36.245414 routing/message_handler.cc:209] Replicating message to : 5A9049A..182A5D4 [ group_id : 3479713..5A3DDB7] id: 952949398
V 2244668224 15:30:36.259362 routing/network_utils.cc:263] >>>>>>>>> rudp send message to connection id 5A9049A..182A5D4
I 2244668224 15:30:36.259844 routing/message_handler.cc:209] Replicating message to : 5BF23D1..7618C3E [ group_id : 3479713..5A3DDB7] id: 952949398
V 2244668224 15:30:36.260958 routing/network_utils.cc:263] >>>>>>>>> rudp send message to connection id 5BF23D1..7618C3E
I 2244668224 15:30:36.261551 routing/message_handler.cc:78] Node Level Request for 00BF68C..EFF47B1 from D3A6FC9..71F9D9A id: 952949398
I 2261453632 15:30:36.310258 routing/network_utils.cc:322] Type kNodeLevel Response message successfully sent from 00BF68C..EFF47B1 to D3A6FC9..71F9D9A with destination ID D3A6FC9..71F9D9A id: 952949398
...
I 2916084544 15:30:36.250018 routing/routing_private.cc:493] This node [0E755B3..15EE8C0] received message type: kNodeLevel Request from D3A6FC9..71F9D9A id: 952949398
I 2916084544 15:30:36.259917 routing/message_handler.cc:78] Node Level Request for 0E755B3..15EE8C0 from D3A6FC9..71F9D9A id: 952949398
I 2127170368 15:30:36.269221 routing/routing_private.cc:493] This node [5A9049A..182A5D4] received message type: kNodeLevel Request from D3A6FC9..71F9D9A id: 952949398
I 2127170368 15:30:36.269838 routing/message_handler.cc:78] Node Level Request for 5A9049A..182A5D4 from D3A6FC9..71F9D9A id: 952949398
I 2840550208 15:30:36.281443 routing/routing_private.cc:493] This node [5BF23D1..7618C3E] received message type: kNodeLevel Request from D3A6FC9..71F9D9A id: 952949398
I 2840550208 15:30:36.312671 routing/message_handler.cc:78] Node Level Request for 5BF23D1..7618C3E from D3A6FC9..71F9D9A id: 952949398
...
I 2597161792 15:30:36.276925 routing/routing_private.cc:493] This node [D3A6FC9..71F9D9A] received message type: kNodeLevel Response from 00BF68C..EFF47B1 id: 952949398
V 2597161792 15:30:36.277659 routing/message_handler.cc:438] Relaying response to 0E755B3..15EE8C0 id: 952949398
I 2916084544 15:30:36.294223 routing/routing_private.cc:493] This node [0E755B3..15EE8C0] received message type: kNodeLevel Response from 00BF68C..EFF47B1 id: 952949398
I 2916084544 15:30:36.311591 routing/message_handler.cc:123] Node Level Response for 0E755B3..15EE8C0 from 00BF68C..EFF47B1 id: 952949398
I 2916084544 15:30:36.311688 routing/timer.cc:155] Received 1 response(s). Waiting for 3 responses for task 952949398
...
I 2143955776 15:30:36.320318 routing/network_utils.cc:322] Type kNodeLevel Response message successfully sent from 5A9049A..182A5D4 to D3A6FC9..71F9D9A with destination ID D3A6FC9..71F9D9A id: 952949398
I 2605554496 15:30:36.316501 routing/routing_private.cc:493] This node [D3A6FC9..71F9D9A] received message type: kNodeLevel Response from 5A9049A..182A5D4 id: 952949398
V 2605554496 15:30:36.342588 routing/message_handler.cc:438] Relaying response to 0E755B3..15EE8C0 id: 952949398
I 2916084544 15:30:36.416228 routing/routing_private.cc:493] This node [0E755B3..15EE8C0] received message type: kNodeLevel Response from 5A9049A..182A5D4 id: 952949398
I 2916084544 15:30:36.426297 routing/message_handler.cc:123] Node Level Response for 0E755B3..15EE8C0 from 5A9049A..182A5D4 id: 952949398
I 2916084544 15:30:36.426447 routing/timer.cc:155] Received 2 response(s). Waiting for 2 responses for task 952949398
...
I 2882513728 15:30:36.391796 routing/network_utils.cc:322] Type kNodeLevel Response message successfully sent from 5BF23D1..7618C3E to D3A6FC9..71F9D9A with destination ID D3A6FC9..71F9D9A id: 952949398
I 2597161792 15:30:36.353944 routing/routing_private.cc:493] This node [D3A6FC9..71F9D9A] received message type: kNodeLevel Response from 5BF23D1..7618C3E id: 952949398
V 2597161792 15:30:36.354092 routing/message_handler.cc:438] Relaying response to 0E755B3..15EE8C0 id: 952949398
I 2848942912 15:30:36.467763 routing/routing_private.cc:493] This node [0E755B3..15EE8C0] received message type: kNodeLevel Response from 5BF23D1..7618C3E id: 952949398
I 2848942912 15:30:36.468002 routing/message_handler.cc:123] Node Level Response for 0E755B3..15EE8C0 from 5BF23D1..7618C3E id: 952949398
I 2848942912 15:30:36.468102 routing/timer.cc:155] Received 3 response(s). Waiting for 1 responses for task 952949398
... ... ... ...
E 2848942912 15:30:56.226880 routing/timer.cc:96] Timed out waiting for task 952949398
V 2993683264 15:30:56.227663 pd/common/rpc_handler.cc:93] CheckResponse - 0E755B3..15EE8C0 - Response for RPC 1103 from group around 3479713..5A3DDB7 with 3 messages

from maidsafe.

muecs avatar muecs commented on June 12, 2024

Great work Qi!

from maidsafe.

maqi avatar maqi commented on June 12, 2024

It turned out that the on_message_received_ of pd::RoutingMessageHandler doesn't get initialised when received that request during the start-up procedure. (which it shall not receive a request, as it is not fully joined yet.)

Possible solutions :
1, on pd level, respond with a dummy msg
2, on Routing level, try to avoid send out that request (a non-fully-joined node shall not be counted as member of a group)

from maidsafe.

maqi avatar maqi commented on June 12, 2024

commit
maidsafe-archive/MaidSafe-Routing@4ed97ec
solved this problem, by avoiding relaying a request to the original if it's already inside the group
(instead pick up an outside one and pass on the request)

from maidsafe.

chandraprakash avatar chandraprakash commented on June 12, 2024

We are changing the behaviour of Send API to deal with time-out delays.
When a node is joining the network, it will get only 3 responses for group messages. After getting a health of 8 nodes, it will start firing functor on 4 responses. This will accommodate above case when a request is appearing to a node when it is has no nodes in its routing table.

Reopening issue until this change is in place.

from maidsafe.

chandraprakash avatar chandraprakash commented on June 12, 2024

maidsafe-archive/MaidSafe-Routing@268439d

from maidsafe.

dirvine avatar dirvine commented on June 12, 2024

I think this is now complete ?

from maidsafe.

chandraprakash avatar chandraprakash commented on June 12, 2024

The routing changes are still not in place. So re-opening.

from maidsafe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.