Giter VIP home page Giter VIP logo

Comments (9)

hdtjiang avatar hdtjiang commented on August 28, 2024 4

the problem is caused by the parameter in point-to-point/model/qbb-net-device.h you will find
static const uint32_t fCnt = 128; // Max number of flows on a NIC, for TX and RX respectively. TX+RX=fCnt*2.
And you can increase this.but there is also a problem. when you finished a flow and start a new flow ,you will find this problem will appear again.Because there is none of queue recovery mechanism.

from ns3-rdma.

bobzhuyb avatar bobzhuyb commented on August 28, 2024

I never tried that. Sorry.

from ns3-rdma.

wangshuaizs avatar wangshuaizs commented on August 28, 2024

OK, thank you anyway!

I have got another trouble. I run a simulation that server node 0 - 126 connect to a broadcom switch, then server node 0 send 1 packet (pay load size =1000) to the rest of each server node. the result prints some warning: " WARNING: Drop because egress Port buffer full, WARNING: Drop because egress Q buffer full, WARNING: Drop because egress SP buffer full", I expected to see retransmission, but I can not find retransimission in mix.tr.

Even when I increase the number of server nodes to 129, which means that server node 0 will send 1 packet to server node 1 - 128, respectively, the main.exe crashes with error message like โ€œ0x0000010000001000 access violation occurs when the reading position.โ€

Does that mean I can not simulation more than 127 flows from one server simultaneously? I have tried to dig in your source code, but I find nothing to support this assumption. Could you please give me some suggestion? Thank you !

from ns3-rdma.

bobzhuyb avatar bobzhuyb commented on August 28, 2024

The main issue is on the switch node, not on the servers/flows.

I hard-coded a max port number of 64 per switch because this is what we had in practice (64-port switches). You may try to raise this.
https://github.com/bobzhuyb/ns3-rdma/blob/master/src/network/model/broadcom-node.h#L59

Once you raise this, the switch buffer may run out easily -- remember PFC requires certain buffer headroom per port to operate, otherwise PFC cannot prevent packet losses. You may need to reconfigure buffer thresholds/capacity in https://github.com/bobzhuyb/ns3-rdma/blob/master/src/network/model/broadcom-node.cc

If you want to test 128->1 or even more intensive incast, I recommend you to stick with 64-port switches and use multi-hop topology. The congestion point will be at the last hop anyways. Then you don't need to worry about above issues on the switch.

from ns3-rdma.

wangshuaizs avatar wangshuaizs commented on August 28, 2024

@bobzhuyb

I tried to create a topology with 2 servers, named server 0 and server 1, connected to each other directly. And server 1 established 200 rdma flows to server 0 at the same time, but visual studio report errors that said memory access violation. Is it a bug?

Thank you!

from ns3-rdma.

bobzhuyb avatar bobzhuyb commented on August 28, 2024

I don't remember any hard-coded limitation for the number of flows per server... but I may be wrong. What is the maximum number of flows that does not have this problem? 128? 64?

from ns3-rdma.

wangshuaizs avatar wangshuaizs commented on August 28, 2024

@bobzhuyb

In my test, 127 flows are ok, but 128 flows aren't.

from ns3-rdma.

bobzhuyb avatar bobzhuyb commented on August 28, 2024

Thanks @hdtjiang for the explanation. This is indeed something that needs to be improved.

from ns3-rdma.

wangshuaizs avatar wangshuaizs commented on August 28, 2024

Thanks @hdtjiang for your reply. I think the parameter in network/utils/broadcom-egress-queue.h should also be increased accordingly:

static const unsigned fCnt = 128; //max number of queues, 128 for NICs

from ns3-rdma.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.