Comments (6)
from drtmh.
Hi,
Thanks for the reply.
Based on the output it seems the cluster has two devices per machine, and the port 1 of the second machine is inactive (at least that is my understanding) --I am afraid I do not have physical access to the cluster to confirm this, but I can double check this with someone who has. Is there a way to bypass this issue, i.e., use only one device & one port per machine?
Thank you.
The ibstatus
on each machine returns the following:
Infiniband device 'mlx5_0' port 1 status:
default gid: XXX
base lid: 0x6
sm lid: 0x4
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 100 Gb/sec (4X EDR)
link_layer: InfiniBandInfiniband device 'mlx5_1' port 1 status:
default gid: XXX
base lid: 0xffff
sm lid: 0x0
state: 1: DOWN
phys state: 3: Disabled
rate: 10 Gb/sec (4X SDR)
link_layer: InfiniBand
from drtmh.
from drtmh.
Hi,
Thank you for your feedback.
Just to make sure I understand: if the name mlx5_X from the output I sent earlier shows the port number, then port 0 (mlx5_0) is the one that is active, correct?
If that is the case, as I mentioned earlier (first comment), I have set the use_port_
to be 0
in RWorker::choose_rnic_port()
as suggested in #2. Is this your suggestion? I have tried this change before + re-building the project, but I get the same output.
Please let me know if I have misunderstood something.
Thank you.
from drtmh.
from drtmh.
Hi,
Thanks for the feedback.
I tried the following and it seems it worked.
In the init_rdma()
in src/core/rworker.cc, I set idx
to be a fixed value (dev_id = 0, port_id=1), instead of using cm_->convert_port_idx()
. More specifically:
RdmaCtrl::DevIdx idx = RdmaCtrl::DevIdx{.dev_id = 0, .port_id=1}
Thank you for your help! :)
from drtmh.
Related Issues (7)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from drtmh.