Giter VIP home page Giter VIP logo

Comments (6)

wxdwfc avatar wxdwfc commented on June 18, 2024

from drtmh.

psistakis avatar psistakis commented on June 18, 2024

Hi,

Thanks for the reply.

Based on the output it seems the cluster has two devices per machine, and the port 1 of the second machine is inactive (at least that is my understanding) --I am afraid I do not have physical access to the cluster to confirm this, but I can double check this with someone who has. Is there a way to bypass this issue, i.e., use only one device & one port per machine?

Thank you.

The ibstatus on each machine returns the following:

Infiniband device 'mlx5_0' port 1 status:
default gid: XXX
base lid: 0x6
sm lid: 0x4
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 100 Gb/sec (4X EDR)
link_layer: InfiniBand

Infiniband device 'mlx5_1' port 1 status:
default gid: XXX
base lid: 0xffff
sm lid: 0x0
state: 1: DOWN
phys state: 3: Disabled
rate: 10 Gb/sec (4X SDR)
link_layer: InfiniBand

from drtmh.

wxdwfc avatar wxdwfc commented on June 18, 2024

from drtmh.

psistakis avatar psistakis commented on June 18, 2024

Hi,

Thank you for your feedback.

Just to make sure I understand: if the name mlx5_X from the output I sent earlier shows the port number, then port 0 (mlx5_0) is the one that is active, correct?

If that is the case, as I mentioned earlier (first comment), I have set the use_port_ to be 0 in RWorker::choose_rnic_port() as suggested in #2. Is this your suggestion? I have tried this change before + re-building the project, but I get the same output.

Please let me know if I have misunderstood something.

Thank you.

from drtmh.

wxdwfc avatar wxdwfc commented on June 18, 2024

from drtmh.

psistakis avatar psistakis commented on June 18, 2024

Hi,

Thanks for the feedback.

I tried the following and it seems it worked.

In the init_rdma() in src/core/rworker.cc, I set idx to be a fixed value (dev_id = 0, port_id=1), instead of using cm_->convert_port_idx(). More specifically:

RdmaCtrl::DevIdx idx = RdmaCtrl::DevIdx{.dev_id = 0, .port_id=1}

Thank you for your help! :)

from drtmh.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.