Giter VIP home page Giter VIP logo

Comments (6)

jackalcooper avatar jackalcooper commented on July 28, 2024
checking for unistd.h... (cached) yes
checking rdma/fabric.h usability... no
checking rdma/fabric.h presence... no
checking for rdma/fabric.h... no
configure: error: unable to find required headers

MAKE with git clone. It checks header files of rdma.

from aws-ofi-nccl.

rashikakheria avatar rashikakheria commented on July 28, 2024

For the first reported issue (based on release v0.9), the build complains about missing nccl_net.h. This is provided by NCCL repository. Once you have cloned and built NCCL, you could use NCCL_HOME option to specify any non-standard installation. For eg:

make NCCL_HOME=/home/ec2/nccl/build

For the second one, the configuration complains about missing rdma/fabric.h provided by libfabric repository. You need to mention the path of working installation of libfabric which can be specified (if non-standard installation) using --with-libfabric flag.

Please let me know if this still doesn't work for you.

from aws-ofi-nccl.

jackalcooper avatar jackalcooper commented on July 28, 2024

Thank you @rashikakheria ! Later I will get it another try. Besides getting it running, I am curious about the performance this plug-in brings to the table. We are trying to get rid of the bottleneck of 100Gbps AWS ENA due to CPU single core limitation. It would be wonderful if you could post some info on benchmarks and performance boost from naive NCCL incorporation.

from aws-ofi-nccl.

bwbarrett avatar bwbarrett commented on July 28, 2024

@jackalcooper, the primary driver for NCCL over libfabric is to support EFA when it is officially released. EFA will still be 100 Gbps, but should have lower CPU utilization than using ENA and will avoid the 10 Gbps flow limiters that require the use of a high number of rings in NCCL to drive the full 100 Gbps. Of course, we made the ofi-nccl package Open Source to encourage others in the Libfabric community to also provide NCCL support for their networks.

from aws-ofi-nccl.

rashikakheria avatar rashikakheria commented on July 28, 2024

Are there any other open questions @jackalcooper? If not, could we close this issue?

from aws-ofi-nccl.

jackalcooper avatar jackalcooper commented on July 28, 2024

@rashikakheria Thanks for asking. There is no other issue for now. I have built it and applied for EFA preview. Will test it again and report more on this if it is approved.

from aws-ofi-nccl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.