Comments (6)
checking for unistd.h... (cached) yes
checking rdma/fabric.h usability... no
checking rdma/fabric.h presence... no
checking for rdma/fabric.h... no
configure: error: unable to find required headers
MAKE with git clone. It checks header files of rdma.
from aws-ofi-nccl.
For the first reported issue (based on release v0.9), the build complains about missing nccl_net.h
. This is provided by NCCL repository. Once you have cloned and built NCCL, you could use NCCL_HOME
option to specify any non-standard installation. For eg:
make NCCL_HOME=/home/ec2/nccl/build
For the second one, the configuration complains about missing rdma/fabric.h
provided by libfabric repository. You need to mention the path of working installation of libfabric which can be specified (if non-standard installation) using --with-libfabric
flag.
Please let me know if this still doesn't work for you.
from aws-ofi-nccl.
Thank you @rashikakheria ! Later I will get it another try. Besides getting it running, I am curious about the performance this plug-in brings to the table. We are trying to get rid of the bottleneck of 100Gbps AWS ENA due to CPU single core limitation. It would be wonderful if you could post some info on benchmarks and performance boost from naive NCCL incorporation.
from aws-ofi-nccl.
@jackalcooper, the primary driver for NCCL over libfabric is to support EFA when it is officially released. EFA will still be 100 Gbps, but should have lower CPU utilization than using ENA and will avoid the 10 Gbps flow limiters that require the use of a high number of rings in NCCL to drive the full 100 Gbps. Of course, we made the ofi-nccl package Open Source to encourage others in the Libfabric community to also provide NCCL support for their networks.
from aws-ofi-nccl.
Are there any other open questions @jackalcooper? If not, could we close this issue?
from aws-ofi-nccl.
@rashikakheria Thanks for asking. There is no other issue for now. I have built it and applied for EFA preview. Will test it again and report more on this if it is approved.
from aws-ofi-nccl.
Related Issues (20)
- Mellanox and EFA in Docker Image HOT 6
- NCCL WARN NET/OFI Only EFA provider is supported HOT 2
- potential reoccurrence of https://github.com/aws/aws-ofi-nccl/issues/69 HOT 1
- aws branch does not build on centos 7 with gcc 4.8.5 HOT 2
- Support Ubuntu 22.04 HOT 4
- Support FI_CONTEXT2 HOT 2
- Misleading comparison on unsigned integer
- Plugin fails if compiled against Libfabric 1.18 but run against Libfabric 1.17 or older. HOT 11
- Unable to find libcudart.so (1.7.1) HOT 6
- Running nccl-perf tests documentation is missing MPI instructions HOT 3
- What are some AI/ML workloads users can utilize to test performance of the plugin?
- Unable to force FI_HMEM to be used and FI_OPT_CUDA_API_PERMITTED is not respected by config scripts HOT 4
- Support Amazon Linux 2023 (AL2023) HOT 2
- Support Red Hat Enterprise Linux 9+ HOT 4
- Add more examples with more recent cuda versions HOT 2
- Topology Discovery Regression HOT 2
- GPU direct HOT 1
- NCCL internal error after aws-ofi-nccl upgrade to version 1.7.4 HOT 6
- Segfault after/during finalize with OpenMPI HOT 2
- Propagate "Invalid address" to NCCL communicator
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aws-ofi-nccl.