Giter VIP home page Giter VIP logo

Comments (6)

AddyLaddy avatar AddyLaddy commented on August 20, 2024

In the Nvidia provided containers the MPI=1 built tests are usually called:

/opt/nccl-tests/build/all_reduce_perf_mpi

Did you build /opt/nccl-tests/build/all_reduce_perf yourself using MPI=1 ?

from nccl-tests.

FrankLeeeee avatar FrankLeeeee commented on August 20, 2024

I encountered the same problem on H100 as well. I built nccl-tests with OpenMPI and did not use Docker.

from nccl-tests.

jianh619 avatar jianh619 commented on August 20, 2024

In the Nvidia provided containers the MPI=1 built tests are usually called:

/opt/nccl-tests/build/all_reduce_perf_mpi

Did you build /opt/nccl-tests/build/all_reduce_perf yourself using MPI=1 ?

Yes , I build the image myself , compiling nccl test with MPI=1

BTW , is there official container provided by Nvidia ? Coudl you let me know where I can get the download link?

from nccl-tests.

FrankLeeeee avatar FrankLeeeee commented on August 20, 2024

I solved this issue by using OpenMPI 4.1 instead. I originally built nccl-tests with openmpi 5.0 but it runs separately on each node. After switching to OpenMPI 4.1 and rebuilding it, it works as expected now.

from nccl-tests.

kiskra-nvidia avatar kiskra-nvidia commented on August 20, 2024

Yes , I build the image myself , compiling nccl test with MPI=1

It sure looks like, for whatever reason, either your MPI compilation or your MPI installation does not work as expected.

Does a simple MPI "hello world" type program work correctly (you know, one that would report the rank and size of MPI_COMM_WORLD from each launched process)?

Can you verify if your all_reduce_perf actually uses MPI? Say, check with ldd if it links with the MPI library:

ldd all_reduce_perf | grep mpi

Or check with nm if it has any MPI symbols:

nm all_reduce_perf | grep MPI

BTW , is there official container provided by Nvidia ? Coudl you let me know where I can get the download link?

Docker container nvidia/cuda:12.2.2-devel-ubuntu22.04 contains NCCL 2.19.3. A number of containers in Nvidia's NGC catalog (https://catalog.ngc.nvidia.com/) contain NCCL as well. I believe TensorRT does (https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorrt), and PyTorch (https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch). Also https://catalog.ngc.nvidia.com/orgs/nvidia/containers/hpc-benchmarks, https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nvhpc...

from nccl-tests.

jianh619 avatar jianh619 commented on August 20, 2024

Thanks guys , it should be some reason for compilation , rebuild the image , works now .

from nccl-tests.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.