Comments (2)
The reported time is the time of the NCCL group call, i.e.
ncclGroupStart();
ncclRecv(...); // from prev rank
ncclSend(...); // to next rank
ncclGroupEnd();
It is not a ping-pong test, it's more like a single ring connecting previous rank and next rank.
from nccl-tests.
@sjeaugey Thank you, assuming the alpha-beta cost model, would you agree that the following accurately describes the total time per rank
That is, for GPU 0,
from nccl-tests.
Related Issues (20)
- NCCL_ALGO on multi-node and multi-GPU HOT 1
- NCCL initialization hangs with 4 GPUs, but works with 2 GPUs HOT 4
- all_reduce_perf hangs; using a single GPU on a 4GPU machine HOT 18
- Rank Assignment Issue under four containers on two different servers. HOT 8
- Test NCCL failure common.cu:959 'internal error - please report this issue to the NCCL developers / ' HOT 9
- 1 GiB headroom might be too small
- how to support One Device per Process? HOT 4
- NCCL WARN Cannot use cuda/gdr transports as part of specified UCX_TLS HOT 5
- mpirun all_reduce_perf hang with multi-device test
- alltoall_perf: each rank is only sending to half of the other ranks HOT 14
- stepbytes (increment size) argument does not support 1M notation HOT 1
- Test fail caused by ibvwrap.c:160 NCCL WARN Call to ibv_modify_qp failed with error Connection timed out. HOT 2
- has nvswitch, but uses 0 nvls channels HOT 3
- 2 Nodes nccl-test with mpi hangs HOT 1
- what is cu:990 error? how to solve this problem? HOT 5
- Test NCCL failure common.cu:997 'internal error HOT 9
- NCCL Tree allreduce test cannot reach the theoretical bus bandwidth on 2 nodes with 4 nics
- all_reduce_perf core dumped on 4 L20 HOT 11
- What's multi-allreduce ? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nccl-tests.