Giter VIP home page Giter VIP logo

Comments (2)

sjeaugey avatar sjeaugey commented on July 20, 2024 1

The reported time is the time of the NCCL group call, i.e.

ncclGroupStart();
ncclRecv(...); // from prev rank
ncclSend(...); // to next rank
ncclGroupEnd();

It is not a ping-pong test, it's more like a single ring connecting previous rank and next rank.

from nccl-tests.

osayamenja avatar osayamenja commented on July 20, 2024

@sjeaugey Thank you, assuming the alpha-beta cost model, would you agree that the following accurately describes the total time per rank $t_r$ and ideal reported time $t_R$? $$t_r = \alpha_{r-1, r} + n \cdot\beta_{r-1, r} = \frac{RTT}{2}$$ $$t_R = \max_{r \in W} t_r$$ where $\alpha_{ij}$ and $\beta_{ij}$ denote latency and bandwidth of sending from $i$ to $j$ and $n$ is data size and $W$ is process world.

That is, for GPU 0, $t_0 = \alpha_{1, 0} + n \cdot\beta_{1, 0}$ since both GPU 0 and 1 send simultaneously through the ring, and GPU 0 will have to wait until it receives from GPU 1, and vice versa.

from nccl-tests.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.