google / nccl-fastsocket Goto Github PK
View Code? Open in Web Editor NEWNCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
License: Other
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
License: Other
Enviroment
Command
mpirun --allow-run-as-root -np 8 \
--hostfile centos8-hostfile \
--mca orte_base_help_aggregate 0 \
--mca btl tcp,vader,self \
--mca plm_rsh_args "-p 8022" \
--mca btl_tcp_if_include eth0 \
-bind-to none -oversubscribe \
--map-by slot \
-x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH \
-x NCCL_SOCKET_IFNAME=eth0 \
-x NCCL_IB_DISABLE=1 \
nccl-tests/build/all_reduce_perf -b 8 -e 1024M -f 5 -g 1 -o all -n 500 -w 10
Perfromence with FastSocket plugin enabled
# nThread 1 nGpus 1 minBytes 8 maxBytes 1073741824 step: 5(factor) warmup iters: 10 iters: 500 validation: 1
#
# Using devices
# Rank 0 Pid 418 on ml-gpu-ser423 device 0 [0x02] Tesla P40
# Rank 1 Pid 419 on ml-gpu-ser423 device 1 [0x03] Tesla P40
# Rank 2 Pid 420 on ml-gpu-ser423 device 2 [0x83] Tesla P40
# Rank 3 Pid 421 on ml-gpu-ser423 device 3 [0x84] Tesla P40
# Rank 4 Pid 488 on ml-gpu-ser604 device 0 [0x02] Tesla P40
# Rank 5 Pid 489 on ml-gpu-ser604 device 1 [0x03] Tesla P40
# Rank 6 Pid 490 on ml-gpu-ser604 device 2 [0x83] Tesla P40
# Rank 7 Pid 491 on ml-gpu-ser604 device 3 [0x84] Tesla P40
# out-of-place in-place
# size count type redop time algbw busbw error time algbw busbw error
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
8 2 float avg 77.44 0.00 0.00 9e-10 77.03 0.00 0.00 9e-10
40 10 float avg 77.25 0.00 0.00 9e-10 77.08 0.00 0.00 9e-10
200 50 float avg 78.30 0.00 0.00 9e-10 78.19 0.00 0.00 9e-10
1000 250 float avg 87.68 0.01 0.02 3e-08 87.59 0.01 0.02 3e-08
5000 1250 float avg 108.3 0.05 0.08 3e-08 108.4 0.05 0.08 3e-08
25000 6250 float avg 261.9 0.10 0.17 3e-08 271.6 0.09 0.16 3e-08
125000 31250 float avg 411.9 0.30 0.53 3e-08 420.6 0.30 0.52 3e-08
625000 156250 float avg 999.8 0.63 1.09 3e-08 977.8 0.64 1.12 3e-08
3125000 781250 float avg 4749.9 0.66 1.15 3e-08 4835.7 0.65 1.13 3e-08
15625000 3906250 float avg 15131 1.03 1.81 3e-08 15210 1.03 1.80 3e-08
78125000 19531250 float avg 71686 1.09 1.91 3e-08 71619 1.09 1.91 3e-08
390625000 97656250 float avg 336844 1.16 2.03 3e-08 337039 1.16 2.03 3e-08
Perfromence with FastSocket plugin disabled
# nThread 1 nGpus 1 minBytes 8 maxBytes 1073741824 step: 5(factor) warmup iters: 10 iters: 500 validation: 1
#
# Using devices
# Rank 0 Pid 418 on ml-gpu-ser423 device 0 [0x02] Tesla P40
# Rank 1 Pid 419 on ml-gpu-ser423 device 1 [0x03] Tesla P40
# Rank 2 Pid 420 on ml-gpu-ser423 device 2 [0x83] Tesla P40
# Rank 3 Pid 421 on ml-gpu-ser423 device 3 [0x84] Tesla P40
# Rank 4 Pid 488 on ml-gpu-ser604 device 0 [0x02] Tesla P40
# Rank 5 Pid 489 on ml-gpu-ser604 device 1 [0x03] Tesla P40
# Rank 6 Pid 490 on ml-gpu-ser604 device 2 [0x83] Tesla P40
# Rank 7 Pid 491 on ml-gpu-ser604 device 3 [0x84] Tesla P40
# out-of-place in-place
# size count type redop time algbw busbw error time algbw busbw error
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
8 2 float avg 149.6 0.00 0.00 9e-10 138.9 0.00 0.00 9e-10
40 10 float avg 138.0 0.00 0.00 9e-10 102.3 0.00 0.00 9e-10
200 50 float avg 96.76 0.00 0.00 9e-10 96.16 0.00 0.00 9e-10
1000 250 float avg 82.18 0.01 0.02 3e-08 82.30 0.01 0.02 3e-08
5000 1250 float avg 103.8 0.05 0.08 3e-08 102.4 0.05 0.09 3e-08
25000 6250 float avg 225.3 0.11 0.19 3e-08 225.4 0.11 0.19 3e-08
125000 31250 float avg 346.6 0.36 0.63 3e-08 345.5 0.36 0.63 3e-08
625000 156250 float avg 961.4 0.65 1.14 3e-08 968.0 0.65 1.13 3e-08
3125000 781250 float avg 4677.0 0.67 1.17 3e-08 4684.6 0.67 1.17 3e-08
15625000 3906250 float avg 13943 1.12 1.96 3e-08 13941 1.12 1.96 3e-08
78125000 19531250 float avg 68384 1.14 2.00 3e-08 68389 1.14 2.00 3e-08
390625000 97656250 float avg 333850 1.17 2.05 3e-08 333890 1.17 2.05 3e-08
Anyone has any suggestions ? am i do the right perfermance tests?
Dear developers, can you please help with the following errors please? Thank you!
$ git clone https://github.com/google/nccl-fastsocket.git
$ cd nccl-fastsocket
$ bazel build :all
WARNING: Output base '/home/user/.cache/bazel/_bazel_user/1340a46a9e7502c5cf03e1a0a087e4f3' is on NFS. This may lead to surprising failures and undetermined behavior.
Starting local Bazel server and connecting to it...
ERROR: Traceback (most recent call last):
File "/home/user/nccl/ext-net/google-fastsocket/nccl-fastsocket/BUILD", line 104, column 8, in <toplevel>
pkg_tar(
File "/home/user/.cache/bazel/_bazel_user/1340a46a9e7502c5cf03e1a0a087e4f3/external/rules_pkg/pkg/private/tar/tar.bzl", line 318, column 38, in pkg_tar
private_stamp_detect = select({
Error in select: select: got Label for dict key, want a label string
ERROR: error loading package '': Package '' contains errors
INFO: Elapsed time: 3.352s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (1 packages loaded)
With working installation of cuda:
NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2
Build doesnt get too far:
bazel build :all
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: SHA256 (https://github.com/bazelbuild/rules_pkg/archive/main.zip) = 4c9d7c26c8f1969f6518e5d7d52e947668107eb537c73c724b5a4b2f61646a08
DEBUG: Rule 'rules_pkg' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "4c9d7c26c8f1969f6518e5d7d52e947668107eb537c73c724b5a4b2f61646a08"
DEBUG: Repository rules_pkg instantiated at:
/home/e/nccl-fastsocket/WORKSPACE.bazel:25:13: in <toplevel>
Repository rule http_archive defined at:
/home/e/.cache/bazel/_bazel_e/f1e3dc12e04fc3258fc15fe372504351/external/bazel_tools/tools/build_defs/repo/http.bzl:336:31: in <toplevel>
ERROR: error loading package '': cannot load '@rules_pkg//toolchains:rpmbuild_configure.bzl': no such file
INFO: Elapsed time: 4.119s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
I am trying out the fastsocket NCCL plugin on GCP (specifically a GCE SLURM cluster build out of 2x(8xA100) nodes with gVNIC's). I see those warnings in the logs, specifically NCCL WARN Cannot get incoming CPU.
and NCCL WARN Maximum retry reached for accept 3.
. Does that mean something specific or can it be safely ignored?
The code runs despite the warning, although performance with/without the plugin look very similar.
full-debug2-test-1:4024:4048 [0] net_fastsocket.cc:765 NCCL WARN Cannot get incoming CPU.
full-debug2-test-0:4300:4325 [0] net_fastsocket.cc:785 NCCL WARN Maximum retry reached for accept 3.
full-debug2-test-1:4024:4055 [0] net_fastsocket.cc:674 NCCL WARN Maximum retry reached for connect 3.
full-debug2-test-0:4300:4325 [0] NCCL INFO accept qid: 3, rqid: 3
full-debug2-test-0:4300:4325 [0] NCCL INFO accept incoming cpu: 0
full-debug2-test-0:4300:4325 [0] NCCL INFO NET/FastSocket : Connected after 1000 retries.
full-debug2-test-0:4300:4325 [0] NCCL INFO NET/FastSocket : Accepted data socket 3
full-debug2-test-0:4300:4348 [0] net_fastsocket.cc:652 NCCL WARN Cannot get incoming CPU.
full-debug2-test-1:4024:4055 [0] NCCL INFO connect incoming cpu: 0
full-debug2-test-1:4024:4055 [0] NCCL INFO connect qid: 3, rqid: 3
full-debug2-test-1:4024:4055 [0] NCCL INFO NET/FastSocket : Connected after 1000 retries.
full-debug2-test-1:4024:4055 [0] NCCL INFO NET/FastSocket : Connected data socket 3
full-debug2-test-1:4024:4048 [0] net_fastsocket.cc:765 NCCL WARN Cannot get incoming CPU.
full-debug2-test-1:4024:4055 [0] NCCL INFO NET/FastSocket : Async connect done
full-debug2-test-0:4300:4348 [0] net_fastsocket.cc:652 NCCL WARN Cannot get incoming CPU.
full-debug2-test-1:4024:4048 [0] net_fastsocket.cc:765 NCCL WARN Cannot get incoming CPU.
full-debug2-test-0:4300:4348 [0] net_fastsocket.cc:652 NCCL WARN Cannot get incoming CPU
Collecting package metadata (current_repodata.json): done
Solving environment: done
Cloning into 'nccl-fastsocket'...
remote: Enumerating objects: 86, done.
remote: Counting objects: 100% (86/86), done.
remote: Compressing objects: 100% (59/59), done.
remote: Total 86 (delta 49), reused 61 (delta 24), pack-reused 0
Receiving objects: 100% (86/86), 39.02 KiB | 539.00 KiB/s, done.
Resolving deltas: 100% (49/49), done.
Starting local Bazel server and connecting to it...
INFO: SHA256 (https://github.com/bazelbuild/rules_pkg/archive/main.zip) = a73b8dd453c788f2fc994b1714664c1a0d295b05144daa84d8b2d08603f5ac32
DEBUG: Rule 'rules_pkg' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "a73b8dd453c788f2fc994b1714664c1a0d295b05144daa84d8b2d08603f5ac32"
DEBUG: Repository rules_pkg instantiated at:
no stack (--record_rule_instantiation_callstack not enabled)
Repository rule http_archive defined at:
/root/.cache/bazel/_bazel_root/70a63a26ed5cd2b68e457225637f1b0c/external/bazel_tools/tools/build_defs/repo/http.bzl:336:31: in
ERROR: /root/.cache/bazel/_bazel_root/70a63a26ed5cd2b68e457225637f1b0c/external/rules_pkg/pkg/private/pkg_files.bzl:588:16: name 'json' is not defined
ERROR: /root/.cache/bazel/_bazel_root/70a63a26ed5cd2b68e457225637f1b0c/external/rules_pkg/pkg/private/pkg_files.bzl:590:16: name 'json' is not defined
INFO: Repository rules_license instantiated at:
no stack (--record_rule_instantiation_callstack not enabled)
Repository rule http_archive defined at:
/root/.cache/bazel/_bazel_root/70a63a26ed5cd2b68e457225637f1b0c/external/bazel_tools/tools/build_defs/repo/http.bzl:336:31: in
ERROR: Skipping ':all': while parsing ':all': error loading package '': in /root/.cache/bazel/_bazel_root/70a63a26ed5cd2b68e457225637f1b0c/external/rules_pkg/pkg/tar.bzl: in /root/.cache/bazel/_bazel_root/70a63a26ed5cd2b68e457225637f1b0c/external/rules_pkg/pkg/private/tar/tar.bzl: Extension 'pkg/private/pkg_files.bzl' has errors
WARNING: Target pattern parsing failed.
ERROR: while parsing ':all': error loading package '': in /root/.cache/bazel/_bazel_root/70a63a26ed5cd2b68e457225637f1b0c/external/rules_pkg/pkg/tar.bzl: in /root/.cache/bazel/_bazel_root/70a63a26ed5cd2b68e457225637f1b0c/external/rules_pkg/pkg/private/tar/tar.bzl: Extension 'pkg/private/pkg_files.bzl' has errors
INFO: Elapsed time: 4.993s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (1 packages loaded)
as README stated
NCCL Fast Socket is based on TCP/IP communication and uses a number of techniques to achieve better and more consistent performance, especially with 100 Gbps networking on Google Cloud
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.