baidu-research / deepbench Goto Github PK
View Code? Open in Web Editor NEWBenchmarking Deep Learning operations on different hardware
License: Apache License 2.0
Benchmarking Deep Learning operations on different hardware
License: Apache License 2.0
The DeepBench makefiles assume certain libs and header files (e.g. libmpi) are located in the same directory tree. For reasons I have yet to discern RHEL installs the libraries in one directory structure and the headers in another (in the -devel packages). By applying the below patch and adding "MPI_INCLUDE_PATH=/usr/include/openmpi-x86_64" DeepBench now builds cleanly on RHEL; I've coded the change such that it shouldn't require any changes for systems that have libraries and header files in the same directory.
diff -ur DeepBench.orig/code/baidu_allreduce/Makefile DeepBench/code/baidu_allreduce/Makefile
--- DeepBench.orig/code/baidu_allreduce/Makefile 2017-12-14 14:53:03.255428367 -0500
+++ DeepBench/code/baidu_allreduce/Makefile 2017-12-12 12:15:19.396655494 -0500
@@ -6,6 +6,7 @@
CUDA_PATH?=/usr/local/cuda
CUDA_LIB64=$(CUDA_PATH)/lib64
MPI_PATH?=/usr/local/openmpi
+MPI_INCLUDE_PATH?=/usr/local/openmpi
BAIDU_ALLREDUCE_PATH?=/local/baidu-allreduce
BIN_DIR?=bin
MKDIR=mkdir -p
@@ -21,8 +22,8 @@
ring_all_reduce:
clean:
diff -ur DeepBench.orig/code/nvidia/Makefile DeepBench/code/nvidia/Makefile
--- DeepBench.orig/code/nvidia/Makefile 2017-12-14 14:53:03.257428373 -0500
+++ DeepBench/code/nvidia/Makefile 2017-12-12 11:49:20.379360097 -0500
@@ -7,6 +7,7 @@
CUDNN_PATH?=/usr/local/cudnn
NCCL_PATH?=/usr/local/nccl
MPI_PATH?=/usr/local/openmpi
+MPI_INCLUDE_PATH?=/usr/include/openmpi
BIN_DIR?=bin
MKDIR=mkdir -p
#BLAS
@@ -45,7 +46,7 @@
nccl_mpi:
sparse:
diff -ur DeepBench.orig/code/osu_allreduce/Makefile DeepBench/code/osu_allreduce/Makefile
--- DeepBench.orig/code/osu_allreduce/Makefile 2017-12-14 14:53:03.258428376 -0500
+++ DeepBench/code/osu_allreduce/Makefile 2017-12-12 11:51:07.655654032 -0500
@@ -3,6 +3,7 @@
CC_FLAGS= -c -O2 -pthread -Wall -march=native
MPI_PATH?=/usr/local/openmpi
+MPI_INCLUDE_PATH?=/usr/local/openmpi
CUDA_PATH?=/usr/local/cuda
MKDIR=mkdir -p
BIN_DIR?=bin
@@ -17,10 +18,10 @@
coll:
allreduce:
clean:
rm -rf $(BIN_DIR)
Hi,
I got some error while executing,i compiled it with cuda 7.5, gcc/4.9.4, cudnn/5.0.5 and i am trying to execute the generated binary file:
"nccl_mpi_all_reduce" and
"nccl_single_all_reduce"
like
mpirun -np 1 bin/nccl_mpi_all_reduce
and
srun ./bin/nccl_single_all_reduce 1
respectively, and i am getting an error
load gcc/4.9.4 (PATH, MANPATH, LD_LIBRARY_PATH)
Set GNU compilers as MPI wrappers backend
load CUDNN/5.0.5 (LD_LIBRARY_PATH, LIBRARY_PATH, C_INCLUDE_PATH, CPLUS_INCLUDE_PATH)
terminate called after throwing an instance of 'thrust::system::system_error'
what(): function_attributes(): after cudaFuncGetAttributes: invalid device function
[nva10:06393] *** Process received signal ***
[nva10:06393] Signal: Aborted (6)
[nva10:06393] Signal code: (-6)
[nva10:06393] [ 0] /lib64/libpthread.so.0() [0x3b3ee0f790]
[nva10:06393] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3b3e632625]
[nva10:06393] [ 2] /lib64/libc.so.6(abort+0x175) [0x3b3e633e05]
[nva10:06393] [ 3] /apps/GCC/4.9.4/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x15d) [0x2b3c2f147acd]
and
load gcc/4.9.4 (PATH, MANPATH, LD_LIBRARY_PATH)
Set GNU compilers as MPI wrappers backend
load CUDNN/5.0.5 (LD_LIBRARY_PATH, LIBRARY_PATH, C_INCLUDE_PATH, CPLUS_INCLUDE_PATH)
terminate called after throwing an instance of 'thrust::system::system_error'
terminate called after throwing an instance of 'thrust::system::system_error'
what(): function_attributes(): after cudaFuncGetAttributes: invalid device function what(): function_attributes(): after cudaFuncGetAttributes: invalid device function
respectively ..
if there is any solution for this please,,,,
thank you so much
Hi
Anyway to do that?
In code/intel/gemm/bench.cpp:128
, the size of matrix b is calculated as std::max(sizea, max_sizeb)
, this is wrong and leads to a segmentation fault if the test is run with a subset of matrix sizes from the the default input set. Line 128 should read:
max_sizeb = std::max(sizeb, max_sizeb);
I was able to compile and run the benchmarks. However, have trouble interpreting the results. Copied below is the result after running the intel benchmarks. I am not sure, how to specify the variables for the benchmark.
Intel GEMM Benchmark:
~~~~~~~~~~~~~~~~~~
SGEMM(N,N,512,32,512) 24.6 usec 682.47345 GFlop/sec
SGEMM(N,N,1024,32,512) 29.6 usec 1133.19128 GFlop/sec
SGEMM(T,N,512,48000,2816) 30200.0 usec 4583.18308 GFlop/sec
SGEMM(T,N,512,48000,2048) 22696.9 usec 4435.11591 GFlop/sec
SGEMM(T,N,512,48000,2560) 27688.0 usec 4544.54369 GFlop/sec
SGEMM(T,N,512,48000,1536) 17718.4 usec 4260.96345 GFlop/sec
SGEMM(T,N,1024,48000,2816) 59694.3 usec 4637.35822 GFlop/sec
SGEMM(T,N,1024,48000,2048) 44873.5 usec 4486.53753 GFlop/sec
SGEMM(T,N,1024,48000,2560) 54299.2 usec 4634.65544 GFlop/sec
SGEMM(T,N,1024,48000,1536) 34313.9 usec 4400.40483 GFlop/sec
SGEMM(N,T,512,32,512) 25.9 usec 647.68659 GFlop/sec
SGEMM(N,T,1024,32,512) 29.5 usec 1136.88506 GFlop/sec
Total time 7512760.1 usec, Overall Performance: 3554.80515 GFlop/sec
Intel Convolution Benchmark
~~~~~~~~~~~~~~~~~~~~~~
##########################################
# Performance - FWD (custom-Storage) #
##########################################
GFLOP = 1.6442
fp time = 0.00063374
GFLOPS = 2594.4
PERFDUMP,FP,1.8.2-185,64,16,2048,512,7,7,1,1,1,0,0,0.00063374,2594.4,15346.564300,15346.564276,0.000037,1.749877,0.000001,1.745342,0.000031
##########################################
# Performance - BWD (custom-Storage) #
##########################################
GFLOP = 1.6442
bp time = 0.047238
GFLOPS = 34.806
PERFDUMP,BP,1.8.2-185,64,16,2048,512,7,7,1,1,1,0,0,0.047238,34.806,45863.489576,45863.489619,0.000034,0.138520,0.000000,0.101015,0.000031
##########################################
# Performance - UPD (custom-Storage) #
##########################################
GFLOP = 1.6442
wu time = 0.00059244
GFLOPS = 2775.2
PERFDUMP,WU,1.8.2-185,64,16,2048,512,7,7,1,1,1,0,0,0.00059244,2775.2,31919.598646,31919.598646,0.000000,0.000000,0.000000,0.000000,0.000000
Hi,
I can provide you results with Tesla P100 and multiple Intel CPUs but I was wondering what was the best way to submit the results ?
Thanks,
Jerome
os: ubuntu 16.4
GPU: Tesla V100-SXM2 *4 ,one node
cuda:9.1
nccl: 2.1.15
cudnn:7.0
cmd:
make CUDA_PATH=/usr/local/cuda CUDNN_PATH=/usr/lib/x86_64-linux-gnu MPI_PATH=/usr/local/openmpi-1.10.2_cuda9.1 NCCL_PATH=/usr/lib/x86_64-linux-gnu USE_TENSOR_CORES=1 ARCH=sm_70
while compile the benchmark, it met the error, just like:
./kernels/gemm_problems.h:2:0: required from here
/usr/include/c++/6/tuple:489:65: error: mismatched argument pack lengths while expanding โstd::is_convertible<_UElements&&, _Elements>โ
return _and<is_convertible<_UElements&&, _Elements>...>::value;
^~~~~
/usr/include/c++/6/tuple:490:1: error: body of constexpr function โstatic constexpr bool std::_TC<, _Elements>::_ImplicitlyMoveConvertibleTuple() [with _UElements = {const std::tuple<int, int, int, bool, bool>&}; bool = true; _Elements = {int, int, int, bool, bool}]โ not a return-statement
}
^
Makefile:30: recipe for target 'gemm' failed
make[1]: *** [gemm] Error 1
make[1]: Leaving directory '/root/DeepBench/code/nvidia'
can you give me some suggestion.
Hi , I'm a beginner to the Deepbench , I'm confused about the deepbench working environment as follows:
1. the exact revison to the ICC MKL and MPI and how about the ubuntu 16.1(GNU/Linux 4.8.0-22-generic x86_64) or which ubuntu revsion suggest?
2. I'm very interested the All-Reduce ,but the operation" run_allreduce_ia.sh <osu_allreduce binary> " what the osu_allreduce binary and the hostfile refer?
look forward to you reply, thank you!
Hi, in DeepBench/code/intel/convolution/mkl_conv/std_conv_bench.cpp: function calc_flops only calculate the forward flops, it did not contain backward flops, is this a bug ?
I'm looking to keep things consistent when running your benchmark against a GTX 1060; were any of the library versions changed from those listed (such as using cuda 8.0 instead of 7.5) when testing the TitanX Pascal?
I have noticed that a formulation is used to calculate the TFLOPS of each operator in the XLS files.
For recurrent layers - LSTM, it is calculated as =(8*$E296*$D296*$C296*$C296)/(G296/1000)/10^12
.
It seems it just includes the computation of GEMM, and doesn't include compution of sigmoid or tanh.
Could you explain the formulation more detaily, especially for LSTM/GRU items. Thanks.
In DeepBench/code/intel/convolution/mkl_conv/run_mkl_conv_ia.sh,
33 if lscpu | grep Flags | grep -qs avx512dq; then
34 ./run_mkl_conv_ia_SKX.sh
35 elif lscpu | grep Flags | grep -qa avx512f; then
36 ./run_mkl_conv_ia_KNL.sh
37 else
38
39 fi
But the /run_mkl_conv_ia_SKX.sh /run_mkl_conv_ia_KNL.sh /run_mkl_conv_ia_KNL.sh can not be found in the repo.
Hello,
I want to use Deepbench for matrix multiplications and convolutions with low precision(8 bit, float16)with Theano and Keras. Can you please let me know how to integrate and use this library with Theano and Keras? What do I need to change in Theano and Keras ? I am only doing inference and not training with 8-bit etc. i.e I don't need to calculate backward propagations in low precisions.
Setup: A CNN trained in Keras/Theano with FP32 weights, bias and activations which is then quantized to 8-bit, 16-bit etc. That means I have a quantized model. I want to do inference or forward path by matrix multiplications of these 8-bit, 16-bit quantized weights, bias and activations (i.e these matrices have 8-bit values)... This means I have to do 8-bit multiply and 32-bit accumulate using Deepbench. I will use cuDNN v6 and CUDA8.0 on GTX 1080 TI Nvidia processor(Pascal architecture) which has support for DP4A.
./run_gemm_bench.sh mkdir -p bin
g++ -O3 --std=c++11 -I ../kernels/ -lpthread -mfpu=neon-fp-armv8 -o bin/gemm_bench gemm_bench.cc
g++: error: unrecognized command line option โ-mfpu=neon-fp-armv8โ
make: *** [gemm] Error 1
build success!
start running!
./run_gemm_bench.sh: line 8: bin/gemm_bench: No such file or directory
running complete!
sorry๏ผi debug for a long time๏ผbut i don't know where the problem happened
It's exciting to see benchmarks for V100, especially given recent release of Tesla V, which is expected to have similar performance.
The numbers in DeepBench look amazingly good: basically V100 is 3 times faster than 1080ti in RNNs.
In fact, numbers are so good I started to doubt: did you benchmark a single V100 or a full DGX-1 with 8 Tesla V100's?
What is the reason for not incorporating/benchmarking BackwardWeights
at least for NVIDIA? There is no use of cudnnRNNBackwardWeights
.
m n k a_t b_t precision time (usec)
terminate called after throwing an instance of 'thrust::system::system_error'
what(): function_attributes(): after cudaFuncGetAttributes: invalid device function
1760 16 1760 0 0Aborted
I don't know what goes wrong.
I am seeing it run for several hours on a KNL machine. Can someone advise me what Config setup to be ensured before running from the BIOS, SW and HW standpoints?
Recent mobile processors have CPUs, GPUs, DSPs and NPUs that are capable of Teraflop-level deep learning operations.
Would be great to have support for high-end mobile processors, such as Snapdragon 845 with their acceleration cores.
I'm trying to run deepbench on my Nvidia GTX 1080, with cuda 8, cudnn 5, NCCL and openmpi2.1. But meet some problems as the following shown, can anyone help on this?
/DeepBench/code# make CUDA_PATH=/usr/local/cuda-8.0 CUDNN_PATH=/usr/local/cuda-8.0/targets/x86_64-linux MPI_PATH=/usr/local/open[196/1916]
ATH=/usr/local/nccl
mkdir -p bin
make -C nvidia
make[1]: Entering directory '/DeepBench/code/nvidia'
mkdir -p bin
/usr/local/cuda-8.0/bin/nvcc gemm_bench.cu -DUSE_TENSOR_CORES=0 -DPAD_KERNELS=1 -o bin/gemm_bench -I ../kernels/ -I /usr/local/cuda-8.0/include -L /usr/loc$
l/cuda-8.0/lib64 -lcublas -L /usr/local/cuda-8.0/lib64 -lcurand --generate-code arch=compute_52,code=sm_52 -std=c++11
mkdir -p bin
/usr/local/cuda-8.0/bin/nvcc conv_bench.cu -DUSE_TENSOR_CORES=0 -DPAD_KERNELS=1 -o bin/conv_bench -I ../kernels/ -I /usr/local/cuda-8.0/include -I /usr/loc$
l/cuda-8.0/targets/x86_64-linux/include/ -L /usr/local/cuda-8.0/targets/x86_64-linux/lib64/ -L /usr/local/cuda-8.0/lib64 -lcurand -lcudnn --generate-code a$
ch=compute_52,code=sm_52 -std=c++11
mkdir -p bin
/usr/local/cuda-8.0/bin/nvcc rnn_bench.cu -DUSE_TENSOR_CORES=0 -o bin/rnn_bench -I ../kernels/ -I /usr/local/cuda-8.0/include -I /usr/local/cuda-8.0/target$
/x86_64-linux/include/ -L /usr/local/cuda-8.0/targets/x86_64-linux/lib64/ -L /usr/local/cuda-8.0/lib64 -lcurand -lcudnn --generate-code arch=compute_52,cod$
=sm_52 -std=c++11
mkdir -p bin
/usr/local/cuda-8.0/bin/nvcc nccl_single_all_reduce.cu -o bin/nccl_single_all_reduce -I ../kernels/ -I /usr/local/nccl/include/ -I /usr/local/cuda-8.0/targ$
ts/x86_64-linux/include/ -L /usr/local/nccl/lib/ -L /usr/local/cuda-8.0/targets/x86_64-linux/lib64 -lnccl -lcudart -lcurand --generate-code arch=compute_52$
code=sm_52 -std=c++11
mkdir -p bin
/usr/local/cuda-8.0/bin/nvcc nccl_mpi_all_reduce.cu -o bin/nccl_mpi_all_reduce -I ../kernels/ -I /usr/local/nccl/include/ -I /usr/local/cuda-8.0/targets/x86
_64-linux/include/ -I /usr/local/openmpi/include -L /usr/local/nccl/lib/ -L /usr/local/cuda-8.0/targets/x86_64-linux/lib64 -L /usr/local/openmpi/lib -lnccl
-lcurand -lcudart -lmpi --generate-code arch=compute_52,code=sm_52 -std=c++11
/usr/bin/ld: warning: libopen-rte.so.20, needed by /usr/local/openmpi/lib/libmpi.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libopen-pal.so.20, needed by /usr/local/openmpi/lib/libmpi.so, not found (try using -rpath or -rpath-link)
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_argv_append' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_var_get'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_strerror' /usr/local/openmpi/lib/libmpi.so: undefined reference to
orte_process_info'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_getcwd' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_pmix_pdata_t_class'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_hash_table_get_first_key_uint32' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_free_list_t_class'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_show_help' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_allocator_component_lookup'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_hwloc1112_hwloc_bitmap_compare' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_backtrace_buffer'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_datatype_predefined_elem_desc' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_mpool_base_framework'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_hash_table_get_value_ptr' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_thread_get_self'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_process_name_print' /usr/local/openmpi/lib/libmpi.so: undefined reference to
orte_session_dir_cleanup'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_hwloc_base_get_topology' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_convertor_pack'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_hwloc_base_single_cpu' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_path_nfs'
/usr/local/openmpi/lib/libmpi.so: undefined reference to orte_errmgr' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_namelist_t_class'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_info_close_components' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_mpool_base_alloc'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_value_load' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_bitmap_clear_bit'
/usr/local/openmpi/lib/libmpi.so: undefined reference to mca_mpool_base_free' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_proc_local_set'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_pointer_array_t_class' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_pvar_find_by_name'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_datatype_create_desc' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_convertor_prepare_for_send'
/usr/local/openmpi/lib/libmpi.so: undefined reference to mca_base_var_enum_create_flag' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_progress_set_event_poll_rate'
/usr/local/openmpi/lib/libmpi.so: undefined reference to orte_info_show_orte_version' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_hash_table_set_value_uint32'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_pointer_array_test_and_set_item' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_free_list_grow_st'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_progress' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_mutex_t_class'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_datatype_set_element_count' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_datatype_commit'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_finalize_util' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_progress_set_yield_when_idle'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_pmix_collect_all_data' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_argv_append_unique_nosize'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_hwloc1112_hwloc_bitmap_isincluded' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_init_util'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_pmix' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_hash_table_init'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_bitmap_t_class' /usr/local/openmpi/lib/libmpi.so: undefined reference to
orte_name_wildcard'
/usr/local/openmpi/lib/libmpi.so: undefined reference to mca_base_var_register' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_hash_table_remove_value_uint32'
/usr/local/openmpi/lib/libmpi.so: undefined reference to orte_rml' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_pvar_handle_reset'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_pmix_base_exchange' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_hash_table_set_value_ptr'
/usr/local/openmpi/lib/libmpi.so: undefined reference to orte_info_close_components' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_built_with_cuda_support'
/usr/local/openmpi/lib/libmpi.so: undefined reference to orte_in_parallel_debugger' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_mpool_base_tree_print' [107/1916]
/usr/local/openmpi/lib/libmpi.so: undefined reference to mca_base_framework_components_close' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_strncpy'
/usr/local/openmpi/lib/libmpi.so: undefined reference to mca_base_var_enum_create' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_convertor_raw'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_bitmap_set_bit' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_rand'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_proc_t_class' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_progress_unregister'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_argv_free' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_info_out'
/usr/local/openmpi/lib/libmpi.so: undefined reference to orte_util_print_name_args' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_pvar_handle_stop'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_arch_set_fortran_logical_size' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_memory_base_malloc_init_hook'
/usr/local/openmpi/lib/libmpi.so: undefined reference to mca_base_component_close' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_hwloc_topology'
/usr/local/openmpi/lib/libmpi.so: undefined reference to mca_base_var_group_get_stamp' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_allocator_base_framework'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_hwloc1112_hwloc_get_obj_by_depth' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_framework_is_open'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_hwloc_base_get_available_cpus' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_pvar_handle_read_value'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_install_dirs' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_var_get_count'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_thread_self_compare' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_hwloc1112_hwloc_get_type_depth'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_local_arch' /usr/local/openmpi/lib/libmpi.so: undefined reference to
orte_proc_is_bound'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_free_list_init' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_var_group_get'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_info_register_framework_params' /usr/local/openmpi/lib/libmpi.so: undefined reference to
orte_rml_recv_cb_t_class'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_pmix_base_async_modex' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_progress_event_users_increment'
/usr/local/openmpi/lib/libmpi.so: undefined reference to orte_init' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_convert_process_name_to_string'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_output_close' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_backtrace_print'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_hash_table_set_value_uint64' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_convert_string_to_process_name'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_progress_set_event_flag' /usr/local/openmpi/lib/libmpi.so: undefined reference to
orte_rml_recv_callback'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_datatype_dump_data_desc' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_value_t_class'
/usr/local/openmpi/lib/libmpi.so: undefined reference to orte_util_convert_process_name_to_string' [62/1916] /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_cuda_support'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_output_stream_t_class' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_var_dump'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_datatype_resize' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_argv_split'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_srand' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_datatype_dump_data_flags'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_hwloc1112_hwloc_get_cpubind' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_list_item_t_class'
/usr/local/openmpi/lib/libmpi.so: undefined reference to mca_base_var_find_by_name' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_select'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_btl_base_framework' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_hash_table_remove_value_ptr'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_progress_event_users_decrement' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_setenv'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_bitmap_is_set_bit' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_hash_table_get_value_uint64'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_bitmap_init' /usr/local/openmpi/lib/libmpi.so: undefined reference to
orte_util_compare_name_fields'
/usr/local/openmpi/lib/libmpi.so: undefined reference to mca_base_component_to_string' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_free_list_item_t_class'
/usr/local/openmpi/lib/libmpi.so: undefined reference to mca_base_var_set_value' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_proc_for_name'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_hash_table_get_next_key_uint32' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_pvar_handle_write_value'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_convertor_t_class' /usr/local/openmpi/lib/libmpi.so: undefined reference to
orte_proc_applied_binding'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_datatype_finalize' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_argv_join'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_condition_t_class' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_components_close'
/usr/local/openmpi/lib/libmpi.so: undefined reference to mca_base_var_get_value' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_var_register_synonym'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_list_sort' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_component_repository_release'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_pointer_array_add' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_output'
/usr/local/openmpi/lib/libmpi.so: undefined reference to orte_finalize' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_datatype_contain_basic_datatypes'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_abort_print_stack' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_hwloc1112_hwloc_bitmap_alloc'
/usr/local/openmpi/lib/libmpi.so: undefined reference to mca_base_component_var_register' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_compare_proc'
/usr/local/openmpi/lib/libmpi.so: undefined reference to `mca_base_var_group_get_count'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_info_register_project_frameworks' [17/1916] /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_framework_close'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_abort_delay' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_pvar_get_count'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_datatype_copy_content_same_ddt' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_convertor_create'
/usr/local/openmpi/lib/libmpi.so: undefined reference to mca_base_framework_open' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_output_verbose'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_output_open' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_pvar_session_t_class'
/usr/local/openmpi/lib/libmpi.so: undefined reference to orte_odls' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_hwloc1112_hwloc_bitmap_free'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_pointer_array_init' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_info_make_version_str'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_argv_append_nosize' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_pvar_handle_alloc'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_hash_table_get_value_uint32' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_pvar_get'
/usr/local/openmpi/lib/libmpi.so: undefined reference to mca_base_pvar_handle_free' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_value_unload'
/usr/local/openmpi/lib/libmpi.so: undefined reference to orte_info_register_framework_params' /usr/local/openmpi/lib/libmpi.so: undefined reference to
orte_session_dir_finalize'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_class_initialize' /usr/local/openmpi/lib/libmpi.so: undefined reference to
orte_util_convert_string_to_process_name'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_hash_table_t_class' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_component_list_item_t_class'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_progress_register' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_var_find'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_datatype_init' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_datatype_add'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_hwloc_base_cset2mapstr' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_datatype_get_element_count'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_datatype_clone' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_info_show_opal_version'
/usr/local/openmpi/lib/libmpi.so: undefined reference to mca_base_var_group_find_by_name' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_buffer_t_class'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_convertor_prepare_for_recv' /usr/local/openmpi/lib/libmpi.so: undefined reference to
mca_base_framework_components_open'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_datatype_t_class' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_bitmap_find_and_set_first_unset_bit'
/usr/local/openmpi/lib/libmpi.so: undefined reference to mca_base_pvar_handle_start' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_pointer_array_set_item'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_dss' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_rcache_base_framework'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_process_info' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_bitmap_set_max_size'
/usr/local/openmpi/lib/libmpi.so: undefined reference to opal_object_t_class' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_list_t_class'
/usr/local/openmpi/lib/libmpi.so: undefined reference to orte_standalone_operation' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_pmix_app_t_class'
/usr/local/openmpi/lib/libmpi.so: undefined reference to orte_ess' /usr/local/openmpi/lib/libmpi.so: undefined reference to
opal_hwloc_base_cset2str'
/usr/local/openmpi/lib/libmpi.so: undefined reference to `opal_convertor_unpack'
collect2: error: ld returned 1 exit status
Makefile:47: recipe for target 'nccl_mpi' failed
make[1]: *** [nccl_mpi] Error 1
make[1]: Leaving directory '/DeepBench/code/nvidia'
Makefile:6: recipe for target 'nvidia' failed
make: *** [nvidia] Error 2
I'm trying to run deepbench on my arm64 platform. I had installed ARM compute library on my platform and point ARM_COMPUTE_INCLUDE_PATH, ARM_COMPUTE_LIB_PATH to correct path. like as followings:
ARM_COMPUTE_INCLUDE_PATH?=/root/DeepBench/code/arm/arm_compute-v17.06-bin
ARM_COMPUTE_LIB_PATH?=/root/DeepBench/code/arm/arm_compute-v17.06-bin
ARM_COMPUTE_LIB=$(ARM_COMPUTE_LIB_PATH)/lib/linux-arm64-v8a-neon
Build logs as followings, can someone help on this?
root@localhost:~/DeepBench/code/arm# make conv
mkdir -p bin
g++ -O3 -std=c++11 -I /root/DeepBench/code/arm/arm_compute-v17.06-bin -I ../kernels/ -std=c++11 -larm_compute -L /root/DeepBench/code/arm/arm_compute-v17.06-bin/lib/linux-arm64-v8a-neon convolution.cpp -o bin/conv_bench
/tmp/ccGfptls.o: In function time_cnn(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, int, unsigned int, unsigned int, unsigned int, unsigned int, int)': convolution.cpp:(.text+0x5c): undefined reference to
arm_compute::Tensor::Tensor()'
convolution.cpp:(.text+0x64): undefined reference to arm_compute::Tensor::Tensor()' convolution.cpp:(.text+0x6c): undefined reference to
arm_compute::Tensor::Tensor()'
convolution.cpp:(.text+0x74): undefined reference to arm_compute::Tensor::Tensor()' convolution.cpp:(.text+0xb0): undefined reference to
arm_compute::NEConvolutionLayer::NEConvolutionLayer()'
convolution.cpp:(.text+0x104): undefined reference to arm_compute::Tensor::allocator()' convolution.cpp:(.text+0x120): undefined reference to
arm_compute::TensorInfo::TensorInfo(arm_compute::TensorShape const&, unsigned long, arm_compute::DataType, int)'
convolution.cpp:(.text+0x12c): undefined reference to arm_compute::ITensorAllocator::init(arm_compute::TensorInfo const&)' convolution.cpp:(.text+0x1e8): undefined reference to
arm_compute::Tensor::allocator()'
convolution.cpp:(.text+0x204): undefined reference to arm_compute::TensorInfo::TensorInfo(arm_compute::TensorShape const&, unsigned long, arm_compute::DataType, int)' convolution.cpp:(.text+0x210): undefined reference to
arm_compute::ITensorAllocator::init(arm_compute::TensorInfo const&)'
convolution.cpp:(.text+0x218): undefined reference to arm_compute::Tensor::allocator()' convolution.cpp:(.text+0x234): undefined reference to
arm_compute::TensorInfo::TensorInfo(arm_compute::TensorShape const&, unsigned long, arm_compute::DataType, int)'
convolution.cpp:(.text+0x240): undefined reference to arm_compute::ITensorAllocator::init(arm_compute::TensorInfo const&)' convolution.cpp:(.text+0x248): undefined reference to
arm_compute::Tensor::allocator()'
convolution.cpp:(.text+0x264): undefined reference to arm_compute::TensorInfo::TensorInfo(arm_compute::TensorShape const&, unsigned long, arm_compute::DataType, int)' convolution.cpp:(.text+0x270): undefined reference to
arm_compute::ITensorAllocator::init(arm_compute::TensorInfo const&)'
convolution.cpp:(.text+0x2cc): undefined reference to arm_compute::NEConvolutionLayer::configure(arm_compute::ITensor const*, arm_compute::ITensor const*, arm_compute::ITensor const*, arm_compute::ITensor*, arm_compute::PadStrideInfo const&, arm_compute::WeightsInfo const&)' convolution.cpp:(.text+0x2d4): undefined reference to
arm_compute::Tensor::allocator()'
convolution.cpp:(.text+0x2e8): undefined reference to arm_compute::Tensor::allocator()' convolution.cpp:(.text+0x2fc): undefined reference to
arm_compute::Tensor::allocator()'
convolution.cpp:(.text+0x310): undefined reference to arm_compute::Tensor::allocator()' convolution.cpp:(.text+0x324): undefined reference to
arm_compute::NEConvolutionLayer::run()'
convolution.cpp:(.text+0x340): undefined reference to arm_compute::NEConvolutionLayer::run()' convolution.cpp:(.text+0x364): undefined reference to
vtable for arm_compute::NEConvolutionLayer'
convolution.cpp:(.text+0x368): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text+0x36c): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text+0x374): undefined reference to vtable for arm_compute::NEConvolutionLayer' convolution.cpp:(.text+0x378): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text+0x37c): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text+0x3c4): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text+0x3c8): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text+0x400): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text+0x404): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text+0x43c): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text+0x440): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text+0x474): undefined reference to
vtable for arm_compute::NEConvolutionLayerReshapeWeights'
convolution.cpp:(.text+0x47c): undefined reference to vtable for arm_compute::NEConvolutionLayerReshapeWeights' convolution.cpp:(.text+0x480): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text+0x484): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text+0x4c0): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text+0x4c4): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text+0x4fc): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text+0x500): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text+0x538): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text+0x53c): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text+0x574): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text+0x578): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text+0xb3c): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text+0xb40): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text+0xb48): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text+0xb4c): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text+0xb64): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text+0xb68): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text+0xb80): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text+0xb84): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text+0xb9c): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text+0xba0): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text+0xbc0): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text+0xbc4): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text+0xbd0): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text+0xbd4): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text+0xbf0): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text+0xbf4): undefined reference to vtable for arm_compute::TensorAllocator' /tmp/ccGfptls.o: In function
time_cnn(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, int, unsigned int, unsigned int, unsigned int, unsigned int, int) [clone .constprop.22]':
convolution.cpp:(.text+0xc78): undefined reference to arm_compute::Tensor::Tensor()' convolution.cpp:(.text+0xc80): undefined reference to
arm_compute::Tensor::Tensor()'
convolution.cpp:(.text+0xc88): undefined reference to arm_compute::Tensor::Tensor()' convolution.cpp:(.text+0xc90): undefined reference to
arm_compute::Tensor::Tensor()'
convolution.cpp:(.text+0xcc8): undefined reference to arm_compute::NEConvolutionLayer::NEConvolutionLayer()' convolution.cpp:(.text+0xd1c): undefined reference to
arm_compute::Tensor::allocator()'
convolution.cpp:(.text+0xd38): undefined reference to arm_compute::TensorInfo::TensorInfo(arm_compute::TensorShape const&, unsigned long, arm_compute::DataType, int)' convolution.cpp:(.text+0xd44): undefined reference to
arm_compute::ITensorAllocator::init(arm_compute::TensorInfo const&)'
convolution.cpp:(.text+0xe00): undefined reference to arm_compute::Tensor::allocator()' convolution.cpp:(.text+0xe1c): undefined reference to
arm_compute::TensorInfo::TensorInfo(arm_compute::TensorShape const&, unsigned long, arm_compute::DataType, int)'
convolution.cpp:(.text+0xe28): undefined reference to arm_compute::ITensorAllocator::init(arm_compute::TensorInfo const&)' convolution.cpp:(.text+0xe30): undefined reference to
arm_compute::Tensor::allocator()'
convolution.cpp:(.text+0xe4c): undefined reference to arm_compute::TensorInfo::TensorInfo(arm_compute::TensorShape const&, unsigned long, arm_compute::DataType, int)' convolution.cpp:(.text+0xe58): undefined reference to
arm_compute::ITensorAllocator::init(arm_compute::TensorInfo const&)'
convolution.cpp:(.text+0xe60): undefined reference to arm_compute::Tensor::allocator()' convolution.cpp:(.text+0xe7c): undefined reference to
arm_compute::TensorInfo::TensorInfo(arm_compute::TensorShape const&, unsigned long, arm_compute::DataType, int)'
convolution.cpp:(.text+0xe88): undefined reference to arm_compute::ITensorAllocator::init(arm_compute::TensorInfo const&)' convolution.cpp:(.text+0xee4): undefined reference to
arm_compute::NEConvolutionLayer::configure(arm_compute::ITensor const*, arm_compute::ITensor const*, arm_compute::ITensor const*, arm_compute::ITensor*, arm_compute::PadStrideInfo const&, arm_compute::WeightsInfo const&)'
convolution.cpp:(.text+0xeec): undefined reference to arm_compute::Tensor::allocator()' convolution.cpp:(.text+0xf00): undefined reference to
arm_compute::Tensor::allocator()'
convolution.cpp:(.text+0xf14): undefined reference to arm_compute::Tensor::allocator()' convolution.cpp:(.text+0xf28): undefined reference to
arm_compute::Tensor::allocator()'
convolution.cpp:(.text+0xf3c): undefined reference to arm_compute::NEConvolutionLayer::run()' convolution.cpp:(.text+0xf50): undefined reference to
arm_compute::NEConvolutionLayer::run()'
convolution.cpp:(.text+0xf70): undefined reference to vtable for arm_compute::NEConvolutionLayer' convolution.cpp:(.text+0xf74): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text+0xf78): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text+0xf80): undefined reference to
vtable for arm_compute::NEConvolutionLayer'
convolution.cpp:(.text+0xf84): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text+0xf88): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text+0xfd0): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text+0xfd4): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text+0x100c): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text+0x1010): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text+0x1048): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text+0x104c): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text+0x1080): undefined reference to vtable for arm_compute::NEConvolutionLayerReshapeWeights' convolution.cpp:(.text+0x1088): undefined reference to
vtable for arm_compute::NEConvolutionLayerReshapeWeights'
convolution.cpp:(.text+0x108c): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text+0x1090): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text+0x10cc): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text+0x10d0): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text+0x1108): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text+0x110c): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text+0x1144): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text+0x1148): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text+0x1180): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text+0x1184): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text+0x174c): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text+0x1750): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text+0x1758): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text+0x175c): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text+0x1774): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text+0x1778): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text+0x1790): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text+0x1794): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text+0x17ac): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text+0x17b0): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text+0x17d0): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text+0x17d4): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text+0x17e0): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text+0x17e4): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text+0x1800): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text+0x1804): undefined reference to
vtable for arm_compute::TensorAllocator'
/tmp/ccGfptls.o: In function arm_compute::NEConvolutionLayer::~NEConvolutionLayer()': convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD2Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x14): undefined reference to
vtable for arm_compute::NEConvolutionLayer'
convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD2Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x1c): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD2Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x20): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD2Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x28): undefined reference to vtable for arm_compute::NEConvolutionLayer' convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD2Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x2c): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD2Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x30): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD2Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x6c): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD2Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x70): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD2Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0xa8): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD2Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0xac): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD2Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0xe4): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD2Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0xe8): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD2Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x11c): undefined reference to
vtable for arm_compute::NEConvolutionLayerReshapeWeights'
convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD2Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x124): undefined reference to vtable for arm_compute::NEConvolutionLayerReshapeWeights' convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD2Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x128): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD2Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x12c): undefined reference to vtable for arm_compute::TensorAllocator' /tmp/ccGfptls.o: In function
arm_compute::NEConvolutionLayer::~NEConvolutionLayer()':
convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD0Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x14): undefined reference to vtable for arm_compute::NEConvolutionLayer' convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD0Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x1c): undefined reference to
vtable for arm_compute::Tensor'
convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD0Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x20): undefined reference to vtable for arm_compute::TensorAllocator' convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD0Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x28): undefined reference to
vtable for arm_compute::NEConvolutionLayer'
convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD0Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x2c): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD0Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x30): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD0Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x6c): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD0Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x70): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD0Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0xa8): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD0Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0xac): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD0Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0xe4): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD0Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0xe8): undefined reference to
vtable for arm_compute::TensorAllocator'
convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD0Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x11c): undefined reference to vtable for arm_compute::NEConvolutionLayerReshapeWeights' convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD0Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x124): undefined reference to
vtable for arm_compute::NEConvolutionLayerReshapeWeights'
convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD0Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x128): undefined reference to vtable for arm_compute::Tensor' convolution.cpp:(.text._ZN11arm_compute18NEConvolutionLayerD0Ev[_ZN11arm_compute18NEConvolutionLayerD5Ev]+0x12c): undefined reference to
vtable for arm_compute::TensorAllocator'
collect2: error: ld returned 1 exit status
Makefile:21: recipe for target 'conv' failed
make: *** [conv] Error 1
according to benchmark result, mkl's deep learning with convolution (not gemm) has a much slower backward speed than the forward pass.
for example , for W=341, H=79,C=32,N=4, K=32, R=5, S=10, in KNL7250 platform, forward 0.91ms, backward with input is 68.79 ms, with weight is 74.98 ms! so backward is 68 times slower than forward.
as a comparison, in titanx, forward is 0.74ms, backward with input is 3.09 ms, with weight is 0.76 ms. For forward, KNL7250 is only a little slower than titanx , but for backward, KNL7250 is much much slower. This is similar with other W,H,C configuration.
can any one give me the reason? is it because mkl has not made much optimization for backward yet?
Hi,
I want to make sbench.c file. Any tool or package should be installed before doing this?
There seems to be a discrepency with the dimensions printed and those passed to cublasSgemm for the TN case. I'm looking at file gemm_bench.cu.
Effectively, at line 177
C is k_printed by n_printed
B is m_printed by n_printed
A is m_printed by k_printed
so that
m_to_cublas, k_to_cublas = k_printed, m_printed.
This possible discrepency applies to 15 of the 78 cases.
I am new to LSTMs and GRUs. As far as I understand, SGEMM (though on small matrices) can still potentially take up a significant time in end to end execution of RNNs.
I was hoping to find the matrix sizes of the multiplications from the XLS file. But, it only mentions hidden units and timesteps. Is there a way to figure out the SGEMM call parameters from the excel sheet? Or may be from one of the codes?
I noticed that DeepBench is still using NCCL 1 for its benchmarking. Is anyone interested in NCCL 2 benchmarks or already working on them?
One of the interesting things about NCCL 2 is the ability to communicate across nodes (not just within a node). Unfortunately, I have immediate access only to a cluster of K80 machines, so my set up is not ideal for evaluating it with state-of-the-art processors, so I'm curious if anyone else is interesting in working on NCCL 2.
For recurrent layer benchmarks, I have seen that you have updated its' results for Intel hardware. But I cannot find the scripts to run this workload on Intel hardware. Could you please help to update it.
Thanks.
Hi everyone,
We will soon have an NVIDIA DGX-1 for benchmarking and I was wondering which modifications could be applied to DeepBench (parameters and so on) so that it can be used effectively on this supercomputer. We will make our results available for the community. Thanks!
Is there any explanation of how the benchmarking is done? so as to know the exact parameters which are focussed and then can be a better insight. I am very much interested to know how deepbench is working and would also want to know the P and Q parameters in the convolution results.
The parameter should be fwd_perf in line 137 at file code/nvidia/conv_bench.cu
I am trying to benchmark my GPU, but I have some problem:
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDNN failure: CUDNN_STATUS_NOT_INITIALIZED in cudnn_helper.h at line: 32
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDNN failure: CUDNN_STATUS_NOT_INITIALIZED in cudnn_helper.h at line: 32
Aborted (core dumped)
root@fe903e806138:/DeepBench/code/bin# ^C
root@fe903e806138:/DeepBench/code/bin# ./rnn_bench
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDNN failure: CUDNN_STATUS_NOT_INITIALIZED in rnn_bench.cu at line: 254
Aborted (core dumped)
root@fe903e806138:/DeepBench/code/bin#
if you have any solution, please tell me.
Example below. The padding is shown to be 3x3 but the actual config in problems file shows a 1x1 padding.
Spreadsheet -
7 | 7 | 2048 | 8 | 512 | 1 | 1 | 3 | 3 | 2 | 2 |
---|
File -
std::make_tuple(7, 7, 2048, 8, 512, 1, 1, 0, 0, 1, 1),
One example:
This spreadsheet has data for the config below:
6144 | 48000 | 2048 | T | N
But the above config is not found in the training_set here:
https://github.com/baidu-research/DeepBench/blob/master/code/kernels/gemm_problems.h
Hello!
I have just done a preliminary tuning of ISAAC (https://github.com/ptillet/isaac) for the Pascal Titan X, and it seems to outperform cuBLAS on many (but not all) shapes. What is the policy concerning which library should be used for the benchmarks ? (especially when there is not a clear winner!)
The benchmark is below. Non-overclocked Pascal Titan X
https://gist.github.com/ptillet/92caaeb0036cb2022e021da87e38b096
Hi,
when trying to run the ncc_mpi_all_reduce over in a system with two P100 I get:
Any idea on what could be the problem?
Sorry, I read the instructions wrong.
Thanks
Intel's mkl_conv Makefile
is missing the -xMIC-AVX512
flag, which is required for evaluating this benchmark on KNL. Is this intentional?
Hi,
sorry it might be the basic things, although i am the beginner...
i am getting the error on include nccl.h no such file or directory and i am little confuse in NCCl_PATH
if some one could explain this issue
thank you so much.
Hi,
While looking at the results, i came across forward(ms) w.rt inputs and parameters. What does that actually mean? Does it mean testing the forward pass with changes in the image size while doing a forward pass?
If i get some insight into how exactly the excel file mean? Thanks.
Hi,
Looks like there is a typo in spreadsheets:
Output of the first DeepSpeech convolution does not fit the second layer:
out_W = (W + 2 * pad_w - filter_w + 1) / stride_w = (700 + 0 - 5 + 1) / 2 = 348
out_H = (H + 2 * pad_h - filter_h + 1) / stride_h = (161 + 0 - 20 + 1) / 2 = 71
I guess R should be filter height and S should be filter width. In that case DeepSpeech layers fit perfectly:
out_W = (700 + 0 - 20 + 1) / 2 = 341
out_H = (161 + 0 - 5 + 1) / 2 = 79
Please also chech KWS case for the same issue.
I see the following header files present in the Intel benchmarks representing the kernels:
https://github.com/baidu-research/DeepBench/blob/master/code/intel/convolution/mkl_conv/input.h
https://github.com/baidu-research/DeepBench/blob/master/code/intel/convolution/mkl_conv/input_topologies.h
https://github.com/baidu-research/DeepBench/blob/master/code/intel/gemm/input.h
https://github.com/baidu-research/DeepBench/blob/master/code/intel/sgemm/input.h
These are still being used in the benchmark code. Could you please switch to using the header files in the kernels
folder? It would make it a lot easier to maintain the kernels going forward. @dmudiger , could you please help make these changes?
My system is installed 4 P100 GPUs and CUDA 8.0. For NCCL, it runs well. And I compile the benchmark by 'make CUDA_PATH=/usr/local/cuda CUDNN_PATH=/usr/local/cuda MPI_PATH=/home/userid/ompi NCCL_PATH=/home/userid/weike/nccl/ ARCH=sm_61'
Anyone can help me the coredump?
m n k a_t b_t time (usec)
main: #1.
main: #2.
main: #3.
terminate called after throwing an instance of 'thrust::system::system_error'
what(): function_attributes(): after cudaFuncGetAttributes: invalid device function
Aborted (core dumped)
userid@ubuntu-WK-4xP100:~/weike/DeepBench/code$
Hello,
I ran the RNN-bench on both an M40 and P100 GPU. Results shows that the Vanilla RNN TeraFLOPS result for P100 is lesser than M40 GPU.
LSTM values are better.
Do you have any insights on this ? Have you guys faced this issue ? I re-ran the benchmark but got similar results. Conv bench and GEMM bench results are favoring P100 by a huge margin.
Thanks,
Arun
Hello
I ran the above script run_DeepBench_ia.sh in a Intel KNL machine and after 4.5 hours it ended, but I am unable to find the results or summary in any of the folders. Can someone advise how I can benchmark an Intel KNL machine versus another lower end Intel machine?
Thanks
Jay Mahalingam
Calligo Technologies
Bangalore/ India
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.