davidrohr / hpl-gpu Goto Github PK

View Code? Open in Web Editor NEW

86.0 86.0 14.0 4.17 MB

High Performance Linpack for GPUs (Using OpenCL, CUDA, CAL)

License: Other

Makefile 0.46% Shell 0.03% HTML 0.41% C 87.40% C++ 11.70%

hpl-gpu's People

Contributors

Stargazers

Watchers

Forkers

ddemidov amwskl limin2021 tanakh yuxianzhi yqin whans biubiubiu-xixi w019746 bigbanglwb cocteautwins grandeep oyanghd

hpl-gpu's Issues

Does hpl-gpu with cuda backend works on a pure cpu node?

I thought it should, at least that's the impression from reading the wiki. But in reality I got this,

CUDA Error 30: unknown error
caldgemm_cuda.cu:154
Getting Device Count
Error initializing CALDGEMM, abborting run

The dgemm_bench along runs on both cpu and gpu, and hybrid of both. The hpl-gpu build runs on cpu+gpu hybrid. But I was trying to test a cluster with some pure cpu nodes and some hybrid nodes and found that the cpu one does not run. Did I do something wrong? Or if there's special tuning that I need to do like dgemm_bench?

Problem on hpl-gpu compilation

David,

I'm having some trouble when compiling the hpl-gpu code, following your tutorial. I believe I correctly installed Intel MKL and CALDGEMM, and maybe the problem is in the environment configuration. The problem is that I receive undefined references in the recipe for 'dexe.grd', in the compilation process. Here's what I get when I try to make:

/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_blas_ctrmm'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_lp64_ch_blkldlslvs_ooc_pardiso'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_lapack_chptrd'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_spblas_lp64_mkl_zskymv'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_slv_omp_nrhs_real'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_lapack_zungqr'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_serv_default_progress'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_lp64_pds_slv_nrhs_par_real'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_lp64_sssslv_thr_pardiso'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_lp64_slv_omp_real'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_lp64_sp_assemble_csr_full'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_lp64_iter_ref_seq_real'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_spblas_lp64_mkl_dskymm'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_lp64_pds_slv_omp_driver_nrhs_cmplx'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_lapack_lp64_cgetrf'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_lapack_clansy'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_lapack_zpbtrs'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_lp64_sp_pds_create_pattern_for_metis_omp'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_spblas_lp64_mkl_zcoomm'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_sparse_s_qr_i4'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_c_pre_cgs_pardiso'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_blas_gemm_s16s16s32_pack'
...
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_blas_ztrmm'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_blas_cgepack_compact'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_sp_pds_copy_a2l_value_omp_cmplx'
collect2: error: ld returned 1 exit status
Makefile:98: recipe for target 'dexe.grd' failed
make[2]: *** [dexe.grd] Error 1

Have you had this error before? Can you help me at figuring this out please?

Execute the command './mem -g -2 -c -1 -x -z -l -lh 3072 -lw 3072 -lx 20 -ly 20 -a -u' ,it returned an error message

Using interleaved memory Running linear and strided tests Linpack Mode enabled: 20 tiles of size 3072 x 3072 doubles Running dma-mem-bench, settings: Data Size 30198988800, Data Size GPU 75497472, Map GPU -2, CPU Core -1, Use Only Mapped GPUs 0, Iterations 16, Strided Test: Matrix 3072 x 24576 - Stride: 491520 1 OpenCL Platforms found Platform 0 Device 0: NVIDIA Corporation Tesla K80 (64 bits) Platform 0 Device 1: NVIDIA Corporation Tesla K80 (64 bits) Platform 0 Device 2: NVIDIA Corporation Tesla K80 (64 bits) Platform 0 Device 3: NVIDIA Corporation Tesla K80 (64 bits) No CPU device found
I have two CPU cores on this node,however it returns this error message.What caused this?

I would like to use hpl-gpu with a backend of CUDA but failed at compiling.

I failed at compiling the caldgemm.The log is:

(tensorrt) nvidia@Hewlett-Packard:~/caldgemm$ make -j8
/bin/sh: 1: Syntax error: redirection unexpected
/bin/sh: 1: [: -a: unexpected operator
makefiles/makefile:7: Unknown Architecture:  0, defaulting to x86_64-pc-linux-gnu
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.d: No such file or directory
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/caldgemm.d: No such file or directory
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/benchmark.d: No such file or directory
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/timer.d: No such file or directory
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/qmalloc.d: No such file or directory
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/caldgemm_cpu.d: No such file or directory
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/affinity.d: No such file or directory
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/threadserver.d: No such file or directory
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/qsem.d: No such file or directory
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/caldgemm_adl.d: No such file or directory
/bin/sh: 1: Syntax error: redirection unexpected
/bin/sh: 1: [: -a: unexpected operator
makefiles/makefile:7: Unknown Architecture:  0, defaulting to x86_64-pc-linux-gnu
/usr/local/cuda/bin/nvcc --compiler-bindir c++ --use_fast_math --maxrregcount 255 -O4 -Xptxas -v -Xptxas -O4 -Xcompiler -O4 -m64 `for i in 35 61; do echo -n -gencode arch=compute_$i,code=sm_$i\ ;done`  --compiler-options -I/home/nvidia/intel/mkl/include --compiler-options -I/usr/local/openmpi/include/vampirtrace --compiler-options -I"/usr/local/cuda/include" --compiler-options -I"/usr/local/cuda/sdk/common/inc" --compiler-options -DCALDGEMM_CUDA --compiler-options -DCALDGEMM_CUDA_CUBLAS --compiler-options -DUSE_MKL --compiler-options -D_64BIT  --cuda --output-file "release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.cpp" caldgemm_cuda.cu
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT  -Wno-strict-aliasing -c caldgemm.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/caldgemm.o
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT  -Wno-strict-aliasing -c benchmark.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/benchmark.o
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT   -c cmodules/timer.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/timer.o
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT   -c cmodules/qmalloc.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/qmalloc.o
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT   -c cmodules/affinity.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/affinity.o
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT   -c cmodules/threadserver.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/threadserver.o
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT   -c caldgemm_cpu.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/caldgemm_cpu.o
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT   -c cmodules/qsem.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/qsem.o
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT   -c caldgemm_adl.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/caldgemm_adl.o
caldgemm_cuda.cu(364): warning: variable "threads" was declared but never referenced

caldgemm_cuda.cu(364): warning: variable "blocks" was declared but never referenced

caldgemm_cuda.cu(364): warning: variable "threads" was declared but never referenced

caldgemm_cuda.cu(364): warning: variable "blocks" was declared but never referenced

ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function '_Z20CUDAConversionKernelPKdPdmm' for 'sm_35'
ptxas info    : Function properties for _Z20CUDAConversionKernelPKdPdmm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 14 registers, 352 bytes cmem[0]
ptxas info    : Compiling entry function '_Z17CUDAKernelLinpackPdS_S_mmmddm' for 'sm_35'
ptxas info    : Function properties for _Z17CUDAKernelLinpackPdS_S_mmmddm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 101 registers, 392 bytes cmem[0]
ptxas info    : Compiling entry function '_Z16CUDAKernelALPHA1PdS_S_mmmddm' for 'sm_35'
ptxas info    : Function properties for _Z16CUDAKernelALPHA1PdS_S_mmmddm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 101 registers, 392 bytes cmem[0]
ptxas info    : Compiling entry function '_Z10CUDAKernelPdS_S_mmmddm' for 'sm_35'
ptxas info    : Function properties for _Z10CUDAKernelPdS_S_mmmddm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 101 registers, 392 bytes cmem[0]
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function '_Z20CUDAConversionKernelPKdPdmm' for 'sm_61'
ptxas info    : Function properties for _Z20CUDAConversionKernelPKdPdmm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 25 registers, 352 bytes cmem[0]
ptxas info    : Compiling entry function '_Z17CUDAKernelLinpackPdS_S_mmmddm' for 'sm_61'
ptxas info    : Function properties for _Z17CUDAKernelLinpackPdS_S_mmmddm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 95 registers, 392 bytes cmem[0]
ptxas info    : Compiling entry function '_Z16CUDAKernelALPHA1PdS_S_mmmddm' for 'sm_61'
ptxas info    : Function properties for _Z16CUDAKernelALPHA1PdS_S_mmmddm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 95 registers, 392 bytes cmem[0]
ptxas info    : Compiling entry function '_Z10CUDAKernelPdS_S_mmmddm' for 'sm_61'
ptxas info    : Function properties for _Z10CUDAKernelPdS_S_mmmddm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 95 registers, 392 bytes cmem[0]
cat release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.cpp | grep -v NVCC_GREP | sed "s/#pragma detect_mismatch(\"_MSC_VER\", \"1600\")//g" > release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.cpp.tmp
mv -f release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.cpp.tmp release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.cpp
if [ -e "caldgemm_cuda.cu.x86_64-pc-linux-gnu.patch" ]; then patch -r /dev/null -s --no-backup-if-mismatch -i caldgemm_cuda.cu.x86_64-pc-linux-gnu.patch release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.cpp; fi
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -x c++ -Wno-effc++ -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT   -c release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.o
caldgemm_cuda.cu: In member function ‘virtual int caldgemm_cuda::RunCALDGEMM_Exit()’:
caldgemm_cuda.cu:738:55: warning: ‘cudaError_t cudaThreadSynchronize()’ is deprecated [-Wdeprecated-declarations]
  CHKRET(cudaThreadSynchronize(), "Synchronizing CUDA Thread");
                                                       ^
/usr/local/cuda/include/cuda_runtime_api.h:957:46: note: declared here
 extern __CUDA_DEPRECATED __host__ cudaError_t CUDARTAPI cudaThreadSynchronize(void);
                                              ^~~~~~~~~~~~~~~~~~~~~
c++ -m64 -Wall -ggdb -fopenmp -flto  -L/usr/local/cuda/lib64 -L/opt/intel/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 -L/home/nvidia/intel/mkl/lib/intel64/ -L/home/nvidia/intel/lib/intel64/ release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.o          release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/caldgemm.o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/benchmark.o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/timer.o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/qmalloc.o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/caldgemm_cpu.o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/affinity.o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/threadserver.o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/qsem.o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/caldgemm_adl.o       -lrt -ldl -lpthread -lcudart -lcuda -lcublas -liomp5 -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread  -o dgemm_bench 
/tmp/cccjW1s5.ltrans1.ltrans.o:(.nvFatBinSegment+0x8): undefined reference to `fatbinData'
collect2: error: ld returned 1 exit status
makefiles/makefile:191: recipe for target 'dgemm_bench' failed
make: *** [dgemm_bench] Error 1

Looking forward to your reply.

undefined reference to `fatbinData'

After compilation of caldgemm successfully ,When I'm compiling the HPL-GPU, I got the lib link error.

Log as follows:

-rpath=~/hpl-gpu/lib -ldl -L/root/cuda-8.0/lib64 -lcudart -lcudadevrt -lcublas -L ~/softwares/software_install/OpenMPI/lib64 -lmpi -lmpi_cxx
/tmp/ccSp3tGD.ltrans28.ltrans.o:(.nvFatBinSegment+0x8): undefined reference to `fatbinData'
collect2: error: ld returned 1 exit status
make[2]: *** [dexe.grd] Error 1

env:
MKL, CUDA8.0, OpenMPI,CentOS7

I got the same error in CUDA8 and CUDA9.

Where am I wrong, can you give me some advice?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.