Giter VIP home page Giter VIP logo

hpl-gpu's People

Contributors

davidrohr avatar ddemidov avatar jbornschein avatar mattkretz avatar themarix avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hpl-gpu's Issues

Does hpl-gpu with cuda backend works on a pure cpu node?

I thought it should, at least that's the impression from reading the wiki. But in reality I got this,

CUDA Error 30: unknown error
caldgemm_cuda.cu:154
Getting Device Count
Error initializing CALDGEMM, abborting run

The dgemm_bench along runs on both cpu and gpu, and hybrid of both. The hpl-gpu build runs on cpu+gpu hybrid. But I was trying to test a cluster with some pure cpu nodes and some hybrid nodes and found that the cpu one does not run. Did I do something wrong? Or if there's special tuning that I need to do like dgemm_bench?

Problem on hpl-gpu compilation

David,

I'm having some trouble when compiling the hpl-gpu code, following your tutorial. I believe I correctly installed Intel MKL and CALDGEMM, and maybe the problem is in the environment configuration. The problem is that I receive undefined references in the recipe for 'dexe.grd', in the compilation process. Here's what I get when I try to make:

/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_blas_ctrmm'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_lp64_ch_blkldlslvs_ooc_pardiso'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_lapack_chptrd'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_spblas_lp64_mkl_zskymv'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_slv_omp_nrhs_real'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_lapack_zungqr'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_serv_default_progress'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_lp64_pds_slv_nrhs_par_real'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_lp64_sssslv_thr_pardiso'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_lp64_slv_omp_real'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_lp64_sp_assemble_csr_full'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_lp64_iter_ref_seq_real'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_spblas_lp64_mkl_dskymm'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_lp64_pds_slv_omp_driver_nrhs_cmplx'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_lapack_lp64_cgetrf'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_lapack_clansy'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_lapack_zpbtrs'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_lp64_sp_pds_create_pattern_for_metis_omp'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_spblas_lp64_mkl_zcoomm'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_sparse_s_qr_i4'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_c_pre_cgs_pardiso'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_blas_gemm_s16s16s32_pack'
...
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_blas_ztrmm'
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so: undefined reference to `mkl_blas_cgepack_compact'
/opt/intel/mkl/lib/intel64/libmkl_core.so: undefined reference to `mkl_pds_sp_pds_copy_a2l_value_omp_cmplx'
collect2: error: ld returned 1 exit status
Makefile:98: recipe for target 'dexe.grd' failed
make[2]: *** [dexe.grd] Error 1

Have you had this error before? Can you help me at figuring this out please?

Execute the command './mem -g -2 -c -1 -x -z -l -lh 3072 -lw 3072 -lx 20 -ly 20 -a -u' ,it returned an error message

Using interleaved memory Running linear and strided tests Linpack Mode enabled: 20 tiles of size 3072 x 3072 doubles Running dma-mem-bench, settings: Data Size 30198988800, Data Size GPU 75497472, Map GPU -2, CPU Core -1, Use Only Mapped GPUs 0, Iterations 16, Strided Test: Matrix 3072 x 24576 - Stride: 491520 1 OpenCL Platforms found Platform 0 Device 0: NVIDIA Corporation Tesla K80 (64 bits) Platform 0 Device 1: NVIDIA Corporation Tesla K80 (64 bits) Platform 0 Device 2: NVIDIA Corporation Tesla K80 (64 bits) Platform 0 Device 3: NVIDIA Corporation Tesla K80 (64 bits) No CPU device found
I have two CPU cores on this node,however it returns this error message.What caused this?

I would like to use hpl-gpu with a backend of CUDA but failed at compiling.

I failed at compiling the caldgemm.The log is:

(tensorrt) nvidia@Hewlett-Packard:~/caldgemm$ make -j8
/bin/sh: 1: Syntax error: redirection unexpected
/bin/sh: 1: [: -a: unexpected operator
makefiles/makefile:7: Unknown Architecture:  0, defaulting to x86_64-pc-linux-gnu
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.d: No such file or directory
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/caldgemm.d: No such file or directory
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/benchmark.d: No such file or directory
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/timer.d: No such file or directory
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/qmalloc.d: No such file or directory
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/caldgemm_cpu.d: No such file or directory
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/affinity.d: No such file or directory
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/threadserver.d: No such file or directory
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/qsem.d: No such file or directory
makefiles/makefile:334: release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/caldgemm_adl.d: No such file or directory
/bin/sh: 1: Syntax error: redirection unexpected
/bin/sh: 1: [: -a: unexpected operator
makefiles/makefile:7: Unknown Architecture:  0, defaulting to x86_64-pc-linux-gnu
/usr/local/cuda/bin/nvcc --compiler-bindir c++ --use_fast_math --maxrregcount 255 -O4 -Xptxas -v -Xptxas -O4 -Xcompiler -O4 -m64 `for i in 35 61; do echo -n -gencode arch=compute_$i,code=sm_$i\ ;done`  --compiler-options -I/home/nvidia/intel/mkl/include --compiler-options -I/usr/local/openmpi/include/vampirtrace --compiler-options -I"/usr/local/cuda/include" --compiler-options -I"/usr/local/cuda/sdk/common/inc" --compiler-options -DCALDGEMM_CUDA --compiler-options -DCALDGEMM_CUDA_CUBLAS --compiler-options -DUSE_MKL --compiler-options -D_64BIT  --cuda --output-file "release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.cpp" caldgemm_cuda.cu
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT  -Wno-strict-aliasing -c caldgemm.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/caldgemm.o
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT  -Wno-strict-aliasing -c benchmark.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/benchmark.o
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT   -c cmodules/timer.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/timer.o
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT   -c cmodules/qmalloc.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/qmalloc.o
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT   -c cmodules/affinity.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/affinity.o
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT   -c cmodules/threadserver.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/threadserver.o
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT   -c caldgemm_cpu.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/caldgemm_cpu.o
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT   -c cmodules/qsem.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/qsem.o
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT   -c caldgemm_adl.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/caldgemm_adl.o
caldgemm_cuda.cu(364): warning: variable "threads" was declared but never referenced

caldgemm_cuda.cu(364): warning: variable "blocks" was declared but never referenced

caldgemm_cuda.cu(364): warning: variable "threads" was declared but never referenced

caldgemm_cuda.cu(364): warning: variable "blocks" was declared but never referenced

ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function '_Z20CUDAConversionKernelPKdPdmm' for 'sm_35'
ptxas info    : Function properties for _Z20CUDAConversionKernelPKdPdmm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 14 registers, 352 bytes cmem[0]
ptxas info    : Compiling entry function '_Z17CUDAKernelLinpackPdS_S_mmmddm' for 'sm_35'
ptxas info    : Function properties for _Z17CUDAKernelLinpackPdS_S_mmmddm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 101 registers, 392 bytes cmem[0]
ptxas info    : Compiling entry function '_Z16CUDAKernelALPHA1PdS_S_mmmddm' for 'sm_35'
ptxas info    : Function properties for _Z16CUDAKernelALPHA1PdS_S_mmmddm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 101 registers, 392 bytes cmem[0]
ptxas info    : Compiling entry function '_Z10CUDAKernelPdS_S_mmmddm' for 'sm_35'
ptxas info    : Function properties for _Z10CUDAKernelPdS_S_mmmddm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 101 registers, 392 bytes cmem[0]
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function '_Z20CUDAConversionKernelPKdPdmm' for 'sm_61'
ptxas info    : Function properties for _Z20CUDAConversionKernelPKdPdmm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 25 registers, 352 bytes cmem[0]
ptxas info    : Compiling entry function '_Z17CUDAKernelLinpackPdS_S_mmmddm' for 'sm_61'
ptxas info    : Function properties for _Z17CUDAKernelLinpackPdS_S_mmmddm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 95 registers, 392 bytes cmem[0]
ptxas info    : Compiling entry function '_Z16CUDAKernelALPHA1PdS_S_mmmddm' for 'sm_61'
ptxas info    : Function properties for _Z16CUDAKernelALPHA1PdS_S_mmmddm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 95 registers, 392 bytes cmem[0]
ptxas info    : Compiling entry function '_Z10CUDAKernelPdS_S_mmmddm' for 'sm_61'
ptxas info    : Function properties for _Z10CUDAKernelPdS_S_mmmddm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 95 registers, 392 bytes cmem[0]
cat release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.cpp | grep -v NVCC_GREP | sed "s/#pragma detect_mismatch(\"_MSC_VER\", \"1600\")//g" > release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.cpp.tmp
mv -f release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.cpp.tmp release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.cpp
if [ -e "caldgemm_cuda.cu.x86_64-pc-linux-gnu.patch" ]; then patch -r /dev/null -s --no-backup-if-mismatch -i caldgemm_cuda.cu.x86_64-pc-linux-gnu.patch release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.cpp; fi
c++ -m64 -D"_AMD64_" -D"_X64_"  -pipe -DGCC_RUNTIME  -flto -Wall -Wno-write-strings -fopenmp -O3 -march=native -msse4.2 -m64 -fweb -frename-registers -minline-all-stringops -mfpmath=sse -ftracer -funroll-loops -fpeel-loops -fprefetch-loop-arrays -ffast-math -fno-stack-protector -ggdb  -x c++ -Wno-effc++ -I/home/nvidia/intel/mkl/include -I/usr/local/openmpi/include/vampirtrace -I"/usr/local/cuda/include" -I"/usr/local/cuda/sdk/common/inc"  -DCALDGEMM_CUDA -DCALDGEMM_CUDA_CUBLAS -DUSE_MKL -D_64BIT   -c release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.cpp -o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.o
caldgemm_cuda.cu: In member function ‘virtual int caldgemm_cuda::RunCALDGEMM_Exit()’:
caldgemm_cuda.cu:738:55: warning: ‘cudaError_t cudaThreadSynchronize()’ is deprecated [-Wdeprecated-declarations]
  CHKRET(cudaThreadSynchronize(), "Synchronizing CUDA Thread");
                                                       ^
/usr/local/cuda/include/cuda_runtime_api.h:957:46: note: declared here
 extern __CUDA_DEPRECATED __host__ cudaError_t CUDARTAPI cudaThreadSynchronize(void);
                                              ^~~~~~~~~~~~~~~~~~~~~
c++ -m64 -Wall -ggdb -fopenmp -flto  -L/usr/local/cuda/lib64 -L/opt/intel/compilers_and_libraries_2016.2.181/linux/compiler/lib/intel64 -L/home/nvidia/intel/mkl/lib/intel64/ -L/home/nvidia/intel/lib/intel64/ release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cu/caldgemm_cuda.o          release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/caldgemm.o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/benchmark.o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/timer.o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/qmalloc.o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/caldgemm_cpu.o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/affinity.o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/threadserver.o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/cmodules/qsem.o release/x86_64-pc-linux-gnu_64EXECUTABLE_dgemm_bench/cpp/caldgemm_adl.o       -lrt -ldl -lpthread -lcudart -lcuda -lcublas -liomp5 -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread  -o dgemm_bench 
/tmp/cccjW1s5.ltrans1.ltrans.o:(.nvFatBinSegment+0x8): undefined reference to `fatbinData'
collect2: error: ld returned 1 exit status
makefiles/makefile:191: recipe for target 'dgemm_bench' failed
make: *** [dgemm_bench] Error 1

Looking forward to your reply.

undefined reference to `fatbinData'

After compilation of caldgemm successfully ,When I'm compiling the HPL-GPU, I got the lib link error.

Log as follows:

-rpath=~/hpl-gpu/lib -ldl -L/root/cuda-8.0/lib64 -lcudart -lcudadevrt -lcublas -L ~/softwares/software_install/OpenMPI/lib64 -lmpi -lmpi_cxx
/tmp/ccSp3tGD.ltrans28.ltrans.o:(.nvFatBinSegment+0x8): undefined reference to `fatbinData'
collect2: error: ld returned 1 exit status
make[2]: *** [dexe.grd] Error 1

env:
MKL, CUDA8.0, OpenMPI,CentOS7

I got the same error in CUDA8 and CUDA9.

Where am I wrong, can you give me some advice?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.