Thanks for the CUDA implementation! It works well in my case. But there seems to b

Limitation of the input_channels about permutohedral_lattice HOT 10 CLOSED

ShuaiChenBIGR commented on August 16, 2024

Limitation of the input_channels

from permutohedral_lattice.

Comments (10)

ShuaiChenBIGR commented on August 16, 2024 1

Oh, that will make the deep learning super slow.....But thanks for the explaination. I will try not to use too many input channels.

from permutohedral_lattice.

MiguelMonteiro commented on August 16, 2024

Hi, can you specify the error you are getting?

from permutohedral_lattice.

ShuaiChenBIGR commented on August 16, 2024

Thanks for your reply. Here is the error log in Linux:
rm: cannot remove 'lattice_filter.so': No such file or directory
-- The CXX compiler identification is GNU 5.4.0
-- The CUDA compiler identification is NVIDIA 9.0.176
-- Check for working CXX compiler: /usr/bin/g++-5
-- Check for working CXX compiler: /usr/bin/g++-5 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Configuring done
-- Generating done
-- Build files have been written to: /hdd2/PythonCodes//Modules/Networks/CRF/CRFasRNN_tensorflow/CRFasRNNLayer/build_dir
Scanning dependencies of target lattice_filter
[ 25%] Building CXX object CMakeFiles/lattice_filter.dir/src/LatticeFilterKernel.cpp.o
[ 50%] Building CUDA object CMakeFiles/lattice_filter.dir/src/LatticeFilterKernel.cu.o
/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1265): warning: calling a constexpr host function("real") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1265): warning: calling a constexpr host function("imag") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1265): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1270): warning: calling a constexpr host function("real") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1270): warning: calling a constexpr host function("imag") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1270): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(133): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(138): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(208): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(213): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/lib/bfloat16/bfloat16.h(63): warning: calling a constexpr host function("real") from a host device function("bfloat16") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/lib/bfloat16/bfloat16.h(63): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/lib/bfloat16/bfloat16.h(66): warning: calling a constexpr host function("real") from a host device function("bfloat16") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/lib/bfloat16/bfloat16.h(66): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/lib/bfloat16/bfloat16.h(157): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/lib/bfloat16/bfloat16.h(161): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(57): warning: integer conversion resulted in a change of sign

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(304): warning: integer conversion resulted in a change of sign

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(305): warning: integer conversion resulted in a change of sign

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(57): warning: integer conversion resulted in a change of sign

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(304): warning: integer conversion resulted in a change of sign

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/google/protobuf/arena_impl.h(305): warning: integer conversion resulted in a change of sign

/home/schen/anaconda3/lib/python3.6/site-packages/tensorflow/include/google/protobuf/generated_message_reflection.h(685): warning: variable "unused" was set but never used

[ 75%] Linking CUDA device code CMakeFiles/lattice_filter.dir/cmake_device_link.o
nvlink error : Entry function '__nv_static_67__54_tmpxft_00000a82_00000000_6_LatticeFilterKernel_cpp1_ii_46244a3b__Z10splatCacheIdLi3ELi33EEviPKT_P11MatrixEntryIS0_E12HashTableGPUIS0_XT0_EXT1_EE' uses too much shared data (0x10c00 bytes, 0xc000 max)
nvlink error : Entry function '__nv_static_67__54_tmpxft_00000a82_00000000_6_LatticeFilterKernel_cpp1_ii_46244a3b__Z10splatCacheIdLi4ELi33EEviPKT_P11MatrixEntryIS0_E12HashTableGPUIS0_XT0_EXT1_EE' uses too much shared data (0x10c00 bytes, 0xc000 max)
CMakeFiles/lattice_filter.dir/build.make:98: recipe for target 'CMakeFiles/lattice_filter.dir/cmake_device_link.o' failed
make[2]: *** [CMakeFiles/lattice_filter.dir/cmake_device_link.o] Error 255
CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/lattice_filter.dir/all' failed
make[1]: *** [CMakeFiles/lattice_filter.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
cp: cannot stat 'lattice_filter.so': No such file or directory

from permutohedral_lattice.

MiguelMonteiro commented on August 16, 2024

So it's a compilation error you get but only when you set INPUT_CHANNELS greater than 30?

from permutohedral_lattice.

ShuaiChenBIGR commented on August 16, 2024

Yes. I haven't check the exact value between 20-30 but when I set 3 or 20 it works fine. Values like 26-30 will give compilation errors.

from permutohedral_lattice.

MiguelMonteiro commented on August 16, 2024

From the error you posted it seems that you are running out of memory.
There is no easy fix for this as it requires deep changes in the program.
Perhaps the "easy way" would be to try a better GPU if you have one around, but even then if you were to keep increasing the channels you would run into the same problem again.

from permutohedral_lattice.

ShuaiChenBIGR commented on August 16, 2024

I have the same feeling that it's the limitation of GPU memory, so maybe I could try the CPU version and see whether I can use more input channels.
You said that we could change the CMakeList.txt to only compile the CPU version, but I don't have much experience in C++. Do you mind showing me how to modify it?

from permutohedral_lattice.

MiguelMonteiro commented on August 16, 2024

It has nothing to do with C++ it is the CMakeList.txt that has to be modified, this is meta-language that tells the compiler what to do.
That being said it is not necessary to change anything. The CPU version works for any number of input and reference channels regardless of how the GPU code is compiled.
The flag INPUT_CHANNELS only influences the GPU version.
As a results, you can compile the GPU version for a small number of channels and use the CPU version with 30 channels.
However, if you use 30 channels with the CPU version it's likely going to take forever...

from permutohedral_lattice.

ShuaiChenBIGR commented on August 16, 2024

That means the CPU version will automatically applied only if we don't import the lattice_filter_op_loader.module, just like what you did in the first test.py in the CRFasRNN layer?

from permutohedral_lattice.

MiguelMonteiro commented on August 16, 2024

No, it is not automatic. Tensorflow uses the GPU by default, you have to tell it to use the CPU explicitly.
Something like:

with tf.device('/cpu:0'):
    code here

from permutohedral_lattice.

Limitation of the input_channels about permutohedral_lattice HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent