miguelmonteiro / permutohedral_lattice Goto Github PK

Permutohedral Lattice C++/CUDA implementation + TensorFlow Op (CPU/GPU)

C++ 98.78% CMake 0.06% Cuda 0.84% Shell 0.03% Python 0.29%

tensorflow bilateral-filter gaussian-filter permutohedral-lattice-algorithm gpu cuda cpp conditional-random-fields filter

permutohedral_lattice's Introduction

This code implements the Permutohedral Lattice for high dimensional filtering. Read the original paper. If you use this work please consider citing our paper in addition to the original one.

The code contains:

A CPU implementation (C++);
A GPU implementation (C++/CUDA);
TensorFlow Op Kernels that wrap the CPU and GPU implementations to be used in Python/TensorFlow;

This code can be used to perform (approximate) bilateral filtering, gaussian filtering, non-local means etc... It also supports an arbitrary number of spatial dimensions, input channels and reference channels.

The TensorFlow op has gradients implemented and hence can be used with backprop, it can be used with batch_size>=1. This code was made with to be used as part of larger algorithms such as Conditional Random Fields (CRFs).

How to compile and use

Install CMake (version >= 3.9).
Open the file build.sh and change the variables CXX_COMPILER and CUDA_COMPILER to the path of the C++ and nvcc (CUDA) compilers on your machine.
To compile the code run:

sh build.sh

This will create a directory called build_dir which will contain the compiled code.

Caveats

This script will try to compile code for both CPU and GPU at the same time, so if you don't want the GPU part (and want the script to run) you must change CMakeLists.txt.

Because of the way the GPU (CUDA) code is implemented, the number of spatial dimensions and number of channels of the input and reference images must be known at compile time. This can be changed in the build.sh script as well by changing the variables SPATIAL_DIMS, INPUT_CHANNELS and REFERENCE_CHANNELS. If you only need the CPU version this variables do nothing to it and these values can be run-time values.

Example Usage

CPU C++

./build_dir/test_bilateral_cpu Images/input.bmp Images/output.bmp 8 0.125

GPU C++/CUDA

./build_dir/test_bilateral_gpu Images/input.bmp Images/output.bmp 8 0.125

TensorFlow Python

Look into TFOpTests for actual working examples.

Example of bilateral filtering a 2D filtering gray scale image based on a RGB image. On GPU compile with SPATIAL_DIMS=2, INPUT_CHANNELS=1 and REFERENCE_CHANNELS=3

import tensorflow as tf
import lattice_filter_op_loader

input = tf.placeholder(shape=(batch_size, width, height, 1))
reference = tf.placeholder(shape=(batch_size, width, height, 3))

output = module.lattice_filter(input, reference_image, bilateral=True, theta_alpha=8, theta_beta=0.125)

# Then run the graph, load, save images

Known Issues

The GPU version must know SPATIAL_DIMS, INPUT_CHANNELS and REFERENCE_CHANNELS at run time.
Sometimes the op does not default to use the GPU. Don't know the cause of this.
The CPU and GPU versions don't produce exactly the same result (0.2% different). Has to do with implementation.
The gradients of the TensorFlow Op don't match numerically calculated gradients for some values of the various theta parameters. I suspect it has something to do with numerical issues when dividing by numbers close to zero.

Collaborators are welcome

permutohedral_lattice's People

Contributors

Stargazers

Watchers

Forkers

fettpet congweilin arthur-qiu longfei98 whztt07 thetoby1 hiphonl netw0rkf10w sonicyang ojskrede ml-edu liuzh47 shengwenbo125 jianzfb liuziyang123

permutohedral_lattice's Issues

About Speed acceleration

Your work is very valuable. I have a stupid question that How many times faster is the cuda version compared to the cpu version? Because I have no Gpu now....

Limitation of the input_channels

Thanks for the CUDA implementation! It works well in my case.
But there seems to be a limitation of the input channels. When I set it above 30 I got an error about the Hashtable when building it. My GPU is Nvidia 1080. Do you have the same problem?

Error after compilation

Hi,
I have an issue that make me unable to use the code. the error comes up during the build. If if compile with CXX_COMPILER=/usr/bin/g++-4.8, I get this error message

-- The CXX compiler identification is GNU 4.8.5
-- The CUDA compiler identification is NVIDIA 10.0.130
-- Check for working CXX compiler: /usr/bin/g++-4.8
-- Check for working CXX compiler: /usr/bin/g++-4.8 -- broken
CMake Error at /usr/local/share/cmake-3.13/Modules/CMakeTestCXXCompiler.cmake:45 (message):
 The C++ compiler

    "/usr/bin/g++-4.8"

  is not able to compile a simple test program.

  It fails with the following output:


Change Dir: /home/path/CRFasRNNLayer/permutohedral_lattice/build_dir/CMakeFiles/CMakeTmp
    Run Build Command:"/usr/bin/make" "cmTC_756dd/fast"
    /usr/bin/make -f CMakeFiles/cmTC_756dd.dir/build.make CMakeFiles/cmTC_756dd.dir/build
    make[1]: Entering directory '/home/path/CRFasRNNLayer/permutohedral_lattice/build_dir/CMakeFiles/CMakeTmp'
    Building CXX object CMakeFiles/cmTC_756dd.dir/testCXXCompiler.cxx.o
    /usr/bin/g++-4.8    -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe    -o CMakeFiles/cmTC_756dd.dir/testCXXCompiler.cxx.o -c /home/path/CRFasRNNLayer/permutohedral_lattice/build_dir/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
    g++-4.8: error: unrecognized command line option ‘-std=c++17’
    g++-4.8: error: unrecognized command line option ‘-fstack-protector-strong’
    g++-4.8: error: unrecognized command line option ‘-fno-plt’
    CMakeFiles/cmTC_756dd.dir/build.make:65: recipe for target 'CMakeFiles/cmTC_756dd.dir/testCXXCompiler.cxx.o' failed
    make[1]: *** [CMakeFiles/cmTC_756dd.dir/testCXXCompiler.cxx.o] Error 1
    make[1]: Leaving directory '/home/path/CRFasRNNLayer/permutohedral_lattice/build_dir/CMakeFiles/CMakeTmp'
    Makefile:121: recipe for target 'cmTC_756dd/fast' failed
    make: *** [cmTC_756dd/fast] Error 2
   
  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:3 (project)


-- Configuring incomplete, errors occurred!

But if I use g++-7, I can compile but I get this error message when I run the test.

Traceback (most recent call last):
  File "Tests/greyscale_test.py", line 8, in <module>
    import lattice_filter_op_loader
  File "/home/path/CRFasRNNLayer/permutohedral_lattice/lattice_filter_op_loader.py", line 29, in <module>
    module = tf.load_op_library(path.join(path.dirname(path.abspath(__file__)), 'lattice_filter.so'))
  File "/home/me/anaconda3/envs/test/lib/python3.7/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /home/path/CRFasRNNLayer/permutohedral_lattice/lattice_filter.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrESs

I am on Ubuntu 18, python 3.7.
Thanks in advance!

About 3D Version

Hi,
Thanks for your nice work.
By the way, I want to know if this implementation of 3D version can be directly used for irregular discrete points？

Changes for CPU compilation

Could you please list down what all changes would be required in CMakeLists.txt for CPU-only version of this code?

About kernel normalization

Hi @MiguelMonteiro,

Thanks for sharing your code!

I know you no longer work on this project, but I would like to ask just a quick question that I hope wouldn't take much of your time.

In Krähenbühl's paper, first paragraph of Section 5, he mentioned two types of normalization (symmetric and asymmetric) that greatly increase the accuracy of the approximate filtering. Do you remember doing anything like that in your code?

Thank you in advance for your response!

Kernel size

Thanks for the effort implementing the code, good to know it can support multi-batch. However, after reading the code, can I assume this bilateral filter is applied on a densely connected graph? Say, pixel (0,0) are jointly computed with all other pixels (i,j) except itself. Please clarify if I am wrong.

Regards,

Could you share this project's environment ?

I successfully build this project, but encounter such problem:
tensorflow.python.framework.errors_impl.NotFoundError: /home/data/lixg/Mig_CRF/permutohedral_lattice/lattice_filter.so: undefined symbol: _ZN10tensorflow7strings6StrCatERKNS0_8AlphaNumE
So, I guess it's concerned about the environment.

SPATIAL_DIMS, INPUT_CHANNELS and REFERENCE_CHANNELS setting

Thanks for your shared source code. I'm doing medial image segmentation, and I'd like to add crf-rnn layer to the end of U-Net model. My input data is 3D MRA image and the ground truth includes only one label. Is it true if I set SPATIAL_DIMS=3, INPUT_CHANNELS=3 and REFERENCE_CHANNELS=3 ?

Looking for your reply.
Thanks

Permutohedral Lattice for 3D pointcloud

Hi,

Thanks for your work in this repo. I am currently implementing CRF-RNN for 3D pointcloud, but it's very hard for me to understand the Permutohedral Lattice. I decided to implement CRF-RNN without the lattice, but it seems very slow and the improvement is not significant.

Would you be kind to point me, how to hack the lattice implementation for 3D points?
It has X,Y,Z as spatial feature and Intensity as non-spatial feature, so it would be at least 4D.

Hasan

15 errors detected in the compilation of "/tmp/tmpxft_00002c51_00000000-6_LatticeFilterkernal.cpp1.ii"

hi, I have tried to use your source code to compiler lattice_filter.so, but there are so many error when I excute
sh build.sh
the final information is 15 errors detected in the compilation of "/tmp/tmpxft_00002c51_00000000-6_LatticeFilterkernal.cpp1.ii", could help me check where lead to there errors?
Thank you

Tensorflow version

Hello, what is the version of tensorflow and cuda you use?

one issue when compiling

Hello！
I use ubuntu16.04. python3.5. tensorflow1.4
My version g++ 5 and cuda 8.0.

At the beginning of compiling, there are several information show that both tow compiler is work well.
All the errors occur after 50% are compile task is done and i see this
[50%] Building CUDA object CMakeFiles/lattice_filter.dir/src/LatticeFilterkernel.cu.o

Scanning dependencies of target lattice_filter
[ 25%] Building CXX object CMakeFiles/lattice_filter.dir/src/LatticeFilterKernel.cpp.o
[ 50%] Building CUDA object CMakeFiles/lattice_filter.dir/src/LatticeFilterKernel.cu.o

About the build type

Hi Miguel,

I noticed that in the build script you used cmake to build the project in debug mode: cmake -DCMAKE_BUILD_TYPE=Debug.

Should we expect better performance if we build it in release mode? I tried setting -DCMAKE_BUILD_TYPE=Release but this produced errors.

Sorry if the question is stupid.

Thank you in advance for your answer!

The purpose of the reverse in the blur stage

First of all, thank you so much for sharing your implementation.
Could you please explain why the parameter reverse is set to false in the bilateral filter and true in the back propagation stage?

cuda_runtime.h: No such file or directory

Hello! thanks for sharing this code! 😁 😁

Unfortunately, I cannot compile the code. When I run bash build.sh I obtain the following error:

-- The CXX compiler identification is GNU 5.5.0
-- The CUDA compiler identification is NVIDIA 10.1.243
-- Check for working CXX compiler: /usr/bin/g++-5
-- Check for working CXX compiler: /usr/bin/g++-5 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /home/gabriele/venvs/conda/miniconda3/envs/tf114/bin/nvcc
-- Check for working CUDA compiler: /home/gabriele/venvs/conda/miniconda3/envs/tf114/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Configuring done
-- Generating done
-- Build files have been written to: path/to/crf_as_rnn/permutohedral_lattice/build_dir
Scanning dependencies of target lattice_filter
[ 25%] Building CXX object CMakeFiles/lattice_filter.dir/src/LatticeFilterKernel.cpp.o
In file included from path/to/crf_as_rnn/permutohedral_lattice/src/LatticeFilterKernel.h:24:0,
from path/to/crf_as_rnn/permutohedral_lattice/src/LatticeFilterKernel.cpp:21:
path/to/crf_as_rnn/permutohedral_lattice/src/DeviceMemoryAllocator.h:27:26: fatal error: cuda_runtime.h: No such file or directory
compilation terminated.
CMakeFiles/lattice_filter.dir/build.make:62: recipe for target 'CMakeFiles/lattice_filter.dir/src/LatticeFilterKernel.cpp.o' failed
make[2]: *** [CMakeFiles/lattice_filter.dir/src/LatticeFilterKernel.cpp.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/lattice_filter.dir/all' failed
make[1]: *** [CMakeFiles/lattice_filter.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

In particular, I use a conda virtual environment with tensorflow 1.14 (at /home/gabriele/venvs/conda/miniconda3/envs/tf114), and I have CUDA Version: 10.2.
Inside the build.sh, I modified the following variables:

CUDA_COMPILER=$CONDA_PREFIX/bin/nvcc
CXX_COMPILER=/usr/bin/g++-5
CUDA_INCLUDE=$CONDA_PREFIX/lib/python3.7/site-packages/tensorflow/include

Any idea what could raise the error? It seems there is no cuda_runtime.h

Thanks a lot!

about Batch_size

Hi, Miguel,

When batch_size>1， the memory that needs to be allocated is batch_size=1 or batch_size>1.

Thanks,

Pytorch version

Dear author, it's a so useful resource, is it possible to release the Pytorch version recently? Thanks.

lattice_filter.so

Hi,

Where is the 'lattice_filter.so'

Thanks,
ZZW

3D pointcloud

Hi,
I would like to know how could I use this code for 3D point cloud.

From what I understand, I would need to have put SPATIAL_DIMS=3, but this would require me to have a 3d point everywhere. On the contrary, what I have is a list of 3D sparse points that do not fill the whole space and are no regularly spaced out. Each point has a 3D feature vector (RGB).

So basically, my question is : can i use your code if I have sparsity in the spatial dimensions?

Thanks

Build.sh runs successfully, but no ./build_dir is created

Hi,

I get to run build.sh after changing the file to use the default g++ and nvcc I have installed. However, after a point I get the following message

[ 50%] Building CUDA object CMakeFiles/lattice_filter.dir/src/LatticeFilterKernel.cu.o                                                                                

/home/gustavo/.local/lib/python3.6/site-packages/tensorflow/include/absl/strings/string_view.h(495): warning: expression has no effect /home/gustavo/.local/lib/python3.6/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eige/src/Core/MathFunctions.h(1288): warning: calling a constexpr __host__ function("real") from a __host__ __device__ function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

After this, I tried to update the CMakeLists.txt file with the flag
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} --expt-relaxed-constexpr")

However, it did not fix. Would you have some idea about how to solve this?

Thanks

Get weird error when using Bilateral Filter

Hello, when I doing test on a small demo, I get such error:

2018-10-12 10:54:42.372961: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
2018-10-12 10:54:42.487787: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.86
pciBusID: 0000:02:00.0
totalMemory: 7.92GiB freeMemory: 26.00MiB
2018-10-12 10:54:42.487833: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1)
2018-10-12 10:54:42.489801: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 26.00M (27262976 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-10-12 10:54:42.557414: E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2018-10-12 10:54:42.557467: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2018-10-12 10:54:42.557484: F tensorflow/core/kernels/conv_ops.cc:667] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
Aborted (core dumped)

and this is my small demo:

ims = tf.placeholder(tf.float32, [None, None, None, 3], name='img')
labels = tf.placeholder(tf.int32, [None, None, None, 1], name='labels')

feat = tf.layers.conv2d(ims, 3, [3, 3])
a = tf.nn.relu(feat)
feat = tf.layers.conv2d(a, 3, [3, 3])
b = tf.nn.relu(feat)
_a = tf.split(a, 3, axis=-1)
_b = tf.split(b, 3, axis=-1)
tmp = []
for __a, __b in zip(_a, _b):
    tmp.append(module.lattice_filter(__b, __a, bilateral=True, theta_alpha=8.0, theta_beta=0.125))
final = tf.concat(tmp, axis=-1)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    out = sess.run(final, feed_dict={ims: np.random.rand(2, 32, 32, 3)})

Please help me.

Cuda Debug informations

Hello,

inside the memory allocator on line 96, 102 and 106 are naked cuda calls. They doesn't deliver any debug information. For example if the gpu rans out of memory the program exists when it checks the createLattice function. This is a hard to find error. A simple solution would be to integrate some checks after the cuda calls.

greets Sebastian

Pytorch?

Hi,
Just to know, will there be a PyTorch extension of this code?
Thanks

Possibly incorrect number of blocks/blockSize?

I've been working on some adaptation of your code and noticed something inconsistent:

permutohedral_lattice/src/PermutohedralLatticeGPU.cuh

Line 476 in fe04895

blockSize.y = 1;

Shouldn't here be blocks.y = 1;? blockSize.y seems to be equal to one already. In my understanding, the result is that the same computation is run pd + 1 times. Though it's not entirely clear if this is harmful.

Questions about REFERENCE_CHANNELS

Hello author, I have no idea about REFERENCE_CHANNELS. Could you please provide some explanation about that variable?

Thank you very much!