Hello all, After successfully compiling the code by addressing some

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Hello <a class="user-mention notranslate" data-hovercard-type="user" data

Different error when running test: Undefined symbol: _ZN10tensorflow3PadERKN5Eigen9GpuDeviceEPKfiiiiiiPf about flownet2-tf HOT 10 OPEN

fperezgamonal commented on August 24, 2024

Different error when running test: Undefined symbol: _ZN10tensorflow3PadERKN5Eigen9GpuDeviceEPKfiiiiiiPf

from flownet2-tf.

Comments (10)

ahmedshingaly commented on August 24, 2024 20

Hi.

I am facing a similar issue as well. I am trying to run a pre-trained styleGAN model (https://github.com/NVlabs/stylegan2) on my JupyterLab in a Tensorflow 1.14 GPU environment.

So, when I try to run the python code python run_generator.py generate-images --network=gdrive:networks/stylegan2-ffhq-config-f.pkl --seeds=6600-6625 --truncation-psi=0.5 as given in the link, I get the following error:

tensorflow.python.framework.errors_impl.NotFoundError: /trainman-mount/trainman-storage-d2b580e4-067b-44d3-9be3-be48cc5f0d71/stylegan2/dnnlib/tflib/_cudacache/fused_bias_act_1ac15fee5b354fc0d3aa1e7f98502e64.so: undefined symbol: _ZN10tensorflow12OpDefBuilder6OutputESs

I have no idea what does this _ZN10tensorflow12OpDefBuilder6OutputESs mean, but seems similar to the one raised in this thread. I also tried finding solutions for this error but all of them revolve around modifying some Makefile and there doesn't seem to be any use of a makefile for my problem since I am just running python code.

Any help will be much appreciated :)

In file stylegan2/dnnlib/tflib/custom_ops.py, line 127:
change from
compile_opts += ’ --compiler-options \’-fPIC -D_GLIBCXX_USE_CXX11_ABI=0\’’
to
compile_opts += ’ --compiler-options \’-fPIC -D_GLIBCXX_USE_CXX11_ABI=1\’’

from flownet2-tf.

Vedant2311 commented on August 24, 2024 4

Hi.

I am facing a similar issue as well. I am trying to run a pre-trained styleGAN model (https://github.com/NVlabs/stylegan2) on my JupyterLab in a Tensorflow 1.14 GPU environment.

So, when I try to run the python code python run_generator.py generate-images --network=gdrive:networks/stylegan2-ffhq-config-f.pkl --seeds=6600-6625 --truncation-psi=0.5 as given in the link, I get the following error:

tensorflow.python.framework.errors_impl.NotFoundError: /trainman-mount/trainman-storage-d2b580e4-067b-44d3-9be3-be48cc5f0d71/stylegan2/dnnlib/tflib/_cudacache/fused_bias_act_1ac15fee5b354fc0d3aa1e7f98502e64.so: undefined symbol: _ZN10tensorflow12OpDefBuilder6OutputESs

I have no idea what does this _ZN10tensorflow12OpDefBuilder6OutputESs mean, but seems similar to the one raised in this thread. I also tried finding solutions for this error but all of them revolve around modifying some Makefile and there doesn't seem to be any use of a makefile for my problem since I am just running python code.

Any help will be much appreciated :)

from flownet2-tf.

fperezgamonal commented on August 24, 2024

Final update: after fighting with it for quite a few days and with help with my university's IT staff, I got it solved. A soft link for cuda.h was the solution (and keep the Makefile as shown above if I am not mistaken).

I will close this issue now, feel free to open it if you encounter a similar problem and I'll try to help you as much as possible.

Cheers.

from flownet2-tf.

seni04 commented on August 24, 2024

Final update: after fighting with it for quite a few days and with help with my university's IT staff, I got it solved. A soft link for cuda.h was the solution (and keep the Makefile as shown above if I am not mistaken).

I will close this issue now, feel free to open it if you encounter a similar problem and I'll try to help you as much as possible.

Cheers.

Hello sir, what do you mean by "A soft link for cuda.h was the solution"

how you do it ?

from flownet2-tf.

fperezgamonal commented on August 24, 2024

Hello @seni04 the technical stuff told me they had fixed by creating a soft link between the actual cuda version on the PC and the "standard" path where it is normally installed.

I assume they did something like:

ln -s /usr/bin/cuda-10.0 /usr/bin/cuda
But using the actual path where you installed CUDA as the first argument.
I'm sorry I cannot give you more details but I've just checked my IT tickets and found no extra details.
I hope this helps you,
PS: here is the actual (last) Makefile I used in any case (rename it back to Makefile)
Makefile.txt

Cheers,

Ferran.

from flownet2-tf.

seni04 commented on August 24, 2024

Hello @seni04 the technical stuff told me they had fixed by creating a soft link between the actual cuda version on the PC and the "standard" path where it is normally installed.

I assume they did something like:

ln -s /usr/bin/cuda-10.0 /usr/bin/cuda
But using the actual path where you installed CUDA as the first argument.
I'm sorry I cannot give you more details but I've just checked my IT tickets and found no extra details.
I hope this helps you,
PS: here is the actual (last) Makefile I used in any case (rename it back to Makefile)
Makefile.txt

Cheers,

Ferran.

nvcc -c --expt-relaxed-constexpr -g -std=c++11 -DNDEBUG -I/usr/local/lib/python2.7/dist-packages/tensorflow/include -I"/usr/local/cuda-9.0/include" -DGOOGLE_CUDA=1 -D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES -D__STRICT_ANSI__ -D_GLIBCXX_USE_CXX11_ABI=0 src/ops/preprocessing/kernels/data_augmentation.cu.cc -x cu -Xcompiler -fPIC -o src/ops/build/data_augmentation.o
In file included from /usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/util/cuda_kernel_helper.h:21:0,
from src/ops/preprocessing/kernels/data_augmentation.cu.cc:7:
/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/util/cuda_device_functions.h:32:31: fatal error: cuda/include/cuda.h: No such file or directory
compilation terminated.
Makefile:68: recipe for target 'preprocessing' failed
make: *** [preprocessing] Error 1

iam still get this error, already using the same makefile like yours

from flownet2-tf.

fperezgamonal commented on August 24, 2024

Hello again,

I am very sorry to see you are still facing the same issues. I totally understand your frustration since I was totally unable to successfully compile the ops in another computer to try to run more experiments in parallel (and I had the same configuration and Makefile!).

The only thing I can thing of is searching for this error since it is very reoccurring and try some of the proposed solutions and see if it works.
By the way, if you happen to solve this issue and run into a missing library (libcupti), I have just how I solved that. I did so by adding the path to the library to the LD_LIBRARY_PATH environment variable , as follows:
export LD_LIBRARY_PATH=/soft/easybuild/debian/8.8/Broadwell/software/CUDA/9.0.176/extras/CUPTI/lib64:$LD_LIBRARY_PATH

If I can find any more information on how to solve your error, I will post it here.
I wish you luck!

PS: I'll leave this open so more people can see this issue and hopefully provide a solution.
Cheers,
Feran.

from flownet2-tf.

stefanuddenberg commented on August 24, 2024

I am facing the same issue. Trying to get this to work on my university's cluster and facing the same issue. I was able to get it working fine on my Windows machine, and my group has been able to get it to work on an EC2 instance, so I have no idea what the issue is exactly. From what I can tell, all the correct dependencies are installed... @Vedant2311 did you come up with a solution?

from flownet2-tf.

AliRashidnejad commented on August 24, 2024

Hi.
I am facing a similar issue as well. I am trying to run a pre-trained styleGAN model (https://github.com/NVlabs/stylegan2) on my JupyterLab in a Tensorflow 1.14 GPU environment.
So, when I try to run the python code python run_generator.py generate-images --network=gdrive:networks/stylegan2-ffhq-config-f.pkl --seeds=6600-6625 --truncation-psi=0.5 as given in the link, I get the following error:

tensorflow.python.framework.errors_impl.NotFoundError: /trainman-mount/trainman-storage-d2b580e4-067b-44d3-9be3-be48cc5f0d71/stylegan2/dnnlib/tflib/_cudacache/fused_bias_act_1ac15fee5b354fc0d3aa1e7f98502e64.so: undefined symbol: _ZN10tensorflow12OpDefBuilder6OutputESs

I have no idea what does this _ZN10tensorflow12OpDefBuilder6OutputESs mean, but seems similar to the one raised in this thread. I also tried finding solutions for this error but all of them revolve around modifying some Makefile and there doesn't seem to be any use of a makefile for my problem since I am just running python code.
Any help will be much appreciated :)

In file stylegan2/dnnlib/tflib/custom_ops.py, line 127:
change from
compile_opts += ’ --compiler-options \’-fPIC -D_GLIBCXX_USE_CXX11_ABI=0\’’
to
compile_opts += ’ --compiler-options \’-fPIC -D_GLIBCXX_USE_CXX11_ABI=1\’’

thanks ahmedshingaly, this solved the similar issue for me

from flownet2-tf.

justusgraham commented on August 24, 2024

Also solved the issue for me. Would've been impossible to debug; thank you!

from flownet2-tf.

Different error when running test: Undefined symbol: _ZN10tensorflow3PadERKN5Eigen9GpuDeviceEPKfiiiiiiPf about flownet2-tf HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent