Giter VIP home page Giter VIP logo

Comments (9)

t-kalinowski avatar t-kalinowski commented on August 26, 2024 2

Thanks, I can reproduce. This seems to be specific to TF 2.16, the GPU is visible with the identical setup using TF 2.15.

It seems that we need to do some more work on WSL with helping Tensorflow discover the nvidia shared libraries (note, we already workaround some deficiencies by creating symlinks to nvidia shared libraries in the tensorflow virtual env. This works on Linux, but is apparently not sufficient on WSL)

For now, you can fix by running this in WSL before starting the R session (Or setting the env vars in the R session before reticulate has initializing Python).

#!/bin/sh

# Store original LD_LIBRARY_PATH 
export ORIGINAL_LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" 

# Get the CUDNN directory 
CUDNN_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)")))

# Set LD_LIBRARY_PATH to include CUDNN directory
export LD_LIBRARY_PATH=$(find ${CUDNN_DIR}/*/lib/ -type d -printf "%p:")${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# Get the ptxas directory  
PTXAS_DIR=$(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)")))

# Set PATH to include the directory containing ptxas
export PATH=$(find ${PTXAS_DIR}/*/bin/ -type d -printf "%p:")${PATH:+:${PATH}}

from: https://discuss.tensorflow.org/t/what-versions-of-cuda-and-cudnn-are-required-for-tensorflow-2-16/24711/3

Note, there is nothing specific to conda here. We still recommend using a virtualenv if possible.

I'll push an update soon making sure that the R package does this work so users don't have to.

from keras.

evanliu3594 avatar evanliu3594 commented on August 26, 2024 1
#!/bin/sh

# Store original LD_LIBRARY_PATH 
export ORIGINAL_LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" 

# Get the CUDNN directory 
CUDNN_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)")))

# Set LD_LIBRARY_PATH to include CUDNN directory
export LD_LIBRARY_PATH=$(find ${CUDNN_DIR}/*/lib/ -type d -printf "%p:")${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# Get the ptxas directory  
PTXAS_DIR=$(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)")))

# Set PATH to include the directory containing ptxas
export PATH=$(find ${PTXAS_DIR}/*/bin/ -type d -printf "%p:")${PATH:+:${PATH}}

Thanks a lot! That saves me from learning python again...😂

from keras.

t-kalinowski avatar t-kalinowski commented on August 26, 2024

Can you confirm that the R session is indeed finding the correct python env? What is the output of reticulate::py_config()?

from keras.

evanliu3594 avatar evanliu3594 commented on August 26, 2024

Can you confirm that the R session is indeed finding the correct python env? What is the output of reticulate::py_config()?

yes, I only created 1 conda env called keras

> reticulate::py_config()
python:         /home/evan/pyEnv/keras/bin/python
libpython:      /home/evan/pyEnv/keras/lib/libpython3.11.so
pythonhome:     /home/evan/pyEnv/keras:/home/evan/pyEnv/keras
version:        3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0]
numpy:          /home/evan/pyEnv/keras/lib/python3.11/site-packages/numpy
numpy_version:  1.26.4
keras:          /home/evan/pyEnv/keras/lib/python3.11/site-packages/keras

NOTE: Python version was forced by use_python() function
> tf$config$list_physical_devices()
[[1]]
PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')

from keras.

t-kalinowski avatar t-kalinowski commented on August 26, 2024

What a curious bug, thanks for reporting.

Just to rule some things out:

  • Do you have any startup code in .Rprofile or .Renviron that might be interfering with GPU visibility? What is the output from Sys.getenv("CUDA_VISIBLE_DEVICES") in R?
  • Does the same happen outside conda? Can you try with a venv and see if things work that way?
R -q -e 'keras3::install_keras()'
R -q -e 'library(reticulate); use_virtualenv("r-keras"); import("tensorflow")$config$list_physical_devices()'

from keras.

evanliu3594 avatar evanliu3594 commented on August 26, 2024

What a curious bug, thanks for reporting.

Just to rule some things out:

  • Do you have any startup code in .Rprofile or .Renviron that might be interfering with GPU visibility? What is the output from Sys.getenv("CUDA_VISIBLE_DEVICES") in R?
  • Does the same happen outside conda? Can you try with a venv and see if things work that way?
R -q -e 'keras3::install_keras()'
R -q -e 'library(reticulate); use_virtualenv("r-keras"); import("tensorflow")$config$list_physical_devices()'

Thans for the reply.
I only used the .Rprofile to set the CRAN repo to a nearer mirror site to speed up downloading, so it is quite clean.

> Sys.getenv("CUDA_VISIBLE_DEVICES")
[1] ""

I tried the shell command to install keras, and it ends out the same.

evan@DESKTOP-KGBNUBC:~$ R -q -e 'library(reticulate); use_virtualenv("r-keras"); import("tensorflow")$config$list_physical_devices()'
> library(reticulate); use_virtualenv("r-keras"); import("tensorflow")$config$list_physical_devices()
2024-06-12 03:25:20.236712: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-12 03:25:20.782594: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-06-12 03:25:21.546573: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
[[1]]
PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')

One thing different is, in this run install_keras() with default params, the cuda package and tensorflow-gpu is no installed.
It seems R truly did not detect my GPU.

evan@DESKTOP-KGBNUBC:~$ source .virtualenvs/r-keras/bin/activate
(r-keras) evan@DESKTOP-KGBNUBC:~$ pip list | grep tensor
tensorboard                  2.16.2
tensorboard-data-server      0.7.2
tensorflow-cpu               2.16.1
tensorflow-datasets          4.9.6
tensorflow-io-gcs-filesystem 0.37.0
tensorflow-metadata          1.15.0
(r-keras) evan@DESKTOP-KGBNUBC:~$ pip list | grep cuda
(r-keras) evan@DESKTOP-KGBNUBC:~$

I dug a little to find out that the lspci can't see the GPU in the WSL2 Ubuntu2204, but nvidia-smi worked.

evan@DESKTOP-KGBNUBC:~$ lspci
4d66:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio console (rev 01)
6e30:00:00.0 System peripheral: Red Hat, Inc. Virtio file system (rev 01)
d98b:00:00.0 3D controller: Microsoft Corporation Device 008e
evan@DESKTOP-KGBNUBC:~$ nvidia-smi
Wed Jun 12 03:43:34 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.52.01              Driver Version: 555.99         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti     On  |   00000000:01:00.0  On |                  N/A |
| 39%   37C    P8             10W /  160W |    1657MiB /   8188MiB |     20%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        66      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+

Then I installed keras again with gpu=TRUE, not surprisingly resulted in the same problem as the initial one, the GPU disappeared in R, but appeared in python. 🤦‍♂️

from keras.

t-kalinowski avatar t-kalinowski commented on August 26, 2024

I'll try to get on a Windows machine tomorrow and see if I can reproduce.

from keras.

evanliu3594 avatar evanliu3594 commented on August 26, 2024

Just an update about what I've tried.

After a whole system reinstall (including the WSL Ubuntu), I found out that I can't see GPU in python too.
Sorry for ignoring this, but until then I recalled that before using R function install_keras(), I used pip to install tensorflow package, and added some lines in the conda activate.d bash script to ensure add these nvidia packages to the system environment.

NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))

for dir in $NVIDIA_DIR/*; do
    if [ -d "$dir/lib" ]; then
        export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"
    fi
done

I'm not sure if this is vital to ensure python can see the GPU, it is apparently not affecting R.

from keras.

t-kalinowski avatar t-kalinowski commented on August 26, 2024

This is fixed on main now, the workaround should not longer be necessary. Please install the development version and reinstall keras+tensorflow to test it out.

remotes::install_github("rstudio/keras3")
keras3::install_keras()
# new R session
library(keras3) # load hook hints to reticulate to use_virtualenv("r-keras")
tensorflow::tf$config$list_physical_devices()

from keras.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.