Giter VIP home page Giter VIP logo

Comments (7)

ligun avatar ligun commented on August 25, 2024

I have the same issue on my environment.

  • Linux Mint 20.3 (Based on Ubuntu 20.04)
  • RX580
  • ROCm 5.2.0
    • sudo amdgpu-install --usecase=rocm
    • sudo dpkg -i rocblas_2.44.0.50200-65_amd64.deb
    • pip3 install torchvision-0.12.0a0+2662797-cp38-cp38-linux_x86_64.whl
    • pip3 install torch-1.11.0a0+git503a092-cp38-cp38-linux_x86_64.whl

Additionally, I had to install libopenmpi-dev, miopen-hip, libopenblas-dev, rocm-libs and rocm-dev to import pytorch.

How did you resolve the issue?

from rocm-gfx803.

xuhuisheng avatar xuhuisheng commented on August 25, 2024

@ligun
maybe pci atomic issue, you can run dmesg|grep kfd tko check whether card is added successful.

from rocm-gfx803.

ligun avatar ligun commented on August 25, 2024

@xuhuisheng
Thank you for the reply. I checked the log.
It seems to be added successfully.

$ dmesg|grep kfd
[    3.098492] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[    3.098647] kfd kfd: amdgpu: added device 1002:67df

from rocm-gfx803.

xuhuisheng avatar xuhuisheng commented on August 25, 2024

You can execute rocminfo to check if rocm can run properly.

Right now, I meet a problem that if I installed latest ROCm-5.2.3 dkms - driver, then I won't run rocminfo on gfx803. So I uninstall dkms, using upstream kernel builtin amdgpu driver, then I can run ROCm properly.

from rocm-gfx803.

ligun avatar ligun commented on August 25, 2024

rocminfo always returned error.

$ rocminfo 
ROCk module is loaded
hsa api call failure at: /long_pathname_so_that_rpms_can_package_the_debug_info/src/rocminfo/rocminfo.cc:1140
Call returned HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.

from rocm-gfx803.

xuhuisheng avatar xuhuisheng commented on August 25, 2024

Looks same to me.
My suggestion is uninstall amdgpu-dkms and amdgpu-dkms-firmware and reboot. This will fallback to upstream linux kernel builtin amdgpu-driver. Try rocminfo again, maybe pass.

from rocm-gfx803.

ligun avatar ligun commented on August 25, 2024

@xuhuisheng
Thank you very much for all your advice. Finally rocminfo ran successfully and torch.cuda.is_available() returned true!
I can use ROCm.

from rocm-gfx803.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.