Giter VIP home page Giter VIP logo

rocm_pytorch_informations's Introduction

PyTorch 1.6.0a + ROCm 3.3 for AMD RadeonGPU @ Apr 20th, 2020

Update and upgrade to latest packages.

sudo apt update
sudo apt -y dist-upgrade

Install "Non Uniform Memory Access" dev package.

sudo apt install -y libnuma-dev

Add the ROCm apt repository

wget -q -O - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add -
echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list

Install the ROCm driver.

sudo apt update
sudo apt install -y rocm-dkms rocm-libs hipcub miopen-hip rccl
sudo reboot

GPU info

AMDGPU NVIDIA Description
rocm-smi nvidia-smi GPU information command.
clinfo clinfo OpenCL GPU information command.

Test for rocm installations

# Make sure to recognized GPUs as file descriptor.
ls /dev/dri/

# >> card0  renderD128
# Make sure to recognized GPUs.
/opt/rocm/bin/rocm-smi

# ========================ROCm System Management Interface========================
# ================================================================================
# GPU  Temp   AvgPwr  SCLK    MCLK    Fan     Perf  PwrCap  VRAM%  GPU%
# 0    35.0c  18.0W   808Mhz  350Mhz  21.96%  auto  250.0W    0%   0%
# ================================================================================
# ==============================End of ROCm SMI Log ==============================
# Make sure to recognized GPUs using OpenCL.
/opt/rocm/opencl/bin/x86_64/clinfo

# Number of platforms:				 1
#   Platform Profile:				 FULL_PROFILE
#   Platform Version:				 OpenCL 2.1 AMD-APP (3098.0)
#   Platform Name:				 AMD Accelerated Parallel Processing
#   Platform Vendor:				 Advanced Micro Devices, Inc.
#   Platform Extensions:				 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
# 
# 
#   Platform Name:				 AMD Accelerated Parallel Processing
# Number of devices:				 1
#   Device Type:					 CL_DEVICE_TYPE_GPU
#   Vendor ID:					 1002h
#   Board name:					 Vega 20
#   Device Topology:				 PCI[ B#3, D#0, F#0 ]
#   Max compute units:				 60
#   .
#   .
#   .

Add rocm binary paths

echo 'export PATH=$PATH:/opt/rocm/bin:/opt/rocm/profiler/bin:/opt/rocm/opencl/bin/x86_64' |
sudo tee -a /etc/profile.d/rocm.sh
sudo reboot

Set permissions. To access the GPU you must be a user in the video group

To add your user to the video group run

sudo usermod -a -G video $LOGNAME

Preparing for installing pytorch

Install mkl

cd /tmp
wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
sudo apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB

sudo sh -c 'echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list'
sudo apt-get update && sudo apt-get install intel-mkl-64bit-2018.2-046

Installing PyTorch

You can choose either to install from wheels or build from source

Install from wheel

Python ROCm PyTorch GPU S
3.7 3.3 1.6.0a GFX906
3.7 3.3 1.6.0a GFX900 -
3.5 2.9 1.3.0a GFX900
3.5 2.9 1.3.0a GFX906
3.7 2.9 1.3.0a GFX900 -
3.7 2.9 1.3.0a GFX906 -

GFX Code Architecture Products
GFX806 Polaris Series RX550/RX560/RX570/RX580/RX590 ...
GFX900 Vega10 Series Vega64/Vega56/MI25/WX9100/FrontierEdition ...
GFX906 Vega20 Series RadeonVII/MI50/MI60 ...

# ROCm3.3 PyTorch1.6.0a
sudo pip3 install http://install.aieater.com/libs/pytorch/rocm3.3/gfx906/torch-1.6.0a0-cp37-cp37m-linux_x86_64.whl torchvision

continue to test the installation.


Source build for developers

Install dependencies

sudo apt install -y gcc cmake clang ccache llvm ocl-icd-opencl-dev python3-pip
sudo apt install -y rocrand rocblas miopen-hip miopengemm rocfft rocprim rocsparse rocm-cmake rocm-dev rocm-device-libs rocm-libs rccl hipcub rocthrust

export PATH=/opt/rocm/hcc/bin:/opt/rocm/hip/bin:/opt/rocm/opencl/bin:$PATH
export USE_LLVM=/opt/llvm
export LLVM_DIR=/opt/llvm/lib/cmake/llvm

Clone PyTorch repository

git clone https://github.com/pytorch/pytorch.git
cd pytorch

'Hipify' PyTorch source by executing python tools/amd_build/build_amd.py

python3 tools/amd_build/build_amd.py

Alternative to get a fixed version.

wget http://install.aieater.com/libs/pytorch/sources/pytorch1.6.0.tar.gz

This pytorch project has already hippified, and cloned sub modules.

Required environment variables

GFX Code Architecture Products
GFX806 Polaris Series RX550/RX560/RX570/RX580/RX590 ...
GFX900 Vega10 Series Vega64/Vega56/MI25/WX9100/FrontierEdition ...
GFX906 Vega20 Series RadeonVII/MI50/MI60 ...
#export HCC_AMDGPU_TARGET=gfx806 #(RX550/RX560/RX570/RX580/RX590 ...)
export HCC_AMDGPU_TARGET=gfx900 #(Vega64/Vega56/MI25/WX9100/FrontierEdition ...)
#export HCC_AMDGPU_TARGET=gfx906 #(RadeonVII/MI50/MI60 ...)

export USE_NINJA=1
export MAX_JOBS=8
export HIP_PLATFORM=hcc

echo $HCC_AMDGPU_TARGET

Install cmake and requirements to Build and install

pip3 install -r requirements.txt
python3 setup.py install --user

Distribution build for wheel

python3 setup.py build

Cleanup

python3 setup.py clean

GPU visibly masking and multiple GPUs

AMDGPU NVIDIA Description
export=HIP_VISIBLE_DEVICES= export=CUDA_VISIBLE_DEVICES= CPU
export=HIP_VISIBLE_DEVICES=0 export=CUDA_VISIBLE_DEVICES=0 Single GPU
export=HIP_VISIBLE_DEVICES=0,1 export=CUDA_VISIBLE_DEVICES=0,1 Multiple GPUs

Make sure to recognize GPU device via PyTorch.

# Check GPU is available or not.
python3 -c 'import torch;print("GPU:",torch.cuda.is_available())'

# GPU: True
python3 -c 'import torch;print("DeviceID:",str(torch.cuda.current_device()))'


# DeviceID: 0
python3 -c 'import torch;print("DeviceName:",str(torch.cuda.get_device_name(torch.cuda.current_device())))'

# DeviceName: Vega 20

Make sure everything is working

PYTORCH_TEST_WITH_ROCM=1 python3 test/run_test.py --verbose

Also see, AMDGPU - ROCm Caffe/PyTorch/Tensorflow 1.x installation, official, introduction on docker

rocm_pytorch_informations's People

Contributors

aieater avatar murthy95 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rocm_pytorch_informations's Issues

Download Pytorch sub-modules

I am trying to build pytorch and I ran into an error saying that files in pytorch repository weren't found. With a bit of googling, I found out that this was because I didn't get all the sub-modules which wasn't in the directions.

git submodule update --init

Running that right after the pytorch github repository was downloaded seems to fix it. I was too lazy to create a pull request but will tomorrow adding this direction to the appropriate position.

Some rocm queries

It seems AMD is catching up with Nvidia. I am thinking about considering AMD GPUs instead of Nvidia because of the scarcity of 30 series.
Need Some infos.

  1. How is the fp16 performance for ROCm pytorch?
  2. What are the known issues with the rocm_pytorch?
  3. Is there any performance benchmark?

ImportError: libhip_hcc.so.3: cannot open shared object file: No such file or directory

I installed everything as described in Readme

torch seems to be successfully installed, but when I try to import it I get

ImportError: libhip_hcc.so.3: cannot open shared object file: No such file or directory

Do you have any ideas what might be causing that?

Something might didn't work out b4 since when I type

/opt/rocm/opencl/bin/x86_64/clinfo

I get

bash: /opt/rocm/opencl/bin/x86_64/clinfo: No such file or directory

arch linux 5.7.10 kernel panic

Hi!

Nice tutorial, official docs are really bad

but after i installed some of this packages rocm-dkms rocm-libs hipcub rccl miopen-hip i've gotten kernel panic

deinstallation didn't help, only lts kernel was able to boot my system

[   25.053990] kauditd_printk_skb: 12 callbacks suppressed
[   25.053992] audit: type=1131 audit(1595939464.843:24): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   44.490842] audit: type=1131 audit(1595939484.280:25): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   44.642536] audit: type=1334 audit(1595939484.433:26): prog-id=10 op=UNLOAD
[   44.643392] audit: type=1334 audit(1595939484.433:27): prog-id=9 op=UNLOAD
[  319.355039] audit: type=1130 audit(1595939759.143:28): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=getty@tty2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  321.613645] audit: type=1100 audit(1595939761.403:29): pid=438 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:authentication grantors=pam_securetty,pam_tally2,pam_shells,pam_unix,pam_permit acct="root" exe="/usr/bin/login" hostname=reache addr=? terminal=tty2 res=success'
[  321.619593] audit: type=1101 audit(1595939761.410:30): pid=438 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:accounting grantors=pam_tally2,pam_access,pam_unix,pam_permit,pam_time acct="root" exe="/usr/bin/login" hostname=reache addr=? terminal=tty2 res=success'
[  321.620133] audit: type=1103 audit(1595939761.410:31): pid=438 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_securetty,pam_tally2,pam_shells,pam_unix,pam_permit acct="root" exe="/usr/bin/login" hostname=reache addr=? terminal=tty2 res=success'
[  321.620209] audit: type=1006 audit(1595939761.410:32): pid=438 uid=0 old-auid=4294967295 auid=0 tty=tty2 old-ses=4294967295 ses=1 res=1
[  321.620250] audit: type=1300 audit(1595939761.410:32): arch=c000003e syscall=1 success=yes exit=1 a0=3 a1=7ffff4b85ae0 a2=1 a3=7ffff4b857f7 items=0 ppid=1 pid=438 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=tty2 ses=1 comm="login" exe="/usr/bin/login" key=(null)
[  321.620325] audit: type=1327 audit(1595939761.410:32): proctitle=2F62696E2F6C6F67696E002D70002D2D0020202020
[  321.633869] audit: type=1130 audit(1595939761.423:33): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=user-runtime-dir@0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  321.638862] audit: type=1101 audit(1595939761.427:34): pid=441 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:accounting grantors=pam_tally2,pam_access,pam_unix,pam_permit,pam_time acct="root" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  321.638938] audit: type=1103 audit(1595939761.427:35): pid=441 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=? acct="root" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'

can you help or give right direction at least?

Installed ROCm4.0, but how to use it?

Hey,
I just installed ROCm4.0 from the official pytorch website. How can I now use my radeon gpu for training?
I tried your command:

python3 -c 'import torch;print("DeviceName:",str(torch.cuda.get_device_name(torch.cuda.current_device())))'

which gives the error

Traceback (most recent call last):
File "", line 1, in
File "/home/bryan/.local/lib/python3.6/site-packages/torch/cuda/init.py", line 388, in current_device
_lazy_init()
File "/home/bryan/.local/lib/python3.6/site-packages/torch/cuda/init.py", line 170, in _lazy_init
torch._C._cuda_init()
RuntimeError: No HIP GPUs are available

Any clues what I am doing wrong or could you give me any hint, how to use my gpu after installing ROCm4.0?

Thanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.