Giter VIP home page Giter VIP logo

Comments (8)

jnorwood avatar jnorwood commented on August 26, 2024

On the same gpu, the fig_15_5 example runs at about 0.11 per iteration and fig_15_7 at about 0.035 per iteration, so the 7 secs per iteration of the fig_15_3 single task example seems extremely slow.

from data-parallel-cpp.

bashbaug avatar bashbaug commented on August 26, 2024

Interesting, the "single task" version is not going to run very well on most GPUs, but the time you are seeing is excessive.

Could you please include:

  • What version of the dpcpp compiler you are using, from dpcpp --version?
  • What driver versions you have installed, from sycl-ls or sycl-ls --verbose?

As a data point, you may also want to try using the OpenCL GPU backend instead of the Level Zero GPU backend. You can do this with the SYCL_BE or SYCL_DEVICE_FILTER environment variables - see here. I don't think this will make a difference (it doesn't on my similar Intel(R) HD Graphics 620 system), but it's worth a try.

Thanks!

from data-parallel-cpp.

jnorwood avatar jnorwood commented on August 26, 2024

I'm using the most recent docker released version

root@33541cf26757:/workspaces/data-parallel-CPP-main/build# dpcpp --version
Intel(R) oneAPI DPC++ Compiler 2021.2.0 (2021.2.0.20210317)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2021.2.0/linux/bin

root@33541cf26757:/workspaces/data-parallel-CPP-main/build# sycl-ls
ACC : Intel(R) FPGA Emulation Platform for OpenCL(TM) 1.2 [2021.11.3.0.17_160000]
CPU : Intel(R) OpenCL 2.1 [2021.11.3.0.17_160000]
GPU : Intel(R) OpenCL HD Graphics 3.0 [21.11.19310]
GPU : Intel(R) Level-Zero 1.0 [1.0.19310]
HOST: SYCL host platform 1.2 [1.2]

I retried using
export SYCL_DEVICE_FILTER=opencl:gpu:2
based on document at github
It still hangs for the original matrixSize=128, but made it through four iterations for matrixSize=100 at about 6.3 sec/iteration.

from data-parallel-cpp.

bashbaug avatar bashbaug commented on August 26, 2024

I got the most recent docker version working on my system also. Note that it appears there is a slightly newer version thanthe one you are using. I'm not able to reproduce this issue on my end:

root@55f9fbe3ec3b:/workspaces/sycl-book-samples/build# dpcpp --version
Intel(R) oneAPI DPC++/C++ Compiler 2021.3.0 (2021.3.0.20210619)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2021.3.0/linux/bin
root@55f9fbe3ec3b:/workspaces/sycl-book-samples/build# sycl-ls
0. ACC : Intel(R) FPGA Emulation Platform for OpenCL(TM) 1.2 [2021.12.6.0.19_160000]
1. CPU : Intel(R) OpenCL 2.1 [2021.12.6.0.19_160000]
2. GPU : Intel(R) OpenCL HD Graphics 3.0 [21.23.20043]
3. GPU : Intel(R) Level-Zero 1.1 [1.1.20043]
4. HOST: SYCL host platform 1.2 [1.2]
root@55f9fbe3ec3b:/workspaces/sycl-book-samples/build# samples/Ch15_gpus/fig_15_3_single_task_matrix_multiplication 
Running on device: Intel(R) HD Graphics 620 [0x5916]
Success!
GFlops: 0.0249472
root@55f9fbe3ec3b:/workspaces/sycl-book-samples/build# SYCL_DEVICE_FILTER=opencl:gpu samples/Ch15_gpus/fig_15_3_single_task_matrix_multiplication 
Running on device: Intel(R) HD Graphics 620 [0x5916]
Success!
GFlops: 0.0249608
root@55f9fbe3ec3b:/workspaces/sycl-book-samples/build# 

A couple of possibilities:

  1. Perhaps there was an issue in the older docker image that has been fixed? This would be the best-case scenario. Can you please try grabbing the latest docker image and give it a try?
  2. Maybe there is an issue with your Iris Pro Graphics 580 that does not appear on my HD Graphics 620? I think this is unlikely - if anything your GPU should be faster! - but I suppose it is possible.
  3. Could there be anything else odd happening with your system? Is everything else running OK?

Since (1) is the easiest to check, let's start there first.

from data-parallel-cpp.

jnorwood avatar jnorwood commented on August 26, 2024

ok, thanks. Yes, I had pulled the latest docker images, but neglected to rebuild my docker environment in vscode and update its compiler paths

after doing that I delete my build directory and then re-created makefiles. So, here's also my cmake configure options ... no optimizations and enabling debug. Maybe that has something to do with the issue.

41 mkdir build
42 cd build
43 CXX=dpcpp cmake -D CMAKE_BUILD_TYPE=Debug -D CMAKE_CXX_FLAGS="-O0" -D NODPL=1 ../

Here is the compiler version showing the update to latest version and the sycl-ls versions
root@4578405bdff6:/workspaces/data-parallel-CPP-main/build# dpcpp --version
Intel(R) oneAPI DPC++/C++ Compiler 2021.3.0 (2021.3.0.20210619)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2021.3.0/linux/bin
root@4578405bdff6:/workspaces/data-parallel-CPP-main/build#

root@4578405bdff6:/workspaces/data-parallel-CPP-main/build# sycl-ls
0. ACC : Intel(R) FPGA Emulation Platform for OpenCL(TM) 1.2 [2021.12.6.0.19_160000]

  1. CPU : Intel(R) OpenCL 2.1 [2021.12.6.0.19_160000]
  2. GPU : Intel(R) OpenCL HD Graphics 3.0 [21.23.20043]
  3. GPU : Intel(R) Level-Zero 1.1 [1.1.20043]
  4. HOST: SYCL host platform 1.2 [1.2]

however, the end result with matrixSize=128 is still a hang.
with matrixSize=100 it completes, although with long iterations
with matrixSize=110 it hangs after one 9.2 second iteration.

I'm attaching the screen captures with the iterations showing the matrixSize for 100 and 110 cases.

fig_15_3_single_task_mat100
fig_15_3_single_task_mat110

from data-parallel-cpp.

jnorwood avatar jnorwood commented on August 26, 2024

I checked that the problem is associated with the disabled optimization. If I override build optimization to -O2 with
make CXX_FLAGS="-O2", then the MatrixSize:128 completes
I normally build with -O0 due to the poor debugger support with -O2 optimization.

Running on device: Intel(R) Iris(TM) Pro Graphics 580 [0x193b]
MatrixSize:128
time:0.979357
time:0.807278
time:0.823998
time:0.853383
Success!
GFlops: 0.00519562

from data-parallel-cpp.

bashbaug avatar bashbaug commented on August 26, 2024

Thanks for investigating further. I can reproduce the excessive execution time using -O0 also.

I'm checking to see if there is a way to compile the host code with -O0 for easier debugging but to keep the device code (that executes on the GPU, and is leading to the excessive execution time) using a different optimization level.

Would this satisfy your use-case? I see you mentioned above:

I normally build with -O0 due to the poor debugger support with -O2 optimization.

from data-parallel-cpp.

jnorwood avatar jnorwood commented on August 26, 2024

I already have the work-arounds of reducing MatrixSize and/or using Q{cpu_selector{}}.

With MatrixSize==128 and using gpu_selector I can wait for 4 minutes without executing a single iteration, so I presume something is hung.

Using cpu_selector, fig_15_3_single_task completes an iteration in about 0.4 sec.

There is a document on gdb for gpu: gpu_debug , which I'm linking here for reference. It mentions setting heartbeat_interval, enable_hangcheck and preempt_timeout settings, which I haven't explicitly set.

I'll come back to this problem after finishing the dpc++ book examples and see if I can debug the gpu hang further.

from data-parallel-cpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.