Giter VIP home page Giter VIP logo

compute-runtime's Introduction

Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver

Introduction

The Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver is an open source project providing compute API support (Level Zero, OpenCL) for Intel graphics hardware architectures (HD Graphics, Xe).

What is NEO?

NEO is the shorthand name for Compute Runtime contained within this repository. It is also a development mindset that we adopted when we first started the implementation effort for OpenCL.

The project evolved beyond a single API and NEO no longer implies a specific API. When talking about a specific API, we will mention it by name (e.g. Level Zero, OpenCL).

License

The Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver is distributed under the MIT License.

You may obtain a copy of the License at: https://opensource.org/licenses/MIT

Supported Platforms

Platform OpenCL Level Zero
Intel Core Processors with Gen8 graphics devices (formerly Broadwell) 3.0 -
Intel Core Processors with Gen9 graphics devices (formerly Skylake, Kaby Lake, Coffee Lake) 3.0 Y
Intel Atom Processors with Gen9 graphics devices (formerly Apollo Lake, Gemini Lake) 3.0 -
Intel Core Processors with Gen11 graphics devices (formerly Ice Lake) 3.0 Y
Intel Atom Processors with Gen11 graphics devices (formerly Elkhart Lake) 3.0 -
Intel Core Processors with Gen12 graphics devices (formerly Tiger Lake, Rocket Lake, Alder Lake) 3.0 Y

Release cadence

Release cadence changed from weekly to monthly late 2022

  • At the beginning of each calendar month, we identify a well-tested driver version from the previous month as a release candidate for our monthly release.
  • We create a release branch and apply selected fixes for significant issues.
  • The branch naming convention is releases/yy.ww (yy - year, ww - work week of release candidate)
  • The builds are tagged using the following format: yy.ww.bbbbb.hh (yy - year, ww - work week, bbbbb - incremental build number from the master branch, hh - incremental commit number on release branch).
  • We publish and document a monthly release from the tip of that branch.
  • During subsequent weeks of a given month, we continue to cherry-pick fixes to that branch and may publish a hotfix release.
  • Quality level of the driver (per platform) will be provided in the Release Notes.

Installation Options

To allow NEO access to GPU device make sure user has permissions to files /dev/dri/renderD*.

Via system package manager

NEO is available for installation on a variety of Linux distributions and can be installed via the distro's package manager.

For example on Ubuntu* 22.04:

apt-get install intel-opencl-icd

Manual download

.deb packages for Ubuntu are provided along with installation instructions and Release Notes on the release page

Linking applications

Directly linking to the runtime library is not supported:

Dependencies

In addition, to enable performance counters support, the following packages are needed:

How to provide feedback

By default, please submit an issue using native github.com interface.

How to contribute

Create a pull request on github.com with your patch. Make sure your change is cleanly building and passing ULTs. A maintainer will contact you if there are questions or concerns. See contribution guidelines for more details.

See also

Level Zero specific

OpenCL specific

(*) Other names and brands may be claimed as property of others.

compute-runtime's People

Contributors

adamcetnerowski avatar arturharasimiuk avatar bartoszdunajski avatar compute-runtime-automation avatar compute-runtime-validation avatar ddabek-i avatar dziubanmaciejintel avatar fhazubski-intel avatar fzwolinski avatar hoppemateusz avatar ivvenevt avatar jablonskimateusz avatar jacekdanecki avatar jchodor avatar jitendrasharma1989 avatar joshuaranjan avatar kamdiedrich avatar kamilkoprykintel avatar kcencele avatar kgibala avatar km-nowak avatar krystianchmielewski avatar lukaszjobczyk avatar maciejplewka avatar michalmrozek avatar pawel-cieslak avatar smilczar avatar smorek-intel avatar wallted avatar zzdanowicz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

compute-runtime's Issues

build failure

In the "Copying OCaml library component llvm_X86.cmx to intermediate are" invoked from cmake -DBUILD_TYPE=Release -DCMAKE_BUILD_TYPE=Release ../neo when following the build instructions gives the following error:

CMake Error at bindings/ocaml/llvm/cmake_install.cmake:49 (file):
  file cannot create directory: /usr/lib/ocaml/llvm.  Maybe need
  administrative privileges.
Call Stack (most recent call first):
  bindings/ocaml/cmake_install.cmake:42 (include)
  cmake_install.cmake:66 (include)


make[3]: *** [Makefile:129: install] Error 1
make[2]: *** [igc/IGC/BiFModule/clang_build/CMakeFiles/cclang.dir/build.make:131: igc/IGC/BiFModule/clang_build/src/src/cclang-stamp/cclang-install] Error 2
make[1]: *** [CMakeFiles/Makefile2:1515: igc/IGC/BiFModule/clang_build/CMakeFiles/cclang.dir/all] Error 2
make: *** [Makefile:152: all] Error 2

which seems to imply that configuration requires root. This looks like a minor bug, but it does cause fairly significant worry for packagers.

GPU hang on Intel Core i7-7500U CPU

With integrated gpu cycles free to burn on my Ubuntu system, I decided to run https://github.com/gcp/leela-zero with this driver. In the middle of the self-tuning process, it hung, with the following dmesg messages shortly after about 6 seconds of screen freeze:

[ 6583.292439] [drm] GPU HANG: ecode 9:0:0x8ed9fff2, in leelaz [9165], reason: Hang on rcs0, action: reset
[ 6583.292441] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 6583.292442] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 6583.292442] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 6583.292442] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 6583.292443] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 6583.292448] i915 0000:00:02.0: Resetting rcs0 after gpu hang

The contents of /sys/class/drm/card0/error I copypasted to https://gist.github.com/eaglgenes101/2c30c93d15953c75d476c2486a0b608a .

When I run the self-tuning process with beignet, it runs fine, but with repeated messages of "Beignet: "Work group size exceed Kernel's work group size."" during the tuning process, increasing with frequency as the tuning process goes on.

Where should I start digging deeper about this problem? And what further information should I provide to help solve this crash?

Build failure on gcc 7.2.1

/home/raun/workspace/neo/runtime/mem_obj/image.cpp: In static member function ‘static OCLRT::Image* OCLRT::Image::create(OCLRT::Context*, cl_mem_flags, const OCLRT::SurfaceFormatInfo*, const cl_image_desc*, const void*, cl_int&)’:
/home/raun/workspace/neo/runtime/mem_obj/image.cpp:146:24: error: this statement may fall through [-Werror=implicit-fallthrough=]
             imageDepth = imageDesc->image_depth;
             ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
/home/raun/workspace/neo/runtime/mem_obj/image.cpp:147:9: note: here
         case CL_MEM_OBJECT_IMAGE2D:

Maybe have a way to disable warnings as errors?

kernel: Failed to release pages: bind_count=1, pages_pin_count=1, pin_global=0

Trying to run on current master of the driver, ubuntu 18.04 (kernel 4.15), and using mmap for the file, I got that in the kernel logs:

May  3 14:32:33 portable-alex kernel: [ 6993.616308] ------------[ cut here ]------------
May  3 14:32:33 portable-alex kernel: [ 6993.616310] Failed to release pages: bind_count=1, pages_pin_count=1, pin_global=0
May  3 14:32:33 portable-alex kernel: [ 6993.616395] WARNING: CPU: 1 PID: 8071 at /build/linux-QLn4bB/linux-4.15.0/drivers/gpu/drm/i915/i915_gem_userptr.c:89 cancel_userptr+0xe8/0xf0 [i915]
May  3 14:32:33 portable-alex kernel: [ 6993.616396] Modules linked in: msr ccm pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat rfcomm nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables devlink cmac iptable_filter snd_hrtimer ipmi_devintf ipmi_msghandler bnep binfmt_misc nls_iso8859_1 arc4 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_soc_skl snd_soc_skl_ipc kvm_intel snd_hda_ext_core snd_soc_sst_dsp snd_soc_sst_ipc snd_hda_codec_hdmi snd_soc_acpi kvm uvcvideo videobuf2_vmalloc videobuf2_memops btusb videobuf2_v4l2 irqbypass cdc_mbim btrtl cdc_wdm crct10dif_pclmul snd_hda_codec_realtek crc32_pclmul cdc_ncm
May  3 14:32:33 portable-alex kernel: [ 6993.616427]  videobuf2_core ghash_clmulni_intel videodev btbcm snd_hda_codec_generic usbnet pcbc iwlmvm snd_soc_core cdc_acm btintel mii media mac80211 snd_compress ac97_bus snd_pcm_dmaengine bluetooth aesni_intel aes_x86_64 crypto_simd glue_helper cryptd ecdh_generic intel_cstate intel_rapl_perf iwlwifi idma64 virt_dma snd_seq_midi snd_hda_intel snd_seq_midi_event input_leds joydev cfg80211 snd_hda_codec serio_raw snd_hda_core snd_hwdep intel_lpss_pci thinkpad_acpi mei_me intel_wmi_thunderbolt wmi_bmof snd_pcm nvram snd_rawmidi mei processor_thermal_device intel_lpss shpchp intel_pch_thermal ucsi_acpi typec_ucsi intel_soc_dts_iosf snd_seq typec snd_seq_device snd_timer snd soundcore int3403_thermal int340x_thermal_zone int3400_thermal tpm_crb acpi_pad acpi_thermal_rel mac_hid sch_fq_codel parport_pc
May  3 14:32:33 portable-alex kernel: [ 6993.616457]  ppdev lp parport ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear uas usb_storage i915 i2c_algo_bit drm_kms_helper e1000e psmouse syscopyarea sysfillrect nvme sysimgblt ptp pps_core fb_sys_fops nvme_core thunderbolt drm wmi video
May  3 14:32:33 portable-alex kernel: [ 6993.616477] CPU: 1 PID: 8071 Comm: kworker/u16:2 Tainted: G           OE    4.15.0-21-generic #22-Ubuntu
May  3 14:32:33 portable-alex kernel: [ 6993.616478] Hardware name: LENOVO 20L7CTO1WW/20L7CTO1WW, BIOS N22ET34W (1.11 ) 03/13/2018
May  3 14:32:33 portable-alex kernel: [ 6993.616499] Workqueue: i915-userptr-release cancel_userptr [i915]
May  3 14:32:33 portable-alex kernel: [ 6993.616515] RIP: 0010:cancel_userptr+0xe8/0xf0 [i915]
May  3 14:32:33 portable-alex kernel: [ 6993.616516] RSP: 0000:ffffa45552e57e60 EFLAGS: 00010282
May  3 14:32:33 portable-alex kernel: [ 6993.616518] RAX: 0000000000000000 RBX: ffff8b3a5115fc00 RCX: 0000000000000006
May  3 14:32:33 portable-alex kernel: [ 6993.616518] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff8b3ac1456490
May  3 14:32:33 portable-alex kernel: [ 6993.616519] RBP: ffffa45552e57e78 R08: 0000000000000001 R09: 0000000000000577
May  3 14:32:33 portable-alex kernel: [ 6993.616520] R10: ffffa45552e57e38 R11: 0000000000000000 R12: ffff8b3a5115fda8
May  3 14:32:33 portable-alex kernel: [ 6993.616521] R13: 0000000000000000 R14: ffff8b38e1ad0700 R15: 0000000000000000
May  3 14:32:33 portable-alex kernel: [ 6993.616522] FS:  0000000000000000(0000) GS:ffff8b3ac1440000(0000) knlGS:0000000000000000
May  3 14:32:33 portable-alex kernel: [ 6993.616523] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May  3 14:32:33 portable-alex kernel: [ 6993.616523] CR2: 00007f8b333c3000 CR3: 00000004e0c0a001 CR4: 00000000003606e0
May  3 14:32:33 portable-alex kernel: [ 6993.616524] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May  3 14:32:33 portable-alex kernel: [ 6993.616525] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
May  3 14:32:33 portable-alex kernel: [ 6993.616526] Call Trace:
May  3 14:32:33 portable-alex kernel: [ 6993.616530]  process_one_work+0x1de/0x410
May  3 14:32:33 portable-alex kernel: [ 6993.616532]  worker_thread+0x32/0x410
May  3 14:32:33 portable-alex kernel: [ 6993.616534]  kthread+0x121/0x140
May  3 14:32:33 portable-alex kernel: [ 6993.616535]  ? process_one_work+0x410/0x410
May  3 14:32:33 portable-alex kernel: [ 6993.616537]  ? kthread_create_worker_on_cpu+0x70/0x70
May  3 14:32:33 portable-alex kernel: [ 6993.616539]  ret_from_fork+0x35/0x40
May  3 14:32:33 portable-alex kernel: [ 6993.616540] Code: bf 46 ff ff eb c9 8b 93 c8 01 00 00 8b 8b a4 01 00 00 48 c7 c7 70 d2 61 c0 8b b3 9c 01 00 00 c6 05 f5 56 10 00 01 e8 38 85 94 ee <0f> 0b eb bc 0f 1f 40 00 0f 1f 44 00 00 55 ba 08 00 00 00 48 89 
May  3 14:32:33 portable-alex kernel: [ 6993.616566] ---[ end trace 5d962f9c324bf541 ]---

So far, a one-time message, but this might be a hint of something?

Segfault in clReleaseProgram when using shared_ptr

I'm on a Debian 9 system with an Intel Core i5-6200U CPU with a HD Graphics 520 (Skylake ULT GT2) GPU. I've installed the .deb driver from https://github.com/intel/compute-runtime/releases/tag/2018ww16-010750

I'm wrapping a cl_program object in a class with constructor/destructor. When I use that object with a std::shared_ptr and store it in a cache, I'm encountering a segfault in clReleaseProgram when cleaning-up. The exact same code works well using the old Intel driver and Intel's Beignet.

I've created a single-source ~100 line minimal reproducible example here. It can be compiled and ran e.g. as follows: g++ --std=c++11 -g clReleaseProgram_segfault.cpp -o test -lOpenCL && ./test. The first few lines have 3 defines: if any of them is enabled some small changes are made to the code and the segfault is gone. So it really seems to be a combination of three: std::shared_ptr, cache, and building the program.

The stack trace just points to clReleaseProgram, there are no debug symbols further in the libraries it seems:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6757a30 in ?? () from /opt/intel/opencl/libigdrcl.so
#0  0x00007ffff6757a30 in ?? () from /opt/intel/opencl/libigdrcl.so
#1  0x00007ffff67582b0 in ?? () from /opt/intel/opencl/libigdrcl.so
#2  0x00007ffff6727ab2 in ?? () from /opt/intel/opencl/libigdrcl.so
#3  0x00007ffff672897e in ?? () from /opt/intel/opencl/libigdrcl.so
#4  0x00007ffff6728b29 in ?? () from /opt/intel/opencl/libigdrcl.so
#5  0x00007ffff6710bcb in ?? () from /opt/intel/opencl/libigdrcl.so
#6  0x0000555555555972 in Program::~Program (this=0x55555585dd30, __in_chrg=<optimized out>) at clReleaseProgram_segfault.cpp:39
#7  0x0000555555557ffa in __gnu_cxx::new_allocator<Program>::destroy<Program> (this=0x55555585dd30, __p=0x55555585dd30) at /usr/include/c++/6/ext/new_allocator.h:124
#8  0x0000555555557fcd in std::allocator_traits<std::allocator<Program> >::destroy<Program> (__a=..., __p=0x55555585dd30) at /usr/include/c++/6/bits/alloc_traits.h:487
(...)
#21 0x00007ffff6fb52e8 in __libc_start_main (main=0x5555555554e1 <main()>, argc=1, argv=0x7fffffffe0a8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffe098) at ../csu/libc-start.c:325
#22 0x00005555555552ba in _start ()

Valgrind doesn't give much extra information. Anything I can still try? Hopefully with this example you can easily reproduce it and investigate.

Any ETA for Fine grained SVM on Skylake?

Hi,
thanks first for providing precompiled binaries on a weekly basis!
I see you keep adding "Fine grained SVM is not supported in this release" in all releases..
it's coming soon? Any ETA to share?

thanks..

In-kernel debugging fails with vtune

Not sure if this is a bug of vtune, compute-runtime or something else so I post here.

While I finally managed to get vtune to work with compute-runtime (documentation needs to be more explicit ! Only one slide presentation mentions the need for CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS=y, and I had to put libmd.so from intel/metrics-discovery in vtune dir which is documented nowhere !), gpu-hotspots works well but not gpu-profile.

Indeed gpu-profiling spawns "Error: Internal collection error has occurred. Please contact the technical support." which results in no in-kernel profiling.

cloc_tests segfaults

After a build cloc_tests segfaults. It looks like some files are not copied into the bin director. All other tests run and pass.

Log below:

[raun@localhost bin]$ ./cloc_tests
Iteration: 1
Error: Input file test_files/copybuffer.cl missing.

/home/raun/workspace/neo/unit_tests/offline_compiler/offline_compiler_tests.cpp:59: Failure
Expected: (nullptr) != (pOfflineCompiler), actual: 8-byte object <00-00 00-00 00-00 00-00> vs NULL

/home/raun/workspace/neo/unit_tests/offline_compiler/offline_compiler_tests.cpp:60: Failure
Value of: retVal
  Actual: -5151
Expected: 0
[  FAILED  ] OfflineCompilerTests.GoodArgTest
Error: Input file test_files/copybuffer.cl missing.

/home/raun/workspace/neo/unit_tests/offline_compiler/offline_compiler_tests.cpp:77: Failure
Value of: internalOptions
Expected: has substring "cl_khr_3d_image_writes"
  Actual: ""
[  FAILED  ] OfflineCompilerTests.TestExtensions
Error: Input file test_files/copybuffer.cl missing.

/home/raun/workspace/neo/unit_tests/offline_compiler/offline_compiler_tests.cpp:90: Failure
Expected: (nullptr) != (pOfflineCompiler), actual: 8-byte object <00-00 00-00 00-00 00-00> vs NULL

/home/raun/workspace/neo/unit_tests/offline_compiler/offline_compiler_tests.cpp:91: Failure
Value of: retVal
  Actual: -5151
Expected: 0
Segmentation fault (core dumped)
[raun@localhost bin]$

Can neo be built on Windows?

I tried to build with windows cmake and seems some linux related headers is missing:
:\Neo-github\build\CMakeFiles\CMakeTmp\CheckIncludeFile.c(1): fatal error C1083: Cannot open include file: 'pthread.h': No such file or directory [D:\Neo-github\build\CMakeFiles\CMakeTmp\cmTC_e39e3.vcxproj]

not sure if something missing from windows cmake environment or windows build is not in your plan.
Thanks,
Mariam

segfault in OCLRT::DrmMemoryManager::allocUserptr when not logged in graphically

I'm getting an segfault in allocUserptr when I'm not logged to a graphical Xorg session. After a fresh reboot, GDM is running but I'm not logged in graphically yet. I then ssh into the machine and this happens:

[raun@localhost test]$ LD_LIBRARY_PATH=. gdb clinfo
GNU gdb (GDB) Fedora 8.0.1-35.fc27

(gdb) r
Starting program: /home/raun/test/clinfo
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.26-24.fc27.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
KHR ICD trace at /home/raun/OpenCL-ICD-Loader/icd.c:68: attempting to add vendor /home/raun/workspace/build/bin/libigdrcl.so...

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff690cd19 in OCLRT::DrmMemoryManager::allocUserptr(unsigned long, unsigned long, unsigned long, bool) ()
   from /home/raun/workspace/build/bin/libigdrcl.so
Missing separate debuginfos, use: dnf debuginfo-install libgcc-7.2.1-2.fc27.x86_64 libstdc++-7.2.1-2.fc27.x86_6

If I log graphically the segfault when running OpenCL applications my ssh session no longer happens. So the sequence is:

  • reboot
  • ssh to machine
  • run clinfo
    • get above error
  • log in graphically to the physical machine
  • return to ssh session
    • clinfo runs perfectly.

Being able to run OpenCL applications remotely without Xorg running would be very nice to have.

NEO is slower than Beignet with Intel HD Graphics 500

Hi,

I use BOINC on my NAS (Intel J3455 CPU with Intel HD Graphics 500) and it shows that the NEO driver (72 GFLOPS) slower then the Beignet driver (96 GFLOPS).

j3455 - beignet

j3455 - neo

Although, my laptop (Intel 8250U CPU with Intel UHD Graphics 620) faster with the NEO Driver (211 GFLOPS) than with the Beignet (192 GFLOPS)

8250u - beignet

8250u - neo

Does Neo support work_group_reduce_min?

Hello.

1:16:24: error: implicit declaration of function 'work_group_reduce_min' is invalid in OpenCL
            dist_i_j = work_group_reduce_min(dist_i_j);
                       ^
1:16:24: note: did you mean 'sub_group_reduce_min'?
CTHeader.h:5863:39: note: 'sub_group_reduce_min' declared here
double  __attribute__((overloadable)) sub_group_reduce_min( double x );
                                      ^

Thank you.

Missing documentation on how to build and run an OpenCL example program

I managed to successfully build the runtime as a deb package and after its installation clinfo reports platforms and devices correctly. But now I'm struggling to compile an example program like [1]. It builds and works perfectly when linked to Beignet.

With the current runtime I get

$ gcc -g -w fft.c -o fft -I../opencl_headers -L/opt/intel/opencl -ligdrcl -lm
/tmp/ccYHHFKV.o: In function `fftCore':
/home/rojkov/work/vpg/opencl-book-example/fft.c:63: undefined reference to `clCreateKernel'
/home/rojkov/work/vpg/opencl-book-example/fft.c:68: undefined reference to `clCreateKernel'
/home/rojkov/work/vpg/opencl-book-example/fft.c:73: undefined reference to `clCreateKernel'
/home/rojkov/work/vpg/opencl-book-example/fft.c:91: undefined reference to `clSetKernelArg'
/home/rojkov/work/vpg/opencl-book-example/fft.c:96: undefined reference to `clSetKernelArg'
/home/rojkov/work/vpg/opencl-book-example/fft.c:101: undefined reference to `clSetKernelArg'
/home/rojkov/work/vpg/opencl-book-example/fft.c:106: undefined reference to `clSetKernelArg'
/home/rojkov/work/vpg/opencl-book-example/fft.c:112: undefined reference to `clSetKernelArg'
/tmp/ccYHHFKV.o:/home/rojkov/work/vpg/opencl-book-example/fft.c:117: more undefined references to `clSetKernelArg' follow
/tmp/ccYHHFKV.o: In function `fftCore':
/home/rojkov/work/vpg/opencl-book-example/fft.c:151: undefined reference to `clEnqueueNDRangeKernel'
/home/rojkov/work/vpg/opencl-book-example/fft.c:163: undefined reference to `clSetKernelArg'
/home/rojkov/work/vpg/opencl-book-example/fft.c:169: undefined reference to `clEnqueueNDRangeKernel'
/home/rojkov/work/vpg/opencl-book-example/fft.c:176: undefined reference to `clWaitForEvents'
/home/rojkov/work/vpg/opencl-book-example/fft.c:182: undefined reference to `clGetEventProfilingInfo'
/home/rojkov/work/vpg/opencl-book-example/fft.c:190: undefined reference to `clGetEventProfilingInfo'
/home/rojkov/work/vpg/opencl-book-example/fft.c:204: undefined reference to `clEnqueueNDRangeKernel'
/home/rojkov/work/vpg/opencl-book-example/fft.c:211: undefined reference to `clWaitForEvents'
/home/rojkov/work/vpg/opencl-book-example/fft.c:219: undefined reference to `clReleaseKernel'
/home/rojkov/work/vpg/opencl-book-example/fft.c:224: undefined reference to `clReleaseKernel'
/home/rojkov/work/vpg/opencl-book-example/fft.c:229: undefined reference to `clReleaseKernel'
/tmp/ccYHHFKV.o: In function `main':
/home/rojkov/work/vpg/opencl-book-example/fft.c:306: undefined reference to `clGetPlatformIDs'
/home/rojkov/work/vpg/opencl-book-example/fft.c:311: undefined reference to `clGetDeviceIDs'
/home/rojkov/work/vpg/opencl-book-example/fft.c:319: undefined reference to `clCreateContext'
/home/rojkov/work/vpg/opencl-book-example/fft.c:326: undefined reference to `clCreateCommandQueue'
/home/rojkov/work/vpg/opencl-book-example/fft.c:334: undefined reference to `clCreateBuffer'
/home/rojkov/work/vpg/opencl-book-example/fft.c:340: undefined reference to `clCreateBuffer'
/home/rojkov/work/vpg/opencl-book-example/fft.c:346: undefined reference to `clCreateBuffer'
/home/rojkov/work/vpg/opencl-book-example/fft.c:355: undefined reference to `clEnqueueWriteBuffer'
/home/rojkov/work/vpg/opencl-book-example/fft.c:364: undefined reference to `clCreateProgramWithSource'
/home/rojkov/work/vpg/opencl-book-example/fft.c:372: undefined reference to `clBuildProgram'
/home/rojkov/work/vpg/opencl-book-example/fft.c:379: undefined reference to `clCreateKernel'
/home/rojkov/work/vpg/opencl-book-example/fft.c:384: undefined reference to `clCreateKernel'
/home/rojkov/work/vpg/opencl-book-example/fft.c:389: undefined reference to `clCreateKernel'
/home/rojkov/work/vpg/opencl-book-example/fft.c:396: undefined reference to `clSetKernelArg'
/home/rojkov/work/vpg/opencl-book-example/fft.c:397: undefined reference to `clSetKernelArg'
/home/rojkov/work/vpg/opencl-book-example/fft.c:399: undefined reference to `clEnqueueNDRangeKernel'
/home/rojkov/work/vpg/opencl-book-example/fft.c:410: undefined reference to `clSetKernelArg'
/home/rojkov/work/vpg/opencl-book-example/fft.c:411: undefined reference to `clSetKernelArg'
/home/rojkov/work/vpg/opencl-book-example/fft.c:412: undefined reference to `clSetKernelArg'
/home/rojkov/work/vpg/opencl-book-example/fft.c:414: undefined reference to `clEnqueueNDRangeKernel'
/home/rojkov/work/vpg/opencl-book-example/fft.c:426: undefined reference to `clSetKernelArg'
/home/rojkov/work/vpg/opencl-book-example/fft.c:427: undefined reference to `clSetKernelArg'
/home/rojkov/work/vpg/opencl-book-example/fft.c:428: undefined reference to `clSetKernelArg'
/home/rojkov/work/vpg/opencl-book-example/fft.c:430: undefined reference to `clEnqueueNDRangeKernel'
/home/rojkov/work/vpg/opencl-book-example/fft.c:443: undefined reference to `clSetKernelArg'
/home/rojkov/work/vpg/opencl-book-example/fft.c:444: undefined reference to `clSetKernelArg'
/home/rojkov/work/vpg/opencl-book-example/fft.c:446: undefined reference to `clEnqueueNDRangeKernel'
/home/rojkov/work/vpg/opencl-book-example/fft.c:457: undefined reference to `clEnqueueReadBuffer'
/home/rojkov/work/vpg/opencl-book-example/fft.c:482: undefined reference to `clFlush'
/home/rojkov/work/vpg/opencl-book-example/fft.c:487: undefined reference to `clFinish'
/home/rojkov/work/vpg/opencl-book-example/fft.c:492: undefined reference to `clReleaseKernel'
/home/rojkov/work/vpg/opencl-book-example/fft.c:497: undefined reference to `clReleaseKernel'
/home/rojkov/work/vpg/opencl-book-example/fft.c:502: undefined reference to `clReleaseKernel'
/home/rojkov/work/vpg/opencl-book-example/fft.c:507: undefined reference to `clReleaseProgram'
/home/rojkov/work/vpg/opencl-book-example/fft.c:512: undefined reference to `clReleaseMemObject'
/home/rojkov/work/vpg/opencl-book-example/fft.c:518: undefined reference to `clReleaseMemObject'
/home/rojkov/work/vpg/opencl-book-example/fft.c:524: undefined reference to `clReleaseMemObject'
/home/rojkov/work/vpg/opencl-book-example/fft.c:530: undefined reference to `clReleaseCommandQueue'
/home/rojkov/work/vpg/opencl-book-example/fft.c:535: undefined reference to `clReleaseContext'
collect2: error: ld returned 1 exit status

readelf shows that the missing functions exist in the lib, but not exported for some reason. The only exported functions seem to be only clIcdGetPlatformIDsKHR, clGetPlatformInfo and clGetExtensionFunctionAddress. How the lib is supposed to be used then?

Could you please provide a short intro on how to use the runtime?

[1] https://github.com/rojkov/opencl-fft-example

Android x86 support (aka Android-IA) for Neo driver?

Hi,
I say that because altough I newer owned an Android device with x86 processor either tablet or phone,
seems Intel provided OpenCL support on these devices..
even I seem to remember that some year ago Intel shipped Android opencl library and info on linking it with Intel Compiler for Android or/and some Intel OpenCL SDK version..
anyway I believe this devices used similar closed source binary components similar to say Windows driver at the time..
anyway now Android-IA efforts seem to have moved towards open source stack using Mesa as graphics driver component of Android-IA..
as Mesa has CLOVER OpenCL driver which is very "unadvanced" to say it cleanly, Intel users have beignet option and has had support for Android at least for a couple of years:
https://cgit.freedesktop.org/beignet/log/?qt=grep&q=android
https://cgit.freedesktop.org/beignet/commit/?id=be0ae741a9064cebf5b5c9a277ae895086bdbddd
"docs/howto/android-build-howto.mdwn"
In the case I'm interested is on Android x86 ISOs for PC/laptops we have modern releases like:
http://www.android-x86.org/releases/releasenote-7-1-r1
which using Mesa and VAAPI expose OpenGL ES 3.2 and HW accelerated video decode (encode?) on computers with modern Intel HD graphics running this Android-x86 ISOs..
lastly if not enough very recently have started appearing builds with Vulkan support also.. using Mesa ANVIL driver.. support is very early for Vulkan only VulkanCapsViewer works.. graphical demos or apps with Vulkan support like PPSSPP emulator still have issues..
this builds are Android 8.1 (Oreo) experimental builds provided by @maurossi here:
https://drive.google.com/drive/folders/0B_OFHiIqgpSFTFpkQWc1eXV3ME0
which ship with latest kernel 4.16rc1+Mesa18.1dev using LLVM6.0 for amdgpu DC..
sadly still have yet to see any of this ISOs to integrate beignet OpenCL driver as, altough Google doesn't officially ship/endorse OpenCL on his phones, many (ARM64) devices ship with modern OpenCL drivers..
for example Xiaomi with modern Qualcomm SOCs and say Adreno 5xx have OpenCL2.0 support)..
of course there are also apps on the Play store so Google isn't blocking them either (for example Geekbench Compute non free version has OpenCL support similar to Compubench CL app)..

sorry for long story..
all it was done to motivate Neo devs to include also Android-IA (Android-x86 or call you what you want) support for this new driver and a possible build guide to integrate this NEO OpenCL driver on the standard Android-x86 ISO build process so some ISOs can start appearing supporting:
OpenGL ES3.2+Vulkan+OpenCL2.x..
basically all APIs..

also not related to you but some Intel Android devs or Intel Mesa dev should try to enable full OpenGL on Android..
Nvidia does this already on his Android devices Shield tablet and TV so not asking something from other world..
of course full OpenGL support it's provided already by Intel Mesa driver we only need keeping it builded also for Android-IA and that should align perfectly with new Wine for Android project (https://dl.winehq.org/wine-builds/android/):

extracted from private communication with Mauro Rossi:

more info on full GL on Android:
(by nvidia devtech)
http://jamesdolan.blogspot.com.es/2014/06/opengl-44-and-beyond-on-android.html
also seems Mesa full OpenGL support for Android was removed a while ago:
(egl: treat EGL_OPENGL_API as invalid on Android)
https://github.com/intel/external-mesa/commit/7563c39641d07a604f22a9d47861121ce80116f9
it says " /* OpenGL is not a valid/supported API on Android */ " but Nvidia is able to overcome this..
I assume this is similar to OpenCL situation on Android: not officially supported by Google..
but some vendors and IHV ships phones with it..
for ex. Xiaomi phones with modern Qualcomm SOCS ship with a OpenCL 2.0 driver on Android..
don't know if reverting this patch is all that is needed for having full GL on Mesa drivers on Android..

from earlier private communication:

also Wine Android x86 project ships with opengl support but seems limited to passing to underlying OpenGL ES driver.. 
don't know if asked before but Mesa having full GL support (albeit core profile only) will be a dream come true..
to see full OpenGL support build too (not only OpenGL ES).. Nvidia Shield devices (tablet and TV) support that too.. you can check by using
https://play.google.com/store/apps/details?id=de.saschawillems.glescapsviewer
it shows in EGL on Shield tablet:
Client APIs(2):
OpenGL_ES
OpenGL
while Android x86 ISOs show:
CLient APIs(1):
OpenGL_ES..
(seems that is a EGL flag you set in EGL API context creation EGL_RENDERABLE_TYPE)
with that and Wine Android minor modification to load the full GL underlying Android driver we should be able to run also lots OpenGL Windows apps/games..
for example RPCS3,Cemu emulator use OpenGL 4.5 (Cemu requires mesa-mild github which exposes minimal modifications for compat profile support)
of course I tested these on Vega on Mesa 18.1dev and they are working via Wine on Ubuntu so should work also on Android-x86..
also full GL support on Wine should allow to work his D3D11 emulation layers that since Wine version 3.0 allow working on OpenGL implementations with core profile only support like Mesa..

Slice/Subslice powergating

Hi,

We're currently working on a patch series in the i915 kernel driver to enable per context slice & subslice powergating : https://patchwork.freedesktop.org/series/42285/

Currently we have that feature disabled by default in this i915 series and it can be turned on by setting a sysfs entry. This means that the per context settings are ignored by default (and we keep the powergating configuration to nothing powergated by default, like this is the case on the current kernels).
As far as I know this feature is only useful for performance gains on some media workloads.
Mesa is not planning to make use of this, there it only make sense to run workload on the maximum amount of slices/subslices (as far as we're aware).

We're wondering what's the take from the compute developers?
Would this feature be of any use?

There is a side effect to enabling this feature : switching from one powergating configuration to another takes time (we measure a 50~60us delay on Skylake GT4, the delay grows with the size of the GT) and this delay will occur everytime a context switch happens between 2 contexts of different powergating configurations.

Thanks for your comments!

segfault during compilation

The following build step (and other similar steps) crash:

.../compute-runtime/build/bin/cloc -q -file copy_buffer_rect.igdrcl_built_in -device cfl -64 -out_dir .../compute-runtime/
build/bin/built_ins/x64/cfl -cpp_file -options -cl-kernel-arg-info

Backtrace:

#0  0x00007f0585cd2c62 in llvm::MemoryBuffer::getMemBufferRef() const () from .../compute-runtime/build/bin/libigdccl.so
#1  0x00007f05850cf8ca in TC::TranslateBuild (pInputArgs=pInputArgs@entry=0x7ffdcd9729d0, pOutputArgs=pOutputArgs@entry=0x7ffdcd9729a0, 
    inputDataFormatTemp=<optimized out>, IGCPlatform=..., profilingTimerResolution=<optimized out>)
    at .../compute-runtime/igc/IGC/AdaptorOCL/dllInterfaceCompute.cpp:729
#2  0x00007f0585174408 in IGC::IgcOclTranslationCtx<0ul>::Impl::Translate (this=<optimized out>, outVersion=<optimized out>, src=<optimized out>, 
    options=<optimized out>, internalOptions=<optimized out>, tracingOptions=<optimized out>, tracingOptionsCount=<optimized out>)
    at .../compute-runtime/igc/IGC/AdaptorOCL/ocl_igc_interface/impl/igc_ocl_translation_ctx_impl.h:173
#3  0x0000557a3d0ceb45 in OCLRT::OfflineCompiler::buildSourceCode() ()
#4  0x0000557a3d0d43d2 in OCLRT::OfflineCompiler::build() ()
#5  0x0000557a3d0cd66c in main ()

Clarify delta and my own Intel Windows OpenCL Neo vs classic comparison and issues found..

to start you must clarify:
"The driver has the following functional delta compared to previously released drivers"
delta means it adds this functionality vs classic or removes it?

from what I see delta it seems means it's removed vs classic..
at least that's what I see on a Windows driver with NEO driver..

comparison of Intel HD 530 OCL classic on drv 23.20.16.4911 vs NEO on drv 23.20.16.4933 based on clinfo:

REGRESSIONS:
Max read/write args goes from 128 to 0 (not supported?)
Preferred platform/global/local atomic alignment go from 64 to 0 (this is really good or bad?)
no khr_mipmap_image khr_mipmap_image_writes khr_throttle_hints extensions support
ENHANCEMENTS:
Max mem allocation allowed size doubled from 2GB to 4GB approx..
also similar to max Constant buffer on my 32GB RAM system..
"Device OpenCL C version" goes from 2.0 to 2.1 altough "Version" was already 2.1..
there are any new OpenCL C features added on 2.1?
added extension cl_khr_il_program.. : altough don't know what to say SPIR-V was already supported in classic driver.. so seems extension is just exposed now in case..

full clinfo reports are below..

just to end it's strange as "Fine grain system" still remains a "No" on both Intel CPU and Intel GPU device and also on both drivers..
then drv 23.20.16.4944 appeared few days ago.. installed it and now clinfo doesn't detect Intel GPU device goes from:
Platform Name: Intel(R) OpenCL
Number of devices: 2
to:
Platform Name: Intel(R) OpenCL
Number of devices: 1
but CPU device (the only Intel device detected) now for first device reports :
Fine grain system: Yes

(note don't report CPU device as only changes are:
"Driver version" field goes from: 7.5.0.2 to 7.6.0.611
"Verison" field from "OpenCL 2.1 (Build 2)" to OpenCL 2.1 (Build 611)
)

drv 23.20.16.4911
Platform Name: Intel(R) OpenCL
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 8086h
Max compute units: 24
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1150Mhz
Address bits: 64
Max memory allocation: 2147483647
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 128
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 16384
Max image 3D height: 16384
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 524288
Global memory size: 13687249306
Constant buffer size: 2147483647
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Max pipe arguments: 16
Max pipe active reservations: 1
Max pipe packet size: 1024
Max global variable size: 65536
Max global variable preferred total size: 2147483647
Max read/write image args: 128
Max on device events: 1024
Queue on device max size: 67108864
Max on device queues: 1
Queue on device preferred size: 131072
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: Yes
Fine grain system: No
Atomics: Yes
Preferred platform atomic alignment: 64
Preferred global atomic alignment: 64
Preferred local atomic alignment: 64
Kernel Preferred work group size multiple: 32
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 83
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: Yes
Profiling : Yes
Queue on Device properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 000001B894BC4640
Name: Intel(R) HD Graphics 530
Vendor: Intel(R) Corporation
Device OpenCL C version: OpenCL C 2.0
Driver version: 23.20.16.4911
Profile: FULL_PROFILE
Version: OpenCL 2.1
Extensions: cl_intel_accelerator cl_intel_advanced_motion_estimation cl_intel_d3d11_nv12_media_sharing cl_intel_device_side_avc_motion_estimation cl_intel_driver_diagnostics cl_intel_dx9_media_sharing cl_intel_media_block_io cl_intel_motion_estimation cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_required_subgroup_size cl_intel_simultaneous_sharing cl_intel_subgroups cl_intel_subgroups_short cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_fp16 cl_khr_fp64 cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_gl_sharing cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_khr_spir cl_khr_subgroups cl_khr_throttle_hints

drv 23.20.16.4933
Platform Name: Intel(R) OpenCL
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 8086h
Max compute units: 24
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1150Mhz
Address bits: 64
Max memory allocation: 4294959104
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 128
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 16384
Max image 3D height: 16384
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 524288
Global memory size: 13695635456
Constant buffer size: 4294959104
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Max pipe arguments: 16
Max pipe active reservations: 1
Max pipe packet size: 1024
Max global variable size: 65536
Max global variable preferred total size: 4294959104
Max read/write image args: 0
Max on device events: 1024
Queue on device max size: 67108864
Max on device queues: 1
Queue on device preferred size: 131072
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: Yes
Fine grain system: No
Atomics: Yes
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 32
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 83
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: Yes
Profiling : Yes
Queue on Device properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 00000225D9FC7700
Name: Intel(R) HD Graphics 530
Vendor: Intel(R) Corporation
Device OpenCL C version: OpenCL C 2.1
Driver version: 23.20.16.4933
Profile: FULL_PROFILE
Version: OpenCL 2.1 NEO
Extensions: cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_subgroups cl_khr_il_program cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_intel_d3d11_nv12_media_sharing cl_intel_simultaneous_sharing

clGetDeviceInfo(CL_DEVICE_VERSION) demands too large of a buffer

The spec says that the CL_INVALID_VALUE will be returned if the size is less than the declared return type.

The NEO driver (.5018) on Windows 10 returns CL_INVALID_VALUE if the buffer isn't at least 16 bytes which happens to be large enough for the complete device string: "OpenCL 2.1 NEO " + NUL.

I expect clGetDeviceInfo(char[]) to record a trailing NUL but otherwise work on a buffer of 1 byte or more.

Use case: I only want to know obtain the major.minor values.

Trimmed down example:

  struct {
    cl_uchar opencl_space[7]; // "OpenCL_"
    cl_uchar major;
    cl_uchar dot;
    cl_uchar minor;
#if 1 // Intel NEO is too restrictive
    cl_uchar space;
    cl_uchar vendor[32];
#endif
  } device_version;

  cl(GetDeviceInfo(device_id,
                   CL_DEVICE_VERSION,
                   16, // sizeof(device_version),
                   &device_version,
                   NULL));

Request for an extension like cl_qcom_compressed_image..

Hi,
now that Intel OpenCL development is open sourced for all platforms with some effort everyone can implement what they want on top of it.. but anyway I'm requesting for an extension that surprises me hasn't been exposed yet by almost any vendors..
I'm meaning the support for OpenCL kernel to read from compressed textures (images in OpenCL speaking).. I mean textures compressed with texture formats like ASTC,DXTx,BPTC,etc supported by the GPU supporting OpenCL.. note that functionality is supported by all hardware already as OpenGL/OpenGL ES/DirectX compute shaders already allow reading from such textures AFAIK.. still troubles me by Khronos or any vendors expose it in OpenCL.. seems Qualcomm is the only one with cl_qcom_compressed_image altough can't find info on that extension and doesn't seem published by the name seems clear that supports compressed textures and of course must be read_only as writing to compressed textures from compute shaders/OpenCL kernels seems very difficult to support in HW and seems there is no big reason to support it..

of course reading from compressed textures in CL kernels has it's uses.. I'm thinking of custom render pipelines/rasterizers or raytracers implemented with OpenCL kernels.. if we want to use same assets (stored efficiently with compressed textures) as with current GPU pipeline on this novel/hybrid pipelines we need this functionality..
Projects like these were tested with some success years ago:
http://research.nvidia.com/publication/high-performance-software-rasterization-gpus
altough in CUDA and remember seeing in the paper noting as current limitations/future work that compute APIs like CUDA didn't expose compressed texture at the time.. AFAIK it hasn't changed still altough this limitation was pointed by his own Nvidia researchers..
also for example AMD has
https://github.com/GPUOpen-LibrariesAndSDKs/RadeonProRender-Baikal
https://github.com/GPUOpen-LibrariesAndSDKs/RadeonRays_SDK
projects which are raytracers on OpenCL and show complex scenes but of course this scenes can't use compressed textures because OpenCL doesn't allow it as if were going to be rendered natively on GPU graphics APIs..

what do you think?

Build failure with version 18.29.11114

We tried to compile this pre-release https://github.com/intel/compute-runtime/releases/tag/18.29.11114 and we have problems with KhronosGroup/OpenCL-Headers@f039db6.

We got following failure:

In file included from /root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/common_clang.cpp:19:
/root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/common_clang.h:20:10: fatal error: 'CL/cl.h' file not found
#include "CL/cl.h"
         ^~~~~~~~~
In file included from /root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/link.cpp:19:
/root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/common_clang.h:20:10: fatal error: 'CL/cl.h' file not found
#include "CL/cl.h"
         ^~~~~~~~~
1 error generated.
make[5]: *** [projects/cclang/CMakeFiles/opencl_clang.dir/link.cpp.o] Error 1
make[5]: *** Waiting for unfinished jobs....
1 error generated.
make[5]: *** [projects/cclang/CMakeFiles/opencl_clang.dir/common_clang.cpp.o] Error 1
make[4]: *** [projects/cclang/CMakeFiles/opencl_clang.dir/all] Error 2
make[3]: *** [all] Error 2
make[2]: *** [igc/IGC/BiFModule/clang_build/src/src/cclang-stamp/cclang-build] Error 2
make[1]: *** [igc/IGC/BiFModule/clang_build/CMakeFiles/cclang.dir/all] Error 2
make: *** [all] Error 2

As you see the output is fatal error: 'CL/cl.h' file not found . And it is true, folders hierarchy in this repositroy with this commit is different if we compare with HEAD status. After that we moved opencl11/CL/ to CL/ and tried to build (again failure):

In file included from /root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/common_clang.cpp:19:
/root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/common_clang.h:77:11: error: unknown type name 'cl_kernel_arg_address_qualifier'
  virtual cl_kernel_arg_address_qualifier
          ^
/root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/common_clang.h:79:11: error: unknown type name 'cl_kernel_arg_access_qualifier'
  virtual cl_kernel_arg_access_qualifier
          ^
In file included from /root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/link.cpp:19:
/root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/common_clang.h:77:11: error: unknown type name 'cl_kernel_arg_address_qualifier'
  virtual cl_kernel_arg_address_qualifier
         ^
/root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/common_clang.h:81:11: error: unknown type name 'cl_kernel_arg_type_qualifier'
  virtual cl_kernel_arg_type_qualifier
          ^
/root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/common_clang.h:79:11: error: unknown type name 'cl_kernel_arg_access_qualifier'
  virtual cl_kernel_arg_access_qualifier
          ^
/root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/common_clang.h:81:11: error: unknown type name 'cl_kernel_arg_type_qualifier'
  virtual cl_kernel_arg_type_qualifier
          ^
In file included from /root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/common_clang.cpp:24:
/root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/options.h:64:7: warning: 'OpenCLArgList' has virtual functions but non-virtual destructor [-Wnon-virtual-dtor]
class OpenCLArgList : public llvm::opt::ArgList {
      ^
In file included from /root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/link.cpp:21:
/root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/options.h:64:7: warning: 'OpenCLArgList' has virtual functions but non-virtual destructor [-Wnon-virtual-dtor]
class OpenCLArgList : public llvm::opt::ArgList {
      ^
/root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/link.cpp:361:24: error: use of undeclared identifier 'CL_LINK_PROGRAM_FAILURE'
    pResult->setResult(CL_LINK_PROGRAM_FAILURE);
                       ^
/root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/link.cpp:370:24: error: use of undeclared identifier 'CL_LINK_PROGRAM_FAILURE'
    pResult->setResult(CL_LINK_PROGRAM_FAILURE);
                       ^
1 warning and 5 errors generated.
make[5]: *** [projects/cclang/CMakeFiles/opencl_clang.dir/link.cpp.o] Error 1
make[5]: *** Waiting for unfinished jobs....
/root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/common_clang.cpp:252:14: error: use of undeclared identifier 'CL_COMPILE_PROGRAM_FAILURE'
      return CL_COMPILE_PROGRAM_FAILURE;
             ^
/root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/common_clang.cpp:292:35: error: use of undeclared identifier 'CL_COMPILE_PROGRAM_FAILURE'
    return success ? CL_SUCCESS : CL_COMPILE_PROGRAM_FAILURE;
                                  ^
/root/opencl_build/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/common_clang.cpp:303:12: error: use of undeclared identifier 'CL_COMPILE_PROGRAM_FAILURE'
    return CL_COMPILE_PROGRAM_FAILURE;
           ^
1 warning and 6 errors generated.
make[5]: *** [projects/cclang/CMakeFiles/opencl_clang.dir/common_clang.cpp.o] Error 1
make[4]: *** [projects/cclang/CMakeFiles/opencl_clang.dir/all] Error 2
make[3]: *** [all] Error 2
make[2]: *** [igc/IGC/BiFModule/clang_build/src/src/cclang-stamp/cclang-build] Error 2
make[1]: *** [igc/IGC/BiFModule/clang_build/CMakeFiles/cclang.dir/all] Error 2
make: *** [all] Error 2

Then we moved opencl22/CL/ to CL/. It compiled but tests failed:

/root/opencl_build/neo/unit_tests/event/event_tests.cpp:578: Failure
Value of: output.c_str()
  Actual: "waitForCompletionWithTimeout counter 0\ntest"
Expected: "test"

/root/opencl_build/neo/unit_tests/memory_leak_listener.cpp:56: Failure
Value of: (int)MemoryManagement::failingAllocation
  Actual: -1
Expected: leak
Which is: 95
To locate call stack, change the value of captureCallStacks to true
[  FAILED  ] InternalsEventTest.givenBlockedKernelWithPrintfWhenSubmittedThenPrintOutput

=====================
==   ULTs FAILED   ==
=====================
Tests run:      11551
Tests passed:   11550
Tests failed:   1
Tests disabled: 19
Time elapsed:  342 ms
=====================
[  FAILED  ] InternalsEventTest.givenBlockedKernelWithPrintfWhenSubmittedThenPrintOutput

make[2]: *** [run_bxt_unit_tests] Error 1
make[1]: *** [unit_tests/CMakeFiles/run_bxt_unit_tests.dir/all] Error 2
make: *** [all] Error 2

I want to mention that if we build HEAD (by instructions from here https://github.com/intel/compute-runtime/blob/master/documentation/BUILD_Centos.md) all is ok and we got intel-opencl-18.30-0.x86_64-igdrcl.rpm.

The question is which version / folder of CL/ should we use from KhronosGroup/OpenCL-Headers ?

Build failure of test_kernel_Gen9lp with Debian 9 and Skylake GPU

I'm on a Debian 9 system with an Intel Core i5-6200U CPU with a HD Graphics 520 (Skylake ULT GT2) GPU. I'm following the instructions for building on Ubuntu, except that I had to build Clang 4.0 myself from source.

I'm getting quite far with building the whole thing, but then I encounter the following issue (single-threaded make):

Scanning dependencies of target unit_tests
[ 91%] Built target unit_tests
[ 93%] Built target test_kernels_Gen9lp
[ 93%] Built target test_kernel_Gen9lp
[ 93%] Generating ../bin/Gen9lp/test_files/x64/-cl-kernel-debug-enable_Gen9lp.bc, ../bin/Gen9lp/test_files/x64/-cl-kernel-debug-enable_Gen9lp.bin, ../bin/Gen9lp/test_files/x64/-cl-kernel-debug-enable_Gen9lp.gen, ../bin/Gen9lp/test_files/x64/-cl-kernel-debug-enable_Gen9lp.dbg

error: undefined reference to `_Z12get_local_idj()'
undefined reference to `_Z12get_group_idj()'

error: backend compiler failed build.

Build failed with error code: -11
unit_tests/CMakeFiles/test_kernel_debug_enable_Gen9lp.dir/build.make:64: recipe for target 'bin/Gen9lp/test_files/x64/-cl-kernel-debug-enable_Gen9lp.bc' failed
make[2]: *** [bin/Gen9lp/test_files/x64/-cl-kernel-debug-enable_Gen9lp.bc] Error 245
CMakeFiles/Makefile2:5745: recipe for target 'unit_tests/CMakeFiles/test_kernel_debug_enable_Gen9lp.dir/all' failed
make[1]: *** [unit_tests/CMakeFiles/test_kernel_debug_enable_Gen9lp.dir/all] Error 2
Makefile:149: recipe for target 'all' failed
make: *** [all] Error 2

Have you seen this before? Did you ever build the software on Debian? Please let me know if there is anything I can try or any further information you need.

Support new SPIR-V 1.3 subgroup features in OpenCL also?

Hi,
don't know if a dumb question as perhaps needs some Khronos effort to expose new Subgroups functionality available in Vulkan 1.1 and exposed in SPIR-V 1.3 in OpenCL 2.x also..
don't know exactly how compares to SPV_intel_subgroups but seems rich enough and cross-vendor ext. so would also nice to have support for it..
in has basic, vote,shuffle, shuffle relative, quad ops, etc.. and they are already supported by Intel Anvil Vulkan driver on Mesa (you can see here:https://twitter.com/oscarbg81/status/971748592372285440) if I posted twitter link correctly..
so it's not a matter of Intel GPU hardware support..
thanks..

Having OpenCL CPU device support under only one platform (NEO platform) jointly with GPU is possible?

Hi,
don't know if lame question.. but this is what we get on Windows Neo drivers after all.. right now Linux Neo OpenCL driver provides 1 platform with one GPU device, no CPU device..
seems Intel SRB 5.0 package provided both 2 devices: CPU and GPU under 1 platform..
(The intel-opencl-r5.0 (SRB5.0) Linux driver package enables OpenCL 1.2 or 2.0 on the GPU/CPU for the following Intel® processors:)
don't remember about beignet provides, but maybe similar to Neo only GPU device..
do you plan on providing (OpenCL) CPU device under NEO OpenCL platform either by reusing close source driver in Intel SRB 5.0 package or also by open sourcing Intel OpenCL CPU driver..
so briefly right now is there any solution to have both devices CPU+GPU under 1 platform?
may try to hack with SRB 5.0 OpenCL.so loader as seems the gpu driver from SRB package is named similarly to Neo igdrcl or something like that..

any way Intel open sourcing also OpenCL CPU part will be awesome.. for example for providing OpenCL 2.1 with SPIR-V CPU support to MacOS which is stalled at OpenCL 1.2.. of course if made open source doesn't need Intel or Apple approval support could be added by community and will be just a hacky way..
this is similar to how Intel OpenSWR project merged into Mesa can be compiled in MacOS providing a fast OpenGL CPU driver even using latest AVX extensions..
we will be a not obtain a Macos OpenCL.framework but a dynamic library which "homebrew" Mac apps wanting OpenCL 2.1 or SPIR-V support could link into it.. as far as they need only CPU support..
of course don't know if right place to ask for it..
feel free to point me right forum to ask if needed..
and perhaps should open a separate issue asking for it..

thanks..

Can clarify which platorms will get OpenCL 2.2 support? SKL with gen9 will get it?

title says it all..
on 01.org site we see:
"with OpenCL 2.2 software technology support coming later in 2018"
just wanting to know if will only be supported on current platformls like Skylake desktop GPUs or only in upcoming Cannonlake (gen10) or even need Icelake (gen11) (know this future products and graphics gen versions from public Mesa commits)..

Unit tests fail to build on gcc 7.2.1

In file included from /home/raun/workspace/neo/unit_tests/built_ins/built_in_tests.cpp:24:0:
/home/raun/workspace/neo/unit_tests/gen_common/test.h:87:62: error: could not convert ‘testing::internal::GetTypeId<BuiltInTests>()’ from ‘testing::internal::TypeId {aka const void*}’ to ‘testing::internal::CodeLocation’
                  ::testing::internal::GetTypeId<test_fixture>())
                                                              ^
/home/raun/workspace/neo/unit_tests/gen_common/test.h:77:14: note: in definition of macro ‘HWTEST_TEST_’
             (parent_id),                                                                       \
              ^~~~~~~~~

Build failure with version 18.27.11048: igfxfmid.h: No such file or directory

I'm getting the following compile error:

Scanning dependencies of target cloc
[  0%] Building CXX object offline_compiler/CMakeFiles/cloc.dir/offline_compiler.cpp.o
/storage/linux/abs/intel-compute-runtime-git/src/neo/offline_compiler/offline_compiler.cpp:31:10: fatal error: igfxfmid.h: No such file or directory
 #include "igfxfmid.h"
          ^~~~~~~~~~~~
compilation terminated.
make[2]: *** [offline_compiler/CMakeFiles/cloc.dir/build.make:63: offline_compiler/CMakeFiles/cloc.dir/offline_compiler.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:11826: offline_compiler/CMakeFiles/cloc.dir/all] Error 2
make: *** [Makefile:152: all] Error 2

When using make with multiple jobs (-j option), it shows multiple occurrences of this not found file igfxfmid.h, and also an error complaining about the not found file gtsysinfo.h.

  • Current git master gives me the same error.

  • Version 18.26.10987 builds fine for me.

  • A git bisect shows that this is being caused by commit 561cdb7.

Switching occurrences of #include "igfxfmid.h" to #include "common/igfxfmid.h" across the source code fixes the corresponding error for me (and also occurrences of #include "gtsysinfo.h" to #include "common/gtsysinfo.h". But there are a lot of files to change and I did not test all of them to check if it really fixes all errors.

OS: Arch Linux x86_64

Cmake failure

I'm having trouble following the build instructions.

First. Both the compute-runtime and the graphics-compiler both call for identical workspaces and say to create a 'build' directory for cmake. Both projects can't share the same build dir.

  • Am I expected to have 2 separate workspaces, 1 for the runtime and 1 for the compiler?
  • Should I have 2 build dir's in a single workspace, 1 for the runtime and 1 more the compiler?



Second. GoogleMock has been merged into GoogleTest several years ago. The CMakeList.txt for the runtime still seems to assume gtest and gmock are separate projects.

  • Should I be using an old gmock and gtest from when they where seperate projects?



Third. Related to the second, the runtime has several dependencies. Are there known good labels to use when building the workspace?

Publish rpm in Github releases

Hi guys,
Do you have any plans to publish .rpm for CentOS in Github releases?

You have already .deb packages here but it would be nice to have .rpm too.

Compilation error on debian?

When compiling on debian buster, using clang 5.0.1-2, the make step give me:

Scanning dependencies of target igfx_gmmumd_excite
[  1%] Building CXX object gmmlib/Source/GmmLib/CMakeFiles/igfx_gmmumd_excite.dir/__/Common/AssertTracer/AssertTracer.cpp.o
clang: error: unknown argument: '-fno-tree-pre'
clang: warning: -Wl,--no-undefined: 'linker' input unused [-Wunused-command-line-argument]

Are there some compile settings I should change ?

i7 8700K not found by clinfo on Kubuntu 18.04; missing steps?

I'm using Kubuntu 18.04 with an i7 8700K and a GeForce GTX 1070 Ti. I've installed the nvidia-cuda-toolkit which adds support for OpenCL on the GPU. Then, I followed the build instructions of Intel Compute Runtime and installed intel-opencl-igdrcl as described. However, clinfo (from the official package lists) is not able to find the Intel OpenCL implementation.

hendrik@i7-8700K:~/Downloads/intel_compute_runtime/build$ clinfo
Number of platforms                               1
  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 1.2 CUDA 9.1.84
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer
  Platform Extensions function suffix             NV

  Platform Name                                   NVIDIA CUDA
Number of devices                                 1
  Device Name                                     GeForce GTX 1070 Ti
  Device Vendor                                   NVIDIA Corporation
......

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.11
  ICD loader Profile                              OpenCL 2.1

However, there is an intel.icd file:

hendrik@i7-8700K:~/Downloads/intel_compute_runtime/build$ ls /etc/OpenCL/vendors/
intel.icd  nvidia.icd
hendrik@i7-8700K:~/Downloads/intel_compute_runtime/build$ cat /etc/OpenCL/vendors/intel.icd 
/opt/intel/opencl/libigdrcl.so

I've seen #20 (comment) and have added the Khronos ICD loader as described there, but the issue remains the same.

Could you please help me? What am I doing wrong? Thanks in advance for your time!

Supports GVT-g i.e. works under this kind of virtualization..

Hi,
I think this is the last issue I have about the NEO driver:
on GVT-g setup guide (https://github.com/intel/gvt-linux/wiki/GVTg_Setup_Guide) we see:

6 Features Supported
6.1 Highlights Features
Compute:
OpenCL2.0 for Windows/Linux guest
6.2 Validated Benchmarks
OpenCL:
LuxMark, Beignet

so seems GVT-g supports Beignet OpenCL 2.0 driver.. is this new Intel Neo OpenCL tested already tested on Linux guest virtual machine using GVT-g to see if it works correctly?
hope no issues also with new GVT-g QEMU dma_buf support annonued this week: https://01.org/igvt-g/blogs/wangbo85/2018/sharing-guest-framebuffer-host

thanks..
Oscar.

Unable to get working on Broadwell i7 5600U, or missing steps?

I have built and installed without any error (following exactly the steps from https://github.com/intel/compute-runtime#building) on my laptop (ubuntu 17.10, Broadwell i7-5600U), but clinfo returns 0 platform found. Checking clinfo calls with strace shows it is properly picking-up /opt/intel/opencl/libigdrcl.so.

Is Broadwell supported right now? Readme states it is https://github.com/intel/compute-runtime#supported-platforms, but the "GenX" naming is unclear to me, and https://ark.intel.com/products/85215/Intel-Core-i7-5600U-Processor-4M-Cache-up-to-3_20-GHz reports "5th gen". And the GPU is Intel HD Graphics 5500, whose naming would be consistent with 5th gen, but it's not listed on https://www.intel.com/content/www/us/en/architecture-and-technology/visual-technology/graphics-overview.html.

So, am I trying to get it working on an unsupported platform, or did I missed anything to get that working ? Thanks!

OpenCL on Minnowboard Max

Hi everyone,

I'm using a Minnowboard Max with Intel Atom E3825 CPU and its GPU and just compiled and installed the ComputeRuntime (deb-file) on the platform.

But clinfo doesn't get me an OpenCL device like the GPU or CPU. I installed alle required libraries like, ocl-icd, ocl-header and ocl-icd-libopencl1 and also the compute runtime.

Is there any library I have to link somewhere else or am I missing some dependencies?

Thanks

clGetPlatformIDs fails with the included Fedora 27 ICD

Using the ICD included in Fedora 27 I get the following error from NEO. Beignet, POCL, and AMDGPU Mesa implements all work fine with this ICD

[raun@localhost test]$ gdb clinfo
GNU gdb (GDB) Fedora 8.0.1-35.fc27

(gdb) r
Starting program: /home/raun/test/clinfo
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.26-24.fc27.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
terminate called after throwing an instance of 'cl::Error'
  what():  clGetPlatformIDs

Program received signal SIGABRT, Aborted.
0x00007ffff6c7266b in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install libgcc-7.2.1-2.fc27.x86_64 libstdc++-7.2.1-2.fc27.x86_64 ocl-icd-2.2.11-4.fc27.x86_64
(gdb) info sharedlibrary
From                To                  Syms Read   Shared Object Library
0x00007ffff7dd5d50  0x00007ffff7df4c50  Yes (*)     /lib64/ld-linux-x86-64.so.2
0x00007ffff7bcf1a0  0x00007ffff7bd26b6  Yes (*)     /lib64/librt.so.1
0x00007ffff787f880  0x00007ffff7928985  Yes (*)     /lib64/libm.so.6
0x00007ffff7674e50  0x00007ffff7675b7e  Yes (*)     /lib64/libdl.so.2
0x00007ffff7458940  0x00007ffff746a848  Yes (*)     /lib64/libOpenCL.so.1
0x00007ffff723fac0  0x00007ffff724fde5  Yes (*)     /lib64/libgcc_s.so.1
0x00007ffff7023b10  0x00007ffff70322d1  Yes (*)     /lib64/libpthread.so.0
0x00007ffff6c5b770  0x00007ffff6dc816c  Yes (*)     /lib64/libc.so.6
0x00007ffff67f7450  0x00007ffff693008a  Yes         /home/raun/workspace/build/bin/libigdrcl.so
0x00007ffff64ea100  0x00007ffff659a7f8  Yes (*)     /lib64/libstdc++.so.6
(*): Shared library is missing debugging information.
(gdb)

clGetPlatformIds is returning -1001. You can see from the gdb log that libigdrcl.so was dynamically loaded by the ICD.

If I build my own ICD from TOT khronos clinfo executes just fine.

Build fails without `opencl-headers` installed

Following the build instruction (in a docker container from a blank Ubuntu 18.04 image), I get the error below. apt install opencl-headers fixes this. I find it odd, as it appears to me that we are building the headers too - so maybe they're built but not being recognized?

[ 86%] Built target opencl.pcm.target
[ 86%] Built target opencl.headers.target
[ 88%] Built target cl_headers
[ 88%] Building CXX object projects/cclang/CMakeFiles/opencl_clang.dir/common_clang.cpp.o
In file included from /dockerws/workspace/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/common_clang.cpp:19:
/dockerws/workspace/build/igc/IGC/BiFModule/clang_build/src/src/cclang/projects/cclang/common_clang.h:20:10: fatal error: 'CL/cl.h' file not found
#include "CL/cl.h"
         ^~~~~~~~~
1 error generated.
projects/cclang/CMakeFiles/opencl_clang.dir/build.make:132: recipe for target 'projects/cclang/CMakeFiles/opencl_clang.dir/common_clang.cpp.o' failed
make[5]: *** [projects/cclang/CMakeFiles/opencl_clang.dir/common_clang.cpp.o] Error 1
CMakeFiles/Makefile2:6408: recipe for target 'projects/cclang/CMakeFiles/opencl_clang.dir/all' failed
make[4]: *** [projects/cclang/CMakeFiles/opencl_clang.dir/all] Error 2
Makefile:151: recipe for target 'all' failed
make[3]: *** [all] Error 2
igc/IGC/BiFModule/clang_build/CMakeFiles/cclang.dir/build.make:223: recipe for target 'igc/IGC/BiFModule/clang_build/src/src/cclang-stamp/cclang-build' failed
make[2]: *** [igc/IGC/BiFModule/clang_build/src/src/cclang-stamp/cclang-build] Error 2
CMakeFiles/Makefile2:10770: recipe for target 'igc/IGC/BiFModule/clang_build/CMakeFiles/cclang.dir/all' failed
make[1]: *** [igc/IGC/BiFModule/clang_build/CMakeFiles/cclang.dir/all] Error 2
Makefile:151: recipe for target 'all' failed
make: *** [all] Error 2

'IKBL_GT2_R_ULX_DEVICE_F0_ID' was not declared in this scope

Build breaking on i5-5250u. Seems to be from this commit. Cc @LukaszJobczyk

/dockerws/workspace/neo/runtime/dll/linux/devices/devices_base.m:82:9: error: 'IKBL_GT2_R_ULX_DEVICE_F0_ID' was not declared in this scope
 DEVICE( IKBL_GT2_R_ULX_DEVICE_F0_ID,         KBL_1x3x8,  GTTYPE_GT2 )
         ^
/dockerws/workspace/neo/runtime/dll/linux/drm_neo_create.cpp:38:36: note: in definition of macro 'DEVICE'
 #define DEVICE(devId, gt, gtType) {devId, &gt::hwInfo, &gt::setupGtSystemInfo, gtType},
                                    ^~~~~
/dockerws/workspace/neo/runtime/dll/linux/devices/devices_base.m:82:9: note: suggested alternative: 'IKBL_GT2_R_ULT_DEVICE_F0_ID'
 DEVICE( IKBL_GT2_R_ULX_DEVICE_F0_ID,         KBL_1x3x8,  GTTYPE_GT2 )
         ^
/dockerws/workspace/neo/runtime/dll/linux/drm_neo_create.cpp:38:36: note: in definition of macro 'DEVICE'
 #define DEVICE(devId, gt, gtType) {devId, &gt::hwInfo, &gt::setupGtSystemInfo, gtType},
                                    ^~~~~
igdrcl_lib_release/CMakeFiles/igdrcl_dll.dir/build.make:686: recipe for target 'igdrcl_lib_release/CMakeFiles/igdrcl_dll.dir/dll/linux/drm_neo_create.cpp.o' failed
make[2]: *** [igdrcl_lib_release/CMakeFiles/igdrcl_dll.dir/dll/linux/drm_neo_create.cpp.o] Error 1
CMakeFiles/Makefile2:12090: recipe for target 'igdrcl_lib_release/CMakeFiles/igdrcl_dll.dir/all' failed
make[1]: *** [igdrcl_lib_release/CMakeFiles/igdrcl_dll.dir/all] Error 2
Makefile:151: recipe for target 'all' failed
make: *** [all] Error 2

Failed to build debug version

I tried to build debug version to do some debugging. but I fail to build it. there are basically two issues.

  1. EnableDxbcDump not defined but used in IGC/AdaptorCommon/customApi.cpp
  2. an invalid product family was passed in to IGC::SetWorkaroundTable() which triggers an assert there.
    please help to fix them.

Comparing Win Neo vs Linux Neo driver and some Linux benchmarks..

Ok,
I compiled today the driver and run and up to date clinfo tool (full report below):

extensions differences vs Windows driver are:
Linux drv has (and Windows no):
cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue

more info on "cl_khr_create_command_queue" extension, please? can't find info on OpenCL registry..

Win drv has (and Linux drv no):
cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing

as you mention on README "delta" other driver:
OpenGL sharing with MESA driver..

this means currently there is no OpenGL interop on Neo with Mesa right?
are you working on it/have plans to implement it?

thanks..
Report:

Number of platforms                               1
  Platform Name                                   Intel(R) OpenCL HD Graphics
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 2.1 
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_subgroups cl_khr_il_program cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation cl_intel_va_api_media_sharing 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             INTEL

  Platform Name                                   Intel(R) OpenCL HD Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Gen9 HD Graphics NEO
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 2.1 NEO 
  Driver Version                                  1.0
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     GPU
  Device Available                                Yes
  Device Profile                                  FULL_PROFILE
  Max compute units                               24
  Max clock frequency                             0MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Compiler Available                              Yes
  Linker Available                                Yes
  Preferred work group size multiple              32
  Max sub-groups per work group                   32
  Sub-group sizes (Intel)                         8x16x32
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 1 / 1       
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              3435970560 (3.2GiB)
  Error Correction support                        No
  Max memory allocation                           1717985280 (1.6GiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             1717985280 (1.6GiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        524288 (512KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            107374080 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4 bytes
    Pitch alignment for 2D image buffers          4 pixels
    Max 2D image size                             16384x16384 pixels
    Max planar YUV image size                     16384x16380 pixels
    Max 3D image size                             16384x16384x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
    Max number of read/write image args           128
  Max number of pipe args                         16
  Max active pipe reservations                    1
  Max pipe packet size                            1024
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max constant buffer size                        1717985280 (1.6GiB)
  Max number of constant args                     8
  Max size of kernel argument                     1024
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                131072 (128KiB)
    Max size                                      67108864 (64MiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      83ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Sub-group independent forward progress        Yes
    IL version                                    SPIR-V_1.0 
    SPIR versions                                 1.2 
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
  Motion Estimation accelerator version (Intel)   2
    Device-side AVC Motion Estimation version     1
      Supports texture sampler use                Yes
      Supports preemption                         No
  Device Extensions                               cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_subgroups cl_khr_il_program cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation cl_intel_va_api_media_sharing 

libreoffice crashes since installation of the new intel opencl driver stack

I successfully installed the new intel opencl driver stack (checked with clinfo and everything looks fine).

I can use darktable with activated opencl without problems, but whenever I open an existing document in libreoffice or save a new one it crashes with the message "Couldn't open /usr/share/kde4/config/kdebug.areas". I searched the internet and found some information on problems with opencl and libreoffice, but not connected to the new driver stack.

Deactivating opencl in libreoffice and deleting the user profile makes no difference. Uninstalling opencl unfortunately doesn't help either, libreoffice still crashes, even as root. My system hasn't changed recently apart from the installation of opencl.

system:
ubuntu 17.10, kernel 4.13.0-36, kaby lake i5-7300HQ, Intel HD Graphics 630
libreoffice 5.4.5.1

Has someone already tried the opencl stack with ubuntu and libreoffice and faced the same problem? What can I do to resolve the issue?

output of clinfo:

Number of platforms                               2
  Platform Name                                   Experimental OpenCL 2.1 CPU Only Platform
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 2.1 LINUX
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             INTEL

  Platform Name                                   Intel(R) OpenCL HD Graphics
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 2.1 
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_subgroups cl_khr_il_program cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             INTEL

  Platform Name                                   Experimental OpenCL 2.1 CPU Only Platform
Number of devices                                 1
  Device Name                                     Intel(R) Core(TM) i5-7300HQ CPU @ 2.50GHz
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 2.1 (Build 10)
  Driver Version                                  1.2.0.10
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               4
  Max clock frequency                             2500MHz
  Device Partition                                (core)
    Max number of sub-devices                     4
    Supported partition types                     by counts, equally, by names (Intel)
  Max work item dimensions                        3
  Max work item sizes                             8192x8192x8192
  Max work group size                             8192
  Preferred work group size multiple              128                                                                                                                                                        
  Max sub-groups per work group                   1                                                                                                                                                          
  Preferred / native vector sizes                                                                                                                                                                            
    char                                                 1 / 32                                                                                                                                              
    short                                                1 / 16                                                                                                                                              
    int                                                  1 / 8                                                                                                                                               
    long                                                 1 / 4                                                                                                                                               
    half                                                 0 / 0        (n/a)                                                                                                                                  
    float                                                1 / 8                                                                                                                                               
    double                                               1 / 4        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              33649938432 (31.34GiB)
  Error Correction support                        No
  Max memory allocation                           8412484608 (7.835GiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   Yes
    Atomics                                       Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics                 
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         0 bytes
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             65536 (64KiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        262144
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             480
    Max size for 1D images from buffer            525780288 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   64 bytes
    Pitch alignment for 2D image buffers          64 bytes
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 480
    Max number of write image args                480
    Max number of read/write image args           480
  Max number of pipe args                         16
  Max active pipe reservations                    65535
  Max pipe packet size                            1024
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max constant buffer size                        131072 (128KiB)
  Max number of constant args                     480
  Max size of kernel argument                     3840 (3.75KiB)
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Local thread execution (Intel)                Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                4294967295 (4GiB)
    Max size                                      4294967295 (4GiB)
  Max queues on device                            4294967295
  Max events on device                            4294967295
  Prefer user sync for interop                    No
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    Sub-group independent forward progress        No
    IL version                                    SPIR-V_1.0
    SPIR versions                                 1.2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer 

  Platform Name                                   Intel(R) OpenCL HD Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Gen9 HD Graphics NEO
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 2.1 NEO 
  Driver Version                                  1.0
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               23
  Max clock frequency                             1000MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256

Number of platforms                               2
  Platform Name                                   Experimental OpenCL 2.1 CPU Only Platform
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 2.1 LINUX
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             INTEL

  Platform Name                                   Intel(R) OpenCL HD Graphics
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 2.1 
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_subgroups cl_khr_il_program cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             INTEL

  Platform Name                                   Experimental OpenCL 2.1 CPU Only Platform
Number of devices                                 1
  Device Name                                     Intel(R) Core(TM) i5-7300HQ CPU @ 2.50GHz
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 2.1 (Build 10)
  Driver Version                                  1.2.0.10
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               4
  Max clock frequency                             2500MHz
  Device Partition                                (core)
    Max number of sub-devices                     4
    Supported partition types                     by counts, equally, by names (Intel)
  Max work item dimensions                        3
  Max work item sizes                             8192x8192x8192
  Max work group size                             8192
  Preferred work group size multiple              128                                                                                                                                                        
  Max sub-groups per work group                   1                                                                                                                                                          
  Preferred / native vector sizes                                                                                                                                                                            
    char                                                 1 / 32                                                                                                                                              
    short                                                1 / 16                                                                                                                                              
    int                                                  1 / 8                                                                                                                                               
    long                                                 1 / 4                                                                                                                                               
    half                                                 0 / 0        (n/a)                                                                                                                                  
    float                                                1 / 8                                                                                                                                               
    double                                               1 / 4        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              33649938432 (31.34GiB)
  Error Correction support                        No
  Max memory allocation                           8412484608 (7.835GiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   Yes
    Atomics                                       Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics                 
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         0 bytes
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             65536 (64KiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        262144
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             480
    Max size for 1D images from buffer            525780288 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   64 bytes
    Pitch alignment for 2D image buffers          64 bytes
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 480
    Max number of write image args                480
    Max number of read/write image args           480
  Max number of pipe args                         16
  Max active pipe reservations                    65535
  Max pipe packet size                            1024
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max constant buffer size                        131072 (128KiB)
  Max number of constant args                     480
  Max size of kernel argument                     3840 (3.75KiB)
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Local thread execution (Intel)                Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                4294967295 (4GiB)
    Max size                                      4294967295 (4GiB)
  Max queues on device                            4294967295
  Max events on device                            4294967295
  Prefer user sync for interop                    No
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    Sub-group independent forward progress        No
    IL version                                    SPIR-V_1.0
    SPIR versions                                 1.2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer 

  Platform Name                                   Intel(R) OpenCL HD Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Gen9 HD Graphics NEO
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 2.1 NEO 
  Driver Version                                  1.0
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               23
  Max clock frequency                             1000MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256

Neo does not support non-uniform work group sizes?

Hello.

clEnqueueNDRangeKernel returns "CL_INVALID_WORK_GROUP_SIZE if local_work_size is specified and number of work-items specified by global_work_size is not evenly divisible by size of work-group given by local_work_size"

I use Intel Core i3-6100 processor.

CentOS 7.5 RPM Installation Fails

Hi,

I checked out all the version you used in your latest release and built from scratch (your instructions are a bit incomplete as still the google libs for test needed) and everything goes through fine and without any issue.

However installing the rpm-packages fails:
rpm -i intel-opencl-1.0-0.x86_64-igdrcl.rpm
file /opt from install of intel-opencl-igdrcl-1.0.0-1.x86_64 conflicts with file from package filesystem-3.2-25.el7.x86_64
file /etc/ld.so.conf.d from install of intel-opencl-igdrcl-1.0.0-1.x86_64 conflicts with file from package glibc-2.17-222.el7.x86_64

it seems to touch /opt and /etc/ld.so.conf.d directories, but they are already there.

I did a --force during rpm install and it seems to work, however, it might be good to fix that.

ls /etc/ld.so.conf.d
atlas-x86_64.conf dyninst-x86_64.conf kernel-3.10.0-862.el7.x86_64.conf libiscsi-x86_64.conf mariadb-x86_64.conf qt-x86_64.conf
rpm -i --force /nfs_home/aheineck/Misc/gen_stack/build/intel-opencl-1.0-0.x86_64-igdrcl.rpm
ls /etc/ld.so.conf.d
atlas-x86_64.conf dyninst-x86_64.conf kernel-3.10.0-862.el7.x86_64.conf libintelopencl.conf libiscsi-x86_64.conf mariadb-x86_64.conf qt-x86_64.conf

Thanks,
Alex

Docker builds?

Not sure if there's enough demand, but docker builds on some common OSes would be great:

  • users can use the image directly
  • users can inspect the dockerfile to see how it's compiled etc.
  • this could integrate with #46 in that each docker container could build the required package for each OS etc.

Neo OpenCL driver supports already SPV_INTEL_subgroups?

Hi,
I'm very interested in this extension as it seems to provide support for cl_intel_subgroups & cl_intel_subgroups_short extensions to SPIR-V world..
don't know if extension is official yet or not..
it's strange because it can't be found on SPIR-V extension registry:
https://www.khronos.org/registry/spir-v/
but appers on:
https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.pdf
or commits like:
[Mesa-dev] [PATCH 03/10] spirv: Import the latest 1.0.12 header and JSON from Khronos

just found by inspection that Khronos SPIRV-LLVM compiler already supports it:
KhronosGroup/SPIRV-LLVM@4496e89
it also there support for Clang
"CLANG support for cl_intel_subgroups and cl_intel_subgroups_short"
KhronosGroup/SPIR@7ad2092

I was lazy and still don't compiled these to projects jointly with neo opencl driver to see if SPIRV kernels using these cl_intel_subgroups_* capabilities run correctly on Neo..
I thought I could avoid compiling Khronos SPIR and SPIRV-LLVM projects by using Intel OpenCL code builder from Intel OpenCL SDK that supports generating also SPIR-V code from kernels from a nice GUI..
I tested with latest version avaiable (Intel® SDK for OpenCL™ Applications 2017 R2 version 7.0.0.2567) and it doesn't worked (of course generating OpenCL vendor specific binary for Intel HD graphics using kernel using this cl_intel_subgroups_* extension worked correctly..

so guess it's will be in Intel OpenCL SDK 2018 whenever it comes out, right?

of course when I say if Neo supports this SPV extension I also mean the OpenCL Neo driver that has started appearing included on recent Windows drivers for example on 23.20.16.4933..

wait you to enlighten me on all this questions..

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.