Giter VIP home page Giter VIP logo

xdna-driver's People

Contributors

amd-xiaomren avatar atgutier avatar gyang1099 avatar houlz0507 avatar jgmelber avatar k2 avatar keryell avatar mamin506 avatar maxzhen avatar nishadsaraf avatar raphaelbamd avatar rbramand-xilinx avatar samuelcwils avatar vengutta18 avatar xdavidz avatar xuwd1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xdna-driver's Issues

Broken Ubuntu Version Check for Debian

While building xrt, I encounter a broken Ubuntu version check. (I use Debian Unstable.)

Build Error:

CMake Error at CMake/cpackLin.cmake:94 (if):
  if given arguments:

    "(" "debian" "MATCHES" "^(ubuntu)" ")" "AND" "(" "STREQUAL" "23.10" ")"

  Unknown arguments specified
Call Stack (most recent call first):
  CMake/nativeLnx.cmake:199 (include)
  CMakeLists.txt:123 (include)

Commenting out the code is the quick fix for me, but raising the issue for proper handling:

--- cpackLin.cmake.orig	2024-06-04 21:40:18.696599826 -0700
+++ cpackLin.cmake	2024-06-04 21:40:34.634035125 -0700
@@ -91,7 +91,7 @@
       uuid-dev (>= 2.27.1)")
   endif()

-  if ((${LINUX_FLAVOR} MATCHES "^(ubuntu)") AND (${LINUX_VERSION} STREQUAL "23.10"))
+#  if ((${LINUX_FLAVOR} MATCHES "^(ubuntu)") AND (${LINUX_VERSION} STREQUAL "23.10"))
     # Workaround for the following class of cpack build failure on Ubuntu 23.10
     # CMake Error at /usr/share/cmake-3.27/Modules/Internal/CPack/CPackDeb.cmake:348 (message):
     #   CPackDeb: dpkg-shlibdeps: 'dpkg-shlibdeps: error: no dependency information
@@ -104,10 +104,10 @@
     # build/Release/_CPack_Packages/Linux/DEB/xrt_202410.2.17.0_23.10-amd64/xrt directory
     # Adding an empty DEBIAN directory somehow convinces dpkg-shlibdeps to behave sanely.

-    message("-- Enable Ubuntu 23.10 cpack dpkg-shlibdeps failure workaround")
-    file(WRITE "${CMAKE_CURRENT_BINARY_DIR}/please-mantic.txt" "Workaround for cpack bug on Ubuntu 23.10")
-    install(FILES "${CMAKE_CURRENT_BINARY_DIR}/please-mantic.txt" DESTINATION "${XRT_INSTALL_DIR}/DEBIAN")
-  endif()
+#    message("-- Enable Ubuntu 23.10 cpack dpkg-shlibdeps failure workaround")
+#    file(WRITE "${CMAKE_CURRENT_BINARY_DIR}/please-mantic.txt" "Workaround for cpack bug on Ubuntu 23.10")
+#    install(FILES "${CMAKE_CURRENT_BINARY_DIR}/please-mantic.txt" DESTINATION "${XRT_INSTALL_DIR}/DEBIAN")
+#  endif()

   if (DEFINED CROSS_COMPILE)
     if (${aarch} STREQUAL "aarch64")

Help for install xdna-driver "No such device with index '0'"

I'm having trouble running the example after building a kernel with the newest XRT and driver for amd 8845hs.

the operation is under PVE virtual machine

These are what i check for each step

Check hardware

to ensure driver will install check the VM system requirements and host requirements

host check
root@pve:~# dmesg | grep -i iommu
# boot options
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.8-2-pve root=/dev/mapper/pve-root ro quiet amd_iommu=on iommu=pt nomodeset pcie_acs_override=downstream initcall_blacklist=sysfb_init vfio_iommu_type1.allow_unsafe_interrupts=1  video=efifb:off
[    0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[    0.063268] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.8.8-2-pve root=/dev/mapper/pve-root ro quiet amd_iommu=on iommu=pt nomodeset pcie_acs_override=downstream initcall_blacklist=sysfb_init vfio_iommu_type1.allow_unsafe_interrupts=1 video=efifb:off
[    0.647008] iommu: Default domain type: Passthrough (set via kernel command line)
# found this
[    0.683854] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    0.683910] pci 0000:00:01.0: Adding to iommu group 0
[    0.683927] pci 0000:00:01.2: Adding to iommu group 1
[    0.683944] pci 0000:00:01.4: Adding to iommu group 2
[    0.683973] pci 0000:00:02.0: Adding to iommu group 3
[    0.683990] pci 0000:00:02.1: Adding to iommu group 4
[    0.684006] pci 0000:00:02.2: Adding to iommu group 5
[    0.684022] pci 0000:00:02.3: Adding to iommu group 6
[    0.684045] pci 0000:00:03.0: Adding to iommu group 7
[    0.684063] pci 0000:00:03.1: Adding to iommu group 8
[    0.684087] pci 0000:00:04.0: Adding to iommu group 9
[    0.684105] pci 0000:00:04.1: Adding to iommu group 10
[    0.684133] pci 0000:00:08.0: Adding to iommu group 11
[    0.684149] pci 0000:00:08.1: Adding to iommu group 12
[    0.684165] pci 0000:00:08.2: Adding to iommu group 13
[    0.684181] pci 0000:00:08.3: Adding to iommu group 14
[    0.684209] pci 0000:00:14.0: Adding to iommu group 15
[    0.684223] pci 0000:00:14.3: Adding to iommu group 15
[    0.684292] pci 0000:00:18.0: Adding to iommu group 16
[    0.684306] pci 0000:00:18.1: Adding to iommu group 16
[    0.684321] pci 0000:00:18.2: Adding to iommu group 16
[    0.684336] pci 0000:00:18.3: Adding to iommu group 16
[    0.684351] pci 0000:00:18.4: Adding to iommu group 16
[    0.684366] pci 0000:00:18.5: Adding to iommu group 16
[    0.684383] pci 0000:00:18.6: Adding to iommu group 16
[    0.684397] pci 0000:00:18.7: Adding to iommu group 16
[    0.684418] pci 0000:01:00.0: Adding to iommu group 17
[    0.684434] pci 0000:02:00.0: Adding to iommu group 18
[    0.684450] pci 0000:03:00.0: Adding to iommu group 19
[    0.684466] pci 0000:04:00.0: Adding to iommu group 20
[    0.684483] pci 0000:05:00.0: Adding to iommu group 21
[    0.684513] pci 0000:c6:00.0: Adding to iommu group 22
[    0.684530] pci 0000:c6:00.1: Adding to iommu group 23
[    0.684547] pci 0000:c6:00.2: Adding to iommu group 24
[    0.684565] pci 0000:c6:00.3: Adding to iommu group 25
[    0.684582] pci 0000:c6:00.4: Adding to iommu group 26
[    0.684600] pci 0000:c6:00.6: Adding to iommu group 27
[    0.684619] pci 0000:c7:00.0: Adding to iommu group 28
[    0.684639] pci 0000:c7:00.1: Adding to iommu group 29
[    0.684657] pci 0000:c8:00.0: Adding to iommu group 30
[    0.684677] pci 0000:c8:00.3: Adding to iommu group 31
[    0.684696] pci 0000:c8:00.4: Adding to iommu group 32
[    0.684714] pci 0000:c8:00.5: Adding to iommu group 33
[    0.684733] pci 0000:c8:00.6: Adding to iommu group 34
# found this
[    0.687914] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

VM check

check cpu

root@vm:~# lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          48 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   4
  On-line CPU(s) list:    0-3
Vendor ID:                AuthenticAMD
# found this
  Model name:             AMD Ryzen 7 8845HS w/ Radeon 780M Graphics
    CPU family:           25
    Model:                117
    Thread(s) per core:   1
    Core(s) per socket:   4
    Socket(s):            1
    Stepping:             2
    BogoMIPS:             7585.70
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
                           pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pd
                          pe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pn
                          i pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_
                          deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_l
                          egacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw per
                          fctr_core ssbd ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc
                          _adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed
                          adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx5
                          12vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 clzero xsaveerptr
                           wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_clean flushbyasi
                          d pausefilter pfthreshold v_vmsave_vmload vgif vnmi avx512vbmi um
                          ip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512
                          _bitalg avx512_vpopcntdq rdpid fsrm flush_l1d arch_capabilities
Virtualization features:
  Virtualization:         AMD-V
  Hypervisor vendor:      KVM
  Virtualization type:    full
Caches (sum of all):
  L1d:                    256 KiB (4 instances)
  L1i:                    256 KiB (4 instances)
  L2:                     2 MiB (4 instances)
  L3:                     64 MiB (4 instances)
NUMA:
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-3
Vulnerabilities:
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Not affected
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Vulnerable: Safe RET, no microcode
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitizat
                          ion
  Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP di
                          sabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
  Srbds:                  Not affected
  Tsx async abort:        Not affected

check immou and drm

root@vm:~# cat /proc/cmdline
# found this
# not enable in cmdline immou but still detected enable
BOOT_IMAGE=/vmlinuz-6.8.0-38-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro

root@vm:~# sudo dmesg | grep -i iommu
# found this, iommu retured values
[    0.163549] iommu: Default domain type: Translated
[    0.163549] iommu: DMA domain TLB invalidation policy: lazy mode

root@vm:~#  sudo lsmod | grep drm
# found this, also return values
amddrm_ttm_helper      12288  1 amdgpu
amdttm                118784  2 amdgpu,amddrm_ttm_helper
amddrm_buddy           20480  1 amdgpu
drm_exec               16384  1 amdgpu
drm_suballoc_helper    16384  1 amdgpu
drm_display_helper    253952  1 amdgpu
cec                    98304  1 drm_display_helper
drm_vram_helper        24576  1 bochs
drm_ttm_helper         12288  2 bochs,drm_vram_helper
ttm                   114688  2 drm_vram_helper,drm_ttm_helper


root@vm:~#  sudo dmesg | grep drm
[    0.396334] ACPI: bus type drm_connector registered
[    0.431417] [drm] Initialized simpledrm 1.0.0 20200625 for simple-framebuffer.0 on minor 0
[    0.432952] simple-framebuffer simple-framebuffer.0: [drm] fb0: simpledrmdrmfb frame buffer device
[    0.731790] bochs-drm 0000:00:02.0: vgaarb: deactivate vga console
[    0.731861] [drm] Found bochs VGA, ID 0xb0c5.
[    0.731864] [drm] Framebuffer size 16384 kB @ 0xfd000000, mmio @ 0xfeb96000.
[    0.732780] [drm] Found EDID data blob.
[    0.732952] [drm] Initialized bochs-drm 1.0.0 20130925 for 0000:00:02.0 on minor 0
[    0.733705] fbcon: bochs-drmdrmfb (fb0) is primary device
[    1.034047] bochs-drm 0000:00:02.0: [drm] fb0: bochs-drmdrmfb frame buffer device
[    1.883668] [drm] amdgpu kernel modesetting enabled.
[    1.883681] [drm] amdgpu version: 6.7.0
[    1.883688] [drm] OS DRM version: 6.8.0
[    1.916430] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x1900 0x2014:0x8001 0xC5).
[    1.918362] [drm] register mmio base: 0xFEA00000
[    1.918370] [drm] register mmio size: 524288
[    1.921893] [drm] add ip block number 0 <soc21_common>
[    1.921904] [drm] add ip block number 1 <gmc_v11_0>
[    1.921911] [drm] add ip block number 2 <ih_v6_0>
[    1.921918] [drm] add ip block number 3 <psp>
[    1.921925] [drm] add ip block number 4 <smu>
[    1.922193] [drm] add ip block number 5 <dm>
[    1.922394] [drm] add ip block number 6 <gfx_v11_0>
[    1.922591] [drm] add ip block number 7 <sdma_v6_0>
[    1.922777] [drm] add ip block number 8 <vcn_v4_0>
[    1.922959] [drm] add ip block number 9 <jpeg_v4_0>
[    1.923137] [drm] add ip block number 10 <mes_v11_0>
[    1.932912] [drm] BIOS signature incorrect 5b 69
[    1.933147] [drm] BIOS header is broken
[    1.940333] [drm] BIOS signature incorrect 5b 69
[    2.090508] workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
[    2.390508] workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 8 times, consider switching to WQ_UNBOUND
[    2.689184] systemd[1]: Starting [email protected] - Load Kernel Module drm...
[    2.999504] workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 16 times, consider switching to WQ_UNBOUND
[    5.211507] workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 32 times, consider switching to WQ_UNBOUND

check pci device

root@vm:~# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:03.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon
00:05.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
# found this
00:10.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Phoenix3 (rev c5)
# found this
00:11.0 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon High Definition Audio Controller
00:12.0 Ethernet controller: Red Hat, Inc. Virtio network device
# found this
00:1b.0 Signal processing controller: Advanced Micro Devices, Inc. [AMD] AMD IPU Device
00:1e.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
00:1f.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
01:01.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI

there are GPU/IPU and audio device

and i was build success xrt and xdna-driver and it was installed

root@vm:~# ls /opt/xilinx/xrt/lib
libcontainer_mpd_plugin.so  libxrt_core.so                libxrt_hwemu_static.a
libxdp_core.so              libxrt_core.so.2              libxrt_noop.so
libxdp_core.so.2            libxrt_core.so.2.18.0         libxrt_noop.so.2
libxdp_core.so.2.18.0       libxrt_core_static.a          libxrt_noop.so.2.18.0
libxilinxopencl.so          libxrt_coreutil.so            libxrt++.so
libxilinxopencl.so.2        libxrt_coreutil.so.2          libxrt++.so.2
libxilinxopencl.so.2.18.0   libxrt_coreutil.so.2.18.0     libxrt++.so.2.18.0
libxilinxopencl_static.a    libxrt_coreutil_static.a      libxrt++_static.a
libxma2api.so               libxrt_driver_xdna.so         libxrt_swemu.so
libxma2api.so.2             libxrt_driver_xdna.so.2       libxrt_swemu.so.2
libxma2api.so.2.18.0        libxrt_driver_xdna.so.2.18.0  libxrt_swemu.so.2.18.0
libxma2plugin.so            libxrt_hwemu.so               libxrt_swemu_static.a
libxma2plugin.so.2          libxrt_hwemu.so.2             xrt
libxma2plugin.so.2.18.0     libxrt_hwemu.so.2.18.0
root@vm:~# ls /usr/lib/firmware/amdnpu/1502_00 -alh
total 316K
drwxr-xr-x 2 root root 4.0K Jul 19 18:58 .
drwxr-xr-x 6 root root 4.0K Jul 19 17:26 ..
-rw-r--r-- 1 root root 305K Jul 19 15:58 npu.sbin

when i check steps from #50

useing xrt xbutil check

root@vm:~# ./xrt/build/Release/opt/xilinx/xrt/bin/xrt-smi examine
System Configuration
  OS Name              : Linux
  Release              : 6.8.0-38-generic
  Machine              : x86_64
  CPU Cores            : 4
  Memory               : 7941 MB
  Distribution         : Ubuntu 24.04 LTS
  GLIBC                : 2.39
  Model                : Standard PC (i440FX + PIIX, 1996)
  BIOS vendor          : SeaBIOS
  BIOS version         : rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org

XRT
  Version              : 2.18.0
  Branch               : HEAD
  Hash                 : 54b1a0335ef517415d17206d30365cf4a2c380d0
  Hash Date            : 2024-07-19 18:10:24
  xocl                 : unknown, unknown
  xclmgmt              : unknown, unknown
WARNING: xclmgmt version is unknown. Is xclmgmt driver loaded? Or is MSD/MPD running?

Devices present
# found this 
  0 devices found

here is problem why 0 devices found

and try to install

# cwd is ./xdna-driver/build/Release
root@vm:~#  ./opt/xilinx/xrt/amdxdna/dkms_driver.sh --install
XILINX_XRT is not set properly

why XILINX_XRT is not set properly,i have installed xrt and xdna-driver,it's seems xrt was not install success.

and retry to install xrt

root@vm:~# sudo apt reinstall ./xrt_202420.2.18.0_24.04-amd64-xrt.deb  ./xrt_202420.2.18.0_24.04-amd64-xbflash.deb
[sudo] password for never:
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Note, selecting 'xrt' instead of './xrt_202420.2.18.0_24.04-amd64-xrt.deb'
Note, selecting 'xrt-xbflash' instead of './xrt_202420.2.18.0_24.04-amd64-xbflash.deb'
0 upgraded, 0 newly installed, 2 reinstalled, 0 to remove and 1 not upgraded.
Need to get 0 B/16.2 MB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 /home/never/xdna-driver/xrt/build/Release/xrt_202420.2.18.0_24.04-amd64-xrt.deb xrt amd64 2.18.0 [16.2 MB]
Get:2 /home/never/xdna-driver/xrt/build/Release/xrt_202420.2.18.0_24.04-amd64-xbflash.deb xrt-xbflash amd64 2.18.0 [59.5 kB]
(Reading database ... 138133 files and directories currently installed.)
Preparing to unpack .../xrt_202420.2.18.0_24.04-amd64-xrt.deb ...
Unpacking xrt (2.18.0) over (2.18.0) ...
Preparing to unpack .../xrt_202420.2.18.0_24.04-amd64-xbflash.deb ...
Unpacking xrt-xbflash (2.18.0) over (2.18.0) ...
Setting up xrt (2.18.0) ...
Setting up xrt-xbflash (2.18.0) ...
Scanning processes...
Scanning linux images...

Running kernel seems to be up-to-date.

No services need to be restarted.

No containers need to be restarted.

No user sessions are running outdated binaries.

No VM guests are running outdated hypervisor (qemu) binaries on this host.
N: Download is performed unsandboxed as root as file '/home/never/xdna-driver/xrt/build/Release/xrt_202420.2.18.0_24.04-amd64-xrt.deb' couldn't be accessed by user '_apt'. - pkgAcquire::Run (13: Permission denied)

check driver

root@vm:~# dpkg -l | grep xdna
ii  xrt_plugin-amdxdna                     2.18.0                                  amd64        XDNA driver plugin for Xilinx RunTime

root@vm:~# dpkg -l | grep xrt
ii  xrt                                    2.18.0                                  amd64        Runtime stack for use with AMD platforms
ii  xrt-container                          2.18.0                                  amd64        Runtime stack for use with AMD platforms
ii  xrt-xbflash                            2.18.0                                  amd64        Runtime stack for use with AMD platforms
ii  xrt_plugin-amdxdna                     2.18.0                                  amd64        XDNA driver plugin for Xilinx RunTime

XILINX_XRT is not set properly problem still

switch to root and export XILINX_XRT=/opt/xilinx/xrt

root@vm:~# export XILINX_XRT=/opt/xilinx/xrt
root@vm:~#  ./opt/xilinx/xrt/amdxdna/dkms_driver.sh --install
Installing xrt-amdxdna-2.18.0 from /opt/xilinx/xrt/amdxdna...
Module xrt-amdxdna-2.18.0 for kernel 6.8.0-38-generic (x86_64).
Before uninstall, this module version was ACTIVE on this kernel.

amdxdna.ko.zst:
 - Uninstallation
   - Deleting from: /lib/modules/6.8.0-38-generic/updates/dkms/
 - Original module
   - No original module was found for this module on this kernel.
   - Use the dkms install command to reinstall any previous module version.
depmod...

amdxdna.ko.zst:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/6.8.0-38-generic/updates/dkms/
depmod...
Successfully intalled and enabled DKMS for xrt-amdxdna/2.18.0

check again use xrt-smi

root@vm:~# /home/never/xdna-driver# ./xrt/build/Release/opt/xilinx/xrt/bin/unwrapped/xrt-smi examin
ERROR: Unknown command: 'examin'

DESCRIPTION: The Xilinx (R) Run Time - System Management Interface (xrt-smi) is a standalone
             command line utility that is included with the Xilinx Run Time (XRT) installation
             package. It includes multiple commands to identify and validate the installed card(s).

             This information can be used for both card administration and application debugging.

USAGE: xrt-smi[--help] [--version] [--verbose] [--batch] [--force] [command [commandArgs]]

AVAILABLE COMMANDS:
  configure  - Device and host configuration
  examine    - Status of the system and device
  program    - Download the acceleration program to a given device
  reset      - Resets the given device
  validate   - Validates the basic shell acceleration functionality

OPTIONS:
  --help             - Help to use this application
  --version          - Report the version of XRT and its drivers
  --verbose          - Turn on verbosity
  --batch            - Enable batch mode (disables escape characters)
  --force            - When possible, force an operation
root@tkai:/home/never/xdna-driver# ./xrt/build/Release/opt/xilinx/xrt/bin/unwrapped/xrt-smi examine
System Configuration
  OS Name              : Linux
  Release              : 6.8.0-38-generic
  Machine              : x86_64
  CPU Cores            : 4
  Memory               : 7941 MB
  Distribution         : Ubuntu 24.04 LTS
  GLIBC                : 2.39
  Model                : Standard PC (i440FX + PIIX, 1996)
  BIOS vendor          : SeaBIOS
  BIOS version         : rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org

XRT
  Version              : 2.18.0
  Branch               : HEAD
  Hash                 : 54b1a0335ef517415d17206d30365cf4a2c380d0
  Hash Date            : 2024-07-19 18:10:24
  xocl                 : unknown, unknown
  xclmgmt              : unknown, unknown
WARNING: xclmgmt version is unknown. Is xclmgmt driver loaded? Or is MSD/MPD running?
  amdxdna              : 2.18.0_20240719, 33ce972deb1eaec4666671a0255870a28ec982ae

Devices present
  0 devices found

it's seems almost success, I'd appreciate any assistance. still 0 device found

Failed to open KMQ device fd (err=22): Invalid argument TEST FAILED! when running the test example.

I tried to build the driver with these environments: ubuntu-22.04.04 + vitis-2022.2 and ubuntu-24.04 + vitis-2023.2. The kernel I cloned from AMD-SW/linux is 6.8.7+. The device I am using is Thinkbook 14+ with 8845hs. And the AMD VM support is enabled in the BIOS, but I cannot find options related with IOMMU.

After following the instructions in the readme, both the environment cannot run the test example with output:

$ ./example_build/example_noop_test ../tools/bins/1502_00/validate.xclbin
Host test code start...
Host test code is creating device object...
ERROR: Caught exception: Failed to open KMQ device fd (err=22): Invalid argument
TEST FAILED!

The dmesg command reports:

[  255.039799] amdxdna 0000:65:00.1: set mpnpu_clock = 600 mhz
[  255.059899] amdxdna 0000:65:00.1: set npu_hclock = 1024 mhz
[  255.101933] [drm] Initialized amdxdna_accel_driver 1.0.0 20240124 for 0000:65:00.1 on minor 0
[  278.109565] amdxdna 0000:65:00.1: amdxdna_drm_open: SVA bind device failed, ret -28

What is the possible cause of this problem?

Thanks for your attention in advance.

Upstream

Now the we have the IOMMU SVA support upstreamed, do you have intentions to upstream the xdna-driver?

How about integration with Pytorch?

I think that integration with Pytorch can trigger most consumers to buy those laptops just to start working with AI.
Also it can be opponent to Macbooks that already use their iGPU for Pytorch tasks and it very popular usage case!!!

And need to remember that Ryzen has dedicated AI engine that more power efficient then Macbook's iGPU that a big pros!

Supporting Linux 6.10

Independently from this driver, I am still have a lot of issues with my HP ZBook Power 15.6 inch G10 A to have basic power control to work correctly so I am trying all the latest Linux versions.
So, as I explained recently to @sonals, if I want to use also XDNA, it would be nice to have XRT & XDNA to track the latest development.

`No such device with index '0'` on 8700G

I'm having trouble running the example after building a kernel with the newest XRT and driver for my AMD Ryzen 7 8700G w/ Radeon 780M Graphics on archlinux. These are what I got after install xrt & xdna-driver

ls /opt/xilinx/xrt/lib
libaws_mpd_plugin.so        libxilinxopencl.so.2       libxrt_coreutil.so            libxrt_noop.so.2.17.0
libazure_mpd_plugin.so      libxilinxopencl.so.2.17.0  libxrt_coreutil.so.2          libxrt++.so
libcontainer_mpd_plugin.so  libxilinxopencl_static.a   libxrt_coreutil.so.2.17.0     libxrt++.so.2
libsched_em.so              libxma2api.so              libxrt_coreutil_static.a      libxrt++.so.2.17.0
libsched_em.so.2            libxma2api.so.2            libxrt_driver_xdna.so         libxrt++_static.a
libsched_em.so.2.17.0       libxma2api.so.2.17.0       libxrt_driver_xdna.so.2       libxrt_swemu.so
libsched_em_v30.so          libxma2plugin.so           libxrt_driver_xdna.so.2.17.0  libxrt_swemu.so.2
libsched_em_v30.so.2        libxma2plugin.so.2         libxrt_hwemu.so               libxrt_swemu.so.2.17.0
libsched_em_v30.so.2.17.0   libxma2plugin.so.2.17.0    libxrt_hwemu.so.2             libxrt_swemu_static.a
libxdp_core.so              libxrt_core.so             libxrt_hwemu.so.2.17.0        xrt
libxdp_core.so.2            libxrt_core.so.2           libxrt_hwemu_static.a
libxdp_core.so.2.17.0       libxrt_core.so.2.17.0      libxrt_noop.so
libxilinxopencl.so          libxrt_core_static.a       libxrt_noop.so.2
ls /usr/lib/firmware/amdnpu/1502_00 -alh
total 292K
drwxr-xr-x 2 root root 4.0K Apr 23 10:58 .
drwxr-xr-x 3 root root 4.0K Apr 23 10:58 ..
-rw-r--r-- 1 root root 281K Apr 23 10:55 npu.sbin
./example_build/example_noop_test ../tools/bins/1502_00/validate.xclbin
Host test code start...
Host test code is creating device object...
ERROR: Caught exception:  No such device with index '0'
TEST FAILED!
uname -a
Linux 16t 6.8.7-iommu-sva-part4-v7-gc97772a3ca59-dirty #1 SMP PREEMPT_DYNAMIC Mon Apr 22 10:02:12 CST 2024 x86_64 GNU/Linux

I'd appreciate any assistance you can offer.

Rebase the required Linux kernel on top of latest released version

The Linux kernel required to run this xdna-driver is based on an old release candidate of Linux 6.7 but Linux 6.7.2 has been released since with a lot of fixes (https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.7.2).
It would be nice to have the latest version on https://github.com/AMDESE/linux to minimize troubles.
Technically I should open this issue on https://github.com/AMDESE/linux but that GitHub repository does not allow opening issue. Strange vision of open-source... ;-)

Guidance please?

Can you please share the roadmap for this project. I which Linux kernel can this driver be found, without messing around with custom builds?

I have a Ryzen 7940hs system that I still cannot use on Linux fully, because it is not officially supported by AMD.

`./build.sh: line 91: jq: command not found`

If the system doesn't have jq installed, download_npufws silently fails (i.e., script doesn't exit) and no firmwares are downloaded and plugins install but still myseriously no devices are found. Recommend set -uo pipefail.

Is possible to develop a custom AI Engine firmware?

I notice the Ryzen AI are using XLNX_VART_FIRMWARE and .xclbin file. I expect it was using the same technology as Xilinx Versal AI Engine. Will AMD open the document and interface for users to develop custom firmware? (That is, develop custom .xclbin file just like Versal device using Vitis IDE
😄). This is great for non-AI applications to utilize the AI Engine inside Ryzen CPU.

Files missing from debian package?

I was able to build everything, but the post-install step breaks:

Successfully intalled and enabled DKMS for xrt-amdxdna/2.17.0
Loading new amdxdna Linux kernel module
Creating xclbin firmware symbolic link
/opt/xilinx/xrt/amdxdna/setup_xclbin_firmware.sh: line 38: cd: /lib/firmware/amdipu/*/: No such file or directory
modprobe: ERROR: could not insert 'amdxdna': Unknown symbol in module, or unknown parameter (see dmesg)
$ sudo dkms status
xrt/2.17.0: added
Error! Could not locate dkms.conf file.
File: /var/lib/dkms/xrt-amdaie/2.17.0/source/dkms.conf does not exist.

Build error for xrt with gcc-14 and solution

I received an error building xrt with gcc-14:

In file included from /home/mike/Development/MISC/xdna-driver_main/xrt/src/runtime_src/tools/xclbinutil/R>
                 from /home/mike/Development/MISC/xdna-driver_main/xrt/src/runtime_src/tools/xclbinutil/R>
/usr/include/rapidjson/document.h: In member function 'rapidjson::GenericStringRef<CharType>& rapidjson::>
/usr/include/rapidjson/document.h:319:82: error: assignment of read-only member 'rapidjson::GenericString>
  319 |     GenericStringRef& operator=(const GenericStringRef& rhs) { s = rhs.s; length = rhs.length; }

This issue is also described in (several) bug reports as an issue with rapidjson 1.1.0 (the current release, although very old):
https://www.mail-archive.com/[email protected]/msg196747.html
https://bugs.gentoo.org/919374

The solution is to either patch 1.1.0 with a more recent commit:

https://github.com/Tencent/rapidjson/commit/3b2441b8.patch

... or pull rapidjson from git upstream (which is what I did to resolve after disabling the build tests):

git clone --depth=1 https://github.com/tencent/rapidjson.git rapidjson
cd rapidjson
mkdir tmpbuild
cd tmpbuild
CC=gcc-14 CXX=g++-14 cmake -DRAPIDJSON_BUILD_TESTS=OFF ..
make -j `nproc`
sudo make install

Now all of the XDNA components (linux kernel, xrt, xrt-xdna) successfully build with gcc-14.

Thanks @maxzhen and @sonals for your help.

Cheers,
Michael

DRM_IOCTL_AMDXDNA_GET_INFO IOCTL failed (err=95): Operation not supported

# xrt_test

====== 0: npu3 xrt vadd started =====
DRM_IOCTL_AMDXDNA_GET_INFO IOCTL failed (err=95): Operation not supported
====== 0: npu3 xrt vadd FAILED  =====

1       test(s) executed
1       test(s) FAILED!
# xbutil validate -d 0000:c7:00.1

XRT build version: 2.17.0
Build hash:
Build date: 2024-06-10 20:36:38
Git branch:
PID: 2214
UID: 0
[Thu Jun 13 10:05:29 2024 GMT]
HOST: m-kot
EXE: /usr/bin/unwrapped/xbutil2
[xbutil] ERROR: DRM_IOCTL_AMDXDNA_GET_INFO IOCTL failed (err=95): Operation not supported
# dmesg | grep -v input | grep -E "xdna|npu|xocl|xclmgmt|drm|gpu"
[    1.347323] xocl: loading out-of-tree module taints kernel.
[    1.356150] xclmgmt init()
[    2.554377] systemd[1]: Starting Load Kernel Module drm...
[    2.560314] systemd[1]: [email protected]: Deactivated successfully.
[    2.560358] systemd[1]: Finished Load Kernel Module drm.
[    2.801077] Loading firmware: amdnpu/1502_00/npu.sbin
[    2.802016] amdxdna 0000:c7:00.1: enabling device (0000 -> 0002)
[    2.809259] amdxdna 0000:c7:00.1: (Develop) IOMMU mode is 0
[    2.840447] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[    2.860449] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
[    2.894357] [drm] amdgpu kernel modesetting enabled.
[    2.897619] amdgpu: Virtual CRAT table created for CPU
[    2.897626] amdgpu: Topology: Add CPU node
[    2.897713] amdgpu 0000:c6:00.0: enabling device (0006 -> 0007)
[    2.897734] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x1900 0x1002:0x0124 0xC5).
[    2.897741] [drm] register mmio base: 0xDC500000
[    2.897741] [drm] register mmio size: 524288
[    2.900727] [drm] add ip block number 0 <soc21_common>
[    2.900730] [drm] add ip block number 1 <gmc_v11_0>
[    2.900732] [drm] add ip block number 2 <ih_v6_0>
[    2.900734] [drm] add ip block number 3 <psp>
[    2.900736] [drm] add ip block number 4 <smu>
[    2.900737] [drm] add ip block number 5 <dm>
[    2.900739] [drm] add ip block number 6 <gfx_v11_0>
[    2.900740] [drm] add ip block number 7 <sdma_v6_0>
[    2.900742] [drm] add ip block number 8 <vcn_v4_0>
[    2.900744] [drm] add ip block number 9 <jpeg_v4_0>
[    2.900745] [drm] add ip block number 10 <mes_v11_0>
[    2.900761] amdgpu 0000:c6:00.0: amdgpu: Fetched VBIOS from VFCT
[    2.900764] amdgpu: ATOM BIOS: 113-PHXGENERIC-001
[    2.900777] Loading firmware: amdgpu/psp_13_0_4_toc.bin
[    2.901513] [drm] Initialized amdxdna_accel_driver 1.0.0 20240124 for 0000:c7:00.1 on minor 0
[    2.901597] Loading firmware: amdgpu/psp_13_0_4_ta.bin
[    2.902331] Loading firmware: amdgpu/dcn_3_1_4_dmcub.bin
[    2.903122] Loading firmware: amdgpu/gc_11_0_1_pfp.bin
[    2.903583] Loading firmware: amdgpu/gc_11_0_1_me.bin
[    2.904080] Loading firmware: amdgpu/gc_11_0_1_rlc.bin
[    2.904605] Loading firmware: amdgpu/gc_11_0_1_mec.bin
[    2.905238] Loading firmware: amdgpu/gc_11_0_1_imu.bin
[    2.905714] Loading firmware: amdgpu/sdma_6_0_1.bin
[    2.906029] [drm] VCN(0) encode/decode are enabled in VM mode
[    2.906031] Loading firmware: amdgpu/vcn_4_0_2.bin
[    2.906770] amdgpu 0000:c6:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
[    2.906882] Loading firmware: amdgpu/gc_11_0_1_mes_2.bin
[    2.907442] Loading firmware: amdgpu/gc_11_0_1_mes1.bin
[    2.908065] amdgpu 0000:c6:00.0: vgaarb: deactivate vga console
[    2.908068] amdgpu 0000:c6:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
[    2.908092] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[    2.908113] amdgpu 0000:c6:00.0: amdgpu: VRAM: 4096M 0x0000008000000000 - 0x00000080FFFFFFFF (4096M used)
[    2.908115] amdgpu 0000:c6:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
[    2.908134] [drm] Detected VRAM RAM=4096M, BAR=4096M
[    2.908135] [drm] RAM width 64bits DDR5
[    2.908208] [drm] amdgpu: 4096M of VRAM memory ready
[    2.908209] [drm] amdgpu: 13942M of GTT memory ready.
[    2.908219] [drm] GART: num cpu pages 131072, num gpu pages 131072
[    2.908448] [drm] PCIE GART of 512M enabled (table at 0x00000080FFD00000).
[    2.908724] [drm] Loading DMUB firmware via PSP: version=0x08003A00
[    2.909005] [drm] Found VCN firmware Version ENC: 1.19 DEC: 7 VEP: 0 Revision: 13
[    2.909008] amdgpu 0000:c6:00.0: amdgpu: Will use PSP to load VCN firmware
[    2.932862] [drm] reserve 0x4000000 from 0x80f8000000 for PSP TMR
[    3.475099] amdgpu 0000:c6:00.0: amdgpu: RAS: optional ras ta ucode is not available
[    3.482664] amdgpu 0000:c6:00.0: amdgpu: RAP: optional rap ta ucode is not available
[    3.482668] amdgpu 0000:c6:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[    3.514961] amdgpu 0000:c6:00.0: amdgpu: SMU is initialized successfully!
[    3.514965] [drm] Seamless boot condition check passed
[    3.516033] [drm] Display Core v3.2.266 initialized on DCN 3.1.4
[    3.516037] [drm] DP-HDMI FRL PCON supported
[    3.518631] [drm] DMUB hardware initialized: version=0x08003A00
[    3.520335] snd_hda_intel 0000:c6:00.1: bound 0000:c6:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[    3.588227] [drm] kiq ring mec 3 pipe 1 q 0
[    3.590651] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[    3.590677] amdgpu 0000:c6:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[    3.659561] amdgpu: HMM registered 4096MB device memory
[    3.660114] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[    3.660126] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
[    3.660221] amdgpu: Virtual CRAT table created for GPU
[    3.660492] amdgpu: Topology: Add dGPU node [0x1900:0x1002]
[    3.660494] kfd kfd: amdgpu: added device 1002:1900
[    3.660503] amdgpu 0000:c6:00.0: amdgpu: SE 1, SH per SE 2, CU per SH 6, active_cu_number 12
[    3.660507] amdgpu 0000:c6:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[    3.660509] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[    3.660510] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[    3.660511] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[    3.660512] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[    3.660513] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[    3.660514] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[    3.660515] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[    3.660516] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[    3.660517] amdgpu 0000:c6:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[    3.660518] amdgpu 0000:c6:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[    3.660519] amdgpu 0000:c6:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[    3.660521] amdgpu 0000:c6:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[    3.661404] [drm] ring gfx_32768.1.1 was added
[    3.661896] [drm] ring compute_32768.2.2 was added
[    3.662375] [drm] ring sdma_32768.3.3 was added
[    3.662420] [drm] ring gfx_32768.1.1 ib test pass
[    3.662451] [drm] ring compute_32768.2.2 ib test pass
[    3.662501] [drm] ring sdma_32768.3.3 ib test pass
[    3.664571] [drm] Initialized amdgpu 3.57.0 20150101 for 0000:c6:00.0 on minor 0
[    3.667849] fbcon: amdgpudrmfb (fb0) is primary device
[    3.667975] [drm] DSC precompute is not needed.
[    4.097544] amdgpu 0000:c6:00.0: [drm] REG_WAIT timeout 1us * 100000 tries - optc314_disable_crtc line:148
[    4.152546] amdgpu 0000:c6:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[   53.990860] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[   54.010771] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
[   54.052663] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[   59.637423] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[   61.574206] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  122.652191] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[  122.672252] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
[  122.714141] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  293.601336] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[  293.621373] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
[  293.663138] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  300.966818] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[  300.987233] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
[  301.030168] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.751547] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[  314.771241] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
[  314.813866] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.814563] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.814690] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.814809] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.814936] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.815040] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.815167] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.815267] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.815369] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.815464] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.815567] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.815655] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.815748] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.815837] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.815929] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.816025] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.816115] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.816209] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.816299] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.816386] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.816482] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.816573] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.816671] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.816766] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.816858] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.816951] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  314.817045] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  370.301712] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[  370.321703] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
[  370.363371] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  891.645155] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[  891.661209] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
[  891.702813] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  900.371245] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[  900.391240] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
[  900.433023] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[  906.238892] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[  906.258981] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz

Provided examples work but there is kernel error at the end of execution

Hello!
All the 3 provided examples seem to work like:

./example_build/example_noop_test /lib/firmware/amdnpu/1502/validate.xclbin
...$ ./example_build/example_noop_test /lib/firmware/amdnpu/1502/validate.xclbin 
Host test code start...
Host test code is creating device object...
Host test code is loading xclbin object...
Host test code is creating kernel object...
Host test code kernel name: DPU_PDI_0
Host code is registering xclbin to the device...
Host code is creating hw_context...
Host test code is creating kernel object...
Host test code allocate buffer objects...
Host test code sync buffer objects to device...
Host test code iterations (~10 seconds): 70000
Host test microseconds: 6962790
Host test average latency: 99 us/iter
TEST PASSED!

but when I look at the dmesg or /var/log/kern.log there is a scary:

2024-02-15T17:51:15.668654-08:00 rk-xsj kernel: [ 2909.731818] ------------[ cut here ]------------
2024-02-15T17:51:15.668663-08:00 rk-xsj kernel: [ 2909.731821] WARNING: CPU: 9 PID: 42463 at drivers/iommu/io-pgfault.c:249 iopf_queue_flush_dev+0x2f/0x40
2024-02-15T17:51:15.668664-08:00 rk-xsj kernel: [ 2909.731827] Modules linked in: amdxdna(OE) drm_shmem_helper xocl(OE) xclmgmt(OE) hid_logitech_hidpp hid_logitech_dj snd_usb_audio snd_usbmidi_lib snd_ump rfcomm snd_seq_dummy snd_hrtimer xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc ipmi_devintf ipmi_msghandler nvme_fabrics vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) ccm overlay cmac algif_hash algif_skcipher af_alg bnep binfmt_misc nls_iso8859_1 intel_rapl_msr joydev intel_rapl_common snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_sof_amd_acp63 snd_sof_amd_vangogh ledtrig_audio snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_hda_codec_hdmi snd_sof snd_hda_intel snd_sof_utils snd_intel_dspcfg snd_intel_sdw_acpi snd_soc_core edac_mce_amd snd_hda_codec mt7921e mt7921_common btusb snd_hda_core snd_compress uvcvideo btrtl mt792x_lib ac97_bus snd_hwdep
2024-02-15T17:51:15.668665-08:00 rk-xsj kernel: [ 2909.731881]  kvm_amd btintel snd_pcm_dmaengine videobuf2_vmalloc mt76_connac_lib btbcm uvc mt76 snd_seq_midi snd_pci_ps btmtk videobuf2_memops snd_seq_midi_event snd_rpl_pci_acp6x videobuf2_v4l2 snd_rawmidi snd_acp_pci mac80211 kvm bluetooth snd_acp_legacy_common videodev snd_pci_acp6x snd_seq snd_pcm irqbypass videobuf2_common ecdh_generic snd_seq_device crct10dif_pclmul crc32_pclmul hid_multitouch ecc mc snd_pci_acp5x snd_timer polyval_clmulni polyval_generic cfg80211 snd_rn_pci_acp3x ghash_clmulni_intel snd_acp_config sha256_ssse3 ucsi_acpi hp_wmi snd sha1_ssse3 r8169 typec_ucsi snd_soc_acpi sparse_keymap rapl platform_profile wmi_bmof thunderbolt k10temp libarc4 realtek soundcore ccp snd_pci_acp3x i2c_piix4 typec nvidia_uvm(POE) i2c_hid_acpi wireless_hotkey i2c_hid amd_pmc msr parport_pc ppdev nfsd lp parport auth_rpcgss nfs_acl lockd grace efi_pstore sunrpc dmi_sysfs ip_tables x_tables autofs4 dm_crypt hid_generic usbhid hid nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) amdgpu amdxcp drm_exec gpu_sched
2024-02-15T17:51:15.668665-08:00 rk-xsj kernel: [ 2909.731942]  drm_buddy drm_suballoc_helper drm_ttm_helper ttm drm_display_helper cec rc_core drm_kms_helper input_leds nvme video drm serio_raw xhci_pci nvme_core xhci_pci_renesas i2c_algo_bit wmi mac_hid aesni_intel crypto_simd cryptd
2024-02-15T17:51:15.668666-08:00 rk-xsj kernel: [ 2909.731958] CPU: 9 PID: 42463 Comm: example_noop_te Tainted: P        W  OE      6.7.4+iommu-sva-v4+ #1
2024-02-15T17:51:15.668666-08:00 rk-xsj kernel: [ 2909.731960] Hardware name: HP HP ZBook Power 15.6 inch G10 A Mobile Workstation PC/8B95, BIOS V85 Ver. 01.03.00 09/11/2023
2024-02-15T17:51:15.668667-08:00 rk-xsj kernel: [ 2909.731961] RIP: 0010:iopf_queue_flush_dev+0x2f/0x40
2024-02-15T17:51:15.668667-08:00 rk-xsj kernel: [ 2909.731964] Code: 48 8b 87 d0 02 00 00 48 8b 40 20 48 85 c0 74 1a 55 48 8b 40 40 48 8b 38 48 89 e5 e8 8b 79 61 ff 31 c0 5d 31 ff e9 6c 06 80 00 <0f> 0b b8 ed ff ff ff 31 ff e9 5e 06 80 00 0f 1f 00 90 90 90 90 90
2024-02-15T17:51:15.668667-08:00 rk-xsj kernel: [ 2909.731965] RSP: 0018:ffffb19f0faffcc8 EFLAGS: 00010246
2024-02-15T17:51:15.668667-08:00 rk-xsj kernel: [ 2909.731967] RAX: 0000000000000000 RBX: ffffa05d41aeb0c0 RCX: 0000000000000000
2024-02-15T17:51:15.668668-08:00 rk-xsj kernel: [ 2909.731968] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa05d41aeb0c0
2024-02-15T17:51:15.668668-08:00 rk-xsj kernel: [ 2909.731969] RBP: ffffb19f0faffd00 R08: 0000000000000000 R09: 0000000000000000
2024-02-15T17:51:15.668668-08:00 rk-xsj kernel: [ 2909.731970] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
2024-02-15T17:51:15.668668-08:00 rk-xsj kernel: [ 2909.731970] R13: ffffa05d41895d80 R14: ffffa0617fc70978 R15: ffffa0617fc70810
2024-02-15T17:51:15.668668-08:00 rk-xsj kernel: [ 2909.731971] FS:  00007fb2bd043c00(0000) GS:ffffa06c75a40000(0000) knlGS:0000000000000000
2024-02-15T17:51:15.668669-08:00 rk-xsj kernel: [ 2909.731973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2024-02-15T17:51:15.668669-08:00 rk-xsj kernel: [ 2909.731974] CR2: 00007fb2bc945400 CR3: 000000068f1ac000 CR4: 0000000000750ef0
2024-02-15T17:51:15.668670-08:00 rk-xsj kernel: [ 2909.731975] PKRU: 55555554
2024-02-15T17:51:15.668670-08:00 rk-xsj kernel: [ 2909.731976] Call Trace:
2024-02-15T17:51:15.668670-08:00 rk-xsj kernel: [ 2909.731977]  <TASK>
2024-02-15T17:51:15.668671-08:00 rk-xsj kernel: [ 2909.731981]  ? show_regs+0x6d/0x80
2024-02-15T17:51:15.668671-08:00 rk-xsj kernel: [ 2909.731984]  ? __warn+0x89/0x160
2024-02-15T17:51:15.668676-08:00 rk-xsj kernel: [ 2909.731987]  ? iopf_queue_flush_dev+0x2f/0x40
2024-02-15T17:51:15.668676-08:00 rk-xsj kernel: [ 2909.731989]  ? report_bug+0x17e/0x1b0
2024-02-15T17:51:15.668677-08:00 rk-xsj kernel: [ 2909.731993]  ? handle_bug+0x51/0xa0
2024-02-15T17:51:15.668677-08:00 rk-xsj kernel: [ 2909.731996]  ? exc_invalid_op+0x18/0x80
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.731998]  ? asm_exc_invalid_op+0x1b/0x20
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.732003]  ? iopf_queue_flush_dev+0x2f/0x40
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.732005]  ? srso_alias_return_thunk+0x5/0xfbef5
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.732006]  ? amd_iommu_remove_dev_pasid+0x7d/0x160
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.732010]  iommu_detach_device_pasid+0x5a/0xa0
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.732013]  iommu_sva_unbind_device+0x3f/0xa0
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.732017]  amdxdna_drm_close+0xa5/0x130 [amdxdna]
2024-02-15T17:51:15.668679-08:00 rk-xsj kernel: [ 2909.732024]  drm_file_free+0x1e6/0x260 [drm]
2024-02-15T17:51:15.668679-08:00 rk-xsj kernel: [ 2909.732045]  drm_release+0xc7/0x150 [drm]
2024-02-15T17:51:15.668679-08:00 rk-xsj kernel: [ 2909.732059]  __fput+0x9e/0x2e0
2024-02-15T17:51:15.668679-08:00 rk-xsj kernel: [ 2909.732063]  __fput_sync+0x1c/0x30
2024-02-15T17:51:15.668679-08:00 rk-xsj kernel: [ 2909.732065]  __x64_sys_close+0x3e/0x90
2024-02-15T17:51:15.668679-08:00 rk-xsj kernel: [ 2909.732068]  do_syscall_64+0x5d/0xf0
2024-02-15T17:51:15.668680-08:00 rk-xsj kernel: [ 2909.732070]  ? srso_alias_return_thunk+0x5/0xfbef5
2024-02-15T17:51:15.668680-08:00 rk-xsj kernel: [ 2909.732072]  ? ksys_write+0x73/0x100
2024-02-15T17:51:15.668680-08:00 rk-xsj kernel: [ 2909.732073]  ? srso_alias_return_thunk+0x5/0xfbef5
2024-02-15T17:51:15.668680-08:00 rk-xsj kernel: [ 2909.732075]  ? exit_to_user_mode_prepare+0x39/0x190
2024-02-15T17:51:15.668680-08:00 rk-xsj kernel: [ 2909.732078]  ? srso_alias_return_thunk+0x5/0xfbef5
2024-02-15T17:51:15.668681-08:00 rk-xsj kernel: [ 2909.732080]  ? syscall_exit_to_user_mode+0x37/0x60
2024-02-15T17:51:15.668681-08:00 rk-xsj kernel: [ 2909.732082]  ? srso_alias_return_thunk+0x5/0xfbef5
2024-02-15T17:51:15.668681-08:00 rk-xsj kernel: [ 2909.732083]  ? do_syscall_64+0x6c/0xf0
2024-02-15T17:51:15.668681-08:00 rk-xsj kernel: [ 2909.732085]  ? do_syscall_64+0x6c/0xf0
2024-02-15T17:51:15.668681-08:00 rk-xsj kernel: [ 2909.732087]  ? exc_page_fault+0x94/0x1b0
2024-02-15T17:51:15.668682-08:00 rk-xsj kernel: [ 2909.732089]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
2024-02-15T17:51:15.668682-08:00 rk-xsj kernel: [ 2909.732091] RIP: 0033:0x7fb2bc7157c4
2024-02-15T17:51:15.668682-08:00 rk-xsj kernel: [ 2909.732093] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 85 0d 0f 00 00 74 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 44 c3 0f 1f 00 48 83 ec 18 89 7c 24 0c e8 13
2024-02-15T17:51:15.668682-08:00 rk-xsj kernel: [ 2909.732094] RSP: 002b:00007ffd629c61e8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
2024-02-15T17:51:15.668682-08:00 rk-xsj kernel: [ 2909.732096] RAX: ffffffffffffffda RBX: 000055808393b6f0 RCX: 00007fb2bc7157c4
2024-02-15T17:51:15.668683-08:00 rk-xsj kernel: [ 2909.732097] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000003
2024-02-15T17:51:15.668683-08:00 rk-xsj kernel: [ 2909.732098] RBP: 000055808393b788 R08: 0000000000000000 R09: 0000000000000000
2024-02-15T17:51:15.668683-08:00 rk-xsj kernel: [ 2909.732098] R10: 0000558083999a60 R11: 0000000000000202 R12: 00007fb2bc9ff100
2024-02-15T17:51:15.668683-08:00 rk-xsj kernel: [ 2909.732099] R13: 000055808393b4f0 R14: 0000000100000001 R15: 00005580839336d0
2024-02-15T17:51:15.668683-08:00 rk-xsj kernel: [ 2909.732102]  </TASK>
2024-02-15T17:51:15.668683-08:00 rk-xsj kernel: [ 2909.732103] ---[ end trace 0000000000000000 ]---

Any idea? Is it normal?
At least it does not crash my work laptop. ;-)

0 devices found after installation

After installing Linux Kernel 6.7, XRT, and this driver on a fresh install of ubuntu 22.04 I'm seeing 0 devices found when running "xbutil examine", the provided code sample is also seg faulting when attempting to load device(0).

This is being executed on a minisforum PC with a 7940HS (UM790 Pro), any pointers with further debugging tips or solutions would be appreciated.

System Configuration
  OS Name              : Linux
  Release              : 6.7.0-rc8+
  Version              : #1 SMP PREEMPT_DYNAMIC Sun Feb 11 17:27:55 EST 2024
  Machine              : x86_64
  CPU Cores            : 16
  Memory               : 62085 MB
  Distribution         : Ubuntu 22.04.3 LTS
  GLIBC                : 2.35
  Model                : Venus series
  BIOS vendor          : American Megatrends International, LLC.
  BIOS version         : 1.09

XRT
  Version              : 2.17.0
  Branch               : master
  Hash                 : a395e702b2e79b3ec23c9cdc3ab4ad31a0d84eab
  Hash Date            : 2024-02-12 12:04:51
  XOCL                 : 2.17.0, a395e702b2e79b3ec23c9cdc3ab4ad31a0d84eab
  XCLMGMT              : 2.17.0, a395e702b2e79b3ec23c9cdc3ab4ad31a0d84eab
  AMDXDNA              : 2.17.0_20240212, 317e0c67747cbf88e5b5a3a81ba4bdf7bf5b3fc3

Devices present
  0 devices found

`example_noop_test` buffer overflows

(base) mlevental@mlevental-F7BSC:/tmp/xdna-driver/build$ ./example_build/example_noop_test ../tools/bins/1502_00/validate.xclbin
Host test code start...
Host test code is creating device object...
Host test code is loading xclbin object...
Host test code is creating kernel object...
Host test code kernel name: DPU_PDI_0
Host code is registering xclbin to the device...
Host code is creating hw_context...
Host test code is creating kernel object...
Host test code allocate buffer objects...
Host test code sync buffer objects to device...
Host test code iterations (~10 seconds): 70000
*** stack smashing detected ***: terminated
Aborted (core dumped)
(base) mlevental@mlevental-F7BSC:/tmp/xdna-driver/build$ git rev-parse HEAD^
85e380b68c8b921b7efab18c1b3280644a982d40
(base) mlevental@mlevental-F7BSC:/tmp/xdna-driver/build$ /opt/xilinx/xrt/bin/xbutil examine
System Configuration
  OS Name              : Linux
  Release              : 6.8.8
  Version              : #2 SMP PREEMPT_DYNAMIC Fri May  3 14:13:56 CDT 2024
  Machine              : x86_64
  CPU Cores            : 16
  Memory               : 94278 MB
  Distribution         : Ubuntu 22.04.3 LTS
  GLIBC                : 2.35
  Model                : F7BSC
  BIOS vendor          : American Megatrends International, LLC.
  BIOS version         : 1.04

XRT
  Version              : 2.18.0
  Branch               : HEAD
  Hash                 : c678a9469f9b20fcb9a04bbedb5c51f8473faec0
  Hash Date            : 2024-05-24 18:16:53
  XOCL                 : unknown, unknown
  XCLMGMT              : unknown, unknown
WARNING: xclmgmt version is unknown. Is xclmgmt driver loaded? Or is MSD/MPD running?
  AMDXDNA              : 2.18.0_20240613, 099bf0332e3d5692f98fbbf309fa9177a95f10da
  Firmware Version     : N/A

Devices present
BDF             :  Name          
---------------------------------
[0000:c5:00.1]  :  RyzenAI-npu1  

(base) mlevental@mlevental-F7BSC:/tmp/xdna-driver/build$ uname -r
6.8.8

What is the capability scope of this driver?

Question not an Issue

Got the installation working on Linux which is great, but I was wondering, what is the scope of what the driver will allow you to do? It seems that most of the software located in the Ryzen SW repository is strictly for windows (hard dependency on .dll, presence of .bat files etc.). Is there any plan on porting some of those tutorials to work with the Linux driver OR any way to get them working at the moment? Ideally it'd be nice to have a quick way to test executing some model via the ONNX runtime from a python script, similar to what they have available for windows at the moment. Any pointers to places where I can read more about the capabilities of this driver would be greatly appreciated too!

amdxdna_ctx.c build errors

I'm trying to refresh one of my machines that has Ryzen AI, I get these build errors:

/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c: In function ‘amdxdna_sched_job_init’:
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:171:62: error: passing argument 3 of ‘drm_sched_job_init’ makes pointer from integer without a cast [-Werror=int-conversion]
  171 |         ret = drm_sched_job_init(&job->base, &hwctx->entity, 1, hwctx);
      |                                                              ^
      |                                                              |
      |                                                              int
In file included from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.h:11,
                 from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_drv.h:16,
                 from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:12:
./include/drm/gpu_scheduler.h:533:30: note: expected ‘void *’ but argument is of type ‘int’
  533 |                        void *owner);
      |                        ~~~~~~^~~~~
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:171:15: error: too many arguments to function ‘drm_sched_job_init’
  171 |         ret = drm_sched_job_init(&job->base, &hwctx->entity, 1, hwctx);
      |               ^~~~~~~~~~~~~~~~~~
In file included from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.h:11,
                 from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_drv.h:16,
                 from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:12:
./include/drm/gpu_scheduler.h:531:5: note: declared here
  531 | int drm_sched_job_init(struct drm_sched_job *job,
      |     ^~~~~~~~~~~~~~~~~~
In file included from ./include/uapi/linux/posix_types.h:5,
                 from ./include/uapi/linux/types.h:14,
                 from ./include/linux/types.h:6,
                 from ./include/linux/kasan-checks.h:5,
                 from ./include/asm-generic/rwonce.h:26,
                 from ./arch/x86/include/generated/asm/rwonce.h:1,
                 from ./include/linux/compiler.h:251,
                 from ./include/linux/export.h:5,
                 from ./include/linux/linkage.h:7,
                 from ./include/linux/preempt.h:10,
                 from ./include/linux/spinlock.h:56,
                 from ./include/linux/kref.h:16,
                 from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:7:
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c: In function ‘amdxdna_hwctx_create’:
./include/linux/stddef.h:8:14: error: passing argument 3 of ‘drm_sched_init’ makes integer from pointer without a cast [-Werror=int-conversion]
    8 | #define NULL ((void *)0)
      |              ^~~~~~~~~~~
      |              |
      |              void *
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:494:49: note: in expansion of macro ‘NULL’
  494 |         ret = drm_sched_init(sched, &sched_ops, NULL, DRM_SCHED_PRIORITY_COUNT,
      |                                                 ^~~~
In file included from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.h:11,
                 from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_drv.h:16,
                 from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:12:
./include/drm/gpu_scheduler.h:526:24: note: expected ‘u32’ {aka ‘unsigned int’} but argument is of type ‘void *’
  526 |                    u32 num_rqs, uint32_t hw_submission, unsigned int hang_limit,
      |                    ~~~~^~~~~~~
In file included from ./include/linux/limits.h:7,
                 from ./include/linux/kernel.h:17,
                 from ./arch/x86/include/asm/percpu.h:27,
                 from ./arch/x86/include/asm/preempt.h:6,
                 from ./include/linux/preempt.h:79,
                 from ./include/linux/spinlock.h:56,
                 from ./include/linux/kref.h:16,
                 from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:7:
./include/vdso/limits.h:11:25: error: passing argument 7 of ‘drm_sched_init’ makes pointer from integer without a cast [-Werror=int-conversion]
   11 | #define LONG_MAX        ((long)(~0UL >> 1))
      |                         ^~~~~~~~~~~~~~~~~~~
      |                         |
      |                         long int
./include/linux/sched.h:296:41: note: in expansion of macro ‘LONG_MAX’
  296 | #define MAX_SCHEDULE_TIMEOUT            LONG_MAX
      |                                         ^~~~~~~~
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:495:49: note: in expansion of macro ‘MAX_SCHEDULE_TIMEOUT’
  495 |                              HWCTX_MAX_CMDS, 0, MAX_SCHEDULE_TIMEOUT, NULL,
      |                                                 ^~~~~~~~~~~~~~~~~~~~
In file included from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.h:11,
                 from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_drv.h:16,
                 from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:12:
./include/drm/gpu_scheduler.h:527:59: note: expected ‘struct workqueue_struct *’ but argument is of type ‘long int’
  527 |                    long timeout, struct workqueue_struct *timeout_wq,
      |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:496:41: error: passing argument 10 of ‘drm_sched_init’ from incompatible pointer type [-Werror=incompatible-pointer-types]
  496 |                              NULL, hwctx->name, &client->xdna->pdev->dev);
      |                                    ~~~~~^~~~~~
      |                                         |
      |                                         char *
In file included from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.h:11,
                 from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_drv.h:16,
                 from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:12:
./include/drm/gpu_scheduler.h:528:70: note: expected ‘struct device *’ but argument is of type ‘char *’
  528 |                    atomic_t *score, const char *name, struct device *dev);
      |                                                       ~~~~~~~~~~~~~~~^~~
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:494:15: error: too many arguments to function ‘drm_sched_init’
  494 |         ret = drm_sched_init(sched, &sched_ops, NULL, DRM_SCHED_PRIORITY_COUNT,
      |               ^~~~~~~~~~~~~~
In file included from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.h:11,
                 from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_drv.h:16,
                 from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:12:
./include/drm/gpu_scheduler.h:524:5: note: declared here
  524 | int drm_sched_init(struct drm_gpu_scheduler *sched,
      |     ^~~~~~~~~~~~~~
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c: In function ‘amdxdna_hwctx_destroy_rcu’:
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:767:9: error: implicit declaration of function ‘drm_sched_wqueue_stop’; did you mean ‘drm_sched_stop’? [-Werror=implicit-function-declaration]
  767 |         drm_sched_wqueue_stop(&hwctx->sched);
      |         ^~~~~~~~~~~~~~~~~~~~~
      |         drm_sched_stop
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:777:9: error: implicit declaration of function ‘drm_sched_wqueue_start’; did you mean ‘drm_sched_start’? [-Werror=implicit-function-declaration]
  777 |         drm_sched_wqueue_start(&hwctx->sched);
      |         ^~~~~~~~~~~~~~~~~~~~~~
      |         drm_sched_start
cc1: all warnings being treated as errors

Problems with suspend and hibernate

While I have a lot of issues with power control on my laptop with 6.5 and 6.7.4 kernel independently from this project, it seems that sometimes the amdxdna prevents the kernel from stopping too.
When it works:

2024-02-16T09:50:10.661313-08:00 rk-xsj kernel: [  139.176934] amdxdna 0000:66:00.1: firmware resuming...
2024-02-16T09:50:10.661314-08:00 rk-xsj kernel: [  139.177070] amdxdna 0000:66:00.1: hardware context resuming...

When it fails:

2024-02-16T10:00:26.051836-08:00 rk-xsj kernel: [  668.382497] amdxdna 0000:66:00.1: amdxdna_do_suspend: suspend NPU firmware failed
2024-02-16T10:00:26.051837-08:00 rk-xsj kernel: [  668.382501] amdxdna 0000:66:00.1: PM: pci_pm_suspend(): amdxdna_pmops_suspend+0x0/0x80 [amdxdna] returns -19
2024-02-16T10:00:26.051837-08:00 rk-xsj kernel: [  668.382514] amdxdna 0000:66:00.1: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x1b0 returns -19
2024-02-16T10:00:26.051838-08:00 rk-xsj kernel: [  668.382519] amdxdna 0000:66:00.1: PM: failed to suspend async: error -19
2024-02-16T10:00:26.051838-08:00 rk-xsj kernel: [  668.478472] PM: Some devices failed to suspend, or early wake event detected

Is this a known problem?

clinfo outputs the '[XRT] ERROR: get_device_info: Operation not supported'

Should it work ?

clinfo 
Number of platforms:				 2
  Platform Profile:				 EMBEDDED_PROFILE
  Platform Version:				 OpenCL 1.0
  Platform Name:				 Xilinx
  Platform Vendor:				 Xilinx
  Platform Extensions:				 cl_khr_icd
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.1 AMD-APP (3558.0)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback 


  Platform Name:				 Xilinx
Number of devices:				 1
XRT build version: 2.17.0
Build hash: baf88820fb3fc24dda4dc08c91ecbca2c76c7b0f
Build date: 2024-04-17 13:03:42
Git branch: HEAD
PID: 61787
UID: 0
[Wed Apr 17 16:03:16 2024 GMT]
HOST: Shiva
EXE: /opt/rocm-5.5.0/bin/clinfo
[XRT] ERROR: get_device_info: Operation not supported
ERROR: clGetDeviceInfo(-6)

It is AMD Ryzen 7 7840HS:

xbutil examine
System Configuration
  OS Name              : Linux
  Release              : 6.8.6-060806-generic
  Version              : #202404131135 SMP PREEMPT_DYNAMIC Wed Apr 17 04:32:21 EEST 2024
  Machine              : x86_64
  CPU Cores            : 16
  Memory               : 95772 MB
  Distribution         : Ubuntu 22.04.4 LTS
  GLIBC                : 2.35
  Model                : TUXEDO Sirius 16 Gen1
  BIOS vendor          : American Megatrends International, LLC.
  BIOS version         : V1.00A00_20240108

XRT
  Version              : 2.17.0
  Branch               : HEAD
  Hash                 : baf88820fb3fc24dda4dc08c91ecbca2c76c7b0f
  Hash Date            : 2024-04-17 13:03:42
  XOCL                 : 2.17.0, baf88820fb3fc24dda4dc08c91ecbca2c76c7b0f
  XCLMGMT              : 2.17.0, baf88820fb3fc24dda4dc08c91ecbca2c76c7b0f
  AMDXDNA              : 2.17.0_20240417, 35351e4bbbc65568669c36255825425030be721f

Devices present
BDF             :  Name          
---------------------------------
[0000:6a:00.1]  :  RyzenAI-npu1  

Kernel 6.9 released

Hello,

question. Does Kernel 6.9 include the necessary patches (IOMMU SVA) or does it still require a custom kernel?

Thx

kernel oops while running 'xbutil validate'

AMD Ryzen 7 7840HS, Ubuntu 22.04:

System Configuration
  OS Name              : Linux
  Release              : 6.8.7-060807-generic
  Version              : #202404170934 SMP PREEMPT_DYNAMIC Thu Apr 18 13:01:01 EEST 2024
  Machine              : x86_64
  CPU Cores            : 16
  Memory               : 95777 MB
  Distribution         : Ubuntu 22.04.4 LTS
  GLIBC                : 2.35
  Model                : TUXEDO Sirius 16 Gen1
  BIOS vendor          : American Megatrends International, LLC.
  BIOS version         : V1.00A00_20240108

XRT
  Version              : 2.17.0
  Branch               : HEAD
  Hash                 : baf88820fb3fc24dda4dc08c91ecbca2c76c7b0f
  Hash Date            : 2024-04-17 13:03:42
  XOCL                 : 2.17.0, baf88820fb3fc24dda4dc08c91ecbca2c76c7b0f
  XCLMGMT              : 2.17.0, baf88820fb3fc24dda4dc08c91ecbca2c76c7b0f
  AMDXDNA              : 2.17.0_20240417, 35351e4bbbc65568669c36255825425030be721f

Devices present
BDF             :  Name          
---------------------------------
[0000:6a:00.1]  :  RyzenAI-npu1  

'xbutil validate' freezes at:

------------------------------------------------------------
                        EARLY ACCESS                        
        This release of xbutil contains early access        
         experimental features which may have bugs.         
------------------------------------------------------------
Validate Device           : [0000:6a:00.1]
    Platform              : RyzenAI-npu1
-------------------------------------------------------------------------------
Test 1 [0000:6a:00.1]     : verify                                              
    Details               : Kernel name is 'DPU_PDI_0'
                            Total duration: '1.1's
                            Average throughput: '9490.3' ops/s
                            Average latency: '105.4' us
    Test Status           : [PASSED]
-------------------------------------------------------------------------------
[          <->    <->Running Test>    <->]: Running Test... < 59s >
terminate called after throwing an instance of 'boost::wrapexcept<boost::io::too_many_args>'
  what():  boost::too_many_args: format-string referred to fewer arguments than were passed

Oops:

[Fri Apr 19 07:03:53 2024] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP NOPTI
[Fri Apr 19 07:03:53 2024] CPU: 6 PID: 21072 Comm: xbutil2 Tainted: G        W  O       6.8.7-060807-generic #202404170934
[Fri Apr 19 07:03:53 2024] Hardware name: TUXEDO TUXEDO Sirius 16 Gen1/APX958, BIOS V1.00A00_20240108 01/08/2024
[Fri Apr 19 07:03:53 2024] RIP: 0010:amdxdna_flush+0x39/0xa0 [amdxdna]
[Fri Apr 19 07:03:53 2024] Code: c8 00 00 00 48 8b 98 98 00 00 00 4c 8b 63 68 66 90 49 81 c4 28 06 00 00 4c 89 e7 e8 d1 fe 91 eb 48 8b 13 48 8b 43 08 48 89 df <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 03 48 83
[Fri Apr 19 07:03:53 2024] RSP: 0018:ffff9fd28b9cfc40 EFLAGS: 00010246
[Fri Apr 19 07:03:53 2024] RAX: dead000000000122 RBX: ffff8ea8a814cfc0 RCX: ffff8ea7763e0800
[Fri Apr 19 07:03:53 2024] RDX: dead000000000100 RSI: ffff8ea7503f1b80 RDI: ffff8ea8a814cfc0
[Fri Apr 19 07:03:53 2024] RBP: ffff9fd28b9cfc50 R08: 0000000000000000 R09: 0000000000000000
[Fri Apr 19 07:03:53 2024] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8ea7cc212628
[Fri Apr 19 07:03:53 2024] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8eb3b04cb340
[Fri Apr 19 07:03:53 2024] FS:  0000000000000000(0000) GS:ffff8ebc1e700000(0000) knlGS:0000000000000000
[Fri Apr 19 07:03:53 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri Apr 19 07:03:53 2024] CR2: 00007dfa3c600000 CR3: 0000000f2623c000 CR4: 0000000000f50ef0
[Fri Apr 19 07:03:53 2024] PKRU: 55555554
[Fri Apr 19 07:03:53 2024] Call Trace:
[Fri Apr 19 07:03:53 2024]  <TASK>
[Fri Apr 19 07:03:53 2024]  ? show_regs+0x6d/0x80
[Fri Apr 19 07:03:53 2024]  ? die_addr+0x37/0xa0
[Fri Apr 19 07:03:53 2024]  ? exc_general_protection+0x1db/0x480
[Fri Apr 19 07:03:53 2024]  ? asm_exc_general_protection+0x27/0x30
[Fri Apr 19 07:03:53 2024]  ? amdxdna_flush+0x39/0xa0 [amdxdna]
[Fri Apr 19 07:03:53 2024]  filp_flush+0x35/0x90
[Fri Apr 19 07:03:53 2024]  filp_close+0x14/0x30
[Fri Apr 19 07:03:53 2024]  put_files_struct+0x85/0xf0
[Fri Apr 19 07:03:53 2024]  exit_files+0x47/0x60
[Fri Apr 19 07:03:53 2024]  do_exit+0x295/0x530
[Fri Apr 19 07:03:53 2024]  do_group_exit+0x35/0x90
[Fri Apr 19 07:03:53 2024]  get_signal+0x954/0x990
[Fri Apr 19 07:03:53 2024]  ? srso_alias_return_thunk+0x5/0xfbef5
[Fri Apr 19 07:03:53 2024]  ? hrtimer_nanosleep+0xbf/0x1a0
[Fri Apr 19 07:03:53 2024]  arch_do_signal_or_restart+0x39/0x120
[Fri Apr 19 07:03:53 2024]  syscall_exit_to_user_mode+0x209/0x260
[Fri Apr 19 07:03:53 2024]  do_syscall_64+0x8c/0x180
[Fri Apr 19 07:03:53 2024]  ? syscall_exit_to_user_mode+0x89/0x260
[Fri Apr 19 07:03:53 2024]  ? srso_alias_return_thunk+0x5/0xfbef5
[Fri Apr 19 07:03:53 2024]  ? do_syscall_64+0x8c/0x180
[Fri Apr 19 07:03:53 2024]  ? srso_alias_return_thunk+0x5/0xfbef5
[Fri Apr 19 07:03:53 2024]  ? irqentry_exit_to_user_mode+0x7e/0x260
[Fri Apr 19 07:03:53 2024]  ? srso_alias_return_thunk+0x5/0xfbef5
[Fri Apr 19 07:03:53 2024]  ? irqentry_exit+0x43/0x50
[Fri Apr 19 07:03:53 2024]  ? srso_alias_return_thunk+0x5/0xfbef5
[Fri Apr 19 07:03:53 2024]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[Fri Apr 19 07:03:53 2024] RIP: 0033:0x79d1b94e57f8
[Fri Apr 19 07:03:53 2024] Code: Unable to access opcode bytes at 0x79d1b94e57ce.
[Fri Apr 19 07:03:53 2024] RSP: 002b:00007fff79b86300 EFLAGS: 00000293 ORIG_RAX: 00000000000000e6
[Fri Apr 19 07:03:53 2024] RAX: fffffffffffffdfc RBX: 00007fff79b86301 RCX: 000079d1b94e57f8
[Fri Apr 19 07:03:53 2024] RDX: 00007fff79b863a0 RSI: 0000000000000000 RDI: 0000000000000000
[Fri Apr 19 07:03:53 2024] RBP: 00007fff79b863d0 R08: 0000000000000000 R09: 0000000000000000
[Fri Apr 19 07:03:53 2024] R10: 00007fff79b863a0 R11: 0000000000000293 R12: 00007fff79b863a0
[Fri Apr 19 07:03:53 2024] R13: 0000000000000000 R14: 00007fff79b863a0 R15: 00007fff79b86c50
[Fri Apr 19 07:03:53 2024]  </TASK>
[Fri Apr 19 07:03:53 2024] Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype br_netfilter xfrm_user xfrm_algo rfcomm xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink xclmgmt(O) xocl(O) snd_seq_dummy snd_hrtimer bridge stp llc cmac algif_hash algif_skcipher af_alg bnep nvme_fabrics overlay zram binfmt_misc nls_iso8859_1 btusb btrtl btintel btbcm btmtk bluetooth ecdh_generic ecc snd_ctl_led ledtrig_audio snd_soc_dmic snd_ps_pdm_dma snd_soc_ps_mach snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp intel_rapl_common snd_sof snd_usb_audio uvcvideo edac_mce_amd snd_sof_utils videobuf2_vmalloc uvc snd_usbmidi_lib videobuf2_memops snd_soc_core snd_ump kvm_amd videobuf2_v4l2 ch341 snd_hda_codec_conexant snd_hda_codec_generic usbserial videodev snd_hda_codec_hdmi snd_rawmidi snd_compress kvm videobuf2_common ac97_bus mc snd_pcm_dmaengine irqbypass
[Fri Apr 19 07:03:53 2024]  snd_hda_intel iwlmvm rapl snd_pci_ps snd_intel_dspcfg input_leds mac80211 libarc4 snd_intel_sdw_acpi snd_hda_codec serio_raw snd_rpl_pci_acp6x hid_multitouch snd_acp_pci snd_hda_core snd_acp_legacy_common wmi_bmof snd_pci_acp6x snd_hwdep snd_pcm k10temp iwlwifi snd_pci_acp5x snd_rn_pci_acp3x amdxdna(O) snd_acp_config snd_seq snd_soc_acpi snd_pci_acp3x amd_pmf sp5100_tco snd_seq_device snd_timer cfg80211 snd soundcore amdtee soc_button_array mac_hid ccp amd_sfh tee amd_pmc platform_profile sch_fq_codel nfsd auth_rpcgss msr nfs_acl parport_pc lockd grace ppdev parport bfq efi_pstore sunrpc ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 uas usbhid usb_storage amdgpu amdxcp drm_exec gpu_sched drm_buddy i2c_algo_bit drm_suballoc_helper drm_ttm_helper crct10dif_pclmul ttm crc32_pclmul drm_display_helper polyval_clmulni polyval_generic cec nvme hid_generic sha256_ssse3 sha1_ssse3 xhci_pci r8169 nvme_core thunderbolt video
[Fri Apr 19 07:03:53 2024]  rc_core xhci_pci_renesas realtek nvme_auth wmi hid aesni_intel crypto_simd cryptd [last unloaded: i2c_hid]
[Fri Apr 19 07:03:53 2024] ---[ end trace 0000000000000000 ]---
[Fri Apr 19 07:03:53 2024] RIP: 0010:amdxdna_flush+0x39/0xa0 [amdxdna]
[Fri Apr 19 07:03:53 2024] Code: c8 00 00 00 48 8b 98 98 00 00 00 4c 8b 63 68 66 90 49 81 c4 28 06 00 00 4c 89 e7 e8 d1 fe 91 eb 48 8b 13 48 8b 43 08 48 89 df <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 03 48 83
[Fri Apr 19 07:03:53 2024] RSP: 0018:ffff9fd28b9cfc40 EFLAGS: 00010246
[Fri Apr 19 07:03:53 2024] RAX: dead000000000122 RBX: ffff8ea8a814cfc0 RCX: ffff8ea7763e0800
[Fri Apr 19 07:03:53 2024] RDX: dead000000000100 RSI: ffff8ea7503f1b80 RDI: ffff8ea8a814cfc0
[Fri Apr 19 07:03:53 2024] RBP: ffff9fd28b9cfc50 R08: 0000000000000000 R09: 0000000000000000
[Fri Apr 19 07:03:53 2024] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8ea7cc212628
[Fri Apr 19 07:03:53 2024] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8eb3b04cb340
[Fri Apr 19 07:03:53 2024] FS:  0000000000000000(0000) GS:ffff8ebc1e700000(0000) knlGS:0000000000000000
[Fri Apr 19 07:03:53 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri Apr 19 07:03:53 2024] CR2: 00007dfa3c600000 CR3: 0000000ca84a0000 CR4: 0000000000f50ef0
[Fri Apr 19 07:03:53 2024] PKRU: 55555554
[Fri Apr 19 07:03:53 2024] Fixing recursive fault but reboot is needed!
[Fri Apr 19 07:03:53 2024] BUG: scheduling while atomic: xbutil2/21072/0x00000000
[Fri Apr 19 07:03:53 2024] Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype br_netfilter xfrm_user xfrm_algo rfcomm xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink xclmgmt(O) xocl(O) snd_seq_dummy snd_hrtimer bridge stp llc cmac algif_hash algif_skcipher af_alg bnep nvme_fabrics overlay zram binfmt_misc nls_iso8859_1 btusb btrtl btintel btbcm btmtk bluetooth ecdh_generic ecc snd_ctl_led ledtrig_audio snd_soc_dmic snd_ps_pdm_dma snd_soc_ps_mach snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp intel_rapl_common snd_sof snd_usb_audio uvcvideo edac_mce_amd snd_sof_utils videobuf2_vmalloc uvc snd_usbmidi_lib videobuf2_memops snd_soc_core snd_ump kvm_amd videobuf2_v4l2 ch341 snd_hda_codec_conexant snd_hda_codec_generic usbserial videodev snd_hda_codec_hdmi snd_rawmidi snd_compress kvm videobuf2_common ac97_bus mc snd_pcm_dmaengine irqbypass
[Fri Apr 19 07:03:53 2024]  snd_hda_intel iwlmvm rapl snd_pci_ps snd_intel_dspcfg input_leds mac80211 libarc4 snd_intel_sdw_acpi snd_hda_codec serio_raw snd_rpl_pci_acp6x hid_multitouch snd_acp_pci snd_hda_core snd_acp_legacy_common wmi_bmof snd_pci_acp6x snd_hwdep snd_pcm k10temp iwlwifi snd_pci_acp5x snd_rn_pci_acp3x amdxdna(O) snd_acp_config snd_seq snd_soc_acpi snd_pci_acp3x amd_pmf sp5100_tco snd_seq_device snd_timer cfg80211 snd soundcore amdtee soc_button_array mac_hid ccp amd_sfh tee amd_pmc platform_profile sch_fq_codel nfsd auth_rpcgss msr nfs_acl parport_pc lockd grace ppdev parport bfq efi_pstore sunrpc ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 uas usbhid usb_storage amdgpu amdxcp drm_exec gpu_sched drm_buddy i2c_algo_bit drm_suballoc_helper drm_ttm_helper crct10dif_pclmul ttm crc32_pclmul drm_display_helper polyval_clmulni polyval_generic cec nvme hid_generic sha256_ssse3 sha1_ssse3 xhci_pci r8169 nvme_core thunderbolt video
[Fri Apr 19 07:03:53 2024]  rc_core xhci_pci_renesas realtek nvme_auth wmi hid aesni_intel crypto_simd cryptd [last unloaded: i2c_hid]
[Fri Apr 19 07:03:53 2024] CPU: 6 PID: 21072 Comm: xbutil2 Tainted: G      D W  O       6.8.7-060807-generic #202404170934
[Fri Apr 19 07:03:53 2024] Hardware name: TUXEDO TUXEDO Sirius 16 Gen1/APX958, BIOS V1.00A00_20240108 01/08/2024
[Fri Apr 19 07:03:53 2024] Call Trace:
[Fri Apr 19 07:03:53 2024]  <TASK>
[Fri Apr 19 07:03:53 2024]  dump_stack_lvl+0x76/0xa0
[Fri Apr 19 07:03:53 2024]  dump_stack+0x10/0x20
[Fri Apr 19 07:03:53 2024]  __schedule_bug+0x64/0x80
[Fri Apr 19 07:03:53 2024]  schedule_debug.isra.0+0xdb/0x130
[Fri Apr 19 07:03:53 2024]  __schedule+0x69/0x6b0
[Fri Apr 19 07:03:53 2024]  ? srso_alias_return_thunk+0x5/0xfbef5
[Fri Apr 19 07:03:53 2024]  ? vprintk+0x42/0x80
[Fri Apr 19 07:03:53 2024]  ? srso_alias_return_thunk+0x5/0xfbef5
[Fri Apr 19 07:03:53 2024]  ? _printk+0x60/0x90
[Fri Apr 19 07:03:53 2024]  do_task_dead+0x44/0x50
[Fri Apr 19 07:03:53 2024]  make_task_dead+0x13e/0x140
[Fri Apr 19 07:03:53 2024]  rewind_stack_and_make_dead+0x17/0x20
[Fri Apr 19 07:03:53 2024] RIP: 0033:0x79d1b94e57f8
[Fri Apr 19 07:03:53 2024] Code: Unable to access opcode bytes at 0x79d1b94e57ce.
[Fri Apr 19 07:03:53 2024] RSP: 002b:00007fff79b86300 EFLAGS: 00000293 ORIG_RAX: 00000000000000e6
[Fri Apr 19 07:03:53 2024] RAX: fffffffffffffdfc RBX: 00007fff79b86301 RCX: 000079d1b94e57f8
[Fri Apr 19 07:03:53 2024] RDX: 00007fff79b863a0 RSI: 0000000000000000 RDI: 0000000000000000
[Fri Apr 19 07:03:53 2024] RBP: 00007fff79b863d0 R08: 0000000000000000 R09: 0000000000000000
[Fri Apr 19 07:03:53 2024] R10: 00007fff79b863a0 R11: 0000000000000293 R12: 00007fff79b863a0
[Fri Apr 19 07:03:53 2024] R13: 0000000000000000 R14: 00007fff79b863a0 R15: 00007fff79b86c50
[Fri Apr 19 07:03:53 2024]  </TASK>

UBSAN: array-index-out-of-bounds in /var/lib/dkms/xrt-amdxdna/2.17.0/build/driver/amdxdna/npu1_message.c:488:57

With current driver I can run

-<%>- ./example_build/example_noop_test /lib/firmware/amdnpu/1502_00/validate.xclbin                 
Host test code start...
Host test code is creating device object...
Host test code is loading xclbin object...
Host test code is creating kernel object...
Host test code kernel name: DPU_PDI_0
Host code is registering xclbin to the device...
Host code is creating hw_context...
Host test code is creating kernel object...
Host test code allocate buffer objects...
Host test code sync buffer objects to device...
Host test code iterations (~10 seconds): 70000
Host test microseconds: 6833094
Host test average latency: 97 us/iter
TEST PASSED!

But I have this kernel error message:

[Wed Apr  3 13:18:21 2024] [drm] Initialized amdxdna_accel_driver 1.0.0 20240124 for 0000:66:00.1 on minor 0
[Wed Apr  3 13:19:20 2024] ------------[ cut here ]------------
[Wed Apr  3 13:19:20 2024] UBSAN: array-index-out-of-bounds in /var/lib/dkms/xrt-amdxdna/2.17.0/build/driver/amdxdna/npu1_message.c:488:57
[Wed Apr  3 13:19:20 2024] index 1 is out of range for type 'amdxdna_cu_config [*]'
[Wed Apr  3 13:19:20 2024] CPU: 3 PID: 27749 Comm: example_noop_te Tainted: P        W  OE      6.8.2+iommu-sva-part4-v7+pmc-10ms-delay+ #2
[Wed Apr  3 13:19:20 2024] Hardware name: HP HP ZBook Power 15.6 inch G10 A Mobile Workstation PC/8B95, BIOS V85 Ver. 01.04.00 01/18/2024
[Wed Apr  3 13:19:20 2024] Call Trace:
[Wed Apr  3 13:19:20 2024]  <TASK>
[Wed Apr  3 13:19:20 2024]  dump_stack_lvl+0x48/0x70
[Wed Apr  3 13:19:20 2024]  dump_stack+0x10/0x20
[Wed Apr  3 13:19:20 2024]  __ubsan_handle_out_of_bounds+0xc6/0x110
[Wed Apr  3 13:19:20 2024]  npu1_config_cu+0x36a/0x3e0 [amdxdna]
[Wed Apr  3 13:19:20 2024]  ? __pfx_npu_msg_cb+0x10/0x10 [amdxdna]
[Wed Apr  3 13:19:20 2024]  npu1_hwctx_config+0xa4/0x200 [amdxdna]
[Wed Apr  3 13:19:20 2024]  amdxdna_drm_config_hwctx_ioctl+0xa4/0x140 [amdxdna]
[Wed Apr  3 13:19:20 2024]  ? __pfx_amdxdna_drm_config_hwctx_ioctl+0x10/0x10 [amdxdna]
[Wed Apr  3 13:19:20 2024]  drm_ioctl_kernel+0xb9/0x120
[Wed Apr  3 13:19:20 2024]  ? srso_alias_return_thunk+0x5/0xfbef5
[Wed Apr  3 13:19:20 2024]  drm_ioctl+0x2d4/0x550
[Wed Apr  3 13:19:20 2024]  ? __pfx_amdxdna_drm_config_hwctx_ioctl+0x10/0x10 [amdxdna]
[Wed Apr  3 13:19:20 2024]  __x64_sys_ioctl+0xa0/0xf0
[Wed Apr  3 13:19:20 2024]  do_syscall_64+0x74/0x140
[Wed Apr  3 13:19:20 2024]  ? srso_alias_return_thunk+0x5/0xfbef5
[Wed Apr  3 13:19:20 2024]  ? sysvec_apic_timer_interrupt+0x4b/0xd0
[Wed Apr  3 13:19:20 2024]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[Wed Apr  3 13:19:20 2024] RIP: 0033:0x751c6a12396f
[Wed Apr  3 13:19:20 2024] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[Wed Apr  3 13:19:20 2024] RSP: 002b:00007ffc7c102470 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[Wed Apr  3 13:19:20 2024] RAX: ffffffffffffffda RBX: 00000000c0186442 RCX: 0000751c6a12396f
[Wed Apr  3 13:19:20 2024] RDX: 00007ffc7c1026b0 RSI: 00000000c0186442 RDI: 0000000000000003
[Wed Apr  3 13:19:20 2024] RBP: 00007ffc7c1026b0 R08: 0000751c641e0000 R09: 0000000000000003
[Wed Apr  3 13:19:20 2024] R10: 0000598033302d40 R11: 0000000000000246 R12: 00005980332d69e0
[Wed Apr  3 13:19:20 2024] R13: 000059803331ba28 R14: 0000598033302c50 R15: 0000751c6aa72660
[Wed Apr  3 13:19:20 2024]  </TASK>
[Wed Apr  3 13:19:20 2024] ---[ end trace ]---

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.