amd / xdna-driver Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
While building xrt, I encounter a broken Ubuntu version check. (I use Debian Unstable.)
Build Error:
CMake Error at CMake/cpackLin.cmake:94 (if):
if given arguments:
"(" "debian" "MATCHES" "^(ubuntu)" ")" "AND" "(" "STREQUAL" "23.10" ")"
Unknown arguments specified
Call Stack (most recent call first):
CMake/nativeLnx.cmake:199 (include)
CMakeLists.txt:123 (include)
Commenting out the code is the quick fix for me, but raising the issue for proper handling:
--- cpackLin.cmake.orig 2024-06-04 21:40:18.696599826 -0700
+++ cpackLin.cmake 2024-06-04 21:40:34.634035125 -0700
@@ -91,7 +91,7 @@
uuid-dev (>= 2.27.1)")
endif()
- if ((${LINUX_FLAVOR} MATCHES "^(ubuntu)") AND (${LINUX_VERSION} STREQUAL "23.10"))
+# if ((${LINUX_FLAVOR} MATCHES "^(ubuntu)") AND (${LINUX_VERSION} STREQUAL "23.10"))
# Workaround for the following class of cpack build failure on Ubuntu 23.10
# CMake Error at /usr/share/cmake-3.27/Modules/Internal/CPack/CPackDeb.cmake:348 (message):
# CPackDeb: dpkg-shlibdeps: 'dpkg-shlibdeps: error: no dependency information
@@ -104,10 +104,10 @@
# build/Release/_CPack_Packages/Linux/DEB/xrt_202410.2.17.0_23.10-amd64/xrt directory
# Adding an empty DEBIAN directory somehow convinces dpkg-shlibdeps to behave sanely.
- message("-- Enable Ubuntu 23.10 cpack dpkg-shlibdeps failure workaround")
- file(WRITE "${CMAKE_CURRENT_BINARY_DIR}/please-mantic.txt" "Workaround for cpack bug on Ubuntu 23.10")
- install(FILES "${CMAKE_CURRENT_BINARY_DIR}/please-mantic.txt" DESTINATION "${XRT_INSTALL_DIR}/DEBIAN")
- endif()
+# message("-- Enable Ubuntu 23.10 cpack dpkg-shlibdeps failure workaround")
+# file(WRITE "${CMAKE_CURRENT_BINARY_DIR}/please-mantic.txt" "Workaround for cpack bug on Ubuntu 23.10")
+# install(FILES "${CMAKE_CURRENT_BINARY_DIR}/please-mantic.txt" DESTINATION "${XRT_INSTALL_DIR}/DEBIAN")
+# endif()
if (DEFINED CROSS_COMPILE)
if (${aarch} STREQUAL "aarch64")
I'm having trouble running the example after building a kernel with the newest XRT and driver for amd 8845hs.
the operation is under PVE virtual machine
These are what i check for each step
to ensure driver will install check the VM system requirements and host requirements
root@pve:~# dmesg | grep -i iommu
# boot options
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.8-2-pve root=/dev/mapper/pve-root ro quiet amd_iommu=on iommu=pt nomodeset pcie_acs_override=downstream initcall_blacklist=sysfb_init vfio_iommu_type1.allow_unsafe_interrupts=1 video=efifb:off
[ 0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[ 0.063268] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.8.8-2-pve root=/dev/mapper/pve-root ro quiet amd_iommu=on iommu=pt nomodeset pcie_acs_override=downstream initcall_blacklist=sysfb_init vfio_iommu_type1.allow_unsafe_interrupts=1 video=efifb:off
[ 0.647008] iommu: Default domain type: Passthrough (set via kernel command line)
# found this
[ 0.683854] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 0.683910] pci 0000:00:01.0: Adding to iommu group 0
[ 0.683927] pci 0000:00:01.2: Adding to iommu group 1
[ 0.683944] pci 0000:00:01.4: Adding to iommu group 2
[ 0.683973] pci 0000:00:02.0: Adding to iommu group 3
[ 0.683990] pci 0000:00:02.1: Adding to iommu group 4
[ 0.684006] pci 0000:00:02.2: Adding to iommu group 5
[ 0.684022] pci 0000:00:02.3: Adding to iommu group 6
[ 0.684045] pci 0000:00:03.0: Adding to iommu group 7
[ 0.684063] pci 0000:00:03.1: Adding to iommu group 8
[ 0.684087] pci 0000:00:04.0: Adding to iommu group 9
[ 0.684105] pci 0000:00:04.1: Adding to iommu group 10
[ 0.684133] pci 0000:00:08.0: Adding to iommu group 11
[ 0.684149] pci 0000:00:08.1: Adding to iommu group 12
[ 0.684165] pci 0000:00:08.2: Adding to iommu group 13
[ 0.684181] pci 0000:00:08.3: Adding to iommu group 14
[ 0.684209] pci 0000:00:14.0: Adding to iommu group 15
[ 0.684223] pci 0000:00:14.3: Adding to iommu group 15
[ 0.684292] pci 0000:00:18.0: Adding to iommu group 16
[ 0.684306] pci 0000:00:18.1: Adding to iommu group 16
[ 0.684321] pci 0000:00:18.2: Adding to iommu group 16
[ 0.684336] pci 0000:00:18.3: Adding to iommu group 16
[ 0.684351] pci 0000:00:18.4: Adding to iommu group 16
[ 0.684366] pci 0000:00:18.5: Adding to iommu group 16
[ 0.684383] pci 0000:00:18.6: Adding to iommu group 16
[ 0.684397] pci 0000:00:18.7: Adding to iommu group 16
[ 0.684418] pci 0000:01:00.0: Adding to iommu group 17
[ 0.684434] pci 0000:02:00.0: Adding to iommu group 18
[ 0.684450] pci 0000:03:00.0: Adding to iommu group 19
[ 0.684466] pci 0000:04:00.0: Adding to iommu group 20
[ 0.684483] pci 0000:05:00.0: Adding to iommu group 21
[ 0.684513] pci 0000:c6:00.0: Adding to iommu group 22
[ 0.684530] pci 0000:c6:00.1: Adding to iommu group 23
[ 0.684547] pci 0000:c6:00.2: Adding to iommu group 24
[ 0.684565] pci 0000:c6:00.3: Adding to iommu group 25
[ 0.684582] pci 0000:c6:00.4: Adding to iommu group 26
[ 0.684600] pci 0000:c6:00.6: Adding to iommu group 27
[ 0.684619] pci 0000:c7:00.0: Adding to iommu group 28
[ 0.684639] pci 0000:c7:00.1: Adding to iommu group 29
[ 0.684657] pci 0000:c8:00.0: Adding to iommu group 30
[ 0.684677] pci 0000:c8:00.3: Adding to iommu group 31
[ 0.684696] pci 0000:c8:00.4: Adding to iommu group 32
[ 0.684714] pci 0000:c8:00.5: Adding to iommu group 33
[ 0.684733] pci 0000:c8:00.6: Adding to iommu group 34
# found this
[ 0.687914] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
check cpu
root@vm:~# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: AuthenticAMD
# found this
Model name: AMD Ryzen 7 8845HS w/ Radeon 780M Graphics
CPU family: 25
Model: 117
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: 2
BogoMIPS: 7585.70
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pd
pe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pn
i pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_
deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_l
egacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw per
fctr_core ssbd ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc
_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed
adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx5
12vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 clzero xsaveerptr
wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_clean flushbyasi
d pausefilter pfthreshold v_vmsave_vmload vgif vnmi avx512vbmi um
ip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512
_bitalg avx512_vpopcntdq rdpid fsrm flush_l1d arch_capabilities
Virtualization features:
Virtualization: AMD-V
Hypervisor vendor: KVM
Virtualization type: full
Caches (sum of all):
L1d: 256 KiB (4 instances)
L1i: 256 KiB (4 instances)
L2: 2 MiB (4 instances)
L3: 64 MiB (4 instances)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Vulnerable: Safe RET, no microcode
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitizat
ion
Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP di
sabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Srbds: Not affected
Tsx async abort: Not affected
check immou and drm
root@vm:~# cat /proc/cmdline
# found this
# not enable in cmdline immou but still detected enable
BOOT_IMAGE=/vmlinuz-6.8.0-38-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro
root@vm:~# sudo dmesg | grep -i iommu
# found this, iommu retured values
[ 0.163549] iommu: Default domain type: Translated
[ 0.163549] iommu: DMA domain TLB invalidation policy: lazy mode
root@vm:~# sudo lsmod | grep drm
# found this, also return values
amddrm_ttm_helper 12288 1 amdgpu
amdttm 118784 2 amdgpu,amddrm_ttm_helper
amddrm_buddy 20480 1 amdgpu
drm_exec 16384 1 amdgpu
drm_suballoc_helper 16384 1 amdgpu
drm_display_helper 253952 1 amdgpu
cec 98304 1 drm_display_helper
drm_vram_helper 24576 1 bochs
drm_ttm_helper 12288 2 bochs,drm_vram_helper
ttm 114688 2 drm_vram_helper,drm_ttm_helper
root@vm:~# sudo dmesg | grep drm
[ 0.396334] ACPI: bus type drm_connector registered
[ 0.431417] [drm] Initialized simpledrm 1.0.0 20200625 for simple-framebuffer.0 on minor 0
[ 0.432952] simple-framebuffer simple-framebuffer.0: [drm] fb0: simpledrmdrmfb frame buffer device
[ 0.731790] bochs-drm 0000:00:02.0: vgaarb: deactivate vga console
[ 0.731861] [drm] Found bochs VGA, ID 0xb0c5.
[ 0.731864] [drm] Framebuffer size 16384 kB @ 0xfd000000, mmio @ 0xfeb96000.
[ 0.732780] [drm] Found EDID data blob.
[ 0.732952] [drm] Initialized bochs-drm 1.0.0 20130925 for 0000:00:02.0 on minor 0
[ 0.733705] fbcon: bochs-drmdrmfb (fb0) is primary device
[ 1.034047] bochs-drm 0000:00:02.0: [drm] fb0: bochs-drmdrmfb frame buffer device
[ 1.883668] [drm] amdgpu kernel modesetting enabled.
[ 1.883681] [drm] amdgpu version: 6.7.0
[ 1.883688] [drm] OS DRM version: 6.8.0
[ 1.916430] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x1900 0x2014:0x8001 0xC5).
[ 1.918362] [drm] register mmio base: 0xFEA00000
[ 1.918370] [drm] register mmio size: 524288
[ 1.921893] [drm] add ip block number 0 <soc21_common>
[ 1.921904] [drm] add ip block number 1 <gmc_v11_0>
[ 1.921911] [drm] add ip block number 2 <ih_v6_0>
[ 1.921918] [drm] add ip block number 3 <psp>
[ 1.921925] [drm] add ip block number 4 <smu>
[ 1.922193] [drm] add ip block number 5 <dm>
[ 1.922394] [drm] add ip block number 6 <gfx_v11_0>
[ 1.922591] [drm] add ip block number 7 <sdma_v6_0>
[ 1.922777] [drm] add ip block number 8 <vcn_v4_0>
[ 1.922959] [drm] add ip block number 9 <jpeg_v4_0>
[ 1.923137] [drm] add ip block number 10 <mes_v11_0>
[ 1.932912] [drm] BIOS signature incorrect 5b 69
[ 1.933147] [drm] BIOS header is broken
[ 1.940333] [drm] BIOS signature incorrect 5b 69
[ 2.090508] workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
[ 2.390508] workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 8 times, consider switching to WQ_UNBOUND
[ 2.689184] systemd[1]: Starting [email protected] - Load Kernel Module drm...
[ 2.999504] workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 16 times, consider switching to WQ_UNBOUND
[ 5.211507] workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 32 times, consider switching to WQ_UNBOUND
root@vm:~# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:03.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon
00:05.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
# found this
00:10.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Phoenix3 (rev c5)
# found this
00:11.0 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon High Definition Audio Controller
00:12.0 Ethernet controller: Red Hat, Inc. Virtio network device
# found this
00:1b.0 Signal processing controller: Advanced Micro Devices, Inc. [AMD] AMD IPU Device
00:1e.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
00:1f.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
01:01.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI
there are GPU/IPU and audio device
and i was build success xrt and xdna-driver and it was installed
root@vm:~# ls /opt/xilinx/xrt/lib
libcontainer_mpd_plugin.so libxrt_core.so libxrt_hwemu_static.a
libxdp_core.so libxrt_core.so.2 libxrt_noop.so
libxdp_core.so.2 libxrt_core.so.2.18.0 libxrt_noop.so.2
libxdp_core.so.2.18.0 libxrt_core_static.a libxrt_noop.so.2.18.0
libxilinxopencl.so libxrt_coreutil.so libxrt++.so
libxilinxopencl.so.2 libxrt_coreutil.so.2 libxrt++.so.2
libxilinxopencl.so.2.18.0 libxrt_coreutil.so.2.18.0 libxrt++.so.2.18.0
libxilinxopencl_static.a libxrt_coreutil_static.a libxrt++_static.a
libxma2api.so libxrt_driver_xdna.so libxrt_swemu.so
libxma2api.so.2 libxrt_driver_xdna.so.2 libxrt_swemu.so.2
libxma2api.so.2.18.0 libxrt_driver_xdna.so.2.18.0 libxrt_swemu.so.2.18.0
libxma2plugin.so libxrt_hwemu.so libxrt_swemu_static.a
libxma2plugin.so.2 libxrt_hwemu.so.2 xrt
libxma2plugin.so.2.18.0 libxrt_hwemu.so.2.18.0
root@vm:~# ls /usr/lib/firmware/amdnpu/1502_00 -alh
total 316K
drwxr-xr-x 2 root root 4.0K Jul 19 18:58 .
drwxr-xr-x 6 root root 4.0K Jul 19 17:26 ..
-rw-r--r-- 1 root root 305K Jul 19 15:58 npu.sbin
when i check steps from #50
root@vm:~# ./xrt/build/Release/opt/xilinx/xrt/bin/xrt-smi examine
System Configuration
OS Name : Linux
Release : 6.8.0-38-generic
Machine : x86_64
CPU Cores : 4
Memory : 7941 MB
Distribution : Ubuntu 24.04 LTS
GLIBC : 2.39
Model : Standard PC (i440FX + PIIX, 1996)
BIOS vendor : SeaBIOS
BIOS version : rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org
XRT
Version : 2.18.0
Branch : HEAD
Hash : 54b1a0335ef517415d17206d30365cf4a2c380d0
Hash Date : 2024-07-19 18:10:24
xocl : unknown, unknown
xclmgmt : unknown, unknown
WARNING: xclmgmt version is unknown. Is xclmgmt driver loaded? Or is MSD/MPD running?
Devices present
# found this
0 devices found
here is problem why 0 devices found
and try to install
# cwd is ./xdna-driver/build/Release
root@vm:~# ./opt/xilinx/xrt/amdxdna/dkms_driver.sh --install
XILINX_XRT is not set properly
why XILINX_XRT
is not set properly,i have installed xrt and xdna-driver,it's seems xrt was not install success.
and retry to install xrt
root@vm:~# sudo apt reinstall ./xrt_202420.2.18.0_24.04-amd64-xrt.deb ./xrt_202420.2.18.0_24.04-amd64-xbflash.deb
[sudo] password for never:
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Note, selecting 'xrt' instead of './xrt_202420.2.18.0_24.04-amd64-xrt.deb'
Note, selecting 'xrt-xbflash' instead of './xrt_202420.2.18.0_24.04-amd64-xbflash.deb'
0 upgraded, 0 newly installed, 2 reinstalled, 0 to remove and 1 not upgraded.
Need to get 0 B/16.2 MB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 /home/never/xdna-driver/xrt/build/Release/xrt_202420.2.18.0_24.04-amd64-xrt.deb xrt amd64 2.18.0 [16.2 MB]
Get:2 /home/never/xdna-driver/xrt/build/Release/xrt_202420.2.18.0_24.04-amd64-xbflash.deb xrt-xbflash amd64 2.18.0 [59.5 kB]
(Reading database ... 138133 files and directories currently installed.)
Preparing to unpack .../xrt_202420.2.18.0_24.04-amd64-xrt.deb ...
Unpacking xrt (2.18.0) over (2.18.0) ...
Preparing to unpack .../xrt_202420.2.18.0_24.04-amd64-xbflash.deb ...
Unpacking xrt-xbflash (2.18.0) over (2.18.0) ...
Setting up xrt (2.18.0) ...
Setting up xrt-xbflash (2.18.0) ...
Scanning processes...
Scanning linux images...
Running kernel seems to be up-to-date.
No services need to be restarted.
No containers need to be restarted.
No user sessions are running outdated binaries.
No VM guests are running outdated hypervisor (qemu) binaries on this host.
N: Download is performed unsandboxed as root as file '/home/never/xdna-driver/xrt/build/Release/xrt_202420.2.18.0_24.04-amd64-xrt.deb' couldn't be accessed by user '_apt'. - pkgAcquire::Run (13: Permission denied)
root@vm:~# dpkg -l | grep xdna
ii xrt_plugin-amdxdna 2.18.0 amd64 XDNA driver plugin for Xilinx RunTime
root@vm:~# dpkg -l | grep xrt
ii xrt 2.18.0 amd64 Runtime stack for use with AMD platforms
ii xrt-container 2.18.0 amd64 Runtime stack for use with AMD platforms
ii xrt-xbflash 2.18.0 amd64 Runtime stack for use with AMD platforms
ii xrt_plugin-amdxdna 2.18.0 amd64 XDNA driver plugin for Xilinx RunTime
XILINX_XRT is not set properly
problem still
switch to root and export XILINX_XRT=/opt/xilinx/xrt
root@vm:~# export XILINX_XRT=/opt/xilinx/xrt
root@vm:~# ./opt/xilinx/xrt/amdxdna/dkms_driver.sh --install
Installing xrt-amdxdna-2.18.0 from /opt/xilinx/xrt/amdxdna...
Module xrt-amdxdna-2.18.0 for kernel 6.8.0-38-generic (x86_64).
Before uninstall, this module version was ACTIVE on this kernel.
amdxdna.ko.zst:
- Uninstallation
- Deleting from: /lib/modules/6.8.0-38-generic/updates/dkms/
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.
depmod...
amdxdna.ko.zst:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/6.8.0-38-generic/updates/dkms/
depmod...
Successfully intalled and enabled DKMS for xrt-amdxdna/2.18.0
check again use xrt-smi
root@vm:~# /home/never/xdna-driver# ./xrt/build/Release/opt/xilinx/xrt/bin/unwrapped/xrt-smi examin
ERROR: Unknown command: 'examin'
DESCRIPTION: The Xilinx (R) Run Time - System Management Interface (xrt-smi) is a standalone
command line utility that is included with the Xilinx Run Time (XRT) installation
package. It includes multiple commands to identify and validate the installed card(s).
This information can be used for both card administration and application debugging.
USAGE: xrt-smi[--help] [--version] [--verbose] [--batch] [--force] [command [commandArgs]]
AVAILABLE COMMANDS:
configure - Device and host configuration
examine - Status of the system and device
program - Download the acceleration program to a given device
reset - Resets the given device
validate - Validates the basic shell acceleration functionality
OPTIONS:
--help - Help to use this application
--version - Report the version of XRT and its drivers
--verbose - Turn on verbosity
--batch - Enable batch mode (disables escape characters)
--force - When possible, force an operation
root@tkai:/home/never/xdna-driver# ./xrt/build/Release/opt/xilinx/xrt/bin/unwrapped/xrt-smi examine
System Configuration
OS Name : Linux
Release : 6.8.0-38-generic
Machine : x86_64
CPU Cores : 4
Memory : 7941 MB
Distribution : Ubuntu 24.04 LTS
GLIBC : 2.39
Model : Standard PC (i440FX + PIIX, 1996)
BIOS vendor : SeaBIOS
BIOS version : rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org
XRT
Version : 2.18.0
Branch : HEAD
Hash : 54b1a0335ef517415d17206d30365cf4a2c380d0
Hash Date : 2024-07-19 18:10:24
xocl : unknown, unknown
xclmgmt : unknown, unknown
WARNING: xclmgmt version is unknown. Is xclmgmt driver loaded? Or is MSD/MPD running?
amdxdna : 2.18.0_20240719, 33ce972deb1eaec4666671a0255870a28ec982ae
Devices present
0 devices found
it's seems almost success, I'd appreciate any assistance. still 0 device found
I tried to build the driver with these environments: ubuntu-22.04.04 + vitis-2022.2
and ubuntu-24.04 + vitis-2023.2
. The kernel I cloned from AMD-SW/linux
is 6.8.7+
. The device I am using is Thinkbook 14+
with 8845hs
. And the AMD VM support is enabled in the BIOS, but I cannot find options related with IOMMU.
After following the instructions in the readme, both the environment cannot run the test example with output:
$ ./example_build/example_noop_test ../tools/bins/1502_00/validate.xclbin
Host test code start...
Host test code is creating device object...
ERROR: Caught exception: Failed to open KMQ device fd (err=22): Invalid argument
TEST FAILED!
The dmesg
command reports:
[ 255.039799] amdxdna 0000:65:00.1: set mpnpu_clock = 600 mhz
[ 255.059899] amdxdna 0000:65:00.1: set npu_hclock = 1024 mhz
[ 255.101933] [drm] Initialized amdxdna_accel_driver 1.0.0 20240124 for 0000:65:00.1 on minor 0
[ 278.109565] amdxdna 0000:65:00.1: amdxdna_drm_open: SVA bind device failed, ret -28
What is the possible cause of this problem?
Thanks for your attention in advance.
Now the we have the IOMMU SVA support upstreamed, do you have intentions to upstream the xdna-driver?
If I have 256g memory on board, how many Gbit memory will be address by 780m ?
I think that integration with Pytorch can trigger most consumers to buy those laptops just to start working with AI.
Also it can be opponent to Macbooks that already use their iGPU for Pytorch tasks and it very popular usage case!!!
And need to remember that Ryzen has dedicated AI engine that more power efficient then Macbook's iGPU that a big pros!
Since AMD 7845hs and AMD 8845hs product page says they are very powerful. The 8845hs has "Total Processor Performance : 38 TOPS". I want use the power of the chip, is there a way I can train my model using NPU and GPU at the same time to leverage the full power of the chip.
Independently from this driver, I am still have a lot of issues with my HP ZBook Power 15.6 inch G10 A to have basic power control to work correctly so I am trying all the latest Linux versions.
So, as I explained recently to @sonals, if I want to use also XDNA, it would be nice to have XRT & XDNA to track the latest development.
I'm having trouble running the example after building a kernel with the newest XRT and driver for my AMD Ryzen 7 8700G w/ Radeon 780M Graphics on archlinux. These are what I got after install xrt & xdna-driver
ls /opt/xilinx/xrt/lib
libaws_mpd_plugin.so libxilinxopencl.so.2 libxrt_coreutil.so libxrt_noop.so.2.17.0
libazure_mpd_plugin.so libxilinxopencl.so.2.17.0 libxrt_coreutil.so.2 libxrt++.so
libcontainer_mpd_plugin.so libxilinxopencl_static.a libxrt_coreutil.so.2.17.0 libxrt++.so.2
libsched_em.so libxma2api.so libxrt_coreutil_static.a libxrt++.so.2.17.0
libsched_em.so.2 libxma2api.so.2 libxrt_driver_xdna.so libxrt++_static.a
libsched_em.so.2.17.0 libxma2api.so.2.17.0 libxrt_driver_xdna.so.2 libxrt_swemu.so
libsched_em_v30.so libxma2plugin.so libxrt_driver_xdna.so.2.17.0 libxrt_swemu.so.2
libsched_em_v30.so.2 libxma2plugin.so.2 libxrt_hwemu.so libxrt_swemu.so.2.17.0
libsched_em_v30.so.2.17.0 libxma2plugin.so.2.17.0 libxrt_hwemu.so.2 libxrt_swemu_static.a
libxdp_core.so libxrt_core.so libxrt_hwemu.so.2.17.0 xrt
libxdp_core.so.2 libxrt_core.so.2 libxrt_hwemu_static.a
libxdp_core.so.2.17.0 libxrt_core.so.2.17.0 libxrt_noop.so
libxilinxopencl.so libxrt_core_static.a libxrt_noop.so.2
ls /usr/lib/firmware/amdnpu/1502_00 -alh
total 292K
drwxr-xr-x 2 root root 4.0K Apr 23 10:58 .
drwxr-xr-x 3 root root 4.0K Apr 23 10:58 ..
-rw-r--r-- 1 root root 281K Apr 23 10:55 npu.sbin
./example_build/example_noop_test ../tools/bins/1502_00/validate.xclbin
Host test code start...
Host test code is creating device object...
ERROR: Caught exception: No such device with index '0'
TEST FAILED!
uname -a
Linux 16t 6.8.7-iommu-sva-part4-v7-gc97772a3ca59-dirty #1 SMP PREEMPT_DYNAMIC Mon Apr 22 10:02:12 CST 2024 x86_64 GNU/Linux
I'd appreciate any assistance you can offer.
The XRT required in the git submodule is too old to compile with 6.8 but at the same time this driver does not compile with XRT of the day.
The Linux kernel required to run this xdna-driver
is based on an old release candidate of Linux 6.7 but Linux 6.7.2 has been released since with a lot of fixes (https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.7.2).
It would be nice to have the latest version on https://github.com/AMDESE/linux to minimize troubles.
Technically I should open this issue on https://github.com/AMDESE/linux but that GitHub repository does not allow opening issue. Strange vision of open-source... ;-)
Can you please share the roadmap for this project. I which Linux kernel can this driver be found, without messing around with custom builds?
I have a Ryzen 7940hs system that I still cannot use on Linux fully, because it is not officially supported by AMD.
If the system doesn't have jq
installed, download_npufws
silently fails (i.e., script doesn't exit) and no firmwares are downloaded and plugins install but still myseriously no devices are found. Recommend set -uo pipefail
.
I notice the Ryzen AI are using XLNX_VART_FIRMWARE and .xclbin file. I expect it was using the same technology as Xilinx Versal AI Engine. Will AMD open the document and interface for users to develop custom firmware? (That is, develop custom .xclbin file just like Versal device using Vitis IDE
😄). This is great for non-AI applications to utilize the AI Engine inside Ryzen CPU.
I was able to build everything, but the post-install step breaks:
Successfully intalled and enabled DKMS for xrt-amdxdna/2.17.0
Loading new amdxdna Linux kernel module
Creating xclbin firmware symbolic link
/opt/xilinx/xrt/amdxdna/setup_xclbin_firmware.sh: line 38: cd: /lib/firmware/amdipu/*/: No such file or directory
modprobe: ERROR: could not insert 'amdxdna': Unknown symbol in module, or unknown parameter (see dmesg)
$ sudo dkms status
xrt/2.17.0: added
Error! Could not locate dkms.conf file.
File: /var/lib/dkms/xrt-amdaie/2.17.0/source/dkms.conf does not exist.
I received an error building xrt with gcc-14:
In file included from /home/mike/Development/MISC/xdna-driver_main/xrt/src/runtime_src/tools/xclbinutil/R>
from /home/mike/Development/MISC/xdna-driver_main/xrt/src/runtime_src/tools/xclbinutil/R>
/usr/include/rapidjson/document.h: In member function 'rapidjson::GenericStringRef<CharType>& rapidjson::>
/usr/include/rapidjson/document.h:319:82: error: assignment of read-only member 'rapidjson::GenericString>
319 | GenericStringRef& operator=(const GenericStringRef& rhs) { s = rhs.s; length = rhs.length; }
This issue is also described in (several) bug reports as an issue with rapidjson 1.1.0 (the current release, although very old):
https://www.mail-archive.com/[email protected]/msg196747.html
https://bugs.gentoo.org/919374
The solution is to either patch 1.1.0 with a more recent commit:
https://github.com/Tencent/rapidjson/commit/3b2441b8.patch
... or pull rapidjson from git upstream (which is what I did to resolve after disabling the build tests):
git clone --depth=1 https://github.com/tencent/rapidjson.git rapidjson
cd rapidjson
mkdir tmpbuild
cd tmpbuild
CC=gcc-14 CXX=g++-14 cmake -DRAPIDJSON_BUILD_TESTS=OFF ..
make -j `nproc`
sudo make install
Now all of the XDNA components (linux kernel, xrt, xrt-xdna) successfully build with gcc-14.
Thanks @maxzhen and @sonals for your help.
Cheers,
Michael
# xrt_test
====== 0: npu3 xrt vadd started =====
DRM_IOCTL_AMDXDNA_GET_INFO IOCTL failed (err=95): Operation not supported
====== 0: npu3 xrt vadd FAILED =====
1 test(s) executed
1 test(s) FAILED!
# xbutil validate -d 0000:c7:00.1
XRT build version: 2.17.0
Build hash:
Build date: 2024-06-10 20:36:38
Git branch:
PID: 2214
UID: 0
[Thu Jun 13 10:05:29 2024 GMT]
HOST: m-kot
EXE: /usr/bin/unwrapped/xbutil2
[xbutil] ERROR: DRM_IOCTL_AMDXDNA_GET_INFO IOCTL failed (err=95): Operation not supported
202410.2.17.319
https://github.com/Xilinx/XRT/releases/tag/202410.2.17.319AMD Ryzen™ 7 8845HS
# dmesg | grep -v input | grep -E "xdna|npu|xocl|xclmgmt|drm|gpu"
[ 1.347323] xocl: loading out-of-tree module taints kernel.
[ 1.356150] xclmgmt init()
[ 2.554377] systemd[1]: Starting Load Kernel Module drm...
[ 2.560314] systemd[1]: [email protected]: Deactivated successfully.
[ 2.560358] systemd[1]: Finished Load Kernel Module drm.
[ 2.801077] Loading firmware: amdnpu/1502_00/npu.sbin
[ 2.802016] amdxdna 0000:c7:00.1: enabling device (0000 -> 0002)
[ 2.809259] amdxdna 0000:c7:00.1: (Develop) IOMMU mode is 0
[ 2.840447] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[ 2.860449] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
[ 2.894357] [drm] amdgpu kernel modesetting enabled.
[ 2.897619] amdgpu: Virtual CRAT table created for CPU
[ 2.897626] amdgpu: Topology: Add CPU node
[ 2.897713] amdgpu 0000:c6:00.0: enabling device (0006 -> 0007)
[ 2.897734] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x1900 0x1002:0x0124 0xC5).
[ 2.897741] [drm] register mmio base: 0xDC500000
[ 2.897741] [drm] register mmio size: 524288
[ 2.900727] [drm] add ip block number 0 <soc21_common>
[ 2.900730] [drm] add ip block number 1 <gmc_v11_0>
[ 2.900732] [drm] add ip block number 2 <ih_v6_0>
[ 2.900734] [drm] add ip block number 3 <psp>
[ 2.900736] [drm] add ip block number 4 <smu>
[ 2.900737] [drm] add ip block number 5 <dm>
[ 2.900739] [drm] add ip block number 6 <gfx_v11_0>
[ 2.900740] [drm] add ip block number 7 <sdma_v6_0>
[ 2.900742] [drm] add ip block number 8 <vcn_v4_0>
[ 2.900744] [drm] add ip block number 9 <jpeg_v4_0>
[ 2.900745] [drm] add ip block number 10 <mes_v11_0>
[ 2.900761] amdgpu 0000:c6:00.0: amdgpu: Fetched VBIOS from VFCT
[ 2.900764] amdgpu: ATOM BIOS: 113-PHXGENERIC-001
[ 2.900777] Loading firmware: amdgpu/psp_13_0_4_toc.bin
[ 2.901513] [drm] Initialized amdxdna_accel_driver 1.0.0 20240124 for 0000:c7:00.1 on minor 0
[ 2.901597] Loading firmware: amdgpu/psp_13_0_4_ta.bin
[ 2.902331] Loading firmware: amdgpu/dcn_3_1_4_dmcub.bin
[ 2.903122] Loading firmware: amdgpu/gc_11_0_1_pfp.bin
[ 2.903583] Loading firmware: amdgpu/gc_11_0_1_me.bin
[ 2.904080] Loading firmware: amdgpu/gc_11_0_1_rlc.bin
[ 2.904605] Loading firmware: amdgpu/gc_11_0_1_mec.bin
[ 2.905238] Loading firmware: amdgpu/gc_11_0_1_imu.bin
[ 2.905714] Loading firmware: amdgpu/sdma_6_0_1.bin
[ 2.906029] [drm] VCN(0) encode/decode are enabled in VM mode
[ 2.906031] Loading firmware: amdgpu/vcn_4_0_2.bin
[ 2.906770] amdgpu 0000:c6:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
[ 2.906882] Loading firmware: amdgpu/gc_11_0_1_mes_2.bin
[ 2.907442] Loading firmware: amdgpu/gc_11_0_1_mes1.bin
[ 2.908065] amdgpu 0000:c6:00.0: vgaarb: deactivate vga console
[ 2.908068] amdgpu 0000:c6:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
[ 2.908092] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[ 2.908113] amdgpu 0000:c6:00.0: amdgpu: VRAM: 4096M 0x0000008000000000 - 0x00000080FFFFFFFF (4096M used)
[ 2.908115] amdgpu 0000:c6:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
[ 2.908134] [drm] Detected VRAM RAM=4096M, BAR=4096M
[ 2.908135] [drm] RAM width 64bits DDR5
[ 2.908208] [drm] amdgpu: 4096M of VRAM memory ready
[ 2.908209] [drm] amdgpu: 13942M of GTT memory ready.
[ 2.908219] [drm] GART: num cpu pages 131072, num gpu pages 131072
[ 2.908448] [drm] PCIE GART of 512M enabled (table at 0x00000080FFD00000).
[ 2.908724] [drm] Loading DMUB firmware via PSP: version=0x08003A00
[ 2.909005] [drm] Found VCN firmware Version ENC: 1.19 DEC: 7 VEP: 0 Revision: 13
[ 2.909008] amdgpu 0000:c6:00.0: amdgpu: Will use PSP to load VCN firmware
[ 2.932862] [drm] reserve 0x4000000 from 0x80f8000000 for PSP TMR
[ 3.475099] amdgpu 0000:c6:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 3.482664] amdgpu 0000:c6:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 3.482668] amdgpu 0000:c6:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 3.514961] amdgpu 0000:c6:00.0: amdgpu: SMU is initialized successfully!
[ 3.514965] [drm] Seamless boot condition check passed
[ 3.516033] [drm] Display Core v3.2.266 initialized on DCN 3.1.4
[ 3.516037] [drm] DP-HDMI FRL PCON supported
[ 3.518631] [drm] DMUB hardware initialized: version=0x08003A00
[ 3.520335] snd_hda_intel 0000:c6:00.1: bound 0000:c6:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[ 3.588227] [drm] kiq ring mec 3 pipe 1 q 0
[ 3.590651] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[ 3.590677] amdgpu 0000:c6:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[ 3.659561] amdgpu: HMM registered 4096MB device memory
[ 3.660114] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[ 3.660126] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
[ 3.660221] amdgpu: Virtual CRAT table created for GPU
[ 3.660492] amdgpu: Topology: Add dGPU node [0x1900:0x1002]
[ 3.660494] kfd kfd: amdgpu: added device 1002:1900
[ 3.660503] amdgpu 0000:c6:00.0: amdgpu: SE 1, SH per SE 2, CU per SH 6, active_cu_number 12
[ 3.660507] amdgpu 0000:c6:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 3.660509] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 3.660510] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 3.660511] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[ 3.660512] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[ 3.660513] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[ 3.660514] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[ 3.660515] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[ 3.660516] amdgpu 0000:c6:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[ 3.660517] amdgpu 0000:c6:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 3.660518] amdgpu 0000:c6:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[ 3.660519] amdgpu 0000:c6:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[ 3.660521] amdgpu 0000:c6:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[ 3.661404] [drm] ring gfx_32768.1.1 was added
[ 3.661896] [drm] ring compute_32768.2.2 was added
[ 3.662375] [drm] ring sdma_32768.3.3 was added
[ 3.662420] [drm] ring gfx_32768.1.1 ib test pass
[ 3.662451] [drm] ring compute_32768.2.2 ib test pass
[ 3.662501] [drm] ring sdma_32768.3.3 ib test pass
[ 3.664571] [drm] Initialized amdgpu 3.57.0 20150101 for 0000:c6:00.0 on minor 0
[ 3.667849] fbcon: amdgpudrmfb (fb0) is primary device
[ 3.667975] [drm] DSC precompute is not needed.
[ 4.097544] amdgpu 0000:c6:00.0: [drm] REG_WAIT timeout 1us * 100000 tries - optc314_disable_crtc line:148
[ 4.152546] amdgpu 0000:c6:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[ 53.990860] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[ 54.010771] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
[ 54.052663] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 59.637423] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 61.574206] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 122.652191] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[ 122.672252] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
[ 122.714141] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 293.601336] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[ 293.621373] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
[ 293.663138] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 300.966818] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[ 300.987233] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
[ 301.030168] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.751547] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[ 314.771241] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
[ 314.813866] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.814563] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.814690] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.814809] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.814936] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.815040] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.815167] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.815267] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.815369] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.815464] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.815567] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.815655] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.815748] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.815837] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.815929] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.816025] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.816115] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.816209] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.816299] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.816386] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.816482] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.816573] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.816671] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.816766] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.816858] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.816951] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 314.817045] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 370.301712] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[ 370.321703] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
[ 370.363371] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 891.645155] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[ 891.661209] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
[ 891.702813] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 900.371245] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[ 900.391240] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
[ 900.433023] amdxdna 0000:c7:00.1: aie2_get_info: Not supported request parameter 7
[ 906.238892] amdxdna 0000:c7:00.1: set mpnpu_clock = 600 mhz
[ 906.258981] amdxdna 0000:c7:00.1: set npu_hclock = 1024 mhz
This occurs on the build step ./build.sh -package
for the step Create DEB package for existed release or debug build. top build the XDNA Driver
Hello!
All the 3 provided examples seem to work like:
./example_build/example_noop_test /lib/firmware/amdnpu/1502/validate.xclbin
...$ ./example_build/example_noop_test /lib/firmware/amdnpu/1502/validate.xclbin
Host test code start...
Host test code is creating device object...
Host test code is loading xclbin object...
Host test code is creating kernel object...
Host test code kernel name: DPU_PDI_0
Host code is registering xclbin to the device...
Host code is creating hw_context...
Host test code is creating kernel object...
Host test code allocate buffer objects...
Host test code sync buffer objects to device...
Host test code iterations (~10 seconds): 70000
Host test microseconds: 6962790
Host test average latency: 99 us/iter
TEST PASSED!
but when I look at the dmesg
or /var/log/kern.log
there is a scary:
2024-02-15T17:51:15.668654-08:00 rk-xsj kernel: [ 2909.731818] ------------[ cut here ]------------
2024-02-15T17:51:15.668663-08:00 rk-xsj kernel: [ 2909.731821] WARNING: CPU: 9 PID: 42463 at drivers/iommu/io-pgfault.c:249 iopf_queue_flush_dev+0x2f/0x40
2024-02-15T17:51:15.668664-08:00 rk-xsj kernel: [ 2909.731827] Modules linked in: amdxdna(OE) drm_shmem_helper xocl(OE) xclmgmt(OE) hid_logitech_hidpp hid_logitech_dj snd_usb_audio snd_usbmidi_lib snd_ump rfcomm snd_seq_dummy snd_hrtimer xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc ipmi_devintf ipmi_msghandler nvme_fabrics vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) ccm overlay cmac algif_hash algif_skcipher af_alg bnep binfmt_misc nls_iso8859_1 intel_rapl_msr joydev intel_rapl_common snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_sof_amd_acp63 snd_sof_amd_vangogh ledtrig_audio snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_hda_codec_hdmi snd_sof snd_hda_intel snd_sof_utils snd_intel_dspcfg snd_intel_sdw_acpi snd_soc_core edac_mce_amd snd_hda_codec mt7921e mt7921_common btusb snd_hda_core snd_compress uvcvideo btrtl mt792x_lib ac97_bus snd_hwdep
2024-02-15T17:51:15.668665-08:00 rk-xsj kernel: [ 2909.731881] kvm_amd btintel snd_pcm_dmaengine videobuf2_vmalloc mt76_connac_lib btbcm uvc mt76 snd_seq_midi snd_pci_ps btmtk videobuf2_memops snd_seq_midi_event snd_rpl_pci_acp6x videobuf2_v4l2 snd_rawmidi snd_acp_pci mac80211 kvm bluetooth snd_acp_legacy_common videodev snd_pci_acp6x snd_seq snd_pcm irqbypass videobuf2_common ecdh_generic snd_seq_device crct10dif_pclmul crc32_pclmul hid_multitouch ecc mc snd_pci_acp5x snd_timer polyval_clmulni polyval_generic cfg80211 snd_rn_pci_acp3x ghash_clmulni_intel snd_acp_config sha256_ssse3 ucsi_acpi hp_wmi snd sha1_ssse3 r8169 typec_ucsi snd_soc_acpi sparse_keymap rapl platform_profile wmi_bmof thunderbolt k10temp libarc4 realtek soundcore ccp snd_pci_acp3x i2c_piix4 typec nvidia_uvm(POE) i2c_hid_acpi wireless_hotkey i2c_hid amd_pmc msr parport_pc ppdev nfsd lp parport auth_rpcgss nfs_acl lockd grace efi_pstore sunrpc dmi_sysfs ip_tables x_tables autofs4 dm_crypt hid_generic usbhid hid nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) amdgpu amdxcp drm_exec gpu_sched
2024-02-15T17:51:15.668665-08:00 rk-xsj kernel: [ 2909.731942] drm_buddy drm_suballoc_helper drm_ttm_helper ttm drm_display_helper cec rc_core drm_kms_helper input_leds nvme video drm serio_raw xhci_pci nvme_core xhci_pci_renesas i2c_algo_bit wmi mac_hid aesni_intel crypto_simd cryptd
2024-02-15T17:51:15.668666-08:00 rk-xsj kernel: [ 2909.731958] CPU: 9 PID: 42463 Comm: example_noop_te Tainted: P W OE 6.7.4+iommu-sva-v4+ #1
2024-02-15T17:51:15.668666-08:00 rk-xsj kernel: [ 2909.731960] Hardware name: HP HP ZBook Power 15.6 inch G10 A Mobile Workstation PC/8B95, BIOS V85 Ver. 01.03.00 09/11/2023
2024-02-15T17:51:15.668667-08:00 rk-xsj kernel: [ 2909.731961] RIP: 0010:iopf_queue_flush_dev+0x2f/0x40
2024-02-15T17:51:15.668667-08:00 rk-xsj kernel: [ 2909.731964] Code: 48 8b 87 d0 02 00 00 48 8b 40 20 48 85 c0 74 1a 55 48 8b 40 40 48 8b 38 48 89 e5 e8 8b 79 61 ff 31 c0 5d 31 ff e9 6c 06 80 00 <0f> 0b b8 ed ff ff ff 31 ff e9 5e 06 80 00 0f 1f 00 90 90 90 90 90
2024-02-15T17:51:15.668667-08:00 rk-xsj kernel: [ 2909.731965] RSP: 0018:ffffb19f0faffcc8 EFLAGS: 00010246
2024-02-15T17:51:15.668667-08:00 rk-xsj kernel: [ 2909.731967] RAX: 0000000000000000 RBX: ffffa05d41aeb0c0 RCX: 0000000000000000
2024-02-15T17:51:15.668668-08:00 rk-xsj kernel: [ 2909.731968] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa05d41aeb0c0
2024-02-15T17:51:15.668668-08:00 rk-xsj kernel: [ 2909.731969] RBP: ffffb19f0faffd00 R08: 0000000000000000 R09: 0000000000000000
2024-02-15T17:51:15.668668-08:00 rk-xsj kernel: [ 2909.731970] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
2024-02-15T17:51:15.668668-08:00 rk-xsj kernel: [ 2909.731970] R13: ffffa05d41895d80 R14: ffffa0617fc70978 R15: ffffa0617fc70810
2024-02-15T17:51:15.668668-08:00 rk-xsj kernel: [ 2909.731971] FS: 00007fb2bd043c00(0000) GS:ffffa06c75a40000(0000) knlGS:0000000000000000
2024-02-15T17:51:15.668669-08:00 rk-xsj kernel: [ 2909.731973] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2024-02-15T17:51:15.668669-08:00 rk-xsj kernel: [ 2909.731974] CR2: 00007fb2bc945400 CR3: 000000068f1ac000 CR4: 0000000000750ef0
2024-02-15T17:51:15.668670-08:00 rk-xsj kernel: [ 2909.731975] PKRU: 55555554
2024-02-15T17:51:15.668670-08:00 rk-xsj kernel: [ 2909.731976] Call Trace:
2024-02-15T17:51:15.668670-08:00 rk-xsj kernel: [ 2909.731977] <TASK>
2024-02-15T17:51:15.668671-08:00 rk-xsj kernel: [ 2909.731981] ? show_regs+0x6d/0x80
2024-02-15T17:51:15.668671-08:00 rk-xsj kernel: [ 2909.731984] ? __warn+0x89/0x160
2024-02-15T17:51:15.668676-08:00 rk-xsj kernel: [ 2909.731987] ? iopf_queue_flush_dev+0x2f/0x40
2024-02-15T17:51:15.668676-08:00 rk-xsj kernel: [ 2909.731989] ? report_bug+0x17e/0x1b0
2024-02-15T17:51:15.668677-08:00 rk-xsj kernel: [ 2909.731993] ? handle_bug+0x51/0xa0
2024-02-15T17:51:15.668677-08:00 rk-xsj kernel: [ 2909.731996] ? exc_invalid_op+0x18/0x80
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.731998] ? asm_exc_invalid_op+0x1b/0x20
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.732003] ? iopf_queue_flush_dev+0x2f/0x40
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.732005] ? srso_alias_return_thunk+0x5/0xfbef5
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.732006] ? amd_iommu_remove_dev_pasid+0x7d/0x160
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.732010] iommu_detach_device_pasid+0x5a/0xa0
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.732013] iommu_sva_unbind_device+0x3f/0xa0
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.732017] amdxdna_drm_close+0xa5/0x130 [amdxdna]
2024-02-15T17:51:15.668679-08:00 rk-xsj kernel: [ 2909.732024] drm_file_free+0x1e6/0x260 [drm]
2024-02-15T17:51:15.668679-08:00 rk-xsj kernel: [ 2909.732045] drm_release+0xc7/0x150 [drm]
2024-02-15T17:51:15.668679-08:00 rk-xsj kernel: [ 2909.732059] __fput+0x9e/0x2e0
2024-02-15T17:51:15.668679-08:00 rk-xsj kernel: [ 2909.732063] __fput_sync+0x1c/0x30
2024-02-15T17:51:15.668679-08:00 rk-xsj kernel: [ 2909.732065] __x64_sys_close+0x3e/0x90
2024-02-15T17:51:15.668679-08:00 rk-xsj kernel: [ 2909.732068] do_syscall_64+0x5d/0xf0
2024-02-15T17:51:15.668680-08:00 rk-xsj kernel: [ 2909.732070] ? srso_alias_return_thunk+0x5/0xfbef5
2024-02-15T17:51:15.668680-08:00 rk-xsj kernel: [ 2909.732072] ? ksys_write+0x73/0x100
2024-02-15T17:51:15.668680-08:00 rk-xsj kernel: [ 2909.732073] ? srso_alias_return_thunk+0x5/0xfbef5
2024-02-15T17:51:15.668680-08:00 rk-xsj kernel: [ 2909.732075] ? exit_to_user_mode_prepare+0x39/0x190
2024-02-15T17:51:15.668680-08:00 rk-xsj kernel: [ 2909.732078] ? srso_alias_return_thunk+0x5/0xfbef5
2024-02-15T17:51:15.668681-08:00 rk-xsj kernel: [ 2909.732080] ? syscall_exit_to_user_mode+0x37/0x60
2024-02-15T17:51:15.668681-08:00 rk-xsj kernel: [ 2909.732082] ? srso_alias_return_thunk+0x5/0xfbef5
2024-02-15T17:51:15.668681-08:00 rk-xsj kernel: [ 2909.732083] ? do_syscall_64+0x6c/0xf0
2024-02-15T17:51:15.668681-08:00 rk-xsj kernel: [ 2909.732085] ? do_syscall_64+0x6c/0xf0
2024-02-15T17:51:15.668681-08:00 rk-xsj kernel: [ 2909.732087] ? exc_page_fault+0x94/0x1b0
2024-02-15T17:51:15.668682-08:00 rk-xsj kernel: [ 2909.732089] entry_SYSCALL_64_after_hwframe+0x6e/0x76
2024-02-15T17:51:15.668682-08:00 rk-xsj kernel: [ 2909.732091] RIP: 0033:0x7fb2bc7157c4
2024-02-15T17:51:15.668682-08:00 rk-xsj kernel: [ 2909.732093] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 85 0d 0f 00 00 74 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 44 c3 0f 1f 00 48 83 ec 18 89 7c 24 0c e8 13
2024-02-15T17:51:15.668682-08:00 rk-xsj kernel: [ 2909.732094] RSP: 002b:00007ffd629c61e8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
2024-02-15T17:51:15.668682-08:00 rk-xsj kernel: [ 2909.732096] RAX: ffffffffffffffda RBX: 000055808393b6f0 RCX: 00007fb2bc7157c4
2024-02-15T17:51:15.668683-08:00 rk-xsj kernel: [ 2909.732097] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000003
2024-02-15T17:51:15.668683-08:00 rk-xsj kernel: [ 2909.732098] RBP: 000055808393b788 R08: 0000000000000000 R09: 0000000000000000
2024-02-15T17:51:15.668683-08:00 rk-xsj kernel: [ 2909.732098] R10: 0000558083999a60 R11: 0000000000000202 R12: 00007fb2bc9ff100
2024-02-15T17:51:15.668683-08:00 rk-xsj kernel: [ 2909.732099] R13: 000055808393b4f0 R14: 0000000100000001 R15: 00005580839336d0
2024-02-15T17:51:15.668683-08:00 rk-xsj kernel: [ 2909.732102] </TASK>
2024-02-15T17:51:15.668683-08:00 rk-xsj kernel: [ 2909.732103] ---[ end trace 0000000000000000 ]---
Any idea? Is it normal?
At least it does not crash my work laptop. ;-)
After installing Linux Kernel 6.7, XRT, and this driver on a fresh install of ubuntu 22.04 I'm seeing 0 devices found when running "xbutil examine", the provided code sample is also seg faulting when attempting to load device(0).
This is being executed on a minisforum PC with a 7940HS (UM790 Pro), any pointers with further debugging tips or solutions would be appreciated.
System Configuration
OS Name : Linux
Release : 6.7.0-rc8+
Version : #1 SMP PREEMPT_DYNAMIC Sun Feb 11 17:27:55 EST 2024
Machine : x86_64
CPU Cores : 16
Memory : 62085 MB
Distribution : Ubuntu 22.04.3 LTS
GLIBC : 2.35
Model : Venus series
BIOS vendor : American Megatrends International, LLC.
BIOS version : 1.09
XRT
Version : 2.17.0
Branch : master
Hash : a395e702b2e79b3ec23c9cdc3ab4ad31a0d84eab
Hash Date : 2024-02-12 12:04:51
XOCL : 2.17.0, a395e702b2e79b3ec23c9cdc3ab4ad31a0d84eab
XCLMGMT : 2.17.0, a395e702b2e79b3ec23c9cdc3ab4ad31a0d84eab
AMDXDNA : 2.17.0_20240212, 317e0c67747cbf88e5b5a3a81ba4bdf7bf5b3fc3
Devices present
0 devices found
I want use amd chips to model inference on some docker container,like this image
https://hub.docker.com/r/siutin/stable-diffusion-webui-docker
but im not sure how to make amd gpu work with it.
would give some advice or document to explain how to make a cpu works docker image or linux system to extend ryzen ai support
(base) mlevental@mlevental-F7BSC:/tmp/xdna-driver/build$ ./example_build/example_noop_test ../tools/bins/1502_00/validate.xclbin
Host test code start...
Host test code is creating device object...
Host test code is loading xclbin object...
Host test code is creating kernel object...
Host test code kernel name: DPU_PDI_0
Host code is registering xclbin to the device...
Host code is creating hw_context...
Host test code is creating kernel object...
Host test code allocate buffer objects...
Host test code sync buffer objects to device...
Host test code iterations (~10 seconds): 70000
*** stack smashing detected ***: terminated
Aborted (core dumped)
(base) mlevental@mlevental-F7BSC:/tmp/xdna-driver/build$ git rev-parse HEAD^
85e380b68c8b921b7efab18c1b3280644a982d40
(base) mlevental@mlevental-F7BSC:/tmp/xdna-driver/build$ /opt/xilinx/xrt/bin/xbutil examine
System Configuration
OS Name : Linux
Release : 6.8.8
Version : #2 SMP PREEMPT_DYNAMIC Fri May 3 14:13:56 CDT 2024
Machine : x86_64
CPU Cores : 16
Memory : 94278 MB
Distribution : Ubuntu 22.04.3 LTS
GLIBC : 2.35
Model : F7BSC
BIOS vendor : American Megatrends International, LLC.
BIOS version : 1.04
XRT
Version : 2.18.0
Branch : HEAD
Hash : c678a9469f9b20fcb9a04bbedb5c51f8473faec0
Hash Date : 2024-05-24 18:16:53
XOCL : unknown, unknown
XCLMGMT : unknown, unknown
WARNING: xclmgmt version is unknown. Is xclmgmt driver loaded? Or is MSD/MPD running?
AMDXDNA : 2.18.0_20240613, 099bf0332e3d5692f98fbbf309fa9177a95f10da
Firmware Version : N/A
Devices present
BDF : Name
---------------------------------
[0000:c5:00.1] : RyzenAI-npu1
(base) mlevental@mlevental-F7BSC:/tmp/xdna-driver/build$ uname -r
6.8.8
Question not an Issue
Got the installation working on Linux which is great, but I was wondering, what is the scope of what the driver will allow you to do? It seems that most of the software located in the Ryzen SW repository is strictly for windows (hard dependency on .dll, presence of .bat files etc.). Is there any plan on porting some of those tutorials to work with the Linux driver OR any way to get them working at the moment? Ideally it'd be nice to have a quick way to test executing some model via the ONNX runtime from a python script, similar to what they have available for windows at the moment. Any pointers to places where I can read more about the capabilities of this driver would be greatly appreciated too!
I'm trying to refresh one of my machines that has Ryzen AI, I get these build errors:
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c: In function ‘amdxdna_sched_job_init’:
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:171:62: error: passing argument 3 of ‘drm_sched_job_init’ makes pointer from integer without a cast [-Werror=int-conversion]
171 | ret = drm_sched_job_init(&job->base, &hwctx->entity, 1, hwctx);
| ^
| |
| int
In file included from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.h:11,
from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_drv.h:16,
from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:12:
./include/drm/gpu_scheduler.h:533:30: note: expected ‘void *’ but argument is of type ‘int’
533 | void *owner);
| ~~~~~~^~~~~
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:171:15: error: too many arguments to function ‘drm_sched_job_init’
171 | ret = drm_sched_job_init(&job->base, &hwctx->entity, 1, hwctx);
| ^~~~~~~~~~~~~~~~~~
In file included from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.h:11,
from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_drv.h:16,
from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:12:
./include/drm/gpu_scheduler.h:531:5: note: declared here
531 | int drm_sched_job_init(struct drm_sched_job *job,
| ^~~~~~~~~~~~~~~~~~
In file included from ./include/uapi/linux/posix_types.h:5,
from ./include/uapi/linux/types.h:14,
from ./include/linux/types.h:6,
from ./include/linux/kasan-checks.h:5,
from ./include/asm-generic/rwonce.h:26,
from ./arch/x86/include/generated/asm/rwonce.h:1,
from ./include/linux/compiler.h:251,
from ./include/linux/export.h:5,
from ./include/linux/linkage.h:7,
from ./include/linux/preempt.h:10,
from ./include/linux/spinlock.h:56,
from ./include/linux/kref.h:16,
from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:7:
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c: In function ‘amdxdna_hwctx_create’:
./include/linux/stddef.h:8:14: error: passing argument 3 of ‘drm_sched_init’ makes integer from pointer without a cast [-Werror=int-conversion]
8 | #define NULL ((void *)0)
| ^~~~~~~~~~~
| |
| void *
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:494:49: note: in expansion of macro ‘NULL’
494 | ret = drm_sched_init(sched, &sched_ops, NULL, DRM_SCHED_PRIORITY_COUNT,
| ^~~~
In file included from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.h:11,
from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_drv.h:16,
from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:12:
./include/drm/gpu_scheduler.h:526:24: note: expected ‘u32’ {aka ‘unsigned int’} but argument is of type ‘void *’
526 | u32 num_rqs, uint32_t hw_submission, unsigned int hang_limit,
| ~~~~^~~~~~~
In file included from ./include/linux/limits.h:7,
from ./include/linux/kernel.h:17,
from ./arch/x86/include/asm/percpu.h:27,
from ./arch/x86/include/asm/preempt.h:6,
from ./include/linux/preempt.h:79,
from ./include/linux/spinlock.h:56,
from ./include/linux/kref.h:16,
from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:7:
./include/vdso/limits.h:11:25: error: passing argument 7 of ‘drm_sched_init’ makes pointer from integer without a cast [-Werror=int-conversion]
11 | #define LONG_MAX ((long)(~0UL >> 1))
| ^~~~~~~~~~~~~~~~~~~
| |
| long int
./include/linux/sched.h:296:41: note: in expansion of macro ‘LONG_MAX’
296 | #define MAX_SCHEDULE_TIMEOUT LONG_MAX
| ^~~~~~~~
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:495:49: note: in expansion of macro ‘MAX_SCHEDULE_TIMEOUT’
495 | HWCTX_MAX_CMDS, 0, MAX_SCHEDULE_TIMEOUT, NULL,
| ^~~~~~~~~~~~~~~~~~~~
In file included from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.h:11,
from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_drv.h:16,
from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:12:
./include/drm/gpu_scheduler.h:527:59: note: expected ‘struct workqueue_struct *’ but argument is of type ‘long int’
527 | long timeout, struct workqueue_struct *timeout_wq,
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:496:41: error: passing argument 10 of ‘drm_sched_init’ from incompatible pointer type [-Werror=incompatible-pointer-types]
496 | NULL, hwctx->name, &client->xdna->pdev->dev);
| ~~~~~^~~~~~
| |
| char *
In file included from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.h:11,
from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_drv.h:16,
from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:12:
./include/drm/gpu_scheduler.h:528:70: note: expected ‘struct device *’ but argument is of type ‘char *’
528 | atomic_t *score, const char *name, struct device *dev);
| ~~~~~~~~~~~~~~~^~~
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:494:15: error: too many arguments to function ‘drm_sched_init’
494 | ret = drm_sched_init(sched, &sched_ops, NULL, DRM_SCHED_PRIORITY_COUNT,
| ^~~~~~~~~~~~~~
In file included from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.h:11,
from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_drv.h:16,
from /home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:12:
./include/drm/gpu_scheduler.h:524:5: note: declared here
524 | int drm_sched_init(struct drm_gpu_scheduler *sched,
| ^~~~~~~~~~~~~~
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c: In function ‘amdxdna_hwctx_destroy_rcu’:
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:767:9: error: implicit declaration of function ‘drm_sched_wqueue_stop’; did you mean ‘drm_sched_stop’? [-Werror=implicit-function-declaration]
767 | drm_sched_wqueue_stop(&hwctx->sched);
| ^~~~~~~~~~~~~~~~~~~~~
| drm_sched_stop
/home/eric/src/xdna-driver/src/driver/amdxdna/amdxdna_ctx.c:777:9: error: implicit declaration of function ‘drm_sched_wqueue_start’; did you mean ‘drm_sched_start’? [-Werror=implicit-function-declaration]
777 | drm_sched_wqueue_start(&hwctx->sched);
| ^~~~~~~~~~~~~~~~~~~~~~
| drm_sched_start
cc1: all warnings being treated as errors
While I have a lot of issues with power control on my laptop with 6.5 and 6.7.4 kernel independently from this project, it seems that sometimes the amdxdna
prevents the kernel from stopping too.
When it works:
2024-02-16T09:50:10.661313-08:00 rk-xsj kernel: [ 139.176934] amdxdna 0000:66:00.1: firmware resuming...
2024-02-16T09:50:10.661314-08:00 rk-xsj kernel: [ 139.177070] amdxdna 0000:66:00.1: hardware context resuming...
When it fails:
2024-02-16T10:00:26.051836-08:00 rk-xsj kernel: [ 668.382497] amdxdna 0000:66:00.1: amdxdna_do_suspend: suspend NPU firmware failed
2024-02-16T10:00:26.051837-08:00 rk-xsj kernel: [ 668.382501] amdxdna 0000:66:00.1: PM: pci_pm_suspend(): amdxdna_pmops_suspend+0x0/0x80 [amdxdna] returns -19
2024-02-16T10:00:26.051837-08:00 rk-xsj kernel: [ 668.382514] amdxdna 0000:66:00.1: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x1b0 returns -19
2024-02-16T10:00:26.051838-08:00 rk-xsj kernel: [ 668.382519] amdxdna 0000:66:00.1: PM: failed to suspend async: error -19
2024-02-16T10:00:26.051838-08:00 rk-xsj kernel: [ 668.478472] PM: Some devices failed to suspend, or early wake event detected
Is this a known problem?
Should it work ?
clinfo
Number of platforms: 2
Platform Profile: EMBEDDED_PROFILE
Platform Version: OpenCL 1.0
Platform Name: Xilinx
Platform Vendor: Xilinx
Platform Extensions: cl_khr_icd
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1 AMD-APP (3558.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: Xilinx
Number of devices: 1
XRT build version: 2.17.0
Build hash: baf88820fb3fc24dda4dc08c91ecbca2c76c7b0f
Build date: 2024-04-17 13:03:42
Git branch: HEAD
PID: 61787
UID: 0
[Wed Apr 17 16:03:16 2024 GMT]
HOST: Shiva
EXE: /opt/rocm-5.5.0/bin/clinfo
[XRT] ERROR: get_device_info: Operation not supported
ERROR: clGetDeviceInfo(-6)
It is AMD Ryzen 7 7840HS:
xbutil examine
System Configuration
OS Name : Linux
Release : 6.8.6-060806-generic
Version : #202404131135 SMP PREEMPT_DYNAMIC Wed Apr 17 04:32:21 EEST 2024
Machine : x86_64
CPU Cores : 16
Memory : 95772 MB
Distribution : Ubuntu 22.04.4 LTS
GLIBC : 2.35
Model : TUXEDO Sirius 16 Gen1
BIOS vendor : American Megatrends International, LLC.
BIOS version : V1.00A00_20240108
XRT
Version : 2.17.0
Branch : HEAD
Hash : baf88820fb3fc24dda4dc08c91ecbca2c76c7b0f
Hash Date : 2024-04-17 13:03:42
XOCL : 2.17.0, baf88820fb3fc24dda4dc08c91ecbca2c76c7b0f
XCLMGMT : 2.17.0, baf88820fb3fc24dda4dc08c91ecbca2c76c7b0f
AMDXDNA : 2.17.0_20240417, 35351e4bbbc65568669c36255825425030be721f
Devices present
BDF : Name
---------------------------------
[0000:6a:00.1] : RyzenAI-npu1
there are some discuss : #168
i think it's a milestone of this driver, cause a lot of cloud service provider are selling GPU computing in theres VM.and if i setting up a computing system i prefer cloud native or in my vm environment.
would amd test and make xdna driver in Linux VM environment like https://libvirt.org/index.html or PVE(actually is libvirt)
Hello,
question. Does Kernel 6.9 include the necessary patches (IOMMU SVA) or does it still require a custom kernel?
Thx
AMD Ryzen 7 7840HS, Ubuntu 22.04:
System Configuration
OS Name : Linux
Release : 6.8.7-060807-generic
Version : #202404170934 SMP PREEMPT_DYNAMIC Thu Apr 18 13:01:01 EEST 2024
Machine : x86_64
CPU Cores : 16
Memory : 95777 MB
Distribution : Ubuntu 22.04.4 LTS
GLIBC : 2.35
Model : TUXEDO Sirius 16 Gen1
BIOS vendor : American Megatrends International, LLC.
BIOS version : V1.00A00_20240108
XRT
Version : 2.17.0
Branch : HEAD
Hash : baf88820fb3fc24dda4dc08c91ecbca2c76c7b0f
Hash Date : 2024-04-17 13:03:42
XOCL : 2.17.0, baf88820fb3fc24dda4dc08c91ecbca2c76c7b0f
XCLMGMT : 2.17.0, baf88820fb3fc24dda4dc08c91ecbca2c76c7b0f
AMDXDNA : 2.17.0_20240417, 35351e4bbbc65568669c36255825425030be721f
Devices present
BDF : Name
---------------------------------
[0000:6a:00.1] : RyzenAI-npu1
'xbutil validate' freezes at:
------------------------------------------------------------
EARLY ACCESS
This release of xbutil contains early access
experimental features which may have bugs.
------------------------------------------------------------
Validate Device : [0000:6a:00.1]
Platform : RyzenAI-npu1
-------------------------------------------------------------------------------
Test 1 [0000:6a:00.1] : verify
Details : Kernel name is 'DPU_PDI_0'
Total duration: '1.1's
Average throughput: '9490.3' ops/s
Average latency: '105.4' us
Test Status : [PASSED]
-------------------------------------------------------------------------------
[ <-> <->Running Test> <->]: Running Test... < 59s >
terminate called after throwing an instance of 'boost::wrapexcept<boost::io::too_many_args>'
what(): boost::too_many_args: format-string referred to fewer arguments than were passed
Oops:
[Fri Apr 19 07:03:53 2024] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP NOPTI
[Fri Apr 19 07:03:53 2024] CPU: 6 PID: 21072 Comm: xbutil2 Tainted: G W O 6.8.7-060807-generic #202404170934
[Fri Apr 19 07:03:53 2024] Hardware name: TUXEDO TUXEDO Sirius 16 Gen1/APX958, BIOS V1.00A00_20240108 01/08/2024
[Fri Apr 19 07:03:53 2024] RIP: 0010:amdxdna_flush+0x39/0xa0 [amdxdna]
[Fri Apr 19 07:03:53 2024] Code: c8 00 00 00 48 8b 98 98 00 00 00 4c 8b 63 68 66 90 49 81 c4 28 06 00 00 4c 89 e7 e8 d1 fe 91 eb 48 8b 13 48 8b 43 08 48 89 df <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 03 48 83
[Fri Apr 19 07:03:53 2024] RSP: 0018:ffff9fd28b9cfc40 EFLAGS: 00010246
[Fri Apr 19 07:03:53 2024] RAX: dead000000000122 RBX: ffff8ea8a814cfc0 RCX: ffff8ea7763e0800
[Fri Apr 19 07:03:53 2024] RDX: dead000000000100 RSI: ffff8ea7503f1b80 RDI: ffff8ea8a814cfc0
[Fri Apr 19 07:03:53 2024] RBP: ffff9fd28b9cfc50 R08: 0000000000000000 R09: 0000000000000000
[Fri Apr 19 07:03:53 2024] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8ea7cc212628
[Fri Apr 19 07:03:53 2024] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8eb3b04cb340
[Fri Apr 19 07:03:53 2024] FS: 0000000000000000(0000) GS:ffff8ebc1e700000(0000) knlGS:0000000000000000
[Fri Apr 19 07:03:53 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri Apr 19 07:03:53 2024] CR2: 00007dfa3c600000 CR3: 0000000f2623c000 CR4: 0000000000f50ef0
[Fri Apr 19 07:03:53 2024] PKRU: 55555554
[Fri Apr 19 07:03:53 2024] Call Trace:
[Fri Apr 19 07:03:53 2024] <TASK>
[Fri Apr 19 07:03:53 2024] ? show_regs+0x6d/0x80
[Fri Apr 19 07:03:53 2024] ? die_addr+0x37/0xa0
[Fri Apr 19 07:03:53 2024] ? exc_general_protection+0x1db/0x480
[Fri Apr 19 07:03:53 2024] ? asm_exc_general_protection+0x27/0x30
[Fri Apr 19 07:03:53 2024] ? amdxdna_flush+0x39/0xa0 [amdxdna]
[Fri Apr 19 07:03:53 2024] filp_flush+0x35/0x90
[Fri Apr 19 07:03:53 2024] filp_close+0x14/0x30
[Fri Apr 19 07:03:53 2024] put_files_struct+0x85/0xf0
[Fri Apr 19 07:03:53 2024] exit_files+0x47/0x60
[Fri Apr 19 07:03:53 2024] do_exit+0x295/0x530
[Fri Apr 19 07:03:53 2024] do_group_exit+0x35/0x90
[Fri Apr 19 07:03:53 2024] get_signal+0x954/0x990
[Fri Apr 19 07:03:53 2024] ? srso_alias_return_thunk+0x5/0xfbef5
[Fri Apr 19 07:03:53 2024] ? hrtimer_nanosleep+0xbf/0x1a0
[Fri Apr 19 07:03:53 2024] arch_do_signal_or_restart+0x39/0x120
[Fri Apr 19 07:03:53 2024] syscall_exit_to_user_mode+0x209/0x260
[Fri Apr 19 07:03:53 2024] do_syscall_64+0x8c/0x180
[Fri Apr 19 07:03:53 2024] ? syscall_exit_to_user_mode+0x89/0x260
[Fri Apr 19 07:03:53 2024] ? srso_alias_return_thunk+0x5/0xfbef5
[Fri Apr 19 07:03:53 2024] ? do_syscall_64+0x8c/0x180
[Fri Apr 19 07:03:53 2024] ? srso_alias_return_thunk+0x5/0xfbef5
[Fri Apr 19 07:03:53 2024] ? irqentry_exit_to_user_mode+0x7e/0x260
[Fri Apr 19 07:03:53 2024] ? srso_alias_return_thunk+0x5/0xfbef5
[Fri Apr 19 07:03:53 2024] ? irqentry_exit+0x43/0x50
[Fri Apr 19 07:03:53 2024] ? srso_alias_return_thunk+0x5/0xfbef5
[Fri Apr 19 07:03:53 2024] entry_SYSCALL_64_after_hwframe+0x78/0x80
[Fri Apr 19 07:03:53 2024] RIP: 0033:0x79d1b94e57f8
[Fri Apr 19 07:03:53 2024] Code: Unable to access opcode bytes at 0x79d1b94e57ce.
[Fri Apr 19 07:03:53 2024] RSP: 002b:00007fff79b86300 EFLAGS: 00000293 ORIG_RAX: 00000000000000e6
[Fri Apr 19 07:03:53 2024] RAX: fffffffffffffdfc RBX: 00007fff79b86301 RCX: 000079d1b94e57f8
[Fri Apr 19 07:03:53 2024] RDX: 00007fff79b863a0 RSI: 0000000000000000 RDI: 0000000000000000
[Fri Apr 19 07:03:53 2024] RBP: 00007fff79b863d0 R08: 0000000000000000 R09: 0000000000000000
[Fri Apr 19 07:03:53 2024] R10: 00007fff79b863a0 R11: 0000000000000293 R12: 00007fff79b863a0
[Fri Apr 19 07:03:53 2024] R13: 0000000000000000 R14: 00007fff79b863a0 R15: 00007fff79b86c50
[Fri Apr 19 07:03:53 2024] </TASK>
[Fri Apr 19 07:03:53 2024] Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype br_netfilter xfrm_user xfrm_algo rfcomm xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink xclmgmt(O) xocl(O) snd_seq_dummy snd_hrtimer bridge stp llc cmac algif_hash algif_skcipher af_alg bnep nvme_fabrics overlay zram binfmt_misc nls_iso8859_1 btusb btrtl btintel btbcm btmtk bluetooth ecdh_generic ecc snd_ctl_led ledtrig_audio snd_soc_dmic snd_ps_pdm_dma snd_soc_ps_mach snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp intel_rapl_common snd_sof snd_usb_audio uvcvideo edac_mce_amd snd_sof_utils videobuf2_vmalloc uvc snd_usbmidi_lib videobuf2_memops snd_soc_core snd_ump kvm_amd videobuf2_v4l2 ch341 snd_hda_codec_conexant snd_hda_codec_generic usbserial videodev snd_hda_codec_hdmi snd_rawmidi snd_compress kvm videobuf2_common ac97_bus mc snd_pcm_dmaengine irqbypass
[Fri Apr 19 07:03:53 2024] snd_hda_intel iwlmvm rapl snd_pci_ps snd_intel_dspcfg input_leds mac80211 libarc4 snd_intel_sdw_acpi snd_hda_codec serio_raw snd_rpl_pci_acp6x hid_multitouch snd_acp_pci snd_hda_core snd_acp_legacy_common wmi_bmof snd_pci_acp6x snd_hwdep snd_pcm k10temp iwlwifi snd_pci_acp5x snd_rn_pci_acp3x amdxdna(O) snd_acp_config snd_seq snd_soc_acpi snd_pci_acp3x amd_pmf sp5100_tco snd_seq_device snd_timer cfg80211 snd soundcore amdtee soc_button_array mac_hid ccp amd_sfh tee amd_pmc platform_profile sch_fq_codel nfsd auth_rpcgss msr nfs_acl parport_pc lockd grace ppdev parport bfq efi_pstore sunrpc ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 uas usbhid usb_storage amdgpu amdxcp drm_exec gpu_sched drm_buddy i2c_algo_bit drm_suballoc_helper drm_ttm_helper crct10dif_pclmul ttm crc32_pclmul drm_display_helper polyval_clmulni polyval_generic cec nvme hid_generic sha256_ssse3 sha1_ssse3 xhci_pci r8169 nvme_core thunderbolt video
[Fri Apr 19 07:03:53 2024] rc_core xhci_pci_renesas realtek nvme_auth wmi hid aesni_intel crypto_simd cryptd [last unloaded: i2c_hid]
[Fri Apr 19 07:03:53 2024] ---[ end trace 0000000000000000 ]---
[Fri Apr 19 07:03:53 2024] RIP: 0010:amdxdna_flush+0x39/0xa0 [amdxdna]
[Fri Apr 19 07:03:53 2024] Code: c8 00 00 00 48 8b 98 98 00 00 00 4c 8b 63 68 66 90 49 81 c4 28 06 00 00 4c 89 e7 e8 d1 fe 91 eb 48 8b 13 48 8b 43 08 48 89 df <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 03 48 83
[Fri Apr 19 07:03:53 2024] RSP: 0018:ffff9fd28b9cfc40 EFLAGS: 00010246
[Fri Apr 19 07:03:53 2024] RAX: dead000000000122 RBX: ffff8ea8a814cfc0 RCX: ffff8ea7763e0800
[Fri Apr 19 07:03:53 2024] RDX: dead000000000100 RSI: ffff8ea7503f1b80 RDI: ffff8ea8a814cfc0
[Fri Apr 19 07:03:53 2024] RBP: ffff9fd28b9cfc50 R08: 0000000000000000 R09: 0000000000000000
[Fri Apr 19 07:03:53 2024] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8ea7cc212628
[Fri Apr 19 07:03:53 2024] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8eb3b04cb340
[Fri Apr 19 07:03:53 2024] FS: 0000000000000000(0000) GS:ffff8ebc1e700000(0000) knlGS:0000000000000000
[Fri Apr 19 07:03:53 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri Apr 19 07:03:53 2024] CR2: 00007dfa3c600000 CR3: 0000000ca84a0000 CR4: 0000000000f50ef0
[Fri Apr 19 07:03:53 2024] PKRU: 55555554
[Fri Apr 19 07:03:53 2024] Fixing recursive fault but reboot is needed!
[Fri Apr 19 07:03:53 2024] BUG: scheduling while atomic: xbutil2/21072/0x00000000
[Fri Apr 19 07:03:53 2024] Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype br_netfilter xfrm_user xfrm_algo rfcomm xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink xclmgmt(O) xocl(O) snd_seq_dummy snd_hrtimer bridge stp llc cmac algif_hash algif_skcipher af_alg bnep nvme_fabrics overlay zram binfmt_misc nls_iso8859_1 btusb btrtl btintel btbcm btmtk bluetooth ecdh_generic ecc snd_ctl_led ledtrig_audio snd_soc_dmic snd_ps_pdm_dma snd_soc_ps_mach snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp intel_rapl_common snd_sof snd_usb_audio uvcvideo edac_mce_amd snd_sof_utils videobuf2_vmalloc uvc snd_usbmidi_lib videobuf2_memops snd_soc_core snd_ump kvm_amd videobuf2_v4l2 ch341 snd_hda_codec_conexant snd_hda_codec_generic usbserial videodev snd_hda_codec_hdmi snd_rawmidi snd_compress kvm videobuf2_common ac97_bus mc snd_pcm_dmaengine irqbypass
[Fri Apr 19 07:03:53 2024] snd_hda_intel iwlmvm rapl snd_pci_ps snd_intel_dspcfg input_leds mac80211 libarc4 snd_intel_sdw_acpi snd_hda_codec serio_raw snd_rpl_pci_acp6x hid_multitouch snd_acp_pci snd_hda_core snd_acp_legacy_common wmi_bmof snd_pci_acp6x snd_hwdep snd_pcm k10temp iwlwifi snd_pci_acp5x snd_rn_pci_acp3x amdxdna(O) snd_acp_config snd_seq snd_soc_acpi snd_pci_acp3x amd_pmf sp5100_tco snd_seq_device snd_timer cfg80211 snd soundcore amdtee soc_button_array mac_hid ccp amd_sfh tee amd_pmc platform_profile sch_fq_codel nfsd auth_rpcgss msr nfs_acl parport_pc lockd grace ppdev parport bfq efi_pstore sunrpc ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 uas usbhid usb_storage amdgpu amdxcp drm_exec gpu_sched drm_buddy i2c_algo_bit drm_suballoc_helper drm_ttm_helper crct10dif_pclmul ttm crc32_pclmul drm_display_helper polyval_clmulni polyval_generic cec nvme hid_generic sha256_ssse3 sha1_ssse3 xhci_pci r8169 nvme_core thunderbolt video
[Fri Apr 19 07:03:53 2024] rc_core xhci_pci_renesas realtek nvme_auth wmi hid aesni_intel crypto_simd cryptd [last unloaded: i2c_hid]
[Fri Apr 19 07:03:53 2024] CPU: 6 PID: 21072 Comm: xbutil2 Tainted: G D W O 6.8.7-060807-generic #202404170934
[Fri Apr 19 07:03:53 2024] Hardware name: TUXEDO TUXEDO Sirius 16 Gen1/APX958, BIOS V1.00A00_20240108 01/08/2024
[Fri Apr 19 07:03:53 2024] Call Trace:
[Fri Apr 19 07:03:53 2024] <TASK>
[Fri Apr 19 07:03:53 2024] dump_stack_lvl+0x76/0xa0
[Fri Apr 19 07:03:53 2024] dump_stack+0x10/0x20
[Fri Apr 19 07:03:53 2024] __schedule_bug+0x64/0x80
[Fri Apr 19 07:03:53 2024] schedule_debug.isra.0+0xdb/0x130
[Fri Apr 19 07:03:53 2024] __schedule+0x69/0x6b0
[Fri Apr 19 07:03:53 2024] ? srso_alias_return_thunk+0x5/0xfbef5
[Fri Apr 19 07:03:53 2024] ? vprintk+0x42/0x80
[Fri Apr 19 07:03:53 2024] ? srso_alias_return_thunk+0x5/0xfbef5
[Fri Apr 19 07:03:53 2024] ? _printk+0x60/0x90
[Fri Apr 19 07:03:53 2024] do_task_dead+0x44/0x50
[Fri Apr 19 07:03:53 2024] make_task_dead+0x13e/0x140
[Fri Apr 19 07:03:53 2024] rewind_stack_and_make_dead+0x17/0x20
[Fri Apr 19 07:03:53 2024] RIP: 0033:0x79d1b94e57f8
[Fri Apr 19 07:03:53 2024] Code: Unable to access opcode bytes at 0x79d1b94e57ce.
[Fri Apr 19 07:03:53 2024] RSP: 002b:00007fff79b86300 EFLAGS: 00000293 ORIG_RAX: 00000000000000e6
[Fri Apr 19 07:03:53 2024] RAX: fffffffffffffdfc RBX: 00007fff79b86301 RCX: 000079d1b94e57f8
[Fri Apr 19 07:03:53 2024] RDX: 00007fff79b863a0 RSI: 0000000000000000 RDI: 0000000000000000
[Fri Apr 19 07:03:53 2024] RBP: 00007fff79b863d0 R08: 0000000000000000 R09: 0000000000000000
[Fri Apr 19 07:03:53 2024] R10: 00007fff79b863a0 R11: 0000000000000293 R12: 00007fff79b863a0
[Fri Apr 19 07:03:53 2024] R13: 0000000000000000 R14: 00007fff79b863a0 R15: 00007fff79b86c50
[Fri Apr 19 07:03:53 2024] </TASK>
With current driver I can run
-<%>- ./example_build/example_noop_test /lib/firmware/amdnpu/1502_00/validate.xclbin
Host test code start...
Host test code is creating device object...
Host test code is loading xclbin object...
Host test code is creating kernel object...
Host test code kernel name: DPU_PDI_0
Host code is registering xclbin to the device...
Host code is creating hw_context...
Host test code is creating kernel object...
Host test code allocate buffer objects...
Host test code sync buffer objects to device...
Host test code iterations (~10 seconds): 70000
Host test microseconds: 6833094
Host test average latency: 97 us/iter
TEST PASSED!
But I have this kernel error message:
[Wed Apr 3 13:18:21 2024] [drm] Initialized amdxdna_accel_driver 1.0.0 20240124 for 0000:66:00.1 on minor 0
[Wed Apr 3 13:19:20 2024] ------------[ cut here ]------------
[Wed Apr 3 13:19:20 2024] UBSAN: array-index-out-of-bounds in /var/lib/dkms/xrt-amdxdna/2.17.0/build/driver/amdxdna/npu1_message.c:488:57
[Wed Apr 3 13:19:20 2024] index 1 is out of range for type 'amdxdna_cu_config [*]'
[Wed Apr 3 13:19:20 2024] CPU: 3 PID: 27749 Comm: example_noop_te Tainted: P W OE 6.8.2+iommu-sva-part4-v7+pmc-10ms-delay+ #2
[Wed Apr 3 13:19:20 2024] Hardware name: HP HP ZBook Power 15.6 inch G10 A Mobile Workstation PC/8B95, BIOS V85 Ver. 01.04.00 01/18/2024
[Wed Apr 3 13:19:20 2024] Call Trace:
[Wed Apr 3 13:19:20 2024] <TASK>
[Wed Apr 3 13:19:20 2024] dump_stack_lvl+0x48/0x70
[Wed Apr 3 13:19:20 2024] dump_stack+0x10/0x20
[Wed Apr 3 13:19:20 2024] __ubsan_handle_out_of_bounds+0xc6/0x110
[Wed Apr 3 13:19:20 2024] npu1_config_cu+0x36a/0x3e0 [amdxdna]
[Wed Apr 3 13:19:20 2024] ? __pfx_npu_msg_cb+0x10/0x10 [amdxdna]
[Wed Apr 3 13:19:20 2024] npu1_hwctx_config+0xa4/0x200 [amdxdna]
[Wed Apr 3 13:19:20 2024] amdxdna_drm_config_hwctx_ioctl+0xa4/0x140 [amdxdna]
[Wed Apr 3 13:19:20 2024] ? __pfx_amdxdna_drm_config_hwctx_ioctl+0x10/0x10 [amdxdna]
[Wed Apr 3 13:19:20 2024] drm_ioctl_kernel+0xb9/0x120
[Wed Apr 3 13:19:20 2024] ? srso_alias_return_thunk+0x5/0xfbef5
[Wed Apr 3 13:19:20 2024] drm_ioctl+0x2d4/0x550
[Wed Apr 3 13:19:20 2024] ? __pfx_amdxdna_drm_config_hwctx_ioctl+0x10/0x10 [amdxdna]
[Wed Apr 3 13:19:20 2024] __x64_sys_ioctl+0xa0/0xf0
[Wed Apr 3 13:19:20 2024] do_syscall_64+0x74/0x140
[Wed Apr 3 13:19:20 2024] ? srso_alias_return_thunk+0x5/0xfbef5
[Wed Apr 3 13:19:20 2024] ? sysvec_apic_timer_interrupt+0x4b/0xd0
[Wed Apr 3 13:19:20 2024] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[Wed Apr 3 13:19:20 2024] RIP: 0033:0x751c6a12396f
[Wed Apr 3 13:19:20 2024] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[Wed Apr 3 13:19:20 2024] RSP: 002b:00007ffc7c102470 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[Wed Apr 3 13:19:20 2024] RAX: ffffffffffffffda RBX: 00000000c0186442 RCX: 0000751c6a12396f
[Wed Apr 3 13:19:20 2024] RDX: 00007ffc7c1026b0 RSI: 00000000c0186442 RDI: 0000000000000003
[Wed Apr 3 13:19:20 2024] RBP: 00007ffc7c1026b0 R08: 0000751c641e0000 R09: 0000000000000003
[Wed Apr 3 13:19:20 2024] R10: 0000598033302d40 R11: 0000000000000246 R12: 00005980332d69e0
[Wed Apr 3 13:19:20 2024] R13: 000059803331ba28 R14: 0000598033302c50 R15: 0000751c6aa72660
[Wed Apr 3 13:19:20 2024] </TASK>
[Wed Apr 3 13:19:20 2024] ---[ end trace ]---
How many mainborad 's memory can be addressed by the AMD Ryzen™ AI 9 365's gpu、npu ?
Does the npu can be used to tain model ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.