xilinx / mlir-air Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
From branch debugging_matrix_scalar_add
, I am working on getting my example multi_core_dma
working (this file). The single core version works in this branch, but when I increase the herd size from 1x1 to 2x2, the example does not work any more.
To be specific, to replicate run:
cd programming_examples/matrix_scalar_add/multi_core_dma
make
When I inspect programming_examples/matrix_scalar_add/multi_core_dma/build/air_project/npu.air.mlir
, I notice that it looks like all of the data is only going to one core instead of being distributed to all of the cores.
aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 0, 0][1, 1, 8, 16][0, 0, 32]) {id = 0 : i64, metadata = @airMemcpyId9} : memref<32x16xi32>
aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 16, 0][1, 1, 8, 16][0, 0, 32]) {id = 1 : i64, metadata = @airMemcpyId9} : memref<32x16xi32>
aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 0, 16][1, 1, 8, 16][0, 0, 32]) {id = 2 : i64, metadata = @airMemcpyId9} : memref<32x16xi32>
aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 16, 16][1, 1, 8, 16][0, 0, 32]) {id = 3 : i64, metadata = @airMemcpyId9} : memref<32x16xi32>
aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 0, 0][1, 1, 8, 16][0, 0, 32]) {id = 4 : i64, metadata = @airMemcpyId10} : memref<32x16xi32>
aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 16, 0][1, 1, 8, 16][0, 0, 32]) {id = 5 : i64, metadata = @airMemcpyId10} : memref<32x16xi32>
aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 0, 16][1, 1, 8, 16][0, 0, 32]) {id = 6 : i64, metadata = @airMemcpyId10} : memref<32x16xi32>
aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 16, 16][1, 1, 8, 16][0, 0, 32]) {id = 7 : i64, metadata = @airMemcpyId10} : memref<32x16xi32>
aiex.npu.sync {channel = 0 : i32, column = 0 : i32, column_num = 1 : i32, direction = 0 : i32, row = 0 : i32, row_num = 1 : i32}
aiex.npu.sync {channel = 0 : i32, column = 0 : i32, column_num = 1 : i32, direction = 0 : i32, row = 0 : i32, row_num = 1 : i32}
aiex.npu.sync {channel = 0 : i32, column = 0 : i32, column_num = 1 : i32, direction = 0 : i32, row = 0 : i32, row_num = 1 : i32}
aiex.npu.sync {channel = 0 : i32, column = 0 : i32, column_num = 1 : i32, direction = 0 : i32, row = 0 : i32, row_num = 1 : i32}
From branch debugging_matrix_scalar_add
, I am working on getting my example multi_core_channel
working (this file). The single core version works in this branch, but when I increase the herd size from 1x1 to 2x2, the example does not work any more.
To be specific, to replicate run:
cd programming_examples/matrix_scalar_add/multi_core_channel
make
When I inspect programming_examples/matrix_scalar_add/multi_core_channel/build/air_project/npu.air.mlir
, I notice that it looks like all of the data is only going to one core instead of being distributed to all of the cores.
aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 0, 0][1, 1, 8, 16][0, 0, 32]) {id = 0 : i64, metadata = @airMemcpyId3} : memref<32x16xi32>
aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 0, 0][1, 1, 8, 16][0, 0, 32]) {id = 1 : i64, metadata = @airMemcpyId4} : memref<32x16xi32>
aiex.npu.sync {channel = 0 : i32, column = 0 : i32, column_num = 1 : i32, direction = 0 : i32, row = 0 : i32, row_num = 1 : i32}
aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 0, 16][1, 1, 8, 16][0, 0, 32]) {id = 0 : i64, metadata = @airMemcpyId3} : memref<32x16xi32>
aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 0, 16][1, 1, 8, 16][0, 0, 32]) {id = 1 : i64, metadata = @airMemcpyId4} : memref<32x16xi32>
aiex.npu.sync {channel = 0 : i32, column = 0 : i32, column_num = 1 : i32, direction = 0 : i32, row = 0 : i32, row_num = 1 : i32}
aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 16, 0][1, 1, 8, 16][0, 0, 32]) {id = 0 : i64, metadata = @airMemcpyId3} : memref<32x16xi32>
aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 16, 0][1, 1, 8, 16][0, 0, 32]) {id = 1 : i64, metadata = @airMemcpyId4} : memref<32x16xi32>
aiex.npu.sync {channel = 0 : i32, column = 0 : i32, column_num = 1 : i32, direction = 0 : i32, row = 0 : i32, row_num = 1 : i32}
aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 16, 16][1, 1, 8, 16][0, 0, 32]) {id = 0 : i64, metadata = @airMemcpyId3} : memref<32x16xi32>
aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 16, 16][1, 1, 8, 16][0, 0, 32]) {id = 1 : i64, metadata = @airMemcpyId4} : memref<32x16xi32>
aiex.npu.sync {channel = 0 : i32, column = 0 : i32, column_num = 1 : i32, direction = 0 : i32, row = 0 : i32, row_num = 1 : i32}
Write some tests for the runtime python bindings.
https://github.com/Xilinx/mlir-air/blob/main/runtime_lib/python/LibAirHostModule.cpp
Hi, In the Pynq .spec file there is a Pynq package called aienginesv2 being built. Is that package public somewhere?
Thank you!
See contents of: https://github.com/Xilinx/mlir-air/blob/main/pynq/vck190_air/vck190_air.spec
AIR reports a missing header during runtime library build for target aarch64. The test_library.h
header is available under /runtime_lib/aarch64/test_lib/include
, but for some reason the compiler only looks for /runtime_lib/x86_64/test_lib/include
. I was able to work around this by setting CPLUS_INCLUDE_PATH
but there probably is some hard-coded path pointed to x86.
c3a9b505f06936a3e4c81c221ca9fac2a7d6dbad
d21ca563e0c0fd100a4bbd98d194e770ce33bd79
Compile AIR with runtime lib target aarch64:
cmake .. \
-GNinja \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_INSTALL_PREFIX="../${INSTALL_DIR}" \
-DArch=arm64 \
-DgccVer=10.2.0 \
-DCMAKE_USE_TOOLCHAIN=FALSE \
-DCMAKE_USE_TOOLCHAIN_AIRHOST=TRUE \
-DLLVM_DIR=${LLVM_DIR}/build/lib/cmake/llvm \
-DMLIR_DIR=${LLVM_DIR}/build/lib/cmake/mlir \
-DAIE_DIR=${MLIR_AIE_DIR}/build/lib/cmake/aie \
-Dpybind11_DIR=${PYTHON_ROOT}/pybind11/share/cmake/pybind11 \
-DAIR_RUNTIME_TARGETS:STRING="aarch64" \
-Daarch64_TOOLCHAIN_FILE=/home/niansong/mlir-air/cmake/modules/toolchain_aarch64.cmake \
-DBUILD_SHARED_LIBS=OFF \
-DLLVM_USE_LINKER=lld \
-DXILINX_XAIE_INCLUDE_DIR=/home/niansong/mlir-air/install/runtime_lib/aarch64/xaiengine/include \
-DXILINX_XAIE_LIBS=/home/niansong/mlir-air/install/runtime_lib/aarch64/xaiengine/lib \
-DCMAKE_MODULE_PATH=${CMAKEMODULES_DIR}/ \
|& tee cmake.log
During build there's error message:
1 error generated.
[21/30] Building CXX object airhost/CMakeFiles/airhost.dir/queue.cpp.o
FAILED: airhost/CMakeFiles/airhost.dir/queue.cpp.o
/usr/bin/clang++-10 --sysroot=/group/xrlabs/platforms/vck190-pynq-v2.7/sysroot -DLIBXAIENGINEV2 -I/proj/rdi/staff/erweiw/niansong_end_of_internship/mlir-air/runtime_lib/airhost/include -I/proj/rdi/staff/erweiw/niansong_end_of_internship/mlir-air/utils/mlir-aie/include -I/proj/rdi/staff/erweiw/niansong_end_of_internship/mlir-air/utils/mlir-aie/build/include/../runtime_lib/x86_64/test_lib/include -I/group/xrlabs/platforms/vck190-pynq-v2.7/sysroot/opt/xaienginev2/include --sysroot=/group/xrlabs/platforms/vck190-pynq-v2.7/sysroot --target=aarch64-linux-gnu --gcc-toolchain=/group/xrlabs/platforms/vck190-pynq-v2.7/sysroot/usr -fuse-ld=lld-10 -Wno-unused-command-line-argument -std=gnu++14 -fPIC -MD -MT airhost/CMakeFiles/airhost.dir/queue.cpp.o -MF airhost/CMakeFiles/airhost.dir/queue.cpp.o.d -o airhost/CMakeFiles/airhost.dir/queue.cpp.o -c /proj/rdi/staff/erweiw/niansong_end_of_internship/mlir-air/runtime_lib/airhost/queue.cpp
In file included from /proj/rdi/staff/erweiw/niansong_end_of_internship/mlir-air/runtime_lib/airhost/queue.cpp:20:
/proj/rdi/staff/erweiw/niansong_end_of_internship/mlir-air/runtime_lib/airhost/include/air_host_impl.h:11:10: fatal error: 'test_library.h' file not found
#include "test_library.h"
^~~~~~~~~~~~~~~~
1 error generated.
aie.herds are intentionally isolated from above to allow parallel compilation.
memory allocations to L1 memory should only be made within a herd.
those allocations must not be yielded outside of a herd ( would break execution model).
memory allocation to L2 memory should only be made within a segment
those allocations must not be yielded outside of a segment ( would break execution model).
Can we check that verifiers check this behaviour, and documentation communicates it.
Some RyzenAI self-hosted runners fail on tests using large buffers with an allocation error:
[XRT] ERROR: Failed to allocate host memory buffer (mmap(len=16777216, prot=3, flags=8193, offset=4294967296) failed (err=11): Resource temporarily unavailable), make sure host bank is enabled (see xbutil configure --host-mem)
terminate called after throwing an instance of 'xrt_core::system_error'
what(): mmap(len=16777216, prot=3, flags=8193, offset=4294967296) failed (err=11): Resource temporarily unavailable
but the same error does not occur when running the tests manually, even on the same machine with the same binaries.
The problem is that the "max locked memory" limit is too low when running the tests under the github runner. As an example, on one github runner machine the limit seen by user is:
$ ulimit -l
3694888
But the value reported in a workflow is 8192
. Buffer allocation in the driver may fail as a result, with mmap
returning error num 11, "Resource Temporarily Unavailable". The fix is to increase the limit in the workflow script. Because a normal use is not able to increase the limit, one workaround is to add the following to the workflow script:
sudo prlimit -lunlimited --pid $$
with a corresponding line in the sudoers file to allow the command. e.g.:
%github ALL=(ALL) NOPASSWD: /usr/bin/prlimit *
Currently the utility scripts are the only documentation.
https://github.com/Xilinx/mlir-air/blob/main/docs/building.md
When I make petalinux_build, I encountered this problem:
make[1]: Entering directory '/home/pynq/projects/mlir-air/platforms/xilinx_vck190_air/petalinux/build/tmp/work/versal_generic-xilinx-linux/linux-xlnx/5.10+gitAUTOINC+568989d441-r0/linux-versal_generic-standard-build'
GEN Makefile
HOSTCC scripts/basic/fixdep
/opt/petalinux/2021.2/components/yocto/buildtools_extended/sysroots/x86_64-petalinux-linux/usr/bin/../lib/gcc/x86_64-petalinux-linux/10.2.0/../../../../x86_64-petalinux-linux/bin/ld: cannot find -lgcc_s
I'm not sure how to solve it, which library should I use, aarch64 or x86? Thank you!
In the AIRDependency pass, a call somewhere of the form op->erase()
is causing problems.
See #371 for more info (and an attempted fix).
Hello everyone,
When building mlid-air-pcie I find the following error when compiling mlir-air/runtime_lib/airhost/memory.cpp
:
dispatch_packet_t': expected expression
uint64_t signal_offset = offsetof(dispatch_packet_t, completion_signal);
^
/people/gioi152/src/tools/xilinx/mlir-air/runtime_lib/airhost/memory.cpp:270:60: error: use of undeclared iden\
tifier 'completion_signal'
uint64_t signal_offset = offsetof(dispatch_packet_t, completion_signal);
^
/people/gioi152/src/tools/xilinx/mlir-air/runtime_lib/airhost/memory.cpp:327:41: error: unexpected type name '\
dispatch_packet_t': expected expression
uint64_t signal_offset = offsetof(dispatch_packet_t, completion_signal);
^
/people/gioi152/src/tools/xilinx/mlir-air/runtime_lib/airhost/memory.cpp:327:60: error: use of undeclared iden\
tifier 'completion_signal'
uint64_t signal_offset = offsetof(dispatch_packet_t, completion_signal);
^
4 errors generated.
Am I missing any header?
Also, not a big problem, but I need to pass the fourth parameter to build-mlir-air-pcie.sh (which is the location of libXAIE just built and installed) instead of three (as shown here) or cmake won't be able to find it.
I'm trying to make an example where I allocate it in the segment (in L2 memory) and then use it as a target for the dma_memcpy_nd
in the herd within the segment. I have written an example for that in a branch called alloc-check-example
.
In the current form of the code, where I try to send the allocated memory to the herd through an operand/argument, I get the following error:
File "mlir-air/install-xrt/python/air/dialects/_air_ops_ext.py", line 105, in <listcomp>
operand_types = [s.type for s in sizes] * 2 + [o.type for o in operands]
AttributeError: 'AllocOp' object has no attribute 'type'
make: *** [Makefile:9: run] Error 1
If I don't try to send it from the segment to the herd as a herd operand, I get this error instead:
air._mlir_libs._site_initialize.<locals>.MLIRError: Unable to parse module assembly:
error: "-":21:11: 'air.dma_memcpy_nd' op using value defined outside the region
note: "-":21:11: see current operation: "air.dma_memcpy_nd"(<<UNKNOWN SSA VALUE>>, %arg4, %2, %3, %4, %5, %6, %7) <{operandSegmentSizes = array<i32: 0, 1, 0, 0, 0, 1, 2, 2, 2>}> : (memref<16x8xi32, 1 : i32>, memref<32x16xi32>, index, index, index, index, index, index) -> ()
note: "-":11:9: required by region isolation constraints
make: *** [Makefile:9: run] Error 1
I believe one of these two things should work, but I'm not sure which one (or both).
I was working on #642 (which is not ready to merge) and I started to see some behavior I didn't understand. So I tried to make a minimal example and I likewise have some unexpected behavior.
The example I came up with uses somewhat ridiculous things like tiny 1x1 data tiles to make it really obvious to me what was going on in the output. I found that my example fails for a 32x16 image and even an 8x8 image, but it is successful for a 4x4 image.
My example is here. It's not supposed to pass the python test harness, I've just been looking at the output to see what it's doing.
As usual, you can run with:
make clean && make
When it fails, it seems like no output is received (the output stays the original value of 0xFFFFFFFF
s in the test harness). When it succeeds, each value in the output image is increasing by 2, e.g. for the 4x4 image,
0000 0002 0004 0006
0008 000a 000c 000e
0010 0012 0014 0016
0018 001a 001c 001e
Because it only works for small images, I'm guessing I'm running into a limit on the number of channel operations/copies allowed at some point, but I have not yet confirmed this theory.
In submitting this issue I'm hoping to discover:
ChannelGet
/ChannelPut
ops? If so, what is the limit (and is there a way to catch it before a programmer runs into trouble?)This issue happens during calling aircc.py
to lower air IR generated from linalg.
c3a9b505f06936a3e4c81c221ca9fac2a7d6dbad
d21ca563e0c0fd100a4bbd98d194e770ce33bd79
Input: air.mlir
module {
func.func @forward(%arg0: memref<128x128xi32>, %arg1: memref<128x128xi32>, %arg2: memref<128x128xi32>) {
linalg.matmul ins(%arg0, %arg1 : memref<128x128xi32>, memref<128x128xi32>) outs(%arg2 : memref<128x128xi32>)
return
}
}
Compilation command:
air-opt air.mlir \
-o air.opt.mlir \
-buffer-results-to-out-params \
-air-linalg-codegen='l2-tile-size=64,64,64 l2-promote=true l1-tile-size=32,32,32 l1-promote=true' \
-air-par-to-herd \
-air-copy-to-dma \
-canonicalize -cse \
aircc.py \
-row-offset=3 \
-col-offset=5 \
./air.opt.mlir \
-o air.mlir.a \
--host-target=aarch64-linux-gnu \
--sysroot=${SYSROOT}
Error message:
loc("-":28:11): error: block with no terminator, has
"scf.for"(%1, %2, %3) ({
^bb0(%arg7: index):
"scf.for"(%1, %3, %0) ({
^bb0(%arg8: index):
"scf.for"(%1, %3, %0) ({
^bb0(%arg9: index):
"scf.for"(%1, %3, %0) ({
^bb0(%arg10: index):
%4 = "memref.load"(<<UNKNOWN SSA VALUE>>, %arg8, %arg10) : (memref<32x32xi32, 2>, index, index) -> i32
%5 = "memref.load"(<<UNKNOWN SSA VALUE>>, %arg10, %arg9) : (memref<32x32xi32, 2>, index, index) -> i32
%6 = "memref.load"(<<UNKNOWN SSA VALUE>>, %arg8, %arg9) : (memref<32x32xi32, 2>, index, index) -> i32
%7 = "arith.muli"(%4, %5) : (i32, i32) -> i32
%8 = "arith.addi"(%6, %7) : (i32, i32) -> i32
"memref.store"(%8, <<UNKNOWN SSA VALUE>>, %arg8, %arg9) : (i32, memref<32x32xi32, 2>, index, index) -> ()
"scf.yield"() : () -> ()
}) : (index, index, index) -> ()
"scf.yield"() : () -> ()
}) : (index, index, index) -> ()
"scf.yield"() : () -> ()
}) : (index, index, index) -> ()
"memref.dealloc"(<<UNKNOWN SSA VALUE>>) : (memref<32x32xi32, 2>) -> ()
"memref.dealloc"(<<UNKNOWN SSA VALUE>>) : (memref<32x32xi32, 2>) -> ()
"memref.dealloc"(<<UNKNOWN SSA VALUE>>) : (memref<32x32xi32, 2>) -> ()
"scf.yield"() : () -> ()
}) : (index, index, index) -> ()
Traceback (most recent call last):
File "/home/niansong/mlir-air/install/bin/aircc.py", line 13, in <module>
main()
File "/home/niansong/mlir-air/install/python/air/compiler/aircc/main.py", line 316, in main
run(module)
File "/home/niansong/mlir-air/install/python/air/compiler/aircc/main.py", line 138, in run
run_passes('builtin.module('+pass_pipeline+')', air_to_aie_module, opts,
File "/home/niansong/mlir-air/install/python/air/compiler/aircc/main.py", line 77, in run_passes
PassManager.parse(pass_pipeline).run(mlir_module)
RuntimeError: Failure while executing pass pipeline.
There isn't any documentation for AIE2, XRT, RyzenAI
see https://github.com/Xilinx/mlir-aie/blob/main/docs/buildHostLin.md
The aircpu library is not built in the "compiler only" mode but it is for cpu execution of air programs without any device or runtime.
Hi, I am not sure if this is me or a real build issue. Can you give some advice on the error below?
This occurs toward the end of 'make pynq' all stages up to this seem to work correctly. Here is where it stops:
Platform created:
./platform_repo/xilinx_vck190_air/export/xilinx_vck190_air/xilinx_vck190_air.xpfm
make[1]: *** No rule to make target '../petalinux/images/linux/sdk.sh', needed by 'prep_sysroot'. Stop.
make[1]: Leaving directory '/enc/sandbox/mlir-air/platforms/xilinx_vck190_air/aie_platform'
make: *** [Makefile:42: platform] Error 2
I am using Ubuntu 20 LTS & v2021.2 tools as requested.
I am going through the make file to see if I can figure it out. Will update if I see why.
This is part of my effort to write examples using channels in a variety of ways (#648).
I've been having some issues with getting worker-to-worker (core-to-core within a herd) data movement with channels to work. It's quite possible I just have a bug in my own code that I have not yet found, or that my own assumptions aren't accurate. If someone could take a look to see if it's reasonable, and maybe look into a fix if it's not my bug, that would be great!
My example code is here. You can recreate the issue on the worker2worker
branch by:
cd programming_examples/channel_examples/worker_to_worker
make
Before the patch #367, I could build mlir-air without any rocm/hsa. Now, my build fails with
/home/jamesn/labs/mlir-air/python/../runtime_lib/airhost/include/air_queue.h:12:10: fatal error: 'hsa/hsa.h' file not found
#include "hsa/hsa.h"
I don't want to build the runtime, I am only using the compiler (mlir subdirectory) Can we have an option to just build the compiler?
Current behaviour allows general parameters to be passed into the herd ( herds are isolated from above), but we don't have a path to lowering these at the moment.
AIE RTPs are one way of lowering these.
The proposed AIE logical dialect would also support core.tasks with general parameters ( which could be lowered from herd parameters).
We should add a verifier that indicates that the lowering is not currently complete ( rejecting code with extra arguments).
We should add a lowering path to enable scalar parameters to be passed into the herd.
aircc.py assumes that the clang, opt, et. al. in the user's PATH are the correct ones to use.
There is also a hard coded assumption for the path of mlir-aie: https://github.com/Xilinx/mlir-air/blob/main/python/air/compiler/aircc/main.py#L356
Hi,
I'm trying to build the AIR platform for VCK5000 with Vivado 2022.1. I'm hitting the following error when running make all
under mlir-air/platforms/xilinx_vck5000_air
ERROR: [IP_Flow 19-5481] Logical NOC instance '/axi_noc_0' has a total of '32G' DDR memory assigned, this exceeds the maximum '16G' DDR capacity for this IP. Please review your NOC DDR memory configuration.
I'm not sure what I'm missing. I'm using the VCK5000 v1.0 board files. Please find the attached log file for reference.
Thanks!
I'm working on an example where I have an 2D matrix of input data, then I break it into four data tiles, and then I am attempt to process one data tile per one compute tile in a variety of ways using AIR constructs. I am sanity checking my programs by making each compute core add a unique tile_num
to each value in the data tile they modify, so I can reassure myself that the compute tile I think is doing some work is actually the compute tile doing the work.
Anyways, I am trying to compose an example of this scenario that uses four launches, where the herd size is 1x1. My first attempt is here where I have a while loop within the herd because I hear the kernel will be persistent across launches.
Anyways, even with that persistence, I'd like to somehow parameterize the herd with the launch indices so I can calculate a unique tile_num
per launch. Is this something that is possible to do? If not, how do I reassure myself that one data tile is being processed per launch?
Seems like there's an empty vector in ::AIRDependency::createPartialMemref
function that caused this issue.
c3a9b505f06936a3e4c81c221ca9fac2a7d6dbad
d21ca563e0c0fd100a4bbd98d194e770ce33bd79
Input: mmult.triton.air.mlir
#map = affine_map<(d0, d1) -> (d0, d1)>
module {
func.func @matmul_kernel(%arg0: memref<*xi32>, %arg1: memref<*xi32>, %arg2: memref<*xi32>, %arg3: i32, %arg4: i32, %arg5: i32, %arg6: i32, %arg7: i32, %arg8: i32, %arg9: i32, %arg10: i32, %arg11: i32, %arg12: i32, %arg13: i32, %arg14: i32) {
%c0_i32 = arith.constant 0 : i32
%c128_i32 = arith.constant 128 : i32
%c128 = arith.constant 128 : index
%alloc = memref.alloc() {alignment = 64 : i64} : memref<128x128xi32>
linalg.fill ins(%c0_i32 : i32) outs(%alloc : memref<128x128xi32>)
%c1_i32 = arith.constant 1 : i32
%c0_i32_0 = arith.constant 0 : i32
%c-1_i32 = arith.constant -1 : i32
%0 = arith.cmpi sgt, %c128_i32, %c0_i32_0 : i32
%1 = arith.select %0, %c-1_i32, %c1_i32 : i32
%2 = arith.addi %1, %arg4 : i32
%3 = arith.divsi %2, %c128_i32 : i32
%4 = arith.addi %c1_i32, %3 : i32
%5 = arith.subi %c0_i32_0, %arg4 : i32
%6 = arith.divsi %5, %c128_i32 : i32
%7 = arith.subi %c0_i32_0, %6 : i32
%8 = arith.cmpi slt, %arg4, %c0_i32_0 : i32
%9 = arith.cmpi sgt, %arg4, %c0_i32_0 : i32
%10 = arith.cmpi slt, %c128_i32, %c0_i32_0 : i32
%11 = arith.cmpi sgt, %c128_i32, %c0_i32_0 : i32
%12 = arith.andi %8, %10 : i1
%13 = arith.andi %9, %11 : i1
%14 = arith.ori %12, %13 : i1
%15 = arith.select %14, %4, %7 : i32
%c1_i32_1 = arith.constant 1 : i32
%c0_i32_2 = arith.constant 0 : i32
%c-1_i32_3 = arith.constant -1 : i32
%16 = arith.cmpi slt, %15, %c0_i32_2 : i32
%17 = arith.select %16, %c1_i32_1, %c-1_i32_3 : i32
%18 = arith.subi %17, %arg12 : i32
%19 = arith.divsi %18, %15 : i32
%20 = arith.subi %c-1_i32_3, %19 : i32
%21 = arith.divsi %arg12, %15 : i32
%22 = arith.cmpi slt, %arg12, %c0_i32_2 : i32
%23 = arith.cmpi sgt, %arg12, %c0_i32_2 : i32
%24 = arith.cmpi slt, %15, %c0_i32_2 : i32
%25 = arith.cmpi sgt, %15, %c0_i32_2 : i32
%26 = arith.andi %22, %25 : i1
%27 = arith.andi %23, %24 : i1
%28 = arith.ori %26, %27 : i1
%29 = arith.select %28, %20, %21 : i32
%30 = arith.remsi %arg12, %15 : i32
%31 = arith.muli %29, %c128_i32 : i32
%32 = arith.muli %30, %c128_i32 : i32
%33 = arith.index_cast %31 : i32 to index
%34 = arith.index_cast %arg6 : i32 to index
%35 = arith.muli %33, %34 : index
%36 = arith.index_cast %arg7 : i32 to index
%37 = arith.index_cast %arg8 : i32 to index
%38 = arith.index_cast %32 : i32 to index
%39 = arith.index_cast %arg9 : i32 to index
%40 = arith.muli %38, %39 : index
%reinterpret_cast = memref.reinterpret_cast %arg0 to offset: [%35], sizes: [128, 128], strides: [%34, %36] : memref<*xi32> to memref<128x128xi32, strided<[?, ?], offset: ?>>
%reinterpret_cast_4 = memref.reinterpret_cast %arg1 to offset: [%40], sizes: [128, 128], strides: [%37, %39] : memref<*xi32> to memref<128x128xi32, strided<[?, ?], offset: ?>>
%alloc_5 = memref.alloc() : memref<128x128xi32>
%41 = arith.index_cast %arg5 : i32 to index
%42 = arith.minsi %41, %c128 : index
%subview = memref.subview %reinterpret_cast[0, 0] [128, %42] [1, 1] : memref<128x128xi32, strided<[?, ?], offset: ?>> to memref<128x?xi32, strided<[?, ?], offset: ?>>
%subview_6 = memref.subview %alloc_5[0, 0] [128, %42] [1, 1] : memref<128x128xi32> to memref<128x?xi32, strided<[128, 1]>>
%43 = arith.cmpi slt, %42, %c128 : index
scf.if %43 {
linalg.fill ins(%c0_i32 : i32) outs(%alloc_5 : memref<128x128xi32>)
}
linalg.copy {cast = #linalg.type_fn<cast_signed>} ins(%subview : memref<128x?xi32, strided<[?, ?], offset: ?>>) outs(%subview_6 : memref<128x?xi32, strided<[128, 1]>>)
%alloc_7 = memref.alloc() : memref<128x128xi32>
%44 = arith.index_cast %arg5 : i32 to index
%45 = arith.minsi %44, %c128 : index
%subview_8 = memref.subview %reinterpret_cast_4[0, 0] [%45, 128] [1, 1] : memref<128x128xi32, strided<[?, ?], offset: ?>> to memref<?x128xi32, strided<[?, ?], offset: ?>>
%subview_9 = memref.subview %alloc_7[0, 0] [%45, 128] [1, 1] : memref<128x128xi32> to memref<?x128xi32, strided<[128, 1]>>
%46 = arith.cmpi slt, %45, %c128 : index
scf.if %46 {
linalg.fill ins(%c0_i32 : i32) outs(%alloc_7 : memref<128x128xi32>)
}
linalg.copy {cast = #linalg.type_fn<cast_signed>} ins(%subview_8 : memref<?x128xi32, strided<[?, ?], offset: ?>>) outs(%subview_9 : memref<?x128xi32, strided<[128, 1]>>)
%alloc_10 = memref.alloc() {alignment = 64 : i64} : memref<128x128xi32>
%alloc_11 = memref.alloc() {alignment = 64 : i64} : memref<128x128xi32>
memref.copy %alloc_10, %alloc_11 : memref<128x128xi32> to memref<128x128xi32>
memref.dealloc %alloc_10 : memref<128x128xi32>
linalg.matmul ins(%alloc_5, %alloc_7 : memref<128x128xi32>, memref<128x128xi32>) outs(%alloc_11 : memref<128x128xi32>)
memref.dealloc %alloc_7 : memref<128x128xi32>
memref.dealloc %alloc_5 : memref<128x128xi32>
%alloc_12 = memref.alloc() {alignment = 64 : i64} : memref<128x128xi32>
linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%alloc_11, %alloc : memref<128x128xi32>, memref<128x128xi32>) outs(%alloc_12 : memref<128x128xi32>) {
^bb0(%in: i32, %in_17: i32, %out: i32):
%68 = arith.addi %in, %in_17 : i32
linalg.yield %68 : i32
}
memref.dealloc %alloc_11 : memref<128x128xi32>
%alloc_13 = memref.alloc() {alignment = 64 : i64} : memref<128x128xi32>
linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%alloc, %alloc_12 : memref<128x128xi32>, memref<128x128xi32>) outs(%alloc_13 : memref<128x128xi32>) {
^bb0(%in: i32, %in_17: i32, %out: i32):
%68 = arith.addi %in, %in_17 : i32
linalg.yield %68 : i32
}
memref.dealloc %alloc_12 : memref<128x128xi32>
memref.dealloc %alloc : memref<128x128xi32>
%47 = arith.muli %29, %c128_i32 : i32
%48 = arith.muli %30, %c128_i32 : i32
%49 = arith.index_cast %arg10 : i32 to index
%50 = arith.index_cast %47 : i32 to index
%51 = arith.muli %50, %49 : index
%52 = arith.index_cast %arg11 : i32 to index
%53 = arith.index_cast %48 : i32 to index
%54 = arith.muli %53, %52 : index
%55 = arith.addi %51, %54 : index
%reinterpret_cast_14 = memref.reinterpret_cast %arg2 to offset: [%55], sizes: [128, 128], strides: [%49, %52] : memref<*xi32> to memref<128x128xi32, strided<[?, ?], offset: ?>>
%56 = arith.index_cast %47 : i32 to index
%57 = arith.addi %56, %c128 : index
%58 = arith.index_cast %arg3 : i32 to index
%59 = arith.minsi %57, %58 : index
%60 = arith.subi %59, %56 : index
%61 = arith.index_cast %48 : i32 to index
%62 = arith.addi %61, %c128 : index
%63 = arith.index_cast %arg4 : i32 to index
%64 = arith.minsi %62, %63 : index
%65 = arith.subi %64, %61 : index
%66 = arith.minsi %60, %c128 : index
%67 = arith.minsi %65, %c128 : index
%subview_15 = memref.subview %alloc_13[0, 0] [%66, %67] [1, 1] : memref<128x128xi32> to memref<?x?xi32, strided<[128, 1]>>
%subview_16 = memref.subview %reinterpret_cast_14[0, 0] [%66, %67] [1, 1] : memref<128x128xi32, strided<[?, ?], offset: ?>> to memref<?x?xi32, strided<[?, ?], offset: ?>>
%cast = memref.cast %subview_15 : memref<?x?xi32, strided<[128, 1]>> to memref<?x?xi32, strided<[?, ?], offset: ?>>
linalg.copy {cast = #linalg.type_fn<cast_signed>} ins(%subview_15 : memref<?x?xi32, strided<[128, 1]>>) outs(%subview_16 : memref<?x?xi32, strided<[?, ?], offset: ?>>)
memref.dealloc %alloc_13 : memref<128x128xi32>
return
}
func.func @kernel(%arg0: memref<128x128xi32>, %arg1: memref<128x128xi32>, %arg2: memref<128x128xi32>, %arg3: i32, %arg4: i32, %arg5: i32, %arg6: i32, %arg7: i32, %arg8: i32, %arg9: i32, %arg10: i32, %arg11: i32, %arg12: i32, %arg13: i32, %arg14: i32) {
%cast = memref.cast %arg0 : memref<128x128xi32> to memref<*xi32>
%cast_0 = memref.cast %arg1 : memref<128x128xi32> to memref<*xi32>
%cast_1 = memref.cast %arg2 : memref<128x128xi32> to memref<*xi32>
call @matmul_kernel(%cast, %cast_0, %cast_1, %arg3, %arg4, %arg5, %arg6, %arg7, %arg8, %arg9, %arg10, %arg11, %arg12, %arg13, %arg14) : (memref<*xi32>, memref<*xi32>, memref<*xi32>, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32) -> ()
return
}
}
Compilation command:
air-opt mmult.triton.air.mlir \
-buffer-results-to-out-params \
-air-linalg-codegen \
-air-par-to-herd \
-air-copy-to-dma \
-air-dependency \
-canonicalize -cse \
-o mmult.air.mlir
Error message and stack trace
air-opt: /home/niansong/mlir-air/llvm/llvm/include/llvm/ADT/SmallVector.h:294: reference llvm::SmallVectorTemplateCommon<mlir::Value>::operator[](size_type) [T = mlir::Value]: Assertion `idx < size()' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: air-opt mmult.triton.air.mlir -buffer-results-to-out-params -air-linalg-codegen -air-par-to-herd -air-copy-to-dma -air-dependency -canonicalize -cse -o mmult.air.mlir
#0 0x000055cbc9a8c007 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/niansong/mlir-air/install/bin/air-opt+0x2854007)
#1 0x000055cbc9a89e5e llvm::sys::RunSignalHandlers() (/home/niansong/mlir-air/install/bin/air-opt+0x2851e5e)
#2 0x000055cbc9a8c80f SignalHandler(int) Signals.cpp:0:0
#3 0x00007f54362a1420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
#4 0x00007f5435d3400b raise /build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
#5 0x00007f5435d13859 abort /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:81:7
#6 0x00007f5435d13729 get_sysdep_segment_value /build/glibc-SzIz7B/glibc-2.31/intl/loadmsgcat.c:509:8
#7 0x00007f5435d13729 _nl_load_domain /build/glibc-SzIz7B/glibc-2.31/intl/loadmsgcat.c:970:34
#8 0x00007f5435d24fd6 (/lib/x86_64-linux-gnu/libc.so.6+0x33fd6)
#9 0x000055cbc8049ac9 llvm::SmallVectorTemplateCommon<mlir::Value, void>::operator[](unsigned long) AIRDependencyScheduleOpt.cpp:0:0
#10 0x000055cbc815728d (anonymous namespace)::AIRDependency::createPartialMemref(mlir::Value, unsigned int, llvm::SmallVector<mlir::Value, 2u>) AIRDependency.cpp:0:0
#11 0x000055cbc815778c void (anonymous namespace)::AIRDependency::traceDeps<xilinx::air::ExecuteOp>(llvm::SmallVector<(anonymous namespace)::AIRDependency::partialMemref, 1u>, xilinx::air::ExecuteOp, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>) AIRDependency.cpp:0:0
#12 0x000055cbc81569e3 (anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*)::operator()(mlir::Operation*) const AIRDependency.cpp:0:0
#13 0x000055cbc815470d void llvm::function_ref<void (mlir::Operation*)>::callback_fn<(anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*)>(long, mlir::Operation*) AIRDependency.cpp:0:0
#14 0x000055cbc84d5dce mlir::detail::walk(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) (/home/niansong/mlir-air/install/bin/air-opt+0x129ddce)
#15 0x000055cbc81546b2 std::enable_if<llvm::is_one_of<mlir::Operation*, mlir::Operation*, mlir::Region*, mlir::Block*>::value, void>::type mlir::detail::walk<(mlir::WalkOrder)1, (anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*), mlir::Operation*, void>(mlir::Operation*, (anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*)&&) AIRDependency.cpp:0:0
#16 0x000055cbc815465d std::enable_if<llvm::function_traits<std::decay<(anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*)>::type>::num_args == 1, void>::type mlir::Operation::walk<(mlir::WalkOrder)1, (anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*), void>((anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*)&&) AIRDependency.cpp:0:0
#17 0x000055cbc8148ba0 std::enable_if<llvm::function_traits<std::decay<(anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*)>::type>::num_args == 1, void>::type mlir::OpState::walk<(mlir::WalkOrder)1, (anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*), void>((anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*)&&) AIRDependency.cpp:0:0
#18 0x000055cbc81472de (anonymous namespace)::AIRDependency::runOnOperation() AIRDependency.cpp:0:0
#19 0x000055cbc8388c9f mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) (/home/niansong/mlir-air/install/bin/air-opt+0x1150c9f)
#20 0x000055cbc83892c9 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) (/home/niansong/mlir-air/install/bin/air-opt+0x11512c9)
#21 0x000055cbc838b446 mlir::PassManager::run(mlir::Operation*) (/home/niansong/mlir-air/install/bin/air-opt+0x1153446)
#22 0x000055cbc8385b86 performActions(llvm::raw_ostream&, bool, bool, std::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, llvm::function_ref<mlir::LogicalResult (mlir::PassManager&)>, bool, bool) MlirOptMain.cpp:0:0
#23 0x000055cbc838585d mlir::LogicalResult llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<mlir::LogicalResult (mlir::PassManager&)>, mlir::DialectRegistry&, bool, bool, bool, bool, bool, bool, bool)::$_0>(long, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) MlirOptMain.cpp:0:0
#24 0x000055cbc840e4c8 mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, bool, bool) (/home/niansong/mlir-air/install/bin/air-opt+0x11d64c8)
#25 0x000055cbc8383dfe mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<mlir::LogicalResult (mlir::PassManager&)>, mlir::DialectRegistry&, bool, bool, bool, bool, bool, bool, bool) (/home/niansong/mlir-air/install/bin/air-opt+0x114bdfe)
#26 0x000055cbc838429f mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&, bool) (/home/niansong/mlir-air/install/bin/air-opt+0x114c29f)
#27 0x000055cbc7fc1a3a main (/home/niansong/mlir-air/install/bin/air-opt+0xd89a3a)
#28 0x00007f5435d15083 __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:342:3
#29 0x000055cbc7fc173e _start (/home/niansong/mlir-air/install/bin/air-opt+0xd8973e)
./compile.sh: line 9: 643162 Aborted air-opt mmult.triton.air.mlir -buffer-results-to-out-params -air-linalg-codegen -air-par-to-herd -air-copy-to-dma -air-dependency -canonicalize -cse -o mmult.air.mlir
If LibXAIE is not found, then AIR_BUILD_RUNTIME=OFF should be set.
On the MLIR-AIE side, everything seems to be clear as long as it is to be run on VCK190. I understood that to make it on VCK5000, MLIR-AIR is needed. I installed it as detailed here with the most recent ROCm-runtime.
Following aircc doc, there seems not to be in any way a clear path to get an MLIR code up and running on VCK5000. I tried to play with the tests and a few examples, such as SPARTA but I am not seeing that going anywhere.
Perhaps there is an indication or a tutorial like the one for MLIR-AIE here where from an MLIR code written in AIE/R dialects, I can get it running on VCK5000?
Add .PHONY: sysroot
to the petalinux directory Makefile, otherwise the sysroot
target will always be up-to-date as the target name is the same as the SYSROOT directory name.
This target must execute before the further steps of the platform build occur.
Hi there,
I'm trying to build test 21 https://github.com/Xilinx/mlir-air/tree/main/test/airhost/21_air_nd_memcpy_2d, but I'm getting an undefined symbol error when linking:
ld.lld: error: undefined symbol: air::segments::segment_0::mlir_aie_write_buffer_scratch_0_0(aie_libxaie_ctx_t*, int, int)
>>> referenced by test.cpp:106 (/home/nx08/nx08/s2081362-2/mlir-air/test/airhost/21_air_nd_memcpy_2d/test.cpp:106)
Has anyone else encountered this issue?
Need update for Xilinx/mlir-aie#247
I'm trying to update the iree-amd-aie project with the latest version of mlir-aie. That seems to have removed or renamed the DimTuple attr which used to be defined here https://github.com/Xilinx/mlir-aie/blob/c0341aa3d525827a21f25b7423b18f4359a34cf3/include/aie/Dialect/AIE/IR/AIEAttrs.td but is still used in this project here:
mlir-air/mlir/lib/Conversion/AIRToAIEPass.cpp
Line 2288 in 3ae51ef
With docker image containers.xilinx.com/acdc/build:2.0
the current util build script, targetting x86, fails with
CMake Error at utils/llvm/build/lib/cmake/llvm/HandleLLVMOptions.cmake:320 (message):
Host compiler does not support '-fuse-ld=lld'
Call Stack (most recent call first):
CMakeLists.txt:73 (include)
It seems that this issue can be solved by passing in the toolchain file twice into the cmake:
-DCMAKE_TOOLCHAIN_FILE=`pwd`/../cmake/modules/toolchain_x86_64.cmake \
-Dx86_64_TOOLCHAIN_FILE=`pwd`/../cmake/modules/toolchain_x86_64.cmake \
I've written a couple of multi-segment examples as part of the programmming examples generally (and specifically for channels #648) in this PR #663.
Right now, all 3 examples using 2 segments fail during compilation with a segfault. I have not looked further into the issue but I hope to do some debugging myself next week.
Edit: I'm putting work on this on hold for a bit, if anyone wants to pick this up.
I am working on writing a worker-to-worker data transfer example for channels (as part of the grouping of examples that exercise various features of channels, #648).
Draft PR is here: #653
I am basing it off the code in the channel_size example (PR waiting to be merged here: #642)
The channel_size example works well for me. As an intermediate step to adding worker-to-worker communication to that example, I tried to have each worker send data to itself over a channel. That is the version of the code that is pushed in the draft PR #653. The particular file of interest is this one.
When I run with this intermediate step, I get the following error:
Using aiecc.py from: /scratch/ehunhoff/mlir-air/mlir-aie/install/bin/..
Running: builtin.module(air-insert-launch-and-segment-around-herd,func.func(air-lower-herd-parallel),air-dma-to-channel,canonicalize,cse,air-specialize-channel-wrap-and-stride,func.func(air-renumber-dma),func.func(convert-linalg-to-loops),air-place-herds{num-rows=6 num-cols=4 row-anchor=2 col-anchor=0})
Running: builtin.module(air-to-aie{emit-while-loop=false row-offset=2 col-offset=0 device=npu1_4col})
python3: /scratch/ehunhoff/mlir-air/mlir/lib/Conversion/AIRToAIESchedulingUtils.cpp:956: void xilinx::air::simpleDMAChannelAllocation(std::vector<MemcpyBundleAsFlow> &, ShimDMAAllocator &, MemTileDMAAllocator &, TileDMAAllocator &): Assertion `core' failed.
Aborted (core dumped)
make: *** [Makefile:7: run] Error 134
My question is:
I get a build error compiling the firmware. See my comments here: fa07290#r87744561
As part of the channel examples, and an interesting discussion on allocation, I wanted to see if I could explicitly allocate L3 memory in a launch and L2 memory in a segment. To this end, I wrote the programming_examples/channel_examples/hierarchical_alloc
example in this PR: #661
Currently, it fails to fully compile with an error like this:
Traceback (most recent call last):
File "mlir-air/programming_examples/channel_examples/hierarchical_alloc/run.py", line 86, in <module>
test_main(build_module, verbose=args.verbose)
File "mlir-air/programming_examples/channel_examples/hierarchical_alloc/run.py", line 45, in test_main
addone = backend.compile_and_load(mlir_module)
File "mlir-air/install-xrt/python/air/backend/xrt.py", line 222, in compile_and_load
c = self.compile(module)
File "mlir-air/install-xrt/python/air/backend/xrt.py", line 117, in compile
aircc.run(air_module, aircc_options)
File "mlir-air/install-xrt/python/air/compiler/aircc/main.py", line 449, in run
run_passes(air_to_npu_passes, air_to_npu_module, opts, air_to_npu_file)
File "mlir-air/install-xrt/python/air/compiler/aircc/main.py", line 113, in run_passes
PassManager.parse(pass_pipeline).run(mlir_module.operation)
air._mlir_libs._site_initialize.<locals>.MLIRError: Failure while executing pass pipeline:
error: "-":145:20: failed to legalize operation 'airrt.alloc' marked as erased
note: "-":145:20: see current operation: %16 = "airrt.alloc"() : () -> memref<1x4xi32, 1 : i32>
note: "-":161:16: found live user of result #0: %5 = memref.load %2[%c0, %c0] : memref<1x4xi32, 1 : i32>
make: *** [Makefile:9: run] Error 1
I'm not fully confident this example is a reasonable thing to ask the air compiler to handle, but I think it might be. If it is not, let me know, and I will either change or erase the example!
Recently, when I tried to do an l2 allocation from within a herd, I got a segfault. This makes sense as I believe only the following allocations are legal:
It would be more user-friendly if any illegal allocations outside of the above were caught in a verifier.
This is low priority, but ideally I would like to run the Matrix Scalar Add examples with the experimental_passes
aircc.py
option set. However, the experimental passes break both of the currently working examples, single_core_dma
and single_core_channel
. For single_core_dma
, the output is wrong. For single_core_channel
, there is a segfault.
To replicate, set experimental_passes=True
in this file (on the minimal-matrix-scalar-add
branch).
For single_core_dma
:
cd programming_examples/matrix_scalar_add/single_core_dma
make clean
make
For single_core_channel
:
cd programming_examples/matrix_scalar_add/single_core_dma
make clean
make
I investigated a bit which passes might be causing the problems, and if I just comment out the first two of the experimental passes (defined here), both examples still work with the remaining passes:
#"air-dependency",
#"air-dependency-schedule-opt",
I'm working on my draft PR #621
I rebased my branch to master after #620 was merged in. However, after that rebase, the two examples that previously worked (single_core_dma
and single_core_channel
) no longer work.
You can replicate the working versions in branch debugging-matrix-scalar-add
(which diverges from main
at bcbfed5c
instead of HEAD
=45592176
) with:
cd programming_examples/matrix_scalar_add/single_core_dma
make
and
cd programming_examples/matrix_scalar_add/single_core_channel
make
You can replicate the failing tests in branch minimal-matrix-scalar-add
with the same commands.
For the single_core_dma
example, the files aie.air.mlir
and placed.air.mlir
are identical between the passing/failing cases. The file npu.air.mlir
has the following diff:
$ diff broken_single_core_dma_build/air_project/npu.air.mlir working_single_core_dma_build/air_project/npu.air.mlir
63,64c63,64
< aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 0, 512][1, 1, 8, 16][0, 0, 32]) {id = 2 : i64, metadata = @airMemcpyId3} : memref<32x16xi32>
< aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 0, 528][1, 1, 8, 16][0, 0, 32]) {id = 3 : i64, metadata = @airMemcpyId3} : memref<32x16xi32>
---
> aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 16, 0][1, 1, 8, 16][0, 0, 32]) {id = 2 : i64, metadata = @airMemcpyId3} : memref<32x16xi32>
> aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 16, 16][1, 1, 8, 16][0, 0, 32]) {id = 3 : i64, metadata = @airMemcpyId3} : memref<32x16xi32>
67,68c67,68
< aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 0, 512][1, 1, 8, 16][0, 0, 32]) {id = 6 : i64, metadata = @airMemcpyId4} : memref<32x16xi32>
< aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 0, 528][1, 1, 8, 16][0, 0, 32]) {id = 7 : i64, metadata = @airMemcpyId4} : memref<32x16xi32>
---
> aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 16, 0][1, 1, 8, 16][0, 0, 32]) {id = 6 : i64, metadata = @airMemcpyId4} : memref<32x16xi32>
> aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 16, 16][1, 1, 8, 16][0, 0, 32]) {id = 7 : i64, metadata = @airMemcpyId4} : memref<32x16xi32>
The diff for the single_core_channel
example is essentially the same.
Let me know if more information is needed!
Hi, I have this weird issue while building the repo following the instructions in this page:
FAILED: mlir/lib/CAPI/CMakeFiles/obj.AIRCAPI.dir/Runner.cpp.o
/net/media/scratch/fournier/llvm-install/llvm-15/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/net/media/scratch/fournier/llvm-for-mlir-aie/llvm/include -I/net/media/scratch/fournier/llvm-for-mlir-aie/build-Debug/include -I/net/media/scratch/fournier/llvm-for-mlir-aie/mlir/include -I/net/media/scratch/fournier/llvm-for-mlir-aie/build-Debug/tools/mlir/include -I/net/media/scratch/fournier/mlir-aie/include -I/net/media/scratch/fournier/mlir-aie/build/include -I/net/media/scratch/fournier/mlir-air/mlir/include -I/net/media/scratch/fournier/mlir-air/build/mlir/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -std=gnu++17 -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -fno-exceptions -fno-rtti -UNDEBUG -MD -MT mlir/lib/CAPI/CMakeFiles/obj.AIRCAPI.dir/Runner.cpp.o -MF mlir/lib/CAPI/CMakeFiles/obj.AIRCAPI.dir/Runner.cpp.o.d -o mlir/lib/CAPI/CMakeFiles/obj.AIRCAPI.dir/Runner.cpp.o -c /net/media/scratch/fournier/mlir-air/mlir/lib/CAPI/Runner.cpp
In file included from /net/media/scratch/fournier/mlir-air/mlir/lib/CAPI/Runner.cpp:11:
In file included from /net/media/scratch/fournier/mlir-air/mlir/include/air/Util/Runner.h:12:
In file included from /net/media/scratch/fournier/mlir-air/mlir/include/air/Util/Dependency.h:34:
In file included from /usr/include/boost/graph/graphviz.hpp:25:
/usr/include/boost/property_map/dynamic_property_map.hpp:150:28: error: use of typeid requires -frtti
if (in_value.type() == typeid(value_type)) {
^
/usr/include/boost/property_map/dynamic_property_map.hpp:191:56: error: use of typeid requires -frtti
virtual const std::type_info& key() const { return typeid(key_type); }
^
/usr/include/boost/property_map/dynamic_property_map.hpp:192:56: error: use of typeid requires -frtti
virtual const std::type_info& value() const { return typeid(value_type); }
^
/usr/include/boost/property_map/dynamic_property_map.hpp:286:29: error: use of typeid requires -frtti
if (i->second->key() == typeid(key)) {
^
/usr/include/boost/property_map/dynamic_property_map.hpp:308:29: error: use of typeid requires -frtti
if (i->second->key() == typeid(key))
^
/usr/include/boost/property_map/dynamic_property_map.hpp:321:29: error: use of typeid requires -frtti
if (i->second->key() == typeid(key))
^
/usr/include/boost/property_map/dynamic_property_map.hpp:334:29: error: use of typeid requires -frtti
if (i->second->key() == typeid(key))
As you can see boost requires RTTI, but the command line of the compiler contains -fno-rtti
. Do you have an idea, what could be causing this? When I grep for rtti
in the repo I only find hits in the cmake files that are in sandbox and they don't appear related... Thanks for any help.
I have setup all the tools for the first time and can get pretty far. However, when I try to compile any of the examples it complains about unrecognized architecture 'aie'.
What did I miss in my setup or is this a new bug?
I just cloned this repo earlier today, I have built up to commit 8ea962a. Using Ubuntu 20.04LTS & v2021.2 tools. I have it set to use the sysroot from the remnants of the pynq pre-build.
This is the beefmaker example project make results:
clang ../../../install/runtime_lib/test_library.cpp --target=aarch64-linux-gnu --sysroot=../../../platforms/xilinx_vck190_air/petalinux/sysroot/sysroots/cortexa72-cortexa53-xilinx-linux/ -g -I../../../platforms/xilinx_vck190_air/petalinux/sysroot/sysroots/cortexa72-cortexa53-xilinx-linux//opt/xaienginev2/include -std=c++17 -I/sandbox-hdd/mlir-air/install/bin//../runtime_lib/airhost/include -I../../../install/runtime_lib -DAIR_LIBXAIE_ENABLE -DLIBXAIENGINEV2 -c -o test_library.o
xchesscc -p me -P /tools/Xilinx/Vitis/2021.2/aietools/data/cervino/lib -c chess/beefmaker_kernel.cc
aircc.py -o beefmaker.air.a --host-target=aarch64-linux-gnu -xbridge --sysroot=../../../platforms/xilinx_vck190_air/petalinux/sysroot/sysroots/cortexa72-cortexa53-xilinx-linux/ air.mlir
Compiling partitions: ['partition_0']
Found Vitis at /enc/tools/Xilinx/Vitis/2021.2
MLIR compilation: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:-- 0:00:00 0/1 1 Workeropt: unrecognized architecture 'aie' provided.
Error encountered while running: opt --opaque-pointers=0 --passes=default<O2> -inline-threshold=10 -S air_project/partition_0/input.ll -o air_project/partition_0/input.opt.ll
Error ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:-- 0:00:00 0/1 1 Worker
Error encountered while running: aiecc.py --sysroot ../../../platforms/xilinx_vck190_air/petalinux/sysroot/sysroots/cortexa72-cortexa53-xilinx-linux/ --host-target aarch64-linux-gnu --tmpdir air_project/partition_0 --aie-generate-xaiev2 --xbridge --no-xchesscc air_project/aiecc.partition_0.mlir
make: *** [Makefile:18: beefmaker.air.a] Error 1
In order to support code generated from loop-peeling, we'd like to look at forming herds from multiple scf.forall statements ( rather than targeting just one), so that L1 allocations can stay local within a herd definition.
Channels are a key abstraction of mlir-air, but there are few examples for how to use them. This issue is a place to discuss which examples are needed to show how channels work, and which of those examples are implemented.
The instructions in https://xilinx.github.io/mlir-air/building.html appear to have a typo:
git clone https://github.com/stephenneuendorffer/aie-rt
cd aie-rt
git checkout phoenix_v2023.2
cd driver/src
make -f Makefile.Linux CFLAGS="-D__AIEAMDAIR__"
sudo cp -r ../include /opt/aiengine/
sudo cp libxaiengine.so* /opt/xaiengine/lib/
export LD_LIBRARY_PATH=/opt/xaiengine/lib:${LD_LIBRARY_PATH}
opt/aiengine
vs opt/xaiengine
?
Hi,
I've encountered the error stated on the title of the issue when I try to compile the test 13_mb_add_one. If I check the symbol table of libxaiengine.xo
I get that AmdAirBackend
is defined but AmdairBackend
is undefined:
I built the libxaiengine
library from https://github.com/stephenneuendorffer/aie-rt, branch phoenix_v2023.2, following the instructions in the mlir-air documentation: https://xilinx.github.io/mlir-air/building.html.
Edit: found the solution, there was a problem with the definition of a variable in the source code because of a name mismatch. Will post the fix as a pull request in the relevant repository soon.
Hello everyone!
In cmake/modules/toolchain_x86.cmake there is an hardcoded reference to LLVM 12
# specify the compiler
set(CLANG_VER 12)
set(CMAKE_C_COMPILER clang-${CLANG_VER})
set(CMAKE_CXX_COMPILER clang++-${CLANG_VER})
set(CMAKE_ASM_COMPILER clang-${CLANG_VER})
set(CMAKE_STRIP llvm-strip-${CLANG_VER})
set(CLANG_LLD lld-${CLANG_VER} CACHE STRING "" FORCE)
However, the LLVM version downloaded might be different (LLVM 17 as I write) and the builder cannot find the compilers. One option is to update the clang version variable, however clang++-17 doesn't exist (only clang++ and clang-17) so that doesn't work either. I resolved temporarily by removing any reference to the CLANG version in the above variables. Do we need to check for specific version of LLVM? can we use llvm-config of the LLVM we built?
Thanks,
Roberto
It seems that some ODS information needs to be update because of recent LLVM changes. See also this related issue.
I have tried to compile the nod.ai shark runtime with the iree-amd-aie plugin enabled. The latter uses this project. I got the following error:
In file included from /home/fharwath/wd/shark/SRT/third_party/llvm-project/mlir/include/mlir/IR/Types.h:12:
/home/fharwath/wd/shark/SRT/third_party/llvm-project/mlir/include/mlir/IR/TypeSupport.h:54:28: error: no member named 'name' in 'xilinx::airrt::EventType'
54 | T::name);
| ~~~^
This is caused by LLVM commit 3dbac2c007c1 [mlir] Expose type and attribute names in the MLIRContext and abstract type/attr classes
.
air-to-aie on a transform test, run from mlir-air/mlir/test directory:
$ air-opt --air-to-aie Transform/AIRDependency/matmul_parallel.mlir
Maybe this is not a sensible pass to run on this input, but I'm reporting just in case you don't think users should be reaching assertion failures like this. Here's the dump:
/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1045: std::vector::reference std::vector<int>::operator[](std::vector::size_type) [_Tp = int, _Alloc = std::allocator<int>]: Assertion '__n < this->size()' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: ../../build/bin/air-opt --air-to-aie Transform/AIRDependency/matmul_parallel.mlir
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0 air-opt 0x00005563621ba8f7
Hi,
I get the error stated in the title when I run tests in this respository (in particular I've tried https://github.com/Xilinx/mlir-air/tree/main/test/airhost/271_air_get_info and https://github.com/Xilinx/mlir-air/tree/main/test/airhost/13_mb_add_one). The errors happen at the libxaiengine level. For example [AIE ERROR] XAie_AmdAirIO_Read32():165: Error opening /sys/class/amdair/amdair/00/address
.
I was trying to use the compile()
and load()
methods of XRTBackend
when I was doing some debugging recently. I realized the load()
method takes a module: air.ir.Module
as an argument which is then never used.
This is confusing.
The abstract base class (AirBackend
) is flexible, because we can specify a unique CompiledArtifact
for XRTBackend
. I think this confusion would be fixed if the CompiledArgument
for the XRTBackend
were something like (in pseudo code):
class XRTArtifact:
xclbin: File(),
insts: File(),
e.g., the compiled artifacts are a pair of files (or file paths) pointing to the xclbin and instruction file.
I'm happy to make a PR for this, if others think this is a reasonable change. If there is some history behind the current format that needs to be taken into account, I'm happy to hear it!
The AIRBIN script requires elfutils. When attempting to build it on one of the machines, we get the following error from configure regarding zstd:
./configure: line 7060: syntax error near unexpected token `ZSTD_COMPRESS,libzstd'
./configure: line 7060: ` PKG_CHECK_MODULES(ZSTD_COMPRESS,libzstd >= 1.4.0,'
The workaround was to copy a configuration file from a working machine. Want to note this so we look into it later and remember the temporary fix.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.