xilinx / mlir-air Goto Github PK

View Code? Open in Web Editor NEW

71.0 13.0 26.0 32.96 MB

License: MIT License

CMake 1.11% Makefile 0.47% Python 8.21% Shell 0.64% C++ 47.92% MLIR 40.08% C 0.37% TypeScript 1.17% JavaScript 0.04%

mlir-air's People

Stargazers

Watchers

mlir-air's Issues

Multi Core DMA Matrix Scalar Add Example Fails

From branch debugging_matrix_scalar_add, I am working on getting my example multi_core_dma working (this file). The single core version works in this branch, but when I increase the herd size from 1x1 to 2x2, the example does not work any more.

To be specific, to replicate run:

cd programming_examples/matrix_scalar_add/multi_core_dma
make

When I inspect programming_examples/matrix_scalar_add/multi_core_dma/build/air_project/npu.air.mlir, I notice that it looks like all of the data is only going to one core instead of being distributed to all of the cores.

      aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 0, 0][1, 1, 8, 16][0, 0, 32]) {id = 0 : i64, metadata = @airMemcpyId9} : memref<32x16xi32>
      aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 16, 0][1, 1, 8, 16][0, 0, 32]) {id = 1 : i64, metadata = @airMemcpyId9} : memref<32x16xi32>
      aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 0, 16][1, 1, 8, 16][0, 0, 32]) {id = 2 : i64, metadata = @airMemcpyId9} : memref<32x16xi32>
      aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 16, 16][1, 1, 8, 16][0, 0, 32]) {id = 3 : i64, metadata = @airMemcpyId9} : memref<32x16xi32>
      aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 0, 0][1, 1, 8, 16][0, 0, 32]) {id = 4 : i64, metadata = @airMemcpyId10} : memref<32x16xi32>
      aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 16, 0][1, 1, 8, 16][0, 0, 32]) {id = 5 : i64, metadata = @airMemcpyId10} : memref<32x16xi32>
      aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 0, 16][1, 1, 8, 16][0, 0, 32]) {id = 6 : i64, metadata = @airMemcpyId10} : memref<32x16xi32>
      aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 16, 16][1, 1, 8, 16][0, 0, 32]) {id = 7 : i64, metadata = @airMemcpyId10} : memref<32x16xi32>
      aiex.npu.sync {channel = 0 : i32, column = 0 : i32, column_num = 1 : i32, direction = 0 : i32, row = 0 : i32, row_num = 1 : i32}
      aiex.npu.sync {channel = 0 : i32, column = 0 : i32, column_num = 1 : i32, direction = 0 : i32, row = 0 : i32, row_num = 1 : i32}
      aiex.npu.sync {channel = 0 : i32, column = 0 : i32, column_num = 1 : i32, direction = 0 : i32, row = 0 : i32, row_num = 1 : i32}
      aiex.npu.sync {channel = 0 : i32, column = 0 : i32, column_num = 1 : i32, direction = 0 : i32, row = 0 : i32, row_num = 1 : i32}

Multi Core Channel Matrix Scalar Add Example Fails

From branch debugging_matrix_scalar_add, I am working on getting my example multi_core_channel working (this file). The single core version works in this branch, but when I increase the herd size from 1x1 to 2x2, the example does not work any more.

To be specific, to replicate run:

cd programming_examples/matrix_scalar_add/multi_core_channel
make

When I inspect programming_examples/matrix_scalar_add/multi_core_channel/build/air_project/npu.air.mlir, I notice that it looks like all of the data is only going to one core instead of being distributed to all of the cores.

      aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 0, 0][1, 1, 8, 16][0, 0, 32]) {id = 0 : i64, metadata = @airMemcpyId3} : memref<32x16xi32>
      aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 0, 0][1, 1, 8, 16][0, 0, 32]) {id = 1 : i64, metadata = @airMemcpyId4} : memref<32x16xi32>
      aiex.npu.sync {channel = 0 : i32, column = 0 : i32, column_num = 1 : i32, direction = 0 : i32, row = 0 : i32, row_num = 1 : i32}
      aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 0, 16][1, 1, 8, 16][0, 0, 32]) {id = 0 : i64, metadata = @airMemcpyId3} : memref<32x16xi32>
      aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 0, 16][1, 1, 8, 16][0, 0, 32]) {id = 1 : i64, metadata = @airMemcpyId4} : memref<32x16xi32>
      aiex.npu.sync {channel = 0 : i32, column = 0 : i32, column_num = 1 : i32, direction = 0 : i32, row = 0 : i32, row_num = 1 : i32}
      aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 16, 0][1, 1, 8, 16][0, 0, 32]) {id = 0 : i64, metadata = @airMemcpyId3} : memref<32x16xi32>
      aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 16, 0][1, 1, 8, 16][0, 0, 32]) {id = 1 : i64, metadata = @airMemcpyId4} : memref<32x16xi32>
      aiex.npu.sync {channel = 0 : i32, column = 0 : i32, column_num = 1 : i32, direction = 0 : i32, row = 0 : i32, row_num = 1 : i32}
      aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 16, 16][1, 1, 8, 16][0, 0, 32]) {id = 0 : i64, metadata = @airMemcpyId3} : memref<32x16xi32>
      aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 16, 16][1, 1, 8, 16][0, 0, 32]) {id = 1 : i64, metadata = @airMemcpyId4} : memref<32x16xi32>
      aiex.npu.sync {channel = 0 : i32, column = 0 : i32, column_num = 1 : i32, direction = 0 : i32, row = 0 : i32, row_num = 1 : i32}

air runtime python bindings do not have tests

Write some tests for the runtime python bindings.
https://github.com/Xilinx/mlir-air/blob/main/runtime_lib/python/LibAirHostModule.cpp

PYNQ packages

Hi, In the Pynq .spec file there is a Pynq package called aienginesv2 being built. Is that package public somewhere?

Thank you!

See contents of: https://github.com/Xilinx/mlir-air/blob/main/pynq/vck190_air/vck190_air.spec

Missing `test_library.h` building runtime library for target aarch64

Description

AIR reports a missing header during runtime library build for target aarch64. The test_library.h header is available under /runtime_lib/aarch64/test_lib/include, but for some reason the compiler only looks for /runtime_lib/x86_64/test_lib/include. I was able to work around this by setting CPLUS_INCLUDE_PATH but there probably is some hard-coded path pointed to x86.

Tool commit points

MLIR-AIR @c3a9b505f06936a3e4c81c221ca9fac2a7d6dbad
MLIR-AIE @d21ca563e0c0fd100a4bbd98d194e770ce33bd79

Repeat this issue

Compile AIR with runtime lib target aarch64:

cmake .. \
    -GNinja \
    -DCMAKE_C_COMPILER=clang \
    -DCMAKE_CXX_COMPILER=clang++ \
    -DCMAKE_INSTALL_PREFIX="../${INSTALL_DIR}" \
    -DArch=arm64 \
    -DgccVer=10.2.0 \
    -DCMAKE_USE_TOOLCHAIN=FALSE \
    -DCMAKE_USE_TOOLCHAIN_AIRHOST=TRUE \
    -DLLVM_DIR=${LLVM_DIR}/build/lib/cmake/llvm \
    -DMLIR_DIR=${LLVM_DIR}/build/lib/cmake/mlir \
    -DAIE_DIR=${MLIR_AIE_DIR}/build/lib/cmake/aie \
    -Dpybind11_DIR=${PYTHON_ROOT}/pybind11/share/cmake/pybind11 \
    -DAIR_RUNTIME_TARGETS:STRING="aarch64" \
    -Daarch64_TOOLCHAIN_FILE=/home/niansong/mlir-air/cmake/modules/toolchain_aarch64.cmake \
    -DBUILD_SHARED_LIBS=OFF \
    -DLLVM_USE_LINKER=lld \
    -DXILINX_XAIE_INCLUDE_DIR=/home/niansong/mlir-air/install/runtime_lib/aarch64/xaiengine/include \
    -DXILINX_XAIE_LIBS=/home/niansong/mlir-air/install/runtime_lib/aarch64/xaiengine/lib \
    -DCMAKE_MODULE_PATH=${CMAKEMODULES_DIR}/ \
    |& tee cmake.log

During build there's error message:

1 error generated.
[21/30] Building CXX object airhost/CMakeFiles/airhost.dir/queue.cpp.o
FAILED: airhost/CMakeFiles/airhost.dir/queue.cpp.o
/usr/bin/clang++-10 --sysroot=/group/xrlabs/platforms/vck190-pynq-v2.7/sysroot -DLIBXAIENGINEV2 -I/proj/rdi/staff/erweiw/niansong_end_of_internship/mlir-air/runtime_lib/airhost/include -I/proj/rdi/staff/erweiw/niansong_end_of_internship/mlir-air/utils/mlir-aie/include -I/proj/rdi/staff/erweiw/niansong_end_of_internship/mlir-air/utils/mlir-aie/build/include/../runtime_lib/x86_64/test_lib/include -I/group/xrlabs/platforms/vck190-pynq-v2.7/sysroot/opt/xaienginev2/include --sysroot=/group/xrlabs/platforms/vck190-pynq-v2.7/sysroot --target=aarch64-linux-gnu --gcc-toolchain=/group/xrlabs/platforms/vck190-pynq-v2.7/sysroot/usr -fuse-ld=lld-10 -Wno-unused-command-line-argument -std=gnu++14 -fPIC -MD -MT airhost/CMakeFiles/airhost.dir/queue.cpp.o -MF airhost/CMakeFiles/airhost.dir/queue.cpp.o.d -o airhost/CMakeFiles/airhost.dir/queue.cpp.o -c /proj/rdi/staff/erweiw/niansong_end_of_internship/mlir-air/runtime_lib/airhost/queue.cpp
In file included from /proj/rdi/staff/erweiw/niansong_end_of_internship/mlir-air/runtime_lib/airhost/queue.cpp:20:
/proj/rdi/staff/erweiw/niansong_end_of_internship/mlir-air/runtime_lib/airhost/include/air_host_impl.h:11:10: fatal error: 'test_library.h' file not found
#include "test_library.h"
         ^~~~~~~~~~~~~~~~
1 error generated.

Confirm that verifiers are in place to verify allocations to L1 / L2 memory are only made inside Herds / Segments

aie.herds are intentionally isolated from above to allow parallel compilation.

memory allocations to L1 memory should only be made within a herd.
those allocations must not be yielded outside of a herd ( would break execution model).

memory allocation to L2 memory should only be made within a segment
those allocations must not be yielded outside of a segment ( would break execution model).

Can we check that verifiers check this behaviour, and documentation communicates it.

github runner sets max locked memory too low for some tests

Some RyzenAI self-hosted runners fail on tests using large buffers with an allocation error:

[XRT] ERROR: Failed to allocate host memory buffer (mmap(len=16777216, prot=3, flags=8193, offset=4294967296) failed (err=11): Resource temporarily unavailable), make sure host bank is enabled (see xbutil configure --host-mem)
terminate called after throwing an instance of 'xrt_core::system_error'
  what():  mmap(len=16777216, prot=3, flags=8193, offset=4294967296) failed (err=11): Resource temporarily unavailable

but the same error does not occur when running the tests manually, even on the same machine with the same binaries.

The problem is that the "max locked memory" limit is too low when running the tests under the github runner. As an example, on one github runner machine the limit seen by user is:

$ ulimit -l
3694888

But the value reported in a workflow is 8192. Buffer allocation in the driver may fail as a result, with mmap returning error num 11, "Resource Temporarily Unavailable". The fix is to increase the limit in the workflow script. Because a normal use is not able to increase the limit, one workaround is to add the following to the workflow script:

sudo prlimit -lunlimited --pid $$

with a corresponding line in the sudoers file to allow the command. e.g.:

%github ALL=(ALL) NOPASSWD: /usr/bin/prlimit *

mlir-air needs build documentation

Currently the utility scripts are the only documentation.

https://github.com/Xilinx/mlir-air/blob/main/docs/building.md

x86_64-petalinux-linux/bin/ld: cannot find -lgcc_s

When I make petalinux_build, I encountered this problem:

make[1]: Entering directory '/home/pynq/projects/mlir-air/platforms/xilinx_vck190_air/petalinux/build/tmp/work/versal_generic-xilinx-linux/linux-xlnx/5.10+gitAUTOINC+568989d441-r0/linux-versal_generic-standard-build'
GEN Makefile
HOSTCC scripts/basic/fixdep
/opt/petalinux/2021.2/components/yocto/buildtools_extended/sysroots/x86_64-petalinux-linux/usr/bin/../lib/gcc/x86_64-petalinux-linux/10.2.0/../../../../x86_64-petalinux-linux/bin/ld: cannot find -lgcc_s

I'm not sure how to solve it, which library should I use, aarch64 or x86? Thank you!

Operation erased with user error

In the AIRDependency pass, a call somewhere of the form op->erase() is causing problems.

See #371 for more info (and an attempted fix).

dispatch_packet_t': expected expression

Hello everyone,

When building mlid-air-pcie I find the following error when compiling mlir-air/runtime_lib/airhost/memory.cpp:

dispatch_packet_t': expected expression
      uint64_t signal_offset = offsetof(dispatch_packet_t, completion_signal);
                                        ^
/people/gioi152/src/tools/xilinx/mlir-air/runtime_lib/airhost/memory.cpp:270:60: error: use of undeclared iden\
tifier 'completion_signal'
      uint64_t signal_offset = offsetof(dispatch_packet_t, completion_signal);
                                                           ^
/people/gioi152/src/tools/xilinx/mlir-air/runtime_lib/airhost/memory.cpp:327:41: error: unexpected type name '\
dispatch_packet_t': expected expression
      uint64_t signal_offset = offsetof(dispatch_packet_t, completion_signal);
                                        ^
/people/gioi152/src/tools/xilinx/mlir-air/runtime_lib/airhost/memory.cpp:327:60: error: use of undeclared iden\
tifier 'completion_signal'
      uint64_t signal_offset = offsetof(dispatch_packet_t, completion_signal);
                                                           ^
4 errors generated.

Am I missing any header?

Also, not a big problem, but I need to pass the fourth parameter to build-mlir-air-pcie.sh (which is the location of libXAIE just built and installed) instead of three (as shown here) or cmake won't be able to find it.

Access memory allocated in segment in the herd

I'm trying to make an example where I allocate it in the segment (in L2 memory) and then use it as a target for the dma_memcpy_nd in the herd within the segment. I have written an example for that in a branch called alloc-check-example.

In the current form of the code, where I try to send the allocated memory to the herd through an operand/argument, I get the following error:

  File "mlir-air/install-xrt/python/air/dialects/_air_ops_ext.py", line 105, in <listcomp>
    operand_types = [s.type for s in sizes] * 2 + [o.type for o in operands]
AttributeError: 'AllocOp' object has no attribute 'type'
make: *** [Makefile:9: run] Error 1

If I don't try to send it from the segment to the herd as a herd operand, I get this error instead:

air._mlir_libs._site_initialize.<locals>.MLIRError: Unable to parse module assembly:
error: "-":21:11: 'air.dma_memcpy_nd' op using value defined outside the region
 note: "-":21:11: see current operation: "air.dma_memcpy_nd"(<<UNKNOWN SSA VALUE>>, %arg4, %2, %3, %4, %5, %6, %7) <{operandSegmentSizes = array<i32: 0, 1, 0, 0, 0, 1, 2, 2, 2>}> : (memref<16x8xi32, 1 : i32>, memref<32x16xi32>, index, index, index, index, index, index) -> ()
 note: "-":11:9: required by region isolation constraints
make: *** [Makefile:9: run] Error 1

I believe one of these two things should work, but I'm not sure which one (or both).

Limit on ChannelGet/ChannelPut operations?

I was working on #642 (which is not ready to merge) and I started to see some behavior I didn't understand. So I tried to make a minimal example and I likewise have some unexpected behavior.

The example I came up with uses somewhat ridiculous things like tiny 1x1 data tiles to make it really obvious to me what was going on in the output. I found that my example fails for a 32x16 image and even an 8x8 image, but it is successful for a 4x4 image.

My example is here. It's not supposed to pass the python test harness, I've just been looking at the output to see what it's doing.

As usual, you can run with:

make clean && make

When it fails, it seems like no output is received (the output stays the original value of 0xFFFFFFFFs in the test harness). When it succeeds, each value in the output image is increasing by 2, e.g. for the 4x4 image,

0000 0002 0004 0006 
0008 000a 000c 000e 
0010 0012 0014 0016 
0018 001a 001c 001e

Because it only works for small images, I'm guessing I'm running into a limit on the number of channel operations/copies allowed at some point, but I have not yet confirmed this theory.

In submitting this issue I'm hoping to discover:

Is this expected behavior when using many ChannelGet/ChannelPut ops? If so, what is the limit (and is there a way to catch it before a programmer runs into trouble?)
If it's not expected behavior, maybe a bug fix?

AIR lowering pipeline failed for mmult with l2 tile size 64,64,64, l1 tile size 32,32,32

Description

This issue happens during calling aircc.py to lower air IR generated from linalg.

Tool commit points

MLIR-AIR @c3a9b505f06936a3e4c81c221ca9fac2a7d6dbad
MLIR-AIE @d21ca563e0c0fd100a4bbd98d194e770ce33bd79

Repeat the issue

Input: air.mlir

module {
  func.func @forward(%arg0: memref<128x128xi32>, %arg1: memref<128x128xi32>, %arg2: memref<128x128xi32>) {
    linalg.matmul ins(%arg0, %arg1 : memref<128x128xi32>, memref<128x128xi32>) outs(%arg2 : memref<128x128xi32>)
    return
  }
}

Compilation command:

air-opt air.mlir \
	-o air.opt.mlir \
	-buffer-results-to-out-params \
	-air-linalg-codegen='l2-tile-size=64,64,64 l2-promote=true l1-tile-size=32,32,32 l1-promote=true' \
	-air-par-to-herd \
	-air-copy-to-dma \
	-canonicalize -cse \

aircc.py \
    -row-offset=3 \
    -col-offset=5 \
    ./air.opt.mlir \
    -o air.mlir.a \
    --host-target=aarch64-linux-gnu \
    --sysroot=${SYSROOT}

Error message:

loc("-":28:11): error: block with no terminator, has 
"scf.for"(%1, %2, %3) ({
^bb0(%arg7: index):
  "scf.for"(%1, %3, %0) ({
  ^bb0(%arg8: index):
    "scf.for"(%1, %3, %0) ({
    ^bb0(%arg9: index):
      "scf.for"(%1, %3, %0) ({
      ^bb0(%arg10: index):
        %4 = "memref.load"(<<UNKNOWN SSA VALUE>>, %arg8, %arg10) : (memref<32x32xi32, 2>, index, index) -> i32
        %5 = "memref.load"(<<UNKNOWN SSA VALUE>>, %arg10, %arg9) : (memref<32x32xi32, 2>, index, index) -> i32
        %6 = "memref.load"(<<UNKNOWN SSA VALUE>>, %arg8, %arg9) : (memref<32x32xi32, 2>, index, index) -> i32
        %7 = "arith.muli"(%4, %5) : (i32, i32) -> i32
        %8 = "arith.addi"(%6, %7) : (i32, i32) -> i32
        "memref.store"(%8, <<UNKNOWN SSA VALUE>>, %arg8, %arg9) : (i32, memref<32x32xi32, 2>, index, index) -> ()
        "scf.yield"() : () -> ()
      }) : (index, index, index) -> ()
      "scf.yield"() : () -> ()
    }) : (index, index, index) -> ()
    "scf.yield"() : () -> ()
  }) : (index, index, index) -> ()
  "memref.dealloc"(<<UNKNOWN SSA VALUE>>) : (memref<32x32xi32, 2>) -> ()
  "memref.dealloc"(<<UNKNOWN SSA VALUE>>) : (memref<32x32xi32, 2>) -> ()
  "memref.dealloc"(<<UNKNOWN SSA VALUE>>) : (memref<32x32xi32, 2>) -> ()
  "scf.yield"() : () -> ()
}) : (index, index, index) -> ()
Traceback (most recent call last):
  File "/home/niansong/mlir-air/install/bin/aircc.py", line 13, in <module>
    main()
  File "/home/niansong/mlir-air/install/python/air/compiler/aircc/main.py", line 316, in main
    run(module)
  File "/home/niansong/mlir-air/install/python/air/compiler/aircc/main.py", line 138, in run
    run_passes('builtin.module('+pass_pipeline+')', air_to_aie_module, opts,
  File "/home/niansong/mlir-air/install/python/air/compiler/aircc/main.py", line 77, in run_passes
    PassManager.parse(pass_pipeline).run(mlir_module)
RuntimeError: Failure while executing pass pipeline.

Build instructions/getting started for RyzenAI / XRT

There isn't any documentation for AIE2, XRT, RyzenAI
see https://github.com/Xilinx/mlir-aie/blob/main/docs/buildHostLin.md

aircpu library is not built in compiler only mode

The aircpu library is not built in the "compiler only" mode but it is for cpu execution of air programs without any device or runtime.

VCK190 pynq platform build failure

Hi, I am not sure if this is me or a real build issue. Can you give some advice on the error below?

This occurs toward the end of 'make pynq' all stages up to this seem to work correctly. Here is where it stops:

Platform created:
./platform_repo/xilinx_vck190_air/export/xilinx_vck190_air/xilinx_vck190_air.xpfm
make[1]: *** No rule to make target '../petalinux/images/linux/sdk.sh', needed by 'prep_sysroot'. Stop.
make[1]: Leaving directory '/enc/sandbox/mlir-air/platforms/xilinx_vck190_air/aie_platform'
make: *** [Makefile:42: platform] Error 2

I am using Ubuntu 20 LTS & v2021.2 tools as requested.

I am going through the make file to see if I can figure it out. Will update if I see why.

Worker-to-Worker Channel Example

This is part of my effort to write examples using channels in a variety of ways (#648).

I've been having some issues with getting worker-to-worker (core-to-core within a herd) data movement with channels to work. It's quite possible I just have a bug in my own code that I have not yet found, or that my own assumptions aren't accurate. If someone could take a look to see if it's reasonable, and maybe look into a fix if it's not my bug, that would be great!

My example code is here. You can recreate the issue on the worker2worker branch by:

cd programming_examples/channel_examples/worker_to_worker
make

Failure to build since rocm/hsa addition

Before the patch #367, I could build mlir-air without any rocm/hsa. Now, my build fails with

/home/jamesn/labs/mlir-air/python/../runtime_lib/airhost/include/air_queue.h:12:10: fatal error: 'hsa/hsa.h' file not found
#include "hsa/hsa.h"

I don't want to build the runtime, I am only using the compiler (mlir subdirectory) Can we have an option to just build the compiler?

Herd parameters allow general behaviour, but current lowering does not support this.

Current behaviour allows general parameters to be passed into the herd ( herds are isolated from above), but we don't have a path to lowering these at the moment.

AIE RTPs are one way of lowering these.

The proposed AIE logical dialect would also support core.tasks with general parameters ( which could be lowered from herd parameters).

We should add a verifier that indicates that the lowering is not currently complete ( rejecting code with extra arguments).
We should add a lowering path to enable scalar parameters to be passed into the herd.

aircc can reference wrong version of llvm tools.

aircc.py assumes that the clang, opt, et. al. in the user's PATH are the correct ones to use.
There is also a hard coded assumption for the path of mlir-aie: https://github.com/Xilinx/mlir-air/blob/main/python/air/compiler/aircc/main.py#L356

[xilinx_vck5000_air] make error: "exceeds the maximum 16G DDR capacity" of AXI NOC IP

Hi,

I'm trying to build the AIR platform for VCK5000 with Vivado 2022.1. I'm hitting the following error when running make all under mlir-air/platforms/xilinx_vck5000_air

ERROR: [IP_Flow 19-5481] Logical NOC instance '/axi_noc_0' has a total of '32G' DDR memory assigned, this exceeds the maximum '16G' DDR capacity for this IP. Please review your NOC DDR memory configuration.

I'm not sure what I'm missing. I'm using the VCK5000 v1.0 board files. Please find the attached log file for reference.

Thanks!

vivado.log

Multiple launches, herd with single core

I'm working on an example where I have an 2D matrix of input data, then I break it into four data tiles, and then I am attempt to process one data tile per one compute tile in a variety of ways using AIR constructs. I am sanity checking my programs by making each compute core add a unique tile_num to each value in the data tile they modify, so I can reassure myself that the compute tile I think is doing some work is actually the compute tile doing the work.

Anyways, I am trying to compose an example of this scenario that uses four launches, where the herd size is 1x1. My first attempt is here where I have a while loop within the herd because I hear the kernel will be persistent across launches.

Anyways, even with that persistence, I'd like to somehow parameterize the herd with the launch indices so I can calculate a unique tile_num per launch. Is this something that is possible to do? If not, how do I reassure myself that one data tile is being processed per launch?

`air-dependency` pass failed on mmult code generated from Triton

Description

Seems like there's an empty vector in ::AIRDependency::createPartialMemref function that caused this issue.

Tool commit points

MLIR-AIR @c3a9b505f06936a3e4c81c221ca9fac2a7d6dbad
MLIR-AIE @d21ca563e0c0fd100a4bbd98d194e770ce33bd79

Repeat this issue

Input: mmult.triton.air.mlir

#map = affine_map<(d0, d1) -> (d0, d1)>
module {
  func.func @matmul_kernel(%arg0: memref<*xi32>, %arg1: memref<*xi32>, %arg2: memref<*xi32>, %arg3: i32, %arg4: i32, %arg5: i32, %arg6: i32, %arg7: i32, %arg8: i32, %arg9: i32, %arg10: i32, %arg11: i32, %arg12: i32, %arg13: i32, %arg14: i32) {
    %c0_i32 = arith.constant 0 : i32
    %c128_i32 = arith.constant 128 : i32
    %c128 = arith.constant 128 : index
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<128x128xi32>
    linalg.fill ins(%c0_i32 : i32) outs(%alloc : memref<128x128xi32>)
    %c1_i32 = arith.constant 1 : i32
    %c0_i32_0 = arith.constant 0 : i32
    %c-1_i32 = arith.constant -1 : i32
    %0 = arith.cmpi sgt, %c128_i32, %c0_i32_0 : i32
    %1 = arith.select %0, %c-1_i32, %c1_i32 : i32
    %2 = arith.addi %1, %arg4 : i32
    %3 = arith.divsi %2, %c128_i32 : i32
    %4 = arith.addi %c1_i32, %3 : i32
    %5 = arith.subi %c0_i32_0, %arg4 : i32
    %6 = arith.divsi %5, %c128_i32 : i32
    %7 = arith.subi %c0_i32_0, %6 : i32
    %8 = arith.cmpi slt, %arg4, %c0_i32_0 : i32
    %9 = arith.cmpi sgt, %arg4, %c0_i32_0 : i32
    %10 = arith.cmpi slt, %c128_i32, %c0_i32_0 : i32
    %11 = arith.cmpi sgt, %c128_i32, %c0_i32_0 : i32
    %12 = arith.andi %8, %10 : i1
    %13 = arith.andi %9, %11 : i1
    %14 = arith.ori %12, %13 : i1
    %15 = arith.select %14, %4, %7 : i32
    %c1_i32_1 = arith.constant 1 : i32
    %c0_i32_2 = arith.constant 0 : i32
    %c-1_i32_3 = arith.constant -1 : i32
    %16 = arith.cmpi slt, %15, %c0_i32_2 : i32
    %17 = arith.select %16, %c1_i32_1, %c-1_i32_3 : i32
    %18 = arith.subi %17, %arg12 : i32
    %19 = arith.divsi %18, %15 : i32
    %20 = arith.subi %c-1_i32_3, %19 : i32
    %21 = arith.divsi %arg12, %15 : i32
    %22 = arith.cmpi slt, %arg12, %c0_i32_2 : i32
    %23 = arith.cmpi sgt, %arg12, %c0_i32_2 : i32
    %24 = arith.cmpi slt, %15, %c0_i32_2 : i32
    %25 = arith.cmpi sgt, %15, %c0_i32_2 : i32
    %26 = arith.andi %22, %25 : i1
    %27 = arith.andi %23, %24 : i1
    %28 = arith.ori %26, %27 : i1
    %29 = arith.select %28, %20, %21 : i32
    %30 = arith.remsi %arg12, %15 : i32
    %31 = arith.muli %29, %c128_i32 : i32
    %32 = arith.muli %30, %c128_i32 : i32
    %33 = arith.index_cast %31 : i32 to index
    %34 = arith.index_cast %arg6 : i32 to index
    %35 = arith.muli %33, %34 : index
    %36 = arith.index_cast %arg7 : i32 to index
    %37 = arith.index_cast %arg8 : i32 to index
    %38 = arith.index_cast %32 : i32 to index
    %39 = arith.index_cast %arg9 : i32 to index
    %40 = arith.muli %38, %39 : index
    %reinterpret_cast = memref.reinterpret_cast %arg0 to offset: [%35], sizes: [128, 128], strides: [%34, %36] : memref<*xi32> to memref<128x128xi32, strided<[?, ?], offset: ?>>
    %reinterpret_cast_4 = memref.reinterpret_cast %arg1 to offset: [%40], sizes: [128, 128], strides: [%37, %39] : memref<*xi32> to memref<128x128xi32, strided<[?, ?], offset: ?>>
    %alloc_5 = memref.alloc() : memref<128x128xi32>
    %41 = arith.index_cast %arg5 : i32 to index
    %42 = arith.minsi %41, %c128 : index
    %subview = memref.subview %reinterpret_cast[0, 0] [128, %42] [1, 1] : memref<128x128xi32, strided<[?, ?], offset: ?>> to memref<128x?xi32, strided<[?, ?], offset: ?>>
    %subview_6 = memref.subview %alloc_5[0, 0] [128, %42] [1, 1] : memref<128x128xi32> to memref<128x?xi32, strided<[128, 1]>>
    %43 = arith.cmpi slt, %42, %c128 : index
    scf.if %43 {
      linalg.fill ins(%c0_i32 : i32) outs(%alloc_5 : memref<128x128xi32>)
    }
    linalg.copy {cast = #linalg.type_fn<cast_signed>} ins(%subview : memref<128x?xi32, strided<[?, ?], offset: ?>>) outs(%subview_6 : memref<128x?xi32, strided<[128, 1]>>)
    %alloc_7 = memref.alloc() : memref<128x128xi32>
    %44 = arith.index_cast %arg5 : i32 to index
    %45 = arith.minsi %44, %c128 : index
    %subview_8 = memref.subview %reinterpret_cast_4[0, 0] [%45, 128] [1, 1] : memref<128x128xi32, strided<[?, ?], offset: ?>> to memref<?x128xi32, strided<[?, ?], offset: ?>>
    %subview_9 = memref.subview %alloc_7[0, 0] [%45, 128] [1, 1] : memref<128x128xi32> to memref<?x128xi32, strided<[128, 1]>>
    %46 = arith.cmpi slt, %45, %c128 : index
    scf.if %46 {
      linalg.fill ins(%c0_i32 : i32) outs(%alloc_7 : memref<128x128xi32>)
    }
    linalg.copy {cast = #linalg.type_fn<cast_signed>} ins(%subview_8 : memref<?x128xi32, strided<[?, ?], offset: ?>>) outs(%subview_9 : memref<?x128xi32, strided<[128, 1]>>)
    %alloc_10 = memref.alloc() {alignment = 64 : i64} : memref<128x128xi32>
    %alloc_11 = memref.alloc() {alignment = 64 : i64} : memref<128x128xi32>
    memref.copy %alloc_10, %alloc_11 : memref<128x128xi32> to memref<128x128xi32>
    memref.dealloc %alloc_10 : memref<128x128xi32>
    linalg.matmul ins(%alloc_5, %alloc_7 : memref<128x128xi32>, memref<128x128xi32>) outs(%alloc_11 : memref<128x128xi32>)
    memref.dealloc %alloc_7 : memref<128x128xi32>
    memref.dealloc %alloc_5 : memref<128x128xi32>
    %alloc_12 = memref.alloc() {alignment = 64 : i64} : memref<128x128xi32>
    linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%alloc_11, %alloc : memref<128x128xi32>, memref<128x128xi32>) outs(%alloc_12 : memref<128x128xi32>) {
    ^bb0(%in: i32, %in_17: i32, %out: i32):
      %68 = arith.addi %in, %in_17 : i32
      linalg.yield %68 : i32
    }
    memref.dealloc %alloc_11 : memref<128x128xi32>
    %alloc_13 = memref.alloc() {alignment = 64 : i64} : memref<128x128xi32>
    linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%alloc, %alloc_12 : memref<128x128xi32>, memref<128x128xi32>) outs(%alloc_13 : memref<128x128xi32>) {
    ^bb0(%in: i32, %in_17: i32, %out: i32):
      %68 = arith.addi %in, %in_17 : i32
      linalg.yield %68 : i32
    }
    memref.dealloc %alloc_12 : memref<128x128xi32>
    memref.dealloc %alloc : memref<128x128xi32>
    %47 = arith.muli %29, %c128_i32 : i32
    %48 = arith.muli %30, %c128_i32 : i32
    %49 = arith.index_cast %arg10 : i32 to index
    %50 = arith.index_cast %47 : i32 to index
    %51 = arith.muli %50, %49 : index
    %52 = arith.index_cast %arg11 : i32 to index
    %53 = arith.index_cast %48 : i32 to index
    %54 = arith.muli %53, %52 : index
    %55 = arith.addi %51, %54 : index
    %reinterpret_cast_14 = memref.reinterpret_cast %arg2 to offset: [%55], sizes: [128, 128], strides: [%49, %52] : memref<*xi32> to memref<128x128xi32, strided<[?, ?], offset: ?>>
    %56 = arith.index_cast %47 : i32 to index
    %57 = arith.addi %56, %c128 : index
    %58 = arith.index_cast %arg3 : i32 to index
    %59 = arith.minsi %57, %58 : index
    %60 = arith.subi %59, %56 : index
    %61 = arith.index_cast %48 : i32 to index
    %62 = arith.addi %61, %c128 : index
    %63 = arith.index_cast %arg4 : i32 to index
    %64 = arith.minsi %62, %63 : index
    %65 = arith.subi %64, %61 : index
    %66 = arith.minsi %60, %c128 : index
    %67 = arith.minsi %65, %c128 : index
    %subview_15 = memref.subview %alloc_13[0, 0] [%66, %67] [1, 1] : memref<128x128xi32> to memref<?x?xi32, strided<[128, 1]>>
    %subview_16 = memref.subview %reinterpret_cast_14[0, 0] [%66, %67] [1, 1] : memref<128x128xi32, strided<[?, ?], offset: ?>> to memref<?x?xi32, strided<[?, ?], offset: ?>>
    %cast = memref.cast %subview_15 : memref<?x?xi32, strided<[128, 1]>> to memref<?x?xi32, strided<[?, ?], offset: ?>>
    linalg.copy {cast = #linalg.type_fn<cast_signed>} ins(%subview_15 : memref<?x?xi32, strided<[128, 1]>>) outs(%subview_16 : memref<?x?xi32, strided<[?, ?], offset: ?>>)
    memref.dealloc %alloc_13 : memref<128x128xi32>
    return
  }
  func.func @kernel(%arg0: memref<128x128xi32>, %arg1: memref<128x128xi32>, %arg2: memref<128x128xi32>, %arg3: i32, %arg4: i32, %arg5: i32, %arg6: i32, %arg7: i32, %arg8: i32, %arg9: i32, %arg10: i32, %arg11: i32, %arg12: i32, %arg13: i32, %arg14: i32) {
    %cast = memref.cast %arg0 : memref<128x128xi32> to memref<*xi32>
    %cast_0 = memref.cast %arg1 : memref<128x128xi32> to memref<*xi32>
    %cast_1 = memref.cast %arg2 : memref<128x128xi32> to memref<*xi32>
    call @matmul_kernel(%cast, %cast_0, %cast_1, %arg3, %arg4, %arg5, %arg6, %arg7, %arg8, %arg9, %arg10, %arg11, %arg12, %arg13, %arg14) : (memref<*xi32>, memref<*xi32>, memref<*xi32>, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32) -> ()
    return
  }
}

Compilation command:

air-opt mmult.triton.air.mlir \
    -buffer-results-to-out-params \
    -air-linalg-codegen \
    -air-par-to-herd \
    -air-copy-to-dma \
    -air-dependency \
    -canonicalize -cse \
    -o mmult.air.mlir

Error message and stack trace

air-opt: /home/niansong/mlir-air/llvm/llvm/include/llvm/ADT/SmallVector.h:294: reference llvm::SmallVectorTemplateCommon<mlir::Value>::operator[](size_type) [T = mlir::Value]: Assertion `idx < size()' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: air-opt mmult.triton.air.mlir -buffer-results-to-out-params -air-linalg-codegen -air-par-to-herd -air-copy-to-dma -air-dependency -canonicalize -cse -o mmult.air.mlir
 #0 0x000055cbc9a8c007 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/niansong/mlir-air/install/bin/air-opt+0x2854007)
 #1 0x000055cbc9a89e5e llvm::sys::RunSignalHandlers() (/home/niansong/mlir-air/install/bin/air-opt+0x2851e5e)
 #2 0x000055cbc9a8c80f SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f54362a1420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #4 0x00007f5435d3400b raise /build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #5 0x00007f5435d13859 abort /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:81:7
 #6 0x00007f5435d13729 get_sysdep_segment_value /build/glibc-SzIz7B/glibc-2.31/intl/loadmsgcat.c:509:8
 #7 0x00007f5435d13729 _nl_load_domain /build/glibc-SzIz7B/glibc-2.31/intl/loadmsgcat.c:970:34
 #8 0x00007f5435d24fd6 (/lib/x86_64-linux-gnu/libc.so.6+0x33fd6)
 #9 0x000055cbc8049ac9 llvm::SmallVectorTemplateCommon<mlir::Value, void>::operator[](unsigned long) AIRDependencyScheduleOpt.cpp:0:0
#10 0x000055cbc815728d (anonymous namespace)::AIRDependency::createPartialMemref(mlir::Value, unsigned int, llvm::SmallVector<mlir::Value, 2u>) AIRDependency.cpp:0:0
#11 0x000055cbc815778c void (anonymous namespace)::AIRDependency::traceDeps<xilinx::air::ExecuteOp>(llvm::SmallVector<(anonymous namespace)::AIRDependency::partialMemref, 1u>, xilinx::air::ExecuteOp, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>) AIRDependency.cpp:0:0
#12 0x000055cbc81569e3 (anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*)::operator()(mlir::Operation*) const AIRDependency.cpp:0:0
#13 0x000055cbc815470d void llvm::function_ref<void (mlir::Operation*)>::callback_fn<(anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*)>(long, mlir::Operation*) AIRDependency.cpp:0:0
#14 0x000055cbc84d5dce mlir::detail::walk(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) (/home/niansong/mlir-air/install/bin/air-opt+0x129ddce)
#15 0x000055cbc81546b2 std::enable_if<llvm::is_one_of<mlir::Operation*, mlir::Operation*, mlir::Region*, mlir::Block*>::value, void>::type mlir::detail::walk<(mlir::WalkOrder)1, (anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*), mlir::Operation*, void>(mlir::Operation*, (anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*)&&) AIRDependency.cpp:0:0
#16 0x000055cbc815465d std::enable_if<llvm::function_traits<std::decay<(anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*)>::type>::num_args == 1, void>::type mlir::Operation::walk<(mlir::WalkOrder)1, (anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*), void>((anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*)&&) AIRDependency.cpp:0:0
#17 0x000055cbc8148ba0 std::enable_if<llvm::function_traits<std::decay<(anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*)>::type>::num_args == 1, void>::type mlir::OpState::walk<(mlir::WalkOrder)1, (anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*), void>((anonymous namespace)::AIRDependency::runOnOperation()::'lambda0'(mlir::Operation*)&&) AIRDependency.cpp:0:0
#18 0x000055cbc81472de (anonymous namespace)::AIRDependency::runOnOperation() AIRDependency.cpp:0:0
#19 0x000055cbc8388c9f mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) (/home/niansong/mlir-air/install/bin/air-opt+0x1150c9f)
#20 0x000055cbc83892c9 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) (/home/niansong/mlir-air/install/bin/air-opt+0x11512c9)
#21 0x000055cbc838b446 mlir::PassManager::run(mlir::Operation*) (/home/niansong/mlir-air/install/bin/air-opt+0x1153446)
#22 0x000055cbc8385b86 performActions(llvm::raw_ostream&, bool, bool, std::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, llvm::function_ref<mlir::LogicalResult (mlir::PassManager&)>, bool, bool) MlirOptMain.cpp:0:0
#23 0x000055cbc838585d mlir::LogicalResult llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<mlir::LogicalResult (mlir::PassManager&)>, mlir::DialectRegistry&, bool, bool, bool, bool, bool, bool, bool)::$_0>(long, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) MlirOptMain.cpp:0:0
#24 0x000055cbc840e4c8 mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, bool, bool) (/home/niansong/mlir-air/install/bin/air-opt+0x11d64c8)
#25 0x000055cbc8383dfe mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<mlir::LogicalResult (mlir::PassManager&)>, mlir::DialectRegistry&, bool, bool, bool, bool, bool, bool, bool) (/home/niansong/mlir-air/install/bin/air-opt+0x114bdfe)
#26 0x000055cbc838429f mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&, bool) (/home/niansong/mlir-air/install/bin/air-opt+0x114c29f)
#27 0x000055cbc7fc1a3a main (/home/niansong/mlir-air/install/bin/air-opt+0xd89a3a)
#28 0x00007f5435d15083 __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:342:3
#29 0x000055cbc7fc173e _start (/home/niansong/mlir-air/install/bin/air-opt+0xd8973e)
./compile.sh: line 9: 643162 Aborted                 air-opt mmult.triton.air.mlir -buffer-results-to-out-params -air-linalg-codegen -air-par-to-herd -air-copy-to-dma -air-dependency -canonicalize -cse -o mmult.air.mlir

runtime build should be disabled when libxaie not present

If LibXAIE is not found, then AIR_BUILD_RUNTIME=OFF should be set.

Launch MLIR code on VCK5000?

On the MLIR-AIE side, everything seems to be clear as long as it is to be run on VCK190. I understood that to make it on VCK5000, MLIR-AIR is needed. I installed it as detailed here with the most recent ROCm-runtime.

Following aircc doc, there seems not to be in any way a clear path to get an MLIR code up and running on VCK5000. I tried to play with the tests and a few examples, such as SPARTA but I am not seeing that going anywhere.

Perhaps there is an indication or a tutorial like the one for MLIR-AIE here where from an MLIR code written in AIE/R dialects, I can get it running on VCK5000?

make petalinux sysroot target .PHONY

Add .PHONY: sysroot to the petalinux directory Makefile, otherwise the sysroot target will always be up-to-date as the target name is the same as the SYSROOT directory name.

This target must execute before the further steps of the platform build occur.

mlir-air/platforms/xilinx_vck190_air/petalinux/Makefile

Line 33 in 4b3a639

sysroot:

Undefined symbol on test 21

Hi there,
I'm trying to build test 21 https://github.com/Xilinx/mlir-air/tree/main/test/airhost/21_air_nd_memcpy_2d, but I'm getting an undefined symbol error when linking:

ld.lld: error: undefined symbol: air::segments::segment_0::mlir_aie_write_buffer_scratch_0_0(aie_libxaie_ctx_t*, int, int)
>>> referenced by test.cpp:106 (/home/nx08/nx08/s2081362-2/mlir-air/test/airhost/21_air_nd_memcpy_2d/test.cpp:106)

Has anyone else encountered this issue?

Update for mlir-aie #247

Need update for Xilinx/mlir-aie#247

Need to replace DimTuple

I'm trying to update the iree-amd-aie project with the latest version of mlir-aie. That seems to have removed or renamed the DimTuple attr which used to be defined here https://github.com/Xilinx/mlir-aie/blob/c0341aa3d525827a21f25b7423b18f4359a34cf3/include/aie/Dialect/AIE/IR/AIEAttrs.td but is still used in this project here:

mlir-air/mlir/lib/Conversion/AIRToAIEPass.cpp

Line 2288 in 3ae51ef

std::vector<AIE::DimTupleAttr> dims =

mlir-air/mlir/lib/Conversion/AIRToAIESchedulingUtils.cpp

Line 175 in 3ae51ef

std::vector<AIE::DimTupleAttr>

Toolchain file seems to need to be passed in twice in cmake?

With docker image containers.xilinx.com/acdc/build:2.0 the current util build script, targetting x86, fails with

CMake Error at utils/llvm/build/lib/cmake/llvm/HandleLLVMOptions.cmake:320 (message):
  Host compiler does not support '-fuse-ld=lld'
Call Stack (most recent call first):
  CMakeLists.txt:73 (include)

It seems that this issue can be solved by passing in the toolchain file twice into the cmake:

    -DCMAKE_TOOLCHAIN_FILE=`pwd`/../cmake/modules/toolchain_x86_64.cmake \
    -Dx86_64_TOOLCHAIN_FILE=`pwd`/../cmake/modules/toolchain_x86_64.cmake \

Python Multiple Segment Examples

I've written a couple of multi-segment examples as part of the programmming examples generally (and specifically for channels #648) in this PR #663.

Right now, all 3 examples using 2 segments fail during compilation with a segfault. I have not looked further into the issue but I hope to do some debugging myself next week.

Edit: I'm putting work on this on hold for a bit, if anyone wants to pick this up.

Worker-to-Self Channel Example

I am working on writing a worker-to-worker data transfer example for channels (as part of the grouping of examples that exercise various features of channels, #648).

Draft PR is here: #653

I am basing it off the code in the channel_size example (PR waiting to be merged here: #642)

The channel_size example works well for me. As an intermediate step to adding worker-to-worker communication to that example, I tried to have each worker send data to itself over a channel. That is the version of the code that is pushed in the draft PR #653. The particular file of interest is this one.

When I run with this intermediate step, I get the following error:

Using aiecc.py from:  /scratch/ehunhoff/mlir-air/mlir-aie/install/bin/..
Running: builtin.module(air-insert-launch-and-segment-around-herd,func.func(air-lower-herd-parallel),air-dma-to-channel,canonicalize,cse,air-specialize-channel-wrap-and-stride,func.func(air-renumber-dma),func.func(convert-linalg-to-loops),air-place-herds{num-rows=6 num-cols=4 row-anchor=2 col-anchor=0})
Running: builtin.module(air-to-aie{emit-while-loop=false row-offset=2 col-offset=0 device=npu1_4col})
python3: /scratch/ehunhoff/mlir-air/mlir/lib/Conversion/AIRToAIESchedulingUtils.cpp:956: void xilinx::air::simpleDMAChannelAllocation(std::vector<MemcpyBundleAsFlow> &, ShimDMAAllocator &, MemTileDMAAllocator &, TileDMAAllocator &): Assertion `core' failed.
Aborted (core dumped)
make: *** [Makefile:7: run] Error 134

My question is:

Is a worker allowed to put/get data to/from a channel to itself?
Or is this a bug (either in my example code or the air compiler)?

firmware build error

I get a build error compiling the firmware. See my comments here: fa07290#r87744561

AllocOps and load/store ops in Launch and Segment

As part of the channel examples, and an interesting discussion on allocation, I wanted to see if I could explicitly allocate L3 memory in a launch and L2 memory in a segment. To this end, I wrote the programming_examples/channel_examples/hierarchical_alloc example in this PR: #661

Currently, it fails to fully compile with an error like this:

Traceback (most recent call last):
  File "mlir-air/programming_examples/channel_examples/hierarchical_alloc/run.py", line 86, in <module>
    test_main(build_module, verbose=args.verbose)
  File "mlir-air/programming_examples/channel_examples/hierarchical_alloc/run.py", line 45, in test_main
    addone = backend.compile_and_load(mlir_module)
  File "mlir-air/install-xrt/python/air/backend/xrt.py", line 222, in compile_and_load
    c = self.compile(module)
  File "mlir-air/install-xrt/python/air/backend/xrt.py", line 117, in compile
    aircc.run(air_module, aircc_options)
  File "mlir-air/install-xrt/python/air/compiler/aircc/main.py", line 449, in run
    run_passes(air_to_npu_passes, air_to_npu_module, opts, air_to_npu_file)
  File "mlir-air/install-xrt/python/air/compiler/aircc/main.py", line 113, in run_passes
    PassManager.parse(pass_pipeline).run(mlir_module.operation)
air._mlir_libs._site_initialize.<locals>.MLIRError: Failure while executing pass pipeline:
error: "-":145:20: failed to legalize operation 'airrt.alloc' marked as erased
 note: "-":145:20: see current operation: %16 = "airrt.alloc"() : () -> memref<1x4xi32, 1 : i32>
 note: "-":161:16: found live user of result #0: %5 = memref.load %2[%c0, %c0] : memref<1x4xi32, 1 : i32>
make: *** [Makefile:9: run] Error 1

I'm not fully confident this example is a reasonable thing to ask the air compiler to handle, but I think it might be. If it is not, let me know, and I will either change or erase the example!

Illegal Allocation Catch in Verifier

Recently, when I tried to do an l2 allocation from within a herd, I got a segfault. This makes sense as I believe only the following allocations are legal:

L1 in herd
L2 in segment
L3 in launch??

It would be more user-friendly if any illegal allocations outside of the above were caught in a verifier.

Running Matrix Scalar Add Examples with `aircc --experimental-passes`

This is low priority, but ideally I would like to run the Matrix Scalar Add examples with the experimental_passes aircc.py option set. However, the experimental passes break both of the currently working examples, single_core_dma and single_core_channel. For single_core_dma, the output is wrong. For single_core_channel, there is a segfault.

To replicate, set experimental_passes=True in this file (on the minimal-matrix-scalar-add branch).

For single_core_dma:

cd programming_examples/matrix_scalar_add/single_core_dma
make clean
make

For single_core_channel:

cd programming_examples/matrix_scalar_add/single_core_dma
make clean
make

I investigated a bit which passes might be causing the problems, and if I just comment out the first two of the experimental passes (defined here), both examples still work with the remaining passes:

    #"air-dependency",
    #"air-dependency-schedule-opt",

Single Core DMA/Channel Matrix Scalar Add Examples Broken

I'm working on my draft PR #621

I rebased my branch to master after #620 was merged in. However, after that rebase, the two examples that previously worked (single_core_dma and single_core_channel) no longer work.

You can replicate the working versions in branch debugging-matrix-scalar-add (which diverges from main at bcbfed5c instead of HEAD=45592176) with:

cd programming_examples/matrix_scalar_add/single_core_dma
make

and

cd programming_examples/matrix_scalar_add/single_core_channel
make

You can replicate the failing tests in branch minimal-matrix-scalar-add with the same commands.

For the single_core_dma example, the files aie.air.mlir and placed.air.mlir are identical between the passing/failing cases. The file npu.air.mlir has the following diff:

$ diff broken_single_core_dma_build/air_project/npu.air.mlir working_single_core_dma_build/air_project/npu.air.mlir 
63,64c63,64
<       aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 0, 512][1, 1, 8, 16][0, 0, 32]) {id = 2 : i64, metadata = @airMemcpyId3} : memref<32x16xi32>
<       aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 0, 528][1, 1, 8, 16][0, 0, 32]) {id = 3 : i64, metadata = @airMemcpyId3} : memref<32x16xi32>
---
>       aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 16, 0][1, 1, 8, 16][0, 0, 32]) {id = 2 : i64, metadata = @airMemcpyId3} : memref<32x16xi32>
>       aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 16, 16][1, 1, 8, 16][0, 0, 32]) {id = 3 : i64, metadata = @airMemcpyId3} : memref<32x16xi32>
67,68c67,68
<       aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 0, 512][1, 1, 8, 16][0, 0, 32]) {id = 6 : i64, metadata = @airMemcpyId4} : memref<32x16xi32>
<       aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 0, 528][1, 1, 8, 16][0, 0, 32]) {id = 7 : i64, metadata = @airMemcpyId4} : memref<32x16xi32>
---
>       aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 16, 0][1, 1, 8, 16][0, 0, 32]) {id = 6 : i64, metadata = @airMemcpyId4} : memref<32x16xi32>
>       aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 16, 16][1, 1, 8, 16][0, 0, 32]) {id = 7 : i64, metadata = @airMemcpyId4} : memref<32x16xi32>

The diff for the single_core_channel example is essentially the same.

Let me know if more information is needed!

Issue while building, -fno-rtti is passed although boost requires rtti

Hi, I have this weird issue while building the repo following the instructions in this page:

FAILED: mlir/lib/CAPI/CMakeFiles/obj.AIRCAPI.dir/Runner.cpp.o 
/net/media/scratch/fournier/llvm-install/llvm-15/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/net/media/scratch/fournier/llvm-for-mlir-aie/llvm/include -I/net/media/scratch/fournier/llvm-for-mlir-aie/build-Debug/include -I/net/media/scratch/fournier/llvm-for-mlir-aie/mlir/include -I/net/media/scratch/fournier/llvm-for-mlir-aie/build-Debug/tools/mlir/include -I/net/media/scratch/fournier/mlir-aie/include -I/net/media/scratch/fournier/mlir-aie/build/include -I/net/media/scratch/fournier/mlir-air/mlir/include -I/net/media/scratch/fournier/mlir-air/build/mlir/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -std=gnu++17   -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS  -fno-exceptions -fno-rtti -UNDEBUG -MD -MT mlir/lib/CAPI/CMakeFiles/obj.AIRCAPI.dir/Runner.cpp.o -MF mlir/lib/CAPI/CMakeFiles/obj.AIRCAPI.dir/Runner.cpp.o.d -o mlir/lib/CAPI/CMakeFiles/obj.AIRCAPI.dir/Runner.cpp.o -c /net/media/scratch/fournier/mlir-air/mlir/lib/CAPI/Runner.cpp
In file included from /net/media/scratch/fournier/mlir-air/mlir/lib/CAPI/Runner.cpp:11:
In file included from /net/media/scratch/fournier/mlir-air/mlir/include/air/Util/Runner.h:12:
In file included from /net/media/scratch/fournier/mlir-air/mlir/include/air/Util/Dependency.h:34:
In file included from /usr/include/boost/graph/graphviz.hpp:25:
/usr/include/boost/property_map/dynamic_property_map.hpp:150:28: error: use of typeid requires -frtti
    if (in_value.type() == typeid(value_type)) {
                           ^
/usr/include/boost/property_map/dynamic_property_map.hpp:191:56: error: use of typeid requires -frtti
  virtual const std::type_info& key()   const { return typeid(key_type); }
                                                       ^
/usr/include/boost/property_map/dynamic_property_map.hpp:192:56: error: use of typeid requires -frtti
  virtual const std::type_info& value() const { return typeid(value_type); }
                                                       ^
/usr/include/boost/property_map/dynamic_property_map.hpp:286:29: error: use of typeid requires -frtti
    if (i->second->key() == typeid(key)) {
                            ^
/usr/include/boost/property_map/dynamic_property_map.hpp:308:29: error: use of typeid requires -frtti
    if (i->second->key() == typeid(key))
                            ^
/usr/include/boost/property_map/dynamic_property_map.hpp:321:29: error: use of typeid requires -frtti
    if (i->second->key() == typeid(key))
                            ^
/usr/include/boost/property_map/dynamic_property_map.hpp:334:29: error: use of typeid requires -frtti
    if (i->second->key() == typeid(key))

As you can see boost requires RTTI, but the command line of the compiler contains -fno-rtti. Do you have an idea, what could be causing this? When I grep for rtti in the repo I only find hits in the cmake files that are in sandbox and they don't appear related... Thanks for any help.

Unrecognized architecture 'aie'

I have setup all the tools for the first time and can get pretty far. However, when I try to compile any of the examples it complains about unrecognized architecture 'aie'.

What did I miss in my setup or is this a new bug?

I just cloned this repo earlier today, I have built up to commit 8ea962a. Using Ubuntu 20.04LTS & v2021.2 tools. I have it set to use the sysroot from the remnants of the pynq pre-build.

This is the beefmaker example project make results:

clang ../../../install/runtime_lib/test_library.cpp --target=aarch64-linux-gnu --sysroot=../../../platforms/xilinx_vck190_air/petalinux/sysroot/sysroots/cortexa72-cortexa53-xilinx-linux/ -g  -I../../../platforms/xilinx_vck190_air/petalinux/sysroot/sysroots/cortexa72-cortexa53-xilinx-linux//opt/xaienginev2/include -std=c++17 -I/sandbox-hdd/mlir-air/install/bin//../runtime_lib/airhost/include -I../../../install/runtime_lib -DAIR_LIBXAIE_ENABLE -DLIBXAIENGINEV2 -c -o test_library.o
xchesscc -p me -P /tools/Xilinx/Vitis/2021.2/aietools/data/cervino/lib -c chess/beefmaker_kernel.cc
aircc.py -o beefmaker.air.a --host-target=aarch64-linux-gnu -xbridge --sysroot=../../../platforms/xilinx_vck190_air/petalinux/sysroot/sysroots/cortexa72-cortexa53-xilinx-linux/ air.mlir
Compiling partitions: ['partition_0']
Found Vitis at /enc/tools/Xilinx/Vitis/2021.2
 MLIR compilation: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:-- 0:00:00 0/1 1 Workeropt: unrecognized architecture 'aie' provided.
Error encountered while running: opt --opaque-pointers=0 --passes=default<O2> -inline-threshold=10 -S air_project/partition_0/input.ll -o air_project/partition_0/input.opt.ll
 Error ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:-- 0:00:00 0/1 1 Worker
Error encountered while running: aiecc.py --sysroot ../../../platforms/xilinx_vck190_air/petalinux/sysroot/sysroots/cortexa72-cortexa53-xilinx-linux/ --host-target aarch64-linux-gnu --tmpdir air_project/partition_0 --aie-generate-xaiev2 --xbridge --no-xchesscc air_project/aiecc.partition_0.mlir
make: *** [Makefile:18: beefmaker.air.a] Error 1

Form Herds from Multiple scf.forall nests.

In order to support code generated from loop-peeling, we'd like to look at forming herds from multiple scf.forall statements ( rather than targeting just one), so that L1 allocations can stay local within a herd definition.

Channel Examples

Channels are a key abstraction of mlir-air, but there are few examples for how to use them. This issue is a place to discuss which examples are needed to show how channels work, and which of those examples are implemented.

Examples to Create

Discussion Topics

Is Launch2Launch Desirable?

Some trouble creating multiple launches, so this may be a little early in terms of creating a launch2launch channel communication (see issue #627)
What information would you need to scheduling launches in order to do this?

Synchronous vs Asynchronous

Might one day be good to have some examples where the user explicitly sets async tokens on channel operations, but this capability isn't implemented yet in the air dependency pass.

Placement

An example doing something specific with channels based on placement of resources?

Build instructions clarification in docs

The instructions in https://xilinx.github.io/mlir-air/building.html appear to have a typo:

git clone https://github.com/stephenneuendorffer/aie-rt
cd aie-rt
git checkout phoenix_v2023.2
cd driver/src
make -f Makefile.Linux CFLAGS="-D__AIEAMDAIR__"
sudo cp -r ../include /opt/aiengine/
sudo cp libxaiengine.so* /opt/xaiengine/lib/
export LD_LIBRARY_PATH=/opt/xaiengine/lib:${LD_LIBRARY_PATH}

opt/aiengine vs opt/xaiengine ?

error: undefined reference due to --no-allow-shlib-undefined: AmdairBackend

Hi,
I've encountered the error stated on the title of the issue when I try to compile the test 13_mb_add_one. If I check the symbol table of libxaiengine.xo I get that AmdAirBackend is defined but AmdairBackend is undefined:

I built the libxaiengine library from https://github.com/stephenneuendorffer/aie-rt, branch phoenix_v2023.2, following the instructions in the mlir-air documentation: https://xilinx.github.io/mlir-air/building.html.

Edit: found the solution, there was a problem with the definition of a variable in the source code because of a name mismatch. Will post the fix as a pull request in the relevant repository soon.

CMakefile hard codes reference to clang/clang++ 12

Hello everyone!

In cmake/modules/toolchain_x86.cmake there is an hardcoded reference to LLVM 12

# specify the compiler
set(CLANG_VER 12)
set(CMAKE_C_COMPILER clang-${CLANG_VER})
set(CMAKE_CXX_COMPILER clang++-${CLANG_VER})
set(CMAKE_ASM_COMPILER clang-${CLANG_VER})
set(CMAKE_STRIP llvm-strip-${CLANG_VER})
set(CLANG_LLD lld-${CLANG_VER} CACHE STRING "" FORCE)

However, the LLVM version downloaded might be different (LLVM 17 as I write) and the builder cannot find the compilers. One option is to update the clang version variable, however clang++-17 doesn't exist (only clang++ and clang-17) so that doesn't work either. I resolved temporarily by removing any reference to the CLANG version in the above variables. Do we need to check for specific version of LLVM? can we use llvm-config of the LLVM we built?

Thanks,

Roberto

Update ODS to add 'name' member to 'xilinx::airrt::EventType' and other types

It seems that some ODS information needs to be update because of recent LLVM changes. See also this related issue.

I have tried to compile the nod.ai shark runtime with the iree-amd-aie plugin enabled. The latter uses this project. I got the following error:

In file included from /home/fharwath/wd/shark/SRT/third_party/llvm-project/mlir/include/mlir/IR/Types.h:12:
/home/fharwath/wd/shark/SRT/third_party/llvm-project/mlir/include/mlir/IR/TypeSupport.h:54:28: error: no member named 'name' in 'xilinx::airrt::EventType'
   54 |                         T::name);
      |                         ~~~^

This is caused by LLVM commit 3dbac2c007c1 [mlir] Expose type and attribute names in the MLIRContext and abstract type/attr classes
.

Assertion failure running air-to-aie pass

air-to-aie on a transform test, run from mlir-air/mlir/test directory:

 $ air-opt --air-to-aie  Transform/AIRDependency/matmul_parallel.mlir

Maybe this is not a sensible pass to run on this input, but I'm reporting just in case you don't think users should be reaching assertion failures like this. Here's the dump:

/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1045: std::vector::reference std::vector<int>::operator[](std::vector::size_type) [_Tp = int, _Alloc = std::allocator<int>]: Assertion '__n < this->size()' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: ../../build/bin/air-opt --air-to-aie Transform/AIRDependency/matmul_parallel.mlir
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  air-opt   0x00005563621ba8f7

Error opening /sys/class/amdair/amdair/00/address

Hi,
I get the error stated in the title when I run tests in this respository (in particular I've tried https://github.com/Xilinx/mlir-air/tree/main/test/airhost/271_air_get_info and https://github.com/Xilinx/mlir-air/tree/main/test/airhost/13_mb_add_one). The errors happen at the libxaiengine level. For example [AIE ERROR] XAie_AmdAirIO_Read32():165: Error opening /sys/class/amdair/amdair/00/address.

`XRTBackend` Implementation of `AirBackend` is Confusing

I was trying to use the compile() and load() methods of XRTBackend when I was doing some debugging recently. I realized the load() method takes a module: air.ir.Module as an argument which is then never used.

This is confusing.

The abstract base class (AirBackend) is flexible, because we can specify a unique CompiledArtifact for XRTBackend. I think this confusion would be fixed if the CompiledArgument for the XRTBackend were something like (in pseudo code):

class XRTArtifact:
  xclbin: File(),
  insts: File(),

e.g., the compiled artifacts are a pair of files (or file paths) pointing to the xclbin and instruction file.

I'm happy to make a PR for this, if others think this is a reasonable change. If there is some history behind the current format that needs to be taken into account, I'm happy to hear it!

Error configuring elfutils

The AIRBIN script requires elfutils. When attempting to build it on one of the machines, we get the following error from configure regarding zstd:

./configure: line 7060: syntax error near unexpected token `ZSTD_COMPRESS,libzstd'
./configure: line 7060: `      PKG_CHECK_MODULES(ZSTD_COMPRESS,libzstd >= 1.4.0,'

The workaround was to copy a configuration file from a working machine. Want to note this so we look into it later and remember the temporary fix.

xilinx / mlir-air Goto Github PK

mlir-air's People

Stargazers

Watchers

Forkers

mlir-air's Issues

Description

Tool commit points

Repeat this issue

Description

Tool commit points

Repeat the issue

Description

Tool commit points

Repeat this issue

Examples to Create

Discussion Topics

Is Launch2Launch Desirable?

Synchronous vs Asynchronous

Placement

Recommend Projects

Recommend Topics

Recommend Org