Giter VIP home page Giter VIP logo

thu-dsp-lab / llvm-project Goto Github PK

View Code? Open in Web Editor NEW
22.0 2.0 14.0 2.06 GB

LLVM OpenCL C compiler suite for ventus GPGPU

Home Page: http://llvm.org

License: Other

C++ 39.41% Shell 0.03% CMake 0.32% Assembly 9.83% C 12.51% Python 1.07% LLVM 35.95% Dockerfile 0.01% HTML 0.20% CSS 0.01% JavaScript 0.02% Emacs Lisp 0.01% Batchfile 0.01% Objective-C 0.46% Objective-C++ 0.10% HLSL 0.01% Cuda 0.07% Pawn 0.01% SWIG 0.01% Rust 0.01%

llvm-project's Introduction

This is the Ventus GPGPU port of LLVM Compiler Infrastructure

Ventus GPGPU is based on RISCV RV32IMAZfinxZve32f ISA with fully redefined concept of V-extension.

The Ventus GPGPU OpenCL compiler based on LLVM is developed by Terapines Technology (Wuhan) Co., Ltd

承影GPGPU OpenCL编译器由Terapines(兆松科技)负责开发

For more architecture detail, please refer to Ventus GPGPU Arch

Getting Started

1: Programs related repositories

Download all the repositories firstly and place them in the same path.

ATTENTION: Remember to check branch for every repository, cause the project are under development, if you get any build errors, feel free to give an issue or just contact authors

2: Build all the programs

Our program is based on LLVM, so the need packages to build ventus are almost the same as what are needed to build LLVM, you can refer to official website for detailed llvm building guidance, we just list most important needed packages here.

  • ccache
  • cmake
  • ninja
  • clang

If you see any packages missing information, just install them

The following packages are needed for other repositories:

  • device-tree-compiler
  • bsdmainutils

Run ./build-ventus.sh to automatically build all the programs, but we need to run firstly

  • For developers who want to build Debug version for llvm, export BUILD_TYPE=Debug, since it's set default to be 'Release'
  • export POCL_DIR=<path-to-pocl-dir>, default folder path will be set to be <llvm-ventus-parentFolder>/pocl
  • export OCL_ICD_DIR=<path-to-ocl-icd-dir>, default folder path will be set to be <llvm-ventus-parentFolder>/ocl-icd

You can dive into build-ventus.sh file to see the detailed information about build process

3: Bridge icd loader

Run export VENTUS_INSTALL_PREFIX=<path_to_install> to set VENTUS_INSTALL_PREFIX environment variable(system environment variable recommended)

Run export LD_LIBRARY_PATH=${VENTUS_INSTALL_PREFIX}/lib to tell OpenCL application to use your own built libOpenCL.so, also to correctly locate LLVM shared libraries

Run export OCL_ICD_VENDORS=${VENTUS_INSTALL_PREFIX}/lib/libpocl.so to tell ocl icd loader where the icd driver is.

Finally, run export POCL_DEVICES="ventus" to tell pocl driver which device is available(should we set ventus as default device?).

You will see Ventus GPGPU device is found if your setup is correct.

NOTE: OpenCL host side program should be linked with icd loader -lOpenCL.

$ <pocl-install-dir>/bin/poclcc -l

LIST OF DEVICES:
0:
  Vendor:   THU
    Name:   Ventus GPGPU device
 Version:   2.2 HSTR: THU-ventus-gpgpu

Also, you can try to set POCL_DEBUG=all and run example under <pocl-build-dir> to see the full OpenCL software stack execution pipeline. For example(Work in progress).

4: Compiler using example

we can now use our built compiler to generate an ELF file, and using spike to complete the isa simulation.

Cause the address space requirement in spike, we use a customized linker script for our compiler

Take vecadd.cl below as an example :

__kernel void vectorAdd(__global float* A, __global float* B) {
  unsigned tid = get_global_id(0);
  A[tid] += B[tid];
}

4.1: Generate ELF file

4.1.1 Compile directly

Remember to build libclc too because we need the libclc library

Use command line under the root directory of llvm-ventus

./install/bin/clang -cl-std=CL2.0 -target riscv32 -mcpu=ventus-gpgpu vecadd.cl  ./install/lib/crt0.o -L./install/lib -lworkitem -I./libclc/generic/include -nodefaultlibs ./libclc/riscv32/lib/workitem/get_global_id.cl -O1 -cl-std=CL2.0 -Wl,-T,utils/ldscripts/ventus/elf32lriscv.ld -o vecadd.riscv
4.1.2 Compile step-by-step
  1. Compile OpenCL code to LLVM IR assembly (.ll file):
./install/bin/clang -S -cl-std=CL2.0 -target riscv32 -mcpu=ventus-gpgpu vecadd.cl -emit-llvm -o vecadd.ll
  1. Compile LLVM IR to RISC-V assembly or object file:
./install/bin/llc -mtriple=riscv32 -mcpu=ventus-gpgpu vecadd.ll -o vecadd.s
./install/bin/llc -mtriple=riscv32 -mcpu=ventus-gpgpu --filetype=obj vecadd.ll -o vecadd.o
  1. Link essential library: Linking crt0 and libclc All the libclc workitem functions' implementation is included in riscv32clc.o
./install/bin/ld.lld -o vecadd.riscv -T utils/ldscripts/ventus/elf32lriscv.ld vecadd.o ./install/lib/crt0.o ./install/lib/riscv32clc.o -L./install/lib -lworkitem --gc-sections
4.1.3 Compile assembly code to object file (.s to .o)

Take custome instructions custome.s as an example :

vftta.vv v0, v0, v1
vfexp v0, v1
vadd12.vi v0, v1, 8
./install/bin/clang -c -target riscv32 -mcpu=ventus-gpgpu custom.s -o custom.o

4.2: Dump file

./install/bin/llvm-objdump -d --mattr=+v,+zfinx vecadd.riscv >& vecadd.txt

you will see output like below, 0x80000000 is the space address required by spike for _start function, this is the reason why we use a customized linker script

vecadd.riscv:	file format elf32-littleriscv

Disassembly of section .text:

80000000 <_start>:
80000000: 97 21 00 00  	auipc	gp, 2
80000004: 93 81 01 80  	addi	gp, gp, -2048
80000008: 93 0e 00 02  	li	t4, 32
8000000c: d7 fe 0e 0d  	vsetvli	t4, t4, e32, m1, ta, ma
80000010: b7 2e 00 00  	lui	t4, 2
80000014: f3 ae 0e 30  	csrrs	t4, mstatus, t4
80000018: 93 0e 00 00  	li	t4, 0
8000001c: 73 21 60 80  	csrr	sp, 2054
80000020: 73 22 70 80  	csrr	tp, 2055

80000024 <.Lpcrel_hi1>:
80000024: 17 15 00 00  	auipc	a0, 1
80000028: 13 05 85 fe  	addi	a0, a0, -24

....
....
....

or you can check encoding of custom instructions

./install/bin/llvm-objdump -d --mattr=+v,+zfinx custom.o >& custom.txt
custom.o:       file format elf32-littleriscv

Disassembly of section .text:

00000000 <.text>:
       0: 0b c0 00 0e   vftta.vv        v0, v0, v1
       4: 0b 60 10 0a   vfexp   v0, v1
       8: 0b 80 80 00   vadd12.vi       v0, v1, 8

4.3: Running in spike

We need to run the isa simulator to verify our compiler

Use spike from THU and follow the README.md

4.4: Driver using example

Accordingly, after all the building process, you can change directory to <llvm-ventus-parentFolder>/pocl/build/examples/vecadd directory, then export variables as what Bridge icd loader does, finally just execute the file vecadd

5: Github actions

the workflow file is .github/workflows/ventus-build.yml, including below jobs

  • Build llvm
  • Build ocl-icd
  • Build libclc
  • Build isa-simulator
  • Build sumulator-driver
  • Build pocl
  • Isa simulation test
  • GPU-rodinia testsuite
  • Pocl testing

6: Docker image

If the user needs to build the toolchain of the Ventus project in an environment other than Ubuntu, such as the CentOS system, we provide the Dockerfile for building the CentOS image. The file is under '.github/workflows/containers/dockerfiles'.

Note: When using build-ventus.sh to build the instantiated centos container, the following modifications are required, which are different from the above "2: Build all the programs":

--- a/build-ventus.sh
+++ b/build-ventus.sh
@@ -119,6 +119,8 @@ build_llvm() {
     -DLLVM_CCACHE_BUILD=ON \
     -DLLVM_OPTIMIZED_TABLEGEN=ON \
     -DLLVM_PARALLEL_LINK_JOBS=12 \
+    -DCMAKE_C_COMPILER=clang \
+    -DCMAKE_CXX_COMPILER=clang++ \
     -DCMAKE_BUILD_TYPE=${BUILD_TYPE} \
     -DLLVM_ENABLE_PROJECTS="clang;lld;libclc" \
     -DLLVM_TARGETS_TO_BUILD="AMDGPU;X86;RISCV" \
@@ -232,7 +234,7 @@ export_elements() {
   export SPIKE_TARGET_DIR=${VENTUS_INSTALL_PREFIX}
   export VENTUS_INSTALL_PREFIX=${VENTUS_INSTALL_PREFIX}
   export POCL_DEVICES="ventus"
-  export OCL_ICD_VENDORS=${VENTUS_INSTALL_PREFIX}/lib/libpocl.so
+  export OCL_ICD_VENDORS=${VENTUS_INSTALL_PREFIX}/lib64/libpocl.so
 }

llvm-project's People

Contributors

lattner avatar topperc avatar rksimon avatar espindola avatar rotateright avatar tkremenek avatar ddunbar avatar arsenm avatar douggregor avatar d0k avatar rui314 avatar zygoloid avatar maskray avatar chandlerc avatar isanbard avatar echristo avatar nico avatar dwblaikie avatar rnk avatar chapuni avatar nikic avatar labath avatar akyrtzi avatar stoklund avatar jdevlieghere avatar klausler avatar eefriedman avatar tobiasgrosser avatar lhames avatar fhahn avatar

Stargazers

drew-drew avatar  avatar Junyi Mei avatar  avatar Payton Byrd avatar fatih avatar Xiao avatar Jing Zhang avatar  avatar Kika avatar XADE avatar  avatar  avatar mengou avatar  avatar  avatar MbjYjbpivj avatar Seungkwan Kang avatar AI_amateur avatar  avatar Kevin avatar Hu He avatar

Watchers

 avatar  avatar

llvm-project's Issues

Join指令插入的顺序

之前是默认插入到PostIDomBB的最前面,但是目前vmv的设计好像有点问题,见#16
1 : 当vmv指令出现在分支时。拷贝操作应该是当前线程mask为1时才进行
2 :join操作应该是在BB的最后一个vmv指令之后

访存指令匹配的错误

在tablegen中写访存指令的匹配,有一定的局限性,还需要在源代码中修改以解决访存指令错误的问题

[CTS] basic/if failed

  1. run test_basic if failed
  2. simple change case to test
    A. set 32 threads , run test_basic if 32
    B.
    WXWorkLocal_16944375885891
  3. run failed
    input data has 0x1 ,but output data is 0x7fffffff

WXWorkLocal_16944229622649

[gpu-rodiania] illegal fmv.w.x instruction

Using the ventus-llvm commit :b9fa6ff

run the gpu-rodinia project - gaussian test, you will get illegal instruction as below

d3 82 03 f0  	fmv.w.x	t0, t2

but as hardware colleges commented, this istruction is llegal, so when we need to bitcast f32(GPRF32) to i32(GPR), we can all use vmv.v.x instruction

多参数处理的问题

在ventus中,目前只有printf函数的实现是用了多参数。本身代码是用C语言写的,见printf分支代码,以下面的代码为例

int *Buffer = (int *)0xa0024000;
int pos = -1;
int printf(const __attribute__((address_space(1)))  char *format, ...) {
  __builtin_va_list arg_list;
  __builtin_va_start(arg_list, format);
  int pos = -1;
  // 往这个地址写东西: print_buffer_addr
  while (*format != '\0') {
    pos++;
    if (*format == '%') {
      format++;
      if (*format == 'd') {
        int val = __builtin_va_arg(arg_list, int);
        Buffer[pos] = val;
      }
      format++;
    } else {
      // printf("%c", *format);
      Buffer[pos] = *format;
      format++;
    }
  }
  __builtin_va_end(arg_list);
}

标准的riscv32代码如下:

printf:                                 # @printf
# %bb.0:
        addi    sp, sp, -64
        sw      ra, 28(sp)                      # 4-byte Folded Spill
        sw      s0, 24(sp)                      # 4-byte Folded Spill
        addi    s0, sp, 32
        sw      a7, 28(s0)
        sw      a6, 24(s0)
        sw      a5, 20(s0)
        sw      a4, 16(s0)
        sw      a3, 12(s0)
        sw      a2, 8(s0)
        sw      a1, 4(s0)
        sw      a0, -16(s0)
        addi    a0, s0, 4
        sw      a0, -20(s0)
        li      a0, -1
        sw      a0, -24(s0)
        j       .LBB0_1
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        lw      a0, -16(s0)
        lbu     a0, 0(a0)
        beqz    a0, .LBB0_8
        j       .LBB0_2
.LBB0_2:                                #   in Loop: Header=BB0_1 Depth=1
        lw      a0, -24(s0)
        addi    a0, a0, 1
        sw      a0, -24(s0)
        lw      a0, -16(s0)
        lbu     a0, 0(a0)
        li      a1, 37
        bne     a0, a1, .LBB0_6
        j       .LBB0_3
.LBB0_3:                                #   in Loop: Header=BB0_1 Depth=1
        lw      a0, -16(s0)
        addi    a0, a0, 1
        sw      a0, -16(s0)
        lw      a0, -16(s0)
        lbu     a0, 0(a0)
        li      a1, 100
        bne     a0, a1, .LBB0_5
        j       .LBB0_4
.LBB0_4:                                #   in Loop: Header=BB0_1 Depth=1
        lw      a0, -20(s0)
        addi    a1, a0, 4
        sw      a1, -20(s0)
        lw      a0, 0(a0)
        sw      a0, -28(s0)
        lw      a0, -28(s0)
        lui     a1, %hi(Buffer)
        lw      a1, %lo(Buffer)(a1)
        lw      a2, -24(s0)
        slli    a2, a2, 2
        add     a1, a1, a2
        sw      a0, 0(a1)
        j       .LBB0_5
.LBB0_5:                                #   in Loop: Header=BB0_1 Depth=1
        lw      a0, -16(s0)
        addi    a0, a0, 1
        sw      a0, -16(s0)
        j       .LBB0_7
.LBB0_6:                                #   in Loop: Header=BB0_1 Depth=1
        lw      a0, -16(s0)
        lbu     a0, 0(a0)
        lui     a1, %hi(Buffer)
        lw      a1, %lo(Buffer)(a1)
        lw      a2, -24(s0)
        slli    a2, a2, 2
        add     a1, a1, a2
        sw      a0, 0(a1)
        lw      a0, -16(s0)
        addi    a0, a0, 1
        sw      a0, -16(s0)
        j       .LBB0_7
.LBB0_7:                                #   in Loop: Header=BB0_1 Depth=1
        j       .LBB0_1
.LBB0_8:
        lw      a0, -12(s0)
        lw      ra, 28(sp)                      # 4-byte Folded Reload
        lw      s0, 24(sp)                      # 4-byte Folded Reload
        addi    sp, sp, 64
        ret
                                        # -- End function
Buffer:
        .word   2684502016

pos:
        .word   4294967295                      # 0xffffffff

                                        # DW_AT_external
                                        # DW_AT_external
                                        # DW_AT_prototyped
                                        # DW_AT_external

这里的sw a0, -20(s0)是lower vastart操作,就是把可变参数的首地址存在栈上。而这个地址是跟栈帧(s0)相关的,在ventus的架构中,VLW/VSW的操作是由硬件来完成地址偏移计算的,所以按照标准的RISCV32标准来生成代码,如此类似的指令执行起来是有问题的,见这个PR: 649f4da

llvm libclc compiler-rt precision issue

目前ventus libclc里面的函数实现借鉴了llvm仓库的两个子文件夹

  • libclc(builtin以及workitem函数)
  • compiler-rt(因为目前承影架构不支持64位,所以采用了软浮点的方式支持了64位)

函数调用的时候generic_addressing_space自动推导的问题

OpenCL 2.0支持generic address space,所以在定义一个函数的时候可以不指定指针的地址空间,那么在调用该函数的时候可以传入任何地址空间的指针,所以函数内部不知道怎么翻译有关该指针的访存指令。可以看下面这个例子:

// RUN: clang -cl-std=CL2.0 -target riscv32 -mcpu=ventus-gpgpu -S %s
int test2(private int *a) {
    return *a + 1;
}

int test3(int *a) {
    return *a + 1;
}

int test1() {
    private int a[2];
    return test2(a) + test3(a);
}

这里test3和test2翻成了不同的访存指令。下面是汇编:

test2:
	addi	sp, sp, 4
	sw	ra, -4(sp)
	vlw.v	v0, 0(v0)
	vadd.vi	v0, v0, 1
	lw	ra, -4(sp)
	addi	sp, sp, -4
	ret
.Lfunc_end0:
	.size	test2, .Lfunc_end0-test2

	.globl	test3
	.p2align	2
	.type	test3,@function
test3:
	addi	sp, sp, 4
	sw	ra, -4(sp)
	vlw12.v	v0, 0(v0)
	vadd.vi	v0, v0, 1
	lw	ra, -4(sp)
	addi	sp, sp, -4
	ret
.Lfunc_end1:
	.size	test3, .Lfunc_end1-test3

	.globl	test1
	.p2align	2
	.type	test1,@function
test1:
	addi	sp, sp, 4
	sw	ra, -4(sp)
	lw	ra, -4(sp)
	addi	sp, sp, -4
	ret

栈的空间调整的问题

在ventus的架构中,其实是有两个栈空间的,目前我们没有完全分开这两种栈空间的计算,或者说有些计算错误目前也未知

Discussion about test

目前的测试其实包括了很多模块,但是主要还是针对pocl和spike的功能测试,当前的测试比较分散,没有加入CTS,只有pocl和rodinia的部分测试,个人觉得不是很完善,应该把CTS通过的部分也加入进去,以下想法:

1 : 创建一个子仓库,里面存所有待测试的可执行文件,文件层级按目前的仓库来划分。类似

pocl
	vecadd
	matadd
gpurodia
	nn
	gaussian
CTS
	testconformance
		basic
				if
				kernel_call_kernel
		compiler
		...

每次有通过的新用例可以往这个子仓库push,然后写一个脚本来跑测试,以后每次PR之前,都做一个这个测试,保证更改之后之前的测试都能过,这是一个大致的指导思路


当然总体思路是要保证能测试到目前所有通过的用例,然后后面每次修改都要保证通过前面所有的测试,而且前面每一次测试的dump和log文件最好能够保留,这样后面PR用例失败的时候,有新旧log比对

The most common problem when running test

1 : load/store instructions

*  wrong function return address 
*  wrong load/store instructions
	* load from private stack, but use vlw12 ins
	* load from lds, but use vlw ins

* Wrong function return address	
	* 这种情况最有可能出现在llvm commit之后导致libclc库的函数实现出现问题
	*  crt0.S文件里面也有一小部分关于ra寄存器的spill/restore

fails to compile vector_dot_product testcase

my environment is ubuntu 20.04, and I compile example vecadd.cl successfully, but for this testcase it fails.
command is:
/opt/llvm-project/install/bin/clang -cl-std=CL2.0 -target riscv32 -mcpu=ventus-gpgpu -O1 -S example1.cl -o example1.s
source file example1.cl is:

#define USE_VECTOR_DATATYPES

__kernel void dot_product (__global const float4 *a,
             __global const float4 *b, __global float *c)
{
  int gid = get_global_id(0);

#ifndef USE_VECTOR_DATATYPES
  /* This version is to smoke test the autovectorization.
     Tries to create parallel regions with nice memory
     access pattern etc. so it gets autovectorizer. */
  /* This parallel region does not vectorize with the
     loop vectorizer because it accesses vector datatypes.
     Perhaps with SLP/BB vectorizer.*/

  float ax = a[gid].x;
  float ay = a[gid].y;
  float az = a[gid].z;
  float aw = a[gid].w;

  float bx = b[gid].x,
      by = b[gid].y,
      bz = b[gid].z,
      bw = b[gid].w;

  barrier(CLK_LOCAL_MEM_FENCE);

  /* This parallel region should vectorize. */
  c[gid] = ax * bx;
  c[gid] += ay * by;
  c[gid] += az * bz;
  c[gid] += aw * bw;

#else
  float4 prod = a[gid] * b[gid];
  c[gid] = prod.x + prod.y + prod.z + prod.w;
#endif

}

the report is:

clang: /opt/llvm-project/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp:879: bool llvm::RISCVDAGToDAGISel::SelectAddrRegReg(llvm::SDValue, llvm::SDValue&, llvm::SDValue&): Assertion `0 && "TODO"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /opt/llvm-project/install/bin/clang -cl-std=CL2.0 -target riscv32 -mcpu=ventus-gpgpu -O1 -S example1.cl -o example1.s
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'example1.cl'.
4.      Running pass 'RISCV DAG->DAG Pattern Instruction Selection' on function '@dot_product'
 #0 0x00007f81ee6e3298 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /opt/llvm-project/llvm/lib/Support/Unix/Signals.inc:567:22
 #1 0x00007f81ee6e3353 PrintStackTraceSignalHandler(void*) /opt/llvm-project/llvm/lib/Support/Unix/Signals.inc:641:1
 #2 0x00007f81ee6e0f5e llvm::sys::RunSignalHandlers() /opt/llvm-project/llvm/lib/Support/Signals.cpp:104:20
 #3 0x00007f81ee6e2ab9 llvm::sys::CleanupOnSignal(unsigned long) /opt/llvm-project/llvm/lib/Support/Unix/Signals.inc:366:31
 #4 0x00007f81ee52a87b (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) /opt/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:77:5
 #5 0x00007f81ee52adf6 CrashRecoverySignalHandler(int) /opt/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:398:1
 #6 0x00007f81edea3090 (/lib/x86_64-linux-gnu/libc.so.6+0x43090)
 #7 0x00007f81edea300b raise (/lib/x86_64-linux-gnu/libc.so.6+0x4300b)
 #8 0x00007f81ede82859 abort (/lib/x86_64-linux-gnu/libc.so.6+0x22859)
 #9 0x00007f81ede82729 (/lib/x86_64-linux-gnu/libc.so.6+0x22729)
#10 0x00007f81ede93fd6 (/lib/x86_64-linux-gnu/libc.so.6+0x33fd6)
#11 0x00007f81f9313dfa llvm::RISCVDAGToDAGISel::SelectAddrRegReg(llvm::SDValue, llvm::SDValue&, llvm::SDValue&) /opt/llvm-project/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp:886:21
#12 0x00007f81f931e2c3 llvm::RISCVDAGToDAGISel::CheckComplexPattern(llvm::SDNode*, llvm::SDNode*, llvm::SDValue, unsigned int, llvm::SmallVectorImpl<std::pair<llvm::SDValue, llvm::SDNode*>>&) /opt/llvm-project/build/lib/Target/RISCV/RISCVGenDAGISel.inc:17100:78
#13 0x00007f81eca68299 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) /opt/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:3119:31
#14 0x00007f81f931974e llvm::RISCVDAGToDAGISel::SelectCode(llvm::SDNode*) /opt/llvm-project/build/lib/Target/RISCV/RISCVGenDAGISel.inc:16112:1
#15 0x00007f81f9312556 llvm::RISCVDAGToDAGISel::Select(llvm::SDNode*) /opt/llvm-project/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp:614:13
#16 0x00007f81eca5dac9 llvm::SelectionDAGISel::DoInstructionSelection() /opt/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1105:32
#17 0x00007f81eca5cd34 llvm::SelectionDAGISel::CodeGenAndEmitDAG() /opt/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:935:61
#18 0x00007f81eca5b3f7 llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, true, false, void>, false, true>, llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, true, false, void>, false, true>, bool&) /opt/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:689:1
#19 0x00007f81eca6031a llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) /opt/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1604:33
#20 0x00007f81eca59e9b llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) /opt/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:469:7
#21 0x00007f81f93195a0 llvm::RISCVDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) /opt/llvm-project/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h:40:3
#22 0x00007f81f650ed0a llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /opt/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:91:33
#23 0x00007f81ef98eb40 llvm::FPPassManager::runOnFunction(llvm::Function&) /opt/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1430:20
#24 0x00007f81ef98ee09 llvm::FPPassManager::runOnModule(llvm::Module&) /opt/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1476:13
#25 0x00007f81ef98f27b (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /opt/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1545:20
#26 0x00007f81ef98a19c llvm::legacy::PassManagerImpl::run(llvm::Module&) /opt/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:535:13
#27 0x00007f81ef98fb51 llvm::legacy::PassManager::run(llvm::Module&) /opt/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1673:1
#28 0x00007f81f7674458 (anonymous namespace)::EmitAssemblyHelper::RunCodegenPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&) /opt/llvm-project/clang/lib/CodeGen/BackendUtil.cpp:1093:51
#29 0x00007f81f7674660 (anonymous namespace)::EmitAssemblyHelper::EmitAssembly(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>) /opt/llvm-project/clang/lib/CodeGen/BackendUtil.cpp:1118:17
#30 0x00007f81f76756e6 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>) /opt/llvm-project/clang/lib/CodeGen/BackendUtil.cpp:1274:25
#31 0x00007f81f7e5116d clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) /opt/llvm-project/clang/lib/CodeGen/CodeGenAction.cpp:381:24
#32 0x00007f81e77dd7d7 clang::ParseAST(clang::Sema&, bool, bool) /opt/llvm-project/clang/lib/Parse/ParseAST.cpp:203:14
#33 0x00007f81f4376a6f clang::ASTFrontendAction::ExecuteAction() /opt/llvm-project/clang/lib/Frontend/FrontendAction.cpp:1162:11
#34 0x00007f81f7e4ae58 clang::CodeGenAction::ExecuteAction() /opt/llvm-project/clang/lib/CodeGen/CodeGenAction.cpp:1170:5
#35 0x00007f81f437632e clang::FrontendAction::Execute() /opt/llvm-project/clang/lib/Frontend/FrontendAction.cpp:1059:38
#36 0x00007f81f4288109 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /opt/llvm-project/clang/lib/Frontend/CompilerInstance.cpp:1045:42
#37 0x00007f81f9164966 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /opt/llvm-project/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:264:38
#38 0x0000561f27804aa1 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /opt/llvm-project/clang/tools/driver/cc1_main.cpp:251:40
#39 0x0000561f277f2bca ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) /opt/llvm-project/clang/tools/driver/driver.cpp:319:20
#40 0x00007f81f3bb5f69 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::'lambda'()::operator()() const /opt/llvm-project/clang/lib/Driver/Job.cpp:428:32
#41 0x00007f81f3bb6575 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::'lambda'()>(long) /opt/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46:40
#42 0x00007f81ee516c18 llvm::function_ref<void ()>::operator()() const /opt/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:62
#43 0x00007f81ee52b008 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) /opt/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:434:10
#44 0x00007f81f3bb6187 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const /opt/llvm-project/clang/lib/Driver/Job.cpp:428:7
#45 0x00007f81f3b39bf8 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const /opt/llvm-project/clang/lib/Driver/Compilation.cpp:200:22
#46 0x00007f81f3b39f86 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const /opt/llvm-project/clang/lib/Driver/Compilation.cpp:254:62
#47 0x00007f81f3b52acb clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) /opt/llvm-project/clang/lib/Driver/Driver.cpp:1816:28
#48 0x0000561f277f4196 clang_main(int, char**) /opt/llvm-project/clang/tools/driver/driver.cpp:520:39
#49 0x0000561f27826c5e main /opt/llvm-project/build/tools/clang/tools/driver/clang-driver.cpp:11:63
#50 0x00007f81ede84083 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24083)
#51 0x0000561f277f125e _start (/opt/llvm-project/install/bin/clang+0x4f25e)
clang-16: error: clang frontend command failed with exit code 134 (use -v to see invocation)
clang version 16.0.0 (https://github.com/THU-DSP-LAB/llvm-project.git 60a504f481826a3e1fa84aa120786110e8714944)
Target: riscv32
Thread model: posix
InstalledDir: /opt/llvm-project/install/bin
clang-16: note: diagnostic msg:
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-16: note: diagnostic msg: /tmp/example1-4fb1fd.cl
clang-16: note: diagnostic msg: /tmp/example1-4fb1fd.sh
clang-16: note: diagnostic msg:

********************

[CTS] Basic下的kernel_limit_constants测试

__kernel void test( __global int *intOut, __global float *floatOut ) 
{
  intOut[0] = isinf( 1.0 ) ? 1 : 0;
  intOut[1] = isnormal( MAXFLOAT ) ? 1 : 0;
  intOut[2] = isnan( MAXFLOAT ) ? 1 : 0;
  intOut[3] = sizeof( MAXFLOAT );
  intOut[4] = ( MAXFLOAT == FLT_MAX ) ? 1 : 0;
  intOut[6] = ( MAXFLOAT == MAXFLOAT ) ? 1 : 0;
  intOut[7] = ( MAXFLOAT == 0x1.fffffep127f ) ? 1 : 0;
  floatOut[0] = MAXFLOAT;
}

目前isinf(MAX_FLOAT)这个函数取得的结果有点问题

atomic函数的支持

目前atomic函数的支持并不完整,需要支持更多的atomic函数


可以参考pocl里面的实现,支持更多的atomic函数

[CTS] basic大测试下的constant小测试

800000a8 <loop_constant_kernel>:
800000a8: 13 01 01 01  	addi	sp, sp, 16
800000ac: 23 20 11 00  	sw	ra, 0(sp)
800000b0: 83 22 85 00  	lw	t0, 8(a0)
800000b4: 23 2c 51 fe  	sw	t0, -8(sp)
800000b8: 23 2e a1 fe  	sw	a0, -4(sp)
800000bc: 83 22 05 00  	lw	t0, 0(a0)
800000c0: 23 2a 51 fe  	sw	t0, -12(sp)
800000c4: 57 40 00 5e  	vmv.v.x	v0, zero
800000c8: ef 00 c0 05  	jal	0x80000124 <_Z13get_global_idj>
800000cc: 83 24 81 ff  	lw	s1, -8(sp)
800000d0: b7 32 00 80  	lui	t0, 524291
800000d4: 63 56 90 02  	blez	s1, 0x80000100 <.LBB0_3>
800000d8: 03 23 c1 ff  	lw	t1, -4(sp)
800000dc: 03 23 43 00  	lw	t1, 4(t1)
800000e0: 83 a2 82 00  	lw	t0, 8(t0)
800000e4: 93 03 10 00  	li	t2, 1

800000e8 <.LBB0_2>:
800000e8: 03 24 03 00  	lw	s0, 0(t1)
800000ec: d3 f2 82 00  	fadd.s	t0, t0, s0
800000f0: b3 84 74 40  	sub	s1, s1, t2
800000f4: 13 03 c3 00  	addi	t1, t1, 12
800000f8: e3 98 04 fe  	bnez	s1, 0x800000e8 <.LBB0_2>
800000fc: 6f 00 80 00  	j	0x80000104 <.LBB0_4>

80000100 <.LBB0_3>:
80000100: 83 a2 82 00  	lw	t0, 8(t0)

80000104 <.LBB0_4>:
80000104: 57 30 01 96  	vsll.vi	v0, v0, 2
80000108: 03 23 41 ff  	lw	t1, -12(sp)
8000010c: 57 40 03 02  	vadd.vx	v0, v0, t1
80000110: d7 c0 02 5e  	vmv.v.x	v1, t0
80000114: 7b 60 10 00  	vsw12.v	v1, 0(v0)
80000118: 83 20 01 00  	lw	ra, 0(sp)
8000011c: 13 01 01 ff  	addi	sp, sp, -16
80000120: 67 80 00 00  	ret

这个是测试失败的编译器生成的汇编信息

Epilog 信息没有恢复正确

因为目前在恢复TP栈帧的时候,只恢复了TP,其实我们在访存指令的时候用到的是v32寄存器,所以还需要有恢复v32寄存器值这一步

[rodinia] b+tree findK生成指令错误

复现步骤:简化case,将command.txt修改为k 1
错误描述:

  1. 参数*KnodesD返回给pocl的是4 byte,直接指令生成时按8byte取,造成越界访问。
  2. 参数*ansD参数为40(a0), 指令中没有对该参数的读取及访问,造成结果未写出。
  3. 在vmsleu.vv指令的逻辑错误可能存在问题,该指令后并未有对offsetD[bid]赋值的指令。造成数据逻辑错误。
    WXWorkLocal_16989742161479
    WXWorkLocal_16992546869962
    WXWorkLocal_1699255316492

支持regexti指令的生成

目前承影架构中所有带立即数的向量指令,都只有5bit,regexti指令是扩展十一位立即数
regexti的立即数字段和下一条指令的立即数字段拼起来形成一个11位立即数,然后对这个数进行有符号扩展

O1参数优化下branch指令生成错误

对于非kernel函数,在优化参数O1下会将局部变量直接存到标量寄存器使用,而不是先存到tp栈上再取出使用,这会导致生成的分支指令为标量指令,导致有些情况下的结果错误,可以看下面这个例子:
源码:

int loop(int x) {
    for (int i = 0; i != 64; i=i+2) {
        if (x - i) return i;
    }
    return 0;
}

-O0:

loop:
        addi    sp, sp, 4
        addi    tp, tp, 12
        regext  zero, zero, 1
        vmv.v.x v32, tp
        sw      ra, -4(sp)
        regext  zero, zero, 8
        vsw.v   v0, -8(v32)
        li      t0, 0
        vmv.v.x v0, t0
        regext  zero, zero, 8
        vsw.v   v0, -12(v32)
        j       .LBB0_1
.LBB0_1:
        regext  zero, zero, 8
        vlw.v   v0, -12(v32)
        li      t0, 64
        vmv.v.x v1, t0
.Lpcrel_hi0:
        auipc   t1, %pcrel_hi(.LBB0_7)
        setrpc  zero, t1, %pcrel_lo(.Lpcrel_hi0)
        vbeq    v0, v1, .LBB0_6
        j       .LBB0_2
.LBB0_2:
        regext  zero, zero, 8
        vlw.v   v0, -8(v32)
        regext  zero, zero, 8
        vlw.v   v1, -12(v32)
        vsub.vv v0, v0, v1
        li      t0, 0
        vmv.v.x v1, t0
.Lpcrel_hi1:
        auipc   t1, %pcrel_hi(.LBB0_7)
        setrpc  zero, t1, %pcrel_lo(.Lpcrel_hi1)
        vbeq    v0, v1, .LBB0_4
        j       .LBB0_3
.LBB0_3:
        regext  zero, zero, 8
        vlw.v   v0, -12(v32)
        regext  zero, zero, 8
        vsw.v   v0, -4(v32)
        j       .LBB0_7
.LBB0_4:
        j       .LBB0_5
.LBB0_5:
        regext  zero, zero, 8
        vlw.v   v0, -12(v32)
        vadd.vi v0, v0, 2
        regext  zero, zero, 8
        vsw.v   v0, -12(v32)
        j       .LBB0_1
.LBB0_6:
        li      t0, 0
        vmv.v.x v0, t0
        regext  zero, zero, 8
        vsw.v   v0, -4(v32)
        j       .LBB0_7
.LBB0_7:
        join    zero, zero, 0
        regext  zero, zero, 8
        vlw.v   v0, -4(v32)
        lw      ra, -4(sp)
        addi    sp, sp, -4
        addi    tp, tp, -12
        regext  zero, zero, 1
        vmv.v.x v32, tp
        ret

-O1:

loop:
        li      t0, 0
        li      t2, 64
.LBB0_1:
        vmv.v.x v1, t0
.Lpcrel_hi0:
        auipc   t1, %pcrel_hi(.LBB0_5)
        setrpc  zero, t1, %pcrel_lo(.Lpcrel_hi0)
        vbne    v0, v1, .LBB0_4
        addi    t0, t0, 2
        bne     t0, t2, .LBB0_1
        vmv.v.x v0, zero
        j       .LBB0_5
.LBB0_4:
        vmv.v.x v0, t0
        j       .LBB0_5
.LBB0_5:
        join    zero, zero, 0
        ret

可以发现在O1优化下返回值有问题。

函数返回值过大的问题

当前函数返回值最多支持用16个寄存器(V0-V15)来接收,所以当返回值大小超过4x16个字节时(比如double16类型)就会出现问题。
参考下面这个用例:

double16 func(double16 x, double16 y)
{
    return x + y;
}

[libclc] ctz函数实现的缺失

跑integer_ops测试下的integer_clz测试发现函数缺失的问题。同时也发现了regexti指令插入位置的问题???

[CTS] Invalid Object Idx

  • 子用例:

progvar_prog_scope_init
progvar_prog_scope_uninit
progvar_prog_scope_misc

  • 报错信息:
    image

[CTS] Basic测试下的parameter_types测试

kernel void test_kernel(
char c, uchar uc, short s, ushort us, int i, uint ui, float f,
global float *result)
{

result[0] =  (c);
result[1] =  (uc);
result[2] =  (s);
result[3] =  (us);
result[4] =  (i);
result[5] =  (ui);
result[6] = f;
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.