hsafoundation / hlc-hsail-development-llvm Goto Github PK
View Code? Open in Web Editor NEWHSAIL LLVM Tree - Development has stopped on this branch This was a development branch
License: Other
HSAIL LLVM Tree - Development has stopped on this branch This was a development branch
License: Other
I'm using the latest HLC compiler from the branch hsail-review-v2.
Consider the following OpenCL program:
struct Foo
{
int a;
int b;
};
volatile int *DbgAddr = (int *)4;
__attribute__((noinline)) void Func(const struct Foo *f)
{
DbgAddr[0] = f->a;
DbgAddr[1] = f->b;
}
__kernel void Bug()
{
struct Foo f = {1, 2};
Func(&f);
}
This OpenCL program gets compiled into the following HSAIL code:
module &__llvm_hsail_module:1:0:$full:$large:$near;
decl function &Func()(arg_u64 %f);
function &Func()(arg_u64 %f)
{
// BB#0:
ld_arg_align(8)_u64 $d0, [%f];
ld_align(4)_u32 $s0, [$d0];
st_align(4)_u32 $s0, [4];
ld_align(4)_u32 $s0, [$d0+4];
st_align(8)_u32 $s0, [8];
ret;
};
prog kernel &__OpenCL_Bug_kernel(
kernarg_u64 %__global_offset_0,
kernarg_u64 %__global_offset_1,
kernarg_u64 %__global_offset_2,
kernarg_u64 %__printf_buffer,
kernarg_u64 %__vqueue_pointer,
kernarg_u64 %__aqlwrap_pointer)
{
align(8) private_u8 %__privateStack[8];
// BB#0:
st_private_align(4)_u32 2, [%__privateStack][4];
st_private_align(8)_u32 1, [%__privateStack];
lda_private_u32 $s0, [%__privateStack];
cvt_u64_u32 $d0, $s0;
{
arg_u64 %f;
st_arg_align(8)_u64 $d0, [%f];
call &Func () (%f);
}
ret;
};
In the kernel __OpenCL_Bug_kernel
, the address of of f
is obtained by using the lda_private_u32
instruction, which computes an address relative to the private segment. However, in the function Func
, a flat ld
instruction is used to read from that address.
Shouldn't either be the lda
instruction flat, or the ld
instruction private, so that the segments they refer to match?
I'm using the latest llc
from the branch hsail-stable-3.7
and the latest CLOC.sh
from the master branch of the CLOC
repository.
Consider the following OpenCL program:
__kernel void Bug2(__global unsigned short *a, __global long *b)
{
*b = *a;
}
Compiling this program with the CLOC.sh
gives the following HSAIL code:
module &__llvm_hsail_module:1:0:$full:$large:$near;
prog kernel &__OpenCL_Bug2_kernel(
kernarg_u64 %__global_offset_0,
kernarg_u64 %__global_offset_1,
kernarg_u64 %__global_offset_2,
kernarg_u64 %__printf_buffer,
kernarg_u64 %__vqueue_pointer,
kernarg_u64 %__aqlwrap_pointer,
kernarg_u64 %a,
kernarg_u64 %b)
{
// BB#0:
ld_kernarg_align(8)_width(all)_u64 $d0, [%a];
ld_global_align(2)_u8 $s0, [$d0];
cvt_u64_u32 $d0, $s0;
ld_kernarg_align(8)_width(all)_u64 $d1, [%b];
st_global_align(8)_u64 $d0, [$d1];
ret;
};
The second load instruction in the kernel (ld_global_align(2)_u8
) loads only a single byte instead of the whole unsigned short
from memory. Interestingly enough, when changing the type of b
to e.g. int
, the correct 16-bit load instruction is generated.
This code in HSAILTargetLowering::LowerArgument results in an inconsistency in how types that need to be promoted are handled.
Type *sType = type->getScalarType();
EVT argVT = Ins ? (_Ins)[ArgNo].VT : (_Outs)[ArgNo].VT;
if (sType->isIntegerTy(8))
argVT = MVT::i8;
else if (sType->isIntegerTy(16))
argVT = MVT::i16;
The default ABI behavior for < i32 return values is to promote to i32 (see getTypeForExtArgOrReturn).
This is what happens for i1, but because of this special casing of i8 and i16, the return store is lowered as the small type despite the parameter being the full i32 type. The argument / return lowering should stop looking at the IR type, and use the computed set of VTs the argument is broken up into. There should also be a calling convention file to handle more of this kind of type logic.
Currently these are only usable from intrinsics (which should be removed since they are easy to match)
These should be treated as an 8-bit byte load. If the ABI requires these are promoted to 32-bit and occupy that much space, they should be marked with zeroext parameter attributes
These are currently selected to firstbit / lastbit, but these have different behavior on zero input. For the defined version, the LLVM intrinsic is supposed to return the size of the type in bits on zero input. firstbit / last bit return 0xffffffff, so the non-undef version needs to combine this with a mask for the size of the type.
%r2 = fcmp olt double %r0, %r1
%r3 = select i1 %r2, double %r0, double %r1
This sequence is incorrectly emitted as min_f64. This does not have the same NaN behavior as the IEEE minNum/maxNum behavior the instruction implements.
This test case (https://gist.github.com/0dd7835fb7e9087851a5.git) causes an error in HLC:
llc -O2 -march=hsail -filetype=asm -o dump.hsail dump.opt.ll -debug
********** COMPUTING STACKMAP LIVENESS: ZZN4Sort15SortIntegerKeysERN11Concurrency5arrayIjLi1EEES3_S3_iEN3_EC__219__cxxamp_trampolineEPjiiS5_iiS5_iii **********
terminate called after throwing an instance of 'SyntaxError'
#0 0x1ccc6db llvm::sys::PrintStackTrace(_IO_FILE*) (/opt/amd/hlc.hsail-1.0f/llc+0x1ccc6db)
#1 0x1ccc976 PrintStackTraceSignalHandler(void*) (/opt/amd/hlc.hsail-1.0f/llc+0x1ccc976)
#2 0x1ccb3cf SignalHandler(int) (/opt/amd/hlc.hsail-1.0f/llc+0x1ccb3cf)
#3 0x7fee225f0340 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10340)
#4 0x7fee21806cc9 gsignal /build/buildd/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56:0
#5 0x7fee2180a0d8 abort /build/buildd/eglibc-2.19/stdlib/abort.c:91:0
#6 0x7fee22118535 __gnu_cxx::__verbose_terminate_handler() (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x60535)
#7 0x7fee221166d6 (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x5e6d6)
#8 0x7fee22116703 (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x5e703)
#9 0x7fee22116922 (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x5e922)
#10 0x1cf6254 HSAIL_ASM::Brigantine::handleError(SyntaxError const&) (/opt/amd/hlc.hsail-1.0f/llc+0x1cf6254)
#11 0x1cf6372 HSAIL_ASM::Brigantine::brigWriteError(char const*, HSAIL_ASM::SourceInfo const*) (/opt/amd/hlc.hsail-1.0f/llc+0x1cf6372)
#12 0x1cf8b59 HSAIL_ASM::Brigantine::createRef(HSAIL_ASM::SRef const&, HSAIL_ASM::OperandRegister, long, bool, HSAIL_ASM::SourceInfo const*) (/opt/amd/hlc.hsail-1.0f/llc+0x1cf8b59)
#13 0xf97086 HSAIL_ASM::Brigantine::createRef(HSAIL_ASM::SRef const&, HSAIL_ASM::SRef&, long, bool, HSAIL_ASM::SourceInfo const*) (/opt/amd/hlc.hsail-1.0f/llc+0xf97086)
#14 0xfa57f5 llvm::BRIGAsmPrinter::BrigEmitOperandLdStAddress(llvm::MachineInstr const*, unsigned int, unsigned int) (/opt/amd/hlc.hsail-1.0f/llc+0xfa57f5)
#15 0xfa7cf8 llvm::BRIGAsmPrinter::BrigEmitInstMem(llvm::MachineInstr const&, unsigned int) (/opt/amd/hlc.hsail-1.0f/llc+0xfa7cf8)
#16 0xfa190d llvm::BRIGAsmPrinter::EmitInstructionImpl(llvm::MachineInstr const*) (/opt/amd/hlc.hsail-1.0f/llc+0xfa190d)
#17 0xfa1686 llvm::BRIGAsmPrinter::EmitInstruction(llvm::MachineInstr const*) (/opt/amd/hlc.hsail-1.0f/llc+0xfa1686)
#18 0x14e03c1 llvm::AsmPrinter::EmitFunctionBody() (/opt/amd/hlc.hsail-1.0f/llc+0x14e03c1)
#19 0xfa0c0e llvm::BRIGAsmPrinter::runOnMachineFunction(llvm::MachineFunction&) (/opt/amd/hlc.hsail-1.0f/llc+0xfa0c0e)
#20 0x164bad5 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/opt/amd/hlc.hsail-1.0f/llc+0x164bad5)
#21 0x191a9cd llvm::FPPassManager::runOnFunction(llvm::Function&) (/opt/amd/hlc.hsail-1.0f/llc+0x191a9cd)
#22 0x191ab6e llvm::FPPassManager::runOnModule(llvm::Module&) (/opt/amd/hlc.hsail-1.0f/llc+0x191ab6e)
#23 0x191ae8d (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) (/opt/amd/hlc.hsail-1.0f/llc+0x191ae8d)
#24 0x191b592 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/opt/amd/hlc.hsail-1.0f/llc+0x191b592)
#25 0x191b7e1 llvm::legacy::PassManager::run(llvm::Module&) (/opt/amd/hlc.hsail-1.0f/llc+0x191b7e1)
#26 0xba3950 compileModule(char**, llvm::LLVMContext&) (/opt/amd/hlc.hsail-1.0f/llc+0xba3950)
#27 0xba2afe main (/opt/amd/hlc.hsail-1.0f/llc+0xba2afe)
#28 0x7fee217f1ec5 __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:321:0
#29 0xba00a9 _start (/opt/amd/hlc.hsail-1.0f/llc+0xba00a9)
Stack dump:
0. Program arguments: /opt/amd/hlc.hsail-1.0f/llc -O2 -march=hsail -filetype=asm -o dump.hsail dump.opt.ll -debug
1. Running pass 'Function Pass Manager' on module 'dump.opt.ll'.
2. Running pass 'BRIG Container Filler' on function '@ZZN4Sort15SortIntegerKeysERN11Concurrency5arrayIjLi1EEES3_S3_iEN3_EC__219__cxxamp_trampolineEPjiiS5_iiS5_iii'
Aborted
Note: this is from the Sort C++AMP sample from the APP SDK
I'm on the latest HLC compiler from the branch hsail-stable-3.7
.
OpenCL prohibits the use of non-void
return types in __kernel
functions. Therefore, I would expect the compiler to reject the following OpenCL kernel and terminate with a nice error message. Unfortunately, that's not the case.
__kernel bool Main()
{
return true;
}
It seems like the HLC compiler eats the kernel, but eventually CLOC
fails assembling the resulting HSAIL code. The following error message is emitted:
> st_arg_u32 $s0, [%__OpenCL_Main_kernel];
> ^
input(14,18): Symbol not found: %__OpenCL_Main_kernel
ERROR: The following command failed with return code 1.
HSAILasm -o /tmp/hsa_finalizer-RuZXe8/temp.hsail /tmp/cloc26565/temp.hsail
I'm on the latest HLC compiler from the branch hsail-stable-3.7
. I compiled with -O2
.
The following kernel fails to compile.
__attribute__((noinline)) static uint Op(uint in)
{
return 2*in;
}
__kernel void Bug1(uint in, __global uint *out)
{
*out = Op(in);
}
The HLC compiler crashes with the following error message:
ERROR: The following command failed with return code 134.
./llc -O2 -march=hsail64 -filetype=asm -o /tmp/cloc13948/temp.hsail /tmp/cloc13948/temp.opt.bc
llc: /home/dgeier/Documents/swarm64/code/hsa/HLC-HSAIL-Development-LLVM/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:3346: llvm::SDValue llvm::SelectionDAG::getNode(unsigned int, llvm::SDLoc, llvm::EVT, llvm::SDValue, llvm::SDValue, const llvm::SDNodeFlags*): Assertion `N1.getValueType() == N2.getValueType() && N1.getValueType() == VT && "Binary operator types must match!"' failed.
0 llc 0x00000000022348d2 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 59
1 llc 0x0000000002234c26
2 llc 0x00000000022336b1
3 libpthread.so.0 0x00007fe6272b1d10
4 libc.so.6 0x00007fe626451267 gsignal + 55
5 libc.so.6 0x00007fe626452eca abort + 362
6 libc.so.6 0x00007fe62644a03d
7 libc.so.6 0x00007fe62644a0f2
8 llc 0x0000000002091357 llvm::SelectionDAG::getNode(unsigned int, llvm::SDLoc, llvm::EVT, llvm::SDValue, llvm::SDValue, llvm::SDNodeFlags const*) + 2285
9 llc 0x00000000017e2ba1 llvm::HSAILTargetLowering::getArgStore(llvm::SelectionDAG&, llvm::SDLoc, llvm::EVT, llvm::Type*, unsigned int, llvm::SDValue, llvm::SDValue, llvm::SDValue, unsigned int, llvm::SDValue, llvm::AAMDNodes const&, unsigned long) const + 741
10 llc 0x00000000017e360d llvm::HSAILTargetLowering::LowerArgument(llvm::SDValue, llvm::SDValue, bool, llvm::SmallVectorImpl<llvm::ISD::InputArg> const*, llvm::SmallVectorImpl<llvm::ISD::OutputArg> const*, llvm::SDLoc, llvm::SelectionDAG&, llvm::SmallVectorImpl<llvm::SDValue>*, unsigned int&, llvm::Type*, unsigned int, char const*, llvm::SDValue, llvm::SmallVectorImpl<llvm::SDValue> const*, bool, llvm::AAMDNodes const&, unsigned long) const + 2131
11 llc 0x00000000017e4ca5 llvm::HSAILTargetLowering::LowerCall(llvm::TargetLowering::CallLoweringInfo&, llvm::SmallVectorImpl<llvm::SDValue>&) const + 4339
12 llc 0x00000000020e5da8 llvm::TargetLowering::LowerCallTo(llvm::TargetLowering::CallLoweringInfo&) const + 4450
13 llc 0x00000000020dc4e2 llvm::SelectionDAGBuilder::lowerInvokable(llvm::TargetLowering::CallLoweringInfo&, llvm::MachineBasicBlock*) + 444
14 llc 0x00000000020dca61 llvm::SelectionDAGBuilder::LowerCallTo(llvm::ImmutableCallSite, llvm::SDValue, bool, llvm::MachineBasicBlock*) + 951
15 llc 0x00000000020df148 llvm::SelectionDAGBuilder::visitCall(llvm::CallInst const&) + 2294
16 llc 0x00000000020be4d6 llvm::SelectionDAGBuilder::visit(unsigned int, llvm::User const&) + 1256
17 llc 0x00000000020bdf49 llvm::SelectionDAGBuilder::visit(llvm::Instruction const&) + 155
18 llc 0x000000000210c037 llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator<llvm::Instruction const>, llvm::ilist_iterator<llvm::Instruction const>, bool&) + 83
19 llc 0x000000000210f3d0 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 2866
20 llc 0x000000000210b261 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 1049
21 llc 0x00000000017d2fd9
22 llc 0x0000000001a8061d llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 95
23 llc 0x0000000001e0cc9d llvm::FPPassManager::runOnFunction(llvm::Function&) + 305
24 llc 0x0000000001e0ce3c llvm::FPPassManager::runOnModule(llvm::Module&) + 112
25 llc 0x0000000001e0d1b4
26 llc 0x0000000001e0d894 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 252
27 llc 0x0000000001e0dacd llvm::legacy::PassManager::run(llvm::Module&) + 39
28 llc 0x0000000000cb0376
29 llc 0x0000000000caf30a main + 257
30 libc.so.6 0x00007fe62643ca40 __libc_start_main + 240
31 llc 0x0000000000cae049 _start + 41
Stack dump:
0. Program arguments: ./llc -O2 -march=hsail64 -filetype=asm -o /tmp/cloc13948/temp.hsail /tmp/cloc13948/temp.opt.bc
1. Running pass 'Function Pass Manager' on module '/tmp/cloc13948/temp.opt.bc'.
2. Running pass 'HSAIL DAG->DAG Instruction Selection' on function '@__OpenCL_Bug1_kernel'
/opt/amd/bin/cloc.sh: line 333: 13966 Aborted (core dumped) $HSA_LLVM_PATH/$CMD_LLC -o $TMPDIR/$FNAME.hsail $TMPDIR/$FNAME.opt.bc
Weirdly, after changing the name of the in
argument of Op()
to in2
, the error disappears. The following kernel compilers successfully.
__attribute__((noinline)) static uint Op(uint in2)
{
return 2*in2;
}
__kernel void Bug1(uint in, __global uint *out)
{
*out = Op(in);
}
While testing simple OpenCL image kernels through the HSAIL LLVM I noticed that three nested loops cause the outer loop to branch incorrectly. More specifically, the loop variable is incremented every other iteration of the outer loop. Attached is a simple OpenCL kernel that reproduces the issue. The issue persists up to -O2, but is not present when using -O3.
For this kernel, the HSAIL LLVM create the following outer loop structure (complete HSAIL attached):
…
@BB0_2:
mov_b32 $s5, 0;
not_b1 $c1, $c1;
cbr_b1 $c1, @BB0_3;
br @BB0_7;
…
@BB0_7:
add_u32 $s4, $s4, 1;
cmp_ne_b1_s32 $c2, $s4, $s0;
cbr_b1 $c2, @BB0_2;
The boolean c1
is negated on each iteration, but only every other iteration increments the outer loop variable $s4
. It seems that the not clause is related to a check that asserts that the outer loop is executed at all (in this case, that x_size > -1
).
Pull request #29 fixes this simple case, but more thorough control flow analysis is required to detect all variable uses inside loops.
This IR crashes the machine verifier:
target datalayout = "e-p:32:32-p1:64:64:64-p2:64:64:64-p3:32:32:32-p4:64:64:64-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-n32"
target triple = "hsail64-pc-unknown-amdopencl"
; Function Attrs: nounwind
define spir_kernel void @hsaPy_hsapy_devfn__5F__5F_main_5F__5F__2E_TestScan_2E_test_5F_single_5F_block_2E__3C_locals_3E__2E_scan_5F_block_24_4_2E_array_28_int64_2C__20_1d_2C__20_C_29__2E_array_28_int64_2C__20_1d_2C__20_C_29_(i8 addrspace(1)* nocapture readnone %dot.2, i64 %dot.3, i64 %dot.4, i64 addrspace(1)* nocapture %dot.5, i64 %dot.6, i64 %dot.7, i8 addrspace(1)* nocapture readnone %dot.8, i64 %dot.9, i64 %dot.10, i64 addrspace(1)* nocapture %dot.11, i64 %dot.12, i64 %dot.13) #0 {
.14:
ret void
}
; Function Attrs: nounwind
define linkonce_odr spir_func i32 @hsapy_devfn__5F__5F_main_5F__5F__2E_device_5F_scan_5F_generic_24_1_2E_int64_2E_array_28_int64_2C__20_1d_2C__20_C_29_(i64* nocapture %dot.ret, i64 %arg.tid, i8* %arg.data.0, i64 %arg.data.1, i64 %arg.data.2, i64 addrspace(4)* %arg.data.3, i64 %arg.data.4.0, i64 %arg.data.5.0) #0 {
entry:
br i1 undef, label %B40.lr.ph, label %B154
B40.lr.ph: ; preds = %entry
%dot.802 = or i64 undef, 1
br label %B40
B40: ; preds = %B130, %B40.lr.ph
br i1 undef, label %B65, label %B130
B65: ; preds = %B40
%dot.134 = add i64 undef, undef
br label %B130
B130: ; preds = %B65, %B40
%dot.198 = sdiv i64 undef, 2
br i1 undef, label %B40, label %B154.loopexit
B154.loopexit: ; preds = %B130
br label %B154
B154: ; preds = %B154.loopexit, %entry
%dot.261 = icmp eq i64 %arg.tid, 0
br i1 %dot.261, label %B206, label %B238.preheader
B206: ; preds = %B154
br label %B238.preheader
B238.preheader: ; preds = %B206, %B154
br i1 undef, label %B250.lr.ph, label %B384
B250.lr.ph: ; preds = %B238.preheader
%dot.377 = add i64 undef, 2
br label %B250
B250: ; preds = %B370, %B250.lr.ph
br i1 undef, label %B285, label %B370
B285: ; preds = %B250
br label %B370
B370: ; preds = %B285, %B250
br i1 undef, label %B250, label %B384.loopexit
B384.loopexit: ; preds = %B370
br label %B384
B384: ; preds = %B384.loopexit, %B238.preheader
ret i32 0
}
attributes #0 = { nounwind }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-realign-stack" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind readnone "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-realign-stack" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
If a function has a return type, and the body is only an unreachable instruction, it fails brig validation.
define float @empty_func_with_return_type() {
unreachable
}
produces:
Incompatible identifier declarations
LLVM ERROR:
Brig container validation has failed in BRIGAsmPrinter.cpp
The current assortment of bitselect patterns do not work for simple permutations of the commutable operators.
Additionally, many of these patterns do not work with the pattern they recognize after instcombine.
For example:
define i32 @bitselect_pat4_rrr(i32 %src0, i32 %src1, i32 %src2) #0 {
%tmp0 = and i32 %src0, %src2
%tmp1 = xor i32 %src2, %tmp0
%tmp2 = and i32 %src0, %src1
%tmp3 = xor i32 %tmp1, %tmp2
ret i32 %tmp3
}
This forms a bitselect, but the the instcombined version does not:
define i32 @bitselect_pat4_rrr(i32 %src0, i32 %src1, i32 %src2) #0 {
%1 = xor i32 %src0, -1
%tmp1 = and i32 %1, %src2
%tmp2 = and i32 %src0, %src1
%tmp3 = xor i32 %tmp1, %tmp2
ret i32 %tmp3
}
cmp instructions can write to non-condition registers instead, so extensions of setcc should be promoted to directly write to the wider type register.
e.g.
(i32 sext (i1 setcc)) -> i32 setcc
This test case (https://gist.github.com/0bf0e257663a02890f36.git) will cause opt (hsail-stable-3.7) to crash.
It looks like to be an issue in SROA. For now, this problem could be worked around by disabling SROA with -use-new-sroa=false
opt -O3 -disable-simplify-libcalls -debug -verify ./dump.linked.bc -o ./dump.opt.bc
CGSCCPASSMGR: SCC Refresh didn't change call graph.
Inliner visiting SCC: opencl_frexp_global: 1 call sites.
Inlining: cost=always, Call: %4 = call spir_func double @_Z5frexpdPU3AS4i(double %x, i32 addrspace(4)* %3)
CGSCCPASSMGR: Refreshing SCC with 1 nodes:
Call graph node for function: 'opencl_frexp_global'<<0x4e1d920>> #uses=1
CGSCCPASSMGR: SCC Refresh didn't change call graph.
SROA function: opencl_frexp_global
SROA alloca: %1 = alloca i32 addrspace(1)*, align 8
Rewriting FCA loads and stores...
Slices of alloca: %1 = alloca i32 addrspace(1)*, align 8
[0,8) slice #0
used by: store i32 addrspace(1)* %exp, i32 addrspace(1)** %1, align 8
[0,8) slice #1
used by: %3 = load i32 addrspace(4)*, i32 addrspace(4)** %2, align 8
Pre-splitting loads and stores
Searching for candidate loads and stores
Rewriting alloca partition [0,8) to: %1 = alloca i32 addrspace(1)*, align 8
rewriting [0,8) slice #0
original: store i32 addrspace(1)* %exp, i32 addrspace(1)** %1, align 8
to: store i32 addrspace(1)* %exp, i32 addrspace(1)** %1, align 8
rewriting [0,8) slice #1
original: %3 = load i32 addrspace(4)*, i32 addrspace(4)** %2, align 8
opt: /home/xkerox/docker_volume/HSA_1.0f/HLC-HSAIL-Development-LLVM.hsail-stable-3.7/HLC-HSAIL-Development-LLVM/lib/IR/Instructions.cpp:2257: static llvm::CastInst* llvm::CastInst::Create(llvm::Instructi
#0 0x23ea033 llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/opt/amd/bin/opt+0x23ea033)
#1 0x23ea348 PrintStackTraceSignalHandler(void*) (/opt/amd/bin/opt+0x23ea348)
#2 0x23e8c80 SignalHandler(int) (/opt/amd/bin/opt+0x23e8c80)
#3 0x7f5785300340 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10340)
#4 0x7f5784516cc9 gsignal /build/buildd/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56:0
#5 0x7f578451a0d8 abort /build/buildd/eglibc-2.19/stdlib/abort.c:91:0
#6 0x7f578450fb86 __assert_fail_base /build/buildd/eglibc-2.19/assert/assert.c:92:0
#7 0x7f578450fc32 (/lib/x86_64-linux-gnu/libc.so.6+0x2fc32)
#8 0x1f2349b llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, llvm::Twine const&, llvm::Instruction*) (/opt/amd/bin/opt+0x1f2349b)
#9 0x22f3185 llvm::IRBuilder<true, llvm::ConstantFolder, (anonymous namespace)::IRBuilderPrefixedInserter<true> >::CreateCast(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, llvm::Twine const&) (/
#10 0x22ed966 llvm::IRBuilder<true, llvm::ConstantFolder, (anonymous namespace)::IRBuilderPrefixedInserter<true> >::CreateBitCast(llvm::Value*, llvm::Type*, llvm::Twine const&) (/opt/amd/bin/opt+0x22ed96
#11 0x22dd4bb convertValue(llvm::DataLayout const&, llvm::IRBuilder<true, llvm::ConstantFolder, (anonymous namespace)::IRBuilderPrefixedInserter<true> >&, llvm::Value*, llvm::Type*) (/opt/amd/bin/opt+0x2
#12 0x22e11b6 (anonymous namespace)::AllocaSliceRewriter::visitLoadInst(llvm::LoadInst&) (/opt/amd/bin/opt+0x22e11b6)
#13 0x22f8a9f llvm::InstVisitor<(anonymous namespace)::AllocaSliceRewriter, bool>::visitLoad(llvm::LoadInst&) (/opt/amd/bin/opt+0x22f8a9f)
#14 0x22f3bc5 llvm::InstVisitor<(anonymous namespace)::AllocaSliceRewriter, bool>::visit(llvm::Instruction&) (/opt/amd/bin/opt+0x22f3bc5)
#15 0x22ee687 llvm::InstVisitor<(anonymous namespace)::AllocaSliceRewriter, bool>::visit(llvm::Instruction*) (/opt/amd/bin/opt+0x22ee687)
#16 0x22e0490 (anonymous namespace)::AllocaSliceRewriter::visit((anonymous namespace)::Slice const*) (/opt/amd/bin/opt+0x22e0490)
#17 0x22e905f (anonymous namespace)::SROA::rewritePartition(llvm::AllocaInst&, (anonymous namespace)::AllocaSlices&, (anonymous namespace)::AllocaSlices::Partition&) (/opt/amd/bin/opt+0x22e905f)
#18 0x22e98c8 (anonymous namespace)::SROA::splitAlloca(llvm::AllocaInst&, (anonymous namespace)::AllocaSlices&) (/opt/amd/bin/opt+0x22e98c8)
#19 0x22ea3d5 (anonymous namespace)::SROA::runOnAlloca(llvm::AllocaInst&) (/opt/amd/bin/opt+0x22ea3d5)
#20 0x22eb1ed (anonymous namespace)::SROA::runOnFunction(llvm::Function&) (/opt/amd/bin/opt+0x22eb1ed)
#21 0x1f49bcd llvm::FPPassManager::runOnFunction(llvm::Function&) (/opt/amd/bin/opt+0x1f49bcd)
#22 0x1fc5323 (anonymous namespace)::CGPassManager::RunPassOnSCC(llvm::Pass*, llvm::CallGraphSCC&, llvm::CallGraph&, bool&, bool&) (/opt/amd/bin/opt+0x1fc5323)
#23 0x1fc64bf (anonymous namespace)::CGPassManager::RunAllPassesOnSCC(llvm::CallGraphSCC&, llvm::CallGraph&, bool&) (/opt/amd/bin/opt+0x1fc64bf)
#24 0x1fc682a (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) (/opt/amd/bin/opt+0x1fc682a)
#25 0x1f4a0f6 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) (/opt/amd/bin/opt+0x1f4a0f6)
#26 0x1f4a873 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/opt/amd/bin/opt+0x1f4a873)
#27 0x1f4aadd llvm::legacy::PassManager::run(llvm::Module&) (/opt/amd/bin/opt+0x1f4aadd)
#28 0xeb955e main (/opt/amd/bin/opt+0xeb955e)
#29 0x7f5784501ec5 __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:321:0
#30 0xe96b69 _start (/opt/amd/bin/opt+0xe96b69)
Stack dump:
0. Program arguments: /opt/amd/bin/opt -O3 -disable-simplify-libcalls -debug -verify ./dump.linked.bc -o ./dump.opt.bc
1. Running pass 'CallGraph Pass Manager' on module './dump.linked.bc'.
. Running pass 'SROA' on function '@opencl_frexp_global'
Before opt, notice the barrier is after label 17 which is a post-dom node
%13 = icmp eq i32 %10, 0
br i1 %13, label %14, label %17
; <label>:14 ; preds = %8
%15 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(3)* @ZZ4mainEN3_EC__019__cxxamp_trampolineEPiiiiiiii.ZZZ4mainENK3_EC__0clEN11Concurrency11tiled_indexILi2ELi0ELi0EEEE1t, i64 0, i64 0
store i32 123, i32 addrspace(3)* %15, align 4, !tbaa !8
%16 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(3)* @ZZ4mainEN3_EC__019__cxxamp_trampolineEPiiiiiiii.ZZZ4mainENK3_EC__0clEN11Concurrency11tiled_indexILi2ELi0ELi0EEEE1t, i64 0, i64 1
store i32 321, i32 addrspace(3)* %16, align 4, !tbaa !8
br label %17
; <label>:17 ; preds = %14, %8
tail call spir_func void @amp_barrier(i32 3) #17
After opt, the control flow has been transformed from a triangle into a diamond. The original barrier has been replicated and hosted into the branches, which is illegal:
%12 = icmp eq i32 %9, 0
br i1 %12, label %.thread, label %13
.thread: ; preds = %8
store i32 123, i32 addrspace(3)* getelementptr inbounds ([2 x i32], [2 x i32] addrspace(3)* @ZZ4mainEN3_EC__019__cxxamp_trampolineEPiiiiiiii.ZZZ4mainENK3_EC__0clEN11Concurrency11tiled_indexILi2ELi0ELi0EEEE1t, i64 0, i64 0), align 4, !tbaa !8
store i32 321, i32 addrspace(3)* getelementptr inbounds ([2 x i32], [2 x i32] addrspace(3)* @ZZ4mainEN3_EC__019__cxxamp_trampolineEPiiiiiiii.ZZZ4mainENK3_EC__0clEN11Concurrency11tiled_indexILi2ELi0ELi0EEEE1t, i64 0, i64 1), align 4, !tbaa !8
tail call spir_func void @__hsail_barrier() #4
br label %"_ZZ4mainENK3$_0clEN11Concurrency11tiled_indexILi2ELi0ELi0EEE.exit"
; <label>:13 ; preds = %8
tail call spir_func void @__hsail_barrier() #4
%14 = icmp eq i32 %9, 1
br i1 %14, label %15, label %"_ZZ4mainENK3$_0clEN11Concurrency11tiled_indexILi2ELi0ELi0EEE.exit"
The test files are here: https://gist.github.com/scchan/b69e6fd4225815741415
Tested on hsa-1.0f, hsail-stable-3.7 and affects both branches
I'm on the latest HLC compiler from the branch hsail-stable-3.7
. The generated HSAIL code for the following two kernels is identical, even though the first kernel takes args
by reference (pointer) and the second kernel takes args
by value. I compiled both kernels with -O2
.
First kernel:
struct KernArgs
{
uint Arg32;
ulong Arg64;
};
__kernel void Main(__global ulong *res, const __global struct KernArgs *args)
{
*res = (ulong)args->Arg32|args->Arg64;
}
Second kernel:
struct KernArgs
{
uint Arg32;
ulong Arg64;
};
__kernel void Main(__global ulong *res, const struct KernArgs args)
{
*res = (ulong)args.Arg32|args.Arg64;
}
Compiling both kernels yields identical HSAIL code:
module &__llvm_hsail_module:1:0:$full:$large:$near;
prog kernel &__OpenCL_Main_kernel(
kernarg_u64 %__global_offset_0,
kernarg_u64 %__global_offset_1,
kernarg_u64 %__global_offset_2,
kernarg_u64 %__printf_buffer,
kernarg_u64 %__vqueue_pointer,
kernarg_u64 %__aqlwrap_pointer,
kernarg_u64 %res,
kernarg_u64 %args)
{
// BB#0:
ld_kernarg_align(8)_width(all)_u64 $d0, [%args];
ld_global_align(8)_u64 $d1, [$d0+8];
ld_global_align(4)_u32 $s0, [$d0];
cvt_u64_u32 $d0, $s0;
or_b64 $d0, $d0, $d1;
ld_kernarg_align(8)_width(all)_u64 $d1, [%res];
st_global_align(8)_u64 $d0, [$d1];
ret;
};
The LLVM IR for the two kernels is:
First kernel:
; ModuleID = '/tmp/cloc23414/bug.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024"
target triple = "spir64-unknown-unknown"
%struct.KernArgs = type { i32, i64 }
; Function Attrs: nounwind
define spir_kernel void @__OpenCL_Main_kernel(i64 addrspace(1)* %res, %struct.KernArgs addrspace(1)* %args) #0 {
%1 = alloca i64 addrspace(1)*, align 8
%2 = alloca %struct.KernArgs addrspace(1)*, align 8
store i64 addrspace(1)* %res, i64 addrspace(1)** %1, align 8
store %struct.KernArgs addrspace(1)* %args, %struct.KernArgs addrspace(1)** %2, align 8
%3 = load %struct.KernArgs addrspace(1)*, %struct.KernArgs addrspace(1)** %2, align 8
%4 = getelementptr inbounds %struct.KernArgs, %struct.KernArgs addrspace(1)* %3, i32 0, i32 0
%5 = load i32, i32 addrspace(1)* %4, align 4
%6 = zext i32 %5 to i64
%7 = load %struct.KernArgs addrspace(1)*, %struct.KernArgs addrspace(1)** %2, align 8
%8 = getelementptr inbounds %struct.KernArgs, %struct.KernArgs addrspace(1)* %7, i32 0, i32 1
%9 = load i64, i64 addrspace(1)* %8, align 8
%10 = or i64 %6, %9
%11 = load i64 addrspace(1)*, i64 addrspace(1)** %1, align 8
store i64 %10, i64 addrspace(1)* %11, align 8
ret void
}
attributes #0 = { nounwind }
!opencl.kernels = !{!0}
!opencl.enable.FP_CONTRACT = !{}
!opencl.ocl.version = !{!6}
!opencl.spir.version = !{!6}
!0 = !{void (i64 addrspace(1)*, %struct.KernArgs addrspace(1)*)* @__OpenCL_Main_kernel, !1, !2, !3, !4, !5}
!1 = !{!"kernel_arg_addr_space", i32 1, i32 1}
!2 = !{!"kernel_arg_access_qual", !"none", !"none"}
!3 = !{!"kernel_arg_type", !"ulong*", !"struct KernArgs*"}
!4 = !{!"kernel_arg_type_qual", !"", !"const"}
!5 = !{!"kernel_arg_base_type", !"ulong*", !"struct KernArgs*"}
!6 = !{i32 2, i32 0}
Second kernel:
; ModuleID = '/tmp/cloc23566/bug.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024"
target triple = "spir64-unknown-unknown"
%struct.KernArgs = type { i32, i64 }
; Function Attrs: nounwind
define spir_kernel void @__OpenCL_Main_kernel(i64 addrspace(1)* %res, %struct.KernArgs* byval %args) #0 {
%1 = alloca i64 addrspace(1)*, align 8
store i64 addrspace(1)* %res, i64 addrspace(1)** %1, align 8
%2 = getelementptr inbounds %struct.KernArgs, %struct.KernArgs* %args, i32 0, i32 0
%3 = load i32, i32* %2, align 4
%4 = zext i32 %3 to i64
%5 = getelementptr inbounds %struct.KernArgs, %struct.KernArgs* %args, i32 0, i32 1
%6 = load i64, i64* %5, align 8
%7 = or i64 %4, %6
%8 = load i64 addrspace(1)*, i64 addrspace(1)** %1, align 8
store i64 %7, i64 addrspace(1)* %8, align 8
ret void
}
attributes #0 = { nounwind }
!opencl.kernels = !{!0}
!opencl.enable.FP_CONTRACT = !{}
!opencl.ocl.version = !{!6}
!opencl.spir.version = !{!6}
!0 = !{void (i64 addrspace(1)*, %struct.KernArgs*)* @__OpenCL_Main_kernel, !1, !2, !3, !4, !5}
!1 = !{!"kernel_arg_addr_space", i32 1, i32 0}
!2 = !{!"kernel_arg_access_qual", !"none", !"none"}
!3 = !{!"kernel_arg_type", !"ulong*", !"struct KernArgs"}
!4 = !{!"kernel_arg_type_qual", !"", !"const"}
!5 = !{!"kernel_arg_base_type", !"ulong*", !"struct KernArgs"}
!6 = !{i32 2, i32 0}
I'm on the latest HLC compiler from the branch hsail-stable-3.7
.
Consider the following OpenCL program:
__attribute__((__noinline__)) bool IsEmpty()
{
return true;
}
__kernel void Main()
{
IsEmpty();
}
Compiling the program above with -O0
gives the following error:
The error is:
> call &IsEmpty (%IsEmpty) ();
> ^
input(27,17): Incompatible types of formal and actual arguments
ERROR: The following command failed with return code 1.
HSAILasm -o /tmp/hsa_finalizer-KenL4R/temp.hsail /tmp/cloc20050/temp.hsail
The corresponding intermediate assembly looks like:
module &__llvm_hsail_module:1:0:$full:$large:$near;
decl function &IsEmpty(arg_u32 %ret)();
function &IsEmpty(arg_u32 %IsEmpty)()
{
// BB#0:
mov_b32 $s0, 1;
st_arg_u32 $s0, [%IsEmpty];
ret;
};
prog kernel &__OpenCL_Main_kernel(
kernarg_u64 %__global_offset_0,
kernarg_u64 %__global_offset_1,
kernarg_u64 %__global_offset_2,
kernarg_u64 %__printf_buffer,
kernarg_u64 %__vqueue_pointer,
kernarg_u64 %__aqlwrap_pointer)
{
align(4) spill_u8 %__spillStack[4];
// BB#0:
{
arg_u8 %IsEmpty;
call &IsEmpty (%IsEmpty) ();
ld_arg_u8 $s0, [%IsEmpty];
}
st_spill_align(4)_u32 $s0, [%__spillStack]; // 4-byte Folded Spill
ret;
};
Apparently, the HSAIL code generator outputs an arg_u32
return value type for the IsEmpty()
function, but the corresponding argument on the call-site is of type arg_u8
.
This patch adds detection of kernels from the opencl.kernels metadata: https://github.com/pocl/pocl/blob/master/tools/patches/llvm-3.7-hsail-branch.patch
The existing pattern was incorrect and required rewriting to make progress on instruction refactoring.
The old TernaryFusedShlAdd did (add (shl $src0, imm:$src1), $src2) -> mad $src1, $src2, $src3 where src2 is not 1 << $src2 like it needs to be.
The test case is simple:
unsigned get_global_size(unsigned dimindx)
{
switch(dimindx)
{
case 0: return __builtin_hsail_gridsize(0);
case 1: return __builtin_hsail_gridsize(1);
case 2: return __builtin_hsail_gridsize(2);
default: return 0;
}
}
Trying to compile this with:
/me/LLVM_370_HSAIL_rwdi_NA_rtti/bin/clang-3.7 -cc1 -triple hsail64 -emit-llvm-bc -emit-llvm-uselists -disable-free -disable-llvm-verifier -main-file-name b.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -no-integrated-as -mconstructor-aliases -dwarf-column-info -coverage-file /tmp/b.bc -resource-dir /home/LLVM_370_HSAIL_rwdi_NA_rtti/bin/../lib/clang/3.7.0 -fno-dwarf-directory-asm -fdebug-compilation-dir /tmp -ferror-limit 19 -fmessage-length 160 -mstackrealign -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -ffake-address-space-map -o /tmp/b.bc -x c /tmp/b.c
.. results in a crash:
clang-3.7: /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/include/llvm/ADT/ArrayRef.h:187: const T& llvm::ArrayRef<T>::operator[](size_t) const [with T = llvm::Type*; size_t = long unsigned int]: Assertion `Index < Length && "Invalid index!"' failed.
Program received signal SIGABRT, Aborted.
#4 0x00007ffff3893745 in llvm::ArrayRef<llvm::Type*>::operator[] (this=0x7fffffff6cf0, Index=0)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/include/llvm/ADT/ArrayRef.h:187
__PRETTY_FUNCTION__ = "const T& llvm::ArrayRef<T>::operator[](size_t) const [with T = llvm::Type*; size_t = long unsigned int]"
#5 0x00007ffff383bc6f in DecodeFixedType (Infos=..., Tys=..., Context=...) at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/lib/IR/Function.cpp:775
D = {Kind = llvm::Intrinsic::IITDescriptor::Argument, {Integer_Width = 1, Float_Width = 1, Vector_Width = 1, Pointer_AddressSpace = 1,
Struct_NumElements = 1, Argument_Info = 1}}
__PRETTY_FUNCTION__ = "llvm::Type* DecodeFixedType(llvm::ArrayRef<llvm::Intrinsic::IITDescriptor>&, llvm::ArrayRef<llvm::Type*>, llvm::LLVMContext&)"
#6 0x00007ffff383bff1 in llvm::Intrinsic::getType (Context=..., id=llvm::Intrinsic::hsail_gridsize, Tys=...)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/lib/IR/Function.cpp:828
Table = {<llvm::SmallVectorImpl<llvm::Intrinsic::IITDescriptor>> = {<llvm::SmallVectorTemplateBase<llvm::Intrinsic::IITDescriptor, true>> = {<llvm::SmallVectorTemplateCommon<llvm::Intrinsic::IITDescriptor, void>> = {<llvm::SmallVectorBase> = {BeginX = 0x7fffffff6e78, EndX = 0x7fffffff6e88,
CapacityX = 0x7fffffff6eb8}, FirstEl = {<llvm::AlignedCharArray<4ul, 8ul>> = {
buffer = "\v\000\000\000\001\000\000"}, <No data fields>}}, <No data fields>}, <No data fields>}, Storage = {InlineElts = {
{<llvm::AlignedCharArray<4ul, 8ul>> = {buffer = "\a\000\000\000 \000\000"}, <No data fields>}, {<llvm::AlignedCharArray<4ul, 8ul>> = {
buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<4ul, 8ul>> = {
buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<4ul, 8ul>> = {
buffer = "M\226\336\367\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<4ul, 8ul>> = {
buffer = "\001\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<4ul, 8ul>> = {
buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<4ul, 8ul>> = {
buffer = "\000\000\000\000\000\000\000"}, <No data fields>}}}}
TableRef = {Data = 0x7fffffff6e80, Length = 1}
ResultTy = 0x7fffffff68a0
ArgTys = {<llvm::SmallVectorImpl<llvm::Type*>> = {<llvm::SmallVectorTemplateBase<llvm::Type*, true>> = {<llvm::SmallVectorTemplateCommon<llvm::Type*, void>> = {<llvm::SmallVectorBase> = {BeginX = 0xffffffff, EndX = 0x7fffffff6e30, CapacityX = 0x7ffff35ac4e0}, FirstEl = {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\220\265\334\363\377\177\000"}, <No data fields>}}, <No data fields>}, <No data fields>}, Storage = {InlineElts = {
{<llvm::AlignedCharArray<8ul, 8ul>> = {buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "M\226\336\367\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\001\000\000\000\000\000\000"}, <No data fields>}}}}
#7 0x00007ffff383cf1e in llvm::Intrinsic::getDeclaration (M=0x6bf900, id=llvm::Intrinsic::hsail_gridsize, Tys=...)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/lib/IR/Function.cpp:870
No locals.
#8 0x00007fffefd2dce8 in clang::CodeGen::CodeGenModule::getIntrinsic (this=0x6d53b0, IID=1335, Tys=...)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CodeGenModule.cpp:2564
No locals.
#9 0x00007fffefb1c08e in clang::CodeGen::CodeGenFunction::EmitBuiltinExpr (this=0x7fffffffac80, FD=0x711438, BuiltinID=1012, E=0x711620, ReturnValue=...)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGBuiltin.cpp:1803
FTy = 0x7ffff357e4d8
BuiltinRetType = {Value = {Value = 7071232}}
Args = {<llvm::SmallVectorImpl<llvm::Value*>> = {<llvm::SmallVectorTemplateBase<llvm::Value*, true>> = {<llvm::SmallVectorTemplateCommon<llvm::Value*, v---Type <return> to continue, or q <return> to quit---
oid>> = {<llvm::SmallVectorBase> = {BeginX = 0x7fffffff6fe8, EndX = 0x7fffffff6fe8, CapacityX = 0x7fffffff7068},
FirstEl = {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\220\020i\000\000\000\000"}, <No data fields>}}, <No data fields>}, <No data fields>}, Storage = {InlineElts = {
{<llvm::AlignedCharArray<8ul, 8ul>> = {buffer = "\000\346k\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = " p\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "0p\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\276\f.\353\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\000\346k\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "@p\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "Pp\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\276\f.\353\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "Pp\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\000\346k\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\000\346k\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "pp\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\200p\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\026\004.\353\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\200p\377\377\377\177\000"}, <No data fields>}}}}
ICEArguments = 1
Error = clang::ASTContext::GE_None
F = 0x7ffff35b7430
V = 0x3f400000000
RetTy = 0x7ffff7deffb0 <_dl_runtime_resolve+80>
Result = {<clang::Expr::EvalStatus> = {HasSideEffects = false, Diag = 0x0}, Val = {Kind = clang::APValue::Uninitialized, static DataSize = 48,
Data = {<llvm::AlignedCharArray<8ul, 48ul>> = {
buffer = "`w\377\377\377\177\000\000\200w\377\377\377\177\000\000\360\273-\353\377\177\000\000\220w\377\377\377\177\000\000\000\000\000\000\000\000\000\000\240w\377\377\377\177\000"}, <No data fields>}}}
__PRETTY_FUNCTION__ = "clang::CodeGen::RValue clang::CodeGen::CodeGenFunction::EmitBuiltinExpr(const clang::FunctionDecl*, unsigned int, const clang::CallExpr*, clang::CodeGen::ReturnValueSlot)"
Name = 0x7ffff06479b5 "__builtin_hsail_gridsize"
IntrinsicID = llvm::Intrinsic::hsail_gridsize
#10 0x00007fffefbff7f2 in clang::CodeGen::CodeGenFunction::EmitCallExpr (this=0x7fffffffac80, E=0x711620, ReturnValue=...)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGExpr.cpp:3122
builtinID = 1012
FD = 0x711438
TargetDecl = 0x711438
Callee = 0x20
#11 0x00007fffefc3c7c8 in (anonymous namespace)::ScalarExprEmitter::VisitCallExpr (this=0x7fffffffa130, E=0x711620)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGExprScalar.cpp:327
V = 0x711620
#12 0x00007fffefc4df53 in clang::StmtVisitorBase<clang::make_ptr, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::Visit (this=0x7fffffffa130,
S=0x711620) at /home/b/tools/clang/include/clang/AST/StmtNodes.inc:299
No locals.
#13 0x00007fffefc3be14 in (anonymous namespace)::ScalarExprEmitter::Visit (this=0x7fffffffa130, E=0x711620)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGExprScalar.cpp:201
DL = {OriginalLocation = {Loc = {Ref = {MD = 0x0}}}, CGF = @0x7fffffffac80}
#14 0x00007fffefc42ac0 in (anonymous namespace)::ScalarExprEmitter::VisitCastExpr (this=0x7fffffffa130, CE=0x711650)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGExprScalar.cpp:1579
E = 0x711620
DestTy = {Value = {Value = 7073152}}
Kind = clang::CK_IntegralCast
__PRETTY_FUNCTION__ = "llvm::Value* {anonymous}::ScalarExprEmitter::VisitCastExpr(clang::CastExpr*)"
#15 0x00007fffefc4efb7 in clang::StmtVisitorBase<clang::make_ptr, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::VisitImplicitCastExpr (
this=0x7fffffffa130, S=0x711650) at /home/b/tools/clang/include/clang/AST/StmtNodes.inc:393
No locals.
#16 0x00007fffefc4e073 in clang::StmtVisitorBase<clang::make_ptr, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::Visit (this=0x7fffffffa130,
S=0x711650) at /home/b/tools/clang/include/clang/AST/StmtNodes.inc:393
No locals.
#17 0x00007fffefc3be14 in (anonymous namespace)::ScalarExprEmitter::Visit (this=0x7fffffffa130, E=0x711650)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGExprScalar.cpp:201
DL = {OriginalLocation = {Loc = {Ref = {MD = 0x0}}}, CGF = @0x7fffffffac80}
#18 0x00007fffefc4cd8c in clang::CodeGen::CodeGenFunction::EmitScalarExpr (this=0x7fffffffac80, E=0x711650, IgnoreResultAssign=false)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGExprScalar.cpp:3475
__PRETTY_FUNCTION__ = "llvm::Value* clang::CodeGen::CodeGenFunction::EmitScalarExpr(const clang::Expr*, bool)"
#19 0x00007fffefcd84ca in clang::CodeGen::CodeGenFunction::EmitReturnStmt (this=0x7fffffffac80, S=...)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:1075
RV = 0x711650
cleanupScope = {CleanupStackDepth = {Size = 0}, LifetimeExtendedCleanupStackSize = 0, OldDidCallStackSave = false, PerformCleanup = true,
CGF = @0x7fffffffac80}
#20 0x00007fffefcd4b68 in clang::CodeGen::CodeGenFunction::EmitStmt (this=0x7fffffffac80, S=0x711668)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:136
__PRETTY_FUNCTION__ = "void clang::CodeGen::CodeGenFunction::EmitStmt(const clang::Stmt*)"
#21 0x00007fffefcd953c in clang::CodeGen::CodeGenFunction::EmitCaseStmt (this=0x7fffffffac80, S=...)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:1297
CaseVal = 0x7067b0
CaseDest = 0x6dc3d0
CurCase = 0x7113c8
NextCase = 0x0
#22 0x00007fffefcd5287 in clang::CodeGen::CodeGenFunction::EmitSimpleStmt (this=0x7fffffffac80, S=0x7113c8)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:265
No locals.
#23 0x00007fffefcd4871 in clang::CodeGen::CodeGenFunction::EmitStmt (this=0x7fffffffac80, S=0x7113c8)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:50
__PRETTY_FUNCTION__ = "void clang::CodeGen::CodeGenFunction::EmitStmt(const clang::Stmt*)"
#24 0x00007fffefcd5469 in clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope (this=0x7fffffffac80, S=..., GetLast=false, AggSlot=...)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:293
I = 0x711938
E = 0x711958
RetAlloca = 0x0
#25 0x00007fffefcd538d in clang::CodeGen::CodeGenFunction::EmitCompoundStmt (this=0x7fffffffac80, S=..., GetLast=false, AggSlot=...)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:283
CrashInfo = {<llvm::PrettyStackTraceEntry> = {_vptr.PrettyStackTraceEntry = 0x7ffff08ec960 <vtable for clang::PrettyStackTraceLoc+16>,
NextEntry = 0x7fffffffb670}, SM = @0x6905e0, Loc = {ID = 1256}, Message = 0x7fffeff87228 "LLVM IR generation of compound statement ('{}')"}
Scope = {<clang::CodeGen::CodeGenFunction::RunCleanupsScope> = {CleanupStackDepth = {Size = 0}, LifetimeExtendedCleanupStackSize = 0,
OldDidCallStackSave = false, PerformCleanup = true, CGF = @0x7fffffffac80}, Range = {B = {ID = 1256}, E = {ID = 1437}},
Labels = {<llvm::SmallVectorImpl<clang::LabelDecl const*>> = {<llvm::SmallVectorTemplateBase<clang::LabelDecl const*, true>> = {<llvm::SmallVectorTemplateCommon<clang::LabelDecl const*, void>> = {<llvm::SmallVectorBase> = {BeginX = 0x7fffffffa5c0, EndX = 0x7fffffffa5c0, CapacityX = 0x7fffffffa5e0},
FirstEl = {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\000\000\000\000\000\000\000"}, <No data fields>}}, <No data fields>}, <No data fields>}, Storage = {InlineElts = {
{<llvm::AlignedCharArray<8ul, 8ul>> = {buffer = " \031q\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\360\245\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\022M\306\357\377\177\000"}, <No data fields>}}}}, ParentScope = 0x0}
#26 0x00007fffefcd5175 in clang::CodeGen::CodeGenFunction::EmitSimpleStmt (this=0x7fffffffac80, S=0x711920)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:256
No locals.
#27 0x00007fffefcd4871 in clang::CodeGen::CodeGenFunction::EmitStmt (this=0x7fffffffac80, S=0x711920)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:50
__PRETTY_FUNCTION__ = "void clang::CodeGen::CodeGenFunction::EmitStmt(const clang::Stmt*)"
#28 0x00007fffefcda057 in clang::CodeGen::CodeGenFunction::EmitSwitchStmt (this=0x7fffffffac80, S=...)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:1599
SavedSwitchInsn = 0x0
SavedSwitchWeights = 0x0
SavedCRBlock = 0x0
ConstantCondValue = {<llvm::APInt> = {BitWidth = 1, {VAL = 0, pVal = 0x0}}, IsUnsigned = false}
SwitchExit = {Block = 0x706610, ScopeDepth = {Size = 0}, Index = 2}
ConditionScope = {CleanupStackDepth = {Size = 0}, LifetimeExtendedCleanupStackSize = 0, OldDidCallStackSave = false, PerformCleanup = true,
CGF = @0x7fffffffac80}
CondV = 0x7066a8
DefaultBlock = 0x6dc2f0
OuterContinue = {Block = 0x0, ScopeDepth = {Size = -1}, Index = 0}
__PRETTY_FUNCTION__ = "void clang::CodeGen::CodeGenFunction::EmitSwitchStmt(const clang::SwitchStmt&)"
#29 0x00007fffefcd4b8b in clang::CodeGen::CodeGenFunction::EmitStmt (this=0x7fffffffac80, S=0x711380)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:138
__PRETTY_FUNCTION__ = "void clang::CodeGen::CodeGenFunction::EmitStmt(const clang::Stmt*)"
#30 0x00007fffefcd5469 in clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope (this=0x7fffffffac80, S=..., GetLast=false, AggSlot=...)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:293
I = 0x7119b8
E = 0x7119c0
RetAlloca = 0x0
#31 0x00007fffefd1922a in clang::CodeGen::CodeGenFunction::EmitFunctionBody (this=0x7fffffffac80, Args=..., Body=0x7119a0)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CodeGenFunction.cpp:794
S = 0x7119a0
#32 0x00007fffefd19b0c in clang::CodeGen::CodeGenFunction::GenerateCode (this=0x7fffffffac80, GD=..., Fn=0x6d49b8, FnInfo=...)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CodeGenFunction.cpp:915
Body = 0x7119a0
FD = 0x711258
Args = {<llvm::SmallVector<clang::VarDecl const*, 16u>> = {<llvm::SmallVectorImpl<clang::VarDecl const*>> = {<llvm::SmallVectorTemplateBase<clang::VarDecl const*, true>> = {<llvm::SmallVectorTemplateCommon<clang::VarDecl const*, void>> = {<llvm::SmallVectorBase> = {BeginX = 0x7fffffffaac8,
EndX = 0x7fffffffaad0, CapacityX = 0x7fffffffab48}, FirstEl = {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "(\tq\000\000\000\000"}, <No data fields>}}, <No data fields>}, <No data fields>}, Storage = {InlineElts = {
{<llvm::AlignedCharArray<8ul, 8ul>> = {buffer = "\220\020i\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\260^m\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\000\253\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\000\275<\360\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\220\020i\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = " \336\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "M\226\336\367\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\001\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "@\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "p_\233\357\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
buffer = "\340\005i\000\000\000\000"}, <No data fields>}}}}, <No data fields>}
ResTy = {Value = {Value = 7073152}}
MD = 0x0
BodyRange = {B = {ID = 1232}, E = {ID = 1439}}
Loc = {ID = 1194}
#33 0x00007fffefd2d777 in clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition (this=0x6d53b0, GD=..., GV=0x6d49b8)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CodeGenModule.cpp:2476
D = 0x711258
FI = @0x6dc870: {<llvm::FoldingSetImpl::Node> = {NextInFoldingSetBucket = 0x6d6121}, CallingConvention = 0, EffectiveCallingConvention = 0,
ASTCallingConvention = 0, InstanceMethod = 0, ChainCall = 0, NoReturn = 0, ReturnsRetained = 0, HasRegParm = 0, RegParm = 0, Required = {
NumRequired = 4294967295}, ArgStruct = 0x0, NumArgs = 1}
Ty = 0x6d6ea0
__PRETTY_FUNCTION__ = "void clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition(clang::GlobalDecl, llvm::GlobalValue*)"
Fn = 0x6d49b8
#34 0x00007fffefd2a48d in clang::CodeGen::CodeGenModule::EmitGlobalDefinition (this=0x6d53b0, GD=..., GV=0x0)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CodeGenModule.cpp:1531
D = 0x711258
CrashInfo = {<llvm::PrettyStackTraceEntry> = {_vptr.PrettyStackTraceEntry = 0x7fffebbea520 <vtable for clang::PrettyStackTraceDecl+16>,
NextEntry = 0x7fffffffc210}, TheDecl = 0x711258, Loc = {ID = 1194}, SM = @0x6905e0, Message = 0x7fffeffc60f0 "Generating code for declaration"}
#35 0x00007fffefd29c9e in clang::CodeGen::CodeGenModule::EmitGlobal (this=0x6d53b0, GD=...)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CodeGenModule.cpp:1387
Global = 0x711258
__PRETTY_FUNCTION__ = "void clang::CodeGen::CodeGenModule::EmitGlobal(clang::GlobalDecl)"
MangledName = {static npos = 18446744073709551615, Data = 0x7fffffffb740 "`\267\377\377\377\177", Length = 140737214714484}
#36 0x00007fffefd31290 in clang::CodeGen::CodeGenModule::EmitTopLevelDecl (this=0x6d53b0, D=0x711258)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CodeGenModule.cpp:3242
__PRETTY_FUNCTION__ = "void clang::CodeGen::CodeGenModule::EmitTopLevelDecl(clang::Decl*)"
#37 0x00007fffefe43075 in (anonymous namespace)::CodeGeneratorImpl::HandleTopLevelDecl (this=0x6bf6f0, DG=...)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/ModuleBuilder.cpp:130
I = 0x7fffffffc1b0
E = 0x7fffffffc1b8
HandlingDecl = {Self = @0x6bf6f0}
#38 0x00007fffefd112ec in clang::BackendConsumer::HandleTopLevelDecl (this=0x6bf570, D=...)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CodeGenAction.cpp:111
CrashInfo = {<llvm::PrettyStackTraceEntry> = {_vptr.PrettyStackTraceEntry = 0x7fffebbea520 <vtable for clang::PrettyStackTraceDecl+16>,
NextEntry = 0x7fffffffc2a0}, TheDecl = 0x711258, Loc = {ID = 0}, SM = @0x6905e0, Message = 0x7fffeffb42c0 "LLVM IR generation of declaration"}
#39 0x00007fffe9e7e069 in clang::ParseAST (S=..., PrintStats=false, SkipFunctionBodies=false)
at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/Parse/ParseAST.cpp:142
OldCollectStats = false
Consumer = 0x6bf570
ParseOP = std::unique_ptr<clang::Parser> containing 0x6d0a10
P = @0x6d0a10: {<clang::CodeCompletionHandler> = {_vptr.CodeCompletionHandler = 0x7fffea19fb18 <vtable for clang::Parser+16>}, PP = @0x691090, Tok = {
Loc = 1440, UintData = 0, PtrData = 0x0, Kind = clang::tok::eof, Flags = 0}, PrevTokLocation = {ID = 1439}, ParenCount = 0, BracketCount = 0,
BraceCount = 0, Actions = @0x6cdd60, Diags = @0x68f260, NumCachedScopes = 3, ScopeCache = {0x6bf260, 0x6f9d70, 0x6f9f60, 0x0 <repeats 13 times>},
Ident__exception_code = 0x0, Ident___exception_code = 0x0, Ident_GetExceptionCode = 0x0, Ident__exception_info = 0x0, Ident___exception_info = 0x0,
Ident_GetExceptionInfo = 0x0, Ident__abnormal_termination = 0x0, Ident___abnormal_termination = 0x0, Ident_AbnormalTermination = 0x0,
Ident__except = 0x0, Ident_sealed = 0x0, Ident_super = 0x6cdaf8, Ident_vector = 0x0, Ident_bool = 0x0, Ident_pixel = 0x0, Ident_instancetype = 0x0,
Ident_introduced = 0x0, Ident_deprecated = 0x0, Ident_obsoleted = 0x0, Ident_unavailable = 0x0, Ident_message = 0x0, Ident_final = 0x0,
Ident_override = 0x0,
This code has a few problems, so I've had to remove it. These were renamed and the parameters changed from a previous version of LLVM, so these have been dead for an known amount of time. It's also not clear if these are used for the intended purpose.
analyzeCompare:
optimizeCompareInstr's comment says it
// Simplify constructions like this:
// cmp_gt $c0, $s2, $s1;
// cmov $s1, $c0, -1, 0;
// cmp_eq $c0, $s1, 0;
//
// Instead we should get:
// not cmp_gt $c0, $s2, $s1;
This should be handled by the DAG combiner already, but for some reason is not. My guess is this is not handled now because of one of the many places that assume ZeroOrOneBooleanContent in the setcc optimizations. This also did not handle fixing this situation if the cmov's immediate operands were swapped to false, true.
I'm on the latest HLC compiler from the branch hsail-stable-3.7
. I compiled with -O2
.
The following kernel doesn't assemble after compilation, because the alignment info generated by the code generator can exceed HSAIL's limit of BRIG_ALIGNMENT_256
:
__kernel void Bug(size_t index)
{
__global uint *ptr = (__global uint *)(index*512);
*ptr = 0xdeadc0de;
}
The emitted error is:
> st_global_align(512)_u32 3735929054, [$d0];
> ^
input(16,18): Invalid alignment
ERROR: The following command failed with return code 1.
HSAILasm -o /tmp/hsa_finalizer-qVC9u8/temp.hsail /tmp/cloc31620/temp.hsail
I think all that needs to be done is to limit the alignment info produced by the code generator to BRIG_ALIGNMENT_256
.
fcopysign is allowed to have different operand types, and the sign will still be copied. e.g. the sign from a float value can be copied to a double value. This does not currently work with HSAIL.
Could you add support for the following intrinsics? hsail-stable-3.7 and newer would be fine, thanks
call &__hsail_sqrt_fc_f32 (%__hsail_sqrt_fc_f32) (%__param_p0, %__param_p1);
call &__hsail_div_fc_f32 (%__hsail_div_fc_f32) (%__param_p0, %__param_p1, %__param_p2);
call &__hsail_mul_f64 (%__hsail_mul_f64) (%__param_p0, %__param_p1);
Testcase in test/CodeGen/HSAIL/global-variable-alias.ll
@0 = addrspace(2) global [4 x i32] [ i32 5, i32 4, i32 432, i32 3 ]
@alias = alias [4 x i32] addrspace(2)* @0
define i32 @use_alias_gv() nounwind {
%gep = getelementptr [4 x i32] addrspace(2)* @alias, i32 0, i32 1
%load = load i32 addrspace(2)* %gep
ret i32 %load
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.