Giter VIP home page Giter VIP logo

hsafoundation / hlc-hsail-development-llvm Goto Github PK

View Code? Open in Web Editor NEW
15.0 15.0 6.0 239.83 MB

HSAIL LLVM Tree - Development has stopped on this branch This was a development branch

License: Other

Makefile 0.24% Shell 0.11% C 0.31% OCaml 0.29% Python 0.45% C++ 46.56% Objective-C 0.40% Assembly 10.64% Perl 0.03% Emacs Lisp 0.01% Vim Script 0.02% CMake 0.26% M4 0.10% Go 0.13% SourcePawn 0.01% LLVM 40.24% Mirah 0.20% Groff 0.01%

hlc-hsail-development-llvm's People

Contributors

ahatanak avatar arsenm avatar asl avatar atrick avatar bcardosolopes avatar bigcheese avatar bob-wilson avatar chandlerc avatar chapuni avatar cunningbaldrick avatar d0k avatar ddunbar avatar dexonsmith avatar dwblaikie avatar echristo avatar eefriedman avatar espindola avatar isanbard avatar lattner avatar lhames avatar majnemer avatar mbrukman avatar nadavrot avatar nlewycky avatar resistor avatar sampo3k avatar stoklund avatar tnorthover avatar topperc avatar tstellaramd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hlc-hsail-development-llvm's Issues

Wrong LDA instruction generated for addresses passed into functions

I'm using the latest HLC compiler from the branch hsail-review-v2.
Consider the following OpenCL program:

struct Foo
{
    int a;
    int b;
};

volatile int *DbgAddr = (int *)4;

__attribute__((noinline)) void Func(const struct Foo *f)
{
    DbgAddr[0] = f->a;
    DbgAddr[1] = f->b;
}

__kernel void Bug()
{
    struct Foo f = {1, 2};
    Func(&f);
}

This OpenCL program gets compiled into the following HSAIL code:

module &__llvm_hsail_module:1:0:$full:$large:$near;

decl function &Func()(arg_u64 %f);

function &Func()(arg_u64 %f)
{
    // BB#0:
    ld_arg_align(8)_u64 $d0, [%f];
    ld_align(4)_u32 $s0, [$d0];
    st_align(4)_u32 $s0, [4];
    ld_align(4)_u32 $s0, [$d0+4];
    st_align(8)_u32 $s0, [8];
    ret;
};

prog kernel &__OpenCL_Bug_kernel(
    kernarg_u64 %__global_offset_0,
    kernarg_u64 %__global_offset_1,
    kernarg_u64 %__global_offset_2,
    kernarg_u64 %__printf_buffer,
    kernarg_u64 %__vqueue_pointer,
    kernarg_u64 %__aqlwrap_pointer)
{
    align(8) private_u8 %__privateStack[8];
    // BB#0:
    st_private_align(4)_u32 2, [%__privateStack][4];
    st_private_align(8)_u32 1, [%__privateStack];
    lda_private_u32 $s0, [%__privateStack];
    cvt_u64_u32 $d0, $s0;
    {
        arg_u64 %f;
        st_arg_align(8)_u64 $d0, [%f];
        call    &Func () (%f);
    }
    ret;
};

In the kernel __OpenCL_Bug_kernel, the address of of f is obtained by using the lda_private_u32 instruction, which computes an address relative to the private segment. However, in the function Func, a flat ld instruction is used to read from that address.

Shouldn't either be the lda instruction flat, or the ld instruction private, so that the segments they refer to match?

Compiler generates 8-bit instead of 16-bit LD instruction

I'm using the latest llc from the branch hsail-stable-3.7 and the latest CLOC.sh from the master branch of the CLOC repository.

Consider the following OpenCL program:

__kernel void Bug2(__global unsigned short *a, __global long *b)
{
    *b = *a;
}

Compiling this program with the CLOC.sh gives the following HSAIL code:

module &__llvm_hsail_module:1:0:$full:$large:$near;

prog kernel &__OpenCL_Bug2_kernel(
    kernarg_u64 %__global_offset_0,
    kernarg_u64 %__global_offset_1,
    kernarg_u64 %__global_offset_2,
    kernarg_u64 %__printf_buffer,
    kernarg_u64 %__vqueue_pointer,
    kernarg_u64 %__aqlwrap_pointer,
    kernarg_u64 %a,
    kernarg_u64 %b)
{
    // BB#0:
    ld_kernarg_align(8)_width(all)_u64  $d0, [%a];
    ld_global_align(2)_u8   $s0, [$d0];
    cvt_u64_u32 $d0, $s0;
    ld_kernarg_align(8)_width(all)_u64  $d1, [%b];
    st_global_align(8)_u64  $d0, [$d1];
    ret;
};

The second load instruction in the kernel (ld_global_align(2)_u8) loads only a single byte instead of the whole unsigned short from memory. Interestingly enough, when changing the type of b to e.g. int, the correct 16-bit load instruction is generated.

i1 / i8 / i16 return types / parameters inconsistently handled

This code in HSAILTargetLowering::LowerArgument results in an inconsistency in how types that need to be promoted are handled.

Type *sType = type->getScalarType();

EVT argVT = Ins ? (_Ins)[ArgNo].VT : (_Outs)[ArgNo].VT;

if (sType->isIntegerTy(8))
argVT = MVT::i8;
else if (sType->isIntegerTy(16))
argVT = MVT::i16;

The default ABI behavior for < i32 return values is to promote to i32 (see getTypeForExtArgOrReturn).

This is what happens for i1, but because of this special casing of i8 and i16, the return store is lowered as the small type despite the parameter being the full i32 type. The argument / return lowering should stop looking at the IR type, and use the computed set of VTs the argument is broken up into. There should also be a calling convention file to handle more of this kind of type logic.

i1 kernel arguments loaded as 4 bytes

These should be treated as an 8-bit byte load. If the ABI requires these are promoted to 32-bit and occupy that much space, they should be marked with zeroext parameter attributes

Probably wrong behavior for cttz / ctlz on zero input

These are currently selected to firstbit / lastbit, but these have different behavior on zero input. For the defined version, the LLVM intrinsic is supposed to return the size of the type in bits on zero input. firstbit / last bit return 0xffffffff, so the non-undef version needs to combine this with a mask for the size of the type.

fcmp + select incorrectly matched to min/max

%r2 = fcmp olt double %r0, %r1
%r3 = select i1 %r2, double %r0, double %r1

This sequence is incorrectly emitted as min_f64. This does not have the same NaN behavior as the IEEE minNum/maxNum behavior the instruction implements.

Error in HSAIL_ASM::Brigantine::createRef

This test case (https://gist.github.com/0dd7835fb7e9087851a5.git) causes an error in HLC:

llc -O2 -march=hsail -filetype=asm -o dump.hsail dump.opt.ll -debug

********** COMPUTING STACKMAP LIVENESS: ZZN4Sort15SortIntegerKeysERN11Concurrency5arrayIjLi1EEES3_S3_iEN3_EC__219__cxxamp_trampolineEPjiiS5_iiS5_iii **********
terminate called after throwing an instance of 'SyntaxError'
#0 0x1ccc6db llvm::sys::PrintStackTrace(_IO_FILE*) (/opt/amd/hlc.hsail-1.0f/llc+0x1ccc6db)
#1 0x1ccc976 PrintStackTraceSignalHandler(void*) (/opt/amd/hlc.hsail-1.0f/llc+0x1ccc976)
#2 0x1ccb3cf SignalHandler(int) (/opt/amd/hlc.hsail-1.0f/llc+0x1ccb3cf)
#3 0x7fee225f0340 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10340)
#4 0x7fee21806cc9 gsignal /build/buildd/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56:0
#5 0x7fee2180a0d8 abort /build/buildd/eglibc-2.19/stdlib/abort.c:91:0
#6 0x7fee22118535 __gnu_cxx::__verbose_terminate_handler() (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x60535)
#7 0x7fee221166d6 (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x5e6d6)
#8 0x7fee22116703 (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x5e703)
#9 0x7fee22116922 (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x5e922)
#10 0x1cf6254 HSAIL_ASM::Brigantine::handleError(SyntaxError const&) (/opt/amd/hlc.hsail-1.0f/llc+0x1cf6254)
#11 0x1cf6372 HSAIL_ASM::Brigantine::brigWriteError(char const*, HSAIL_ASM::SourceInfo const*) (/opt/amd/hlc.hsail-1.0f/llc+0x1cf6372)
#12 0x1cf8b59 HSAIL_ASM::Brigantine::createRef(HSAIL_ASM::SRef const&, HSAIL_ASM::OperandRegister, long, bool, HSAIL_ASM::SourceInfo const*) (/opt/amd/hlc.hsail-1.0f/llc+0x1cf8b59)
#13 0xf97086 HSAIL_ASM::Brigantine::createRef(HSAIL_ASM::SRef const&, HSAIL_ASM::SRef&, long, bool, HSAIL_ASM::SourceInfo const*) (/opt/amd/hlc.hsail-1.0f/llc+0xf97086)
#14 0xfa57f5 llvm::BRIGAsmPrinter::BrigEmitOperandLdStAddress(llvm::MachineInstr const*, unsigned int, unsigned int) (/opt/amd/hlc.hsail-1.0f/llc+0xfa57f5)
#15 0xfa7cf8 llvm::BRIGAsmPrinter::BrigEmitInstMem(llvm::MachineInstr const&, unsigned int) (/opt/amd/hlc.hsail-1.0f/llc+0xfa7cf8)
#16 0xfa190d llvm::BRIGAsmPrinter::EmitInstructionImpl(llvm::MachineInstr const*) (/opt/amd/hlc.hsail-1.0f/llc+0xfa190d)
#17 0xfa1686 llvm::BRIGAsmPrinter::EmitInstruction(llvm::MachineInstr const*) (/opt/amd/hlc.hsail-1.0f/llc+0xfa1686)
#18 0x14e03c1 llvm::AsmPrinter::EmitFunctionBody() (/opt/amd/hlc.hsail-1.0f/llc+0x14e03c1)
#19 0xfa0c0e llvm::BRIGAsmPrinter::runOnMachineFunction(llvm::MachineFunction&) (/opt/amd/hlc.hsail-1.0f/llc+0xfa0c0e)
#20 0x164bad5 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/opt/amd/hlc.hsail-1.0f/llc+0x164bad5)
#21 0x191a9cd llvm::FPPassManager::runOnFunction(llvm::Function&) (/opt/amd/hlc.hsail-1.0f/llc+0x191a9cd)
#22 0x191ab6e llvm::FPPassManager::runOnModule(llvm::Module&) (/opt/amd/hlc.hsail-1.0f/llc+0x191ab6e)
#23 0x191ae8d (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) (/opt/amd/hlc.hsail-1.0f/llc+0x191ae8d)
#24 0x191b592 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/opt/amd/hlc.hsail-1.0f/llc+0x191b592)
#25 0x191b7e1 llvm::legacy::PassManager::run(llvm::Module&) (/opt/amd/hlc.hsail-1.0f/llc+0x191b7e1)
#26 0xba3950 compileModule(char**, llvm::LLVMContext&) (/opt/amd/hlc.hsail-1.0f/llc+0xba3950)
#27 0xba2afe main (/opt/amd/hlc.hsail-1.0f/llc+0xba2afe)
#28 0x7fee217f1ec5 __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:321:0
#29 0xba00a9 _start (/opt/amd/hlc.hsail-1.0f/llc+0xba00a9)
Stack dump:
0.      Program arguments: /opt/amd/hlc.hsail-1.0f/llc -O2 -march=hsail -filetype=asm -o dump.hsail dump.opt.ll -debug 
1.      Running pass 'Function Pass Manager' on module 'dump.opt.ll'.
2.      Running pass 'BRIG Container Filler' on function '@ZZN4Sort15SortIntegerKeysERN11Concurrency5arrayIjLi1EEES3_S3_iEN3_EC__219__cxxamp_trampolineEPjiiS5_iiS5_iii'
Aborted

Note: this is from the Sort C++AMP sample from the APP SDK

OpenCL kernel with non-void return type seems to compile but fail assembling

I'm on the latest HLC compiler from the branch hsail-stable-3.7.

OpenCL prohibits the use of non-void return types in __kernel functions. Therefore, I would expect the compiler to reject the following OpenCL kernel and terminate with a nice error message. Unfortunately, that's not the case.

__kernel bool Main()
{
    return true;
}

It seems like the HLC compiler eats the kernel, but eventually CLOC fails assembling the resulting HSAIL code. The following error message is emitted:

>   st_arg_u32  $s0, [%__OpenCL_Main_kernel];
>                    ^
input(14,18): Symbol not found: %__OpenCL_Main_kernel

ERROR:  The following command failed with return code 1.
        HSAILasm -o /tmp/hsa_finalizer-RuZXe8/temp.hsail /tmp/cloc26565/temp.hsail

Compiler crashes when using the same name for two different arguments in two different scopes

I'm on the latest HLC compiler from the branch hsail-stable-3.7. I compiled with -O2.

The following kernel fails to compile.

__attribute__((noinline)) static uint Op(uint in)
{
    return 2*in;
}

__kernel void Bug1(uint in, __global uint *out)
{
    *out = Op(in);
}

The HLC compiler crashes with the following error message:

ERROR:  The following command failed with return code 134.
        ./llc -O2 -march=hsail64 -filetype=asm -o /tmp/cloc13948/temp.hsail /tmp/cloc13948/temp.opt.bc

llc: /home/dgeier/Documents/swarm64/code/hsa/HLC-HSAIL-Development-LLVM/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:3346: llvm::SDValue llvm::SelectionDAG::getNode(unsigned int, llvm::SDLoc, llvm::EVT, llvm::SDValue, llvm::SDValue, const llvm::SDNodeFlags*): Assertion `N1.getValueType() == N2.getValueType() && N1.getValueType() == VT && "Binary operator types must match!"' failed.
0  llc             0x00000000022348d2 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 59
1  llc             0x0000000002234c26
2  llc             0x00000000022336b1
3  libpthread.so.0 0x00007fe6272b1d10
4  libc.so.6       0x00007fe626451267 gsignal + 55
5  libc.so.6       0x00007fe626452eca abort + 362
6  libc.so.6       0x00007fe62644a03d
7  libc.so.6       0x00007fe62644a0f2
8  llc             0x0000000002091357 llvm::SelectionDAG::getNode(unsigned int, llvm::SDLoc, llvm::EVT, llvm::SDValue, llvm::SDValue, llvm::SDNodeFlags const*) + 2285
9  llc             0x00000000017e2ba1 llvm::HSAILTargetLowering::getArgStore(llvm::SelectionDAG&, llvm::SDLoc, llvm::EVT, llvm::Type*, unsigned int, llvm::SDValue, llvm::SDValue, llvm::SDValue, unsigned int, llvm::SDValue, llvm::AAMDNodes const&, unsigned long) const + 741
10 llc             0x00000000017e360d llvm::HSAILTargetLowering::LowerArgument(llvm::SDValue, llvm::SDValue, bool, llvm::SmallVectorImpl<llvm::ISD::InputArg> const*, llvm::SmallVectorImpl<llvm::ISD::OutputArg> const*, llvm::SDLoc, llvm::SelectionDAG&, llvm::SmallVectorImpl<llvm::SDValue>*, unsigned int&, llvm::Type*, unsigned int, char const*, llvm::SDValue, llvm::SmallVectorImpl<llvm::SDValue> const*, bool, llvm::AAMDNodes const&, unsigned long) const + 2131
11 llc             0x00000000017e4ca5 llvm::HSAILTargetLowering::LowerCall(llvm::TargetLowering::CallLoweringInfo&, llvm::SmallVectorImpl<llvm::SDValue>&) const + 4339
12 llc             0x00000000020e5da8 llvm::TargetLowering::LowerCallTo(llvm::TargetLowering::CallLoweringInfo&) const + 4450
13 llc             0x00000000020dc4e2 llvm::SelectionDAGBuilder::lowerInvokable(llvm::TargetLowering::CallLoweringInfo&, llvm::MachineBasicBlock*) + 444
14 llc             0x00000000020dca61 llvm::SelectionDAGBuilder::LowerCallTo(llvm::ImmutableCallSite, llvm::SDValue, bool, llvm::MachineBasicBlock*) + 951
15 llc             0x00000000020df148 llvm::SelectionDAGBuilder::visitCall(llvm::CallInst const&) + 2294
16 llc             0x00000000020be4d6 llvm::SelectionDAGBuilder::visit(unsigned int, llvm::User const&) + 1256
17 llc             0x00000000020bdf49 llvm::SelectionDAGBuilder::visit(llvm::Instruction const&) + 155
18 llc             0x000000000210c037 llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator<llvm::Instruction const>, llvm::ilist_iterator<llvm::Instruction const>, bool&) + 83
19 llc             0x000000000210f3d0 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 2866
20 llc             0x000000000210b261 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 1049
21 llc             0x00000000017d2fd9
22 llc             0x0000000001a8061d llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 95
23 llc             0x0000000001e0cc9d llvm::FPPassManager::runOnFunction(llvm::Function&) + 305
24 llc             0x0000000001e0ce3c llvm::FPPassManager::runOnModule(llvm::Module&) + 112
25 llc             0x0000000001e0d1b4
26 llc             0x0000000001e0d894 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 252
27 llc             0x0000000001e0dacd llvm::legacy::PassManager::run(llvm::Module&) + 39
28 llc             0x0000000000cb0376
29 llc             0x0000000000caf30a main + 257
30 libc.so.6       0x00007fe62643ca40 __libc_start_main + 240
31 llc             0x0000000000cae049 _start + 41
Stack dump:
0.  Program arguments: ./llc -O2 -march=hsail64 -filetype=asm -o /tmp/cloc13948/temp.hsail /tmp/cloc13948/temp.opt.bc 
1.  Running pass 'Function Pass Manager' on module '/tmp/cloc13948/temp.opt.bc'.
2.  Running pass 'HSAIL DAG->DAG Instruction Selection' on function '@__OpenCL_Bug1_kernel'
/opt/amd/bin/cloc.sh: line 333: 13966 Aborted                 (core dumped) $HSA_LLVM_PATH/$CMD_LLC -o $TMPDIR/$FNAME.hsail $TMPDIR/$FNAME.opt.bc

Weirdly, after changing the name of the in argument of Op() to in2, the error disappears. The following kernel compilers successfully.

__attribute__((noinline)) static uint Op(uint in2)
{
    return 2*in2;
}

__kernel void Bug1(uint in, __global uint *out)
{
    *out = Op(in);
}

Three nested loops result in erroneous control flow

While testing simple OpenCL image kernels through the HSAIL LLVM I noticed that three nested loops cause the outer loop to branch incorrectly. More specifically, the loop variable is incremented every other iteration of the outer loop. Attached is a simple OpenCL kernel that reproduces the issue. The issue persists up to -O2, but is not present when using -O3.

For this kernel, the HSAIL LLVM create the following outer loop structure (complete HSAIL attached):

…
@BB0_2:
    mov_b32 $s5, 0;
    not_b1  $c1, $c1;
    cbr_b1  $c1, @BB0_3;
    br  @BB0_7;
…
@BB0_7:
    add_u32 $s4, $s4, 1;
    cmp_ne_b1_s32   $c2, $s4, $s0;
    cbr_b1  $c2, @BB0_2;

The boolean c1 is negated on each iteration, but only every other iteration increments the outer loop variable $s4. It seems that the not clause is related to a check that asserts that the outer loop is executed at all (in this case, that x_size > -1).

Pull request #29 fixes this simple case, but more thorough control flow analysis is required to detect all variable uses inside loops.

kernel.zip

Crash in machine verifier

This IR crashes the machine verifier:

target datalayout = "e-p:32:32-p1:64:64:64-p2:64:64:64-p3:32:32:32-p4:64:64:64-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-n32"
target triple = "hsail64-pc-unknown-amdopencl"

; Function Attrs: nounwind
define spir_kernel void @hsaPy_hsapy_devfn__5F__5F_main_5F__5F__2E_TestScan_2E_test_5F_single_5F_block_2E__3C_locals_3E__2E_scan_5F_block_24_4_2E_array_28_int64_2C__20_1d_2C__20_C_29__2E_array_28_int64_2C__20_1d_2C__20_C_29_(i8 addrspace(1)* nocapture readnone %dot.2, i64 %dot.3, i64 %dot.4, i64 addrspace(1)* nocapture %dot.5, i64 %dot.6, i64 %dot.7, i8 addrspace(1)* nocapture readnone %dot.8, i64 %dot.9, i64 %dot.10, i64 addrspace(1)* nocapture %dot.11, i64 %dot.12, i64 %dot.13) #0 {
.14:
ret void
}

; Function Attrs: nounwind
define linkonce_odr spir_func i32 @hsapy_devfn__5F__5F_main_5F__5F__2E_device_5F_scan_5F_generic_24_1_2E_int64_2E_array_28_int64_2C__20_1d_2C__20_C_29_(i64* nocapture %dot.ret, i64 %arg.tid, i8* %arg.data.0, i64 %arg.data.1, i64 %arg.data.2, i64 addrspace(4)* %arg.data.3, i64 %arg.data.4.0, i64 %arg.data.5.0) #0 {
entry:
br i1 undef, label %B40.lr.ph, label %B154

B40.lr.ph: ; preds = %entry
%dot.802 = or i64 undef, 1
br label %B40

B40: ; preds = %B130, %B40.lr.ph
br i1 undef, label %B65, label %B130

B65: ; preds = %B40
%dot.134 = add i64 undef, undef
br label %B130

B130: ; preds = %B65, %B40
%dot.198 = sdiv i64 undef, 2
br i1 undef, label %B40, label %B154.loopexit

B154.loopexit: ; preds = %B130
br label %B154

B154: ; preds = %B154.loopexit, %entry
%dot.261 = icmp eq i64 %arg.tid, 0
br i1 %dot.261, label %B206, label %B238.preheader

B206: ; preds = %B154
br label %B238.preheader

B238.preheader: ; preds = %B206, %B154
br i1 undef, label %B250.lr.ph, label %B384

B250.lr.ph: ; preds = %B238.preheader
%dot.377 = add i64 undef, 2
br label %B250

B250: ; preds = %B370, %B250.lr.ph
br i1 undef, label %B285, label %B370

B285: ; preds = %B250
br label %B370

B370: ; preds = %B285, %B250
br i1 undef, label %B250, label %B384.loopexit

B384.loopexit: ; preds = %B370
br label %B384

B384: ; preds = %B384.loopexit, %B238.preheader
ret i32 0
}

attributes #0 = { nounwind }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-realign-stack" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind readnone "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-realign-stack" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }

Empty functions with return types fail validation

If a function has a return type, and the body is only an unreachable instruction, it fails brig validation.

define float @empty_func_with_return_type() {
unreachable
}

produces:

Incompatible identifier declarations
LLVM ERROR:
Brig container validation has failed in BRIGAsmPrinter.cpp

Many bitselect patterns are ineffective

The current assortment of bitselect patterns do not work for simple permutations of the commutable operators.

Additionally, many of these patterns do not work with the pattern they recognize after instcombine.

For example:

define i32 @bitselect_pat4_rrr(i32 %src0, i32 %src1, i32 %src2) #0 {
  %tmp0 = and i32 %src0, %src2
  %tmp1 = xor i32 %src2, %tmp0
  %tmp2 = and i32 %src0, %src1
  %tmp3 = xor i32 %tmp1, %tmp2
  ret i32 %tmp3
}

This forms a bitselect, but the the instcombined version does not:

define i32 @bitselect_pat4_rrr(i32 %src0, i32 %src1, i32 %src2) #0 {
  %1 = xor i32 %src0, -1
  %tmp1 = and i32 %1, %src2
  %tmp2 = and i32 %src0, %src1
  %tmp3 = xor i32 %tmp1, %tmp2
  ret i32 %tmp3
}

opt crashes in SROA

This test case (https://gist.github.com/0bf0e257663a02890f36.git) will cause opt (hsail-stable-3.7) to crash.
It looks like to be an issue in SROA. For now, this problem could be worked around by disabling SROA with -use-new-sroa=false

opt -O3 -disable-simplify-libcalls -debug  -verify ./dump.linked.bc -o ./dump.opt.bc
CGSCCPASSMGR: SCC Refresh didn't change call graph. 
Inliner visiting SCC: opencl_frexp_global: 1 call sites. 
    Inlining: cost=always, Call:   %4 = call spir_func double @_Z5frexpdPU3AS4i(double %x, i32 addrspace(4)* %3)
CGSCCPASSMGR: Refreshing SCC with 1 nodes: 
Call graph node for function: 'opencl_frexp_global'<<0x4e1d920>>  #uses=1

CGSCCPASSMGR: SCC Refresh didn't change call graph. 
SROA function: opencl_frexp_global
SROA alloca:   %1 = alloca i32 addrspace(1)*, align 8
  Rewriting FCA loads and stores...
Slices of alloca:   %1 = alloca i32 addrspace(1)*, align 8
  [0,8) slice #0
    used by:   store i32 addrspace(1)* %exp, i32 addrspace(1)** %1, align 8
  [0,8) slice #1
    used by:   %3 = load i32 addrspace(4)*, i32 addrspace(4)** %2, align 8
Pre-splitting loads and stores 
  Searching for candidate loads and stores 
Rewriting alloca partition [0,8) to:   %1 = alloca i32 addrspace(1)*, align 8
  rewriting [0,8) slice #0
    original:   store i32 addrspace(1)* %exp, i32 addrspace(1)** %1, align 8
          to:   store i32 addrspace(1)* %exp, i32 addrspace(1)** %1, align 8
  rewriting [0,8) slice #1
    original:   %3 = load i32 addrspace(4)*, i32 addrspace(4)** %2, align 8
opt: /home/xkerox/docker_volume/HSA_1.0f/HLC-HSAIL-Development-LLVM.hsail-stable-3.7/HLC-HSAIL-Development-LLVM/lib/IR/Instructions.cpp:2257: static llvm::CastInst* llvm::CastInst::Create(llvm::Instructi
#0 0x23ea033 llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/opt/amd/bin/opt+0x23ea033)
#1 0x23ea348 PrintStackTraceSignalHandler(void*) (/opt/amd/bin/opt+0x23ea348)
#2 0x23e8c80 SignalHandler(int) (/opt/amd/bin/opt+0x23e8c80)
#3 0x7f5785300340 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10340)
#4 0x7f5784516cc9 gsignal /build/buildd/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56:0
#5 0x7f578451a0d8 abort /build/buildd/eglibc-2.19/stdlib/abort.c:91:0
#6 0x7f578450fb86 __assert_fail_base /build/buildd/eglibc-2.19/assert/assert.c:92:0
#7 0x7f578450fc32 (/lib/x86_64-linux-gnu/libc.so.6+0x2fc32)
#8 0x1f2349b llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, llvm::Twine const&, llvm::Instruction*) (/opt/amd/bin/opt+0x1f2349b)
#9 0x22f3185 llvm::IRBuilder<true, llvm::ConstantFolder, (anonymous namespace)::IRBuilderPrefixedInserter<true> >::CreateCast(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, llvm::Twine const&) (/
#10 0x22ed966 llvm::IRBuilder<true, llvm::ConstantFolder, (anonymous namespace)::IRBuilderPrefixedInserter<true> >::CreateBitCast(llvm::Value*, llvm::Type*, llvm::Twine const&) (/opt/amd/bin/opt+0x22ed96
#11 0x22dd4bb convertValue(llvm::DataLayout const&, llvm::IRBuilder<true, llvm::ConstantFolder, (anonymous namespace)::IRBuilderPrefixedInserter<true> >&, llvm::Value*, llvm::Type*) (/opt/amd/bin/opt+0x2
#12 0x22e11b6 (anonymous namespace)::AllocaSliceRewriter::visitLoadInst(llvm::LoadInst&) (/opt/amd/bin/opt+0x22e11b6)
#13 0x22f8a9f llvm::InstVisitor<(anonymous namespace)::AllocaSliceRewriter, bool>::visitLoad(llvm::LoadInst&) (/opt/amd/bin/opt+0x22f8a9f)
#14 0x22f3bc5 llvm::InstVisitor<(anonymous namespace)::AllocaSliceRewriter, bool>::visit(llvm::Instruction&) (/opt/amd/bin/opt+0x22f3bc5)
#15 0x22ee687 llvm::InstVisitor<(anonymous namespace)::AllocaSliceRewriter, bool>::visit(llvm::Instruction*) (/opt/amd/bin/opt+0x22ee687)
#16 0x22e0490 (anonymous namespace)::AllocaSliceRewriter::visit((anonymous namespace)::Slice const*) (/opt/amd/bin/opt+0x22e0490)
#17 0x22e905f (anonymous namespace)::SROA::rewritePartition(llvm::AllocaInst&, (anonymous namespace)::AllocaSlices&, (anonymous namespace)::AllocaSlices::Partition&) (/opt/amd/bin/opt+0x22e905f)
#18 0x22e98c8 (anonymous namespace)::SROA::splitAlloca(llvm::AllocaInst&, (anonymous namespace)::AllocaSlices&) (/opt/amd/bin/opt+0x22e98c8)
#19 0x22ea3d5 (anonymous namespace)::SROA::runOnAlloca(llvm::AllocaInst&) (/opt/amd/bin/opt+0x22ea3d5)
#20 0x22eb1ed (anonymous namespace)::SROA::runOnFunction(llvm::Function&) (/opt/amd/bin/opt+0x22eb1ed)
#21 0x1f49bcd llvm::FPPassManager::runOnFunction(llvm::Function&) (/opt/amd/bin/opt+0x1f49bcd)
#22 0x1fc5323 (anonymous namespace)::CGPassManager::RunPassOnSCC(llvm::Pass*, llvm::CallGraphSCC&, llvm::CallGraph&, bool&, bool&) (/opt/amd/bin/opt+0x1fc5323)
#23 0x1fc64bf (anonymous namespace)::CGPassManager::RunAllPassesOnSCC(llvm::CallGraphSCC&, llvm::CallGraph&, bool&) (/opt/amd/bin/opt+0x1fc64bf)
#24 0x1fc682a (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) (/opt/amd/bin/opt+0x1fc682a)
#25 0x1f4a0f6 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) (/opt/amd/bin/opt+0x1f4a0f6)
#26 0x1f4a873 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/opt/amd/bin/opt+0x1f4a873)
#27 0x1f4aadd llvm::legacy::PassManager::run(llvm::Module&) (/opt/amd/bin/opt+0x1f4aadd)
#28 0xeb955e main (/opt/amd/bin/opt+0xeb955e)
#29 0x7f5784501ec5 __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:321:0
#30 0xe96b69 _start (/opt/amd/bin/opt+0xe96b69)
Stack dump:
0.  Program arguments: /opt/amd/bin/opt -O3 -disable-simplify-libcalls -debug -verify ./dump.linked.bc -o ./dump.opt.bc 
1.  Running pass 'CallGraph Pass Manager' on module './dump.linked.bc'.
. Running pass 'SROA' on function '@opencl_frexp_global'

opt incorrectly hoists and replicates barrier

Before opt, notice the barrier is after label 17 which is a post-dom node

  %13 = icmp eq i32 %10, 0
  br i1 %13, label %14, label %17

; <label>:14                                      ; preds = %8
  %15 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(3)* @ZZ4mainEN3_EC__019__cxxamp_trampolineEPiiiiiiii.ZZZ4mainENK3_EC__0clEN11Concurrency11tiled_indexILi2ELi0ELi0EEEE1t, i64 0, i64 0
  store i32 123, i32 addrspace(3)* %15, align 4, !tbaa !8
  %16 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(3)* @ZZ4mainEN3_EC__019__cxxamp_trampolineEPiiiiiiii.ZZZ4mainENK3_EC__0clEN11Concurrency11tiled_indexILi2ELi0ELi0EEEE1t, i64 0, i64 1
  store i32 321, i32 addrspace(3)* %16, align 4, !tbaa !8
  br label %17

; <label>:17                                      ; preds = %14, %8
  tail call spir_func void @amp_barrier(i32 3) #17

After opt, the control flow has been transformed from a triangle into a diamond. The original barrier has been replicated and hosted into the branches, which is illegal:

  %12 = icmp eq i32 %9, 0
  br i1 %12, label %.thread, label %13

.thread:                                          ; preds = %8
  store i32 123, i32 addrspace(3)* getelementptr inbounds ([2 x i32], [2 x i32] addrspace(3)* @ZZ4mainEN3_EC__019__cxxamp_trampolineEPiiiiiiii.ZZZ4mainENK3_EC__0clEN11Concurrency11tiled_indexILi2ELi0ELi0EEEE1t, i64 0, i64 0), align 4, !tbaa !8
  store i32 321, i32 addrspace(3)* getelementptr inbounds ([2 x i32], [2 x i32] addrspace(3)* @ZZ4mainEN3_EC__019__cxxamp_trampolineEPiiiiiiii.ZZZ4mainENK3_EC__0clEN11Concurrency11tiled_indexILi2ELi0ELi0EEEE1t, i64 0, i64 1), align 4, !tbaa !8
  tail call spir_func void @__hsail_barrier() #4
  br label %"_ZZ4mainENK3$_0clEN11Concurrency11tiled_indexILi2ELi0ELi0EEE.exit"

; <label>:13                                      ; preds = %8
  tail call spir_func void @__hsail_barrier() #4
  %14 = icmp eq i32 %9, 1
  br i1 %14, label %15, label %"_ZZ4mainENK3$_0clEN11Concurrency11tiled_indexILi2ELi0ELi0EEE.exit"

The test files are here: https://gist.github.com/scchan/b69e6fd4225815741415

Tested on hsa-1.0f, hsail-stable-3.7 and affects both branches

Kernel arguments passed by reference and passed by value are compiled into identical HSAIL code

I'm on the latest HLC compiler from the branch hsail-stable-3.7. The generated HSAIL code for the following two kernels is identical, even though the first kernel takes args by reference (pointer) and the second kernel takes args by value. I compiled both kernels with -O2.

First kernel:

struct KernArgs
{
    uint  Arg32;
    ulong Arg64;
};

__kernel void Main(__global ulong *res, const __global struct KernArgs *args)
{
    *res = (ulong)args->Arg32|args->Arg64;
}

Second kernel:

struct KernArgs
{
    uint  Arg32;
    ulong Arg64;
};

__kernel void Main(__global ulong *res, const struct KernArgs args)
{
    *res = (ulong)args.Arg32|args.Arg64;
}

Compiling both kernels yields identical HSAIL code:

module &__llvm_hsail_module:1:0:$full:$large:$near;

prog kernel &__OpenCL_Main_kernel(
    kernarg_u64 %__global_offset_0,
    kernarg_u64 %__global_offset_1,
    kernarg_u64 %__global_offset_2,
    kernarg_u64 %__printf_buffer,
    kernarg_u64 %__vqueue_pointer,
    kernarg_u64 %__aqlwrap_pointer,
    kernarg_u64 %res,
    kernarg_u64 %args)
{
    // BB#0:
    ld_kernarg_align(8)_width(all)_u64  $d0, [%args];
    ld_global_align(8)_u64  $d1, [$d0+8];
    ld_global_align(4)_u32  $s0, [$d0];
    cvt_u64_u32 $d0, $s0;
    or_b64  $d0, $d0, $d1;
    ld_kernarg_align(8)_width(all)_u64  $d1, [%res];
    st_global_align(8)_u64  $d0, [$d1];
    ret;
};

The LLVM IR for the two kernels is:

First kernel:

; ModuleID = '/tmp/cloc23414/bug.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024"
target triple = "spir64-unknown-unknown"

%struct.KernArgs = type { i32, i64 }

; Function Attrs: nounwind
define spir_kernel void @__OpenCL_Main_kernel(i64 addrspace(1)* %res, %struct.KernArgs addrspace(1)* %args) #0 {
  %1 = alloca i64 addrspace(1)*, align 8
  %2 = alloca %struct.KernArgs addrspace(1)*, align 8
  store i64 addrspace(1)* %res, i64 addrspace(1)** %1, align 8
  store %struct.KernArgs addrspace(1)* %args, %struct.KernArgs addrspace(1)** %2, align 8
  %3 = load %struct.KernArgs addrspace(1)*, %struct.KernArgs addrspace(1)** %2, align 8
  %4 = getelementptr inbounds %struct.KernArgs, %struct.KernArgs addrspace(1)* %3, i32 0, i32 0
  %5 = load i32, i32 addrspace(1)* %4, align 4
  %6 = zext i32 %5 to i64
  %7 = load %struct.KernArgs addrspace(1)*, %struct.KernArgs addrspace(1)** %2, align 8
  %8 = getelementptr inbounds %struct.KernArgs, %struct.KernArgs addrspace(1)* %7, i32 0, i32 1
  %9 = load i64, i64 addrspace(1)* %8, align 8
  %10 = or i64 %6, %9
  %11 = load i64 addrspace(1)*, i64 addrspace(1)** %1, align 8
  store i64 %10, i64 addrspace(1)* %11, align 8
  ret void
}

attributes #0 = { nounwind }

!opencl.kernels = !{!0}
!opencl.enable.FP_CONTRACT = !{}
!opencl.ocl.version = !{!6}
!opencl.spir.version = !{!6}

!0 = !{void (i64 addrspace(1)*, %struct.KernArgs addrspace(1)*)* @__OpenCL_Main_kernel, !1, !2, !3, !4, !5}
!1 = !{!"kernel_arg_addr_space", i32 1, i32 1}
!2 = !{!"kernel_arg_access_qual", !"none", !"none"}
!3 = !{!"kernel_arg_type", !"ulong*", !"struct KernArgs*"}
!4 = !{!"kernel_arg_type_qual", !"", !"const"}
!5 = !{!"kernel_arg_base_type", !"ulong*", !"struct KernArgs*"}
!6 = !{i32 2, i32 0}

Second kernel:

; ModuleID = '/tmp/cloc23566/bug.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024"
target triple = "spir64-unknown-unknown"

%struct.KernArgs = type { i32, i64 }

; Function Attrs: nounwind
define spir_kernel void @__OpenCL_Main_kernel(i64 addrspace(1)* %res, %struct.KernArgs* byval %args) #0 {
  %1 = alloca i64 addrspace(1)*, align 8
  store i64 addrspace(1)* %res, i64 addrspace(1)** %1, align 8
  %2 = getelementptr inbounds %struct.KernArgs, %struct.KernArgs* %args, i32 0, i32 0
  %3 = load i32, i32* %2, align 4
  %4 = zext i32 %3 to i64
  %5 = getelementptr inbounds %struct.KernArgs, %struct.KernArgs* %args, i32 0, i32 1
  %6 = load i64, i64* %5, align 8
  %7 = or i64 %4, %6
  %8 = load i64 addrspace(1)*, i64 addrspace(1)** %1, align 8
  store i64 %7, i64 addrspace(1)* %8, align 8
  ret void
}

attributes #0 = { nounwind }

!opencl.kernels = !{!0}
!opencl.enable.FP_CONTRACT = !{}
!opencl.ocl.version = !{!6}
!opencl.spir.version = !{!6}

!0 = !{void (i64 addrspace(1)*, %struct.KernArgs*)* @__OpenCL_Main_kernel, !1, !2, !3, !4, !5}
!1 = !{!"kernel_arg_addr_space", i32 1, i32 0}
!2 = !{!"kernel_arg_access_qual", !"none", !"none"}
!3 = !{!"kernel_arg_type", !"ulong*", !"struct KernArgs"}
!4 = !{!"kernel_arg_type_qual", !"", !"const"}
!5 = !{!"kernel_arg_base_type", !"ulong*", !"struct KernArgs"}
!6 = !{i32 2, i32 0}

Mismatch of return value data type between function declaration and call-site

I'm on the latest HLC compiler from the branch hsail-stable-3.7.
Consider the following OpenCL program:

__attribute__((__noinline__)) bool IsEmpty()
{
    return true;
}

__kernel void Main()
{
    IsEmpty();
}

Compiling the program above with -O0 gives the following error:

The error is:

>         call    &IsEmpty (%IsEmpty) ();
>                          ^
input(27,17): Incompatible types of formal and actual arguments

ERROR:  The following command failed with return code 1.
        HSAILasm -o /tmp/hsa_finalizer-KenL4R/temp.hsail /tmp/cloc20050/temp.hsail

The corresponding intermediate assembly looks like:

module &__llvm_hsail_module:1:0:$full:$large:$near;

decl function &IsEmpty(arg_u32 %ret)();

function &IsEmpty(arg_u32 %IsEmpty)()
{

// BB#0:
    mov_b32    $s0, 1;
    st_arg_u32    $s0, [%IsEmpty];
    ret;
};

prog kernel &__OpenCL_Main_kernel(
    kernarg_u64 %__global_offset_0,
    kernarg_u64 %__global_offset_1,
    kernarg_u64 %__global_offset_2,
    kernarg_u64 %__printf_buffer,
    kernarg_u64 %__vqueue_pointer,
    kernarg_u64 %__aqlwrap_pointer)
{

    align(4) spill_u8 %__spillStack[4];
// BB#0:
    {
        arg_u8 %IsEmpty;
        call    &IsEmpty (%IsEmpty) ();
        ld_arg_u8    $s0, [%IsEmpty];
    }
    st_spill_align(4)_u32    $s0, [%__spillStack]; // 4-byte Folded Spill
    ret;
};

Apparently, the HSAIL code generator outputs an arg_u32 return value type for the IsEmpty() function, but the corresponding argument on the call-site is of type arg_u8.

Need to implement integer mad matching

The existing pattern was incorrect and required rewriting to make progress on instruction refactoring.

The old TernaryFusedShlAdd did (add (shl $src0, imm:$src1), $src2) -> mad $src1, $src2, $src3 where src2 is not 1 << $src2 like it needs to be.

Crash when trying to use __builtin_hsail_gridsize()

The test case is simple:

unsigned  get_global_size(unsigned dimindx)
{
  switch(dimindx)
    {
      case 0: return __builtin_hsail_gridsize(0);
      case 1: return __builtin_hsail_gridsize(1);
      case 2: return __builtin_hsail_gridsize(2);
      default: return 0;
    }
}

Trying to compile this with:

/me/LLVM_370_HSAIL_rwdi_NA_rtti/bin/clang-3.7 -cc1 -triple hsail64 -emit-llvm-bc -emit-llvm-uselists -disable-free -disable-llvm-verifier -main-file-name b.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -no-integrated-as -mconstructor-aliases -dwarf-column-info -coverage-file /tmp/b.bc -resource-dir /home/LLVM_370_HSAIL_rwdi_NA_rtti/bin/../lib/clang/3.7.0 -fno-dwarf-directory-asm -fdebug-compilation-dir /tmp -ferror-limit 19 -fmessage-length 160 -mstackrealign -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -ffake-address-space-map -o /tmp/b.bc -x c /tmp/b.c

.. results in a crash:

clang-3.7: /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/include/llvm/ADT/ArrayRef.h:187: const T& llvm::ArrayRef<T>::operator[](size_t) const [with T = llvm::Type*; size_t = long unsigned int]: Assertion `Index < Length && "Invalid index!"' failed.

Program received signal SIGABRT, Aborted.

#4  0x00007ffff3893745 in llvm::ArrayRef<llvm::Type*>::operator[] (this=0x7fffffff6cf0, Index=0)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/include/llvm/ADT/ArrayRef.h:187
        __PRETTY_FUNCTION__ = "const T& llvm::ArrayRef<T>::operator[](size_t) const [with T = llvm::Type*; size_t = long unsigned int]"


#5  0x00007ffff383bc6f in DecodeFixedType (Infos=..., Tys=..., Context=...) at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/lib/IR/Function.cpp:775
        D = {Kind = llvm::Intrinsic::IITDescriptor::Argument, {Integer_Width = 1, Float_Width = 1, Vector_Width = 1, Pointer_AddressSpace = 1, 
            Struct_NumElements = 1, Argument_Info = 1}}
        __PRETTY_FUNCTION__ = "llvm::Type* DecodeFixedType(llvm::ArrayRef<llvm::Intrinsic::IITDescriptor>&, llvm::ArrayRef<llvm::Type*>, llvm::LLVMContext&)"


#6  0x00007ffff383bff1 in llvm::Intrinsic::getType (Context=..., id=llvm::Intrinsic::hsail_gridsize, Tys=...)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/lib/IR/Function.cpp:828
        Table = {<llvm::SmallVectorImpl<llvm::Intrinsic::IITDescriptor>> = {<llvm::SmallVectorTemplateBase<llvm::Intrinsic::IITDescriptor, true>> = {<llvm::SmallVectorTemplateCommon<llvm::Intrinsic::IITDescriptor, void>> = {<llvm::SmallVectorBase> = {BeginX = 0x7fffffff6e78, EndX = 0x7fffffff6e88, 
                  CapacityX = 0x7fffffff6eb8}, FirstEl = {<llvm::AlignedCharArray<4ul, 8ul>> = {
                    buffer = "\v\000\000\000\001\000\000"}, <No data fields>}}, <No data fields>}, <No data fields>}, Storage = {InlineElts = {
              {<llvm::AlignedCharArray<4ul, 8ul>> = {buffer = "\a\000\000\000 \000\000"}, <No data fields>}, {<llvm::AlignedCharArray<4ul, 8ul>> = {
                  buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<4ul, 8ul>> = {
                  buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<4ul, 8ul>> = {
                  buffer = "M\226\336\367\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<4ul, 8ul>> = {
                  buffer = "\001\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<4ul, 8ul>> = {
                  buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<4ul, 8ul>> = {
                  buffer = "\000\000\000\000\000\000\000"}, <No data fields>}}}}
        TableRef = {Data = 0x7fffffff6e80, Length = 1}
        ResultTy = 0x7fffffff68a0
        ArgTys = {<llvm::SmallVectorImpl<llvm::Type*>> = {<llvm::SmallVectorTemplateBase<llvm::Type*, true>> = {<llvm::SmallVectorTemplateCommon<llvm::Type*, void>> = {<llvm::SmallVectorBase> = {BeginX = 0xffffffff, EndX = 0x7fffffff6e30, CapacityX = 0x7ffff35ac4e0}, FirstEl = {<llvm::AlignedCharArray<8ul, 8ul>> = {
                    buffer = "\220\265\334\363\377\177\000"}, <No data fields>}}, <No data fields>}, <No data fields>}, Storage = {InlineElts = {
              {<llvm::AlignedCharArray<8ul, 8ul>> = {buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "M\226\336\367\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "\001\000\000\000\000\000\000"}, <No data fields>}}}}


#7  0x00007ffff383cf1e in llvm::Intrinsic::getDeclaration (M=0x6bf900, id=llvm::Intrinsic::hsail_gridsize, Tys=...)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/lib/IR/Function.cpp:870
No locals.


#8  0x00007fffefd2dce8 in clang::CodeGen::CodeGenModule::getIntrinsic (this=0x6d53b0, IID=1335, Tys=...)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CodeGenModule.cpp:2564
No locals.


#9  0x00007fffefb1c08e in clang::CodeGen::CodeGenFunction::EmitBuiltinExpr (this=0x7fffffffac80, FD=0x711438, BuiltinID=1012, E=0x711620, ReturnValue=...)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGBuiltin.cpp:1803
        FTy = 0x7ffff357e4d8
        BuiltinRetType = {Value = {Value = 7071232}}
        Args = {<llvm::SmallVectorImpl<llvm::Value*>> = {<llvm::SmallVectorTemplateBase<llvm::Value*, true>> = {<llvm::SmallVectorTemplateCommon<llvm::Value*, v---Type <return> to continue, or q <return> to quit---
oid>> = {<llvm::SmallVectorBase> = {BeginX = 0x7fffffff6fe8, EndX = 0x7fffffff6fe8, CapacityX = 0x7fffffff7068}, 
                FirstEl = {<llvm::AlignedCharArray<8ul, 8ul>> = {
                    buffer = "\220\020i\000\000\000\000"}, <No data fields>}}, <No data fields>}, <No data fields>}, Storage = {InlineElts = {
              {<llvm::AlignedCharArray<8ul, 8ul>> = {buffer = "\000\346k\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = " p\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "0p\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "\276\f.\353\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "\000\346k\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "@p\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "Pp\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "\276\f.\353\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "Pp\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "\000\346k\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "\000\346k\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "pp\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "\200p\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "\026\004.\353\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                  buffer = "\200p\377\377\377\177\000"}, <No data fields>}}}}
        ICEArguments = 1
        Error = clang::ASTContext::GE_None
        F = 0x7ffff35b7430
        V = 0x3f400000000
        RetTy = 0x7ffff7deffb0 <_dl_runtime_resolve+80>
        Result = {<clang::Expr::EvalStatus> = {HasSideEffects = false, Diag = 0x0}, Val = {Kind = clang::APValue::Uninitialized, static DataSize = 48, 
            Data = {<llvm::AlignedCharArray<8ul, 48ul>> = {
                buffer = "`w\377\377\377\177\000\000\200w\377\377\377\177\000\000\360\273-\353\377\177\000\000\220w\377\377\377\177\000\000\000\000\000\000\000\000\000\000\240w\377\377\377\177\000"}, <No data fields>}}}
        __PRETTY_FUNCTION__ = "clang::CodeGen::RValue clang::CodeGen::CodeGenFunction::EmitBuiltinExpr(const clang::FunctionDecl*, unsigned int, const clang::CallExpr*, clang::CodeGen::ReturnValueSlot)"
        Name = 0x7ffff06479b5 "__builtin_hsail_gridsize"
        IntrinsicID = llvm::Intrinsic::hsail_gridsize

#10 0x00007fffefbff7f2 in clang::CodeGen::CodeGenFunction::EmitCallExpr (this=0x7fffffffac80, E=0x711620, ReturnValue=...)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGExpr.cpp:3122
        builtinID = 1012
        FD = 0x711438
        TargetDecl = 0x711438
        Callee = 0x20

#11 0x00007fffefc3c7c8 in (anonymous namespace)::ScalarExprEmitter::VisitCallExpr (this=0x7fffffffa130, E=0x711620)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGExprScalar.cpp:327
        V = 0x711620

#12 0x00007fffefc4df53 in clang::StmtVisitorBase<clang::make_ptr, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::Visit (this=0x7fffffffa130, 
    S=0x711620) at /home/b/tools/clang/include/clang/AST/StmtNodes.inc:299
No locals.

#13 0x00007fffefc3be14 in (anonymous namespace)::ScalarExprEmitter::Visit (this=0x7fffffffa130, E=0x711620)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGExprScalar.cpp:201
        DL = {OriginalLocation = {Loc = {Ref = {MD = 0x0}}}, CGF = @0x7fffffffac80}

#14 0x00007fffefc42ac0 in (anonymous namespace)::ScalarExprEmitter::VisitCastExpr (this=0x7fffffffa130, CE=0x711650)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGExprScalar.cpp:1579
        E = 0x711620
        DestTy = {Value = {Value = 7073152}}
        Kind = clang::CK_IntegralCast
        __PRETTY_FUNCTION__ = "llvm::Value* {anonymous}::ScalarExprEmitter::VisitCastExpr(clang::CastExpr*)"

#15 0x00007fffefc4efb7 in clang::StmtVisitorBase<clang::make_ptr, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::VisitImplicitCastExpr (
    this=0x7fffffffa130, S=0x711650) at /home/b/tools/clang/include/clang/AST/StmtNodes.inc:393
No locals.

#16 0x00007fffefc4e073 in clang::StmtVisitorBase<clang::make_ptr, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::Visit (this=0x7fffffffa130, 
    S=0x711650) at /home/b/tools/clang/include/clang/AST/StmtNodes.inc:393
No locals.

#17 0x00007fffefc3be14 in (anonymous namespace)::ScalarExprEmitter::Visit (this=0x7fffffffa130, E=0x711650)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGExprScalar.cpp:201
        DL = {OriginalLocation = {Loc = {Ref = {MD = 0x0}}}, CGF = @0x7fffffffac80}

#18 0x00007fffefc4cd8c in clang::CodeGen::CodeGenFunction::EmitScalarExpr (this=0x7fffffffac80, E=0x711650, IgnoreResultAssign=false)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGExprScalar.cpp:3475
        __PRETTY_FUNCTION__ = "llvm::Value* clang::CodeGen::CodeGenFunction::EmitScalarExpr(const clang::Expr*, bool)"

#19 0x00007fffefcd84ca in clang::CodeGen::CodeGenFunction::EmitReturnStmt (this=0x7fffffffac80, S=...)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:1075
        RV = 0x711650
        cleanupScope = {CleanupStackDepth = {Size = 0}, LifetimeExtendedCleanupStackSize = 0, OldDidCallStackSave = false, PerformCleanup = true, 
          CGF = @0x7fffffffac80}

#20 0x00007fffefcd4b68 in clang::CodeGen::CodeGenFunction::EmitStmt (this=0x7fffffffac80, S=0x711668)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:136
        __PRETTY_FUNCTION__ = "void clang::CodeGen::CodeGenFunction::EmitStmt(const clang::Stmt*)"

#21 0x00007fffefcd953c in clang::CodeGen::CodeGenFunction::EmitCaseStmt (this=0x7fffffffac80, S=...)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:1297
        CaseVal = 0x7067b0
        CaseDest = 0x6dc3d0
        CurCase = 0x7113c8
        NextCase = 0x0

#22 0x00007fffefcd5287 in clang::CodeGen::CodeGenFunction::EmitSimpleStmt (this=0x7fffffffac80, S=0x7113c8)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:265
No locals.

#23 0x00007fffefcd4871 in clang::CodeGen::CodeGenFunction::EmitStmt (this=0x7fffffffac80, S=0x7113c8)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:50
        __PRETTY_FUNCTION__ = "void clang::CodeGen::CodeGenFunction::EmitStmt(const clang::Stmt*)"

#24 0x00007fffefcd5469 in clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope (this=0x7fffffffac80, S=..., GetLast=false, AggSlot=...)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:293
        I = 0x711938
        E = 0x711958
        RetAlloca = 0x0

#25 0x00007fffefcd538d in clang::CodeGen::CodeGenFunction::EmitCompoundStmt (this=0x7fffffffac80, S=..., GetLast=false, AggSlot=...)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:283
        CrashInfo = {<llvm::PrettyStackTraceEntry> = {_vptr.PrettyStackTraceEntry = 0x7ffff08ec960 <vtable for clang::PrettyStackTraceLoc+16>, 
            NextEntry = 0x7fffffffb670}, SM = @0x6905e0, Loc = {ID = 1256}, Message = 0x7fffeff87228 "LLVM IR generation of compound statement ('{}')"}
        Scope = {<clang::CodeGen::CodeGenFunction::RunCleanupsScope> = {CleanupStackDepth = {Size = 0}, LifetimeExtendedCleanupStackSize = 0, 
            OldDidCallStackSave = false, PerformCleanup = true, CGF = @0x7fffffffac80}, Range = {B = {ID = 1256}, E = {ID = 1437}}, 
          Labels = {<llvm::SmallVectorImpl<clang::LabelDecl const*>> = {<llvm::SmallVectorTemplateBase<clang::LabelDecl const*, true>> = {<llvm::SmallVectorTemplateCommon<clang::LabelDecl const*, void>> = {<llvm::SmallVectorBase> = {BeginX = 0x7fffffffa5c0, EndX = 0x7fffffffa5c0, CapacityX = 0x7fffffffa5e0}, 
                  FirstEl = {<llvm::AlignedCharArray<8ul, 8ul>> = {
                      buffer = "\000\000\000\000\000\000\000"}, <No data fields>}}, <No data fields>}, <No data fields>}, Storage = {InlineElts = {
                {<llvm::AlignedCharArray<8ul, 8ul>> = {buffer = " \031q\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                    buffer = "\360\245\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                    buffer = "\022M\306\357\377\177\000"}, <No data fields>}}}}, ParentScope = 0x0}

#26 0x00007fffefcd5175 in clang::CodeGen::CodeGenFunction::EmitSimpleStmt (this=0x7fffffffac80, S=0x711920)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:256
No locals.

#27 0x00007fffefcd4871 in clang::CodeGen::CodeGenFunction::EmitStmt (this=0x7fffffffac80, S=0x711920)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:50
        __PRETTY_FUNCTION__ = "void clang::CodeGen::CodeGenFunction::EmitStmt(const clang::Stmt*)"

#28 0x00007fffefcda057 in clang::CodeGen::CodeGenFunction::EmitSwitchStmt (this=0x7fffffffac80, S=...)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:1599
        SavedSwitchInsn = 0x0
        SavedSwitchWeights = 0x0
        SavedCRBlock = 0x0
        ConstantCondValue = {<llvm::APInt> = {BitWidth = 1, {VAL = 0, pVal = 0x0}}, IsUnsigned = false}
        SwitchExit = {Block = 0x706610, ScopeDepth = {Size = 0}, Index = 2}
        ConditionScope = {CleanupStackDepth = {Size = 0}, LifetimeExtendedCleanupStackSize = 0, OldDidCallStackSave = false, PerformCleanup = true, 
          CGF = @0x7fffffffac80}
        CondV = 0x7066a8
        DefaultBlock = 0x6dc2f0
        OuterContinue = {Block = 0x0, ScopeDepth = {Size = -1}, Index = 0}
        __PRETTY_FUNCTION__ = "void clang::CodeGen::CodeGenFunction::EmitSwitchStmt(const clang::SwitchStmt&)"

#29 0x00007fffefcd4b8b in clang::CodeGen::CodeGenFunction::EmitStmt (this=0x7fffffffac80, S=0x711380)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:138
        __PRETTY_FUNCTION__ = "void clang::CodeGen::CodeGenFunction::EmitStmt(const clang::Stmt*)"

#30 0x00007fffefcd5469 in clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope (this=0x7fffffffac80, S=..., GetLast=false, AggSlot=...)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CGStmt.cpp:293
        I = 0x7119b8
        E = 0x7119c0
        RetAlloca = 0x0

#31 0x00007fffefd1922a in clang::CodeGen::CodeGenFunction::EmitFunctionBody (this=0x7fffffffac80, Args=..., Body=0x7119a0)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CodeGenFunction.cpp:794
        S = 0x7119a0

#32 0x00007fffefd19b0c in clang::CodeGen::CodeGenFunction::GenerateCode (this=0x7fffffffac80, GD=..., Fn=0x6d49b8, FnInfo=...)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CodeGenFunction.cpp:915
        Body = 0x7119a0
        FD = 0x711258
        Args = {<llvm::SmallVector<clang::VarDecl const*, 16u>> = {<llvm::SmallVectorImpl<clang::VarDecl const*>> = {<llvm::SmallVectorTemplateBase<clang::VarDecl const*, true>> = {<llvm::SmallVectorTemplateCommon<clang::VarDecl const*, void>> = {<llvm::SmallVectorBase> = {BeginX = 0x7fffffffaac8, 
                    EndX = 0x7fffffffaad0, CapacityX = 0x7fffffffab48}, FirstEl = {<llvm::AlignedCharArray<8ul, 8ul>> = {
                      buffer = "(\tq\000\000\000\000"}, <No data fields>}}, <No data fields>}, <No data fields>}, Storage = {InlineElts = {
                {<llvm::AlignedCharArray<8ul, 8ul>> = {buffer = "\220\020i\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                    buffer = "\260^m\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                    buffer = "\000\253\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                    buffer = "\000\275<\360\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                    buffer = "\220\020i\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                    buffer = " \336\377\377\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                    buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                    buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                    buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                    buffer = "M\226\336\367\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                    buffer = "\001\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                    buffer = "\000\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                    buffer = "@\000\000\000\000\000\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                    buffer = "p_\233\357\377\177\000"}, <No data fields>}, {<llvm::AlignedCharArray<8ul, 8ul>> = {
                    buffer = "\340\005i\000\000\000\000"}, <No data fields>}}}}, <No data fields>}
        ResTy = {Value = {Value = 7073152}}
        MD = 0x0
        BodyRange = {B = {ID = 1232}, E = {ID = 1439}}
        Loc = {ID = 1194}

#33 0x00007fffefd2d777 in clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition (this=0x6d53b0, GD=..., GV=0x6d49b8)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CodeGenModule.cpp:2476
        D = 0x711258
        FI = @0x6dc870: {<llvm::FoldingSetImpl::Node> = {NextInFoldingSetBucket = 0x6d6121}, CallingConvention = 0, EffectiveCallingConvention = 0, 
          ASTCallingConvention = 0, InstanceMethod = 0, ChainCall = 0, NoReturn = 0, ReturnsRetained = 0, HasRegParm = 0, RegParm = 0, Required = {
            NumRequired = 4294967295}, ArgStruct = 0x0, NumArgs = 1}
        Ty = 0x6d6ea0
        __PRETTY_FUNCTION__ = "void clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition(clang::GlobalDecl, llvm::GlobalValue*)"
        Fn = 0x6d49b8

#34 0x00007fffefd2a48d in clang::CodeGen::CodeGenModule::EmitGlobalDefinition (this=0x6d53b0, GD=..., GV=0x0)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CodeGenModule.cpp:1531
        D = 0x711258
        CrashInfo = {<llvm::PrettyStackTraceEntry> = {_vptr.PrettyStackTraceEntry = 0x7fffebbea520 <vtable for clang::PrettyStackTraceDecl+16>, 
            NextEntry = 0x7fffffffc210}, TheDecl = 0x711258, Loc = {ID = 1194}, SM = @0x6905e0, Message = 0x7fffeffc60f0 "Generating code for declaration"}

#35 0x00007fffefd29c9e in clang::CodeGen::CodeGenModule::EmitGlobal (this=0x6d53b0, GD=...)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CodeGenModule.cpp:1387
        Global = 0x711258
        __PRETTY_FUNCTION__ = "void clang::CodeGen::CodeGenModule::EmitGlobal(clang::GlobalDecl)"
        MangledName = {static npos = 18446744073709551615, Data = 0x7fffffffb740 "`\267\377\377\377\177", Length = 140737214714484}

#36 0x00007fffefd31290 in clang::CodeGen::CodeGenModule::EmitTopLevelDecl (this=0x6d53b0, D=0x711258)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CodeGenModule.cpp:3242
        __PRETTY_FUNCTION__ = "void clang::CodeGen::CodeGenModule::EmitTopLevelDecl(clang::Decl*)"

#37 0x00007fffefe43075 in (anonymous namespace)::CodeGeneratorImpl::HandleTopLevelDecl (this=0x6bf6f0, DG=...)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/ModuleBuilder.cpp:130
        I = 0x7fffffffc1b0
        E = 0x7fffffffc1b8
        HandlingDecl = {Self = @0x6bf6f0}

#38 0x00007fffefd112ec in clang::BackendConsumer::HandleTopLevelDecl (this=0x6bf570, D=...)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/CodeGen/CodeGenAction.cpp:111
        CrashInfo = {<llvm::PrettyStackTraceEntry> = {_vptr.PrettyStackTraceEntry = 0x7fffebbea520 <vtable for clang::PrettyStackTraceDecl+16>, 
            NextEntry = 0x7fffffffc2a0}, TheDecl = 0x711258, Loc = {ID = 0}, SM = @0x6905e0, Message = 0x7fffeffb42c0 "LLVM IR generation of declaration"}

#39 0x00007fffe9e7e069 in clang::ParseAST (S=..., PrintStats=false, SkipFunctionBodies=false)
    at /home/devel/a/hsaf/HLC-HSAIL-Development-LLVM/tools/clang/lib/Parse/ParseAST.cpp:142
        OldCollectStats = false
        Consumer = 0x6bf570
        ParseOP = std::unique_ptr<clang::Parser> containing 0x6d0a10
        P = @0x6d0a10: {<clang::CodeCompletionHandler> = {_vptr.CodeCompletionHandler = 0x7fffea19fb18 <vtable for clang::Parser+16>}, PP = @0x691090, Tok = {
            Loc = 1440, UintData = 0, PtrData = 0x0, Kind = clang::tok::eof, Flags = 0}, PrevTokLocation = {ID = 1439}, ParenCount = 0, BracketCount = 0, 
          BraceCount = 0, Actions = @0x6cdd60, Diags = @0x68f260, NumCachedScopes = 3, ScopeCache = {0x6bf260, 0x6f9d70, 0x6f9f60, 0x0 <repeats 13 times>}, 
          Ident__exception_code = 0x0, Ident___exception_code = 0x0, Ident_GetExceptionCode = 0x0, Ident__exception_info = 0x0, Ident___exception_info = 0x0, 
          Ident_GetExceptionInfo = 0x0, Ident__abnormal_termination = 0x0, Ident___abnormal_termination = 0x0, Ident_AbnormalTermination = 0x0, 
          Ident__except = 0x0, Ident_sealed = 0x0, Ident_super = 0x6cdaf8, Ident_vector = 0x0, Ident_bool = 0x0, Ident_pixel = 0x0, Ident_instancetype = 0x0, 
          Ident_introduced = 0x0, Ident_deprecated = 0x0, Ident_obsoleted = 0x0, Ident_unavailable = 0x0, Ident_message = 0x0, Ident_final = 0x0, 
          Ident_override = 0x0, 

Re-implement analyzeCompare and optimizeCompareInstr

This code has a few problems, so I've had to remove it. These were renamed and the parameters changed from a previous version of LLVM, so these have been dead for an known amount of time. It's also not clear if these are used for the intended purpose.

analyzeCompare:

  • was always returning true, regardless of whether it set the required values.
  • Only set SrcReg, never the mask or value
    • Was checking the wrong operand for the source operand registers, so this probably never worked correctly.
    • Was ignoring the second operand

optimizeCompareInstr's comment says it

// Simplify constructions like this:
// cmp_gt $c0, $s2, $s1;
// cmov $s1, $c0, -1, 0;
// cmp_eq $c0, $s1, 0;
//
// Instead we should get:
// not cmp_gt $c0, $s2, $s1;

This should be handled by the DAG combiner already, but for some reason is not. My guess is this is not handled now because of one of the many places that assume ZeroOrOneBooleanContent in the setcc optimizations. This also did not handle fixing this situation if the cmov's immediate operands were swapped to false, true.

Alignment for LD/ST instructions may exceed HSAIL limit (BRIG_ALIGNMENT_256)

I'm on the latest HLC compiler from the branch hsail-stable-3.7. I compiled with -O2.

The following kernel doesn't assemble after compilation, because the alignment info generated by the code generator can exceed HSAIL's limit of BRIG_ALIGNMENT_256:

__kernel void Bug(size_t index)
{
    __global uint *ptr = (__global uint *)(index*512);
    *ptr = 0xdeadc0de;
}

The emitted error is:

>   st_global_align(512)_u32    3735929054, [$d0];
>                   ^
input(16,18): Invalid alignment

ERROR:  The following command failed with return code 1.
        HSAILasm -o /tmp/hsa_finalizer-qVC9u8/temp.hsail /tmp/cloc31620/temp.hsail

I think all that needs to be done is to limit the alignment info produced by the code generator to BRIG_ALIGNMENT_256.

Mixed type operands for fcopysign do not work

fcopysign is allowed to have different operand types, and the sign will still be copied. e.g. the sign from a float value can be copied to a double value. This does not currently work with HSAIL.

support for hsail intrinsics

Could you add support for the following intrinsics? hsail-stable-3.7 and newer would be fine, thanks

call  &__hsail_sqrt_fc_f32 (%__hsail_sqrt_fc_f32) (%__param_p0, %__param_p1);
call  &__hsail_div_fc_f32 (%__hsail_div_fc_f32) (%__param_p0, %__param_p1, %__param_p2);
call  &__hsail_mul_f64 (%__hsail_mul_f64) (%__param_p0, %__param_p1);

Use of global variable aliases does not work

Testcase in test/CodeGen/HSAIL/global-variable-alias.ll

@0 = addrspace(2) global [4 x i32] [ i32 5, i32 4, i32 432, i32 3 ]

@alias = alias [4 x i32] addrspace(2)* @0

define i32 @use_alias_gv() nounwind {
%gep = getelementptr [4 x i32] addrspace(2)* @alias, i32 0, i32 1
%load = load i32 addrspace(2)* %gep
ret i32 %load
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.