dependablesystemslab / lltfi Goto Github PK

LLTFI is a tool, which is an extension of LLFI, allowing users to run fault injection experiments on C/C++, TensorFlow and PyTorch applications at the LLVM IR level. Please refer to the paper below. NOTE: If you publish a paper using LLTFI, please cite the following paper:

Home Page: https://blogs.ubc.ca/dependablesystemslab/2022/07/29/lltfi-framework-agnostic-fault-injection-for-machine-learning-applications/

License: Apache License 2.0

CMake 0.16% Shell 1.61% Python 15.90% Java 21.40% C++ 41.18% C 11.38% Makefile 0.07% HTML 0.08% Batchfile 0.01% GAP 0.37% TeX 0.18% JavaScript 2.49% CSS 0.28% Dockerfile 0.13% PureBasic 0.01% Jupyter Notebook 4.75%

lltfi's People

Contributors

Stargazers

Watchers

Forkers

hjiang13 anushreebannadabhavi 440791217 abrahamchan navin-mohan yaoyunzhou manisadati abishe8 ranbir-sharma zebron22

lltfi's Issues

Fix hard-coded paths in build scripts of ML models

LLTFI/sample_programs/ml_sample_programs/nlp_models/bert/compile.sh

Line 20 in 88458b9

 clang++ -o model.exe model.o -L/home/LLTFI/LLTFI/build/bin/../runtime_lib -lllfi-rt -lpthread -L /Debug/lib -Wl,-rpath /home/LLTFI/LLTFI/build/bin/../runtime_lib -I$ONNX_MLIR_SRC/include -O0 -lonnx_proto -lprotobuf -lcruntime -ljson-c 

Unexpected behavior with AVX-512

I observed that compiling certain applications while using both the AVX-512 extensions and LLTFI leads to unexpected behavior. In the current setup we select the fadd, fmul, fsub and fdiv instructions only, and we run LLTFI's faultinjectionpass as last in the LLVM optimization pipeline. The issue first appeared with QMCPack v3.12.0, compiled in full double precision and with AVX-512. With this configuration, I observed odd behaviors (e.g., random crashes, memory leaks) even when not injecting any fault, suggesting some form of register corruption caused by LLTFI's injectFault IR functions.

I then managed to reproduce the issue by creating a small test program as follows:

#include "Sample_vector_util.h"

int main(int argc, char** argv)
{
    int i=0, n=32;
    FLOAT *a, *b, *c;
    a = (FLOAT*)malloc(sizeof(FLOAT)*n);
    b = (FLOAT*)malloc(sizeof(FLOAT)*n);
    c = (FLOAT*)malloc(sizeof(FLOAT)*n);

    for(i=0; i<n; i++)
    {
        a[i] = 1;
        b[i] = i;
        c[i] = 0;
     }

    multiplyVec(a, b, c, n);
    printVec(c, n);

    free(a);
    free(b);
    free(c);
    return 0;
}

The printVec and multiplyVec functions are declared in the Sample_vector_util.h header:

#include<stdio.h>
#include<stdlib.h>

#define FLOAT double

void multiplyVec(FLOAT* a, FLOAT* b, FLOAT* c, int n);
void printVec(FLOAT* c, int n);

And defined in Sample_vector_util.c as follows:

#include "Sample_vector_util.h"

void multiplyVec(FLOAT* a, FLOAT* b, FLOAT* c, int n)
{
    int i=0;
#pragma clang loop vectorize_width(8)
    for(i=0;i<n;i++)
    {
        c[i] = a[i] * b[i];
    }
}

void printVec(FLOAT* c, int n)
{
    int i=0;
    for(i=0; i<n; i++)
    {
        printf("c[%d] = %f\n", i, c[i]);
    }
}

We are vectorizing the for loop within the multiplyVec function, and setting vectorize_width explicitly to 8 to enforce the full AVX-512 vector width. We then compile with -march=native. The generated LLVM IR does not suggest anything odd - here is a small excerpt of the loop body:

  %wide.load = load <8 x double>, ptr %5, align 8, !dbg !42, !tbaa !44, !alias.scope !48, !llfi_index !51  
[...]
  %9 = getelementptr inbounds double, ptr %b, i64 %index, !dbg !58, !llfi_index !59
  %wide.load20 = load <8 x double>, ptr %9, align 8, !dbg !58, !tbaa !44, !alias.scope !60, !llfi_index !62
[...]  
  %13 = fmul <8 x double> %wide.load, %wide.load20, !dbg !69, !llfi_index !70
  %fi3 = call <8 x double> @Sample_vector_util_temp.ll_injectFault1(i64 45, <8 x double> %13, i32 18, i32 0, i32 1, i32 0, ptr @fmul_namestr), !llfi_injectfault !71

Here is instead the definition of the injectFault function itself:

define <8 x double> @Sample_vector_util_temp.ll_injectFault1(i64 %0, <8 x double> %1, i32 %2, i32 %3, i32 %4, i32 %5, ptr %6) {
entry:
  %tmploc = alloca <8 x double>, align 64
  store <8 x double> %1, ptr %tmploc, align 64
  %pre_cond = call i1 @preFunc(i64 %0, i32 %2, i32 %3, i32 %4)
  br i1 %pre_cond, label %inject, label %exit

inject:                                           ; preds = %entry
  %tmploc_cast = bitcast ptr %tmploc to ptr
  call void @injectFunc(i64 %0, i32 512, ptr %tmploc_cast, i32 %3, i32 %5, ptr %6)
  br label %exit

exit:                                             ; preds = %inject, %entry
  %updateval = load <8 x double>, ptr %tmploc, align 64
  ret <8 x double> %updateval
}

However, when running the compiled program, the output (which should show consecutive numbers from 0 to 31) is incorrect:

c[0] = 0.000000
c[1] = 1.000000
c[2] = 2.000000
c[3] = 3.000000
c[4] = 4.000000
c[5] = 5.000000
c[6] = 0.000000
c[7] = 0.000000
c[8] = 8.000000
c[9] = 9.000000
c[10] = 10.000000
c[11] = 11.000000
c[12] = 12.000000
c[13] = 13.000000
c[14] = 0.000000
c[15] = 0.000000
c[16] = 16.000000
c[17] = 17.000000
c[18] = 18.000000
c[19] = 19.000000
c[20] = 20.000000
c[21] = 21.000000
c[22] = 0.000000
c[23] = 0.000000
c[24] = 24.000000
c[25] = 25.000000
c[26] = 26.000000
c[27] = 27.000000
c[28] = 28.000000
c[29] = 29.000000
c[30] = 0.000000
c[31] = 0.000000

Essentially, the last 2 elements in each 8-element-wide fmul are corrupted. When using a vectorization width of 4 (i.e., standard AVX), the output is correct.

GenllfiIndex for parallel compilation

Add FI in ML programs using GUI

free():invalid pointer

When I run the instrument on /lltfi/sample_programs/factorial.ll
An error occurred:

`free(): invalid pointer
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: /home/hailong/llvm-project/build/bin/opt -load /home/hailong/LLTFI/BUILD/lltfi/bin/../llvm_passes/llfi-passes.so -genllfiindexpass -enable-new-pm=0 -o /home/hailong/LLTFI/BUILD/lltfi/sample_programs/sum/llfi/sum-llfi_index.ll /home/hailong/LLTFI/BUILD/lltfi/sample_programs/sum/sum.ll -S
#0 0x00007f7d04c10374 PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
#1 0x00007f7d04c0dc14 SignalHandler(int) Signals.cpp:0:0
#2 0x00007f7d0467f090 (/lib/x86_64-linux-gnu/libc.so.6+0x43090)
#3 0x00007f7d0467f00b raise /build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
#4 0x00007f7d0465e859 abort /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:81:7
#5 0x00007f7d046c926e __libc_message /build/glibc-SzIz7B/glibc-2.31/libio/../sysdeps/posix/libc_fatal.c:155:5
#6 0x00007f7d046d12fc /build/glibc-SzIz7B/glibc-2.31/malloc/malloc.c:5348:3
#7 0x00007f7d046d2b2c _int_free /build/glibc-SzIz7B/glibc-2.31/malloc/malloc.c:4173:5
#8 0x00007f7d049712af std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_assign(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) (/lib/x86_64-linux-gnu/libstdc++.so.6+0x1432af)
#9 0x00007f7d035c53ae void llvm::cl::initializer<char [38]>::apply<llvm::cl::opt<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, false, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >(llvm::cl::opt<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, false, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >&) const (/home/hailong/LLTFI/BUILD/lltfi/bin/../llvm_passes/llfi-passes.so+0xb83ae)
#10 0x00007f7d035c51df void llvm::cl::applicator<llvm::cl::initializer<char [38]> >::opt<llvm::cl::opt<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, false, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >(llvm::cl::initializer<char [38]> const&, llvm::cl::opt<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, false, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >&) (/home/hailong/LLTFI/BUILD/lltfi/bin/../llvm_passes/llfi-passes.so+0xb81df)
#11 0x00007f7d035c4f3a void llvm::cl::apply<llvm::cl::opt<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, false, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >, llvm::cl::initializer<char [38]> >(llvm::cl::opt<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, false, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >, llvm::cl::initializer<char [38]> const&) (/home/hailong/LLTFI/BUILD/lltfi/bin/../llvm_passes/llfi-passes.so+0xb7f3a)
#12 0x00007f7d035c4ab0 void llvm::cl::apply<llvm::cl::opt<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, false, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >, llvm::cl::desc, llvm::cl::initializer<char [38]> >(llvm::cl::opt<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, false, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >, llvm::cl::desc const&, llvm::cl::initializer<char [38]> const&) (/home/hailong/LLTFI/BUILD/lltfi/bin/../llvm_passes/llfi-passes.so+0xb7ab0)
#13 0x00007f7d035c43bb void llvm::cl::apply<llvm::cl::opt<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, false, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >, char [28], llvm::cl::desc, llvm::cl::initializer<char [38]> >(llvm::cl::opt<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, false, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >, char const (&) [28], llvm::cl::desc const&, llvm::cl::initializer<char [38]> const&) (/home/hailong/LLTFI/BUILD/lltfi/bin/../llvm_passes/llfi-passes.so+0xb73bb)
#14 0x00007f7d035c3b84 llvm::cl::opt<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, false, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >::opt<char [28], llvm::cl::desc, llvm::cl::initializer<char [38]> >(char const (&) [28], llvm::cl::desc const&, llvm::cl::initializer<char [38]> const&) (/home/hailong/LLTFI/BUILD/lltfi/bin/../llvm_passes/llfi-passes.so+0xb6b84)
#15 0x00007f7d035c2492 __static_initialization_and_destruction_0(int, int) SoftwareFailureAutoScanPass.cpp:0:0
#16 0x00007f7d035c2555 _GLOBAL__sub_I_SoftwareFailureAutoScanPass.cpp SoftwareFailureAutoScanPass.cpp:0:0
#17 0x00007f7d07750b9a call_init /build/glibc-SzIz7B/glibc-2.31/elf/dl-init.c:71:19
#18 0x00007f7d07750ca1 _dl_init /build/glibc-SzIz7B/glibc-2.31/elf/dl-init.c:118:9
#19 0x00007f7d0479c985 _dl_catch_exception /build/glibc-SzIz7B/glibc-2.31/elf/dl-error-skeleton.c:184:18
#20 0x00007f7d0775543d dl_open_worker /build/glibc-SzIz7B/glibc-2.31/elf/dl-open.c:763:5
#21 0x00007f7d0479c928 _dl_catch_exception /build/glibc-SzIz7B/glibc-2.31/elf/dl-error-skeleton.c:209:18
#22 0x00007f7d0775460a _dl_open /build/glibc-SzIz7B/glibc-2.31/elf/dl-open.c:837:17
#23 0x00007f7d03a1934c dlopen_doit /build/glibc-SzIz7B/glibc-2.31/dlfcn/dlopen.c:66:13
#24 0x00007f7d0479c928 _dl_catch_exception /build/glibc-SzIz7B/glibc-2.31/elf/dl-error-skeleton.c:209:18
#25 0x00007f7d0479c9f3 _dl_catch_error /build/glibc-SzIz7B/glibc-2.31/elf/dl-error-skeleton.c:228:12
#26 0x00007f7d03a19b59 _dlerror_run /build/glibc-SzIz7B/glibc-2.31/dlfcn/dlerror.c:174:40
#27 0x00007f7d03a193da dlopen /build/glibc-SzIz7B/glibc-2.31/dlfcn/dlopen.c:87:51
#28 0x00007f7d04bf0cb8 llvm::sys::DynamicLibrary::getPermanentLibrary(char const, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) (/home/hailong/llvm-project/build/bin/../lib/libLLVMSupport.so.15git+0x1cacb8)
#29 0x00007f7d04b4c2c3 llvm::PluginLoader::operator=(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) (/home/hailong/llvm-project/build/bin/../lib/libLLVMSupport.so.15git+0x1262c3)
#30 0x000055a3c6de01f1 llvm::cl::opt<llvm::PluginLoader, false, llvm::cl::parser<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >::handleOccurrence(unsigned int, llvm::StringRef, llvm::StringRef) (/home/hailong/llvm-project/build/bin/opt+0x341f1)
#31 0x00007f7d04ad4777 ProvideOption(llvm::cl::Option, llvm::StringRef, llvm::StringRef, int, char const* const*, int&) CommandLine.cpp:0:0
#32 0x00007f7d04ae4292 llvm::cl::ParseCommandLineOptions(int, char const* const*, llvm::StringRef, llvm::raw_ostream*, char const*, bool) (/home/hailong/llvm-project/build/bin/../lib/libLLVMSupport.so.15git+0xbe292)
#33 0x000055a3c6dc9853 main (/home/hailong/llvm-project/build/bin/opt+0x1d853)
#34 0x00007f7d04660083 __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:342:3
#35 0x000055a3c6dcc6ae _start (/home/hailong/llvm-project/build/bin/opt+0x206ae)

ERROR: there was an error during running the instrumentation pass, please follow the provided instructions for instrument.`

llfi.index.map.txt maps N/A to LLVM IR instructions during tracing

When tracing is enabled, LLTFI generates a file named llfi.index.map.txt. This file should contain the mapping between the LLVM IR id with the original line number in the code. However, the LLVM IR instructions map to line_N/A instead.
Additionally, we are lacking any information in the README or Wiki to explain what this file does.

Example file contents for llfi.index.map.txt

llfiID_34 line_N/A
llfiID_33 line_N/A
llfiID_32 line_N/A
llfiID_31 line_N/A

onnx-mlir failed with rnn-mnist benchmark

Hi,

With the rnn-mnist benchmark (https://github.com/abrahamchan/LLFI-TF/blob/udit_benchmarks/sample_programs/rnn-mnist/rnn-mnist.py), onnx-mlir failed with the following error message:

$LLFI-TF/sample_programs/rnn-mnist$ onnx-mlir --EmitLLVMIR model.onnx
Loop op doesn't support dynamic dimensions for scan output.
UNREACHABLE executed at /home/uditg/UBC/onnx-mlir/src/Conversion/ONNXToKrnl/ControlFlow/Loop.cpp:254!
Aborted (core dumped)

Cleanup the Wiki

Add VectorNet Benchmark

@AnushreeBannadabhavi We should also try LLTFI on VectorNet, as it is used in Apollo Baidu ADS.
Here's the PyTorch implementation of it: https://github.com/Liang-ZX/VectorNet

Migrate, document, and test the code for ML-level error propogation tracing

Problem during installation

Hi dear Dr Karthik,
Thank you for considering my request for solving my problem during LLTFI installation. You know, I deal with some errors when I try to install LLTFI. I have followed Auto-Installer step by step and I met all dependencies. I would appreciate it if you guide me to solve this problem. Furthermore, I send you some pictures that I took during the installation process, which are attached to this issue.

my device properties:

CPU: Intel Corei3
Hard Disk: SSD 256G
Memory: 12Gigbyte
OS: Ubuntu 20.04.6 LTS

Dependencies:

64 Bit Machine (preferably with GPU for faster training of ML programs)
64 bit Linux (Ubuntu 20.04) or OS X
CMake (minimum v3.15)
Python 3 and above
Ninja >= 1.10.2
Internet Connection

Usage:

Copy the InstallLLTFI.py script to where you want to build the LLTFI. Run "python3 InstallLLTFI.py -h" to see all running options/guidelines
Run "python3 InstallLLTFI.py"

The test about test_suit failed

Hi author，
Hello author,
As for the project I built, I tested the "Running tests" in the /build file and found that most of the test results failed. Could you please remind me where the problem occurred?

check_injection.py:67: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
suite = yaml.load(f)
check_injection.py:16: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config_dict = yaml.load(inputyaml)
=============== Result ===============
./SoftwareFaults/BufferOverflow_API FAIL: No ./llfi folder found!
./SoftwareFaults/BufferOverflowMalloc_Data FAIL: No ./llfi folder found!
./SoftwareFaults/BufferOverflowMemmove_Data FAIL: No ./llfi folder found!
./SoftwareFaults/BufferUnderflow_API FAIL: No ./llfi folder found!
./SoftwareFaults/CPUHog_Res FAIL: No ./llfi folder found!
./SoftwareFaults/DataCorruption_Data FAIL: No ./llfi folder found!
./SoftwareFaults/Deadlock_Res FAIL: No ./llfi folder found!
./SoftwareFaults/HighFrequentEvent_Timing FAIL: No ./llfi folder found!
./SoftwareFaults/InappropriateClose_API FAIL: No ./llfi folder found!
./SoftwareFaults/IncorrectOutput_API FAIL: No ./llfi folder found!
./SoftwareFaults/IncorrectOutput_Data FAIL: No ./llfi folder found!
./SoftwareFaults/InvalidMessage_MPI FAIL: No ./llfi folder found!
./SoftwareFaults/InvalidPointer_Res FAIL: No ./llfi folder found!
./SoftwareFaults/InvalidSender_MPI FAIL: No ./llfi folder found!
./SoftwareFaults/LowMemory_Res FAIL: No ./llfi folder found!
./SoftwareFaults/MemoryExhaustion_Res FAIL: No ./llfi folder found!
./SoftwareFaults/MemoryLeak_Res FAIL: No ./llfi folder found!
./SoftwareFaults/NoAck_MPI FAIL: No ./llfi folder found!
./SoftwareFaults/NoClose_API FAIL: No ./llfi folder found!
./SoftwareFaults/NoDrain_MPI FAIL: No ./llfi folder found!
./SoftwareFaults/NoMessage_MPI FAIL: No ./llfi folder found!
./SoftwareFaults/NoOpen_API FAIL: No ./llfi folder found!
./SoftwareFaults/NoOutput_API FAIL: No ./llfi folder found!
./SoftwareFaults/NoOutput_Data FAIL: No ./llfi folder found!
./SoftwareFaults/PacketStorm_MPI FAIL: No ./llfi folder found!
./SoftwareFaults/RaceCondition_Timing FAIL: No ./llfi folder found!
./SoftwareFaults/StalePointer_Res FAIL: No ./llfi folder found!
./SoftwareFaults/ThreadKiller_Res FAIL: No ./llfi folder found!
./SoftwareFaults/UnderAccumulator_Res FAIL: No ./llfi folder found!
./SoftwareFaults/WrongAPI_API FAIL: No ./llfi folder found!
./SoftwareFaults/WrongDestination_Data FAIL: No ./llfi folder found!
./SoftwareFaults/WrongMode_API FAIL: No ./llfi folder found!
./SoftwareFaults/WrongPointer_Data FAIL: No ./llfi folder found!
./SoftwareFaults/WrongRetrievedAddress_IO FAIL: No ./llfi folder found!
./SoftwareFaults/WrongRetrievedFormat_IO FAIL: No ./llfi folder found!
./SoftwareFaults/WrongSavedAddress_IO FAIL: No ./llfi folder found!
./SoftwareFaults/WrongSavedFormat_IO FAIL: No ./llfi folder found!
./SoftwareFaults/WrongSource_Data FAIL: No ./llfi folder found!
./HardwareFaults/funcname PASS
./HardwareFaults/insttype PASS
./HardwareFaults/llfiindex PASS
./HardwareFaults/random PASS
./HardwareFaults/tracing PASS
./HardwareFaults/multiplebits PASS
./BatchMode/NoOpen_API_WrongMode_API_BufferUnderflow_API FAIL: No ./llfi folder found!
./BatchMode/SoftwareFailureAutoScan Subdirectories for failure modes not found!

Register LLTFI passes at last in the optimization pipeline

registerOptimizerLastEPCallback()

Add GPT-bs benchmark

Add this NLP benchmark:
https://github.com/onnx/models/tree/main/text/machine_comprehension/gpt2-bs

Improve fault injection time of LLTFI thorugh static linking

Currently, LLTFI takes a very large amount of time to do fault injection in NLP models (Bert: 890 s, GPT: 60 s). For GPT, there are 1.4B LLFI cycles i.e. there will be 1.4B calls to LLFI's shared library. How about we statically link the fault injection library in these cases? This should significantly reduce the fault injection time at the expense of increased binary size.

Remove JAVA GUI + remove it from the README + Build scripts

std::logic_error thrown during LLFI's intrumentation pass

Hi,

I got the following error while instrumenting mantevo-hpccg benchmark:

terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/uditg/UBC/llvm-project/build/bin/opt -load /home/uditg/UBC/LLFI-TF/LLFI/bin/../llvm_pas
ses/llfi-passes.so -profilingpass -enable-new-pm=0 -insttype -includeinst=all -excludeinst=ret -regloc -dstreg -includef
orwardtrace -includebackwardtrace -o /home/uditg/UBC/LLFI-TF/Benchmarks/mantevo-hpccg/build/llfi/hpccg-profiling.ll /hom
e/uditg/UBC/LLFI-TF/Benchmarks/mantevo-hpccg/build/llfi/hpccg-llfi_index.ll -S
1.      Running pass 'Profiling pass' on module '/home/uditg/UBC/LLFI-TF/Benchmarks/mantevo-hpccg/build/llfi/hpccg-llfi_
index.ll'.
 #0 0x00005613d78f18c0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/uditg/UBC/llvm-project/llvm/lib/Support
/Unix/Signals.inc:565:0
 abrahamchan/LLFI-TF#1 0x00005613d78f1977 PrintStackTraceSignalHandler(void*) /home/uditg/UBC/llvm-project/llvm/lib/Support/Unix/Signals.in
c:632:0
 abrahamchan/LLFI-TF#2 0x00005613d78ef62b llvm::sys::RunSignalHandlers() /home/uditg/UBC/llvm-project/llvm/lib/Support/Signals.cpp:76:0
 abrahamchan/LLFI-TF#3 0x00005613d78f1241 SignalHandler(int) /home/uditg/UBC/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:0
 abrahamchan/LLFI-TF#4 0x00007fae660a1980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
 abrahamchan/LLFI-TF#5 0x00007fae64d52fb7 raise /build/glibc-S9d2JN/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #6 0x00007fae64d54921 abort /build/glibc-S9d2JN/glibc-2.27/stdlib/abort.c:81:0
 #7 0x00007fae65747957 (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x8c957)
 #8 0x00007fae6574dae6 (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x92ae6)
 #9 0x00007fae6574db21 (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x92b21)
#10 0x00007fae6574dd54 (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x92d54)
#11 0x00007fae6574979f (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x8e79f)
#12 0x00005613d5a0d057 void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construc
t<char const*>(char const*, char const*, std::forward_iterator_tag) /usr/include/c++/7/bits/basic_string.tcc:215:0
#13 0x00007fae64ac4f9d llfi::demangleFuncName(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<ch
ar> >) (/home/uditg/UBC/LLFI-TF/LLFI/bin/../llvm_passes/llfi-passes.so+0x96f9d)
#14 0x00007fae64acc2ac llfi::Controller::getModuleFuncs(llvm::Module&) (/home/uditg/UBC/LLFI-TF/LLFI/bin/../llvm_passes/
llfi-passes.so+0x9e2ac)
#15 0x00007fae64acc348 llfi::Controller::init(llvm::Module&) (/home/uditg/UBC/LLFI-TF/LLFI/bin/../llvm_passes/llfi-passe
s.so+0x9e348)
#16 0x00007fae64acd9db llfi::Controller::Controller(llvm::Module&) (/home/uditg/UBC/LLFI-TF/LLFI/bin/../llvm_passes/llfi
-passes.so+0x9f9db)
#17 0x00007fae64acc672 llfi::Controller::getInstance(llvm::Module&) (/home/uditg/UBC/LLFI-TF/LLFI/bin/../llvm_passes/llf
i-passes.so+0x9e672)
#18 0x00007fae64ae2b38 llfi::ProfilingPass::runOnModule(llvm::Module&) (/home/uditg/UBC/LLFI-TF/LLFI/bin/../llvm_passes/
llfi-passes.so+0xb4b38)
#19 0x00005613d6d5f181 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /home/uditg/UBC/llvm-project/llv
m/lib/IR/LegacyPassManager.cpp:1554:0
#20 0x00005613d6d5a24d llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/uditg/UBC/llvm-project/llvm/lib/IR/Legacy
PassManager.cpp:542:0
#21 0x00005613d6d5fa0b llvm::legacy::PassManager::run(llvm::Module&) /home/uditg/UBC/llvm-project/llvm/lib/IR/LegacyPass
Manager.cpp:1682:0
#22 0x00005613d5a29480 main /home/uditg/UBC/llvm-project/llvm/tools/opt/opt.cpp:1076:0
#23 0x00007fae64d35bf7 __libc_start_main /build/glibc-S9d2JN/glibc-2.27/csu/../csu/libc-start.c:344:0
#24 0x00005613d59efeca _start (/home/uditg/UBC/llvm-project/build/bin/opt+0x18dfeca)

ERROR: there was an error during running the instrumentation pass, please follow the provided instructions for instrument.

Uninitialized pointer variable in GenLLFIIndexPass.cpp may lead to segfaults

In the runOnModuleMain() function in llvm_passes/core/GenLLFIIndexPass.cpp, an Instruction pointer variable is declared as follows:

LLTFI/llvm_passes/core/GenLLFIIndexPass.cpp

Line 12 in 88458b9

Instruction *currinst;

In some cases, the for loop that comes after may not be entered at all, and since currinst is not explicitly initialized, the code attempts to dereference whatever random address it contains. This can be easily fixed by initializing currinst as follows:

Instruction *currinst = NULL;

This issue was encountered when attempting to run the genllfiindex pass from the clang 15.0 frontend, while compiling the QMCPack v3.12.0 application.

Use multiple threads to concurrently run multiple FI experiments

Currently, LLTFI runs FI experiments sequentially, which makes the FI significantly slow.
Moving forward, we can parallelize the FI engine of LLTFI so as to exploit multi-cores for concurrently running FI experiments.

This change will require us to use an alternate mechanism for passing the FI parameters after the profiling phase to the runtime fault injection part. Currently, these parameters are passed using a fixed file. (llfi.stat.[something]).

Implementation of FakeQuantization in LLTFI

Implementation milestones for this week (7 August - 14 August 2024)

Figure out the reason behind the outliers in cnn-fmnist model ✅
Iterate over the output tensor to dequantize the inputs ✅
Double check the Fmul Instruction to be correct for collecting W and X inputs of conv and matmul layers ✅
Implement a solution for Basis Vector in both conv and matmul layers ✅

Old Updates are below -

For the week of 31 July - 7 August 2024 :

Implement the working model of getting the scalling factors correct by stripping the outliers from the percentile approach (as directed by the Research Paper) ✅
Add the support for other layers inside the model ✅
Work with Profiling and Fault Injection Stage to divide the work and call different custom build API calls to calibrate the model and use the date for calibration within the ML layers ✅
Look for the solution for Basis vector ✅
Maximum int number support must be 32 bit, thus, need to ensure this is practised well in the Qunatization ✅

For the week of 24-31 July 2024 :

Prepare the Presentation Slides to present the work for 25 July meeting ✅

Implementation milestones for this week

Implementation of the runtime library to remember the FSM regarding the weights and scaling factors ✅
Implement the Falut Injection Stage to import the calibration data from the previous steps ✅
Implement the Quantization Formula as described below within LLTFI ✅
Furthermore, simulate the injection and Profiling stage to execute the custom LLVM IR function pass (helps in dividing work to different API calls) ✅
Research regrading how the bias vector can be handled within the conv layers of ML programs since it deviates the concept of quantization

For the week of 17-24 July 2024 :

Read Research Paper

Implementation milestones for this week

Gather feature matrix and kernel matrix within the calibration phase for Convolution Layer and Matrix Multiplication Layer ✅
- Gather the Matrix elements from the runtime library (getWandX) - First Milestone ✅
- In the next phase, compute the int matrix multiplication according to the output matrix shape ✅
- Furthermore, replace this resultant matrix with the initial fmul instruction ✅
Implement the DeQuantization within the InjectFault Layer
- After calibration, get the resultant tensor matrix from the LLTFInjectFault functional call ✅
- Overwrite the resultant matrix with DeQuantized Matrix back to the ml program

As a part of the first iteration of Fake Quantization, we are looking to have a native conversion, wherein the Quantization phase, we convert float -> int and in DeQuantization Phase, we convert int -> float ✅

New Update after reading the Research Papers-

Qunatization

Q(r) = Int(r / S)

* Where r is a floating point number (real value), 
* S is the scaling factor, 
* Int is a rounding function that converts the real number to the nearest int value 
* and Q(r) is the Qunatization function

Scaling Factor

S = 2 x max( | r_min |, | r_max| ) / (2^(b - 1) - 1)

* Where r_min is the minimum value found in the input array,
* r_max is the maximum value found in the input array,
* b is the bit width of the qunatized outputs aka the range of qunatized outputs

Dequnatization

r = S x Q(r)

* Where Q(r) is the Qunatized input produced by the first function above,
* S is the Scaling Factor 
* r is the real number output

2022 Wishlist for LLTFI: Feature Requests and Bug Fixes

We are continuously improving LLTFI to make it more versatile and robust. This issue tracks all the feature requests and bug fixes that we plan to add to LLTFI in 2022.
Feel free to request a new feature or bug fix by opening a new Github issue for it and then linking it back to this post.

Bug Fixes

~~Improve the tool's documentation.~~
~~Check if the GUI and the webapp still work. Update them to use the recent versions of Java and NodeJS. (#9 )~~

Feature Requests

~~Add docker image for LLTFI (#5 )~~
Use multiple threads to concurrently run multiple FI experiments (#6 )
~~Add support for RNNs (#2 )~~
Add fault model to support FI into DNN's weights and biases (memory-based FI) (#7 )
~~Check if LLTFI works correctly with the Caffee2 framework. Add Caffe2 benchmarks. (#8 )~~
Add Apollo Baidu as a benchmark (#8 )
~~Upgrade the ONNX-MLIR version (#10)~~
Add ML-level fault propagation tracing (#29)
Add documentation for using fault propagation tracing (#29)

Support or injecting into both src and dest registers

Currently, LLTFI supports injecting faults into destination registers (dstreg) or all the source registers (allsrcreg) of an instruction. However, there's no option to inject faults randomly into both source and destination registers of an instruction (without writing code to do so). It'd be good to have such an option, say 'allregs' in the RegLoc register selector that does the above.

Support Big endian architecture

Currently, LLTFI does wrong Fault injection in Big-endian architecture. We should either add support for big-endian architecture or at least throw an error.

No rule to make target /LLTFI/test_suite/BatchMode

The current version of LLTFI is failing to build due to a CMake error:

make[2]: *** No rule to make target '/home/user/LLTFI/test_suite/BatchMode', needed by 'test_suite/BatchMode' . Stop.
CMakeFiles/Makefile2:622: recipe for target 'test_suite/CMakeFiles/test_suite.dir/all' failed

This error occurs when running the setup script. This affects both the manual installation and the Docker build. I'm attempting to build manually with the command ./setup -LLFI_BUILD_ROOT $(pwd)/build -LLVM_SRC_ROOT $(pwd)/../llvm-project -LLVM_DST_ROOT $(pwd)/../llvm-project/build

Remove onnx-mlir patch with the DependableSystemLab repo

CustomTensorOperatorPass pass has to be updated accordingly.

Add docker image for LLTFI

Installing LLTFI is a cumbersome process: there are many dependencies including LLVM, ONNX, onnx-mlir, protobuf, etc.
It would be better to provide a docker image of LLTFI with everything preinstalled so to make the tool more usable.

Check the functionality of webapp and the GUI

It's been a long time (~ 5-6 years) since we used or updated the LLTFI's GUI. We should check if they still work or update them accordingly to use latest packages of Java and NodeJS.

Does anyone still use them? Should we just remove them from the main repository? Or we can perhaps push the GUI and Webapp into a separate branch.

Automatic installation and manual installation fail on Ubuntu 22

I'm trying to build LLTFI using the automatic and manual install. I tried following the instructions in the README for the latter.

Both of these fail at the step where I'm trying to build LLVM using NInja. It says bin/clang-15 is not found, though I have the file and I've independently installed clang-15 as per the instructions. Can this please be looked into ? I'm able to build LLVM using make without Ninja following the instructions on their website.

On another note, there's a small discrepancy in the manual install process. It says to build "tools" in the README, but that project is no longer defined in LLVM. The auto installed script doesn't do this, so I'm guessing the manual one is incorrect.

Unmanaged NULL pointer return from itaniumDemangle() function

The demangleFuncName() function in llvm_passes/core/Utils.cpp uses the LLVM built-in itaniumDemangle() function as follows:

LLTFI/llvm_passes/core/Utils.cpp

Line 12 in 88458b9

char *test = itaniumDemangle(func.c_str(), NULL, NULL, &stat);

It appears that, in some edge cases, itaniumDemangle() will fail internally, set the status flag accordingly and then return a NULL pointer. This case is currently not handled in the LLTFI code, leading to crashes for some compilation jobs. One simple fix would be to check for the value of test when declaring the demangled string as follows:

std::string demangled = test!=nullptr ? test : func;

However, I'm not sure about the implications of this and more investigation may be needed. It may also be a good idea to check for the length of the func string before running the func[0] == '_' && func[1] == 'Z' if check.

This issue was encountered when attempting to run the genllfiindex and faultinjection passes from the clang 15.0 frontend, while compiling the QMCPack v3.12.0 application.

Auto-Installer needs to check for pip

I was installing LLTFI using the auto-installer, and I didn't have pip installed. The script failed silently and didn't throw an exception. We need to check if pip is installed in the system along with Cmake and Ninja as the autoinstaller uses pip for installing TensorFlow etc.

Update the onnx-mlir version

It looks like ONNX-MLIR is now a lot more mature and supports 116 ML models out of 128 Models from the ONNX Zoo, including large models like Bert and GPT. Ref: onnx/onnx-mlir#128.

We should seriously consider upgrading the onnx-mlir version being used with LLTFI.

Move License to Apache license

LLTFI currently uses the Illinois Open Source license as LLVM historically used this license. However, starting from version 9, LLVM has moved to the "Apache License 2.0 with LLVM exceptions" (https://en.wikipedia.org/wiki/LLVM), and most components have been relicensed. This is similar to the Illinois license is spirit, but simplifies many of the issues related to patents etc.

We should move LLTFI to the Apache 2.0 license as well, to ensure we continue to remain compatible with LLVM in the future. The only change would be to update the LICENSE file in the Git repo to match that of the LLVM license (below).

https://llvm.org/LICENSE.txt

Injection into src registers fails to compile

I'm trying to inject faults into the "source registers" (allsrcregs, srcreg1, srcreg2 etc) in the factorial program. They all fail to compile in the instrumentation pass - it terminates with the assertion failures and stack dump shown below. Injection into dstreg works correctly for the same YAML file however. I've attached a sample YAML file that triggers the failure. Thanks.

llvm::FunctionType::FunctionType(llvm::Type*, llvm::ArrayRefllvm::Type*, bool): Assertion `isValidReturnType(Result) && "invalid return type for function"' failed.

#10 0x0000557ce7ec6157 llvm::FunctionType::FunctionType(llvm::Type*, llvm::ArrayRefllvm::Type*, bool) (/home/karthik/Programs/llvm-project/build/bin/opt+0x2e90157)
#11 0x0000557ce7eca72b llvm::FunctionType::get(llvm::Type*, llvm::ArrayRefllvm::Type*, bool) (/home/karthik/Programs/llvm-project/build/bin/opt+0x2e9472b)
#12 0x00007f0c62b9a91b llfi::FaultInjectionPass::insertInjectionFuncCall(std::map<llvm::Instruction*, std::__cxx11::list<int, std::allocator>, std::lessllvm::Instruction*, std::allocator<std::pair<llvm::Instruction const, std::__cxx11::list<int, std::allocator>>>>, llvm::Module&) (/home/karthik/Programs/LLTFI/build/bin/../llvm_passes/llfi-passes.so+0xd691b)
#13 0x00007f0c62b9b93e llfi::FaultInjectionPass::runOnModule(llvm::Module&) (/home/karthik/Programs/LLTFI/build/bin/../llvm_passes/llfi-passes.so+0xd793e)
#14 0x00007f0c62bce289 llfi::NewFaultInjectionPass::run(llvm::Module&, llvm::AnalysisManagerllvm::Module&) (/home/karthik/Programs/LLTFI/build/bin/../llvm_passes/llfi-passes.so+0x10a289)
#15 0x00007f0c62bd0be5 llvm::detail::PassModel<llvm::Module, llfi::NewFaultInjectionPass, llvm::PreservedAnalyses, llvm::AnalysisManagerllvm::Module>::run(llvm::Module&, llvm::AnalysisManagerllvm::Module&) (/home/karthik/Programs/LLTFI/build/bin/../llvm_passes/llfi-passes.so+0x10cbe5)
#16 0x0000557ce7ea9068 llvm::PassManager<llvm::Module, llvm::AnalysisManagerllvm::Module>::run(llvm::Module&, llvm::AnalysisManagerllvm::Module&) (/home/karthik/Programs/llvm-project/build/bin/opt+0x2e73068)
#17 0x0000557ce5ba8db7 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRefllvm::StringRef, llvm::ArrayRefllvm::PassPlugin, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool) (/home/karthik/Programs/llvm-project/build/bin/opt+0xb72db7)
#18 0x0000557ce5af0175 main (/home/karthik/Programs/llvm-project/build/bin/opt+0xaba175)
#19 0x00007f0c62c3fd90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#20 0x00007f0c62c3fe40 call_init ./csu/../csu/libc-start.c:128:20
#21 0x00007f0c62c3fe40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#22 0x0000557ce5b9b525 _start (/home/karthik/Programs/llvm-project/build/bin/opt+0xb65525)

YAML file (replacing allsrcreg with srcreg1, srcreg2 etc. triggers the same failure):

compileOption:
instSelMethod:
- insttype:
include:
- all
exclude:
- ret

regSelMethod: regloc
regloc: allsrcreg

tracingPropagation: False # trace dynamic instruction values.

tracingPropagationOption:
    maxTrace: 250 # max number of instructions to trace during fault injection run
    debugTrace: False
    generateCDFG: True

runOption:
- run:
numOfRuns: 5
fi_type: bitflip