tracer-x / tracerx Goto Github PK

TracerX Symbolic Virtual Machine

License: Other

Shell 0.49% Makefile 2.23% C++ 40.10% Lex 0.17% Yacc 0.37% C 52.84% Python 1.82% LLVM 1.19% SMT 0.01% Emacs Lisp 0.06% M4 0.62% Dockerfile 0.11%

symbolic-execution klee

tracerx's People

Contributors

Stargazers

Watchers

Forkers

nguyenquocbinh rasoolmaghareh youfriend20012001 domainexpert yxliang01 rshariffdeen fuzzamos temporalengineer fjgao ninesd thehillbright

tracerx's Issues

Tracer-X has issue on system call

@domainexpert
specifically happened when executed for Minix tr: tracer-x/TracerX-examples@e5e1d75

notes: current makefile is not suitable to run minix tr since it uses additional arguments. In order to run it, please run using the configuration in the comment section. I am still learning to modify the makefile to adapt with this case.

Analysis of successful subsumption check in coreutils

There's a need to further analyze the successful subsumption check in coreutils. I am thinking to let tracer-x produce separate file that list the successful subsumption check that contains the program location (if possible), current constraint, and interpolant query.

Please let me know if there's any other ideas.

Tracer-X slower than KLEE due to disabled pre-solving

It is mainly caused by disabling the equality substitution (a pre-solving simplification). When equality substitution is disabled, then, simplifyExpr(expr) will not be called. And therefore constraint solving will be slower. The reason for disabling the expression simplification is because it mangles the path condition, and we can therefore no longer identify the core. One of the example where it shows Tracer-X is slower than KLEE can be seen in here: tracer-x/TracerX-examples@15ee216

Calling directComputeValidity May not Be Fastest

In SubsumptionTableEntry::subsumed, calling z3solver->directComputeValidity instead of solver->evaluate results in subsumption check not enjoying the optimizations provided by KLEE wrapper solvers in case the query formula contains quantifications. It is probably worth trying to replace the former with the latter and test the performance gain/loss. The issue with this is that various solver wrappers do not handle quantified formulas.

Merging with klee/klee

The PR klee/klee#328 may be impacting us a lot, so merge may not be trivial this time round.

State Merging Is Not Handled

State merging in KLEE causes problems with missing path condition constraints when marking constraints on the path condition on subsumption check (in SubsumptionTableEntry::subsumed). This is because the constraints in the unsatisfiability core are from the ExecutionState constraints, which, when merging had been done, would contain more constraints than those that can be marked in PathCondition. The problem is currently ignored by keeping silent if a constraint in the core does not exist on the path condition.

Missing interpolant constraint

This is one of the query from addition_safe8.c, where there is no shadow_y interpolant in the existential sub expression and thus the expression cannot be further simplified.

(query [(Sle N0:(ReadLSB w32 0 x)
              0)
         (Sle 0
              N1:(ReadLSB w32 0 y))
         (Sle N1 35)
         (Slt 8
              (ReadLSB w32 0 p1))
         (Slt 8
              (ReadLSB w32 0 p2))
         (Slt 8
              (ReadLSB w32 0 p3))]
(Exists ((w8 x 4 __shadow__y)
                (w8 x 4 __shadow__x))
                (And (Sle N2:(ReadLSB w32 0 __shadow__x)
                          0)
                     (Eq (Add w32 3
                                  (Add w32 N2
                                           (ReadLSB w32 0 __shadow__y)))
                         (Add w32 2 (Add w32 N0 N1))))))

Cubic time problem

This issue is assigned to also assigned to @domainexpert

Code Formatting Issues

Let's follow LLVM coding standard http://llvm.org/docs/CodingStandards.html
In particular,

/// should be used only for doxygen-style comments and // for explanations within code.
Names should be in camel casing, e.g., unsatCore instead of unsat_core. (Strictly speaking, the coding standard requires UnsatCore, but let's follow KLEE convention instead.)
Do not use using namespace std. Instead, prefix std components with std::. As we are implementing code for klee namespace, we should not use using namespace llvm as well.
Use preincrement whenever possible.
Don't evaluate end() everytime through a loop.

Regression Tests Failures

Need to ensure that all regression test suites succeeds. This should be done before resolving #20 so as merging can be done carefully commit by commit with proper regression testing done at each step.

Add Comments to Interpolation Support Code

From @domainexpert on December 12, 2015 15:30

We need to clarify the purpose of every new component (methods/classes) introduced that are not part of original KLEE.

Copied from original issue: domainexpert/klee#4

Ensure That Subsumption Checks Are at the Correct Points

As search tree branching could be the result of non-control-flow branching, the storage of interpolant and subsumption check should be based on instruction id instead of basic block id.

Z3 Not Used Incrementally

When exploring depth-first, Z3 could have been used incrementally to partially solve path conditions. This may result in performance gain.

Thrashing When Experimenting with Coreutils Cat

There seems to be thrashing when running make cat.klee in klee-examples/coreutils.

Submitting subsumption check once to the solver

In the current implementation, whenever the subsumption table contains n entries to be checked (itp_1, ... , itp_n), the solver is called at most n times. Instead, we could try to bunch the entries into a single disjunction itp_1 \/ ... \/ itp_n and call the solver once. This may / may not improve the performance. Idea by Chu Duc Hiep and Joxan Jaffar.

Negation As Failure Does Not Work for Constraints

Given the following example:

#include <klee/klee.h>
#include <assert.h>

int main() {
  int x;
  int cond;

  klee_make_symbolic(&cond, sizeof(cond), "cond");

  if (cond) {
    x = 4;
  } else {
    x = 2;
  }

  klee_join("dummy1", x);

  return x;
}

When run with branch tracer-x-clpr, the run should result in 0 subsumption as even though the assignment x = 2 is visited before x = 4 is, x = 4 does not satisfy dummy1(x) :- x <= 3, however, there is 1 subsumption due to the success of the second proof query with x = 4. Precisely, the CLP(R) query is:

not(=((0),__clp__cond)),=(__clpr__arg0__,(4)),not(dummy1(__heap__,__clpr__arg0__)).

CLP(R) returns *** No for this constraint, as the equality of the leftmost constraint is satisfiable, and hence the evaluation of the not(...) fails. Instead, it should have returned *** Yes, by treating the first term as a disequality constraint that is satisfiable.

Replace Field-Insensitivity with Array Constraints-Based Address Resolution

Current field-insensitive flow dependency tracking to compute unsatisfiability core's dependent-upon allocations loses precision due to field-insensitivity of load and store for composite allocations. Change the dependency computation mechanism to use load/store address resolution based on array constraints to obtain the precise address.

Move Repository to A New Organization

At some point we need to create a GitHub organization for this project and move our KLEE repository to be under that organization.

Index Allocation By Address in the Subsumption Table

Currently, allocations in the subsumption table is indexed by base allocation (an LLVM value) first, and then its actual address. The keys of the table are stored in SubsumptionTableEntry::concreteAddressStoreKeys and SubsumptionTableEntry::symbolicAddressStoreKeys, which are vectors of llvm::Value pointers. However, it is possible that the same (concrete) address is actually referenced via different bases, making subsumption checks less reliable when indexed using the base, as we may fail to retrieve the correct memoed program state (the interpolant). Instead, the program state should be indexed using the addresses themselves to be correct.

Create Command-Line Options to Select Solver and to Use Interpolation

From @domainexpert on December 11, 2015 12:51

Allow KLEE to be compiled with both Z3 and STP, and add an option to select between them (or to select metaSMT). Also, when Z3 is enabled, add an option to enable interpolation. When these are done, merge master and felicia branches.

Copied from original issue: domainexpert/klee#1

Check Tree for Missing Subsumption Opportunities

There is a need to check that the subsumptions in this tree are correct. This tree was produced on klee-examples/scalability/Regexp.c of 85baff8. If they were correct, the current revision of tracer-x/klee should be evaluated against it.

Solver not deciding although quantification has been removed

There seems to be cases that subsubmption check reports inability of solver to decide (in-)validity although there is no quantified variable in the formula.

Update Comment of Dependency Class

The comment of Dependency class describing the forward dependency flow is outdated. The updating, however, probably should be done after resolving #85 .

Alternative Behavior of klee_abstract

@feliciahalim In the design currently being worked on in #86 klee_abstract(condition(x1, ..., xN)) removes all constraints containing any of x1, ..., xN, and adds the abstraction condition condition(x1, ..., xN) to the path condition. For example, if the state was x==1 && y==x and the condition was x>=1, this will be abstracted to x>=1. An alternative behavior would be to replace x (variable in the abstraction condition), with its ghost version, say ghost_x, and then adding the path condition, e.g., ghost_x==1 && y==ghost_x && x>=1, essentially existentially-quantifying the original x. The abstraction is equivalent to y==1 && x>=1, which retains the information on y. Credit goes to Joxan Jaffar for the idea.

Simplification to Dependency Computation

It is possible to simplify flow dependency information implemented in the Dependency class into directly dependency between allocations. Currently the dependencies stored are between (LLVM) values in Dependency::flowsToList, which can be larger in number (unconfirmed), and the allocations dependencies are only computed when needed using AllocationGraph graph constructed in an ad-hoc manner, hence introducing complication as conceptually there are value dependency graph computed top-down and allocation dependency graph (what is actually needed) computed from the value dependency graph bottom-up.

Project Subtitle Needs Updating

Project subtitle of feliciahalim/klee is currently the same as KLEE. In order to be more informative, I suggest changing it to "KLEE Symbolic Virtual Machine with Tracer-X Extension" with website http://paella.ddns.comp.nus.edu.sg/tracerx/.

Suggesting Change from InterpolationOption::interpolation to m_* Field of InterpreterHandler

The m_* fields are in class KleeHandler of tools/klee/main.cpp. This seems to be KLEE's standard to globally access command-line option value.

EnvironmentAllocation May Not Be Necessary

I think with a31d238 EnvironmentAllocation class is no longer necesssary, as any load instruction / loaded value that is forcibly used as a pointer is now handled in a general way in 17d2ff7. echo.c and echo_small.c in klee-examples/scalability can probably be used for testing the removal of EnvironmentAllocation as they call getenv.

Ensure That Subsumption Checks Are Not More than Necessary

For efficiency, subsumption check should only be done just before executing the first instruction of a tree node, and not before executing every instruction in the node.

Extra Constraints in Path Condition

Extra constraints such as disjunctions sometimes appear in the path condition (constraints field) of state object (of type ExecutionState passed in to SubsumptionTableEntry::subsumed. These constraints are not recorded in PathCondition causing segmentation faults when marking the constraints for interpolation. The problem is currently ignored by keeping silent if a constraint in the core does not exist on the path condition.

Handling decimal value in fourier-motzkin implementation

If we have a constraint like this:
2*shadow_y <= y, where shadow_y is on focus existential variable.
In the normalization step, both sides will be divided by shadow_y coefficient resulting the coefficient y into a fraction (1/2). However, we useint_64 data type when storing the coefficient, therefore y coefficient would be stored as 0 and therefore invalidate the overall inequality.

The alternative solution could be changing int64_t datatype intolong double.

Need for WLP to Compute Interpolant

I just added a new example in here https://github.com/tracer-x/klee-examples/blob/master/basic/addition_safe10.c , this example is actually the modification of https://github.com/tracer-x/klee-examples/blob/master/basic/addition_safe5.c by re-ordering its addition operation. Addition_safe5 has 4 subsumptions, while addition_safe10 has zero subsumption. This is the snapshot of addition_safe10.c :

 klee_assume(x <= 0);

  if (p1 > 8)
    x = x + 2;
  else
    x = x + 0;

  if (p2 > 8)
    x = x + 2;
  else
    x = x + 1;

  if (p3 > 8)
    x = x + 2;
  else
    x = x + 1;

  assert(x <= 6);

@domainexpert I think there's opportunity to make subsumption for this case by adding the constraint from the assertion. For example, after we executed the last IF condition (p3) for both its false and true, we know that at this point to be in safe condition x should be x <= 4 (can be generated from its assertion x <=6 and addition operation in p3). Therefore, as long as the x<=4 still holds before that point, it is still considered as safe and can be subsumed. The idea still rough, it may need further exploration.

Renaming of -max-fail-subsume option to -max-subsumption-failure

I think -max-subsumption-failure would be easier to read and understand.

Memoing Simplified Subformula in Subsumption Check

At present, formula simplification is done when checking each subsumption table entry, even with the same control point index. Memoing some simplified subformula across when performing subsumption check across all entries checked for a particular control point may improve performance. Credit goes to Joxan Jaffar for this issue.

Location information is missing

When the bug is reported, the location information is missing. It supposed indicated the line of number that cause the error.

Lower coverage produced by Tracer-x compared to KLEE

We currently look at the coverage as an indicator for comparing the Tracer-x and KLEE performance. However, Tracer-x seems produce lower line coverage (also branch and instruction coverage). It might be caused by paths are subsumed and therefore it is not executed.

Therefore, we might need other way to comparing Tracer-X and KLEE performance such as only subsumed the non-error paths, so we could expect number of error-paths generated by Tracer-X and KLEE is similar (that shows we didn't miss the error)

Change Recursion to Iteration

Parts of the code for interpolation, e.g., Dependency::recursivelyBuildAllocationGraph is a recursive method. It may be worth replacing these with iterations, which would result in a more efficient execution.

Missing error report

KLEE reported two error reports in different locations. However, Tracer-x has reported only one report. It is caused by the error paths is subsumed by an empty interpolant.

This is the Heartbleed.c code:

#include <stdlib.h>
#include <klee/klee.h>

char *SSL_get_shared_ciphers(const char *cp, char *buf, int len)
{
    char *p = buf;

    for (; *cp; )
    {
        if (len-- == 0)
        {
            *p='\0';
            return(buf);
        }
        else
            *(p++)= *(cp++);
    }
    *(p++)=':';

    p[-1]='\0';
    return(buf);
}

/* test function */

#define BUF_SIZE 10
#define MAX_S 20

int main()
{
    char *buf = malloc(BUF_SIZE);
    klee_make_symbolic(buf, BUF_SIZE, "buf");

    // Set up string variable 'name' with arbitrary content
    // and length at most MAX_S (including terminator).

    int s = MAX_S;
    char *name = malloc(s);
    name[s-1] = '\0';
    klee_make_symbolic(name, s, "name");


    // Check SSL_get_shared_ciphers for all strings 'name'
    // of size up to MAX_S.

    SSL_get_shared_ciphers(name, buf, BUF_SIZE);

    return 0;
}

Missing Error Report

Regarding basic/loop_unsafe1.c in tracer-x/TracerX-examples#20: When run with -select-solver=stp, KLEE manages to report the error in the example, however, interpolation run does not report the error.

Combination of configure Options May Not Work

Compilation may break without --with-stp option but with --with-z3. This is definitely an issue in tracer-x-clpr branch.

Implement Simple Substitution for Subsumption Check

For the version of PR #38, when running KLEE, often there are expressions of the form `x == expr` in subsumption check, where `x` is existential. Here, it is easy to find all occurrences of `x` in the formula and replace that occurrence with `expr`. This should result in some simplifications of the subsumption check formulas when running `addition_safe8.c` of tracer-x/TracerX-examples@`3e6b18a`.

Designing klee_join

@feliciahalim To do abstraction at subsumption checkpoint. Here we add a new API, perhaps called klee_subsume_abstract(name, var1, ..., varN) specifying a CLP predicate named name, with arguments var1, ..., varN, which are program variables. The assertion does not change the symbolic exploration, however, at the earliest subsumption checkpoint(s) after the call, at the point of checking for subsumption, we check that the CLP predicate name(var1, ..., varN) holds instead of checking that the interpolant holds. Credit goes to Joxan Jaffar for the idea.

Run time error found when testing

I have picked 5 Coreutils program to be tested on KLEE including base64, mkdir, echo, cat, and chmod. However, the experiments produce run time error and it happened by the same cause: Dependency.cpp:1058: klee::Dependency::execute(llvm::Instruction*, klee::refklee::Expr): Assertion `arg != 0 && "operand not found"' failed.

Compute Variables Influencing Unsatisfiability Core

From @domainexpert on December 12, 2015 15:16

To guarantee the soundness of the analysis, we use forward points-to rules to compute variables influencing unsatisfiability core. The information can be used to determine at a particular control point, part(s) of the state that are relevant to the unsatisfiability core in order to compute the Craig interpolant at that point. The implementation could possibly use the shadow memory points-to implementation in interpolant branch.

Copied from original issue: domainexpert/klee#2

Incremental subsumption check

Assuming we want to check that n interpolant holds. They are int_1, ..., int_n. Now, at the moment, for each j with 1 <= j <= n, we check that path condition => int_j. Instead of this, it may be a better idea
to insert a backtracking point after submitting the path condition but before submitting the (negation) of the interpolant, so that the solver state can be returned to the state before the (negation) of the interpolant is inserted. The performance may / may not improve in this way. Credits to Chu Duc Hiep and Joxan Jaffar for the idea.

Valgrind Memcheck Error Reports

As at 273f96f Valgrind reports issues when Tracer-X KLEE is run using make cat.klee in coreutils subdirectory tracer-x/TracerX-examples@748c31b, with time replaced with valgrind in coreutils/Makefile.

Regression Failures

From @domainexpert on December 12, 2015 15:24

Regression test using make check fails on Dogfood/ImmutableTest.cpp and Feature/DanglingConcreteReadExpr.c at domainexpert/fp-analysis-klee@45f967d.

Copied from original issue: domainexpert/klee#3

Make tree.dot Generation Available Even When Interpolaton Is Disabled

It would be good to still be able to generate tree.dot when interpolation is disabled, for comparing with when interpolation is enabled.

Solving of Unary Constraints

To speedup subsumption check, it may be worthwhile to quickly detect contradictory unary constraints in subsumption checks. Credit to Joxan Jaffar for this issue.

Fourier-Motzkin for Quantifier Elimination

Subsumption checks involve quantified formulas. In case the formulas involve linear arithmetic constraints, we make an assumption that the variables are in real number domain and use Fourier-Motzkin elimination to remove quantification. As I see it, there are three possible approaches:

Look into Z3 quantifier elimination for real linear arithmetic. Importantly, we need a simplification API to eliminate existentially-quantified variables. If such exists, the best is just to use it, as Z3 is already integrated.
Interface with CLP(R). The disadvantages are:
a. the interfacing work will take time, and
b. this makes the code harder to maintain, as it will include another third-party.
Write our own. Fourier-Motzkin is not hard to write, and it would probably only take the same amount of time as interfacing with CLP(R).