Giter VIP home page Giter VIP logo

tracerx's People

Contributors

251 avatar ahorn avatar andreamattavelli avatar antiagainst avatar arpitadutta avatar ccadar avatar cunningbaldrick avatar ddcc avatar ddunbar avatar delcypher avatar domainexpert avatar feliciahalim avatar holycrap872 avatar hpalikareva avatar kaomoneus avatar kren1 avatar martinnowack avatar mdimjasevic avatar msoos avatar mvillmow avatar paulmar avatar pcc avatar rasoolmaghareh avatar ret2libc avatar rsas avatar sanghu1790 avatar willemp avatar xuanlinhha avatar yotann avatar yxliang01 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tracerx's Issues

Analysis of successful subsumption check in coreutils

There's a need to further analyze the successful subsumption check in coreutils. I am thinking to let tracer-x produce separate file that list the successful subsumption check that contains the program location (if possible), current constraint, and interpolant query.

Please let me know if there's any other ideas.

Tracer-X slower than KLEE due to disabled pre-solving

It is mainly caused by disabling the equality substitution (a pre-solving simplification). When equality substitution is disabled, then, simplifyExpr(expr) will not be called. And therefore constraint solving will be slower. The reason for disabling the expression simplification is because it mangles the path condition, and we can therefore no longer identify the core. One of the example where it shows Tracer-X is slower than KLEE can be seen in here: tracer-x/TracerX-examples@15ee216

Calling directComputeValidity May not Be Fastest

In SubsumptionTableEntry::subsumed, calling z3solver->directComputeValidity instead of solver->evaluate results in subsumption check not enjoying the optimizations provided by KLEE wrapper solvers in case the query formula contains quantifications. It is probably worth trying to replace the former with the latter and test the performance gain/loss. The issue with this is that various solver wrappers do not handle quantified formulas.

State Merging Is Not Handled

State merging in KLEE causes problems with missing path condition constraints when marking constraints on the path condition on subsumption check (in SubsumptionTableEntry::subsumed). This is because the constraints in the unsatisfiability core are from the ExecutionState constraints, which, when merging had been done, would contain more constraints than those that can be marked in PathCondition. The problem is currently ignored by keeping silent if a constraint in the core does not exist on the path condition.

Missing interpolant constraint

This is one of the query from addition_safe8.c, where there is no shadow_y interpolant in the existential sub expression and thus the expression cannot be further simplified.

(query [(Sle N0:(ReadLSB w32 0 x)
              0)
         (Sle 0
              N1:(ReadLSB w32 0 y))
         (Sle N1 35)
         (Slt 8
              (ReadLSB w32 0 p1))
         (Slt 8
              (ReadLSB w32 0 p2))
         (Slt 8
              (ReadLSB w32 0 p3))]
(Exists ((w8 x 4 __shadow__y)
                (w8 x 4 __shadow__x))
                (And (Sle N2:(ReadLSB w32 0 __shadow__x)
                          0)
                     (Eq (Add w32 3
                                  (Add w32 N2
                                           (ReadLSB w32 0 __shadow__y)))
                         (Add w32 2 (Add w32 N0 N1))))))

Code Formatting Issues

Let's follow LLVM coding standard http://llvm.org/docs/CodingStandards.html
In particular,

  1. /// should be used only for doxygen-style comments and // for explanations within code.
  2. Names should be in camel casing, e.g., unsatCore instead of unsat_core. (Strictly speaking, the coding standard requires UnsatCore, but let's follow KLEE convention instead.)
  3. Do not use using namespace std. Instead, prefix std components with std::. As we are implementing code for klee namespace, we should not use using namespace llvm as well.
  4. Use preincrement whenever possible.
  5. Don't evaluate end() everytime through a loop.

Regression Tests Failures

Need to ensure that all regression test suites succeeds. This should be done before resolving #20 so as merging can be done carefully commit by commit with proper regression testing done at each step.

Z3 Not Used Incrementally

When exploring depth-first, Z3 could have been used incrementally to partially solve path conditions. This may result in performance gain.

Submitting subsumption check once to the solver

In the current implementation, whenever the subsumption table contains n entries to be checked (itp_1, ... , itp_n), the solver is called at most n times. Instead, we could try to bunch the entries into a single disjunction itp_1 \/ ... \/ itp_n and call the solver once. This may / may not improve the performance. Idea by Chu Duc Hiep and Joxan Jaffar.

Negation As Failure Does Not Work for Constraints

Given the following example:

#include <klee/klee.h>
#include <assert.h>

int main() {
  int x;
  int cond;

  klee_make_symbolic(&cond, sizeof(cond), "cond");

  if (cond) {
    x = 4;
  } else {
    x = 2;
  }

  klee_join("dummy1", x);

  return x;
} 

When run with branch tracer-x-clpr, the run should result in 0 subsumption as even though the assignment x = 2 is visited before x = 4 is, x = 4 does not satisfy dummy1(x) :- x <= 3, however, there is 1 subsumption due to the success of the second proof query with x = 4. Precisely, the CLP(R) query is:

not(=((0),__clp__cond)),=(__clpr__arg0__,(4)),not(dummy1(__heap__,__clpr__arg0__)).

CLP(R) returns *** No for this constraint, as the equality of the leftmost constraint is satisfiable, and hence the evaluation of the not(...) fails. Instead, it should have returned *** Yes, by treating the first term as a disequality constraint that is satisfiable.

Replace Field-Insensitivity with Array Constraints-Based Address Resolution

Current field-insensitive flow dependency tracking to compute unsatisfiability core's dependent-upon allocations loses precision due to field-insensitivity of load and store for composite allocations. Change the dependency computation mechanism to use load/store address resolution based on array constraints to obtain the precise address.

Index Allocation By Address in the Subsumption Table

Currently, allocations in the subsumption table is indexed by base allocation (an LLVM value) first, and then its actual address. The keys of the table are stored in SubsumptionTableEntry::concreteAddressStoreKeys and SubsumptionTableEntry::symbolicAddressStoreKeys, which are vectors of llvm::Value pointers. However, it is possible that the same (concrete) address is actually referenced via different bases, making subsumption checks less reliable when indexed using the base, as we may fail to retrieve the correct memoed program state (the interpolant). Instead, the program state should be indexed using the addresses themselves to be correct.

Update Comment of Dependency Class

The comment of Dependency class describing the forward dependency flow is outdated. The updating, however, probably should be done after resolving #85 .

Alternative Behavior of klee_abstract

@feliciahalim In the design currently being worked on in #86 klee_abstract(condition(x1, ..., xN)) removes all constraints containing any of x1, ..., xN, and adds the abstraction condition condition(x1, ..., xN) to the path condition. For example, if the state was x==1 && y==x and the condition was x>=1, this will be abstracted to x>=1. An alternative behavior would be to replace x (variable in the abstraction condition), with its ghost version, say ghost_x, and then adding the path condition, e.g., ghost_x==1 && y==ghost_x && x>=1, essentially existentially-quantifying the original x. The abstraction is equivalent to y==1 && x>=1, which retains the information on y. Credit goes to Joxan Jaffar for the idea.

Simplification to Dependency Computation

It is possible to simplify flow dependency information implemented in the Dependency class into directly dependency between allocations. Currently the dependencies stored are between (LLVM) values in Dependency::flowsToList, which can be larger in number (unconfirmed), and the allocations dependencies are only computed when needed using AllocationGraph graph constructed in an ad-hoc manner, hence introducing complication as conceptually there are value dependency graph computed top-down and allocation dependency graph (what is actually needed) computed from the value dependency graph bottom-up.

EnvironmentAllocation May Not Be Necessary

I think with a31d238 EnvironmentAllocation class is no longer necesssary, as any load instruction / loaded value that is forcibly used as a pointer is now handled in a general way in 17d2ff7. echo.c and echo_small.c in klee-examples/scalability can probably be used for testing the removal of EnvironmentAllocation as they call getenv.

Extra Constraints in Path Condition

Extra constraints such as disjunctions sometimes appear in the path condition (constraints field) of state object (of type ExecutionState passed in to SubsumptionTableEntry::subsumed. These constraints are not recorded in PathCondition causing segmentation faults when marking the constraints for interpolation. The problem is currently ignored by keeping silent if a constraint in the core does not exist on the path condition.

Handling decimal value in fourier-motzkin implementation

If we have a constraint like this:
2*shadow_y <= y, where shadow_y is on focus existential variable.
In the normalization step, both sides will be divided by shadow_y coefficient resulting the coefficient y into a fraction (1/2). However, we useint_64 data type when storing the coefficient, therefore y coefficient would be stored as 0 and therefore invalidate the overall inequality.

The alternative solution could be changing int64_t datatype intolong double.

Need for WLP to Compute Interpolant

I just added a new example in here https://github.com/tracer-x/klee-examples/blob/master/basic/addition_safe10.c , this example is actually the modification of https://github.com/tracer-x/klee-examples/blob/master/basic/addition_safe5.c by re-ordering its addition operation. Addition_safe5 has 4 subsumptions, while addition_safe10 has zero subsumption. This is the snapshot of addition_safe10.c :

 klee_assume(x <= 0);

  if (p1 > 8)
    x = x + 2;
  else
    x = x + 0;

  if (p2 > 8)
    x = x + 2;
  else
    x = x + 1;

  if (p3 > 8)
    x = x + 2;
  else
    x = x + 1;

  assert(x <= 6);

@domainexpert I think there's opportunity to make subsumption for this case by adding the constraint from the assertion. For example, after we executed the last IF condition (p3) for both its false and true, we know that at this point to be in safe condition x should be x <= 4 (can be generated from its assertion x <=6 and addition operation in p3). Therefore, as long as the x<=4 still holds before that point, it is still considered as safe and can be subsumed. The idea still rough, it may need further exploration.

Memoing Simplified Subformula in Subsumption Check

At present, formula simplification is done when checking each subsumption table entry, even with the same control point index. Memoing some simplified subformula across when performing subsumption check across all entries checked for a particular control point may improve performance. Credit goes to Joxan Jaffar for this issue.

Location information is missing

When the bug is reported, the location information is missing. It supposed indicated the line of number that cause the error.

Lower coverage produced by Tracer-x compared to KLEE

We currently look at the coverage as an indicator for comparing the Tracer-x and KLEE performance. However, Tracer-x seems produce lower line coverage (also branch and instruction coverage). It might be caused by paths are subsumed and therefore it is not executed.

Therefore, we might need other way to comparing Tracer-X and KLEE performance such as only subsumed the non-error paths, so we could expect number of error-paths generated by Tracer-X and KLEE is similar (that shows we didn't miss the error)

Change Recursion to Iteration

Parts of the code for interpolation, e.g., Dependency::recursivelyBuildAllocationGraph is a recursive method. It may be worth replacing these with iterations, which would result in a more efficient execution.

Missing error report

KLEE reported two error reports in different locations. However, Tracer-x has reported only one report. It is caused by the error paths is subsumed by an empty interpolant.

This is the Heartbleed.c code:

#include <stdlib.h>
#include <klee/klee.h>

char *SSL_get_shared_ciphers(const char *cp, char *buf, int len)
{
    char *p = buf;

    for (; *cp; )
    {
        if (len-- == 0)
        {
            *p='\0';
            return(buf);
        }
        else
            *(p++)= *(cp++);
    }
    *(p++)=':';

    p[-1]='\0';
    return(buf);
}

/* test function */

#define BUF_SIZE 10
#define MAX_S 20

int main()
{
    char *buf = malloc(BUF_SIZE);
    klee_make_symbolic(buf, BUF_SIZE, "buf");

    // Set up string variable 'name' with arbitrary content
    // and length at most MAX_S (including terminator).

    int s = MAX_S;
    char *name = malloc(s);
    name[s-1] = '\0';
    klee_make_symbolic(name, s, "name");


    // Check SSL_get_shared_ciphers for all strings 'name'
    // of size up to MAX_S.

    SSL_get_shared_ciphers(name, buf, BUF_SIZE);

    return 0;
}

Implement Simple Substitution for Subsumption Check

For the version of PR #38, when running KLEE, often there are expressions of the form x == expr in subsumption check, where x is existential. Here, it is easy to find all occurrences of x in the formula and replace that occurrence with expr. This should result in some simplifications of the subsumption check formulas when running addition_safe8.c of tracer-x/TracerX-examples@3e6b18a.

Designing klee_join

@feliciahalim To do abstraction at subsumption checkpoint. Here we add a new API, perhaps called klee_subsume_abstract(name, var1, ..., varN) specifying a CLP predicate named name, with arguments var1, ..., varN, which are program variables. The assertion does not change the symbolic exploration, however, at the earliest subsumption checkpoint(s) after the call, at the point of checking for subsumption, we check that the CLP predicate name(var1, ..., varN) holds instead of checking that the interpolant holds. Credit goes to Joxan Jaffar for the idea.

Run time error found when testing

I have picked 5 Coreutils program to be tested on KLEE including base64, mkdir, echo, cat, and chmod. However, the experiments produce run time error and it happened by the same cause: Dependency.cpp:1058: klee::Dependency::execute(llvm::Instruction*, klee::refklee::Expr): Assertion `arg != 0 && "operand not found"' failed.

Compute Variables Influencing Unsatisfiability Core

From @domainexpert on December 12, 2015 15:16

To guarantee the soundness of the analysis, we use forward points-to rules to compute variables influencing unsatisfiability core. The information can be used to determine at a particular control point, part(s) of the state that are relevant to the unsatisfiability core in order to compute the Craig interpolant at that point. The implementation could possibly use the shadow memory points-to implementation in interpolant branch.

Copied from original issue: domainexpert/klee#2

Incremental subsumption check

Assuming we want to check that n interpolant holds. They are int_1, ..., int_n. Now, at the moment, for each j with 1 <= j <= n, we check that path condition => int_j. Instead of this, it may be a better idea
to insert a backtracking point after submitting the path condition but before submitting the (negation) of the interpolant, so that the solver state can be returned to the state before the (negation) of the interpolant is inserted. The performance may / may not improve in this way. Credits to Chu Duc Hiep and Joxan Jaffar for the idea.

Solving of Unary Constraints

To speedup subsumption check, it may be worthwhile to quickly detect contradictory unary constraints in subsumption checks. Credit to Joxan Jaffar for this issue.

Fourier-Motzkin for Quantifier Elimination

Subsumption checks involve quantified formulas. In case the formulas involve linear arithmetic constraints, we make an assumption that the variables are in real number domain and use Fourier-Motzkin elimination to remove quantification. As I see it, there are three possible approaches:

  1. Look into Z3 quantifier elimination for real linear arithmetic. Importantly, we need a simplification API to eliminate existentially-quantified variables. If such exists, the best is just to use it, as Z3 is already integrated.
  2. Interface with CLP(R). The disadvantages are:
    a. the interfacing work will take time, and
    b. this makes the code harder to maintain, as it will include another third-party.
  3. Write our own. Fourier-Motzkin is not hard to write, and it would probably only take the same amount of time as interfacing with CLP(R).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.