Giter VIP home page Giter VIP logo

Comments (5)

ericbodden avatar ericbodden commented on June 6, 2024

Hello. I think that there actually could be a semantic problem with using non-zero seeds. Remember that jump functions are really functions, of the form "in any context in which x holds initially, y holds at the target statement". I think if you use a non-zero seed then you are saying that the initial value only holds for certain contexts, namely those in which your non-zero seed applies. Could you maybe explain a bit more about your client analysis so that I can understand what it does and how it behaves?

from heros.

jtoman avatar jtoman commented on June 6, 2024

Sure. For certain functions, I want to track all possible heap allocations that may reach that function call. (For reasons specific to the analysis there is a reason I'm not simply using Soot's built into PTA and then extracting the AllocNodes...) For example, given a function f(x, 2) I want to find all heap allocations that reach x. There may be multiple invocations of f; I want to find not simply which allocations reach an invocation of f, but which allocations reach which invocations of f. For that reason, I assign a unique identifier to each invocation; the values computed along edges are taken from the powerset of all identifiers.

The initial seeds for this backwards-may analysis are the arguments of interest at the each invocation of f. So for example, if there are two invocations f(x,3) and f(y,4) then my initial seeds are (f(x,3), x) and (f(y,4), y) respectively.

Upon further digging it appears that initial seeds (n,d) that are not of the form (sP, zero) are simply seeded into the analysis as jump functions from (sP, zero) -> (n,d) with the corresponding path function being the identity function. Thus, a jump function never originates from (n,d), however the Phase II(i) computation appears to assume that the initial seeds are the source of jump functions; I believe that this mismatch of assumptions causes the error.

from heros.

ericbodden avatar ericbodden commented on June 6, 2024

Hi @jtoman
Interestingly, your analysis sounds very similar in spirit to DART, an on-demand points-to analysis that we have implemented on top of heros a few months ago. We had submitted it to a conference but did not make it, are re-submitting it now, with additional experiments. Let me know in case this could be useful to you. It might be an opportunity to collaborate... My PhD student @johspaeth is working on this.

@johspaeth or @johanneslerch should also be able to best comment on your assessment. @johanneslerch can you comment on this? I think you are currently looking in this code anyway...

from heros.

johanneslerch avatar johanneslerch commented on June 6, 2024

I agree, this looks like a bug to me as well. The initial seed edges are propagated using zero->customValue, but note that to jumpFn an edge zero->zero is added.

It is questionable if the initial edge should be zero->customValue or customValue->customValue. For the former case the line
jumpFn.addFunction(zeroValue, startPoint, zeroValue, EdgeIdentity.<V>v());
in submitInitialSeeds should be changed to
jumpFn.addFunction(zeroValue, startPoint, val, EdgeIdentity.<V>v());

If choosing the former case, then in addition computeValues has to be changed.
Instead of starting with the given value by the seed specification as in the current code:

Pair<N, D> superGraphNode = new Pair<N,D>(startPoint, val); 
scheduleValueProcessing(new ValuePropagationTask(superGraphNode));

the computation should start for the zero value:

Pair<N, D> superGraphNode = new Pair<N,D>(startPoint, zeroValue); 
scheduleValueProcessing(new ValuePropagationTask(superGraphNode));

@ericbodden which version do you prefer? An initial edge from zero to the custom specified seed value or a self-loop edge using only the specified seed (i am not sure if the latter case may cause other assumptions to fail)

@jtoman can you try the sketched bug fix and check if it solves your problem?

I have some general remarks addressing your analysis design. I am not sure if scalability is an issue in your case, but if you encounter problems here you might want to rethink if it is a good choice to have different facts D for each initial seed. Different facts will never allow you to reuse computed summaries. Basically, your approach is as good as running the analysis for each initial seed in isolation.

I immediately see two alternatives allowing you to reuse summaries.

  1. Model the problem as an IDE problem. Use IFDS facts D that do not depend on the initial seed. Use at the initial seeds EdgeFunctions mapping Bottom to your invocation site, e.g., f(x,2). Note that the x value is represented by the IFDS fact, therefore, is not included in the EdgeFunction values.

  2. Use the PathTrackingIFDSSolver or JoinHandlingNodesIFDSSolver and create your facts D such that they have a reference to their predecessor fact from which they were generated. Note that facts still have to be equals independent of their predecessor. The Solver implementations require you to implement addNeighbor, handleJoin respectively, which will take care of facts being merged without loosing the ability to reconstruct were they came from. In the end, you are able to traverse the chain of predecessors and neighbors to find all origins, i.e, in your case all invocation sites of f. You can find more details on this approach in [1].

[1] http://www.bodden.de/pubs/lhbm14flowtwist.pdf

from heros.

jtoman avatar jtoman commented on June 6, 2024

Hi @johanneslerch, as you suggested, my analysis is actually an IDE problem, the IDE values L are what track invocation sites and the facts D are access paths that are invocation site agnostic.

I will try the sketched bug-fix later today.

from heros.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.