It seems Phosphor produces wrong taint results for relational (e.g. >) and conditio

Thanks for your patience and help, Jon. I use <code class="notransla

Wrong taint results for relational and conditional operators,about gmu-swe/phosphor

Comments (14)

jon-bell commented on August 15, 2024

Hi Bihuan,
The behavior that you are looking for is for Phosphor to track taint tags through conditional operators (e.g. greater than sign). Phosphor does this only with the "controlTrack" flag set during instrumentation (must be set when instrumenting both the test code and the JRE). Based on your output, I am guessing that you are just using the multiTaint option.

If you run this same exact code with the controlTrack flag set, you should see the output:
e = 0 -> Taint [lbl=null deps = [x ]]
f = 1 -> Taint [lbl=null deps = [y ]]
b1 = false -> Taint [lbl=null deps = [x y ]]
b2 = true -> Taint [lbl=null deps = [x y ]]
b3 = true -> Taint [lbl=null deps = [x y b ]]

In this case, there are branches in System.out.println that depend on the value of what we are printing (e.g. for deciding how to print numbers). So the line that prints e will cause e to be relevant to the control flow state, and the line that prints f will cause it to be relevant as well. If you DON'T print the values, then you will see the expected behavior.

I've demonstrated this behavior in a test case here - Programming-Systems-Lab@f05efe6

from phosphor.

chenbihuan commented on August 15, 2024

Thanks, Jon.

I did set the "controlTrack" flag, but unfortunately I used the old phosphor.jar in the master branch.
I can now get the expected behavior after building the project and using the newly generated Phosphor-0.0.2-SNAPSHOT.jar.

from phosphor.

chenbihuan commented on August 15, 2024

Hi Jon,

I encountered another problem. See the example below.

public class Test {

    public static void test(int x, int y, boolean b) {  
        Taint tg = null;
        if (b) {
            int g = x + y;
            //System.out.println(g);
            tg = MultiTainter.getTaint(g);
        }

        int h = 1;
        Taint th = MultiTainter.getTaint(h);

        System.out.println(tg);
        System.out.println(th);
    }

    public static void main(String[] args) {
        int x = -1;
        int y = -1;
        boolean b = true;

        int xt = MultiTainter.taintedInt(x, "x");
        int yt = MultiTainter.taintedInt(y, "y");
        boolean bt = MultiTainter.taintedBoolean(b, "b");

        test(xt, yt, bt);
    }

}

If I comment out System.out.println(g);, the output is:
Taint [lbl=null deps = [x y b ]]
null

However, if I uncomment System.out.println(g);, the output becomes:
Taint [lbl=null deps = [x y b ]]
Taint [lbl=null deps = [b x y ]]

Is this behavior expected? IMHO, the taint result for h in the second case should be null.
Please help to clarify the problem. Thanks!

from phosphor.

jon-bell commented on August 15, 2024

Hi Bihuan,

Let me clarify the semantics for how tags are applied in control flow.

We apply the taint tag of the branch condition to all assignments that appear (through a simple post-dominator analysis) to be influenced by that branch. This analysis only considers the current method.

So for example, in this case:

if(b){
    int g = 4; //g gets b's tag
}
else
{
    int r = 2; //r gets b's tag
}
int x = 3; //x does not get b's tag

It gets more complex when you call methods - because the state is carried over (persisting between methods).

if(b){
    someMethod();
}
...
public void someMethod()
{
    this.var = 5; //when someMethod() is called by the code above, in the branch, this.var gets b's tag!
}

Because we only remove a taint tag from the control flow marker when we are sure that it is no longer influencing the control flow, we can end up with marks accumulating that don't matter, for reasons like this:

if(b){
   //control flow now marked with b
   return;//b still marks control flow at call site - and won't be removed 
}

This is what is happening in println - there is a branch on the value being printed, and no clear resolution to that branch. One way to see it is that we are being conservative here - if a function has multiple return statements, then the choice of which return is used may change the control flow in other methods.

Tracking implicit flow is incredibly difficult to do both precisely and soundly. We are using the same semantics as DyTan. If you have any suggestions for improvements please let us know :)

from phosphor.

chenbihuan commented on August 15, 2024

Thanks for the clarification, Jon.

In my problem, I want to know the set of inputs that each branch depends on. My solution here is to (1) enable the data-flow taint analysis only, (2) identify the set of variables that each branch uses, and (3) instrument the calls to getTtaint(X) for the identified set of variables for each branch to get its taint result.

from phosphor.

jon-bell commented on August 15, 2024

I'm not sure how you are doing (2-3) above.
I'd suggest that you do it like this:
Make a new maven project
Add Phosphor as a dependency (and make sure it gets included in the resulting jar)
Create a new class that extends DataAndControlFlowTagFactory. Override only the jumpOp method. Take a look at what's there now - in the data tracking case, it is basically just popping the taint tags from the stack - you can change this behavior to record it somewhere, but not to do the implicit tracking that we do.
Add a file called "phosphor-mv" to the top of your src directory. Add the line "taintTagFactory=mytainttagfactoryclass"
If you want to use it as a javaagent, you'll also need to create a premain in your new project too, which should set Configuration.taintTagFactory then call the phosphor premain.

from phosphor.

chenbihuan commented on August 15, 2024

Thanks for the advice, Jon.

I have followed your suggested steps. I implemented BranchTaintTagFactory, which just stored the originally-popped taint tags via invoking the BranchTaintRecorder.add(X) method.

import edu.columbia.cs.psl.phosphor.Configuration;
import edu.columbia.cs.psl.phosphor.TaintUtils;
import edu.columbia.cs.psl.phosphor.instrumenter.DataAndControlFlowTagFactory;
import edu.columbia.cs.psl.phosphor.instrumenter.LocalVariableManager;
import edu.columbia.cs.psl.phosphor.instrumenter.TaintPassingMV;
import edu.columbia.cs.psl.phosphor.org.objectweb.asm.Label;
import edu.columbia.cs.psl.phosphor.org.objectweb.asm.MethodVisitor;
import edu.columbia.cs.psl.phosphor.org.objectweb.asm.Opcodes;
import edu.columbia.cs.psl.phosphor.org.objectweb.asm.Type;

public class BranchTaintTagFactory extends DataAndControlFlowTagFactory {

    @Override
    public void jumpOp(int opcode, int branchStarting, Label label, MethodVisitor mv, LocalVariableManager lvs, TaintPassingMV ta) {
        switch (opcode) {
            case Opcodes.IFEQ:
            case Opcodes.IFNE:
            case Opcodes.IFLT:
            case Opcodes.IFGE:
            case Opcodes.IFGT:
            case Opcodes.IFLE:
                //top is val, taint
                mv.visitInsn(SWAP);
                //mv.visitInsn(POP);
                mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(BranchTaintRecorder.class), "add", "(" + Configuration.TAINT_TAG_DESC + ")V", false);
                mv.visitJumpInsn(opcode, label);
                break;
            case Opcodes.IF_ICMPEQ:
            case Opcodes.IF_ICMPNE:
            case Opcodes.IF_ICMPLT:
            case Opcodes.IF_ICMPGE:
            case Opcodes.IF_ICMPGT:
            case Opcodes.IF_ICMPLE:
                //top is val, taint, val, taint
                mv.visitInsn(SWAP);
                //mv.visitInsn(POP);
                mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(BranchTaintRecorder.class), "add", "(" + Configuration.TAINT_TAG_DESC + ")V", false);
                //top is val, val, taint
                mv.visitInsn(DUP2_X1);
                mv.visitInsn(POP2);
                //mv.visitInsn(POP);
                mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(BranchTaintRecorder.class), "add", "(" + Configuration.TAINT_TAG_DESC + ")V", false);
                mv.visitJumpInsn(opcode, label);
                break;
            case Opcodes.IF_ACMPEQ:
            case Opcodes.IF_ACMPNE:
                // O1 (T1?) O2 (T2?)
                Type typeOnStack = ta.getTopOfStackType();
                if (typeOnStack.getSort() == Type.ARRAY && typeOnStack.getElementType().getSort() != Type.OBJECT) {
                    // O1 T1 O2 (T2?)
                    mv.visitInsn(SWAP);
                    //mv.visitInsn(POP);
                    mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(BranchTaintRecorder.class), "add", "(Ljava/lang/Object;)V", false);
                }
                //O1 O2 (T2?)
                Type secondOnStack = ta.getStackTypeAtOffset(1);
                if (secondOnStack.getSort() == Type.ARRAY && secondOnStack.getElementType().getSort() != Type.OBJECT) {
                    //O1 O2 T2
                    mv.visitInsn(DUP2_X1);
                    mv.visitInsn(POP2);
                    //mv.visitInsn(POP);
                    mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(BranchTaintRecorder.class), "add", "(Ljava/lang/Object;)V", false);
                }
                if(Configuration.WITH_UNBOX_ACMPEQ && (opcode == Opcodes.IF_ACMPEQ || opcode == Opcodes.IF_ACMPNE))
                {
                    mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(TaintUtils.class), "ensureUnboxed", "(Ljava/lang/Object;)Ljava/lang/Object;", false);
                    mv.visitInsn(SWAP);
                    mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(TaintUtils.class), "ensureUnboxed", "(Ljava/lang/Object;)Ljava/lang/Object;", false);
                    mv.visitInsn(SWAP);
                }
                mv.visitJumpInsn(opcode, label);
                break;
            case Opcodes.IFNULL:
            case Opcodes.IFNONNULL:
                // O1 (T1?)
                typeOnStack = ta.getTopOfStackType();
                if (typeOnStack.getSort() == Type.ARRAY && typeOnStack.getElementType().getSort() != Type.OBJECT) {
                    //O1 T1
                    mv.visitInsn(SWAP);
                    //mv.visitInsn(POP);
                    mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(BranchTaintRecorder.class), "add", "(Ljava/lang/Object;)V", false);
                }
                mv.visitJumpInsn(opcode, label);
                break;              
            case Opcodes.GOTO:
                mv.visitJumpInsn(opcode, label);
                break;
            default:
                throw new IllegalArgumentException("Unimplemented: " + opcode);
        }
    }

}

import java.util.HashSet;

import edu.columbia.cs.psl.phosphor.Configuration;
import edu.columbia.cs.psl.phosphor.runtime.Taint;
import edu.columbia.cs.psl.phosphor.struct.Tainted;
import edu.columbia.cs.psl.phosphor.struct.TaintedWithIntTag;
import edu.columbia.cs.psl.phosphor.struct.TaintedWithObjTag;

public class BranchTaintRecorder {

    private static HashSet<Integer> setInt;
    private static HashSet<Taint> setObj;

    public static void add(int tag) {
        if (setInt == null) {
            setInt = new HashSet<Integer>();
        }
        setInt.add(tag);
    }

    public static void add(Taint tag) {
        if (setObj == null) {
            setObj = new HashSet<Taint>();
        }
        setObj.add(tag);
    }

    public static void add(Object obj) {
        if (obj instanceof Tainted) {
            if (obj instanceof TaintedWithObjTag) {
                add((Taint)(((TaintedWithObjTag) obj).getPHOSPHOR_TAG()));
            } else if (obj instanceof TaintedWithIntTag) {
                add(((TaintedWithIntTag) obj).getPHOSPHOR_TAG());
            }
        }
    }

    public static HashSet<?> getBranchTaint() {
        if (Configuration.MULTI_TAINTING) {
            return setObj;
        } else {
            return setInt;
        }
    }

}

However, when testing on TestVariables, I got the StackOverflowError. Am I missing something here when extending DataAndControlFlowTagFactory? If possible, can you give some hints to solve the problem? Thank you.

java.lang.StackOverflowError
at edu.ntu.taint.BranchTaintRecorder.add(BranchTaintRecorder.java:17)
at java.lang.Integer.valueOf$$PHOSPHORTAGGED(Integer.java:830)
at java.lang.Integer.valueOf(Integer.java)
at edu.ntu.taint.BranchTaintRecorder.add(BranchTaintRecorder.java:20)
at java.lang.Integer.valueOf$$PHOSPHORTAGGED(Integer.java:830)
at java.lang.Integer.valueOf(Integer.java)
at edu.ntu.taint.BranchTaintRecorder.add(BranchTaintRecorder.java:20)
at java.lang.Integer.valueOf$$PHOSPHORTAGGED(Integer.java:830)
at java.lang.Integer.valueOf(Integer.java)
at edu.ntu.taint.BranchTaintRecorder.add(BranchTaintRecorder.java:20)
at java.lang.Integer.valueOf$$PHOSPHORTAGGED(Integer.java:830)
at java.lang.Integer.valueOf(Integer.java)
at edu.ntu.taint.BranchTaintRecorder.add(BranchTaintRecorder.java:20)
........

import edu.columbia.cs.psl.phosphor.runtime.Tainter;

public class TestVariables {

    public static void test(Pair p) {
        int tag1 = Tainter.getTaint(p);
        int tag2 = Tainter.getTaint(p.getH());
        int tag3 = Tainter.getTaint(p.getW());

        System.out.println((tag1 & 1) + " " + (tag1 & 2) + " " + (tag1 & 4));
        System.out.println((tag2 & 1) + " " + (tag2 & 2) + " " + (tag2 & 4));
        System.out.println((tag3 & 1) + " " + (tag3 & 2) + " " + (tag3 & 4));
    }

    public static void main(String[] args) {
        int x = Tainter.taintedInt(1, 1);
        int y = Tainter.taintedInt(1, 2);

        Pair p = new Pair(x, y);
        Tainter.taintedObject(p, 4);

        test(p);
    }

}

class Pair {

    private int h;
    private int w;

    public Pair(int h, int w) {
        this.h = h;
        this.w = w;
    }

    public int getH() {
        return h;
    }

    public void setH(int h) {
        this.h = h;
    }

    public int getW() {
        return w;
    }

    public void setW(int w) {
        this.w = w;
    }

}

from phosphor.

jon-bell commented on August 15, 2024

When you are doing any kind of instrumentation like this, it's vital that your runtime code (in this case, BranchTaintRecorder.add) does not call back into instrumented code, or else you'll see stack overflows like this. Remember that Phosphor is instrumenting the entire JRE API etc. - including java.lang.Integer (remember that in your code above the compiler changes setInt.add(int tag) to setInt.add(Integer.valueOf(int tag))), and HashSet.

Do you need it to be a hashset? Would a linked list suffice? If so, you can use edu.columbia.cs.psl.phosphor.struct.LinkedList<T> - and for storing the int tags, create your own wrapper class (i.e. a class that has just 1 primitive field, which stores the int as below)

class WrappedInt{
    public int v;
}

Or, if you really only care about the 31 distinct markers (and not what combination they were encountered in) then for this case, it's probably fastest and easiest just to use a single bit string to store which tags were encountered, like this:

private static int setInts;
public void add(int tag)
{
    setInts |= tag;
}

from phosphor.

chenbihuan commented on August 15, 2024

Thanks for your patience and help, Jon.

I use edu.columbia.cs.psl.phosphor.struct.LinkedList<T> and create a wrapper class for int. Now the stack overflow exception is gone. After running the following test, it seems that the originally-popped taint tags are not actually stored by BranchTaintRecorder.add. Do you have any clues? My extended taint analysis is attached (taint.zip) including the source code.

public class Test {

    public static void main(String[] args) {
        int x = Tainter.taintedInt(1, 1);

        if (x > 0) {
            int tag = Tainter.getTaint(x);
            assert((tag & 1) == 1);
        }

        assert(BranchTaintRecorder.getBranchTaint() == null);
    }

}

Meanwhile, now it needs much more time than just popping those tags. Is this normal?

BTW, is it possible to get the corresponding source line number of those jump bytecodes?

from phosphor.

jon-bell commented on August 15, 2024

Heh... There are some optimizations that will avoid loading taint tags onto the stack when we know that they are going to be popped right away (e.g. for if(x>0)). We disable this optimization when doing implicit flow tracking, but it doesn't get disabled for you.

You can look at the generated code by using javap -private -verbose instrumented/Test.class:

        13: invokestatic  #83                 // Method edu/columbia/cs/psl/phosphor/runtime/Tainter.taintedInt$$PHOSPHORTAGGED:(IIIILedu/columbia/cs/psl/phosphor/struct/TaintedIntWithIntTag;)Ledu/columbia/cs/psl/phosphor/struct/TaintedIntWithIntTag;
        16: dup           
        17: getfield      #84                 // Field edu/columbia/cs/psl/phosphor/struct/TaintedIntWithIntTag.taint:I
        20: swap          
        21: getfield      #86                 // Field edu/columbia/cs/psl/phosphor/struct/TaintedIntWithIntTag.val:I
        24: istore_2      
        25: istore_3      
        26: iload_2       
        27: ifle          99

But it is applied at many other branches. I just added the instrument-time option disableJumpOptimizations to turn this optimization off. That should solve your problem.

This slowdown is probably what is to be expected: you are replacing EVERY SINGLE branch operation in the entire execution of the JVM with this big dynamic call. Are you interested in just targeting specific branches? Even if you are targeting all of them, you can also probably make it significantly faster like this:

    public static void add(int tag) {
        if(tag==0) return;
                /* 99.99999% of calls will probably have 0 tag, 
                *  so why are we allocating a new wrappedint and putting it on a list?
                */
        if (setInt == null) {
            setInt = new LinkedList<WrappedInt>();
        }
        setInt.add(new WrappedInt(tag));
    }

I've added a callback for line number visitation.

from phosphor.

chenbihuan commented on August 15, 2024

Thanks for adding the option, Jon.

I added the disableJumpOptimizations for instrumenting jre and the following test case. I still failed to get the stored tags by invoking BranchTaintRecorder.getBranchTaint().

public class Test {

    public static void main(String[] args) {
        int x = Tainter.taintedInt(1, 1);

        if (x > 0) {
            int tag = Tainter.getTaint(x);
            assert((tag & 1) == 1);
        }

        assert(BranchTaintRecorder.getBranchTaint() == null);
        //assert(BranchTaintRecorder.getAllInts() == 1);
    }

}

Then I used a single bit string to store the tags like this:

private static int allInts = 0;
public static int getAllInts() {
    return allInts;
}

However, I got the following exception:

Exception in thread "main" java.lang.NoSuchMethodError: edu.ntu.taint.BranchTaintRecorder.getAllInts$$PHOSPHORTAGGED(Ledu/columbia/cs/psl/phosphor/struct/TaintedIntWithIntTag;)Ledu/columbia/cs/psl/phosphor/struct/TaintedIntWithIntTag;
    at phosphor.test.Test.main(Test.java:17)

So I added the required method:

public static TaintedIntWithIntTag getAllInts$$PHOSPHORTAGGED(TaintedIntWithIntTag ret) {
    ret.taint = allInts;
    ret.val = allInts;
    return ret;
}

Then I passed the second assertion in the above test case, which indicated that the tags were correctly stored. I don't understand why BranchTaintRecorder.getBranchTaint() failed to get the stored tags. Would you please take a look? The source code are attached (taint.zip).

For the source line number issue, I want to get the line number in the source file where the encountered jump instruction corresponds to, and push this number onto the stack in jumpOp so that I can distinguish the set of inputs that each branch depends on. For example, the source line number of the instruction 32: ifle 104 is 6 (i.e., the line if (x > 0) {). The callback you added seems to do the reverse thing, which needs to know to line number first? Any suggestions would be helpful. Thank you.

from phosphor.

chenbihuan commented on August 15, 2024

Hi, Jon. Both problems are now solved.

For the first problem, I can get the stored tags like this:

private static LinkedList<BranchTaint> branchTaints;
public static LinkedList<BranchTaint> getBranchTaints() {
    return branchTaints;
}

But I still don't know why this version works while the previous one does not. Besides, if the runtime code needs to get int value, a corresponding get method with the suffix $$PHOSPHORTAGGED has to be defined?

private int tag;
public int getTag() {
    return tag;
}   
public TaintedIntWithIntTag getTag$$PHOSPHORTAGGED(TaintedIntWithIntTag ret) {
    ret.val = tag;
    ret.taint = tag;
    return ret;
}

For the second problem, it seems I totally misunderstand the mechanism of ASM... The parameter line of the callback method lineNumberVisited is just what I want.

Thanks for your great help.

from phosphor.

jon-bell commented on August 15, 2024

Glad that you got it working. You need to provide the $$PHOSPHORTAGGED version because that's how the taint tags are passed back and forth for primitives (if you have any passed as arguments or returned). You'll see that we do the same thing in Tainter.java. Note that you probably want to say ret.taint = 0 though, or else you might end up accidentally accumulating extra taints on extra branches (e.g. when you are handling the result of getTag in your client code).

I am not sure exactly what the issue was with getBranchTaints - it seems like there were several issues happening at once and it's not clear to me exactly what was causing that. FYI, you might find it helpful to run your test programs in debug mode and attach a debugger (like eclipse) to your running app, setting a breakpoint at a branch, and stepping through to see what happens. While the source code won't match exactly (none of the generated instructions will show in your source view), you can still do step-into and see what functions get called (like your branch recorder).

Am I correct in understanding that this is all working now, though?

from phosphor.

chenbihuan commented on August 15, 2024

Thanks for the suggestions, Jon. All things are working now.

from phosphor.

Wrong taint results for relational and conditional operators about phosphor HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent