Comments (14)
Hi Bihuan,
The behavior that you are looking for is for Phosphor to track taint tags through conditional operators (e.g. greater than sign). Phosphor does this only with the "controlTrack" flag set during instrumentation (must be set when instrumenting both the test code and the JRE). Based on your output, I am guessing that you are just using the multiTaint option.
If you run this same exact code with the controlTrack flag set, you should see the output:
e = 0 -> Taint [lbl=null deps = [x ]]
f = 1 -> Taint [lbl=null deps = [y ]]
b1 = false -> Taint [lbl=null deps = [x y ]]
b2 = true -> Taint [lbl=null deps = [x y ]]
b3 = true -> Taint [lbl=null deps = [x y b ]]
In this case, there are branches in System.out.println that depend on the value of what we are printing (e.g. for deciding how to print numbers). So the line that prints e will cause e to be relevant to the control flow state, and the line that prints f will cause it to be relevant as well. If you DON'T print the values, then you will see the expected behavior.
I've demonstrated this behavior in a test case here - Programming-Systems-Lab@f05efe6
from phosphor.
Thanks, Jon.
I did set the "controlTrack" flag, but unfortunately I used the old phosphor.jar in the master branch.
I can now get the expected behavior after building the project and using the newly generated Phosphor-0.0.2-SNAPSHOT.jar.
from phosphor.
Hi Jon,
I encountered another problem. See the example below.
public class Test {
public static void test(int x, int y, boolean b) {
Taint tg = null;
if (b) {
int g = x + y;
//System.out.println(g);
tg = MultiTainter.getTaint(g);
}
int h = 1;
Taint th = MultiTainter.getTaint(h);
System.out.println(tg);
System.out.println(th);
}
public static void main(String[] args) {
int x = -1;
int y = -1;
boolean b = true;
int xt = MultiTainter.taintedInt(x, "x");
int yt = MultiTainter.taintedInt(y, "y");
boolean bt = MultiTainter.taintedBoolean(b, "b");
test(xt, yt, bt);
}
}
If I comment out System.out.println(g);, the output is:
Taint [lbl=null deps = [x y b ]]
null
However, if I uncomment System.out.println(g);, the output becomes:
Taint [lbl=null deps = [x y b ]]
Taint [lbl=null deps = [b x y ]]
Is this behavior expected? IMHO, the taint result for h in the second case should be null.
Please help to clarify the problem. Thanks!
from phosphor.
Hi Bihuan,
Let me clarify the semantics for how tags are applied in control flow.
We apply the taint tag of the branch condition to all assignments that appear (through a simple post-dominator analysis) to be influenced by that branch. This analysis only considers the current method.
So for example, in this case:
if(b){
int g = 4; //g gets b's tag
}
else
{
int r = 2; //r gets b's tag
}
int x = 3; //x does not get b's tag
It gets more complex when you call methods - because the state is carried over (persisting between methods).
if(b){
someMethod();
}
...
public void someMethod()
{
this.var = 5; //when someMethod() is called by the code above, in the branch, this.var gets b's tag!
}
Because we only remove a taint tag from the control flow marker when we are sure that it is no longer influencing the control flow, we can end up with marks accumulating that don't matter, for reasons like this:
if(b){
//control flow now marked with b
return;//b still marks control flow at call site - and won't be removed
}
This is what is happening in println - there is a branch on the value being printed, and no clear resolution to that branch. One way to see it is that we are being conservative here - if a function has multiple return statements, then the choice of which return is used may change the control flow in other methods.
Tracking implicit flow is incredibly difficult to do both precisely and soundly. We are using the same semantics as DyTan. If you have any suggestions for improvements please let us know :)
from phosphor.
Thanks for the clarification, Jon.
In my problem, I want to know the set of inputs that each branch depends on. My solution here is to (1) enable the data-flow taint analysis only, (2) identify the set of variables that each branch uses, and (3) instrument the calls to getTtaint(X) for the identified set of variables for each branch to get its taint result.
from phosphor.
I'm not sure how you are doing (2-3) above.
I'd suggest that you do it like this:
Make a new maven project
Add Phosphor as a dependency (and make sure it gets included in the resulting jar)
Create a new class that extends DataAndControlFlowTagFactory. Override only the jumpOp method. Take a look at what's there now - in the data tracking case, it is basically just popping the taint tags from the stack - you can change this behavior to record it somewhere, but not to do the implicit tracking that we do.
Add a file called "phosphor-mv" to the top of your src directory. Add the line "taintTagFactory=mytainttagfactoryclass"
If you want to use it as a javaagent, you'll also need to create a premain in your new project too, which should set Configuration.taintTagFactory then call the phosphor premain.
from phosphor.
Thanks for the advice, Jon.
I have followed your suggested steps. I implemented BranchTaintTagFactory, which just stored the originally-popped taint tags via invoking the BranchTaintRecorder.add(X) method.
import edu.columbia.cs.psl.phosphor.Configuration;
import edu.columbia.cs.psl.phosphor.TaintUtils;
import edu.columbia.cs.psl.phosphor.instrumenter.DataAndControlFlowTagFactory;
import edu.columbia.cs.psl.phosphor.instrumenter.LocalVariableManager;
import edu.columbia.cs.psl.phosphor.instrumenter.TaintPassingMV;
import edu.columbia.cs.psl.phosphor.org.objectweb.asm.Label;
import edu.columbia.cs.psl.phosphor.org.objectweb.asm.MethodVisitor;
import edu.columbia.cs.psl.phosphor.org.objectweb.asm.Opcodes;
import edu.columbia.cs.psl.phosphor.org.objectweb.asm.Type;
public class BranchTaintTagFactory extends DataAndControlFlowTagFactory {
@Override
public void jumpOp(int opcode, int branchStarting, Label label, MethodVisitor mv, LocalVariableManager lvs, TaintPassingMV ta) {
switch (opcode) {
case Opcodes.IFEQ:
case Opcodes.IFNE:
case Opcodes.IFLT:
case Opcodes.IFGE:
case Opcodes.IFGT:
case Opcodes.IFLE:
//top is val, taint
mv.visitInsn(SWAP);
//mv.visitInsn(POP);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(BranchTaintRecorder.class), "add", "(" + Configuration.TAINT_TAG_DESC + ")V", false);
mv.visitJumpInsn(opcode, label);
break;
case Opcodes.IF_ICMPEQ:
case Opcodes.IF_ICMPNE:
case Opcodes.IF_ICMPLT:
case Opcodes.IF_ICMPGE:
case Opcodes.IF_ICMPGT:
case Opcodes.IF_ICMPLE:
//top is val, taint, val, taint
mv.visitInsn(SWAP);
//mv.visitInsn(POP);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(BranchTaintRecorder.class), "add", "(" + Configuration.TAINT_TAG_DESC + ")V", false);
//top is val, val, taint
mv.visitInsn(DUP2_X1);
mv.visitInsn(POP2);
//mv.visitInsn(POP);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(BranchTaintRecorder.class), "add", "(" + Configuration.TAINT_TAG_DESC + ")V", false);
mv.visitJumpInsn(opcode, label);
break;
case Opcodes.IF_ACMPEQ:
case Opcodes.IF_ACMPNE:
// O1 (T1?) O2 (T2?)
Type typeOnStack = ta.getTopOfStackType();
if (typeOnStack.getSort() == Type.ARRAY && typeOnStack.getElementType().getSort() != Type.OBJECT) {
// O1 T1 O2 (T2?)
mv.visitInsn(SWAP);
//mv.visitInsn(POP);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(BranchTaintRecorder.class), "add", "(Ljava/lang/Object;)V", false);
}
//O1 O2 (T2?)
Type secondOnStack = ta.getStackTypeAtOffset(1);
if (secondOnStack.getSort() == Type.ARRAY && secondOnStack.getElementType().getSort() != Type.OBJECT) {
//O1 O2 T2
mv.visitInsn(DUP2_X1);
mv.visitInsn(POP2);
//mv.visitInsn(POP);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(BranchTaintRecorder.class), "add", "(Ljava/lang/Object;)V", false);
}
if(Configuration.WITH_UNBOX_ACMPEQ && (opcode == Opcodes.IF_ACMPEQ || opcode == Opcodes.IF_ACMPNE))
{
mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(TaintUtils.class), "ensureUnboxed", "(Ljava/lang/Object;)Ljava/lang/Object;", false);
mv.visitInsn(SWAP);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(TaintUtils.class), "ensureUnboxed", "(Ljava/lang/Object;)Ljava/lang/Object;", false);
mv.visitInsn(SWAP);
}
mv.visitJumpInsn(opcode, label);
break;
case Opcodes.IFNULL:
case Opcodes.IFNONNULL:
// O1 (T1?)
typeOnStack = ta.getTopOfStackType();
if (typeOnStack.getSort() == Type.ARRAY && typeOnStack.getElementType().getSort() != Type.OBJECT) {
//O1 T1
mv.visitInsn(SWAP);
//mv.visitInsn(POP);
mv.visitMethodInsn(Opcodes.INVOKESTATIC, Type.getInternalName(BranchTaintRecorder.class), "add", "(Ljava/lang/Object;)V", false);
}
mv.visitJumpInsn(opcode, label);
break;
case Opcodes.GOTO:
mv.visitJumpInsn(opcode, label);
break;
default:
throw new IllegalArgumentException("Unimplemented: " + opcode);
}
}
}
import java.util.HashSet;
import edu.columbia.cs.psl.phosphor.Configuration;
import edu.columbia.cs.psl.phosphor.runtime.Taint;
import edu.columbia.cs.psl.phosphor.struct.Tainted;
import edu.columbia.cs.psl.phosphor.struct.TaintedWithIntTag;
import edu.columbia.cs.psl.phosphor.struct.TaintedWithObjTag;
public class BranchTaintRecorder {
private static HashSet<Integer> setInt;
private static HashSet<Taint> setObj;
public static void add(int tag) {
if (setInt == null) {
setInt = new HashSet<Integer>();
}
setInt.add(tag);
}
public static void add(Taint tag) {
if (setObj == null) {
setObj = new HashSet<Taint>();
}
setObj.add(tag);
}
public static void add(Object obj) {
if (obj instanceof Tainted) {
if (obj instanceof TaintedWithObjTag) {
add((Taint)(((TaintedWithObjTag) obj).getPHOSPHOR_TAG()));
} else if (obj instanceof TaintedWithIntTag) {
add(((TaintedWithIntTag) obj).getPHOSPHOR_TAG());
}
}
}
public static HashSet<?> getBranchTaint() {
if (Configuration.MULTI_TAINTING) {
return setObj;
} else {
return setInt;
}
}
}
However, when testing on TestVariables, I got the StackOverflowError. Am I missing something here when extending DataAndControlFlowTagFactory? If possible, can you give some hints to solve the problem? Thank you.
java.lang.StackOverflowError
at edu.ntu.taint.BranchTaintRecorder.add(BranchTaintRecorder.java:17)
at java.lang.Integer.valueOf$$PHOSPHORTAGGED(Integer.java:830)
at java.lang.Integer.valueOf(Integer.java)
at edu.ntu.taint.BranchTaintRecorder.add(BranchTaintRecorder.java:20)
at java.lang.Integer.valueOf$$PHOSPHORTAGGED(Integer.java:830)
at java.lang.Integer.valueOf(Integer.java)
at edu.ntu.taint.BranchTaintRecorder.add(BranchTaintRecorder.java:20)
at java.lang.Integer.valueOf$$PHOSPHORTAGGED(Integer.java:830)
at java.lang.Integer.valueOf(Integer.java)
at edu.ntu.taint.BranchTaintRecorder.add(BranchTaintRecorder.java:20)
at java.lang.Integer.valueOf$$PHOSPHORTAGGED(Integer.java:830)
at java.lang.Integer.valueOf(Integer.java)
at edu.ntu.taint.BranchTaintRecorder.add(BranchTaintRecorder.java:20)
........
import edu.columbia.cs.psl.phosphor.runtime.Tainter;
public class TestVariables {
public static void test(Pair p) {
int tag1 = Tainter.getTaint(p);
int tag2 = Tainter.getTaint(p.getH());
int tag3 = Tainter.getTaint(p.getW());
System.out.println((tag1 & 1) + " " + (tag1 & 2) + " " + (tag1 & 4));
System.out.println((tag2 & 1) + " " + (tag2 & 2) + " " + (tag2 & 4));
System.out.println((tag3 & 1) + " " + (tag3 & 2) + " " + (tag3 & 4));
}
public static void main(String[] args) {
int x = Tainter.taintedInt(1, 1);
int y = Tainter.taintedInt(1, 2);
Pair p = new Pair(x, y);
Tainter.taintedObject(p, 4);
test(p);
}
}
class Pair {
private int h;
private int w;
public Pair(int h, int w) {
this.h = h;
this.w = w;
}
public int getH() {
return h;
}
public void setH(int h) {
this.h = h;
}
public int getW() {
return w;
}
public void setW(int w) {
this.w = w;
}
}
from phosphor.
When you are doing any kind of instrumentation like this, it's vital that your runtime code (in this case, BranchTaintRecorder.add
) does not call back into instrumented code, or else you'll see stack overflows like this. Remember that Phosphor is instrumenting the entire JRE API etc. - including java.lang.Integer
(remember that in your code above the compiler changes setInt.add(int tag)
to setInt.add(Integer.valueOf(int tag))
), and HashSet
.
Do you need it to be a hashset? Would a linked list suffice? If so, you can use edu.columbia.cs.psl.phosphor.struct.LinkedList<T>
- and for storing the int tags, create your own wrapper class (i.e. a class that has just 1 primitive field, which stores the int as below)
class WrappedInt{
public int v;
}
Or, if you really only care about the 31 distinct markers (and not what combination they were encountered in) then for this case, it's probably fastest and easiest just to use a single bit string to store which tags were encountered, like this:
private static int setInts;
public void add(int tag)
{
setInts |= tag;
}
from phosphor.
Thanks for your patience and help, Jon.
I use edu.columbia.cs.psl.phosphor.struct.LinkedList<T>
and create a wrapper class for int
. Now the stack overflow exception is gone. After running the following test, it seems that the originally-popped taint tags are not actually stored by BranchTaintRecorder.add
. Do you have any clues? My extended taint analysis is attached (taint.zip) including the source code.
public class Test {
public static void main(String[] args) {
int x = Tainter.taintedInt(1, 1);
if (x > 0) {
int tag = Tainter.getTaint(x);
assert((tag & 1) == 1);
}
assert(BranchTaintRecorder.getBranchTaint() == null);
}
}
Meanwhile, now it needs much more time than just popping those tags. Is this normal?
BTW, is it possible to get the corresponding source line number of those jump bytecodes?
from phosphor.
Heh... There are some optimizations that will avoid loading taint tags onto the stack when we know that they are going to be popped right away (e.g. for if(x>0)
). We disable this optimization when doing implicit flow tracking, but it doesn't get disabled for you.
You can look at the generated code by using javap -private -verbose instrumented/Test.class
:
13: invokestatic #83 // Method edu/columbia/cs/psl/phosphor/runtime/Tainter.taintedInt$$PHOSPHORTAGGED:(IIIILedu/columbia/cs/psl/phosphor/struct/TaintedIntWithIntTag;)Ledu/columbia/cs/psl/phosphor/struct/TaintedIntWithIntTag;
16: dup
17: getfield #84 // Field edu/columbia/cs/psl/phosphor/struct/TaintedIntWithIntTag.taint:I
20: swap
21: getfield #86 // Field edu/columbia/cs/psl/phosphor/struct/TaintedIntWithIntTag.val:I
24: istore_2
25: istore_3
26: iload_2
27: ifle 99
But it is applied at many other branches. I just added the instrument-time option disableJumpOptimizations
to turn this optimization off. That should solve your problem.
This slowdown is probably what is to be expected: you are replacing EVERY SINGLE branch operation in the entire execution of the JVM with this big dynamic call. Are you interested in just targeting specific branches? Even if you are targeting all of them, you can also probably make it significantly faster like this:
public static void add(int tag) {
if(tag==0) return;
/* 99.99999% of calls will probably have 0 tag,
* so why are we allocating a new wrappedint and putting it on a list?
*/
if (setInt == null) {
setInt = new LinkedList<WrappedInt>();
}
setInt.add(new WrappedInt(tag));
}
I've added a callback for line number visitation.
from phosphor.
Thanks for adding the option, Jon.
I added the disableJumpOptimizations
for instrumenting jre and the following test case. I still failed to get the stored tags by invoking BranchTaintRecorder.getBranchTaint()
.
public class Test {
public static void main(String[] args) {
int x = Tainter.taintedInt(1, 1);
if (x > 0) {
int tag = Tainter.getTaint(x);
assert((tag & 1) == 1);
}
assert(BranchTaintRecorder.getBranchTaint() == null);
//assert(BranchTaintRecorder.getAllInts() == 1);
}
}
Then I used a single bit string to store the tags like this:
private static int allInts = 0;
public static int getAllInts() {
return allInts;
}
However, I got the following exception:
Exception in thread "main" java.lang.NoSuchMethodError: edu.ntu.taint.BranchTaintRecorder.getAllInts$$PHOSPHORTAGGED(Ledu/columbia/cs/psl/phosphor/struct/TaintedIntWithIntTag;)Ledu/columbia/cs/psl/phosphor/struct/TaintedIntWithIntTag;
at phosphor.test.Test.main(Test.java:17)
So I added the required method:
public static TaintedIntWithIntTag getAllInts$$PHOSPHORTAGGED(TaintedIntWithIntTag ret) {
ret.taint = allInts;
ret.val = allInts;
return ret;
}
Then I passed the second assertion in the above test case, which indicated that the tags were correctly stored. I don't understand why BranchTaintRecorder.getBranchTaint()
failed to get the stored tags. Would you please take a look? The source code are attached (taint.zip).
For the source line number issue, I want to get the line number in the source file where the encountered jump instruction corresponds to, and push this number onto the stack in jumpOp
so that I can distinguish the set of inputs that each branch depends on. For example, the source line number of the instruction 32: ifle 104
is 6 (i.e., the line if (x > 0) {
). The callback you added seems to do the reverse thing, which needs to know to line number first? Any suggestions would be helpful. Thank you.
from phosphor.
Hi, Jon. Both problems are now solved.
For the first problem, I can get the stored tags like this:
private static LinkedList<BranchTaint> branchTaints;
public static LinkedList<BranchTaint> getBranchTaints() {
return branchTaints;
}
But I still don't know why this version works while the previous one does not. Besides, if the runtime code needs to get int
value, a corresponding get
method with the suffix $$PHOSPHORTAGGED
has to be defined?
private int tag;
public int getTag() {
return tag;
}
public TaintedIntWithIntTag getTag$$PHOSPHORTAGGED(TaintedIntWithIntTag ret) {
ret.val = tag;
ret.taint = tag;
return ret;
}
For the second problem, it seems I totally misunderstand the mechanism of ASM... The parameter line
of the callback method lineNumberVisited
is just what I want.
Thanks for your great help.
from phosphor.
Glad that you got it working. You need to provide the $$PHOSPHORTAGGED version because that's how the taint tags are passed back and forth for primitives (if you have any passed as arguments or returned). You'll see that we do the same thing in Tainter.java. Note that you probably want to say ret.taint = 0
though, or else you might end up accidentally accumulating extra taints on extra branches (e.g. when you are handling the result of getTag in your client code).
I am not sure exactly what the issue was with getBranchTaints
- it seems like there were several issues happening at once and it's not clear to me exactly what was causing that. FYI, you might find it helpful to run your test programs in debug mode and attach a debugger (like eclipse) to your running app, setting a breakpoint at a branch, and stepping through to see what happens. While the source code won't match exactly (none of the generated instructions will show in your source view), you can still do step-into and see what functions get called (like your branch recorder).
Am I correct in understanding that this is all working now, though?
from phosphor.
Thanks for the suggestions, Jon. All things are working now.
from phosphor.
Related Issues (20)
- Any up to date document or tutorials? HOT 12
- What's the proper way to taint an object array? HOT 2
- Support for newer versions of Java HOT 17
- mvn package HOT 5
- java.lang.NullPointerException HOT 11
- Embeded updated `Configuration` class into JDK? HOT 2
- startup error HOT 1
- Crashes inside MethodHandleImpl HOT 5
- Crash inside GeneratedMethodAccessor
- InheritedAutoTaintObjTagITCase is flaky in CI on Java 16
- Error occurred during initialization of VM HOT 7
- ArrayOutOfBoundsException in SourceTaintingMV HOT 2
- How to Instrument Multiple Jars HOT 1
- Error occurred during initialization of VM HOT 4
- What is the reason that this command cannot be executed under windows environment and the usage prompt pops up? HOT 1
- I get this error after completing the for instrumented jre HOT 2
- Ask a question about phosphor use. HOT 3
- Error occured when using the instrumented jdk HOT 8
- Questions about phosphor compatibility with other java agents HOT 3
- Phosphor is not running on the window HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from phosphor.