-
Notifications
You must be signed in to change notification settings - Fork 1.9k
C++: Replace field -> object taint rule with ArrayContent dataflow
#4230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
UnknownType with ArrayContent dataflowUnknownType with ArrayContent dataflow
|
CPP-differences for 78b24b7: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1428/ The failing tests are from the internal repo, which I'll create a PR for when we're happy with this PR. |
… the flows from dataflow already now.
UnknownType with ArrayContent dataflowfield -> object taint rule with ArrayContent dataflow
jbj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I'm very excited about this PR. We'll have to figure out how it fits into the collection taint modelling we're doing these days.
| // `x[i] = taint()` | ||
| store.getDestinationAddress() instanceof PointerAddInstruction | ||
| or | ||
| // `*p = taint()` | ||
| store.getDestinationAddress().(CopyValueInstruction).getUnary() instanceof LoadInstruction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These rules seem overly syntactic, but we can always tweak them later if there's a need. The important thing is they're guaranteed not to overlap with getWrittenField.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. The PointerAdd case I think we can live with, but requiring the presence of a CopyValue instruction as a direct successor to the Load instruction is quite fragile. For instance, for this store:
int source();
void f(char* s) {
*(int*)s = source();
}there's a Convert instruction in between the Load and the CopyValue, which means that this is not recognized as a store step by arrayStoreStepChi.
| private predicate arrayReadStep(Node node1, ArrayContent a, Node node2) { | ||
| a = TArrayContent() and | ||
| exists(LoadInstruction load | | ||
| node1.asInstruction() = load.getSourceValueOperand().getAnyDef() and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definition seems overly broad. Can we restrict the address operand to particular syntactic forms like in arrayStoreStepChi and ensure there is no overlap with fieldReadStep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was also my main worry when I was CPP-difference, but luckily didn't appear to be a problem. For explicit array and pointer reads (i.e., x[i]) we could restrict it by requiring that the source address instruction is a PointerAdd instruction. But for an example like:
void f(int* s) {
*s = source();
}
void test() {
int x;
f(&x);
sink(x);
}(where the read step is currently the load of x in sink(x)) I don't see how to syntactically restrict the predicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But can you add something like and not fieldReadStep(node1, _, _) to ensure there is no overlap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I was hoping your solution would Just Work, but the following shows that we need to do something else, I think:
void f(int* px) {
*px = source();
}
void test_outer_with_ptr() {
Point p;
f(&p.x);
sink(p.x);
}Here, the load of p.x in sink(p.x) actually looks like a fieldReadStep, but (as *px = source() matches arrayStoreStepChi), we actually need an arrayReadStep in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this and decided it doesn't need to block this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Buffers of unknown size | ||
| t instanceof UnknownType | ||
| ) | ||
| or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove the same rule from IR TaintTrackingUtil.qll.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I've removed it in 0b97a4a.
… from DefaultTaintTracking.qll
|
I didn't notice until now that there's actually a result change in our tests for // BAD: i5 value comes from argv
char i5[5012];
i5[0] = argv[1][0];
printf(i5);
printWrapper(i5);
// BAD: i5 value comes from argv
printf(i5 + 1);
printWrapper(i5 + 1);Which are legitimate uses of To keep this PR uncontroversial, I now think that we should:
How does this sound? |
How so? I don't see any fields there, so I'd classify it as element -> array.
SGTM. |
Right. I meant |
|
This PR should be good to go now! The remaining test failures are accepted in the internal repo (which is green). |
…ontent C++: Add test demonstrating field conflation after merging #4230
C++: Fix field conflation after #4230
This PR uses field flow to implement precise modeling of array writes and reads for IR dataflow.
With these changes to the C++ dataflow library, we can now remove one of the taint rules that causes
field -> objecttaint flow (which we want to avoid before we embark on https://github.com/github/codeql-c-analysis-team/issues/103 to avoid field-to-unrelated-field flow).