Skip to content

Conversation

@MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Sep 8, 2020

This PR uses field flow to implement precise modeling of array writes and reads for IR dataflow.

With these changes to the C++ dataflow library, we can now remove one of the taint rules that causes field -> object taint flow (which we want to avoid before we embark on https://github.com/github/codeql-c-analysis-team/issues/103 to avoid field-to-unrelated-field flow).

@MathiasVP MathiasVP added C++ WIP This is a work-in-progress, do not merge yet! labels Sep 8, 2020
@MathiasVP MathiasVP changed the title WIP: Replace taint through UnknownType with ArrayContent dataflow C++: (WIP) Replace taint through UnknownType with ArrayContent dataflow Sep 8, 2020
@MathiasVP
Copy link
Contributor Author

MathiasVP commented Sep 12, 2020

CPP-differences for 78b24b7: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1428/

The failing tests are from the internal repo, which I'll create a PR for when we're happy with this PR.

@MathiasVP MathiasVP removed the WIP This is a work-in-progress, do not merge yet! label Sep 14, 2020
@MathiasVP MathiasVP changed the title C++: (WIP) Replace taint through UnknownType with ArrayContent dataflow C++: Replace field -> object taint rule with ArrayContent dataflow Sep 14, 2020
@MathiasVP MathiasVP marked this pull request as ready for review September 14, 2020 20:27
@MathiasVP MathiasVP requested a review from a team as a code owner September 14, 2020 20:27
Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm very excited about this PR. We'll have to figure out how it fits into the collection taint modelling we're doing these days.

Comment on lines 246 to 250
// `x[i] = taint()`
store.getDestinationAddress() instanceof PointerAddInstruction
or
// `*p = taint()`
store.getDestinationAddress().(CopyValueInstruction).getUnary() instanceof LoadInstruction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These rules seem overly syntactic, but we can always tweak them later if there's a need. The important thing is they're guaranteed not to overlap with getWrittenField.

Copy link
Contributor Author

@MathiasVP MathiasVP Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The PointerAdd case I think we can live with, but requiring the presence of a CopyValue instruction as a direct successor to the Load instruction is quite fragile. For instance, for this store:

int source();
void f(char* s) {
  *(int*)s = source();
}

there's a Convert instruction in between the Load and the CopyValue, which means that this is not recognized as a store step by arrayStoreStepChi.

private predicate arrayReadStep(Node node1, ArrayContent a, Node node2) {
a = TArrayContent() and
exists(LoadInstruction load |
node1.asInstruction() = load.getSourceValueOperand().getAnyDef() and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition seems overly broad. Can we restrict the address operand to particular syntactic forms like in arrayStoreStepChi and ensure there is no overlap with fieldReadStep?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was also my main worry when I was CPP-difference, but luckily didn't appear to be a problem. For explicit array and pointer reads (i.e., x[i]) we could restrict it by requiring that the source address instruction is a PointerAdd instruction. But for an example like:

void f(int* s) {
  *s = source();
}

void test() {
  int x;
  f(&x);
  sink(x);
}

(where the read step is currently the load of x in sink(x)) I don't see how to syntactically restrict the predicate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But can you add something like and not fieldReadStep(node1, _, _) to ensure there is no overlap?

Copy link
Contributor Author

@MathiasVP MathiasVP Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I was hoping your solution would Just Work, but the following shows that we need to do something else, I think:

void f(int* px) {
  *px = source();
}

void test_outer_with_ptr() {
  Point p;
  f(&p.x);
  sink(p.x);
}

Here, the load of p.x in sink(p.x) actually looks like a fieldReadStep, but (as *px = source() matches arrayStoreStepChi), we actually need an arrayReadStep in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this and decided it doesn't need to block this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As reported in #4298 this is a source of field conflation (which I would've discovered if I had tried to replace p.x with p.y in the example above 😢). This is fixed by #4302.

// Buffers of unknown size
t instanceof UnknownType
)
or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove the same rule from IR TaintTrackingUtil.qll.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've removed it in 0b97a4a.

@MathiasVP MathiasVP added the depends on internal PR This PR should only be merged in sync with an internal Semmle PR label Sep 16, 2020
@MathiasVP
Copy link
Contributor Author

MathiasVP commented Sep 16, 2020

I didn't notice until now that there's actually a result change in our tests for UncontrolledFormatStringl.ql in argvLocal.c, where we no longer report these two results after 78b24b7 (which removed array element -> array taint):

// BAD: i5 value comes from argv
char i5[5012];
i5[0] = argv[1][0];
printf(i5);
printWrapper(i5);
	
// BAD: i5 value comes from argv
printf(i5 + 1);
printWrapper(i5 + 1);

Which are legitimate uses of field -> object flow.

To keep this PR uncontroversial, I now think that we should:

  • Undo that particular commit, and only remove the t instanceof UnknownType case from DefaultTaintTracking
  • Undo 0b97a4a which removed the same rule from TaintTrackingUtil that was removed in )
    The good thing is that now those two rules will be completely identical. Once we have the field-to-object-flow-at-sink flow as mentioned in https://github.com/github/codeql-c-analysis-team/issues/103 we can remove the rule completely.

How does this sound?

…e we get the flows from dataflow already now."

This reverts commit 78b24b7.
…removed from DefaultTaintTracking.qll"

This reverts commit 0b97a4a.
@jbj
Copy link
Contributor

jbj commented Sep 16, 2020

Which are legitimate uses of field -> object flow.

How so? I don't see any fields there, so I'd classify it as element -> array.

How does this sound?

SGTM.

@MathiasVP
Copy link
Contributor Author

MathiasVP commented Sep 16, 2020

How so? I don't see any fields there, so I'd classify it as element -> array.

Right. I meant element -> array. Good catch!

@MathiasVP
Copy link
Contributor Author

This PR should be good to go now! The remaining test failures are accepted in the internal repo (which is green).

@jbj jbj merged commit c67605f into github:main Sep 18, 2020
MathiasVP added a commit to MathiasVP/ql that referenced this pull request Sep 18, 2020
MathiasVP added a commit that referenced this pull request Sep 18, 2020
…ontent

C++: Add test demonstrating field conflation after merging #4230
jbj added a commit that referenced this pull request Sep 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C++ depends on internal PR This PR should only be merged in sync with an internal Semmle PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants