Skip to content

Conversation

@geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented Jan 18, 2022

Fix branch related FPs in cpp/improper-null-termination, for example:

if (cond) {ptr = not_null_terminated_string()} else {ptr = null_terminated_string()}
...
if (cond) {add_null_termination(ptr)}
...
use_null_terminated_string(ptr);

Previously this was a false positive result because, although the string is always null terminated before the point of use, this is only true because certain paths through the control flow are impossible (namely taking cond == true on the first line and cond == false on the third line).

The solution I've used to is narrow the query - considering the presence of something that null terminates the string on any path to be enough, rather than requiring a barrier on all (supposed) paths.


LGTM diff query: https://lgtm.com/query/4160689657278568975/ (there are still other types of FPs in the results, but they seem like an improvement)
Performance: about the same (locally)

@geoffw0 geoffw0 added the C++ label Jan 18, 2022
@geoffw0 geoffw0 requested a review from a team as a code owner January 18, 2022 11:16
MathiasVP
MathiasVP previously approved these changes Jan 18, 2022
Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than a tiny QLDoc request.

}

/**
* Flow from a buffer that is not not null terminated to a sink that requires
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we highlight in the QLDoc that this is control flow (and not dataflow, which might be the expected kind of flow when reading this sentence)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

(I'm not entirely sure why this isn't data flow, but it seems to be working well enough)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it would be valuable to compare the current set of results to a version that uses semmle.code.cpp.dataflow.DataFlow, semmle.code.cpp.ir.dataflow.DataFlow, or event semmle.code.cpp.ir.dataflow.TaintTracking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had a try at both. AST dataflow produces a lot of false positives, as best I can tell because it doesn't really know when a pointed buffer might be overwritten. e.g.

char buffer1[1024];
char buffer2[1024];

strcpy(buffer2, "content");
strcpy(buffer1, buffer2); // <-- I get a false positive on buffer2 here as though the line above didn't exist.

The IR DataFlow might do better at that but fails at the source - in this and most other cases an uninitialized buffer. UninitializedNode is deprecated in the IR (and does not work) so I don't get 90% of the sources.

Unless we can fix one of those, the quality of results using data flow is much worse than the current StackVariableReachability version.

Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I don't think we should change any control-flow to dataflow in this PR, so let's just merge these changes now.

@MathiasVP MathiasVP merged commit dfbde23 into github:main Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants