-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Python: Add dataflow consistency query #8457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
AHA, a few inconsistencies uncovered 🕵️ @yoff maybe we can work together on fixing these? |
30681b2 to
66cf8cb
Compare
|
Interesting. There are a few instances of |
129bc13 to
16659f8
Compare
|
It seems we have very few failure modes:
I wonder what is going on here? We should not have semantic changes, should we? Is this to do with the missing file? |
Ah, so updating should fix it. |
16659f8 to
80d67d0
Compare
80d67d0 to
2b438cf
Compare
|
woops, git merge did things to C++ which was certainly not intended |
2b438cf to
2ec1822
Compare
| } | ||
|
|
||
| predicate multipleArgumentCallExclude(ArgumentNode arg, DataFlowCall call) { | ||
| isArgumentNode(arg, call, _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be strengthened, to make it clear where you expect multiple arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done in f9d7bec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, that made it too strict. For the one case I have debugged so far:
In the code super(Base, self).foo() we use self as an argument in both the super() call (pos 1) and in the .foo() call (pos self). Will look into fixing that tomorrow
| predicate argHasPostUpdateExclude(ArgumentNode n) { | ||
| exists(ArgumentPosition apos | n.argumentOf(_, apos) and apos.isStarArgs(_)) | ||
| or | ||
| exists(ArgumentPosition apos | n.argumentOf(_, apos) and apos.isDictSplat()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a similar exclusion in Ruby for implicit hash-splats, but this is only because we haven't yet implemented proper support. Is it the same for Python? I.e., in Ruby we do not currently handle
def foo(**args)
args[:p].setField taint
end
foo(p: x)
sink(x.getField)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the assertion in the following code holds. This was not really something I've thought too much about though, since it's not something I've ever seen in real code -- but I agree that it could happen 👍
def set_foo(**args):
args["p"].foo = 42
class MyClass: pass
c = MyClass()
set_foo(p=c)
assert c.foo == 42There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same goes for *args parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have documented the missing flow for Ruby here: #14859.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, done for Python here: #14936
yoff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great comments spelling out the different situations. I have added some suggestions around aligning the code a bit more with the comments.
| exists(DataFlowCall getAttrCall, DataFlowCall methodCall, AttrRead attr | | ||
| call in [getAttrCall, methodCall] | ||
| | | ||
| arg = getAttrCall.getArgument(any(ArgumentPosition p | p.isPositional(0))) and | ||
| arg = methodCall.getArgument(any(ArgumentPosition p | p.isSelf())) and | ||
| attr.getObject() = arg and | ||
| attr.(CfgNode).getNode() = getAttrCall.getNode() | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use GetAttrCallNode here for getAttrCall? Then we would be a bit more sure that this is the method being called, and arg = getAttrCall.getObject() would also cover the object= case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's private, so we can't without making it public:
| private class GetAttrCallNode extends BuiltinAttrCallNode { |
I also don't see how this would improve things. attr.(CfgNode).getNode() = getAttrCall.getNode() makes sure that the AttrRead and the DataFlowCall shares the underlying CFG node... isn't that enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't see how this would improve things. attr.(CfgNode).getNode() = getAttrCall.getNode() makes sure that the AttrRead and the DataFlowCall shares the underlying CFG node... isn't that enough?
It probably is, but the argument is by casing on the implemented AttrReads and their shape. it might be nicer to ensure directly that getAttrCall is a call to the Python function getattr.
I will not push hard for this, though, the tests will probably make noise if it becomes a problem..
| // In the code `super(Base, self).foo()` we use `self` as an argument in both the | ||
| // super() call (pos 1) and in the .foo() call (pos self). | ||
| exists(DataFlowCall superCall, DataFlowCall methodCall | call in [superCall, methodCall] | | ||
| exists(superCallTwoArgumentTracker(_, arg)) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this all our evidence that superCall is a call to super? Could we have something like superCall.getNode() = superCallTwoArgumentTracker(_, arg) instead or does that not eliminate all these inconsistencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. if you look at the implementation of superCallTwoArgumentTracker, it's actually that arg is the "object" argument in the super() call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, and the result is something that the call flows to. That is why I suggested as I did. At the moment, we just know that arg is an argument to a call to super and that it is also arg 1 to superCall (and arg self to methodCall). Presumably it could also be an argument to many other things (and the call to super could be different from superCall), that is what we are testing after all..
| // in the code `def func(self): super().foo(); super.bar()` we use `self` as the | ||
| // (pos self) argument in both .foo() and .bar() calls. | ||
| exists(Function f | | ||
| exprNode(f.getArg(0)) = arg and | ||
| call.getNode().getScope() = f and | ||
| arg = call.getArgument(any(ArgumentPosition p | p.isSelf())) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think you want an other in this logic as well? Otherwise you may be hiding other problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, done in 02f2031
| private import Public | ||
|
|
||
| predicate argHasPostUpdateExclude(ArgumentNode n) { | ||
| exists(ArgumentPosition apos | n.argumentOf(_, apos) and apos.isStarArgs(_)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a comments saying "TODO: make this unnecessary"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| predicate argHasPostUpdateExclude(ArgumentNode n) { | ||
| exists(ArgumentPosition apos | n.argumentOf(_, apos) and apos.isStarArgs(_)) | ||
| or | ||
| exists(ArgumentPosition apos | n.argumentOf(_, apos) and apos.isDictSplat()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a comments saying "TODO: make this unnecessary"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
yoff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that we could be a bit more robust, but I am not sure it is terribly important, and getting this in will be great, so I am approving now.
We forgot to delete that file in github#8457
As a real consistency query, so it will be run as part of ALL tests. (which might make CI take longer, but the value is nice I think)
I've made a dummy consistency query in #8458 to convince reviewers that these consistency queries are actually run 😊