Skip to content

Conversation

@RasmusWL
Copy link
Member

@RasmusWL RasmusWL commented Mar 16, 2022

As a real consistency query, so it will be run as part of ALL tests. (which might make CI take longer, but the value is nice I think)

I've made a dummy consistency query in #8458 to convince reviewers that these consistency queries are actually run 😊

@RasmusWL RasmusWL requested a review from a team as a code owner March 16, 2022 08:41
@RasmusWL
Copy link
Member Author

AHA, a few inconsistencies uncovered 🕵️ @yoff maybe we can work together on fixing these?

@RasmusWL RasmusWL marked this pull request as draft June 20, 2022 09:17
@RasmusWL RasmusWL force-pushed the add-dataflow-consistency-query branch from 30681b2 to 66cf8cb Compare September 23, 2022 09:02
@yoff
Copy link
Contributor

yoff commented Sep 23, 2022

Interesting. There are a few instances of Node has multiple PostUpdateNodes. The rest are missing toStrings.

@RasmusWL RasmusWL force-pushed the add-dataflow-consistency-query branch 2 times, most recently from 129bc13 to 16659f8 Compare August 22, 2023 11:16
@yoff
Copy link
Contributor

yoff commented Aug 24, 2023

It seems we have very few failure modes:

  • Call should have one enclosing callable but has 0. (Lots)
  • Node steps to itself (Lots)
  • Store step does not preserve enclosing callable. (Just a few)

I wonder what is going on here? We should not have semantic changes, should we? Is this to do with the missing file?

@RasmusWL
Copy link
Member Author

I wonder what is going on here? We should not have semantic changes, should we? Is this to do with the missing file?

same problem as #14037

@yoff
Copy link
Contributor

yoff commented Aug 24, 2023

same problem as #1403

Ah, so updating should fix it.

@RasmusWL RasmusWL force-pushed the add-dataflow-consistency-query branch from 16659f8 to 80d67d0 Compare November 21, 2023 10:47
@github-actions github-actions bot added the C++ label Nov 21, 2023
@RasmusWL RasmusWL force-pushed the add-dataflow-consistency-query branch from 80d67d0 to 2b438cf Compare November 21, 2023 10:51
@github-actions github-actions bot removed the C++ label Nov 21, 2023
@RasmusWL
Copy link
Member Author

woops, git merge did things to C++ which was certainly not intended

@RasmusWL RasmusWL force-pushed the add-dataflow-consistency-query branch from 2b438cf to 2ec1822 Compare November 21, 2023 11:44
@RasmusWL RasmusWL marked this pull request as ready for review November 21, 2023 12:05
}

predicate multipleArgumentCallExclude(ArgumentNode arg, DataFlowCall call) {
isArgumentNode(arg, call, _)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be strengthened, to make it clear where you expect multiple arguments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done in f9d7bec

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, that made it too strict. For the one case I have debugged so far:

In the code super(Base, self).foo() we use self as an argument in both the super() call (pos 1) and in the .foo() call (pos self). Will look into fixing that tomorrow

predicate argHasPostUpdateExclude(ArgumentNode n) {
exists(ArgumentPosition apos | n.argumentOf(_, apos) and apos.isStarArgs(_))
or
exists(ArgumentPosition apos | n.argumentOf(_, apos) and apos.isDictSplat())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a similar exclusion in Ruby for implicit hash-splats, but this is only because we haven't yet implemented proper support. Is it the same for Python? I.e., in Ruby we do not currently handle

def foo(**args)
  args[:p].setField taint
end

foo(p: x)
sink(x.getField)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the assertion in the following code holds. This was not really something I've thought too much about though, since it's not something I've ever seen in real code -- but I agree that it could happen 👍

def set_foo(**args):
    args["p"].foo = 42

class MyClass: pass
c = MyClass()
set_foo(p=c)
assert c.foo == 42

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same goes for *args parameter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have documented the missing flow for Ruby here: #14859.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, done for Python here: #14936

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great comments spelling out the different situations. I have added some suggestions around aligning the code a bit more with the comments.

Comment on lines +95 to +102
exists(DataFlowCall getAttrCall, DataFlowCall methodCall, AttrRead attr |
call in [getAttrCall, methodCall]
|
arg = getAttrCall.getArgument(any(ArgumentPosition p | p.isPositional(0))) and
arg = methodCall.getArgument(any(ArgumentPosition p | p.isSelf())) and
attr.getObject() = arg and
attr.(CfgNode).getNode() = getAttrCall.getNode()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use GetAttrCallNode here for getAttrCall? Then we would be a bit more sure that this is the method being called, and arg = getAttrCall.getObject() would also cover the object= case.

Copy link
Member Author

@RasmusWL RasmusWL Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's private, so we can't without making it public:

private class GetAttrCallNode extends BuiltinAttrCallNode {

I also don't see how this would improve things. attr.(CfgNode).getNode() = getAttrCall.getNode() makes sure that the AttrRead and the DataFlowCall shares the underlying CFG node... isn't that enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't see how this would improve things. attr.(CfgNode).getNode() = getAttrCall.getNode() makes sure that the AttrRead and the DataFlowCall shares the underlying CFG node... isn't that enough?

It probably is, but the argument is by casing on the implemented AttrReads and their shape. it might be nicer to ensure directly that getAttrCall is a call to the Python function getattr.

I will not push hard for this, though, the tests will probably make noise if it becomes a problem..

// In the code `super(Base, self).foo()` we use `self` as an argument in both the
// super() call (pos 1) and in the .foo() call (pos self).
exists(DataFlowCall superCall, DataFlowCall methodCall | call in [superCall, methodCall] |
exists(superCallTwoArgumentTracker(_, arg)) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this all our evidence that superCall is a call to super? Could we have something like superCall.getNode() = superCallTwoArgumentTracker(_, arg) instead or does that not eliminate all these inconsistencies?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. if you look at the implementation of superCallTwoArgumentTracker, it's actually that arg is the "object" argument in the super() call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, and the result is something that the call flows to. That is why I suggested as I did. At the moment, we just know that arg is an argument to a call to super and that it is also arg 1 to superCall (and arg self to methodCall). Presumably it could also be an argument to many other things (and the call to super could be different from superCall), that is what we are testing after all..

Comment on lines 112 to 118
// in the code `def func(self): super().foo(); super.bar()` we use `self` as the
// (pos self) argument in both .foo() and .bar() calls.
exists(Function f |
exprNode(f.getArg(0)) = arg and
call.getNode().getScope() = f and
arg = call.getArgument(any(ArgumentPosition p | p.isSelf()))
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think you want an other in this logic as well? Otherwise you may be hiding other problems?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, done in 02f2031

private import Public

predicate argHasPostUpdateExclude(ArgumentNode n) {
exists(ArgumentPosition apos | n.argumentOf(_, apos) and apos.isStarArgs(_))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a comments saying "TODO: make this unnecessary"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

predicate argHasPostUpdateExclude(ArgumentNode n) {
exists(ArgumentPosition apos | n.argumentOf(_, apos) and apos.isStarArgs(_))
or
exists(ArgumentPosition apos | n.argumentOf(_, apos) and apos.isDictSplat())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a comments saying "TODO: make this unnecessary"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@RasmusWL RasmusWL requested a review from yoff November 28, 2023 13:09
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that we could be a bit more robust, but I am not sure it is terribly important, and getting this in will be great, so I am approving now.

@RasmusWL RasmusWL merged commit 2fed0ad into github:main Dec 4, 2023
@RasmusWL RasmusWL deleted the add-dataflow-consistency-query branch December 4, 2023 11:51
RasmusWL added a commit to RasmusWL/codeql that referenced this pull request Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants