Python: Add dataflow consistency query #8457

RasmusWL · 2022-03-16T08:41:39Z

As a real consistency query, so it will be run as part of ALL tests. (which might make CI take longer, but the value is nice I think)

I've made a dummy consistency query in #8458 to convince reviewers that these consistency queries are actually run 😊

RasmusWL · 2022-03-16T10:19:45Z

AHA, a few inconsistencies uncovered 🕵️ @yoff maybe we can work together on fixing these?

yoff · 2022-09-23T13:40:43Z

Interesting. There are a few instances of Node has multiple PostUpdateNodes. The rest are missing toStrings.

yoff · 2023-08-24T07:57:10Z

It seems we have very few failure modes:

Call should have one enclosing callable but has 0. (Lots)
Node steps to itself (Lots)
Store step does not preserve enclosing callable. (Just a few)

I wonder what is going on here? We should not have semantic changes, should we? Is this to do with the missing file?

RasmusWL · 2023-08-24T08:00:19Z

I wonder what is going on here? We should not have semantic changes, should we? Is this to do with the missing file?

same problem as #14037

yoff · 2023-08-24T09:41:07Z

same problem as #1403

Ah, so updating should fix it.

RasmusWL · 2023-11-21T10:51:40Z

woops, git merge did things to C++ which was certainly not intended

hvitved · 2023-11-21T12:14:48Z

python/ql/consistency-queries/DataFlowConsistency.ql

+  }
+
+  predicate multipleArgumentCallExclude(ArgumentNode arg, DataFlowCall call) {
+    isArgumentNode(arg, call, _)


Could this be strengthened, to make it clear where you expect multiple arguments?

Yes, done in f9d7bec

Argh, that made it too strict. For the one case I have debugged so far:

In the code super(Base, self).foo() we use self as an argument in both the super() call (pos 1) and in the .foo() call (pos self). Will look into fixing that tomorrow

hvitved · 2023-11-21T12:18:41Z

python/ql/consistency-queries/DataFlowConsistency.ql

+  predicate argHasPostUpdateExclude(ArgumentNode n) {
+    exists(ArgumentPosition apos | n.argumentOf(_, apos) and apos.isStarArgs(_))
+    or
+    exists(ArgumentPosition apos | n.argumentOf(_, apos) and apos.isDictSplat())


We have a similar exclusion in Ruby for implicit hash-splats, but this is only because we haven't yet implemented proper support. Is it the same for Python? I.e., in Ruby we do not currently handle

def foo(**args) args[:p].setField taint end foo(p: x) sink(x.getField)

Yes, the assertion in the following code holds. This was not really something I've thought too much about though, since it's not something I've ever seen in real code -- but I agree that it could happen 👍

def set_foo(**args): args["p"].foo = 42 class MyClass: pass c = MyClass() set_foo(p=c) assert c.foo == 42

same goes for *args parameter

I have documented the missing flow for Ruby here: #14859.

Nice, done for Python here: #14936

yoff

Thanks for the great comments spelling out the different situations. I have added some suggestions around aligning the code a bit more with the comments.

yoff · 2023-11-28T11:03:34Z

python/ql/consistency-queries/DataFlowConsistency.ql

+    exists(DataFlowCall getAttrCall, DataFlowCall methodCall, AttrRead attr |
+      call in [getAttrCall, methodCall]
+    |
+      arg = getAttrCall.getArgument(any(ArgumentPosition p | p.isPositional(0))) and
+      arg = methodCall.getArgument(any(ArgumentPosition p | p.isSelf())) and
+      attr.getObject() = arg and
+      attr.(CfgNode).getNode() = getAttrCall.getNode()
+    )


Could we use GetAttrCallNode here for getAttrCall? Then we would be a bit more sure that this is the method being called, and arg = getAttrCall.getObject() would also cover the object= case.

it's private, so we can't without making it public:

codeql/python/ql/lib/semmle/python/dataflow/new/internal/Attributes.qll

Line 163 in 663096f

private class GetAttrCallNode extends BuiltinAttrCallNode {

I also don't see how this would improve things. attr.(CfgNode).getNode() = getAttrCall.getNode() makes sure that the AttrRead and the DataFlowCall shares the underlying CFG node... isn't that enough?

I also don't see how this would improve things. attr.(CfgNode).getNode() = getAttrCall.getNode() makes sure that the AttrRead and the DataFlowCall shares the underlying CFG node... isn't that enough?

It probably is, but the argument is by casing on the implemented AttrReads and their shape. it might be nicer to ensure directly that getAttrCall is a call to the Python function getattr.

I will not push hard for this, though, the tests will probably make noise if it becomes a problem..

yoff · 2023-11-28T11:12:06Z

python/ql/consistency-queries/DataFlowConsistency.ql

+    // In the code `super(Base, self).foo()` we use `self` as an argument in both the
+    // super() call (pos 1) and in the .foo() call (pos self).
+    exists(DataFlowCall superCall, DataFlowCall methodCall | call in [superCall, methodCall] |
+      exists(superCallTwoArgumentTracker(_, arg)) and


Is this all our evidence that superCall is a call to super? Could we have something like superCall.getNode() = superCallTwoArgumentTracker(_, arg) instead or does that not eliminate all these inconsistencies?

yes. if you look at the implementation of superCallTwoArgumentTracker, it's actually that arg is the "object" argument in the super() call.

Indeed, and the result is something that the call flows to. That is why I suggested as I did. At the moment, we just know that arg is an argument to a call to super and that it is also arg 1 to superCall (and arg self to methodCall). Presumably it could also be an argument to many other things (and the call to super could be different from superCall), that is what we are testing after all..

yoff · 2023-11-28T11:16:00Z

python/ql/consistency-queries/DataFlowConsistency.ql

+    // in the code `def func(self): super().foo(); super.bar()` we use `self` as the
+    // (pos self) argument in both .foo() and .bar() calls.
+    exists(Function f |
+      exprNode(f.getArg(0)) = arg and
+      call.getNode().getScope() = f and
+      arg = call.getArgument(any(ArgumentPosition p | p.isSelf()))
+    )


I would think you want an other in this logic as well? Otherwise you may be hiding other problems?

good point, done in 02f2031

yoff · 2023-11-28T11:27:26Z

python/ql/consistency-queries/DataFlowConsistency.ql

+  private import Public
+
+  predicate argHasPostUpdateExclude(ArgumentNode n) {
+    exists(ArgumentPosition apos | n.argumentOf(_, apos) and apos.isStarArgs(_))


Should we have a comments saying "TODO: make this unnecessary"?

yoff · 2023-11-28T11:27:36Z

python/ql/consistency-queries/DataFlowConsistency.ql

+  predicate argHasPostUpdateExclude(ArgumentNode n) {
+    exists(ArgumentPosition apos | n.argumentOf(_, apos) and apos.isStarArgs(_))
+    or
+    exists(ArgumentPosition apos | n.argumentOf(_, apos) and apos.isDictSplat())


Should we have a comments saying "TODO: make this unnecessary"?

…kwargs arguments

yoff

I still think that we could be a bit more robust, but I am not sure it is terribly important, and getting this in will be great, so I am approving now.

We forgot to delete that file in github#8457

RasmusWL requested a review from a team as a code owner March 16, 2022 08:41

github-actions bot added the Python label Mar 16, 2022

RasmusWL marked this pull request as draft June 20, 2022 09:17

RasmusWL force-pushed the add-dataflow-consistency-query branch from 30681b2 to 66cf8cb Compare September 23, 2022 09:02

RasmusWL force-pushed the add-dataflow-consistency-query branch 2 times, most recently from 129bc13 to 16659f8 Compare August 22, 2023 11:16

Python: Add dataflow consistency query

b6df6b7

RasmusWL force-pushed the add-dataflow-consistency-query branch from 16659f8 to 80d67d0 Compare November 21, 2023 10:47

github-actions bot added the C++ label Nov 21, 2023

Python: Remove old manual consistency query tests

df9fb14

RasmusWL force-pushed the add-dataflow-consistency-query branch from 80d67d0 to 2b438cf Compare November 21, 2023 10:51

github-actions bot removed the C++ label Nov 21, 2023

Python: Accept consistency-errors in django-orm

2ec1822

RasmusWL force-pushed the add-dataflow-consistency-query branch from 2b438cf to 2ec1822 Compare November 21, 2023 11:44

RasmusWL marked this pull request as ready for review November 21, 2023 12:05

hvitved reviewed Nov 21, 2023

View reviewed changes

RasmusWL added 3 commits November 21, 2023 15:57

Python: Make multipleArgumentCallExclude more specific

f9d7bec

Python: Highlight even more cases for multipleArgumentCallExclude

67b1414

Python: Fix consistency for bound-methods used in list-comp

4a98ed9

yoff reviewed Nov 28, 2023

View reviewed changes

RasmusWL added 2 commits November 28, 2023 14:04

Python: Ensure other call for super().foo

02f2031

Python: Highlight we actually want post-update nodes for *args and **…

2c10160

…kwargs arguments

RasmusWL requested a review from yoff November 28, 2023 13:09

yoff approved these changes Nov 28, 2023

View reviewed changes

RasmusWL merged commit 2fed0ad into github:main Dec 4, 2023

RasmusWL deleted the add-dataflow-consistency-query branch December 4, 2023 11:51

RasmusWL added a commit to RasmusWL/codeql that referenced this pull request Dec 14, 2023

Python: Delete old copy of DataFlowImplConsistency.qll

2a98a7e

We forgot to delete that file in github#8457

RasmusWL mentioned this pull request Dec 14, 2023

Python: Delete old copy of DataFlowImplConsistency.qll #15109

Merged

Python: Add dataflow consistency query #8457

Python: Add dataflow consistency query #8457

Uh oh!

Conversation

RasmusWL commented Mar 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RasmusWL commented Mar 16, 2022

Uh oh!

yoff commented Sep 23, 2022

Uh oh!

yoff commented Aug 24, 2023

Uh oh!

RasmusWL commented Aug 24, 2023

Uh oh!

yoff commented Aug 24, 2023

Uh oh!

RasmusWL commented Nov 21, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yoff left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RasmusWL Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yoff left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RasmusWL commented Mar 16, 2022 •

edited

Loading

RasmusWL Nov 28, 2023 •

edited

Loading