Skip to content

Conversation

@yoff
Copy link
Contributor

@yoff yoff commented Jan 18, 2022

Cleaned up version of #7356

QL part of https://github.com/github/semmle-code/pull/41454

  • AST nodes for patterns
    • sub expressions, sub statements, and sub patterns
    • add sub patterns to the new case statement
    • scopes for patterns computed as the scope of the enclosing case (could be added to py_scopes instead)
  • SSA for variables created inside match statements
    • PatternCaptureDefinition and PatternAliasDefinition
  • data flow through patterns
    • subject of match flows to each top-level pattern (could be more like use-use-flow and only flow directly to the first one)
    • shape based flow inside patterns
      • MatchAsPattern
      • MatchOrPattern
      • MatchLiteralPattern (no flow)
      • MatchCapturePattern
      • MatchWildcardPattern (no flow)
      • MatchValuePattern (no flow)
      • MatchSequencePattern
        • when tuple
        • when list
      • MatchStarPattern
      • MatchMappingPattern
      • MatchDoubleStarPattern
      • MatchKeyValuePattern
      • MatchClassPattern
        • for keyword arguments
        • for positional arguments
      • MatchKeywordPattern
  • create upgrade script
  • create downgrade script

@yoff yoff requested a review from a team as a code owner January 18, 2022 20:13
@yoff yoff marked this pull request as draft January 19, 2022 12:30
@yoff yoff force-pushed the python/support-match branch from c98f6af to cb7d358 Compare January 19, 2022 12:36
yoff added 4 commits January 19, 2022 14:29
- new syntactic category `Pattern` (in `Patterns.qll`)
- subpatterns available on statments
- new statements `MatchStmt` and `Case`
  (`Match` would conflict with the shared ReDoS library)
- new expression `Guard`
- support for pattern lists
- new SSA definition `PatternCaptureDefinition`
- new SSA definition `PatternAliasDefinition`
- implement `hasDefiningNode`
- also update `validTest.py`, but commented out for now
  otherwise CI will fail until we force it to run with Python 3.10
- added debug utility for dataflow (`dataflowTestPaths.ql`)
@yoff yoff force-pushed the python/support-match branch from cb7d358 to 74108e9 Compare January 19, 2022 13:30
@yoff yoff force-pushed the python/support-match branch from 74108e9 to db253e8 Compare January 19, 2022 14:23
@yoff
Copy link
Contributor Author

yoff commented Jan 20, 2022

The language tests pass in the Semmle-code PR.
Here, I see errors of the form

[VALUE_NOT_IN_TYPE] predicate py_bools(@py_bool_parent parent, int idx): Value 4333 of field parent is not in type @py_bool_parent. The value is however in the following types: @py_Try. Appears in tuple (4333,4)

Does that indicate a problem with the upgrade script?

@yoff yoff marked this pull request as ready for review January 20, 2022 12:22
@yoff
Copy link
Contributor Author

yoff commented Jan 24, 2022

The language tests pass in the Semmle-code PR. Here, I see errors of the form

[VALUE_NOT_IN_TYPE] predicate py_bools(@py_bool_parent parent, int idx): Value 4333 of field parent is not in type @py_bool_parent. The value is however in the following types: @py_Try. Appears in tuple (4333,4)

Does that indicate a problem with the upgrade script?

Ah, I guess that is expected since this repository contains the new dbscheme, but not the updated extractor.

Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of comments, but overall this is a wonderful piece of work!
I especially like the thorough comments and the clear division regarding setting/removing Content in flow steps. Made it very easy to review. 🙂


/** INTERNAL: See the class `MatchAsPattern` for further information. */
library class MatchAsPattern_ extends @py_MatchAsPattern, Pattern {
/** Gets the pattern of this matchaspattern pattern. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matchaspattern is not the most elegant thing, though I guess it'll be a pain to get it to autogenerate it better.

I guess we can just ignore it for now. We can always override the QLDoc on the wrapper class, if need be.

Comment on lines 1593 to 1594
* `command` is the subject of the as-pattern, while the second component of `command` is the subject
* of the first capture pattern. As such, 'subject' refers to the pattern under evaluation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly confused by this description. I see two as-patterns, and a second component only appears in the second of these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that description is wrong, I think it has not been updated. Thanks for catching that.

library class ExprParent extends ExprParent_ { }

/** Internal implementation class */
library class PatternListParent extends PatternListParent_ { }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can leave off the library bit. We don't need that any more.

def test_value_pattern():
match SOURCE:
case Unsafe.VALUE as x:
SINK(x) #$ flow="SOURCE, l:-2 -> x" MISSING: flow="SOURCE, l:-5 -> x"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This missing flow is curious. Do you know what causes it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I thought we would need the guard aspect to be implemented, but now that you mention it, there is already flow from the value pattern Unsafe.VALUE to the alias x. So all we should need is flow from the actual value to the value pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we are also missing flow from the SSA variable on line 58 to the attribute at line 62.

def test_sequence_pattern_list():
match [NONSOURCE, SOURCE]:
case [x, y]:
SINK_F(x) #$ SPURIOUS: flow="SOURCE, l:-2 -> x"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note the spurious flow here, which I assume is caused by the list content mentioned elsewhere. I feel like we should be able to tighten this up a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the incoming subject [NONSOURCE, SOURCE] has list content, so we really do not know where in the list the taint resides. In more realistic code, it would be written as

match l:
  case [x, y]:
    ... use x ...

If l has list content, we have to read it into x.

yoff and others added 4 commits January 27, 2022 10:31
The comment about `py_scopes` was simply removed
Particularly in value and literal patterns.
This is getting a little bit into the guards aspect of matching.
We could similarly add reverse flow in terms of
sub-patterns storing to a sequence pattern,
a flow step from alternatives to an-or-pattern, etc..
It does not seem too likely that sources are embedded in patterns
to begin with, but for secrets perhaps?

It is illustrated by the literal test. The value test still fails.
I believe we miss flow in general from the static attribute.
@yoff yoff requested a review from tausbn January 27, 2022 16:39
tausbn
tausbn previously approved these changes Jan 28, 2022
Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Ship it! :shipit:

@tausbn tausbn merged commit 47a57e0 into main Jan 28, 2022
@tausbn tausbn deleted the python/support-match branch January 28, 2022 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants