-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Python/support match #7635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python/support match #7635
Conversation
c98f6af to
cb7d358
Compare
- new syntactic category `Pattern` (in `Patterns.qll`) - subpatterns available on statments - new statements `MatchStmt` and `Case` (`Match` would conflict with the shared ReDoS library) - new expression `Guard` - support for pattern lists
- new SSA definition `PatternCaptureDefinition` - new SSA definition `PatternAliasDefinition` - implement `hasDefiningNode`
- also update `validTest.py`, but commented out for now otherwise CI will fail until we force it to run with Python 3.10 - added debug utility for dataflow (`dataflowTestPaths.ql`)
adapted from [the ruby instructions](https://github.com/github/codeql/blob/main/ruby/doc/prepare-db-upgrade.md)
cb7d358 to
74108e9
Compare
74108e9 to
db253e8
Compare
- should still update stats for tables
|
The language tests pass in the Semmle-code PR. Does that indicate a problem with the upgrade script? |
Ah, I guess that is expected since this repository contains the new dbscheme, but not the updated extractor. |
tausbn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of comments, but overall this is a wonderful piece of work!
I especially like the thorough comments and the clear division regarding setting/removing Content in flow steps. Made it very easy to review. 🙂
|
|
||
| /** INTERNAL: See the class `MatchAsPattern` for further information. */ | ||
| library class MatchAsPattern_ extends @py_MatchAsPattern, Pattern { | ||
| /** Gets the pattern of this matchaspattern pattern. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matchaspattern is not the most elegant thing, though I guess it'll be a pain to get it to autogenerate it better.
I guess we can just ignore it for now. We can always override the QLDoc on the wrapper class, if need be.
python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPrivate.qll
Outdated
Show resolved
Hide resolved
python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPrivate.qll
Outdated
Show resolved
Hide resolved
| * `command` is the subject of the as-pattern, while the second component of `command` is the subject | ||
| * of the first capture pattern. As such, 'subject' refers to the pattern under evaluation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly confused by this description. I see two as-patterns, and a second component only appears in the second of these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that description is wrong, I think it has not been updated. Thanks for catching that.
| library class ExprParent extends ExprParent_ { } | ||
|
|
||
| /** Internal implementation class */ | ||
| library class PatternListParent extends PatternListParent_ { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can leave off the library bit. We don't need that any more.
python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPrivate.qll
Outdated
Show resolved
Hide resolved
python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPrivate.qll
Outdated
Show resolved
Hide resolved
python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPublic.qll
Outdated
Show resolved
Hide resolved
| def test_value_pattern(): | ||
| match SOURCE: | ||
| case Unsafe.VALUE as x: | ||
| SINK(x) #$ flow="SOURCE, l:-2 -> x" MISSING: flow="SOURCE, l:-5 -> x" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This missing flow is curious. Do you know what causes it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I thought we would need the guard aspect to be implemented, but now that you mention it, there is already flow from the value pattern Unsafe.VALUE to the alias x. So all we should need is flow from the actual value to the value pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think we are also missing flow from the SSA variable on line 58 to the attribute at line 62.
| def test_sequence_pattern_list(): | ||
| match [NONSOURCE, SOURCE]: | ||
| case [x, y]: | ||
| SINK_F(x) #$ SPURIOUS: flow="SOURCE, l:-2 -> x" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I note the spurious flow here, which I assume is caused by the list content mentioned elsewhere. I feel like we should be able to tighten this up a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, the incoming subject [NONSOURCE, SOURCE] has list content, so we really do not know where in the list the taint resides. In more realistic code, it would be written as
match l:
case [x, y]:
... use x ...
If l has list content, we have to read it into x.
Co-authored-by: Taus <[email protected]>
The comment about `py_scopes` was simply removed
Particularly in value and literal patterns. This is getting a little bit into the guards aspect of matching. We could similarly add reverse flow in terms of sub-patterns storing to a sequence pattern, a flow step from alternatives to an-or-pattern, etc.. It does not seem too likely that sources are embedded in patterns to begin with, but for secrets perhaps? It is illustrated by the literal test. The value test still fails. I believe we miss flow in general from the static attribute.
tausbn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Ship it! ![]()
python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPrivate.qll
Outdated
Show resolved
Hide resolved
…ate.qll Co-authored-by: Taus <[email protected]>
Cleaned up version of #7356
QL part of https://github.com/github/semmle-code/pull/41454
casestatementpy_scopesinstead)PatternCaptureDefinitionandPatternAliasDefinition