New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shared: Switch to dot-separated access paths in summary specs #7878
Conversation
067404d
to
560394f
|
The evaluations aren't done yet, but in the interest of time, it might be worth starting reviews before that. I'd like to nominate some reviewers, but others are obviously welcome to join in:
|
JS
For the rest of you: I've added a backref for how the x of y -> y.x rewrite was made.
|
Evaluations:
|
LGTM, thanks for making this change Asger. Could I please ask you to also update getComponentStackCsv in FlowSummaryImpl.qll(and consequently the expected test output in ql/test/library-tests/dataflow/library and ql/test/library-tests/frameworks/EntityFramework)?
javascript/ql/lib/semmle/javascript/frameworks/data/internal/AccessPathSyntax.qll
Outdated
Show resolved
Hide resolved
javascript/ql/lib/semmle/javascript/frameworks/data/internal/AccessPathSyntax.qll
Outdated
Show resolved
Hide resolved
| private string getRawToken(AccessPath path, int n) { | ||
| // Avoid splitting by '.' since tokens may contain dots, e.g. `Field[foo.Bar.x]`. | ||
| // Instead use regexpFind to match valid tokens, and supplement with a final length | ||
| // check to ensure all characters were included in a token. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // check to ensure all characters were included in a token. | |
| // check (in `AccessPath.hasSyntaxError`) to ensure all characters were included in a token. |
| } | ||
|
|
||
| /** Gets the `n`th-last token, with 0 being the last token. */ | ||
| AccessPathToken getLastToken(int n) { result = getToken(getNumToken() - 1 - n) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting ql-for-ql warnings re: implicit this
| string getArgumentList() { result = this.getPart(2) } | ||
|
|
||
| /** Gets the `n`th argument to this token, such as `x` or `y` from `Member[x,y]`. */ | ||
| string getArgument(int n) { result = this.getArgumentList().splitAt(",", n) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be flexible on whitespace while we have the chance? Should Token[arg1, arg2] mean the same as Token[arg1,arg2]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence here.
On one hand I don't think the spaces actually improve readability, and I also wouldn't want to need an auto-formatter to resolve arguments about CSV rows should be formatted.
On the other hand, allowing spaces is probably the least surprising behaviour; could save someone a lot of time if they simply assumed spaces would work. However, this can also be handled by emitting a CSV validation error when an argument has trailing spaces.
Ultimately I'm leaning towards not allowing spaces, but I'm open to changing it if someone feels strongly about changing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd lean towards allow because when someone is writing new rules it'll take them a wasted hour or so fiddling with other things before they think to run the CSV validator. If the validator was run early enough that e.g. you got a warning from VSCode that'd help, but I would normally only expect something like that to be run at CI time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added support for surrounding spaces
| exists(ParameterPosition pos | | ||
| parseArg(token, pos) and result = SummaryComponent::argument(pos) | ||
| ) | ||
| or | ||
| exists(ArgumentPosition pos | | ||
| parseParam(token, pos) and result = SummaryComponent::parameter(pos) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These ParameterPosition / ArgumentPosition uses seem backwards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's supposed to be backwards (see the diff). An Argument[n] token results in a synthesized parameter, and Parameter[n] results in a synthesized argument (to a callback).
| idx = spec.getNumToken() - 1 and | ||
| stack = SummaryComponentStack::singleton(interpretComponent(spec.getLastToken(idx))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getNumToken - 1 and a use of getLastToken? Do these cancel out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to preserve the original indexing scheme, but yeah it ended up looking weird.
I've pushed a commit that replaces the whole thing with indexes that start at the beginning and use getToken instead of getLastToken. Could you take another look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good! AFAICT getLastToken is now dead code, and I'd like to see whitespace significance avoided wherever possible; otherwise lgtm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed getLastToken
|
Also would be interested to hear arguments for a regexpFind (which might bind and find matches other than at the start of an access path specifier) plus a cross-check based on match lengths, vs. a stateful parse starting at the LHS, with a predicate argument tracking the offset where the previous access path element parse finished, and so where the next one should begin. I imagine the combination of greed and lookaheads in the regexp are supposed to produce exactly the same behaviour, but my regex-fu isn't sufficient to know that for sure. |
The If you try to parse one token at a time, you can't use regexps efficiently, because you can't tell the engine where to start matching (it will start at the beginning and tell you where each match began, that's different). So to parse the Also, the |
Argh, I overlooked this request first time around. Should be fixed in caf5931 |
This reverts commit 9bf522b.
|
Force-pushed to resolve conflicts in the Ruby libraries. Only the commit |
This changes the syntax of input/output specifications to the dot-separated
A.Bstyle instead ofB of A.It briefly retains support for both syntaxes so the tests pass at each commit, but support for the
of-style is removed entirely at the end.AccessPathSyntax.qllis shared between JS and the shared data-flow libraries. Since parsing is no longer entirely trivial, it made sense to share it. I also think we should also share other aspects of parsing (such as then1..n2argument ranges), but I'd like to separate that from the mega-PR.One change from the previous parsing is that an empty access path now has zero tokens; previously, the empty path consisted of a single empty token. This mattered for how
Annotatedsummaries were implemented in Java, where the empty input/output spec must be interpreted in a certain way. This was preserved by special-casing the empty access path ininterpretInputandinterpretOutput. This doesn't feel great, but it seems like this special case was already there, it's just more explicit now.I've started DCA evaluations for all four affected languages (against Differences code-scanning.qls).
The text was updated successfully, but these errors were encountered: