Skip to content

Ruby: Improvements to RegExpTracking #13077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 15, 2023

Conversation

hvitved
Copy link
Contributor

@hvitved hvitved commented May 9, 2023

When #11879 switched from using global data flow to using type tracking, for identifying regular expressions, we lost flow through built-in string methods. This is because the conjunct

TaintTracking::localTaintStep(nodeFrom, nodeTo) and
nodeFrom.(DataFlowPrivate::SummaryNode).getSummarizedCallable() instanceof String::SummarizedCallable

doesn't work in type tracking, where we don't have flow-through. Instead, we should be using FlowSummaryImpl::Private::Steps::summaryThroughStepTaint, which this PR does.

Moreover, this PR improves on performance by strengthening the initial forwards/backwards pruning, which also revealed a bug in the existing TypeBackTracker::getACompatibleTypeTracker predicate.

DCA shows a nice speedup on artichoke__artichoke.

@github-actions github-actions bot added the Ruby label May 9, 2023
@hvitved hvitved force-pushed the ruby/track-regexp-improvements branch 2 times, most recently from 59de393 to 1d723a4 Compare May 9, 2023 13:16
@github-actions github-actions bot added the Python label May 9, 2023
@hvitved hvitved force-pushed the ruby/track-regexp-improvements branch 2 times, most recently from b440db9 to ddebf59 Compare May 9, 2023 13:46
@hvitved hvitved added the no-change-note-required This PR does not need a change note label May 9, 2023
@hvitved hvitved marked this pull request as ready for review May 9, 2023 18:19
@hvitved hvitved requested review from a team as code owners May 9, 2023 18:19
@hvitved hvitved requested a review from erik-krogh May 9, 2023 18:19
Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your implementation looks good.
I'm used to "content" just being string-constants that directly correspond to property names, but I see that it's actually more complicated here.

You are using that string-constants gets converted to regular-expressions, but not the other way around, which makes the implementation conceptually simpler.
I should probably have done that from the start.


As said, the implementation is fine, but I don't think the readability is.
E.g. what is n in regFromString?
I know what the code did previously, which helped in reading it.
But I think the rest of the Ruby/Python team will find it harder.
If you can look through the code again and give some more descriptive names (and maybe some selective docstrings), that would help.

@hvitved hvitved force-pushed the ruby/track-regexp-improvements branch from ddebf59 to 51087d0 Compare May 10, 2023 07:42
erik-krogh
erik-krogh previously approved these changes May 10, 2023
Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

But I think you also need a review from the language teams.
@github/codeql-python / @github/codeql-ruby: your turn.

yoff
yoff previously approved these changes May 10, 2023
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python change is good, thanks :-)
As I now read the other parts also, I left some questions, but they are mostly for my own curiosity..

@hvitved hvitved dismissed stale reviews from yoff and erik-krogh via 425ebba May 10, 2023 12:04
@calumgrant calumgrant requested a review from aibaars May 15, 2023 08:34
@yoff yoff self-requested a review May 15, 2023 11:04
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - nice refactor, I enjoyed reading that :-)

@hvitved hvitved merged commit 9dede31 into github:main May 15, 2023
@hvitved hvitved deleted the ruby/track-regexp-improvements branch May 15, 2023 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-change-note-required This PR does not need a change note Python Ruby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants