-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Python: Cache more predicates to improve performance. #7339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
af54bcc to
dbfe0c5
Compare
ade85c4 to
8893268
Compare
080bdb0 to
2c4af29
Compare
tausbn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this!
All of the changes look sensible to me (modulo one comment). 👍
Now I guess I need to rejigger my intuitions for what happens in which stages...
| /** Gets the function for this statement */ | ||
| Function getDefinedFunction() { | ||
| exists(FunctionExpr func | this.containsInScope(func) and result = func.getInnerScope()) | ||
| result = f.getInnerScope() // XXX: This behaves very differently. But from inspecting the results of the previous version, that had every function in the same scope as the result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no. 😲
This looks more correct to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I'll delete the comment.
Edit: It seems I had already deleted it in a later commit.
| * Where var may be redefined in call to `foo` if `var` escapes (is global or non-local). | ||
| */ | ||
| pragma[noinline] | ||
| pragma[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one was surprising to me, but I assume the performance tests show that it's a net improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while, so I don't really remember.
But I suppose it was an improvement.
| private predicate isBeforeCode(Comment c, File f) { | ||
| f = c.getLocation().getFile() and | ||
| minStmtLine(f) < c.getLocation().getStartLine() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that you're just following the naming scheme set out in the file already, but isn't this predicate actually expressing that the comment c appears after code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very confused. I'm sure I saw a commit that fixed this, but it appears to be unfixed. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You definitely saw that, I'm not sure why that disappeared in the rebase.
(I did the rebase to try to get QL-for-QL to stop complaining).
I reintroduced the fix, and also updated the expected output for the location test (that has an extra result due to the Django models getting imported).
This reverts commit 84bc904. It caused ql/abstract-class-import alerts
|
Code-scanning was complaining about an (I did introduce it, but that was in the def-node PR). It's fixed now. |
tausbn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Let's get it in before it starts conflicting with other stuff. ![]()
For this PR I have chosen not to rebase and squash, the commit history are the iterations I went through in making this PR.
(And the commit-messages are accordingly).
I recommend not reviewing each commit as much gets reverted along the way.
But the commits give a good idea about the process that I followed.
nightly/lgtm-full evaluation (~11% average speedup).
nightly/lgtm-full evaluation (single-threaded) (~15% average speedup).
The speedup gets larger the more queries are added in an evaluation, so the above represents a best-case performance gain.
Single-query performance will be slower if you start with an empty cache.
Here is the DB size increase.
Here is the process I followed:
CachedStages.qllfile from JavaScript into Python.main, to figure out which queries are slow.Results in:.These sections tell you which predicates are the cached predicates in a given stage.
Use
CachedStages.qllto group some of these stages in a way that makes sense.5.1. Look at the second to last
Clause timing report(the report for the last stage) of a slow query, addcacheto expensive non-query specific predicates (and add them toCachedStages.qll).5.2. Fix any bad join orders that are introduced.
5.3. Look at
Results in:again, and check if there are anycachedpredicates that should be grouped.5.4. If you've accidentally grouped too many stages into one big group, then revert whatever you did.
5.5. Revert some caching if the DB size increases too much.
5.6. Revert some imports if I caused
ql/abstract-class-importalerts.(
Start an experiment->Use saved experiment->From previous experimentis helpful).codeql-actionoptions.