Skip to content

Conversation

@MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Jan 5, 2022

This PR removes a bunch of bad self joins from the cpp/toctou-race-condition that I observed on php. For instance this one:

(94s) Tuple counts for #select#cpe#135#fff#shared/2@50efafh0 after 4.5s:
20213850 ~0%     {2} r1 = JOIN SSA::SsaDefinition::getAUse_dispred#fff WITH SSA::SsaDefinition::getAUse_dispred#fff ON FIRST 2 OUTPUT Rhs.2 'arg1', Lhs.2 'arg1'
                 return r1

The changes textual changes look quite scary, but it's really just simple refactorings following this "algorithm":

  1. Pull stuff into individual predicates
  2. Manually add some context into those predicates

I recommend enabling "Hide whitespace" when reviewing the PR. That makes the changes really easy to review.

@MathiasVP MathiasVP added the C++ label Jan 5, 2022
@MathiasVP MathiasVP requested a review from a team as a code owner January 5, 2022 15:36
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Jan 5, 2022
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes otherwise LGTM, I'd like to see DCA results or similar there is one, I'll take a look.

)
}

predicate checkUse(Expr check, Expr checkPath, FunctionCall use, Expr usePath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This predicate is odd, there seems to be no association between check / checkPath and use / usePath. Should it be two predicates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an association: If check / checkPath is a check on a filename, then use / usePath is an operation on a filename. And if check / checkPath is another filename operation, then use / usePath looks like a sensitive operation on a filename. So I don't think we can split them up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks like a cartesian product to me (or two smaller ones). All we care about is that the other half exists in the correct form, we don't care which one it is. So we could rewrite to something like:

predicate checkUse(Expr check, Expr checkPath) {
  // `check` looks like a check on a filename
  checksPath(check, checkPath) and
  exists(FunctionCall use, Expr usePath |
    use = filenameOperation(usePath)
  )
  or
  // another filename operation (null pointers can indicate errors)
  check = filenameOperation(checkPath) and
  exists(FunctionCall use, Expr usePath |
    use = sensitiveFilenameOperation(usePath)
  )
}

plus the corresponding predicate for use / usePath. I guess maybe the optimizer is doing this already?

It also seems like a very strange thing to want. I'd be surprised if this is behaving exactly as we want it to.

Copy link
Contributor Author

@MathiasVP MathiasVP Jan 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean wrt. the cartesian product. To satisfy my curiosity, I'll check what the compiler is actually doing here. I also agree that the logic itself is weird. I don't want to change the logic in this PR, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored the predicates in 173cefd to include the missing piece of information (which was still in the select clause) to associates check and use.

@geoffw0
Copy link
Contributor

geoffw0 commented Jan 6, 2022

DCA shows a significant speedup for cpp/toctou-race-condition and no change in results. 👍

@MathiasVP
Copy link
Contributor Author

MathiasVP commented Jan 6, 2022

173cefd fixes the review comments and makes the entire select clause a super crisp giant join now:

Tuple counts for #select#cpe#135#fff/3@3ee072o8 after 0ms:
  15 ~0%     {3} r1 = JOIN TOCTOUFilesystemRace::isCheckedPath#ffbfbf WITH TOCTOUFilesystemRace::isUsedPath#ffffff ON FIRST 6 OUTPUT Lhs.3 'use', Lhs.4 'usePath', Lhs.0 'check'
              return r1

There's been a bunch of changes since I did the first DCA run. So I'll run another one to make sure I didn't destroy any performance on another project.

@MathiasVP
Copy link
Contributor Author

DCA looks good.

@geoffw0
Copy link
Contributor

geoffw0 commented Jan 7, 2022

Except I don't see a cpp/toctou-race-condition improvement on the "Query run timings, per query" this time. Is there a more detailed list somewhere?

@MathiasVP
Copy link
Contributor Author

MathiasVP commented Jan 7, 2022

Except I don't see a cpp/toctou-race-condition improvement on the "Query run timings, per query" this time. Is there a more detailed list somewhere?

Oh, yeah. That's a good point. Here's how I concluded that we still get the performance benefits in the new DCA run:

In the first DCA run the query is mentioned in the Query run timings, per query table in satisfied.md because we were lucky enough to get a 5% performance improvement on the query (which is a magic threshold we've set that makes it appear in the table in satisfied.md). However, in the new DCA run we're only seeing a 3.5% performance improvement. You can see that by looking in the giant any.md file which contains all of the query comparisons. That one has the following row:

query a targets b targets a_m b_m diff relative conclusion
...
cpp/toctou-race-condition 1 1 539.7 520.8 -18.91 -0.0350
...

I think the missing 1.5% performance can be attributed to noise, so I think it's fair to conclude that we didn't lose any performance (especially considering the fact that we removed two cartesian products from the query).

(In fact, the conclusion column being for that row means "this row isn't going to be in satisfied.md.)

@geoffw0
Copy link
Contributor

geoffw0 commented Jan 7, 2022

Thanks, I didn't know about this information in any.md and I agree it looks fine.

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with this, and I think all comments have been addressed.

@MathiasVP MathiasVP merged commit 4ee6533 into github:main Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C++ no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants