Skip to content

Conversation

@hmac
Copy link
Contributor

@hmac hmac commented Dec 7, 2021

When calculating StringlikeLiteral.getValueText, include results from
interpolations where we can determine their string value. For example:

b = "b" # local variable
D = "d" # constant

"a#{b}c"     # getValueText() = "abc"
"a#{b}c{D}"  # getValueText() = "abcd"
/#a#{b}c{D}/ # getValueText() = "abcd"

Because this also applies to regexes, it increases the sensitivity of our ReDoS queries. With this change we find a true positive in GHSA-jxhc-q857-3j6g.

There's one shortcoming to this change: ReDoS alerts may not include the correct location of the problematic (sub)string if it is included via an interpolation. The alert itself, and the suggested problematic input, will still be correct.

@github-actions github-actions bot added the Ruby label Dec 7, 2021
@hmac hmac force-pushed the hmac/regexp-interpolations branch from 0f03e42 to 06f09e4 Compare December 8, 2021 01:39
@hmac hmac marked this pull request as ready for review December 8, 2021 21:51
@hmac hmac requested a review from a team as a code owner December 8, 2021 21:51
// If last statement in the interpolation is a constant or local variable read,
// we attempt to look up its string value.
// If there's a result, we return that as the string value of the interpolation.
final override string getValueText() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and the getValueText further down) should be implemented in CfgNodes.qll instead (i.e., on CFG nodes instead of AST nodes). In general, if getValueText is defined recursively, it should be done in CfgNodes for potential higher precision (in context of CFG splitting). Example

if (b)
  x = "true"
else
  x = "false"

if (b)
  puts x

If (when) Ruby has Boolean CFG splitting, the access to x in the b = true case will have constant value "true", while it will have constant value "false" in the b = false case. However, the AST node x in puts x will not have a constant value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I think I understand what you're saying here. I've updated this PR to push more calculation into the CFG. We still need a few changes at the AST level, I think, because regular string constants aren't CFG nodes. Let me know if I've got the right idea or if I'm still off.

@hmac hmac force-pushed the hmac/regexp-interpolations branch from 06f09e4 to a4a3922 Compare December 13, 2021 02:34
@hmac hmac requested a review from hvitved December 15, 2021 23:34
@hvitved
Copy link
Contributor

hvitved commented Dec 17, 2021

I have added some suggested changes to this PR at #7440.

@hmac hmac force-pushed the hmac/regexp-interpolations branch from acb45ff to 1db95eb Compare December 22, 2021 01:07
hmac and others added 9 commits January 6, 2022 12:25
The superclass definition uses SSA, which doesn't track constants.
This change is split over several commits so it is easier to see.
This change adds some extra lines, which will be populated in the next
commit.
When calculating `StringlikeLiteral.getValueText`, include results from
interpolations where we can determine their string value. For example:

    b = "b" # local variable
    D = "d" # constant

    "a#{b}c"     # getValueText() = "abc"
    "a#{b}c{D}"  # getValueText() = "abcd"
    /#a#{b}c{D}/ # getValueText() = "abcd"
This exercises the support for resolving string interpolations, and is
based on a real vulnerability:

GHSA-jxhc-q857-3j6g)
@hmac hmac force-pushed the hmac/regexp-interpolations branch from 1db95eb to ac9cac7 Compare January 5, 2022 23:27
@hmac hmac force-pushed the hmac/regexp-interpolations branch from 047dd38 to c81b4ce Compare January 18, 2022 02:27
Create a set of classes for components of regex literals,
separate from those of string literals. This allows us to special-case
components of free-spacing regexes (ones with the /x flag) to not have a
`getValueText()`.

This in turn is useful because our regex parser can't handle free-spacing
regexes, so excluding them ensures that we don't generate erroneous
ReDoS alerts.
@hmac hmac force-pushed the hmac/regexp-interpolations branch from 47a0b47 to 4f7f924 Compare January 18, 2022 22:24
@hmac
Copy link
Contributor Author

hmac commented Jan 19, 2022

Performance is improved from the last DCA run. Still slower than main, but not catastrophically.

head~1:

source a targets b targets a_m a_std b_m b_std diff relative
ruby__ruby 1 1 619 0 847 0 228 0.368
opal__opal 1 1 987 0 1778 0 791 0.801

head:

source a targets b targets a_m a_std b_m b_std diff relative
opal__opal 3 3 862.3 13.58 1060 27.87 197.7 0.229
ruby__ruby 3 3 590.7 95.34 809.7 136.5 219 0.371

Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will merge if the latest DCA run looks good.

@hvitved hvitved merged commit cb098df into main Jan 19, 2022
@hvitved hvitved deleted the hmac/regexp-interpolations branch January 19, 2022 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants