Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-42967: only use '&' as a query string separator #24297

Merged
merged 13 commits into from Feb 14, 2021

Conversation

@AdamGold
Copy link
Contributor

@AdamGold AdamGold commented Jan 22, 2021

bpo-42967: [security] urllib.parse.parse_qsl(): Web cache poisoning -
; as a query args separator

https://bugs.python.org/issue42967

Co-authored-by: Ken Jin [email protected]

@the-knights-who-say-ni
Copy link

@the-knights-who-say-ni the-knights-who-say-ni commented Jan 22, 2021

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@AdamGold

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

bpo-42967: [security] urllib.parse.parse_qsl(): Web cache poisoning -
`;` as a query args separator
@AdamGold AdamGold force-pushed the AdamGold:fix-issue-42967 branch from 0d8e76e to 93730b7 Jan 22, 2021
@AdamGold AdamGold requested a review from ethanfurman as a code owner Jan 22, 2021
@Fidget-Spinner
Copy link
Contributor

@Fidget-Spinner Fidget-Spinner commented Jan 23, 2021

@AdamGold I went ahead and re-ran the CLA bot and it seems you signed the CLA here :).

BTW, what are your thoughts about combining our PRs into one? I already added you as a Co-author to PR-24271, meaning the final commit will show us both as commit authors. I also wanted to use some really nice ideas from this PR if you're okay with it.

@AdamGold
Copy link
Contributor Author

@AdamGold AdamGold commented Jan 23, 2021

@Fidget-Spinner Sounds like a good idea, do you want me to checkout to your branch and push a commit?

@Fidget-Spinner
Copy link
Contributor

@Fidget-Spinner Fidget-Spinner commented Jan 24, 2021

@AdamGold sorry I changed my mind, I decided to close my PR in favor of this one, because I think 2 open PRs might split core devs' attention. I'll just review this one instead.

Copy link
Contributor

@Fidget-Spinner Fidget-Spinner left a comment

Some things:

  • cgi.py should expose these new arguments IMO so that users can switch. Most of cgi.py's arguments are already passed directly into urllib.parse anyways (the docs also mention that behavior here). You can see the affected functions in my PR here https://github.com/python/cpython/pull/24271/files#
  • Following that,FieldStorage should also accept the separator switch IMO, and you may have to pass the separator in a few more places (you can refer to my PR for that too, should be mostly painless :) ).

Other than those nits, and waiting for decision on the bpo about whether to boolean switch or allow user selection, I think this PR is quite well done :).

Lib/test/test_urlparse.py Show resolved Hide resolved
@AdamGold
Copy link
Contributor Author

@AdamGold AdamGold commented Jan 24, 2021

@Fidget-Spinner Thanks! much appreciated.
Regarding CGI and FieldStorage, isn't that used for POST requests?

Lib/test/test_urlparse.py Outdated Show resolved Hide resolved
@Fidget-Spinner
Copy link
Contributor

@Fidget-Spinner Fidget-Spinner commented Jan 25, 2021

@Fidget-Spinner Thanks! much appreciated.
Regarding CGI and FieldStorage, isn't that used for POST requests?

Yup, however it uses parse_qsl internally somewhere, so I'm guessing we probably still have to pass it in (most of its __init__ arguments are already directly passed to parse_qsl anyways).

@AdamGold
Copy link
Contributor Author

@AdamGold AdamGold commented Jan 25, 2021

@Fidget-Spinner Thanks! much appreciated.
Regarding CGI and FieldStorage, isn't that used for POST requests?

Yup, however it uses parse_qsl internally somewhere, so I'm guessing we probably still have to pass it in (most of its __init__ arguments are already directly passed to parse_qsl anyways).

From the docs:

The script’s input is connected to the client too, and sometimes the form data is read this way; at other times the form data is passed via the “query string” part of the URL

I honestly don't know enough about CGI to tell - how is the query string being used there?

Copy link
Contributor

@Fidget-Spinner Fidget-Spinner left a comment

I honestly don't know enough about CGI to tell - how is the query string being used there?

So I didn't know this either, but a POST form can contain a query string too... 😮; so FieldStorage has to parse query strings: https://stackoverflow.com/questions/14710061/is-it-valid-to-combine-a-form-post-with-a-query-string/14710450

Lib/cgi.py Outdated Show resolved Hide resolved
Lib/cgi.py Outdated Show resolved Hide resolved
Lib/cgi.py Outdated Show resolved Hide resolved
Lib/cgi.py Show resolved Hide resolved
@AdamGold AdamGold force-pushed the AdamGold:fix-issue-42967 branch from 1f9353a to 08c7c39 Jan 25, 2021
Copy link
Member

@merwok merwok left a comment

I prefered the other version that always split on & and had a boolean param to also parse on ;. Waiting on Senthil’s and others’ opinion on the bug report.

@AdamGold AdamGold force-pushed the AdamGold:fix-issue-42967 branch from 08c7c39 to 5e1cfd1 Jan 25, 2021
@AdamGold
Copy link
Contributor Author

@AdamGold AdamGold commented Jan 25, 2021

So I didn't know this either, but a POST form can contain a query string too..

@Fidget-Spinner implemented.

@merwok
Copy link
Member

@merwok merwok commented Jan 25, 2021

Just a process note: github’s user experience is not great for reviewers when a PR has its history rewritten.
We prefer that people add new commits, and the final merge is always a squash merge. Thanks!
See https://devguide.python.org/pullrequest/#submitting

@AdamGold AdamGold force-pushed the AdamGold:fix-issue-42967 branch from 5e1cfd1 to 8e1e361 Jan 25, 2021
@AdamGold
Copy link
Contributor Author

@AdamGold AdamGold commented Jan 25, 2021

@merwok Just saw this comment - noted, will be following this from now on.

@orsenthil
Copy link
Member

@orsenthil orsenthil commented Feb 14, 2021

I prefered the other version that always split on & and had a boolean param to also parse on ;.

@merwok - This is a good thinking, and I originally thought along the lines of stick with '&' or just '&' and ';' in some form.

Taking werkzeug package example (used in Flask and other projects) made it think that it's very reasonable to with a separator argument, with default as & pallets/werkzeug@6503519#diff-2d06ddf3f490f8cbcda42ae788a0586f08d5c96e67d89784a785c8295759e60dR30 (changed since 2009)

  1. The W3C recommendation -

Let strings be the result of strictly splitting the string payload on U+0026 AMPERSAND characters (&).

is taken care by this patch.

  1. Using a bool to accommodate an older unrecommended separator will confuse people. But to accommodate the old un-changed practice, like you showed the example of debian bugs, providing the option to override separator made more sense, since it was explicit.

The breaking change is disallowing both & and ; as separator together, and due to attack-vector of cache-poisoning exposed as a vulnerability, the primary breaking change is about disallowing both these characters together. I think, seems reasonable thing to do, even as it potential break some software that were relying on it.

Copy link
Member

@orsenthil orsenthil left a comment

I think, using the separator with '&' as default seems much better than not providing any option change and strictly use '&' as this moment.

We will only need a documentation that highlights this behavior and change in behavior from past and this patch can be merged.

@AdamGold
Copy link
Contributor Author

@AdamGold AdamGold commented Feb 14, 2021

@orsenthil Where should I write this documentation?

@miss-islington
Copy link
Contributor

@miss-islington miss-islington commented Feb 14, 2021

Sorry, @AdamGold and @orsenthil, I could not cleanly backport this to 3.7 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker fcbe0cb04d35189401c0c880ebfb4311e952d776 3.7

@miss-islington
Copy link
Contributor

@miss-islington miss-islington commented Feb 14, 2021

Sorry @AdamGold and @orsenthil, I had trouble checking out the 3.8 backport branch.
Please backport using cherry_picker on command line.
cherry_picker fcbe0cb04d35189401c0c880ebfb4311e952d776 3.8

@miss-islington
Copy link
Contributor

@miss-islington miss-islington commented Feb 14, 2021

Sorry, @AdamGold and @orsenthil, I could not cleanly backport this to 3.9 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker fcbe0cb04d35189401c0c880ebfb4311e952d776 3.9

@miss-islington
Copy link
Contributor

@miss-islington miss-islington commented Feb 14, 2021

Sorry @AdamGold and @orsenthil, I had trouble checking out the 3.6 backport branch.
Please backport using cherry_picker on command line.
cherry_picker fcbe0cb04d35189401c0c880ebfb4311e952d776 3.6

orsenthil added a commit to orsenthil/cpython that referenced this pull request Feb 15, 2021
bpo-42967: [security] Address a web cache-poisoning issue reported in
urllib.parse.parse_qsl().

urllib.parse will only us "&" as query string separator by default
instead of both ";" and "&" as allowed in earlier versions. An optional
argument seperator with default value "&" is added to specify the
separator.

Co-authored-by: Éric Araujo <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Ken Jin <[email protected]>
Co-authored-by: Éric Araujo <[email protected]>

(cherry picked from commit fcbe0cb)
orsenthil added a commit to orsenthil/cpython that referenced this pull request Feb 15, 2021
…4297)

bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl().

urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator.

Co-authored-by: Éric Araujo <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Ken Jin <[email protected]>
Co-authored-by: Éric Araujo <[email protected]>.
(cherry picked from commit fcbe0cb)

Co-authored-by: Adam Goldschmidt <[email protected]>
@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Feb 15, 2021

GH-24528 is a backport of this pull request to the 3.9 branch.

orsenthil added a commit to orsenthil/cpython that referenced this pull request Feb 15, 2021
bpo-42967: [security] Address a web cache-poisoning issue reported in
urllib.parse.parse_qsl().

urllib.parse will only us "&" as query string separator by default
instead of both ";" and "&" as allowed in earlier versions. An optional
argument seperator with default value "&" is added to specify the
separator.

Co-authored-by: Éric Araujo <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Ken Jin <[email protected]>
Co-authored-by: Éric Araujo <[email protected]>
(cherry picked from commit fcbe0cb)
orsenthil added a commit to orsenthil/cpython that referenced this pull request Feb 15, 2021
…4297)

bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl().

urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator.

Co-authored-by: Éric Araujo <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Ken Jin <[email protected]>
Co-authored-by: Éric Araujo <[email protected]>.
(cherry picked from commit fcbe0cb)

Co-authored-by: Adam Goldschmidt <[email protected]>
@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Feb 15, 2021

GH-24529 is a backport of this pull request to the 3.8 branch.

orsenthil added a commit to orsenthil/cpython that referenced this pull request Feb 15, 2021
bpo-42967: [security] Address a web cache-poisoning issue reported in
urllib.parse.parse_qsl().

urllib.parse will only us "&" as query string separator by default
instead of both ";" and "&" as allowed in earlier versions. An optional
argument seperator with default value "&" is added to specify the
separator.

Co-authored-by: Éric Araujo <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Ken Jin <[email protected]>
Co-authored-by: Éric Araujo <[email protected]>
(cherry picked from commit fcbe0cb)
orsenthil added a commit to orsenthil/cpython that referenced this pull request Feb 15, 2021
…4297)

bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl().

urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator.

Co-authored-by: Éric Araujo <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Ken Jin <[email protected]>
Co-authored-by: Éric Araujo <[email protected]>.
(cherry picked from commit fcbe0cb)

Co-authored-by: Adam Goldschmidt <[email protected]>
@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Feb 15, 2021

GH-24531 is a backport of this pull request to the 3.7 branch.

orsenthil added a commit to orsenthil/cpython that referenced this pull request Feb 15, 2021
bpo-42967: [security] Address a web cache-poisoning issue reported in
urllib.parse.parse_qsl().

urllib.parse will only us "&" as query string separator by default
instead of both ";" and "&" as allowed in earlier versions. An optional
argument seperator with default value "&" is added to specify the
separator.

Co-authored-by: Éric Araujo <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Ken Jin <[email protected]>
Co-authored-by: Éric Araujo <[email protected]>
(cherry picked from commit fcbe0cb)
orsenthil added a commit to orsenthil/cpython that referenced this pull request Feb 15, 2021
…4297)

bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl().

urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator.

Co-authored-by: Éric Araujo <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Ken Jin <[email protected]>
Co-authored-by: Éric Araujo <[email protected]>.
(cherry picked from commit fcbe0cb)

Co-authored-by: Adam Goldschmidt <[email protected]>
@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Feb 15, 2021

GH-24532 is a backport of this pull request to the 3.6 branch.

@AdamGold
Copy link
Contributor Author

@AdamGold AdamGold commented Feb 15, 2021

CVE-2021-23336 has been assigned.

orsenthil added a commit that referenced this pull request Feb 15, 2021
…24528)

(cherry picked from commit fcbe0cb)

* [3.9] bpo-42967: only use '&' as a query string separator (GH-24297)

bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl().

urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator.

Co-authored-by: Éric Araujo <[email protected]>
Co-authored-by: Ken Jin <[email protected]>
Co-authored-by: Adam Goldschmidt <[email protected]>
ambv pushed a commit that referenced this pull request Feb 15, 2021
…24529)

* bpo-42967: only use '&' as a query string separator (#24297)

bpo-42967: [security] Address a web cache-poisoning issue reported in
urllib.parse.parse_qsl().

urllib.parse will only us "&" as query string separator by default
instead of both ";" and "&" as allowed in earlier versions. An optional
argument seperator with default value "&" is added to specify the
separator.

Co-authored-by: Éric Araujo <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Ken Jin <[email protected]>
Co-authored-by: Éric Araujo <[email protected]>
(cherry picked from commit fcbe0cb)

* [3.8] bpo-42967: only use '&' as a query string separator (GH-24297)

bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl().

urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator.

Co-authored-by: Éric Araujo <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Ken Jin <[email protected]>
Co-authored-by: Éric Araujo <[email protected]>.
(cherry picked from commit fcbe0cb)

Co-authored-by: Adam Goldschmidt <[email protected]>

* Update correct version information.

* fix docs and make logic clearer

Co-authored-by: Adam Goldschmidt <[email protected]>
Co-authored-by: Fidget-Spinner <[email protected]>
ned-deily pushed a commit that referenced this pull request Feb 15, 2021
…H-24531)

bpo-42967: [security] Address a web cache-poisoning issue reported in
urllib.parse.parse_qsl().

urllib.parse will only us "&" as query string separator by default
instead of both ";" and "&" as allowed in earlier versions. An optional
argument seperator with default value "&" is added to specify the
separator.

Co-authored-by: Éric Araujo <[email protected]>
Co-authored-by: Ken Jin <[email protected]>
Co-authored-by: Adam Goldschmidt <[email protected]>
(cherry picked from commit fcbe0cb)
ned-deily pushed a commit that referenced this pull request Feb 15, 2021
…H-24532)

bpo-42967: [security] Address a web cache-poisoning issue reported in
urllib.parse.parse_qsl().

urllib.parse will only us "&" as query string separator by default
instead of both ";" and "&" as allowed in earlier versions. An optional
argument seperator with default value "&" is added to specify the
separator.

Co-authored-by: Éric Araujo <[email protected]>
Co-authored-by: Ken Jin <[email protected]>
Co-authored-by: Adam Goldschmidt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants