bpo-37969: Correct urllib.parse functions dropping the delimiters of empty URI components #15642

maggyero · Sep 2, 2019

This PR will make the following changes:

update the urlsplit and urlunsplit functions of the urllib.parse module to keep the ? delimiter in a URI with an empty query component and keep the # delimiter in a URI with an empty fragment component (currently the delimiters are dropped):
```
  >>> from urllib.parse import urlsplit, urlunsplit
  >>> urlunsplit(urlsplit('http://example.com/?'))
  'http://example.com/?'  # currently: 'http://example.com/'
  >>> urlunsplit(urlsplit('http://example.com/#'))
  'http://example.com/#'  # currently: 'http://example.com/'
```
This is required by RFC 3986:

Normalization should not remove delimiters when their associated component is empty unless licensed to do so by the scheme specification. For example, the URI "http://example.com/?" cannot be assumed to be equivalent to any of the examples above. Likewise, the presence or absence of delimiters within a userinfo subcomponent is usually significant to its interpretation. The fragment component is not subject to any scheme-based normalization; thus, two URIs that differ only by the suffix "#" are considered different regardless of the scheme.

To do so:
- the urlsplit function now decodes an absent '' query component as None and an absent '' fragment component as None (e.g., urlsplit('http://example.com/') → ('http', 'example.com', '/', None, None)), and still decodes an empty '?' query component as '' and an empty '#' fragment component as '' (e.g., urlsplit('http://example.com/?#') → ('http', 'example.com', '/', '', ''));
- the urlunsplit function now encodes a None query component as an absent '' query component and a None fragment component as an absent '' fragment component (e.g., urlunsplit(('http', 'example.com', '/', None, None)) → 'http://example.com/'), and now encodes a '' query component as an empty '?' query component and a '' fragment component as an empty '#' fragment component (e.g., urlunsplit(('http', 'example.com', '/', '', '')) → 'http://example.com/?#');
add and update the corresponding unit tests in the test.test_urlparse module;
update a unit test in the test.test_urllib2 module;
update the urllib.parse documentation accordingly.

https://bugs.python.org/issue37969


      Update parse.py


      Update parse.py


      Update urllib.parse.rst


      Update parse.py


      Update parse.py


      Update test_urllib2.py


      Update test_urlparse.py


      Update test_urlparse.py


      Update test_urlparse.py


      Update parse.py


      Update test_urlparse.py


      Update test_urlparse.py


      Update test_urlparse.py


      Update test_urlparse.py

nicktimko · Sep 3, 2019

It's maybe a bit surprising to have some of the tuple fields sometimes be None (typing.Tuple[typing.Optional[str]] instead of typing.Tuple[str]), but I'm not sure of a more obvious solution.

The other alternative I thought about was to just explicitly dump in the delimiter if it's empty (e.g. 'http://example.com/?#' → 'http', 'example.com', '/', '?', '#'), but that's probably more surprising, rebuilding the URL is more complex, and what then if there's a URL like http://example.com/??.

I think you need to also describe the breaking change very clearly (haven't done it before, but I think that's what bedevere/news is for, i.e. these things), and leave hints in the actual documentation about the change ("changed in 3.9")

Housekeeping: I'd squash all the commits.

maggyero · Sep 3, 2019

Thank you for reviewing this @nicktimko! Yes the None solution for absent query/fragment seemed the most straightforward and natural to me.

I have updated the PR description to detail the exact changes. Nice suggestion, I will make the news entry, documentation version note and commit squash. But before I would like to fix an issue: the documentation tests in Travis CI failed for an obscure reason (see below). Do you have any idea why?

nicktimko · Sep 3, 2019

I don't know, but the docs build looks like it's installing blurb, which might be related to the news, so maybe adding a news item would fix it? Does it run locally? Just guessing though.


      Create 2019-09-10-17-01-35.bpo-37969.5Dz8e7.rst

maggyero · Sep 11, 2019

Thanks @nicktimko, I have added a news entry, but documentation tests still fail in Travis-CI.

orsenthil · Sep 11, 2019

I don't think the documentation failure is related to the code in this PR. Perhaps this PR needs to be rebased?

orsenthil · 2019-09-11T21:22:43Z

orsenthil reviewed

Sep 11, 2019

View changes

This is going to be a breaking change and will affect a plenty of downstream libraries and frameworks that had been relying upon the previous behavior.

I don't have any code comments, and the code changes look good to me.
I find the rational ok
I will request reviews from more active core developers and want to hear their opinion on this change too.

Update parse.py

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

838e934

the-knights-who-say-ni added the CLA signed label Sep 2, 2019

bedevere-bot added the awaiting review label Sep 2, 2019

maggyero added 6 commits Sep 2, 2019

Update parse.py

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

8bffcf3

Update urllib.parse.rst

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

e7e1f7a

Update parse.py

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

050de62

Update parse.py

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

4df550d

Update test_urllib2.py

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

2dfecf2

Update test_urlparse.py

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

98d25e7

maggyero changed the title ~~bpo-37969: Update parse.py~~ bpo-37969: Correct urllib.parse functions reporting false equivalent URIs Sep 2, 2019

maggyero added 7 commits Sep 2, 2019

Update test_urlparse.py

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

a56112a

Update test_urlparse.py

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

7dc5a12

Update parse.py

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

ad3994a

Update test_urlparse.py

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

6d70b49

Update test_urlparse.py

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

58947a5

Update test_urlparse.py

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

db39e96

Update test_urlparse.py

Loading status checks…

67465d2

maggyero marked this pull request as ready for review Sep 2, 2019

ned-deily requested a review from orsenthil Sep 4, 2019

Create 2019-09-10-17-01-35.bpo-37969.5Dz8e7.rst

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

d6bd27f

maggyero changed the title ~~bpo-37969: Correct urllib.parse functions reporting false equivalent URIs~~ bpo-37969: Correct urllib.parse functions dropping the delimiters of empty URI components Sep 11, 2019

orsenthil requested review from vstinner and serhiy-storchaka Sep 11, 2019

python/cpython

bpo-37969: Correct urllib.parse functions dropping the delimiters of empty URI components #15642

bpo-37969: Correct urllib.parse functions dropping the delimiters of empty URI components #15642

maggyero commented Sep 2, 2019 •

edited

the-knights-who-say-ni added the CLA signed label Sep 2, 2019

bedevere-bot added the awaiting review label Sep 2, 2019

maggyero changed the title bpo-37969: Update parse.py bpo-37969: Correct urllib.parse functions reporting false equivalent URIs Sep 2, 2019

maggyero marked this pull request as ready for review Sep 2, 2019

This comment has been minimized.

nicktimko commented Sep 3, 2019 •

edited

This comment has been minimized.

maggyero commented Sep 3, 2019 •

edited

This comment has been minimized.

nicktimko commented Sep 3, 2019

ned-deily requested a review from orsenthil Sep 4, 2019

This comment has been minimized.

maggyero commented Sep 11, 2019 •

edited

maggyero changed the title bpo-37969: Correct urllib.parse functions reporting false equivalent URIs bpo-37969: Correct urllib.parse functions dropping the delimiters of empty URI components Sep 11, 2019

This comment has been minimized.

orsenthil commented Sep 11, 2019

orsenthil left a comment

orsenthil requested review from vstinner and serhiy-storchaka Sep 11, 2019

Sponsor python/cpython

python/cpython

Join GitHub today

bpo-37969: Correct urllib.parse functions dropping the delimiters of empty URI components #15642

Conversation

maggyero commented Sep 2, 2019 • edited

the-knights-who-say-ni added the CLA signed label Sep 2, 2019

bedevere-bot added the awaiting review label Sep 2, 2019

maggyero changed the title bpo-37969: Update parse.py bpo-37969: Correct urllib.parse functions reporting false equivalent URIs Sep 2, 2019

maggyero marked this pull request as ready for review Sep 2, 2019

This comment has been minimized.

nicktimko commented Sep 3, 2019 • edited

This comment has been minimized.

maggyero commented Sep 3, 2019 • edited

This comment has been minimized.

nicktimko commented Sep 3, 2019

ned-deily requested a review from orsenthil Sep 4, 2019

This comment has been minimized.

maggyero commented Sep 11, 2019 • edited

maggyero changed the title bpo-37969: Correct urllib.parse functions reporting false equivalent URIs bpo-37969: Correct urllib.parse functions dropping the delimiters of empty URI components Sep 11, 2019

This comment has been minimized.

orsenthil commented Sep 11, 2019

orsenthil left a comment

orsenthil requested review from vstinner and serhiy-storchaka Sep 11, 2019

maggyero commented Sep 2, 2019 •

edited

nicktimko commented Sep 3, 2019 •

edited

maggyero commented Sep 3, 2019 •

edited

maggyero commented Sep 11, 2019 •

edited