Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upbpo-37969: Correct urllib.parse functions dropping the delimiters of empty URI components #15642
Conversation
the-knights-who-say-ni
added
the
CLA signed
label
Sep 2, 2019
bedevere-bot
added
the
awaiting review
label
Sep 2, 2019
maggyero
changed the title
bpo-37969: Update parse.py
bpo-37969: Correct urllib.parse functions reporting false equivalent URIs
Sep 2, 2019
maggyero
marked this pull request as ready for review
Sep 2, 2019
This comment has been minimized.
This comment has been minimized.
nicktimko
commented
Sep 3, 2019
•
|
It's maybe a bit surprising to have some of the tuple fields sometimes be The other alternative I thought about was to just explicitly dump in the delimiter if it's empty (e.g. I think you need to also describe the breaking change very clearly (haven't done it before, but I think that's what bedevere/news is for, i.e. these things), and leave hints in the actual documentation about the change ("changed in 3.9") Housekeeping: I'd squash all the commits. |
This comment has been minimized.
This comment has been minimized.
|
Thank you for reviewing this @nicktimko! Yes the I have updated the PR description to detail the exact changes. Nice suggestion, I will make the news entry, documentation version note and commit squash. But before I would like to fix an issue: the documentation tests in Travis CI failed for an obscure reason (see below). Do you have any idea why? |
This comment has been minimized.
This comment has been minimized.
nicktimko
commented
Sep 3, 2019
|
I don't know, but the docs build looks like it's installing |
ned-deily
requested a review
from orsenthil
Sep 4, 2019
This comment has been minimized.
This comment has been minimized.
|
Thanks @nicktimko, I have added a news entry, but documentation tests still fail in Travis-CI. |
maggyero
changed the title
bpo-37969: Correct urllib.parse functions reporting false equivalent URIs
bpo-37969: Correct urllib.parse functions dropping the delimiters of empty URI components
Sep 11, 2019
This comment has been minimized.
This comment has been minimized.
|
I don't think the documentation failure is related to the code in this PR. Perhaps this PR needs to be rebased? |
|
This is going to be a breaking change and will affect a plenty of downstream libraries and frameworks that had been relying upon the previous behavior.
|
maggyero commentedSep 2, 2019
•
edited
This PR will make the following changes:
update the
urlsplitandurlunsplitfunctions of theurllib.parsemodule to keep the?delimiter in a URI with an empty query component and keep the#delimiter in a URI with an empty fragment component (currently the delimiters are dropped):This is required by RFC 3986:
To do so:
urlsplitfunction now decodes an absent''query component asNoneand an absent''fragment component asNone(e.g.,urlsplit('http://example.com/')→('http', 'example.com', '/', None, None)), and still decodes an empty'?'query component as''and an empty'#'fragment component as''(e.g.,urlsplit('http://example.com/?#')→('http', 'example.com', '/', '', ''));urlunsplitfunction now encodes aNonequery component as an absent''query component and aNonefragment component as an absent''fragment component (e.g.,urlunsplit(('http', 'example.com', '/', None, None))→'http://example.com/'), and now encodes a''query component as an empty'?'query component and a''fragment component as an empty'#'fragment component (e.g.,urlunsplit(('http', 'example.com', '/', '', ''))→'http://example.com/?#');add and update the corresponding unit tests in the
test.test_urlparsemodule;update a unit test in the
test.test_urllib2module;update the
urllib.parsedocumentation accordingly.https://bugs.python.org/issue37969