Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add C implementation of os.path.splitroot() #102511

Closed
barneygale opened this issue Mar 7, 2023 · 5 comments
Closed

Add C implementation of os.path.splitroot() #102511

barneygale opened this issue Mar 7, 2023 · 5 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage type-feature A feature request or enhancement

Comments

@barneygale
Copy link
Contributor

barneygale commented Mar 7, 2023

Feature or enhancement

Speed up os.path.splitroot() by implementing it in C.

Pitch

Per @eryksun:

I think splitroot() warrants a C implementation since it's a required step in our basic path handling on Windows -- join(), split(), relpath(), and commonpath(). Speeding it up gives a little boost across the board. Also, it would be less confusing if nt._path_splitroot() actually implemented ntpath.splitroot().

If implemented, using _Py_splitroot() in _Py_normpath() would help to ensure consistency. Currently, for example, ntpath.normpath('//?/UNC/server/share/../..') is correct on POSIX but wrong on Windows because _Py_normpath() incorrectly handles "//?/UNC/" as the root instead of "//?/UNC/server/share/".

Previous discussion

Linked PRs

@barneygale barneygale added the type-feature A feature request or enhancement label Mar 7, 2023
@barneygale barneygale added the performance Performance or resource usage label Mar 7, 2023
@zooba
Copy link
Member

zooba commented Mar 10, 2023

I'd recommend making this a skiproot API rather than split, and just have it return the first index (or a pointer) after the root. This is very easy to convert into a split, but it also composes better with other APIs (e.g. _Py_normpath needs the index and not the text).

@eryksun
Copy link
Contributor

eryksun commented Mar 10, 2023

Sounds good. We can use _Py_skiproot() in os__path_splitroot_impl() in "Modules/posixmodule.c".

Note that existing use of nt._path_splitroot() in importlib will have to be updated to handle the 3-way split (drive, root, rest) instead of (anchor, rest).

@barneygale
Copy link
Contributor Author

I'm going to attempt to implement this, though my C is rather basic so it will take me some time.

Rough plan:

  1. Make nt._path_splitroot() work like ntpath.splitroot(), i.e. return a 3-tuple. Adjust importlib. Use it in ntpath where available.
  2. Add posix._path_splitroot(). Use it in posixpath where available.
  3. Add _Py_skiproot(). Call it from _Py_normpath() and _path_splitroot()

Let me know if that doesn't sound right!

@eryksun
Copy link
Contributor

eryksun commented Mar 13, 2023

Here's an overview of what I did to experiment with this idea. I started by adding a new _Py_skiproot() internal API function in "Python/fileutils.c":

PyAPI_FUNC(wchar_t *) _Py_skiproot(wchar_t *path, Py_ssize_t size,
                                   wchar_t **root);

I moved the existing root-skipping implementation from _Py_normpath() into _Py_skiproot() and modified it to implement returning rest and root. I also added support for extended "UNC" paths. I updated _Py_normpath() to use _Py_skiproot(). Once test_ntpath and test_posixpath were successful, I moved on to implement os._path_splitroot_ex in "Modules/posixmodule.c", which I used to implement ntpath.splitroot() and posixpath.splitroot(). I again verified that test_ntpath and test_posixpath were successful.

I retained the old os._path_splitroot implementation that's used by importlib. I plan to modify importlib to use the new implementation, but only if benchmarking demonstrates that making splitroot() a builtin function is worthwhile in general. If so, I'll remove the old implementation and rename the new one to os._path_splitroot.

My initial comparison shows that, when splitting "//server/share/spam/eggs", the new builtin _path_splitroot_ex() is about 4 times faster than the fallback implementation on Windows, and it's about 1.7 times faster than the fallback implementation on Linux.

@barneygale
Copy link
Contributor Author

My C isn't good enough for this, so someone else please feel free to take a stab at it!

@iritkatriel iritkatriel added extension-modules C modules in the Modules dir interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed extension-modules C modules in the Modules dir labels Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants