Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve import time of various stdlib modules #109653

Open
AlexWaygood opened this issue Sep 21, 2023 · 18 comments
Open

Improve import time of various stdlib modules #109653

AlexWaygood opened this issue Sep 21, 2023 · 18 comments
Labels
3.13 new features, bugs and security fixes performance Performance or resource usage topic-typing type-feature A feature request or enhancement

Comments

@AlexWaygood
Copy link
Member

AlexWaygood commented Sep 21, 2023

Feature or enhancement

Proposal:

As noted in https://discuss.python.org/t/deferred-computation-evalution-for-toplevels-imports-and-dataclasses/34173, typing isn't the slowest stdlib module in terms of import time, but neither is it one of the quickest. We should speed it up, if possible.

Links to previous discussion of this feature:

https://discuss.python.org/t/deferred-computation-evalution-for-toplevels-imports-and-dataclasses/34173

Linked PRs

@AlexWaygood AlexWaygood added type-feature A feature request or enhancement performance Performance or resource usage topic-typing 3.13 new features, bugs and security fixes labels Sep 21, 2023
AlexWaygood added a commit that referenced this issue Sep 23, 2023
…ed members on demand (#109651)

Co-authored-by: Thomas Grainger <tagrain@gmail.com>
@AlexWaygood AlexWaygood changed the title Improve import time of typing.py Improve import time of various stdlib modules Sep 23, 2023
@AlexWaygood
Copy link
Member Author

Retitling this issue to reflect a change of scope: part of the reason for the slow import time of typing is the slow import time of some of the modules it imports. There may be low-hanging fruit in some other stdlib modules as well.

@hugovk
Copy link
Member

hugovk commented Sep 23, 2023

An idea: review the output of a linter for "unused" imports.

Some may be false positives due to needing an import's side effects, but I expect there will be some that are no longer needed.

@itamaro
Copy link
Contributor

itamaro commented Sep 23, 2023

Some may be false positives due to needing an import's side effects ...

How do you feel about making an intentional effort to get rid of such reliance on side effects in the stdlib?
Explicit is better than implicit :)

@AlexWaygood
Copy link
Member Author

An idea: review the output of a linter for "unused" imports.

FWIW, I can only find one in the stdlib that I'm 100% confident with, using ruff/pycln:

import warnings

I think this is because Victor went through them all fairly recently, in #105411. There are a fair few in Lib/test/, though. I'll file a PR in a bit.

@sobolevn
Copy link
Member

This is also interesting:

  • dataclasses copies recursive_repr to avoid a dependency:

    cpython/Lib/dataclasses.py

    Lines 248 to 266 in 51863b7

    # This function's logic is copied from "recursive_repr" function in
    # reprlib module to avoid dependency.
    def _recursive_repr(user_function):
    # Decorator to make a repr function return "..." for a recursive
    # call.
    repr_running = set()
    @functools.wraps(user_function)
    def wrapper(self):
    key = id(self), _thread.get_ident()
    if key in repr_running:
    return '...'
    repr_running.add(key)
    try:
    result = user_function(self)
    finally:
    repr_running.discard(key)
    return result
    return wrapper
  • But, it imports functools:
    import functools
  • And it imports reprlib.recursive_repr:
    from reprlib import recursive_repr

@sobolevn
Copy link
Member

One more interesting case:

@AlexWaygood
Copy link
Member Author

@sobolevn, the situation with dataclasses and reprlib you point out in #109653 (comment) is indeed interesting. From my local experiments, it does seem like the import time of dataclasses is unaffected if dataclasses is altered so that it just imports recursive_repr from reprlib rather than reimplementing that logic. The import time for functools also doesn't really change if it vendors the reprlib code instead of importing it; it seems like that import is really cheap.

So, should we just go for cleaner code in dataclasses, and import recursive_repr from reprlib? Maybe, but there are other reasons than just import time to avoid unnecessary dependencies between different stdlib modules. I think it's not the biggest issue in the world; we can probably leave the code as it is. I'll defer to the dataclasses experts on whether or not they feel like changing this.

@hauntsaninja

This comment was marked as resolved.

@hugovk

This comment was marked as resolved.

@AlexWaygood

This comment was marked as resolved.

@ewdurbin

This comment was marked as resolved.

@hugovk

This comment was marked as resolved.

@AA-Turner
Copy link
Member

Looking at importlib.metadata there's a fairly easy ~15% gain by defering the import of _adapters (email.message). I tried removing all of the annotations & imports of typing, but this is made difficult as Traversable is a core part of the API, and it's a runtime checkable typing protocol rather than a more 'normal' (for the stdlib) ABC. The improvement over the first optimisation is therefore only 7% (20% total).

A

csm10495 pushed a commit to csm10495/cpython that referenced this issue Sep 28, 2023
…precated members on demand (python#109651)

Co-authored-by: Thomas Grainger <tagrain@gmail.com>
csm10495 pushed a commit to csm10495/cpython that referenced this issue Sep 28, 2023
csm10495 pushed a commit to csm10495/cpython that referenced this issue Sep 28, 2023
@DavidCEllis
Copy link
Contributor

@sobolevn, the situation with dataclasses and reprlib you point out in #109653 (comment) is indeed interesting. From my local experiments, it does seem like the import time of dataclasses is unaffected if dataclasses is altered so that it just imports recursive_repr from reprlib rather than reimplementing that logic. The import time for functools also doesn't really change if it vendors the reprlib code instead of importing it; it seems like that import is really cheap.

So, should we just go for cleaner code in dataclasses, and import recursive_repr from reprlib? Maybe, but there are other reasons than just import time to avoid unnecessary dependencies between different stdlib modules. I think it's not the biggest issue in the world; we can probably leave the code as it is. I'll defer to the dataclasses experts on whether or not they feel like changing this.

One thing that does mildly bug me about the _recursive_repr definition in dataclasses is that it is modified to use functools.wraps instead of manually fixing the attributes like the reprlib implementation. The definition of functools.wraps uses partial which in turn uses recursive_repr from reprlib for its own __repr__1. So it's kind of using recursive_repr indirectly to avoid depending on recursive_repr directly.


dataclasses import time is unaffected by the change here as most of the significant direct imports end up being imported in-turn by inspect. (eg: re could be delayed until needed to identify a string annotation, but that doesn't matter while inspect is being used as the core method to gather annotations as inspect will eagerly import re).

I wrote about removing or delaying some of the imports dataclasses uses late last year but didn't end up publishing or pursuing it as #97800 made inspect.get_annotations the mechanism by which dataclasses obtains the annotations and as such much more of a core part of the functionality. I figured it would be too much churn at that point and unlikely to get in. I've made this public as a gist if it is of any interest but it's a bit tangential and far too long to include here.

To summarise, taken from the -Ximporttime results I had at the time:

import time: self [us] | cumulative | imported package
import time:      1070 |      22907 | dataclasses (as in 3.11.1)
import time:      1296 |       2840 | dataclasses (with inspect/re/functools/copy imports removed/delayed)

Footnotes

  1. In the python implementation of partial at least.

@DavidCEllis
Copy link
Contributor

DavidCEllis commented Oct 2, 2023

I had a look at dataclasses again as it is now and this could still be done if the relevant parts of inspect.get_annotations are copied into the module as recursive_repr currently is.

Almost any improvements also require deferring import of inspect until it's needed to generate a __doc__ for a missing docstring1. It also requires preventing dataclasses from making the docstring in '-OO' mode2 or there's a chance that it would make -OO slower due to the inspect import always being triggered.

This does mean that classes with no docstring/with string annotations/with the future annotations import will still end up with the inspect/re imports, which could be a surprising difference in performance. I'm also not sure if it would be too disruptive.

The changes would look something like this:

Code changes to dataclasses
diff --git a/Lib/dataclasses.py b/Lib/dataclasses.py
index 84f8d68ce0..5ce90192f5 100644
--- a/Lib/dataclasses.py
+++ b/Lib/dataclasses.py
@@ -1,13 +1,10 @@
-import re
 import sys
 import copy
 import types
-import inspect
 import keyword
-import functools
 import itertools
 import abc
-import _thread
+from reprlib import recursive_repr as _recursive_repr
 from types import FunctionType, GenericAlias
 
 
@@ -220,7 +217,21 @@ def __repr__(self):
 # String regex that string annotations for ClassVar or InitVar must match.
 # Allows "identifier.identifier[" or "identifier[".
 # https://bugs.python.org/issue33453 for details.
-_MODULE_IDENTIFIER_RE = re.compile(r'^(?:\s*(\w+)\s*\.)?\s*(\w+)')
+class _DelayedRegexMatcher:
+    def __init__(self, match_str):
+        self.match_str = match_str
+        self._compiled_regex = None
+
+    def __repr__(self):
+        return f"{self.__class__.__name__}(match_str={self.match_str!r})"
+
+    def match(self, ann):
+        if self._compiled_regex is None:
+            import re
+            self._compiled_regex = re.compile(self.match_str)
+        return self._compiled_regex.match(ann)
+
+_MODULE_IDENTIFIER_RE = _DelayedRegexMatcher(r'^(?:\s*(\w+)\s*\.)?\s*(\w+)')
 
 # Atomic immutable types which don't require any recursive handling and for which deepcopy
 # returns the same object. We can provide a fast-path for these types in asdict and astuple.
@@ -245,25 +256,31 @@ def __repr__(self):
     property,
 })
 
-# This function's logic is copied from "recursive_repr" function in
-# reprlib module to avoid dependency.
-def _recursive_repr(user_function):
-    # Decorator to make a repr function return "..." for a recursive
-    # call.
-    repr_running = set()
-
-    @functools.wraps(user_function)
-    def wrapper(self):
-        key = id(self), _thread.get_ident()
-        if key in repr_running:
-            return '...'
-        repr_running.add(key)
-        try:
-            result = user_function(self)
-        finally:
-            repr_running.discard(key)
-        return result
-    return wrapper
+def _get_annotations(obj):
+    """
+    Compute the annotations dict for a dataclass.
+
+    Copied from inspect.get_annotations with unused code removed.
+    """
+    if isinstance(obj, type):
+        obj_dict = getattr(obj, "__dict__", None)
+        if obj_dict and hasattr(obj_dict, "get"):
+            ann = obj_dict.get("__annotations__", None)
+            if isinstance(ann, types.GetSetDescriptorType):
+                ann = None
+        else:
+            ann = None
+    else:
+        raise TypeError(f"{obj!r} is not a class.")
+
+    if not isinstance(ann, types.NoneType | dict):
+        raise ValueError(f"{obj!r}.__annotations__ is neither a dict nor None")
+
+    if not ann:
+        return {}
+
+    return dict(ann)
+
 
 class InitVar:
     __slots__ = ('type', )
@@ -322,7 +339,7 @@ def __init__(self, default, default_factory, init, repr, hash, compare,
         self.kw_only = kw_only
         self._field_type = None
 
-    @_recursive_repr
+    @_recursive_repr()
     def __repr__(self):
         return ('Field('
                 f'name={self.name!r},'
@@ -632,7 +649,7 @@ def _repr_fn(fields, globals):
                                 for f in fields]) +
                      ')"'],
                      globals=globals)
-    return _recursive_repr(fn)
+    return _recursive_repr()(fn)
 
 
 def _frozen_get_del_attr(cls, fields, globals):
@@ -967,7 +984,7 @@ def _process_class(cls, init, repr, eq, order, unsafe_hash, frozen,
     # actual default value.  Pseudo-fields ClassVars and InitVars are
     # included, despite the fact that they're not real fields.  That's
     # dealt with later.
-    cls_annotations = inspect.get_annotations(cls)
+    cls_annotations = _get_annotations(cls)
 
     # Now find fields in our class.  While doing so, validate some
     # things, and set the default values (as class attributes) where
@@ -1131,15 +1148,17 @@ def _process_class(cls, init, repr, eq, order, unsafe_hash, frozen,
         # we're here the overwriting is unconditional.
         cls.__hash__ = hash_action(cls, field_list, globals)
 
-    if not getattr(cls, '__doc__'):
-        # Create a class doc-string.
-        try:
-            # In some cases fetching a signature is not possible.
-            # But, we surely should not fail in this case.
-            text_sig = str(inspect.signature(cls)).replace(' -> None', '')
-        except (TypeError, ValueError):
-            text_sig = ''
-        cls.__doc__ = (cls.__name__ + text_sig)
+    if sys.flags.optimize < 2:  # Don't create a docstring in -OO mode
+        if not getattr(cls, '__doc__'):
+            # Create a class doc-string.
+            try:
+                # In some cases fetching a signature is not possible.
+                # But, we surely should not fail in this case.
+                import inspect
+                text_sig = str(inspect.signature(cls)).replace(' -> None', '')
+            except (TypeError, ValueError):
+                text_sig = ''
+            cls.__doc__ = (cls.__name__ + text_sig)
 
     if match_args:
         # I could probably compute this once

With -Ximporttime going from ~20ms for dataclasses to ~4ms.

./python -Ximporttime -c "import dataclasses"

Before:

import time: self [us] | cumulative | imported package
import time:       442 |        442 |   _io
import time:        85 |         85 |   marshal
import time:       601 |        601 |   posix
import time:      1674 |       2800 | _frozen_importlib_external
import time:       305 |        305 |   time
import time:       602 |        907 | zipimport
import time:       115 |        115 |     _codecs
import time:      1006 |       1121 |   codecs
import time:       840 |        840 |   encodings.aliases
import time:      1526 |       3486 | encodings
import time:       346 |        346 | encodings.utf_8
import time:       231 |        231 | _signal
import time:        73 |         73 |     _abc
import time:       296 |        368 |   abc
import time:       332 |        700 | io
import time:       438 |        438 |       _stat
import time:       209 |        647 |     stat
import time:      1847 |       1847 |     _collections_abc
import time:       107 |        107 |       genericpath
import time:       245 |        352 |     posixpath
import time:       883 |       3727 |   os
import time:       168 |        168 |   _sitebuiltins
import time:       223 |        223 |   sitecustomize
import time:        67 |         67 |   usercustomize
import time:       585 |       4768 | site
import time:       393 |        393 |       types
import time:      1871 |       2263 |     enum
import time:        82 |         82 |       _sre
import time:       399 |        399 |         re._constants
import time:       419 |        817 |       re._parser
import time:       149 |        149 |       re._casefix
import time:       400 |       1447 |     re._compiler
import time:       197 |        197 |         itertools
import time:       269 |        269 |         keyword
import time:       125 |        125 |           _operator
import time:       457 |        582 |         operator
import time:       337 |        337 |         reprlib
import time:       113 |        113 |         _collections
import time:      1378 |       2874 |       collections
import time:        99 |         99 |       _functools
import time:       805 |       3776 |     functools
import time:       300 |        300 |     copyreg
import time:       742 |       8527 |   re
import time:       369 |        369 |       _weakrefset
import time:       482 |        851 |     weakref
import time:       287 |       1138 |   copy
import time:      1072 |       1072 |       _ast
import time:       595 |        595 |       contextlib
import time:      2148 |       3815 |     ast
import time:       157 |        157 |         _opcode
import time:       202 |        202 |         _opcode_metadata
import time:       294 |        652 |       opcode
import time:       767 |       1418 |     dis
import time:       165 |        165 |     collections.abc
import time:       151 |        151 |       importlib
import time:        76 |        226 |     importlib.machinery
import time:       211 |        211 |         token
import time:        38 |         38 |         _tokenize
import time:       942 |       1190 |       tokenize
import time:       175 |       1365 |     linecache
import time:      2031 |       9017 |   inspect
import time:      1045 |      19725 | dataclasses

After:

import time: self [us] | cumulative | imported package
import time:       517 |        517 |   _io
import time:        87 |         87 |   marshal
import time:       632 |        632 |   posix
import time:      1597 |       2831 | _frozen_importlib_external
import time:       198 |        198 |   time
import time:       466 |        664 | zipimport
import time:        88 |         88 |     _codecs
import time:       758 |        845 |   codecs
import time:       690 |        690 |   encodings.aliases
import time:      1007 |       2542 | encodings
import time:       325 |        325 | encodings.utf_8
import time:       204 |        204 | _signal
import time:        67 |         67 |     _abc
import time:       305 |        371 |   abc
import time:       381 |        752 | io
import time:       252 |        252 |       _stat
import time:       177 |        429 |     stat
import time:      1565 |       1565 |     _collections_abc
import time:       180 |        180 |       genericpath
import time:       286 |        465 |     posixpath
import time:       905 |       3362 |   os
import time:       253 |        253 |   _sitebuiltins
import time:       295 |        295 |   sitecustomize
import time:       148 |        148 |   usercustomize
import time:       693 |       4749 | site
import time:       760 |        760 |     types
import time:       279 |        279 |       _weakrefset
import time:       140 |        140 |       itertools
import time:       726 |       1143 |     weakref
import time:       211 |        211 |     copyreg
import time:       497 |       2609 |   copy
import time:       152 |        152 |   keyword
import time:       230 |        230 |   reprlib
import time:      1193 |       4183 | dataclasses

copy could also be deferred to the inside of asdict and astuple when needed to save another chunk. I don't think the inline import will make much of a difference in performance but I'd like to test that again before proposing it.

If it seems worth it I could make a PR and investigate the effect of deferring copy too.


Edit: I realised that you can use a non-data descriptor to delay the inspect import in this case until __doc__ is accessed rather than on creation. A little more complicated but it makes the performance behaviour more sensible.3

Additional changes for non-data descriptor
diff --git a/Lib/dataclasses.py b/Lib/dataclasses.py
index 5ce90192f5..36fc17a2b5 100644
--- a/Lib/dataclasses.py
+++ b/Lib/dataclasses.py
@@ -233,6 +233,27 @@ def match(self, ann):
 
 _MODULE_IDENTIFIER_RE = _DelayedRegexMatcher(r'^(?:\s*(\w+)\s*\.)?\s*(\w+)')
 
+# Descriptor used to defer `inspect` import until __doc__ is accessed
+class _DocstringMaker:
+    def __get__(self, obj, cls=None):
+        """
+        Create and return a class docstring, replacing this descriptor with the docstring.
+        """
+        import inspect
+
+        if cls is None:
+            cls = type(obj)
+        try:
+            # In some cases fetching a signature is not possible.
+            # But, we surely should not fail in this case.
+            text_sig = str(inspect.signature(cls)).replace(' -> None', '')
+        except (TypeError, ValueError):
+            text_sig = ''
+
+        new_docstring = cls.__name__ + text_sig
+        setattr(cls, "__doc__", new_docstring)
+        return new_docstring
+
 # Atomic immutable types which don't require any recursive handling and for which deepcopy
 # returns the same object. We can provide a fast-path for these types in asdict and astuple.
 _ATOMIC_TYPES = frozenset({
@@ -1150,15 +1171,8 @@ def _process_class(cls, init, repr, eq, order, unsafe_hash, frozen,
 
     if sys.flags.optimize < 2:  # Don't create a docstring in -OO mode
         if not getattr(cls, '__doc__'):
-            # Create a class doc-string.
-            try:
-                # In some cases fetching a signature is not possible.
-                # But, we surely should not fail in this case.
-                import inspect
-                text_sig = str(inspect.signature(cls)).replace(' -> None', '')
-            except (TypeError, ValueError):
-                text_sig = ''
-            cls.__doc__ = (cls.__name__ + text_sig)
+            # Put a docstring maker in place
+            cls.__doc__ = _DocstringMaker()
 
     if match_args:
         # I could probably compute this once

Footnotes

  1. I'd like to be able to remove that but the signature code is more involved than the get_annotations code and I don't know of a nice way to replace it without possibly changing the output.

  2. Perhaps it should already be doing this?

  3. I realised almost immediately after posting this that you could probably just use the same _DocstringMaker instance.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 2, 2023
…110221)

(cherry picked from commit 21a6263)

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
AlexWaygood added a commit that referenced this issue Oct 2, 2023
… (#110247)

gh-109653: Fix regression in the import time of `random` in Python 3.12 (GH-110221)
(cherry picked from commit 21a6263)

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
@danielhollas
Copy link

danielhollas commented Nov 24, 2023

I recently spent a sizeable amount of time trying to speedup startup time of our CLI app, and was delighted to find this issue, thanks so much @AlexWaygood for working on this! 👏

Another chunky stdlib module is logging. It looks like a low-hanging fruit there would be to defer importing traceback, this seems to give around 5ms improvement. Here's how the patch would look on current master:

diff --git a/Lib/logging/__init__.py b/Lib/logging/__init__.py
index eb7e020d1e..7f7d44e5f5 100644
--- a/Lib/logging/__init__.py
+++ b/Lib/logging/__init__.py
@@ -23,7 +23,7 @@
 To use, simply 'import logging' and log away!
 """
 
-import sys, os, time, io, re, traceback, warnings, weakref, collections.abc
+import sys, os, time, io, re, warnings, weakref, collections.abc
 
 from types import GenericAlias
 from string import Template
@@ -653,6 +653,7 @@ def formatException(self, ei):
         This default implementation just uses
         traceback.print_exception()
         """
+        import traceback
         sio = io.StringIO()
         tb = ei[2]
         # See issues #9427, #1553375. Commented out for now.
@@ -1061,6 +1062,7 @@ def handleError(self, record):
         The record which was being processed is passed in to this method.
         """
         if raiseExceptions and sys.stderr:  # see issue 13807
+            import traceback
             exc = sys.exception()
             try:
                 sys.stderr.write('--- Logging error ---\n')
@@ -1601,6 +1603,7 @@ def findCaller(self, stack_info=False, stacklevel=1):
         co = f.f_code
         sinfo = None
         if stack_info:
+            import traceback
             with io.StringIO() as sio:
                 sio.write("Stack (most recent call last):\n")
                 traceback.print_stack(f, file=sio)

@ofek
Copy link
Sponsor Contributor

ofek commented Dec 9, 2023

I would like to echo the elation of the previous commenter when I encountered this ongoing effort today. As a maintainer of various CLIs and also two personal apps that run on AWS Lambda/Google Cloud Functions, this will help me tremendously!

Thank you very, very much 🙂

@AlexWaygood
Copy link
Member Author

Another chunky stdlib module is logging. It looks like a low-hanging fruit there would be to defer importing traceback, this seems to give around 5ms improvement. Here's how the patch would look on current master:

Sorry for the slow response @danielhollas! That indeed looks like a pretty good idea to me. Feel free to make a PR, and ping me on the PR for review :)

I would like to echo the elation of the previous commenter when I encountered this ongoing effort today

You're very welcome :)

danielhollas added a commit to danielhollas/cpython that referenced this issue Dec 10, 2023
Delayed import of traceback results in 2-5ms speedup.
Issue python#109653
danielhollas added a commit to danielhollas/cpython that referenced this issue Dec 10, 2023
Delayed import of traceback results in ~16% speedup.
Issue python#109653
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 new features, bugs and security fixes performance Performance or resource usage topic-typing type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

12 participants