[Python-Dev] Impact of Namedtuple on startup time
Serhiy Storchaka
storchaka at gmail.com
Tue Jul 18 01:20:27 EDT 2017
17.07.17 15:43, Antoine Pitrou пише:
> Cost of creating a namedtuple has been identified as a contributor to
> Python startup time. Not only Python core and the stdlib, but any
> third-party library creating namedtuple classes (there are many of
> them). An issue was created for this:
> https://bugs.python.org/issue28638
>
> Raymond decided to close the issue because:
>
> 1) the proposed resolution makes the "_source" attribute empty (or, at
> least, something else than it currently is). Raymond claims the
> "_source" attribute is an essential feature of namedtuples.
>
> 2) optimizing startup cost is supposedly not worth the effort.
The implementations of namedtuple that don't use compilation were
written by different developers (including me) multiple times before
issue28638. I provided my patch in issue28638 as an example, but I
understand Raymond's arguments, and they look weighty to me. I don't
know how much the _source attribute is used, but it is a part of public API.
The drawback of these implementation is slower __new__ and __repr__
methods. This can be avoided if use compilation for creating __new__,
but this makes the creation of a namedtuple class slower (but still
faster than compiling full namedtuple class). The drawback of generating
_source without using it to create a namedtuple class is complicating
the code and possible quickly desynchronization of two implementations
in future.
I think that the right solution of this issue is generalizing the import
machinery and allowing it to cache not just files, but arbitrary chunks
of code. We already use precompiled bytecode files for exactly same goal
-- speed up the startup by avoiding compilation. This solution could be
used for caching other generated code, not just namedtuples.
More information about the Python-Dev
mailing list