[Python-Dev] Impact of Namedtuple on startup time

Tue Jul 18 01:20:27 EDT 2017

17.07.17 15:43, Antoine Pitrou пише:
> Cost of creating a namedtuple has been identified as a contributor to
> Python startup time.  Not only Python core and the stdlib, but any
> third-party library creating namedtuple classes (there are many of
> them).  An issue was created for this:
> https://bugs.python.org/issue28638
> 
> Raymond decided to close the issue because:
> 
> 1) the proposed resolution makes the "_source" attribute empty (or, at
> least, something else than it currently is).  Raymond claims the
> "_source" attribute is an essential feature of namedtuples.
> 
> 2) optimizing startup cost is supposedly not worth the effort.

The implementations of namedtuple that don't use compilation were 
written by different developers (including me) multiple times before 
issue28638. I provided my patch in issue28638 as an example, but I 
understand Raymond's arguments, and they look weighty to me. I don't 
know how much the _source attribute is used, but it is a part of public API.

The drawback of these implementation is slower __new__ and __repr__ 
methods. This can be avoided if use compilation for creating __new__, 
but this makes the creation of a namedtuple class slower (but still 
faster than compiling full namedtuple class). The drawback of generating 
_source without using it to create a namedtuple class is complicating 
the code and possible quickly desynchronization of two implementations 
in future.

I think that the right solution of this issue is generalizing the import 
machinery and allowing it to cache not just files, but arbitrary chunks 
of code. We already use precompiled bytecode files for exactly same goal 
-- speed up the startup by avoiding compilation. This solution could be 
used for caching other generated code, not just namedtuples.