[Python-Dev] pymalloc and overallocation
(unicodeobject.c,2.139,2.140 checkin)
Tim Peters
tim.one@comcast.net
Fri, 26 Apr 2002 16:59:23 -0400
[Tim]
>> But Marc-Andre uses realloc at the end to return the excess. The
>> excess bytes will get reused (and some returned yet again) by the
>> next overallocation, and so on.
[Martin]
> Right. I confused this with the fact that PyMem_Realloc won't return
> the excess memory,
PyMem_Realloc does whatever the system realloc does -- PyMem_Realloc doesn't
go thru pymalloc today (except in a PYMALLOC_DEBUG build). Doesn't matter,
though, since strings use the PyObject_{Malloc, Free, Realloc} family today,
and that does use pymalloc. OTOH, there's no reason PyObject_Realloc *has*
to hang on to all small-block memory on a shrinking realloc, and there's no
reason pymalloc couldn't grow another realloc entry point specifying what
the caller wants a shrinking realloc to do. These things are all easy to
change, but I don't know what's truly desirable.
Note another subtlety: I expect you brought up PyMem_Realloc because
unicodeobject.c uses the PyMem_XYZ family for managing the
PyUnicodeObject.str member today. That means it normally never uses
pymalloc at all, except to allocate fixed-size PyUnicodeObject structs
(which use the PyObject_XYZ memory family). I don't know whether that's the
best idea, but that's how it is today.
pymalloc gets into this because PyUnicode_EncodeUTF8 returns a plain string
object, and the latter uses pymalloc today.
> so the extra bytes in a small string will be wasted for the life
> time of the string object - that still could cause significant memory
> wastage.
It could. Python generally aims to optimize the expected case, not jump
thru hoops to avoid worst cases (else we wouldn't use dicts at all <wink>).
But I don't know what the expected case is here, and given how often I use
Unicode in my own work it could be I'll never have a clue. Note that the
expected uses of Unicode strings makes no difference to
PyUnicode_EncodeUTF8: what counts there is the expected lifetimes and sizes
of the "plain" utf8-encoded PyStringObjects it computes. Indeed, pymalloc
has almost no implications for Unicode beyond the encode-as-a-plain-string
functions (unless unicodeobject.c is changed to manage the
PyUnicodeObject.str member using pymalloc too, as plain strings do today).
>> MAL, you should keep in mind that pymalloc is also managing the
>> small chunks in your scheme: when you're fiddling with a 40-character
>> Unicode string, an overallocation "by a factor of 4" only amounts to
>> an 80-character UTF8 string.
> [I guess this is a terminology, not a math problem:
Nope! Turns out it was an hallucination problem <wink>.
> a 40 character Unicode string has already 80 bytes; the UTF-8 of
> it can have up to 160 bytes].
You're right, of course. The conclusion doesn't change, though: that's
still in the range of block pymalloc handles (and will remain so unless I
reduce pymalloc's small-object threshold below what's needed for pymalloc to
handle small dicts on its own -- which I'm unlikely to do).