This issue tracker will soon become read-only and move to GitHub.
For a smoother transition, remember to log in and link your GitHub username to your profile.
For more information, see this post about the migration.

classification
Title: Batch-mode input() limited to 4095 characters on *NIX
Type: behavior Stage: test needed
Components: Interpreter Core Versions: Python 3.11, Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Romuald, eryksun, gstarck
Priority: normal Keywords:

Created on 2021-10-18 12:32 by Romuald, last changed 2021-10-25 11:45 by eryksun.

Messages (5)
msg404179 - (view) Author: Romuald Brunet (Romuald) * Date: 2021-10-18 12:32
When run in non-interactive mode and with a TTY stdin, the input() method will not read more than 4095 characters

Simple example:

>>> foo = input()  # paste a 5000 character pasteboard (one line)
>>> print(len(foo))
4095

Note that this will **not** happen when using an interactive shell (using python -i on the same script will not have the 4095 limit)

I've traced the issue (I think) to the fgets function called from my_fgets in myreadline.c, but I have no clue as to how one might fix this.
msg404910 - (view) Author: Grégory Starck (gstarck) * Date: 2021-10-23 20:56
reproduced and also seen in my_fgets.

but strange. it's fgets that seems to return/insert a \n after 4096 chars read from stdin :O I dont quite get. at all haha.

now just straced that.. we see 4 read of 1024 bytes/chars. 

and with strace -s 1025 I can see the last one : 

read(0, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\n", 1024) = 1024

so it's a prob in copy/paste of OS or something ?
msg404956 - (view) Author: Romuald Brunet (Romuald) * Date: 2021-10-25 09:39
This does not seems to be a copypaste issue.

I've re-tested using xdotool to "manually" type 5000 characters in to a X terminal (gnome-terminal and xterm, to be sure) and got the same result.

I also have 4 read(0, "...") with the last one ending with a "\n", that's a very strange behavior


I tried to test the same thing on a macOS version, but the input() / terminal would not let me insert more than 1024 characters
msg404958 - (view) Author: Grégory Starck (gstarck) * Date: 2021-10-25 10:04
> This does not seems to be a copypaste issue.

well. it's either not a prob in my_fgets()/fgets IMO. what the process reads on its stdin is already corrupted/broken.

but I'm interrested in knowing more about the issue/original cause.
msg404960 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-10-25 11:45
> but I'm interrested in knowing more about the issue/original cause.

When the readline module is imported in interactive mode, the PyOS_ReadlineFunctionPointer function pointer is set to call_readline(), which uses GNU Readline. Otherwise PyOS_Readline() calls PyOS_StdioReadline(), which calls my_fgets() and thus C standard I/O fgets().

In Linux, when the terminal is in canonical mode, a low-level read() is buffered with a limit of 4096 bytes. If the limit is exceeded, the call returns just the first 4095 bytes plus a trailing newline. You can verify this with os.read(0, 5000). GNU Readline disables canonical mode and works with the terminal at a lower level. I am far from an expert with Unix terminals, but here's the basics of something that allows input() to read more than 4096 characters without having to import the readline module.

    import sys
    import termios

    LFLAG = 3
    settings = termios.tcgetattr(sys.stdin.fileno())
    settings[LFLAG] &= ~termios.ICANON
    termios.tcsetattr(sys.stdin.fileno(), termios.TCSANOW, settings)

    try:
        s = input()
    finally:
        settings[LFLAG] |= termios.ICANON
        termios.tcsetattr(sys.stdin.fileno(), termios.TCSANOW, settings)

    print(len(s))
History
Date User Action Args
2021-10-25 11:45:49eryksunsetnosy: + eryksun
messages: + msg404960
2021-10-25 10:04:59gstarcksetmessages: + msg404958
2021-10-25 09:39:53Romualdsetmessages: + msg404956
2021-10-23 20:56:32gstarcksetnosy: + gstarck

messages: + msg404910
versions: + Python 3.8, Python 3.9, Python 3.10
2021-10-23 05:13:00terry.reedysettitle: input() method limited to 4095 characters on *NIX -> Batch-mode input() limited to 4095 characters on *NIX
stage: test needed
versions: + Python 3.11, - Python 3.6, Python 3.7, Python 3.8, Python 3.9, Python 3.10
2021-10-18 12:32:04Romualdcreate