Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upbpo-39287: Doc: Add UTF-8 mode section in using/windows. #17935
Conversation
|
|
||
| .. versionadded:: 3.7 | ||
|
|
||
| Windows doesn't use UTF-8 for the system encoding (the ANSI Code Page). |
This comment has been minimized.
This comment has been minimized.
eryksun
Jan 10, 2020
Contributor
Windows 10 supports setting the system locale's ANSI and OEM codepages to UTF-8 (65001), but it's not enabled by default.
There are still problems to be resolved with using UTF-8 at the system level. In particular, the console host (conhost.exe) doesn't support using UTF-8 as the input codepage for use with ReadFile and ReadConsoleA. It encodes the UTF-16 input buffer with an internal WideCharToMultiByte call that assumes one byte per encoded character (at least in a Western locale, for which a single-byte encoding is assumed). This fails for non-ASCII characters, which in turn end up as null bytes in the result of a ReadFile or ReadConsoleA call. Python is immune to this problem for the most part. The I/O stack detects a console file and uses wide-character ReadConsoleW instead, via io._WindowsConsoleIO. The problem affects low-level os.write and os.read, however, because they're not integrated with _WindowsConsoleIO.
methane commentedJan 10, 2020
•
edited by bedevere-bot
https://bugs.python.org/issue39287