classification
Title: time.tzname on Python 3.3.0 for Windows is decoded by wrong encoding
Type: behavior Stage:
Components: Extension Modules, Windows Versions: Python 3.5, Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, amaury.forgeotdarc, belopolsky, haypo, jcea, msmhrt, ocean-city
Priority: normal Keywords: 3.3regression

Created on 2012-10-25 11:56 by msmhrt, last changed 2014-07-30 16:50 by BreamoreBoy.

Messages (13)
msg173755 - (view) Author: Masami HIRATA (msmhrt) Date: 2012-10-25 11:56
OS: Windows 7 Starter Edition SP1 (32-bit) Japanese version
Python: 3.3.0 for Windows x86 (python-3.3.0.msi)

time.tzname on Python 3.3.0 for Windows is decoded by wrong encoding.

C:\Python33>python.exe
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (In
tel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> time.tzname[0]
'\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)'
>>> time.tzname[0].encode('iso-8859-1').decode('mbcs')
'東京 (標準時)'
>>>

'東京 (標準時)' means 'Tokyo (Standard Time)' in Japanese.
time.tzname on Python 3.2.3 for Windows works correctly.

C:\Python32>python.exe
Python 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> time.tzname[0]
'東京 (標準時)'
>>>
msg173758 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-25 13:47
I see in 3.3 PyUnicode_DecodeFSDefaultAndSize() was replaced by PyUnicode_DecodeLocale().

What show sys.getdefaultencoding(), sys.getfilesystemencoding(), and locale.getpreferredencoding()?
msg173772 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2012-10-25 17:31
Looking at the CRT source code, tznames should be decoded with mbcs.
See also http://mail.python.org/pipermail/python-3000/2007-August/009290.html
msg173784 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-25 18:06
As I understand, OP has UTF-8 locale.
msg173798 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-10-25 20:30
>I see in 3.3 PyUnicode_DecodeFSDefaultAndSize() was replaced
> by PyUnicode_DecodeLocale().

Related changes:

 - 8620e6901e58 for the issue #5905
 - 279b0aee0cfb for the issue #13560

I wrote 8620e6901e58 for Linux, when the wcsftime() function is missing.

The problem is the changeset 279b0aee0cfb: it introduces a regression on Windows. It looks like PyUnicode_DecodeFSDefault() and PyUnicode_DecodeFSDefault() use a different encoding on Windows.

I suppose that we need to add an #ifdef MS_WINDOWS to use PyUnicode_DecodeFSDefault() on Windows, and PyUnicode_DecodeFSDefault() on Linux.

See also the issue #10653: time.strftime() uses strftime() (bytes) instead of wcsftime() (unicode) on Windows, because wcsftime() and tzname format the timezone differently.
msg173806 - (view) Author: Masami HIRATA (msmhrt) Date: 2012-10-25 22:55
> What show sys.getdefaultencoding(), sys.getfilesystemencoding(), and locale.getpreferredencoding()?

C:\Python33>python.exe
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (In
tel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
>>> sys.getfilesystemencoding()
'mbcs'
>>> import locale
>>> locale.getpreferredencoding()
'cp932'
>>>

'cp932' is the same as 'mbcs' in the Japanese environment.
msg173824 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-10-26 07:07
> >>> sys.getfilesystemencoding()
> 'mbcs'
> >>> import locale
> >>> locale.getpreferredencoding()
> 'cp932'
> >>>
>
> 'cp932' is the same as 'mbcs' in the Japanese environment.

And what is the value.of locale.getpreferredencoding(False)?
msg173827 - (view) Author: Masami HIRATA (msmhrt) Date: 2012-10-26 09:08
> And what is the value.of locale.getpreferredencoding(False)?

>>> import locale
>>> locale.getpreferredencoding(False)
'cp932'
>>>
msg174161 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-10-29 23:19
See also the issue #836035.
msg174164 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-10-29 23:59
According to CRT source code:
 - tzset() uses WideCharToMultiByte(lc_cp, 0, tzinfo.StandardName, -1, tzname[0], _TZ_STRINGS_SIZE - 1, NULL, &defused) with lc_cp = ___lc_codepage_func().
 - wcsftime("%z") and wcsftime("%Z") use _mbstowcs_s_l() to decode the time zone name

I tried to call ___lc_codepage_func(): it returns 0. I suppose that it means that mbstowcs() and wcstombs() use the ANSI code page.

Instead of trying to bet what is the correct encoding, it would be simpler (and safer) to read the Unicode version of the tzname array: StandardName and DaylightName of GetTimeZoneInformation().

If anything is changed, time.strftime(), time.strptime(), datetime.datetime.strftime() and time.tzname must be checked (with "%Z" format).
msg174165 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-10-30 00:28
"Instead of trying to bet what is the correct encoding, it would be simpler (and safer) to read the Unicode version of the tzname array: StandardName and DaylightName of GetTimeZoneInformation()."

GetTimeZoneInformation() formats correctly timezone names, but it reintroduces #10653 issue: time.strftime("%Z") formats the timezone name differently.

See also issue #13029 which is a duplicate of #10653, but contains useful information.

--

Example on Windows 7 with a french setup configured to Tokyo's timezone.

Using GetTimeZoneInformation(), time.tzname is ("Tokyo", "Tokyo (heure d\u2019\xe9t\xe9)"). U+2019 is the "RIGHT SINGLE QUOTATION MARK". This character is usually replaced with U+0027 (APOSTROPHE) in ASCII.

time.strftime("%Z") gives "Tokyo (heure d'\x81\x66ete)" (if it is implemented using strftime() or wcsftime()).

--

If I understood correctly, Python 3.3 has two issues on Windows:

 * time.tzname is decoded from the wrong encoding
 * time.strftime("%Z") gives an invalid output

The real blocker issue is a bug in strftime() and wcsftime() in Windows CRT. A solution is to replace "%Z" with the timezone name before calling strftime() or wcsftime(), aka working around the Windows CRT bug.
msg176408 - (view) Author: Masami HIRATA (msmhrt) Date: 2012-11-26 12:14
Is there any progress on this issue?
msg224325 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-07-30 16:50
Could somebody respond to the originator please.
History
Date User Action Args
2014-07-30 16:50:45BreamoreBoysetnosy: + BreamoreBoy

messages: + msg224325
versions: + Python 3.5, - Python 3.3
2012-11-26 12:14:57msmhrtsetmessages: + msg176408
2012-10-30 10:03:40serhiy.storchakasetnosy: - serhiy.storchaka
2012-10-30 00:28:26hayposetnosy: + ocean-city
messages: + msg174165
2012-10-29 23:59:23hayposetmessages: + msg174164
2012-10-29 23:19:11hayposetmessages: + msg174161
2012-10-28 03:55:38jceasetnosy: + jcea
2012-10-26 09:08:31msmhrtsetmessages: + msg173827
2012-10-26 07:07:48hayposetmessages: + msg173824
2012-10-25 22:55:46msmhrtsetmessages: + msg173806
2012-10-25 20:30:20hayposetmessages: + msg173798
2012-10-25 19:12:46r.david.murraysetnosy: + belopolsky, haypo
2012-10-25 18:06:07serhiy.storchakasetmessages: + msg173784
2012-10-25 17:31:57amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg173772
2012-10-25 13:47:12serhiy.storchakasetversions: + Python 3.4
nosy: + serhiy.storchaka

messages: + msg173758

components: + Extension Modules
keywords: + 3.3regression
2012-10-25 11:56:50msmhrtcreate