[PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

Brian Inglis Brian.Inglis@SystematicSw.ab.ca
Fri Sep 4 14:05:13 GMT 2020

On 2020-09-04 06:44, Corinna Vinschen wrote:
> Hi Takashi,
> On Sep  4 18:21, Takashi Yano via Cygwin-patches wrote:
>> Hi Corinna,
>> On Thu, 3 Sep 2020 19:59:12 +0200
>> Corinna Vinschen wrote:
>>> The only idea I had so far was, changing the way __set_charset_from_locale
>>> works from within _setlocale_r:
>>> We could add a Cygwin-specific function only fetching the codepage and
>>> call it unconditionally from _setlocale_r.  __set_charset_from_locale is
>>> called with a new parameter "codepage", so it doesn't have to fetch the
>>> CP by itself, but it's still only called from _setlocale_r if necessary.
>>> Would that be sufficient?  The CP conversion from 20127/ASCII to 65001/UTF8
>>> could be done at the point the codepage is actually required.
>> I think I have found the answer to your request.
>> Patch attached. What do you think of this patch?
>> Calling initial_setlocale() is necessary because
>> nl_langinfo() always returns "ANSI_X3.4-1968"
>> regardless locale setting if this is not called.
> Well, this is correct.  Per POSIX, a standard-conformant application is
> running in the "C" locale unless it calls setlocale() explicitely.
> That's one reason Cygwin defaults to UTF-8 internally.  Everything,
> including the terminal, is supposed to default to UTF-8.  That's the
> most sane default, even if an application is not locale-aware.
> So, to follow POSIX, initial_setlocale() is used to set up the
> environment and command line stuff and then, before calling the
> application's main, Cygwin calls _setlocale_r (_REENT, LC_CTYPE, "C");
> to reset the apps default locale to "C".  That's why nl_langinfo()
> returns "ANSI_X3.4-1968".
> However, the initial_setlocale() call in dll_crt0_1 calls
> internal_setlocale(), and *that* function sets the conversion functions
> for the internal conversions.  What it *doesn't* do yet at the moment is
> to store the charset name itself or, better, the equivalent codepage.
> If we change that, setup_locale can simply go away.  Below is a patch
> doing just that.  Can you please check if that works in your test
> scenarios?
> However, there's something which worries me.  Why do we need or set the
> pseudo terminal codepage at all?  I see that you convert from MB charset
> to MB charset and then use the result in WriteFile to the connecting
> pipes.  Question is this: Why not just converting the strings via
> sys_mbstowcs to a UTF-16 string and then send that over the line, using
> WriteConsoleW for the final output to the console?  That would simplify
> this stuff quite a bit, wouldn't it?  After all, for writing UTF-16 to
> the console, we simply don't need to know or care for the console CP.

IIRC his locale was ja_JP.UTF-8 but he got English messages on CP 932!

Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]

More information about the Cygwin-patches mailing list