[PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

Brian Inglis Brian.Inglis@SystematicSw.ab.ca
Tue Sep 8 04:52:59 GMT 2020

On 2020-09-07 15:08, Johannes Schindelin wrote:
> On Mon, 7 Sep 2020, Takashi Yano via Cygwin-patches wrote:
>> On Mon, 7 Sep 2020 10:26:33 +0200
>> Corinna Vinschen wrote:
>>> Hi Takashi,
>>> On Sep  5 17:43, Takashi Yano via Cygwin-patches wrote:
>>>> On Fri, 4 Sep 2020 21:22:35 +0200
>>>> Corinna Vinschen wrote:
>>>>> Btw., the main loop in
>>>>> fhandler_pty_master::pty_master_fwd_thread() calls
>>>>>   char *buf = convert_mb_str (cygheap->locale.term_code_page,
>>>>>                               &nlen, CP_UTF8, ptr, wlen);
>>>>>                                      ^^^^^^^
>>>>>   [...]
>>>>>   WriteFile (to_master_cyg, ...
>>>>> But then, after the code breaks from that loop, it calls
>>>>>   char *buf = convert_mb_str (cygheap->locale.term_code_page, &nlen,
>>>>>                               GetConsoleOutputCP (), ptr, wlen);
>>>>>                               ^^^^^^^^^^^^^^^^^^^^^
>>>>>   [...]
>>>>>   process_opost_output (to_master_cyg, ...
>>>>> process_opost_output then calls WriteFile on that to_master_cyg handle,
>>>>> just like the WriteFile call above.
>>>>> Is that really correct?  Shouldn't the second invocation use CP_UTF8 as
>>>>> well?
>>>> That is correct. The first conversion is for the case that pseudo
>>>> console is enabled, and the second one is for the case that pseudo
>>>> console is disabled.
>>>> Pseudo console converts charset from console code page to UTF-8.
>>>> Therefore, data read from from_slave is always UTF-8 when pseudo
>>>> console is enabled. Moreover, OPOST processing is done in pseudo
>>>> console, so write data simply by WriteFile() is enough.
>>>> If pseudo console is disabled, cmd.exe and so on uses console
>>>> code page, so the code page of data read from from_slave is
>>>> GetConsoleOutputCP(). In this case, OPOST processing is necessary.
>>> This is really confusing me.  We never set the console codepage in the
>>> old pty code before, it was just pipes transmitting bytes.  Why do we
>>> suddenly have to handle native apps running in a console in this case?!?
>> This is actually not related to pseudo console. In Japanese environment,
>> cmd.exe output CP932 string by default. This caused gabled output in old
>> cygwin such as 3.0.7. The code for the case that pseudo console is
>> disabled is to fix this.
> It is related to Pseudo Console insofar as it was slipped in as part of
> the Pseudo Console patches.
> And what Takashi reports as a bug fix is the underlying reason for the
> tickets in MSYS2 (and elsewhere) that I mentioned.
> In fact, I even suggested in
> https://github.com/msys2/MSYS2-packages/issues/1974#issuecomment-685475967
> to revert that change.
> What Takashi describes as "correct behavior" unfortunately seems not to be
> very common in practice, which is why I contend that from the users' point
> of view, it could not matter less whether the console applications are
> "correct" or not. From the point of view of users who have their `LANG`
> set to something like `en_US.UTF-8`, the encoding was correct before, and
> now it is no longer correct. And _that_ is the correctness users actually
> care about.

But also for users running locales and localization using non-Latin scripts, it
is important that messages be generated in languages they understand and output
in characters they can read.
It has been for some years (at least since the EU was formed in 1993) inadequate
and erroneous to support only en_US.ASCII.

Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]

More information about the Cygwin-patches mailing list