[PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

Takashi Yano takashi.yano@nifty.ne.jp
Thu Sep 10 12:34:03 GMT 2020


On Thu, 10 Sep 2020 09:15:00 +0900
Takashi Yano via Cygwin-patches <cygwin-patches@cygwin.com> wrote:
> Hi Corinna,
> 
> On Wed, 9 Sep 2020 09:21:23 +0200
> Corinna Vinschen wrote:
> > On Sep  8 17:16, Takashi Yano via Cygwin-patches wrote:
> > > On Mon, 7 Sep 2020 23:17:36 +0200 (CEST)
> > > Johannes Schindelin wrote:
> > > > Hi Takashi,
> > > > 
> > > > On Sat, 5 Sep 2020, Takashi Yano wrote:
> > > > 
> > > > > On Fri, 4 Sep 2020 08:23:42 +0200 (CEST)
> > > > > Johannes Schindelin wrote:
> > > > > >
> > > > > > On Fri, 4 Sep 2020, Takashi Yano via Cygwin-patches wrote:
> > > > > >
> > > > > > > On Tue, 1 Sep 2020 18:19:16 +0200 (CEST)
> > > > > > > Johannes Schindelin wrote:
> > > > > > > [...]
> > > > `LANG=en_US.UTF-8` (meaning your patches force their console applications'
> > > > output to be interpreted with code page 437) and therefore for those
> > > > users, things looked fine before, and now they don't.
> > > > 
> > > > Note that I am not talking about developers who develop said console
> > > > applications. I am talking about users who use those console applications.
> > > > In other words, I am talking about a vastly larger group of affected
> > > > people.
> > > > 
> > > > All of those people (or at least a substantial majority) will now have to
> > > > be told to please disable Pseudo Console support in v3.2.0 because they
> > > > would have to patch and rebuild those console applications that don't call
> > > > `SetConsoleOutputCP()`, and that is certainly unreasonable to expect of
> > > > the majority of users. Not even the `cmd /c chcp 65001` work-around (that
> > > > helps with v3.1.7) will work with v3.2.0 when Pseudo Console support is
> > > > enabled.
> > > 
> > > In the case where pseudo console is disabled, the patch I proposed in
> > > https://cygwin.com/pipermail/cygwin-patches/2020q3/010548.html
> > > will solve the issue. I mean apps which work correctly in cygwin 3.0.7
> > > work in cygwin 3.2.0 as well with that patch.
> > > 
> > > OTOH, in the case where pseudo console is enabled, non-cygwin apps which
> > > work correctly in command prompt, work in cygwin 3.2.0 as well.
> > > 
> > > It is enough for all, isn't it?
> > > 
> > > You may think that all non-cygwin apps which work in cygwin 3.0.7 must
> > > work in cygwin 3.2.0 even when pseudo console is enabled, but it is
> > > against the purpose of the pseudo console support. The goal of pseudo
> > > console support is to make non-cygwin apps work as if it is executed in
> > > command prompt.
> > > 
> > > By the way, you complained regarding garbled output of the program which
> > > outputs UTF-8 string regardless of locale.
> > > https://cygwin.com/pipermail/cygwin-developers/2020-August/011951.html
> > > However, many Japanese programmers, for example, make programs which
> > > output SJIS (CP932) string regardless of locale.
> > > 
> > > Why do you think the former must work while the latter deos not have to?
> > > Is there any reasonable reason other than backward compatibility?
> > 
> > Is that still a concern with the latest from master?  There's
> > a snapshot for testing, Johannes.
> 
> We are still discussing about that.
> https://github.com/msys2/MSYS2-packages/issues/1974
> 
> > Takashi, does the patch from
> > https://cygwin.com/pipermail/cygwin-developers/2020-August/011951.html
> > still apply to the latest from master?  Question is, shouldn't the
> > Windows calls setting the codepage be only called if started from
> > child_info_spawn::worker for non-Cygwin executables?
> 
> I'd propose the patch:
> 
> diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
> index 37d033bbe..95b28c3da 100644
> --- a/winsup/cygwin/fhandler_tty.cc
> +++ b/winsup/cygwin/fhandler_tty.cc
> @@ -1830,7 +1830,11 @@ fhandler_pty_slave::setup_locale (void)
>    extern UINT __eval_codepage_from_internal_charset ();
> 
>    if (!get_ttyp ()->term_code_page)
> -    get_ttyp ()->term_code_page = __eval_codepage_from_internal_charset ();
> +    {
> +      get_ttyp ()->term_code_page = __eval_codepage_from_internal_charset ();
> +      SetConsoleCP (get_ttyp ()->term_code_page);
> +      SetConsoleOutputCP (get_ttyp ()->term_code_page);
> +    }
>  }
> 
>  void
> 
> However, Johannes insists setting codepage for non-cygwin apps
> even when pseudo console is enabled, which I cannot agree.
> 
> Actually, I hesitate even the patch above, however, it seems to
> be necessary for msys apps in terms of backward compatibility.

I found that output of Oracle java.exe and javac.exe are garbled
if the patch above is applied. This is because java.exe and javac.exe
output SJIS code unconditionally in my environment.

OTOH, rust-based program such as cargo and ripgrep output UTF-8
unconditionally. node.js also seems to output UTF-8 string by
default.

I think there is no way for both apps to work properly if pseudo
console is disabled. As far as I tested, both of them works when
pseudo console is enabled. IMHO, the best way to achieve maximum
compatibility, is enabling pseudo console, which is disabled in
MSYS2 by default.

As for the case with pseudo console disabled:

If backward compatibility is important, we should apply the patch
above. If compatibility with the behaviour in command prompt is
important, we should leave the codepage to the system default.

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>


More information about the Cygwin-patches mailing list