[PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

Corinna Vinschen corinna-cygwin@cygwin.com
Fri Sep 4 19:22:35 GMT 2020


Hi Takashi,

On Sep  4 23:50, Takashi Yano via Cygwin-patches wrote:
> Hi Corinna,
> 
> On Fri, 4 Sep 2020 14:44:00 +0200
> Corinna Vinschen wrote:
> > On Sep  4 18:21, Takashi Yano via Cygwin-patches wrote:
> > > I think I have found the answer to your request.
> > > Patch attached. What do you think of this patch?
> > > 
> > > Calling initial_setlocale() is necessary because
> > > nl_langinfo() always returns "ANSI_X3.4-1968"
> > > regardless locale setting if this is not called.
> > [...]
> > However, the initial_setlocale() call in dll_crt0_1 calls
> > internal_setlocale(), and *that* function sets the conversion functions
> > for the internal conversions.  What it *doesn't* do yet at the moment is
> > to store the charset name itself or, better, the equivalent codepage.
> > 
> > If we change that, setup_locale can simply go away.  Below is a patch
> > doing just that.  Can you please check if that works in your test
> > scenarios?
> 
> I tried your patch, but unfortunately it does not work.
> cygheap->locale.term_code_page is 0 in pty master.
> 
> If the following lines are moved in internal_setlocale(),
> 
>   const char *charset = __locale_charset (__get_global_locale ());
>   debug_printf ("Global charset set to %s", charset);
>   /* Store codepage to be utilized by pseudo console code. */
>   cygheap->locale.term_code_page =
>             __eval_codepage_from_internal_charset (charset);
> 
> in internal_setlocale() before
> 
>   /* Don't do anything if the charset hasn't actually changed. */
>   if (cygheap->locale.mbtowc == __get_global_locale ()->mbtowc)
>     return;

Uh, that makes sense.

> cygheap->locale.term_code_page is always 65001 even if mintty is
> startted by
> mintty -o locale=ja_JP -o charset=CP932
> or
> mintty -o locale=ja_JP -o charset=EUCJP
> 
> Perhaps, this is because LANG is not set properly yet when mintty
> is started.

Yeah, that's the reason.  The above settings of locale and charset on
the CLI should only take over when mintty calls setlocale() with a
matching string.  The fact that it sets the matching value in the
environment, too, should only affect child processes, not mintty itself.

But it's incorrect to call initial_setlocale() from setup_locale()
without resetting it to its original value.

Unfortunately that doesn't solve any problem with the pseudo console
codepage.  Drat.  It sounds like you need the terminal's charset,
rather than the one set in the environment.

So this boils down to the fact that term_code_page must be set
after the application is already running and as soo as it creates
the pty, me thinks.  What if __eval_codepage_from_internal_charset()
is called at pty creation?  Or even on reading from /writing to
the pty the first time?  That should always be late enough to fetch
the correct codepage.

Patch attached.  Does that work as expected?

> > However, there's something which worries me.  Why do we need or set the
> > pseudo terminal codepage at all?  I see that you convert from MB charset
> > to MB charset and then use the result in WriteFile to the connecting
> > pipes.  Question is this: Why not just converting the strings via
> > sys_mbstowcs to a UTF-16 string and then send that over the line, using
> > WriteConsoleW for the final output to the console?  That would simplify
> > this stuff quite a bit, wouldn't it?  After all, for writing UTF-16 to
> > the console, we simply don't need to know or care for the console CP.
> 
> WriteConsoleW() cannot be used because the handle to_master_cyg is
> not a console handle.

Sigh, of course.  It's all just pipes.  I forgot that, sorry.

Btw., the main loop in fhandler_pty_master::pty_master_fwd_thread()
calls 

  char *buf = convert_mb_str (cygheap->locale.term_code_page,
                              &nlen, CP_UTF8, ptr, wlen);
                                     ^^^^^^^
  [...]
  WriteFile (to_master_cyg, ...

But then, after the code breaks from that loop, it calls

  char *buf = convert_mb_str (cygheap->locale.term_code_page, &nlen,
                              GetConsoleOutputCP (), ptr, wlen);
                              ^^^^^^^^^^^^^^^^^^^^^
  [...]
  process_opost_output (to_master_cyg, ...

process_opost_output then calls WriteFile on that to_master_cyg handle,
just like the WriteFile call above.

Is that really correct?  Shouldn't the second invocation use CP_UTF8 as
well?


Corinna
-------------- next part --------------
>From 89acd62e88871a89b9bcb2964924f8960c7673ec Mon Sep 17 00:00:00 2001
From: Corinna Vinschen <corinna@vinschen.de>
Date: Fri, 4 Sep 2020 21:20:35 +0200
Subject: [PATCH] Cygwin: try calling __eval_codepage_from_internal_charset in
 pty code

the idea being, that we need the tty's locale and charset, not the
environment locale settings when starting the tty.
---
 winsup/cygwin/fhandler_tty.cc | 17 +++++++++++++++++
 winsup/cygwin/nlsfuncs.cc     | 12 +++++-------
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
index f207a0b27892..feb014175123 100644
--- a/winsup/cygwin/fhandler_tty.cc
+++ b/winsup/cygwin/fhandler_tty.cc
@@ -39,6 +39,8 @@ extern "C" {
   VOID WINAPI ClosePseudoConsole (HPCON);
 }
 
+extern UINT __eval_codepage_from_internal_charset ();
+
 #define close_maybe(h) \
   do { \
     if (h && h != INVALID_HANDLE_VALUE) \
@@ -633,6 +635,11 @@ fhandler_pty_slave::open (int flags, mode_t)
 
   fhandler_console::need_invisible ();
 
+#if 1 /* Let's try this first */
+  if (!cygheap->locale.term_code_page)
+    cygheap->locale.term_code_page = __eval_codepage_from_internal_charset ();
+#endif
+
   set_open_status ();
   return 1;
 
@@ -1607,6 +1614,11 @@ fhandler_pty_master::write (const void *ptr, size_t len)
   if (bg <= bg_eof)
     return (ssize_t) bg;
 
+#if 0 /* Let's try this if setting codepage at pty open time is not enough */
+  if (!cygheap->locale.term_code_page)
+    cygheap->locale.term_code_page = __eval_codepage_from_internal_charset ();
+#endif
+
   push_process_state process_state (PID_TTYOU);
 
   /* Write terminal input to to_slave pipe instead of output_handle
@@ -1969,6 +1981,11 @@ fhandler_pty_master::pty_master_fwd_thread ()
   DWORD rlen;
   char outbuf[65536];
 
+#if 0 /* Let's try this if setting codepage at pty open time is not enough */
+  if (!cygheap->locale.term_code_page)
+    cygheap->locale.term_code_page = __eval_codepage_from_internal_charset ();
+#endif
+
   termios_printf ("Started.");
   for (;;)
     {
diff --git a/winsup/cygwin/nlsfuncs.cc b/winsup/cygwin/nlsfuncs.cc
index 10a3a0705142..6bffc77c91eb 100644
--- a/winsup/cygwin/nlsfuncs.cc
+++ b/winsup/cygwin/nlsfuncs.cc
@@ -1452,9 +1452,10 @@ __set_charset_from_locale (const char *locale, char *charset)
    internal charset setting.  This is *not* necessarily the Windows
    codepage connected to a locale by default, so we have to set this
    up explicitely. */
-static UINT
-__eval_codepage_from_internal_charset (const char *charset)
+UINT
+__eval_codepage_from_internal_charset ()
 {
+  const char *charset = __locale_charset (__get_global_locale ());
   UINT codepage = CP_UTF8; /* Default UTF8 */
 
   /* The internal charset names are well defined, so we can use shortcuts. */
@@ -1582,11 +1583,8 @@ internal_setlocale ()
   if (cygheap->locale.mbtowc == __get_global_locale ()->mbtowc)
     return;
 
-  const char *charset = __locale_charset (__get_global_locale ());
-  debug_printf ("Global charset set to %s", charset);
-  /* Store codepage to be utilized by pseudo console code. */
-  cygheap->locale.term_code_page =
-			__eval_codepage_from_internal_charset (charset);
+  debug_printf ("Global charset set to %s",
+		__locale_charset (__get_global_locale ()));
   /* Fetch PATH and CWD and convert to wchar_t in previous charset. */
   path = getenv ("PATH");
   if (path && *path)	/* $PATH can be potentially unset. */
-- 
2.26.2



More information about the Cygwin-patches mailing list