[PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

Takashi Yano takashi.yano@nifty.ne.jp
Sat Sep 5 08:43:01 GMT 2020


Hi Corinna,

On Fri, 4 Sep 2020 21:22:35 +0200
Corinna Vinschen wrote:
> Hi Takashi,
> 
> On Sep  4 23:50, Takashi Yano via Cygwin-patches wrote:
> > Hi Corinna,
> > 
> > On Fri, 4 Sep 2020 14:44:00 +0200
> > Corinna Vinschen wrote:
> > > On Sep  4 18:21, Takashi Yano via Cygwin-patches wrote:
> > > > I think I have found the answer to your request.
> > > > Patch attached. What do you think of this patch?
> > > > 
> > > > Calling initial_setlocale() is necessary because
> > > > nl_langinfo() always returns "ANSI_X3.4-1968"
> > > > regardless locale setting if this is not called.
> > > [...]
> > > However, the initial_setlocale() call in dll_crt0_1 calls
> > > internal_setlocale(), and *that* function sets the conversion functions
> > > for the internal conversions.  What it *doesn't* do yet at the moment is
> > > to store the charset name itself or, better, the equivalent codepage.
> > > 
> > > If we change that, setup_locale can simply go away.  Below is a patch
> > > doing just that.  Can you please check if that works in your test
> > > scenarios?
> > 
> > I tried your patch, but unfortunately it does not work.
> > cygheap->locale.term_code_page is 0 in pty master.
> > 
> > If the following lines are moved in internal_setlocale(),
> > 
> >   const char *charset = __locale_charset (__get_global_locale ());
> >   debug_printf ("Global charset set to %s", charset);
> >   /* Store codepage to be utilized by pseudo console code. */
> >   cygheap->locale.term_code_page =
> >             __eval_codepage_from_internal_charset (charset);
> > 
> > in internal_setlocale() before
> > 
> >   /* Don't do anything if the charset hasn't actually changed. */
> >   if (cygheap->locale.mbtowc == __get_global_locale ()->mbtowc)
> >     return;
> 
> Uh, that makes sense.
> 
> > cygheap->locale.term_code_page is always 65001 even if mintty is
> > startted by
> > mintty -o locale=ja_JP -o charset=CP932
> > or
> > mintty -o locale=ja_JP -o charset=EUCJP
> > 
> > Perhaps, this is because LANG is not set properly yet when mintty
> > is started.
> 
> Yeah, that's the reason.  The above settings of locale and charset on
> the CLI should only take over when mintty calls setlocale() with a
> matching string.  The fact that it sets the matching value in the
> environment, too, should only affect child processes, not mintty itself.
> 
> But it's incorrect to call initial_setlocale() from setup_locale()
> without resetting it to its original value.
> 
> Unfortunately that doesn't solve any problem with the pseudo console
> codepage.  Drat.  It sounds like you need the terminal's charset,
> rather than the one set in the environment.
> 
> So this boils down to the fact that term_code_page must be set
> after the application is already running and as soo as it creates
> the pty, me thinks.  What if __eval_codepage_from_internal_charset()
> is called at pty creation?  Or even on reading from /writing to
> the pty the first time?  That should always be late enough to fetch
> the correct codepage.
> 
> Patch attached.  Does that work as expected?

Thank you very much for the patch.

Your new additional patch works well except the test case such as:

  int pm = getpt();
  if (fork()) {
    [do the master operations]
  } else {
    int ps = open(ptsname(pm), O_RDWR|O_NOCTTY);
    close(pm);
    setsid();
    ioctl(ps, TIOCSCTTY, 1);
    dup2(ps, 0);
    dup2(ps, 1);
    dup2(ps, 2);
    close(ps);
    [exec non-cygwin process]
  }

If this test case is run in cygwin console (command prompt),
it causes garbled output due to term_code_page == 0.

The second additional patch attached fixes the isseu.

> Btw., the main loop in fhandler_pty_master::pty_master_fwd_thread()
> calls 
> 
>   char *buf = convert_mb_str (cygheap->locale.term_code_page,
>                               &nlen, CP_UTF8, ptr, wlen);
>                                      ^^^^^^^
>   [...]
>   WriteFile (to_master_cyg, ...
> 
> But then, after the code breaks from that loop, it calls
> 
>   char *buf = convert_mb_str (cygheap->locale.term_code_page, &nlen,
>                               GetConsoleOutputCP (), ptr, wlen);
>                               ^^^^^^^^^^^^^^^^^^^^^
>   [...]
>   process_opost_output (to_master_cyg, ...
> 
> process_opost_output then calls WriteFile on that to_master_cyg handle,
> just like the WriteFile call above.
> 
> Is that really correct?  Shouldn't the second invocation use CP_UTF8 as
> well?

That is correct. The first conversion is for the case that pseudo
console is enabled, and the second one is for the case that pseudo
console is disabled.

Pseudo console converts charset from console code page to UTF-8.
Therefore, data read from from_slave is always UTF-8 when pseudo
console is enabled. Moreover, OPOST processing is done in pseudo
console, so write data simply by WriteFile() is enough.

If pseudo console is disabled, cmd.exe and so on uses console
code page, so the code page of data read from from_slave is
GetConsoleOutputCP(). In this case, OPOST processing is necessary.

If the second conversion and process_opost_output() are in the
"else {}" block for "if (get_ttyp ()->h_pseudo_console) {}",
the intent whould be clearer.

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: set-term-code-page-in-_ti.diff
URL: <https://cygwin.com/pipermail/cygwin-patches/attachments/20200905/107c2103/attachment.ksh>


More information about the Cygwin-patches mailing list