[PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

Corinna Vinschen corinna-cygwin@cygwin.com
Mon Sep 7 08:26:33 GMT 2020

Hi Takashi,

On Sep  5 17:43, Takashi Yano via Cygwin-patches wrote:
> On Fri, 4 Sep 2020 21:22:35 +0200
> Corinna Vinschen wrote:
> > So this boils down to the fact that term_code_page must be set
> > after the application is already running and as soo as it creates
> > the pty, me thinks.  What if __eval_codepage_from_internal_charset()
> > is called at pty creation?  Or even on reading from /writing to
> > the pty the first time?  That should always be late enough to fetch
> > the correct codepage.
> > 
> > Patch attached.  Does that work as expected?
> Thank you very much for the patch.
> Your new additional patch works well except the test case such as:
>   int pm = getpt();
>   if (fork()) {
>     [do the master operations]
>   } else {
>     int ps = open(ptsname(pm), O_RDWR|O_NOCTTY);
>     close(pm);
>     setsid();
>     ioctl(ps, TIOCSCTTY, 1);
>     dup2(ps, 0);
>     dup2(ps, 1);
>     dup2(ps, 2);
>     close(ps);
>     [exec non-cygwin process]
>   }
> If this test case is run in cygwin console (command prompt),
> it causes garbled output due to term_code_page == 0.

term_code_page is set on fhandler_pty_slave::open, which is what
you call above.  How can term_code_page be 0 after that?  Are you
talking about the forking parent process being master?  If so, either
switching `#if 1/#if 0' blocks in the patch might fix this (by setting
term_code_page on first read/write), or alternatively adding an
__eval_codepage_from_internal_charset() call to the master creation
as well.  Did you try enabling the #if 0'd blocks?

Either way, the call to __eval_codepage_from_internal_charset() should
take place as soon as the first pty gets created or on first pty

__eval_codepage_from_internal_charset() could also simply be called
on *every* invocation of pty read/write, given how low profile it is.

Apart from that, I don't see anything wrong with the above scenario.
If the application creating the pty does *not* switch locale, we're
in "C" locale territory, and we should default to UTF-8.  That's
what a call to __eval_codepage_from_internal_charset() would do,
because it defaults to UTF-8 and never returns the ASCII codepage.

> > Btw., the main loop in fhandler_pty_master::pty_master_fwd_thread()
> > calls 
> > 
> >   char *buf = convert_mb_str (cygheap->locale.term_code_page,
> >                               &nlen, CP_UTF8, ptr, wlen);
> >                                      ^^^^^^^
> >   [...]
> >   WriteFile (to_master_cyg, ...
> > 
> > But then, after the code breaks from that loop, it calls
> > 
> >   char *buf = convert_mb_str (cygheap->locale.term_code_page, &nlen,
> >                               GetConsoleOutputCP (), ptr, wlen);
> >                               ^^^^^^^^^^^^^^^^^^^^^
> >   [...]
> >   process_opost_output (to_master_cyg, ...
> > 
> > process_opost_output then calls WriteFile on that to_master_cyg handle,
> > just like the WriteFile call above.
> > 
> > Is that really correct?  Shouldn't the second invocation use CP_UTF8 as
> > well?
> That is correct. The first conversion is for the case that pseudo
> console is enabled, and the second one is for the case that pseudo
> console is disabled.
> Pseudo console converts charset from console code page to UTF-8.
> Therefore, data read from from_slave is always UTF-8 when pseudo
> console is enabled. Moreover, OPOST processing is done in pseudo
> console, so write data simply by WriteFile() is enough.
> If pseudo console is disabled, cmd.exe and so on uses console
> code page, so the code page of data read from from_slave is
> GetConsoleOutputCP(). In this case, OPOST processing is necessary.

This is really confusing me.  We never set the console codepage in the
old pty code before, it was just pipes transmitting bytes.  Why do we
suddenly have to handle native apps running in a console in this case?!?

> diff --git a/winsup/cygwin/cygheap.h b/winsup/cygwin/cygheap.h
> index 2b84f4252..8877cc358 100644
> --- a/winsup/cygwin/cygheap.h
> +++ b/winsup/cygwin/cygheap.h
> @@ -341,7 +341,6 @@ struct cygheap_debug
>  struct cygheap_locale
>  {
>    mbtowc_p mbtowc;
> -  UINT term_code_page;

No, wait.  Just reverting this change without checking the alternatives
doesn't make sense.

Why would we want to store the codepage in every single tty, given
term_code_page is set to the same value in every one of them?  AFAICS
it's sufficient to have a single term_code_page shared with the child
processes via cygheap.  The idea is to get rid of the complex
setup_locale code in every execve call and just set it once in a process
tree starting at the process creating the ptys.


More information about the Cygwin-patches mailing list