[PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

Takashi Yano takashi.yano@nifty.ne.jp
Mon Sep 7 04:45:58 GMT 2020


On Mon, 7 Sep 2020 01:04:13 +0900
Takashi Yano via Cygwin-patches <cygwin-patches@cygwin.com> wrote:
> On Sun, 6 Sep 2020 19:15:30 +0900
> Takashi Yano via Cygwin-patches <cygwin-patches@cygwin.com> wrote:
> > On Sun, 6 Sep 2020 17:57:03 +0900
> > Takashi Yano via Cygwin-patches <cygwin-patches@cygwin.com> wrote:
> > > On Sat, 5 Sep 2020 23:15:16 +0900
> > > Takashi Yano via Cygwin-patches <cygwin-patches@cygwin.com> wrote:
> > > > Hi Corinna,
> > > > 
> > > > On Sat, 5 Sep 2020 20:15:06 +0900
> > > > Takashi Yano via Cygwin-patches <cygwin-patches@cygwin.com> wrote:
> > > > > On Sat, 5 Sep 2020 17:43:01 +0900
> > > > > Takashi Yano via Cygwin-patches <cygwin-patches@cygwin.com> wrote:
> > > > > > Hi Corinna,
> > > > > > 
> > > > > > On Fri, 4 Sep 2020 21:22:35 +0200
> > > > > > Corinna Vinschen wrote:
> > > > > > > Hi Takashi,
> > > > > > > 
> > > > > > > On Sep  4 23:50, Takashi Yano via Cygwin-patches wrote:
> > > > > > > > Hi Corinna,
> > > > > > > > 
> > > > > > > > On Fri, 4 Sep 2020 14:44:00 +0200
> > > > > > > > Corinna Vinschen wrote:
> > > > > > > > > On Sep  4 18:21, Takashi Yano via Cygwin-patches wrote:
> > > > > > > > > > I think I have found the answer to your request.
> > > > > > > > > > Patch attached. What do you think of this patch?
> > > > > > > > > > 
> > > > > > > > > > Calling initial_setlocale() is necessary because
> > > > > > > > > > nl_langinfo() always returns "ANSI_X3.4-1968"
> > > > > > > > > > regardless locale setting if this is not called.
> > > > > > > > > [...]
> > > > > > > > > However, the initial_setlocale() call in dll_crt0_1 calls
> > > > > > > > > internal_setlocale(), and *that* function sets the conversion functions
> > > > > > > > > for the internal conversions.  What it *doesn't* do yet at the moment is
> > > > > > > > > to store the charset name itself or, better, the equivalent codepage.
> > > > > > > > > 
> > > > > > > > > If we change that, setup_locale can simply go away.  Below is a patch
> > > > > > > > > doing just that.  Can you please check if that works in your test
> > > > > > > > > scenarios?
> > > > > > > > 
> > > > > > > > I tried your patch, but unfortunately it does not work.
> > > > > > > > cygheap->locale.term_code_page is 0 in pty master.
> > > > > > > > 
> > > > > > > > If the following lines are moved in internal_setlocale(),
> > > > > > > > 
> > > > > > > >   const char *charset = __locale_charset (__get_global_locale ());
> > > > > > > >   debug_printf ("Global charset set to %s", charset);
> > > > > > > >   /* Store codepage to be utilized by pseudo console code. */
> > > > > > > >   cygheap->locale.term_code_page =
> > > > > > > >             __eval_codepage_from_internal_charset (charset);
> > > > > > > > 
> > > > > > > > in internal_setlocale() before
> > > > > > > > 
> > > > > > > >   /* Don't do anything if the charset hasn't actually changed. */
> > > > > > > >   if (cygheap->locale.mbtowc == __get_global_locale ()->mbtowc)
> > > > > > > >     return;
> > > > > > > 
> > > > > > > Uh, that makes sense.
> > > > > > > 
> > > > > > > > cygheap->locale.term_code_page is always 65001 even if mintty is
> > > > > > > > startted by
> > > > > > > > mintty -o locale=ja_JP -o charset=CP932
> > > > > > > > or
> > > > > > > > mintty -o locale=ja_JP -o charset=EUCJP
> > > > > > > > 
> > > > > > > > Perhaps, this is because LANG is not set properly yet when mintty
> > > > > > > > is started.
> > > > > > > 
> > > > > > > Yeah, that's the reason.  The above settings of locale and charset on
> > > > > > > the CLI should only take over when mintty calls setlocale() with a
> > > > > > > matching string.  The fact that it sets the matching value in the
> > > > > > > environment, too, should only affect child processes, not mintty itself.
> > > > > > > 
> > > > > > > But it's incorrect to call initial_setlocale() from setup_locale()
> > > > > > > without resetting it to its original value.
> > > > > > > 
> > > > > > > Unfortunately that doesn't solve any problem with the pseudo console
> > > > > > > codepage.  Drat.  It sounds like you need the terminal's charset,
> > > > > > > rather than the one set in the environment.
> > > > > > > 
> > > > > > > So this boils down to the fact that term_code_page must be set
> > > > > > > after the application is already running and as soo as it creates
> > > > > > > the pty, me thinks.  What if __eval_codepage_from_internal_charset()
> > > > > > > is called at pty creation?  Or even on reading from /writing to
> > > > > > > the pty the first time?  That should always be late enough to fetch
> > > > > > > the correct codepage.
> > > > > > > 
> > > > > > > Patch attached.  Does that work as expected?
> > > > > > 
> > > > > > Thank you very much for the patch.
> > > > > > 
> > > > > > Your new additional patch works well except the test case such as:
> > > > > > 
> > > > > >   int pm = getpt();
> > > > > >   if (fork()) {
> > > > > >     [do the master operations]
> > > > > >   } else {
> > > > > >     int ps = open(ptsname(pm), O_RDWR|O_NOCTTY);
> > > > > >     close(pm);
> > > > > >     setsid();
> > > > > >     ioctl(ps, TIOCSCTTY, 1);
> > > > > >     dup2(ps, 0);
> > > > > >     dup2(ps, 1);
> > > > > >     dup2(ps, 2);
> > > > > >     close(ps);
> > > > > >     [exec non-cygwin process]
> > > > > >   }
> > > > > > 
> > > > > > If this test case is run in cygwin console (command prompt),
> > > > > > it causes garbled output due to term_code_page == 0.
> > > > > > 
> > > > > > The second additional patch attached fixes the isseu.
> > > > > 
> > > > > No. This does not fix enough.
> > > > > 
> > > > > In the test case above, if it does not call setlocale(),
> > > > > __eval_codepage_from_internal_charset() always returns "ASCII"
> > > > > regardless of locale setting. Therefore, output is garbled if
> > > > > the terminal charset is not UTF-8.
> > > > 
> > > > New additional patch against your patches is attached.
> > > > In the end, essentially what was being done in the original
> > > > code has been restored.
> > > 
> > > The patch revised a little.
> > 
> > Chages:
> > - If global locale is set, it takes precedence.
> 
> Changes:
> - Use __get_current_locale() instead of __get_global_locale().
> - Fix a bug for ISO-8859-* charset.

Changes:
- Use envblock if it is passed to CreateProcess in spawn.cc.

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: set-term-code-page.diff
URL: <https://cygwin.com/pipermail/cygwin-patches/attachments/20200907/aadac73f/attachment-0001.ksh>


More information about the Cygwin-patches mailing list