[PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

Takashi Yano takashi.yano@nifty.ne.jp
Wed Sep 2 16:25:00 GMT 2020


Hi Corinna,

On Wed, 2 Sep 2020 17:24:50 +0200
Corinna Vinschen  wrote:
> On Sep  2 19:54, Takashi Yano via Cygwin-patches wrote:
> > Hi Corinna,
> > 
> > On Wed, 2 Sep 2020 10:38:18 +0200
> > Corinna Vinschen wrote:
> > > On Sep  2 10:30, Corinna Vinschen wrote:
> > > > Ok guys, I'm not opposed to this change in terms of its result,
> > > > but I'm starting to wonder why all this locale code in fhandler_tty
> > > > is necessary at all.
> > > > 
> > > > I see that get_langinfo() calls __loadlocale and performs a lot of stuff
> > > > on the charsets which looks like duplicates of the initial_setlocale()
> > > > call performed at DLL startup.
> > > > 
> > > > If there's anything missing in the initial_setlocale() call which would
> > > > be required by the pseudo tty code?  What exactly is it?  The codepage?
> > > > And why can't we just add the info to cygheap->locale at initial_setlocale()
> > > > time so it's available at exec time without going through all this hassle
> > > > every time?
> > > > 
> > > > Apart from that, all this locale/charset/lcid stuff should be concentrated
> > > > in nlsfunc.cc ideally.
> > > 
> > > get_locale_from_env() and get_langinfo() should go away.  If we just
> > > need a codepage for get_ttyp ()->term_code_page, we should really find a
> > > way to do this from within internal_setlocale().
> > 
> > I looked into internal_setlocale() code, but I could not found
> > the code which handles thecode page. I found the code handling
> > the code page in __set_charset_from_locale() function in nlsfuncs.cc,
> > but it does not return code page itself. Could you please explain
> > more detail of your idea?
> 
> I had none yet :)  I was just musing, without actually thinking about a
> solution.  But I think this isn't very complicated.  Given this is
> inside Cygwin, nothing keeps the function to have a well-defined
> side-effect, as in setting a (not yet existing) member "term_code_page"
> of cygheap->locale.
> 
> Kind of like this:
> 
> diff --git a/winsup/cygwin/cygheap.h b/winsup/cygwin/cygheap.h
> index 8877cc358c39..2b84f4252071 100644
> --- a/winsup/cygwin/cygheap.h
> +++ b/winsup/cygwin/cygheap.h
> @@ -341,6 +341,7 @@ struct cygheap_debug
>  struct cygheap_locale
>  {
>    mbtowc_p mbtowc;
> +  UINT term_code_page;
>  };
>  
>  struct user_heap_info
> diff --git a/winsup/cygwin/nlsfuncs.cc b/winsup/cygwin/nlsfuncs.cc
> index 668d7eb9e778..752f4239d911 100644
> --- a/winsup/cygwin/nlsfuncs.cc
> +++ b/winsup/cygwin/nlsfuncs.cc
> @@ -1298,6 +1298,9 @@ __set_charset_from_locale (const char *locale, char *charset)
>  			    LOCALE_IDEFAULTANSICODEPAGE | LOCALE_RETURN_NUMBER,
>  			    (PWCHAR) &cp, sizeof cp))
>      cp = 0;
> +  /* Store codepage in cygheap->locale so fhandler_tty can switch the
> +     pseudo console to the correct codepage. */
> +  cygheap->locale.term_code_page = cp ?: CP_UTF8;
>    /* Translate codepage and lcid to a charset closely aligned with the default
>       charsets defined in Glibc. */
>    const char *cs;
> 
> Make sense?

I have tried your code, however, it does not work as expected.
It seems that __set_charset_from_locale() is not called.
cygheap->locale.term_code_page is always 0.

I have added following lines into setup_locale() to make sure
to call __set_charset_from_locale() for a test,

  setlocale (LC_ALL, "");
  __set_charset_from_locale (__get_global_locale()->categories[LC_CTYPE], charset);
  get_ttyp ()->term_code_page = cygheap->locale.term_code_page;

however, term_code_page is set to 932 if locale is ja_JP.UTF-8.
In this case term_code_page should be CP_UTF8 (65001).

The code page retrieved in __set_charset_from_locale() is not
based on "UTF-8" but "ja_JP".

Let me consider a while.

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>


More information about the Cygwin-patches mailing list