[PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

Takashi Yano takashi.yano@nifty.ne.jp
Mon Sep 7 09:54:45 GMT 2020


On Mon, 7 Sep 2020 11:08:23 +0200
Corinna Vinschen wrote:
> Hi Takashi,
> 
> On Sep  7 13:45, Takashi Yano via Cygwin-patches wrote:
> >  #if 0 /* Let's try this if setting codepage at pty open time is not enough */
> > -  if (!cygheap->locale.term_code_page)
> > -    cygheap->locale.term_code_page = __eval_codepage_from_internal_charset ();
> > +  if (!get_ttyp ()->term_code_page)
> > +    get_ttyp ()->term_code_page = __eval_codepage_from_internal_charset (NULL);
> >  #endif
> 
> *If* we revert back to using setup_locale, these #if blocks would
> go away.
> 
> > -__eval_codepage_from_internal_charset ()
> > +__eval_codepage_from_internal_charset (const WCHAR *envblock)
> >  {
> > -  const char *charset = __locale_charset (__get_global_locale ());
> > +  const char *charset;
> > +  __locale_t *loc = NULL;
> > +  if (__get_current_locale ()->lc_cat[LC_CTYPE].buf)
> > +    charset = __locale_charset (__get_current_locale ());
> > +  else
> > +    {
> > +      char locale[ENCODING_LEN + 1] = {0, };
> > +      if (envblock)
> > +	{
> > +	  const WCHAR *lc_all = NULL, *lc_ctype = NULL, *lang = NULL;
> > +	  for (const WCHAR *p = envblock; *p != L'\0'; p += wcslen (p) + 1)
> > +	    if (wcsncmp (p, L"LC_ALL=", 7) == 0)
> > +	      lc_all = p + 7;
> > +	    else if (wcsncmp (p, L"LC_CTYPE=", 9) == 0)
> > +	      lc_ctype = p + 9;
> > +	    else if (wcsncmp (p, L"LANG=", 5) == 0)
> > +	      lang = p + 5;
> > +	  if (lc_all && *lc_all)
> > +	    snprintf (locale, ENCODING_LEN + 1, "%ls", lc_all);
> 	    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 	    sys_wcstombs (locale, ENCODING_LEN + 1, lc_all);
> 
> OTOH, if you read these environment vars right from our current POSIX
> env, you don't have to convert from mbs to wcs at all.  Just call
> getenv("LC_ALL"), etc.  After all, envblock is just the wide char
> copy of our current POSIX env.

IIUC, envblock is not a copy of environment if exec*e() is used.
In this case, getenv() cannot retrieve the new environment values
passed to exec*e(). This is needed by the test case bellow.

  int pm = getpt();
  if (fork()) {
    [do the master operations]
  } else {
    char *env[] = {"LANG=ja_JP.SJIS", ...., NULL};
    setsid();
    ps = open(ptsname(pm), O_RDWR);
    close(pm);
    dup2(ps, 0);
    dup2(ps, 1);
    dup2(ps, 2);
    close(ps);
    execle("/bin/tcsh", "/bin/tcsh", "-l", NULL, env);
  }

> > +	  else if (lc_ctype && *lc_ctype)
> > +	    snprintf (locale, ENCODING_LEN + 1, "%ls", lc_ctype);
> > +	  else if (lang && *lang)
> > +	    snprintf (locale, ENCODING_LEN + 1, "%ls", lang);
> > +	}
> > +      if (!*locale)
> > +	{
> > +	  const char *env = __get_locale_env (_REENT, LC_CTYPE);
> > +	  strncpy (locale, env, ENCODING_LEN);
> > +	  locale[ENCODING_LEN] = '\0';
> > +	}
> > +      loc = duplocale (__get_current_locale ());
> > +      __loadlocale (loc, LC_CTYPE, locale);
> > +      charset = __locale_charset (loc);
> > +    }
> 
> Oh, boy, this is really a lot.  I have some doubts this complexity is
> really necessary.  It's a bit weird to go to such great lengths for
> native applications.  Still, why not just do this once in the process
> creating the pty rather than trying on every execve?

This is executed just once for a pty. Because
__eval_codepage_from_internal_charset() is called only when
get_ttyp ()->term_code_page is not set yet.

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>


More information about the Cygwin-patches mailing list