CYGWIN=codepage? Or LC_CTYPE=foo?

Corinna Vinschen corinna-cygwin@cygwin.com
Sun Apr 6 19:26:00 GMT 2008


On Apr  7 01:04, Kazuhiro Fujieda wrote:
> >>> On Sun, 06 Apr 2008 16:39:43 +0200
> >>> Corinna Vinschen <corinna-cygwin@cygwin.com> said:
> 
> > Shouldn't the (default) setting of LANG, LC_CTYPE and friends be based
> > on what the underlying OS is set to?  Microsoft maintains a table which
> > defines the relationship between the locale identifier used internally
> > (LCID), the "Culture name" (what's used by POSIX) and the attached
> > codepage.  The list is here:
> >
> > http://www.microsoft.com/globaldev/nlsweb/default.mspx
> 
> There are several culture names not conforming to the convention
> of locale names, for example, "gsw-FR", "az-Cyrl-AZ", "zh-Hant",
> and so on. I wrote the table between locale names and LCIDs for
> my implementation of setlocale.

I checked the return values from GetLocaleInfo.  Apparently the strings
returned by GetLocaleInfo(LOCALE_SISO639LANGNAME) and
GetLocaleInfo(LOCALE_SISO3166LANGNAME) match what you would expect.

For instance, az-Cyrl-AZ and az-Latn-AZ both return the ISO639 code az
and the ISO3166 code AZ.  They differ in the returned ANSI codepage,
1254 or 1251.

zh-CN and zh-Hans both return ISO639 zh and ISO3166 CN, zh-TW and
zh-Hant both return zh and TW.  In both cases the difference is just the
returned ANSI codepage.

I don't know if gsw-FR is correct though.  That's apparently an ISO639-2
code.  But it looks like there is no ISO639 code covering that.  What
would POSIX do?

> It isn't preferable such large table resides in the DLL. I will
> simplify the implementation. I will make setlocale check the
> codeset part only and accept the default locale or the C locale.

Fine with me, but given my above mentioned results, doesn't it make
sense to use the OS for that, too?

Oh, btw., from what I read in SUSv3, the "C" locale is actually just
the old name for what's today called "POSIX" locale.  They are both
equivalent, but POSIX requires that a POSIX conformant system
understands both.  I guess that's no big problem.

> > Or, we check if LANG/LC_CTYPE is set and only set the codepage according
> > to the setting of these variables.  Otherwise we just use the default
> > ANSI codepage.
> 
> This approach is preferable. I think setting $LANG is overkill.

You're right, of course.


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat



More information about the Cygwin-developers mailing list