More about charsets

Corinna Vinschen
Sat Mar 27 20:18:00 GMT 2010

On Mar 27 18:52, Andy Koppe wrote:
> On 27 March 2010 17:53, Corinna Vinschen:
> > I also intend to make GB2312 the default name, rather than GBK since
> > that's the default for these languages in Linux.
> You mean have nl_langinfo(CODESET) return GB2312 when something like
> "zh_CN.GBK" is selected? Not sure about that, because it might cause

No.  What I mean is, if somebody chooses a language_TERRITORY code which
default codepage is 936, then set the codeset to "GB2312".  If somebody
explicitely chooses "GBK", stick to it.  If somebody chooses "EUC-CN",
map it to GB2312.  That reflects what Linux does.  So that's what

  setlocale (LC_CTYPE, "zh_CN");
  printf ("%s\n", nl_langinfo (CODESET));

  ==>  "GB2312"

  setlocale (LC_CTYPE, "zh_CN.gbk");
  printf ("%s\n", nl_langinfo (CODESET));

  ==>  "GBK"

  setlocale (LC_CTYPE, "zh_CN.eucCN");
  printf ("%s\n", nl_langinfo (CODESET));

  ==>  "GB2312"

> > Btw., apart from EUC-TW, what's missing as well is BIG5-HKSCS.  I read
> > and the Windows specific section,
> > but I'm still puzzled how this is supposed to work.  Does Vista's
> > codepage 950 contain the HKSCS elements or not?!?
> Nope, it doesn't. For XP there's an installable package that turns
> codepage 950 into BIG5-HKSCS. As far as I understand it, in Vista MS
> gave up on the idea of extending BIG5, and instead interpreted the
> HKSCS spec as a requirement for fonts and programs to support the
> Unicode codepoints needed for Cantonese. Here's Michael Kaplan
> sounding off on codepage "951":

Too bad.  I hope it's not overly critical.  Right now standard Big5 is
our default for zh_HK as well as for zh_TW.


Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

More information about the Cygwin-developers mailing list