More about charsets

Andy Koppe andy.koppe@gmail.com
Sat Mar 27 20:44:00 GMT 2010


Corinna Vinschen:
> No.  What I mean is, if somebody chooses a language_TERRITORY code which
> default codepage is 936, then set the codeset to "GB2312".  If somebody
> explicitely chooses "GBK", stick to it.  If somebody chooses "EUC-CN",
> map it to GB2312.  That reflects what Linux does.  So that's what
> happens:
>
>  setlocale (LC_CTYPE, "zh_CN");
>  printf ("%s\n", nl_langinfo (CODESET));
>
>  ==>  "GB2312"
>
>  setlocale (LC_CTYPE, "zh_CN.gbk");
>  printf ("%s\n", nl_langinfo (CODESET));
>
>  ==>  "GBK"
>
>  setlocale (LC_CTYPE, "zh_CN.eucCN");
>  printf ("%s\n", nl_langinfo (CODESET));
>
>  ==>  "GB2312"

Looks good to me.


>> > Btw., apart from EUC-TW, what's missing as well is BIG5-HKSCS.  I read
>> > http://en.wikipedia.org/wiki/HKSCS and the Windows specific section,
>> > but I'm still puzzled how this is supposed to work.  Does Vista's
>> > codepage 950 contain the HKSCS elements or not?!?
>>
>> Nope, it doesn't. For XP there's an installable package that turns
>> codepage 950 into BIG5-HKSCS. As far as I understand it, in Vista MS
>> gave up on the idea of extending BIG5, and instead interpreted the
>> HKSCS spec as a requirement for fonts and programs to support the
>> Unicode codepoints needed for Cantonese. Here's Michael Kaplan
>> sounding off on codepage "951":
>> http://blogs.msdn.com/michkap/archive/2007/05/12/2561904.aspx
>
> Too bad.  I hope it's not overly critical.  Right now standard Big5 is
> our default for zh_HK as well as for zh_TW.

I guess it's a case of "use zh_CN.UTF-8 instead", as it appears that's
what was the intention with HKSCS-2004. Wikipedia has this to say on
HKSCS on Linux: "HKSCS support was added to glibc in 2000, but it has
not been updated since then. HKSCS-2004 support is handled as Unicode
4.1 and later."

Andy



More information about the Cygwin-developers mailing list