More about charsets
Sat Mar 27 14:54:00 GMT 2010
while looking into the GB18030 issue once again, I found that we still
may have two holes which might be important to support.
- GB2312 aka EUC-CN
We already support GBK, codepage 936. GB2312/EUC-CN is a subset
of GBK and apparently GBK is often used while still labeled as
GB2312. See the discussion here:
So the question is, should we just allow GB2312 and EUC-CN as
codeset names, but use the GBK conversion functions for them?
Otherwise, there's also a codepage 51936, which is called EUC-CN
in the list at
I didn't test it, but it appears to be the real GB2312. I don't
know if it really makes sense to make the difference, though.
There's a codepage 51950 which appears to be something like EUC-TW.
I just found this, though:
Andy, is that a general rule? Or did you test on XP and the codepage
was just not installed, by any chance?
We certainly have other holes as well, but for OS usage I don't see
any other codeset which would be that important.
Anything I'm missing?
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
More information about the Cygwin-developers