More about charsets

Corinna Vinschen
Sat Mar 27 14:54:00 GMT 2010

Hi guys,

while looking into the GB18030 issue once again, I found that we still
may have two holes which might be important to support.

- GB2312 aka EUC-CN

  We already support GBK, codepage 936.  GB2312/EUC-CN is a subset
  of GBK and apparently GBK is often used while still labeled as
  GB2312.  See the discussion here:

  So the question is, should we just allow GB2312 and EUC-CN as
  codeset names, but use the GBK conversion functions for them?

  Otherwise, there's also a codepage 51936, which is called EUC-CN
  in the list at
  I didn't test it, but it appears to be the real GB2312.  I don't
  know if it really makes sense to make the difference, though.


  There's a codepage 51950 which appears to be something like EUC-TW.
  I just found this, though:

  Andy, is that a general rule?  Or did you test on XP and the codepage
  was just not installed, by any chance?

We certainly have other holes as well, but for OS usage I don't see
any other codeset which would be that important.

Anything I'm missing?


Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

More information about the Cygwin-developers mailing list