More about charsets

Corinna Vinschen corinna-cygwin@cygwin.com
Sat Mar 27 14:54:00 GMT 2010


Hi guys,


while looking into the GB18030 issue once again, I found that we still
may have two holes which might be important to support.

- GB2312 aka EUC-CN

  We already support GBK, codepage 936.  GB2312/EUC-CN is a subset
  of GBK and apparently GBK is often used while still labeled as
  GB2312.  See the discussion here:
  http://www.mail-archive.com/unicode@unicode.org/msg03516.html

  So the question is, should we just allow GB2312 and EUC-CN as
  codeset names, but use the GBK conversion functions for them?

  Otherwise, there's also a codepage 51936, which is called EUC-CN
  in the list at
  http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx
  I didn't test it, but it appears to be the real GB2312.  I don't
  know if it really makes sense to make the difference, though.

- EUC-TW

  There's a codepage 51950 which appears to be something like EUC-TW.
  I just found this, though:
  http://code.google.com/p/mintty/source/detail?r=738

  Andy, is that a general rule?  Or did you test on XP and the codepage
  was just not installed, by any chance?

We certainly have other holes as well, but for OS usage I don't see
any other codeset which would be that important.

Anything I'm missing?


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat



More information about the Cygwin-developers mailing list