More about charsets
Corinna Vinschen
corinna-cygwin@cygwin.com
Sat Mar 27 17:53:00 GMT 2010
On Mar 27 18:24, Corinna Vinschen wrote:
> On Mar 27 16:11, Andy Koppe wrote:
> > Corinna Vinschen:
> > > while looking into the GB18030 issue once again, I found that we still
> > > may have two holes which might be important to support.
> > >
> > > - GB2312 aka EUC-CN
> > >
> > > Â We already support GBK, codepage 936. Â GB2312/EUC-CN is a subset
> > > Â of GBK and apparently GBK is often used while still labeled as
> > > Â GB2312. Â See the discussion here:
> > > Â http://www.mail-archive.com/unicode@unicode.org/msg03516.html
> > >
> > > Â So the question is, should we just allow GB2312 and EUC-CN as
> > > Â codeset names, but use the GBK conversion functions for them?
> >
> > Might as well. As you saw, mintty already does that. Thomas Wolff's
> > mined goes even further and handles both GB2312 and GBK with its
> > GB18030 codec, because GBK is a subset of GB18030.
>
> I think I'll opt for GBK for now, given that GB18030 doesn't exist yet.
I also intend to make GB2312 the default name, rather than GBK since
that's the default for these languages in Linux.
Btw., apart from EUC-TW, what's missing as well is BIG5-HKSCS. I read
http://en.wikipedia.org/wiki/HKSCS and the Windows specific section,
but I'm still puzzled how this is supposed to work. Does Vista's
codepage 950 contain the HKSCS elements or not?!?
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
More information about the Cygwin-developers
mailing list