GB18030 (was: Re: charset changes)

Andy Koppe
Sun Mar 28 07:00:00 GMT 2010

Corinna Vinschen:
>> [MultiByteToWideChar] only fails if the MB_ERR_INVALID_CHARS flag is set.
> That's what Cygwin is doing.  I don't see how any other setting would
> make sense.

In mintty I use the non-failing mode to try to tell invalid sequences
from incomplete ones. If MultiByteToWideChar returns the
UnicodeDefaultChar only, it could be either, but if it returns the
default char followed by a second char, there's an invalid sequence
followed by something else.

(That code is used primarily on Cygwin 1.5. On 1.7, Cygwin's charset
support is used whenever a valid locale is set.)

> Looks like there's still no chance to persuade you to sign the copyright
> assignment form.

'fraid so. Sorry.

So is that a 'no' regarding the GB18030 scheme?

>> ps: Btw, speaking of performance issues, the 8-bit charsets are rather
>> inefficient because for every single non-ASCII character they parse
>> the charset name to obtain a charset table index. Storing that index
>> alongside the name might make quite a big difference.
> That's right.  The problem is that it's necessary to be able to call the
> function the same way as any other __FOO_wctomb or __FOO_mbtowc
> function.  Right now all these functions get the charset name as
> parameter.  This is necessary because the functions could be called with
> a charset name which is different from the globally stored charset.
> For instance, if Cygwin is using another charset for the console window
> than the application is requesting in setlocale.
> Anyway, feel free to send a patch to change the charset name parameter
> to an array index parameter.

Right, I'll have a go at the newlib side of that. As this will impact
code that's ifdeffed out for Cygwin, can you recommend a
multibyte-enabled platform to compile (and test?) that on?


More information about the Cygwin-developers mailing list