Console codepage setting via chcp?
Sat Sep 26 20:17:00 GMT 2009
On Sep 26 21:34, Corinna Vinschen wrote:
> Do you propose to change __utf8_mbtowc/__utf8_wctomb to allow UCS-2
> encoding as well?
> This is no problem for __utf8_mbtowc, but in __utf8_wctomb it's not
> possible to convert surrogate pairs to correct UTF-8 *and* lone
> surrogate first halfs to UCS-2, at least not with a lot of additional
> effort. The reason is that the first byte returned when the first half
> is read is > 0xf0. When the function is called for the second half and
> it turns out there is no second half, then the already returned 0xf0
> byte is suddenly wrong. And the wctomb functions have no read-ahead
> For that reason, I invented the aforementioned \016\377\x sequence
> to represent lone surrogate second halves.
> The only other alternative would be to revert all the surrogate pair
> handling changes and to allow only UCS-2 again, thus giving up to
> support Unicode values >= U+10000.
No, there's a third alternative, of course.
The __utf8_wctomb function could just create the corresponding
UCS-2 values if no first half has been encountered before. The
__utf8_mbtowc function could simply allow these UCS-2 values again.
That works (I just tested it) and is a small change, but is it really
desirable to allow UCS-2 values in UTF-8 strings?
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
More information about the Cygwin-developers