This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix UTF-16 surrogate handling in __utf8_mbtowc


2009/7/28 Corinna Vinschen:
> here's a fix for the UTF-16 surrogate pair handling in __utf8_mbtowc,
> as mentioned in http://sourceware.org/ml/newlib/2009/msg00778.html.
> The original code only worked in the context of application calls to
> mbs[nr]towcs. ÂThe new code below should also work in most cases where
> the application calls mbrtowc by itself.

Thank you very much for implementing that so quickly.


> The downside of this implementation is that an application could be
> happy with the result after only having read the first three bytes
> of the four byte sequence from the input string and just stop. ÂThis
> results in an incomplete surrogate pair. ÂHowever, as far as I can see
> it's rather unlikely, and it's still better that not handling Unicode
> values outside the base plane at all.

I think that's perfectly correct behaviour. There's nothing more that
can be done given the constraint of a 16-bit wchar_t type. That just
can't be hidden here, so applications have to be adapted where
necessary.


> +         *pwc = 0xdc00 | ((tmp - 0x10000) & 0x3ff);

Nitpicking: The '- 0x10000' isn't necessary here; '(tmp & 0x3ff)' should do.

Andy


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]