bug in mbrtowc?
Andy Koppe
andy.koppe@gmail.com
Tue Jul 28 08:22:00 GMT 2009
I've encountered what looks like a bug in mbrtowc's handling of UTF-8.
Here's an example:
#include <stdio.h>
#include <locale.h>
#include <stdlib.h>
#include <wchar.h>
int main(void) {
wchar_t wc;
size_t ret;
mbstate_t s = { 0 };
puts(setlocale(LC_CTYPE, "en_GB.UTF-8"));
printf("%i\n", mbrtowc(&wc, "\xe2", 1, 0));
printf("%i\n", mbrtowc(&wc, "\x94", 1, 0));
printf("%i\n", mbrtowc(&wc, "\x84", 1, 0));
printf("%x\n", wc);
return 0;
}
The sequence E2 94 84 should translate to U+2514. Instead, the second
and third calls to mbrtowc report encoding errors. It does work
correctly if the three bytes are passed to mbrtowc() in one go:
printf("%i\n", mbrtowc(&wc, "\xe2\x94\x84", 3, 0));
Andy
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
More information about the Cygwin
mailing list