This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
Re: UTF-8: Invalid multibyte sequence
- From: Jonathan Nieder <jrnieder at gmail dot com>
- To: Felix Natter <felix dot natter at smail dot inf dot fh-brs dot de>
- Cc: libc-help at sourceware dot org
- Date: Sun, 19 Jun 2011 16:29:43 -0500
- Subject: Re: UTF-8: Invalid multibyte sequence
- References: <87k4chyj1k.fsf@bitburger.home.felix>
Hi Felix,
Felix Natter wrote:
> setlocale(LC_ALL,"en_US.UTF-8");
[...]
> printf("buffer='%s' strlen(buffer)=%d, numChars=%d\n",
> buffer,
> strlen(buffer),
> mbstowcs(NULL, buffer, 0));
>
> return 0;
> }
> ----------
>
> outputs:
> ----------
> buffer='aÃaÃ' strlen(buffer)=6, numChars=-1
> ----------
Odd. I tried this with Debian eglibc 2.13-7 (using %zu in place of %d
to avoid 32-bit vs 64-bit portability problems) and received the
output
buf='aÃaÃ' strlen(buf)=6, numChars=4
Perhaps you don't have the en_US.UTF-8 locale installed. You can
check for that by running
LC_ALL=en_US.UTF-8 perl
and seeing if it complains.
> Next, I tried to generate a widechar-sequence using L"..." and use
> wcsrtombs() to convert it to a multibyte sequence:
For this one, I get:
result=6, errno=0
buf='aÃaÃ' strlen(buf)=6, numChars=4
Hope that helps,
Jonathan