This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: UTF-8: Invalid multibyte sequence


Hi Felix,

Felix Natter wrote:

>   setlocale(LC_ALL,"en_US.UTF-8");
[...]
>   printf("buffer='%s' strlen(buffer)=%d, numChars=%d\n",
>          buffer,
>          strlen(buffer),
>          mbstowcs(NULL, buffer, 0));
> 
>   return 0;
> }
> ----------
> 
> outputs:
> ----------
> buffer='aÃaÃ' strlen(buffer)=6, numChars=-1
> ----------

Odd.  I tried this with Debian eglibc 2.13-7 (using %zu in place of %d
to avoid 32-bit vs 64-bit portability problems) and received the
output

	buf='aÃaÃ' strlen(buf)=6, numChars=4

Perhaps you don't have the en_US.UTF-8 locale installed.  You can
check for that by running

	LC_ALL=en_US.UTF-8 perl

and seeing if it complains.

> Next, I tried to generate a widechar-sequence using L"..." and use
> wcsrtombs() to convert it to a multibyte sequence:

For this one, I get:

	result=6, errno=0
	buf='aÃaÃ' strlen(buf)=6, numChars=4

Hope that helps,
Jonathan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]