This is the mail archive of the cygwin mailing list for the Cygwin project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
2010/1/4 Joseph QuinseyMy assumption has been that *printf should be byte-transparent unless where it uses explicit wide character arguments.
In Cygwin 1,7.1, sprintf() with the format string having an 8th bit set appears to be broken. Sample code (where I've indicated the backslashes in the comments, in case they are stripped out by the mailer):
#include <stdio.h>
int main (void) { unsigned char foo[30] = ""; unsigned char bar[30] = ""; unsigned char xxx[30] = ""; sprintf (foo, "\100%s", "ABCD"); /* this is backslash one zero zero */ sprintf (bar, "\300%s", "ABCD"); /* this is backslash three zero zero */ sprintf (xxx, "\300ABCD"); /* this is backslash three zero zero */ printf ("%d %d %d %d %d\n", foo[0],foo[1],foo[2],foo[3],foo[4]); printf ("%d %d %d %d %d\n", bar[0],bar[1],bar[2],bar[3],bar[4]); printf ("%d %d %d %d %d\n", xxx[0],xxx[1],xxx[2],xxx[3],xxx[4]); return 0; }
gives:
64 65 66 67 68 0 0 0 0 0 192 65 66 67 68
The second line of the output should be the same as the third.
The issue here is that the character set of the "C" locale in Cygwin 1.7 is UTF-8 and that the \300 on its own is an invalid UTF-8 byte.
[EILSEQ] A wide-character code that does not correspond to a valid character has been detected.
In that thread, someone had originally confused char * with wchar [] - the issue resolves cleanly if these are properly distinguished.To get well-defined behaviour, you need to invoke setlocale(LC_CTYPE, ...) with the approriate locale.
See the thread at http://cygwin.com/ml/cygwin/2009-12/msg00980.html
for more on this.
> It's talking about "characters" rather than "bytes" there, which INo, it's talking about "wide character codes" and "valid characters", to be picky.
> think does leave the behaviour for invalid bytes undefined,
I claim it's absolutely not well-defined and I strongly disagree here.It's actually well-defined - non-characters in the format string MUST make printf fail.
I don't think there is such a thing like an invalid multibyte character in a char [] unless it is being interpreted with a multi-byte function, that's what e.g. the mb* functions are for.The issue wasn't with wide characters, but invalid multibyte chars. But anyway, we're agreed that printf is right to bail out.
-- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |