This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: gcc4[1.7] printf treats differently a string constant and a character array


2009/12/28 Andy Koppe:
> 2009/12/28 Rodrigo Medina:
>> Hi,
>> I am moving from cygwin-1.5 and gcc3.4 to cygwin1.7 and gcc4.
>> Some simple programs of mine fail.
>>
>> I am using LC_ALL=es_VE.ISO-8859-15.
>>
>> I have reduced the problem to this example
>>
>> --------------
>> #include <stdio.h>
>> main()
>> {
>> static char* line1 =
>> " This letter has an accent -->Ã, this one has no accent -->a\n\n";
>> static char* line2 = " ***** another line ******\n\n";
>> static char* line3 =
>> " These letters have an accent -->ÃÂ, these ones have no accent -->A!\n\n";
>> static char* line4 =
>> " This letter has an accent -->Ã, this one has no accent -->A\n\n";
>> Âprintf(" This letter has an accent -->Ã, this one has no accent
>> -->a\n\n");
>> Âprintf(line2);
>> Âprintf("%d %d %d\n\n",line1[29],line1[30],line1[31]);
>> Âprintf(line1);
>> Âprintf(line2);
>> Âprintf(" These letters have an accent -->ÃÂ, these ones have no accent
>> -->A!\n\n");
>> Âprintf(line2);
>> Âprintf("%d %d %d %d\n\n",line3[32],line3[33],line3[34],line3[35]);
>> Âprintf(line3);
>> Âprintf(line2);
>> Âprintf(" This letter has an accent -->Ã, this one has no accent
>> -->A\n\n");
>> Âprintf(line2);
>> Âprintf("%d %d %d\n\n",line4[29],line4[30],line4[31]);
>> Âprintf(line4);
>> Âprintf(line2);
>> Âprintf(" ----- END ------");
>> }----------------
>>
>> My output is:
>>
>> ÂThis letter has an accent -->Ã, this one has no accent -->a
>>
>> Â***** another line ******
>>
>> 62 -31 44
>>
>> ÂThis letter has an accent --> ***** another line ******
>>
>> ÂThese letters have an accent -->ÃÂ, these ones have no accent -->A!
>>
>> Â***** another line ******
>>
>> 62 -61 -95 44
>>
>> ÂThese letters have an accent -->ÃÂ, these ones have no accent -->A!
>>
>> Â***** another line ******
>>
>> ÂThis letter has an accent -->Ã, this one has no accent -->A
>>
>> Â***** another line ******
>>
>> 62 -61 44
>>
>> ÂThis letter has an accent --> ***** another line ******
>>
>> Â----- END ------
>>
>> As you can see the output of printf(string_constant) is what
>> I expected. The ouput of printf(char_array) is trucated at the non-ASCII
>> character.
>
> Reproduced. Looking at the compiler's assembly output, some of the
> printf() calls are replaced by calls to puts(), and those do work
> correctly, whereas the remaining printf() calls with accented
> characters misbehave. So printf()'s handling of non-ASCII characters
> needs a closer look.

Ah, the problem actually is that your program is missing a call to
setlocale(LC_CTYPE, "") to switch to the locale and character set
specified in the environment. In fact, since your program contains
hard-coded ISO-8859-15 strings, you should probably do
setlocale(LC_CTYPE, "<whatever>.ISO-8859-15").

Without a setlocale call, programs use the "C" locale, and on Cygwin
1.7 that implies the UTF-8 character set. Those single accented
ISO-8859-15 characters are invalid when interpreted as UTF-8, so
printf halts there. The accented character pairs like "ÃÂ", meanwhile,
happen to be valid UTF-8, so they get through.

I couldn't find specific text about invalid bytes in the POSIX printf
spec, but it does say the following: "The format is a character
string, beginning and ending in its initial shift state, if any. The
format is composed of zero or more directives: ordinary characters,
which are simply copied to the output stream, and conversion
specifications, each of which shall result in the fetching of zero or
more arguments."

It's talking about "characters" rather than "bytes" there, which I
think does leave the behaviour for invalid bytes undefined, so
newlib's printf implementation is in its rights to just stop
processing the string at one of those.

Andy

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]