Newlib's implementation of isalnum() is causing compiler warnings
Corinna Vinschen
corinna-cygwin@cygwin.com
Mon Nov 14 11:41:00 GMT 2011
On Nov 14 11:16, Rafal Zwierz wrote:
> Hi,
>
> First of all apologies if it is not the right place to submit patches
> for newlib/libc used by cygwin. If it's not then I would appreciate if
> you could point me to the right place for submitting such patches.
The right place for newlib patches is the newlib mailing list
newlib AT sourceware DOT org. However...
> If it is the right place then please read on.
>
> main.c (attached) is a simple app which, when compiled with under Cygwin
>
> gcc -Wall -Werror main.c
>
> shows the following problem:
>
> cc1: warnings being treated as errors
> main.c: In function âmainâ:
> main.c:6:4: error: array subscript has type âcharâ
>
> The fix is quite simple and is contained in patch.txt.
...this is not the right thing to do. Actually the problem is in your
application. The ctype warnings in newlib have been added exactly for
the benefit of application developers to warn them about using the ctype
macros in an incorrect way.
See the POSIX man page of isalpha (but this is valid for all isFOO ctype
macros):
http://pubs.opengroup.org/onlinepubs/9699919799/functions/isalpha.html
Note especially:
The c argument is an int, the value of which the application shall
ensure is a character representable as an unsigned char or equal to
^^^^^^^^^^^^^
the value of the macro EOF. If the argument has any other value, the
behavior is undefined.
It's a common mistake in applications to use a signed char value as
argument to the ctype macros. While this was no problem way back when
everything was basically ASCII-only, it's a problem if you take other
codesets into account. Here's why:
The common definition of EOF is:
#define EOF (-1)
Now consider this code:
setlocale (LC_ALL, "en_US.iso88591");
char s[2] = { 0xff, 0 }; // 0xff is the character 'ÿ' in ISO-8859-1,
// aka LATIN SMALL LETTER Y WITH DIAERESIS
if (isalpha (s[0]))
printf ("isalpha is true\n");
The text will not be printed, because c is sign extended to int, thus
((char) 0xff) will become -1 in the call to isalpha. Since -1 is EOF,
the character 'ÿ' will be handled incorrectly.
The right thing to do is to call
if (isalpha ((unsigned char) c))
or, to create portable, multibyte-aware code:
wchar_t wc;
mbtowc (&wc, s, strlen (s));
if (iswalpha ((wint_t) wc))
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
More information about the Cygwin-patches
mailing list