Newlib's implementation of isalnum() is causing compiler warnings

Corinna Vinschen corinna-cygwin@cygwin.com
Mon Nov 14 11:41:00 GMT 2011


On Nov 14 11:16, Rafal Zwierz wrote:
> Hi,
> 
> First of all apologies if it is not the right place to submit patches
> for newlib/libc used by cygwin. If it's not then I would appreciate if
> you could point me to the right place for submitting such patches.

The right place for newlib patches is the newlib mailing list
newlib AT sourceware DOT org.  However...

> If it is the right place then please read on.
> 
> main.c (attached) is a simple app which, when compiled with under Cygwin
> 
> gcc -Wall -Werror main.c
> 
> shows the following problem:
> 
> cc1: warnings being treated as errors
> main.c: In function ‘main’:
> main.c:6:4: error: array subscript has type ‘char’
> 
> The fix is quite simple and is contained in patch.txt.

...this is not the right thing to do.  Actually the problem is in your
application.  The ctype warnings in newlib have been added exactly for
the benefit of application developers to warn them about using the ctype
macros in an incorrect way.

See the POSIX man page of isalpha (but this is valid for all isFOO ctype
macros):
http://pubs.opengroup.org/onlinepubs/9699919799/functions/isalpha.html

Note especially:

  The c argument is an int, the value of which the application shall
  ensure is a character representable as an unsigned char or equal to
                                            ^^^^^^^^^^^^^
  the value of the macro EOF. If the argument has any other value, the
  behavior is undefined.

It's a common mistake in applications to use a signed char value as
argument to the ctype macros.  While this was no problem way back when
everything was basically ASCII-only, it's a problem if you take other
codesets into account.  Here's why:

The common definition of EOF is:

  #define EOF (-1)

Now consider this code:

  setlocale (LC_ALL, "en_US.iso88591");
  char s[2] = { 0xff, 0 };  // 0xff is the character 'ÿ' in ISO-8859-1,
                            // aka LATIN SMALL LETTER Y WITH DIAERESIS
  if (isalpha (s[0]))
    printf ("isalpha is true\n");

The text will not be printed, because c is sign extended to int, thus
((char) 0xff) will become -1 in the call to isalpha.  Since -1 is EOF,
the character 'ÿ' will be handled incorrectly.

The right thing to do is to call

  if (isalpha ((unsigned char) c))

or, to create portable, multibyte-aware code:

  wchar_t wc;
  mbtowc (&wc, s, strlen (s));
  if (iswalpha ((wint_t) wc))


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat



More information about the Cygwin-patches mailing list