This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Invalid tm_zone from localtime() when TZ is not set


Am 25.05.2016 um 10:44 schrieb Corinna Vinschen:
On May 25 11:28, KOBAYASHI Shinji wrote:

Any other comments on this topic? Let me explain my proposal again.

The intention of the following code in tzsetwall() should be to pick
up UPPERCASE letters "in ASCII range":

Are you sure you're not mixing ASCII with '8-bit character' range there?

if (isupper(*src)) *dst++ = *src;

NOTE: src is wchar_t *, dst is char *.

As Csaba Raduly pointed out, isw*() functions should be the first
choice if they achieve the desired behavior (select uppercase AND
ASCII).

But it doesn't, so it's not.

However, iswupper() does not fit for this purpose, as it
returns 1 for L'\uff21' for example. And I could not find isw*()

In that case, wouldn't it make sense to fix iswupper in the first place?

I don't believe it's been shown to be broken, so there's no need to fix it.

Apart from that, we can workaround all problems in tzsetwall by just
checking for

  if (*src >= L'A' && *src <= L'Z')

While that may be possible if it really is ASCII you're looking for, it's perverting the whole reason <ctype.h> and <wctype.h> exist: to make tests like this as independent of the actual character encoding as possible.

Here's what I wrote last week, but apparently only to Csaba Raduli:

Am 20.05.2016 um 09:09 schrieb Csaba Raduly:

> If the type of those members is WCHAR[] then using isascii() /
> isupper() on them is just plain wrong.

Absolutely. The argument type of isupper() and friends is 'int', not 'unsigned char'. But the _only_ allowed argument values are those in the range of unsigned char, plus EOF. For typical systems, that means the allowed argument range of is*() is -1 ... 255 inclusive. Calling these Standard Library functions with any other argument causes undefined behaviour.

That leaves three sensible ways of calling isupper() in portable code:

*) isupper(foo)  # where type of foo is unsigned char
*) isupper((unsigned char)bar) # where bar is signed char, or plain char
*) isupper(baz) # where baz was got from fgetc() or similar

All other call patterns are plain and simply wrong, or at least non-portable. In particular, passing a wchar_t to any of the <ctype.h> function is wrong every time.

> The correct function to use would be iswupper().

Actually, the is*upper() isn't even the actual problem here. The whole idea of copying a wchar_t string into a char one, element by element, is most likely nonsensical. A wchar_t cannot be assumed to just fit into a char, regardless whether iswupper() returned true on it or not. E.g. what do we expect this to do with an upper-case Greek or Cyrillic letter?

A proper solution may have to be more like this:

    int mapped = wctob(*src);
    /* this call is safe now because of how wctob() works: */
    if (isupper(mapped)) {
       *dst++ = (unsigned char)mapped;
    }

>> So, I propose to call isascii() to assure the wchar_t fits in the
>> range of ASCII before calling isupper().

Calling isascii() would be wrong for the same reasons calling isupper() is.




--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]