This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: [PATCH/RFA] Internationalize ctype functionality

From: "Howland Craig D (Craig)" <howland at LGSInnovations dot com>
To: <newlib at sourceware dot org>
Date: Thu, 26 Mar 2009 21:55:17 -0400
Subject: RE: [PATCH/RFA] Internationalize ctype functionality
References: <20090326210123.GS12738@calimero.vinschen.de>

1)  Wouldn't it be cleaner, especially in files in which it happens more
than once, to replace things like:
 
#ifdef __CYGWIN__
char __declspec(dllexport) *__ctype_ptr__ = _ctype_b + 127;
#else
char *__ctype_ptr__ = _ctype_b + 127;
#endif
 
with:
 
#ifdef __CYGWIN__
  #define DLLEXPORT	__declspec(dllexport)
 #else
  #define DLLEXPORT
#endif
char DLLEXPORT *__ctype_ptr__ = _ctype_b + 127;

(given that the only differences on the lines is the dll attribute)?
This would not only make ctype_.c more readable, but more maintainable.
 
2)  I don't entirely understand the following, possibly due to my lack
of knowledge on the topic:
>- The toupper and tolower functions are now charset independent.  If
the
>  character is > 0x7f, it will be converted to wide char and then
>  towupper/towlower is called on it.
>  This is only a temporary solution.  It works, but it's a bit sedated
>  for native charaters.  In the long run we should rather add
>  upper/lower-case transformation tables, similar to the new ctype
>  character class tables.
toupper and tolower operate on regular characters, which have a defined
range of unsigned-char-allowed-values and EOF.  How can it work to
change it to a wide character except in the degenerate case when wide
characters are the same width as regular characters?  That is, should
it be gated by a check that MB_CUR_MAX == 1?  It seem dangerous to
try otherwise.  What if lowercase ran from 0xE6-0xFF and uppercase
were 0x100-0x119?  Or even worse, if lc was 0xE5-0xFE and uc was
0xFF-0x118?  So you could convert 'a' but not 'b' through 'z'?  (Where
it is unlikely that a and z are actually the first and last letters,
but I have to use something for sake of the example.)  Or is there an
a-priori knowledge of all the characters sets being applied that says
this would be OK?
Does it only make sense to try at all unless MB_LEN_MAX == 1?  (The user
should be using wide characters, not normal, if MB_CUR_MAX can be > 1,
shouldn't they?)  In this case, "#ifdef _MB_CAPABLE" becomes
"#if defined(_MB_CAPABLE)  &&  MB_LEN_MAX == 1".

If this feature is kept, I suggest that
	char s[8] = { c, '\0' };
be changed to:
	char s[MB_LEN_MAX+1] = { c, '\0' };
 
3)  (both toupper.c and tolower.c do this)
+#ifdef _MB_CAPABLE
+  if ((unsigned char) c <= 0x7f) 
+    return isupper (c) ? c - 'A' + 'a' : c;
+
+  char s[8] = { c, '\0' };
+  wchar_t wc;
+  if (mbtowc (&wc, s, 1) >= 0
+      && wctomb (s, (wchar_t) towlower ((wint_t) wc)) == 1)
+    c = s[0];
+  return c;
+#else
+  return isupper(c) ? (c) - 'A' + 'a' : c;
+#endif
 
The char s[8] and wchar_t lines will not work, coming in the middle
of a block, unless the compiler is C99 compliant.  Does Newlib assume
(require) C99 compilers?  (I hope so, but don't think so.)

(Interestingly enough, I tried this with gcc 3.4.4 in Cygwin with
-std=c89, and it actually allowed it.  But I know that it will fail
with some gcc flavors even without -std=c89, as I just had it happen
yesterday on a cross compiler.)
 
Craig

Follow-Ups:
- Re: [PATCH/RFA] Internationalize ctype functionality
  - From: Corinna Vinschen

References:
- [PATCH/RFA] Internationalize ctype functionality
  - From: Corinna Vinschen

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]