This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH/RFA] Fix ctype table and isblank


Corinna Vinschen wrote:
On Apr 8 09:13, Wizards' Guild wrote:
Yes, we really should have an "alpha" flag AND a "blank" flag with the
new semantics.

An "alpha" and "printable" flag. We already habe "blank" so I don't understand what you're trying to accomplish with a new one.

The obvious and common approach would be to widen the
table entries. I'm not a big fan of this because it bloats the
small-footprint systems. This is maybe why it hasn't been done
already?

Keep in mind that until a couple of days ago newlib didn't even *have* extended charset support. The problems we're discussing is a direct result of having more than just ASCII tables.

The problem are smaller targets.  Maybe it would be a feasible approach
to stick to one single 8 bit table (for ASCII) in case of small targets
and to provide everything as 16 bit tables for targets which don't care
the few extra K bytes.

And widening the tables introduces a new problem for Cygwin, which has
to keep some of the "old" stuff to maintain backward compatibility with
applications built under an older version.  It would have to maintain
the old ASCII ctype table, and copy over the data
from the new tables in a more tricky way; we would have to keep the
meaning of the current bit values and only extend flags to the upper
bytes so that Cygwin can deal with that for older apps.  New apps would
immediately profit from the new tables, of course.

However, *if* we really do that, now would be the time.  Cygwin is on
the verge of a new major release and since the extended ctype support is
only available starting with this new release, we could use the
opportunity to widen the character class tables now.

Another approach would be to keep the "C" locale table and macros, but
if extended charsets are supported just convert everything to UNICODE
and handle it there.

Oh, please no...


Some functions, such as tolower, are already
using this approach.

Only for chars > 0x80. But that was meant as a temporary solution.


I don't care for the "half hardcoded" variation of
isblank; seems like an accident waiting to happen.

It's exactly what we need for an *immediate* fix. _B covers all types of spaces (SPC, NBSP), and the TAB goes extra so as not to be catched by isprint().

Both _N and _X are locale-invariant, making them good candidates for
removal from the ctype table. If we wanted to recover ONE flag, I'd
take _N rather than _X. In this case add _X to the digits and use _X
in isprint, isgraph, and isalnum. It is possible to implement isdigit
as a standard macro with single evaluation:

#define isdigit(c) ((unsigned)((c)-'0')<=9)

Hard choices...

I don't think we have much of a choice. Either we stick to the current approach and my isblank() fix, or we widen the tables.

I think the following makes sense in light that isalpha isn't properly supported in the
present scheme.


1. Widen the tables (16 bits) and create a new ctype ptr with a new name.
2. Keep the old ctype ptr pointing to the old ASCII table.
3. Support either new or old mechanism in ctype.h based on a sys/config.h flag (e.g _ASCII_ONLY)
4. Support isblank in old mechanism as proposed, but add an intermediate variable to
avoid evaluating the argument more than once or call internal function (int d = (c), __ctype_ptr[d+1] ...)
5. Cygwin will set itself up to use the new mechanism


Existing code that doesn't care about additional charsets won't expand in size and will continue
to work as before. Old code wasn't using isblank before so I am not concerned that the macro is a
little different than the other isxxxx macros. Old code that doesn't recompile will work as before (ASCII) and will continue to link (no change in size). Code wishing to get the new locale support will have the platform
set the new flag and recompile which is reasonable (major release point or new platform altogether).


Comments?

-- Jeff J.
Corinna



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]