This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Question about several characters
- To: "'libc-alpha at sources dot redhat dot com'" <libc-alpha at sources dot redhat dot com>
- Subject: Question about several characters
- From: "Strassburger, Martin" <martin dot strassburger at sap dot com>
- Date: Thu, 28 Sep 2000 14:03:47 +0200
Hi glibc-experts,
I'm not a locale specialist but I was asked some questions about
certain characters I'm not sure about. Hopefully somebody can answer
(i'm using glibc 2.1.92 of RH7.0):
The non-break space <U00A0> is _not_ defined as a space and printable
characters in several locales, should it?
In locales using ISO-8859-2 the characters <U02D8> BREVE, <U02DB> OGONEK,
<U02C7> CARON and <U02DD> DOUBLE ACUTE ACCENT should be defined as printable
and punctuation?
In ISO-8859-8 the characters <U200E> LEFT-TO-RIGHT MARK and
<U200F> RIGHT-TO-LEFT MARK should not be defined as graphical?
In ko_KR.KSC5601 the characters <U0003> END OF TEXT,
<U0004> END OF TRANSMISSION, <U0017> END OF TRANSMISSION BLOCK and
<U0019> END OF MEDIUM should be defined as control characters?
In BIG5 there are unmapped characters
(cut from BIG5_
% Unmapped Big5 characters:
% /xa2/xcc, /xa2/xce, /xf9/xe9, /xf9/xea, /xf9/xeb,
% /xf9/xf9, /xf9/xfa, /xf9/xfb, /xf9/xfc, /xf9/xfd
%
)
but should be reported as double byte characters (f.i. by mblen),
shouldn't they? I looked at the Unicode CD 3.0 and found in
Mappings/EASTASIA/OTHER/BIG5:
...
# We currently map all of these characters to U+FFFD REPLACEMENT
CHARACTER.
# It is also possible to map these characters to their
duplicates, or to
# the user zone.
#
...
I hope, somebody can help me. Best regards,
Martin