This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Question about several characters


Hi glibc-experts,

I'm not a locale specialist but I was asked some questions about
certain characters I'm not sure about. Hopefully somebody can answer
(i'm using glibc 2.1.92 of RH7.0):

The non-break space <U00A0> is _not_ defined as a space and printable
characters in several locales, should it?

In locales using ISO-8859-2 the characters <U02D8> BREVE, <U02DB> OGONEK, 
<U02C7> CARON and <U02DD> DOUBLE ACUTE ACCENT should be defined as printable
and punctuation?

In ISO-8859-8 the characters <U200E> LEFT-TO-RIGHT MARK and
<U200F> RIGHT-TO-LEFT MARK should not be defined as graphical?

In ko_KR.KSC5601 the characters <U0003> END OF TEXT, 
<U0004> END OF TRANSMISSION, <U0017> END OF TRANSMISSION BLOCK and
<U0019> END OF MEDIUM should be defined as control characters?

In BIG5 there are unmapped characters 
(cut from BIG5_
% Unmapped Big5 characters:
%     /xa2/xcc, /xa2/xce, /xf9/xe9, /xf9/xea, /xf9/xeb,
%     /xf9/xf9, /xf9/xfa, /xf9/xfb, /xf9/xfc, /xf9/xfd
%
)
but should be reported as double byte characters (f.i. by mblen),
shouldn't they? I looked at the Unicode CD 3.0 and found in 
Mappings/EASTASIA/OTHER/BIG5:
...
#       We currently map all of these characters to U+FFFD REPLACEMENT
CHARACTER.
#               It is also possible to map these characters to their
duplicates, or to
#               the user zone.
#
...

I hope, somebody can help me. Best regards,
Martin

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]