This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Collation Confusion


Greetings,

I've been looking into a collation problem with the locales that use
Ethiopic script after discovering some special cases where they do not
work as expected.  I've found a brute force type solution that I don't
much like, in large part because I don't understand the problem fully.

I found that the case comes up with Latin script as well, it just
amplifies with Ethiopic.  Here's the most basic example that I can
come up.  Under the en_US.UTF-8 locale the following list of chars
is sorted:

a
A
aa
Aa
ab

What I don't understand is why "Aa" precedes "ab".  My expectation
was that sorting occurs first based on the string length; next in a
left-to-right fashion where each successive character position is
considered in turn with respect to like positions.

In this case it appears that the "a" following "A" in "Aa" took
precedence over the "a" before the "b" in "ab".  But, shouldn't
sorting have stopped at the "A" and gone no further?

The sorting appears to be case insinsitive when "Aa" is compared with
"ab".  Looking into the iso14651_t1 file I find:

--------------------------8<---------------------------------

<a>
<b>
 :

<MIN>
<CAP>
  :

order_start <LATIN>;forward;backward;forward;forward,position

<U0041> <a>;<BAS>;<CAP>;IGNORE # 319 A
<U0061> <a>;<BAS>;<MIN>;IGNORE # 198 a
<U0062> <b>;<BAS>;<MIN>;IGNORE # 210 b
   :

--------------------------8<---------------------------------

which has case sensitivity.  So the way I read it I still expect a
sort order as per:

a
A
aa
ab
Aa

What am I missing?  If this is indeed the way collation should
occur, how can the collation statements above be modified to
sort with "Aa" following "ab"?

thanks!

/Daniel


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]