This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libc/2648] localedata/locales/es_ES has incorrect LC_COLLATE <space> handling


------- Additional Comments From mfabian at suse dot de  2006-05-09 16:02 -------
Comment from Markus Kuhn from the Novell Bugzilla:

Comment #4 From Markus Kuhn 2006-03-21 11:24 MST
[ ] Private

Glibc implements a 4-pass sorting algorithm, something like the Unicode
Collation Algorithm defined at

  http://www.unicode.org/reports/tr10/

or equivalently the International Standard Ordering defined in ISO 14651. The
SPACE is not ignored, it affects the sorting order only with lower priority
than 

  - the base characters
  - accents
  - whether base characters are uppercase or lower case

At level 4, space is treated like punctuation.

The Unicode sorting algorithm has lots of options. If you look at

  http://www.unicode.org/reports/tr10/#Variable_Weighting

you will see that variable weighting options are avaliable for characters such
as SPACE. Perhaps the UTF-8 locales were configured to use something equivalent
to the "blanked" option, whereas what the user expects here is the
"non-ignorable" option?

It is up to the locale designer to chose these options, and I suspect the
necessary discussion on which options are best here has never taken place.

The culprit is probably in the file

  /usr/share/i18n/locales/iso14651_t1

the line

  <U0020> IGNORE;IGNORE;IGNORE;<U0020> # 32 <SP>

which says that SPACE is sorted at level 4 only, i.e. with lowest priority. I
don't think this is a particularly good choice.

File format spec:
http://www.cl.cam.ac.uk/~mgk25/volatile/ISO-14652.pdf

People like Ulrich Drepper, Alain LaBonté, Keld J. Simonsen would know more on
the origins of this.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=2648

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]