This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug libc/2648] localedata/locales/es_ES has incorrect LC_COLLATE <space> handling
- From: "mfabian at suse dot de" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sources dot redhat dot com
- Date: 9 May 2006 16:02:40 -0000
- Subject: [Bug libc/2648] localedata/locales/es_ES has incorrect LC_COLLATE <space> handling
- References: <20060509154921.2648.mfabian@suse.de>
- Reply-to: sourceware-bugzilla at sourceware dot org
------- Additional Comments From mfabian at suse dot de 2006-05-09 16:02 -------
Comment from Markus Kuhn from the Novell Bugzilla:
Comment #4 From Markus Kuhn 2006-03-21 11:24 MST
[ ] Private
Glibc implements a 4-pass sorting algorithm, something like the Unicode
Collation Algorithm defined at
http://www.unicode.org/reports/tr10/
or equivalently the International Standard Ordering defined in ISO 14651. The
SPACE is not ignored, it affects the sorting order only with lower priority
than
- the base characters
- accents
- whether base characters are uppercase or lower case
At level 4, space is treated like punctuation.
The Unicode sorting algorithm has lots of options. If you look at
http://www.unicode.org/reports/tr10/#Variable_Weighting
you will see that variable weighting options are avaliable for characters such
as SPACE. Perhaps the UTF-8 locales were configured to use something equivalent
to the "blanked" option, whereas what the user expects here is the
"non-ignorable" option?
It is up to the locale designer to chose these options, and I suspect the
necessary discussion on which options are best here has never taken place.
The culprit is probably in the file
/usr/share/i18n/locales/iso14651_t1
the line
<U0020> IGNORE;IGNORE;IGNORE;<U0020> # 32 <SP>
which says that SPACE is sorted at level 4 only, i.e. with lowest priority. I
don't think this is a particularly good choice.
File format spec:
http://www.cl.cam.ac.uk/~mgk25/volatile/ISO-14652.pdf
People like Ulrich Drepper, Alain LaBonté, Keld J. Simonsen would know more on
the origins of this.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=2648
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.