This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug regex/23393] Handle [a-z] and [A-Z] in consistent portable fashion regardless of locale.
- From: "fweimer at redhat dot com" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Thu, 19 Jul 2018 14:06:20 +0000
- Subject: [Bug regex/23393] Handle [a-z] and [A-Z] in consistent portable fashion regardless of locale.
- Auto-submitted: auto-generated
- References: <bug-23393-131@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=23393
--- Comment #24 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to Carlos O'Donell from comment #22)
> (In reply to Florian Weimer from comment #20)
> > The point Rich and I are making is that there is no requirement in POSIX to
> > have ranges following collation sorting. Our current implementations do
> > this, but it's not required by POSIX. We can change the code (and not the
> > data).
>
> This is not my interpretation.
>
> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
>
> ~~~
> 7. In the POSIX locale, a range expression represents the set of collating
> elements that fall between two elements in the collation sequence, inclusive.
> ~~~
>
> We would not meet that rule if we used code points?
For ASCII-based implementations, the order is the same. From “LC_COLLATE
Category in the POSIX Locale”:
# This is the minimum input for the POSIX locale definition for the
# LC_COLLATE category. Characters in this list are in the same order
# as in the ASCII codeset.
And a cursory glance at the definition suggests that the comment is accurate.
> You argue that the "unspecified behaviour" (not undefined), would be changed?
Yes, or not be changed, for the en_US locale and many common range expressions.
--
You are receiving this mail because:
You are on the CC list for the bug.