Bug in collation functions?
Corinna Vinschen
corinna-cygwin@cygwin.com
Thu Oct 29 16:14:00 GMT 2015
On Oct 29 08:59, Ken Brown wrote:
> On 10/29/2015 4:30 AM, Corinna Vinschen wrote:
> >On Oct 29 08:50, Corinna Vinschen wrote:
> >>On Oct 28 21:58, Eric Blake wrote:
> >>>On 10/28/2015 04:14 PM, Ken Brown wrote:
> >>>>It's my understanding that collation is supposed to take whitespace and
> >>>>punctuation into account in the POSIX locale but not in other locales.
> >>>
> >>>Not quite right. It is up to the locale definition whether whitespace
> >>>affects collation. But you are correct that in the POSIX locale,
> >>>whitespace must not be ignored in collation.
> >>>
> >>>>This doesn't seem to be the case on Cygwin. Here's a test case using
> >>>>wcscoll, but the same problem occurs with strcoll.
> >>>
> >>>That's because the locale definitions are different in cygwin than they
> >>>are in glibc. But it is not a bug in Cygwin; POSIX allows for different
> >>>systems to have different locale definitions while still using the same
> >>>locale name like en_US.UTF-8.
> >>
> >>Btw, strcoll and wcscoll in Cygwin are implemented using the Windows
> >>function CompareStringW with the LCID set to the locale matching the
> >>POSIX locale setting. I'm rather glad I didn't have to implement this
> >>by myself... :}
> >
> >OTOH, CompareString has a couple of flags to control its behaviour, see
> >https://msdn.microsoft.com/en-us/library/windows/desktop/dd317761%28v=vs.85%29.aspx
> >
> >Right now Cygwin calls CompareStringW with dwCmpFlags set to 0, but there
> >are flags like NORM_IGNORENONSPACE, NORM_IGNORESYMBOLS. I'm open to a
> >discussion how to change the settings to more closely resemble the rules
> >on Linux.
> >
> >E.g. wcscoll simply calls wcscmp rather than CompareStringW for the
> >C/POSIX locale anyway. So, would it makes sense to set the flags to
> >NORM_IGNORESYMBOLS in other locales?
>
> I think so. That's what the native Windows build of emacs does in this
> situation.
Is that all it's doing? I'm asking because using NORM_IGNORESYMBOLS
does not exaclty resemble the behaviour on Linux on my W10 box:
"11" > "1.1" in POSIX locale
!!! "11" > "1.1" in en_US.UTF-8 locale
"11" > "1 2" in POSIX locale
"11" < "1 2" in en_US.UTF-8 locale
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin/attachments/20151029/b000181a/attachment.sig>
More information about the Cygwin
mailing list