This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: Bug in 'xsl:sort'. ( XT vs SAXON. )
- To: xsl-list at mulberrytech dot com
- Subject: Re: Bug in 'xsl:sort'. ( XT vs SAXON. )
- From: Paul Tchistopolskii <paul at qub dot com>
- Date: Sat, 05 Aug 2000 16:25:35 -0700
- Organization: The Qub Group
- References: <3.0.1.32.20000805192624.009f1340@localhost>
- Reply-To: xsl-list at mulberrytech dot com
----- Original Message -----
From: Jeni Tennison
> If you go a little further on in the XSLT Recommendation, it says:
>
> "NOTE: It is possible for two conforming XSLT processors not to sort
> exactly the same. Some XSLT processors may not support some languages.
> Furthermore, there may be variations possible in the sorting of any
> particular language that are not specified by the attributes on xsl:sort,
> for example, whether Hiragana or Katakana is sorted first in Japanese.
This is not the case here, right? ( Actualy I don't understand
why something other than UTF * should supported
by W3C standards, but that's another story ).
> Future versions of XSLT may provide additional attributes to provide
> control over these variations. Implementations may also use
> implementation-specific namespaced attributes on xsl:sort for this.
This is also not the case, right ?
> NOTE: It is recommended that implementers consult [UNICODE TR10] for
> information on internationalized sorting."
>
> The values should be sorted "lexicographically in the culturally correct
> manner for the language specified by lang" but I guess the question arises
> in English (as it does in other languages) about whether '-' is
> lexicographically before '0' or not.
Right. But I'm not sure the question is about 'English'. I think the
question realy is 'in UTF8' ?
> If you follow up the UNICODE reference, there is a file that gives the
> order for sorting just about every character you can think of
> [http://www.unicode.org/unicode/reports/tr10/basekeys.txt]. In this file,
> various sorts of hyphens:
>
> 00AD ; [*020B.0020.0002.00AD] # SOFT HYPHEN
<cut/>
> come before (i.e. should be sorted before) various forms of 0:
> 0030 ; [.06B9.0020.0002.0030] # DIGIT ZERO
<cut/>
> This would imply that '-1' should be before '0' because '-' sorts before
> '0'. However, on
> [http://www.unicode.org/unicode/reports/tr10/index.html#Alternate
> Weighting] there is some extra stuff about options involving the weighting
> of hyphens (& various other characters) that might contradict this but that
> I can't get my head around right now.
Looks this is correct.
String minus_one = "-1";
String zero = "0";
System.out.println( zero.compareTo( minus_one ) );
prints 3
( this means zero is greater than minus_one ).
This is realy interesteing, huh? 'how many documents should you read
to understand what comes first '-' or '0' ?
> I don't think that either SAXON or XT is 'right'. They employ different
> sort orders,
Why? There is no special encodings or special sorting attributes.
Both engines receive the same 'lang' environment ( Or they dont??? ) ,
why they employ different sort orders?
> but from what I can gather, it's fine for them to do so and
> still both be compliant.
I still think something is strange here. They both are sorting UTF8 (?)
without any special cases mentioned in the W3C paper and the
question is : "in UTF8(?) what comes first '-' or '0' ?" - Right?
Is it legal they are giving the different ansewers to teh same question?
> Eventually the differences between them should be
> diminished through the specification of additional attributes.
Pardon, what attrubutes do you mean ???
I now think maybe this is is the bug in XT ?
Rgds.Paul.
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list