This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Bug in 'xsl:sort'. ( XT vs SAXON. )



----- Original Message ----- 
From: Jeni Tennison 

> If you go a little further on in the XSLT Recommendation, it says:
> 
> "NOTE: It is possible for two conforming XSLT processors not to sort
> exactly the same. Some XSLT processors may not support some languages.
> Furthermore, there may be variations possible in the sorting of any
> particular language that are not specified by the attributes on xsl:sort,
> for example, whether Hiragana or Katakana is sorted first in Japanese.

This is not the case here, right? ( Actualy I don't understand 
why something other than UTF * should supported 
by W3C standards, but that's another story ).

> Future versions of XSLT may provide additional attributes to provide
> control over these variations. Implementations may also use
> implementation-specific namespaced attributes on xsl:sort for this.

This is also not the case, right ?

> NOTE: It is recommended that implementers consult [UNICODE TR10] for
> information on internationalized sorting."
> 
> The values should be sorted "lexicographically in the culturally correct
> manner for the language specified by lang" but I guess the question arises
> in English (as it does in other languages) about whether '-' is
> lexicographically before '0' or not.

Right. But I'm not sure the question is about 'English'. I think the 
question realy is 'in UTF8' ?
 
> If you follow up the UNICODE reference, there is a file that gives the
> order for sorting just about every character you can think of
> [http://www.unicode.org/unicode/reports/tr10/basekeys.txt].  In this file,
> various sorts of hyphens:
> 
> 00AD ; [*020B.0020.0002.00AD] # SOFT HYPHEN
<cut/>

> come before (i.e. should be sorted before) various forms of 0:

> 0030 ; [.06B9.0020.0002.0030] # DIGIT ZERO

<cut/>
 
> This would imply that '-1' should be before '0' because '-' sorts before
> '0'.  However, on
> [http://www.unicode.org/unicode/reports/tr10/index.html#Alternate
> Weighting] there is some extra stuff about options involving the weighting
> of hyphens (& various other characters) that might contradict this but that
> I can't get my head around right now.

Looks this is correct. 

String minus_one = "-1";
String zero = "0";
System.out.println( zero.compareTo( minus_one ) );

prints 3
( this means zero is greater than minus_one ).

This is realy interesteing, huh? 'how many documents should you read 
to understand what comes first '-' or '0' ?
 
> I don't think that either SAXON or XT is 'right'.  They employ different
> sort orders, 

Why? There is no special encodings or special sorting attributes. 
Both engines receive the same 'lang' environment (  Or they dont??? ) , 
why they employ different sort orders? 

> but from what I can gather, it's fine for them to do so and
> still both be compliant.  

I still think something is strange here. They both are sorting UTF8 (?)
without any special cases mentioned in the W3C paper and the 
question is :  "in  UTF8(?) what comes first '-' or '0' ?"  - Right?
Is it legal they are giving the different ansewers to teh same question?

> Eventually the differences between them should be
> diminished through the specification of additional attributes.

Pardon, what attrubutes do you mean ???
I now think maybe this is is the bug in XT ?

Rgds.Paul.




 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]