This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Finding unique nodes in a non-sibling nodeset
- From: Mike Berrow <mberrow at pacbell dot net>
- To: XSL-List at lists dot mulberrytech dot com
- Date: Sat, 29 Jun 2002 10:04:38 -0700
- Subject: [xsl] Finding unique nodes in a non-sibling nodeset
- Reply-to: xsl-list at lists dot mulberrytech dot com
In a code generation transform that I am working on, I frequently encounter
situations where I need to eliminate duplicate expressions or event calls.
The nodes with the commonality to be detected are often scattered around
different parts of a large (preprocessed) reference document that is loaded
with a document call.
Previously, I had eliminated duplicates with something of the form
$list[not(@key1=preceding-sibling::*/@key1)]
or
$list[not(@key1=preceding::*/@key1)]
... If I wanted to look back through the whole document.
In this situation however, the nodes to be duplicate-trimmed are
[A] Selected out of the reference document in very specific contextual
ways (e.g. deep inside xsl:template / xsl:for-each usages)
[B] Not all sibling nodes
[C] The preceding axis can't be used since it looks at the whole
preceding area of the document, not just my carefully selected nodes.
[D] The definition of duplication requires use of multiple node
attributes. i.e. needs a composite key.
Even if [D] were not true, the "preceding-sibling" axis approach would not
work because of [B] and the "preceding" axis approach would not work
because of [C].
I eventually hit on a way to solve this (since I use Saxon) using
saxon:tokenize. But I always wondered if there was a non-extension
way to do it.
What I did was build an aggregate string with delimiters from the nodes
in the set in question (in a variable called "$list"), like so ...
<xsl:variable name="aggregate">
<xsl:for-each select="$list">
<xsl:value-of select="concat(@key1,'/',@key2)" />
<xsl:if test="not(position()=last())"><xsl:text>#</xsl:text></xsl:if>
</xsl:for-each>
</xsl:variable>
Then use tokenize to get a node set ...
<xsl:variable name="list4" select="saxon:tokenize($aggregate,'#')"/>
And eliminate the duplicates the standard (?) way with
<xsl:variable name="list4NoDups" select="$list4[not(.=preceding-sibling::*)]"/>
I'm then able to process the node subset I was trying to get since I have the
keys embedded in the strings in the resultant node-set.
All was well, until my colleague decided to try out Saxon 7.1 which (it turns out)
changes the behavior of tokenize(). In that version, the nodeset comes back in
such a way that you can't use the "preceding" axis on it.
There are features in Saxon 7.1 that we are very interested in, so I needed
to try to find a different technique.
It turns out that the following has exactly the desired effect (in one line!!)
<xsl:variable name="listNoDups"
select="saxon:distinct($list, saxon:expression('concat(@key1,@key2)'))"/>
and I could have done that all along.
However, I still wondered if there was a way of doing this without extensions.
So I put the problem to my good friend Chris Maden (yes, *the* Chris Maden)
... but not in as much detail as I have given here.
Chris said "Muenchian Keys!!"
I hadn't yet used that technique anywhere (but heard it mentioned a lot)
so decided to give it a whirl.
Well, it does solve the problem, but with a restriction that makes it
unusable for me.
I set up my key like so:
<xsl:key name="Key1Key2" match="item[@flavour='sour']/fact" use="concat(@key1,@key2)"/>
Then used:
<xsl:variable name="uniqueKey1Key2forFlavour"
select="$list[generate-id()=generate-id(key('Key1Key2',concat(@key1,@key2)))]"/>
Which does the trick, but I can't use it since xsl:key is a top-level element
and I have situation [A] to deal with.
So, my questions are ...
[1] Is there a non-extension, non-xsl:key way of doing this?
[2] If not, is there a better way than saxon:distinct approach?
Thanks for bearing with me :-)
I have attached my current test data, test transform and output since
it may help to clarify what I'm trying to do.
-- Mike Berrow
========== input.xml ==============
<document>
<item flavour="sweet" >
<fact key1="AA" key2="BB" val="11"/>
<fact key1="XX" key2="CC" val="22"/>
<fact key1="AA" key2="BB" val="33"/>
</item>
<item flavour="sour" >
<fact key1="XX" key2="CC" val="11"/>
<fact key1="XX" key2="BB" val="33"/>
<fact key1="YY" key2="BB" val="22"/>
</item>
<item flavour="sweet" >
<fact key1="XX" key2="CC" val="33"/>
<fact key1="XX" key2="BB" val="22"/>
<fact key1="AA" key2="BB" val="11"/>
</item>
<item flavour="sour" >
<fact key1="YY" key2="BB" val="33"/>
<fact key1="XX" key2="CC" val="11"/>
<fact key1="YY" key2="BB" val="22"/>
</item>
</document>
========== dupElim.xsl ==============
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:saxon="http://icl.com/saxon"
version="1.0">
<!-- Finding unique nodes in a non-sibling nodeset... by Mike Berrow -->
<xsl:output method="xml"/>
<xsl:key name="Key1Key2" match="item[@flavour='sour']/fact" use="concat(@key1,@key2)"/>
<xsl:template match="document">
<!-- Select nodes of interest -->
<xsl:variable name="list" select="item[@flavour='sour']/fact"/>
<!-- Single value, attempt 1 -->
<xsl:comment>For $list[not(@key1=preceding-sibling::*/@key1)]</xsl:comment>
<xsl:text>
	</xsl:text><xsl:comment>We get ...</xsl:comment>
<xsl:variable name="list1NoDups" select="$list[not(@key1=preceding-sibling::*/@key1)]"/>
<xsl:for-each select="$list1NoDups">
<xsl:text>
	</xsl:text>
<xsl:value-of select="concat(@key1,'/',@key2)" />
</xsl:for-each>
<xsl:text>
	</xsl:text>
<xsl:comment>Not desired: 'preceding-sibling' can't see 'preceding cousin'</xsl:comment><xsl:text>

</xsl:text>
<!-- Single value, attempt 2 -->
<xsl:comment>For $list[not(@key1=preceding::*/@key1)]</xsl:comment>
<xsl:text>
	</xsl:text><xsl:comment>We get ...</xsl:comment>
<xsl:variable name="list2NoDups" select="$list[not(@key1=preceding::*/@key1)]"/>
<xsl:for-each select="$list2NoDups">
<xsl:text>
	</xsl:text>
<xsl:value-of select="concat(@key1,'/',@key2)" />
</xsl:for-each>
<xsl:text>
	</xsl:text>
<xsl:comment>Not desired: 'preceding' looks at the whole doc</xsl:comment><xsl:text>

</xsl:text>
<!-- Try Multi-value -->
<xsl:comment>For $list[not(concat(@key1,@key2)=concat(preceding::*/@key1,preceding::*/@key2))]</xsl:comment>
<xsl:text>
	</xsl:text><xsl:comment>We get ...</xsl:comment>
<xsl:variable name="list3NoDups" select="$list[not(concat(@key1,@key2)=concat(preceding::*/@key1,preceding::*/@key2))]"/>
<xsl:for-each select="$list3NoDups">
<xsl:text>
	</xsl:text>
<xsl:value-of select="concat(@key1,'/',@key2)" />
</xsl:for-each>
<xsl:text>
	</xsl:text>
<xsl:comment>Not desired: result of a naive composite key attempt</xsl:comment><xsl:text>

</xsl:text>
<!-- Multi-value using saxon::tokenize -->
<xsl:comment>Using aggregation, saxon:tokenize then 'not(.=preceding-sibling::*)'</xsl:comment>
<xsl:variable name="aggregate">
<xsl:for-each select="$list">
<xsl:value-of select="concat(@key1,'/',@key2)" />
<xsl:if test="not(position()=last())"><xsl:text>#</xsl:text></xsl:if>
</xsl:for-each>
</xsl:variable>
<xsl:variable name="list4" select="saxon:tokenize($aggregate,'#')"/>
<xsl:variable name="list4NoDups" select="$list4[not(.=preceding-sibling::*)]"/>
<xsl:for-each select="$list4NoDups">
<xsl:text>
	</xsl:text>
<xsl:value-of select="." />
</xsl:for-each>
<xsl:text>
	</xsl:text>
<xsl:comment>Which is the desired result</xsl:comment><xsl:text>

</xsl:text>
<!-- Multi-value using saxon::distinct -->
<xsl:comment>saxon:distinct($list, saxon:expression('concat(@key1,@key2)')</xsl:comment>
<xsl:for-each select="saxon:distinct($list, saxon:expression('concat(@key1,@key2)'))">
<xsl:text>
	</xsl:text>
<xsl:value-of select="concat(@key1,'/',@key2)" />
</xsl:for-each>
<xsl:text>
	</xsl:text>
<xsl:comment>Which is tighter code than using tokenize</xsl:comment><xsl:text>

</xsl:text>
<!-- Multi-value using Muenchian -->
<xsl:comment>Using <xsl:text><xsl:key name="Key1Key2" match="item[@flavour='sour']/fact"
use="concat(@key1,@key2)"/></xsl:text>
and select="$list[generate-id(.)=generate-id(key('Key1Key2',concat(@key1,@key2)))]"</xsl:comment>
<xsl:variable name="uniqueKey1Key2forFlavour"
select="$list[generate-id()=generate-id(key('Key1Key2',concat(@key1,@key2)))]"/>
<xsl:for-each select="$uniqueKey1Key2forFlavour">
<xsl:text>
	</xsl:text>
<xsl:value-of select="concat(@key1,'/',@key2)" />
</xsl:for-each>
<xsl:text>
	</xsl:text>
<xsl:comment>Which is the Muenchian approach, but since xsl:key is a top level element, this
will not help when nodesets need to be calculated in specific, non-whole-document
contexts</xsl:comment><xsl:text>

</xsl:text>
</xsl:template>
</xsl:stylesheet>
========== minSet.xml ==============
<?xml version="1.0" encoding="utf-8"?>
<!--For $list[not(@key1=preceding-sibling::*/@key1)]-->
<!--We get ...-->
XX/CC
YY/BB
YY/BB
XX/CC
<!--Not desired: 'preceding-sibling' can't see 'preceding cousin'-->
<!--For $list[not(@key1=preceding::*/@key1)]-->
<!--We get ...-->
YY/BB
<!--Not desired: 'preceding' looks at the whole doc-->
<!--For $list[not(concat(@key1,@key2)=concat(preceding::*/@key1,preceding::*/@key2))]-->
<!--We get ...-->
XX/CC
XX/BB
YY/BB
YY/BB
XX/CC
YY/BB
<!--Not desired: result of a naive composite key attempt-->
<!--Using aggregation, saxon:tokenize then 'not(.=preceding-sibling::*)'-->
XX/CC
XX/BB
YY/BB
<!--Which is the desired result-->
<!--saxon:distinct($list, saxon:expression('concat(@key1,@key2)')-->
XX/CC
XX/BB
YY/BB
<!--Which is tighter code than using tokenize-->
<!--Using <xsl:key name="Key1Key2" match="item[@flavour='sour']/fact" use="concat(@key1,@key2)"/>
and select="$list[generate-id(.)=generate-id(key('Key1Key2',concat(@key1,@key2)))]"-->
XX/CC
XX/BB
YY/BB
<!--Which is the Muenchian approach, but since xsl:key is a top level element, this
will not help when nodesets need to be calculated in specific, non-whole-document contexts-->
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list