This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Selective escaping of special characters
- From: "Kyrre Wathne" <kyrre at idium dot no>
- To: <XSL-List at lists dot mulberrytech dot com>
- Date: Tue, 12 Mar 2002 14:21:53 +0100
- Subject: [xsl] Selective escaping of special characters
- Reply-to: xsl-list at lists dot mulberrytech dot com
My apologies if this question has been asked before, I haven't found posts
that address this exact issue.
My problem is that I want to transform junk HTML generated by Microsoft
Word. This contains markup, of course, so my first instinct was to use
disable-output-escaping. However, this also disables escaping of other
special characters, like the special dash character –. These are then
outputted in a format my browser (Internet Explorer) doesn't understand (I
use "ISO-8859-1" as encoding in output).
I did work out a fix (pasted below) using a recursive named template, but
this is proving too slow for all but the smallest documents. (I use Saxon
6.5.1.)
My question is then: is there a fast way to only disable escaping for "<",
">" and "&"? Alternatively, can the named template below be optimized
significantly?
Thanks for any help.
Kyrre Wathne
<!-- Named template to output markup while escaping special characters -->
<xsl:template name="DUMP_TAG_STRING">
<xsl:param name="str"/>
<xsl:choose>
<xsl:when test="not($str)">
<!-- Empty String -->
</xsl:when>
<xsl:when test="not(contains($str, '<')) and not(contains($str,
'>')) and not(contains($str, '&'))">
<!-- My work is done -->
<xsl:value-of select="$str"/>
</xsl:when>
<xsl:otherwise>
<!-- Convert all XML markup characters temporarily to the backspace
character -->
<xsl:variable name="escaped" select="translate($str, '<>&',
'␈␈␈')"/>
<xsl:variable name="cutPos" select="1 +
string-length(substring-before($escaped, '␈'))"/>
<!-- All but last letter -->
<xsl:variable name="before" select="substring($str, 1, $cutPos - 1)"/>
<!-- Last letter -->
<xsl:variable name="replace" select="substring($str, $cutPos, 1)"/>
<!-- Find the string after before -->
<xsl:variable name="after" select="substring($str, $cutPos + 1)"/>
<!-- Dump part before match -->
<xsl:value-of select="$before"/>
<!-- Dump < or > as is, unescaped -->
<xsl:value-of select="$replace" disable-output-escaping="yes"/>
<xsl:if test="$after">
<!-- Recurse with remainder -->
<xsl:call-template name="DUMP_TAG_STRING">
<xsl:with-param name="str" select="$after"/>
</xsl:call-template>
</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list