This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: whitespace problem when formating to plain-text


Hi John,

> file.xml:
> <info>
>   <link url="231243342">SpazeIRC</link> blablablabla bla . bldse
>   dfseld s3242. 43 <link url="2342342">har</link>.
> </info>
>
> text.xsl:
> <xsl:template match="text()">
>         <xsl:value-of select="normalize-space(.)"/>
> </xsl:template>
>
> The output I get:
> SpazeIRCblablablabla bla . bldse dfseld s3242. 43har.
>
> I want it to look like this:
> SpazeIRC blablablabla bla . bldse dfseld s3242. 43 har.
>         ^                                         ^

When file.xml gets parsed, the XSLT processor builds a node tree that
looks like the following:

  root
   +- info element
       +- text: "\n  "
       +- link element
       |   |  \- url: 231243342
       |   +- text: "SpazeIRC"
       +- text: " blablablabla bla . bldse dfseld s3242. 43 "
       +- link element
       |   | \- url: 2342342
       |   +- text: "har"
       +- text: ".\n"

(Where \n is a new line character.)
       
The XSLT processor works through the nodes one by one trying to apply
templates to them. The built-in templates process the elements by just
moving on to their children, so in effect the only nodes that get
anything done to them are the text nodes:

  text: "\n  "
  text: "SpazeIRC"
  text: " blablablabla bla . bldse dfseld s3242. 43 "
  text: "har"
  text: ".\n"

One feature of normalize-space() is that it strips leading and
trailing whitespace from a value. Looking at the third text node, you
can see that it starts and ends with a space; those spaces get
stripped by normalize-space(), which is why you lose those spaces.

The problem is actually coming from the fact that you're looking at
the individual text nodes when actually you're interested in the
complete string value of the info element. The string value of the
link element is the concatenation of all its descendant text nodes,
which works out as:

  "\n  SpazeIRC blablablabla bla . bldse dfseld s3242. 43 har.\n"

When you normalize that, you get the string that you are after. So
rather than having a template that matches individual text nodes, have
a template that matches the info element and gives you its normalized
string value:

<xsl:template match=="info">
  <xsl:value-of select="normalize-space()" />
</xsl:template>
  
> I must write code to check for '> ' in the beginning and ' <' in the
> end of the string in the "text()" template so I can add an
> whitespace in the beginning or the end if they exist.

No, you mustn't, because you're dealing with XSLT, which views XML as
a node tree as above, not with a language that treats XML as a string
of characters. XSLT doesn't see the markup that you use, it only sees
elements and attributes and text.

I hope that helps,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]