This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Translating character entities for plain text output


> I've found a lot of discussion in the archives about solving 
> character entity problems for HTML output, but not much on plain text:
> 
> Generating plain text from docbook via XSLT, I need to output

What do you mean by "plain text"? Specifically, what character encoding?
If you live in the US or Western Europe, chances are you want
iso-8859-1: so specify encoding="iso-8859-1" in the xsl:output
declaration.
 
> a space for   and -- for —.  I can get some funny 
> glyphs (like  for  ) and various literal codes, but not 
> the result I want. I could postprocess the output, but I'd 
> love to fix the style sheet.

Yes, you can do these conversions either in the stylesheet or by
postprocessing. Or if you want to be clever, you could do it at input
time: change the entity definitions so that   means " " and —
means "--".

XSLT outputs bytes, not glyphs. The  glyph for   was created by
the software you used to view the bytes. In this case the XSLT processor
was outputting a UTF-8 encoding of the character, and you were viewing
it using software that thought it was looking at iso-8859-1.
> 
> In the stylesheet, I've tried defining the entity in a local 
> subset

It's irrelevant how the stylesheet defines the entity, the XML parser
looks ofr the entity definitions in the source document.

, also html and text methods and various encodings in 
> the xsl:output. The following almost works:
> 
> <xsl:template match="text()">
>      <xsl:if test="contains(.,'&#160;')">
>          <xsl:value-of select="translate(., '&#160;', ' ')"/>
>      </xsl:if>
> </xsl:template>
> 
> Unfortunately, this seems to suppress another essential 
> translation on the same context:
> 
>      <xsl:value-of select="translate(., '&#xA;&#xD;', ' ')"/> 
> 
> I can do either, but not both.

You can do both by writing translate(., '&#160;&#xA;&#xD;', '  ')

But actually, there probably won't be any &#xD; characters in your
source: they are removed by the XML parser. Your translation works by
accident, I suspect, because it converts an &#xA to a space and an &#xD
to nothing.

Michael Kay
Software AG
home: Michael.H.Kay@ntlworld.com
work: Michael.Kay@softwareag.com 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]