This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: [xsl] Handling of special characters like © etc


Yogesh Dare wrote:
> <?xml version="1.0"?>

Encoding is, roughly, the mapping of a repertoire of abstract characters 
(units in a script for written language) to 1 or more code units (bytes, 
usually). Your XML file exists with some kind of encoding, because it is, 
after all, just a bunch of bits & bytes.

The encoding declaration in an XML document (the encoding="foo" part of
the <?xml ...?> line at the top) is an XML document's way of stating what
encoding it has. When you omit the encoding declaration, either UTF-8 or
UTF-16 are assumed, usually UTF-8.

>       © 2000 site.com

The copyright symbol is allowed in XML, but since you have implied that
your document is probably UTF-8 encoded, that symbol must be encoded as
the pair of bytes 0xC2 0xA9.

If this is giving you problems, then your file is not really UTF-8 
encoded, and this is an error. Chances are, it is encoded as just the byte 
0xA9, because your file was produced with iso-8859-1 or windows-1252 
encoding. You should get a text editor that saves in different encodings, 
rather than just your platform/OS default, and that has a hex mode so you 
can see the actual bytes in the file. I use TextPad, from 
http://www.textpad.com/

If you don't want to put the correct bytes in your file, you can either
correctly declare the encoding as iso-8859-1 or windows-1252, or you can
use &#169; or &#xA9; in your XML and XSLT documents, rather than the raw
characters.

> Now after parsing, the parser output is given to XSLTProcessor to apply xsl
> on it.But there again I face problem for characters like &,<,> etc.
> Well I can actually replace these known characters by there equivalents like
> for & i can put &amp; and so on.
> But I want some generic way to handle this.

& and < (and >, for balance) are XML markup characters. If you are using
them as character data, you must either escape them, or put them in a 
CDATA section, if one is allowed there. This is a requirement of all XML 
documents, including your source XML and the stylesheet.

   - Mike
_____________________________________________________________________________
mike j. brown, software engineer at  |  xml/xslt: http://skew.org/xml/
webb.net in denver, colorado, USA    |  personal: http://hyperreal.org/~mike/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]