This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: [xsl] Handling of special characters like © etc
- To: xsl-list at lists dot mulberrytech dot com
- Subject: [xsl] Re: [xsl] Handling of special characters like © etc
- From: Mike Brown <mike at skew dot org>
- Date: Thu, 3 May 2001 02:24:45 -0600 (MDT)
- Reply-To: xsl-list at lists dot mulberrytech dot com
Yogesh Dare wrote:
> <?xml version="1.0"?>
Encoding is, roughly, the mapping of a repertoire of abstract characters
(units in a script for written language) to 1 or more code units (bytes,
usually). Your XML file exists with some kind of encoding, because it is,
after all, just a bunch of bits & bytes.
The encoding declaration in an XML document (the encoding="foo" part of
the <?xml ...?> line at the top) is an XML document's way of stating what
encoding it has. When you omit the encoding declaration, either UTF-8 or
UTF-16 are assumed, usually UTF-8.
> © 2000 site.com
The copyright symbol is allowed in XML, but since you have implied that
your document is probably UTF-8 encoded, that symbol must be encoded as
the pair of bytes 0xC2 0xA9.
If this is giving you problems, then your file is not really UTF-8
encoded, and this is an error. Chances are, it is encoded as just the byte
0xA9, because your file was produced with iso-8859-1 or windows-1252
encoding. You should get a text editor that saves in different encodings,
rather than just your platform/OS default, and that has a hex mode so you
can see the actual bytes in the file. I use TextPad, from
http://www.textpad.com/
If you don't want to put the correct bytes in your file, you can either
correctly declare the encoding as iso-8859-1 or windows-1252, or you can
use © or © in your XML and XSLT documents, rather than the raw
characters.
> Now after parsing, the parser output is given to XSLTProcessor to apply xsl
> on it.But there again I face problem for characters like &,<,> etc.
> Well I can actually replace these known characters by there equivalents like
> for & i can put & and so on.
> But I want some generic way to handle this.
& and < (and >, for balance) are XML markup characters. If you are using
them as character data, you must either escape them, or put them in a
CDATA section, if one is allowed there. This is a requirement of all XML
documents, including your source XML and the stylesheet.
- Mike
_____________________________________________________________________________
mike j. brown, software engineer at | xml/xslt: http://skew.org/xml/
webb.net in denver, colorado, USA | personal: http://hyperreal.org/~mike/
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list