This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: Encoding issue
- To: xsl-list at lists dot mulberrytech dot com
- Subject: Re: [xsl] Encoding issue
- From: Mike Brown <mike at skew dot org>
- Date: Mon, 13 Aug 2001 22:29:10 -0600 (MDT)
- Reply-To: xsl-list at lists dot mulberrytech dot com
Jason Macki wrote:
> In Notepad, a line shows up like this:
> <value><![CDATA[ sustainable consumption by Gábor
> Náray-Szabó]]></value>
>
> However, in Visual Interdev, the accented characters are displayed as
> gibberish:
> <value><![CDATA[ sustainable consumption by Gábor Náray-Szabó
> ]]></value>
1. It has nothing to do with CDATA sections, in case you were wondering.
There is no need for them here, unless you the <value> might
contain unescaped "&" or "<" characters.
2. The document you are viewing in Notepad and Visual Interdev contains
the UTF-8 bytes for each character. That's one byte for each ASCII
character and two bytes for each of those particular accented non-ASCII
characters (á and ó).
3. The version of Notepad that you are using knows how to interpret
UTF-8 and is showing you the correct glyphs on your screen.
4. The version of Visual Interdev you are using is misinterpreting the
document as if it were ISO-8859-1 or Windows-1252 encoded. It thinks
the two-byte characters are two separate characters, and is showing
you the glyphs accordingly.
> When I use another application to transform this document, an error
> occurs because the line in question contains invalid characters, and the
> "parseerror.srcText" method displays "sustainable consumption by G?bor
> N?ray-Szab?".
The typical cause of this kind of error is that the document contains
ISO-8859-1 or Windows-1252 bytes, while it is being interpreted as UTF-8
when there's no encoding declaration. Certain byte sequences are
illegal in UTF-8, and almost any document that is not UTF-8 and not pure
ASCII will set off this alarm.
The DOMDocument save method knows how to save properly (from the SDK docs:
"Character encoding is based on the encoding attribute in the XML
declaration, such as <?xml version="1.0" encoding="windows-1252"?>. When
no encoding attribute is specified, the default setting is UTF-8.)
I am guessing that something is amiss in how you are loading this document
for transformation. I'd like to see the actual error you are getting,
though, and what methods you are calling, because sometimes when MSXML is
involved, UTF-16 becomes an issue.
- Mike
____________________________________________________________________________
mike j. brown, fourthought.com | xml/xslt: http://skew.org/xml/
denver/boulder, colorado, usa | personal: http://hyperreal.org/~mike/
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list