This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
RE: Re: encoding
- To: "'xsl-list at lists dot mulberrytech dot com'" <xsl-list at lists dot mulberrytech dot com>
- Subject: RE: [xsl] Re: encoding
- From: "Morris, Chris" <ChrisM at SNELLINGCORP dot COM>
- Date: Wed, 17 Jan 2001 14:11:30 -0600
- Reply-To: xsl-list at lists dot mulberrytech dot com
> Otherwise, if you have specified an encoding in your xml declaration
> and need that preserved in the generated output, then with MSXML3 this
> is rather complicated:
>
> 1. Use transformNodeToObject() instead of transformNode()
>
> 2. On the resulting XMLDOMDocument object issue the save() method.
I couldn't get this to work, can you? I saw this recommendation online in
many places, but it doesn't seem to do the trick.
I did get this off deja from a MS employee, and he doesn't mention the above
method.
http://x58.deja.com/[ST_rn=ps]/threadmsg_ct.xp?AN=694903389.1&mhitnum=1
1. The output of transformNode is always UTF-16 (regardless of the
encoding
attribute in xsl:output). This is because transformNode outputs a BSTR,
which
is by definition a string of UTF-16 characters.
2. To output using the encoding specified in xsl:output, you must pass an
ISequentialStream, an IStream, an ASP Response stream, or an
IPersistStream
pointer to transformNodeToObject or XSLProcessor.output. The correctly
encoded bytes will be streamed out to these interfaces, at which point the
implementation decides what to do with the bytes. An example in an ASP
page:
xml_dom.transformNodeToObject xsl_dom, Response
This will efficiently stream the results of the transformation to an ASP
response stream using UTF-8, if that is what you specified in xsl:output.
3. The XML spec has an "Autodetection of Character Encodings" appendix.
One
of the byte order marks is "EF BB BF", which specifies that the document
is
encoded as UTF-8. I assume this is the mark that Notepad is writing.
Some
form of autodetection is absolutely necessary so that the xml-decl can be
correctly interpreted. For example, UCS-4 (big-endian) will encode the
leading "<" as "00 00 00 3C" while UCS-4 (little-endian) will encode it as
"3C
00 00 00". Before the encoding attribute can be parsed, at least the
encoding
"family" must be determined. The byte order mark assists in this
autodetection process, and it is perfectly legal to begin XML documents
with
these special characters.
~Andy Kimball
MSXSL Dev
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list