This is the mail archive of the xsl-list@mulberrytech.com mailing list .

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

RE: Re: encoding

To: "'xsl-list at lists dot mulberrytech dot com'" <xsl-list at lists dot mulberrytech dot com>
Subject: RE: [xsl] Re: encoding
From: "Morris, Chris" <ChrisM at SNELLINGCORP dot COM>
Date: Wed, 17 Jan 2001 14:11:30 -0600
Reply-To: xsl-list at lists dot mulberrytech dot com

> Otherwise, if you have specified an encoding in your xml declaration
> and need that preserved in the generated output, then with MSXML3 this
> is rather complicated:
> 
> 1. Use transformNodeToObject() instead of transformNode()
> 
> 2. On the resulting XMLDOMDocument object issue the save() method.

I couldn't get this to work, can you? I saw this recommendation online in
many places, but it doesn't seem to do the trick.

I did get this off deja from a MS employee, and he doesn't mention the above
method.

  http://x58.deja.com/[ST_rn=ps]/threadmsg_ct.xp?AN=694903389.1&mhitnum=1

  1. The output of transformNode is always UTF-16 (regardless of the
encoding 
  attribute in xsl:output).  This is because transformNode outputs a BSTR,
which 
  is by definition a string of UTF-16 characters.

  2. To output using the encoding specified in xsl:output, you must pass an 
  ISequentialStream, an IStream, an ASP Response stream, or an
IPersistStream 
  pointer to transformNodeToObject or XSLProcessor.output.  The correctly 
  encoded bytes will be streamed out to these interfaces, at which point the

  implementation decides what to do with the bytes.  An example in an ASP
page:

  xml_dom.transformNodeToObject xsl_dom, Response

  This will efficiently stream the results of the transformation to an ASP 
  response stream using UTF-8, if that is what you specified in xsl:output. 

  3. The XML spec has an "Autodetection of Character Encodings" appendix.
One 
  of   the byte order marks is "EF BB BF", which specifies that the document
is 
  encoded as UTF-8.  I assume this is the mark that Notepad is writing.
Some 
  form of autodetection is absolutely necessary so that the xml-decl can be 
  correctly interpreted.  For example, UCS-4 (big-endian) will encode the 
  leading "<" as "00 00 00 3C" while UCS-4 (little-endian) will encode it as
"3C 
  00 00 00".  Before the encoding attribute can be parsed, at least the
encoding 
  "family" must be determined.  The byte order mark assists in this 
  autodetection process, and it is perfectly legal to begin XML documents
with 
  these special characters.

  ~Andy Kimball
  MSXSL Dev

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]