This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
RE: Special entity characters in Shift-JIS XSL.
- To: xsl-list at mulberrytech dot com
- Subject: RE: Special entity characters in Shift-JIS XSL.
- From: kbagepalli at informatica dot com
- Date: Thu, 16 Mar 2000 14:32:15 -0800
- Reply-To: xsl-list at mulberrytech dot com
Is there a way in which I can specify UTF-8 encoding and output a ASCII
sequence. I should be able to see the file in any text editor so can I code
all the characters as &#nnnn??
Kiran
-----Original Message-----
From: David Carlisle [mailto:davidc@nag.co.uk]
Sent: Friday, December 17, 1999 2:05 AM
To: xsl-list@mulberrytech.com
Subject: Re: Special entity characters in Shift-JIS XSL.
> I think the OPPOSITE of flaky is the word I would use to describe an
entity
> identification paradigm that allows the entity to remain in its encoded
> form, yet still be identified as an entity. I think solid is more the
word.
You could build a solid system on that basis, but it wouldn't be XML.
> how can it then be passed to anymore parsers expecting 7-bit ASCII
> characters?
XML character set is _always_ unicode. If the encoding isn't the default
utf8 or utf16 not all of the character set may be directly accessed by
character data, but you can always use the &# syntax to access any
unicode character. An XML parser _has_ to treat `A' and `A' in an
identical manner and report `character number 65' to the application,
whichever version was in the input file. If your application _needs_
to see `A' and not `A' then it isn't an XML application (it could be
an SGML one).
> What if each of those parsers followed the spec, the first
> transforming the character into a 2-byte unicode character, leaving the
> others to see the two bytes as simply two different characters in the
> stream?
This can't happen as in a well formed XML document you _always_ know
if a multi-byte encoding is being used. Eitehr the <?xml declaration
specifies a single byte encoding such as latin 1, or a multiple byte
encoding is being used (utf 8 unless the first two bytes of the file are
the BOM, in which case it's utf-16)
David
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list