This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: How to read the encoding of an XML document
At 01:33 PM 10/25/2001 -0700, Christopher R. Maden wrote:
>At 12:59 25-10-2001, James Garriss wrote:
>>Ok. If you recall, I started this discussion by mentioning that I am
>>receiving XML documents from several European countries. So the
>>pertinent question for me is "if UTF-8 and/or UTF-16 will be the output
>>encoding set I must use, will they handle charcters from the languages I
>>care about?"
>>
>>So it seems to me that I should be safe outputing my data to
>>UTF-16. That make sense?
>
>Yes. UTF-8 and UTF-16 both cover the entire Unicode repertoire. The
>difference is that that UTF-8 uses a different number of bytes for
>different characters, while UTF-16 uses 2 bytes for most characters. For
>European content, UTF-8 is usually a win; for Asian content, UTF-16 is
>generally better. But either can represent the entire Unicode repertoire.
I've been looking at a lot of European web pages, viewing source to see
what charset they define in the HTML META tag. The majority use
iso-8859-1, but a few don't. Most notably Turkey and Greece have character
sets that are quite different. How do I determine if UTF-16 (or UTF-8)
will work for those languages?
--James
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list