This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: How to read the encoding of an XML document


At 01:33 PM 10/25/2001 -0700, Christopher R. Maden wrote:
>At 12:59 25-10-2001, James Garriss wrote:
>>Ok.  If you recall, I started this discussion by mentioning that I am 
>>receiving XML documents from several European countries.  So the 
>>pertinent question for me is "if UTF-8 and/or UTF-16 will be the output 
>>encoding set I must use, will they handle charcters from the languages I 
>>care about?"
>>
>>So it seems to me that I should be safe outputing my data to 
>>UTF-16.  That make sense?
>
>Yes.  UTF-8 and UTF-16 both cover the entire Unicode repertoire.  The 
>difference is that that UTF-8 uses a different number of bytes for 
>different characters, while UTF-16 uses 2 bytes for most characters.  For 
>European content, UTF-8 is usually a win; for Asian content, UTF-16 is 
>generally better.  But either can represent the entire Unicode repertoire.

I've been looking at a lot of European web pages, viewing source to see 
what charset they define in the HTML META tag.  The majority use 
iso-8859-1, but a few don't.  Most notably Turkey and Greece have character 
sets that are quite different.  How do I determine if UTF-16 (or UTF-8) 
will work for those languages?

--James


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]