This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: AW: Problem parsing cp1252 with msxsl > UTF-8 ?


"Braumüller, Hans" wrote:
> what i don´t understand regarding UTF-8 is that besides is has a bigger
> charset, you cannot use special german special characters like ü, ö ,ä ,so
> we must continue for german with encoding="iso-8859-1" .
> 
> What i am missing?

Any time you save a text file or transmit it over a network, you have a series
of bytes representing the characters in the document. The encoding is how
those bytes map to characters.

UTF-8 maps all 1.1 million Unicode characters to a series of 1 to 4 bytes per
character. You certainly do have those German characters available in utf-8,
but instead of being mapped to 1 byte each, as they would be in iso-8859-1,
they are mapped to 2 bytes each.

A text editor that doesn't tell you what encoding it is using when you save
the document is probably relying on the underlying OS to make
encoding/decoding decisions, and it probably isn't using Unicode internally at
all; rather it just manages buffers of bytes fed to it by the OS. Solution: 
get a smarter text editor that lets you choose the encoding to save files 
with.

The encoding declaration in an XML document is a reflection of the actual
encoding used *throughout* file. You must not save a file with all the
characters encoded as iso-8859-1 bytes, while having encoding="utf-8" in the
file, for example. You must also avoid mixing encodings in the same file (some 
characters using one encoding, some using another).

   - Mike
____________________________________________________________________________
  mike j. brown                   |  xml/xslt: http://skew.org/xml/
  denver/boulder, colorado, usa   |  resume: http://skew.org/~mike/resume/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]