This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: encoding woes: ISO-8859-1 vs. UTF-8


>   I am confused with the recent behavior described
> following regarding encoding.  I have a string "oLogo"
> in CSV, with those two weird characters actually being
> “ and ”, characters in General Punctuation
> II.
>   Here is the steps I am going through, consistently
> using ISO-8859-1 for encoding:

You can't be using ISO-8859-1 to encode the characters “ and
” 

ISO-8859-1 can only encode the characters in the range 0-255.

Perhaps you were using some proprietary Microsoft 8-bit encoding that
includes these two characters?

Rather than showing us what the CSV file looks like on your screen
(which depends entirely on the software used to display it) it might
help to show us what it looks like in hex.

> A. Import CSV
>   1. convert CSV to generic XML: the string did not
> change, stayed "oLogo".
>   2. saxon convert generic XML to proprietary XML:
> string got converted to "“Log&#8221";
>   3. import successful

This looks as if everything is OK so far, although the original CSV file
can't have been in iso-8859-1 as you claim.

> B. Export into CSV
>   1. pull from MSSQL7 to proprietary XML: "oLogo"
>   2. saxon convert proprietary XML to CSV: exception
> org.xml.sax.SAXException: Output character not
> available in this encoding (decimal 8220)
>   Why going one way it works and not the other?

When you use 

<xsl:output method="text" encoding="iso-8859-1"/>

you can only output the characters available in iso-8859-1, namely the
XML characters in the range 0-255.

Michael Kay
Software AG
home: Michael.H.Kay@ntlworld.com
work: Michael.Kay@softwareag.com 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]