This is the mail archive of the
kawa@sourceware.org
mailing list for the Kawa project.
Re: Encoding and unescaped-data
- From: Luis Casillas <casillas at mercedsystems dot com>
- To: Daniel Terhorst <daniel dot s dot terhorst at gmail dot com>, <kawa at sourceware dot org>
- Date: Wed, 22 Aug 2007 13:36:05 -0700
- Subject: Re: Encoding and unescaped-data
Are you somewhere specifying that you want to output data in UTF-8? If no
output encoding is specified, Java will pick one based on your locale; if
that's not UTF-8, then unencodable characters will be output as "?". If you
are on a Unix variant, what's the value of your $LANG environment variable?
(I once ran into a problem where Kawa will not compile source files with
UTF-8 characters correctly if the locale wasn't set to be UTF-8; this is
probably related.)
On 8/22/07 1:05 PM, "Daniel Terhorst" <daniel.s.terhorst@gmail.com> wrote:
> I'm running into a bit of unexpected behavior. I've been able to solve
most of
> my encoding problems with help from the list archives, but I'm
not sure yet
> how to get around this one.
My application loads a UTF-8, HTML file and
> simply sends it to the
server as unescaped-data. If I send it without
> unescaping the data,
all special characters are handled correctly; although,
> the HTML
remains escaped, of course.
But all special characters seem to be
> sent as question marks if I use
unescaped-data.
I've attempted to distill the
> essence of the problem below. I've
simply used a string with the correct
> bytes, which exhibits the same
behavior.
--------
;; -*- scheme -*-
(define
> (bytes->string/utf-8 bytes)
(<string> (<java.lang.String> bytes
> 0
bytes:length
> "UTF-8")))
(let* ((data (string-append "<b>"
> (bytes->string/utf-8 (<byte[]> 206 187))
> "</b>")))
(values-append data ", " (unescaped-data
> data)))
#\newline
--------
The page source output
> is:
<b>λ</b>, <b>?</b>
But I
> expected:
<b>λ</b>, <b>λ</b>
or
> maybe:
<b>λ</b>, <b>λ</b>
Is there a way to get the
> desired output?
Thanks,
-- Daniel Terhorst