This is the mail archive of the kawa@sources.redhat.com mailing list for the Kawa project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

kawa-java string compatibility proposal


Hi folks,

My experience with Kawa thus far has, of course, been wonderful in
nearly all respects.  There have been few stumbling blocks, but
one of the more notable ones has been the incompatibility between
Kawa and Java strings.

This issue has been more frustrating than you might expect, because
the two types are at least partly compatible.  Some operations work
and others don't, and the incompatibilities aren't documented anywhere
that I could find.  So it's quite easy to be confused, and even after
you've got it figured out, having to think about it as you code is still
(I feel) rather inconvenient.

I think the best solution is to try to make Java and Kawa strings
seamlessly interoperable, so I'd like to offer a proposal for your
consideration.  The proposal addresses several areas of concern:

  - passing Scheme strings to Java methods that expect Java strings
  - invoking java.lang.String methods on Scheme strings
  - applying non-destructive Scheme string procedures to Java strings
  - applying destructive Scheme string procedures to Java strings

I've done my best to outline the problems I've encountered, and I've 
made suggestions aimed at alleviating each of the problems.

1) Passing Scheme strings to Java methods

Overall, this works pretty well.   When a Java method expects a 
String argument, and you pass a Kawa string, it appears to work
most of the time:

(invoke-static <java.lang.Integer> 'parseInt "3")
=> 3

(invoke (make <java.lang.StringBuffer>) 'append "foo")
=> foo

There's no particular reason this should work, but it does, so I
assume Kawa is doing some sort of automatic conversion from
gnu.lists.FString to java.lang.String.

The only glitch in this otherwise very convenient behavior occurs 
when a Java method expects a CharSequence, which is a relatively 
new superinterface of java.lang.String.  In that case, Kawa doesn't
perform the conversion:

(let ((pattern (java.util.regex.Pattern:compile "(f+)(o+)")))
  (invoke pattern 'matcher "foo"))
Argument #1 ((f+)(o+)) to
'java.util.regex.Pattern.matcher(java.lang.CharSequence)' has wrong type
(java.util.regex.Pattern) (expected: java.lang.CharSequence)

The error message above seems a bit confused; the argument is a
gnu.lists.FString.  But in any case, it doesn't do the expected
conversion.  (When I say "expected", it's not a criticism of Kawa;
it's just what a naive user like me expects, because Java converts
from String to CharSequence automatically as needed.)

A workaround for the glitch is to convert the argument to a Java
string manually, e.g. by invoking 'toString on it, or by using
(make <java.lang.String> ...):

(let ((pattern (java.util.regex.Pattern:compile "(f+)(o+)")))
  (invoke pattern 'matcher (invoke "foo" 'toString)))

=> java.util.regex.Matcher[pattern=(f+)(o+) region=0,3 lastmatch=]

But this workaround is inconvenient, because most of the time
it's unnecessary.

By happy coincidence, gnu.lists.FString implements Serializable
and Comparable, which are only the other two superinterfaces of
java.lang.String.  So the *only* time Kawa strings aren't converted
seamlessly in Java method invocations is when a CharSequence is expected.

We might be able to make the behavior fully seamless by having
gnu.lists.FString implement CharSequence.  There are only four
methods on CharSequence, and FString already implements three of
them.  The fourth is the subSequence() method, which could simply
call substring() and return the result.

2) Invoking java.lang.String methods on Kawa strings:

A Kawa string can act as the receiver for methods of class
java.lang.Object:

(invoke "foo" 'hashCode)
=> 101574

...but not, alas, for most methods of class java.lang.String:

(invoke "foo" 'toUpperCase)
java.lang.RuntimeException: invoke: no method named `toUpperCase' in
class gnu.lists.FString

You typically have to convert the Scheme string to a Java string first:

(invoke (invoke "foo" 'toString) 'toUpperCase)
=> FOO

However, sometimes it works!  In particular, it works whenever
gnu.lists.FString just happens to implement the method you're invoking:

(invoke "foobar" 'charAt 3)
=> #\b

gnu.lists.FString implements several methods that are identical to their
java.lang.String counterparts, including charAt, compareTo, getChars,
length, substring(int, int), toCharArray, and of course all the
java.lang.Object methods.

This partial compatibility is a bit misleading:  Kawa appears to be
whimsically honoring some valid-looking invocations, but not others.
This is a gross misconception, of course; you're actually at the mercy
of random chance, not malicious whimsy.  But the net effect is much
the same.

Equally confusing are methods like indexOf() and lastIndexOf(),
which happen to have the same semantics as the java.lang.String
versions -- but only for some argument types:

(invoke "foobar" 'indexOf #\b)
=> 3

(invoke "foobar" 'indexOf "bar")
=> -1

substring() is also a bit confusing, because the 2-arg version works:

(invoke "foobar" 'substring 1 3)
=> oo

But the 1-arg version does not:

(invoke "foobar" 'substring 3)
gnu.mapping.WrongArguments: call to 'gnu.lists.FString.substring(int,int)' 
has too few arguments (2; must be 3)

You can work around this inconsistency (or random accidental 
consistency, if you prefer) by being very careful about how you
handle Java and Kawa strings:  as careful, say, as you might be
in handling pointers in a C program.  But unlike in C, Java and
Scheme both have very robust string support, and most programmers
aren't accustomed to needing that kind of care in their string
manipulation.

It also bulks up your code, even if you create functions that do 
the conversion for you:

(define (kstring->jstring arg)
  (invoke arg 'toString))

(invoke (kstring->jstring "foobar") 'indexOf "bar")
=> 3

This extra code is a distraction that one would like to avoid.
Ideally, the conversion from Kawa to Java strings should happen
automatically as necessary, to minimize confusion and inconvenience.

How do you solve it?  Well, all Java string operations are
supportable by gnu.lists.FString, so the trick is to find the
best way to supply them.

I don't know what the best implementation is.  One direct approach
is to implement all of the remaining java.lang.String methods 
on gnu.lists.FString.  The only downside is that when Sun adds new
methods to java.lang.String in subsequent JDK releases, FString must
be updated to include those methods.  

It may also be possible to detect that the receiver is an FString,
and the method is a String method, and generate code to convert
it first.  This would be slower but more robust.  

The first approach seems preferable, since methods aren't often 
added to class java.lang.String.

3) Applying non-destructive Scheme string procedures to Java strings

At the moment, none of the Scheme string procedures accept Java
strings as arguments.  E.g.:

(string-length (make <java.lang.String> "foo"))
Argument #1 (foo) to 'string-length' has wrong type (java.lang.String)

For string functions that don't modify the receiver, it should be
straightforward to change their definitions to work on Java strings as
well:

(string-length (make <java.lang.String> "foo"))
=> 3

Presumably Scheme string procedures that *return* strings would always
return Scheme strings, e.g.:

(string-copy (make <java.lang.String> "foobar"))
=> "foobar"

The return value of string-copy would be a Scheme string.

For fully seamless behavior, the string? type predicate should return
true for Java strings:

(string? (make <java.lang.String> "foo"))
=> #t

It might be nice (albeit optional) to offer two new predicates for
distinguishing the two types:

(java-string? (make <java.lang.String> "foo"))
=> #t
(java-string? "foo")
=> #f
(kawa-string? "foo")
=> #t
(kawa-string? (make <java.lang.String> "foo"))
=> #f

Another approach would be to have string? return #f for Java strings.
However, I believe this would make life inconvenient for developers.
After we've gone through the exercise of making Kawa and Java strings
seamlessly interoperable, most code that dispatches on type will have
to check for both cases:

(if (or (string? arg) (instance? <java.lang.String> arg)) ...)

Thus I feel it would be better to have string? return true for Java
strings.  This is something of a philosophical position:  one that
views strings as a "fundamental" data type that should be treated as
consistently as possible across the Kawa/Java boundary, in much the
same fashion as numbers, booleans and characters:

;; characters are converted seamlessly, going from kawa to java:
(invoke-static <java.lang.Character> 'isDigit #\9)
=> #t

;; ...and from java to kawa:
(char-upcase (invoke (make <java.lang.String> "foo") 'charAt 0))
#\F

;; so are booleans:
(if (static-field <java.lang.Boolean> 'FALSE) "yes" "no")
=> "no"

;; in both directions:
(java.lang.String:valueOf #t)
=> true

;; and so (mostly) are numbers:
(+ (invoke-static <java.lang.Integer> 'parseInt "2")
   (invoke-static <java.lang.Integer> 'parseInt "3"))
=> 5

I'll be the first to admit I haven't done the multibyte analysis here,
but my gut tells me that my proposal won't make multibyte any harder
than it already is.

4) Applying destructive Scheme string procedures to Java strings

This is, of course, the tricky use case, as Java strings are
immutable and Scheme strings are mutable:

(let ((str "foobar"))
     (string-set! str 5 #\z)
     str)
"foobaz"

I don't have a strong preference for the semantics of destructive
operations on Java strings.

One simple approach is to throw an exception, and require the user to
convert to a Kawa string (e.g. via string-copy) before any destructive
operations will work.

Another approach is to convert the Java string to an FString any time
one of these operations is invoked.  I don't know how difficult this
is to implement -- there could be many references to the string, and
they'd somehow need to be updated to refer to the new object.

One can imagine more elaborate approaches -- for instance, wrapping
all Java strings internally by a StringBuffer or some other wrapper
type, so external references to a string don't need to be updated if
you modify it.  I personally doubt it's worth the effort.

I'm open to suggestions here.  My inclination is to throw an
exception, indicating that modifications of Java strings aren't
supported, and perhaps suggesting that string-copy should be used 
to convert to a Scheme string first.  This approach has clean (if
non-ideal) semantics, and would allow us to get most of the
other compatibility cases resolved quickly.

I appreciate your feedback on all this.

-steve


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]