This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: mbstrings


Jens-Ulrik Holger Petersen <petersen@kurims.kyoto-u.ac.jp> writes:
> I am in favour of the idea of using Unicode, but just for the sake of
> completeness I would like to mention that the XEmacs-20 implementation
> of Mule does use characters (unlike Emacs-20 integer implementation).

Well - not really.

XEmacs does have a 19-bit character encoding based on Mule.
The problem is that this encoding is really a two-dimensional
encoding, consisting of two parts:  A code to specify the "character
encoding" combined with "position in that encoding".  The problem
is that many characters are common to many character encodings.
To compare characters for equality becomes a philosophical problem:

1) Do you just compare the character codes, ignoring that the
conceptually same character may be encoded many ways?

2) Do you canonicalize the characters so that equivalent characters
are equal?

2a) Do you do the canonicalization when you do the comparison, or

2b) when the character is created (e.g. read or stored in a string).

Of course - once you do canonicalization, you might as well use
Unicode.

In other words:  The Mule "characters" are problematical as
characters in the Scheme sense.

	--Per Bothner
Cygnus Solutions     bothner@cygnus.com     http://www.cygnus.com/~bothner