This is the mail archive of the mailing list for the Guile project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Fwd: [[Gnome-bindings] Strings and bindings]

(Feel free to forward this appropriately.)

Owen Taylor <> writes
> The Unicode standard is currently only using a 16-bit characters,
> all common characters for living languages are planned to be
> included in the 16-bit space, and many systems do use 16-bit
> characters. (Windows, Java, Python)
> Howevever, there will soon be some character sets defined out
> side of the 16-bit "Basic Multilingual Plane", and allowing
> 32-bit characters, is, IMO, nicer than confining oneself to
> an almost-full character space. 

Using 16 bits should not be a problem.  Unicode has support for
"surrogates".  This is an extension mechanism to support allowing 20
bits to be encoded using 2 16-bit Unicode characters.  That 20-bit
space is *far* from full - as far as I know, it is still officially
empty (though proposals have been made for rare scripts and symbols).

>  - Create an STL-string-like wrapper for a utf8 string. The
>    problem here is that you don't get O(1) random access, which
>    will no doubt disturb some of the people reading this.

But there is almost nothing useful you can do with strings that
requires O(1) random access using a character index, at least once
you're already dealing with non-trivial characters sets.  What you
sometimes need is efficient access to a position in the string, but
that can be a "magic cookie" represented using a byte offset.

So using UTF8 is perfectly reasonable.  Using 16-bit Unicode
with surrogates is also perfectly reasonable.  Using arrays
of 32-bit wide characters does not make sense to me (though
I know that glibc maintainer Ulrich Drepper feels strongly
	--Per Bothner

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]