This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: mbstrings



>By analogy with BigNums (which are not distinguishable from TinyNums at
>the Scheme level), I propose:
>
>Characters in any particular string are of uniform length.  In the
>header of the string, right next to the length, we store the length of
>each character (in one bit).  If the stuff in quotes is all Latin-1,
>then the string is a TinyChar string, otherwise a BigChar string.
>String-set! checks the character size of the string against that of
>the assigned character, and if necessary widens either the character
>or the whole string.  Both string-ref and string-set! still run in
>constant time.  (Amortized: string-set! can take O(n) but only once.)

This is better than the current approach.  But again, you have
multiple string representations internally, which is still visible to
third-party C code.  The only way to get this to work (that I see)
would be to use 16-bit characters everywhere; that way, C developers
wouldn't even have the option of not supporting it, or at least
thinking about the issue.