This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
- Use 16-bit characters in strings throughout. Good idea... - Prescribe the use of Unicode throughout. Good move. Internally we have/use a program I developed that makes everthing go through unicode... it has the advantage that there is a big fat book that has (nearly) everything listed and already done. It also has another great advantage you did not metion. I don't speak Gurmukhi (to paraphrase from the book) "a north indian script direived from lahnda.... etc, etc) but: a) but I have a book that lets me lookup that funny looking small F that sits to the left or right of a letter and find out what it is. Happens to be U+0A3F & U+0A40, GURMUKHI VOWEL SIGN I and II b) the next guy has access to that book too.. or have access to that book c) There is a hell of a lot of work that went into the unicode system I don't think any body in the guile group has the time, money or resources to duplicate that body of work - adopting something that is already done.. makes a lot of sense. - Provide functions to convert between Unicode character strings all other widely-used formats: UTF-8, UTF-7, Latin-1, and the JIS variants, as well as anything else people would like to contribute. win #1 this would also facilitate creating "localization" tools. Basically centered around a 256 entry mapping that maps the localized 8 bit charset into unicode. Example: I'm an english speaker, my input is primary english and output is english, latin 1 suffices for me. A polish, czech, hungarian user may need Latin2... or say you are some how using guile on a PC, you can map unicode into any IBM PC Code Page into your localized charset. see ftp://ftp.unicode.org/MAPPINGS/VENDORS/MICSOFT/PC WINDOWS EBCDIC (run quick!) APPLE/ [note: the ftp.unicode.org site is screwed up, I cannot access it via ftp in netscape, but if I do it all command line... from my sparc it works] win #2 In some languages, accents are ignored for sort purposes, and in others they sort differently... and depending upon the book you grab it's even different in the same language. (Example: Sweedish english dictionaries I have seen the {A-ring} sorted after Z, and I have seen it sorted at the end of the letter A. For some languages, you can easily create a localization table that is also used for sorting purposes. For example, most all of us know the english language sort order. however, in spanish there are the problems of the letters "ch" and "ll". The second situation is much like the problem of sorting dates written like this (gnu sort has some support for this) dec 23 1997 jun 1 1997 apr 3 1997 Using Unicode also lets anybody create a generalized sorting function that can manage these funky sort order problems. - Provide a separate "byte array" type, for applications which genuinely want this. -- no comment. jim> What I'm most interested in is your advice regarding character sets and (externally visible) text representations. How would you recommend we go about supporting wide character sets? What do you think of Unicode? You really *must* include some automatic mapping that turns an 8 bit stream in, say IBM PC codepage 1252 into Unicode, on the input & on the output. I saw a discussion before about setting up guile to act deamon so you could telnet to the port and talk to it. It would be very helpful {understand, I have not used guile yet, I've just been listening in} If I could setup a unicode port that I could talk to, or... I could hookup a translator on the stream so that I could output & input in say {shudder} 'ebcdic' instead of ascii. maybe setup an input encoding, and and output encoding that may well not be the same. -Duane Ellis