This is the mail archive of the guile@cygnus.com mailing list for the guile project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: i18n; wide characters; Guile

To: guile@cygnus.com
Subject: Re: i18n; wide characters; Guile
From: Duane Ellis <duane_ellis@franklin.com>
Date: Mon, 20 Oct 97 08:49:17 EDT
Reply-To: duane@franklin.com


- Use 16-bit characters in strings throughout.
	
	Good idea...

- Prescribe the use of Unicode throughout.

	Good move. Internally we have/use a program I developed
	that makes everthing go through unicode... it has the advantage
	that there is a big fat book that has (nearly) everything
	listed and already done.

	It also has another great advantage you did not metion.

	I don't speak Gurmukhi (to paraphrase from the book) "a north indian script
        direived from lahnda.... etc, etc) but:

	a) but I have a book that lets me lookup that funny looking small F 
           that sits to the left or right of a letter and find out what it is.
	   Happens to be U+0A3F & U+0A40, GURMUKHI VOWEL SIGN I and II

	b) the next guy has access to that book too.. or have access to 
           that book

	c) There is a hell of a lot of work that went into the unicode system
           I don't think any body in the guile group has the time, money or
	   resources to duplicate that body of work - adopting something
	   that is already done.. makes a lot of sense.

- Provide functions to convert between Unicode character strings
  all other widely-used formats: UTF-8, UTF-7, Latin-1, and the JIS
  variants, as well as anything else people would like to contribute.

    win #1
	this would also facilitate creating "localization" tools. Basically
	centered around a 256 entry mapping that maps the localized
	8 bit charset into unicode.

	Example: I'm an english speaker, my input is primary english and output 
	is english, latin 1 suffices for me.

	A polish, czech, hungarian user may need Latin2... or say you are 
	some how using guile on a PC, you can map unicode into any IBM PC Code 
	Page into your localized charset.

	see ftp://ftp.unicode.org/MAPPINGS/VENDORS/MICSOFT/PC
							   WINDOWS
							   EBCDIC (run quick!)

						   APPLE/

	[note: the ftp.unicode.org  site is screwed up, I cannot access it via ftp in
	netscape, but if I do it all command line... from my sparc it works]

   win #2
	In some languages, accents are ignored for sort purposes, and in others
	they sort differently... and depending upon the book you grab it's even
	different in the same language. (Example: Sweedish english dictionaries
	I have seen the {A-ring} sorted after Z, and I have seen it sorted at
	the end of the letter A.

	For some languages, you can easily create a localization table
	that is also used for sorting purposes. For example, most all of us	
	know the english language sort order. however, in spanish there
	are the problems of the letters "ch" and "ll".

	The second situation is much like the problem of sorting dates
	written like this (gnu sort has some support for this)

		dec 23 1997
		jun  1 1997
		apr  3 1997

	Using Unicode also lets anybody create a generalized sorting
	function that can manage these funky sort order problems.

- Provide a separate "byte array" type, for applications which
  genuinely want this.

	-- no comment.

jim>	What I'm most interested in is your advice regarding character sets
	and (externally visible) text representations.  How would you
	recommend we go about supporting wide character sets?  What do you
	think of Unicode?

You really *must* include some automatic mapping
that turns an 8 bit stream in, say IBM PC codepage 1252
into Unicode, on the input & on the output.

I saw a discussion before about setting up guile to act deamon
so you could telnet to the port and talk to it.

It would be very helpful {understand, I have not used guile
yet, I've just been listening in} If I could setup a unicode
port that I could talk to, or... I could hookup a translator
on the stream so that I could output & input in say {shudder} 
'ebcdic' instead of ascii. maybe setup an input encoding, and
and output encoding that may well not be the same.

-Duane Ellis

References:
- i18n; wide characters; Guile
  - From: Jim Blandy <jimb@red-bean.com>

Prev by Date: Not graceful, but functional
Next by Date: Re: i18n; wide characters; Guile
Prev by thread: Re: i18n; wide characters; Guile
Next by thread: "Magic" Variables?
Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]