"C" UTF-8 trouble

Eric Blake ebb9@byu.net
Wed Oct 7 12:04:00 GMT 2009


According to Corinna Vinschen on 10/7/2009 3:03 AM:
>> Unfortunately that's not the case for emacs.
> 
> As for Emacs, I'm wondering if it shouldn't be changed to set its locale
> according to setlocale(LC_CTYPE,NULL) instead, given what POSIX says.

Yes, we should raise this as an upstream bug in emacs.

> Urgh.  So we have to change nl_langinfo in newlib as well.  Do we have
> to return "US-ASCII" if charset is "ASCII", or is it sufficient to
> return __locale_charset() as you did, thus returning "ASCII" for "ASCII"?

Gettext ships (well, used to ship, until recently disabling it for cygwin
1.5 because of lacking locale support) a charset.alias file, which mapped
arbitrary nl_langinfo(CODESET) values into canonical forms.  I think you
are free to return whatever string is easiest, as long as it is documented
as one of the accepted aliases in that file.  But as to a canonical name,
gettext prefers "ASCII", not "US-ASCII".

gnulib also has a function locale_charset, called by a number of packages
(coreutils, tar, findutils, ...), which uses nl_langinfo(CODESET), so
those packages are all depending on learning "UTF-8" if we are in the
default locale.

> For a start, here's a first untested cut at newlib's locale.c, which
> allows us to add any desired mechanism to switch the default locale.
> 
> If you agree to this, I'll propose it on the newlib list.

POSIX does say that the default is implementation-defined, so we have at
least a chance of convincing newlib that we need a hook to let us do our
implementation definition (whether it be by file or otherwise).

-- 
Don't work too hard, make some time for fun as well!

Eric Blake             ebb9@byu.net



More information about the Cygwin-developers mailing list