"C" UTF-8 trouble

Corinna Vinschen corinna-cygwin@cygwin.com
Wed Oct 7 16:17:00 GMT 2009


On Oct  7 16:06, Thomas Wolff wrote:
> Corinna Vinschen wrote:
>> ...
>>
>> $ ./nll
>> ANSI_X3.4-1968
>>
>> $ LANG=C.UTF-8 ./nll
>> ANSI_X3.4-1968
>>
>> $ LANG=ja_JP ./nll
>> EUC-JP
>>
>> $ LANG=ru_RU ./nll
>> ISO-8859-5
>>
>> $ LANG=ru_UA ./nll
>> KOI8-U
>>
>> $ LANG=zh_CN ./nll
>> GB2312
>>
>> $ LANG=zh_TW ./nll
>> BIG5
>>
>> Sigh.  Do we really need a translation table?
>>   
> Yes (sigh). And yes, that's what I had suggested before. Actually, "locale 
> charmap" (on a system with a locale command) gives you the same information 
> as "nll".
> If you want a table, a fairly complete one is included in my package mined, 
> file src/locales.t (generated from src/locales.cfg).
> (Complete in the sense that all locales without explicit suffix not listed 
> here map to ISO-8859-1; maybe I should also include them to distinguish 
> unknown locales ...)
> And, as becomes clear here, the syntax of charmap/codeset names is 
> different between locale names and nl_langinfo,
> e.g. eucJP vs. EUC-JP.

I agree to the general picture.  However, as I mentioned in the mail
you're partially quoting, we just have to draw the line at one point,
even if the solution might be a bit bumpy for the time being.
Therefore, I think we should go for the value returned by
__locale_charset () *for now*.

If you want to contribute your table and the necessary code to make it
working within Cygwin, please feel free.  I'm very obviously glad for
helpful code which eases the internationalization pain.  As for
contributing, newlib's not a problem, while for Cygwin... <insert
obligatory reference to cygwin copyright assignment here>(*).


Corinna


(*) http://cygwin.com/assign.txt


-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat



More information about the Cygwin-developers mailing list