"C" UTF-8 trouble

Thomas Wolff thomas.wolff@nsn.com
Wed Oct 7 14:05:00 GMT 2009


ext Corinna Vinschen wrote:
> On Oct  7 11:08, Andy Koppe wrote:
>   
>> 2009/10/7 Corinna Vinschen:
>>     
>>> Urgh.  So we have to change nl_langinfo in newlib as well.  Do we have
>>> to return "US-ASCII" if charset is "ASCII", or is it sufficient to
>>> return __locale_charset() as you did, thus returning "ASCII" for "ASCII"?
>>>       
>> I'd assume so, but WWLD?
>>     
>
> ===
> #include <stdio.h>
> #include <locale.h>
> #include <langinfo.h>
>
> int main ()
> {
>   char *l;
>
>   setlocale (LC_ALL, "");
>   l = nl_langinfo (CODESET);
>   if (l)
>     printf ("%s\n", l);
>   return 0;
> }
> ===
>
> $ ./nll
> ANSI_X3.4-1968
>
> $ LANG=C.UTF-8 ./nll
> ANSI_X3.4-1968
>
> $ LANG=ja_JP ./nll
> EUC-JP
>
> $ LANG=ru_RU ./nll
> ISO-8859-5
>
> $ LANG=ru_UA ./nll
> KOI8-U
>
> $ LANG=zh_CN ./nll
> GB2312
>
> $ LANG=zh_TW ./nll
> BIG5
>
> Sigh.  Do we really need a translation table?
>   
Yes (sigh). And yes, that's what I had suggested before. Actually, 
"locale charmap" (on a system with a locale command) gives you the same 
information as "nll".
If you want a table, a fairly complete one is included in my package 
mined, file src/locales.t (generated from src/locales.cfg).
(Complete in the sense that all locales without explicit suffix not 
listed here map to ISO-8859-1; maybe I should also include them to 
distinguish unknown locales ...)
And, as becomes clear here, the syntax of charmap/codeset names is 
different between locale names and nl_langinfo,
e.g. eucJP vs. EUC-JP.

Thomas



More information about the Cygwin-developers mailing list