This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Default locale for Russian/Russia should be ru_RU.CP1251


Hi,

I'm running Cygwin 2.2.0 on an English Windows 8.1 box:

> CYGWIN_NT-6.3 UNIT-725 2.2.0(0.289/5/3) 2015-08-03 12:51 x86_64 Cygwin

Windows regional settings are set to Russian/Russia.

In the absence of any settings in bashrc/bash_profile, `locale` command
outputs the following:

> LANG=ru_RU
> LC_CTYPE="ru_RU"
> LC_NUMERIC="ru_RU"
> LC_TIME="ru_RU"
> LC_COLLATE="ru_RU"
> LC_MONETARY="ru_RU"
> LC_MESSAGES="ru_RU"
> LC_ALL=

This is perfectly fine, except that "no charset" in the locale output
means "ISO charset", which is ISO-8859-5 for Russian/Russia and has
never been used (historically, DOS used CP866, Windows used CP1251 ANSI
codepage, and various Unices sticked to KOI8-R before the rise of
Unicode era).

The above is consistent with locale charmap output, which is again
ISO-8859-5.


Short C example also confirms ISO-8859-5 is used:

> #include <stdio.h>
> 
> #include <locale.h>
> #include <langinfo.h>
> 
> int main() {
>     const char *locale = setlocale(LC_ALL, "");
>     const char *codeset = nl_langinfo(CODESET);
>     printf("locale: %s\n", locale);
>     printf("codeset: %s\n", codeset);
> 
>     return 0;
> }

outputs

> locale: ru_RU/ru_RU/ru_RU/ru_RU/ru_RU/C
> codeset: ISO-8859-5


Cygwin docs state that

> Starting with Cygwin 1.7.2, the default character set is determined by the default Windows ANSI codepage for this language and territory.

which is not true in my case (Windows ANSI codepage for Cyrillic is
CP1251, not ISO-8859-5!). Surprisingly, for Belarusian (a.k.a
Belorussian, Eastern Slavic language very close to Russian) "be_BY"
locale the default charset is indeed CP1251 which is in accordance with
both the documentation and common sense.


Additionally, in `strace locale -u` output, I see multiple
> __get_lcid_from_locale: LCID=0x0419 
lines.

"0x0419" corresponds to Russian/Russia (see
<https://msdn.microsoft.com/en-us/library/windows/desktop/dd318693%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396>).

Despite that, $(locale -u) returns "en_GB", despite all regional
settings are set to Russian/Russia. I believe this is not correct,
either, and needs to be fixed.


Regards,
Andrey.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]