locale initialization issue

Andy Koppe andy.koppe@gmail.com
Wed May 4 06:04:00 GMT 2011


Hi,

I stumbled across an issues with locale initialization when the "C"
locale is specified in the environment.

$ cat test.c
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>
#include <langinfo.h>

int main(void) {
  char cs[8];
  puts(nl_langinfo(CODESET));
  printf("%i\n", wctomb(cs, 0x80));
  return 0;
}

The program doesn't call setlocale, so it should be using the "C"
locale with its ASCII charset, which means the wctomb() call with a
codepoint outside the ASCII range should fail. And that's exactly what
happens as long as the locale set in the environment is something
other than "C", e.g.:

$ LC_ALL=C.UTF-8 ./test
ANSI_X3.4-1968
-1

$ LC_ALL=en_GB.ISO-8859-15 ./test
ANSI_X3.4-1968
-1

However, if the environment locale is "C", the charset is still
reported as ASCII (aka ANSI_X3.4-1968), but the wctomb call suddenly
succeeds:

$ LC_ALL=C ./a
ANSI_X3.4-1968
2

That's due to a combination of three things: Cygwin newlib starts with
the __wctomb and __mbtowc function pointers set to the UTF-8 variants
(for conversions during early Cygwin initialization), yet the LC_CTYPE
locale is set to "C", and setlocale() does nothing if the requested
locale is the same as the previous one.

Hence, with the locale set to "C" in the environment, both the
setlocale call from initial_setlocale(), which asks for the
environment locale for filename conversion, and the setlocale() just
before main() that sets the "C" locale, end up doing nothing. Thus the
conversion functions remain set to the UTF-8 variants instead of being
set to the ASCII ones as intended for the "C" locale.

The attached small patch addresses this by starting with the LC_CTYPE
locale set to "C.UTF-8"  and lc_ctype_charset set accordingly too.
This means that setting the "C" locale is recognised as a change and
that the conversion function pointers are updated accordingly. It also
has the happy side effect that the setlocale call from
initial_setlocale() will be short-circuited if the default "C.UTF-8"
locale has not been overridden in the environment.

Additionally, I think it's time to drop the "temporarily" #if 0'd code
for making UTF-8 the charset for the "C" locale.

It's a newlib patch, but it's entirely Cygwin-specific, so it seemed
more appropriate to send it here.

	* libc/locale/locale.c [__CYGWIN__]
	(current_categories, lc_ctype_charset): Start with the LC_CTYPE locale
	set to "C.UTF-8", to match initial __wctomb and __mbtowc settings.
	(lc_message_charset, loadlocale): Settle on ASCII as the "C" charset.

Andy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lc_ctype.patch
Type: application/octet-stream
Size: 1564 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20110504/187b7fd8/attachment.obj>


More information about the Cygwin-patches mailing list