CYGWIN=codepage? Or LC_CTYPE=foo?

Kazuhiro Fujieda fujieda@jaist.ac.jp
Sun Apr 6 07:00:00 GMT 2008


>>> On Thu, 03 Apr 2008 17:54:48 +0200
>>> Corinna Vinschen said:

> That means, in theory there's no reason anymore to keep the
> CYGWIN=codepage setting in the environment.  We could use the LC_CTYPE
> setting, just as on other systems.  Right now, we need the LC_CTYPE
> set to "C-UTF-8" anyway when using the codepage:utf8 setting, otherwise
> the wcstombs and mbstowcs conversions in newlib will be broken.
>
> But there's a problem.  The newlib conversion functions don't know
> anything about Windows codepages, and the Windows conversion functions
> used in the Cygwin functions sys_wcstombs and sys_mbstowcs don't know
> anything about LC_CTYPE. 

The LC_CTYPE is defined to control the character handling of not
system calls but C library functions by the specification. I
believe Cygwin DLL should use sys_wcstombs and sys_mbstowcs with
CYGWIN=codepage, and not depend on userland functions.

Cygwin DLL, however, has both of system calls and userland
functions. Controlling them by LC_CTYPE at the same time is not
bad idea.

To achieve this, it is necessary to make functions related to
character handling know about the mapping between locale names
and Windows codepages. For example, if LC_CTYPE is set to
de_DE@ISO-8859-15, they should know it designate the codepage 28605.

The current implementations of mbstowcs and wcstombs do not work
at all in this scenario. We must replace the implementations
with ones based on MultiByteToWideChar and WideCharToMultiByte.
The emulation will take a little cost. Cygwin DLL should also
use sys_wcstombs and sys_mbstowcs in this scenario.
____
  | AIST      Kazuhiro Fujieda <fujieda@jaist.ac.jp>
  | HOKURIKU  School of Information Science
o_/ 1990      Japan Advanced Institute of Science and Technology




More information about the Cygwin-developers mailing list