"C" UTF-8 trouble

Andy Koppe andy.koppe@gmail.com
Mon Oct 5 16:40:00 GMT 2009


Vim and emacs both appear to have a hardcoded assumption that the
default "C" locale is 8-bit only. Since the "C" locale now defaults to
UTF-8, this means that non-ASCII characters don't work out-of-the-box
after all. :(

Strictly speaking, vim and emacs are wrong to do this, because they
should be leaving the charset up to setlocale and the multibyte
conversion functions. But if these two treat "C" specially, we
probably have to assume that others do the same and consider this a
de-facto standard.

They're both fine, however, if the locale is set to "C.UTF-8" or any
other explicit UTF-8 locale. Therefore, here's one way to address this
issue that avoids patching such apps:

When the Windows environment is translated at DLL startup, and if LANG
is not already set, set it to "C.UTF-8". This has the same semantics
as plain "C", and LC_ALL as well as the specfic LC_* variables would
still override it if set. Yet apps such as emacs and vi wouldn't make
any undue assumptions.

Andy



More information about the Cygwin-developers mailing list