This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

UTF-8 as default charset?


Perhaps this one's better discussed on -developers as well.

2009/9/23 Corinna Vinschen:
> However, if we default to UTF-8 for a subset of languages anyway, it
> gets even more interesting to ask, why not for all languages?

Hmm, why indeed? As far as I know, there's no technical reason not to
do it. POSIX only requires 7-bit ASCII anyway, and the DC?? scheme
ensures that filenames are fully 8-bit clean even with invalid UTF-8.
So for a completely new system, there'd be no question whatsoever:
UTF-8 is the right way to go.


> Isn't it better in the long run to have the same default for all Cygwin
> installations?

Agreed. And I'm getting round to the viewpoint that the scheme I
proposed would only delay the inevitable, and trade reduced pain now
for increased complexity and extra pain down the line.


> I'm really wondering if we shouldn't simply default to UTF-8 as charset
> throughout, in the application, the console, and for the filename
> conversion.

That would certainly make plenty of sense.

I assume that default would apply both to "C" and the likes of "en" and "en_US"?


> Yes, not all applications will work OOTB with chars > 0x7f,
> but it was always a bug to make any assumptions for non-ASCII chars
> in the C locale. ÂApplications can be fixed, right?

Quite. In particular since Linux has been through this already, so at
least the popular apps should already have build options to enable
full locale support. Of course testing and packaging is still a
significant effort, but as an encouraging example here's a mintty user
who managed to build ncurses and mutt with UTF-8 support:

http://code.google.com/p/mintty/issues/detail?id=124#c28

The other significant compatibility issue is with users' existing ANSI
codepage files. The charset support provides the tools to deal with
that though, so it'd be all about fielding the resulting questions and
complaints ...

Andy


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]