This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: UCS data encoding in localedata
- From: Petr Baudis <pasky at ucw dot cz>
- To: Carlos O'Donell <carlos at systemhalted dot org>,Roumen Petrov <bugtrack at roumenpetrov dot info>
- Cc: libc-alpha at sourceware dot org, libc-locale at sourceware dot org
- Date: Fri, 13 Apr 2012 19:25:28 +0200
- Subject: Re: UCS data encoding in localedata
Hi!
On Fri, Apr 06, 2012 at 05:37:22AM -0400, Carlos O'Donell wrote:
> On Fri, Apr 6, 2012 at 4:18 AM, Petr Baudis <pasky@ucw.cz> wrote:
> > ?Does anyone know the technical reason for using the explicit <U0000>
> > UCS encoding in localedata instead of some sane approach like UTF8
> > encoded data? I can think of only historical reasons due to the lack
> > of support in tools (OS, editors, VCS, ...) in the past, however I
> > believe that by now, using UTF8 should be fairly safe.
>
> Yes, you are probably correct.
>
> The only other problem I can think of is that you'd be adding a
> circular dependency between a tool that uses this data and at the same
> time edits the data.
Hmm, I don't quite follow. You should be able to edit UTF8-encoded
data even in a non-locale aware editor, I think?
I can imagine that working with these patches may be painful, I'm not
sure how would bugzilla and MUAs deal with all this. A good compromise
would be to write at least ASCII data in plain, I think that would not
break anything at all? We could start the transition gradually with new
entries using this convention.
On Sat, Apr 07, 2012 at 12:29:37AM +0300, Roumen Petrov wrote:
> Switch of locale data to UTF-* could be fine, but this will not
> resolve issue with function nl_langinfo.
Could you elaborate, please?
--
Petr "Pasky" Baudis
Smart data structures and dumb code works a lot better
than the other way around. -- Eric S. Raymond