This is the mail archive of the
libc-locales@sourceware.org
mailing list for the GNU libc locales project.
Re: locale encodings
- From: Steven Abner <pheonix at zoomtown dot com>
- To: Keld Simonsen <keld at keldix dot com>
- Cc: Troy Korjuslommi <tjk at tksoft dot com>, libc-locales at sourceware dot org
- Date: Tue, 12 Nov 2013 09:34:45 -0500
- Subject: Re: locale encodings
- Authentication-results: sourceware.org; auth=none
- Authentication-results: smtp02.cincibell.synacor.com smtp dot mail=pheonix at zoomtown dot com; spf=unknown; sender-id=unknown
- Authentication-results: smtp02.cincibell.synacor.com header dot from=pheonix at zoomtown dot com; sender-id=unknown
- Authentication-results: smtp02.cincibell.synacor.com smtp dot user=pheonix at zoomtown dot com; auth=pass (LOGIN)
- References: <31AACAB8-A716-47CC-B755-F33DD77BA51E at zoomtown dot com> <1384174607 dot 4028 dot 8 dot camel at uno11 dot loco> <20131112012257 dot GA31828 at rap dot rap dot dk> <5281BEB1 dot 2010909 at redhat dot com> <20131112133642 dot GA22738 at rap dot rap dot dk>
- X_cmae_category: 0,0 Undefined,Undefined
On 12 Nov 2013, at 8:36 AM, Keld Simonsen wrote:
> On Tue, Nov 12, 2013 at 12:37:53AM -0500, Carlos O'Donell wrote:
>> On 11/11/2013 08:22 PM, Keld Simonsen wrote:
>>> Well, the encoding of the source coode of all locales should be 7-bit ascii, for
>>> maximum portability. Then the target encoding should be recorded via the
>>> % charset specification, which gives a list of possible charsets, comma separated.
>>> UTF-8 should always be included there, but other encodings should also be available.
>>
>> So one of the points that we've been trying to gather consensus on is:
>> Is it really important to have 7-bit ASCII? Why not use UTF-8 for the
>> the locale source? It's readily readable by all editors and allows
>> language specific comments in teh source files for maximum maintenance.
>
> I think to have UTF-8 is a bad idea, eg for embedded systems, and for systems that is
> not maintained in UTF-8. It also can give trouble when communicating the source.
FWIW all data that is important, save one, is in POSIX's 7-bit ASCII. From the ones I've examined and
patched, seem to be an almost identical copy from Section 7 of The Open Group Base Specifications.
There are some that have minor data problems, but I was trying to access the default character set.
That happens to be in the "comments" section for some reason.
Steve