This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Making an embedded UTF-8 C locale


Hi folks,

I've just subscribed to the list.  Just to introduce myself, I'm
a Debian developer with several interests, one of which is
distribution-wide support of UCS and UTF-8.  This mail is about
UTF-8 support in the glibc C locale.


What I'd like to do/propose is twofold:

1) In addition to the "C" locale hardcoded into glibc, I'd like
   to additionally provide a "C.UTF-8" locale.  This would be
   identical to the standard C locale and would remain POSIX
   compliant, with the exception that the locale codeset would
   be UTF-8 instead of ASCII.

2) At some future point, I'd like to make the "C.UTF-8" locale
   the default "C" locale, but that's not really the goal right
   at this point.


This came out of the discussion in this bug report:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522776

and this thread on debian-devel:
http://lists.debian.org/debian-devel/2009/08/msg00311.html
http://lists.debian.org/debian-devel/2009/08/msg00413.html

To summarise, there is a need for a standard default UTF-8 locale
that can be relied upon to be present at all times.  For the above,
these were needed at package build time where a particular package
needed a UTF-8 locale for its build, and for a system service at
system startup (before /usr gets mounted).  Having the locale
embedded directly into glibc would allow it to be used right from
starting init, and would be something that could be relied upon
to always be present.  Right now, you need to know in advance the
name of a UTF-8 locale, but it can't be relied upon to be present
on all systems, and it isn't present before /usr is mounted.

I've spent some time looking through the glibc sources to look at
making a patch for this, but I'm afraid I'm insufficiently
familiar with the sources and internal locale data structures to
take a good stab at it.  Could anyone point me at any documentation
of this, if available or provide any pointers for where to get
started?

One thing I was unsure of is if the C locale source files were
created by hand or had been generated by a tool at some point
in the past from the locale data files and, if so, if the same
process could be used to generate UTF-8 equivalents?


Many thanks,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]