This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Character classifications and language-dependence


Hi,

Keld Jørn Simonsen <keld@dkuug.dk> writes:

> On Fri, Sep 15, 2006 at 09:51:52AM +0200, Ludovic Courtès wrote:
>> This is a good point.  More generally, readers of variants of the Latin
>> alphabet will recognize accented Latin letters as letters.
>> 
>> OTOH, "i18n" also includes letters from other alphabets, like Greek and
>> Cyrillic, and it is unclear whether all those alphabets (and variants
>> thereof) can be considered "mutually recognizable" by their readers.
>> 
>> "Recognizability" of a letter is probably very subjective.  For
>> instance, accented letters found in Castellano, Italian, and French,
>> certainly look familiar to each other.  However, accented Latin letters
>> found in Central and Eastern European languages (e.g., `e' with cedilla,
>> as in Polish -- more generally, Latin letters not part of Latin-1)
>> certainly look very "unusual" to readers of French, Castellano, Italian,
>> etc...
>
> My first observation is that when these strange characters occur, it is
> for a reason. There is an intended audience that will understand what is
> written, and for those, as they would know how to read it, then it
> should follow the rules for the characters and scripts in question. 
>
> My other observation is that in the EU, where both you and I live, all
> citizens are required by law to be treated equally, in every member
> state of the EU. [...]

Sorry if that wasn't clear from my previous email, but I fully agree
with you as far as respect of cultures and languages is concerned.  IMO,
that obviously is not limited to the EU.  Also, respecting languages
implies that phrases such as "these strange characters" should be
considered inappropriate.

Anyway, this is my personal opinion and this is not what I wanted to
talk about in the first place.

>> Initially, I was just wondering whether this broad and (to some extent)
>> language-independent character classification is glibc-specific, or
>> whether it is following some standard or recommendation.
>
> AFAIK glibc follows ISO 14652 recommendations, which essensially is the
> same as what Unicode advocates: that all the letters of the different
> script and also the ideographics are considered belonging to class
> alpha.

So perhaps the ISO 14652 paragraph about the "i18n" FDCC-set that I
quoted in my first message should be interpreted as a recommendation to
include "i18n" in all locales?  Is it what you meant?

If this is the case, the language-independent character classification
found in glibc is not glibc-specific but standard-conforming.

Thanks,
Ludovic.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]