This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/4024] collation in pinyin for zh_CN locale


------- Additional Comments From ed dot trager at gmail dot com  2007-02-17 15:24 -------
Subject: Re:  collation in pinyin for zh_CN locale

Pinyin collation for zh_CN as the default will be great -- In fact, I
am surprised to learn it isn't done that way already!

I do have a question though:  What is the order of characters nested
within a given pinyin+tone category?  For example, is this going to
follow the standard order of one of the big dictionaries?  Or
something else?

My copy of "The Pinyin Chinese-English Dictionary ����" (Wu Jingrong
ed., Beijing Foreign Languages Institute 1979) seems to order
characters by number of strokes within pinyin category, i.e. "jin1":
å·¾ä»?æ?¤é??æ´¥ç?? ... etc.  This is one logical way to do it.

But my copy of ç?°ä»£æ±?语è¯?å?¸ ï¼?中å?½ç¤¾ä¼?ç§?å­¦é?¢è¯­è¨?ç??究æ?? Commercial Press Beijing 1986ï¼?
orders "jin1" completely differently: æ´¥ç¦?è¥?å·¾ä»?è¡¿ç?? ... etc.  I'm not sure
what the logic here is ...

Another logical way to do it would be to order by how frequently the
character is used.  If I remember correctly from an earlier post, the
perl script for generating the locale were pulling data from SCIM
tables.  So does this mean you are going to order based on character
usage frequency within pinyin+tone category?

Best - Ed

On 17 Feb 2007 07:44:43 -0000, fundawang at gmail dot com
<sourceware-bugzilla@sourceware.org> wrote:
>
> ------- Additional Comments From fundawang at gmail dot com  2007-02-17 07:44 -------
> > So, what exactly is the proposal?  Create a new locale zh_CN@pinyin
> > or using the new collation data for zh_CN?  The former sounds much
> > safer to me.
> There will be several collation for Chinese, like pronunciation (pinyin) and
> strokes. The most widely used collation is pinyin acturally. The collation of
> iso14651 is of no use for Chinese.
>
> So, the proposal is replacing current collation for zh_CN (iso14651) to pinyin.
> As for the strokes, we'll likely propose zh_CN@strokes in the future.
>
> --
>
>
> http://sourceware.org/bugzilla/show_bug.cgi?id=4024
>
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
>


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=4024

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]