This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

PATCH] [BZ 14094 13064] Update locale data to Unicode 7.0.0


Hi,

  I have worked on updating UTF-8 file to Unicode 7.0.
  Patch size is around 1.2MB, looks like libc-alpha not allowing me post
that size attachment.
  Attached patch [1] to bug [2].

  1. Present patch is only for CHARMAP, patch for updating WIDTH will be
available soon.
  2. utf8-gen.py: New script to generate UTF-8 file.
  3. patch is created by ignoring space changes (-w)
  4.
   ''' Where UnicodeData.txt file has given characters in range
    Example:
    3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
    4DB5;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;

    UTF-8 file mention these range by adding 0x3F inbetween First and
Last Unicode character.
    Example:
    <U3400>..<U343F>     /xe3/x90/x80         <CJK Ideograph Extension A>
    .
    .
    <U4D80>..<U4DB5>     /xe4/xb6/x80         <CJK Ideograph Extension A>

*    Note: No idea why Hangul syllable AC00; D7A3; were not expanded in
Unicode **
**    5.0 UTF-8. We are following consistency and expanding Hangul as
well.**
*    '''

    5. Name changes are in UnicodeData.txt in some cases.
    ''' Some characters have <control> as a name, so using "Unicode 1.0
Name"
     Characters U+0080, U+0081, U+0084 and U+0099 has "<control>" as a
     name and even no "Unicode 1.0 Name" (10th field) in UnicodeData.txt
     We can write code to take there alternate name from NameAliases.txt '''

    Let me know if any issues, doubt or improvement possible.

Best Regards,
Pravin Satpute

1. https://sourceware.org/bugzilla/attachment.cgi?id=7679
2. https://sourceware.org/bugzilla/show_bug.cgi?id=14094


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]