This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Japanese/Chinese language question


On Thu, Jan 21, 2010 at 8:40 AM, Corinna Vinschen  wrote:
> would somebody with Japanese and/or Chinese language background be so
> kind to answer the below two questions?

I have some (outdated) background in I18N and Japanese L10N, though
I'm not a native speaker of either Japanese or any Chinese language.
So I can't offer native intuition, but I can relay some technical info
that might be helpful:

> When comparing strings linguistically (strcoll/wcscoll),
> - are Hiragana and Katakana forms of the same character to be
> Âtreated as equal or as different?

(Nit: they are not "the same character" in either the technical or
traditional sense of "character"; they're the same syllable, but
represented by different characters.)

From the Unicode point of view, they are distinct; there is no defined
equivalence, either canonical or compatibility, between corresponding
Katakana and Hiragana syllables.  The collation algorithm (which does
take linguistic context into account) doesn't seem to say anything
about such comparisons, though it's possible I missed something.

 But as a precedent which might be helpful, I note that with
linguistic sensitivity active, Oracle 10g does compare Hiragana and
Katakana forms of the same syllable as equal.

> - are half-width and full-width forms of the same CJK character
> Âtreated as equal or as different?

According to the Unicode normalization algorithm, half -width and
full-width forms normalize to the same character, so they should be
treated as equivalent.  From the point of view of Unicode, there is no
semantic difference, and the width property is informative, not
normative. It's primarily encoded in Unicode to preserve round-trip
compatibility with other standards, though it's also helpful for hints
to rendering algorithms.

-- 
Mark J. Reed <markjreed@gmail.com>

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]