This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Add "@cjknarrow" modifier (was Re: [Fwd: [1.7] wcwidth failing configure tests])


I have no problem with your change.

-- Jeff J.

Corinna Vinschen wrote:
Jeff,

do you have any opinion about this change?  I would like to get it (or
some variation of it) into Cygwin 1.7.


Thanks, Corinna

On Jun 15 10:44, Corinna Vinschen wrote:
On Jun 14 22:18, IWAMURO Motonori wrote:
2009/6/13 Corinna Vinschen
The problem appears to be that there is no standard for the handling
of ambiguous characters.
Yes, but the guideline exists.
http://cygwin.com/ml/cygwin/2009-05/msg00444.html
A single mail in a single mailing list of a single project.  That's rather
a suggestion than a guideline...

Ambiguous characters behave like wide or narrow characters depending
on the context (language tag, script identification, associated
font, source of data, or explicit markup; all can provide the
context). If the context cannot be established reliably, they should
be treated as narrow characters by default.
Define the default for ja, ko, and zh to use width = 2, with a
@cjknarrow (or whatever) modifier to use width = 1.
I think it is good idea.
If everybody agrees to this suggestion, here's the patch.  Tested
with various combinations like

  LANG=ja_JP.UTF-8@cjknarrow
  LANG=ja_JP@cjknarrow
  LANG=ja.UTF-8@cjknarrow
  LANG=ja@cjknarrow


Corinna



* libc/locale/locale.c (loadlocale): Add handling of "@cjknarrow" modifier on _MB_CAPABLE targets. Add comment to explain.


Index: libc/locale/locale.c
===================================================================
RCS file: /cvs/src/src/newlib/libc/locale/locale.c,v
retrieving revision 1.20
diff -u -p -r1.20 locale.c
--- libc/locale/locale.c 3 Jun 2009 19:28:22 -0000 1.20
+++ libc/locale/locale.c 15 Jun 2009 08:40:46 -0000
@@ -397,6 +397,9 @@ loadlocale(struct _reent *p, int categor
int (*l_wctomb) (struct _reent *, char *, wchar_t, const char *, mbstate_t *);
int (*l_mbtowc) (struct _reent *, wchar_t *, const char *, size_t,
const char *, mbstate_t *);
+#ifdef _MB_CAPABLE
+ int cjknarrow = 0;
+#endif
/* "POSIX" is translated to "C", as on Linux. */
if (!strcmp (locale, "POSIX"))
@@ -427,10 +430,14 @@ loadlocale(struct _reent *p, int categor
if (c[0] == '.')
{
/* Charset */
- strcpy (charset, c + 1);
- if ((c = strchr (charset, '@')))
+ char *chp;
+
+ ++c;
+ strcpy (charset, c);
+ if ((chp = strchr (charset, '@')))
/* Strip off modifier */
- *c = '\0';
+ *chp = '\0';
+ c += strlen (charset);
}
else if (c[0] == '\0' || c[0] == '@')
/* End of string or just a modifier */
@@ -442,6 +449,17 @@ loadlocale(struct _reent *p, int categor
else
/* Invalid string */
return NULL;
+#ifdef _MB_CAPABLE
+ if (c[0] == '@')
+ {
+ /* Modifier */
+ /* Only one modifier is recognized right now. "cjknarrow" is used
+ to modify the behaviour of wcwidth() for East Asian languages.
+ For details see the comment at the end of this function. */
+ if (!strcmp (c + 1, "cjknarrow"))
+ cjknarrow = 1;
+ }
+#endif
}
/* We only support this subset of charsets. */
switch (charset[0])
@@ -604,13 +622,15 @@ loadlocale(struct _reent *p, int categor
__mbtowc = l_mbtowc;
__set_ctype (charset);
/* Check for the language part of the locale specifier. In case
- of "ja", "ko", or "zh", assume the use of CJK fonts. This is
- stored in lc_ctype_cjk_lang and tested in wcwidth() to figure
- out the width to return (1 or 2) for the "CJK Ambiguous Width"
- category of characters. */
- lc_ctype_cjk_lang = (strncmp (locale, "ja", 2) == 0
- || strncmp (locale, "ko", 2) == 0
- || strncmp (locale, "zh", 2) == 0);
+ of "ja", "ko", or "zh", assume the use of CJK fonts, unless the
+ "@cjknarrow" modifier has been specifed.
+ The result is stored in lc_ctype_cjk_lang and tested in wcwidth()
+ to figure out the width to return (1 or 2) for the "CJK Ambiguous
+ Width" category of characters. */
+ lc_ctype_cjk_lang = !cjknarrow
+ && ((strncmp (locale, "ja", 2) == 0
+ || strncmp (locale, "ko", 2) == 0
+ || strncmp (locale, "zh", 2) == 0));
#endif
}
else if (category == LC_MESSAGES)



--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]