This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On 16 November 2010 17:58, Corinna Vinschen wrote: > On Nov Â9 22:06, Andy Koppe wrote: >> The attached small patch affects character widths as reported by >> wcwidth(). It addresses an obscure issue. >> >> The CJK ambiguous width category contains characters that are one >> character cell wide in some contexts and two cells in others. That >> category doesn't actually contain CJK characters as such, but things >> like the Greek and Cyrillic alphabets, accented Latin characters, and >> also line drawing characters. These are usually one cell wide, but in >> CJK legacy encodings such as SJIS or GBK, they were encoded as two >> bytes, and the usual practice was to have the display width correspond >> to the number of bytes. Accordingly, CJK terminal fonts usually have >> double-width glyphs for the affected characters. See also >> http://unicode.org/reports/tr11/#Ambiguous. >> >> Newlib currently decides which width to use based on the selected >> LC_CTYPE locale, i.e. it will use double width for "zh", "jp", and >> "ko" locales, and single width for everything else, independent of the >> selected character set. The attached patch changes this so that single >> width will always be used for single-byte encodings such as the >> ISO-8859 ones, and that double width will always be used for the CJK >> legacy encodings. For UTF-8, the decision will still be made based on >> the locale. The @cjknarrow modifier can still be used to force single >> width, independent of locale and encoding. >> >> The point of this is to fit in with the historical use of those legacy >> encodings, since the ambiguity only arose once the different charsets >> were combined into Unicode. I doubt anyone is using nonsensical >> locale/encoding combinations such as de_DE.GBK or ja_JP.ISO-8859-1, so >> this is primarily about the likes of C.GBK and C.SJIS. Those are >> currently ambiguous-narrow, but vim for example treats them as >> ambiguous-wide, which makes for "interesting" effects when editing >> files containing affected characters. The patch here fixes that. >> >> Tested in Cygwin. I assume this will need to wait for Corinna's return. >> >> Â Â Â * libc/locale/locale.c: Fix ambigous width to one for singlebyte >> Â Â Â charsets and two for non-Unicode multibyte charsets. > > This appears to make a lot of sense. ÂWould you mind to enhance your > patch slightly to fix also the description in the locale.c > documentation? ÂThere's a related paragraph starting with "This > implementation also supports a single modifier, <<"cjknarrow">>..." Sorry, I hadn't seen that. Amended patch attached. * libc/locale/locale.c (loadlocale): Fix width of CJK ambigous characters to 1 for singlebyte charsets and 2 for non-Unicode multibyte charsets. Change documentation accordingly. Andy
Attachment:
ambiwidth2.patch
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |