This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: non-BMP character width


Corinna Vinschen wrote:
> Can you please create a simple self-contained testcase?  I'm not exactly
> sure how this is supposed to work and if a solution exists.  Is that a
> problem for the non-UTF-8 case, too, or for UTF-8 only?

Sorry for the late response; I see you reproduced the case meanwhile -
anyway, here is a test case, to be used with gcc or just with cat:

/* print U+20000 ð */
int main () {
  printf ("<U+20000> is <ð>\n");
}

where you could enter the character in mined with Control-V #20000 Enter :)

About non-UTF-8, I tried to test in Big5, using character 0x8750 which is U+242BF,
and the test suggests it's OK (in cygwin console, mintty, and rxvt-unicode); 
however, that may not be significant since although its Unicode code 
point is non-BMP, the Big5 character is only 16 bits and Windows, 
having supported CJK before Unicode, probably doesn't handle this via Unicode.
I also tried to test eucJP, but that doesn't seem to work at all and mintty crashes...

See my other comment below, please.


On Sep 22 06:57, Lapo Luchini wrote:
> ...
> Actually, I can't reproduce that, but I guess it's a problem of the
> specific console he's using (Thomas, which one is that?): on mintty it
> works ok (I'm not really sure it outputs U+10001, but it surely shows a
> single box)...
The problem used to be in mintty as well until I pointed it out and 
Andy was so ambitious to find a workaround - maybe he could supply a 
code snipplet which would fix this in the cygwin console too, despite 
the bug origin being in the Windows API...

> and on rxvt it just shows as four ISO-8859-1 chars:
> (es expected, as native rxvt doesn't support Unicode)
You would have to test this with rxvt-unicode (urxvt in cygwin) 
where the test case passes (one box). (Not very relevant maybe, 
if reports are true that rxvt is not maintained anymore.)

Corinna wrote:
> > ...
> Uh, I see.  That occurs in the normal Windows console.  This is not
> Cygwin's fault.  Cygwin's console code converts the multibyte string to
> the WCHAR representation and prints it to the console using the
> WriteConsoleW function.  That function prints two blocks/question marks
> for a surrogate pair.  Look at the file in a cmd shell, it will also
> print two blocks/question marks for the surrogate pair.
I was assuming that, like for mintty, the fault was not in the cygwin domain, 
however, as there is a workaround, I thought it would be nice for the cygwin 
console as well.

Kind regards,
Thomas

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]