This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: non-BMP character width


On Sep 22 06:57, Lapo Luchini wrote:
> Corinna Vinschen wrote:
> > Sure.  I was specificially asking for a testcase, preferrably in
> > plain C, which allows to reproduce this under a debugger.
> 
> Actually, I can't reproduce that, but I guess it's a problem of the
> specific console he's using (Thomas, which one is that?): on mintty it
> works ok (I'm not really sure it outputs U+10001, but it surely shows a
> single box) and on rxvt it just shows as four ISO-8859-1 chars:
> (es expected, as native rxvt doesn't support Unicode)
> 
> mintty% echo "-\xF0\x90\x80\x81-"
> -???-
> rxvt% echo "-\xF0\x90\x80\x81-"
> -ð?????-
> 
> Also ok on `ls`:
> 
> % cat s.c
> int main() {
>     fopen("a-\xF0\x90\x80\x81", "w");
>     return 0;
> }
> % ./s
> % ls -l|fgrep a-
> -rw-r--r-- 1 lapo None     0 22 Sep 06:50 a-???

Uh, I see.  That occurs in the normal Windows console.  This is not
Cygwin's fault.  Cygwin's console code converts the multibyte string to
the WCHAR representation and prints it to the console using the
WriteConsoleW function.  That function prints two blocks/question marks
for a surrogate pair.  Look at the file in a cmd shell, it will also
print two blocks/question marks for the surrogate pair.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]