This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: With bad UTF-8, cygwin can create files it can't read


Hi Stuart,

On Mar 30 13:04, Corinna Vinschen wrote:
> On Mar 25 14:34, Kyzer wrote:
> > Hello,
> > 
> > I've found that if you use cygwin to create a file with badly-encoded
> > UTF-8, readdir() gives out an entry with a name that cygwin won't
> > subsequently accept.
> > 
> > * create a file using filename with hex bytes F4 8F BF BF
> > * readdir() reports the filename as hex bytes E2 8E B3 ED BF BF
> > * attempting to open or unlink the filename E2 8E B3 ED BF BF fails
> > * attempting to open or unlink the filename F4 8F BF BF succeeds
> 
> Thanks for the testcase.  I'll have a look later this week (I hope).

Wow.  Just wow.  You found a long-standing bug in the wctomb conversion
from UTF-16 to UTF-8.

As you probably know, Unicode values beyond the base plane (that is,
everything > 0xffff in UTF-32 and > ef bf bf in UTF-8 notation)
are represented as so-called surrogate pairs in UTF-16, two UTF-16
values in the 0xd800 - 0xdfff range.

While the conversion from UTF-8 f4 8f Bf Bf to UTF-16 dbff dfff
worked fine, the conversion back to UTF-8 has a subtil bug.  There's
a test for a lone high surrogate pair in the underlying conversion
function.  This tests the next UTF-16 value like this:

  if (wchar < 0xdc00 || wchar >= 0xdfff)
    /* Handle lone high surrogate */

Notice the >= 0xdfff?  That should have been > 0xdfff.  Duh.  This
bug is only a bit over 5 years old...

Fixed in the git repo.  I'l regenerate the today's fool..., erm, the
today's developer snapshot on https://cygwin.com/snapshots/ later today.


Thanks, especially for the simple testcase,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

Attachment: pgpw6N3MdZhUD.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]