Lone surrogates in UTF-8? (was: Re: Console codepage setting via chcp?)

Corinna Vinschen corinna-cygwin@cygwin.com
Mon Sep 28 12:46:00 GMT 2009

On Sep 28 13:39, Andy Koppe wrote:
> 2009/9/28 Corinna Vinschen:
> >> Oh, and I thought of one more thing that won't roundtrip correctly
> >> from Unix to Windows and back: a high surrogate directly followed by a
> >> low surrogate, because they'll combine into a non-BMP codepoint
> >> represented by a 4-byte sequence. That's near-impossible to happen by
> >> chance though.
> >
> > There is no chance to do that right.  But I'm willing to stick to
> > this trade-off since, as you wrote, it's near-impossible that somebody
> > created that filename by chance.
> Hmm. But what if Java or Oracle or some other CESU-8 degenerate did
> that on purpose?
> Just in case you're not yet completely sick of this, here's how I
> think it could be done:

Nooooo!  I *am* completely sick of this.  I'm willing to let this slip
until the first complaint about this very issue comes along.


Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

More information about the Cygwin-developers mailing list