Lone surrogates in UTF-8? (was: Re: Console codepage setting via chcp?)

Corinna Vinschen corinna-cygwin@cygwin.com
Mon Sep 28 12:46:00 GMT 2009


On Sep 28 13:39, Andy Koppe wrote:
> 2009/9/28 Corinna Vinschen:
> >> Oh, and I thought of one more thing that won't roundtrip correctly
> >> from Unix to Windows and back: a high surrogate directly followed by a
> >> low surrogate, because they'll combine into a non-BMP codepoint
> >> represented by a 4-byte sequence. That's near-impossible to happen by
> >> chance though.
> >
> > There is no chance to do that right.  But I'm willing to stick to
> > this trade-off since, as you wrote, it's near-impossible that somebody
> > created that filename by chance.
> 
> Hmm. But what if Java or Oracle or some other CESU-8 degenerate did
> that on purpose?
> 
> Just in case you're not yet completely sick of this, here's how I
> think it could be done:

Nooooo!  I *am* completely sick of this.  I'm willing to let this slip
until the first complaint about this very issue comes along.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat



More information about the Cygwin-developers mailing list