This is the mail archive of the
cygwin-developers
mailing list for the Cygwin project.
Re: Lone surrogates in UTF-8? (was: Re: Console codepage setting via chcp?)
>> The __utf8_wctomb function could just create the corresponding
>> UCS-2 values if no first half has been encountered before. ÂThe
>> __utf8_mbtowc function could simply allow these UCS-2 values again.
>>
>> That works (I just tested it) and is a small change, but is it really
>> desirable to allow UCS-2 values in UTF-8 strings?
>
> I don't know.
Improved answer: Debian allows them!
$ cat test.c
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>
int main(int argc, char *argv[]) {
puts(setlocale(LC_CTYPE, "") ?: "fail");
int arg = 0;
char s[8];
wchar_t wc;
if (argv[1])
sscanf(argv[1], "%x", &arg);
int l = wctomb(s, arg);
printf("%i\n", l);
l = mbtowc(&wc, s, l);
printf("%i\n", l);
printf("%x\n", wc);
}
$ LC_CTYPE=en_GB.UTF-8 ./a.out d800
en_GB.UTF-8
3
3
d800
$ LC_CTYPE=en_GB.UTF-8 ./a.out dc00
en_GB.UTF-8
3
3
dc00