This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Lone surrogates in UTF-8? (was: Re: Console codepage setting via chcp?)


>> The __utf8_wctomb function could just create the corresponding
>> UCS-2 values if no first half has been encountered before. ÂThe
>> __utf8_mbtowc function could simply allow these UCS-2 values again.
>>
>> That works (I just tested it) and is a small change, but is it really
>> desirable to allow UCS-2 values in UTF-8 strings?
>
> I don't know.

Improved answer: Debian allows them!

$ cat test.c
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>

int main(int argc, char *argv[]) {
  puts(setlocale(LC_CTYPE, "") ?: "fail");
  int arg = 0;
  char s[8];
  wchar_t wc;
  if (argv[1])
    sscanf(argv[1], "%x", &arg);
  int l = wctomb(s, arg);
  printf("%i\n", l);
  l = mbtowc(&wc, s, l);
  printf("%i\n", l);
  printf("%x\n", wc);
}

$ LC_CTYPE=en_GB.UTF-8 ./a.out d800
en_GB.UTF-8
3
3
d800

$ LC_CTYPE=en_GB.UTF-8 ./a.out dc00
en_GB.UTF-8
3
3
dc00


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]