This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix UTF-16 surrogate handling in __utf8_mbtowc


On Jul 28 17:32, Jeff Johnston wrote:
> Corinna Vinschen wrote:
>> The question is, shouldn't the code be changed to disallow values beyond
>> 0x10ffff on all systems, rather than just checking it in the UTF-16
>> case?
>>
>>   
> If the code allows those invalid sequences to generate and doesn't catch  
> them at an earlier stage,
> then it should be fixed, so go ahead, assuming you have tested the patch.

Yes, I tested it.  The highest valid UTF-8 sequence is \xf4\x8f\xbf\xbf
which represents U+10ffff.  So the changed tests only allow leading
bytes <= \xf4 and no sequence >= \xf4\x90.  I also added a clarifying
comment for the test for invalid UTF-8 3-byte sequences representing
UTF-16 surrogate values.  These are valid in CESU-8, but not in UTF-8.

Patch applied.


Thanks,
Corinna

-- 
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]