This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: musl - and menchmarking libc


On Wed, 5 Sep 2012, Rich Felker wrote:

> > "UTF-8 multibyte",
> 
> The conformance issue is accepting 5- and 6-byte sequences which are
> not in the modern standardized definition of UTF-8. They were part of
> a deprecated definition; that definition also allowed "non-shortest"
> sequences which were a security nightmare. Unicode defines UTF's in
> general as one-to-one mappings between sequences of code units and
> Unicode Scalar values (integers in the range 0 to 0x10ffff excluding
> 0xd800-0xdfff).

ISO C refers to ISO/IEC 10646, not Unicode.  I see that the 2011 and 2012 
editions of ISO/IEC 10646 do now have the more restricted definition of 
UTF-8 (the 2003 edition had the definition allowing 5- and 6-byte 
sequences, and I don't see changes to UTF-8 in a quick glance at the first 
five amendments to it up to 2008, which are the amendments I have to hand 
... <http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html> 
removes old editions when new ones are put up); I'm not very familiar with 
WG2 documents and can't readily find where the change in question 
occurred.  Anyway, since as regards known characters we support the 2011 
edition of ISO/IEC 10646, I suppose the UTF-8 handling should be changed 
to match the current ISO/IEC 10646 as well and there should be a bug filed 
for this.

> > "Attention to corner cases".
> 
> Overall, there are a lot of cases where glibc allocates memory with
> alloca or malloc where it doesn't have any fundamental need to,
> leading to interfaces that fail spuriously on OOM or even just on
> large inputs. fnmatch is a good example. If you'd like to treat these
> all as bugs, I'd be happy to start reporting them when I run across
> them, but I saw them more as an overall design issue that would
> require some consensus on policy to start changing/treating as bugs.

If it's an ISO C interface with no way to report errors, then it should be 
considered a bug (for example, if memmove or qsort were to have spurious 
error conditions).  For POSIX and other interfaces (where POSIX allows 
error conditions beyond those listed in POSIX), reports are still 
worthwhile as quality-of-implementation issues - but the two cases should 
be distinguished.

> 2. Add a static library full of the function names without the 64
> suffixes (i.e. fopen, fseeko, lseek, etc.) as hidden symbols, which do
> nothing but tail-call the __foo64 version.

> Step 2 is necessary to support conforming applications which declare
> LFS64-affected functions themselves rather than including the
> associated header (which, per ISO C and POSIX, is completely valid to
> do).

The LFS functions are not part of POSIX, and ISO C and POSIX only allow 
declaring the functions yourself if the function prototype purely involves 
built-in C types rather than typedefs from standard headers, which also 
excludes most but not all of these functions.

-- 
Joseph S. Myers
joseph@codesourcery.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]