This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: musl - and menchmarking libc
On Wed, 5 Sep 2012, Rich Felker wrote:
> > "UTF-8 multibyte",
>
> The conformance issue is accepting 5- and 6-byte sequences which are
> not in the modern standardized definition of UTF-8. They were part of
> a deprecated definition; that definition also allowed "non-shortest"
> sequences which were a security nightmare. Unicode defines UTF's in
> general as one-to-one mappings between sequences of code units and
> Unicode Scalar values (integers in the range 0 to 0x10ffff excluding
> 0xd800-0xdfff).
ISO C refers to ISO/IEC 10646, not Unicode. I see that the 2011 and 2012
editions of ISO/IEC 10646 do now have the more restricted definition of
UTF-8 (the 2003 edition had the definition allowing 5- and 6-byte
sequences, and I don't see changes to UTF-8 in a quick glance at the first
five amendments to it up to 2008, which are the amendments I have to hand
... <http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html>
removes old editions when new ones are put up); I'm not very familiar with
WG2 documents and can't readily find where the change in question
occurred. Anyway, since as regards known characters we support the 2011
edition of ISO/IEC 10646, I suppose the UTF-8 handling should be changed
to match the current ISO/IEC 10646 as well and there should be a bug filed
for this.
> > "Attention to corner cases".
>
> Overall, there are a lot of cases where glibc allocates memory with
> alloca or malloc where it doesn't have any fundamental need to,
> leading to interfaces that fail spuriously on OOM or even just on
> large inputs. fnmatch is a good example. If you'd like to treat these
> all as bugs, I'd be happy to start reporting them when I run across
> them, but I saw them more as an overall design issue that would
> require some consensus on policy to start changing/treating as bugs.
If it's an ISO C interface with no way to report errors, then it should be
considered a bug (for example, if memmove or qsort were to have spurious
error conditions). For POSIX and other interfaces (where POSIX allows
error conditions beyond those listed in POSIX), reports are still
worthwhile as quality-of-implementation issues - but the two cases should
be distinguished.
> 2. Add a static library full of the function names without the 64
> suffixes (i.e. fopen, fseeko, lseek, etc.) as hidden symbols, which do
> nothing but tail-call the __foo64 version.
> Step 2 is necessary to support conforming applications which declare
> LFS64-affected functions themselves rather than including the
> associated header (which, per ISO C and POSIX, is completely valid to
> do).
The LFS functions are not part of POSIX, and ISO C and POSIX only allow
declaring the functions yourself if the function prototype purely involves
built-in C types rather than typedefs from standard headers, which also
excludes most but not all of these functions.
--
Joseph S. Myers
joseph@codesourcery.com