This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: check-abi failure with sys_errlist


On Sat, Apr 7, 2012 at 10:24 AM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> There really is nothing magic here, just a simple fencepost error.
> Instead of defining the maximum error number it defined the number +1
> which resulted in extra inflation.
>
> Forget about sparc, it is completly unaffected, since it doesn't have an
> inflated errlist-compat number.

I agree with Joseph and Dave. We need your help Andreas, some
mentoring from you now will mean that in the future we'll have more
people able to review the code. Terse descriptions of code changes are
difficult to review.

The following is my current understanding of how we got into this
problem, and it's more complicated than I originally thought.

Some background on errlist.c:

(a) We compile sysdeps/gnu/errlist.c with -DEMIT_ERR_MAX to get a
count of the number of errno entries.

(b) At build time using the result of (a) and Versions we create
errlist-compat.c and errlist-compat.h to help with symbol versioning
and size compatibility.

(c) We compile errlist.c, which includes errlist-compat.c and
errlist-compat.h, and if ERR_MAX is defined by errlist-compat.h, which
only happens when we are adding padding for future use, then we use
ERR_MAX + 1 as the true size of the errno table.

In 2004-10-20 we add this hunk, amongst other additions, to
sysdeps/gnu/errlist-compat.awk
~~~
@@ -79,6 +79,12 @@ END {
   print "/* This file was generated by errlist-compat.awk; DO NOT EDIT!  */\n";
   print "#include <shlib-compat.h>\n";

+  if (highest > count) {
+    printf "*** errlist.c count %d inflated to %s count %d (old errno.h?)\n", \
+      count, highest_version, highest > "/dev/stderr";
+    printf "#define ERR_MAX %d\n\n", highest;
+  }
+
~~~

This code has a bug. It prints the size of the array for ERR_MAX when
it should print one less. The value of ERR_MAX is supposed to be the
last *index* in the array, not the full size. This code is triggered
only if you are padding the errno table for future uses to avoid
needing a new symbol every time you add a new errno value.

What are the implications of this bug? This means that over the next 5
years the compat code in errlist-compat.c/errlist-compat.h, created
from errlist-compat.awk, has an ERR_MAX which is 1 larger IFF you
added more entries than you needed (to trigger the `highest > count'
case).

A 2005 change adds 6 new errno entries, ending in the 131st errno (132
entries), along with 0 padding entries, and the bug is not triggered
for the @2.4 symbol on x86 or x86_64.

d4d138a4 (Ulrich Drepper    2005-12-24 21:12:00 +0000 123)
#errlist-compat  132
d4d138a4 (Ulrich Drepper    2005-12-24 21:12:00 +0000 124)
_sys_errlist; sys_errlist; _sys_nerr; sys_nerr;

A 2009 change adds ERFKILL, the 132nd errno (133 entries), along with
1 extra entry, and the bug is triggered for the @2.12 symbol on x86
and x86_64.

0079dd23 (Ulrich Drepper    2009-11-14 10:20:25 -0800 144)   GLIBC_2.12 {
0079dd23 (Ulrich Drepper    2009-11-14 10:20:25 -0800 145)
#errlist-compat  134

For example on x86_64 there are only 133 entries or 0x428 bytes of
real data, we pre-allocate up to 134 entries or 0x430 bytes to ensure
we don't have to add another symbol the next time we need a new slot.
The bug in errlist-compat.awk actually creates a compat array of 135
entries or 0x438 bytes, as is shown by Andreas' test of a shipped
glibc 2.12 (and can be shown with a simple program that prints
sys_nerr).

Therefore because of the bug, after adding EHWPOISON, there is still 1
empty slot left in the table. To compensate for fixing the bug in
errlist-compat.awk you need to correctly account for the extra slot
and bump #errlist-compat to 135.

As Andreas' suggests we need to:

* Undo the bug in errlist-compat.awk.

* Bump 2.12 #errlist-compat in sysdeps/unix/sysv/linux/Versions for
all generic users since this is shown to be correct.

What do the machine maintainers need to do:

* Bump #errlist-compat by 1 for the highest version that is <= 2.15,
but only if you had padding.

* In theory there is the possibility of an ABI breakage with this bug.
If you had a release with the bug, then in a subsequent release filled
the entire table exactly, then you have no extra slots and the extra
slot granted by the bug would be missing, and this would cause your
table to shrink by one. A shrinking array might cause a problem if the
extra slot is on the next page and that page us unmapped. Recompiled
programs already expect it to be smaller. So it's hard to know if
fixing it would be better or worse. I suggest leaving it as is and
moving forward.

It's past my bedtime so I figure I made some mistakes.

Comments?

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]