This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: TLS redux [2.19]


I'm just going to discuss the immediate issue for 2.19 now.  I think
there is consensus on the overall direction for 2.20 and we can
discuss the details more after 2.19 has sailed.

In short, I still disagree with the conclusions people have come to
here.  I consider the nature of the rationales applied to be a deep
cognitive failure to be appropriately conservative.

Rich Felker's characterization of either my position or the historical
glibc position as a "DOS/Windows approach" ("bug compatibility") is
fundamentally inaccurate.  I won't go into more discussion about an
abstract topic now, since it's a distraction.

The fact that LeakSanitizer was shown to be broken by the changes is
*not* what makes my hesitance correct!  It's a specific, concrete
example of why I am correct in general.  It is fundamentally
wrong-headed to conclude from this that if we find a way to change
LeakSanitizer so it works then the problem is solved.

1. People using old versions of LeakSanitizer with glibc-2.19 should
   not be broken.  Doing so is an ABI regression.

2. There are probably other things that are broken too.  Conservatism
   means presuming that there might be and accepting that they matter
   even if we are not aware of them now and even if we are not aware
   of them a year from now.

   If you find yourself saying, "Oh, we found the one application that
   our ABI-breaking change actually broke, and we changed that
   application, so it doesn't count as an ABI-breaking change any
   more," then you are Just Plain Wrong.

3. In response to Joseph's question, yes, you can replace malloc and
   have the dynamic linker call your malloc.  It uses normal PLT calls
   for malloc, calloc, realloc, free, and __libc_memalign.  So it will
   use an application-supplied allocator just the same way libc does.

   It's true that the early allocations done at startup time use an
   allocator private to the dynamic linker (whose allocations can
   never be freed).  These are disjoint from allocations made after
   startup.  The dynamic linker should never attempt to free or
   realloc these allocations; if it does so, that's a bug, but there
   are no known or reported bugs of this nature (at least in recent
   years).

4. In response to Paul's point, yes, replacing malloc and getting
   everything right is hard.  That's really neither here nor there.
   Existing things are doing it right already, and the rules being
   arcane but staying the same for years is a very different thing
   from the rules shifting under your feet.

5. In response to one of Rich's several trolling mischaracterizations,
   there is no example of an "undocumented internal interface any
   application developer might ever have discovered and (ab)used"
   here.  There is no internal interface involved at all.  There is
   indeed an undocumented subtlety, but many things that are stably
   well-specified in actual fact, are unfortunately not formally
   documented.  Certainly we should reduce subtlety and increase
   documentation in the future, but that does not relieve us of our
   obligations to maintain ABI stability today.

6. The supposed urgency of this issue comes entirely from Google for
   Google's uses on production servers.  Google does not produce any
   glibc binaries distributed outside the company(*).  Google does not
   distribute any glibc-using binaries that are believed to be
   affected by this issue.  Google already uses a bespoke modified
   glibc on production servers, so having changes upstream in a
   particular release is not actually a practical constraint on what
   Google can roll out on its servers.

   I thus conclude that there is in fact no urgency whatsoever for
   this issue.  We have rough consensus on a new approach for 2.20
   that will address the immediate issue without introducing any
   compatibility risks.  IMHO that is sufficient for the medium and
   long terms, and there is no need for anything at all in the short
   term.  The status quo ante (2.18) is better than the proposed new
   incompatibility for 2.19.  The yet-newer scheme proposed for 2.20
   (with various details to be ironed out) is better than either, so
   getting that done in 2.20 should be enough.

   The mere fact that we are still discussing fundamental questions
   well into the release freeze period means that these changes are
   not sufficiently baked.  Since there is in fact no true urgency of
   any kind, we should not delay this release further.  We should
   simply make a release that is safely backward compatible, and
   address the whole set of TLS issues for the next release.

   (*) Except for ChromeOS, where there is nothing believed to be
   affected by the issue; and for Native Client, which binaries I
   maintain and can speak for authoritatively, and Native Client does
   not support signals, so the issue is moot.

7. On further reflection I am not so convinced that any "middle road"
   is actually worth pursuing, although I won't object to one that
   meets the criteria for careful backward compatibility.

   If any change to the status quo ante (2.18) is warranted, it must
   be an "opt in" change.  That is, existing binaries and existing
   programs recompiled unchanged will get the existing behavior.
   Programs need to do something explicit at compile time, link time,
   or run time to opt in to using the signal-safe allocator.

   Since we don't think the signal-safe allocation approach is what we
   really want in the long run, it's hard to imagine any new opt-in
   method we'd want to add to the public ABI.  Of course we could add
   something that becomes a no-op later, but it doesn't seem
   worthwhile to add that bloat.

   Google's production servers are using Google-private binaries built
   against a Google-private modified glibc.  So for them it would be
   adequate to have some unofficial ABI, such as GLIBC_PRIVATE symbols
   or the like.  But given that Google is modifying glibc anyway, I
   don't see actual rationale for putting anything like that into
   glibc proper.  Google can just as easily use a small patch in its
   glibc builds, either one that provides the opt-in interface or one
   that just changes the behavior.  There is nothing wrong with making
   sure such a patch is trivial by leaving the code we've added in as
   dead code, or at the very least leaving in the changes that make
   all the lazy TLS allocations go through a special set of entry
   points so they're easy to catch.

   Finally, another option is to allow opt-in when building libc.
   That is, a configure switch to enable using the new
   allocator--which must default to off, preserving compatible
   semantics.  I think this is a somewhat bad idea, but it is
   straightforward and very little work to implement.  Any distro that
   uses --enable-breaking-abi-compatibility-for-arcane-new-tls-feature
   is doing a disservice to its users.  But that's their decision to
   make, and if they're going to make it, there's no reason we should
   force them to use a trivial patch instead of a trivial
   configuration change.  Given how committed everyone else here has
   been to being wrong about the subject, this seems like the path
   most likely to achieve consensus.  I don't think anybody's case
   against just leaving 2.19 unconditionally behaving as the feature
   has always behaved will be even slightly convincing, but the
   configure-switch cop-out seems the most likely to be acceptable to
   all maintainers holding strong opposing views.


Thanks,
Roland


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]