This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Async signal safe TLS accesses
- From: Andrew Hunter <ahh at google dot com>
- To: Alexandre Oliva <aoliva at redhat dot com>
- Cc: Paul Pluzhnikov <ppluzhnikov at google dot com>, "Joseph S. Myers" <joseph at codesourcery dot com>, GLIBC Devel <libc-alpha at sourceware dot org>
- Date: Thu, 5 Dec 2013 12:01:38 -0800
- Subject: Re: [PATCH] Async signal safe TLS accesses
- Authentication-results: sourceware.org; auth=none
- References: <1380830518-16721-1-git-send-email-ahh at google dot com> <1382477766-15770-1-git-send-email-ahh at google dot com> <Pine dot LNX dot 4 dot 64 dot 1310222145490 dot 13258 at digraph dot polyomino dot org dot uk> <CALoOobMsO6X86JFD4J7F-EL-J+xOTEOVbzH=6mwrvfCnFvw57Q at mail dot gmail dot com> <Pine dot LNX dot 4 dot 64 dot 1311052233090 dot 30260 at digraph dot polyomino dot org dot uk> <CALoOobM70-mix+=1zuDnSoK7SRqQChbL=03xBkUcFf1fyS1Mjw at mail dot gmail dot com> <CALoOobP7kdpZCZA0a7MZWCtONu81fW4H_qAWOEfpfvzxJgG_=Q at mail dot gmail dot com> <CALoOobP6rTDosadvLKhHY+deDsU-FtvyO8QX_Y4dZy716e2ATQ at mail dot gmail dot com> <CALoOobOCT-inwMZkzEr+JYT4c8qwpN-EGMPFu_kHQTpc2icj0g at mail dot gmail dot com> <CALoOobPHo7+jG0nfiZp9afC2rArLUMRYZEag21W+78UBTZF=CQ at mail dot gmail dot com> <orvbzckoif dot fsf at livre dot home>
On Thu, Nov 28, 2013 at 5:03 PM, Alexandre Oliva <aoliva@redhat.com> wrote:
> I'm a bit disappointed at how wasteful the allocation strategy is,
> wasting a full page per thread even for a one-word TLS dynamic block. I
> think using a TLS-only malloc arena, only locked with signals disabled,
> would bring us much better memory efficiency, although there might be a
> significant penalty for assigning dynamic segments of different threads
> to the same page, should they happen to be accessed very often. This
> page sharing by different threads is something we don't necessarily
> avoid with the current AS-Unsafe allocation scheme, and avoiding that is
> an improvement that the current strategy brings, but I wonder if we
> could find some trade off that wouldn't waste so much memory but that
> would keep segments used by different threads in separate pages... This
> first comment is not meant as a blocker against the page, but as a
> suggestion for future improvement.
>
As pointed out various points downthread, this allocator's
inefficiency isn't particularly important, for this patch at least, so
I've left it as-is.
>
> Since __signal_safe_malloc is guaranteed to return just-mmapped memory,
> and that is always zeroed out by the kernel, there's no real need for
> __signal_safe_calloc to dirty the pages scribbling over them with
> userland zeros; they're no different from kernel zeros ;-)
>
>
> I was under the impression that one of the main problems with making TLS
> AS-Safe was the fact that in some cases of TLS dynamic address
> resolution we'd take the dl lock, and if we take a signal while we hold
> the lock, and the signal handler attempts to resolve the same address
> and take the lock, we'd deadlock if the lock was non-recursive, or risk
> finding inconsistent state otherwise. The lock is recursive, but I
> don't see anything in the patch besides the CAS loops that avoids
> inconsistent state, particularly in the case of lazy TLS relocation
> (introduced with GNU2 TLS). Was that not taken into account?
>
glibc recursive locks aren't async-signal-safe. We avoid problems
with reentrancy via dl_mask_all_signals.
>
> I'm somewhat concerned about priority inversion in the atomic_delay()
> and CAS loops. Considering that the locks the patch drops are all
> recursive, they won't pose an AS-Safety problem; if we keep them in
> place, but take the measures that have apparently been taken to avoid
> the problems of using inconsistent state when a handler interrupts a
> lock-holding operation, and of using outdated state after a signal
> handler modifies it, it should all work, and then it would be clear that
> the CAS loops are only in place for the case of functions being
> interrupted by signal handlers, not to deal with inter-thread
> concurrency. With the locks in place, we could at least get rid of the
> atomic_delay() loop.
>
The CAS loops have been removed (they were guaranteed to make forward
progress anyway, but that's moot). I admit a prio inversion is
theoretically possible but I very much doubt it's worth fixing right
now.