This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Consensus: Tuning runtime behaviour with environment variables.
- From: Rich Felker <dalias at aerifal dot cx>
- To: Alexandre Oliva <aoliva at redhat dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Sun, 2 Jun 2013 11:41:50 -0400
- Subject: Re: Consensus: Tuning runtime behaviour with environment variables.
- References: <51A58A92 dot 4050508 at redhat dot com> <20130529055518 dot GA23030 at domone dot kolej dot mff dot cuni dot cz> <ormwraq3rx dot fsf at livre dot home> <20130601031151 dot GK20323 at brightrain dot aerifal dot cx> <ora9n9i3jc dot fsf at livre dot home>
On Sun, Jun 02, 2013 at 12:40:07AM -0300, Alexandre Oliva wrote:
> > This does not make access more efficient.
>
> It does when using the optimized TLS relocations I introduced. If
> there's room in the static TLS segment, the dynamic loader resolves TLS
> references to code equivalent to initial exec; if there isn't, it has to
> fallback to the much slower dynamic access modes, even though there are
> optimized fast paths there as well, compared with the TLS ABI used by
> default on x86 and x86_64.
>
> Note that these optimizations (still) aren't the default TLS mode on
> these architectures, even though it was adopted by a few other
> architectures as the TLS ABI. On x86 and x86_64 (and ARM IIRC) you have
> to compile with -mtls-dialect=gnu2 (and -fPIC) to get these optimizable
> relocations.
>
> Please read http://people.redhat.com/aoliva/writeups/TLS/RFC-TLSDESC-x86.txt
Do you have any performance figures to justify this? It strikes me as
a hideous hack for tiny, possibly imaginary gains. The only real place
I see a possible benefit is also the ugliest part: the alternate
calling convention that allows the caller to avoid spilling registers.
In particular, __tls_get_addr should otherwise be nearly as fast as
the do-nothing function, at least the variant that adds the TP. If
not, this is an issue in glibc's __tls_get_addr that should be fixed.
In musl, we have:
void *__tls_get_addr(size_t *v)
{
pthread_t self = __pthread_self();
if (self->dtv && v[0]<=(size_t)self->dtv[0] && self->dtv[v[0]])
return (char *)self->dtv[v[0]]+v[1];
[...]
I just noticed that allowing self->dtv to be NULL is a design flaw
with respect to performance, but even with that check, all the
branches are predictable and the hot path is just a few memory
accesses. If glibc's is considerably slower than this, I think fixing
that issue would make more sense then doing fancy hacks that only
apply to libraries whose TLS gets allocated in the static TLS
segment...
Rich