This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH][BZ #16214] Fix TLS access on S390 with -march=z10
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: Siddhesh Poyarekar <siddhesh at redhat dot com>
- Cc: libc-alpha at sourceware dot org, Andreas Krebbel <krebbel at linux dot vnet dot ibm dot com>
- Date: Tue, 26 Nov 2013 10:29:39 -0500
- Subject: Re: [PATCH][BZ #16214] Fix TLS access on S390 with -march=z10
- Authentication-results: sourceware.org; auth=none
- References: <20131125055009 dot GK19834 at spoyarek dot pnq dot redhat dot com> <52937ABB dot 20009 at redhat dot com> <20131126095750 dot GU19834 at spoyarek dot pnq dot redhat dot com>
On 11/26/2013 04:57 AM, Siddhesh Poyarekar wrote:
> On Mon, Nov 25, 2013 at 11:28:43AM -0500, Carlos O'Donell wrote:
>> On 11/25/2013 12:50 AM, Siddhesh Poyarekar wrote:
>>> Hi,
>>>
>>> -march=z10 eliminates GOT pointer loads for static variables whenever
>>> it can, which exposes a bug in the TLS implementation in s390x. The
>>> TLS implementation assumes that the compiler will load the GOT pointer
>>> (%r12) and implements __tls_get_offset with that assumption. This
>>> however is true only in cases where the compiler sets up TLS for a
>>> __thread. For cases where we call TLS_GET_ADDR directly (like in
>>> dl-sym.c and in some test cases), we have to explicitly ensure that
>>> %r12 is set up.
>>
>> How are you ensuring r12 is "setup?"
>>
>> The bug here seems to be that r12 != GOT, not that the compiler has
>> failed to do anything special. This might have been a binutils
>> issue or a compiler issue that moved r12 away from GOT to optimize
>> loads. I can see that in tls-macros.h you need to artificially setup
>> r12, but that's not a bug fix, that's simply fixing the test cases ;-)
>
> The bug here is that the value that is added in __tls_get_offset
> before it jumps to __tls_get_addr is not the same value that was
> subtracted earlier. In dl-sym.c, the value subtracted was the GOT
> pointer and the value added was 0 (since r12 was not set to the GOT
> pointer). Likewise in the test case. So we're fixing both, a valid
> use case *and* a test case.
Is r12 actually uninitialized or does the compiler and runtime ensure
it's exactly zero? Could an undefined value cause an overflow or
underflow that might set a comparison flag? I don't think so, as long
as we use `unsigned' everywhere I think we're fine.
>>>
>>> +# ifdef __s390x__
>>> +# define LOAD_R12(r12) asm ("lgr %0,%%r12" : "=r" (r12) ::)
>>> +# elif defined __s390__
>>> +# define LOAD_R12(r12) asm ("lr %0,%%r12" : "=r" (r12) ::)
>>> +# endif
>>
>> You should add an `else ... error ..' case here in the even the compiler
>> ever breaks these defines. If you don't want to do that I'd just put
>> the __s390__ case in the else as is done in string.h for s390.
>>
>> I know there is at least one other case of this kind of idef/elif/endif
>> without an error case in dl-tls.h for s390, but that should also be made
>> more robust or just accept that !defined __s390x__ == defined __s390__.
>
> I'll do that for both cases in this file as a separate patch if it is
> OK. But then your suggestion to use __asm__("%r12") should eliminate
> this?
Yes, that's right, but if you try out using register please review the
generated code to make sure it's sensible.
>>> +
>>> # define GET_ADDR_OFFSET \
>>> (ti->ti_offset - (unsigned long) __builtin_thread_pointer ())
>>>
>>> +/* Subtract contents of r12 from __TI. We don't really care if this is the
>>> + GOT pointer. */
>>
>> This comment is a bit obscure and oblique. Why don't you care that r12 is the
>> GOT pointer? Is it just because the ABI says to use r12 and it wouldn't
>> really matter what it pointed at as long as you had a register to use for an
>> offset?
>>
>> I suggest rewriting this comment to explain why we use r12.
>>
>> e.g.
>>
>> /* The s390/s390x ABIs use r12 as a pointer to access the GOT and other data
>> via register+offset addressing. Their TLS ABI also uses r12 to access the
>> tls_index structure via r12+offset in __tls_get_offset. Thus knowing the
>> final address of the tls_index variable in __TLS_GET_ADDR we are forced
>> to subtract r12 here to get an offset which is what __tls_get_offset expects.
>> We don't use _GLOBAL_OFFSET_TABLE_ because that's wrong, r12 is not the GOT,
>> it is simply a register near the GOT to be used for register+offset
>> addressing. */
>>
>> Did I understand this correctly?
>
> No. in terms of the strict definition (which is relevant for
> interoperability between gcc-generated code for __thread access and
> glibc) %r12 must be set to the GOT pointer before calling
> __tls_get_offset - it does not have to be something else. The
> compiler will generate the code to load the GOT pointer in %r12 and
> generate the offset to pass to __tls_get_offset.
OK, in that case please write up some more detailed comment.
> In our case in __TLS_GET_ADDR (ignoring TLS_IE for a moment since it
> is just a test case fix), the compiler is not aware of a TLS access
> and is hence under no obligation to load the GOT pointer in %r12, nor
> is it responsible to generate the offset to the tls_index. We have to
> explicitly do that by subtracting the GOT pointer from the tls_index
> structure passed to the __TLS_GET_ADDR macro and then setting up %r12
> to the GOT pointer so that __tls_get_offset adds it back to the
> offset.
Right.
> However, since we have full control over these operations, we can
> avoid loading the GOT pointer in %r12 and just get away with
> subtracting whatever is there in %r12 so that adding it back in
> __tls_get_offset gives us back the tls_index address.
Exactly. This should be in the comment :-)
>>> # define __TLS_GET_ADDR(__ti) \
>>> - ({ extern char _GLOBAL_OFFSET_TABLE_[] attribute_hidden; \
>>> - (void *) __tls_get_offset ((char *) (__ti) - _GLOBAL_OFFSET_TABLE_) \
>>> - + (unsigned long) __builtin_thread_pointer (); })
>>> +({ \
>>> + unsigned long int r12; \
>>> + LOAD_R12 (r12); \
>>
>> register unsigned long int r12 __asm__ ("%r12"); ?
>
> That looks nicer too, will try it, thanks.
>
>>> + ((void *) __tls_get_offset ((unsigned long int) (__ti) - r12) \
>>> + + (unsigned long) __builtin_thread_pointer ()); \
>>> +})
Suggest using `unsigned long' or `unsigned long int' consistently.
>>> #endif
>>>
>>> diff --git a/sysdeps/s390/s390-32/tls-macros.h b/sysdeps/s390/s390-32/tls-macros.h
>>> index 8a0ad58..0a11998 100644
>>> --- a/sysdeps/s390/s390-32/tls-macros.h
>>> +++ b/sysdeps/s390/s390-32/tls-macros.h
>>> @@ -8,12 +8,17 @@
>>>
>>> #ifdef PIC
>>> # define TLS_IE(x) \
>>> - ({ unsigned long __offset; \
>>> + ({ unsigned long __offset, __save; \
>>> asm ("bras %0,1f\n" \
>>> - "0:\t.long " #x "@gotntpoff\n" \
>>> - "1:\tl %0,0(%0)\n\t" \
>>> - "l %0,0(%0,%%r12):tls_load:" #x \
>>
>> This old code looks like it expects r12 to be setup to a specific
>> value, but it's probably not given the optimizations you point
>> out initially.
>
> It expects GOT pointer to be loaded in %r12.
Understood.
>>> - : "=&a" (__offset) : : "cc" ); \
>>> + "0:\t.long _GLOBAL_OFFSET_TABLE_-0b\n\t" \
>>> + ".long " #x "@gotntpoff\n" \
>>
>> Compute the offset from the GOT to this literal pool.
>>
>>> + "1:\tlr %1,%%r12\n\t" \
>>
>> Save r12.
>>
>>> + "l %%r12,0(%0)\n\t" \
>>> + "la %%r12,0(%%r12,%0)\n\t" \
>>> + "l %0,4(%0)\n\t" \
>>> + "l %0,0(%0,%%r12):tls_load:" #x "\n\t" \
>>
>> Set r12 to the GOT, set %0 to the offset of the GOT entry that holds the
>> TLS variable offset and then do the load to get the value?
>>
>
> Set r12 to the GOT and set %0 to x@gotntpoff. So basr loads the
> address of the word after itself (which is the location of .long
> _GLOBAL_OFFSET_TABLE_-0b) into %0. Dereference that and add that
> value to its own address to get the GOT address. The word after the
> address in %0 is x@gotntpoff, i.e. the offset to compute the location
> of the TLS variable. Finally, add %r12 and the offset and dereference
> to get the value.
Makes sense.
>> I reviewed the other macros and noticed they setup r12, so they are OK.
>> However, they are artificially making r12 == GOT, but it need not be,
>> it's just for the test.
>
> No, they need to be == GOT, because the compiler will generate offsets
> relative to GOT.
Understood. Either way they look correct.
OK to checkin IMO as long as:
* Use register for r12 *or* fixup the defines to be robust.
and
* Enhance the comment around __TLS_GET_ADDR to explain the situation
in more detail.
I think you're good to go now unless you object to any of that.
Cheers,
Carlos.