This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][BZ #16214] Fix TLS access on S390 with -march=z10


On 11/26/2013 04:57 AM, Siddhesh Poyarekar wrote:
> On Mon, Nov 25, 2013 at 11:28:43AM -0500, Carlos O'Donell wrote:
>> On 11/25/2013 12:50 AM, Siddhesh Poyarekar wrote:
>>> Hi,
>>>
>>> -march=z10 eliminates GOT pointer loads for static variables whenever
>>> it can, which exposes a bug in the TLS implementation in s390x.  The
>>> TLS implementation assumes that the compiler will load the GOT pointer
>>> (%r12) and implements __tls_get_offset with that assumption.  This
>>> however is true only in cases where the compiler sets up TLS for a
>>> __thread.  For cases where we call TLS_GET_ADDR directly (like in
>>> dl-sym.c and in some test cases), we have to explicitly ensure that
>>> %r12 is set up.
>>
>> How are you ensuring r12 is "setup?"
>>
>> The bug here seems to be that r12 != GOT, not that the compiler has
>> failed to do anything special. This might have been a binutils
>> issue or a compiler issue that moved r12 away from GOT to optimize
>> loads. I can see that in tls-macros.h you need to artificially setup
>> r12, but that's not a bug fix, that's simply fixing the test cases ;-)
> 
> The bug here is that the value that is added in __tls_get_offset
> before it jumps to __tls_get_addr is not the same value that was
> subtracted earlier.  In dl-sym.c, the value subtracted was the GOT
> pointer and the value added was 0 (since r12 was not set to the GOT
> pointer).  Likewise in the test case.  So we're fixing both, a valid
> use case *and* a test case.

Is r12 actually uninitialized or does the compiler and runtime ensure 
it's exactly zero? Could an undefined value cause an overflow or
underflow that might set a comparison flag? I don't think so, as long
as we use `unsigned' everywhere I think we're fine.

>>>  
>>> +# ifdef __s390x__
>>> +#  define LOAD_R12(r12) asm ("lgr %0,%%r12" : "=r" (r12) ::)
>>> +# elif defined __s390__
>>> +#  define LOAD_R12(r12) asm ("lr %0,%%r12" : "=r" (r12) ::)
>>> +# endif
>>
>> You should add an `else ... error ..' case here in the even the compiler
>> ever breaks these defines. If you don't want to do that I'd just put
>> the __s390__ case in the else as is done in string.h for s390.
>>
>> I know there is at least one other case of this kind of idef/elif/endif
>> without an error case in dl-tls.h for s390, but that should also be made
>> more robust or just accept that !defined __s390x__ == defined __s390__.
> 
> I'll do that for both cases in this file as a separate patch if it is
> OK.  But then your suggestion to use __asm__("%r12") should eliminate
> this?

Yes, that's right, but if you try out using register please review the
generated code to make sure it's sensible.

>>> +
>>>  # define GET_ADDR_OFFSET \
>>>    (ti->ti_offset - (unsigned long) __builtin_thread_pointer ())
>>>  
>>> +/* Subtract contents of r12 from __TI.  We don't really care if this is the
>>> +   GOT pointer.  */
>>
>> This comment is a bit obscure and oblique. Why don't you care that r12 is the
>> GOT pointer? Is it just because the ABI says to use r12 and it wouldn't
>> really matter what it pointed at as long as you had a register to use for an
>> offset? 
>>
>> I suggest rewriting this comment to explain why we use r12.
>>
>> e.g.
>>
>> /* The s390/s390x ABIs use r12 as a pointer to access the GOT and other data
>>    via register+offset addressing. Their TLS ABI also uses r12 to access the
>>    tls_index structure via r12+offset in __tls_get_offset. Thus knowing the
>>    final address of the tls_index variable in __TLS_GET_ADDR we are forced
>>    to subtract r12 here to get an offset which is what __tls_get_offset expects.
>>    We don't use _GLOBAL_OFFSET_TABLE_ because that's wrong, r12 is not the GOT,
>>    it is simply a register near the GOT to be used for register+offset
>>    addressing.  */
>>
>> Did I understand this correctly?
> 
> No.  in terms of the strict definition (which is relevant for
> interoperability between gcc-generated code for __thread access and
> glibc) %r12 must be set to the GOT pointer before calling
> __tls_get_offset - it does not have to be something else.  The
> compiler will generate the code to load the GOT pointer in %r12 and
> generate the offset to pass to __tls_get_offset.

OK, in that case please write up some more detailed comment.

> In our case in __TLS_GET_ADDR (ignoring TLS_IE for a moment since it
> is just a test case fix), the compiler is not aware of a TLS access
> and is hence under no obligation to load the GOT pointer in %r12, nor
> is it responsible to generate the offset to the tls_index.  We have to
> explicitly do that by subtracting the GOT pointer from the tls_index
> structure passed to the __TLS_GET_ADDR macro and then setting up %r12
> to the GOT pointer so that __tls_get_offset adds it back to the
> offset.

Right.

> However, since we have full control over these operations, we can
> avoid loading the GOT pointer in %r12 and just get away with
> subtracting whatever is there in %r12 so that adding it back in
> __tls_get_offset gives us back the tls_index address.

Exactly. This should be in the comment :-)

>>>  # define __TLS_GET_ADDR(__ti) \
>>> -  ({ extern char _GLOBAL_OFFSET_TABLE_[] attribute_hidden;		  \
>>> -     (void *) __tls_get_offset ((char *) (__ti) - _GLOBAL_OFFSET_TABLE_)  \
>>> -     + (unsigned long) __builtin_thread_pointer (); })
>>> +({									      \
>>> +  unsigned long int r12;						      \
>>> +  LOAD_R12 (r12);							      \
>>
>> register unsigned long int r12 __asm__ ("%r12"); ?
> 
> That looks nicer too, will try it, thanks.
> 
>>> +  ((void *) __tls_get_offset ((unsigned long int) (__ti) - r12)		      \
>>> +   + (unsigned long) __builtin_thread_pointer ());			      \
>>> +})

Suggest using `unsigned long' or `unsigned long int' consistently.

>>>  #endif
>>>  
>>> diff --git a/sysdeps/s390/s390-32/tls-macros.h b/sysdeps/s390/s390-32/tls-macros.h
>>> index 8a0ad58..0a11998 100644
>>> --- a/sysdeps/s390/s390-32/tls-macros.h
>>> +++ b/sysdeps/s390/s390-32/tls-macros.h
>>> @@ -8,12 +8,17 @@
>>>  
>>>  #ifdef PIC
>>>  # define TLS_IE(x) \
>>> -  ({ unsigned long __offset;						      \
>>> +  ({ unsigned long __offset, __save;					      \
>>>       asm ("bras %0,1f\n"						      \
>>> -	  "0:\t.long " #x "@gotntpoff\n"				      \
>>> -	  "1:\tl %0,0(%0)\n\t"						      \
>>> -	  "l %0,0(%0,%%r12):tls_load:" #x				      \
>>
>> This old code looks like it expects r12 to be setup to a specific
>> value, but it's probably not given the optimizations you point
>> out initially.
> 
> It expects GOT pointer to be loaded in %r12.

Understood.

>>> -	  : "=&a" (__offset) : : "cc" );				      \
>>> +	  "0:\t.long _GLOBAL_OFFSET_TABLE_-0b\n\t"			      \
>>> +	  ".long " #x "@gotntpoff\n"					      \
>>
>> Compute the offset from the GOT to this literal pool.
>>
>>> +	  "1:\tlr %1,%%r12\n\t"						      \
>>
>> Save r12.
>>
>>> +	  "l %%r12,0(%0)\n\t"                                                 \
>>> +	  "la %%r12,0(%%r12,%0)\n\t"					      \
>>> +	  "l %0,4(%0)\n\t"						      \
>>> +	  "l %0,0(%0,%%r12):tls_load:" #x "\n\t"			      \
>>
>> Set r12 to the GOT, set %0 to the offset of the GOT entry that holds the
>> TLS variable offset and then do the load to get the value?
>>
> 
> Set r12 to the GOT and set %0 to x@gotntpoff.  So basr loads the
> address of the word after itself (which is the location of .long
> _GLOBAL_OFFSET_TABLE_-0b) into %0.  Dereference that and add that
> value to its own address to get the GOT address.  The word after the
> address in %0 is x@gotntpoff, i.e. the offset to compute the location
> of the TLS variable.  Finally, add %r12 and the offset and dereference
> to get the value.

Makes sense.

>> I reviewed the other macros and noticed they setup r12, so they are OK.
>> However, they are artificially making r12 == GOT, but it need not be,
>> it's just for the test.
> 
> No, they need to be == GOT, because the compiler will generate offsets
> relative to GOT.

Understood. Either way they look correct.

OK to checkin IMO as long as:

* Use register for r12 *or* fixup the defines to be robust.

and

* Enhance the comment around __TLS_GET_ADDR to explain the situation
  in more detail.

I think you're good to go now unless you object to any of that.

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]