This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Mips}Using DT tags for handling local ifuncs


"Maciej W. Rozycki" <macro@codesourcery.com> writes:
> On Thu, 12 Dec 2013, Richard Sandiford wrote:
>> I think you're suggesting that we allow the ABI-defined GOT to start at
>> something other than $gp - 0x7ff0, so that explicitly-relocated data
>> could go first.  I think that would be more disruptive in some ways,
>> since the 0x7ff0 offset is hard-coded into glibc.  The resolver for
>> lazy-binding stubs subtracts 0x7ff0 from the incoming $gp to get the
>> start of the ABI-defined GOT and then gets the link map from entry 1
>> (assuming that the GNU extension is in use).
>> 
>> I suppose it'd be possible to adjust $gp in the stub so that $gp - 0x7ff0
>> is right on entry to the resolver.  But that would be difficult to do
>> cleanly on n32 and n64, where $gp is call-saved.  The resolver would
>> probably have to return to the stub, which in turn would mean that the
>> stub would need call-frame information.
>
>  Hmm, thanks for reminding me that, that rules out the space before the 
> ABI GOT.  We still have space afterwards for things like this (or e.g. for 
> a small-data area if we ever implement it) though.

Yeah, putting it afterwards is what we already do for TLS relocs.
The problem with that is that the ABI global GOT has to include all
symbols that have a relocation against them, even if there's no need
for a $gp-relative GOT access.  And there can be quite a lot of symbols
like that, especially for things like vtables in C++ code.

So if we put the relocations after the ABI GOT we would end up forcing
the use of multigots even though the number of "real" GOT entries
(those that need to be accessed $gp-relative) is small enough for
a single GOT.  The idea of the tag is to avoid that.

>> >  BTW, for loading 64-bit addresses I suggest using two temporaries (we've 
>> > got plenty of them) for a sequence that is faster on superscalar 
>> > processors, i.e. rather than:
>> >
>> > static const bfd_vma mips64_exec_iplt_entry[] =
>> > {
>> >   0x3c0f0000,	/* lui $15, %highest(.got.iplt entry)        */
>> >   0x65ef0000,	/* daddiu $15, $15, %higher(.got.iplt entry) */
>> >   0x000f7c38,	/* dsll $15,$15, 16                          */
>> >   0x65ef0000,	/* daddiu $15, $15, %hi(.got.iplt entry)     */
>> >   0x000f7c38,	/* dsll $15,$15, 16                          */
>> >   0x01f90000,	/* l[wd] $25, %lo(.got.iplt entry)($15)      */
>> >   0x03200008,	/* jr $25                                    */
>> >   0x00000000,	/* nop                                       */
>> > };
>> >
>> > use:
>> >
>> > static const bfd_vma mips64_exec_iplt_entry[] =
>> > {
>> >   0x3c0f0000,	/* lui $15, %highest(.got.iplt entry)        */
>> >   0x3c0e0000,	/* lui $14, %hi(.got.iplt entry)             */
>> >   0x25ef0000,	/* addiu $15, $15, %higher(.got.iplt entry)  */
>> >   0x000f783c,	/* dsll32 $15, $15, 0x0                      */
>> >   0x01ee782d,	/* daddu $15, $15, $14                       */
>> >   0xddf90000,	/* ld $25, %lo(.got.iplt entry)($15)         */
>> >   0x03200008,	/* jr $25                                    */
>> >   0x00000000,	/* nop                                       */
>> > };
>> >
>> > (this also avoids a DADDIU erratum early R4000/R4400 chips had).
>> 
>> Yeah, I wondered about this when I first saw it too, but Jack optimized
>> the sequence based on the address, so that we would only have the full
>> thing if %highest really was needed.  Since the usual base address is
>> 0x120000000, I think the full sequence will in effect never be used.
>> 
>> I'm not opposed to having two n64 sequences, one for when %highest
>> is needed and one for when it isn't.  It just doesn't seem like a
>> priority.
>
>  Fair enough, but then, after a bit of thinking, do we need 
> %highest/%higher stuff in the first place?  For n64 non-PIC PLT is only 
> supported for msym32 binaries anyway and it doesn't look to me it is ever 
> going to change, so the high 33 address bits will always be zero and the 
> 32-bit version (with LD rather than LW) will do, and for SVR4 PIC binaries 
> you need to figure out the GOT pointer from $t9 instead (is there any 
> point in making a difference between ET_EXEC and ET_DYN binaries here?); 
> note that this would exclude ifunc calls from being tail calls (breaking 
> the standard calling convention) so it looks to me we'll have to make an 
> extra stub to load $gp beforehand.
>
>  Have I missed anything?

One of the main reasons for requiring -msym32 for PLTs is that absolute
code is only really interesting if you can use %hi and %lo to access symbols.
If you need the full %highest/... sequence then it's often better to use
a GOT instead.

But despite the name, IPLTs are different.  They are used regardless of
whether the input is new-style absolute code or SVR4 PIC.  ifuncs in
themselves shouldn't force -msym32.

Thanks,
Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]