This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] gold: Add linker relaxation of tail calls on sparc.


This gets GOLD in line with the relaxations performed by the
BFD linker on sparc.  Basically we can turn a tail call into
a branch if certain conditions are satisfied, and in particular
the final call displacement is small enough.

The code is heavily and thoroughly commented.

I've validated that the relaxation triggers during the testsuite on
both 32-bit and 64-bit sparc, and that the transformations performed
are correct.

I looked at the relaxation support that exists for the kind of stuff
ARM does with stubs where section sizes and offsets end up changing,
and that didn't seem appropriate nor necessary for what I'm doing
here.  So I implemented this similarly to how the TLS code
optimizations are done.

This is relative to "[PATCH] gold: Maintain sparc ELF header bits
properly" for which I'm still waiting approval.

After this I only have 3 items on my agenda for GOLD/Sparc:

1) Implement proper R_SPARC_REGISTER support, this has been bugging
   me for years but last time I tried this it was hard to frob out
   dummy relocations and/or symbols and have GOLD just leave them
   completely alone and let them propagate unimpeded to the target
   relocation function.

   My frustration matched that which Roland recently endured with
   those special ARM PLT symbols.  :-) Hopefully this kind of thing is
   easier now.

2) Fix all of the bugs wrt. large PLTs on sparc64.  As I mentioned
   in my IFUNC patch posting, in order to get this right we'll need
   to pass PLT indexes around rather than PLT offsets.

   The core issue is that PLTs are variably sized on 64-bit sparc, the
   first 32767 PLT slots are one size, and then the format and the
   size of the PLT entries change starting at index 32768.  So if you
   have an IFUNC plt slot, you won't know how big it's going to be
   until you know it's final index.

3) Implement full incremental linking support just like x86-64 does.
   Any known gotchas or implementation issues would be appreciated.

Thanks.

gold/

	* sparc.cc (Target_sparc::Relocate::relax_call): New function.
	(Target_sparc::Relocate::relocate): Call it for R_SPARC_WDISP30
	and R_SPARC_WPLT30.

diff --git a/gold/sparc.cc b/gold/sparc.cc
index 004a265..a9fe9de 100644
--- a/gold/sparc.cc
+++ b/gold/sparc.cc
@@ -333,6 +333,11 @@ class Target_sparc : public Sized_target<size, big_endian>
 		 typename elfcpp::Elf_types<size>::Elf_Addr,
 		 section_size_type);
 
+    inline void
+    relax_call(Target_sparc<size, big_endian>* target,
+	       unsigned char* view,
+	       const elfcpp::Rela<size, big_endian>& rela);
+
     // Ignore the next relocation which should be R_SPARC_TLS_GD_ADD
     bool ignore_gd_add_;
 
@@ -3304,6 +3309,8 @@ Target_sparc<size, big_endian>::Relocate::relocate(
     case elfcpp::R_SPARC_WDISP30:
     case elfcpp::R_SPARC_WPLT30:
       Reloc::wdisp30(view, object, psymval, addend, address);
+      if (target->may_relax())
+	relax_call(target, view, rela);
       break;
 
     case elfcpp::R_SPARC_WDISP22:
@@ -3954,6 +3961,145 @@ Target_sparc<size, big_endian>::Relocate::relocate_tls(
     }
 }
 
+// Relax a call instruction.
+
+template<int size, bool big_endian>
+inline void
+Target_sparc<size, big_endian>::Relocate::relax_call(
+    Target_sparc<size, big_endian>* target,
+    unsigned char* view,
+    const elfcpp::Rela<size, big_endian>& rela)
+{
+  typedef typename elfcpp::Swap<32, true>::Valtype Insntype;
+  Insntype *wv = reinterpret_cast<Insntype*>(view);
+  Insntype call_insn, delay_insn, set_insn;
+  uint32_t op3, reg, off;
+
+  // This code tries to relax call instructions that meet
+  // certain criteria.
+  //
+  // The first criteria is that the call must be such that the return
+  // address which the call writes into %o7 is unused.  Two sequences
+  // meet this criteria, and are used to implement tail calls.
+  //
+  // Leaf function tail call:
+  //
+  // or %o7, %g0, %ANY_REG
+  // call FUNC
+  //  or %ANY_REG, %g0, %o7
+  //
+  // Non-leaf function tail call:
+  //
+  // call FUNC
+  //  restore
+  //
+  // The second criteria is that the call destination is close.  If
+  // the displacement can fit in a signed 22-bit immediate field of a
+  // pre-V9 branch, we can do it.  If we are generating a 64-bit
+  // object or a 32-bit object with ELF machine type EF_SPARC32PLUS,
+  // and the displacement fits in a signed 19-bit immediate field,
+  // then we can use a V9 branch.
+
+  call_insn = elfcpp::Swap<32, true>::readval(wv);
+  delay_insn = elfcpp::Swap<32, true>::readval(wv + 1);
+
+  // Make sure it is really a call instruction.
+  if (((call_insn >> 30) & 0x3) != 1)
+    return;
+
+  if (((delay_insn >> 30) & 0x3) != 2)
+    return;
+
+  // Accept only a restore or an integer arithmetic operation whose
+  // sole side effect is to write the %o7 register (and perhaps set
+  // the condition codes, which are considered clobbered across
+  // function calls).
+  //
+  // For example, we don't want to match a tagged addition or
+  // subtraction.  We also don't want to match something like a
+  // divide.
+  //
+  // Specifically we accept add{,cc}, and{,cc}, or{,cc},
+  // xor{,cc}, sub{,cc}, andn{,cc}, orn{,cc}, and xnor{,cc}.
+
+  op3 = (delay_insn >> 19) & 0x3f;
+  reg = (delay_insn >> 25) & 0x1f;
+  if (op3 != 0x3d
+      && ((op3 & 0x28) != 0 || reg != 15))
+    return;
+
+  // For non-restore instructions, make sure %o7 isn't
+  // an input.
+  if (op3 != 0x3d)
+    {
+      // First check RS1
+      reg = (delay_insn >> 14) & 0x15;
+      if (reg == 15)
+	return;
+
+      // And if non-immediate, check RS2
+      if (((delay_insn >> 13) & 1) == 0)
+	{
+	  reg = (delay_insn & 0x1f);
+	  if (reg == 15)
+	    return;
+	}
+    }
+
+  // Now check the branch distance.  We are called after the
+  // call has been relocated, so we just have to peek at the
+  // offset contained in the instruction.
+  off = call_insn & 0x3fffffff;
+  if ((off & 0x3fe00000)
+      && (off & 0x3fe00000) != 0x3fe00000)
+    return;
+
+  if ((size == 64 || target->elf_machine_ == elfcpp::EM_SPARC32PLUS)
+      && ((off & 0x3c0000) == 0
+	  || (off & 0x3c0000) == 0x3c0000))
+    {
+      // ba,pt %xcc, FUNC
+      call_insn = 0x10680000 | (off & 0x07ffff);
+    }
+  else
+    {
+      // ba FUNC
+      call_insn = 0x10800000 | (off & 0x3fffff);
+    }
+  elfcpp::Swap<32, true>::writeval(wv, call_insn);
+
+  // See if we can NOP out the delay slot instruction.  We peek
+  // at the instruction before the call to make sure we're dealing
+  // with exactly the:
+  //
+  // or %o7, %g0, %ANY_REG
+  // call
+  //  or %ANY_REG, %g0, %o7
+  //
+  // case.  Otherwise this might be a tricky piece of hand written
+  // assembler calculating %o7 in some non-trivial way, and therefore
+  // we can't be sure that NOP'ing out the delay slot is safe.
+  if (op3 == 0x02
+      && rela.get_r_offset() >= 4)
+    {
+      if ((delay_insn & ~(0x1f << 14)) != 0x9e100000)
+	return;
+
+      set_insn = elfcpp::Swap<32, true>::readval(wv - 1);
+      if ((set_insn & ~(0x1f << 25)) != 0x8013c000)
+	return;
+
+      reg = (set_insn >> 25) & 0x1f;
+      if (reg == 0 || reg == 15)
+	return;
+      if (reg != ((delay_insn >> 14) & 0x1f))
+	return;
+
+      // All tests pass, nop it out.
+      elfcpp::Swap<32, true>::writeval(wv + 1, sparc_nop);
+    }
+}
+
 // Relocate section data.
 
 template<int size, bool big_endian>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]