This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] MIPS/binutils: microMIPS linker relaxation fixes

From: "Maciej W. Rozycki" <macro at codesourcery dot com>
To: Richard Sandiford <rdsandiford at googlemail dot com>
Cc: Chao-ying Fu <fu at mips dot com>, Ilie Garbacea <ilie at mips dot com>, <binutils at sourceware dot org>, Tristan Gingold <gingold at adacore dot com>
Date: Wed, 16 Nov 2011 13:43:54 +0000
Subject: Re: [PATCH] MIPS/binutils: microMIPS linker relaxation fixes
References: <alpine.DEB.1.10.1110272117590.28657@tp.orcam.me.uk> <87ty68b6al.fsf@firetop.home> <alpine.DEB.1.10.1111151456400.4191@tp.orcam.me.uk> <87lirhwcag.fsf@firetop.home>

Hi Richard,

 I have now cc-ed the original authors of this code in case they have 
anything to add.

On Tue, 15 Nov 2011, Richard Sandiford wrote:

> >  So I have actually given it some more thought and my understanding of the 
> > ABI remains that while orphaned R_MIPS_LO16 relocations are indeed 
> > permitted, they still must be preceded by a corresponding R_MIPS_HI16, 
> > although that is not required to be adjacent.  I believe this is only 
> > permitted to allow cases like you quoted to avoid unnecessary extra code 
> > to add missing R_MIPS_HI16 relocations.
> 
> There are still potential problems though.  We deliberately allow things like:
> 
>         lui     $4,%hi(foo)
>         lw      $6,%lo(foo)($4)
>         lw      $7,%lo(foo+4)($4)
>         ...
>         .align  8
> foo:
>         .word   X, Y
> 
> and foo is allowed to be in a text section.  Does your patch ensure that
> foo remains 8-byte aligned, even if we relax code earlier in the section?

 Sigh, you're right -- I wish we realised this earlier on.  No, the 
alignment of foo will get broken of course just as alignment of standard 
MIPS code would, as noted with the original submission of this update.  
Of course if you run this under Linux, the you won't notice unless you 
observe the performance drop badly.

> >  Do you have a better idea?
> 
> TBH, my inclination is to remove it from trunk too.  I imagine
> GCC's LTO will catch many of the interesting cases (because then
> we assemble the output object's text section at once).

 OK, so let's see where we are.  We've got three kinds of relaxation 
actions we make:

1. I think with the changes I made to branch relaxation in GAS we are 
   mostly covered.  There's one corner case remaining I reckon (I'd have 
   to go back to the code and/or my earlier notes to track it down), where 
   we fail to convert to a short or compact branch.  And branches between 
   separate modules are extremely rare, so I wouldn't bother about them.  
   So all the branch relaxation code here should by now have been mostly 
   redundant.  I'll have a look into that corner case yet -- I may not be 
   able to do that immediately though.

2. Short delay slot relaxation, i.e. JAL->JALS conversion.  We actually 
   should be handling JALR->JALRS and BGEZAL/BLTZAL->BGEZALS/BLTZALS as 
   well, but we don't.  These can and actually should be done in GAS.  
   There are two cases to handle:

   * Instructions swapped into a delay slot.  I reckon this is a bit 
     tricky, but I think still doable.  The instruction to be swapped is 
     already of the right size, it's just not swapped if it's of the wrong 
     size for the delay slot.  We should enable that swapping and flip the 
     delay slot size bit in the respective branch/jump opcode.

   * Instructions manually scheduled in a delay slot ("noreorder" mode).  
     Currently the mnemonic used for the branch/jump determines the size 
     of this instruction.  I think we should always treat the long delay 
     slot mnemonics as macros; they will often come from assembly written 
     for the standard MIPS mode the conversion of which to the respective 
     short delay slot mnemonics is IMO infeasible.  Not even mentioning 
     that if operands are substituted in any way (e.g. by macro 
     expansion), then the size of the instruction may vary between 
     assembly passess.

     Again, this may be a bit tricky as it requires looking forwards it 
     would seem.  But perhaps we can handle this with relaxation, or maybe 
     simpler yet -- by tweaking the previous instruction emitted through 
     the history of instructions we maintain.

     I think we should have a way to disable this branch/jump conversion, 
     perhaps in the "nomacro" mode or with a new setting (up to debate).

   * While at it we might want to think about instruction swapping around 
     JALX -- as noted above we don't do that if the instruction does not 
     satisfy the delay slot size requirement and there's no JALXS 
     instruction.  We could convert the instruction to the 32-bit size.  
     But then it may be really tough unless we relax all the 16-bit 
     instructions which, conversely, seems an overkill to me.  So I 
     wouldn't put too much effort into it, but still I think it's worth 
     double-checking.

3. HI0_LO16 and ADDIUPC relaxation.  There's nothing that can be done for 
   the former any earlier than by the linker, period.  But do we care?  I 
   think the architecture makes this optimisation unlikely to matter.  
   It's really unusual for TLB systems to map these low/high pages.  Are 
   they used in BAT systems?  I don't know -- can anyone comment?  The 
   addresses from 0 up are typically useful in the error exception 
   handlers (where CP0.Status.ERL switches to the identity mapping of the 
   virtual address space), but are they really such a common case as to 
   dedicate a linker optimisation for?  I doubt it.  So I think we can 
   safely drop this feature and nobody will notice.

   Now as to the ADDIUPC relaxation -- this I think is really worth the 
   trouble as I have seen significant text size reduction as a result of 
   this optimisation.  I'll dig out the exact figures I've got with an 
   example app.  The problem is again you cannot really make this 
   optimisation any earlier than in the linker.  The compiler or assembler 
   do not know what the size of the final executable will be and therefore 
   which references are going to fit in the ADDIUPC's range or not.

   Hmm, I wonder if there's anything we could do about this.  One thought 
   I've got is to refrain from making this optimisation if there are data 
   symbols in code being processed.  But is seems unlikely to me to work 
   reasonably, because (please correct me if I am wrong) at the point 
   relaxation is made all the text sections from all the modules have 
   already been merged into the respective output sections and we cannot 
   only omit the fragments that correspond to modules that had data 
   symbols in text while preserving their alignment too if any of the 
   preceding fragments shrinks.  At least without turning half of the BFD 
   linker code upside down, the lone idea of which makes me feel chilly.  
   Correct?

Any other thoughts?  What do the others do -- or are we the only target 
doing this kind of linker relaxation?  What's LTO BTW?

  Maciej

Follow-Ups:
- RE: [PATCH] MIPS/binutils: microMIPS linker relaxation fixes
  - From: Fu, Chao-Ying
- Re: [PATCH] MIPS/binutils: microMIPS linker relaxation fixes
  - From: Richard Sandiford

References:
- Re: [PATCH] MIPS/binutils: microMIPS linker relaxation fixes
  - From: Richard Sandiford
- Re: [PATCH] MIPS/binutils: microMIPS linker relaxation fixes
  - From: Maciej W. Rozycki
- Re: [PATCH] MIPS/binutils: microMIPS linker relaxation fixes
  - From: Richard Sandiford

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]