This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: [PATCH] MIPS/binutils: microMIPS linker relaxation fixes
- From: "Maciej W. Rozycki" <macro at codesourcery dot com>
- To: Richard Sandiford <rdsandiford at googlemail dot com>
- Cc: Chao-ying Fu <fu at mips dot com>, Ilie Garbacea <ilie at mips dot com>, <binutils at sourceware dot org>, Tristan Gingold <gingold at adacore dot com>
- Date: Wed, 16 Nov 2011 13:43:54 +0000
- Subject: Re: [PATCH] MIPS/binutils: microMIPS linker relaxation fixes
- References: <alpine.DEB.1.10.1110272117590.28657@tp.orcam.me.uk> <87ty68b6al.fsf@firetop.home> <alpine.DEB.1.10.1111151456400.4191@tp.orcam.me.uk> <87lirhwcag.fsf@firetop.home>
Hi Richard,
I have now cc-ed the original authors of this code in case they have
anything to add.
On Tue, 15 Nov 2011, Richard Sandiford wrote:
> > So I have actually given it some more thought and my understanding of the
> > ABI remains that while orphaned R_MIPS_LO16 relocations are indeed
> > permitted, they still must be preceded by a corresponding R_MIPS_HI16,
> > although that is not required to be adjacent. I believe this is only
> > permitted to allow cases like you quoted to avoid unnecessary extra code
> > to add missing R_MIPS_HI16 relocations.
>
> There are still potential problems though. We deliberately allow things like:
>
> lui $4,%hi(foo)
> lw $6,%lo(foo)($4)
> lw $7,%lo(foo+4)($4)
> ...
> .align 8
> foo:
> .word X, Y
>
> and foo is allowed to be in a text section. Does your patch ensure that
> foo remains 8-byte aligned, even if we relax code earlier in the section?
Sigh, you're right -- I wish we realised this earlier on. No, the
alignment of foo will get broken of course just as alignment of standard
MIPS code would, as noted with the original submission of this update.
Of course if you run this under Linux, the you won't notice unless you
observe the performance drop badly.
> > Do you have a better idea?
>
> TBH, my inclination is to remove it from trunk too. I imagine
> GCC's LTO will catch many of the interesting cases (because then
> we assemble the output object's text section at once).
OK, so let's see where we are. We've got three kinds of relaxation
actions we make:
1. I think with the changes I made to branch relaxation in GAS we are
mostly covered. There's one corner case remaining I reckon (I'd have
to go back to the code and/or my earlier notes to track it down), where
we fail to convert to a short or compact branch. And branches between
separate modules are extremely rare, so I wouldn't bother about them.
So all the branch relaxation code here should by now have been mostly
redundant. I'll have a look into that corner case yet -- I may not be
able to do that immediately though.
2. Short delay slot relaxation, i.e. JAL->JALS conversion. We actually
should be handling JALR->JALRS and BGEZAL/BLTZAL->BGEZALS/BLTZALS as
well, but we don't. These can and actually should be done in GAS.
There are two cases to handle:
* Instructions swapped into a delay slot. I reckon this is a bit
tricky, but I think still doable. The instruction to be swapped is
already of the right size, it's just not swapped if it's of the wrong
size for the delay slot. We should enable that swapping and flip the
delay slot size bit in the respective branch/jump opcode.
* Instructions manually scheduled in a delay slot ("noreorder" mode).
Currently the mnemonic used for the branch/jump determines the size
of this instruction. I think we should always treat the long delay
slot mnemonics as macros; they will often come from assembly written
for the standard MIPS mode the conversion of which to the respective
short delay slot mnemonics is IMO infeasible. Not even mentioning
that if operands are substituted in any way (e.g. by macro
expansion), then the size of the instruction may vary between
assembly passess.
Again, this may be a bit tricky as it requires looking forwards it
would seem. But perhaps we can handle this with relaxation, or maybe
simpler yet -- by tweaking the previous instruction emitted through
the history of instructions we maintain.
I think we should have a way to disable this branch/jump conversion,
perhaps in the "nomacro" mode or with a new setting (up to debate).
* While at it we might want to think about instruction swapping around
JALX -- as noted above we don't do that if the instruction does not
satisfy the delay slot size requirement and there's no JALXS
instruction. We could convert the instruction to the 32-bit size.
But then it may be really tough unless we relax all the 16-bit
instructions which, conversely, seems an overkill to me. So I
wouldn't put too much effort into it, but still I think it's worth
double-checking.
3. HI0_LO16 and ADDIUPC relaxation. There's nothing that can be done for
the former any earlier than by the linker, period. But do we care? I
think the architecture makes this optimisation unlikely to matter.
It's really unusual for TLB systems to map these low/high pages. Are
they used in BAT systems? I don't know -- can anyone comment? The
addresses from 0 up are typically useful in the error exception
handlers (where CP0.Status.ERL switches to the identity mapping of the
virtual address space), but are they really such a common case as to
dedicate a linker optimisation for? I doubt it. So I think we can
safely drop this feature and nobody will notice.
Now as to the ADDIUPC relaxation -- this I think is really worth the
trouble as I have seen significant text size reduction as a result of
this optimisation. I'll dig out the exact figures I've got with an
example app. The problem is again you cannot really make this
optimisation any earlier than in the linker. The compiler or assembler
do not know what the size of the final executable will be and therefore
which references are going to fit in the ADDIUPC's range or not.
Hmm, I wonder if there's anything we could do about this. One thought
I've got is to refrain from making this optimisation if there are data
symbols in code being processed. But is seems unlikely to me to work
reasonably, because (please correct me if I am wrong) at the point
relaxation is made all the text sections from all the modules have
already been merged into the respective output sections and we cannot
only omit the fragments that correspond to modules that had data
symbols in text while preserving their alignment too if any of the
preceding fragments shrinks. At least without turning half of the BFD
linker code upside down, the lone idea of which makes me feel chilly.
Correct?
Any other thoughts? What do the others do -- or are we the only target
doing this kind of linker relaxation? What's LTO BTW?
Maciej