Problems with relocations for a custom ISA

Tue Aug 8 15:35:24 GMT 2023

Hello,

On Tue, 8 Aug 2023, MegaIng wrote:

> > > Most of the basics I already managed to implement, i.e. I can generate
> > > simple
> > > workable ELF files. However, I am running into problems with relocations
> > > for
> > > "load immediate" instructions. Without extensions, we want to potentially
> > > emit
> > > long chains of instruction (3 to 8 instructions is realistic), but with
> > > proper
> > > extensions in can get down to only 1 instruction of 3 or 4 bytes. I am
> > > unsure
> > > how to best represent such variable length relocations in BFD and ELF.
> > The normal way would be to not do that.  It seems the assembler will
> > already see either a long chain of small insns, or a single large insn,
> > right?
> 
> Our idea was that the user can use a simple pseudo instruction to 
> represent the entire process of loading a symbol (or any immediate for 
> that matter).

Pseudo instruction makes sense.  But then it would still be the assembler 
that expands it to either a couple base insns or a single extended insn.
The linker would see only one or the other, and hence also only the base 
or the extended relocs.

Or did you really want to reserve some specific byte encoding for this 
pseudo instruction to transfer it from assembler via object file to linker 
and let only the linker replace that by one or the other variant?  That 
seems an unnecessarily complicated scheme.  It depends on if the assembler 
does or doesn't know if it can target the extended insns, or only the base 
ones.  I would definitely suggest that the assembler at latest should know 
this.

> > (obviously details will differ, your 16bit insns won't be able to quite
> > set all 16 bits :) ).
> > If you really want to optimize these sequences also at link time (but
> > why?) then all of this becomes more complicated, but remains essentially
> > the same.  The secret will then be in linking from one of the small relocs
> > (say, the high16 one) to the other, for the linker to easily recognize the
> > whole insn pair and appropriately do something about those byte sequences.
> > In that scheme you need to differ between relocations applied to relaxable
> > code and relocation applied to random non-relaxable data.  E.g. you
> > probably need two variants of the RELOC_LOW16 relocation.
> 
> Not sure if you took a look at our instruction set: The way you would load an
> arbitrary 16bit word is via a sequence of `slo` (shift left 5 and or)
> instructions which use a 5bit immediate (the largest we have in base). So
> breaking it up into two RELOC_LOW_16 or similar wouldn't quite work.

Sure, as I said above: "obviously details will differ".

> It would have to be 3-4 RELOC_BITS_0_4, RELOC_BITS_5_9 RELOC_BITS_10_15 
> or something like that. And you couldn't exactly remove one of those 
> without changing the others.

Yes, this is the usual way to express that.  There are many architectures 
which have similar ISA restrictions and they all do it essentially the 
same way: "select X bits from value, put them into Y bits of field", for 
potentially many combinations of (not necessarily consecutive) X and Y.

> But ofcourse, we don't always need all 4 
> instructions, sometimes we can get away with only two or three, for 
> example if it's only an 8bit value, we only need 2 instructions. We 
> would like to optimize these cases somewhere.

I see.  Yeah, that will ultimately need some linker relaxation as only 
that one will know for sure which values symbols have, and hence if they 
do or do not fit certain constraints.

> After a bit more 
> discussion we came to the idea of having many relocations that 
> potentially cover multiple instructions so that the entire 
> load-immediate sequence can be covered by one relocation,

As you have only such a short immediate field in the base ISA this seems 
like a sensible idea, as otherwise, as you say, you need 7 relocations 
(and insns) for a full 32bit load.

> but this is quite a large amount of relocations.

Hmm?  I don't understand this remark.  If you cover a range of 
instructions by one relocation you necessarily need fewer relocs than if 
you use one reloc per insn?

> > I wouldn't go that way if I were you: it seems the assembler/compiler
> > needs to know if targeting the extended ISA or not anyway, so generating
> > the right instructions and relocations from the start in the assembler
> > seems the right choice, and then doesn't need any relax complications at
> > link time.
> 
> As long as the range (or even the exact value) of the symbol is known at
> assembly time, this is ofcourse true, but what about situations where nothing
> about the range of the value is known?

The compiler/assembler would always emit the full sequence (e.g. assumes 
that the symbol in question happens to be full 32bit).  If you want to 
optimize this use in case the symbol happens to need fewer bits, then yes, 
you do need linker relaxation.  As said, you then need a way in the linker 
to recognize an insn sequence that "belongs" together, so that you can 
appropriately optimize this, either by referring from one to the next 
reloc in such a chain, or by simply assuming that such sequences are 
always done in a certain order (i.e. a simple pattern match; unrecognized 
patterns would remain unrelaxed/unoptimized).

The basic form of relocations doesn't depend on that, though.  You still 
need to differ between the lowest N bits of the requested value, the next 
N bits, the next N bits, and so on, so you do need roundup(32/N) reloc 
types either way.

By restricting certain insn sequences and flexibility you can get away 
with fewer relocations than this.  E.g. with your idea of covering 
multiple insns with one reloc.  Say, if you require that the low 10 bits 
of a value are always set in this way (and given your ISA that makes 
sense):

   shiftset5 %r1, bit04(sym)
   shiftset5 %r1, bit59(sym)

and never with another insn in between, and never in a difference order, 
then of course you can get away with a relocation (say) RELOC_SHIFTSET10, 
that takes the low 10 bits of 'sym' and appropriate distributes those 10 
bits into the right 5 bit field of the instruction.  It would implicitely 
cover both instructions, i.e. a 32bit place in the code section.

If you extend this idea to cover seven instructions of the base ISA you 
can get away with a single reloc that is able to set the whole 32bit of a 
value (at the expense of not being able to place unrelated instructions 
between those seven).

> It seems like other assembler targets truncate the values in those 
> cases? If we went for the minimal representation we would basically 
> limit external symbols to 5bit, which isn't exactly ideal. And from what 
> I can tell, growing a relocation also isn't really something bfd is 
> designed to deal with, right?

I'm not super fluent in the actual implementation of bfd linker 
relaxation.  But I don't see why it can't also grow sections.  It's true 
that the usual relaxation shrinks sizes, and it's probably better to 
follow that as well, but in principle enlarging is no proble either (if 
you enlarge _and_ shrink in your relaxation you can run into 
endless oscillation between the two, so that needs to be watched for).

But one thing about terminology: relocations themself don't grow or 
shrink.  A relocation in principle applies to a certain address without 
range.  The semantics of a specific relocation type will usually say that 
these-and-those bits in a field will be changed by it, and you can say 
that that's the size of a relocation.  But not all relocations are like 
that, and nothing really prevents you from either changing the relocation 
type when you want something else (in linker relaxation), or even defining 
a funny type that applies to either (say) a byte or a word, as needed.  
You need to implement special functions for such relocs then, and can't 
use the generic simple BFD reloc howto model, but still.

Just to expand on this: in principle one could invent a relocation type 
that says "when the symbol has value '1' change the byte 45 bytes 
from here to 42, when it has another value then encode that one into the 
word 7 bytes from here".  That's obviously a crazy semantics for a 
relocation, but nothing inherently prevents you from that.  (Of course, 
making sure that there actually _is_ something 45 bytes from the relocs 
place is a problem :) )  The "size" of such relocation wouldn't be 
well-defined anymore (or be 46), but what I'm saying is, that this is 
okayish.

What does grow or shrink is the section content, and hence distance 
between labels might change during relaxation, which requires delaying 
resolving jumps until relaxation time as well.  This can get quite slow at 
link time (riscv is plagued by this).  Just to make you aware :)

One remark: you _really_ should think long and hard about your immediate 
size in the base ISA.  5 bits is terribly small.  Maybe you can snatch 
away some bits here and there in your 16bit insns to make this 8 bits 
(something that divides 32 would be ideal), but even 6 would bring the 
full-32-bit sequence from 7 to 6 instructions.

Ciao,
Michael.