This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] MIPS/GAS: Correct microMIPS branch swapping assertion


Hi,

 A bug in GAS causes the assembly of some files to fail where the branch 
swapping optimisation is enabled (which is the default) and branch delay 
slots have not been explicitly scheduled, either by the compiler or 
manually in handcoded assembly.  If that happens, then a message like:

foo.s:1234: Error: internal error: fixup not contained within frag

is produced.

 The cause is the internal structure of the assembler, generated machine 
code is stored in dynamically allocated memory chunks called frags.  These 
chunks have a fixed size, when enough machine code has been generated to 
fill a frag completely, then another one is allocated.

 Frequently these chunks are not completely filled, because some assembly 
code sequences result in two variants of machine code produced, to handle 
different scenarios that the assembler cannot resolve until all of the 
input file has been consumed.  For example such variants are produced in 
some code models where a different sequence is needed to access data 
depending on whether it is associated with a local or an external symbol 
and the symbol's definition has not been seen yet.  Such variant code has 
to be put last in a frag and therefore a new frag always follows 
immediately even though the previous frag has only been partially filled.

 The particular scenario that triggers this bug is where a branch 
instruction is first in a frag and GAS decides it can fill its delay slot 
by the preceding instruction (effectively swapping the order of the two 
instructions), that comes from the previous frag.  

 This is performed differently depending on whether the branch is relaxed 
-- that is it can be transformed to a different sequence of code later on 
-- or not.  In the former case the last instruction is deleted from the 
previous frag, shrinking the frag by the length of that instruction, and 
moved into the current one just after the branch.  In the latter case the 
branch is simply exchanged with the last instruction from the previous 
frag and nothing else is changed.

 It is the latter case that triggers the bug.  It happens in the microMIPS 
mode only, where the branch is a 32-bit instruction and the other 
instruction is a 16-bit instruction.  As a result after the swap the 
branch ends up sticking out half-way beyond the earlier frag.  This causes 
the assertion noted at the beginning to trigger as code later on notices a 
relocation (fixup) applied to the branch reaching beyond the frag.  I 
haven't tried a scenario where a 16-bit branch is swapped with a 32-bit 
instruction, but I'd expect other interesting observations.

 As variant code cannot be placed in a branch delay slot this bug only 
triggers where the previous frag has been completely filled which makes it 
happen every once in a while only.  Also this bug does not happen for 
MIPS16 code (where a 32-bit branch can be swapped with a 16-bit other 
instruction, but not the other way round) that is handled by a different 
execution path.

 Here's a one-line fix that enables code already used for relaxed branches 
for use with fixed branches as well.  I have verified it to work correctly 
with the piece of code that originally triggered it, but I wanted to make 
sure this is covered by our test suite as well.

 This turned out to be a bit of a challenge as the test suite cannot 
predict exactly where the end of the first frag is going to be.  There are 
two reasons for that.

 First, frags are placed in memory allocated dynamically with 
obstack_alloc.  This is a GNU extension and is implemented in (lib)iberty 
and glibc; the former is bundled with binutils and is used where not 
available in the system C library, otherwise the system-supplied 
(presumably glibc) version is used.

 Obstack memory is allocated in chunks and a frag can be freely grown up 
to the amount of memory available in the obstack it has been put in.  
Once the frag cannot be grown any further it has to be closed and a new 
frag created, that will be placed in a newly allocated obstack.

 The size of the chunk and therefore any obstack allocated, by default, 
depends on the programming model used on the host.  GAS starts by using 
the default size.  This size is calculated as 4096 bytes minus a 
programming-model specific amount calculated based on sizeof (union 
fooround) defined internally, that I believe can be for *nix systems, that 
we may be potentially concerned about, one of 4, 8, 12 or 16, and the 
corresponding amount comes out as a value between 16 and 32, so the 
remaining space is between 4064 and 4080.

 Second, by the time the first instruction is assembled, some space in the 
obstack has already been consumed.  In practice for the configuration I 
used the value came out as 3988.  So I have figured out this can only be 
tested with the aid of some heuristics that can adapt for system 
specifics, and ended up with the test case included below that sweeps a 
range of possible offsets up to 4096 in hope the end of the first frag 
will be somewhere within. 

 I have figured out it makes no sense to start from zero as that would be 
an unnecessary waste of time and disk space and decided it will be enough 
if the last 256 bytes are scanned.  Also there is no need to scan every 
byte boundary as any instruction that does not entirely fit in the 
remaining space in a frag is automatically carried over to a newly-created 
frag and the preceding frag is closed.

 As I have written above, GAS starts by using the default chunk size, but 
interestingly enough, this size is applied to the first obstack only.  As 
soon as another obstack is needed GAS switches to using about the minimum 
size possible and starts requesting further obstacks in amounts that are 
double the instruction size only, that for the MIPS target means either 4 
or 8 bytes.  How miserable!  Obstack allocation code uses some arbitrary 
look-ahead heuristics and actually provides some more, but that is is 
still 84 bytes only, so beyond the first obstack GAS starts thrashing with 
requests for new obstacks.  This is I believe an unnecessary memory and 
performance waste.

 By how the offending code looks I gather the intent was to provide for 
large obstacks where an excessively big chunk of data needs to be placed 
in a single frag (the associated comment quotes 2GB!), but inadvertently 
the code does not check for the allocation size getting smaller than the 
reasonable default.  There is normally no penalty from using the default 
even for the smallest frags possible, because as many individual frags can 
be placed in a single obstack as will fit there.  Therefore I believe this 
is a missed optimisation and I will be posting a proposed fix separately.

 So, to recap, here is the final change and the test results for the test 
case are as follows:

Progressions and removed failures:

FAIL -> PASS: gas.sum:MIPS branch swapping (1018)
FAIL -> PASS: gas.sum:MIPS branch swapping (996)

The "1018" failure goes away as expected even without the fix below when 
the obstack allocation pessimisation is eliminated.

 I have regression-tested it with my usual set of targets including but 
not limited to 23 MIPS configurations and spotted no problems.  OK to 
apply?

2012-08-02  Maciej W. Rozycki  <macro@codesourcery.com>

	gas/
	* config/tc-mips.c (append_insn): Also handle moving delay-slot
	instruction across frags for fixed branches.

	gas/testsuite/
	* gas/mips/branch-swap-2.l: New list test.
	* gas/mips/branch-swap-2.s: New test source.
	* gas/mips/mips.exp: Run the new test.

  Maciej

binutils-gas-mips-branch-swap.diff
Index: binutils-fsf-trunk-quilt/gas/config/tc-mips.c
===================================================================
--- binutils-fsf-trunk-quilt.orig/gas/config/tc-mips.c	2012-08-02 15:17:48.000000000 +0100
+++ binutils-fsf-trunk-quilt/gas/config/tc-mips.c	2012-08-02 18:55:31.610534816 +0100
@@ -4488,7 +4488,7 @@ append_insn (struct mips_cl_insn *ip, ex
 	    move_insn (ip, delay.frag, delay.where);
 	    move_insn (&delay, ip->frag, ip->where + insn_length (ip));
 	  }
-	else if (relaxed_branch)
+	else if (relaxed_branch || delay.frag != ip->frag)
 	  {
 	    /* Add the delay slot instruction to the end of the
 	       current frag and shrink the fixed part of the
Index: binutils-fsf-trunk-quilt/gas/testsuite/gas/mips/branch-swap-2.l
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ binutils-fsf-trunk-quilt/gas/testsuite/gas/mips/branch-swap-2.l	2012-08-02 18:55:31.620517233 +0100
@@ -0,0 +1 @@
+# No warnings or errors expected!
Index: binutils-fsf-trunk-quilt/gas/testsuite/gas/mips/branch-swap-2.s
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ binutils-fsf-trunk-quilt/gas/testsuite/gas/mips/branch-swap-2.s	2012-08-02 18:55:31.630534624 +0100
@@ -0,0 +1,8 @@
+	.set	micromips
+	.text
+foo:
+	.rept	count
+	ori	$2, $3, (. - foo) >> 2
+	.endr
+	addu	$2, $3, $4
+	j	ext
Index: binutils-fsf-trunk-quilt/gas/testsuite/gas/mips/mips.exp
===================================================================
--- binutils-fsf-trunk-quilt.orig/gas/testsuite/gas/mips/mips.exp	2012-08-02 15:42:25.000000000 +0100
+++ binutils-fsf-trunk-quilt/gas/testsuite/gas/mips/mips.exp	2012-08-02 19:21:46.340718916 +0100
@@ -505,6 +505,20 @@ if { [istarget mips*-*-vxworks*] } {
     run_dump_test_arches "branch-misc-2pic-64" [mips_arch_list_matching mips3]
     run_dump_test "branch-misc-3"
     run_dump_test "branch-swap"
+
+    if $elf {
+	# Sweep a range of branch offsets so that it hits a position where
+	# it is at the beginning of a frag and then swapped with a 16-bit
+	# instruction from the preceding frag.  The offset will be somewhere
+	# close below 4096 as this is the default obstack size limit that
+	# we use and some space will have been already consumed.  The exact
+	# amount depends on the host's programming model.
+	for { set count 960 } { $count <= 1024 } { incr count } {
+	    run_list_test "branch-swap-2" "--defsym count=$count" \
+		"MIPS branch swapping ($count)"
+	}
+    }
+
     run_dump_test "div"
 
     if { !$addr32 } {


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]