This is the mail archive of the crossgcc@sources.redhat.com mailing list for the crossgcc project.

See the CrossGCC FAQ for lots more infromation.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: size of code gcc creates (was "m68k-coff-gcc and NOPs")


Christopher Bahns wrote:
> 
> Hello Chris (and Scott),
> 
> Regarding a CrossGCC post you made back in January, I too have to deal
> with size increases going from MRI to GNU. In my case the MRI build is
> about 94k (of 96k available) and the GNU build is about 107k, which is
> over the limit on my flash. If I can just get it close then I can look
> into other methods to reduce it further, but at the moment that would be
> hopeless.

 One thing where to start reducing the code size can be the volatile
handling. Currently GCC starts to make RISC-like load-change-store
(with possible reload) operations for all the stuff declared as
volatile.

 For example the following code (probably not quite sane for m68k) :
 
----------------------------- clip ------------------------------------
#include <stdio.h>

#define ISR_FUNC  __attribute__((interrupt))

#define PBDR     (*(volatile char *) (0xffffd6))
#define ITU_TSR0 (*(volatile char *) (0xffff67))
#define UART     (*(volatile int *)  (0xffff80))

volatile int count;
extern volatile int flag;

/*
#define PBDR     (*(char *) (0xffffd6))
#define ITU_TSR0 (*(char *) (0xffff67))
#define UART     (*(int *)  (0xffff80))

int count;
extern int flag;
*/

void ISR_FUNC handle_intr(void)
{
	count++;
	if (count == 400)
	  {
	    PBDR ^= 0x01;
	    count = 0;
	    flag = 0;
	  }

	UART = 0;
	UART = 1;
	ITU_TSR0 &= 0xFE;
}
----------------------------- clip ------------------------------------

generates the following assembly output with optimization '-O' :

----------------------------- clip ------------------------------------
	.file	"isr_demo1.c"
gcc2_compiled.:
.text
.globl handle_intr
	.type	 handle_intr,@function
handle_intr:
	move.l %a0,-(%sp)       <---- a0 and d0 will be used
	move.l %d0,-(%sp)       <---- so push them into stack
	move.l count,%d0        <---- load count
	addq.l #1,%d0           <---- increment
	move.l %d0,count        <---- store it back
	move.l count,%d0        <---- reload count
	cmp.l #400,%d0
	jbne .L3
	move.l #16777174,%a0	<---- set PBDR address
	move.b (%a0),%d0        <---- load from it
	eor.b #1,%d0            <---- set the bit
	move.b %d0,(%a0)        <---- store it back
	clr.l count		<---- why no "load-store"
	clr.l flag              <---- for these then ?
.L3:
	move.l #16777088,%a0
	clr.l (%a0)             <---- "UART = 0"
	moveq.l #1,%d0
	move.l %d0,(%a0)        <---- "UART = 1"
	lea (-25,%a0),%a0       <---- set ITU_TSR0 address
	move.b (%a0),%d0        <---- load
	and.b #-2,%d0           <---- change
	move.b %d0,(%a0)        <---- store it back
	move.l (%sp)+,%d0
	move.l (%sp)+,%a0
	rte
.Lfe1:
	.size	 handle_intr,.Lfe1-handle_intr
	.comm	count,4,2
	.ident	"GCC: (GNU) 2.95.2 19991024 (release)"
----------------------------- clip ------------------------------------

while not using the volatile (using the commented part instead) will
give with optimization :

----------------------------- clip ------------------------------------
	.file	"isr_demo2.c"
gcc2_compiled.:
.text
.globl handle_intr
	.type	 handle_intr,@function
handle_intr:
	move.l %d0,-(%sp)     <--- only d0 will be used
	addq.l #1,count       <--- increment count
	cmp.l #400,count
	jbne .L3
	eor.b #1,16777174     <--- set PBDR bit
	clr.l count
	clr.l flag
.L3:
	moveq.l #1,%d0        <--- "UART=1" !!!!!
	move.l %d0,16777088
	and.b #254,16777063   <--- clear the ITU_TSR0 bit
	move.l (%sp)+,%d0     <--- restore d0
	rte
.Lfe1:
	.size	 handle_intr,.Lfe1-handle_intr
	.comm	count,4,2
	.ident	"GCC: (GNU) 2.95.2 19991024 (release)"
----------------------------- clip ------------------------------------

 As we can see, now the code looks more sane, BUT the "UART=0" was
'optimized' away!!! This is just what the volatile tries to avoid
and not using the volatile is the reason for this 'lost code'...

 When one writes an embedded app where quite a lot I/O-operations
will be used, using the volatile with the ports is quite common
and not losing any code will be achieved...

 But as far as I understand, all those direct operations to memory
locations could be done although they were declared as volatile.

 So, why is it necessary to use the :

	move.l count,%d0        <---- load count
	addq.l #1,%d0           <---- increment
	move.l %d0,count        <---- store it back
	move.l count,%d0        <---- reload count

for a volatile instead of the :

	addq.l #1,count         <--- increment count

????  Haven't seen any sane explanation for this 'feature'...

 This 'feature' has been discussed now and then since the last
millennium, but there seems to be 'more important' things to
fix in GCC. The embedded users are a minority and only they
will need to use the volatile.

 Unfortunately this bug seems to be in the GCC core and fixing
it will not be easy for a novice, not even for a more experienced
person (I know some quite qualified people having tried to look
at this). However I haven't checked the current GCC/egcs snapshots
whether this bug is still there... Anyone done this ?

 So we can only try to remind the GCC experts about this 'bug'
now and then...

> Scott,
> I also noticed that the run time library was much bigger.

 The newlib function sizes have also been a problem. Targets like
H8/300, AVR, M68HC11, MN10200 etc. could benefit from a more simple
C-library, especially from a 'minimal' printf().

 Some possibilities exist in newlib, like using the integer-only
'iprintf()' everywhere :

----------------------------- clip ------------------------------------
iprintf--write formatted output (integer only)

Synopsis 
  #include <stdio.h>

  int iprintf(const char *format, ...);


Description
  iprintf is a restricted version of printf: it has the same
  arguments and behavior, save that it cannot perform any
  floating-point formatting: the f, g, G, e, and F type
  specifiers are not recognized.
----------------------------- clip ------------------------------------

where possible, instead of the bigger 'printf()' (with vfprintf() etc.).
Replacing printf() with iprintf() may happen simply by using a :

  #define printf   iprintf

in some suitable place...

 For example the famous "Hello world" will shrink quite a lot when the
'iprintf()' will be used instead of 'printf()' :

----------------------------- clip ------------------------------------
E:\usr\local\samples>\usr\local\m68k-coff\bin\size *hello.x
   text    data     bss     dec     hex filename
  23320    1948      12   25280    62c0 hello.x
   8632    1920      12   10564    2944 ihello.x
----------------------------- clip ------------------------------------

 So 14688 byte less if not any 'printf()' derivatives (like sprintf())
will be needed for floats anywhere...

 Probably there are quite a lot more possibilities in newlib to reduce
the code size, but the printf-case is the best known.

Cheers, Kai


------
Want more information?  See the CrossGCC FAQ, http://www.objsw.com/CrossGCC/
Want to unsubscribe? Send a note to crossgcc-unsubscribe@sourceware.cygnus.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]