This is the mail archive of the crossgcc@sources.redhat.com mailing list for the crossgcc project.

See the CrossGCC FAQ for lots more infromation.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Optimization question


Richard Earnshaw wrote:
> 
> >
> > I'm using an arm-elf cross compiler built from 2.95.2 for a device with
> > a StrongARM CPU core.  I've noticed that the compiler tries to avoid
> > multiply instructions by transforming a 32 bit multiply by a constant
> > into a sequence of adds and subtracts with shifts.  This is probably
> > desirable if the processor has a slow multiply instruction, but the
> > StrongARM core I'm using has a fast multiply (1 clock issue, 1-3 clock
> > result delay depending on early termination).  So I'd really prefer for
> > the compiler to use the multiply instruction.  A quick glance through
> > arm.c in the GCC sources indicate that when -mcpu=strongarm is used,
> > then a flag (arm_fast_multiply) gets set.  Should this cause the use of
> > the multiply instructions (or at least make them more favorable)?  Any
> > hints on how to get the compiler to cooperate?
> >
> 
> Well, when multiplying by a constant, it is nearly always faster to build
> the operation up from shift instructions, even on a StrongARM.  Remember
> that to use the multiply instruction a constant first has to be loaded
> into a register; that takes at least one cycle and may take many more if
> the value has to be synthesised or fetched from an area of memory that
> might be outside the cache (though that can sometimes be moved outside of
> a loop at the expense of increasing register pressure).  It then takes at
> least two cycles to perform the multiply itself, so we have an absolute
> minimum of 3 cycles before it could be possible to save time by using the
> multiply instruction.  A very large number of constant multiplications in
> normal code can be synthesised in 3 or less shift+add insns (each taking
> one cycle), so there are only a small number of cases where it would be
> better to use the multiply instruction even on a StrongARM.
> 
> The costings in gcc are set up to take the above into account, so I'm not
> surprised that you are not seeing the use of the multiply insn.  Do you
> have a specific example where the compile is definitely generating slower
> code?  If so, I'd be interested in taking a look at it.
> 
> Richard

Thanks Richard,

Your arguments are compelling.
I'll just let the compiler do its "thing".

Art

------
Want more information?  See the CrossGCC FAQ, http://www.objsw.com/CrossGCC/
Want to unsubscribe? Send a note to crossgcc-unsubscribe@sourceware.cygnus.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]