This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] more problems with newlib/libc/machine/m68k/memcpy.S


On Tue, Feb 09, 2010 at 03:15:11PM +0100, Josef Wolf wrote:
> 3. -mcpu32 seems to imply -mc68020. So the check for alignment capabilities
>    gives a wrong result for cpu32. BTW: alignment capabilities depend not
>    only on the CPU. It is also dependant on bus width and how the memory is
>    connected.

it's been this way for years and is arguably incorrect, as neither
instruction set is a superset of the other.  (this is a big can of worms
with the way 68k is currently set up under gcc.  I'd love to fix it, but
I'm not holding my breath for the necessary funding.)

even if 8-bit memory is connected to cpu32, I believe the SIM can handle
16- and 32-bit transfers automatically, and with lower overhead since it
can do the transfers back-to-back without intervening instruction fetch.

> IMHO, the correct algorithm would be like this:
> 
> 1. Align dest in any case, no matter what CPU we have. This will do no
> harm to any CPU, since all CPUs can write fast to long-word addresses.
> 2. After dest is aligned, check whether src is aligned also. If it is aligned,
>    we can use optimized algorithm. If not, fall back to bytewise copy. This
>    should have been the response to the error reported in the thread mentioned
>    above.
> 3. Some hardware (like cpu32 with 16bit bus) can do long-word access to word
>    addresses without speed penalty. With such hardware, having src on an even
>    address is enough to use the optimized algorithm.
>    BTW: I think this depends not only on the CPU core, but also on how memory
>         is connected. I have included 16bit-alignment into the patch anyway.
>         We can drop it if it turns out to be true that dependence on the CPU
>         is the wrong thing to do here.

if you're going to optimize for cpu32, see if you can optimize the copy
loops into a single word instruction followed by a dbxx instruction.
this avoids instruction fetches during the loop and increases bus
throughput substantially.

I have also noticed that there is a point of diminishing returns for
jumping through alignment hoops.  depending on the CPU speed, it may be
faster to do a zero-overhead byte copy for small transfers rather than
go through alignment setups.

-- 
  Aaron J. Grier | "Not your ordinary poofy goof." | agrier@poofygoof.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]