memcpy performance (fwd)
Joel Sherrill
joel@OARcorp.com
Tue Dec 9 14:20:00 GMT 1997
Watch out for those performance minded RTEMS users. You will hear about a
wasted cycle for sure. :)
Here is Eric's feedback on what toolset/arguments he was using.
FYI he ported the KA9Q and Linux TCP/IP stacks to RTEMS, the FP
trap code required for the 68040, implemented the termios console
support, and written the 68360 BSP. He is pretty swift. :)
--joel
---------- Forwarded message ----------
Date: Tue, 9 Dec 97 16:09:55 -0600
From: Eric Norum <eric@skatter.usask.ca>
To: Joel Sherrill <joel@OARcorp.com>
Subject: Re: memcpy performance
You wrote:
> What args did you give to gcc for the case you reported on the
> list? One of the new Cygnus newlib maintainers wants to know. And
> before they ask what version of gcc are you using. <
>
> I am getting pretty good responses this week from the Cygnus sde of
> the world.
m68k-rtems-gcc --version
egcs-2.90.04 970901 (gcc2-970802 experimental)
Here's how memcpy.c gets compiled.
/shareNeXT/OS4.2/RTEMS/src/tools-970904/build-m68k-tools/gcc/xgcc
-B/shareNeXT/OS4.2/RTEMS/src/tools-970904/build-m68k-tools/gcc/
-idirafter
/shareNeXT/OS4.2/RTEMS/src/tools-970904/build-m68k-tools/m68k-rtems/newlib/targ-include
-idirafter
/shareNeXT/OS4.2/RTEMS/src/tools-970904/src/newlib/libc/include
-nostdinc -O2 -g -pipe -m68332 -O2 -DHAVE_GETTIMEOFDAY
-DMALLOC_PROVIDED -DEXIT_PROVIDED -DMISSING_SYSCALL_NAMES
-DSIGNAL_PROVIDED -DREENTRANT_SYSCALLS_PROVIDED -fno-builtin
-I/shareNeXT/OS4.2/RTEMS/src/tools-970904/build-m68k-tools/m68k-rtems/newlib/./targ-include
-I/shareNeXT/OS4.2/RTEMS/src/tools-970904/src/newlib/./libc/include
-c ../../../../../../src/newlib/libc/string/memcpy.c
This produces the 5-instruction/byte copy:
0xe2ea <memcpy+22>: moveb %a1@+,%a0@+
0xe2ec <memcpy+24>: movel %d1,%d0
0xe2ee <memcpy+26>: subql #1,%d1
0xe2f0 <memcpy+28>: tstl %d0
0xe2f2 <memcpy+30>: bnes 0xe2ea <memcpy+22>
Changing the memcpy source to:
if (len) {
do {
*ap++ = *bp++;
} while (--len);
}
improves the loop to:
.L9:
move.b (%a0)+,(%a1)+
subq.l #1,%d0
jbne .L9
No loop mode, but certainly a lot faster!
The `memcpy turns into bcopy which calls memmove' problem is
because of the way the compiler was built. The
-DTARGET_MEM_FUNCTIONS=1 flag should be used (or set up when the
compiler is configured). Perhaps this change could make it into the
next tools distribution.
---
Eric Norum eric@skatter.usask.ca
Saskatchewan Accelerator Laboratory Phone: (306) 966-6308
University of Saskatchewan FAX: (306) 966-6058
Saskatoon, Canada.
More information about the Newlib
mailing list