This is the mail archive of the
libc-ports@sources.redhat.com
mailing list for the libc-ports project.
Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: Will Newton <will dot newton at linaro dot org>
- Cc: "libc-ports at sourceware dot org" <libc-ports at sourceware dot org>, Patch Tracking <patches at linaro dot org>, OndÅej BÃlka <neleai at seznam dot cz>, Siddhesh Poyarekar <siddhesh at redhat dot com>
- Date: Fri, 30 Aug 2013 15:26:40 -0400
- Subject: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
- Authentication-results: sourceware.org; auth=none
- References: <520894D5 dot 7060207 at linaro dot org> <CANu=DmiBHoymFKTvaW_VsdhWZEYwkfViz1tTeRgj7H80f0FntA at mail dot gmail dot com> <5220D30B dot 9080306 at redhat dot com> <CANu=DmiXLL9v1Z1KS0sBOs-pL8csEUGc9YE829_-tidKd-GruQ at mail dot gmail dot com>
On 08/30/2013 02:48 PM, Will Newton wrote:
> On 30 August 2013 18:14, Carlos O'Donell <carlos@redhat.com> wrote:
>
> Hi Carlos,
>
>>>> A small change to the entry to the aligned copy loop improves
>>>> performance slightly on A9 and A15 cores for certain copies.
>>>>
>>>> ports/ChangeLog.arm:
>>>>
>>>> 2013-08-07 Will Newton <will.newton@linaro.org>
>>>>
>>>> * sysdeps/arm/armv7/multiarch/memcpy_impl.S: Tighten check
>>>> on entry to aligned copy loop for improved performance.
>>>> ---
>>>> ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S | 4 ++--
>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> Ping?
>>
>> How did you test the performance?
>>
>> glibc has a performance microbenchmark, did you use that?
>
> No, I used the cortex-strings package developed by Linaro for
> benchmarking various string functions against one another[1].
>
> I haven't checked the glibc benchmarks but I'll look into that. It's
> quite a specific case that shows the problem so it may not be obvious
> which one is better however.
If it's not obvious how is someone supposed to review this patch? :-)
> [1] https://launchpad.net/cortex-strings
There are 2 benchmarks. One appears to be dhrystone 2.1, which isn't a string
test in and of itself which should not be used for benchmarking or changing
string functions. The other is called "multi" and appears to run some functions
in a loop and take the time.
e.g.
http://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/view/head:/benchmarks/multi/harness.c
I would not call `multi' exhaustive, and while neither is the glibc performance
benchmark tests the glibc tests have received review from the glibc community
and are our preferred way of demonstrating performance gains when posting
performance patches.
I would really really like to see you post the results of running your new
implementation with this benchmark and show the numbers that claim this is
faster. Is that possible?
Cheers,
Carlos.