This is the mail archive of the libc-ports@sources.redhat.com mailing list for the libc-ports project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.


On 30 August 2013 00:58, Joseph S. Myers <joseph@codesourcery.com> wrote:

Hi Joseph,

>> A small change to the entry to the aligned copy loop improves
>> performance slightly on A9 and A15 cores for certain copies.
>
> Could you clarify what you mean by "certain copies"?

Large copies (> 16kB) where the buffers are 4-byte aligned but not
8-byte aligned. I'll respin the patch with an improved description.

> In particular, have you verified that for all three choices in this code
> (NEON, VFP or neither), the code for unaligned copies is at least as fast
> in this case (common 32-bit alignment, but not common 64-bit alignment) as
> the code that would previously have been used in those cases?

Yes, the performance is very similar but slightly better in the NEON
case and approximately unchanged in the others.

> There are various comments regarding alignment, whether stating "LDRD/STRD
> support unaligned word accesses" or referring to the mutual alignment that
> applies for particular code.  Does this patch make any of them out of
> date?  (If code can now only be reached with common 64-bit alignment, but
> in fact requires only 32-bit alignment, the comment should probably state
> both those things explicitly.)

I've reviewed the comments and they all look ok as far as I can tell.

-- 
Will Newton
Toolchain Working Group, Linaro


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]