This is the mail archive of the libc-ports@sources.redhat.com mailing list for the libc-ports project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Optimize MIPS memcpy


On 5/09/2012, at 3:09 AM, Steve Ellcey wrote:

> On Mon, 2012-09-03 at 02:12 -0700, Andrew T Pinski wrote:
>> Forgot to CC libc-ports@ .
>> On Sat, 2012-09-01 at 18:15 +1200, Maxim Kuvyrkov wrote:
>>> This patch improves MIPS assembly implementations of memcpy.  Two optimizations are added:
>> prefetching of data for subsequent iterations of memcpy loop and pipelined expansion of unaligned
>> memcpy.  These optimizations speed up MIPS memcpy by about 10%.
>>> 
>>> The prefetching part is straightforward: it adds prefetching of a cache line (32 bytes) for +1
>> iteration for unaligned case and +2 iteration for aligned case.  The rationale here is that it will
>> take prefetch to acquire data about same time as 1 iteration of unaligned loop or 2 iterations of aligned loop.  Values for these parameters were tuned on a modern MIPS processor.
>>> 
>> 
>> This might hurt Octeon as the cache line size there is 128 bytes.  Can
>> you say which modern MIPS processor which this has been tuned with?  And
>> is there a way to not hard code 32 in the assembly but in a macro
>> instead.
>> 
>> Thanks,
>> Andrew Pinski
> 
> I've been looking at the MIPS memcpy and was planning on submitting a
> new version based on the one that MIPS submitted to Android.  It has
> prefetching like Maxim's though I found that using the load and 'prepare
> for store' hints instead of 'load streaming' and 'store streaming' hints
> gave me better results on the 74k and 24k that I did performance testing
> on.

I didn't experiment with various prefetching hints, so this very well may be the case.

> 
> This version has more unrolling too and between that and the hints
> difference I got a small performance improvement over Maxim's version
> when doing small memcpy's and a fairly substantial improvement on large
> memcpy's.
> 
> I also merged the 32 and 64 bit versions together so we would only have
> one copy to maintain.  I haven't tried building it as part of glibc yet,
> I have been testing it standalone first and was going to try and
> integrate it into glibc and submit it this week or next.  I'll attach it
> to this email so folks can look at it and I will see if I can
> parameterize the cache line size.  This one also assumes a 32 byte cache
> prefetch.
> 

Your version looks quite good.  If you could wrap it up into a glibc patch I would test it on our setup to confirm that it indeed provides better performance.

Thanks,

--
Maxim Kuvyrkov
Mentor Graphics


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]