This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction


On Fri, Jun 07, 2013 at 09:37:22PM +0800, Ling Ma wrote:
> Hi Ondra,
> If we prefer to backward copy, it will cause memory false dpendence
> and impact our performance as we mentioned above.
> 
> Today we introduce libmicro-0.4.2 (https://java.net/projects/libmicro/)
> and it can help us to measure performance more precisely.
> 
> Based on the result we changed code and get better performance as
> compare.html shows(memcpy-avx2.S execution time is on the right, that
> of memcpy_new.s is on the left). Anyone who has haswell machine can
> test as below:
> 1) tar xjvf libmicro-memcpy.tar.bz2
> 2) cd libmicro-memcpy
> 3)make clean;make
> 4)./memcpy-test-avx2-bench &>memcpy-avx2-output (result from memcpy-avx2.S )
> 5) ./memcpy-test-new-bench &>memcpy-new-output (result from memcpy_new.s )
> 6)./multiview memcpy-new-output memcpy-avx2-output >compare.html
> (memcpy_new.s result is on the left, memcpy-avx2.S result is on the
> right )
> The compare.html shows the comparison result.
> Tomorrow we will try to use vtune, then send out comparison result if
> time is available.
>
That bechmark is wrong in several ways. 

First it does not randomize size in any way. This will cause branches to
be predicted and as branch prediction can account to 20% of time results
you get will be 20% off. 

Same applies to alignment, it needs to be randomized otherwise you lose
part of performance profile. Setting alignment by config variable is
pointless as it will only distinguish aligned/unaligned. 

Then we move to aggregation of results. 
It tests a single implementation a time which is wrong. A runtime of
process depends on many variables and you introduce bias by doing this.

Fox example as you ran 
./memcpy-test-avx2-bench 
cpy frequency could be 800MHz
then in
./memcpy-test-new-bench
a governor can decide to switch to 2.5GHz making results above three
times worse than they are.

Or any action that you do on computer can similary affect these.

Proper way is test both of them at once and randomize which gets
selected.

Please post comparison with all those issues fixed.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]