This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Ling Ma <ling dot ma dot program at gmail dot com>
- Cc: Nix <nix at esperi dot org dot uk>, libc-alpha at sourceware dot org, hongjiu dot lu at intel dot com
- Date: Fri, 7 Jun 2013 20:45:50 +0200
- Subject: Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
- References: <CAOGi=dMiD=_Qf1EJ=F3hfyQDtQubDEC5pjpXKDCHrUQwhr=vzg at mail dot gmail dot com> <20130605161954 dot GA26401 at domone dot kolej dot mff dot cuni dot cz> <CAOGi=dPWPaX5prcL-uAaqS6=_ehzKeBmAFMdwV6aU34jZ0eHtQ at mail dot gmail dot com> <20130606125511 dot GA28565 at domone dot kolej dot mff dot cuni dot cz> <CAOGi=dPs9geCtrWhU1L_0DEfOWOknpzFSLmYs4gbYzGX8Zn5Hg at mail dot gmail dot com> <20130607104613 dot GA6343 at domone dot kolej dot mff dot cuni dot cz> <8761xqru5w dot fsf at spindle dot srvr dot nix> <CAOGi=dMV5jaS2597cksd0mW84UDd06SovsBkL5=WPez-jZWg4g at mail dot gmail dot com> <20130607160749 dot GA28961 at domone dot kolej dot mff dot cuni dot cz> <CAOGi=dP2s4k2rg8TdKwj6V9-VzbOORGzeBmh-G=Fr1eM_OyDoA at mail dot gmail dot com>
On Sat, Jun 08, 2013 at 12:12:56AM +0800, Ling Ma wrote:
> > First it does not randomize size in any way. This will cause branches to
> > be predicted and as branch prediction can account to 20% of time results
> > you get will be 20% off.
> Ling: Because "A widely held rule of thumb is that a program spends
> 90% of its execution time in only 10% of the code", so hardware
> implemented branch prediction mechanism, stable pattern history
> provide benchmark(SPEC 2000) with average 95% correct prediction,
> fully reandom code will make it useless.
>
And are you sure that it is relevant for memcpy? Compile and run simple
program below.
gcc -fPIC -shared memcpy.c -o memcpy.so
LD_PRELOAD=./memcpy.so bash 2> memcpy_input
It will record alignments and sizes of each memcpy call you do in that
shell. You can see how random they are.
> > Fox example as you ran
> > ./memcpy-test-avx2-bench
> > cpy frequency could be 800MHz
> > then in
> > ./memcpy-test-new-bench
> > a governor can decide to switch to 2.5GHz making results above three
> > times worse than they are.
> Ling: I can confirm it is not issue in my compare.html, but like to
> send out double-check result.
>
> Ondra, if we can test real benchmark, that will more approximate our
> real world usage. So some people know good memcpy benchmarks which
> represent the real world applications, and could you please tell us ?
>
One I posted.
> Thanks & Best Regards
> Ling
#include <stdio.h>
#undef memcpy
void *memcpy(void *_x,const void *_y,size_t n){
char *x=_x,*y=_y;
int i;
for(i=0;i<n;i++){
x[i]=y[i];
}
fprintf(stderr,"memcpy:%i dest %i src %i
size\n",((int)_x)%64,((int)_y)%64,n);
return x;
}