This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction

From: Ling Ma <ling dot ma dot program at gmail dot com>
To: Ondřej Bílka <neleai at seznam dot cz>
Cc: libc-alpha at sourceware dot org, Ling <ling dot ml at alibaba-inc dot com>, hongjiu dot lu at intel dot com
Date: Thu, 6 Jun 2013 20:11:15 +0800
Subject: Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
References: <1370424188-4259-1-git-send-email-ling dot ml at alibaba-inc dot com> <20130605121816 dot GA11269 at domone dot kolej dot mff dot cuni dot cz> <CAOGi=dMiD=_Qf1EJ=F3hfyQDtQubDEC5pjpXKDCHrUQwhr=vzg at mail dot gmail dot com> <20130605161954 dot GA26401 at domone dot kolej dot mff dot cuni dot cz> <CAOGi=dPWPaX5prcL-uAaqS6=_ehzKeBmAFMdwV6aU34jZ0eHtQ at mail dot gmail dot com> <20130606125511 dot GA28565 at domone dot kolej dot mff dot cuni dot cz>

(To keep mail thread  consistent, send again with this email address )
Hi Ondra,

Thanks for your correction!
I'm always using test-memcpy.c from glibc to check and compare
performance before today, based on it we find the best result and send
out our patch,  currently we should discard it?
Soon I will test those functions with your profile and other release versions.
If I was wrong, please correct me.

Thanks
Ling


2013/6/6, OndÅej BÃlka <neleai@seznam.cz>:
> On Thu, Jun 06, 2013 at 06:07:51PM +0800, Ling Ma wrote:
>> Hi Ondra
>> I attached results as below:
>> 1) gcc-test-memcpy-output: it compares memcpy files including your
>> memcpy_new, memcpy_sse2_unaligned, memcpy_ssse3_back, memcpy_ssse3,
>> memcpy_vzeroupper_avx2(I added vzeroupper instruction to memcpy_avx2),
>> and memcpy_avx2. the format is from gcc test-memcpy.c
>>
>> 2) results-no-vzeroupper.tar.bz2,: it outputs comparison results
>> including memcpy_avx2 without vzeroupper as you suggested.
>>
>> 3) results-vzeroupper.tar.bz2,: it outputs comparison results
>> including memcpy_vzeroupper_avx2 as you suggested.
>>
>> Any questions please let me know.
>>
> These results show that your patch is 35% slower for gcc see following
> line.
>
> Time ratio to fastest:
> memcpy_glibc: 134.517062% memcpy_new_small: 100.000000% memcpy_new:
> 101.120206% __memcpy_avx2: 136.926079%
>
> It is about same as was glibc because header it has big overhead due of
> computed gotos there.
>
> I forgot to mention that in result.html file you can switch between two
> modes:
> byte - show according to number of bytes
> block - show according to number of aligned 16 byte blocks that need be
> written.
>
> In results_rand/result.html switched to block mode I see that your code
> starts
> being faster from 400 blocks(6400 bytes) onward. For sizes
> that large you soon hit L1 cache exhaustion and in results
> results_rand_L2/result.html
> are much closer.
>
> It looks that best way is use my unaligned header and add avx2 loop
> there.
>
> I generated my file from variant/memcpy_new.c in benchmark by command
>  gcc-4.7 -g -O3 -fPIC -Ivariant  variant/memcpy_new.c -S
> followed by some manual optimization.
>
> You could change loop in c file and then retest.
>
> Ondra
>
>
>
>

Follow-Ups:
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: OndÅej BÃlka

References:
- [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: ling . ma . program
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: OndÅej BÃlka
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: Ling Ma
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: OndÅej BÃlka
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: Ling Ma
- Re: [PATCH 2/2] Improve 64bit memcpy/memmove for Corei7 with avx2 instruction
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]