This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2.0] Use saturated arithmetic for overflow detection.


On Fri, Nov 01, 2013 at 01:44:17PM -0700, Paul Eggert wrote:
> On 11/01/2013 10:58 AM, OndÅej BÃlka wrote:
> 
> > I got similar slowdown on core2, nehalem and fx10 machines.
> 
> Conversely, I saw a 2x speedup on my platform, an AMD
> Deneb (Phenom II X4 910e):
> 
>    $ gcc -O2 assembly.c && time ./a.out
> 
>    real    0m2.096s
>    user    0m2.095s
>    sys    0m0.001s
>    $ gcc -O2 branchfree.c && time ./a.out
> 
>    real    0m1.057s
>    user    0m1.054s
>    sys    0m0.002s
>
Weird as I cannot get these on athlon X2 and phenom X6. As one iteration takes
2.096 * 2600 / 340 = 16 cycles a slowdown is 8 cycles which is hard to explain.

I attached binaries which were used to test (gcc version 4.4.5 (Debian 4.4.5-8))

>From possible causes all seem unlikely. For frequency switching a binary
runs too long. A branch is slower when it jumps close to end of 16 byte
boundary, so I tried to move loop byte-by byte but performance stayed
rougthly same. What is left is that branch cache conflict.
> 
> > As code size is concerned my assembly has 8 extra bytes
> > (jump 2, xor 3, neg 3).  When I use sbb trick from article
> > I could decrease that to 5.
> 

> 5 bytes more than what we're doing now,
> or 5 bytes more than the branchfree version?
> I'm worried about code bloat compared to
> what we're doing now.

No, overhead is versus version with no checking.

Attachment: assembly
Description: Binary data

Attachment: branchfree
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]