This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v2.0] Use saturated arithmetic for overflow detection.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Paul Eggert <eggert at cs dot ucla dot edu>
- Cc: libc-alpha at sourceware dot org
- Date: Fri, 1 Nov 2013 18:58:02 +0100
- Subject: Re: [PATCH v2.0] Use saturated arithmetic for overflow detection.
- Authentication-results: sourceware.org; auth=none
- References: <20131030174502 dot GA18107 at domone dot podge> <Pine dot LNX dot 4 dot 64 dot 1310301749400 dot 22878 at digraph dot polyomino dot org dot uk> <20131030183318 dot GA18706 at domone dot podge> <20131101133126 dot GA2546 at domone dot podge> <5273E29D dot 90000 at cs dot ucla dot edu>
On Fri, Nov 01, 2013 at 10:19:25AM -0700, Paul Eggert wrote:
> Thanks for looking into this.
>
> I agree with earlier comments that we care about overall
> performance not just the individual ops, that optimizing
> for code size is probably best unless we get significant numbers
> suggesting otherwise, and that it may be time to ask the GCC folks
> for help with fast saturated arithmetic ops. Some other suggestions:
>
I already asked. But adding them will take time and benefits from
builtin are small. When
> Stick with inline functions not macros, and use lower-case names since they're
> functions.
>
> If you like tuning this stuff you might want to look at
> <http://locklessinc.com/articles/sat_arithmetic/>, which
> shows how to do saturated arithmetic without jumps, both portably
> and on x86-64; I don't know whether this will save code space, though.
On sandy bridge my implementation runs
real 0m0.548s
user 0m0.548s
sys 0m0.000s
when I replace multiplication from one in article it is slower.
real 0m0.599s
user 0m0.599s
sys 0m0.000s
I got similar slowdown on core2, nehalem and fx10 machines.
This article is example of missapplying rule of eliminating branches
when possible. This holds only when branch is misspredicted at least 5%
of time. See http://yarchive.net/comp/linux/cmov.html
As code size is concerned my assembly has 8 extra bytes (jump 2, xor 3, neg 3).
When I use sbb trick from article I could decrease that to 5.
A article code is 6 bytes per instruction (sbb 3, or 3)
A comparison is 10 bytes (7 mov constant,reg 3 cmp).