This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH v2.0] Use saturated arithmetic for overflow detection.

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Paul Eggert <eggert at cs dot ucla dot edu>
Cc: libc-alpha at sourceware dot org
Date: Fri, 1 Nov 2013 18:58:02 +0100
Subject: Re: [PATCH v2.0] Use saturated arithmetic for overflow detection.
Authentication-results: sourceware.org; auth=none
References: <20131030174502 dot GA18107 at domone dot podge> <Pine dot LNX dot 4 dot 64 dot 1310301749400 dot 22878 at digraph dot polyomino dot org dot uk> <20131030183318 dot GA18706 at domone dot podge> <20131101133126 dot GA2546 at domone dot podge> <5273E29D dot 90000 at cs dot ucla dot edu>

On Fri, Nov 01, 2013 at 10:19:25AM -0700, Paul Eggert wrote:
> Thanks for looking into this.
> 
> I agree with earlier comments that we care about overall
> performance not just the individual ops, that optimizing
> for code size is probably best unless we get significant numbers
> suggesting otherwise, and that it may be time to ask the GCC folks
> for help with fast saturated arithmetic ops.  Some other suggestions:
> 
I already asked. But adding them will take time and benefits from
builtin are small. When 

> Stick with inline functions not macros, and use lower-case names since they're
> functions.
>

> If you like tuning this stuff you might want to look at
> <http://locklessinc.com/articles/sat_arithmetic/>, which
> shows how to do saturated arithmetic without jumps, both portably
> and on x86-64; I don't know whether this will save code space, though.

On sandy bridge my implementation runs

real	0m0.548s
user	0m0.548s
sys	0m0.000s

when I replace multiplication from one in article it is slower.

real	0m0.599s
user	0m0.599s
sys	0m0.000s

I got similar slowdown on core2, nehalem and fx10 machines. 

This article is example of missapplying rule of eliminating branches
when possible. This holds only when branch is misspredicted at least 5%
of time. See http://yarchive.net/comp/linux/cmov.html

As code size is concerned my assembly has 8 extra bytes (jump 2, xor 3, neg 3).
When I use sbb trick from article I could decrease that to 5.

A article code is 6 bytes per instruction (sbb 3, or 3)

A comparison is 10 bytes (7 mov constant,reg 3 cmp).

Follow-Ups:
- Re: [PATCH v2.0] Use saturated arithmetic for overflow detection.
  - From: OndÅej BÃlka
- Re: [PATCH v2.0] Use saturated arithmetic for overflow detection.
  - From: Paul Eggert

References:
- [PATCH v2.0] Use saturated arithmetic for overflow detection.
  - From: OndÅej BÃlka
- Re: [PATCH v2.0] Use saturated arithmetic for overflow detection.
  - From: Paul Eggert

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]