This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[RFC] Saturating using arithmetic in address calculation.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Paul Eggert <eggert at cs dot ucla dot edu>
- Cc: "Joseph S. Myers" <joseph at codesourcery dot com>, libc-alpha at sourceware dot org
- Date: Sat, 26 Oct 2013 17:07:55 +0200
- Subject: [RFC] Saturating using arithmetic in address calculation.
- Authentication-results: sourceware.org; auth=none
- References: <20131025084008 dot GA17443 at domone> <526A319B dot 80909 at suse dot com> <Pine dot LNX dot 4 dot 64 dot 1310251530510 dot 12877 at digraph dot polyomino dot org dot uk> <526A92D7 dot 6040501 at cs dot ucla dot edu>
On Fri, Oct 25, 2013 at 08:48:39AM -0700, Paul Eggert wrote:
> Joseph S. Myers wrote:
> > Note that, for efficiency, in such an (inline / macro) wrapper you want to
> > distinguish the argument counting the number of elements (not constant)
> > and the argument counting the size of an element (almost always constant,
> > so SIZE_MAX / size gets folded to a constant) - and make sure each call
> > passes the arguments in the right order, which may not always be the order
> > the existing multiplication is written in.
>
> An example of exactly that is the xnmalloc function in gnulib's lib/xalloc.h,
> which uses the macro xalloc_oversized(N, S) to detect arithmetic overflow
> efficiently when allocating an array of N objects, each of size S.
> Please feel free to adapt or copy that code into glibc.
>
> http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/xalloc.h
> http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/xalloc-oversized.h
After pondering that a bit I come to that what is really needed is
computing size in saturated arithmetic. Overhead is only a jump on
overflow so we save at least one comparison instruction.
Also this would handle malloc (x * y + z)
Also there would be less branching needed as in overflow case we call
malloc (-1) which fails.
For that performance a assembly version/gcc builtin would be needed. A
portable implementation would have comparable performance with manual
checks.
Example how would look like is this:
diff --git a/string/strcoll_l.c b/string/strcoll_l.c
index 4ee101a..76d0d9f 100644
--- a/string/strcoll_l.c
+++ b/string/strcoll_l.c
@@ -524,17 +524,11 @@ STRCOLL (const STRING_TYPE *s1, const STRING_TYPE *s2, __locale_t l)
memset (&seq1, 0, sizeof (seq1));
seq2 = seq1;
- size_t size_max = SIZE_MAX / (sizeof (int32_t) + 1);
+ size_t size_needed = mul_s (s1len + s2len, sizeof (int32_t) + 1);
- if (MIN (s1len, s2len) > size_max
- || MAX (s1len, s2len) > size_max - MIN (s1len, s2len))
+ if (!__libc_use_alloca (size_needed))
{
- /* If the strings are long enough to cause overflow in the size request,
- then skip the allocation and proceed with the non-cached routines. */
- }
- else if (! __libc_use_alloca ((s1len + s2len) * (sizeof (int32_t) + 1)))
- {
- seq1.idxarr = (int32_t *) malloc ((s1len + s2len) * (sizeof (int32_t) + 1));
+ seq1.idxarr = (int32_t *) malloc (size_needed);
/* If we failed to allocate memory, we leave everything as NULL so that
we use the nocache version of traversal and comparison functions. */