This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] Saturating using arithmetic in address calculation.


On Fri, Oct 25, 2013 at 08:48:39AM -0700, Paul Eggert wrote:
> Joseph S. Myers wrote:
> > Note that, for efficiency, in such an (inline / macro) wrapper you want to 
> > distinguish the argument counting the number of elements (not constant) 
> > and the argument counting the size of an element (almost always constant, 
> > so SIZE_MAX / size gets folded to a constant) - and make sure each call 
> > passes the arguments in the right order, which may not always be the order 
> > the existing multiplication is written in.
> 
> An example of exactly that is the xnmalloc function in gnulib's lib/xalloc.h,
> which uses the macro xalloc_oversized(N, S) to detect arithmetic overflow
> efficiently when allocating an array of N objects, each of size S.
> Please feel free to adapt or copy that code into glibc.
> 
> http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/xalloc.h
> http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/xalloc-oversized.h

After pondering that a bit I come to that what is really needed is
computing size in saturated arithmetic. Overhead is only a jump on
overflow so we save at least one comparison instruction.

Also this would handle malloc (x * y + z) 

Also there would be less branching needed as in overflow case we call
malloc (-1) which fails.

For that performance a assembly version/gcc builtin would be needed. A
portable implementation would have comparable performance with manual
checks.

Example how would look like is this:

diff --git a/string/strcoll_l.c b/string/strcoll_l.c
index 4ee101a..76d0d9f 100644
--- a/string/strcoll_l.c
+++ b/string/strcoll_l.c
@@ -524,17 +524,11 @@ STRCOLL (const STRING_TYPE *s1, const STRING_TYPE *s2, __locale_t l)
   memset (&seq1, 0, sizeof (seq1));
   seq2 = seq1;
 
-  size_t size_max = SIZE_MAX / (sizeof (int32_t) + 1);
+  size_t size_needed = mul_s (s1len + s2len, sizeof (int32_t) + 1);
 
-  if (MIN (s1len, s2len) > size_max
-      || MAX (s1len, s2len) > size_max - MIN (s1len, s2len))
+  if (!__libc_use_alloca (size_needed))
     {
-      /* If the strings are long enough to cause overflow in the size request,
-         then skip the allocation and proceed with the non-cached routines.  */
-    }
-  else if (! __libc_use_alloca ((s1len + s2len) * (sizeof (int32_t) + 1)))
-    {
-      seq1.idxarr = (int32_t *) malloc ((s1len + s2len) * (sizeof (int32_t) + 1));
+      seq1.idxarr = (int32_t *) malloc (size_needed);
 
       /* If we failed to allocate memory, we leave everything as NULL so that
 	 we use the nocache version of traversal and comparison functions.  */


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]