This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Split mantissa calculation loop and add branchprediction to mp multiplication


On Mon, 2012-12-31 at 14:58 +0530, Siddhesh Poyarekar wrote:
> Hi,
> 
> The loop inside __mul in mpa.c is slow because of the branch
> evaluation within the loop.  Attached patch splits the loop to make
> the loop simpler to evaluate and also adds some branch prediction to
> improve performance.  I have verified that this does not break the
> testsuite.
> 
> I checked for performance improvement with this patch using the test
> program here:
> 
> http://entropymine.com/imageworsener/slowpow/
> 
> with the following commandline:
> 
> time ./powtest 1000000 1.0000000000000020 1.5000000000500000
> 
> Taking into account the best times with and without the patch (with
> more than 10 runs each), the improvement is about a quarter of a
> second.
> 
> OK to commit?
> 
> Siddhesh
> 
> ChangeLog:
> 
> 	* sysdeps/ieee754/dbl-64/mpa.c (__mul): Split mantissa
> 	calculation loop and add branch prediction.
> 
> diff --git a/sysdeps/ieee754/dbl-64/mpa.c b/sysdeps/ieee754/dbl-64/mpa.c
> index 78afd99..278fdb6 100644
> --- a/sysdeps/ieee754/dbl-64/mpa.c
> +++ b/sysdeps/ieee754/dbl-64/mpa.c
> @@ -447,33 +447,51 @@ void
>  SECTION
>  __mul(const mp_no *x, const mp_no *y, mp_no *z, int p) {
> 
> -  int i, i1, i2, j, k, k2;
> +  int i, j, k, k2;
>    double u;
> 
> -		      /* Is z=0? */
> -  if (X[0]*Y[0]==ZERO)
> -     { Z[0]=ZERO;  return; }
> -
> -		       /* Multiply, add and carry */
> -  k2 = (p<3) ? p+p : p+3;
> -  Z[k2]=ZERO;
> -  for (k=k2; k>1; ) {
> -    if (k > p)  {i1=k-p; i2=p+1; }
> -    else        {i1=1;   i2=k;   }
> -    for (i=i1,j=i2-1; i<i2; i++,j--)  Z[k] += X[i]*Y[j];
> -
> -    u = (Z[k] + CUTTER)-CUTTER;
> -    if  (u > Z[k])  u -= RADIX;
> -    Z[k]  -= u;
> -    Z[--k] = u*RADIXI;
> -  }
> +  /* Is z=0?  */
> +  if (__glibc_unlikely (X[0] * Y[0] == ZERO))
> +    {
> +      Z[0]=ZERO;
> +      return;
> +    }
> 
> -		 /* Is there a carry beyond the most significant digit? */
> -  if (Z[1] == ZERO) {
> -    for (i=1; i<=p; i++)  Z[i]=Z[i+1];
> -    EZ = EX + EY - 1; }
> -  else
> -    EZ = EX + EY;
> +  /* Multiply, add and carry */
> +  k2 = (__glibc_unlikely (p < 3)) ? p + p : p + 3;
> +  Z[k2] = ZERO;
> +
> +  for (k = k2; k > p; )
> +    {
> +      for (i = k - p, j = p; i < p + 1; i++, j--)
> +	Z[k] += X[i] * Y[j];
> +
> +      u = (Z[k] + CUTTER)-CUTTER;
> +      if  (u > Z[k])  u -= RADIX;
> +      Z[k]  -= u;
> +      Z[--k] = u * RADIXI;
> +    }
> +
> +  for (; k > 1; )
> +    {
> +      for (i = 1,j = k - 1; i < k; i++, j--)
> +	Z[k] += X[i] * Y[j];
> +
> +      u = (Z[k] + CUTTER)-CUTTER;
> +      if  (u > Z[k])
> +	u -= RADIX;
> +      Z[k]  -= u;
> +      Z[--k] = u*RADIXI;
> +    }
> +
I do not understand what you are doing here. If the intent is to replace
the X[], Y[], Z[] doubles with int's you will get overflows in Z[] if
you are changing X[], y[]. Z[] with uint64_t then you avoid the
overflows but (Z[k] + CUTTER)-CUTTER has no effect and you have not
saved any space. Also u is still a double, so you are adding some
expensive int->float->int converts to the inter loop. 

Have you really thought this through?

> +  EZ = EX + EY;
> +  /* Is there a carry beyond the most significant digit? */
> +  if (__glibc_unlikely (Z[1] == ZERO))
> +    {
> +      for (i = 1; i <= p; i++)
> +	Z[i] = Z[i+1];
> +      EZ--;
> +    }
> 
>    Z[0] = X[0] * Y[0];
>  }
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]