This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

glibc merge sort


Hello. 
I am rewriting again about a small improvement to glibc mergesort. I am
posting to this mailing list hoping that some people are interested in
testing idea.
The idea is very simple. Currently in glibc msort, sorting an array of
size n requires allocation of an array of n elements and a final copy of
n elements. It is possible with a very simple modification to get to
allocation of only n/2 elements and final copy of only n/2 elements. 
The algorithms works as follows : allocate a temporary of size n/2. Then
recursively do the following : 
1- divide array into two halves: lower half h1 and upper half h2. h1 is
of size n/2 and h2 is of size n-n/2. 
2- Sort recursively h1 and h2. 
3- Merge elements from h1 and h2 into temporary array until total n/2
elements have been merged(copied) into temporary. 
4- Continue merging of remaining elements of h1 and h2 into h2. 
5- Copy back temporary array into h1. 

Implementing the idea for uint32_t gives(note that temporary is required
to be only of size n/2):
    case 0:
// counter i permits to copy exactly n1 elements into temporary
      i = n1;
      while (i-- > 0)
        {
          if ((*cmp) (b1, b2) <= 0)
            {
              *(uint32_t *) tmp = *(uint32_t *) b1;
              b1 += sizeof (uint32_t);
            }
          else
            {
              *(uint32_t *) tmp = *(uint32_t *) b2;
              b2 += sizeof (uint32_t);
            }
          tmp += sizeof (uint32_t);
        }
      tmp = b + n1 * s;
// check if we hit the end of first or second half      
// The loop fills upper half with remaining elements of two halves
      while (b2 != tmp && b2 != b + n * s)
        {
          if ((*cmp) (b1, b2) <= 0)
            {
              *(uint32_t *) tmp = *(uint32_t *) b1;
              b1 += sizeof (uint32_t);
            }
          else
            {
              *(uint32_t *) tmp = *(uint32_t *) b2;
              b2 += sizeof (uint32_t);
            }
          tmp += sizeof (uint32_t);
        }
      break;

and then final copy would be replaced by :

  if (b2 != tmp)
    memcpy (tmp, b1, (b2 - tmp));
  memcpy (b, t, n1 * s);

I remember that i got a speedup of approx 10% for large arrays of 32
bits elements. The only drawback is small increase in code size as
sorting now uses two loops instead of single one. Idea could also be
applied to other types than uint32_t. Perhaps it could be implemented
only for types most commonly used or for which it gives best speedup.
Best regards.  



	

	
		
___________________________________________________________________________ 
Yahoo! Mail réinvente le mail ! Découvrez le nouveau Yahoo! Mail et son interface révolutionnaire.
http://fr.mail.yahoo.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]