Optimising std::find on x86 and PPC
Chris Jefferson
caj@cs.york.ac.uk
Tue Dec 14 14:39:00 GMT 2004
Hello,
I recently tried changing the std::find random_access overload to change
the main loop from:
difference_type __trip_count = (__last - __first) >> 2;
for(; __trip_count > 0 ; --__trip_count) { if(*__first = __val) return
__first; ++__first; (4 times) }
to:
Iterator __newlast = __last - (__last - __first) % 4;
for( ; __first < __newlast;){ if(*__first = __val) return __first;
++__first; (4 times) }
This knocked about 30% off the time taken on x86 (Note that in a final
version I'd change the %4 into some kind of &ing and/or shifting).
Unfortunatly, a quick test on Mac OS X by Andrew Pinski (thank you!)
found that this slightly decreased both performance in terms of both
space and time on the, as this new version will no longer use the
specialised "count" operator.
Seeing as so many functions use find internally, it seems silly to not
include some kind of improvement. However at the same time of course we
don't want to damage the OSX performance. I suspect the reason the loop
optimisers can't deal with this by themselves is because we unroll this
loop 4 iterations manually, which I suspect confuses it.
If anyone with more knowledge than me knows either how I could tweak
this code so it optimises well on both x86 or PPC, or how hard it might
be to poke the optimiser so one version of this code can be efficent on
both processors.
The last option of course is to start having different code for
different processors.. I'd imagine doing so would only be a route of
last resort however.
Chris
More information about the Libstdc++
mailing list