This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Only set rounding mode in SET_RESTORE_ROUND_53BIT


On 05/30/2013 08:49 AM, Siddhesh Poyarekar wrote:
> On Thu, May 30, 2013 at 02:29:17PM +0530, Siddhesh Poyarekar wrote:
>> On Wed, May 29, 2013 at 07:33:19PM +0530, Siddhesh Poyarekar wrote:
>>> I'm trying to understand the code that sets and restores the floating
>>> point state to try and eek out some performance from it.  I haven't
>>> been able to figure out the need for SET_RESTORE_ROUND_53BIT vs
>>> SET_RESTORE_ROUND_53 for x86_64 - I can see that it's needed for x87:
>>>
>>> SET_RESTORE_ROUND (used in exp, log) only sets and restores the
>>> rounding mode and does not touch the exception masks.  This was
>>> holdexcept_setround, i.e. the same implementation as
>>> SET_RESTORE_ROUND_53BIT before, but I assume the clearing of exception
>>> mask, etc. was removed because it could result in delay in raising
>>> floating point exceptions resulting from some operations within these
>>> functions if exceptions are masked.
>>>
>>> SET_RESTORE_ROUND_53BIT (used in sin/cos) sets the rounding mode,
>>> exception mask and clears all exception flags.  Why can't we simply
>>> get away with setting and restoring just the rounding mode like in
>>> exp, log, etc?  That opens up a good opportunity for optimization
>>> where I could avoid most of the set/restore code in the default case.
>>> sin and cos run almost twice as fast in some cases.
>>
>> So to put all this in code, the below patch is what I intend to do.
>> Instead of modifying exception flags, SET_RESTORE_ROUND_53BIT only
>> sets rounding mode and floating point mode (in x87) and restores it on
>> exit from the lexical block.  I tested x86_64 as well as i386 with
>> --disable-sse to verify that this does not cause any regressions in
>> the testsuite.  Would this change be correct?
> 
> Sorry, I keep replying to myself, but I forgot to test this for
> possible benchmark improvements.  Here are the results for sin, cos
> and tan, which was the functions affected by this change:
> 
> Before:
> 
> cos(): ITERS:3.1054e+08: TOTAL:28904.2Mcy, MAX:3916.53cy, MIN:69.603cy, 10743.8 calls/Mcy
> sin(): ITERS:2.86419e+08: TOTAL:28911.1Mcy, MAX:333.326cy, MIN:46.76cy, 9906.88 calls/Mcy
> tan(): ITERS:6.7984e+07: TOTAL:28908.9Mcy, MAX:3693.01cy, MIN:420.518cy, 2351.66 calls/Mcy
> 
> After:
> 
> cos(): ITERS:4.8486e+08: TOTAL:28890.5Mcy, MAX:188.859cy, MIN:57.562cy, 16782.7 calls/Mcy
> sin(): ITERS:4.06791e+08: TOTAL:28904.6Mcy, MAX:210.524cy, MIN:47.592cy, 14073.6 calls/Mcy
> tan(): ITERS:7.3346e+07: TOTAL:28907.1Mcy, MAX:800.115cy, MIN:388.597cy, 2537.3 calls/Mcy
> 
> So that amounts to ~56% improvmeent for cos, 42% for sin and 7.8% for
> tan.
> 
> So there is a definite performance advantage for just this patch as well.

Yay! This is exactly what the benchmarks are for! :-)

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]