This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] SPU use a non-functional errno


Hi Jeff -

On Tue, Apr 01, 2008 at 04:10:24PM -0400, Jeff Johnston wrote:
> Patrick Mansfield wrote:
>> Hi Jeff -
>>
>> Not sure of the best way to handle this, please comment or apply, thanks!
>>
>
> If you want to override sys/errno.h, that's fine, but if you want impure.c 
> changes, you'll have to override that file in libc/machine/spu as well.

OK.

>> Modify SPU to directly use errno, since it does not need reentrant code.
>> With this patch, any code using errno is 8 bytes smaller, plus there is no
>> function call.
>>
>> More importantly, testing of the SPU optimized math function asin showed a
>> decrease in time of 16%, a simple test took 16.6 seconds, where without
>> the change it took 19.7 seconds. Similar gains are likely for other math
>> domain checks in the SPU optimized math code (they are already set up for
>> branchless compare and setting of errno, but the code has to always read
>> and set errno).

>
> You should only be touching errno on an error.  Normally, you have a local 
> variable to store results and then check for failure.  If failure occurs, 
> you set errno appropriately.  Thus, you only slow down on failure which is 
> more than reasonable and if you get your local variable into a register(s), 
> you can do very efficient checking/branching for the non-error case.

Adding an "if" can actually slow down the code on SPU, since it has
no branch prediction and a fairly long pipeline.

I ran some test cases with acosh, it has domain checking of x < 1.
There are also abs(x) < 1 checks, I did not try to run comparison tests
for any of those cases.

Is the following enough data for you to accept an updated patch?

We have four cases, with/without errno as a function, and with/without
branchless code for the domain check (branchless SPU code like we have in
newlib/libm/machine/spu/headers/dom_chkd_less_than.h vs normal C "if (x <
1)" code). 

The compiler is generating a branch for the "if" code (sometimes it can
actually generate branchless code), and nicely adds a branch hint assuming
the branch will not be taken.

I'm using spu-gcc from IBM's CELL SDK 3.0.

For acosh(2) test case, normalized to 1 for the timing of the "if" and
errno a function, we have:

              errno function          non-function errno
with if         1.00                      .99
branchless      1.10                      .96

So branchless with errno function is the worst, branchless with
non-function errno the best, but it is not that much faster than the "if"
with non-function errno.

If the domain is bad, we'll be quite a bit slower for the "if"
cases. For acosh(.5) I see:

              errno function          non-function errno
with if          1.00                      .93
branchless        .99                      .87

"if" with errno function is now the worst, but branchless with
non-function errno is quite a bit faster.

And code sizes are:

              errno function          non-function errno
with if          35052                    34988 
branchless       35020                    34988

So non-function errno saves 64 or 32 bytes.

The test program also includes fprintf() calls, and it references errno in
the assist call, not sure how much smaller the non-functional errno makes
fprintf().

-- Patrick Mansfield


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]