arch-specific template code

Ulrich Drepper drepper@gmail.com
Tue Aug 28 11:51:00 GMT 2012


On Mon, Aug 27, 2012 at 6:26 PM, Paolo Carlini <paolo.carlini@oracle.com> wrote:
> My personal opinion is that a concrete example, small, but meaningful and
> rather self contained, would help. To be honest, at this stage, isn't clear
> to me which kind of arch-specific optimizations you are thinking about.

Here is a first example.  Note that for now I just added the code in
the middle of random.tcc.  This is an implementation for the
normal_distribution<double>::__generate<> function using SSE3.  The
resulting code runs about 25% faster.  There is really no way to use
the function for any other architecture because it heavily depends on
the x86 intrinsics and hence the x86 instructions.  But there is no
reason why there couldn't be functions with the same interface but
completely different implementation for other archs.  PPC has Altivec,
Arm has Neon.

There will be more opportunities in <random>.  I also have changes for
other headers like <valarray>.  Therefore some precedence for
integrating these type of changes is needed.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: d-random-normal64sse
Type: application/octet-stream
Size: 4492 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/libstdc++/attachments/20120828/7b5fc0fc/attachment.obj>


More information about the Libstdc++ mailing list