This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Optimized with SSE2 sinf and cof for x86_32
- From: Dmitrieva Liubov <liubov dot dmitrieva at gmail dot com>
- To: libc-alpha at sourceware dot org
- Date: Thu, 9 Aug 2012 12:13:20 +0400
- Subject: Re: Optimized with SSE2 sinf and cof for x86_32
- References: <CAHjhQ91iSZzUSaDs_MRO3AzzS-KjH7nBp=4ze_NoSTyomsbcHg@mail.gmail.com>
Repost of the patch with optimized Sinf and Cosf for x86_32.
We are looking forward to accepting and releasing this.
http://sourceware.org/ml/libc-alpha/2012-06/msg00624.html
--
Liubov Dmitrieva
Intel Corporation
2012/6/22 Dmitrieva Liubov <liubov.dmitrieva@gmail.com>:
> This is a patch proposing manually optimized and high-performance sinf
> and cosf versions with excellent precision.
>
> Performance on main path [-10000; 10000] is more than 26X better.
>
> Other important intervals are here (ratio of cycles).
>
> (random) Ist. Bulld. Atom Neh. AVX
> cosf |x|<0.78 1,9 2,72 1,65 1,89 1,79 times
> cosf |x|<1.57 1,55 1,84 1,75 1,70 1,55 times
> cosf |x|<2.35 1,64 2,08 1,78 1,75 1,66 times
> cosf |x|<3.14 1,97 2,86 1,97 1,87 2,12 times
> cosf |x|<3.92 2,15 3,50 2,08 2,01 2,33 times
> cosf |x|<4.71 2,29 3,89 2,15 2,07 2,43 times
> cosf |x|<5.49 2,39 4,70 2,21 2,06 2,52 times
> cosf |x|<6.28 2,47 4,62 2,25 2,14 2,58 times
> cosf |x|<7.06 2,54 4,63 2,28 2,16 2,64 times
> cosf |x|<7.85 2,43 4,48 2,27 2,10 2,63 times
> cosf |x|<8.63 2,30 4,47 2,23 2,04 2,56 times
> cosf |x|<9.42 2,21 4,18 2,20 1,99 2,51 times
> cosf |x|<100 2,53 5,43 2,28 2,34 2,01 times
> cosf |x|<1000 19,82 20,50 19,88 17,96 18,37 times
> cosf |x|<10000 25,98 29,78 24,95 23,63 23,52 times
> cosf |x|<1e10 18,92 28,74 20,97 16,16 18,78 times
>
>
> sinf |x|<0.78 1,39 1,75 1,31 1,30 1,28 times
> sinf |x|<1.57 1,47 1,78 1,65 1,62 1,67 times
> sinf |x|<2.35 1,64 2,10 1,77 1,79 1,77 times
> sinf |x|<3.14 1,94 2,85 1,95 1,88 2,09 times
> sinf |x|<3.92 2,12 3,38 2,04 1,91 2,30 times
> sinf |x|<4.71 2,31 3,95 2,14 1,96 2,42 times
> sinf |x|<5.49 2,66 4,57 2,21 2,15 2,51 times
> sinf |x|<6.28 2,53 4,67 2,24 2,17 2,56 times
> sinf |x|<7.06 2,52 4,54 2,23 2,11 2,62 times
> sinf |x|<7.85 2,43 4,54 2,24 2,08 2,59 times
> sinf |x|<8.63 2,33 4,62 2,21 2,09 2,53 times
> sinf |x|<9.42 2,27 4,28 2,17 1,96 2,51 times
> sinf |x|<100 2,52 5,32 2,26 2,34 2,01 times
> sinf |x|<1000 20,12 20,42 19,89 18,24 18,48 times
> sinf |x|<10000 26,26 26,73 25,00 23,11 23,79 times
> sinf |x|<1e10 18,76 28,73 20,90 16,09 18,49 times
>
>
>
> Testing passed for new sinf/cosf with our proprietary test system that
> tests on many intervals with different steps, checks for special
> values (from ISO C) and corner cases. Test using “make check” from
> GLIBC was ok too.
>
> Our test system observed more than 1e4 ulp errors for |x|>1e4 for
> current GLIBC. New asm versions, provided here, are maximum 0.500121
> ulp for sinf, 0.500573 ulp for cosf.
>
>
> ChangeLog:
>
> 2012-06-22 Liubov Dmitrieva <liubov.dmitrieva@gmail.com>
>
> * sysdeps/i386/i686/fpu/multiarch/Makefile: Update
> (sysdep_routines): Add s_sinf-sse2, s_conf-sse2
>
> * sysdeps/i386/i686/fpu/multiarch/s_sinf-sse2.S New file
> * sysdeps/i386/i686/fpu/multiarch/s_cosf-sse2.S New file
> * sysdeps/i386/i686/fpu/multiarch/s_sinf.c New file
> * sysdeps/i386/i686/fpu/multiarch/s_cosf.c New file
> * sysdeps/ieee754/flt-32/s_sinf.c Update
> (SINF): Add macro for using routine as __sinf_ia32
> * sysdeps/ieee754/flt-32/s_cosf.c Update
> (COSF): Add macro for using routine as __cosf_ia32
>
> * sysdeps/i386/i686/fpu/multiarch/e_expf-sse2.S Fix Copyright
> * sysdeps/i386/i686/fpu/multiarch/e_expf.c Fix Copyright
>
>
> --
> Liubov Dmitrieva
>
> Software Engineer
> Intel Corporation