This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Don't use SSE4_2 instructions on Intel Silvermont Micro Architecture.


Moreover SSSE3 is not good for Silvermont and there are no sse2
unaligned versions for strcmp and memcmp to switch at the moment. I
think we need to have unaligned versions for Core i7 as well.

This is another room for optimization.

I will add new flag bit_Slow_SSE4_2 and switch some function as a
short term solution.

--
Liubov Dmitrieva

On Tue, Jun 18, 2013 at 10:49 AM, OndÅej BÃlka <neleai@seznam.cz> wrote:
> On Mon, Jun 17, 2013 at 11:07:33AM -0700, H.J. Lu wrote:
>> On Mon, Jun 17, 2013 at 10:56 AM, Dmitrieva Liubov
>> <liubov.dmitrieva@gmail.com> wrote:
>> > I checked that functions.
>> > In case of strspn/strcspn/strpbrk to switch SSE4_2 off is bad because
>> > there are no optimized sse2 versions to call instead.
>> > Default versions are not sse there.
>> >
>> > So, it seems we need to create a new flag for Silvermont like
>> > "slowPcmpistri" and fix switches in functions where optimized sse2
>> > exist.
>> >
>> > Or implement optimized sse2 strspn/strcspn/strpbrk and switch SSE4_2 completely.
>> >
> I asked because these are about only case where I cannot get comparable
> results with SSE2. A closest I could get try to split input into upto
> four character intervals and check this in parallel.
> This has bit expensive preprocessing so I still look how to do it
> better.
>>
>> We can add bit_Prefer_SSE2_for_stringop.  When it is set, we
>> will use SSE2 version if it is available.  Otherwise, we use
>> SSE4_2 version if it is available.
>>
>>
> As short term solution I would prefer bit_Slow_SSE4_2.
>
> As long term solution I have optimized implementations for other
> functions that do not use SSE4_2 and are faster.
>
>
>
> When I run `git grep "cmp[ie]str[ie]"` I got
>
> sysdeps/i386/i686/multiarch/strcmp-sse4.S
> sysdeps/x86_64/multiarch/strcmp-sse42.S
>
> I have several ideas but did not get to it yet. It has low priority as a
> hot case is when strings differ in first 16 characters (for example when
> you are sorting.)
>
>
> sysdeps/x86_64/multiarch/rawmemchr.S
>
> Not our case as it needs bit_SSE4_2 and not bit_Prefer_PMINUB_for_stringop
>
> This is false on intel all processors. Most AMD processors are
> misclassified because we do not set anything at all. They have slower
> SSE4_2 which causes performance regression.
>
>
> sysdeps/x86_64/multiarch/strchr.S
> sysdeps/x86_64/multiarch/strrchr.S
>
> I have implementation with faster asyptomatic time but I did not have
> tunning in small cases.
>
>
> sysdeps/x86_64/multiarch/strend-sse4.S
>
> It is bit wierd why do we have this. Definitely you could improve
> performance by taking strlen and modifying return value.
>
> sysdeps/x86_64/multiarch/strstr.c
>
> I have better implementation, I decided to wait for 2.19


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]