This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] BZ #14059 - HAS_FMA4 check needs to also check for AVX
On 11/05/12 18:22, Andreas Jaeger wrote:
> On 05/11/2012 09:10 AM, Allan McRae wrote:
>> On 11/05/12 16:48, Andreas Jaeger wrote:
>>> On 05/11/2012 08:34 AM, Andreas Jaeger wrote:
>>>> On 05/11/2012 12:18 AM, Carlos O'Donell wrote:
>>>>> On Thu, May 10, 2012 at 6:08 PM, Carlos O'Donell
>>>>> <carlos@systemhalted.org> wrote:
>>>>>> On Tue, May 8, 2012 at 4:23 AM, Andreas Jaeger<aj@suse.de> wrote:
>>>>>>>
>>>>>>> There are two things broken in checking for HAS_FMA4/AVX on
>>>>>>> x86-64 in
>>>>>>> the multiarch code:
>>>>>>>
>>>>>>> 1. We should disable FAM4 if AVX is not available (due to disabled
>>>>>>> OSXSAVE)
>>>>>>>
>>>>>>> 2. commit 56f6f6a2403cfa7267cad722597113be35ecf70d reverted some
>>>>>>> changes
>>>>>>> from commit 08cf777f9e7f6d826658a99c7d77a359f73a45bf but
>>>>>>> forgot to
>>>>>>> revert the change for AVX. We really have to disable AVX if
>>>>>>> it's
>>>>>>>
>>>>>>> The issue is easy reproduceable under Xen for the reporter.
>>>>>>>
>>>>>>> Also, the two commits introduced YMM_Usable and removed its usage
>>>>>>> but
>>>>>>> did not remove the code from the headers, I'm cleaning this up as
>>>>>>> well.
>>>>>>>
>>>>>>> I'm appending a patch that I tested on Linux/x86-64 (without Xen).
>>>>>>>
>>>>>>> Ok to commit?
>>>>>>
>>>>>> No.
>>>>>>
>>>>>> I actually think the code Ulrich wrote originally was almost correct,
>>>>>> but missing this:
>>>>>
>>>>> If I had to fix it it would look like this:
>>>>
>>>> Thanks, this looks great - but misses the check for FMA4.
>>>>
>>>> Here's a combined patch that compiled fine for me on
>>>> Linux/x86-64. Jeff, could you test it under Xen, please?
>>>>
>>>> Andreas
>>>>
>>>> 2012-05-11 Andreas Jaeger<aj@suse.de>
>>>> Carlos O'Donell<carlos_odonell@mentor.com>
>>>>
>>>> [BZ #14059]
>>>> * sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
>>>> Fix check for AVX, enable FMA4 only if it exists and if AVX is
>>>> usable.
>>>>
>>>> * sysdeps/x86_64/multiarch/init-arch.h (bit_AVX_Usable): Renamed
>>>> from bit_YMM_usable.
>>>> (bit_FMA4_Usable): New macro.
>>>> (bit_XMM_state): New macro.
>>>> (bit_YMM_state): New macro.
>>>> (index_AVX_Usable): Renamed from index_YMM_Usable.
>>>> (index_FMA4_Usable): New macro.
>>>> (HAS_YMM_USABLE): Delete.
>>>> (HAS_AVX): Use HAS_ARCH_FEATURE.
>>>> (HAS_FMA4): Likewise.
>>>
>>> That version above contained a typo. Here's an update
>>>
>>> Allan, does this work now?
>>>
>>
>> No. I still get an illegal instruction in __strcasecmp_l_avx with this
>> version.
>
> Allan, thanks for testing!
>
> Indeed, it fails:
>
> The code tests the bits in cpuid and does not use the HAS_AVX flag:
> # ifdef HAVE_AVX_SUPPORT
> leaq __strcasecmp_avx(%rip), %rax
> testl $bit_AVX, __cpu_features+CPUID_OFFSET+index_AVX(%rip)
> jnz 2f
> # endif
>
> Please test the following.
>
> I'm now offline until Monday, so won't be able to continue this, if
> anybody likes to finish and commit the patch, I would appreciate it,
>
> Andreas
>
> 2012-05-11 Andreas Jaeger <aj@suse.de>
>
> * sysdeps/x86_64/multiarch/strcmp.S: Use bit_AVX_Usable.
>
>
> diff --git a/sysdeps/x86_64/multiarch/strcmp.S
> b/sysdeps/x86_64/multiarch/strcmp.S
> index 2b9870b..eaf852b 100644
> --- a/sysdeps/x86_64/multiarch/strcmp.S
> +++ b/sysdeps/x86_64/multiarch/strcmp.S
> @@ -1,5 +1,5 @@
> /* strcmp with SSE4.2
> - Copyright (C) 2009, 2010, 2011 Free Software Foundation, Inc.
> + Copyright (C) 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
> Contributed by Intel Corporation.
> This file is part of the GNU C Library.
>
> @@ -106,7 +106,7 @@ ENTRY(__strcasecmp)
> 1:
> # ifdef HAVE_AVX_SUPPORT
> leaq __strcasecmp_avx(%rip), %rax
> - testl $bit_AVX, __cpu_features+CPUID_OFFSET+index_AVX(%rip)
> + testl $bit_AVX_Usable,
> __cpu_features+FEATURE_OFFSET+index_AVX_Usable(%rip)
> jnz 2f
> # endif
> leaq __strcasecmp_sse42(%rip), %rax
> @@ -129,7 +129,7 @@ ENTRY(__strncasecmp)
> 1:
> # ifdef HAVE_AVX_SUPPORT
> leaq __strncasecmp_avx(%rip), %rax
> - testl $bit_AVX, __cpu_features+CPUID_OFFSET+index_AVX(%rip)
> + testl $bit_AVX_Usable,
> __cpu_features+FEATURE_OFFSET+index_AVX_Usable(%rip)
> jnz 2f
> # endif
> leaq __strncasecmp_sse42(%rip), %rax
>
This patch (along with the one to fix the main AVX check), fixes all
issues I have seen.
Allan