This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Potential issue with strstr on x86 with sse4.2 in glibc-2.18
- From: Rich Felker <dalias at aerifal dot cx>
- To: Allan McRae <allan at archlinux dot org>
- Cc: Alexander Monakov <amonakov at ispras dot ru>, libc-alpha at sourceware dot org
- Date: Tue, 20 Aug 2013 00:39:56 -0400
- Subject: Re: Potential issue with strstr on x86 with sse4.2 in glibc-2.18
- References: <520E181D dot 2040308 at archlinux dot org> <alpine dot LNX dot 2 dot 00 dot 1308191628370 dot 2626 at monopod dot intra dot ispras dot ru> <20130819144648 dot GF20515 at brightrain dot aerifal dot cx> <alpine dot LNX dot 2 dot 00 dot 1308191924490 dot 2626 at monopod dot intra dot ispras dot ru> <5212A278 dot 3090909 at archlinux dot org> <20130819230644 dot GM20515 at brightrain dot aerifal dot cx> <5212E278 dot 4030703 at archlinux dot org> <20130820033430 dot GN20515 at brightrain dot aerifal dot cx>
On Mon, Aug 19, 2013 at 11:34:30PM -0400, Rich Felker wrote:
> > I would have assumed that it is gcc's responsibility to ensure alignment
> > if it decides to use SSE and our responsibility if our functions
> > explicitly use SSE. Is that being too naive?
>
> If by "explicitly use SSE" you mean using the intrinsics, alignment
> _should_ be GCC's responsibility just as if GCC had chosen to use SSE
> itself. However I don't know if the reality is like this. The only way
> I can see that GCC would not be expected to take care of alignment is
> when the SSE code resides in inline assembly.
>
> Actually, it's not really the use of SSE, but the use of automatic
> objects with 16-byte-alignment requirements that should cause GCC to
> align the stack. For example, if you have a char array declared with
> __attribute__((aligned(16))) with the intent to pass it to an external
> function that uses SSE, GCC needs to ensure its alignment.
>
> I'm unclear on what GCC's capabilities are in this area; that's why I
> asked.
I just did some tests, and it seems that with the above options, GCC
generates prologue to realign the stack in all non-leaf functions.
This is definitely unacceptable overhead for global usage.
What may be viable is globally using -mpreferred-stack-boundary=2
along with the force_align_arg_pointer attribute on individual
functions that need to make callbacks to application code. From my
experiments, it seems that when -mpreferred-stack-boundary=2 is in
use, GCC will generate code to align the stack only in functions whose
automatic objects need alignment greater than 4-byte.
Unfortunately, force_align_arg_pointer seeme to be a no-op with
-mpreferred-stack-boundary=2, so I think to make a method like this
work, it would need to be combined with the attributes to override
optimization/misc options for a single function.
This all looks like a big mess, and it's all GCC's fault. With such a
nasty incompatible ABI change, they should have added a minimally
invasive way to build code that interoperates: not assuming the stack
pointer is aligned on entry, but preserving the alignment on calls
(i.e. keeping it the same mod 16 as it was on entry) so that both of
these cases work:
1. Caller is using old 4-byte alignment.
2. Caller is using 16-byte alignment and needs its callbacks to be
called with 16-byte alignment.
At present, the only way to get GCC to support both of these usages
seems to be imposing LARGE prologue overhead on every single function.
Rich