g++ 3.4.0 cygwin, codegen SSE & alignement issues

Dave Korn dk@artimi.com
Wed Apr 28 17:17:00 GMT 2004


> -----Original Message-----
> From: Tim Prince 
> Sent: 28 April 2004 17:19

> Because of the different division of responsibilities, if a 
> function built 
> by gcc is called by a function built by a commercial compiler 
> (or by gcc 
> -Os), the stack has a 75% probability of being mis-aligned.  
> It may be 
> possible to overcome this by having a wrapper function 
> between, which is 
> built by gcc with alignment specified, but does not use SSE.

  I once wrote a patch for gcc (for the ppc backend, but the principles
should be applicable if not the actual code) to add a new -m option, the
effect of which was to modify prolog generation code so that instead of just
subtracting a constant from the sp to allocate the new frame, it also
dynamically calculated how much extra to subtract to get the correct
alignment for the resulting new sp value.  It was pretty simple, involving
just a few extra assembler instructions in each prolog.

[  In fact, it may not be as simple as that (...any more).  With the ppc
eabi, the effect of allocating more space on the stack than you've actually
defined in the stack frame is that a gap opens up between the outgoing args
area, which grows up from the bottom of the frame, and the local vars and
saved regs area, which grow down from the top of the frame.  This didn't do
any harm in 2.95.x, but it might well go wrong in gcc-3.x.x, where the
handling of eliminable regs and starting frame offset is different.  I'm
also unsure about how badly this sort of malarkey might break gdb's
understanding of what is going on in a function's frame, but I would imagine
it would do so quite badly.  ]

  It's a total waste of bytes in a situation where you know that the OS or
CRT gets it right for you, but it would be useful in a mixed
objects/abis/compilers situation.  Looks like there might be call for the
same sort of thing for the i.86 backend?

> Presumably, there is a performance advantage to gcc of 
> assuming that the 
> caller passes an aligned stack, but not enough to persuade commercial 
> compilers to adopt a compatible scheme.

  Well, it's quicker to allocate a constant size stack frame than to
dynamically calculate the alignment requirements, but only by two or three
fairly trivial instructions.  And although aligning the frame just once at
startup and keeping it aligned by always allocating aligned-size stack
frames, in some situations stack memory is a limited resource, and
particularly since not all code uses vector registers, there's a lot of
stack memory usage to be saved by not making all the stack frames bigger
just for the sake of the very few frames for functions that actually use the
vector regs.  So I'd say it's probably one of those trade-offs for which
there's no one 'right' answer.

    cheers,
       DaveK
-- 
Can't think of a witty .sigline today....


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list