This is the mail archive of the crossgcc@sourceware.cygnus.com mailing list for the crossgcc project.

See the CrossGCC FAQ for lots more infromation.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: arm-coff-gcc function prologue w/ unsigned short parameter


> > By declaring x as "unsigned short", you are saying that only the bottom 16 
> > bits contain meaningful data; but you then try to use all 32 (assuming 
> > that the top 16 are zero).
> 
> Perhaps I was wrong, but I was under the impression that the ARM core
> cannot deal with just 16 bits of a register unless loading or storing it
> to memory.  All ALU manipulations _must_ operate on all 32 bits.  Is
> this the case?  If so, then would it not be a requirement to zero the
> top 16 bits of a register containing an unsigned short value before
> operating on it?  And if not, then why does gcc religiously do so, even
> under -O2 optimization (short of this one bug I'm investigating)?

Ok, let me rephrase my statement slightly more carefully.

By declaring x as "unsigned short", you are saying that when it is 
transferred into a 32-bit register, only the bottom 16 bits are 
meaningful.  If you subsequently want to do an operation on that register 
that relies on the top 16-bits being well defined, then you/the compiler 
must first convert it into a 32-bit quantity (by zero- or sign-extending 
it).  If you do not do this, then the top 16 bits my contain garbage (the 
compiler is not required to keep those top bits correct at all times).

So, for example, the code

  void foo(unsigned short *x)
  {
    *x += 16;
  }

can be compiled to 

  ldrh	r1, [r0]	// r0 contains x
  add	r1, r1, #16	// 32-bit add
  strh	r1, [r0]	// Store bottom 16 bits

In this case there is no need to zero-extend r1, either on the load or on 
the
store, since the setting of those bits can never affect the behaviour of 
the compiler.  Indeed, for the above example

  ldrsh r1, [r0]
  add	r1, r1, #16
  strh  r1, [r0]

would have given exactly the same results, provided the value in r1 is not 
needed after this.  And further, on an ARM that doesn't support the 
ldrh/strh instructions, the code (little-endian) could be

  ldr	r1, [r0]
  add	r1, r1, #16
  strb  r1, [r0]
  mov	r1, r1, lsr #8
  strb	r1, [r0, #1]

(remember that on arm ldr will rotate the addressed halfword to the bottom 
of the register, even if it is only 16-bit aligned).

On the other hand, the code

  int foo (unsigned short *x)
  {
    return *x + 16 < 5;
  }

must be coded as

  ldrh  r1, [r0]
  add   r1, r1, #16
  mov   r1, r1, asl #16
  mov   r1, r1, lsr #16
  cmp   r1, #5
  movlo r0, #1
  movhs r0, #0

The zero-extension is required because we now need to examine the top 16 
bits.  (In this latter case, the compiler will sometimes make an 
optimization to the above, saving the second shift):

  ldrh  r1, [r0]
  add   r1, r1, #16
  mov   r1, r1, asl #16
  cmp   r1, #327680	// (5 << 16)
  movlo r0, #1
  movhs r0, #0




> 
> > What you really need to write for swabw is
> >
> > static unsigned short
> > swabw(unsigned short x)
> > {
> >     unsigned y = x;
> >     __asm__("orr %0, %0, %0, lsl #16 ; mov %0, %0, lsr #8" : "+r" (y) );
> >     return y & 0xffff;
> > }
> >
> > This will then ensure that the top bits of 'y' are all zero.
> 
> Thanks for the tip, Richard.  I think I'll put this in our official
> version of swabw, as under -O2, it doesn't generate any extra instructions.
> In fact, it comes out exactly the same as my version, except that the
> erroneous asr #16 modifier is replaced with the correct lsr #16.

To be honest, the only thing I thought the compiler was doing oddly with 
your original code was that it was doing any manipulation at all on the 
incoming argument... (if you pass a 16-bit quantity into an asm, then you 
must either not care about the high-order bits, or you must explicitly 
clear them yourself).

> 
> However, as I said before, I see this happening in other places that
> do not have any inline assembly.  I just haven't been able to come up
> with a very short example, suitable for posting here, that exhibits the
> bug using pure ANSI C.  So though your fix addresses this one instance
> nicely, the problem, in general, remains.

We really are going to need a further example if we are going to get any 
further with this.

> 
> To elaborate on my assumption/question above, how, in theory, should gcc
> be dealing with 16-bit values on the arm, which natively wants only to
> deal with 32-bit values?

I hope the examples above have made the issue clearer.

Richard.




------
Want more information?  See the CrossGCC FAQ, http://www.objsw.com/CrossGCC/
Want to unsubscribe? Send a note to crossgcc-unsubscribe@sourceware.cygnus.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]