This is the mail archive of the crossgcc@sourceware.cygnus.com mailing list for the crossgcc project.
See the CrossGCC FAQ for lots more infromation.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
> > By declaring x as "unsigned short", you are saying that only the bottom 16 > > bits contain meaningful data; but you then try to use all 32 (assuming > > that the top 16 are zero). > > Perhaps I was wrong, but I was under the impression that the ARM core > cannot deal with just 16 bits of a register unless loading or storing it > to memory. All ALU manipulations _must_ operate on all 32 bits. Is > this the case? If so, then would it not be a requirement to zero the > top 16 bits of a register containing an unsigned short value before > operating on it? And if not, then why does gcc religiously do so, even > under -O2 optimization (short of this one bug I'm investigating)? Ok, let me rephrase my statement slightly more carefully. By declaring x as "unsigned short", you are saying that when it is transferred into a 32-bit register, only the bottom 16 bits are meaningful. If you subsequently want to do an operation on that register that relies on the top 16-bits being well defined, then you/the compiler must first convert it into a 32-bit quantity (by zero- or sign-extending it). If you do not do this, then the top 16 bits my contain garbage (the compiler is not required to keep those top bits correct at all times). So, for example, the code void foo(unsigned short *x) { *x += 16; } can be compiled to ldrh r1, [r0] // r0 contains x add r1, r1, #16 // 32-bit add strh r1, [r0] // Store bottom 16 bits In this case there is no need to zero-extend r1, either on the load or on the store, since the setting of those bits can never affect the behaviour of the compiler. Indeed, for the above example ldrsh r1, [r0] add r1, r1, #16 strh r1, [r0] would have given exactly the same results, provided the value in r1 is not needed after this. And further, on an ARM that doesn't support the ldrh/strh instructions, the code (little-endian) could be ldr r1, [r0] add r1, r1, #16 strb r1, [r0] mov r1, r1, lsr #8 strb r1, [r0, #1] (remember that on arm ldr will rotate the addressed halfword to the bottom of the register, even if it is only 16-bit aligned). On the other hand, the code int foo (unsigned short *x) { return *x + 16 < 5; } must be coded as ldrh r1, [r0] add r1, r1, #16 mov r1, r1, asl #16 mov r1, r1, lsr #16 cmp r1, #5 movlo r0, #1 movhs r0, #0 The zero-extension is required because we now need to examine the top 16 bits. (In this latter case, the compiler will sometimes make an optimization to the above, saving the second shift): ldrh r1, [r0] add r1, r1, #16 mov r1, r1, asl #16 cmp r1, #327680 // (5 << 16) movlo r0, #1 movhs r0, #0 > > > What you really need to write for swabw is > > > > static unsigned short > > swabw(unsigned short x) > > { > > unsigned y = x; > > __asm__("orr %0, %0, %0, lsl #16 ; mov %0, %0, lsr #8" : "+r" (y) ); > > return y & 0xffff; > > } > > > > This will then ensure that the top bits of 'y' are all zero. > > Thanks for the tip, Richard. I think I'll put this in our official > version of swabw, as under -O2, it doesn't generate any extra instructions. > In fact, it comes out exactly the same as my version, except that the > erroneous asr #16 modifier is replaced with the correct lsr #16. To be honest, the only thing I thought the compiler was doing oddly with your original code was that it was doing any manipulation at all on the incoming argument... (if you pass a 16-bit quantity into an asm, then you must either not care about the high-order bits, or you must explicitly clear them yourself). > > However, as I said before, I see this happening in other places that > do not have any inline assembly. I just haven't been able to come up > with a very short example, suitable for posting here, that exhibits the > bug using pure ANSI C. So though your fix addresses this one instance > nicely, the problem, in general, remains. We really are going to need a further example if we are going to get any further with this. > > To elaborate on my assumption/question above, how, in theory, should gcc > be dealing with 16-bit values on the arm, which natively wants only to > deal with 32-bit values? I hope the examples above have made the issue clearer. Richard. ------ Want more information? See the CrossGCC FAQ, http://www.objsw.com/CrossGCC/ Want to unsubscribe? Send a note to crossgcc-unsubscribe@sourceware.cygnus.com
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |