This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On 18-09-2013 17:40, Richard Henderson wrote: > Hmm. That's register clobbering there. Gcc 4.7.2 generated > > 10000654: 7d 08 00 74 cntlzd r8,r8 > 10000658: 79 08 e8 c2 rldicl r8,r8,61,3 > 1000065c: 38 e8 ff f9 addi r7,r8,-7 > 10000660: 7c ca 38 2a ldx r6,r10,r7 > 10000664: 7c de 39 2a stdx r6,r30,r7 > > Ah, wrong constraints on my asm, that just so happened to work here. Change > all "=r" to "=&r" so that ralt et al does not overlap rsrc. > > > > r~ Thanks for the review and I have checked your suggestion with the modification below on top of my patch. We still need the load/compare/store sequence to avoid unaligned access to first doubleword. diff --git a/sysdeps/powerpc/powerpc64/power7/stpcpy.S b/sysdeps/powerpc/powerpc64/power7/stpcpy.S index 65ff6a0..116e8ee 100644 --- a/sysdeps/powerpc/powerpc64/power7/stpcpy.S +++ b/sysdeps/powerpc/powerpc64/power7/stpcpy.S @@ -41,14 +41,18 @@ EALIGN (__stpcpy, 4, 0) li rMASK, 0 addi rRTN, rRTN, -8 ld rWORD, 0(rSRC) - b L(g2) + cmpb rTMP, rWORD, rMASK + cmpdi rTMP, 0 + beq L(g0) + mr rALT, rWORD + b L(g1) .align 4 L(g0): ldu rALT, 8(rSRC) stdu rWORD, 8(rRTN) cmpb rTMP, rALT, rMASK cmpdi rTMP, 0 - bne L(g1) + bne L(test) ldu rWORD, 8(rSRC) stdu rALT, 8(rRTN) L(g2): cmpb rTMP, rWORD, rMASK @@ -56,6 +60,16 @@ L(g2): cmpb rTMP, rWORD, rMASK beq L(g0) mr rALT, rWORD +L(test): + addi rRTN, rRTN, 8 + cntlzd rMASK, rTMP /* Extract bit offset of null byte. */ + srdi rMASK, rMASK, 3 /* Convert bit offset to byte offset. */ + addi rALT, rMASK, -7 /* Include the previous 7 bytes + nul. */ + ldx rTMP, rSRC, rALT /* Perform one last unaligned copy. */ + stdx rTMP, rRTN, rALT + add rRTN, rRTN, rMASK /* Adjust the return value. */ + blr + And the results in the attached file (I used the stpcpy benchtest). As you can see my initial patch still shows slight better latency.
Attachment:
bench-stpcpy-patch.out
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |