[PATCH v2 1/4] Cygwin: console: Add workaround for broken IL/DL in xterm mode.

Takashi Yano takashi.yano@nifty.ne.jp
Mon Mar 2 00:45:00 GMT 2020


On Sun, 1 Mar 2020 14:56:31 +0100
Hans-Bernhard Bröker wrote:
> Am 01.03.2020 um 07:33 schrieb Takashi Yano:
> 
> > However, from the view point of performance, just inline
> > static function is better. 
> 
> I don't see how that could be the case.  Inline methods of a static C++ 
> object should not suffer any perfomance penalty compared to inline 
> functions operating on static variables.
> 
> > Attached code measures the
> > performance of access speed for wpbuf.
> > I compiled it by g++ 7.4.0 with -O2 option.
> > 
> > The result is as follows.
> > 
> > Total1: 2.315627 second
> > Total2: 1.588511 second
> > Total3: 1.571572 second
> 
> Strange.  The result here (with GCC 9.2) is rather different:
> 
> $ g++ -O2 -o tt wpbuf-bench.cc && ./tt
> Total1: 0.753815 second
> Total2: 0.757444 second
> Total3: 1.217352 second
> 
> And on inspection, all three bench*() functions do appear to have 
> exactly the same machine code, too.  They may be inlined and mixed into 
> main() somewhat differently, though.  That might explain the difference 
> more readily than any actual difference in speed between the three 
> implementations.

I looked into the code generated by g++ 7.4.0 with -O2. The codes
generated are different.

With 32bit compiler,

bench1():
L3:
    cmpl    $255, %edx
    jg  L2
    movb    $65, _wpbuf(%edx)
    movl    $1, %ecx
    addl    $1, %edx
L2:
    subl    $1, %eax
    [...]

bench2(), bench3():
L22:
    cmpl    $255, %edx
    jg  L21
    movb    $65, _wpbuf2(%edx)
    addl    $1, %edx
L21:
    subl    $1, %eax
    [...]

With 64bit compiler,

bench1():
.L3:
    cmpl    $255, %edx
    jg  .L2
    movslq  %edx, %rcx
    addl    $1, %edx
    movb    $65, (%r8,%rcx)
    movl    $1, %ecx
.L2:
    subl    $1, %eax
    [...]

bench2(), bench3():
.L15:
    cmpl    $255, %edx
    jg  .L14
    movslq  %edx, %rcx
    addl    $1, %edx
    movb    $65, (%r8,%rcx)
.L14:
    subl    $1, %eax
    [...]

Obviously, code for bench2() and bench3() is shorter than
bench1().

However, with g++ 9.2.0 with -O2,

bench1(), bench2(), bench3():
L3:
    cmpl    $255, %edx
    jg  L2
    movb    $65, _wpbuf(%edx)
    addl    $1, %edx
L2:
    subl    $1, %eax
    [...]

all the codes are exactly the same, as you mentioned.

So, if we assume g++ 9.2.0, please forget the previous remarks
about speed.

-- 
Takashi Yano <takashi.yano@nifty.ne.jp>



More information about the Cygwin-patches mailing list