This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: Benchmarking (was Re: [patch 2/2] Assert leftover cleanups in TRY_CATCH)
- From: Doug Evans <dje at google dot com>
- To: Stan Shebs <stanshebs at earthlink dot net>
- Cc: gdb-patches <gdb-patches at sourceware dot org>
- Date: Wed, 15 May 2013 10:00:52 -0700
- Subject: Re: Benchmarking (was Re: [patch 2/2] Assert leftover cleanups in TRY_CATCH)
- References: <20130507140020 dot GA10070 at host2 dot jankratochvil dot net> <5192D33A dot 3060702 at earthlink dot net>
On Tue, May 14, 2013 at 5:13 PM, Stan Shebs <stanshebs@earthlink.net> wrote:
> On 5/7/13 7:00 AM, Jan Kratochvil wrote:
>
>>
>> target-side condition evaluation is a good idea:
>>
>> time gdb ./loop -ex 'b 4 if i==360000' -ex r -q -ex 'set confirm no' -ex q
>> real 1m11.586s
>>
>> gdbserver :1234 ./loop
>> time gdb ./loop -ex 'target remote localhost:1234' -ex 'b 4 if i==360000' -ex c -q -ex 'set confirm no' -ex q
>> real 0m21.862s
>>
>> "set breakpoint condition-evaluation target" really helps a lot.
>
> This reminds me of something that has been on my mind recently -
> detecting performance regression with the testsuite.
tis on my todo list.
Got a time machine?
IIRC Redhat had the seeds of something, but it needed more work.
> I added a test for fast tracepoints a while back (tspeed.exp) that also
> went to some trouble to get numbers for fast tracepoint performance,
> although it just reports them, they are not used to pass/fail.
>
> However, if target-side conditionals get worse due to some random
> change, or GDB startup time gets excessive, these are things that we
> know real users care about. On the other hand, this is hard to test
> automatically, and no wants to hack dejagnu that much. Maybe an excuse
> to dabble in a more-modern testing framework? Are there good options?
re: dejagnu hacking: depends on what's needed.
Sometimes regressions aren't really noticed unless one is debugging a
really large app.
A second slowdown might be attributable to many things, but in a
bigger app it could be minutes and now we're talking real money.
It's trivial enough to write a program to generate apps (of whatever
size and along whatever axis is useful) for the task at hand.
And it's trivial enough to come up with a set of benchmarks (I have an
incomplete set I use for my standard large app benchmark), and a
harness to run them.
IWBN if running the tests didn't take a lot of time but alas some
things only show up at scale.
Plus one needs to run them a sufficient number of times to make the data usable.
Running a benchmark with different sized tests and comparing the
relative times can help.
[One thing one could do is, e.g., run gdb under valgrind and use
instruction counts as a proxy for performance.
[One also needs to measure memory usage.]
It has the property of being deterministic, and with a set of
testcases for each benchmark could reveal problems.
One can't do just this because it doesn't measure, e.g., disk/network
latency which can be critical.
I'm sure one could write a tool to approximate the times.
Going this route is slower of course.]
Ultimately I'd expect this to be separate from "make check" (or at
least something one has to ask for explicitly).
But we *do* need something and the sooner the better.