This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: testsuite and hardcoded timeouts


William Cohen wrote:

Quentin Barnes wrote:

I mentioned this issue as an aside and asked about it a month ago,
but I don't think it got a response at the time.

In porting the Systemtap testsuite to an embedded ARM platform
(~350Mhz CPU with 64MB and running using an NFS root and swap), I
found many of the existing hardcoded timeout parameters are way too
short.  Several of them I had to at least triple if not increase by
a larger factor, sometimes 6x-15x to get them to pass.

How do we want to deal with this portability problem of hardcoded
timeouts?


There are a few ways I can think of to address this:


1) Let the hardcoded numbers stay, but up them large enough to
  handle the slowest platform we might ever run on.  If that's
  still not slow enough someday, up them some more when the time
  comes.

This has the advantage of simplicity, but can greatly slow down
suite runs on faster processors when tests do get stuck.

2) Ban all standalone hardcoded timeouts replacing them with an
  expression involving a multiplier and/or a constant and a
  multiplier.

This is not the cleanest because some tests are slow due to I/O
bandwidth or paging where others are slow due to CPU limitations.
But it does have the advantage that if someone is having timeout
issues, they can up the multiplier value and rerun to see if the
problem goes away without having to edit all sorts of wrapper
scripts and tests.

If we go with a multiplier, the multiplier could be set
automatically by reading the cpuinfo and taking a stab at it based
on the machine's BogoMIPS or MHz.  We'd still need a way to have a user
straightforwardly tweak it beyond that manually.

Unfortunately, I don't understand the Systemtap testsuite framework
yet well enough to make specific suggestions.

Thoughts?

Quentin


Hi Quentin,

I have some machines regularly downloading cvs snapshots of systemtap and running the tests. I have encountered the same problem, particularly on the slow pentium III machine. I have increased some of the timeouts as a result of this. However, the problem is we don't know how long some of the tests take to run. In addition to the processor speed the kernel/debuginfo could affect the time required to build/install the tests.

I don't have good solutions to this problem. However, it might be good to start listing the tests that are "too slow." People running probe might be okay with a script taking a little time to get started, but they might not be so patient when it takes minutes for the script to translate and start running. Running them by hand with the "-v" to get information about which phases time is being spent would be helpful.

-Will

I ran into this issue on s390. When a time out occurs if the test would simply produce a warning message then restarts the timer, allowing the timeout to be restarted say 4 or 5 times before finally reporting a failure. Then if something breaks the test will still report a failure. On slower system the test would still pass. If a system/test normally passes with one or two restarts of the timer then something changes and it starts taking 3 or 4 restarts we will know that investigation is needed.


--
David Wilder
IBM Linux Technology Center
Beaverton, Oregon, USA dwilder@us.ibm.com
(503)578-3789



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]