This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

automated way to find functions that we might want to blacklist


Hi,

http://lindi.iki.fi/lindi/systemtap/torture/systemtap-torture.py

is a quick'n'dirty tool that I wrote to figure out why

stap -e 'probe kernel.function("*") {}'

crashes the system. The tool starts from the complete set of functions
and divides this to smaller and smaller partitions based on whether that
set crashes the system or not.

I ran the script with no arguments on an amd64 xen domU running debian
wheezy.  The produced logfile

http://lindi.iki.fi/lindi/systemtap/torture/linux-image-3.0.0-1-amd64_3.0.0-3/torture.log

shows which functions we should consider for the blacklist:

$ grep "([1234] funs) CRASHED" torture.log 
Thu Dec 22 11:48:24 2011 HYPERVISOR_physdev_op (39) .. HYPERVISOR_set_debugreg (41) (3 funs) CRASHED
Thu Dec 22 11:51:09 2011 HYPERVISOR_sched_op (40) .. HYPERVISOR_set_debugreg (41) (2 funs) CRASHED
Thu Dec 22 12:08:41 2011 hash_64 (10907) .. hash_futex (10909) (3 funs) CRASHED
Thu Dec 22 12:09:49 2011 hash_64 (10907) .. hash_64 (10907) (1 funs) CRASHED
Thu Dec 22 12:12:18 2011 hash_ptr (10910) .. hash_walk_next (10912) (3 funs) CRASHED
Thu Dec 22 12:13:28 2011 hash_ptr (10910) .. hash_ptr (10910) (1 funs) CRASHED
Thu Dec 22 13:14:18 2011 native_set_pmd_at (15204) .. native_setup_msi_irqs (15207) (4 funs) CRASHED
Thu Dec 22 13:38:13 2011 native_set_pmd_at (15204) .. native_set_pte (15205) (2 funs) CRASHED
Thu Dec 22 13:40:22 2011 native_set_pte (15205) .. native_set_pte (15205) (1 funs) CRASHED

Machine-readable trace that shows full function names (and not just
"..") is also available:

http://lindi.iki.fi/lindi/systemtap/torture/linux-image-3.0.0-1-amd64_3.0.0-3/state.json.bz2

This version does not cope with non-determinism very well. If a set of
function probes crashes the system only sometimes you may need to run
the torture script multiple times to catch it. If you want to try the
script yourself here's how I ran it:

1) Install watchdog package to domU
2) Use

on_crash = 'restart'

in xen domain configuration

3) Run

while true; do
    if ! hping3 --numeric --count 35 --icmp lindi2.lan; then
    xm destroy lindi2
    sleep 3
    xm create /local/xen/lindi2/config
    sleep 120
    fi
done

on dom0 as root.

4) Add

@reboot sleep 55 && /home/lindi/proj/systemtap-torture/systemtap-torture.py

to crontab of lindi so that the test continues after each crash.

5) Start "socat UDP-RECV:2346 -" on a computer where you want to send
the logs (specified using --report-host).

6) Run

/home/lindi/proj/systemtap-torture/systemtap-torture.py

manually and wait for the system to enter an intense stress test :-)

-Timo


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]