This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
automated way to find functions that we might want to blacklist
- From: Timo Juhani Lindfors <timo dot lindfors at iki dot fi>
- To: systemtap at sourceware dot org
- Date: Thu, 22 Dec 2011 14:19:54 +0200
- Subject: automated way to find functions that we might want to blacklist
Hi,
http://lindi.iki.fi/lindi/systemtap/torture/systemtap-torture.py
is a quick'n'dirty tool that I wrote to figure out why
stap -e 'probe kernel.function("*") {}'
crashes the system. The tool starts from the complete set of functions
and divides this to smaller and smaller partitions based on whether that
set crashes the system or not.
I ran the script with no arguments on an amd64 xen domU running debian
wheezy. The produced logfile
http://lindi.iki.fi/lindi/systemtap/torture/linux-image-3.0.0-1-amd64_3.0.0-3/torture.log
shows which functions we should consider for the blacklist:
$ grep "([1234] funs) CRASHED" torture.log
Thu Dec 22 11:48:24 2011 HYPERVISOR_physdev_op (39) .. HYPERVISOR_set_debugreg (41) (3 funs) CRASHED
Thu Dec 22 11:51:09 2011 HYPERVISOR_sched_op (40) .. HYPERVISOR_set_debugreg (41) (2 funs) CRASHED
Thu Dec 22 12:08:41 2011 hash_64 (10907) .. hash_futex (10909) (3 funs) CRASHED
Thu Dec 22 12:09:49 2011 hash_64 (10907) .. hash_64 (10907) (1 funs) CRASHED
Thu Dec 22 12:12:18 2011 hash_ptr (10910) .. hash_walk_next (10912) (3 funs) CRASHED
Thu Dec 22 12:13:28 2011 hash_ptr (10910) .. hash_ptr (10910) (1 funs) CRASHED
Thu Dec 22 13:14:18 2011 native_set_pmd_at (15204) .. native_setup_msi_irqs (15207) (4 funs) CRASHED
Thu Dec 22 13:38:13 2011 native_set_pmd_at (15204) .. native_set_pte (15205) (2 funs) CRASHED
Thu Dec 22 13:40:22 2011 native_set_pte (15205) .. native_set_pte (15205) (1 funs) CRASHED
Machine-readable trace that shows full function names (and not just
"..") is also available:
http://lindi.iki.fi/lindi/systemtap/torture/linux-image-3.0.0-1-amd64_3.0.0-3/state.json.bz2
This version does not cope with non-determinism very well. If a set of
function probes crashes the system only sometimes you may need to run
the torture script multiple times to catch it. If you want to try the
script yourself here's how I ran it:
1) Install watchdog package to domU
2) Use
on_crash = 'restart'
in xen domain configuration
3) Run
while true; do
if ! hping3 --numeric --count 35 --icmp lindi2.lan; then
xm destroy lindi2
sleep 3
xm create /local/xen/lindi2/config
sleep 120
fi
done
on dom0 as root.
4) Add
@reboot sleep 55 && /home/lindi/proj/systemtap-torture/systemtap-torture.py
to crontab of lindi so that the test continues after each crash.
5) Start "socat UDP-RECV:2346 -" on a computer where you want to send
the logs (specified using --report-host).
6) Run
/home/lindi/proj/systemtap-torture/systemtap-torture.py
manually and wait for the system to enter an intense stress test :-)
-Timo