This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
WIP: subsystem transition monitoring script
- From: "Frank Ch. Eigler" <fche at redhat dot com>
- To: systemtap at sources dot redhat dot com
- Cc: Eric Sandeen <esandeen at redhat dot com>
- Date: Wed, 12 May 2010 17:11:39 -0400
- Subject: WIP: subsystem transition monitoring script
Hi -
Here is another midsize script I've been hacking on. This one is a
statistical subsystem callgraph. Here, function probe patterns may be
designated as defining a subsystem, where we only want to know the
inter-subsystem (vs. intra-subsystem) call transitions. This script
prints a trace record when an inter-subsystem call occurs, and at its
conclusion shows some coarse little transition matrix data.
This may be useful to diagnose performance problems where the
distribution of latencies between subsystems is interesting.
# stap -vt crossings.stp \
'process("cat").function("*")' \
'process("/lib/libc.so.6").function("*@libio/*.c")' \
'module("ext4").function("*")' \
-x 2222
The above designates four subsystems, two in user space and two in
kernel space, and focuses the probes on process 2222 and its
descendants. This is a varargs script, so you can specify up to 9
probe point patterns for up to 9 subsystems.
As user space makes library system calls, I/O eventually hits the disk
device, and we get transitions such as these: (please widen your xterm
to enjoy):
[...]
cat[5411] 294352 c. later: subsystem 1 -> 2 (process("/bin/cat").function("safe_read@/usr/src/debug/coreutils-8.4/lib/safe-read.c:56").return -> process("/lib/libc-2.11.1.so").function("_IO_fflush@/usr/src/debug/glibc-2.11.1/libio/iofflush.c:33").call)
cat[5411] 61730528 c. later: subsystem 2 -> 3 (process("/lib/libc-2.11.1.so").function("_IO_new_file_fopen@/usr/src/debug/glibc-2.11.1/libio/fileops.c:252").call -> module("ext4").function("ext4_check_acl@fs/ext4/acl.c:240").call)
cat[5411] 206488 c. later: subsystem 3 -> 2 (module("ext4").function("ext4_file_open@fs/ext4/file.c:101").return -> process("/lib/libc-2.11.1.so").function("_IO_link_in@/usr/src/debug/glibc-2.11.1/libio/genops.c:102").call)
cat[5411] 1874320 c. later: subsystem 2 -> 3 (process("/lib/libc-2.11.1.so").function("_IO_file_stat@/usr/src/debug/glibc-2.11.1/libio/fileops.c:1222").call -> module("ext4").function("ext4_getattr@fs/ext4/inode.c:5381").call)
cat[5411] 36040 c. later: subsystem 3 -> 2 (module("ext4").function("ext4_getattr@fs/ext4/inode.c:5381").return -> process("/lib/libc-2.11.1.so").function("_IO_file_stat@/usr/src/debug/glibc-2.11.1/libio/fileops.c:1222").return)
[...]
A summary report like this is printed when the script is interrupted:
subsystems:
1 process("cat").function("*")
2 process("/lib/libc.so.6").function("*@libio/*.c")
3 module("ext4").function("*")
subsystem transition counts / times:
3 -> 1, 21 times, after 132232/305870/1723584 min/avg/max c.
1 -> 3, 19 times, after 2249656/57828440/213353080 min/avg/max c.
2 -> 3, 11 times, after 629152/774533492/8438247296 min/avg/max c.
3 -> 2, 9 times, after 35440/10703590/87460552 min/avg/max c.
1 -> 2, 5 times, after 57064/225859/358672 min/avg/max c.
2 -> 1, 3 times, after 730984/1100613/1366288 min/avg/max c.
0 -> 2, 2 times, after 0/0/0 min/avg/max c.
- FChE
------------------------------------------------------------------------
#! /usr/bin/stap
global residence # [thread-id] = subsystem-number
global door # [thread-id] = pp-string
global arrival # [thread-id] = time
global transitions[100] # [subsys-from,subsys-to] <<< residency-time
function time() { return get_cycles() /* or gettimeofday_us() */ }
function timename() { return "c." }
function entered (subsys) {
// skip
t = tid()
if (target() && ! target_set_pid (t)) return 0;
now = time()
pf = pp() // probefunc() or symname(caller_addr())
try {
if (! ([t] in residence))
arrival[t] = now
/* Did we cross subsystem boundaries? */
if (residence[t] != subsys) {
printf("%s[%d] %d %s later: subsystem %d -> %d (%s -> %s)\n", execname(), t,
now-arrival[t], timename(), residence[t], subsys, door[t], pf)
transitions[residence[t],subsys] <<< now-arrival[t]
residence[t] = subsys
arrival[t] = now
}
/* Always update the door[]; at next report, it'll show up as the
"exit" door, where one subsystem called into another one. */
door[t] = pf
} catch {
num_errors_caught ++
// don't delete transitions - it's small anyway
delete residence
delete arrival
delete door
}
}
global num_errors_caught
probe end,error {
log("subsystems:")
for (i=1;i<argc;i++)
printf("%d %s\n", i, argv[i])
log("subsystem transition counts / times:")
foreach ([a,b] in transitions-) {
printf("%d -> %d, %d times, after %d/%d/%d min/avg/max %s\n", a, b,
@count(transitions[a,b]),
@min(transitions[a,b]),
@avg(transitions[a,b]),
@max(transitions[a,b]),
timename())
}
if (num_errors_caught)
warn(sprint("Number of errors / log resets: %d\n", num_errors_caught))
log("")
}
%( $# == 0 %? probe begin {
error("supply some FOO.function("BAR") probe point patterns to define subsystems")
/* dummy */ entered (0)}
%)
%( $# >= 1 %? probe $1.call, $1.return { entered (1) } %)
%( $# >= 2 %? probe $2.call, $2.return { entered (2) } %)
%( $# >= 3 %? probe $3.call, $3.return { entered (3) } %)
%( $# >= 4 %? probe $4.call, $4.return { entered (4) } %)
%( $# >= 5 %? probe $5.call, $5.return { entered (5) } %)
%( $# >= 6 %? probe $6.call, $6.return { entered (6) } %)
%( $# >= 7 %? probe $7.call, $7.return { entered (7) } %)
%( $# >= 8 %? probe $8.call, $8.return { entered (8) } %)
%( $# >= 9 %? probe $9.call, $9.return { entered (9) } %)