This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

WIP: subsystem transition monitoring script


Hi -

Here is another midsize script I've been hacking on.  This one is a
statistical subsystem callgraph.  Here, function probe patterns may be
designated as defining a subsystem, where we only want to know the
inter-subsystem (vs.  intra-subsystem) call transitions.  This script
prints a trace record when an inter-subsystem call occurs, and at its
conclusion shows some coarse little transition matrix data.

This may be useful to diagnose performance problems where the
distribution of latencies between subsystems is interesting.


# stap -vt crossings.stp \
 'process("cat").function("*")' \
 'process("/lib/libc.so.6").function("*@libio/*.c")'  \ 
 'module("ext4").function("*")' \
 -x 2222

The above designates four subsystems, two in user space and two in
kernel space, and focuses the probes on process 2222 and its
descendants.  This is a varargs script, so you can specify up to 9
probe point patterns for up to 9 subsystems.

As user space makes library system calls, I/O eventually hits the disk
device, and we get transitions such as these: (please widen your xterm
to enjoy):

[...]
cat[5411] 294352 c. later: subsystem 1 -> 2 (process("/bin/cat").function("safe_read@/usr/src/debug/coreutils-8.4/lib/safe-read.c:56").return -> process("/lib/libc-2.11.1.so").function("_IO_fflush@/usr/src/debug/glibc-2.11.1/libio/iofflush.c:33").call)
cat[5411] 61730528 c. later: subsystem 2 -> 3 (process("/lib/libc-2.11.1.so").function("_IO_new_file_fopen@/usr/src/debug/glibc-2.11.1/libio/fileops.c:252").call -> module("ext4").function("ext4_check_acl@fs/ext4/acl.c:240").call)
cat[5411] 206488 c. later: subsystem 3 -> 2 (module("ext4").function("ext4_file_open@fs/ext4/file.c:101").return -> process("/lib/libc-2.11.1.so").function("_IO_link_in@/usr/src/debug/glibc-2.11.1/libio/genops.c:102").call)
cat[5411] 1874320 c. later: subsystem 2 -> 3 (process("/lib/libc-2.11.1.so").function("_IO_file_stat@/usr/src/debug/glibc-2.11.1/libio/fileops.c:1222").call -> module("ext4").function("ext4_getattr@fs/ext4/inode.c:5381").call)
cat[5411] 36040 c. later: subsystem 3 -> 2 (module("ext4").function("ext4_getattr@fs/ext4/inode.c:5381").return -> process("/lib/libc-2.11.1.so").function("_IO_file_stat@/usr/src/debug/glibc-2.11.1/libio/fileops.c:1222").return)
[...]

A summary report like this is printed when the script is interrupted:

subsystems:
1 process("cat").function("*")
2 process("/lib/libc.so.6").function("*@libio/*.c")
3 module("ext4").function("*")
subsystem transition counts / times:
3 -> 1, 21 times, after 132232/305870/1723584 min/avg/max c.
1 -> 3, 19 times, after 2249656/57828440/213353080 min/avg/max c.
2 -> 3, 11 times, after 629152/774533492/8438247296 min/avg/max c.
3 -> 2, 9 times, after 35440/10703590/87460552 min/avg/max c.
1 -> 2, 5 times, after 57064/225859/358672 min/avg/max c.
2 -> 1, 3 times, after 730984/1100613/1366288 min/avg/max c.
0 -> 2, 2 times, after 0/0/0 min/avg/max c.


- FChE


------------------------------------------------------------------------
#! /usr/bin/stap

global residence # [thread-id] = subsystem-number
global door # [thread-id] = pp-string
global arrival # [thread-id] = time

global transitions[100] # [subsys-from,subsys-to] <<< residency-time


function time() { return get_cycles() /* or gettimeofday_us() */ }
function timename() { return "c." }


function entered (subsys) {
   // skip 
   t = tid()
   if (target() && ! target_set_pid (t)) return 0;

   now = time()
   pf = pp() // probefunc() or symname(caller_addr())
   
   try {
      if (! ([t] in residence))
         arrival[t] = now

      /* Did we cross subsystem boundaries? */
      if (residence[t] != subsys) {
         printf("%s[%d] %d %s later: subsystem %d -> %d (%s -> %s)\n", execname(), t,
                now-arrival[t], timename(), residence[t], subsys, door[t], pf)
         transitions[residence[t],subsys] <<< now-arrival[t]
         residence[t] = subsys
         arrival[t] = now      
      }

      /* Always update the door[]; at next report, it'll show up as the
      "exit" door, where one subsystem called into another one. */
      door[t] = pf

   } catch {
      num_errors_caught ++
      // don't delete transitions - it's small anyway
      delete residence
      delete arrival
      delete door
   }
}

global num_errors_caught

probe end,error {
   log("subsystems:")
   for (i=1;i<argc;i++)
      printf("%d %s\n", i, argv[i])
   log("subsystem transition counts / times:")
   foreach ([a,b] in transitions-) {
      printf("%d -> %d, %d times, after %d/%d/%d min/avg/max %s\n", a, b,
                      @count(transitions[a,b]), 
                      @min(transitions[a,b]), 
                      @avg(transitions[a,b]), 
                      @max(transitions[a,b]),
                      timename())
   }
   if (num_errors_caught)
      warn(sprint("Number of errors / log resets: %d\n", num_errors_caught))
   log("")
}


%( $# == 0 %? probe begin {
   error("supply some FOO.function("BAR") probe point patterns to define subsystems") 
   /* dummy */ entered (0)} 
   %)
%( $# >= 1 %? probe $1.call, $1.return { entered (1) } %)
%( $# >= 2 %? probe $2.call, $2.return { entered (2) } %)
%( $# >= 3 %? probe $3.call, $3.return { entered (3) } %)
%( $# >= 4 %? probe $4.call, $4.return { entered (4) } %)
%( $# >= 5 %? probe $5.call, $5.return { entered (5) } %)
%( $# >= 6 %? probe $6.call, $6.return { entered (6) } %)
%( $# >= 7 %? probe $7.call, $7.return { entered (7) } %)
%( $# >= 8 %? probe $8.call, $8.return { entered (8) } %)
%( $# >= 9 %? probe $9.call, $9.return { entered (9) } %)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]