This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

static instrumentation for kernel


Hi -

Here is one set of ideas about inserting static instrumentation points
into the kernel.  It predates but is related to the discussion this
summer: <http://sources.redhat.com/ml/systemtap/2005-q3/msg00122.html>
It has what is perhaps an interesting combination of features.  It is
simple, architecture-neutral, does not require nonlocal artifacts like
per-probe declarations, and hopefully is not that slow.  There are
certainly some shortcomings and oversights - please be critical.


The code to be inserted into kernel sources would be a plain macro
call such as:

   SYSTEMTAP_PROBE(name) 
   SYSTEMTAP_PROBE_N(name,arg1) // arg1 castable to int64_t numeric
   SYSTEMTAP_PROBE_NS(name,arg1,arg2) // arg2 castable to char* string

The name should be unique within the function.  As you see, arguments
can be passed, encoding the type/arity into the macro name.  Possibly
some super clever typeof() conditionals can make that implicit.
What these macros would expand to is the following.  We'd generate a
menu of these for reasonable arities/type combinations and shove them
into a kernel header.

#define SYSTEMTAP_PROBE(name) \
   do { \
       static void (*__systemtap_probe_##name)(); \
       if (unlikely(__systemtap_probe_##name)) \
           (__systemtap_probe_##name) ();  \
      } while (0)
#define SYSTEMTAP_PROBE_NS(name,arg1,arg2) \
   do { \
       static void (*__systemtap_probe_ns_##name)(int64_t, const char*); \
       if (unlikely(__systemtap_probe_ns_##name)) \
           (__systemtap_probe_ns_##name) ((int64_t)(arg1), \
                                          (const char *)(arg2));  \
      } while (0)

As you see, the gist of it is a conditional call through a function
pointer, where the pointer is in a static variable.  Its name is
stylized: it encodes the probe name, and its parameter arity/type
signature.  (It might need some annotations to make sure the compiler
doesn't elide it, so that it has a convenient alignment, etc.)

Normally, the variable is NULL, so a dormant probe costs a NULL check
of a memory word, plus a likely conditional jump over the function
call.  When a probe is activated, systemtap arranges to overwrite the
NULL with the entry address of a probe handler function.  (This would
mean no sharing of a static probe point between systemtap sessions for
now.)  When the kernel trips across an activated probe point, it just
does the obvious: calls into a function in the systemtap module.  The
indirect call would be somewhat slower but much simpler than a
djprobe, and much faster than a kprobe.  It would be great if this
someone volunteered to microbenchmark this macro family.

OK, now the script side.  Systemtap would support a new family of
probe points:

   probe kernel.probe("name") { print ($arg1) }
   probe module("foo").probe("name") { ... same ... }

with the "name" portion being optionally annotated with enough
function / compilation-unit identifying suffixes to make it unique.
Incoming arguments from the macro calls would be mapped to script
variables named $arg1 etc.  The probe handler is otherwise completely
normal, and would operate under the same sorts of constraints that a
kprobe handler does (atomicity, limited runtime, etc.).

Finally, the translator side.  When it encounteres these ".probe()"
probe points, systemtap would look through the symbol table for the
referenced kernel/module, searching for those static variables with
the stylized names.  It stashes away the addresses of those variables
for setting/clearing during session startup/shutdown.  Because the
arity/type information is encoded in the stylized names, it can
generate interface functions with exactly matching signatures for the
static function pointer.  It can emit code to safely copy the incoming
parameters to statically typed pseudo-target variables.  Automagically
type-safe.


That's it.  Please let me know if the above is unclear or faulty.


- FChE


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]