This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

User-space probes: Plan B+


Here's where we stand on user-space probes (uprobes).  The intent of
uprobes is to enable application developers to create low-overhead,
dynamic instrumentation for their apps, with uprobes-based
instrumentation interoperating usefully, as needed, with kprobes-based
instrumentation.  Comments are welcome.

Recent History
--------------
Last spring, Prasanna Panchamukhi offered up a kernel-only approach,
where instrumentation would be coded as a kernel module, a la kprobes.
This performed well (e.g. 1 usec per probepoint hit on my Pentium M),
but we got bad reviews on such things as the kernel-only approach
and the per-executable tracing (e.g., hooking read_page(s)).

I tried an approach based on ptrace, with no kernel enhancements, but
it lacked certain necessary features (e.g., #2-5 below), probe overhead
was 12-15x worse than Prasanna's approach, and I couldn't get it to
work when probing multiple processes.  (Frank Eigler independently
suggested this approach and termed it "Plan B from outer space.")

While I was stumped trying to make Plan B work, Roland McGrath made
utrace available to us.  We looked this over as we found the time,
and it looked promising.

There has been much debate within the kprobes teams about the proper
programming model to support.  Discussions at OLS didn't yield many
new ideas, let alone consensus.

The Current Approach: Overview
------------------------------
The approach we are now coding can be summarized as follows.  (Okay,
it's not much like Plan B, but B+ sounds better than C.)

a. A system-call API that is an alternative to ptrace, provides
better support for probepoints and return probes, and exploits all
the process-lifetime events made accessible by utrace.

b. The "tracer" process detects events (e.g., probe hits) by polling
rather than catching SIGCHLD signals.

c. Hooks to allow kernel-mode instrumentation to cooperate with
user-mode "tracer" processes.

Here are the requirements we will satisfy with this approach.

0. Per-process (not per-executable) tracing.

1. Instrumentation can be coded entirely as a user-space app...

2. ... but in situations where performance is critical, uprobes can
run a named kernel handler without waking up the tracer process.

3. A user-mode tracer can invoke a previously registered kernel-mode
handler, so we have simple and efficient communication between user-
and kernel-mode instrumentation.

4. Multiple tracer processes can trace the same tracee.

5. As needed, we can "pre-define" a set of useful kernel handlers.

6. Uprobes can be easily extended (exploiting utrace) to support
notifying the tracer of non-probepoint events in the probee,
such as signals and system calls.

7. The user API should be easier to use than the ptrace API.

8. Handlers run in process context -- the tracee's context (see
requirement 2) or the tracer's context while the tracee is stopped
(see requirement 3).

A typical tracer app would do the following:

- Call uprobe_register() to establish a probepoint and be notified
(or run a kernel handler) when the probepoint is hit.

- Call uprobe_poll() repeatedly to poll for, and handle, events.
(A tracing app would have to spawn multiple threads to trace
multiple processes.)

- Whenever appropriate, call uprobe_run_khandler() to interoperate
with kernel-side instrumentation.

- Call uprobe_unregister() to cancel uprobes.

Apart from implementing kernel-side support for uprobes, the only
addition to the kernel API is a register_khandler() function that takes
a name, handler, and access-permission info.  (The handler takes,
as optional args, pointers to a uprobe object and an arbitrary,
user-defined data area.)

User API
--------
A summary of the user-side API is attached.

Jim


Attachment: uapi.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]