This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: user-space probes -- plan B from outer space


On Tue, 2006-06-06 at 12:07, Frank Ch. Eigler wrote:
> Hi -
> 
> Here is an outline of how systemtap might support user-space probes,
> even in the absence of kernel-based user kprobes.  This is a "plan B"
> only, a desperate stopgap until lkml sees the light.  Maybe "plan Z"
> is more appropriate, considering the limitations I'm about to outline.

I'm just now prototyping something very much like what you've
described.  See below for more info.

> 
> The idea is to support limited systemtap scripts that refer only to
> user-space probe targets such as existing processes.  These scripts
> would be translated to a user-space probe program instead of a kernel
> probe module.

I was thinking a user-mode (instrumentation) program + a kernel module
that defines handlers that could be invoked from the instrumentation
program.  The latter (which requires kernel enhancements) is necessary
only for convenient & efficient coordination of user-space and
kernel-space instrumentation.  (But that's what we're after, right?)

> 
> Probes would be specified with a probe point syntax such as:
> 
>    user.process(233).statement(0xfeedface)
>    user("fche").process("/bin/vi").function("*init*")
> 
> Instead of kprobes of a probe module, this probe program would use
> ptrace to insert breakpoints into any target processes,

Got that running, although the API needs to be generalized.

> perhaps using
> code from RDA or GDB.  Given the process-id or process name, systemtap
> should be able to locate the necessary debugging information at
> translation time.  When probes are hit, the probe process would run
> the compiled probe handlers in much the same way as now.  Access to
> $target vars should be possible.  The runtime code would have to have
> a new variant to use some user-level facility (plain pipes?)  to
> communicate with the front-end.

Comm with front end not undertaken yet.

> 
> 
> Q: Wouldn't this be slow?
> A: Oh yes, quite.  Several ptrace context-switch round-trips per
>    probe hit.  Lots more if we want to pull out target-side
>    state like $variables or stack backtraces.

Yes, pretty slow.  In my prototype, my user-mode handler just increments
a counter.  On my Pentium M, overhead per probepoint hit is ~14.2 usec,
compared with 1.03 usec for the uprobes version last posted to LKML.

For comparison, using "gdb -batch" to do the same thing cost 111 usec
per hit, and tracing one syscall with strace cost ~10 usec per hit.  (Of
course, strace can be more efficient than ad hoc probing because ptrace
has special support for syscall tracing; and a C-code handler can do all
sorts of things that a gdb command-script can't.)

> 
> Q: What about concurrency?
> A: You mean like probes concurrently hit in several target processes,
>    like SMP kprobes?  If there was any indication that this was
>    worthwhile, then we could make the systemtap-generated probe
>    process be multi-threaded (one probe thread per target thread).

Yes.  I haven't taken that on.

> 
> Q: Any other limitations?
> A: Because of ptrace, any process can be supervised by only one
>    process at a time.  So if you run systemtap on a user process,
>    you won't be able to run gdb or another systemtap session on it.

Yes.

> 
> Q: What about probing the kernel and user space together?
> A: Maybe this scheme would work if kernel-space systemtap probes
>    run concurrently, and arrange to share systemtap globals with
>    userspace somehow (mmap?).  Shared variables like this would
>    likely cause many more locking timeouts (=> skipped probes)
>    than now.  There are also additional security concerns.

My proposed approach to user/kernel data sharing is a new system call or
ptrace request that just passes a pid, a handler ID, and a pointer to an
area in user space that the handler (installed via a kernel module) can
read and/or write.  Again, there are security concerns.  But a Bad Guy
would have to have the help of somebody who has permission to install
the module.

If all kernel/user comm were initiated by the instrumentation program,
the kernel handler could sleep as needed.

> 
> Q: What about probing shared libraries?
> A: Because of the way ptrace works, we'd have to turn these into
>    process-level probes, including probes that just sit around
>    monitoring the threads and all their children to dlopen/mmap
>    the named libraries.

Yes.

> 
> Q: Is it worth it to try?  Is there a better way?
> A: You tell me.

There are certainly ways that perform better and lack some of these
limitations.  Selling them on LKML is another matter.

An "incremental approach" might enhance ptrace to reduce probepoint
overhead -- e.g., let the kernel handle single-stepping and continuing
all in one ptrace call.

> 
> 
> - FChE

Jim


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]