This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[RFC]-Approaches to user space probes

From: Prasanna S Panchamukhi <prasanna at in dot ibm dot com>
To: suparna at in dot ibm dot com, ak at suse dot de, roland at redhat dot com, anil dot s dot keshavamurthy at intel dot com, varap at us dot ibm dot com, systemtap at sources dot redhat dot com
Date: Fri, 27 Jan 2006 17:48:47 +0530
Subject: [RFC]-Approaches to user space probes
Reply-to: prasanna at in dot ibm dot com

Hi,

As per yesturday's Conf call dicussion, I have listed few approaches
for dynamic instrumentation of applications/libraries. Please provide
your suggestions about the listed approaches and other approaches you
know.

Thanks
Prasanna

	1. Attaching or loading the application into the tool.
	2. Using a jump instruction to a trampoline and trampoline
	   executing the instrumented code.
	3. Using a breakpoint instruction and changing the instruction
	   point to the instrumentaiton code which is part of user
	   address space.
	4. Using a breakpoint instruction and executing the
	   instrumentation code within the breakpoint handler.

1. Attaching or loading the application into the tool.

	In this method the user application must be loaded into the
tool or attached to already running application. Before the user can
instrument an application he must decide what that instrumentation
will consist of. Dynaprof uses such a mechanism. There are currently
two probes shipped with Dynaprof, the PAPI Probe and the Wallclock
Probe.  PAPI uses the processor's hardware performance counters to
measure specific hardware events like cache misses, branch
mispredictions and floating point instructions. The Wallclock probe
measures elapsed real-time which is sometimes referred to as wallclock
time.
Dynaprof inserts instrumentation directly into the applications
address space. This is accomplished through a run-time code generation
and patching mechanism based upon either Dyninst or DPCL, IBM's
derivative effort. Whenever a function is instrumented, all it's
children are instrumented as well. This is to enable the probe to
generate both inclusive and exclusive metrics.

2. Using a jump instruction to a trampoline and trampoline executing
the instrumented code.

	In this method the instrumenation code must be loded into user
address space dynamically. The major challenges are to generate
instrumentation code at the run time and to allocate space for
dynamically generated code. To insert this code, the application
process is stopped, the code and data are installed into the
application address space using operating system facilities such as
ptrace and /proc file system. Each small code fragments are called
trampolines. Associated with each active probe is a base-trampoline
and block of instrumentation code is placed in its own mini-trampoline.
The base trampoline contains the relocated original instructions from
the probe point in the application program, instructions to save and
restore registers, slots where jumps to mini-trampolines are be inserted
and a jump to return to the application code. When the probe is fired,
the base-trampoline gets executed that saves the registers state and
then execute individual mini-trampolines After returning, base
trampoline restores the registers state and normal execution continues.
Eg: Paradyn tool.

Issues with method 1 and 2 are:

	* Induces intel erratum E49 where the other processors see
	  stale data while one processor replaces the jump instruction.

	* Instruction can only be replaced atomically if the size of
	  the jump instruction is greater than or equal to the original
	  instruction.

	* Other processors need to be stopped if the jump instruction size
	  is less than the original instruction.

3. Using breakpoint instruction and changing the instruction pointer

In this method a breakpoint instruction is inserted at the probe point
and the original instruction is copied into the user address space.
When the probe is fired, the breakpoint handler changes the instruction
pointer to jump to a trampoline part of user address space. After the
trampoline executes the instrumenation code, trampoline jumps back to
the original routine after restoring the registers and process stack.

Issue associated with this approach is to allocate a saperate space in
user address space to copy the instrumenation code and original
instruction.

4. Using breakpoint instruction
	Using a breakpoint instruction and executing the instrumentation
code from within the breakpoint handler in the interrupt context.

Issue associated with this approach is to single step the original
instruction out-of-line.

In kernel space probes, single stepping out-of-line is achieved by
copying the instruction on to some location within kernel address space
and then single step from that location. But for userspace probes,
instruction copied into kernel address space cannot be single stepped,
hence the instruction should be copied to user address space. The
solution is to find free space in the current process address space
and then copy the original instruction and single step that instruction.

User processes use stack space to store local variables, agruments and
return values. Normally the stack space either below or above the stack
pointer indicates the free stack space. If the stack grows downwards,
the stack space below the stack pointer indicates the unused stack free
space and if the stack grows upwards, the stack space above the stack
pointer indicates the unused stack free space.

The instruction to be single stepped can modify the stack space,
hence before using the unused stack free space, sufficient stack space
should be left. The instruction is copied to the bottom of the page
and check is made such that the copied instruction does not cross the
page boundry. The copied instruction is then single stepped.

Several architectures does not allow the instruction to be executed
from the stack location, since no-exec bit is set for the stack pages.
In those architectures, the page table entry corresponding to the
stack page is identified and the no-exec bit is unset making the
instruction on that stack page to be executed.

There are situations where even the unused free stack space is not
enough for the user instruction to be copied and single stepped. In
such situations, the virtual memory area(vma) can be expanded beyond
the current stack vma. This expaneded stack can be used to copy the
original instruction and single step out-of-line.

Even if the vma cannot be extended then the instruction much be
executed inline, by replacing the breakpoint instruction with original
instruction.

Eg: Dprobes implemented this approach, but did not provide single
stepping out-of-line.

Method 3 and 4 require similar breakpoint insertion/removal mechanism
for the pages that are present in the memory and also for the pages
that are not present in the memory during insertion of probes. URL of
the initial patches are:

http://sourceware.org/ml/systemtap/2006-q1/msg00212.html
http://sourceware.org/ml/systemtap/2006-q1/msg00210.html

Method 4 requires a mechaism for single stepping the original
instruction out-of-line , URL of the prototype implentation is:

http://sourceware.org/ml/systemtap/2006-q1/msg00211.html
-- 
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-51776329

Follow-Ups:
- Re: [RFC]-Approaches to user space probes
  - From: Frank Ch. Eigler
- Re: [RFC]-Approaches to user space probes
  - From: Satoshi Oshima

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]