This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: proposed instruction trace support in SystemTap
- From: Dave Nomura <dcnltc at us dot ibm dot com>
- To: "Frank Ch. Eigler" <fche at redhat dot com>
- Cc: systemtap at sourceware dot org, Maynard Johnson <mpjohn at us dot ibm dot com>, James Keniston <kenistoj at us dot ibm dot com>
- Date: Tue, 10 Jul 2007 11:12:57 -0700
- Subject: Re: proposed instruction trace support in SystemTap
- Organization: LTC Power Linux Toolchain
- References: <4689826A.9040902@us.ibm.com> <y0mabuar6ap.fsf@ton.toronto.redhat.com>
Frank Ch. Eigler wrote:
Dave Nomura <dcnltc@us.ibm.com> writes:
SINGLE_STEP/BRANCH TRAP HANDLER
[...]
probe branch_step label("branch handler 1")
{
<do whatever you want for each branch instruction>
itrace_output(); // write itrace binary output
}
where "label" is an language extension to attach a name [...]
Particularly, to turn the probe on and off by explicit function calls.
This is an area we discussed at the face-to-face meeting in Ottawa
last week, in relation to user-space probes. The same concept could
apply to other probe types.
Regarding semantics, this is tricky business. Turning off active
probes is relatively simple, because even if the underlying probe API
doesn't support instantaneous (atomic) disarming, we can simulate it
until the API catches up (by adding an "am I supposed to be disarmed?"
conditional to the handler). Turning them *on* is different - we
can't help but possibly miss a couple of events as the API catches up.
Maybe this is acceptable, maybe not. Some syntax may help tell us the
judgement of the script programmer.
The single_step and branch_step syntax identifies the handler code and
needs to be set up before the user probe code that turns on//off
instruction tracing. I am assuming that since you have access to the
pid() function in your *.stp script, that the user program is invoked
after all of the probes have been processed, so I'm not sure I
understand how the actual instruction tracing events would get lost. I
suppose it could be a big deal if you were trying to trace some very
sensitive code it might be important that ALL instructions in the
specified range are traced, although on PPC there are some reervation(?)
instructions that cannot be traced using the single instruction trap.
The strategy used by Performance Inspector's ITRACE is to turn off
tracing for some number of instructions and try to re-enable tracing by
setting up a kprobe at specific places in the kernel like switch_to, and
return from interrupt, etc.
Regarding syntax, we have more options than an opaque string and
explicit function calls to turn things on and off. We could have a
guard expression like dtrace's /.../ - though we would probably just
spell it thusly:
probe PROBEPOINT if (expr) { }
where expr could be something as simple as (probe_1_enabled_p), which
better be a global variable.
The compiler would analyze expr for dataflow, arrange to evaluate this
condition whenever appropriate (after another probe writes any of its
inputs), and arrange to promptly activate or deactivate the
appropriate probes. Since "promptly" may take some time, script
programmers plopping a conditional like this in are implying consent
to a few events being missed.
The itrace_output() is a function that produces the raw trace data
that could then be post processed for consumption by various
performance analysis tools but the user could do something as simple
as printing out the PC value.
Is the "raw trace data" a well-defined thing? Why would this sort of
hard-coded data set be desirable, as opposed to letting the programmer
write something explicit like:
printf("%2b%8b%4b", cpu(), get_ticks(), pc())
(Of course this can be hidden in a function of his own, or in an
inspectable tapset.)
The user could do the simple printf that you suggest. The proposed
callout to itrace_output() would only by used if you wanted more
detailed information (like timestamp) as required by a tool like
qtrace(a sophisticated pipeline analysis tool). Since the instruction
tracing will trace into the kernel you need some indication of when this
switch happens, and things like switches to different threads perhaps.
Since PI has other tools than ITRACE(tprof for example) I'm not sure
whether the complexity of the raw data that it generates is strictly
needed by qtrace I'll have to ask the PI pros. We would design
itrace_output() to generate the raw information needed by analysis tools
like qtrace and let a post processing tool do the formatting for
consumption the analysis tools
It might be nice if there was some way to name the relay streams so
that they aren't intermingled. Maybe something analogous to the
stream parameter to fprintf.
Something similar was mentioned as desirable in the OLS2007 talk by
Bligh / Desnoyer on google's ktrace & lttng. There, the context was
an occasional need to have separate buffers for high-volume and
low-volume messages, so that buffer overflows did not penalize the
smaller messages too much. Let's think about this some more.
The SystemTap translator would generate calls to target dependent
code to implement single instruction or branch trapping. This is
done a variety of ways on different architectures, but generally
involves setting a bit in a system register to enable single
instruction/branch trapping.
Is this sort of thing done/doable in kernel space also, or just on
user-space threads? Is there an existing kernel API for management of
these registers/fields?
Instruction tracing in the kernel is not something that PI ITRACE
supports, but I don't know of any reason why we would have that
restriction. Maybe the single_step/branch_step would have some sort of
syntax to allow trap handler code for kernel routines. There is
basically one single instruction trap handler that the stap translator
will generate with logic to figure out what handler code to run, so I
can't think of any reason why we wouldn't allow this.
One issue that Jim Keniston identified is that we would want some way to
not trace any of the nstructions in the kernel code associated with stap
probes, trap handler, etc. Your thoughts on how we might do this are
welcome!
[...] - instruction tracing enabled for a parent process id will
enable tracing for all of its children (threads). [...]
This is a sensible behavior, though so is a per-thread alternative.
Since the tracing flags are per-thread control registers anyway,
I suspect we'll have to build the former on top of the latter.
Yes.
[...] INITIALIZATION/CLEANUP
Initialization/cleanup of the instruction tracing feature could be
done by insertion of a call to an itrace initialilzation/cleanup
routine in the user's begin/end probes.
probe begin
itrace_init(<some params>)
probe end
itrace_cleanup()
Neither of these should be necessary. The existence of
instruction-trace type probes should imply automated setup/cleanup.
OK.
- FChE
--
Dave Nomura
LTC Linux Power Toolchain