This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: proposed instruction trace support in SystemTap


Frank Ch. Eigler wrote:
Dave Nomura <dcnltc@us.ibm.com> writes:
SINGLE_STEP/BRANCH TRAP HANDLER
[...]
probe branch_step label("branch handler 1")
{
        <do whatever you want for each branch instruction>
        itrace_output();        // write itrace binary output
}

where "label" is an language extension to attach a name [...]

Particularly, to turn the probe on and off by explicit function calls. This is an area we discussed at the face-to-face meeting in Ottawa last week, in relation to user-space probes. The same concept could apply to other probe types.

Regarding semantics, this is tricky business.  Turning off active
probes is relatively simple, because even if the underlying probe API
doesn't support instantaneous (atomic) disarming, we can simulate it
until the API catches up (by adding an "am I supposed to be disarmed?"
conditional to the handler).  Turning them *on* is different - we
can't help but possibly miss a couple of events as the API catches up.

Maybe this is acceptable, maybe not. Some syntax may help tell us the
judgement of the script programmer.
The single_step and branch_step syntax identifies the handler code and needs to be set up before the user probe code that turns on//off instruction tracing. I am assuming that since you have access to the pid() function in your *.stp script, that the user program is invoked after all of the probes have been processed, so I'm not sure I understand how the actual instruction tracing events would get lost. I suppose it could be a big deal if you were trying to trace some very sensitive code it might be important that ALL instructions in the specified range are traced, although on PPC there are some reervation(?) instructions that cannot be traced using the single instruction trap. The strategy used by Performance Inspector's ITRACE is to turn off tracing for some number of instructions and try to re-enable tracing by setting up a kprobe at specific places in the kernel like switch_to, and return from interrupt, etc.

Regarding syntax, we have more options than an opaque string and explicit function calls to turn things on and off. We could have a guard expression like dtrace's /.../ - though we would probably just spell it thusly:

probe PROBEPOINT if (expr) { }

where expr could be something as simple as (probe_1_enabled_p), which
better be a global variable.

The compiler would analyze expr for dataflow, arrange to evaluate this
condition whenever appropriate (after another probe writes any of its
inputs), and arrange to promptly activate or deactivate the
appropriate probes.  Since "promptly" may take some time, script
programmers plopping a conditional like this in are implying consent
to a few events being missed.


The itrace_output() is a function that produces the raw trace data
that could then be post processed for consumption by various
performance analysis tools but the user could do something as simple
as printing out the PC value.

Is the "raw trace data" a well-defined thing? Why would this sort of
hard-coded data set be desirable, as opposed to letting the programmer
write something explicit like:
printf("%2b%8b%4b", cpu(), get_ticks(), pc())
(Of course this can be hidden in a function of his own, or in an
inspectable tapset.)
The user could do the simple printf that you suggest. The proposed callout to itrace_output() would only by used if you wanted more detailed information (like timestamp) as required by a tool like qtrace(a sophisticated pipeline analysis tool). Since the instruction tracing will trace into the kernel you need some indication of when this switch happens, and things like switches to different threads perhaps. Since PI has other tools than ITRACE(tprof for example) I'm not sure whether the complexity of the raw data that it generates is strictly needed by qtrace I'll have to ask the PI pros. We would design itrace_output() to generate the raw information needed by analysis tools like qtrace and let a post processing tool do the formatting for consumption the analysis tools
It might be nice if there was some way to name the relay streams so
that they aren't intermingled. Maybe something analogous to the
stream parameter to fprintf.

Something similar was mentioned as desirable in the OLS2007 talk by Bligh / Desnoyer on google's ktrace & lttng. There, the context was an occasional need to have separate buffers for high-volume and low-volume messages, so that buffer overflows did not penalize the smaller messages too much. Let's think about this some more.


The SystemTap translator would generate calls to target dependent
code to implement single instruction or branch trapping. This is
done a variety of ways on different architectures, but generally
involves setting a bit in a system register to enable single
instruction/branch trapping.

Is this sort of thing done/doable in kernel space also, or just on
user-space threads? Is there an existing kernel API for management of
these registers/fields?
Instruction tracing in the kernel is not something that PI ITRACE supports, but I don't know of any reason why we would have that restriction. Maybe the single_step/branch_step would have some sort of syntax to allow trap handler code for kernel routines. There is basically one single instruction trap handler that the stap translator will generate with logic to figure out what handler code to run, so I can't think of any reason why we wouldn't allow this.

One issue that Jim Keniston identified is that we would want some way to not trace any of the nstructions in the kernel code associated with stap probes, trap handler, etc. Your thoughts on how we might do this are welcome!

[...] - instruction tracing enabled for a parent process id will
enable tracing for all of its children (threads). [...]

This is a sensible behavior, though so is a per-thread alternative.
Since the tracing flags are per-thread control registers anyway,
I suspect we'll have to build the former on top of the latter.
Yes.
[...] INITIALIZATION/CLEANUP
Initialization/cleanup of the instruction tracing feature could be
done by insertion of a call to an itrace initialilzation/cleanup
routine in the user's begin/end probes.

probe begin
itrace_init(<some params>)
probe end
itrace_cleanup()

Neither of these should be necessary. The existence of
instruction-trace type probes should imply automated setup/cleanup.
OK.
- FChE


--
Dave Nomura
LTC Linux Power Toolchain



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]