This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Implementing a generic binary trace interface.


Jose R. Santos writes:
 > Hi folks,
 > 
 > My team is currently implementing a trace tool using SystemTap that 
 > currently does logging by means of printf mechanism.  We want to move to 
 > a binary trace format but there is no such mechanism on SystemTap for 
 > doing this.  I've looked at what the folks at Hitachi have done with 
 > BTI, but this seems to force a specific trace format that is not 
 > suitable for what we need.  Ideally, the trace format should be left to 
 > the tapset using the interface not the BTI.  I propose to slightly alter 
 > the BTI from Hitachi to allow other trace implementations to use the 
 > trace format that's most convenient for the people implementing them.
 > 
 > To facilitate this, Tom Zanussi has been talking about implementing a 
 > basic form struct to the SystemTap language.  The basic idea of how the 
 > tapset would use the new BTI and struct data types looks like:
 > 
 > probe kernel.function("sys_open")
 > {
 >       trace_extra.hookid = HOOKID_OPEN;
 >       trace_extra.flags = $flags;
 >       trace_extra.mode = $mode;
 >       trace_extra.name = $filename;
 >       trace_extra.fd = $fd;
 > 
 >       lket_tracer(trace_extra);
 > }
 > 
 > 
 > function lket_trace(trace_extra:struct) {
 > 
 >           trace_format.timestamp = ...
 >           trace_format.cpu = ...
 >           trace_format.....
 >           .....
 > 
 >           trace(trace_format, trace_extra)
 > }
 > 
 > 
 > Unlike the BTI current implementation, the format of the trace hook is defined by lket_trace and not by the generic interface.  This design has the following benefits over the Hitachi interface.
 > 
 > 1) It allows for anyone to implement their own trace hooks as they see fit. Making it a truly generic interface.
 > 
 > 2) It does not limit the number or type of arguments that a trace hook can have.  The current implementation limits you to 16 data points of size long.
 > 
 > 
 > Aside from the support to add struct to the SystemTap language, the rest of the changes to the BTI should be pretty straight forward and they would not significantly impact the current work that Hitachi has done with their implementation.
 > 
 > Thoughts?

I'm not sure what you mean with the trace_format part - it seems
confusing to me.  Maybe simplify it to this to make it more clear what
we want to be able to do:

probe kernel.function("sys_open")
{
	data.hook_id = HOOKID_OPEN

	fill_in_common(data)

	data.flags = $flags;
	data.mode = $mode;
	data.name = $filename;
	data.fd = $fd;

	trace(data);
}

function fill_in_common(data)
{
	data.timestamp = timestamp_us();
}

Basically, we want to be able to write a probe like the above, the end
result being that a block of data corresponding to the struct gets
logged by the trace() function, which is just a simple function that
logs N bytes of raw data i.e. sizeof(data).  The struct syntax is to
me just a language convenience making it easy for the translator to
generate the code needed to do this, and doesn't need to do be
anything more than that, so it seems that it should be relatively easy
to add, but I don't know the translator very well, I may be mistaken.

Here's an approximation of the code that might be generated for the
probe script (I didn't bother to look at the code being generated
currently for probes to stick in here, but this should give the
general idea).

First a struct is generated.  The field types are inferred from the
assignments.  Currently there are only 2 systemtap types that can be
used in scripts, and there's no way to specify a field's type
explicitly e.g. u16 or char.  It would be nice in order to save space
for tracing, but I'm assuming for the sake of this example the
capability is there; otherwise just substitute longs for everything
but strings.  In any case, each field can be of any type.  There's
also no arbitrary limit on the number of items.

struct sys_open_trace_struct
{
	u16 hookid;
	unsigned long long timestamp;
	int flags;
	int mode;
	char name[NAME_MAX+1];
	int fd;
};

Finally, in the probe handler that's generated, a hypothetical
function _stp_reserve() is called to reserve space in the stp buffer.
It can basically be treated as if it were a kmalloc(), though all it's
doing is reserving a spot in the tracing buffer.  The code that fills
up the struct is then generated as usual for a probe, and when the
last field is filled in, it's done - there's no trace function or
anything needed after that, because the handler is directly filling in
the trace buffer memory.  Again, I'm just making up the _stp-specific
code off the top of my head - I'll look up the actual generated code
more carefully if it would help make it more clear.

void sys_open_handler(void)
{
	struct sys_open_trace_struct *event;
	event = _stp_reserve(sizeof(struct sys_open_trace_struct));

	event->hook_id = HOOKID_OPEN;
	event->timestamp = _stp_gettimeofday_us();
	event->flags = _stp_get_target("flags");
	event->mode = _stp_get_target("mode");
	_stp_copy_user_string(event->name, _stp_get_target("filename"));
	event->fd = _stp_get_target("fd");
}


Just for comparison, the above probe is really the equivalent of this
probe that uses globals and a dedicated 6-param trace function:

global timestamp, hook_id, flags, mode, filename, fd

probe kernel.function("sys_open")
{
	data.hook_id = HOOKID_OPEN

	fill_in_common()

	flags = $flags;
	mode = $mode;
	filename = $filename;
	fd = $fd;

	trace(hook_id, timestamp, flags, mode, filename, fd);
}

function fill_in_common()
{
	timestamp = timestamp_us();
}

To do the same logging as with the struct example the trace function
here would reserve or allocate an array of 6 longs (or whatever the
trace6() function is hard-coded to handle) and a string, assign the
param values and log. ?The problem with this is that there needs to be
a dedicated trace() function with 6 params and a string and if later
its decided to be more efficient and use sizes for each field
according to its real type, there's no opportunity to do that, unless
you want to enumerate every possibility of number of params and param
type.

Considering that the first probe is so similar to the second, which
can be done today, I think this should be a relatively minor addition
to the language, but I'll let the translator folks comment on that.

This scheme could also be used in the reverse direction with another
slight addition to the language for making trace events from external
tracers visible in systemtap.

global timestamp, hook_id, flags, mode, filename, fd

probe trace.ltt("open_event")
{
	flags = $ltt_event.flags;
	mode = $ltt_event.mode;
	name = $ltt_event.filename;
	fd = $ltt_event.fd;
}

Here, the ltt event fields are available using the same struct syntax,
but prepended with '$'.

You could even combine the two and do 'retracing' if you wanted to
e.g. suppose the static tracepoint didn't provide some piece of info
that you needed.  You could grab what it does provide, and since
you're still in the context of the open syscall, you should be able to
also grab at that point other values, stick it all into a new struct
and log it e.g.

probe trace.ltt("open_event")
{
	data.hook_id = HOOKID_OPEN

	fill_in_common()

	data.flags = $ltt_event.flags;
	data.filename = $ltt_event.filename;
	data.fd = $ltt_event.fd;
	data.mode = $mode;

	trace(data);
}

function fill_in_common()
{
	timestamp = timestamp_us();
}

Here, the first three fields come from the ltt event, and the mode
comes from the mode param that wasn't available in the ltt event but
should be from dwarf info.

Tom



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]