This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: patches to actually use markers?


* Frank Ch. Eigler (fche@redhat.com) wrote:
> Hi -
> 
> On Fri, Nov 16, 2007 at 03:35:39PM -0500, Mathieu Desnoyers wrote:
> > [...]
> > > I see.  Yes, per-systemcall markers would be welcome by our group, and
> > > ones not dependent on TIF_TRACE or whatnot even more so.  But were
> > > trying not to get too optimistic.
> > 
> > I use per-systemcall markers for the principally useful systemcalls, but
> > I also instrument syscall_trace() to get all the other syscalls (new
> > ones, etc..).
> 
> So then some system calls would get duplicate trace reports, and some
> would not get arguments at all?  Does not sound ideal.
> 

We currently have three distinct events for a system call :

syscall entry, with syscall id and instruction pointer
the syscall specific instrumentation (opt)
syscall exit

One of the benefit to have syscall entry/exit with minimal information
is that we can put them really close to the "real" event, i.e. : passing
from userspace to kernel space. It becomes useful when people want a
precise accounting of the kernel vs userspace time. Therefore, the
results will be as close as possible to results taken by a profiler.

Having limited information passed to the syscall entry/exit
instrumentation helps knowing the number of cycles wrongly accounted. We
do not currently alter the statistics to take that into account, but we
plan to do this in the future. Having anything complicated could cause
the number of cycles wrongly accounted to vary between each event, which
is unwanted.

Instrumentation within the syscall specific function helps knowing
when/if the operation has really been done _within the kernel_. It may
imply putting the event within the bounds of existing locks to be as
sure as possible two related events happening on different CPUs won't be
in the wrong order. Ideally, the instrumentation of the syscall "effect
on the internal data structures of the kernel" should be as close as
possible to the actual memory modification.

Given these two opposite sets of constraints, I think having more than
one instrumentation site per syscall makes sense. Moreover, markers are
really cheap... :)

> > I add my own TIF_KERNEL_TRACE, which is a thread flag enabled in
> > each and every thread when tracing is active.  [...]
> 
> Who has responsibility to manage this flag?  Would it be reference
> counted, so that e.g.  two ltt and a third systemtap script all hook
> up to these markers, the flag will will stay set?  It would be nice to
> measure the impact of ordinary, unconditional markers in the
> system-call functions.
> 

Already did. Inactive markers, with high memory pressure, we must do 2
memory reads (that's the cycles difference we get). If they are in
cache, it's hard to see a difference. I think I've documented that in
the markers or immediate values patch header.

For active markers, I did some testing a while ago.. I could dig the ML
to find these results.

Yes, refcount would be the way to go. The code is currently in
kernel/sched.c, since it touches the threads. I would have to add the
refcount. It will be in the next LTTng prerelease.

> > > If "we" is a marker callback function that is given the system call
> > > number, it can be taught.  This is the sort of thing we do currently
> > > in systemtap script code based upon kprobes.
> > 
> > Yeah.. but I fear that within the kernel it can become quickly very
> > ugly.
> 
> It's an inherent tradeoff between a small generic hook versus many
> specialized hooks.  Look how the audit system deals with decoding
> syscalls.  It's not THAT bad.
> 

Hrm, it's just that it centralizes something that would be good to leave
to each subsystem's expert, which is what information specific to a given
system call is interesting and when is the best moment to record it.
Just like I would leave to the architecture experts the final word on
when it's best to record the system call entry/exit event.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]