This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
RE: architecture paper draft

From: "Chen, Brad" <brad dot chen at intel dot com>
To: "Richard J Moore" <richardj_moore at uk dot ibm dot com>
Cc: <systemtap at sources dot redhat dot com>
Date: Tue, 15 Feb 2005 10:37:26 -0800
Subject: RE: architecture paper draft
> Again, comes down to purpose.  For extreme debugging then yes, being
able
> to insert prpobes at arbitrary locations is absolutely necessary. For
> performance and regular tracing then not so. The extreme debugging I'm
> referring to here is where you need to monitor for a peculiar set of
> circumstances then force a dump (panic) when they happen.

It appears to me that DTrace is specifically restricted
to the performance problem, and as a result they have the
useful feature of being a relatively safe tool. Perhaps
this "safe" mode could be the default mode for systemtap,
with a "guru" mode that allowed intercepts at arbitrary
kernel locations?

I think safety is a very important feature for being competitive
with DTrace. I'd hate to see us come up short on safety.

Brad

-----Original Message-----
From: Richard J Moore [mailto:richardj_moore@uk.ibm.com] 
Sent: Monday, February 14, 2005 10:56 AM
To: Chen, Brad
Cc: systemtap@sources.redhat.com; systemtap-owner@sources.redhat.com
Subject: RE: architecture paper draft






systemtap-owner@sources.redhat.com wrote on 12/02/2005 02:26:53:

>
> [appologies if you already saw this; sources.redhat.com sent me a
> send failure notice.]
>
>
> This was very helpful history about dprobes motivation. Thanks.
> I agree with your observation that debugging and performance tools
> have different needs. A few comments.
>
> 1) You might have a look at kerninst from University of Wisconsin.
> They use branches when they can and traps when they can't.
>

Will do.

> 2a)
> - What if we just let the instrumentation do its thing anyway?
> How many cases are there where it is undesireable to commit the
> results of the script before the faulting instruction is launched?
> It seems to me that if the analysis is at the semantic level of
> procedures or source lines then it's okay if the instrumentation
> commits. Especially if we make it clear that the analysis occurs
> before the instruction is executed.
> - When it does matter: if we replace the instrumented instruction
> with a branch, and it generates a trap, then the trap handler
> might recognize that the instruction is in SystemTAP memory and
> know to do something special, such as schedule some kind of fix-up,
> or trigger undo code in the script via a speculation language
> feature.


Comes down to the purpose of the tool - debugging or performace.
For the types of problem I used dprobes on I wold definitely like to
suppress the effect of benign retires. But if we execute the original
instuction ex situ then we have to single step it in general as I think
we
have a hard time deailing will all instructions in a uniform way (think
about relative addressing). So we end up single-stepping almost as a
matter
of necessity.


>
> 2b) Recursion - this we want to strictly disallow, right?
>

Absolutely. It's easy to protect against and not an issue.

> 3) Part of what I took away from the notion of probe points is
> that the instrumentation is placed not at arbitrary locations
> but at very specific locations. Do we want people to be able
> to put instrumentation at arbitrary places? Seems like this
> could be a safety problem.
>

Again, comes down to purpose.  For extreme debugging then yes, being
able
to insert prpobes at arbitrary locations is absolutely necessary. For
performance and regular tracing then not so. The extreme debugging I'm
referring to here is where you need to monitor for a peculiar set of
circumstances then force a dump (panic) when they happen.


> Brad
>
> -----Original Message-----
> From: Richard J Moore [mailto:richardj_moore@uk.ibm.com]
> Sent: Friday, February 11, 2005 1:38 AM
> To: Chen, Brad
> Cc: Frank Ch. Eigler; Stephen C. Tweedie;
systemtap@sources.redhat.com;
> systemtap-owner@sources.redhat.com
> Subject: RE: architecture paper draft
>
> The original design choice for an interrupt mechanism rather than a
> branch
> was based upon the following criteria:
>
> 1) for a global debugger - i.e. where breakpoints/probepoints can be
> placed
> in user and kernel space - then we need run the probe handler in
kernel
> context to give maximum access to system resources. So a privilege
level
> transfer to ring 0 is mandated.
>
> 2a) The probed instruction is single-stepped before normal control
> returns
> to the system. This is done for dynamic tracing purposes, where we
> discard
> the trace record if the probed instruction faults (not traps). If we
> don't
> do this we get multiple trace events for an apparent single execution
of
> a
> given instruction where a page-fault it generates is handled
seamlessly
> by
> the memory manager. There is an option to override this behaviour BTW.
> Single-stepping of the probed instruction has to be done in the
correct
> context, hence for simplicity we temporarily restored the original
> instruction and single-stepped it in situ. However, that scheme opens
> the
> windows for missing potential tracepoints in a multi-processor
> environment.
> Hence the later change to kprobes where we single-step a copy of the
> original instruction. To implement that change we store the original
> instruction in memory that is accessible by the same virtual address
> from
> all contexts - remember this is a global debugger by design, it
doesn't
> privatize code as ptrace does; manipulation of the probed instruction
is
> done by an aliased virtual address in kernel space. A probepoint on a
> shared library is active for all contexts - current and future -  that
> call
> that library.
>
> 2b) If the probed instruction causes recursion into the probe handler
> then
> we silently remove the probepoint. We also provide an explicit means
to
> do
> this from the probe handler (so satisfy various needs).  Thus while
it's
> not valid to put a probe in the code path of the probe handler, it
does
> no
> harm.
>
> 3) Both 2a and 2b require the ability to instate and remove
probepoints
> in
> arbitrary contexts. We can't afford to have to deal with special
locking
> requirements or the possibility of causing a fault on storing the
probe
> instruction. Therefore we chose an instruction that could both be
stored
> atomically and cause a transition to ring 0. There are very few that
do
> this - in fact I think there's only one on IA32, which is the INT3.
>
> 4) In order to preserve order (for tracing purposes) we also required
> that
> the breakpoint interrupt be serviced by in interrupt gate and not a
trap
> gate - the latter doesn't atomically disable interrupts.
>
>
> So, that's how we got into using the interrupt mechanism for
> probepoints. I
> believe it's still valid when kprobes/dprobes is used as a global
> debugger.
> And I guess this is where the requirements of profiling and
performance
> tools differ.  The debugger's prime concern is to record  order of
> events
> and is less concerned about timing. The perftool is concerned with
> accurate
> timing of and sampling and requires minimal disturbance to normal
> performance characteristics but is not concerned with recording the
> detailed sequence of events. Hence the preference by Sun to base the
> performance probe on a call.
>
> Have we come to a parting of the ways? Is kprobes the right mechanism
on
> which to build a DTRACE-like capability?
>
>
>
> - -
> Richard J Moore
> IBM Advanced Linux Response Team - Linux Technology Centre
> MOBEX: 264807; Mobile (+44) (0)7739-875237
> Office: (+44) (0)1962-817072
>
> systemtap-owner@sources.redhat.com wrote on 11/02/2005 01:16:19:
>
> >
> > Frank Ch. Engler wrote:
> > > In addition, this method may require that the kprobes handler not
be
> > > started from an interrupt context wrapped around the "int 3" trap
> > (x86).
> > > Changing this might require extensive changes to kprobes, to
perhaps
> > > insert "simple" diversionary branches into the executable image
> > instead
> > > of traps.  Intel folks prefer this sort of approach for
performance
> > > reasons, but we may have come across an even better reason for it.
> >
> > Thank you for noting my earlier question about interrupt overhead.
> > I said I would do a little homework on interrupt overhead; here it
is:
> >    Cycle delay by CPU   Branch   Trap
> >     1.6 GHz Pentium 4   149      1408
> >      AMD Athalon 1800    38      361
> >     1.6 GHz Pentium M   84      541
> >
> > These numbers are from the kerninst team from the University of
> > Wisconsin
> > and I did not verify them myself. In general it looks like a trap is
> > 7-10x
> > more expensive than a branch. It appears to me that kprobes requires
> > three
> > traps, so that would make the overall impact 20-30x more expensive.
> >
> > For Example: Assume a 1.6GHz Pentium 4
> >    Branch overhead: 149 cycles
> >    Overhead for one trap: about 1400 cycles
> >    Kprobes requires 2-3 traps
> >    1% overhead => 16M cycles
> >    trap-based instrumentation: 5000 probes per second
> >    branch-based instrumentation: 94000 probes per second
> >
> > For many tools, most time will be spent in analysis code and this
> > issue is irrelevant. However, if you happen to be a performance
> > guy, and you're trying to do something even moderately aggressive
> > in terms of higher frequency or very low overhead, this might start
> > to matter. If this also helps to simplify some of the interrupt
> > management issues, that's great.
> >
> > I note in passing that the SPARC implementation of DTrace is
> > reported to use branches, and their x86 implementation uses
> > traps.
> >
> > Brad Chen
> >
>

- -
Richard J Moore
IBM Advanced Linux Response Team - Linux Technology Centre
MOBEX: 264807; Mobile (+44) (0)7739-875237
Office: (+44) (0)1962-817072
Follow-Ups:
- RE: architecture paper draft
  - From: Richard J Moore
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]