This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: kernel summit session on systemtap
- From: Theodore Tso <tytso at mit dot edu>
- To: "Frank Ch. Eigler" <fche at redhat dot com>
- Cc: systemtap at sources dot redhat dot com
- Date: Wed, 17 Sep 2008 18:13:49 -0400
- Subject: Re: kernel summit session on systemtap
- Bcc: tytso at mit dot edu
- References: <20080917144115.GA10231@redhat.com>
On Wed, Sep 17, 2008 at 10:41:15AM -0400, Frank Ch. Eigler wrote:
> Here are some things we need to work more on:
>
> - It's time to really improve & shrink debuginfo. Enough said.
The more I've played with debuginfo, the more I've been convinced that
at least for me, the costs vastly outweight the benefits. It causes
the time to compile the kernel (and kernel developers need to compile
the kernel a lot) to explode, just simply due to disk I/O time; if
/lib is on a separate partition, you can simply not have the space to
store the huge, vastly bloated modules. From the benefits side, given
GCC's increasingly aggressive optimizations, being able to set
breakpoints at random lines is less important when it (a) often
doesn't work because it's been optimized out, or (b) the symbol you
want to reference isn't easily available. Case (b) ends up being very
frustrating because you end up getting a highly confusing error
message, such as:
semantic error: failed to retrieve location attribute for local 'sb'
(dieoffset: 0x9cf22): identifier '$sb' at ext4-check-desk.stp:3:47
Not something that a system administrator will appreciate, never mind
the kernel developer. It just ends up leaving the developer and or
administrator a very bad impression of Systemtap.
How could this be mitigated:
*) Promote the use of Steven Rostedt's streamline_config, telling
people that if they decide to compile with debuginfo, they
will very likely ***badly*** regret it unless they use a
special config file that aggressively restricts their
configuration in terms of not building modules they don't need
on that system.
*) Maybe for kernel developers there should be some suggested patches
that compile the kernel with some amount of optimization
supressed, so that in particular, functions are never
inlined, and maybe in an extreme sense, optimizations are
disabled altogether --- or at least enough that if someone is
going to pay the vast cost of debuginfo, at least they will
get something useful out of it by actually being able to set
traces at arbitrary line numbers, and will hopefully be able
to access variables with much greater probability of success.
Yes, this goes against the Systemtap goal of not requiring
people to compile special kernels and rebooting, but if the
advantage of using debuginfo and being able to set
tracepoints at arbitrary points, at least for me, in the code
I've tried to instrument, I have absolutely no confidence
that I can set tracepoints where I want except at the
beginning of functions anyway. So if I'm going to slow down
my compile-edit-debug cycle in the kernel by an order of
magnitude, say to debug some really hard problem, I want to
be able to really, truly and reliabily be able to set
tracepoints **anywhere** and be able to usefully probe
variables when and where I want.
*) Alternatively, if we are going to take as a given that the only
kind of probe points that are going to be reliable is the
beginning or end of functions (and specifically, non-static
functions), is there some way to generate a restricted set of
debuginfo that only gives enough information that it is
possible to decode the types of the function parameters, but
none of the line number information? Maybe some way of simply
running nm on vmliux, and then creating some kind of magically
.c file that references all of the functions and forcing a
single .o with DWARF information with the function and type
information, and nothing else. I'm not a tools person, so
this may be a stupid way of doing it, but the basic idea is
simply having a highly compressed debuginfo file that only has
function parameter information, and nothing else, which
hopefully will only be a megabyte or two instead of hundreds
and hundreds of megabytes of debuginfo. And to do this
without having to write garguantuan .o files in the build
tree, since that slows down the compile.
I know that Systemtap can run without debuginfo, but if you
can't decode the function arguments, at that point I would
probably use ftrace because it's simpler than Systemtap.
Systemtap could add a huge amount of value over ftrace, if it
could decode function parameters without having to pay the
cost of debuginfo.
Quite frankly, these days the main reason why I haven't been
playing with Systemtap much lately is because I'm tired of
waiting for compiles to complete when compiling with
debuginfo. Sure, it's handy for getting line number
information when debugging oops, but compiling with debuginfo
is **so** painful that I'd much rather paw through
disassembled assembly code to figure out where the system died
when I need to analyze a kernel oops than to wait for a kernel
compile to finish. Pawing through assembly code takes much
less time for me, and is much more efficient, because I'm very
often recompiling the kernel tree. (This is a very different
scenario then when a distribution compiles a kernel once, on a
build machine, and as opposed to multiple times during a
development cycle.)
> - The tool's generality. Linus is rightly skeptical of a tool that
> aims too high and turns out to be too hard to use. (I believe
> "piece of shit" was his shock-value opening comment. :-)
Speaking of that.... this isn't as big of a deal for kernel
developers, but if it really is true that Systemtap is aiming to be
used for System Administrators (and I believe that based on the
assumption that debuginfo management would be done by RPM macros in
the distribution packaging, and ignoring the kernel compile-edit-debug
time problem plus some of the ways Systemtap had been marketed at
events such as the Red Hat Summit), then when looking at the Systemtap
vs. Dtrace comparison chart, I have to agree with the DTrace folks;
the Systemptap projct is very much being disengenuous about some of
the items on the part, such as the comparison of speculative tracing.
The comment "(from first principles via auxiliary data and control
structures)", and the related one for thread-local variables "(from
first principles via tid-indexed auxiliary arrays)" is really lame.
Of *course* you can do anything from first principles. A systemtap
trace is (modulo the time constraint) turing equivalent. That's like
saying there's no need for perl, I can in principle do everything in
assembly language. You *can*, but you might not want to.
One HUGE advantage has over DTrace is that it has certain constructs,
such as its default report generation, and speculative tracing, which
means you can do things on a single command line, i.e.:
dtrace -n 'syscall::exec*:return { trace(execname); }'
By default dtrace will print a line for each probe that fires, and if
you use the trace command, it will print the contents of the name.
Or take this example:
% dtrace -n 'syscall:::entry { @num[pid, execname] = count(); }'
This will automatically print out the number of system calls each
process (printed with pid and execname) was executed between the time
dtrace was started and when the adminsitrator hit ^C:
3104 gnome-terminal 2
3153 gnome-terminal 2
3098 nautilus 3
4804 java 10
599 sshd 24
8117 acroread 45
28921 dtrace 71
113 nscd 270
28920 find 3418
You can do the same thing in systemtap, but you have to do it as a
full script, and you have to explicily have a print command in each
probe statement and you have to explicitly dump out the contents of
each assocative array. Dtrace can supress the automatic output (using
-q), and for any long, sophisticated script, a Dtrace script probably
will do its own explicit output. However, for a system administrator,
they can copy simple Dtrace one-liners and modify them to their needs
much more easily than what you can do under Systemtap. Remember, most
system administrators aren't necessarily programmers!
If we are going to let distribution marketing folks to claim that
Systemtap is meant for System Admiistrators, it has to be easy to use,
and not necessarily assume deep programming skills. (Such as
simulating thread local variables using tid's --- sorry, but that's
just LAME. :-)
- Ted