This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: kernel summit session on systemtap


On Wed, Sep 17, 2008 at 10:41:15AM -0400, Frank Ch. Eigler wrote:
> Here are some things we need to work more on:
> 
> - It's time to really improve & shrink debuginfo.  Enough said.

The more I've played with debuginfo, the more I've been convinced that
at least for me, the costs vastly outweight the benefits.  It causes
the time to compile the kernel (and kernel developers need to compile
the kernel a lot) to explode, just simply due to disk I/O time; if
/lib is on a separate partition, you can simply not have the space to
store the huge, vastly bloated modules.  From the benefits side, given
GCC's increasingly aggressive optimizations, being able to set
breakpoints at random lines is less important when it (a) often
doesn't work because it's been optimized out, or (b) the symbol you
want to reference isn't easily available.  Case (b) ends up being very
frustrating because you end up getting a highly confusing error
message, such as:

	semantic error: failed to retrieve location attribute for local 'sb'
	(dieoffset: 0x9cf22): identifier '$sb' at ext4-check-desk.stp:3:47

Not something that a system administrator will appreciate, never mind
the kernel developer.  It just ends up leaving the developer and or
administrator a very bad impression of Systemtap.

How could this be mitigated:

*) Promote the use of Steven Rostedt's streamline_config, telling
	people that if they decide to compile with debuginfo, they
	will very likely ***badly*** regret it unless they use a
	special config file that aggressively restricts their
	configuration in terms of not building modules they don't need
	on that system.
 
*) Maybe for kernel developers there should be some suggested patches
	 that compile the kernel with some amount of optimization
	 supressed, so that in particular, functions are never
	 inlined, and maybe in an extreme sense, optimizations are
	 disabled altogether --- or at least enough that if someone is
	 going to pay the vast cost of debuginfo, at least they will
	 get something useful out of it by actually being able to set
	 traces at arbitrary line numbers, and will hopefully be able
	 to access variables with much greater probability of success.

	 Yes, this goes against the Systemtap goal of not requiring
	 people to compile special kernels and rebooting, but if the
	 advantage of using debuginfo and being able to set
	 tracepoints at arbitrary points, at least for me, in the code
	 I've tried to instrument, I have absolutely no confidence
	 that I can set tracepoints where I want except at the
	 beginning of functions anyway.  So if I'm going to slow down
	 my compile-edit-debug cycle in the kernel by an order of
	 magnitude, say to debug some really hard problem, I want to
	 be able to really, truly and reliabily be able to set
	 tracepoints **anywhere** and be able to usefully probe
	 variables when and where I want.

*) Alternatively, if we are going to take as a given that the only
	kind of probe points that are going to be reliable is the
	beginning or end of functions (and specifically, non-static
	functions), is there some way to generate a restricted set of
	debuginfo that only gives enough information that it is
	possible to decode the types of the function parameters, but
	none of the line number information?  Maybe some way of simply
	running nm on vmliux, and then creating some kind of magically
	.c file that references all of the functions and forcing a
	single .o with DWARF information with the function and type
	information, and nothing else.  I'm not a tools person, so
	this may be a stupid way of doing it, but the basic idea is
	simply having a highly compressed debuginfo file that only has
	function parameter information, and nothing else, which
	hopefully will only be a megabyte or two instead of hundreds
	and hundreds of megabytes of debuginfo.  And to do this
	without having to write garguantuan .o files in the build
	tree, since that slows down the compile.

	I know that Systemtap can run without debuginfo, but if you
	can't decode the function arguments, at that point I would
	probably use ftrace because it's simpler than Systemtap.
	Systemtap could add a huge amount of value over ftrace, if it
	could decode function parameters without having to pay the
	cost of debuginfo.

	Quite frankly, these days the main reason why I haven't been
	playing with Systemtap much lately is because I'm tired of
	waiting for compiles to complete when compiling with
	debuginfo.  Sure, it's handy for getting line number
	information when debugging oops, but compiling with debuginfo
	is **so** painful that I'd much rather paw through
	disassembled assembly code to figure out where the system died
	when I need to analyze a kernel oops than to wait for a kernel
	compile to finish.  Pawing through assembly code takes much
	less time for me, and is much more efficient, because I'm very
	often recompiling the kernel tree.  (This is a very different
	scenario then when a distribution compiles a kernel once, on a
	build machine, and as opposed to multiple times during a
	development cycle.)

> - The tool's generality.  Linus is rightly skeptical of a tool that
>   aims too high and turns out to be too hard to use.  (I believe
>   "piece of shit" was his shock-value opening comment.  :-)


Speaking of that.... this isn't as big of a deal for kernel
developers, but if it really is true that Systemtap is aiming to be
used for System Administrators (and I believe that based on the
assumption that debuginfo management would be done by RPM macros in
the distribution packaging, and ignoring the kernel compile-edit-debug
time problem plus some of the ways Systemtap had been marketed at
events such as the Red Hat Summit), then when looking at the Systemtap
vs. Dtrace comparison chart, I have to agree with the DTrace folks;
the Systemptap projct is very much being disengenuous about some of
the items on the part, such as the comparison of speculative tracing.

The comment "(from first principles via auxiliary data and control
structures)", and the related one for thread-local variables "(from
first principles via tid-indexed auxiliary arrays)" is really lame.
Of *course* you can do anything from first principles.  A systemtap
trace is (modulo the time constraint) turing equivalent.  That's like
saying there's no need for perl, I can in principle do everything in
assembly language.  You *can*, but you might not want to.

One HUGE advantage has over DTrace is that it has certain constructs,
such as its default report generation, and speculative tracing, which
means you can do things on a single command line, i.e.:

dtrace -n 'syscall::exec*:return { trace(execname); }'

By default dtrace will print a line for each probe that fires, and if
you use the trace command, it will print the contents of the name.

Or take this example:

% dtrace -n 'syscall:::entry { @num[pid, execname] = count(); }'

This will automatically print out the number of system calls each
process (printed with pid and execname) was executed between the time
dtrace was started and when the adminsitrator hit ^C:

 3104 gnome-terminal    2
 3153 gnome-terminal    2
 3098 nautilus          3
 4804 java             10
  599 sshd             24
 8117 acroread         45
28921 dtrace           71
  113 nscd            270
28920 find           3418


You can do the same thing in systemtap, but you have to do it as a
full script, and you have to explicily have a print command in each
probe statement and you have to explicitly dump out the contents of
each assocative array.  Dtrace can supress the automatic output (using
-q), and for any long, sophisticated script, a Dtrace script probably
will do its own explicit output.  However, for a system administrator,
they can copy simple Dtrace one-liners and modify them to their needs
much more easily than what you can do under Systemtap.  Remember, most
system administrators aren't necessarily programmers!

If we are going to let distribution marketing folks to claim that
Systemtap is meant for System Admiistrators, it has to be easy to use,
and not necessarily assume deep programming skills.  (Such as
simulating thread local variables using tid's --- sorry, but that's
just LAME.  :-)

                                                        - Ted


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]