This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PR6580; patch for review] first step of unwind_cache enhancement

From: Serguei Makarov <smakarov at redhat dot com>
To: "Frank Ch. Eigler" <fche at redhat dot com>, systemtap at sourceware dot org
Date: Fri, 7 Sep 2012 10:17:07 -0400 (EDT)
Subject: Re: [PR6580; patch for review] first step of unwind_cache enhancement

> Looking at the complexity, I wonder if we need such a cache at all.
> Not many stap scripts seem to require more than one backtrace/stack
> type call within them.  The opportunity for savings in the PR6580
> world is mainly to avoid unwinding *deeper than necessary*, not to
> avoid unwinding *repeatedly*.  (The case of caching user-regs from a
> full kernel unwind is special.)

If we want to implement the more complex tapset functions in the script language in terms of stack(), some scheme like this is necessary. For instance, a script-language implementation of callers(n) calls stack() up to n times.

If we are to implement each tapset function in embedded-C, basically the only improvement I can see over the current git master is that instead of a string to tokenize, we might be passing back an array of PC values from doing the full backtrace.

> +        /* We always start by fetching the current PC if it's not yet known. */
> +        if (c->uwcache.depth == 0 && c->uwcache.data[0] == 0) {
> +                if (! c->kregs) {
> +                        /* Even the current PC is unknown; so we have absolutely no data
> +                         * at any depth. Note that unlike _stp_stack_kernel_print(), we
> +                         * can't fall back to calling dump_trace() to obtain the
> +                         * backtrace. */
> +                        c->uwcache.done = 1; return 0;

> Why can't we fall back to dump_trace in a non-printing case?  

Because dump_trace prints a string, the only way to use it as a fallback is to capture the string and, again, tokenize it. This is easier and safer to do on the tapset side, in the implementation of stack().

Given the confusion, I should probably make this much more explicit in comments.

> Were you going to change the _print case to rely on this _get?

The following portion of the patch I sent already does this. Note the invocation to _stp_stack_kernel_get(), and the fact that when this invocation (or its prerequisites) fail, the new code uses all of the same fallbacks as the old code (including ones corresponding to the logic in __stp_dwarf_stack_kernel_print, which is no longer invoked):

============================ BEGIN CODE =====================================

@@ -220,10 +280,14 @@ static struct pt_regs *_stp_get_uregs(struct context *c)
 
 static void _stp_stack_kernel_print(struct context *c, int sym_flags)
 {
-	struct pt_regs *regs = NULL;
+        unsigned n, remaining;
+        unsigned long l;
+
+        if (! c->kregs) {
+                /* This is a fatal block for _stp_stack_kernel_get,
+                   but when printing a backtrace we can use this
+                   inexact fallback.
 
-	if (! c->kregs) {
-		/* For the kernel we can use an inexact fallback.
 		   When compiled with frame pointers we can do
 		   a pretty good guess at the stack value,
 		   otherwise let dump_stack guess it
@@ -248,36 +312,33 @@ static void _stp_stack_kernel_print(struct context *c, int sym_flags)
 			_stp_print("\n");
 #endif
 		return;
-	} else {
-		regs = c->kregs;
-	}
+        }
+
+        /* print the current address */
+        if (c->probe_type == _STP_PROBE_HANDLER_KRETPROBE && c->ips.krp.pi
+            && (sym_flags & _STP_SYM_FULL) == _STP_SYM_FULL) {
+                _stp_print("Returning from: ");
+                _stp_print_addr((unsigned long)_stp_probe_addr_r(c->ips.krp.pi),
+                                sym_flags, NULL);
+                _stp_print("Returning to  : ");
+        }
+        _stp_print_addr(_stp_stack_kernel_get(c, 0), sym_flags, NULL);
 
-	/* print the current address */
-	if (c->probe_type == _STP_PROBE_HANDLER_KRETPROBE && c->ips.krp.pi) {
-		if ((sym_flags & _STP_SYM_FULL) == _STP_SYM_FULL) {
-			_stp_print("Returning from: ");
-			_stp_print_addr((unsigned long)_stp_probe_addr_r(c->ips.krp.pi),
-					sym_flags, NULL);
-			_stp_print("Returning to  : ");
-		}
-		_stp_print_addr((unsigned long)_stp_ret_addr_r(c->ips.krp.pi),
-				sym_flags, NULL);
-	} else {
-		_stp_print_addr(REG_IP(regs), sym_flags, NULL);
-	}
-
-	/* print rest of stack... */
 #ifdef STP_USE_DWARF_UNWINDER
-	if (c->uregs == &c->uwcontext.info.regs) {
-		/* Unwinder needs the reg state, clear uregs ref. */
-		c->uregs = NULL;
-		c->probe_flags &= ~_STP_PROBE_STATE_FULL_UREGS;
-	}
-	__stp_dwarf_stack_kernel_print(regs, sym_flags, MAXBACKTRACE,
-				       &c->uwcontext);
+        for (n = 1; n < MAXBACKTRACE; n++) {
+                l = _stp_stack_kernel_get(c, n);
+                if (l == 0) {
+                        remaining = MAXBACKTRACE - n;
+                        _stp_stack_print_fallback(UNW_SP(&c->uwcontext_kernel.info),
+                                                  sym_flags, remaining, 0);
+                        break;
+                } else {
+                        _stp_print_addr(l, sym_flags, NULL);
+                }
+        }
 #else
 	/* Arch specific fallback for kernel backtraces. */
-	__stp_stack_print(regs, sym_flags, MAXBACKTRACE);
+	__stp_stack_print(regs, sym_flags, MAXBACKTRACE);       
 #endif
 }
 
============================ END CODE =====================================

> > +        if (c->uwcache.depth == 0) { /* need to initialize uwcontext */
> > +                if (c->uwcache.depth == 0 && c->uregs == &c->uwcontext_kernel.info.regs) {
> > +                        /* Unwinder needs the reg state, clear uregs ref. */
> > +                        c->uregs = NULL;
> > +                        c->probe_flags &= ~_STP_PROBE_STATE_FULL_UREGS;

> We should have formal bitfields for this sort of thing, if we really need it.

I'll look into this issue. The fact that this code will probably run into problems when we interleave user- and kernel- backtrace invocations gives additional motivation.

> > +function __stack_raw (n:long) %{ /* pragma:unwind */
> > +         STAP_RETVALUE = _stp_stack_kernel_get (CONTEXT, (unsigned)STAP_ARG_n);
> > +%}

> Is there a risk here, considering that the new _stp_stack_kernel_get
> doesn't check its 'n' parameter super well?

Good catch, thank you.

> > +  o->newline() << "#if defined(STP_NEED_UNWIND_DATA)";
> > +  o->newline() << "c->uwcache.depth = 0;";
> > +  o->newline() << "c->uwcache.done = 0;";
> > +  o->newline() << "c->uwcache.data[0] = 0;";
> > +  o->newline() << "#endif";
>
> That seems like too much work to impose on all probes.

Agreed; the next patch I'll send (pending discussion on this one), improves this slightly by reducing the number of assignments down to one (marking the unwind cache 'uninitialized' by setting 'done' to a magic value). There's probably a much better solution I'm not seeing, though.

All the best,
       Serguei

References:
- Re: [PR6580; patch for review] first step of unwind_cache enhancement
  - From: Frank Ch. Eigler

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]