This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: kprobe fault handling
On Wed, 2006-02-08 at 23:23, Prasanna S Panchamukhi wrote:
> Hi,
>
> Please find the patch below to provide robust kprobe fault handling.
> Presently this patch is for i386 architecture, will port to other
> architectures if this approach looks ok. This patch is currently
> untested, appreciate any feedback.
...
> - if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
> - return 1;
> -
I think it's right to move the above call, but if you do, your patch
should update Documentation/kprobes.txt, which states:
"If a fault
occurs during execution of kp->pre_handler or kp->post_handler,
>>> or during single-stepping of the probed instruction, <<<
Kprobes calls kp->fault_handler."
> if (kcb->kprobe_status & KPROBE_HIT_SS) {
> + /*
> + * We are here because the instruction being single stepped
> + * caused a page fault. We reset the current kprobe and the
> + * eip points back to the probe address and allow the page
> + * fault handler.
> + */
> resume_execution(cur, regs, kcb);
I agree with Anil that resume_execution() can make adjustments we don't
want (since the probed instruction hasn't completed). Seems like what
we want is just
regs->eip = (unsigned long) cur->addr;
> regs->eflags |= kcb->kprobe_old_eflags;
>
> reset_current_kprobe();
> preempt_enable_no_resched();
Bibo said:
> When single stepped instruction cause page fault, eip will point to
> kprobe copied address, but not probed address, so execution table
> search will fail.
I don't think finding the fixup will be a problem, because we restore
regs->eip to the address of the actual probed instruction before
returning to do_page_fault().
> + } else if ((kcb->kprobe_status & KPROBE_HIT_SSDONE) ||
> + (kcb-kprobe_status & KPROBE_HIT_ACTIVE)) {
> + /*
> + * We come here because instructions in the pre/post handler
> + * caused the page_fault, this could happen if handler tries
> + * to access user space by copy_from_user(), get_user() etc.
> + * Let the user-specified handler try to fix it first.
> + */
> + if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
> + return 1;
> +
> + /*
> + * In case the user-specified fault handler returned zero,
> + * try to fix up.
> + */
> +
> + if (fixup_exception(regs))
> + return 1;
I think it's OK to call fixup_exceptions() here, but I believe it's
redundant. I understood Suparna to say
(http://sourceware.org/ml/systemtap/2006-q1/msg00423.html) that if we
return 0, do_page_fault() will call fixup_exceptions() instead of trying
to bring in the missing page (since it's a kernel instruction -- in a
handler -- that faulted). Her explanation made sense to me.
We still don't handle the case of an erroneous pre/post-handler that
incurs a fault with no fixup. (We'd like to do something other than let
the kernel crash.) I think we'd have to do a sort of setjmp before
calling the handler in order to do that.
Disclaimer: Some or all of this is probably wrong (again). :-}
Jim