This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: kprobe fault handling


On Wed, 2006-02-08 at 23:23, Prasanna S Panchamukhi wrote:
> Hi,
> 
> Please find the patch below to provide robust kprobe fault handling.
> Presently this patch is for i386 architecture, will port to other
> architectures if this approach looks ok. This patch is currently
> untested, appreciate any feedback.
...
> -	if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
> -		return 1;
> -

I think it's right to move the above call, but if you do, your patch
should update Documentation/kprobes.txt, which states:

"If a fault
occurs during execution of kp->pre_handler or kp->post_handler,
>>> or during single-stepping of the probed instruction, <<<
Kprobes calls kp->fault_handler."

>  	if (kcb->kprobe_status & KPROBE_HIT_SS) {
> +		/*
> +		 * We are here because the instruction being single stepped
> +		 * caused a page fault. We reset the current kprobe and the
> +		 * eip points back to the probe address and allow the page
> +		 * fault handler.
> +		 */
>  		resume_execution(cur, regs, kcb);

I agree with Anil that resume_execution() can make adjustments we don't
want (since the probed instruction hasn't completed).  Seems like what
we want is just
	regs->eip = (unsigned long) cur->addr;

>  		regs->eflags |= kcb->kprobe_old_eflags;
>  
>  		reset_current_kprobe();
>  		preempt_enable_no_resched();

Bibo said:
> When single stepped instruction cause page fault, eip will point to
> kprobe copied address, but not probed address, so execution table
> search will fail.

I don't think finding the fixup will be a problem, because we restore
regs->eip to the address of the actual probed instruction before
returning to do_page_fault().

> +	} else if ((kcb->kprobe_status & KPROBE_HIT_SSDONE) ||
> +			(kcb-kprobe_status & KPROBE_HIT_ACTIVE)) {
> +		/*
> +		 * We come here because instructions in the pre/post handler
> +		 * caused the page_fault, this could happen if handler tries
> +		 * to access user space by copy_from_user(), get_user() etc.
> +		 * Let the user-specified handler try to fix it first.
> +		 */
> +		if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr))
> +			return 1;
> +
> +		/*
> +		 * In case the user-specified fault handler returned zero,
> +		 * try to fix up.
> +		 */
> +
> +		if (fixup_exception(regs))
> +			return 1;

I think it's OK to call fixup_exceptions() here, but I believe it's
redundant.  I understood Suparna to say
(http://sourceware.org/ml/systemtap/2006-q1/msg00423.html) that if we
return 0, do_page_fault() will call fixup_exceptions() instead of trying
to bring in the missing page (since it's a kernel instruction -- in a
handler -- that faulted).  Her explanation made sense to me.

We still don't handle the case of an erroneous pre/post-handler that
incurs a fault with no fixup.  (We'd like to do something other than let
the kernel crash.)  I think we'd have to do a sort of setjmp before
calling the handler in order to do that.

Disclaimer: Some or all of this is probably wrong (again). :-}

Jim


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]