This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [3/3] Userspace probes prototype-take2


I lost an important comment. The patch is not aware of signal processing. After kernel prepares the single-step-inst on the stack, if a signal is delivered to the thread, kernel will save some states into stack and switch to signal handler function, so single-step-inst on the stack might be erased.

>>-----Original Message-----
>>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org] On Behalf Of Zhang, Yanmin
>>Sent: 2006年2月17日 17:20
>>To: prasanna@in.ibm.com; systemtap@sources.redhat.com
>>Subject: RE: [3/3] Userspace probes prototype-take2
>>
>>2 main issues:
>>1) task switch caused by external interrupt when single-step;
>>2) multi-thread:
>>
>>See below inline comments.
>>
>>Yanmin
>>
>>>>-----Original Message-----
>>>>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org] On Behalf Of Prasanna S Panchamukhi
>>>>Sent: 2006年2月8日 22:14
>>>>To: systemtap@sources.redhat.com
>>>>Subject: Re: [3/3] Userspace probes prototype-take2
>>>>
>>>>
>>>>This patch handles the executing the registered callback
>>>>functions when probes is hit.
>>>>
>>>>	Each userspace probe is uniquely identified by the
>>>>combination of inode and offset, hence during registeration the inode
>>>>and offset combination is added to kprobes hash table. Initially when
>>>>breakpoint instruction is hit, the kprobes hash table is looked up
>>>>for matching inode and offset. The pre_handlers are called in sequence
>>>>if multiple probes are registered. The original instruction is single
>>>>stepped out-of-line similar to kernel probes. In kernel space probes,
>>>>single stepping out-of-line is achieved by copying the instruction on
>>>>to some location within kernel address space and then single step
>>>>from that location. But for userspace probes, instruction copied
>>>>into kernel address space cannot be single stepped, hence the
>>>>instruction should be copied to user address space. The solution is
>>>>to find free space in the current process address space and then copy
>>>>the original instruction and single step that instruction.
>>>>
>>>>User processes use stack space to store local variables, agruments and
>>>>return values. Normally the stack space either below or above the
>>>>stack pointer indicates the free stack space. If the stack grows
>>>>downwards, the stack space below the stack pointer indicates the
>>>>unused stack free space and if the stack grows upwards, the stack
>>>>space above the stack pointer indicates the unused stack free space.
>>>>
>>>>The instruction to be single stepped can modify the stack space, hence
>>>>before using the unused stack free space, sufficient stack space
>>>>should be left. The instruction is copied to the bottom of the page
>>>>and check is made such that the copied instruction does not cross the
>>>>page boundry. The copied instruction is then single stepped.
>>>>Several architectures does not allow the instruction to be executed
>>>>from the stack location, since no-exec bit is set for the stack pages.
>>>>In those architectures, the page table entry corresponding to the
>>>>stack page is identified and the no-exec bit is unset making the
>>>>instruction on that stack page to be executed.
>>>>
>>>>There are situations where even the unused free stack space is not
>>>>enough for the user instruction to be copied and single stepped. In
>>>>such situations, the virtual memory area(vma) can be expanded beyond
>>>>the current stack vma. This expaneded stack can be used to copy the
>>>>original instruction and single step out-of-line.
>>>>
>>>>Even if the vma cannot be extended then the instruction much be
>>>>executed inline, by replacing the breakpoint instruction with original
>>>>instruction.
>>>>
>>>>TODO list
>>>>--------
>>>>1. This patch is not stable yet, should work for most conditions.
>>>>
>>>>2. This patch works only with PREEMPT config option disabled, to work
>>>>in PREEMPT enabled condition handlers must be re-written and must
>>>>be seperated out from kernel probes allowing preemption.
>>One of my old comments is an external device interrupt might happen when cpu is single-stepping the original instruction, then the task
>>might be switched to another cpu. If we disable irq when exiting to user space to single step the instruction, kernel might switch the
>>task off just on the exit kernel path. 1) uprobe_page; 2) kprobe_ctlblk, These 2 resources shouldn't be pre cpu, or we need get another
>>approach. How could you resolve the task switch issue?
>>
>>
>>
>>>>
>>>>3. Insert probes on copy-on-write pages. Tracks all COW pages for the
>>>>page containing the specified probe point and inserts/removes all the
>>>>probe points for that page.
>>>>
>>>>4. Optimize the insertion of probes through readpage hooks. Identify
>>>>all the probes to be inserted on the read page and insert them at
>>>>once.
>>>>
>>>>5. Resume exectution should handle setting of proper eip and eflags
>>>>for special instructions similar to kernel probes.
>>>>
>>>>6. Single stepping out-of-line expands the stack if there is no
>>>>enough stack space to copy the original instruction. Expansion of
>>>>stack should be shrinked back to the original size after single
>>>>stepping or the expanded stack should be reused for single stepping
>>>>out-of-line for other probes.
>>>>
>>>>7. A wrapper routines to calculate the offset from the probed file
>>>>beginning. In case of dynamic shared library, the offset is
>>>>calculated by substracting the address of the probe point from the
>>>>beginning of the file mapped address.
>>>>
>>>>8. Handing of page faults while inthe kprobes_handler() and while
>>>>single stepping.
>>>>
>>>>9. Accessing user space pages not present in memory, from the
>>>>registered callback routines.
>>>>
>>>>Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
>>>>
>>>>
>>>> arch/i386/kernel/kprobes.c |  460 +++++++++++++++++++++++++++++++++++++++++++--
>>>> include/asm-i386/kprobes.h |   13 +
>>>> include/linux/kprobes.h    |    7
>>>> kernel/kprobes.c           |    3
>>>> 4 files changed, 468 insertions(+), 15 deletions(-)
>>>>
>>>>diff -puN arch/i386/kernel/kprobes.c~kprobes_userspace_probes_ss_out-of-line arch/i386/kernel/kprobes.c
>>>>--- linux-2.6.16-rc1-mm5/arch/i386/kernel/kprobes.c~kprobes_userspace_probes_ss_out-of-line	2006-02-08 19:26:09.000000000 +0530
>>>>+++ linux-2.6.16-rc1-mm5-prasanna/arch/i386/kernel/kprobes.c	2006-02-08 19:26:10.000000000 +0530
>>>>@@ -30,6 +30,7 @@
>>>>
>>>> #include <linux/config.h>
>>>> #include <linux/kprobes.h>
>>>>+#include <linux/hash.h>
>>>> #include <linux/ptrace.h>
>>>> #include <linux/preempt.h>
>>>> #include <asm/cacheflush.h>
>>>>@@ -38,8 +39,12 @@
>>>>
>>>> void jprobe_return_end(void);
>>>>
>>>>+static struct uprobe_page *uprobe_page;
>>>>+static struct hlist_head uprobe_page_table[KPROBE_TABLE_SIZE];
>>>> DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
>>>> DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
>>>>+DEFINE_PER_CPU(struct uprobe *, current_uprobe) = NULL;
>>>>+DEFINE_PER_CPU(unsigned long, singlestep_addr);
>>>>
>>>> /* insert a jmp code */
>>>> static inline void set_jmp_op(void *from, void *to)
>>>>@@ -125,6 +130,23 @@ void __kprobes arch_disarm_kprobe(struct
>>>> 			   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
>>>> }
>>>>
>>>>+void __kprobes arch_disarm_uprobe(struct kprobe *p, kprobe_opcode_t *address)
>>>>+{
>>>>+	*address = p->opcode;
>>>>+}
>>>>+
>>>>+void __kprobes arch_arm_uprobe(unsigned long *address)
>>>>+{
>>>>+	*(kprobe_opcode_t *)address = BREAKPOINT_INSTRUCTION;
>>>>+}
>>>>+
>>>>+void __kprobes arch_copy_uprobe(struct kprobe *p, unsigned long *address)
>>>>+{
>>>>+	memcpy(p->ainsn.insn, (kprobe_opcode_t *)address,
>>>>+				MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
>>>>+	p->opcode = *(kprobe_opcode_t *)address;
>>>>+}
>>>>+
>>>> static inline void save_previous_kprobe(struct kprobe_ctlblk *kcb)
>>>> {
>>>> 	kcb->prev_kprobe.kp = kprobe_running();
>>>>@@ -151,15 +173,326 @@ static inline void set_current_kprobe(st
>>>> 		kcb->kprobe_saved_eflags &= ~IF_MASK;
>>>> }
>>>>
>>>>+struct uprobe_page __kprobes *get_upage_current(struct task_struct *tsk)
>>>>+{
>>>>+	struct hlist_head *head;
>>>>+	struct hlist_node *node;
>>>>+	struct uprobe_page *upage;
>>>>+
>>>>+	head = &uprobe_page_table[hash_ptr(current, KPROBE_HASH_BITS)];
>>>>+	hlist_for_each_entry(upage, node, head, hlist) {
>>>>+		if (upage->tsk == tsk)
>>>>+			return upage;
>>>>+        }
>>>>+	return NULL;
>>>>+}
>>>>+
>>>>+struct uprobe_page __kprobes *get_upage_free(struct task_struct *tsk)
>>>>+{
>>>>+	int cpu;
>>>>+
>>>>+	for_each_cpu(cpu) {
>>>>+		struct uprobe_page *upage;
>>>>+		upage = per_cpu_ptr(uprobe_page, cpu);
>>>>+		if (upage->status & UPROBE_PAGE_FREE)
>>>>+			return upage;
>>>>+	}
>>>>+	return NULL;
>>>>+}
>>>>+
>>>>+/**
>>>>+ * This routines get the pte of the page containing the specified address.
>>>>+ */
>>>>+static pte_t  __kprobes *get_uprobe_pte(unsigned long address)
>>>>+{
>>>>+	pgd_t *pgd;
>>>>+	pud_t *pud;
>>>>+	pmd_t *pmd;
>>>>+	pte_t *pte = NULL;
>>>>+
>>>>+	pgd = pgd_offset(current->mm, address);
>>>>+	if (!pgd)
>>>>+		goto out;
>>>>+
>>>>+	pud = pud_offset(pgd, address);
>>>>+	if (!pud)
>>>>+		goto out;
>>>>+
>>>>+	pmd = pmd_offset(pud, address);
>>>>+	if (!pmd)
>>>>+		goto out;
>>>>+
>>>>+	pte = pte_alloc_map(current->mm, pmd, address);
>>>>+
>>>>+out:
>>>>+	return pte;
>>>>+}
>>>>+
>>>>+/**
>>>>+ *  This routine check for space in the current process's stack address space.
>>>>+ *  If enough address space is found, it just maps a new page and copies the
>>>>+ *  new instruction on that page for single stepping out-of-line.
>>>>+ */
>>>>+static int __kprobes copy_insn_on_new_page(struct uprobe *uprobe ,
>>>>+			struct pt_regs *regs, struct vm_area_struct *vma)
>>>>+{
>>>>+	unsigned long addr, *vaddr, stack_addr = regs->esp;
>>>>+	int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t);
>>>>+	struct uprobe_page *upage;
>>>>+	struct page *page;
>>>>+	pte_t *pte;
>>>>+
>>>>+
>>>>+	if (vma->vm_flags & VM_GROWSDOWN) {
>>>>+		if (((stack_addr - sizeof(long long))) < (vma->vm_start + size))
>>>>+			return -ENOMEM;
>>>>+
>>>>+		addr = vma->vm_start;
>>>>+	} else if (vma->vm_flags & VM_GROWSUP) {
>>>>+		if ((vma->vm_end - size) < (stack_addr + sizeof(long long)))
>>>>+			return -ENOMEM;
>>>>+
>>>>+		addr = vma->vm_end - size;
>>>>+	} else
>>>>+		return -EFAULT;
>>>>+
>>The multi-thread case is not resolved here. One of typical multi-thread model is that the all threads share the same vma and every thread
>>has 8-k stack. If 2 threads trigger uprobe (although might be not the same uprobe) at the same time, one thread might erase single-step
>>instruction of another.
>>
>>
>>
>>>>+	preempt_enable_no_resched();
>>>>+
>>>>+	pte = get_uprobe_pte(addr);
>>>>+	preempt_disable();
>>>>+	if (!pte)
>>>>+		return -EFAULT;
>>>>+
>>>>+	upage = get_upage_free(current);
>>>>+	upage->status &= ~UPROBE_PAGE_FREE;
>>>>+	upage->tsk = current;
>>>>+	INIT_HLIST_NODE(&upage->hlist);
>>>>+	hlist_add_head(&upage->hlist,
>>>>+		&uprobe_page_table[hash_ptr(current, KPROBE_HASH_BITS)]);
>>>>+
>>>>+	upage->orig_pte = pte;
>>>>+	upage->orig_pte_val =  pte_val(*pte);
>>>>+	set_pte(pte, (*(upage->alias_pte)));
>>>>+
>>>>+	page = pte_page(*pte);
>>>>+	vaddr = (unsigned long *)kmap_atomic(page, KM_USER1);
>>>>+	vaddr = (unsigned long *)((unsigned long)vaddr + (addr & ~PAGE_MASK));
>>>>+	memcpy(vaddr, (unsigned long *)uprobe->kp.ainsn.insn, size);
>>>>+	kunmap_atomic(vaddr, KM_USER1);
>>>>+	regs->eip = addr;
>>So the temp page, upage->alias_addr, replaces the original one on the stack. If the replaced instruction is to operate stack, such like
>>"push eax", the result might be on the new page. After the single step, the pte is restored to the original page which doesn't have
>>the value of eax.
>>
>>
>>
>>>>+
>>>>+	return 0;
>>>>+}
>>>>+
>>>>+/**
>>>>+ * This routine expands the stack beyond the present process address space
>>>>+ * and copies the instruction to that location, so that processor can
>>>>+ * single step out-of-line.
>>>>+ */
>>>>+static int __kprobes copy_insn_onexpstack(struct uprobe *uprobe,
>>>>+			struct pt_regs *regs, struct vm_area_struct *vma)
>>It has the same issues like function copy_insn_on_new_page.
>>
>>
>>>>+{
>>>>+	unsigned long addr, *vaddr, vm_addr;
>>>>+	int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t);
>>>>+	struct vm_area_struct *new_vma;
>>>>+	struct uprobe_page *upage;
>>>>+	struct mm_struct *mm = current->mm;
>>>>+	struct page *page;
>>>>+	pte_t *pte;
>>>>+
>>>>+
>>>>+	if (vma->vm_flags & VM_GROWSDOWN)
>>>>+		vm_addr = vma->vm_start - size;
>>>>+	else if (vma->vm_flags & VM_GROWSUP)
>>>>+		vm_addr = vma->vm_end + size;
>>>>+	else
>>>>+		return -EFAULT;
>>>>+
>>>>+	preempt_enable_no_resched();
>>>>+
>>>>+	/* TODO: do we need to expand stack if extend_vma fails? */
>>>>+	new_vma = find_extend_vma(mm, vm_addr);
>>>>+	preempt_disable();
>>>>+	if (!new_vma)
>>>>+		return -ENOMEM;
>>>>+
>>>>+	/*
>>>>+	 * TODO: Expanding stack for every probe is not a good idea, stack must
>>>>+	 * either be shrunk to its original size after single stepping or the
>>>>+	 * expanded stack should be kept track of, for the probed application,
>>>>+	 * so it can be reused to single step out-of-line
>>>>+	 */
>>>>+	if (new_vma->vm_flags & VM_GROWSDOWN)
>>>>+		addr = new_vma->vm_start;
>>>>+	else
>>>>+		addr = new_vma->vm_end - size;
>>>>+
>>>>+	preempt_enable_no_resched();
>>>>+	pte = get_uprobe_pte(addr);
>>>>+	preempt_disable();
>>>>+	if (!pte)
>>>>+		return -EFAULT;
>>>>+
>>>>+	upage = get_upage_free(current);
>>>>+	upage->status &= ~UPROBE_PAGE_FREE;
>>>>+	upage->tsk = current;
>>>>+	INIT_HLIST_NODE(&upage->hlist);
>>>>+	hlist_add_head(&upage->hlist,
>>>>+		&uprobe_page_table[hash_ptr(current, KPROBE_HASH_BITS)]);
>>>>+	upage->orig_pte = pte;
>>>>+	upage->orig_pte_val =  pte_val(*pte);
>>>>+	set_pte(pte, (*(upage->alias_pte)));
>>>>+
>>>>+	page = pte_page(*pte);
>>>>+	vaddr = (unsigned long *)kmap_atomic(page, KM_USER1);
>>>>+	vaddr = (unsigned long *)((unsigned long)vaddr + (addr & ~PAGE_MASK));
>>>>+	memcpy(vaddr, (unsigned long *)uprobe->kp.ainsn.insn, size);
>>>>+	kunmap_atomic(vaddr, KM_USER1);
>>>>+	regs->eip = addr;
>>>>+
>>>>+	return  0;
>>>>+}
>>>>+
>>>>+/**
>>>>+ * This routine checks for stack free space below the stack pointer and
>>>>+ * then copies the instructions at that location so that the processor can
>>>>+ * single step out-of-line. If there is no enough stack space or if
>>>>+ * copy_to_user fails or if the vma is invalid, it returns error.
>>>>+ */
>>>>+static int __kprobes copy_insn_onstack(struct uprobe *uprobe,
>>>>+			struct pt_regs *regs, unsigned long flags)
>>>>+{
>>>>+	unsigned long page_addr, stack_addr = regs->esp;
>>>>+	int  size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t);
>>>>+	unsigned long *source = (unsigned long *)uprobe->kp.ainsn.insn;
>>>>+
>>>>+	if (flags & VM_GROWSDOWN) {
>>>>+		page_addr = stack_addr & PAGE_MASK;
>>>>+
>>>>+		if (((stack_addr - sizeof(long long))) < (page_addr + size))
>>>>+			return -ENOMEM;
>>>>+
>>>>+		if (__copy_to_user_inatomic((unsigned long *)page_addr, source,
>>>>+									size))
>>>>+			return -EFAULT;
>>>>+
>>>>+		regs->eip = page_addr;
>>>>+	} else if (flags & VM_GROWSUP) {
>>>>+		page_addr = stack_addr & PAGE_MASK;
>>>>+
>>>>+		if (page_addr == stack_addr)
>>>>+			return -ENOMEM;
>>>>+		else
>>>>+			page_addr += PAGE_SIZE;
>>>>+
>>>>+		if ((page_addr - size) < (stack_addr + sizeof(long long)))
>>>>+			return -ENOMEM;
>>>>+
>>>>+		if (__copy_to_user_inatomic((unsigned long *)(page_addr - size),
>>>>+								source, size))
>>>>+			return -EFAULT;
>>>>+
>>>>+		regs->eip = page_addr - size;
>>>>+	} else
>>>>+		return -EINVAL;
>>>>+
>>>>+	return 0;
>>>>+}
>>>>+
>>>>+/**
>>>>+ * This routines get the page containing the probe, maps it and
>>>>+ * replaced the instruction at the probed address with specified
>>>>+ * opcode.
>>>>+ */
>>>>+void __kprobes replace_original_insn(struct uprobe *uprobe,
>>>>+				struct pt_regs *regs, kprobe_opcode_t opcode)
>>>>+{
>>>>+	kprobe_opcode_t *addr;
>>>>+	struct page *page;
>>>>+
>>>>+	page = find_get_page(uprobe->inode->i_mapping,
>>>>+					uprobe->offset >> PAGE_CACHE_SHIFT);
>>>>+	lock_page(page);
>>>>+
>>>>+	addr = (kprobe_opcode_t *)kmap_atomic(page, KM_USER0);
>>>>+	addr = (kprobe_opcode_t *)((unsigned long)addr +
>>>>+				 (unsigned long)(uprobe->offset & ~PAGE_MASK));
>>>>+	*addr = opcode;
>>>>+	/*TODO: flush vma ? */
>>>>+	kunmap_atomic(addr, KM_USER0);
>>>>+
>>>>+	unlock_page(page);
>>>>+
>>>>+	page_cache_release(page);
>>>>+	regs->eip = (unsigned long)uprobe->kp.addr;
>>>>+}
>>>>+
>>>>+/**
>>>>+ * This routine provides the functionality of single stepping out of line.
>>>>+ * If single stepping out-of-line cannot be achieved, it replaces with
>>>>+ * the original instruction allowing it to single step inline.
>>>>+ */
>>>>+static inline int uprobe_single_step(struct kprobe *p, struct pt_regs *regs)
>>>>+{
>>>>+	unsigned long stack_addr = regs->esp, flags;
>>>>+	struct vm_area_struct *vma = NULL;
>>>>+	struct uprobe *uprobe =  __get_cpu_var(current_uprobe);
>>>>+	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>>>>+	int err = 0;
>>>>+
>>>>+	down_read(&current->mm->mmap_sem);
>>>>+
>>>>+	vma = find_vma(current->mm, (stack_addr & PAGE_MASK));
>>>>+	if (!vma) {
>>>>+		/* TODO: Need better error reporting? */
>>>>+		printk("No vma found\n");
>>>>+		up_read(&current->mm->mmap_sem);
>>>>+		return -ENOENT;
>>>>+	}
>>>>+	flags = vma->vm_flags;
>>>>+	up_read(&current->mm->mmap_sem);
>>>>+
>>>>+	kcb->kprobe_status |= UPROBE_SS_STACK;
>>>>+	err = copy_insn_onstack(uprobe, regs, flags);
>>>>+
>>>>+	down_write(&current->mm->mmap_sem);
>>>>+
>>>>+	if (err) {
>>>>+		kcb->kprobe_status |= UPROBE_SS_NEW_STACK;
>>>>+		err = copy_insn_on_new_page(uprobe, regs, vma);
>>>>+	}
>>>>+	if (err) {
>>>>+		kcb->kprobe_status |= UPROBE_SS_EXPSTACK;
>>>>+		err = copy_insn_onexpstack(uprobe, regs, vma);
>>>>+	}
>>>>+
>>>>+	up_write(&current->mm->mmap_sem);
>>>>+
>>>>+	if (err) {
>>>>+		kcb->kprobe_status |= UPROBE_SS_INLINE;
>>>>+		replace_original_insn(uprobe, regs, uprobe->kp.opcode);
>>>>+	}
>>>>+
>>>>+	 __get_cpu_var(singlestep_addr) = regs->eip;
>>>>+
>>>>+
>>>>+	return 0;
>>>>+}
>>>>+
>>>> static inline void prepare_singlestep(struct kprobe *p, struct pt_regs *regs)
>>>> {
>>>> 	regs->eflags |= TF_MASK;
>>>> 	regs->eflags &= ~IF_MASK;
>>>> 	/*single step inline if the instruction is an int3*/
>>>>+
>>>> 	if (p->opcode == BREAKPOINT_INSTRUCTION)
>>>> 		regs->eip = (unsigned long)p->addr;
>>>>-	else
>>>>-		regs->eip = (unsigned long)&p->ainsn.insn;
>>>>+	else {
>>>>+		if (!kernel_text_address((unsigned long)p->addr))
>>>>+			uprobe_single_step(p, regs);
>>>>+		else
>>>>+			regs->eip = (unsigned long)&p->ainsn.insn;
>>>>+	}
>>>> }
>>>>
>>>> /* Called with kretprobe_lock held */
>>>>@@ -194,6 +527,7 @@ static int __kprobes kprobe_handler(stru
>>>> 	kprobe_opcode_t *addr = NULL;
>>>> 	unsigned long *lp;
>>>> 	struct kprobe_ctlblk *kcb;
>>>>+	unsigned seg = regs->xcs & 0xffff;
>>>> #ifdef CONFIG_PREEMPT
>>>> 	unsigned pre_preempt_count = preempt_count();
>>>> #endif /* CONFIG_PREEMPT */
>>>>@@ -208,14 +542,21 @@ static int __kprobes kprobe_handler(stru
>>>> 	/* Check if the application is using LDT entry for its code segment and
>>>> 	 * calculate the address by reading the base address from the LDT entry.
>>>> 	 */
>>>>-	if ((regs->xcs & 4) && (current->mm)) {
>>>>+
>>>>+	if (regs->eflags & VM_MASK)
>>>>+		addr = (kprobe_opcode_t *)(((seg << 4) + regs->eip -
>>>>+			sizeof(kprobe_opcode_t)) & 0xffff);
>>>>+	else if ((regs->xcs & 4) && (current->mm)) {
>>>>+		local_irq_enable();
>>>>+		down(&current->mm->context.sem);
>>>> 		lp = (unsigned long *) ((unsigned long)((regs->xcs >> 3) * 8)
>>>> 					+ (char *) current->mm->context.ldt);
>>>> 		addr = (kprobe_opcode_t *) (get_desc_base(lp) + regs->eip -
>>>> 						sizeof(kprobe_opcode_t));
>>>>-	} else {
>>>>+		up(&current->mm->context.sem);
>>>>+		local_irq_disable();
>>>>+	} else
>>>> 		addr = (kprobe_opcode_t *)(regs->eip - sizeof(kprobe_opcode_t));
>>>>-	}
>>>> 	/* Check we're not actually recursing */
>>>> 	if (kprobe_running()) {
>>>> 		p = get_kprobe(addr);
>>>>@@ -235,7 +576,6 @@ static int __kprobes kprobe_handler(stru
>>>> 			save_previous_kprobe(kcb);
>>>> 			set_current_kprobe(p, regs, kcb);
>>>> 			kprobes_inc_nmissed_count(p);
>>>>-			prepare_singlestep(p, regs);
>>>> 			kcb->kprobe_status = KPROBE_REENTER;
>>>> 			return 1;
>>>> 		} else {
>>>>@@ -307,8 +647,8 @@ static int __kprobes kprobe_handler(stru
>>>> 	}
>>>>
>>>> ss_probe:
>>>>-	prepare_singlestep(p, regs);
>>>> 	kcb->kprobe_status = KPROBE_HIT_SS;
>>>>+	prepare_singlestep(p, regs);
>>>> 	return 1;
>>>>
>>>> no_kprobe:
>>>>@@ -498,6 +838,33 @@ no_change:
>>>> 	return;
>>>> }
>>>>
>>>>+static void __kprobes resume_execution_user(struct uprobe *uprobe,
>>>>+				struct pt_regs *regs, struct kprobe_ctlblk *kcb)
>>>>+{
>>>>+	unsigned long delta;
>>>>+	struct uprobe_page *upage;
>>>>+
>>>>+	/*
>>>>+	 * TODO :need to fixup special instructions as done with kernel probes.
>>>>+	 */
>>>>+	delta = regs->eip - __get_cpu_var(singlestep_addr);
>>>>+	regs->eip = (unsigned long)(uprobe->kp.addr + delta);
>>>>+
>>>>+	if ((kcb->kprobe_status & UPROBE_SS_EXPSTACK) ||
>>>>+			(kcb->kprobe_status & UPROBE_SS_NEW_STACK)) {
>>>>+		upage = get_upage_current(current);
>>>>+		set_pte(upage->orig_pte, __pte(upage->orig_pte_val));
>>>>+		pte_unmap(upage->orig_pte);
>>>>+
>>>>+		upage->status = UPROBE_PAGE_FREE;
>>>>+		hlist_del(&upage->hlist);
>>>>+
>>>>+	} else if (kcb->kprobe_status & UPROBE_SS_INLINE)
>>>>+		replace_original_insn(uprobe, regs,
>>>>+				(kprobe_opcode_t)BREAKPOINT_INSTRUCTION);
>>>>+	regs->eflags &= ~TF_MASK;
>>>>+}
>>>>+
>>>> /*
>>>>  * Interrupts are disabled on entry as trap1 is an interrupt gate and they
>>>>  * remain disabled thoroughout this function.
>>>>@@ -510,16 +877,19 @@ static inline int post_kprobe_handler(st
>>>> 	if (!cur)
>>>> 		return 0;
>>>>
>>>>-	if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
>>>>-		kcb->kprobe_status = KPROBE_HIT_SSDONE;
>>>>+	if (!(kcb->kprobe_status & KPROBE_REENTER) && cur->post_handler) {
>>>>+		kcb->kprobe_status |= KPROBE_HIT_SSDONE;
>>>> 		cur->post_handler(cur, regs, 0);
>>>> 	}
>>>>
>>>>-	resume_execution(cur, regs, kcb);
>>>>+	if (!kernel_text_address((unsigned long)cur->addr))
>>>>+		resume_execution_user(__get_cpu_var(current_uprobe), regs, kcb);
>>>>+	else
>>>>+		resume_execution(cur, regs, kcb);
>>>> 	regs->eflags |= kcb->kprobe_saved_eflags;
>>>>
>>>> 	/*Restore back the original saved kprobes variables and continue. */
>>>>-	if (kcb->kprobe_status == KPROBE_REENTER) {
>>>>+	if (kcb->kprobe_status & KPROBE_REENTER) {
>>>> 		restore_previous_kprobe(kcb);
>>>> 		goto out;
>>>> 	}
>>>>@@ -547,7 +917,13 @@ static inline int kprobe_fault_handler(s
>>>> 		return 1;
>>>>
>>>> 	if (kcb->kprobe_status & KPROBE_HIT_SS) {
>>>>-		resume_execution(cur, regs, kcb);
>>>>+		if (!kernel_text_address((unsigned long)cur->addr)) {
>>>>+			struct uprobe *uprobe =  __get_cpu_var(current_uprobe);
>>>>+			/* TODO: Proper handling of all instruction */
>>>>+			replace_original_insn(uprobe, regs, uprobe->kp.opcode);
>>>>+			regs->eflags &= ~TF_MASK;
>>>>+		} else
>>>>+			resume_execution(cur, regs, kcb);
>>>> 		regs->eflags |= kcb->kprobe_old_eflags;
>>>>
>>>> 		reset_current_kprobe();
>>>>@@ -654,7 +1030,67 @@ int __kprobes longjmp_break_handler(stru
>>>> 	return 0;
>>>> }
>>>>
>>>>+static void free_alias(void)
>>>>+{
>>>>+	int cpu;
>>>>+
>>>>+	for_each_cpu(cpu) {
>>>>+		struct uprobe_page *upage;
>>>>+		upage = per_cpu_ptr(uprobe_page, cpu);
>>>>+
>>>>+		if (upage->alias_addr) {
>>>>+			set_pte(upage->alias_pte, __pte(upage->alias_pte_val));
>>>>+			kfree(upage->alias_addr);
>>>>+		}
>>>>+		upage->alias_pte = 0;
>>>>+	}
>>>>+	free_percpu(uprobe_page);
>>>>+	return;
>>>>+}
>>>>+
>>>>+static int alloc_alias(void)
>>>>+{
>>>>+	int cpu;
>>>>+
>>>>+	uprobe_page = __alloc_percpu(sizeof(struct uprobe_page));
>>[YM] Do here codes try to resolve the problem of task switch at single-step? If so, the per cpu data also might be used up although
>>get_upage_free will go through all uprobe_page of all cpus. I suggest to allocate a series of uprobe_page, and allocate again when they
>>are used up.
>>
>>
>>
>>
>>>>+
>>>>+	for_each_cpu(cpu) {
>>>>+		struct uprobe_page *upage;
>>>>+		upage = per_cpu_ptr(uprobe_page, cpu);
>>>>+		upage->alias_addr = kmalloc(PAGE_SIZE, GFP_USER);
>>[YM] Does kmalloc(PAGE_SIZE...) imply the result is aligned to page? How about using alloc_page?
>>
>>
>>>>+		if (!upage->alias_addr) {
>>>>+			free_alias();
>>>>+			return -ENOMEM;
>>>>+		}
>>>>+		upage->alias_pte = lookup_address(
>>>>+					(unsigned long)upage->alias_addr);
>>>>+		upage->alias_pte_val = pte_val(*upage->alias_pte);
>>>>+		if (upage->alias_pte) {
>>[YM] If kmalloc returns a non-NULL address, upage->alias_pte is not equal to NULL. So delete above checking?
>>
>>
>>>>+			upage->status = UPROBE_PAGE_FREE;
>>>>+			set_pte(upage->alias_pte,
>>>>+						pte_mkdirty(*upage->alias_pte));
>>>>+			set_pte(upage->alias_pte,
>>>>+						pte_mkexec(*upage->alias_pte));
>>>>+			set_pte(upage->alias_pte,
>>>>+						 pte_mkwrite(*upage->alias_pte));
>>>>+			set_pte(upage->alias_pte,
>>>>+						pte_mkyoung(*upage->alias_pte));
>>>>+		}
>>>>+	}
>>>>+	return 0;
>>>>+}
>>>>+
>>>> int __init arch_init_kprobes(void)
>>>> {
>>>>+	int ret = 0;
>>>>+	/*
>>>>+	 * user space probes requires a page to copy the original instruction
>>>>+	 * so that it can single step if there is no free stack space, allocate
>>>>+	 * per cpu page.
>>>>+	 */
>>>>+
>>>>+	if ((ret = alloc_alias()))
>>>>+		return ret;
>>>>+
>>>> 	return 0;
>>>> }
>>>>diff -puN include/asm-i386/kprobes.h~kprobes_userspace_probes_ss_out-of-line include/asm-i386/kprobes.h
>>>>--- linux-2.6.16-rc1-mm5/include/asm-i386/kprobes.h~kprobes_userspace_probes_ss_out-of-line	2006-02-08 19:26:09.000000000 +0530
>>>>+++ linux-2.6.16-rc1-mm5-prasanna/include/asm-i386/kprobes.h	2006-02-08 19:26:10.000000000 +0530
>>>>@@ -42,6 +42,7 @@ typedef u8 kprobe_opcode_t;
>>>> #define JPROBE_ENTRY(pentry)	(kprobe_opcode_t *)pentry
>>>> #define ARCH_SUPPORTS_KRETPROBES
>>>> #define arch_remove_kprobe(p)	do {} while (0)
>>>>+#define UPROBE_PAGE_FREE 0x00000001
>>>>
>>>> void kretprobe_trampoline(void);
>>>>
>>>>@@ -74,6 +75,18 @@ struct kprobe_ctlblk {
>>>> 	struct prev_kprobe prev_kprobe;
>>>> };
>>>>
>>>>+/* per cpu uprobe page structure */
>>>>+struct uprobe_page {
>>>>+	struct hlist_node hlist;
>>>>+	pte_t *alias_pte;
>>>>+	pte_t *orig_pte;
>>>>+	unsigned long orig_pte_val;
>>>>+	unsigned long alias_pte_val;
>>[YM] I think the patch doesn't support CONFIG_X86_PAE, because if CONFIG_X86_PAE=y, pte_t becomes 64 bits.
>>How about changing above 2 members' type to pte_t directly?
>>
>>
>>
>>>>+	void *alias_addr;
>>>>+	struct task_struct *tsk;
>>>>+	unsigned long status;
>>>>+};
>>>>+
>>>> /* trap3/1 are intr gates for kprobes.  So, restore the status of IF,
>>>>  * if necessary, before executing the original int3/1 (trap) handler.
>>>>  */
>>>>diff -puN include/linux/kprobes.h~kprobes_userspace_probes_ss_out-of-line include/linux/kprobes.h
>>>>--- linux-2.6.16-rc1-mm5/include/linux/kprobes.h~kprobes_userspace_probes_ss_out-of-line	2006-02-08 19:26:09.000000000 +0530
>>>>+++ linux-2.6.16-rc1-mm5-prasanna/include/linux/kprobes.h	2006-02-08 19:26:10.000000000 +0530
>>>>@@ -45,11 +45,18 @@
>>>> #ifdef CONFIG_KPROBES
>>>> #include <asm/kprobes.h>
>>>>
>>>>+#define KPROBE_HASH_BITS 6
>>>>+#define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS)
>>>>+
>>>> /* kprobe_status settings */
>>>> #define KPROBE_HIT_ACTIVE	0x00000001
>>>> #define KPROBE_HIT_SS		0x00000002
>>>> #define KPROBE_REENTER		0x00000004
>>>> #define KPROBE_HIT_SSDONE	0x00000008
>>>>+#define UPROBE_SS_STACK		0x00000010
>>>>+#define UPROBE_SS_EXPSTACK	0x00000020
>>>>+#define UPROBE_SS_INLINE	0x00000040
>>>>+#define UPROBE_SS_NEW_STACK	0x00000080
>>>>
>>>> /* Attach to insert probes on any functions which should be ignored*/
>>>> #define __kprobes	__attribute__((__section__(".kprobes.text")))
>>>>diff -puN kernel/kprobes.c~kprobes_userspace_probes_ss_out-of-line kernel/kprobes.c
>>>>--- linux-2.6.16-rc1-mm5/kernel/kprobes.c~kprobes_userspace_probes_ss_out-of-line	2006-02-08 19:26:10.000000000 +0530
>>>>+++ linux-2.6.16-rc1-mm5-prasanna/kernel/kprobes.c	2006-02-08 19:26:10.000000000 +0530
>>>>@@ -42,9 +42,6 @@
>>>> #include <asm/errno.h>
>>>> #include <asm/kdebug.h>
>>>>
>>>>-#define KPROBE_HASH_BITS 6
>>>>-#define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS)
>>>>-
>>>> static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
>>>> static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];
>>>> static struct list_head uprobe_module_list;
>>>>
>>>>_
>>>>--
>>>>Prasanna S Panchamukhi
>>>>Linux Technology Center
>>>>India Software Labs, IBM Bangalore
>>>>Email: prasanna@in.ibm.com
>>>>Ph: 91-80-51776329


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]