This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
RE: [3/3] Userspace probes prototype-take2
- From: "Zhang, Yanmin" <yanmin dot zhang at intel dot com>
- To: "Zhang, Yanmin" <yanmin dot zhang at intel dot com>, <prasanna at in dot ibm dot com>, <systemtap at sources dot redhat dot com>
- Date: Mon, 20 Feb 2006 11:15:31 +0800
- Subject: RE: [3/3] Userspace probes prototype-take2
I lost an important comment. The patch is not aware of signal processing. After kernel prepares the single-step-inst on the stack, if a signal is delivered to the thread, kernel will save some states into stack and switch to signal handler function, so single-step-inst on the stack might be erased.
>>-----Original Message-----
>>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org] On Behalf Of Zhang, Yanmin
>>Sent: 2006年2月17日 17:20
>>To: prasanna@in.ibm.com; systemtap@sources.redhat.com
>>Subject: RE: [3/3] Userspace probes prototype-take2
>>
>>2 main issues:
>>1) task switch caused by external interrupt when single-step;
>>2) multi-thread:
>>
>>See below inline comments.
>>
>>Yanmin
>>
>>>>-----Original Message-----
>>>>From: systemtap-owner@sourceware.org [mailto:systemtap-owner@sourceware.org] On Behalf Of Prasanna S Panchamukhi
>>>>Sent: 2006年2月8日 22:14
>>>>To: systemtap@sources.redhat.com
>>>>Subject: Re: [3/3] Userspace probes prototype-take2
>>>>
>>>>
>>>>This patch handles the executing the registered callback
>>>>functions when probes is hit.
>>>>
>>>> Each userspace probe is uniquely identified by the
>>>>combination of inode and offset, hence during registeration the inode
>>>>and offset combination is added to kprobes hash table. Initially when
>>>>breakpoint instruction is hit, the kprobes hash table is looked up
>>>>for matching inode and offset. The pre_handlers are called in sequence
>>>>if multiple probes are registered. The original instruction is single
>>>>stepped out-of-line similar to kernel probes. In kernel space probes,
>>>>single stepping out-of-line is achieved by copying the instruction on
>>>>to some location within kernel address space and then single step
>>>>from that location. But for userspace probes, instruction copied
>>>>into kernel address space cannot be single stepped, hence the
>>>>instruction should be copied to user address space. The solution is
>>>>to find free space in the current process address space and then copy
>>>>the original instruction and single step that instruction.
>>>>
>>>>User processes use stack space to store local variables, agruments and
>>>>return values. Normally the stack space either below or above the
>>>>stack pointer indicates the free stack space. If the stack grows
>>>>downwards, the stack space below the stack pointer indicates the
>>>>unused stack free space and if the stack grows upwards, the stack
>>>>space above the stack pointer indicates the unused stack free space.
>>>>
>>>>The instruction to be single stepped can modify the stack space, hence
>>>>before using the unused stack free space, sufficient stack space
>>>>should be left. The instruction is copied to the bottom of the page
>>>>and check is made such that the copied instruction does not cross the
>>>>page boundry. The copied instruction is then single stepped.
>>>>Several architectures does not allow the instruction to be executed
>>>>from the stack location, since no-exec bit is set for the stack pages.
>>>>In those architectures, the page table entry corresponding to the
>>>>stack page is identified and the no-exec bit is unset making the
>>>>instruction on that stack page to be executed.
>>>>
>>>>There are situations where even the unused free stack space is not
>>>>enough for the user instruction to be copied and single stepped. In
>>>>such situations, the virtual memory area(vma) can be expanded beyond
>>>>the current stack vma. This expaneded stack can be used to copy the
>>>>original instruction and single step out-of-line.
>>>>
>>>>Even if the vma cannot be extended then the instruction much be
>>>>executed inline, by replacing the breakpoint instruction with original
>>>>instruction.
>>>>
>>>>TODO list
>>>>--------
>>>>1. This patch is not stable yet, should work for most conditions.
>>>>
>>>>2. This patch works only with PREEMPT config option disabled, to work
>>>>in PREEMPT enabled condition handlers must be re-written and must
>>>>be seperated out from kernel probes allowing preemption.
>>One of my old comments is an external device interrupt might happen when cpu is single-stepping the original instruction, then the task
>>might be switched to another cpu. If we disable irq when exiting to user space to single step the instruction, kernel might switch the
>>task off just on the exit kernel path. 1) uprobe_page; 2) kprobe_ctlblk, These 2 resources shouldn't be pre cpu, or we need get another
>>approach. How could you resolve the task switch issue?
>>
>>
>>
>>>>
>>>>3. Insert probes on copy-on-write pages. Tracks all COW pages for the
>>>>page containing the specified probe point and inserts/removes all the
>>>>probe points for that page.
>>>>
>>>>4. Optimize the insertion of probes through readpage hooks. Identify
>>>>all the probes to be inserted on the read page and insert them at
>>>>once.
>>>>
>>>>5. Resume exectution should handle setting of proper eip and eflags
>>>>for special instructions similar to kernel probes.
>>>>
>>>>6. Single stepping out-of-line expands the stack if there is no
>>>>enough stack space to copy the original instruction. Expansion of
>>>>stack should be shrinked back to the original size after single
>>>>stepping or the expanded stack should be reused for single stepping
>>>>out-of-line for other probes.
>>>>
>>>>7. A wrapper routines to calculate the offset from the probed file
>>>>beginning. In case of dynamic shared library, the offset is
>>>>calculated by substracting the address of the probe point from the
>>>>beginning of the file mapped address.
>>>>
>>>>8. Handing of page faults while inthe kprobes_handler() and while
>>>>single stepping.
>>>>
>>>>9. Accessing user space pages not present in memory, from the
>>>>registered callback routines.
>>>>
>>>>Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
>>>>
>>>>
>>>> arch/i386/kernel/kprobes.c | 460 +++++++++++++++++++++++++++++++++++++++++++--
>>>> include/asm-i386/kprobes.h | 13 +
>>>> include/linux/kprobes.h | 7
>>>> kernel/kprobes.c | 3
>>>> 4 files changed, 468 insertions(+), 15 deletions(-)
>>>>
>>>>diff -puN arch/i386/kernel/kprobes.c~kprobes_userspace_probes_ss_out-of-line arch/i386/kernel/kprobes.c
>>>>--- linux-2.6.16-rc1-mm5/arch/i386/kernel/kprobes.c~kprobes_userspace_probes_ss_out-of-line 2006-02-08 19:26:09.000000000 +0530
>>>>+++ linux-2.6.16-rc1-mm5-prasanna/arch/i386/kernel/kprobes.c 2006-02-08 19:26:10.000000000 +0530
>>>>@@ -30,6 +30,7 @@
>>>>
>>>> #include <linux/config.h>
>>>> #include <linux/kprobes.h>
>>>>+#include <linux/hash.h>
>>>> #include <linux/ptrace.h>
>>>> #include <linux/preempt.h>
>>>> #include <asm/cacheflush.h>
>>>>@@ -38,8 +39,12 @@
>>>>
>>>> void jprobe_return_end(void);
>>>>
>>>>+static struct uprobe_page *uprobe_page;
>>>>+static struct hlist_head uprobe_page_table[KPROBE_TABLE_SIZE];
>>>> DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
>>>> DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
>>>>+DEFINE_PER_CPU(struct uprobe *, current_uprobe) = NULL;
>>>>+DEFINE_PER_CPU(unsigned long, singlestep_addr);
>>>>
>>>> /* insert a jmp code */
>>>> static inline void set_jmp_op(void *from, void *to)
>>>>@@ -125,6 +130,23 @@ void __kprobes arch_disarm_kprobe(struct
>>>> (unsigned long) p->addr + sizeof(kprobe_opcode_t));
>>>> }
>>>>
>>>>+void __kprobes arch_disarm_uprobe(struct kprobe *p, kprobe_opcode_t *address)
>>>>+{
>>>>+ *address = p->opcode;
>>>>+}
>>>>+
>>>>+void __kprobes arch_arm_uprobe(unsigned long *address)
>>>>+{
>>>>+ *(kprobe_opcode_t *)address = BREAKPOINT_INSTRUCTION;
>>>>+}
>>>>+
>>>>+void __kprobes arch_copy_uprobe(struct kprobe *p, unsigned long *address)
>>>>+{
>>>>+ memcpy(p->ainsn.insn, (kprobe_opcode_t *)address,
>>>>+ MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
>>>>+ p->opcode = *(kprobe_opcode_t *)address;
>>>>+}
>>>>+
>>>> static inline void save_previous_kprobe(struct kprobe_ctlblk *kcb)
>>>> {
>>>> kcb->prev_kprobe.kp = kprobe_running();
>>>>@@ -151,15 +173,326 @@ static inline void set_current_kprobe(st
>>>> kcb->kprobe_saved_eflags &= ~IF_MASK;
>>>> }
>>>>
>>>>+struct uprobe_page __kprobes *get_upage_current(struct task_struct *tsk)
>>>>+{
>>>>+ struct hlist_head *head;
>>>>+ struct hlist_node *node;
>>>>+ struct uprobe_page *upage;
>>>>+
>>>>+ head = &uprobe_page_table[hash_ptr(current, KPROBE_HASH_BITS)];
>>>>+ hlist_for_each_entry(upage, node, head, hlist) {
>>>>+ if (upage->tsk == tsk)
>>>>+ return upage;
>>>>+ }
>>>>+ return NULL;
>>>>+}
>>>>+
>>>>+struct uprobe_page __kprobes *get_upage_free(struct task_struct *tsk)
>>>>+{
>>>>+ int cpu;
>>>>+
>>>>+ for_each_cpu(cpu) {
>>>>+ struct uprobe_page *upage;
>>>>+ upage = per_cpu_ptr(uprobe_page, cpu);
>>>>+ if (upage->status & UPROBE_PAGE_FREE)
>>>>+ return upage;
>>>>+ }
>>>>+ return NULL;
>>>>+}
>>>>+
>>>>+/**
>>>>+ * This routines get the pte of the page containing the specified address.
>>>>+ */
>>>>+static pte_t __kprobes *get_uprobe_pte(unsigned long address)
>>>>+{
>>>>+ pgd_t *pgd;
>>>>+ pud_t *pud;
>>>>+ pmd_t *pmd;
>>>>+ pte_t *pte = NULL;
>>>>+
>>>>+ pgd = pgd_offset(current->mm, address);
>>>>+ if (!pgd)
>>>>+ goto out;
>>>>+
>>>>+ pud = pud_offset(pgd, address);
>>>>+ if (!pud)
>>>>+ goto out;
>>>>+
>>>>+ pmd = pmd_offset(pud, address);
>>>>+ if (!pmd)
>>>>+ goto out;
>>>>+
>>>>+ pte = pte_alloc_map(current->mm, pmd, address);
>>>>+
>>>>+out:
>>>>+ return pte;
>>>>+}
>>>>+
>>>>+/**
>>>>+ * This routine check for space in the current process's stack address space.
>>>>+ * If enough address space is found, it just maps a new page and copies the
>>>>+ * new instruction on that page for single stepping out-of-line.
>>>>+ */
>>>>+static int __kprobes copy_insn_on_new_page(struct uprobe *uprobe ,
>>>>+ struct pt_regs *regs, struct vm_area_struct *vma)
>>>>+{
>>>>+ unsigned long addr, *vaddr, stack_addr = regs->esp;
>>>>+ int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t);
>>>>+ struct uprobe_page *upage;
>>>>+ struct page *page;
>>>>+ pte_t *pte;
>>>>+
>>>>+
>>>>+ if (vma->vm_flags & VM_GROWSDOWN) {
>>>>+ if (((stack_addr - sizeof(long long))) < (vma->vm_start + size))
>>>>+ return -ENOMEM;
>>>>+
>>>>+ addr = vma->vm_start;
>>>>+ } else if (vma->vm_flags & VM_GROWSUP) {
>>>>+ if ((vma->vm_end - size) < (stack_addr + sizeof(long long)))
>>>>+ return -ENOMEM;
>>>>+
>>>>+ addr = vma->vm_end - size;
>>>>+ } else
>>>>+ return -EFAULT;
>>>>+
>>The multi-thread case is not resolved here. One of typical multi-thread model is that the all threads share the same vma and every thread
>>has 8-k stack. If 2 threads trigger uprobe (although might be not the same uprobe) at the same time, one thread might erase single-step
>>instruction of another.
>>
>>
>>
>>>>+ preempt_enable_no_resched();
>>>>+
>>>>+ pte = get_uprobe_pte(addr);
>>>>+ preempt_disable();
>>>>+ if (!pte)
>>>>+ return -EFAULT;
>>>>+
>>>>+ upage = get_upage_free(current);
>>>>+ upage->status &= ~UPROBE_PAGE_FREE;
>>>>+ upage->tsk = current;
>>>>+ INIT_HLIST_NODE(&upage->hlist);
>>>>+ hlist_add_head(&upage->hlist,
>>>>+ &uprobe_page_table[hash_ptr(current, KPROBE_HASH_BITS)]);
>>>>+
>>>>+ upage->orig_pte = pte;
>>>>+ upage->orig_pte_val = pte_val(*pte);
>>>>+ set_pte(pte, (*(upage->alias_pte)));
>>>>+
>>>>+ page = pte_page(*pte);
>>>>+ vaddr = (unsigned long *)kmap_atomic(page, KM_USER1);
>>>>+ vaddr = (unsigned long *)((unsigned long)vaddr + (addr & ~PAGE_MASK));
>>>>+ memcpy(vaddr, (unsigned long *)uprobe->kp.ainsn.insn, size);
>>>>+ kunmap_atomic(vaddr, KM_USER1);
>>>>+ regs->eip = addr;
>>So the temp page, upage->alias_addr, replaces the original one on the stack. If the replaced instruction is to operate stack, such like
>>"push eax", the result might be on the new page. After the single step, the pte is restored to the original page which doesn't have
>>the value of eax.
>>
>>
>>
>>>>+
>>>>+ return 0;
>>>>+}
>>>>+
>>>>+/**
>>>>+ * This routine expands the stack beyond the present process address space
>>>>+ * and copies the instruction to that location, so that processor can
>>>>+ * single step out-of-line.
>>>>+ */
>>>>+static int __kprobes copy_insn_onexpstack(struct uprobe *uprobe,
>>>>+ struct pt_regs *regs, struct vm_area_struct *vma)
>>It has the same issues like function copy_insn_on_new_page.
>>
>>
>>>>+{
>>>>+ unsigned long addr, *vaddr, vm_addr;
>>>>+ int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t);
>>>>+ struct vm_area_struct *new_vma;
>>>>+ struct uprobe_page *upage;
>>>>+ struct mm_struct *mm = current->mm;
>>>>+ struct page *page;
>>>>+ pte_t *pte;
>>>>+
>>>>+
>>>>+ if (vma->vm_flags & VM_GROWSDOWN)
>>>>+ vm_addr = vma->vm_start - size;
>>>>+ else if (vma->vm_flags & VM_GROWSUP)
>>>>+ vm_addr = vma->vm_end + size;
>>>>+ else
>>>>+ return -EFAULT;
>>>>+
>>>>+ preempt_enable_no_resched();
>>>>+
>>>>+ /* TODO: do we need to expand stack if extend_vma fails? */
>>>>+ new_vma = find_extend_vma(mm, vm_addr);
>>>>+ preempt_disable();
>>>>+ if (!new_vma)
>>>>+ return -ENOMEM;
>>>>+
>>>>+ /*
>>>>+ * TODO: Expanding stack for every probe is not a good idea, stack must
>>>>+ * either be shrunk to its original size after single stepping or the
>>>>+ * expanded stack should be kept track of, for the probed application,
>>>>+ * so it can be reused to single step out-of-line
>>>>+ */
>>>>+ if (new_vma->vm_flags & VM_GROWSDOWN)
>>>>+ addr = new_vma->vm_start;
>>>>+ else
>>>>+ addr = new_vma->vm_end - size;
>>>>+
>>>>+ preempt_enable_no_resched();
>>>>+ pte = get_uprobe_pte(addr);
>>>>+ preempt_disable();
>>>>+ if (!pte)
>>>>+ return -EFAULT;
>>>>+
>>>>+ upage = get_upage_free(current);
>>>>+ upage->status &= ~UPROBE_PAGE_FREE;
>>>>+ upage->tsk = current;
>>>>+ INIT_HLIST_NODE(&upage->hlist);
>>>>+ hlist_add_head(&upage->hlist,
>>>>+ &uprobe_page_table[hash_ptr(current, KPROBE_HASH_BITS)]);
>>>>+ upage->orig_pte = pte;
>>>>+ upage->orig_pte_val = pte_val(*pte);
>>>>+ set_pte(pte, (*(upage->alias_pte)));
>>>>+
>>>>+ page = pte_page(*pte);
>>>>+ vaddr = (unsigned long *)kmap_atomic(page, KM_USER1);
>>>>+ vaddr = (unsigned long *)((unsigned long)vaddr + (addr & ~PAGE_MASK));
>>>>+ memcpy(vaddr, (unsigned long *)uprobe->kp.ainsn.insn, size);
>>>>+ kunmap_atomic(vaddr, KM_USER1);
>>>>+ regs->eip = addr;
>>>>+
>>>>+ return 0;
>>>>+}
>>>>+
>>>>+/**
>>>>+ * This routine checks for stack free space below the stack pointer and
>>>>+ * then copies the instructions at that location so that the processor can
>>>>+ * single step out-of-line. If there is no enough stack space or if
>>>>+ * copy_to_user fails or if the vma is invalid, it returns error.
>>>>+ */
>>>>+static int __kprobes copy_insn_onstack(struct uprobe *uprobe,
>>>>+ struct pt_regs *regs, unsigned long flags)
>>>>+{
>>>>+ unsigned long page_addr, stack_addr = regs->esp;
>>>>+ int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t);
>>>>+ unsigned long *source = (unsigned long *)uprobe->kp.ainsn.insn;
>>>>+
>>>>+ if (flags & VM_GROWSDOWN) {
>>>>+ page_addr = stack_addr & PAGE_MASK;
>>>>+
>>>>+ if (((stack_addr - sizeof(long long))) < (page_addr + size))
>>>>+ return -ENOMEM;
>>>>+
>>>>+ if (__copy_to_user_inatomic((unsigned long *)page_addr, source,
>>>>+ size))
>>>>+ return -EFAULT;
>>>>+
>>>>+ regs->eip = page_addr;
>>>>+ } else if (flags & VM_GROWSUP) {
>>>>+ page_addr = stack_addr & PAGE_MASK;
>>>>+
>>>>+ if (page_addr == stack_addr)
>>>>+ return -ENOMEM;
>>>>+ else
>>>>+ page_addr += PAGE_SIZE;
>>>>+
>>>>+ if ((page_addr - size) < (stack_addr + sizeof(long long)))
>>>>+ return -ENOMEM;
>>>>+
>>>>+ if (__copy_to_user_inatomic((unsigned long *)(page_addr - size),
>>>>+ source, size))
>>>>+ return -EFAULT;
>>>>+
>>>>+ regs->eip = page_addr - size;
>>>>+ } else
>>>>+ return -EINVAL;
>>>>+
>>>>+ return 0;
>>>>+}
>>>>+
>>>>+/**
>>>>+ * This routines get the page containing the probe, maps it and
>>>>+ * replaced the instruction at the probed address with specified
>>>>+ * opcode.
>>>>+ */
>>>>+void __kprobes replace_original_insn(struct uprobe *uprobe,
>>>>+ struct pt_regs *regs, kprobe_opcode_t opcode)
>>>>+{
>>>>+ kprobe_opcode_t *addr;
>>>>+ struct page *page;
>>>>+
>>>>+ page = find_get_page(uprobe->inode->i_mapping,
>>>>+ uprobe->offset >> PAGE_CACHE_SHIFT);
>>>>+ lock_page(page);
>>>>+
>>>>+ addr = (kprobe_opcode_t *)kmap_atomic(page, KM_USER0);
>>>>+ addr = (kprobe_opcode_t *)((unsigned long)addr +
>>>>+ (unsigned long)(uprobe->offset & ~PAGE_MASK));
>>>>+ *addr = opcode;
>>>>+ /*TODO: flush vma ? */
>>>>+ kunmap_atomic(addr, KM_USER0);
>>>>+
>>>>+ unlock_page(page);
>>>>+
>>>>+ page_cache_release(page);
>>>>+ regs->eip = (unsigned long)uprobe->kp.addr;
>>>>+}
>>>>+
>>>>+/**
>>>>+ * This routine provides the functionality of single stepping out of line.
>>>>+ * If single stepping out-of-line cannot be achieved, it replaces with
>>>>+ * the original instruction allowing it to single step inline.
>>>>+ */
>>>>+static inline int uprobe_single_step(struct kprobe *p, struct pt_regs *regs)
>>>>+{
>>>>+ unsigned long stack_addr = regs->esp, flags;
>>>>+ struct vm_area_struct *vma = NULL;
>>>>+ struct uprobe *uprobe = __get_cpu_var(current_uprobe);
>>>>+ struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
>>>>+ int err = 0;
>>>>+
>>>>+ down_read(¤t->mm->mmap_sem);
>>>>+
>>>>+ vma = find_vma(current->mm, (stack_addr & PAGE_MASK));
>>>>+ if (!vma) {
>>>>+ /* TODO: Need better error reporting? */
>>>>+ printk("No vma found\n");
>>>>+ up_read(¤t->mm->mmap_sem);
>>>>+ return -ENOENT;
>>>>+ }
>>>>+ flags = vma->vm_flags;
>>>>+ up_read(¤t->mm->mmap_sem);
>>>>+
>>>>+ kcb->kprobe_status |= UPROBE_SS_STACK;
>>>>+ err = copy_insn_onstack(uprobe, regs, flags);
>>>>+
>>>>+ down_write(¤t->mm->mmap_sem);
>>>>+
>>>>+ if (err) {
>>>>+ kcb->kprobe_status |= UPROBE_SS_NEW_STACK;
>>>>+ err = copy_insn_on_new_page(uprobe, regs, vma);
>>>>+ }
>>>>+ if (err) {
>>>>+ kcb->kprobe_status |= UPROBE_SS_EXPSTACK;
>>>>+ err = copy_insn_onexpstack(uprobe, regs, vma);
>>>>+ }
>>>>+
>>>>+ up_write(¤t->mm->mmap_sem);
>>>>+
>>>>+ if (err) {
>>>>+ kcb->kprobe_status |= UPROBE_SS_INLINE;
>>>>+ replace_original_insn(uprobe, regs, uprobe->kp.opcode);
>>>>+ }
>>>>+
>>>>+ __get_cpu_var(singlestep_addr) = regs->eip;
>>>>+
>>>>+
>>>>+ return 0;
>>>>+}
>>>>+
>>>> static inline void prepare_singlestep(struct kprobe *p, struct pt_regs *regs)
>>>> {
>>>> regs->eflags |= TF_MASK;
>>>> regs->eflags &= ~IF_MASK;
>>>> /*single step inline if the instruction is an int3*/
>>>>+
>>>> if (p->opcode == BREAKPOINT_INSTRUCTION)
>>>> regs->eip = (unsigned long)p->addr;
>>>>- else
>>>>- regs->eip = (unsigned long)&p->ainsn.insn;
>>>>+ else {
>>>>+ if (!kernel_text_address((unsigned long)p->addr))
>>>>+ uprobe_single_step(p, regs);
>>>>+ else
>>>>+ regs->eip = (unsigned long)&p->ainsn.insn;
>>>>+ }
>>>> }
>>>>
>>>> /* Called with kretprobe_lock held */
>>>>@@ -194,6 +527,7 @@ static int __kprobes kprobe_handler(stru
>>>> kprobe_opcode_t *addr = NULL;
>>>> unsigned long *lp;
>>>> struct kprobe_ctlblk *kcb;
>>>>+ unsigned seg = regs->xcs & 0xffff;
>>>> #ifdef CONFIG_PREEMPT
>>>> unsigned pre_preempt_count = preempt_count();
>>>> #endif /* CONFIG_PREEMPT */
>>>>@@ -208,14 +542,21 @@ static int __kprobes kprobe_handler(stru
>>>> /* Check if the application is using LDT entry for its code segment and
>>>> * calculate the address by reading the base address from the LDT entry.
>>>> */
>>>>- if ((regs->xcs & 4) && (current->mm)) {
>>>>+
>>>>+ if (regs->eflags & VM_MASK)
>>>>+ addr = (kprobe_opcode_t *)(((seg << 4) + regs->eip -
>>>>+ sizeof(kprobe_opcode_t)) & 0xffff);
>>>>+ else if ((regs->xcs & 4) && (current->mm)) {
>>>>+ local_irq_enable();
>>>>+ down(¤t->mm->context.sem);
>>>> lp = (unsigned long *) ((unsigned long)((regs->xcs >> 3) * 8)
>>>> + (char *) current->mm->context.ldt);
>>>> addr = (kprobe_opcode_t *) (get_desc_base(lp) + regs->eip -
>>>> sizeof(kprobe_opcode_t));
>>>>- } else {
>>>>+ up(¤t->mm->context.sem);
>>>>+ local_irq_disable();
>>>>+ } else
>>>> addr = (kprobe_opcode_t *)(regs->eip - sizeof(kprobe_opcode_t));
>>>>- }
>>>> /* Check we're not actually recursing */
>>>> if (kprobe_running()) {
>>>> p = get_kprobe(addr);
>>>>@@ -235,7 +576,6 @@ static int __kprobes kprobe_handler(stru
>>>> save_previous_kprobe(kcb);
>>>> set_current_kprobe(p, regs, kcb);
>>>> kprobes_inc_nmissed_count(p);
>>>>- prepare_singlestep(p, regs);
>>>> kcb->kprobe_status = KPROBE_REENTER;
>>>> return 1;
>>>> } else {
>>>>@@ -307,8 +647,8 @@ static int __kprobes kprobe_handler(stru
>>>> }
>>>>
>>>> ss_probe:
>>>>- prepare_singlestep(p, regs);
>>>> kcb->kprobe_status = KPROBE_HIT_SS;
>>>>+ prepare_singlestep(p, regs);
>>>> return 1;
>>>>
>>>> no_kprobe:
>>>>@@ -498,6 +838,33 @@ no_change:
>>>> return;
>>>> }
>>>>
>>>>+static void __kprobes resume_execution_user(struct uprobe *uprobe,
>>>>+ struct pt_regs *regs, struct kprobe_ctlblk *kcb)
>>>>+{
>>>>+ unsigned long delta;
>>>>+ struct uprobe_page *upage;
>>>>+
>>>>+ /*
>>>>+ * TODO :need to fixup special instructions as done with kernel probes.
>>>>+ */
>>>>+ delta = regs->eip - __get_cpu_var(singlestep_addr);
>>>>+ regs->eip = (unsigned long)(uprobe->kp.addr + delta);
>>>>+
>>>>+ if ((kcb->kprobe_status & UPROBE_SS_EXPSTACK) ||
>>>>+ (kcb->kprobe_status & UPROBE_SS_NEW_STACK)) {
>>>>+ upage = get_upage_current(current);
>>>>+ set_pte(upage->orig_pte, __pte(upage->orig_pte_val));
>>>>+ pte_unmap(upage->orig_pte);
>>>>+
>>>>+ upage->status = UPROBE_PAGE_FREE;
>>>>+ hlist_del(&upage->hlist);
>>>>+
>>>>+ } else if (kcb->kprobe_status & UPROBE_SS_INLINE)
>>>>+ replace_original_insn(uprobe, regs,
>>>>+ (kprobe_opcode_t)BREAKPOINT_INSTRUCTION);
>>>>+ regs->eflags &= ~TF_MASK;
>>>>+}
>>>>+
>>>> /*
>>>> * Interrupts are disabled on entry as trap1 is an interrupt gate and they
>>>> * remain disabled thoroughout this function.
>>>>@@ -510,16 +877,19 @@ static inline int post_kprobe_handler(st
>>>> if (!cur)
>>>> return 0;
>>>>
>>>>- if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
>>>>- kcb->kprobe_status = KPROBE_HIT_SSDONE;
>>>>+ if (!(kcb->kprobe_status & KPROBE_REENTER) && cur->post_handler) {
>>>>+ kcb->kprobe_status |= KPROBE_HIT_SSDONE;
>>>> cur->post_handler(cur, regs, 0);
>>>> }
>>>>
>>>>- resume_execution(cur, regs, kcb);
>>>>+ if (!kernel_text_address((unsigned long)cur->addr))
>>>>+ resume_execution_user(__get_cpu_var(current_uprobe), regs, kcb);
>>>>+ else
>>>>+ resume_execution(cur, regs, kcb);
>>>> regs->eflags |= kcb->kprobe_saved_eflags;
>>>>
>>>> /*Restore back the original saved kprobes variables and continue. */
>>>>- if (kcb->kprobe_status == KPROBE_REENTER) {
>>>>+ if (kcb->kprobe_status & KPROBE_REENTER) {
>>>> restore_previous_kprobe(kcb);
>>>> goto out;
>>>> }
>>>>@@ -547,7 +917,13 @@ static inline int kprobe_fault_handler(s
>>>> return 1;
>>>>
>>>> if (kcb->kprobe_status & KPROBE_HIT_SS) {
>>>>- resume_execution(cur, regs, kcb);
>>>>+ if (!kernel_text_address((unsigned long)cur->addr)) {
>>>>+ struct uprobe *uprobe = __get_cpu_var(current_uprobe);
>>>>+ /* TODO: Proper handling of all instruction */
>>>>+ replace_original_insn(uprobe, regs, uprobe->kp.opcode);
>>>>+ regs->eflags &= ~TF_MASK;
>>>>+ } else
>>>>+ resume_execution(cur, regs, kcb);
>>>> regs->eflags |= kcb->kprobe_old_eflags;
>>>>
>>>> reset_current_kprobe();
>>>>@@ -654,7 +1030,67 @@ int __kprobes longjmp_break_handler(stru
>>>> return 0;
>>>> }
>>>>
>>>>+static void free_alias(void)
>>>>+{
>>>>+ int cpu;
>>>>+
>>>>+ for_each_cpu(cpu) {
>>>>+ struct uprobe_page *upage;
>>>>+ upage = per_cpu_ptr(uprobe_page, cpu);
>>>>+
>>>>+ if (upage->alias_addr) {
>>>>+ set_pte(upage->alias_pte, __pte(upage->alias_pte_val));
>>>>+ kfree(upage->alias_addr);
>>>>+ }
>>>>+ upage->alias_pte = 0;
>>>>+ }
>>>>+ free_percpu(uprobe_page);
>>>>+ return;
>>>>+}
>>>>+
>>>>+static int alloc_alias(void)
>>>>+{
>>>>+ int cpu;
>>>>+
>>>>+ uprobe_page = __alloc_percpu(sizeof(struct uprobe_page));
>>[YM] Do here codes try to resolve the problem of task switch at single-step? If so, the per cpu data also might be used up although
>>get_upage_free will go through all uprobe_page of all cpus. I suggest to allocate a series of uprobe_page, and allocate again when they
>>are used up.
>>
>>
>>
>>
>>>>+
>>>>+ for_each_cpu(cpu) {
>>>>+ struct uprobe_page *upage;
>>>>+ upage = per_cpu_ptr(uprobe_page, cpu);
>>>>+ upage->alias_addr = kmalloc(PAGE_SIZE, GFP_USER);
>>[YM] Does kmalloc(PAGE_SIZE...) imply the result is aligned to page? How about using alloc_page?
>>
>>
>>>>+ if (!upage->alias_addr) {
>>>>+ free_alias();
>>>>+ return -ENOMEM;
>>>>+ }
>>>>+ upage->alias_pte = lookup_address(
>>>>+ (unsigned long)upage->alias_addr);
>>>>+ upage->alias_pte_val = pte_val(*upage->alias_pte);
>>>>+ if (upage->alias_pte) {
>>[YM] If kmalloc returns a non-NULL address, upage->alias_pte is not equal to NULL. So delete above checking?
>>
>>
>>>>+ upage->status = UPROBE_PAGE_FREE;
>>>>+ set_pte(upage->alias_pte,
>>>>+ pte_mkdirty(*upage->alias_pte));
>>>>+ set_pte(upage->alias_pte,
>>>>+ pte_mkexec(*upage->alias_pte));
>>>>+ set_pte(upage->alias_pte,
>>>>+ pte_mkwrite(*upage->alias_pte));
>>>>+ set_pte(upage->alias_pte,
>>>>+ pte_mkyoung(*upage->alias_pte));
>>>>+ }
>>>>+ }
>>>>+ return 0;
>>>>+}
>>>>+
>>>> int __init arch_init_kprobes(void)
>>>> {
>>>>+ int ret = 0;
>>>>+ /*
>>>>+ * user space probes requires a page to copy the original instruction
>>>>+ * so that it can single step if there is no free stack space, allocate
>>>>+ * per cpu page.
>>>>+ */
>>>>+
>>>>+ if ((ret = alloc_alias()))
>>>>+ return ret;
>>>>+
>>>> return 0;
>>>> }
>>>>diff -puN include/asm-i386/kprobes.h~kprobes_userspace_probes_ss_out-of-line include/asm-i386/kprobes.h
>>>>--- linux-2.6.16-rc1-mm5/include/asm-i386/kprobes.h~kprobes_userspace_probes_ss_out-of-line 2006-02-08 19:26:09.000000000 +0530
>>>>+++ linux-2.6.16-rc1-mm5-prasanna/include/asm-i386/kprobes.h 2006-02-08 19:26:10.000000000 +0530
>>>>@@ -42,6 +42,7 @@ typedef u8 kprobe_opcode_t;
>>>> #define JPROBE_ENTRY(pentry) (kprobe_opcode_t *)pentry
>>>> #define ARCH_SUPPORTS_KRETPROBES
>>>> #define arch_remove_kprobe(p) do {} while (0)
>>>>+#define UPROBE_PAGE_FREE 0x00000001
>>>>
>>>> void kretprobe_trampoline(void);
>>>>
>>>>@@ -74,6 +75,18 @@ struct kprobe_ctlblk {
>>>> struct prev_kprobe prev_kprobe;
>>>> };
>>>>
>>>>+/* per cpu uprobe page structure */
>>>>+struct uprobe_page {
>>>>+ struct hlist_node hlist;
>>>>+ pte_t *alias_pte;
>>>>+ pte_t *orig_pte;
>>>>+ unsigned long orig_pte_val;
>>>>+ unsigned long alias_pte_val;
>>[YM] I think the patch doesn't support CONFIG_X86_PAE, because if CONFIG_X86_PAE=y, pte_t becomes 64 bits.
>>How about changing above 2 members' type to pte_t directly?
>>
>>
>>
>>>>+ void *alias_addr;
>>>>+ struct task_struct *tsk;
>>>>+ unsigned long status;
>>>>+};
>>>>+
>>>> /* trap3/1 are intr gates for kprobes. So, restore the status of IF,
>>>> * if necessary, before executing the original int3/1 (trap) handler.
>>>> */
>>>>diff -puN include/linux/kprobes.h~kprobes_userspace_probes_ss_out-of-line include/linux/kprobes.h
>>>>--- linux-2.6.16-rc1-mm5/include/linux/kprobes.h~kprobes_userspace_probes_ss_out-of-line 2006-02-08 19:26:09.000000000 +0530
>>>>+++ linux-2.6.16-rc1-mm5-prasanna/include/linux/kprobes.h 2006-02-08 19:26:10.000000000 +0530
>>>>@@ -45,11 +45,18 @@
>>>> #ifdef CONFIG_KPROBES
>>>> #include <asm/kprobes.h>
>>>>
>>>>+#define KPROBE_HASH_BITS 6
>>>>+#define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS)
>>>>+
>>>> /* kprobe_status settings */
>>>> #define KPROBE_HIT_ACTIVE 0x00000001
>>>> #define KPROBE_HIT_SS 0x00000002
>>>> #define KPROBE_REENTER 0x00000004
>>>> #define KPROBE_HIT_SSDONE 0x00000008
>>>>+#define UPROBE_SS_STACK 0x00000010
>>>>+#define UPROBE_SS_EXPSTACK 0x00000020
>>>>+#define UPROBE_SS_INLINE 0x00000040
>>>>+#define UPROBE_SS_NEW_STACK 0x00000080
>>>>
>>>> /* Attach to insert probes on any functions which should be ignored*/
>>>> #define __kprobes __attribute__((__section__(".kprobes.text")))
>>>>diff -puN kernel/kprobes.c~kprobes_userspace_probes_ss_out-of-line kernel/kprobes.c
>>>>--- linux-2.6.16-rc1-mm5/kernel/kprobes.c~kprobes_userspace_probes_ss_out-of-line 2006-02-08 19:26:10.000000000 +0530
>>>>+++ linux-2.6.16-rc1-mm5-prasanna/kernel/kprobes.c 2006-02-08 19:26:10.000000000 +0530
>>>>@@ -42,9 +42,6 @@
>>>> #include <asm/errno.h>
>>>> #include <asm/kdebug.h>
>>>>
>>>>-#define KPROBE_HASH_BITS 6
>>>>-#define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS)
>>>>-
>>>> static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
>>>> static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];
>>>> static struct list_head uprobe_module_list;
>>>>
>>>>_
>>>>--
>>>>Prasanna S Panchamukhi
>>>>Linux Technology Center
>>>>India Software Labs, IBM Bangalore
>>>>Email: prasanna@in.ibm.com
>>>>Ph: 91-80-51776329