This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug runtime/16806] kernel crash during repeated module insertion


https://sourceware.org/bugzilla/show_bug.cgi?id=16806

--- Comment #8 from David Smith <dsmith at redhat dot com> ---
Created attachment 7553
  --> https://sourceware.org/bugzilla/attachment.cgi?id=7553&action=edit
potential patch

Here's a patch that fixed the problem for me. Here's my theory about what is
going on. When shutting down, the generated C code calls
stap_stop_task_finder(), which looks like the following (a bit simplified):

====
stap_stop_task_finder(void)                                                     
{                                                                               
        if (atomic_read(&__stp_task_finder_state) == __STP_TF_UNITIALIZED)      
                return;                                                         
        atomic_set(&__stp_task_finder_state, __STP_TF_STOPPING);                

        // The utrace_shutdown() function detaches and cleans up                
        // everything for us - we don't have to go through each                 
        // engine. This also means that the attach_count could end up           
        // > 0 (since we don't got through each engine individually).           
        utrace_shutdown();                                                      

        atomic_set(&__stp_task_finder_state, __STP_TF_STOPPED);                 

        /* Now that all the engines are detached, make sure                     
         * all the callbacks are finished.  If they aren't, we'll               
         * crash the kernel when the module is removed. */                      
        while (atomic_read(&__stp_inuse_count) != 0) {                          
                schedule();                                                     
        }                                                                       

        /* Make sure all outstanding task work requests are canceled. */        
        __stp_tf_cancel_task_work(); 

        utrace_exit();                                                          
}                                                                               
====

The utrace_shutdown() function unregisters all the tracepoint probes, but can
leave some running. The '_stp_inuse_count' loop above waits until they are
finished. When utrace_exit() gets called, it calls stp_task_work_exit(), to
cancel any outstanding task_work items that the utrace probe handlers left
behind.

The patch does a couple of things. First, it adds a call to
stp_task_work_exit() in utrace_shutdown, so it gets called sooner - before the
'_stp_inuse_count' loop. Second, it fixed a possible bug in the way
utrace_shutdown() knows it has already been called.

These changes fix the problem for me.

-- 
You are receiving this mail because:
You are the assignee for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]