This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFA] gdbserver/lynx178: spurious SIG61 signal when resuming inferior.


Thanks for the comments, Pedro.

> > On ppc-lynx178, resuming the execution of a program after hitting
> > a breakpoint sometimes triggers a spurious SIG61 event:
> 
> I'd like to understand this a little better.
>
> Could that mean the thread that gdbserver used for ptrace hadn't
> been ptrace stopped, or doesn't exist at all?  "sometimes" makes
> me wonder about the latter.

My interpretation of the clues I have been able to gather is that
the LynxOS thread library implementation does not like it when
we mess with the program's scheduling. Lynx178 is derived from
an old version of LynxOS, which can explain why newer versions
are a little more robust in that respect.

I tried to get more info directly from the people who I thought
would know about this, but never managed to make progress in that
direction, so I gave up when I found this solution.

> So does that mean scheduler locking doesn't work?
> 
> E.g.,
> 
> (gdb) thread 2
> (gdb) si
> (gdb) thread 1
> (gdb) c 
Indeed, as expected, same sort of symptom:

    (gdb) thread 1
    [Switching to thread 1 (Thread 30)]
    #0  0x1004ed94 in _trap_ ()
    (gdb) si
    0x1004ed98 in _trap_ ()
    (gdb) thread 2
    [Switching to thread 2 (Thread 36)]
    #0  task_switch.break_me () at task_switch.adb:42
    42            null;
    (gdb) cont
    Continuing.

    Program received signal SIG62, Real-time event 62.
    task_switch.break_me () at task_switch.adb:42
    42            null;

> BTW, vCont;c means "resume all threads", why is the current code just
> resuming one?

It's actually using a ptrace request that applies to the process
(either PTRACE_CONT or PTRACE_SINGLE_STEP).

I never tried to implement single-thread control (scheduler-locking
on), as this is not something we're interested on for this platform,
at least for now...

> lynx_wait_1 ()
> ...
>   if (ptid_equal (ptid, minus_one_ptid))
>     pid = lynx_ptid_get_pid (thread_to_gdb_id (current_inferior));
>   else
>     pid = BUILDPID (lynx_ptid_get_pid (ptid), lynx_ptid_get_tid (ptid));
> 
> retry:
> 
>   ret = lynx_waitpid (pid, &wstat);
> 
> 
> is suspicious also.

I understand... It's a bit of a hybrid between trying to deal with
thread-level execution control, and process-level execution control.

> Doesn't that mean we're doing a waitpid on
> a possibly not-resumed current_inferior (that may not be the main task,
> if that matters)?  Could _that_ be reason for that magic signal 61?

Given the above (we resume processes, rather than threads individually),
I do not think that this is the source of the problem itself. I blame
the thread library for now liking it when you potentially alter the
program scheduling by resuming the non-active thread. This patch does
not prevent this from happening, but at least makes an effort into
avoiding it for the usual situations.

-- 
Joel


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]