This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [RFA] gdbserver/lynx178: spurious SIG61 signal when resuming inferior.
- From: Pedro Alves <palves at redhat dot com>
- To: Joel Brobecker <brobecker at adacore dot com>
- Cc: gdb-patches at sourceware dot org
- Date: Mon, 13 May 2013 15:27:53 +0100
- Subject: Re: [RFA] gdbserver/lynx178: spurious SIG61 signal when resuming inferior.
- References: <1368441986-14478-1-git-send-email-brobecker at adacore dot com> <5190CCF9 dot 3020004 at redhat dot com> <20130513132802 dot GA32222 at adacore dot com>
On 05/13/2013 02:28 PM, Joel Brobecker wrote:
> Lynx178 is derived from
> an old version of LynxOS, which can explain why newer versions
> are a little more robust in that respect.
Ah. I really have no sense of whether 178 is old or recent. ;-)
>
> I tried to get more info directly from the people who I thought
> would know about this, but never managed to make progress in that
> direction, so I gave up when I found this solution.
>
>> So does that mean scheduler locking doesn't work?
>>
>> E.g.,
>>
>> (gdb) thread 2
>> (gdb) si
>> (gdb) thread 1
>> (gdb) c
> Indeed, as expected, same sort of symptom:
>
> (gdb) thread 1
> [Switching to thread 1 (Thread 30)]
> #0 0x1004ed94 in _trap_ ()
> (gdb) si
> 0x1004ed98 in _trap_ ()
> (gdb) thread 2
> [Switching to thread 2 (Thread 36)]
> #0 task_switch.break_me () at task_switch.adb:42
> 42 null;
> (gdb) cont
> Continuing.
>
> Program received signal SIG62, Real-time event 62.
> task_switch.break_me () at task_switch.adb:42
> 42 null;
>
>> BTW, vCont;c means "resume all threads", why is the current code just
>> resuming one?
>
> It's actually using a ptrace request that applies to the process
> (either PTRACE_CONT or PTRACE_SINGLE_STEP).
> I never tried to implement single-thread control (scheduler-locking
> on), as this is not something we're interested on for this platform,
> at least for now...
Okay... I see the file has a reference to PTRACE_CONT_ONE/PTRACE_SINGLE_STEP_ONE
though they're not really being used. As PTRACE_SINGLE_STEP is resumes all
threads in the process, then when stepping over a breakpoint, other
threads may miss breakpoints...
Old lynx-nat.c did:
http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/Attic/lynx-nat.c?rev=1.23&content-type=text/x-cvsweb-markup&cvsroot=src
/* If pid == -1, then we want to step/continue all threads, else
we only want to step/continue a single thread. */
if (pid == -1)
{
pid = PIDGET (inferior_ptid);
func = step ? PTRACE_SINGLESTEP : PTRACE_CONT;
}
else
func = step ? PTRACE_SINGLESTEP_ONE : PTRACE_CONT_ONE;
I'd like to believe that just doing that in gdbserver too
would fix the scheduler-locking example. :-)
For the SIG61 issue, I wonder whether for PTRACE_CONT,
it's "continue main pid process" that we should always use
instead of "last reported thread id" (and that's what the old
lynx-nat.c did too). Did you try that?
Sorry to be picky. IMO, it's good to have all these
experimentation results archived, for when somebody proposes
removing/changing the "make sure to resume last reported" code
at some point...
>
>> lynx_wait_1 ()
>> ...
>> if (ptid_equal (ptid, minus_one_ptid))
>> pid = lynx_ptid_get_pid (thread_to_gdb_id (current_inferior));
>> else
>> pid = BUILDPID (lynx_ptid_get_pid (ptid), lynx_ptid_get_tid (ptid));
>>
>> retry:
>>
>> ret = lynx_waitpid (pid, &wstat);
>>
>>
>> is suspicious also.
>
> I understand... It's a bit of a hybrid between trying to deal with
> thread-level execution control, and process-level execution control.
I actually misread this. lynx_ptid_get_pid returns the main pid of the
process, while I read that as getting at the current_inferior's tid.
>> Doesn't that mean we're doing a waitpid on
>> a possibly not-resumed current_inferior (that may not be the main task,
>> if that matters)? Could _that_ be reason for that magic signal 61?
>
> Given the above (we resume processes, rather than threads individually),
> I do not think that this is the source of the problem itself. I blame
> the thread library for now liking it when you potentially alter the
> program scheduling by resuming the non-active thread. This patch does
> not prevent this from happening, but at least makes an effort into
> avoiding it for the usual situations.
--
Pedro Alves