This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: ttrace: Protocal error
- From: "John David Anglin" <dave at hiauly1 dot hia dot nrc dot ca>
- To: pedro at codesourcery dot com (Pedro Alves)
- Cc: gdb-patches at sourceware dot org
- Date: Fri, 8 Aug 2008 16:48:38 -0400 (EDT)
- Subject: Re: ttrace: Protocal error
> Note, I know nothing about ttrace and HP-UX.
That makes us equal.
> On Friday 08 August 2008 19:33:06, John David Anglin wrote:
> > While were on the subject of threads, it seems we are still not in
> > a position to debug the vla6.f90 failure:
>
> What's this test doing different?
It's not entirely clear. However, it is using emulated TLS support
and multiple lwp threads. This support may be initialized by a constructor
run directly by the dynamic loader. There's a timing or some other
random effect associated with the failure (could be some variable is
being randomly intialized).
> > #4 0x000a3390 in target_resume (ptid=3D
> > {pid =3D 1953788513, lwp =3D 1667563520, tid =3D 774778670}, step=3D0,
> > signal=3DTARGET_SIGNAL_0) at ../../src/gdb/target.c:1789
>
> ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^
>
> I assume this ptid is GDB getting bogus info, right?
That's pretty common for optimized code.
> This should be setting the dying flag on the thread, but
> it is still listed in gdb's thread table.
Yes.
> case TTEVT_LWP_EXIT:
> if (print_thread_events)
> printf_unfiltered (_("[%s exited]\n"), target_pid_to_str (ptid));
> ti =3D find_thread_pid (ptid);
> gdb_assert (ti !=3D NULL);
> ((struct inf_ttrace_private_thread_info *)ti->private)->dying =3D 1;
> inf_ttrace_num_lwps--;
> ttrace (TT_LWP_CONTINUE, ptid_get_pid (ptid),
> ptid_get_lwp (ptid), TT_NOPC, 0, 0);
> /* If we don't return -1 here, core GDB will re-add the thread. */
> ptid =3D minus_one_ptid;
> break;
The dying flag is set when the resume is attempted.
> inf_ttrace_resume:
>
> if (ptid_equal (ptid, minus_one_ptid))
> {
> /* Let all the other threads run too. */
> iterate_over_threads (inf_ttrace_resume_callback, NULL);
> iterate_over_threads (inf_ttrace_delete_dying_threads_callback, NULL);
> }
>
> Is this the first resume after that "exit" notification?
> Any chance we're trying to resume a dead thread here then?
Yes. That's what I think is happening.
> What happens when you delete the dying threads before resuming?
>
> iterate_over_threads (inf_ttrace_delete_dying_threads_callback, NULL);
> iterate_over_threads (inf_ttrace_resume_callback, NULL);
> iterate_over_threads (inf_ttrace_delete_dying_threads_callback, NULL);
>
> Hmmm, I assume not, if my sources match yours, your the program is stopped
> at a syscall event:
>
> /* Be careful not to try to gather much state about a thread
> that's in a syscall. It's frequently a losing proposition. */
> case TARGET_WAITKIND_SYSCALL_ENTRY:
> if (debug_infrun)
> fprintf_unfiltered (gdb_stdlog, "infrun:=20
> TARGET_WAITKIND_SYSCALL_ENTRY\n");
> resume (0, TARGET_SIGNAL_0);
> prepare_to_wait (ecs);
> return;
>
> So, there should have already been a resume in between.
>
> Could you check which thread got the syscall event? Is it the same
> thread we fail to resume? Is it possibly to disable syscall events,
> just for checking if it is related?
I don't know how to disable syscall events.
Dave
--
J. David Anglin dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)