This is the mail archive of the
gdb-prs@sourceware.org
mailing list for the GDB project.
[Bug gdb/12702] New: gdb can hang waiting for thread group leader
- From: "dje at google dot com" <sourceware-bugzilla at sourceware dot org>
- To: gdb-prs at sourceware dot org
- Date: Tue, 26 Apr 2011 00:39:33 +0000
- Subject: [Bug gdb/12702] New: gdb can hang waiting for thread group leader
- Auto-submitted: auto-generated
http://sourceware.org/bugzilla/show_bug.cgi?id=12702
Summary: gdb can hang waiting for thread group leader
Product: gdb
Version: HEAD
Status: NEW
Severity: normal
Priority: P2
Component: gdb
AssignedTo: unassigned@sourceware.org
ReportedBy: dje@google.com
Created attachment 5685
--> http://sourceware.org/bugzilla/attachment.cgi?id=5685
Patch to workaround the issue.
When ptracing, waitpid with options == 0 will hang if the thread group leader
exits while there are still other threads around.
[Blech! IWBN if it could at least return an error instead of hanging.]
When gdb detects a thread has stopped, it will stop other threads before
returning control to the user (in "all-stop mode").
A race occurs when the main thread exits between gdb detecting a 2nd thread has
stopped and gdb waits for the main thread to stop.
If the main thread has exited and there are other threads then waitpid
(main_thread_pid, &status, 0) will hang.
This patch to gdb and the accompanying testcase illustrates the issue.
Apply to cvs head as of 11apr25.
diff -u -p -r1.199 linux-nat.c
--- linux-nat.c 9 Mar 2011 12:48:55 -0000 1.199
+++ linux-nat.c 25 Apr 2011 23:48:13 -0000
@@ -4047,6 +4108,21 @@ linux_thread_alive (ptid_t ptid)
target_pid_to_str (ptid),
err ? safe_strerror (tmp_errno) : "OK");
+ if (debug_linux_nat && GET_PID (ptid) != GET_LWP (ptid))
+ {
+ char buf[200];
+ sprintf (buf, "cat /proc/%ld/task/%ld/status",
+ (long) GET_PID (ptid), (long) GET_LWP (ptid));
+ system (buf);
+ sleep (3);
+ err = kill_lwp (GET_LWP (ptid), 0);
+ fprintf_unfiltered (gdb_stdlog,
+ "LLTA: KILL(SIG0) %s (%s)\n",
+ target_pid_to_str (ptid),
+ err ? safe_strerror (err) : "OK");
+ system (buf);
+ }
+
if (err != 0)
return 0;
bash$ cat testcase.c
#include <stdio.h>
#include <pthread.h>
#include <syscall.h>
#include <stdarg.h>
#include <stdlib.h>
int
gettid ()
{
return syscall (__NR_gettid);
}
void
printf_flushed (const char *msg, ...)
{
va_list args;
va_start (args, msg);
vprintf (msg, args);
va_end (args);
fflush (stdout);
}
void*
thread_function (void* dummy_ptr)
{
printf_flushed ("Thread self 0x%x, pid %d, lwp %d\n",
pthread_self (), getpid (), gettid ());
asm volatile ("int3");
pthread_exit ((void *) 0);
abort ();
}
int
main ()
{
pthread_t thread_id;
pthread_create (&thread_id, NULL, thread_function, NULL);
sleep (1);
return 0;
}
bash$ gcc -g testcase.c -o testcase.x64 -lpthread
bash$ ./gdb --batch -nx -ex "set debug infrun 1" -ex "set debug lin-lwp 1" -ex
run -ex quit ./testcase.x64
[...]
LLW: waitpid 15050 received Trace/breakpoint trap (stopped)
LLTA: KILL(SIG0) Thread 0x7ffff783c700 (LWP 15050) (OK)
Name: testcase.x64
State: T (tracing stop)
Tgid: 15047
Pid: 15050
PPid: 15045
TracerPid: 15045
[...]
State: Z (zombie)
Tgid: 15047
Pid: 15050
PPid: 15045
TracerPid: 15045
[...]
LLW: Candidate event Trace/breakpoint trap (stopped) in Thread 0x7ffff783c700
(LWP 15050).
SC: kill Thread 0x7ffff7fd7700 (LWP 15047) **<SIGSTOP>**
SC: lwp kill 0 ERRNO-OK
hang
The attached patch works around the bug^wissue.
I suspect all calls to waitpid with options == 0 need to be audited and either
fixed or documented that they can't trip over this issue.
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.